linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3] sched/fair: sanitize vruntime of entity being placed
@ 2023-02-09 19:31 Roman Kagan
  2023-02-21  9:38 ` Vincent Guittot
  0 siblings, 1 reply; 14+ messages in thread
From: Roman Kagan @ 2023-02-09 19:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: Valentin Schneider, Zhang Qiao, Ben Segall, Vincent Guittot,
	Waiman Long, Peter Zijlstra, Steven Rostedt, Mel Gorman,
	Dietmar Eggemann, Daniel Bristot de Oliveira, Ingo Molnar,
	Juri Lelli

From: Zhang Qiao <zhangqiao22@huawei.com>

When a scheduling entity is placed onto cfs_rq, its vruntime is pulled
to the base level (around cfs_rq->min_vruntime), so that the entity
doesn't gain extra boost when placed backwards.

However, if the entity being placed wasn't executed for a long time, its
vruntime may get too far behind (e.g. while cfs_rq was executing a
low-weight hog), which can inverse the vruntime comparison due to s64
overflow.  This results in the entity being placed with its original
vruntime way forwards, so that it will effectively never get to the cpu.

To prevent that, ignore the vruntime of the entity being placed if it
didn't execute for longer than the time that can lead to an overflow.

Signed-off-by: Zhang Qiao <zhangqiao22@huawei.com>
[rkagan: formatted, adjusted commit log, comments, cutoff value]
Co-developed-by: Roman Kagan <rkagan@amazon.de>
Signed-off-by: Roman Kagan <rkagan@amazon.de>
---
v2 -> v3:
- make cutoff less arbitrary and update comments [Vincent]

v1 -> v2:
- add Zhang Qiao's s-o-b
- fix constant promotion on 32bit

 kernel/sched/fair.c | 21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0f8736991427..3baa6b7ea860 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4656,6 +4656,7 @@ static void
 place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
 {
 	u64 vruntime = cfs_rq->min_vruntime;
+	u64 sleep_time;
 
 	/*
 	 * The 'current' period is already promised to the current tasks,
@@ -4685,8 +4686,24 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
 		vruntime -= thresh;
 	}
 
-	/* ensure we never gain time by being placed backwards. */
-	se->vruntime = max_vruntime(se->vruntime, vruntime);
+	/*
+	 * Pull vruntime of the entity being placed to the base level of
+	 * cfs_rq, to prevent boosting it if placed backwards.
+	 * However, min_vruntime can advance much faster than real time, with
+	 * the exterme being when an entity with the minimal weight always runs
+	 * on the cfs_rq.  If the new entity slept for long, its vruntime
+	 * difference from min_vruntime may overflow s64 and their comparison
+	 * may get inversed, so ignore the entity's original vruntime in that
+	 * case.
+	 * The maximal vruntime speedup is given by the ratio of normal to
+	 * minimal weight: NICE_0_LOAD / MIN_SHARES, so cutting off on the
+	 * sleep time of 2^63 / NICE_0_LOAD should be safe.
+	 */
+	sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start;
+	if ((s64)sleep_time > (1ULL << 63) / NICE_0_LOAD)
+		se->vruntime = vruntime;
+	else
+		se->vruntime = max_vruntime(se->vruntime, vruntime);
 }
 
 static void check_enqueue_throttle(struct cfs_rq *cfs_rq);
-- 
2.34.1




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879




^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v3] sched/fair: sanitize vruntime of entity being placed
  2023-02-09 19:31 [PATCH v3] sched/fair: sanitize vruntime of entity being placed Roman Kagan
@ 2023-02-21  9:38 ` Vincent Guittot
  2023-02-21 16:57   ` Roman Kagan
  0 siblings, 1 reply; 14+ messages in thread
From: Vincent Guittot @ 2023-02-21  9:38 UTC (permalink / raw)
  To: Roman Kagan
  Cc: linux-kernel, Valentin Schneider, Zhang Qiao, Ben Segall,
	Waiman Long, Peter Zijlstra, Steven Rostedt, Mel Gorman,
	Dietmar Eggemann, Daniel Bristot de Oliveira, Ingo Molnar,
	Juri Lelli

On Thu, 9 Feb 2023 at 20:31, Roman Kagan <rkagan@amazon.de> wrote:
>
> From: Zhang Qiao <zhangqiao22@huawei.com>
>
> When a scheduling entity is placed onto cfs_rq, its vruntime is pulled
> to the base level (around cfs_rq->min_vruntime), so that the entity
> doesn't gain extra boost when placed backwards.
>
> However, if the entity being placed wasn't executed for a long time, its
> vruntime may get too far behind (e.g. while cfs_rq was executing a
> low-weight hog), which can inverse the vruntime comparison due to s64
> overflow.  This results in the entity being placed with its original
> vruntime way forwards, so that it will effectively never get to the cpu.
>
> To prevent that, ignore the vruntime of the entity being placed if it
> didn't execute for longer than the time that can lead to an overflow.
>
> Signed-off-by: Zhang Qiao <zhangqiao22@huawei.com>
> [rkagan: formatted, adjusted commit log, comments, cutoff value]
> Co-developed-by: Roman Kagan <rkagan@amazon.de>
> Signed-off-by: Roman Kagan <rkagan@amazon.de>

Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>

> ---
> v2 -> v3:
> - make cutoff less arbitrary and update comments [Vincent]
>
> v1 -> v2:
> - add Zhang Qiao's s-o-b
> - fix constant promotion on 32bit
>
>  kernel/sched/fair.c | 21 +++++++++++++++++++--
>  1 file changed, 19 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 0f8736991427..3baa6b7ea860 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4656,6 +4656,7 @@ static void
>  place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
>  {
>         u64 vruntime = cfs_rq->min_vruntime;
> +       u64 sleep_time;
>
>         /*
>          * The 'current' period is already promised to the current tasks,
> @@ -4685,8 +4686,24 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
>                 vruntime -= thresh;
>         }
>
> -       /* ensure we never gain time by being placed backwards. */
> -       se->vruntime = max_vruntime(se->vruntime, vruntime);
> +       /*
> +        * Pull vruntime of the entity being placed to the base level of
> +        * cfs_rq, to prevent boosting it if placed backwards.
> +        * However, min_vruntime can advance much faster than real time, with
> +        * the exterme being when an entity with the minimal weight always runs
> +        * on the cfs_rq.  If the new entity slept for long, its vruntime
> +        * difference from min_vruntime may overflow s64 and their comparison
> +        * may get inversed, so ignore the entity's original vruntime in that
> +        * case.
> +        * The maximal vruntime speedup is given by the ratio of normal to
> +        * minimal weight: NICE_0_LOAD / MIN_SHARES, so cutting off on the
> +        * sleep time of 2^63 / NICE_0_LOAD should be safe.
> +        */
> +       sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start;
> +       if ((s64)sleep_time > (1ULL << 63) / NICE_0_LOAD)
> +               se->vruntime = vruntime;
> +       else
> +               se->vruntime = max_vruntime(se->vruntime, vruntime);
>  }
>
>  static void check_enqueue_throttle(struct cfs_rq *cfs_rq);
> --
> 2.34.1
>
>
>
>
> Amazon Development Center Germany GmbH
> Krausenstr. 38
> 10117 Berlin
> Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
> Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
> Sitz: Berlin
> Ust-ID: DE 289 237 879
>
>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3] sched/fair: sanitize vruntime of entity being placed
  2023-02-21  9:38 ` Vincent Guittot
@ 2023-02-21 16:57   ` Roman Kagan
  2023-02-21 17:26     ` Vincent Guittot
  0 siblings, 1 reply; 14+ messages in thread
From: Roman Kagan @ 2023-02-21 16:57 UTC (permalink / raw)
  To: Vincent Guittot, Peter Zijlstra
  Cc: linux-kernel, Valentin Schneider, Zhang Qiao, Ben Segall,
	Waiman Long, Steven Rostedt, Mel Gorman, Dietmar Eggemann,
	Daniel Bristot de Oliveira, Ingo Molnar, Juri Lelli

On Tue, Feb 21, 2023 at 10:38:44AM +0100, Vincent Guittot wrote:
> On Thu, 9 Feb 2023 at 20:31, Roman Kagan <rkagan@amazon.de> wrote:
> >
> > From: Zhang Qiao <zhangqiao22@huawei.com>
> >
> > When a scheduling entity is placed onto cfs_rq, its vruntime is pulled
> > to the base level (around cfs_rq->min_vruntime), so that the entity
> > doesn't gain extra boost when placed backwards.
> >
> > However, if the entity being placed wasn't executed for a long time, its
> > vruntime may get too far behind (e.g. while cfs_rq was executing a
> > low-weight hog), which can inverse the vruntime comparison due to s64
> > overflow.  This results in the entity being placed with its original
> > vruntime way forwards, so that it will effectively never get to the cpu.
> >
> > To prevent that, ignore the vruntime of the entity being placed if it
> > didn't execute for longer than the time that can lead to an overflow.
> >
> > Signed-off-by: Zhang Qiao <zhangqiao22@huawei.com>
> > [rkagan: formatted, adjusted commit log, comments, cutoff value]
> > Co-developed-by: Roman Kagan <rkagan@amazon.de>
> > Signed-off-by: Roman Kagan <rkagan@amazon.de>
> 
> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
> 
> > ---
> > v2 -> v3:
> > - make cutoff less arbitrary and update comments [Vincent]
> >
> > v1 -> v2:
> > - add Zhang Qiao's s-o-b
> > - fix constant promotion on 32bit
> >
> >  kernel/sched/fair.c | 21 +++++++++++++++++++--
> >  1 file changed, 19 insertions(+), 2 deletions(-)

Turns out Peter took v2 through his tree, and it has already landed in
Linus' master.

What scares me, though, is that I've got a message from the test robot
that this commit drammatically affected hackbench results, see the quote
below.  I expected the commit not to affect any benchmarks.

Any idea what could have caused this change?

Thanks,
Roman.


On Tue, Feb 21, 2023 at 03:34:16PM +0800, kernel test robot wrote:
> FYI, we noticed a 125.5% improvement of hackbench.throughput due to commit:
> 
> commit: 829c1651e9c4a6f78398d3e67651cef9bb6b42cc ("sched/fair: sanitize vruntime of entity being placed")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> 
> in testcase: hackbench
> on test machine: 88 threads 2 sockets Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz (Cascade Lake) with 128G memory
> with following parameters:
> 
>         nr_threads: 50%
>         iterations: 8
>         mode: process
>         ipc: pipe
>         cpufreq_governor: performance
> 
> test-description: Hackbench is both a benchmark and a stress test for the Linux kernel scheduler.
> test-url: https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/sched/cfs-scheduler/hackbench.c
> 
> In addition to that, the commit also has significant impact on the following tests:
> 
> +------------------+--------------------------------------------------+
> | testcase: change | hackbench: hackbench.throughput -8.1% regression |
> | test machine     | 104 threads 2 sockets (Skylake) with 192G memory |
> | test parameters  | cpufreq_governor=performance                     |
> |                  | ipc=socket                                       |
> |                  | iterations=4                                     |
> |                  | mode=process                                     |
> |                  | nr_threads=100%                                  |
> +------------------+--------------------------------------------------+
> 
> Details are as below:
> 
> =========================================================================================
> compiler/cpufreq_governor/ipc/iterations/kconfig/mode/nr_threads/rootfs/tbox_group/testcase:
>   gcc-11/performance/pipe/8/x86_64-rhel-8.3/process/50%/debian-11.1-x86_64-20220510.cgz/lkp-csl-2sp9/hackbench
> 
> commit:
>   a2e90611b9 ("sched/fair: Remove capacity inversion detection")
>   829c1651e9 ("sched/fair: sanitize vruntime of entity being placed")
> 
> a2e90611b9f425ad 829c1651e9c4a6f78398d3e6765
> ---------------- ---------------------------
>          %stddev     %change         %stddev
>              \          |                \
>     308887 ±  5%    +125.5%     696539        hackbench.throughput
>     259291 ±  2%    +127.3%     589293        hackbench.throughput_avg
>     308887 ±  5%    +125.5%     696539        hackbench.throughput_best
>     198770 ±  2%    +105.5%     408552 ±  4%  hackbench.throughput_worst
>     319.60 ±  2%     -55.8%     141.24        hackbench.time.elapsed_time
>     319.60 ±  2%     -55.8%     141.24        hackbench.time.elapsed_time.max
>  1.298e+09 ±  8%     -87.6%  1.613e+08 ±  7%  hackbench.time.involuntary_context_switches
>     477107           -12.5%     417660        hackbench.time.minor_page_faults
>      24683 ±  2%     -57.2%      10562        hackbench.time.system_time
>       2136 ±  3%     -45.0%       1174        hackbench.time.user_time
>   3.21e+09 ±  4%     -83.0%  5.442e+08 ±  3%  hackbench.time.voluntary_context_switches
>   5.28e+08 ±  4%      +8.4%  5.723e+08 ±  3%  cpuidle..time
>     365.97 ±  2%     -48.9%     187.12        uptime.boot
>    3322559 ±  3%     +34.3%    4463206 ± 15%  vmstat.memory.cache
>   14194257 ±  2%     -62.8%    5279904 ±  3%  vmstat.system.cs
>    2120781 ±  3%     -72.8%     576421 ±  4%  vmstat.system.in
>       1.84 ± 12%      +2.6        4.48 ±  5%  mpstat.cpu.all.idle%
>       2.49 ±  3%      -1.1        1.39 ±  4%  mpstat.cpu.all.irq%
>       0.04 ± 12%      +0.0        0.05        mpstat.cpu.all.soft%
>       7.36            +2.2        9.56        mpstat.cpu.all.usr%
>      61555 ±  6%     -72.8%      16751 ± 16%  numa-meminfo.node1.Active
>      61515 ±  6%     -72.8%      16717 ± 16%  numa-meminfo.node1.Active(anon)
>     960182 ±102%    +225.6%    3125990 ± 42%  numa-meminfo.node1.FilePages
>    1754002 ± 53%    +137.9%    4173379 ± 34%  numa-meminfo.node1.MemUsed
>   35296824 ±  6%    +157.8%   91005048        numa-numastat.node0.local_node
>   35310119 ±  6%    +157.9%   91058472        numa-numastat.node0.numa_hit
>   35512423 ±  5%    +159.7%   92232951        numa-numastat.node1.local_node
>   35577275 ±  4%    +159.4%   92273266        numa-numastat.node1.numa_hit
>   35310253 ±  6%    +157.9%   91058211        numa-vmstat.node0.numa_hit
>   35296958 ±  6%    +157.8%   91004787        numa-vmstat.node0.numa_local
>      15337 ±  6%     -72.5%       4216 ± 17%  numa-vmstat.node1.nr_active_anon
>     239988 ±102%    +225.7%     781607 ± 42%  numa-vmstat.node1.nr_file_pages
>      15337 ±  6%     -72.5%       4216 ± 17%  numa-vmstat.node1.nr_zone_active_anon
>   35577325 ±  4%    +159.4%   92273215        numa-vmstat.node1.numa_hit
>   35512473 ±  5%    +159.7%   92232900        numa-vmstat.node1.numa_local
>      64500 ±  8%     -61.8%      24643 ± 32%  meminfo.Active
>      64422 ±  8%     -61.9%      24568 ± 32%  meminfo.Active(anon)
>     140271 ± 14%     -38.0%      86979 ± 24%  meminfo.AnonHugePages
>     372672 ±  2%     +13.3%     422069        meminfo.AnonPages
>    3205235 ±  3%     +35.1%    4329061 ± 15%  meminfo.Cached
>    1548601 ±  7%     +77.4%    2747319 ± 24%  meminfo.Committed_AS
>     783193 ± 14%    +154.9%    1996137 ± 33%  meminfo.Inactive
>     783010 ± 14%    +154.9%    1995951 ± 33%  meminfo.Inactive(anon)
>    4986534 ±  2%     +28.2%    6394741 ± 10%  meminfo.Memused
>     475092 ± 22%    +236.5%    1598918 ± 41%  meminfo.Shmem
>       2777            -2.1%       2719        turbostat.Bzy_MHz
>   11143123 ±  6%     +72.0%   19162667        turbostat.C1
>       0.24 ±  7%      +0.7        0.94 ±  3%  turbostat.C1%
>     100440 ± 18%    +203.8%     305136 ± 15%  turbostat.C1E
>       0.06 ±  9%      +0.1        0.18 ± 11%  turbostat.C1E%
>       1.24 ±  3%      +1.6        2.81 ±  4%  turbostat.C6%
>       1.38 ±  3%    +156.1%       3.55 ±  3%  turbostat.CPU%c1
>       0.33 ±  5%     +76.5%       0.58 ±  7%  turbostat.CPU%c6
>       0.16           +31.2%       0.21        turbostat.IPC
>  6.866e+08 ±  5%     -87.8%   83575393 ±  5%  turbostat.IRQ
>       0.33 ± 27%      +0.2        0.57        turbostat.POLL%
>       0.12 ± 10%    +176.4%       0.33 ± 12%  turbostat.Pkg%pc2
>       0.09 ±  7%    -100.0%       0.00        turbostat.Pkg%pc6
>      61.33            +5.2%      64.50 ±  2%  turbostat.PkgTmp
>      14.81            +2.0%      15.11        turbostat.RAMWatt
>      16242 ±  8%     -62.0%       6179 ± 32%  proc-vmstat.nr_active_anon
>      93150 ±  2%     +13.2%     105429        proc-vmstat.nr_anon_pages
>     801219 ±  3%     +35.1%    1082320 ± 15%  proc-vmstat.nr_file_pages
>     195506 ± 14%    +155.2%     498919 ± 33%  proc-vmstat.nr_inactive_anon
>     118682 ± 22%    +236.9%     399783 ± 41%  proc-vmstat.nr_shmem
>      16242 ±  8%     -62.0%       6179 ± 32%  proc-vmstat.nr_zone_active_anon
>     195506 ± 14%    +155.2%     498919 ± 33%  proc-vmstat.nr_zone_inactive_anon
>   70889233 ±  5%    +158.6%  1.833e+08        proc-vmstat.numa_hit
>   70811086 ±  5%    +158.8%  1.832e+08        proc-vmstat.numa_local
>      55885 ± 22%     -67.2%      18327 ± 38%  proc-vmstat.numa_pages_migrated
>     422312 ± 10%     -95.4%      19371 ±  7%  proc-vmstat.pgactivate
>   71068460 ±  5%    +158.1%  1.834e+08        proc-vmstat.pgalloc_normal
>    1554994           -19.6%    1250346 ±  4%  proc-vmstat.pgfault
>   71011267 ±  5%    +155.9%  1.817e+08        proc-vmstat.pgfree
>      55885 ± 22%     -67.2%      18327 ± 38%  proc-vmstat.pgmigrate_success
>     111247 ±  2%     -35.0%      72355 ±  2%  proc-vmstat.pgreuse
>    2506368 ±  2%     -53.1%    1176320        proc-vmstat.unevictable_pgs_scanned
>      20.06 ± 10%     -22.4%      15.56 ±  8%  sched_debug.cfs_rq:/.h_nr_running.max
>       0.81 ± 32%     -93.1%       0.06 ±223%  sched_debug.cfs_rq:/.h_nr_running.min
>       1917 ± 34%    -100.0%       0.00        sched_debug.cfs_rq:/.load.min
>      24.18 ± 10%     +39.0%      33.62 ± 11%  sched_debug.cfs_rq:/.load_avg.avg
>     245.61 ± 25%     +66.3%     408.33 ± 22%  sched_debug.cfs_rq:/.load_avg.max
>      47.52 ± 13%     +72.6%      82.03 ±  8%  sched_debug.cfs_rq:/.load_avg.stddev
>   13431147           -64.9%    4717147        sched_debug.cfs_rq:/.min_vruntime.avg
>   18161799 ±  7%     -67.4%    5925316 ±  6%  sched_debug.cfs_rq:/.min_vruntime.max
>   12413026           -65.0%    4340952        sched_debug.cfs_rq:/.min_vruntime.min
>     739748 ± 16%     -66.6%     247410 ± 17%  sched_debug.cfs_rq:/.min_vruntime.stddev
>       0.85           -16.4%       0.71        sched_debug.cfs_rq:/.nr_running.avg
>       0.61 ± 25%     -90.9%       0.06 ±223%  sched_debug.cfs_rq:/.nr_running.min
>       0.10 ± 25%    +109.3%       0.22 ±  7%  sched_debug.cfs_rq:/.nr_running.stddev
>     169.22          +101.7%     341.33        sched_debug.cfs_rq:/.removed.load_avg.max
>      32.41 ± 24%    +100.2%      64.90 ± 16%  sched_debug.cfs_rq:/.removed.load_avg.stddev
>      82.92 ± 10%    +108.1%     172.56        sched_debug.cfs_rq:/.removed.runnable_avg.max
>      13.60 ± 28%    +114.0%      29.10 ± 20%  sched_debug.cfs_rq:/.removed.runnable_avg.stddev
>      82.92 ± 10%    +108.1%     172.56        sched_debug.cfs_rq:/.removed.util_avg.max
>      13.60 ± 28%    +114.0%      29.10 ± 20%  sched_debug.cfs_rq:/.removed.util_avg.stddev
>       2156 ± 12%     -36.6%       1368 ± 27%  sched_debug.cfs_rq:/.runnable_avg.min
>       2285 ±  7%     -19.8%       1833 ±  6%  sched_debug.cfs_rq:/.runnable_avg.stddev
>   -2389921           -64.8%    -840940        sched_debug.cfs_rq:/.spread0.min
>     739781 ± 16%     -66.5%     247837 ± 17%  sched_debug.cfs_rq:/.spread0.stddev
>     843.88 ±  2%     -20.5%     670.53        sched_debug.cfs_rq:/.util_avg.avg
>     433.64 ±  7%     -43.5%     244.83 ± 17%  sched_debug.cfs_rq:/.util_avg.min
>     187.00 ±  6%     +40.6%     263.02 ±  4%  sched_debug.cfs_rq:/.util_avg.stddev
>     394.15 ± 14%     -29.5%     278.06 ±  3%  sched_debug.cfs_rq:/.util_est_enqueued.avg
>       1128 ± 12%     -17.6%     930.39 ±  5%  sched_debug.cfs_rq:/.util_est_enqueued.max
>      38.36 ± 29%    -100.0%       0.00        sched_debug.cfs_rq:/.util_est_enqueued.min
>       3596 ± 15%     -39.5%       2175 ±  7%  sched_debug.cpu.avg_idle.min
>     160647 ±  9%     -25.9%     118978 ±  9%  sched_debug.cpu.avg_idle.stddev
>     197365           -46.2%     106170        sched_debug.cpu.clock.avg
>     197450           -46.2%     106208        sched_debug.cpu.clock.max
>     197281           -46.2%     106128        sched_debug.cpu.clock.min
>      49.96 ± 22%     -53.1%      23.44 ± 19%  sched_debug.cpu.clock.stddev
>     193146           -45.7%     104898        sched_debug.cpu.clock_task.avg
>     194592           -45.8%     105455        sched_debug.cpu.clock_task.max
>     177878           -49.3%      90211        sched_debug.cpu.clock_task.min
>       1794 ±  5%     -10.7%       1602 ±  2%  sched_debug.cpu.clock_task.stddev
>      13154 ±  2%     -20.3%      10479        sched_debug.cpu.curr->pid.avg
>      15059           -17.2%      12468        sched_debug.cpu.curr->pid.max
>       7263 ± 33%    -100.0%       0.00        sched_debug.cpu.curr->pid.min
>       9321 ± 36%     +98.2%      18478 ± 44%  sched_debug.cpu.max_idle_balance_cost.stddev
>       0.00 ± 17%     -41.6%       0.00 ± 13%  sched_debug.cpu.next_balance.stddev
>      20.00 ± 11%     -21.4%      15.72 ±  7%  sched_debug.cpu.nr_running.max
>       0.86 ± 17%     -87.1%       0.11 ±141%  sched_debug.cpu.nr_running.min
>   25069883           -83.7%    4084117 ±  4%  sched_debug.cpu.nr_switches.avg
>   26486718           -82.8%    4544009 ±  4%  sched_debug.cpu.nr_switches.max
>   23680077           -84.5%    3663816 ±  4%  sched_debug.cpu.nr_switches.min
>     589836 ±  3%     -68.7%     184621 ± 16%  sched_debug.cpu.nr_switches.stddev
>     197278           -46.2%     106128        sched_debug.cpu_clk
>     194327           -46.9%     103176        sched_debug.ktime
>     197967           -46.0%     106821        sched_debug.sched_clk
>      14.91           -37.6%       9.31        perf-stat.i.MPKI
>  2.657e+10           +25.0%   3.32e+10        perf-stat.i.branch-instructions
>       1.17            -0.4        0.78        perf-stat.i.branch-miss-rate%
>  3.069e+08           -20.1%  2.454e+08        perf-stat.i.branch-misses
>       6.43 ±  8%      +2.2        8.59 ±  4%  perf-stat.i.cache-miss-rate%
>  1.952e+09           -24.3%  1.478e+09        perf-stat.i.cache-references
>   14344055 ±  2%     -58.6%    5932018 ±  3%  perf-stat.i.context-switches
>       1.83           -21.8%       1.43        perf-stat.i.cpi
>  2.403e+11            -3.4%  2.322e+11        perf-stat.i.cpu-cycles
>    1420139 ±  2%     -38.8%     869692 ±  5%  perf-stat.i.cpu-migrations
>       2619 ±  7%     -15.5%       2212 ±  8%  perf-stat.i.cycles-between-cache-misses
>       0.24 ± 19%      -0.1        0.10 ± 17%  perf-stat.i.dTLB-load-miss-rate%
>   90403286 ± 19%     -55.8%   39926283 ± 16%  perf-stat.i.dTLB-load-misses
>  3.823e+10           +28.6%  4.918e+10        perf-stat.i.dTLB-loads
>       0.01 ± 34%      -0.0        0.01 ± 33%  perf-stat.i.dTLB-store-miss-rate%
>    2779663 ± 34%     -52.7%    1315899 ± 31%  perf-stat.i.dTLB-store-misses
>   2.19e+10           +24.2%   2.72e+10        perf-stat.i.dTLB-stores
>      47.99 ±  2%     +28.0       75.94        perf-stat.i.iTLB-load-miss-rate%
>   89417955 ±  2%     +38.7%   1.24e+08 ±  4%  perf-stat.i.iTLB-load-misses
>   97721514 ±  2%     -58.2%   40865783 ±  3%  perf-stat.i.iTLB-loads
>  1.329e+11           +26.3%  1.678e+11        perf-stat.i.instructions
>       1503            -7.7%       1388 ±  3%  perf-stat.i.instructions-per-iTLB-miss
>       0.55           +30.2%       0.72        perf-stat.i.ipc
>       1.64 ± 18%    +217.4%       5.20 ± 11%  perf-stat.i.major-faults
>       2.73            -3.7%       2.63        perf-stat.i.metric.GHz
>       1098 ±  2%      -7.1%       1020 ±  3%  perf-stat.i.metric.K/sec
>       1008           +24.4%       1254        perf-stat.i.metric.M/sec
>       4334 ±  2%     +90.5%       8257 ±  7%  perf-stat.i.minor-faults
>      90.94           -14.9       75.99        perf-stat.i.node-load-miss-rate%
>   41932510 ±  8%     -43.0%   23899176 ± 10%  perf-stat.i.node-load-misses
>    3366677 ±  5%     +86.2%    6267816        perf-stat.i.node-loads
>      81.77 ±  3%     -36.3       45.52 ±  3%  perf-stat.i.node-store-miss-rate%
>   18498318 ±  7%     -31.8%   12613933 ±  7%  perf-stat.i.node-store-misses
>    3023556 ± 10%    +508.7%   18405880 ±  2%  perf-stat.i.node-stores
>       4336 ±  2%     +90.5%       8262 ±  7%  perf-stat.i.page-faults
>      14.70           -41.2%       8.65        perf-stat.overall.MPKI
>       1.16            -0.4        0.72        perf-stat.overall.branch-miss-rate%
>       6.22 ±  7%      +2.4        8.59 ±  4%  perf-stat.overall.cache-miss-rate%
>       1.81           -24.3%       1.37        perf-stat.overall.cpi
>       0.24 ± 19%      -0.2        0.07 ± 15%  perf-stat.overall.dTLB-load-miss-rate%
>       0.01 ± 34%      -0.0        0.00 ± 29%  perf-stat.overall.dTLB-store-miss-rate%
>      47.78 ±  2%     +29.3       77.12        perf-stat.overall.iTLB-load-miss-rate%
>       1486            -9.1%       1351 ±  4%  perf-stat.overall.instructions-per-iTLB-miss
>       0.55           +32.0%       0.73        perf-stat.overall.ipc
>      92.54           -15.4       77.16 ±  2%  perf-stat.overall.node-load-miss-rate%
>      85.82 ±  2%     -48.1       37.76 ±  5%  perf-stat.overall.node-store-miss-rate%
>  2.648e+10           +25.2%  3.314e+10        perf-stat.ps.branch-instructions
>   3.06e+08           -22.1%  2.383e+08        perf-stat.ps.branch-misses
>  1.947e+09           -25.5%  1.451e+09        perf-stat.ps.cache-references
>   14298713 ±  2%     -62.5%    5359285 ±  3%  perf-stat.ps.context-switches
>  2.396e+11            -4.0%  2.299e+11        perf-stat.ps.cpu-cycles
>    1415512 ±  2%     -42.2%     817981 ±  4%  perf-stat.ps.cpu-migrations
>   90073948 ± 19%     -60.4%   35711862 ± 15%  perf-stat.ps.dTLB-load-misses
>  3.811e+10           +29.7%  4.944e+10        perf-stat.ps.dTLB-loads
>    2767291 ± 34%     -56.3%    1210210 ± 29%  perf-stat.ps.dTLB-store-misses
>  2.183e+10           +25.0%  2.729e+10        perf-stat.ps.dTLB-stores
>   89118809 ±  2%     +39.6%  1.244e+08 ±  4%  perf-stat.ps.iTLB-load-misses
>   97404381 ±  2%     -62.2%   36860047 ±  3%  perf-stat.ps.iTLB-loads
>  1.324e+11           +26.7%  1.678e+11        perf-stat.ps.instructions
>       1.62 ± 18%    +164.7%       4.29 ±  8%  perf-stat.ps.major-faults
>       4310 ±  2%     +75.1%       7549 ±  5%  perf-stat.ps.minor-faults
>   41743097 ±  8%     -47.3%   21984450 ±  9%  perf-stat.ps.node-load-misses
>    3356259 ±  5%     +92.6%    6462631        perf-stat.ps.node-loads
>   18414647 ±  7%     -35.7%   11833799 ±  6%  perf-stat.ps.node-store-misses
>    3019790 ± 10%    +545.0%   19478071        perf-stat.ps.node-stores
>       4312 ±  2%     +75.2%       7553 ±  5%  perf-stat.ps.page-faults
>  4.252e+13           -43.7%  2.395e+13        perf-stat.total.instructions
>      29.92 ±  4%     -22.8        7.09 ± 29%  perf-profile.calltrace.cycles-pp.__wake_up_common_lock.pipe_write.vfs_write.ksys_write.do_syscall_64
>      28.53 ±  5%     -21.6        6.92 ± 29%  perf-profile.calltrace.cycles-pp.__wake_up_common.__wake_up_common_lock.pipe_write.vfs_write.ksys_write
>      27.86 ±  5%     -21.1        6.77 ± 29%  perf-profile.calltrace.cycles-pp.autoremove_wake_function.__wake_up_common.__wake_up_common_lock.pipe_write.vfs_write
>      27.55 ±  5%     -20.9        6.68 ± 29%  perf-profile.calltrace.cycles-pp.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock.pipe_write
>      22.28 ±  4%     -17.0        5.31 ± 30%  perf-profile.calltrace.cycles-pp.schedule.pipe_read.vfs_read.ksys_read.do_syscall_64
>      21.98 ±  4%     -16.7        5.24 ± 30%  perf-profile.calltrace.cycles-pp.__schedule.schedule.pipe_read.vfs_read.ksys_read
>      12.62 ±  4%      -9.6        3.00 ± 33%  perf-profile.calltrace.cycles-pp.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock
>      34.09            -9.2       24.92 ±  3%  perf-profile.calltrace.cycles-pp.pipe_read.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
>      11.48 ±  5%      -8.8        2.69 ± 38%  perf-profile.calltrace.cycles-pp.enqueue_task_fair.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common
>       9.60 ±  7%      -7.2        2.40 ± 35%  perf-profile.calltrace.cycles-pp.dequeue_task_fair.__schedule.schedule.pipe_read.vfs_read
>      36.39            -6.2       30.20        perf-profile.calltrace.cycles-pp.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
>      40.40            -6.1       34.28        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
>      40.95            -5.7       35.26        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__libc_read
>      37.43            -5.4       32.07        perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
>       6.30 ± 11%      -5.2        1.09 ± 36%  perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write
>       5.66 ± 12%      -5.1        0.58 ± 75%  perf-profile.calltrace.cycles-pp.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       6.46 ± 10%      -5.1        1.40 ± 28%  perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write
>       5.53 ± 13%      -5.0        0.56 ± 75%  perf-profile.calltrace.cycles-pp.schedule.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
>       5.42 ± 13%      -4.9        0.56 ± 75%  perf-profile.calltrace.cycles-pp.__schedule.schedule.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode
>       5.82 ±  9%      -4.7        1.10 ± 37%  perf-profile.calltrace.cycles-pp._raw_spin_lock.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock
>       5.86 ± 16%      -4.6        1.31 ± 37%  perf-profile.calltrace.cycles-pp.enqueue_entity.enqueue_task_fair.ttwu_do_activate.try_to_wake_up.autoremove_wake_function
>       5.26 ±  9%      -4.4        0.89 ± 57%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.try_to_wake_up.autoremove_wake_function.__wake_up_common
>      45.18            -3.5       41.68        perf-profile.calltrace.cycles-pp.__libc_read
>      50.31            -3.2       47.12        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write
>       4.00 ± 27%      -2.9        1.09 ± 40%  perf-profile.calltrace.cycles-pp.dequeue_entity.dequeue_task_fair.__schedule.schedule.pipe_read
>      50.75            -2.7       48.06        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__libc_write
>      40.80            -2.6       38.20        perf-profile.calltrace.cycles-pp.pipe_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       3.10 ± 15%      -2.5        0.62 ±103%  perf-profile.calltrace.cycles-pp.update_cfs_group.dequeue_task_fair.__schedule.schedule.pipe_read
>       2.94 ± 12%      -2.3        0.62 ±102%  perf-profile.calltrace.cycles-pp.update_cfs_group.enqueue_task_fair.ttwu_do_activate.try_to_wake_up.autoremove_wake_function
>       2.38 ±  9%      -2.0        0.38 ±102%  perf-profile.calltrace.cycles-pp._raw_spin_lock.__schedule.schedule.pipe_read.vfs_read
>       2.24 ±  7%      -1.8        0.40 ± 71%  perf-profile.calltrace.cycles-pp.prepare_to_wait_event.pipe_read.vfs_read.ksys_read.do_syscall_64
>       2.08 ±  6%      -1.8        0.29 ±100%  perf-profile.calltrace.cycles-pp.pick_next_task_fair.__schedule.schedule.pipe_read.vfs_read
>       2.10 ± 10%      -1.8        0.32 ±104%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.__schedule.schedule.pipe_read
>       2.76 ±  7%      -1.5        1.24 ± 17%  perf-profile.calltrace.cycles-pp.select_task_rq.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock
>       2.27 ±  5%      -1.4        0.88 ± 11%  perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
>       2.43 ±  7%      -1.3        1.16 ± 17%  perf-profile.calltrace.cycles-pp.select_task_rq_fair.select_task_rq.try_to_wake_up.autoremove_wake_function.__wake_up_common
>       2.46 ±  5%      -1.3        1.20 ±  7%  perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
>       1.54 ±  5%      -1.2        0.32 ±101%  perf-profile.calltrace.cycles-pp.update_load_avg.enqueue_entity.enqueue_task_fair.ttwu_do_activate.try_to_wake_up
>       0.97 ±  9%      -0.3        0.66 ± 19%  perf-profile.calltrace.cycles-pp.select_idle_sibling.select_task_rq_fair.select_task_rq.try_to_wake_up.autoremove_wake_function
>       0.86 ±  6%      +0.2        1.02        perf-profile.calltrace.cycles-pp.__might_fault._copy_from_iter.copy_page_from_iter.pipe_write.vfs_write
>       0.64 ±  9%      +0.5        1.16 ±  5%  perf-profile.calltrace.cycles-pp.mutex_unlock.pipe_read.vfs_read.ksys_read.do_syscall_64
>       0.47 ± 45%      +0.5        0.99 ±  5%  perf-profile.calltrace.cycles-pp.__fget_light.__fdget_pos.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       0.60 ±  8%      +0.5        1.13 ±  5%  perf-profile.calltrace.cycles-pp.__fdget_pos.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
>       0.00            +0.5        0.54 ±  5%  perf-profile.calltrace.cycles-pp.current_time.file_update_time.pipe_write.vfs_write.ksys_write
>       0.00            +0.6        0.56 ±  4%  perf-profile.calltrace.cycles-pp.__might_resched.__might_fault._copy_from_iter.copy_page_from_iter.pipe_write
>       0.00            +0.6        0.56 ±  7%  perf-profile.calltrace.cycles-pp.__might_resched.__might_fault._copy_to_iter.copy_page_to_iter.pipe_read
>       0.00            +0.6        0.58 ±  5%  perf-profile.calltrace.cycles-pp.__might_resched.mutex_lock.pipe_write.vfs_write.ksys_write
>       0.00            +0.6        0.62 ±  3%  perf-profile.calltrace.cycles-pp.__might_resched.mutex_lock.pipe_read.vfs_read.ksys_read
>       0.00            +0.7        0.65 ±  6%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.prepare_to_wait_event.pipe_write.vfs_write
>       0.00            +0.7        0.65 ±  7%  perf-profile.calltrace.cycles-pp.poll_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
>       0.57 ±  5%      +0.7        1.24 ±  6%  perf-profile.calltrace.cycles-pp.__fget_light.__fdget_pos.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       0.00            +0.7        0.72 ±  6%  perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.prepare_to_wait_event.pipe_write.vfs_write.ksys_write
>       0.00            +0.8        0.75 ±  6%  perf-profile.calltrace.cycles-pp.mutex_spin_on_owner.__mutex_lock.pipe_write.vfs_write.ksys_write
>       0.74 ±  9%      +0.8        1.48 ±  5%  perf-profile.calltrace.cycles-pp.file_update_time.pipe_write.vfs_write.ksys_write.do_syscall_64
>       0.63 ±  5%      +0.8        1.40 ±  5%  perf-profile.calltrace.cycles-pp.__fdget_pos.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write
>       0.00            +0.8        0.78 ± 19%  perf-profile.calltrace.cycles-pp.reader__read_event.perf_session__process_events.record__finish_output.__cmd_record
>       0.00            +0.8        0.78 ± 19%  perf-profile.calltrace.cycles-pp.record__finish_output.__cmd_record
>       0.00            +0.8        0.78 ± 19%  perf-profile.calltrace.cycles-pp.perf_session__process_events.record__finish_output.__cmd_record
>       0.00            +0.8        0.80 ± 15%  perf-profile.calltrace.cycles-pp.__cmd_record
>       0.00            +0.8        0.82 ± 11%  perf-profile.calltrace.cycles-pp.intel_idle_irq.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
>       0.00            +0.9        0.85 ±  6%  perf-profile.calltrace.cycles-pp.prepare_to_wait_event.pipe_write.vfs_write.ksys_write.do_syscall_64
>       0.00            +0.9        0.86 ±  4%  perf-profile.calltrace.cycles-pp.current_time.atime_needs_update.touch_atime.pipe_read.vfs_read
>       0.00            +0.9        0.87 ±  5%  perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.__libc_write
>       0.00            +0.9        0.88 ±  5%  perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.__libc_read
>       0.26 ±100%      +1.0        1.22 ± 10%  perf-profile.calltrace.cycles-pp.__schedule.schedule.pipe_write.vfs_write.ksys_write
>       0.00            +1.0        0.96 ±  6%  perf-profile.calltrace.cycles-pp.__might_fault._copy_to_iter.copy_page_to_iter.pipe_read.vfs_read
>       0.27 ±100%      +1.0        1.23 ± 10%  perf-profile.calltrace.cycles-pp.schedule.pipe_write.vfs_write.ksys_write.do_syscall_64
>       0.00            +1.0        0.97 ±  7%  perf-profile.calltrace.cycles-pp.page_counter_uncharge.uncharge_batch.__mem_cgroup_uncharge.__folio_put.pipe_read
>       0.87 ±  8%      +1.1        1.98 ±  5%  perf-profile.calltrace.cycles-pp.apparmor_file_permission.security_file_permission.vfs_read.ksys_read.do_syscall_64
>       0.73 ±  6%      +1.1        1.85 ±  5%  perf-profile.calltrace.cycles-pp.apparmor_file_permission.security_file_permission.vfs_write.ksys_write.do_syscall_64
>       0.00            +1.2        1.15 ±  7%  perf-profile.calltrace.cycles-pp.uncharge_batch.__mem_cgroup_uncharge.__folio_put.pipe_read.vfs_read
>       0.00            +1.2        1.23 ±  6%  perf-profile.calltrace.cycles-pp.__mem_cgroup_uncharge.__folio_put.pipe_read.vfs_read.ksys_read
>       0.00            +1.2        1.24 ±  7%  perf-profile.calltrace.cycles-pp.__folio_put.pipe_read.vfs_read.ksys_read.do_syscall_64
>       0.48 ± 45%      +1.3        1.74 ±  6%  perf-profile.calltrace.cycles-pp.atime_needs_update.touch_atime.pipe_read.vfs_read.ksys_read
>       0.60 ±  7%      +1.3        1.87 ±  8%  perf-profile.calltrace.cycles-pp.__wake_up_common_lock.pipe_read.vfs_read.ksys_read.do_syscall_64
>       1.23 ±  7%      +1.3        2.51 ±  4%  perf-profile.calltrace.cycles-pp.mutex_lock.pipe_read.vfs_read.ksys_read.do_syscall_64
>      43.42            +1.3       44.75        perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write
>       0.83 ±  7%      +1.3        2.17 ±  5%  perf-profile.calltrace.cycles-pp.security_file_permission.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       0.98 ±  7%      +1.4        2.36 ±  6%  perf-profile.calltrace.cycles-pp.security_file_permission.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       0.27 ±100%      +1.4        1.70 ±  9%  perf-profile.calltrace.cycles-pp.__wake_up_common.__wake_up_common_lock.pipe_read.vfs_read.ksys_read
>       0.79 ±  8%      +1.4        2.23 ±  6%  perf-profile.calltrace.cycles-pp.touch_atime.pipe_read.vfs_read.ksys_read.do_syscall_64
>       0.18 ±141%      +1.5        1.63 ±  9%  perf-profile.calltrace.cycles-pp.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock.pipe_read
>       0.18 ±141%      +1.5        1.67 ±  9%  perf-profile.calltrace.cycles-pp.autoremove_wake_function.__wake_up_common.__wake_up_common_lock.pipe_read.vfs_read
>       0.00            +1.6        1.57 ± 10%  perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
>       0.00            +1.6        1.57 ± 10%  perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
>       1.05 ±  8%      +1.7        2.73 ±  6%  perf-profile.calltrace.cycles-pp.copy_user_enhanced_fast_string.copyin._copy_from_iter.copy_page_from_iter.pipe_write
>       1.84 ±  9%      +1.7        3.56 ±  5%  perf-profile.calltrace.cycles-pp.copy_user_enhanced_fast_string.copyout._copy_to_iter.copy_page_to_iter.pipe_read
>       1.41 ±  9%      +1.8        3.17 ±  6%  perf-profile.calltrace.cycles-pp.copyin._copy_from_iter.copy_page_from_iter.pipe_write.vfs_write
>       0.00            +1.8        1.79 ±  9%  perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
>       1.99 ±  9%      +2.0        3.95 ±  5%  perf-profile.calltrace.cycles-pp.copyout._copy_to_iter.copy_page_to_iter.pipe_read.vfs_read
>       2.40 ±  7%      +2.4        4.82 ±  5%  perf-profile.calltrace.cycles-pp._copy_from_iter.copy_page_from_iter.pipe_write.vfs_write.ksys_write
>       0.00            +2.5        2.50 ±  7%  perf-profile.calltrace.cycles-pp.__mutex_lock.pipe_write.vfs_write.ksys_write.do_syscall_64
>       2.89 ±  8%      +2.6        5.47 ±  5%  perf-profile.calltrace.cycles-pp.copy_page_from_iter.pipe_write.vfs_write.ksys_write.do_syscall_64
>       1.04 ± 30%      +2.8        3.86 ±  5%  perf-profile.calltrace.cycles-pp.__entry_text_start.__libc_write
>       0.00            +2.9        2.90 ± 11%  perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
>       0.00            +2.9        2.91 ± 11%  perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
>       0.00            +2.9        2.91 ± 11%  perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64_no_verify
>       0.85 ± 27%      +2.9        3.80 ±  5%  perf-profile.calltrace.cycles-pp.__entry_text_start.__libc_read
>       0.00            +3.0        2.96 ± 11%  perf-profile.calltrace.cycles-pp.secondary_startup_64_no_verify
>       2.60 ±  9%      +3.1        5.74 ±  6%  perf-profile.calltrace.cycles-pp._copy_to_iter.copy_page_to_iter.pipe_read.vfs_read.ksys_read
>       2.93 ±  9%      +3.7        6.66 ±  5%  perf-profile.calltrace.cycles-pp.copy_page_to_iter.pipe_read.vfs_read.ksys_read.do_syscall_64
>       1.60 ± 12%      +4.6        6.18 ±  7%  perf-profile.calltrace.cycles-pp.mutex_unlock.pipe_write.vfs_write.ksys_write.do_syscall_64
>       2.60 ± 10%      +4.6        7.24 ±  5%  perf-profile.calltrace.cycles-pp.mutex_lock.pipe_write.vfs_write.ksys_write.do_syscall_64
>      28.75 ±  5%     -21.6        7.19 ± 28%  perf-profile.children.cycles-pp.schedule
>      30.52 ±  4%     -21.6        8.97 ± 22%  perf-profile.children.cycles-pp.__wake_up_common_lock
>      28.53 ±  6%     -21.0        7.56 ± 26%  perf-profile.children.cycles-pp.__schedule
>      29.04 ±  5%     -20.4        8.63 ± 23%  perf-profile.children.cycles-pp.__wake_up_common
>      28.37 ±  5%     -19.9        8.44 ± 23%  perf-profile.children.cycles-pp.autoremove_wake_function
>      28.08 ±  5%     -19.7        8.33 ± 23%  perf-profile.children.cycles-pp.try_to_wake_up
>      13.90 ±  2%     -10.2        3.75 ± 28%  perf-profile.children.cycles-pp.ttwu_do_activate
>      12.66 ±  3%      -9.2        3.47 ± 29%  perf-profile.children.cycles-pp.enqueue_task_fair
>      34.20            -9.2       25.05 ±  3%  perf-profile.children.cycles-pp.pipe_read
>      90.86            -9.1       81.73        perf-profile.children.cycles-pp.do_syscall_64
>      91.80            -8.3       83.49        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
>      10.28 ±  7%      -7.8        2.53 ± 27%  perf-profile.children.cycles-pp._raw_spin_lock
>       9.85 ±  7%      -6.9        2.92 ± 29%  perf-profile.children.cycles-pp.dequeue_task_fair
>       8.69 ±  7%      -6.6        2.05 ± 24%  perf-profile.children.cycles-pp.exit_to_user_mode_prepare
>       8.99 ±  6%      -6.2        2.81 ± 16%  perf-profile.children.cycles-pp.syscall_exit_to_user_mode
>      36.46            -6.1       30.34        perf-profile.children.cycles-pp.vfs_read
>       8.38 ±  8%      -5.8        2.60 ± 23%  perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
>       6.10 ± 11%      -5.4        0.66 ± 61%  perf-profile.children.cycles-pp.exit_to_user_mode_loop
>      37.45            -5.3       32.13        perf-profile.children.cycles-pp.ksys_read
>       6.50 ± 35%      -4.9        1.62 ± 61%  perf-profile.children.cycles-pp.update_curr
>       6.56 ± 15%      -4.6        1.95 ± 57%  perf-profile.children.cycles-pp.update_cfs_group
>       6.38 ± 14%      -4.5        1.91 ± 28%  perf-profile.children.cycles-pp.enqueue_entity
>       5.74 ±  5%      -3.8        1.92 ± 25%  perf-profile.children.cycles-pp.update_load_avg
>      45.56            -3.8       41.75        perf-profile.children.cycles-pp.__libc_read
>       3.99 ±  4%      -3.1        0.92 ± 24%  perf-profile.children.cycles-pp.pick_next_task_fair
>       4.12 ± 27%      -2.7        1.39 ± 34%  perf-profile.children.cycles-pp.dequeue_entity
>      40.88            -2.5       38.37        perf-profile.children.cycles-pp.pipe_write
>       3.11 ±  4%      -2.4        0.75 ± 22%  perf-profile.children.cycles-pp.switch_mm_irqs_off
>       2.06 ± 33%      -1.8        0.27 ± 27%  perf-profile.children.cycles-pp.asm_sysvec_call_function_single
>       2.38 ± 41%      -1.8        0.60 ± 72%  perf-profile.children.cycles-pp.perf_trace_sched_wakeup_template
>       2.29 ±  5%      -1.7        0.60 ± 25%  perf-profile.children.cycles-pp.switch_fpu_return
>       2.30 ±  6%      -1.6        0.68 ± 18%  perf-profile.children.cycles-pp.prepare_task_switch
>       1.82 ± 33%      -1.6        0.22 ± 31%  perf-profile.children.cycles-pp.sysvec_call_function_single
>       1.77 ± 33%      -1.6        0.20 ± 32%  perf-profile.children.cycles-pp.__sysvec_call_function_single
>       1.96 ±  5%      -1.5        0.50 ± 20%  perf-profile.children.cycles-pp.reweight_entity
>       2.80 ±  7%      -1.2        1.60 ± 12%  perf-profile.children.cycles-pp.select_task_rq
>       1.61 ±  6%      -1.2        0.42 ± 25%  perf-profile.children.cycles-pp.restore_fpregs_from_fpstate
>       1.34 ±  9%      -1.2        0.16 ± 28%  perf-profile.children.cycles-pp._raw_spin_unlock_irqrestore
>       1.62 ±  4%      -1.2        0.45 ± 22%  perf-profile.children.cycles-pp.set_next_entity
>       1.55 ±  8%      -1.1        0.43 ± 12%  perf-profile.children.cycles-pp.update_rq_clock
>       1.49 ±  8%      -1.1        0.41 ± 14%  perf-profile.children.cycles-pp.__update_load_avg_cfs_rq
>       1.30 ± 20%      -1.0        0.26 ± 18%  perf-profile.children.cycles-pp.finish_task_switch
>       1.44 ±  5%      -1.0        0.42 ± 19%  perf-profile.children.cycles-pp.__switch_to_asm
>       2.47 ±  7%      -1.0        1.50 ± 12%  perf-profile.children.cycles-pp.select_task_rq_fair
>       2.33 ±  7%      -0.9        1.40 ±  3%  perf-profile.children.cycles-pp.prepare_to_wait_event
>       1.24 ±  7%      -0.9        0.35 ± 14%  perf-profile.children.cycles-pp.__update_load_avg_se
>       1.41 ± 32%      -0.9        0.56 ± 24%  perf-profile.children.cycles-pp.sched_ttwu_pending
>       2.29 ±  8%      -0.8        1.45 ±  3%  perf-profile.children.cycles-pp._raw_spin_lock_irqsave
>       1.04 ±  7%      -0.8        0.24 ± 22%  perf-profile.children.cycles-pp.check_preempt_curr
>       1.01 ±  3%      -0.7        0.30 ± 20%  perf-profile.children.cycles-pp.__switch_to
>       0.92 ±  7%      -0.7        0.26 ± 12%  perf-profile.children.cycles-pp.update_min_vruntime
>       0.71 ±  2%      -0.6        0.08 ± 75%  perf-profile.children.cycles-pp.put_prev_entity
>       0.76 ±  6%      -0.6        0.14 ± 32%  perf-profile.children.cycles-pp.check_preempt_wakeup
>       0.81 ± 66%      -0.6        0.22 ± 34%  perf-profile.children.cycles-pp.set_task_cpu
>       0.82 ± 17%      -0.6        0.23 ± 10%  perf-profile.children.cycles-pp.cpuacct_charge
>       1.08 ± 15%      -0.6        0.51 ± 10%  perf-profile.children.cycles-pp.wake_affine
>       0.56 ± 15%      -0.5        0.03 ±100%  perf-profile.children.cycles-pp.asm_sysvec_reschedule_ipi
>       0.66 ±  3%      -0.5        0.15 ± 28%  perf-profile.children.cycles-pp.os_xsave
>       0.52 ± 44%      -0.5        0.06 ±151%  perf-profile.children.cycles-pp.native_irq_return_iret
>       0.55 ±  5%      -0.4        0.15 ± 21%  perf-profile.children.cycles-pp.__calc_delta
>       0.56 ± 10%      -0.4        0.17 ± 26%  perf-profile.children.cycles-pp.___perf_sw_event
>       0.70 ± 15%      -0.4        0.32 ± 11%  perf-profile.children.cycles-pp.task_h_load
>       0.40 ±  4%      -0.3        0.06 ± 49%  perf-profile.children.cycles-pp.pick_next_entity
>       0.57 ±  6%      -0.3        0.26 ±  7%  perf-profile.children.cycles-pp.__list_del_entry_valid
>       0.39 ±  8%      -0.3        0.08 ± 24%  perf-profile.children.cycles-pp.set_next_buddy
>       0.64 ±  6%      -0.3        0.36 ±  6%  perf-profile.children.cycles-pp._raw_spin_lock_irq
>       0.53 ± 20%      -0.3        0.25 ±  8%  perf-profile.children.cycles-pp.ttwu_queue_wakelist
>       0.36 ±  8%      -0.3        0.08 ± 11%  perf-profile.children.cycles-pp.rb_insert_color
>       0.41 ±  6%      -0.3        0.14 ± 17%  perf-profile.children.cycles-pp.sched_clock_cpu
>       0.36 ± 33%      -0.3        0.10 ± 17%  perf-profile.children.cycles-pp.__flush_smp_call_function_queue
>       0.37 ±  4%      -0.2        0.13 ± 16%  perf-profile.children.cycles-pp.native_sched_clock
>       0.28 ±  5%      -0.2        0.07 ± 18%  perf-profile.children.cycles-pp.rb_erase
>       0.32 ±  7%      -0.2        0.12 ± 10%  perf-profile.children.cycles-pp.__list_add_valid
>       0.23 ±  6%      -0.2        0.03 ±103%  perf-profile.children.cycles-pp.resched_curr
>       0.27 ±  5%      -0.2        0.08 ± 20%  perf-profile.children.cycles-pp.__wrgsbase_inactive
>       0.26 ±  6%      -0.2        0.08 ± 17%  perf-profile.children.cycles-pp.finish_wait
>       0.26 ±  4%      -0.2        0.08 ± 11%  perf-profile.children.cycles-pp.rcu_note_context_switch
>       0.33 ± 21%      -0.2        0.15 ± 32%  perf-profile.children.cycles-pp.migrate_task_rq_fair
>       0.22 ±  9%      -0.2        0.07 ± 22%  perf-profile.children.cycles-pp.perf_trace_buf_update
>       0.17 ±  8%      -0.1        0.03 ±100%  perf-profile.children.cycles-pp.rb_next
>       0.15 ± 32%      -0.1        0.03 ±100%  perf-profile.children.cycles-pp.llist_reverse_order
>       0.34 ±  7%      -0.1        0.26 ±  3%  perf-profile.children.cycles-pp.anon_pipe_buf_release
>       0.14 ±  6%      -0.1        0.07 ± 17%  perf-profile.children.cycles-pp.read@plt
>       0.10 ± 17%      -0.1        0.04 ± 75%  perf-profile.children.cycles-pp.remove_entity_load_avg
>       0.07 ± 10%      -0.0        0.02 ± 99%  perf-profile.children.cycles-pp.generic_update_time
>       0.11 ±  6%      -0.0        0.07 ±  8%  perf-profile.children.cycles-pp.__mark_inode_dirty
>       0.00            +0.1        0.06 ±  9%  perf-profile.children.cycles-pp.load_balance
>       0.00            +0.1        0.06 ± 11%  perf-profile.children.cycles-pp._raw_spin_trylock
>       0.00            +0.1        0.06 ±  7%  perf-profile.children.cycles-pp.uncharge_folio
>       0.00            +0.1        0.06 ±  7%  perf-profile.children.cycles-pp.__do_softirq
>       0.00            +0.1        0.07 ± 10%  perf-profile.children.cycles-pp.tick_nohz_get_sleep_length
>       0.00            +0.1        0.08 ± 14%  perf-profile.children.cycles-pp.__get_obj_cgroup_from_memcg
>       0.15 ± 23%      +0.1        0.23 ±  7%  perf-profile.children.cycles-pp.task_tick_fair
>       0.19 ± 17%      +0.1        0.28 ±  7%  perf-profile.children.cycles-pp.scheduler_tick
>       0.00            +0.1        0.10 ± 21%  perf-profile.children.cycles-pp.select_idle_core
>       0.00            +0.1        0.10 ±  9%  perf-profile.children.cycles-pp.osq_unlock
>       0.23 ± 12%      +0.1        0.34 ±  6%  perf-profile.children.cycles-pp.update_process_times
>       0.37 ± 13%      +0.1        0.48 ±  5%  perf-profile.children.cycles-pp.hrtimer_interrupt
>       0.24 ± 12%      +0.1        0.35 ±  6%  perf-profile.children.cycles-pp.tick_sched_handle
>       0.31 ± 14%      +0.1        0.43 ±  4%  perf-profile.children.cycles-pp.__hrtimer_run_queues
>       0.37 ± 12%      +0.1        0.49 ±  5%  perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
>       0.00            +0.1        0.12 ± 10%  perf-profile.children.cycles-pp.__mod_memcg_state
>       0.26 ± 10%      +0.1        0.38 ±  6%  perf-profile.children.cycles-pp.tick_sched_timer
>       0.00            +0.1        0.13 ±  7%  perf-profile.children.cycles-pp.free_unref_page
>       0.00            +0.1        0.14 ±  8%  perf-profile.children.cycles-pp.rmqueue
>       0.15 ±  8%      +0.2        0.30 ±  5%  perf-profile.children.cycles-pp.rcu_all_qs
>       0.16 ±  6%      +0.2        0.31 ±  5%  perf-profile.children.cycles-pp.__x64_sys_write
>       0.00            +0.2        0.16 ± 10%  perf-profile.children.cycles-pp.propagate_protected_usage
>       0.00            +0.2        0.16 ± 10%  perf-profile.children.cycles-pp.menu_select
>       0.00            +0.2        0.16 ±  9%  perf-profile.children.cycles-pp.memcg_account_kmem
>       0.42 ± 12%      +0.2        0.57 ±  4%  perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
>       0.15 ± 11%      +0.2        0.31 ±  8%  perf-profile.children.cycles-pp.__x64_sys_read
>       0.00            +0.2        0.17 ±  8%  perf-profile.children.cycles-pp.get_page_from_freelist
>       0.44 ± 11%      +0.2        0.62 ±  4%  perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
>       0.10 ± 31%      +0.2        0.28 ± 24%  perf-profile.children.cycles-pp.mnt_user_ns
>       0.16 ±  4%      +0.2        0.35 ±  5%  perf-profile.children.cycles-pp.kill_fasync
>       0.20 ± 10%      +0.2        0.40 ±  3%  perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare
>       0.09 ±  7%      +0.2        0.29 ±  4%  perf-profile.children.cycles-pp.page_copy_sane
>       0.08 ±  8%      +0.2        0.31 ±  6%  perf-profile.children.cycles-pp.rw_verify_area
>       0.12 ± 11%      +0.2        0.36 ±  8%  perf-profile.children.cycles-pp.ktime_get_coarse_real_ts64
>       0.28 ± 12%      +0.2        0.52 ±  5%  perf-profile.children.cycles-pp.inode_needs_update_time
>       0.00            +0.3        0.27 ±  7%  perf-profile.children.cycles-pp.__memcg_kmem_charge_page
>       0.43 ±  6%      +0.3        0.73 ±  5%  perf-profile.children.cycles-pp.__cond_resched
>       0.21 ± 29%      +0.3        0.54 ± 15%  perf-profile.children.cycles-pp.select_idle_cpu
>       0.10 ± 10%      +0.3        0.43 ± 17%  perf-profile.children.cycles-pp.fsnotify_perm
>       0.23 ± 11%      +0.3        0.56 ±  6%  perf-profile.children.cycles-pp.syscall_enter_from_user_mode
>       0.06 ± 75%      +0.4        0.47 ± 27%  perf-profile.children.cycles-pp.queue_event
>       0.21 ±  9%      +0.4        0.62 ±  5%  perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
>       0.06 ± 75%      +0.4        0.48 ± 26%  perf-profile.children.cycles-pp.ordered_events__queue
>       0.06 ± 73%      +0.4        0.50 ± 24%  perf-profile.children.cycles-pp.process_simple
>       0.01 ±223%      +0.4        0.44 ±  9%  perf-profile.children.cycles-pp.schedule_idle
>       0.05 ±  8%      +0.5        0.52 ±  7%  perf-profile.children.cycles-pp.__alloc_pages
>       0.45 ±  7%      +0.5        0.94 ±  5%  perf-profile.children.cycles-pp.__get_task_ioprio
>       0.89 ±  8%      +0.5        1.41 ±  4%  perf-profile.children.cycles-pp.__might_sleep
>       0.01 ±223%      +0.5        0.54 ± 21%  perf-profile.children.cycles-pp.flush_smp_call_function_queue
>       0.05 ± 46%      +0.5        0.60 ±  7%  perf-profile.children.cycles-pp.osq_lock
>       0.34 ±  8%      +0.6        0.90 ±  5%  perf-profile.children.cycles-pp.aa_file_perm
>       0.01 ±223%      +0.7        0.67 ±  7%  perf-profile.children.cycles-pp.poll_idle
>       0.14 ± 17%      +0.7        0.82 ±  6%  perf-profile.children.cycles-pp.mutex_spin_on_owner
>       0.12 ± 12%      +0.7        0.82 ± 15%  perf-profile.children.cycles-pp.__cmd_record
>       0.07 ± 72%      +0.7        0.78 ± 19%  perf-profile.children.cycles-pp.reader__read_event
>       0.07 ± 72%      +0.7        0.78 ± 19%  perf-profile.children.cycles-pp.record__finish_output
>       0.07 ± 72%      +0.7        0.78 ± 19%  perf-profile.children.cycles-pp.perf_session__process_events
>       0.76 ±  8%      +0.8        1.52 ±  5%  perf-profile.children.cycles-pp.file_update_time
>       0.08 ± 61%      +0.8        0.85 ± 11%  perf-profile.children.cycles-pp.intel_idle_irq
>       1.23 ±  8%      +0.9        2.11 ±  4%  perf-profile.children.cycles-pp.__might_fault
>       0.02 ±141%      +1.0        0.97 ±  7%  perf-profile.children.cycles-pp.page_counter_uncharge
>       0.51 ±  9%      +1.0        1.48 ±  4%  perf-profile.children.cycles-pp.current_time
>       0.05 ± 46%      +1.1        1.15 ±  7%  perf-profile.children.cycles-pp.uncharge_batch
>       1.12 ±  6%      +1.1        2.23 ±  5%  perf-profile.children.cycles-pp.__fget_light
>       0.06 ± 14%      +1.2        1.23 ±  6%  perf-profile.children.cycles-pp.__mem_cgroup_uncharge
>       0.06 ± 14%      +1.2        1.24 ±  7%  perf-profile.children.cycles-pp.__folio_put
>       0.64 ±  7%      +1.2        1.83 ±  5%  perf-profile.children.cycles-pp.syscall_return_via_sysret
>       1.19 ±  8%      +1.2        2.42 ±  4%  perf-profile.children.cycles-pp.__might_resched
>       0.59 ±  9%      +1.3        1.84 ±  6%  perf-profile.children.cycles-pp.atime_needs_update
>      43.47            +1.4       44.83        perf-profile.children.cycles-pp.ksys_write
>       1.28 ±  6%      +1.4        2.68 ±  5%  perf-profile.children.cycles-pp.__fdget_pos
>       0.80 ±  8%      +1.5        2.28 ±  6%  perf-profile.children.cycles-pp.touch_atime
>       0.11 ± 49%      +1.5        1.59 ±  9%  perf-profile.children.cycles-pp.cpuidle_enter_state
>       0.11 ± 49%      +1.5        1.60 ±  9%  perf-profile.children.cycles-pp.cpuidle_enter
>       0.12 ± 51%      +1.7        1.81 ±  9%  perf-profile.children.cycles-pp.cpuidle_idle_call
>       1.44 ±  8%      +1.8        3.22 ±  6%  perf-profile.children.cycles-pp.copyin
>       2.00 ±  9%      +2.0        4.03 ±  5%  perf-profile.children.cycles-pp.copyout
>       1.02 ±  8%      +2.0        3.07 ±  5%  perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
>       1.63 ±  7%      +2.3        3.90 ±  5%  perf-profile.children.cycles-pp.apparmor_file_permission
>       2.64 ±  8%      +2.3        4.98 ±  5%  perf-profile.children.cycles-pp._copy_from_iter
>       0.40 ± 14%      +2.5        2.92 ±  7%  perf-profile.children.cycles-pp.__mutex_lock
>       2.91 ±  8%      +2.6        5.54 ±  5%  perf-profile.children.cycles-pp.copy_page_from_iter
>       0.17 ± 62%      +2.7        2.91 ± 11%  perf-profile.children.cycles-pp.start_secondary
>       1.83 ±  7%      +2.8        4.59 ±  5%  perf-profile.children.cycles-pp.security_file_permission
>       0.17 ± 60%      +2.8        2.94 ± 11%  perf-profile.children.cycles-pp.do_idle
>       0.17 ± 60%      +2.8        2.96 ± 11%  perf-profile.children.cycles-pp.secondary_startup_64_no_verify
>       0.17 ± 60%      +2.8        2.96 ± 11%  perf-profile.children.cycles-pp.cpu_startup_entry
>       2.62 ±  9%      +3.2        5.84 ±  6%  perf-profile.children.cycles-pp._copy_to_iter
>       1.55 ±  8%      +3.2        4.79 ±  5%  perf-profile.children.cycles-pp.__entry_text_start
>       3.09 ±  8%      +3.7        6.77 ±  5%  perf-profile.children.cycles-pp.copy_user_enhanced_fast_string
>       2.95 ±  9%      +3.8        6.73 ±  5%  perf-profile.children.cycles-pp.copy_page_to_iter
>       2.28 ± 11%      +5.1        7.40 ±  6%  perf-profile.children.cycles-pp.mutex_unlock
>       3.92 ±  9%      +6.0        9.94 ±  5%  perf-profile.children.cycles-pp.mutex_lock
>       8.37 ±  9%      -5.8        2.60 ± 23%  perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
>       6.54 ± 15%      -4.6        1.95 ± 57%  perf-profile.self.cycles-pp.update_cfs_group
>       3.08 ±  4%      -2.3        0.74 ± 22%  perf-profile.self.cycles-pp.switch_mm_irqs_off
>       2.96 ±  4%      -1.8        1.13 ± 33%  perf-profile.self.cycles-pp.update_load_avg
>       2.22 ±  8%      -1.5        0.74 ± 12%  perf-profile.self.cycles-pp._raw_spin_lock_irqsave
>       1.96 ±  9%      -1.5        0.48 ± 15%  perf-profile.self.cycles-pp.update_curr
>       1.94 ±  5%      -1.3        0.64 ± 16%  perf-profile.self.cycles-pp._raw_spin_lock
>       1.78 ±  5%      -1.3        0.50 ± 18%  perf-profile.self.cycles-pp.__schedule
>       1.59 ±  7%      -1.2        0.40 ± 12%  perf-profile.self.cycles-pp.enqueue_entity
>       1.61 ±  6%      -1.2        0.42 ± 25%  perf-profile.self.cycles-pp.restore_fpregs_from_fpstate
>       1.44 ±  8%      -1.0        0.39 ± 14%  perf-profile.self.cycles-pp.__update_load_avg_cfs_rq
>       1.42 ±  5%      -1.0        0.41 ± 19%  perf-profile.self.cycles-pp.__switch_to_asm
>       1.18 ±  7%      -0.9        0.33 ± 14%  perf-profile.self.cycles-pp.__update_load_avg_se
>       1.14 ± 10%      -0.8        0.31 ±  9%  perf-profile.self.cycles-pp.update_rq_clock
>       0.90 ±  7%      -0.7        0.19 ± 21%  perf-profile.self.cycles-pp.pick_next_task_fair
>       1.04 ±  7%      -0.7        0.33 ± 13%  perf-profile.self.cycles-pp.prepare_task_switch
>       0.98 ±  4%      -0.7        0.29 ± 20%  perf-profile.self.cycles-pp.__switch_to
>       0.88 ±  6%      -0.7        0.20 ± 17%  perf-profile.self.cycles-pp.enqueue_task_fair
>       1.01 ±  6%      -0.7        0.35 ± 10%  perf-profile.self.cycles-pp.prepare_to_wait_event
>       0.90 ±  8%      -0.6        0.25 ± 12%  perf-profile.self.cycles-pp.update_min_vruntime
>       0.79 ± 17%      -0.6        0.22 ±  9%  perf-profile.self.cycles-pp.cpuacct_charge
>       1.10 ±  5%      -0.6        0.54 ±  9%  perf-profile.self.cycles-pp.try_to_wake_up
>       0.66 ±  3%      -0.5        0.15 ± 27%  perf-profile.self.cycles-pp.os_xsave
>       0.71 ±  6%      -0.5        0.22 ± 18%  perf-profile.self.cycles-pp.reweight_entity
>       0.68 ±  9%      -0.5        0.19 ± 10%  perf-profile.self.cycles-pp.perf_trace_sched_wakeup_template
>       0.67 ±  9%      -0.5        0.18 ± 11%  perf-profile.self.cycles-pp.__wake_up_common
>       0.65 ±  6%      -0.5        0.17 ± 23%  perf-profile.self.cycles-pp.switch_fpu_return
>       0.60 ± 11%      -0.5        0.14 ± 28%  perf-profile.self.cycles-pp.perf_tp_event
>       0.52 ± 44%      -0.5        0.06 ±151%  perf-profile.self.cycles-pp.native_irq_return_iret
>       0.52 ±  7%      -0.4        0.08 ± 25%  perf-profile.self.cycles-pp._raw_spin_unlock_irqrestore
>       0.55 ±  4%      -0.4        0.15 ± 22%  perf-profile.self.cycles-pp.__calc_delta
>       0.61 ±  5%      -0.4        0.21 ± 12%  perf-profile.self.cycles-pp.dequeue_task_fair
>       0.69 ± 14%      -0.4        0.32 ± 11%  perf-profile.self.cycles-pp.task_h_load
>       0.49 ± 11%      -0.3        0.15 ± 29%  perf-profile.self.cycles-pp.___perf_sw_event
>       0.37 ±  4%      -0.3        0.05 ± 73%  perf-profile.self.cycles-pp.pick_next_entity
>       0.50 ±  3%      -0.3        0.19 ± 15%  perf-profile.self.cycles-pp.select_idle_sibling
>       0.38 ±  9%      -0.3        0.08 ± 24%  perf-profile.self.cycles-pp.set_next_buddy
>       0.32 ±  4%      -0.3        0.03 ±100%  perf-profile.self.cycles-pp.put_prev_entity
>       0.64 ±  6%      -0.3        0.35 ±  7%  perf-profile.self.cycles-pp._raw_spin_lock_irq
>       0.52 ±  5%      -0.3        0.25 ±  6%  perf-profile.self.cycles-pp.__list_del_entry_valid
>       0.34 ±  5%      -0.3        0.07 ± 29%  perf-profile.self.cycles-pp.schedule
>       0.35 ±  9%      -0.3        0.08 ± 10%  perf-profile.self.cycles-pp.rb_insert_color
>       0.40 ±  5%      -0.3        0.14 ± 16%  perf-profile.self.cycles-pp.select_task_rq_fair
>       0.33 ±  6%      -0.3        0.08 ± 16%  perf-profile.self.cycles-pp.check_preempt_wakeup
>       0.33 ±  8%      -0.2        0.10 ± 16%  perf-profile.self.cycles-pp.select_task_rq
>       0.36 ±  3%      -0.2        0.13 ± 16%  perf-profile.self.cycles-pp.native_sched_clock
>       0.32 ±  7%      -0.2        0.10 ± 14%  perf-profile.self.cycles-pp.finish_task_switch
>       0.32 ±  4%      -0.2        0.11 ± 13%  perf-profile.self.cycles-pp.dequeue_entity
>       0.32 ±  8%      -0.2        0.12 ± 10%  perf-profile.self.cycles-pp.__list_add_valid
>       0.23 ±  5%      -0.2        0.03 ±103%  perf-profile.self.cycles-pp.resched_curr
>       0.27 ±  6%      -0.2        0.07 ± 21%  perf-profile.self.cycles-pp.rb_erase
>       0.27 ±  5%      -0.2        0.08 ± 20%  perf-profile.self.cycles-pp.__wrgsbase_inactive
>       0.28 ± 13%      -0.2        0.09 ± 12%  perf-profile.self.cycles-pp.check_preempt_curr
>       0.30 ± 13%      -0.2        0.12 ±  7%  perf-profile.self.cycles-pp.ttwu_queue_wakelist
>       0.24 ±  5%      -0.2        0.06 ± 19%  perf-profile.self.cycles-pp.set_next_entity
>       0.21 ± 34%      -0.2        0.04 ± 71%  perf-profile.self.cycles-pp.__flush_smp_call_function_queue
>       0.25 ±  5%      -0.2        0.08 ± 16%  perf-profile.self.cycles-pp.rcu_note_context_switch
>       0.19 ± 26%      -0.1        0.04 ± 73%  perf-profile.self.cycles-pp.perf_trace_sched_stat_runtime
>       0.20 ±  8%      -0.1        0.06 ± 13%  perf-profile.self.cycles-pp.ttwu_do_activate
>       0.17 ±  8%      -0.1        0.03 ±100%  perf-profile.self.cycles-pp.rb_next
>       0.22 ± 23%      -0.1        0.09 ± 31%  perf-profile.self.cycles-pp.migrate_task_rq_fair
>       0.15 ± 32%      -0.1        0.03 ±100%  perf-profile.self.cycles-pp.llist_reverse_order
>       0.16 ±  8%      -0.1        0.06 ± 14%  perf-profile.self.cycles-pp.wake_affine
>       0.10 ± 31%      -0.1        0.03 ±100%  perf-profile.self.cycles-pp.sched_ttwu_pending
>       0.14 ±  5%      -0.1        0.07 ± 20%  perf-profile.self.cycles-pp.read@plt
>       0.32 ±  8%      -0.1        0.26 ±  3%  perf-profile.self.cycles-pp.anon_pipe_buf_release
>       0.10 ±  6%      -0.1        0.04 ± 45%  perf-profile.self.cycles-pp.__wake_up_common_lock
>       0.10 ±  9%      -0.0        0.07 ±  8%  perf-profile.self.cycles-pp.__mark_inode_dirty
>       0.00            +0.1        0.06 ± 11%  perf-profile.self.cycles-pp.free_unref_page
>       0.00            +0.1        0.06 ±  6%  perf-profile.self.cycles-pp.__alloc_pages
>       0.00            +0.1        0.06 ± 11%  perf-profile.self.cycles-pp._raw_spin_trylock
>       0.00            +0.1        0.06 ±  7%  perf-profile.self.cycles-pp.uncharge_folio
>       0.00            +0.1        0.06 ± 11%  perf-profile.self.cycles-pp.uncharge_batch
>       0.00            +0.1        0.07 ± 10%  perf-profile.self.cycles-pp.menu_select
>       0.00            +0.1        0.08 ± 14%  perf-profile.self.cycles-pp.__get_obj_cgroup_from_memcg
>       0.00            +0.1        0.08 ±  7%  perf-profile.self.cycles-pp.__memcg_kmem_charge_page
>       0.00            +0.1        0.10 ± 10%  perf-profile.self.cycles-pp.osq_unlock
>       0.07 ±  5%      +0.1        0.17 ±  8%  perf-profile.self.cycles-pp.copyin
>       0.00            +0.1        0.11 ± 11%  perf-profile.self.cycles-pp.__mod_memcg_state
>       0.13 ±  8%      +0.1        0.24 ±  6%  perf-profile.self.cycles-pp.rcu_all_qs
>       0.14 ±  5%      +0.1        0.28 ±  5%  perf-profile.self.cycles-pp.__x64_sys_write
>       0.07 ± 10%      +0.1        0.21 ±  5%  perf-profile.self.cycles-pp.page_copy_sane
>       0.13 ± 12%      +0.1        0.28 ±  9%  perf-profile.self.cycles-pp.__x64_sys_read
>       0.00            +0.2        0.15 ± 10%  perf-profile.self.cycles-pp.propagate_protected_usage
>       0.18 ±  9%      +0.2        0.33 ±  4%  perf-profile.self.cycles-pp.syscall_exit_to_user_mode_prepare
>       0.07 ±  8%      +0.2        0.23 ±  5%  perf-profile.self.cycles-pp.rw_verify_area
>       0.08 ± 34%      +0.2        0.24 ± 27%  perf-profile.self.cycles-pp.mnt_user_ns
>       0.13 ±  5%      +0.2        0.31 ±  7%  perf-profile.self.cycles-pp.kill_fasync
>       0.21 ±  8%      +0.2        0.39 ±  5%  perf-profile.self.cycles-pp.__might_fault
>       0.06 ± 13%      +0.2        0.26 ±  9%  perf-profile.self.cycles-pp.copyout
>       0.10 ± 11%      +0.2        0.31 ±  8%  perf-profile.self.cycles-pp.ktime_get_coarse_real_ts64
>       0.26 ± 13%      +0.2        0.49 ±  6%  perf-profile.self.cycles-pp.inode_needs_update_time
>       0.23 ±  8%      +0.2        0.47 ±  5%  perf-profile.self.cycles-pp.copy_page_from_iter
>       0.14 ±  7%      +0.2        0.38 ±  6%  perf-profile.self.cycles-pp.file_update_time
>       0.36 ±  7%      +0.3        0.62 ±  4%  perf-profile.self.cycles-pp.ksys_read
>       0.54 ± 13%      +0.3        0.80 ±  4%  perf-profile.self.cycles-pp._copy_from_iter
>       0.15 ±  5%      +0.3        0.41 ±  8%  perf-profile.self.cycles-pp.touch_atime
>       0.14 ±  5%      +0.3        0.40 ±  6%  perf-profile.self.cycles-pp.__cond_resched
>       0.18 ±  5%      +0.3        0.47 ±  4%  perf-profile.self.cycles-pp.syscall_exit_to_user_mode
>       0.16 ±  8%      +0.3        0.46 ±  6%  perf-profile.self.cycles-pp.syscall_enter_from_user_mode
>       0.16 ±  9%      +0.3        0.47 ±  6%  perf-profile.self.cycles-pp.__fdget_pos
>       1.79 ±  8%      +0.3        2.12 ±  3%  perf-profile.self.cycles-pp.pipe_read
>       0.10 ±  8%      +0.3        0.43 ± 17%  perf-profile.self.cycles-pp.fsnotify_perm
>       0.20 ±  4%      +0.4        0.55 ±  5%  perf-profile.self.cycles-pp.ksys_write
>       0.05 ± 76%      +0.4        0.46 ± 27%  perf-profile.self.cycles-pp.queue_event
>       0.32 ±  6%      +0.4        0.73 ±  6%  perf-profile.self.cycles-pp.exit_to_user_mode_prepare
>       0.21 ±  9%      +0.4        0.62 ±  6%  perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
>       0.79 ±  8%      +0.4        1.22 ±  4%  perf-profile.self.cycles-pp.__might_sleep
>       0.44 ±  5%      +0.4        0.88 ±  7%  perf-profile.self.cycles-pp.do_syscall_64
>       0.26 ±  8%      +0.4        0.70 ±  4%  perf-profile.self.cycles-pp.atime_needs_update
>       0.42 ±  7%      +0.5        0.88 ±  5%  perf-profile.self.cycles-pp.__get_task_ioprio
>       0.28 ± 12%      +0.5        0.75 ±  5%  perf-profile.self.cycles-pp.copy_page_to_iter
>       0.19 ±  6%      +0.5        0.68 ± 10%  perf-profile.self.cycles-pp.security_file_permission
>       0.31 ±  8%      +0.5        0.83 ±  5%  perf-profile.self.cycles-pp.aa_file_perm
>       0.05 ± 46%      +0.5        0.59 ±  8%  perf-profile.self.cycles-pp.osq_lock
>       0.30 ±  7%      +0.5        0.85 ±  6%  perf-profile.self.cycles-pp._copy_to_iter
>       0.00            +0.6        0.59 ±  6%  perf-profile.self.cycles-pp.poll_idle
>       0.13 ± 20%      +0.7        0.81 ±  6%  perf-profile.self.cycles-pp.mutex_spin_on_owner
>       0.38 ±  9%      +0.7        1.12 ±  5%  perf-profile.self.cycles-pp.current_time
>       0.08 ± 59%      +0.8        0.82 ± 11%  perf-profile.self.cycles-pp.intel_idle_irq
>       0.92 ±  6%      +0.8        1.72 ±  4%  perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
>       0.01 ±223%      +0.8        0.82 ±  6%  perf-profile.self.cycles-pp.page_counter_uncharge
>       0.86 ±  7%      +1.1        1.91 ±  4%  perf-profile.self.cycles-pp.vfs_read
>       1.07 ±  6%      +1.1        2.14 ±  5%  perf-profile.self.cycles-pp.__fget_light
>       0.67 ±  7%      +1.1        1.74 ±  6%  perf-profile.self.cycles-pp.vfs_write
>       0.15 ± 12%      +1.1        1.28 ±  7%  perf-profile.self.cycles-pp.__mutex_lock
>       1.09 ±  6%      +1.1        2.22 ±  5%  perf-profile.self.cycles-pp.__libc_read
>       0.62 ±  6%      +1.2        1.79 ±  5%  perf-profile.self.cycles-pp.syscall_return_via_sysret
>       1.16 ±  8%      +1.2        2.38 ±  4%  perf-profile.self.cycles-pp.__might_resched
>       0.91 ±  7%      +1.3        2.20 ±  5%  perf-profile.self.cycles-pp.__libc_write
>       0.59 ±  8%      +1.3        1.93 ±  6%  perf-profile.self.cycles-pp.__entry_text_start
>       1.27 ±  7%      +1.7        3.00 ±  6%  perf-profile.self.cycles-pp.apparmor_file_permission
>       0.99 ±  8%      +2.0        2.98 ±  5%  perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
>       1.74 ±  8%      +3.4        5.15 ±  6%  perf-profile.self.cycles-pp.pipe_write
>       2.98 ±  8%      +3.7        6.64 ±  5%  perf-profile.self.cycles-pp.copy_user_enhanced_fast_string
>       2.62 ± 10%      +4.8        7.38 ±  5%  perf-profile.self.cycles-pp.mutex_lock
>       2.20 ± 10%      +5.1        7.30 ±  6%  perf-profile.self.cycles-pp.mutex_unlock
> 
> 
> ***************************************************************************************************
> lkp-skl-fpga01: 104 threads 2 sockets (Skylake) with 192G memory
> =========================================================================================
> compiler/cpufreq_governor/ipc/iterations/kconfig/mode/nr_threads/rootfs/tbox_group/testcase:
>   gcc-11/performance/socket/4/x86_64-rhel-8.3/process/100%/debian-11.1-x86_64-20220510.cgz/lkp-skl-fpga01/hackbench
> 
> commit:
>   a2e90611b9 ("sched/fair: Remove capacity inversion detection")
>   829c1651e9 ("sched/fair: sanitize vruntime of entity being placed")
> 
> a2e90611b9f425ad 829c1651e9c4a6f78398d3e6765
> ---------------- ---------------------------
>          %stddev     %change         %stddev
>              \          |                \
>     177139            -8.1%     162815        hackbench.throughput
>     174484           -18.8%     141618 ±  2%  hackbench.throughput_avg
>     177139            -8.1%     162815        hackbench.throughput_best
>     168530           -37.3%     105615 ±  3%  hackbench.throughput_worst
>     281.38           +23.1%     346.39 ±  2%  hackbench.time.elapsed_time
>     281.38           +23.1%     346.39 ±  2%  hackbench.time.elapsed_time.max
>  1.053e+08 ±  2%    +688.4%  8.302e+08 ±  9%  hackbench.time.involuntary_context_switches
>      21992           +27.8%      28116 ±  2%  hackbench.time.system_time
>       6652            +8.2%       7196        hackbench.time.user_time
>  3.482e+08          +289.2%  1.355e+09 ±  9%  hackbench.time.voluntary_context_switches
>    2110813 ±  5%     +21.6%    2565791 ±  3%  cpuidle..usage
>     333.95           +19.5%     399.05        uptime.boot
>       0.03            -0.0        0.03        mpstat.cpu.all.soft%
>      22.68            -2.9       19.77        mpstat.cpu.all.usr%
>     561083 ± 10%     +45.5%     816171 ± 12%  numa-numastat.node0.local_node
>     614314 ±  9%     +36.9%     841173 ± 12%  numa-numastat.node0.numa_hit
>    1393279 ±  7%     -16.8%    1158997 ±  2%  numa-numastat.node1.local_node
>    1443679 ±  5%     -14.9%    1229074 ±  3%  numa-numastat.node1.numa_hit
>    4129900 ±  8%     -23.0%    3181115        vmstat.memory.cache
>       1731           +30.8%       2265        vmstat.procs.r
>    1598044          +290.3%    6237840 ±  7%  vmstat.system.cs
>     320762           +60.5%     514672 ±  8%  vmstat.system.in
>     962111 ±  6%     +46.0%    1404646 ±  7%  turbostat.C1
>     233987 ±  5%     +51.2%     353892        turbostat.C1E
>   91515563           +97.3%  1.806e+08 ± 10%  turbostat.IRQ
>     448466 ± 14%     -34.2%     294934 ±  5%  turbostat.POLL
>      34.60            -7.3%      32.07        turbostat.RAMWatt
>     514028 ±  2%     -14.0%     442125 ±  2%  meminfo.AnonPages
>    4006312 ±  8%     -23.9%    3047078        meminfo.Cached
>    3321064 ± 10%     -32.7%    2236362 ±  2%  meminfo.Committed_AS
>    1714752 ± 21%     -60.3%     680479 ±  8%  meminfo.Inactive
>    1714585 ± 21%     -60.3%     680305 ±  8%  meminfo.Inactive(anon)
>     757124 ± 18%     -67.2%     248485 ± 27%  meminfo.Mapped
>    6476123 ±  6%     -19.4%    5220738        meminfo.Memused
>    1275724 ± 26%     -75.2%     316896 ± 15%  meminfo.Shmem
>    6806047 ±  3%     -13.3%    5901974        meminfo.max_used_kB
>     161311 ± 23%     +31.7%     212494 ±  5%  numa-meminfo.node0.AnonPages
>     165693 ± 22%     +30.5%     216264 ±  5%  numa-meminfo.node0.Inactive
>     165563 ± 22%     +30.6%     216232 ±  5%  numa-meminfo.node0.Inactive(anon)
>     140638 ± 19%     -36.7%      89034 ± 11%  numa-meminfo.node0.Mapped
>     352173 ± 14%     -35.3%     227805 ±  8%  numa-meminfo.node1.AnonPages
>     501396 ± 11%     -22.6%     388042 ±  5%  numa-meminfo.node1.AnonPages.max
>    1702242 ± 43%     -77.8%     378325 ± 22%  numa-meminfo.node1.FilePages
>    1540803 ± 25%     -70.4%     455592 ± 13%  numa-meminfo.node1.Inactive
>    1540767 ± 25%     -70.4%     455451 ± 13%  numa-meminfo.node1.Inactive(anon)
>     612123 ± 18%     -74.9%     153752 ± 37%  numa-meminfo.node1.Mapped
>    3085231 ± 24%     -53.9%    1420940 ± 14%  numa-meminfo.node1.MemUsed
>     254052 ±  4%     -19.1%     205632 ± 21%  numa-meminfo.node1.SUnreclaim
>    1259640 ± 27%     -75.9%     303123 ± 15%  numa-meminfo.node1.Shmem
>     304597 ±  7%     -20.2%     242920 ± 17%  numa-meminfo.node1.Slab
>      40345 ± 23%     +31.5%      53054 ±  5%  numa-vmstat.node0.nr_anon_pages
>      41412 ± 22%     +30.4%      53988 ±  5%  numa-vmstat.node0.nr_inactive_anon
>      35261 ± 19%     -36.9%      22256 ± 12%  numa-vmstat.node0.nr_mapped
>      41412 ± 22%     +30.4%      53988 ±  5%  numa-vmstat.node0.nr_zone_inactive_anon
>     614185 ±  9%     +36.9%     841065 ± 12%  numa-vmstat.node0.numa_hit
>     560955 ± 11%     +45.5%     816063 ± 12%  numa-vmstat.node0.numa_local
>      88129 ± 14%     -35.2%      57097 ±  8%  numa-vmstat.node1.nr_anon_pages
>     426425 ± 43%     -77.9%      94199 ± 22%  numa-vmstat.node1.nr_file_pages
>     386166 ± 25%     -70.5%     113880 ± 13%  numa-vmstat.node1.nr_inactive_anon
>     153658 ± 18%     -75.3%      38021 ± 37%  numa-vmstat.node1.nr_mapped
>     315775 ± 27%     -76.1%      75399 ± 16%  numa-vmstat.node1.nr_shmem
>      63411 ±  4%     -18.6%      51593 ± 21%  numa-vmstat.node1.nr_slab_unreclaimable
>     386166 ± 25%     -70.5%     113880 ± 13%  numa-vmstat.node1.nr_zone_inactive_anon
>    1443470 ±  5%     -14.9%    1228740 ±  3%  numa-vmstat.node1.numa_hit
>    1393069 ±  7%     -16.8%    1158664 ±  2%  numa-vmstat.node1.numa_local
>     128457 ±  2%     -14.0%     110530 ±  3%  proc-vmstat.nr_anon_pages
>     999461 ±  8%     -23.8%     761774        proc-vmstat.nr_file_pages
>     426485 ± 21%     -60.1%     170237 ±  9%  proc-vmstat.nr_inactive_anon
>      82464            -2.6%      80281        proc-vmstat.nr_kernel_stack
>     187777 ± 18%     -66.9%      62076 ± 28%  proc-vmstat.nr_mapped
>     316813 ± 27%     -75.0%      79228 ± 16%  proc-vmstat.nr_shmem
>      31469            -2.0%      30840        proc-vmstat.nr_slab_reclaimable
>     117889            -8.4%     108036        proc-vmstat.nr_slab_unreclaimable
>     426485 ± 21%     -60.1%     170237 ±  9%  proc-vmstat.nr_zone_inactive_anon
>     187187 ± 12%     -43.5%     105680 ±  9%  proc-vmstat.numa_hint_faults
>     128363 ± 15%     -61.5%      49371 ± 19%  proc-vmstat.numa_hint_faults_local
>      47314 ± 22%     +39.2%      65863 ± 13%  proc-vmstat.numa_pages_migrated
>     457026 ±  9%     -18.1%     374188 ± 13%  proc-vmstat.numa_pte_updates
>    2586600 ±  3%     +27.7%    3302787 ±  8%  proc-vmstat.pgalloc_normal
>    1589970            -6.2%    1491838        proc-vmstat.pgfault
>    2347186 ± 10%     +37.7%    3232369 ±  8%  proc-vmstat.pgfree
>      47314 ± 22%     +39.2%      65863 ± 13%  proc-vmstat.pgmigrate_success
>     112713            +7.0%     120630 ±  3%  proc-vmstat.pgreuse
>    2189056           +22.2%    2674944 ±  2%  proc-vmstat.unevictable_pgs_scanned
>      14.08 ±  2%     +29.3%      18.20 ±  5%  sched_debug.cfs_rq:/.h_nr_running.avg
>       0.80 ± 14%    +179.2%       2.23 ± 24%  sched_debug.cfs_rq:/.h_nr_running.min
>     245.23 ± 12%     -19.7%     196.97 ±  6%  sched_debug.cfs_rq:/.load_avg.max
>       2.27 ± 16%     +75.0%       3.97 ±  4%  sched_debug.cfs_rq:/.load_avg.min
>      45.77 ± 16%     -17.8%      37.60 ±  6%  sched_debug.cfs_rq:/.load_avg.stddev
>   11842707           +39.9%   16567992        sched_debug.cfs_rq:/.min_vruntime.avg
>   13773080 ±  3%    +113.9%   29460281 ±  7%  sched_debug.cfs_rq:/.min_vruntime.max
>   11423218           +30.3%   14885830        sched_debug.cfs_rq:/.min_vruntime.min
>     301190 ± 12%    +439.9%    1626088 ± 10%  sched_debug.cfs_rq:/.min_vruntime.stddev
>     203.83           -16.3%     170.67        sched_debug.cfs_rq:/.removed.load_avg.max
>      14330 ±  3%     +30.9%      18756 ±  5%  sched_debug.cfs_rq:/.runnable_avg.avg
>      25115 ±  4%     +15.5%      28999 ±  6%  sched_debug.cfs_rq:/.runnable_avg.max
>       3811 ± 11%     +68.0%       6404 ± 21%  sched_debug.cfs_rq:/.runnable_avg.min
>       3818 ±  6%     +15.3%       4404 ±  7%  sched_debug.cfs_rq:/.runnable_avg.stddev
>    -849635          +410.6%   -4338612        sched_debug.cfs_rq:/.spread0.avg
>    1092373 ± 54%    +691.1%    8641673 ± 21%  sched_debug.cfs_rq:/.spread0.max
>   -1263082          +378.1%   -6038905        sched_debug.cfs_rq:/.spread0.min
>     300764 ± 12%    +441.8%    1629507 ±  9%  sched_debug.cfs_rq:/.spread0.stddev
>       1591 ±  4%     -11.1%       1413 ±  3%  sched_debug.cfs_rq:/.util_avg.max
>     288.90 ± 11%     +64.5%     475.23 ± 13%  sched_debug.cfs_rq:/.util_avg.min
>     240.33 ±  2%     -32.1%     163.09 ±  3%  sched_debug.cfs_rq:/.util_avg.stddev
>     494.27 ±  3%     +41.6%     699.85 ±  3%  sched_debug.cfs_rq:/.util_est_enqueued.avg
>      11.23 ± 54%    +634.1%      82.47 ± 22%  sched_debug.cfs_rq:/.util_est_enqueued.min
>     174576           +20.7%     210681        sched_debug.cpu.clock.avg
>     174926           +21.2%     211944        sched_debug.cpu.clock.max
>     174164           +20.3%     209436        sched_debug.cpu.clock.min
>     230.84 ± 33%    +226.1%     752.67 ± 20%  sched_debug.cpu.clock.stddev
>     172836           +20.6%     208504        sched_debug.cpu.clock_task.avg
>     173552           +21.0%     210079        sched_debug.cpu.clock_task.max
>     156807           +22.3%     191789        sched_debug.cpu.clock_task.min
>       1634           +17.1%       1914 ±  5%  sched_debug.cpu.clock_task.stddev
>       0.00 ± 32%    +220.1%       0.00 ± 20%  sched_debug.cpu.next_balance.stddev
>      14.12 ±  2%     +28.7%      18.18 ±  5%  sched_debug.cpu.nr_running.avg
>       0.73 ± 25%    +213.6%       2.30 ± 24%  sched_debug.cpu.nr_running.min
>    1810086          +461.3%   10159215 ± 10%  sched_debug.cpu.nr_switches.avg
>    2315994 ±  3%    +515.6%   14258195 ±  9%  sched_debug.cpu.nr_switches.max
>    1529863          +380.3%    7348324 ±  9%  sched_debug.cpu.nr_switches.min
>     167487 ± 18%    +770.8%    1458519 ± 21%  sched_debug.cpu.nr_switches.stddev
>     174149           +20.2%     209410        sched_debug.cpu_clk
>     170980           +20.6%     206240        sched_debug.ktime
>     174896           +20.2%     210153        sched_debug.sched_clk
>       7.35           +24.9%       9.18 ±  4%  perf-stat.i.MPKI
>  1.918e+10           +14.4%  2.194e+10        perf-stat.i.branch-instructions
>       2.16            -0.1        2.09        perf-stat.i.branch-miss-rate%
>  4.133e+08            +6.6%  4.405e+08        perf-stat.i.branch-misses
>      23.08            -9.2       13.86 ±  7%  perf-stat.i.cache-miss-rate%
>  1.714e+08           -37.2%  1.076e+08 ±  3%  perf-stat.i.cache-misses
>  7.497e+08           +33.7%  1.002e+09 ±  5%  perf-stat.i.cache-references
>    1636365          +382.4%    7893858 ±  5%  perf-stat.i.context-switches
>       2.74            -6.8%       2.56        perf-stat.i.cpi
>     131725          +288.0%     511159 ± 10%  perf-stat.i.cpu-migrations
>       1672          +160.8%       4361 ±  4%  perf-stat.i.cycles-between-cache-misses
>       0.49            +0.6        1.11 ±  5%  perf-stat.i.dTLB-load-miss-rate%
>  1.417e+08          +158.7%  3.665e+08 ±  5%  perf-stat.i.dTLB-load-misses
>  2.908e+10            +9.1%  3.172e+10        perf-stat.i.dTLB-loads
>       0.12 ±  4%      +0.1        0.20 ±  4%  perf-stat.i.dTLB-store-miss-rate%
>   20805655 ±  4%     +90.9%   39716345 ±  4%  perf-stat.i.dTLB-store-misses
>  1.755e+10            +8.6%  1.907e+10        perf-stat.i.dTLB-stores
>      29.04            +3.6       32.62 ±  2%  perf-stat.i.iTLB-load-miss-rate%
>   56676082           +60.4%   90917582 ±  3%  perf-stat.i.iTLB-load-misses
>  1.381e+08           +30.6%  1.804e+08        perf-stat.i.iTLB-loads
>   1.03e+11           +10.5%  1.139e+11        perf-stat.i.instructions
>       1840           -21.1%       1451 ±  4%  perf-stat.i.instructions-per-iTLB-miss
>       0.37           +10.9%       0.41        perf-stat.i.ipc
>       1084            -4.5%       1035 ±  2%  perf-stat.i.metric.K/sec
>     640.69           +10.3%     706.44        perf-stat.i.metric.M/sec
>       5249            -9.3%       4762 ±  3%  perf-stat.i.minor-faults
>      23.57           +18.7       42.30 ±  8%  perf-stat.i.node-load-miss-rate%
>   40174555           -45.0%   22109431 ± 10%  perf-stat.i.node-loads
>       8.84 ±  2%     +24.5       33.30 ± 10%  perf-stat.i.node-store-miss-rate%
>    2912322           +60.3%    4667137 ± 16%  perf-stat.i.node-store-misses
>   34046752           -50.6%   16826621 ±  9%  perf-stat.i.node-stores
>       5278            -9.2%       4791 ±  3%  perf-stat.i.page-faults
>       7.24           +12.1%       8.12 ±  4%  perf-stat.overall.MPKI
>       2.15            -0.1        2.05        perf-stat.overall.branch-miss-rate%
>      22.92            -9.5       13.41 ±  7%  perf-stat.overall.cache-miss-rate%
>       2.73            -6.3%       2.56        perf-stat.overall.cpi
>       1644           +43.4%       2358 ±  3%  perf-stat.overall.cycles-between-cache-misses
>       0.48            +0.5        0.99 ±  4%  perf-stat.overall.dTLB-load-miss-rate%
>       0.12 ±  4%      +0.1        0.19 ±  4%  perf-stat.overall.dTLB-store-miss-rate%
>      29.06            +2.9       32.01 ±  2%  perf-stat.overall.iTLB-load-miss-rate%
>       1826           -26.6%       1340 ±  4%  perf-stat.overall.instructions-per-iTLB-miss
>       0.37            +6.8%       0.39        perf-stat.overall.ipc
>      22.74            +6.8       29.53 ± 13%  perf-stat.overall.node-load-miss-rate%
>       7.63            +8.4       16.02 ± 20%  perf-stat.overall.node-store-miss-rate%
>  1.915e+10            +9.0%  2.088e+10        perf-stat.ps.branch-instructions
>  4.119e+08            +3.9%  4.282e+08        perf-stat.ps.branch-misses
>  1.707e+08           -30.5%  1.186e+08 ±  3%  perf-stat.ps.cache-misses
>  7.446e+08           +19.2%  8.874e+08 ±  4%  perf-stat.ps.cache-references
>    1611874          +289.1%    6271376 ±  7%  perf-stat.ps.context-switches
>     127362          +189.0%     368041 ± 11%  perf-stat.ps.cpu-migrations
>  1.407e+08          +116.2%  3.042e+08 ±  5%  perf-stat.ps.dTLB-load-misses
>  2.901e+10            +5.4%  3.057e+10        perf-stat.ps.dTLB-loads
>   20667480 ±  4%     +66.8%   34473793 ±  4%  perf-stat.ps.dTLB-store-misses
>  1.751e+10            +5.1%   1.84e+10        perf-stat.ps.dTLB-stores
>   56310692           +45.0%   81644183 ±  4%  perf-stat.ps.iTLB-load-misses
>  1.375e+08           +26.1%  1.733e+08        perf-stat.ps.iTLB-loads
>  1.028e+11            +6.3%  1.093e+11        perf-stat.ps.instructions
>       4929           -24.5%       3723 ±  2%  perf-stat.ps.minor-faults
>   40134633           -32.9%   26946247 ±  9%  perf-stat.ps.node-loads
>    2805073           +39.5%    3914304 ± 16%  perf-stat.ps.node-store-misses
>   33938259           -38.9%   20726382 ±  8%  perf-stat.ps.node-stores
>       4952           -24.5%       3741 ±  2%  perf-stat.ps.page-faults
>  2.911e+13           +30.9%  3.809e+13 ±  2%  perf-stat.total.instructions
>      15.30 ±  4%      -8.6        6.66 ±  5%  perf-profile.calltrace.cycles-pp.sock_alloc_send_pskb.unix_stream_sendmsg.sock_sendmsg.sock_write_iter.vfs_write
>      13.84 ±  6%      -7.9        5.98 ±  6%  perf-profile.calltrace.cycles-pp.alloc_skb_with_frags.sock_alloc_send_pskb.unix_stream_sendmsg.sock_sendmsg.sock_write_iter
>      13.61 ±  6%      -7.8        5.84 ±  6%  perf-profile.calltrace.cycles-pp.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb.unix_stream_sendmsg.sock_sendmsg
>       9.00 ±  2%      -5.5        3.48 ±  4%  perf-profile.calltrace.cycles-pp.consume_skb.unix_stream_read_generic.unix_stream_recvmsg.sock_read_iter.vfs_read
>       6.44 ±  4%      -4.3        2.14 ±  6%  perf-profile.calltrace.cycles-pp.skb_release_data.consume_skb.unix_stream_read_generic.unix_stream_recvmsg.sock_read_iter
>       5.83 ±  8%      -3.4        2.44 ±  5%  perf-profile.calltrace.cycles-pp.kmem_cache_alloc_node.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb.unix_stream_sendmsg
>       5.81 ±  6%      -3.3        2.48 ±  6%  perf-profile.calltrace.cycles-pp.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb.unix_stream_sendmsg
>       5.50 ±  7%      -3.2        2.32 ±  6%  perf-profile.calltrace.cycles-pp.__kmalloc_node_track_caller.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
>       5.07 ±  8%      -3.0        2.04 ±  6%  perf-profile.calltrace.cycles-pp.__kmem_cache_alloc_node.__kmalloc_node_track_caller.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags
>       6.22 ±  2%      -2.9        3.33 ±  3%  perf-profile.calltrace.cycles-pp.unix_stream_read_actor.unix_stream_read_generic.unix_stream_recvmsg.sock_read_iter.vfs_read
>       6.17 ±  2%      -2.9        3.30 ±  3%  perf-profile.calltrace.cycles-pp.skb_copy_datagram_iter.unix_stream_read_actor.unix_stream_read_generic.unix_stream_recvmsg.sock_read_iter
>       6.11 ±  2%      -2.9        3.24 ±  3%  perf-profile.calltrace.cycles-pp.__skb_datagram_iter.skb_copy_datagram_iter.unix_stream_read_actor.unix_stream_read_generic.unix_stream_recvmsg
>      50.99            -2.6       48.39        perf-profile.calltrace.cycles-pp.__libc_read
>       5.66 ±  3%      -2.3        3.35 ±  3%  perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.__libc_read
>       5.52 ±  3%      -2.3        3.27 ±  3%  perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.__libc_write
>       3.14 ±  2%      -1.7        1.42 ±  4%  perf-profile.calltrace.cycles-pp._copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.unix_stream_read_actor.unix_stream_read_generic
>       2.73 ±  2%      -1.6        1.15 ±  4%  perf-profile.calltrace.cycles-pp.copyout._copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.unix_stream_read_actor
>       2.59 ±  2%      -1.5        1.07 ±  4%  perf-profile.calltrace.cycles-pp.copy_user_enhanced_fast_string.copyout._copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter
>       2.72 ±  3%      -1.4        1.34 ±  6%  perf-profile.calltrace.cycles-pp.kmem_cache_free.unix_stream_read_generic.unix_stream_recvmsg.sock_read_iter.vfs_read
>      41.50            -1.2       40.27        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__libc_read
>       2.26 ±  4%      -1.1        1.12        perf-profile.calltrace.cycles-pp.skb_release_head_state.consume_skb.unix_stream_read_generic.unix_stream_recvmsg.sock_read_iter
>       2.76 ±  3%      -1.1        1.63 ±  3%  perf-profile.calltrace.cycles-pp.__check_object_size.simple_copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.unix_stream_read_actor
>       2.84 ±  3%      -1.1        1.71 ±  2%  perf-profile.calltrace.cycles-pp.simple_copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.unix_stream_read_actor.unix_stream_read_generic
>       2.20 ±  4%      -1.1        1.08        perf-profile.calltrace.cycles-pp.unix_destruct_scm.skb_release_head_state.consume_skb.unix_stream_read_generic.unix_stream_recvmsg
>       2.98 ±  2%      -1.1        1.90 ±  6%  perf-profile.calltrace.cycles-pp.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_sendmsg.sock_write_iter.vfs_write
>       1.99 ±  4%      -1.1        0.92 ±  2%  perf-profile.calltrace.cycles-pp.sock_wfree.unix_destruct_scm.skb_release_head_state.consume_skb.unix_stream_read_generic
>       2.10 ±  3%      -1.0        1.08 ±  4%  perf-profile.calltrace.cycles-pp.check_heap_object.__check_object_size.simple_copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter
>       2.08 ±  4%      -0.8        1.24 ±  3%  perf-profile.calltrace.cycles-pp.__entry_text_start.__libc_write
>       2.16 ±  3%      -0.7        1.47        perf-profile.calltrace.cycles-pp.__entry_text_start.__libc_read
>       2.20 ±  2%      -0.7        1.52 ±  3%  perf-profile.calltrace.cycles-pp.__kmem_cache_free.skb_release_data.consume_skb.unix_stream_read_generic.unix_stream_recvmsg
>       1.46 ±  3%      -0.6        0.87 ±  8%  perf-profile.calltrace.cycles-pp._copy_from_iter.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_sendmsg.sock_write_iter
>       4.82 ±  2%      -0.6        4.24        perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
>       1.31 ±  2%      -0.4        0.90 ±  4%  perf-profile.calltrace.cycles-pp.__check_object_size.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_sendmsg.sock_write_iter
>       0.96 ±  3%      -0.4        0.57 ± 10%  perf-profile.calltrace.cycles-pp.copyin._copy_from_iter.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_sendmsg
>       1.14 ±  3%      -0.4        0.76 ±  5%  perf-profile.calltrace.cycles-pp.memcg_slab_post_alloc_hook.kmem_cache_alloc_node.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
>       0.99 ±  3%      -0.3        0.65 ±  8%  perf-profile.calltrace.cycles-pp.memcg_slab_post_alloc_hook.__kmem_cache_alloc_node.__kmalloc_node_track_caller.kmalloc_reserve.__alloc_skb
>       1.30 ±  4%      -0.3        0.99 ±  3%  perf-profile.calltrace.cycles-pp.sock_recvmsg.sock_read_iter.vfs_read.ksys_read.do_syscall_64
>       0.98 ±  2%      -0.3        0.69 ±  3%  perf-profile.calltrace.cycles-pp.security_file_permission.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       0.67            -0.2        0.42 ± 50%  perf-profile.calltrace.cycles-pp.check_heap_object.__check_object_size.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_sendmsg
>       0.56 ±  4%      -0.2        0.32 ± 81%  perf-profile.calltrace.cycles-pp.__fdget_pos.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
>       0.86 ±  2%      -0.2        0.63 ±  3%  perf-profile.calltrace.cycles-pp.apparmor_file_permission.security_file_permission.vfs_write.ksys_write.do_syscall_64
>       1.15 ±  4%      -0.2        0.93 ±  4%  perf-profile.calltrace.cycles-pp.security_socket_recvmsg.sock_recvmsg.sock_read_iter.vfs_read.ksys_read
>       0.90            -0.2        0.69 ±  3%  perf-profile.calltrace.cycles-pp.get_obj_cgroup_from_current.kmem_cache_alloc_node.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
>       1.23 ±  3%      -0.2        1.07 ±  3%  perf-profile.calltrace.cycles-pp.security_socket_sendmsg.sock_sendmsg.sock_write_iter.vfs_write.ksys_write
>       1.05 ±  2%      -0.2        0.88 ±  2%  perf-profile.calltrace.cycles-pp.security_file_permission.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       0.84 ±  4%      -0.2        0.68 ±  4%  perf-profile.calltrace.cycles-pp.aa_sk_perm.security_socket_recvmsg.sock_recvmsg.sock_read_iter.vfs_read
>       0.88            -0.1        0.78 ±  5%  perf-profile.calltrace.cycles-pp.apparmor_file_permission.security_file_permission.vfs_read.ksys_read.do_syscall_64
>       0.94 ±  3%      -0.1        0.88 ±  4%  perf-profile.calltrace.cycles-pp.aa_sk_perm.security_socket_sendmsg.sock_sendmsg.sock_write_iter.vfs_write
>       0.62 ±  2%      +0.3        0.90 ±  2%  perf-profile.calltrace.cycles-pp.mutex_lock.unix_stream_read_generic.unix_stream_recvmsg.sock_read_iter.vfs_read
>       0.00            +0.6        0.58 ±  3%  perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.prepare_to_wait.unix_stream_data_wait.unix_stream_read_generic.unix_stream_recvmsg
>       0.00            +0.6        0.61 ±  6%  perf-profile.calltrace.cycles-pp.update_load_avg.enqueue_task_fair.ttwu_do_activate.try_to_wake_up.autoremove_wake_function
>       0.00            +0.6        0.62 ±  4%  perf-profile.calltrace.cycles-pp.set_next_entity.pick_next_task_fair.__schedule.schedule.exit_to_user_mode_loop
>       0.00            +0.7        0.67 ± 11%  perf-profile.calltrace.cycles-pp.update_load_avg.dequeue_entity.dequeue_task_fair.__schedule.schedule
>       0.00            +0.7        0.67 ±  7%  perf-profile.calltrace.cycles-pp.__switch_to_asm.__libc_write
>       0.00            +0.8        0.76 ±  4%  perf-profile.calltrace.cycles-pp.reweight_entity.dequeue_task_fair.__schedule.schedule.schedule_timeout
>       0.00            +0.8        0.77 ±  4%  perf-profile.calltrace.cycles-pp.___perf_sw_event.prepare_task_switch.__schedule.schedule.schedule_timeout
>       0.00            +0.8        0.77 ±  8%  perf-profile.calltrace.cycles-pp.put_prev_entity.pick_next_task_fair.__schedule.schedule.exit_to_user_mode_loop
>       0.00            +0.8        0.81 ±  5%  perf-profile.calltrace.cycles-pp.prepare_task_switch.__schedule.schedule.exit_to_user_mode_loop.exit_to_user_mode_prepare
>       0.00            +0.8        0.81 ±  5%  perf-profile.calltrace.cycles-pp.check_preempt_wakeup.check_preempt_curr.ttwu_do_activate.try_to_wake_up.autoremove_wake_function
>       0.00            +0.8        0.82 ±  2%  perf-profile.calltrace.cycles-pp.__switch_to_asm.__libc_read
>       0.00            +0.8        0.82 ±  3%  perf-profile.calltrace.cycles-pp.reweight_entity.enqueue_task_fair.ttwu_do_activate.try_to_wake_up.autoremove_wake_function
>       0.00            +0.9        0.86 ±  5%  perf-profile.calltrace.cycles-pp.perf_trace_sched_wakeup_template.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock
>       0.00            +0.9        0.87 ±  8%  perf-profile.calltrace.cycles-pp.update_load_avg.enqueue_entity.enqueue_task_fair.ttwu_do_activate.try_to_wake_up
>      29.66            +0.9       30.58        perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write
>       0.00            +1.0        0.95 ±  3%  perf-profile.calltrace.cycles-pp.set_next_entity.pick_next_task_fair.__schedule.schedule.schedule_timeout
>       0.00            +1.0        0.98 ±  4%  perf-profile.calltrace.cycles-pp.check_preempt_curr.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common
>       0.00            +1.0        0.99 ±  3%  perf-profile.calltrace.cycles-pp.update_curr.enqueue_entity.enqueue_task_fair.ttwu_do_activate.try_to_wake_up
>       0.00            +1.0        1.05 ±  4%  perf-profile.calltrace.cycles-pp.prepare_to_wait.unix_stream_data_wait.unix_stream_read_generic.unix_stream_recvmsg.sock_read_iter
>       0.00            +1.1        1.07 ± 12%  perf-profile.calltrace.cycles-pp.select_idle_sibling.select_task_rq_fair.select_task_rq.try_to_wake_up.autoremove_wake_function
>      27.81 ±  2%      +1.2       28.98        perf-profile.calltrace.cycles-pp.unix_stream_recvmsg.sock_read_iter.vfs_read.ksys_read.do_syscall_64
>      27.36 ±  2%      +1.2       28.59        perf-profile.calltrace.cycles-pp.unix_stream_read_generic.unix_stream_recvmsg.sock_read_iter.vfs_read.ksys_read
>       0.00            +1.5        1.46 ±  2%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.try_to_wake_up.autoremove_wake_function.__wake_up_common
>       0.00            +1.6        1.55 ±  4%  perf-profile.calltrace.cycles-pp.prepare_task_switch.__schedule.schedule.schedule_timeout.unix_stream_data_wait
>       0.00            +1.6        1.60 ±  4%  perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
>      27.58            +1.6       29.19        perf-profile.calltrace.cycles-pp.sock_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       0.00            +1.6        1.63 ±  5%  perf-profile.calltrace.cycles-pp.update_curr.dequeue_entity.dequeue_task_fair.__schedule.schedule
>       0.00            +1.6        1.65 ±  5%  perf-profile.calltrace.cycles-pp.restore_fpregs_from_fpstate.switch_fpu_return.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
>       0.00            +1.7        1.66 ±  6%  perf-profile.calltrace.cycles-pp.switch_mm_irqs_off.__schedule.schedule.exit_to_user_mode_loop.exit_to_user_mode_prepare
>       0.00            +1.8        1.80        perf-profile.calltrace.cycles-pp._raw_spin_lock.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock
>       0.00            +1.8        1.84 ±  2%  perf-profile.calltrace.cycles-pp.pick_next_task_fair.__schedule.schedule.schedule_timeout.unix_stream_data_wait
>       0.00            +2.0        1.97 ±  2%  perf-profile.calltrace.cycles-pp.switch_mm_irqs_off.__schedule.schedule.schedule_timeout.unix_stream_data_wait
>      26.63 ±  2%      +2.0       28.61        perf-profile.calltrace.cycles-pp.sock_sendmsg.sock_write_iter.vfs_write.ksys_write.do_syscall_64
>       0.00            +2.0        2.01 ±  6%  perf-profile.calltrace.cycles-pp.pick_next_task_fair.__schedule.schedule.exit_to_user_mode_loop.exit_to_user_mode_prepare
>       0.00            +2.1        2.09 ±  6%  perf-profile.calltrace.cycles-pp.select_task_rq_fair.select_task_rq.try_to_wake_up.autoremove_wake_function.__wake_up_common
>       0.00            +2.1        2.11 ±  5%  perf-profile.calltrace.cycles-pp.switch_fpu_return.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
>      25.21 ±  2%      +2.2       27.43        perf-profile.calltrace.cycles-pp.unix_stream_sendmsg.sock_sendmsg.sock_write_iter.vfs_write.ksys_write
>       0.00            +2.4        2.43 ±  5%  perf-profile.calltrace.cycles-pp.select_task_rq.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock
>      48.00            +2.7       50.69        perf-profile.calltrace.cycles-pp.__libc_write
>       0.00            +2.9        2.87 ±  5%  perf-profile.calltrace.cycles-pp.dequeue_entity.dequeue_task_fair.__schedule.schedule.schedule_timeout
>       0.09 ±223%      +3.4        3.47 ±  3%  perf-profile.calltrace.cycles-pp.enqueue_entity.enqueue_task_fair.ttwu_do_activate.try_to_wake_up.autoremove_wake_function
>      39.07            +4.8       43.84        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__libc_write
>       0.66 ± 18%      +5.0        5.62 ±  4%  perf-profile.calltrace.cycles-pp.dequeue_task_fair.__schedule.schedule.schedule_timeout.unix_stream_data_wait
>       4.73            +5.1        9.88 ±  3%  perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write
>       0.66 ± 20%      +5.3        5.98 ±  3%  perf-profile.calltrace.cycles-pp.enqueue_task_fair.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common
>      35.96            +5.7       41.68        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write
>       0.00            +6.0        6.02 ±  6%  perf-profile.calltrace.cycles-pp.__schedule.schedule.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode
>       0.00            +6.2        6.18 ±  6%  perf-profile.calltrace.cycles-pp.schedule.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
>       0.00            +6.4        6.36 ±  6%  perf-profile.calltrace.cycles-pp.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       0.78 ± 19%      +6.4        7.15 ±  3%  perf-profile.calltrace.cycles-pp.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock
>       0.18 ±141%      +7.0        7.18 ±  6%  perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write
>       1.89 ± 15%     +12.1       13.96 ±  3%  perf-profile.calltrace.cycles-pp.__schedule.schedule.schedule_timeout.unix_stream_data_wait.unix_stream_read_generic
>       1.92 ± 15%     +12.3       14.23 ±  3%  perf-profile.calltrace.cycles-pp.schedule.schedule_timeout.unix_stream_data_wait.unix_stream_read_generic.unix_stream_recvmsg
>       1.66 ± 19%     +12.4       14.06 ±  2%  perf-profile.calltrace.cycles-pp.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock.sock_def_readable
>       1.96 ± 15%     +12.5       14.48 ±  3%  perf-profile.calltrace.cycles-pp.schedule_timeout.unix_stream_data_wait.unix_stream_read_generic.unix_stream_recvmsg.sock_read_iter
>       1.69 ± 19%     +12.7       14.38 ±  2%  perf-profile.calltrace.cycles-pp.autoremove_wake_function.__wake_up_common.__wake_up_common_lock.sock_def_readable.unix_stream_sendmsg
>       1.75 ± 19%     +13.0       14.75 ±  2%  perf-profile.calltrace.cycles-pp.__wake_up_common.__wake_up_common_lock.sock_def_readable.unix_stream_sendmsg.sock_sendmsg
>       2.53 ± 10%     +13.4       15.90 ±  2%  perf-profile.calltrace.cycles-pp.sock_def_readable.unix_stream_sendmsg.sock_sendmsg.sock_write_iter.vfs_write
>       1.96 ± 16%     +13.5       15.42 ±  2%  perf-profile.calltrace.cycles-pp.__wake_up_common_lock.sock_def_readable.unix_stream_sendmsg.sock_sendmsg.sock_write_iter
>       2.28 ± 15%     +14.6       16.86 ±  3%  perf-profile.calltrace.cycles-pp.unix_stream_data_wait.unix_stream_read_generic.unix_stream_recvmsg.sock_read_iter.vfs_read
>      15.31 ±  4%      -8.6        6.67 ±  5%  perf-profile.children.cycles-pp.sock_alloc_send_pskb
>      13.85 ±  6%      -7.9        5.98 ±  5%  perf-profile.children.cycles-pp.alloc_skb_with_frags
>      13.70 ±  6%      -7.8        5.89 ±  6%  perf-profile.children.cycles-pp.__alloc_skb
>       9.01 ±  2%      -5.5        3.48 ±  4%  perf-profile.children.cycles-pp.consume_skb
>       6.86 ± 26%      -4.7        2.15 ±  2%  perf-profile.children.cycles-pp._raw_spin_lock_irqsave
>      11.27 ±  3%      -4.6        6.67 ±  3%  perf-profile.children.cycles-pp.syscall_return_via_sysret
>       6.46 ±  4%      -4.3        2.15 ±  6%  perf-profile.children.cycles-pp.skb_release_data
>       4.18 ± 25%      -4.0        0.15 ± 69%  perf-profile.children.cycles-pp.___slab_alloc
>       5.76 ± 32%      -3.9        1.91 ±  3%  perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
>       5.98 ±  8%      -3.5        2.52 ±  5%  perf-profile.children.cycles-pp.kmem_cache_alloc_node
>       5.84 ±  6%      -3.3        2.50 ±  6%  perf-profile.children.cycles-pp.kmalloc_reserve
>       3.33 ± 30%      -3.3        0.05 ± 88%  perf-profile.children.cycles-pp.get_partial_node
>       5.63 ±  7%      -3.3        2.37 ±  6%  perf-profile.children.cycles-pp.__kmalloc_node_track_caller
>       5.20 ±  7%      -3.1        2.12 ±  6%  perf-profile.children.cycles-pp.__kmem_cache_alloc_node
>       6.23 ±  2%      -2.9        3.33 ±  3%  perf-profile.children.cycles-pp.unix_stream_read_actor
>       6.18 ±  2%      -2.9        3.31 ±  3%  perf-profile.children.cycles-pp.skb_copy_datagram_iter
>       6.11 ±  2%      -2.9        3.25 ±  3%  perf-profile.children.cycles-pp.__skb_datagram_iter
>      51.39            -2.5       48.85        perf-profile.children.cycles-pp.__libc_read
>       3.14 ±  3%      -2.5        0.61 ± 13%  perf-profile.children.cycles-pp.__slab_free
>       5.34 ±  3%      -2.1        3.23 ±  3%  perf-profile.children.cycles-pp.__entry_text_start
>       3.57 ±  2%      -1.9        1.66 ±  6%  perf-profile.children.cycles-pp.copy_user_enhanced_fast_string
>       3.16 ±  2%      -1.7        1.43 ±  4%  perf-profile.children.cycles-pp._copy_to_iter
>       2.74 ±  2%      -1.6        1.16 ±  4%  perf-profile.children.cycles-pp.copyout
>       4.16 ±  2%      -1.5        2.62 ±  3%  perf-profile.children.cycles-pp.__check_object_size
>       2.73 ±  3%      -1.4        1.35 ±  6%  perf-profile.children.cycles-pp.kmem_cache_free
>       2.82 ±  2%      -1.2        1.63 ±  3%  perf-profile.children.cycles-pp.check_heap_object
>       2.27 ±  4%      -1.1        1.13 ±  2%  perf-profile.children.cycles-pp.skb_release_head_state
>       2.85 ±  3%      -1.1        1.72 ±  2%  perf-profile.children.cycles-pp.simple_copy_to_iter
>       2.22 ±  4%      -1.1        1.10        perf-profile.children.cycles-pp.unix_destruct_scm
>       3.00 ±  2%      -1.1        1.91 ±  5%  perf-profile.children.cycles-pp.skb_copy_datagram_from_iter
>       2.00 ±  4%      -1.1        0.92 ±  2%  perf-profile.children.cycles-pp.sock_wfree
>       2.16 ±  3%      -0.7        1.43 ±  7%  perf-profile.children.cycles-pp.memcg_slab_post_alloc_hook
>       1.45 ±  3%      -0.7        0.73 ±  7%  perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
>       2.21 ±  2%      -0.7        1.52 ±  3%  perf-profile.children.cycles-pp.__kmem_cache_free
>       1.49 ±  3%      -0.6        0.89 ±  8%  perf-profile.children.cycles-pp._copy_from_iter
>       1.40 ±  3%      -0.6        0.85 ± 13%  perf-profile.children.cycles-pp.mod_objcg_state
>       0.74            -0.5        0.24 ± 16%  perf-profile.children.cycles-pp.__build_skb_around
>       1.48            -0.5        1.01 ±  2%  perf-profile.children.cycles-pp.get_obj_cgroup_from_current
>       2.05 ±  2%      -0.5        1.59 ±  2%  perf-profile.children.cycles-pp.security_file_permission
>       0.98 ±  2%      -0.4        0.59 ± 10%  perf-profile.children.cycles-pp.copyin
>       1.08 ±  3%      -0.4        0.72 ±  3%  perf-profile.children.cycles-pp.__might_resched
>       1.75            -0.3        1.42 ±  4%  perf-profile.children.cycles-pp.apparmor_file_permission
>       1.32 ±  4%      -0.3        1.00 ±  3%  perf-profile.children.cycles-pp.sock_recvmsg
>       0.54 ±  4%      -0.3        0.25 ±  6%  perf-profile.children.cycles-pp.skb_unlink
>       0.54 ±  6%      -0.3        0.26 ±  3%  perf-profile.children.cycles-pp.unix_write_space
>       0.66 ±  3%      -0.3        0.39 ±  4%  perf-profile.children.cycles-pp.obj_cgroup_charge
>       0.68 ±  2%      -0.3        0.41 ±  4%  perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
>       0.86 ±  4%      -0.3        0.59 ±  3%  perf-profile.children.cycles-pp.__check_heap_object
>       0.75 ±  9%      -0.3        0.48 ±  2%  perf-profile.children.cycles-pp.skb_set_owner_w
>       1.84 ±  3%      -0.3        1.58 ±  4%  perf-profile.children.cycles-pp.aa_sk_perm
>       0.68 ± 11%      -0.2        0.44 ±  3%  perf-profile.children.cycles-pp.skb_queue_tail
>       1.22 ±  4%      -0.2        0.99 ±  5%  perf-profile.children.cycles-pp.__fdget_pos
>       0.70 ±  2%      -0.2        0.48 ±  5%  perf-profile.children.cycles-pp.__get_obj_cgroup_from_memcg
>       1.16 ±  4%      -0.2        0.93 ±  3%  perf-profile.children.cycles-pp.security_socket_recvmsg
>       0.48 ±  3%      -0.2        0.29 ±  4%  perf-profile.children.cycles-pp.__might_fault
>       0.24 ±  7%      -0.2        0.05 ± 56%  perf-profile.children.cycles-pp.fsnotify_perm
>       1.12 ±  4%      -0.2        0.93 ±  6%  perf-profile.children.cycles-pp.__fget_light
>       1.24 ±  3%      -0.2        1.07 ±  3%  perf-profile.children.cycles-pp.security_socket_sendmsg
>       0.61 ±  3%      -0.2        0.45 ±  2%  perf-profile.children.cycles-pp.__might_sleep
>       0.33 ±  5%      -0.2        0.17 ±  6%  perf-profile.children.cycles-pp.refill_obj_stock
>       0.40 ±  2%      -0.1        0.25 ±  4%  perf-profile.children.cycles-pp.kmalloc_slab
>       0.57 ±  2%      -0.1        0.45        perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
>       0.54 ±  3%      -0.1        0.42 ±  2%  perf-profile.children.cycles-pp.wait_for_unix_gc
>       0.42 ±  2%      -0.1        0.30 ±  3%  perf-profile.children.cycles-pp.is_vmalloc_addr
>       1.00 ±  2%      -0.1        0.87 ±  5%  perf-profile.children.cycles-pp.__virt_addr_valid
>       0.52 ±  2%      -0.1        0.41        perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
>       0.33 ±  3%      -0.1        0.21 ±  3%  perf-profile.children.cycles-pp.tick_sched_handle
>       0.36 ±  2%      -0.1        0.25 ±  4%  perf-profile.children.cycles-pp.tick_sched_timer
>       0.47 ±  2%      -0.1        0.36 ±  2%  perf-profile.children.cycles-pp.hrtimer_interrupt
>       0.48 ±  2%      -0.1        0.36 ±  2%  perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
>       0.32 ±  3%      -0.1        0.21 ±  5%  perf-profile.children.cycles-pp.update_process_times
>       0.42 ±  3%      -0.1        0.31 ±  2%  perf-profile.children.cycles-pp.__hrtimer_run_queues
>       0.26 ±  6%      -0.1        0.16 ±  4%  perf-profile.children.cycles-pp.kmalloc_size_roundup
>       0.20 ±  4%      -0.1        0.10 ±  9%  perf-profile.children.cycles-pp.task_tick_fair
>       0.24 ±  3%      -0.1        0.15 ±  4%  perf-profile.children.cycles-pp.scheduler_tick
>       0.30 ±  5%      -0.1        0.21 ±  8%  perf-profile.children.cycles-pp.obj_cgroup_uncharge_pages
>       0.20 ±  2%      -0.1        0.11 ±  6%  perf-profile.children.cycles-pp.should_failslab
>       0.51 ±  2%      -0.1        0.43 ±  6%  perf-profile.children.cycles-pp.syscall_enter_from_user_mode
>       0.15 ±  8%      -0.1        0.07 ± 13%  perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare
>       0.19 ±  4%      -0.1        0.12 ±  5%  perf-profile.children.cycles-pp.apparmor_socket_sendmsg
>       0.20 ±  4%      -0.1        0.13 ±  5%  perf-profile.children.cycles-pp.aa_file_perm
>       0.18 ±  5%      -0.1        0.12 ±  5%  perf-profile.children.cycles-pp.apparmor_socket_recvmsg
>       0.14 ± 13%      -0.1        0.08 ± 55%  perf-profile.children.cycles-pp.__mod_memcg_lruvec_state
>       0.24 ±  4%      -0.1        0.18 ±  2%  perf-profile.children.cycles-pp.rcu_all_qs
>       0.18 ± 10%      -0.1        0.12 ± 11%  perf-profile.children.cycles-pp.memcg_account_kmem
>       0.37 ±  3%      -0.1        0.31 ±  3%  perf-profile.children.cycles-pp.security_socket_getpeersec_dgram
>       0.08            -0.0        0.06 ±  8%  perf-profile.children.cycles-pp.put_pid
>       0.18 ±  3%      -0.0        0.16 ±  4%  perf-profile.children.cycles-pp.apparmor_socket_getpeersec_dgram
>       0.21 ±  3%      +0.0        0.23 ±  2%  perf-profile.children.cycles-pp.__get_task_ioprio
>       0.00            +0.1        0.05        perf-profile.children.cycles-pp.perf_exclude_event
>       0.00            +0.1        0.06 ±  7%  perf-profile.children.cycles-pp.invalidate_user_asid
>       0.00            +0.1        0.07 ±  6%  perf-profile.children.cycles-pp.__bitmap_and
>       0.05            +0.1        0.13 ±  8%  perf-profile.children.cycles-pp.irqentry_exit_to_user_mode
>       0.00            +0.1        0.08 ±  7%  perf-profile.children.cycles-pp.schedule_debug
>       0.00            +0.1        0.08 ± 13%  perf-profile.children.cycles-pp.read@plt
>       0.00            +0.1        0.08 ±  5%  perf-profile.children.cycles-pp.sysvec_reschedule_ipi
>       0.00            +0.1        0.10 ±  4%  perf-profile.children.cycles-pp.tracing_gen_ctx_irq_test
>       0.00            +0.1        0.10 ±  4%  perf-profile.children.cycles-pp.place_entity
>       0.00            +0.1        0.12 ± 10%  perf-profile.children.cycles-pp.native_irq_return_iret
>       0.07 ± 14%      +0.1        0.19 ±  3%  perf-profile.children.cycles-pp.__list_add_valid
>       0.00            +0.1        0.13 ±  6%  perf-profile.children.cycles-pp.perf_trace_buf_alloc
>       0.00            +0.1        0.13 ± 34%  perf-profile.children.cycles-pp._find_next_and_bit
>       0.00            +0.1        0.14 ±  5%  perf-profile.children.cycles-pp.switch_ldt
>       0.00            +0.1        0.15 ±  5%  perf-profile.children.cycles-pp.check_cfs_rq_runtime
>       0.00            +0.1        0.15 ± 30%  perf-profile.children.cycles-pp.migrate_task_rq_fair
>       0.00            +0.2        0.15 ±  5%  perf-profile.children.cycles-pp.__rdgsbase_inactive
>       0.00            +0.2        0.16 ±  3%  perf-profile.children.cycles-pp.save_fpregs_to_fpstate
>       0.00            +0.2        0.16 ±  6%  perf-profile.children.cycles-pp.ttwu_queue_wakelist
>       0.00            +0.2        0.17        perf-profile.children.cycles-pp.perf_trace_buf_update
>       0.00            +0.2        0.18 ±  2%  perf-profile.children.cycles-pp.rb_insert_color
>       0.00            +0.2        0.18 ±  4%  perf-profile.children.cycles-pp.rb_next
>       0.00            +0.2        0.18 ± 21%  perf-profile.children.cycles-pp.__cgroup_account_cputime
>       0.01 ±223%      +0.2        0.21 ± 28%  perf-profile.children.cycles-pp.perf_trace_sched_switch
>       0.00            +0.2        0.20 ±  3%  perf-profile.children.cycles-pp.select_idle_cpu
>       0.00            +0.2        0.20 ±  3%  perf-profile.children.cycles-pp.rcu_note_context_switch
>       0.00            +0.2        0.21 ± 26%  perf-profile.children.cycles-pp.set_task_cpu
>       0.00            +0.2        0.22 ±  8%  perf-profile.children.cycles-pp.resched_curr
>       0.08 ±  5%      +0.2        0.31 ± 11%  perf-profile.children.cycles-pp.task_h_load
>       0.00            +0.2        0.24 ±  3%  perf-profile.children.cycles-pp.finish_wait
>       0.04 ± 44%      +0.3        0.29 ±  5%  perf-profile.children.cycles-pp.rb_erase
>       0.19 ±  6%      +0.3        0.46        perf-profile.children.cycles-pp._raw_spin_unlock_irqrestore
>       0.20 ±  6%      +0.3        0.47 ±  3%  perf-profile.children.cycles-pp.__list_del_entry_valid
>       0.00            +0.3        0.28 ±  3%  perf-profile.children.cycles-pp.__wrgsbase_inactive
>       0.02 ±141%      +0.3        0.30 ±  2%  perf-profile.children.cycles-pp.native_sched_clock
>       0.06 ± 13%      +0.3        0.34 ±  2%  perf-profile.children.cycles-pp.sched_clock_cpu
>       0.64 ±  2%      +0.3        0.93        perf-profile.children.cycles-pp.mutex_lock
>       0.00            +0.3        0.30 ±  5%  perf-profile.children.cycles-pp.cr4_update_irqsoff
>       0.00            +0.3        0.30 ±  4%  perf-profile.children.cycles-pp.clear_buddies
>       0.07 ± 55%      +0.3        0.37 ±  5%  perf-profile.children.cycles-pp.perf_trace_sched_stat_runtime
>       0.10 ± 66%      +0.3        0.42 ±  5%  perf-profile.children.cycles-pp.perf_tp_event
>       0.02 ±142%      +0.3        0.36 ±  6%  perf-profile.children.cycles-pp.cpuacct_charge
>       0.12 ±  9%      +0.4        0.47 ± 11%  perf-profile.children.cycles-pp.wake_affine
>       0.00            +0.4        0.36 ± 13%  perf-profile.children.cycles-pp.available_idle_cpu
>       0.05 ± 48%      +0.4        0.42 ±  6%  perf-profile.children.cycles-pp.finish_task_switch
>       0.12 ±  4%      +0.4        0.49 ±  4%  perf-profile.children.cycles-pp.asm_sysvec_reschedule_ipi
>       0.07 ± 17%      +0.4        0.48        perf-profile.children.cycles-pp.__calc_delta
>       0.03 ±100%      +0.5        0.49 ±  4%  perf-profile.children.cycles-pp.pick_next_entity
>       0.00            +0.5        0.48 ±  8%  perf-profile.children.cycles-pp.set_next_buddy
>       0.08 ± 14%      +0.6        0.66 ±  4%  perf-profile.children.cycles-pp.update_min_vruntime
>       0.07 ± 17%      +0.6        0.68 ±  2%  perf-profile.children.cycles-pp.os_xsave
>       0.29 ±  7%      +0.7        0.99 ±  3%  perf-profile.children.cycles-pp.update_cfs_group
>       0.17 ± 17%      +0.7        0.87 ±  4%  perf-profile.children.cycles-pp.perf_trace_sched_wakeup_template
>       0.14 ±  7%      +0.7        0.87 ±  3%  perf-profile.children.cycles-pp.__update_load_avg_se
>       0.14 ± 16%      +0.8        0.90 ±  2%  perf-profile.children.cycles-pp.update_rq_clock
>       0.08 ± 17%      +0.8        0.84 ±  5%  perf-profile.children.cycles-pp.check_preempt_wakeup
>       0.12 ± 14%      +0.8        0.95 ±  3%  perf-profile.children.cycles-pp.__update_load_avg_cfs_rq
>       0.22 ±  5%      +0.8        1.07 ±  3%  perf-profile.children.cycles-pp.prepare_to_wait
>       0.10 ± 18%      +0.9        0.98 ±  3%  perf-profile.children.cycles-pp.check_preempt_curr
>      29.72            +0.9       30.61        perf-profile.children.cycles-pp.vfs_write
>       0.14 ± 11%      +0.9        1.03 ±  4%  perf-profile.children.cycles-pp.__switch_to
>       0.07 ± 20%      +0.9        0.99 ±  6%  perf-profile.children.cycles-pp.put_prev_entity
>       0.12 ± 16%      +1.0        1.13 ±  5%  perf-profile.children.cycles-pp.___perf_sw_event
>       0.07 ± 17%      +1.0        1.10 ± 13%  perf-profile.children.cycles-pp.select_idle_sibling
>      27.82 ±  2%      +1.2       28.99        perf-profile.children.cycles-pp.unix_stream_recvmsg
>      27.41 ±  2%      +1.2       28.63        perf-profile.children.cycles-pp.unix_stream_read_generic
>       0.20 ± 15%      +1.4        1.59 ±  3%  perf-profile.children.cycles-pp.reweight_entity
>       0.21 ± 13%      +1.4        1.60 ±  4%  perf-profile.children.cycles-pp.__switch_to_asm
>       0.23 ± 10%      +1.4        1.65 ±  5%  perf-profile.children.cycles-pp.restore_fpregs_from_fpstate
>       0.20 ± 13%      +1.5        1.69 ±  3%  perf-profile.children.cycles-pp.set_next_entity
>      27.59            +1.6       29.19        perf-profile.children.cycles-pp.sock_write_iter
>       0.28 ± 10%      +1.8        2.12 ±  5%  perf-profile.children.cycles-pp.switch_fpu_return
>       0.26 ± 11%      +1.8        2.10 ±  6%  perf-profile.children.cycles-pp.select_task_rq_fair
>      26.66 ±  2%      +2.0       28.63        perf-profile.children.cycles-pp.sock_sendmsg
>       0.31 ± 12%      +2.1        2.44 ±  5%  perf-profile.children.cycles-pp.select_task_rq
>       0.30 ± 14%      +2.2        2.46 ±  4%  perf-profile.children.cycles-pp.prepare_task_switch
>      25.27 ±  2%      +2.2       27.47        perf-profile.children.cycles-pp.unix_stream_sendmsg
>       2.10            +2.3        4.38 ±  2%  perf-profile.children.cycles-pp._raw_spin_lock
>       0.40 ± 14%      +2.5        2.92 ±  5%  perf-profile.children.cycles-pp.dequeue_entity
>      48.40            +2.6       51.02        perf-profile.children.cycles-pp.__libc_write
>       0.46 ± 15%      +3.1        3.51 ±  3%  perf-profile.children.cycles-pp.enqueue_entity
>       0.49 ± 10%      +3.2        3.64 ±  7%  perf-profile.children.cycles-pp.update_load_avg
>       0.53 ± 20%      +3.4        3.91 ±  3%  perf-profile.children.cycles-pp.update_curr
>      80.81            +3.4       84.24        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
>       0.50 ± 12%      +3.5        4.00 ±  4%  perf-profile.children.cycles-pp.switch_mm_irqs_off
>       0.55 ±  9%      +3.8        4.38 ±  4%  perf-profile.children.cycles-pp.pick_next_task_fair
>       9.60            +4.6       14.15 ±  2%  perf-profile.children.cycles-pp.syscall_exit_to_user_mode
>       0.78 ± 13%      +4.9        5.65 ±  4%  perf-profile.children.cycles-pp.dequeue_task_fair
>       0.78 ± 15%      +5.2        5.99 ±  3%  perf-profile.children.cycles-pp.enqueue_task_fair
>      74.30            +5.6       79.86        perf-profile.children.cycles-pp.do_syscall_64
>       0.90 ± 15%      +6.3        7.16 ±  3%  perf-profile.children.cycles-pp.ttwu_do_activate
>       0.33 ± 31%      +6.3        6.61 ±  6%  perf-profile.children.cycles-pp.exit_to_user_mode_loop
>       0.82 ± 15%      +8.1        8.92 ±  5%  perf-profile.children.cycles-pp.exit_to_user_mode_prepare
>       1.90 ± 16%     +12.2       14.10 ±  2%  perf-profile.children.cycles-pp.try_to_wake_up
>       2.36 ± 11%     +12.2       14.60 ±  3%  perf-profile.children.cycles-pp.schedule_timeout
>       1.95 ± 15%     +12.5       14.41 ±  2%  perf-profile.children.cycles-pp.autoremove_wake_function
>       2.01 ± 15%     +12.8       14.76 ±  2%  perf-profile.children.cycles-pp.__wake_up_common
>       2.23 ± 13%     +13.2       15.45 ±  2%  perf-profile.children.cycles-pp.__wake_up_common_lock
>       2.53 ± 10%     +13.4       15.90 ±  2%  perf-profile.children.cycles-pp.sock_def_readable
>       2.29 ± 15%     +14.6       16.93 ±  3%  perf-profile.children.cycles-pp.unix_stream_data_wait
>       2.61 ± 13%     +18.0       20.65 ±  4%  perf-profile.children.cycles-pp.schedule
>       2.66 ± 13%     +18.1       20.77 ±  4%  perf-profile.children.cycles-pp.__schedule
>      11.25 ±  3%      -4.6        6.67 ±  3%  perf-profile.self.cycles-pp.syscall_return_via_sysret
>       5.76 ± 32%      -3.9        1.90 ±  3%  perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
>       8.69 ±  3%      -3.4        5.27 ±  3%  perf-profile.self.cycles-pp.syscall_exit_to_user_mode
>       3.11 ±  3%      -2.5        0.60 ± 13%  perf-profile.self.cycles-pp.__slab_free
>       6.65 ±  2%      -2.2        4.47 ±  2%  perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
>       4.78 ±  3%      -1.9        2.88 ±  3%  perf-profile.self.cycles-pp.__entry_text_start
>       3.52 ±  2%      -1.9        1.64 ±  6%  perf-profile.self.cycles-pp.copy_user_enhanced_fast_string
>       2.06 ±  3%      -1.1        0.96 ±  5%  perf-profile.self.cycles-pp.kmem_cache_free
>       1.42 ±  3%      -1.0        0.46 ± 10%  perf-profile.self.cycles-pp.check_heap_object
>       1.43 ±  4%      -0.8        0.64        perf-profile.self.cycles-pp.sock_wfree
>       0.99 ±  3%      -0.8        0.21 ± 12%  perf-profile.self.cycles-pp.skb_release_data
>       0.84 ±  8%      -0.7        0.10 ± 64%  perf-profile.self.cycles-pp.___slab_alloc
>       1.97 ±  2%      -0.6        1.32        perf-profile.self.cycles-pp.unix_stream_read_generic
>       1.60 ±  3%      -0.5        1.11 ±  4%  perf-profile.self.cycles-pp.memcg_slab_post_alloc_hook
>       1.24 ±  2%      -0.5        0.75 ± 11%  perf-profile.self.cycles-pp.mod_objcg_state
>       0.71            -0.5        0.23 ± 15%  perf-profile.self.cycles-pp.__build_skb_around
>       0.95 ±  3%      -0.5        0.50 ±  6%  perf-profile.self.cycles-pp.__alloc_skb
>       0.97 ±  4%      -0.4        0.55 ±  5%  perf-profile.self.cycles-pp.kmem_cache_alloc_node
>       0.99 ±  3%      -0.4        0.59 ±  4%  perf-profile.self.cycles-pp.vfs_write
>       1.38 ±  2%      -0.4        0.99        perf-profile.self.cycles-pp.__kmem_cache_free
>       0.86 ±  2%      -0.4        0.50 ±  3%  perf-profile.self.cycles-pp.__kmem_cache_alloc_node
>       0.92 ±  4%      -0.4        0.56 ±  4%  perf-profile.self.cycles-pp.sock_write_iter
>       1.06 ±  3%      -0.4        0.70 ±  3%  perf-profile.self.cycles-pp.__might_resched
>       0.73 ±  4%      -0.3        0.44 ±  4%  perf-profile.self.cycles-pp.__cond_resched
>       0.85 ±  3%      -0.3        0.59 ±  4%  perf-profile.self.cycles-pp.__check_heap_object
>       1.46 ±  7%      -0.3        1.20 ±  2%  perf-profile.self.cycles-pp.unix_stream_sendmsg
>       0.73 ±  9%      -0.3        0.47 ±  2%  perf-profile.self.cycles-pp.skb_set_owner_w
>       1.54            -0.3        1.28 ±  4%  perf-profile.self.cycles-pp.apparmor_file_permission
>       0.74 ±  3%      -0.2        0.50 ±  2%  perf-profile.self.cycles-pp.get_obj_cgroup_from_current
>       1.15 ±  3%      -0.2        0.91 ±  8%  perf-profile.self.cycles-pp.aa_sk_perm
>       0.60            -0.2        0.36 ±  4%  perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
>       0.65 ±  4%      -0.2        0.45 ±  6%  perf-profile.self.cycles-pp.__get_obj_cgroup_from_memcg
>       0.24 ±  6%      -0.2        0.05 ± 56%  perf-profile.self.cycles-pp.fsnotify_perm
>       0.76 ±  3%      -0.2        0.58 ±  2%  perf-profile.self.cycles-pp.sock_read_iter
>       1.10 ±  4%      -0.2        0.92 ±  6%  perf-profile.self.cycles-pp.__fget_light
>       0.42 ±  3%      -0.2        0.25 ±  4%  perf-profile.self.cycles-pp.obj_cgroup_charge
>       0.32 ±  4%      -0.2        0.17 ±  6%  perf-profile.self.cycles-pp.refill_obj_stock
>       0.29            -0.2        0.14 ±  8%  perf-profile.self.cycles-pp.__kmalloc_node_track_caller
>       0.54 ±  3%      -0.1        0.40 ±  2%  perf-profile.self.cycles-pp.__might_sleep
>       0.30 ±  7%      -0.1        0.16 ± 22%  perf-profile.self.cycles-pp.security_file_permission
>       0.34 ±  3%      -0.1        0.21 ±  6%  perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
>       0.41 ±  3%      -0.1        0.29 ±  3%  perf-profile.self.cycles-pp.is_vmalloc_addr
>       0.27 ±  3%      -0.1        0.16 ±  6%  perf-profile.self.cycles-pp._copy_from_iter
>       0.24 ±  3%      -0.1        0.12 ±  3%  perf-profile.self.cycles-pp.ksys_write
>       0.95 ±  2%      -0.1        0.84 ±  5%  perf-profile.self.cycles-pp.__virt_addr_valid
>       0.56 ± 11%      -0.1        0.46 ±  4%  perf-profile.self.cycles-pp.sock_def_readable
>       0.16 ±  7%      -0.1        0.06 ± 18%  perf-profile.self.cycles-pp.sock_recvmsg
>       0.22 ±  5%      -0.1        0.14 ±  2%  perf-profile.self.cycles-pp.ksys_read
>       0.27 ±  4%      -0.1        0.19 ±  5%  perf-profile.self.cycles-pp.kmalloc_slab
>       0.28 ±  2%      -0.1        0.20 ±  2%  perf-profile.self.cycles-pp.consume_skb
>       0.35 ±  2%      -0.1        0.28 ±  3%  perf-profile.self.cycles-pp.__check_object_size
>       0.13 ±  8%      -0.1        0.06 ± 18%  perf-profile.self.cycles-pp.syscall_exit_to_user_mode_prepare
>       0.20 ±  5%      -0.1        0.12 ±  6%  perf-profile.self.cycles-pp.kmalloc_reserve
>       0.26 ±  5%      -0.1        0.19 ±  4%  perf-profile.self.cycles-pp.sock_alloc_send_pskb
>       0.42 ±  2%      -0.1        0.35 ±  7%  perf-profile.self.cycles-pp.syscall_enter_from_user_mode
>       0.19 ±  5%      -0.1        0.12 ±  6%  perf-profile.self.cycles-pp.aa_file_perm
>       0.16 ±  4%      -0.1        0.10 ±  4%  perf-profile.self.cycles-pp.skb_copy_datagram_from_iter
>       0.18 ±  4%      -0.1        0.12 ±  6%  perf-profile.self.cycles-pp.apparmor_socket_sendmsg
>       0.18 ±  5%      -0.1        0.12 ±  4%  perf-profile.self.cycles-pp.apparmor_socket_recvmsg
>       0.15 ±  5%      -0.1        0.10 ±  5%  perf-profile.self.cycles-pp.alloc_skb_with_frags
>       0.64 ±  3%      -0.1        0.59        perf-profile.self.cycles-pp.__libc_write
>       0.20 ±  4%      -0.1        0.15 ±  3%  perf-profile.self.cycles-pp._copy_to_iter
>       0.15 ±  5%      -0.1        0.10 ± 11%  perf-profile.self.cycles-pp.sock_sendmsg
>       0.08 ±  4%      -0.1        0.03 ± 81%  perf-profile.self.cycles-pp.copyout
>       0.11 ±  6%      -0.0        0.06 ±  7%  perf-profile.self.cycles-pp.__fdget_pos
>       0.12 ±  5%      -0.0        0.07 ± 10%  perf-profile.self.cycles-pp.kmalloc_size_roundup
>       0.34 ±  3%      -0.0        0.29        perf-profile.self.cycles-pp.do_syscall_64
>       0.20 ±  4%      -0.0        0.15 ±  4%  perf-profile.self.cycles-pp.rcu_all_qs
>       0.41 ±  3%      -0.0        0.37 ±  8%  perf-profile.self.cycles-pp.unix_stream_recvmsg
>       0.22 ±  2%      -0.0        0.17 ±  4%  perf-profile.self.cycles-pp.unix_destruct_scm
>       0.09 ±  4%      -0.0        0.05        perf-profile.self.cycles-pp.should_failslab
>       0.10 ± 15%      -0.0        0.06 ± 50%  perf-profile.self.cycles-pp.__mod_memcg_lruvec_state
>       0.11 ±  4%      -0.0        0.07        perf-profile.self.cycles-pp.__might_fault
>       0.16 ±  2%      -0.0        0.13 ±  6%  perf-profile.self.cycles-pp.obj_cgroup_uncharge_pages
>       0.18 ±  4%      -0.0        0.16 ±  3%  perf-profile.self.cycles-pp.security_socket_getpeersec_dgram
>       0.28 ±  2%      -0.0        0.25 ±  2%  perf-profile.self.cycles-pp.unix_write_space
>       0.17 ±  2%      -0.0        0.15 ±  5%  perf-profile.self.cycles-pp.apparmor_socket_getpeersec_dgram
>       0.08 ±  6%      -0.0        0.05 ±  7%  perf-profile.self.cycles-pp.security_socket_sendmsg
>       0.12 ±  4%      -0.0        0.10 ±  3%  perf-profile.self.cycles-pp.__skb_datagram_iter
>       0.24 ±  2%      -0.0        0.22        perf-profile.self.cycles-pp.mutex_unlock
>       0.08 ±  5%      +0.0        0.10 ±  6%  perf-profile.self.cycles-pp.scm_recv
>       0.17 ±  2%      +0.0        0.19 ±  3%  perf-profile.self.cycles-pp.__x64_sys_read
>       0.19 ±  3%      +0.0        0.22 ±  2%  perf-profile.self.cycles-pp.__get_task_ioprio
>       0.00            +0.1        0.06        perf-profile.self.cycles-pp.finish_wait
>       0.00            +0.1        0.06 ±  7%  perf-profile.self.cycles-pp.cr4_update_irqsoff
>       0.00            +0.1        0.06 ±  7%  perf-profile.self.cycles-pp.invalidate_user_asid
>       0.00            +0.1        0.07 ± 12%  perf-profile.self.cycles-pp.wake_affine
>       0.00            +0.1        0.07 ±  7%  perf-profile.self.cycles-pp.check_cfs_rq_runtime
>       0.00            +0.1        0.07 ±  5%  perf-profile.self.cycles-pp.perf_trace_buf_update
>       0.00            +0.1        0.07 ±  9%  perf-profile.self.cycles-pp.asm_sysvec_reschedule_ipi
>       0.00            +0.1        0.07 ± 10%  perf-profile.self.cycles-pp.__bitmap_and
>       0.00            +0.1        0.08 ± 10%  perf-profile.self.cycles-pp.schedule_debug
>       0.00            +0.1        0.08 ± 13%  perf-profile.self.cycles-pp.read@plt
>       0.00            +0.1        0.08 ± 12%  perf-profile.self.cycles-pp.perf_trace_buf_alloc
>       0.00            +0.1        0.09 ± 35%  perf-profile.self.cycles-pp.migrate_task_rq_fair
>       0.00            +0.1        0.09 ±  5%  perf-profile.self.cycles-pp.place_entity
>       0.00            +0.1        0.10 ±  4%  perf-profile.self.cycles-pp.tracing_gen_ctx_irq_test
>       0.00            +0.1        0.10        perf-profile.self.cycles-pp.__wake_up_common_lock
>       0.07 ± 17%      +0.1        0.18 ±  3%  perf-profile.self.cycles-pp.__list_add_valid
>       0.00            +0.1        0.11 ±  8%  perf-profile.self.cycles-pp.native_irq_return_iret
>       0.00            +0.1        0.12 ±  6%  perf-profile.self.cycles-pp.select_idle_cpu
>       0.00            +0.1        0.12 ± 34%  perf-profile.self.cycles-pp._find_next_and_bit
>       0.00            +0.1        0.13 ± 25%  perf-profile.self.cycles-pp.__cgroup_account_cputime
>       0.00            +0.1        0.13 ±  7%  perf-profile.self.cycles-pp.switch_ldt
>       0.00            +0.1        0.14 ±  5%  perf-profile.self.cycles-pp.check_preempt_curr
>       0.00            +0.1        0.15 ±  2%  perf-profile.self.cycles-pp.save_fpregs_to_fpstate
>       0.00            +0.1        0.15 ±  5%  perf-profile.self.cycles-pp.__rdgsbase_inactive
>       0.14 ±  3%      +0.2        0.29        perf-profile.self.cycles-pp._raw_spin_unlock_irqrestore
>       0.00            +0.2        0.15 ±  7%  perf-profile.self.cycles-pp.ttwu_queue_wakelist
>       0.00            +0.2        0.17 ±  4%  perf-profile.self.cycles-pp.rb_insert_color
>       0.00            +0.2        0.17 ±  5%  perf-profile.self.cycles-pp.rb_next
>       0.00            +0.2        0.18 ±  2%  perf-profile.self.cycles-pp.autoremove_wake_function
>       0.01 ±223%      +0.2        0.19 ±  6%  perf-profile.self.cycles-pp.ttwu_do_activate
>       0.00            +0.2        0.20 ±  2%  perf-profile.self.cycles-pp.rcu_note_context_switch
>       0.00            +0.2        0.20 ±  7%  perf-profile.self.cycles-pp.exit_to_user_mode_loop
>       0.27            +0.2        0.47 ±  3%  perf-profile.self.cycles-pp.mutex_lock
>       0.00            +0.2        0.20 ± 28%  perf-profile.self.cycles-pp.perf_trace_sched_switch
>       0.00            +0.2        0.21 ±  9%  perf-profile.self.cycles-pp.resched_curr
>       0.04 ± 45%      +0.2        0.26 ±  7%  perf-profile.self.cycles-pp.perf_tp_event
>       0.06 ±  7%      +0.2        0.28 ±  8%  perf-profile.self.cycles-pp.perf_trace_sched_wakeup_template
>       0.19 ±  7%      +0.2        0.41 ±  5%  perf-profile.self.cycles-pp.__list_del_entry_valid
>       0.08 ±  5%      +0.2        0.31 ± 11%  perf-profile.self.cycles-pp.task_h_load
>       0.00            +0.2        0.23 ±  5%  perf-profile.self.cycles-pp.finish_task_switch
>       0.03 ± 70%      +0.2        0.27 ±  5%  perf-profile.self.cycles-pp.rb_erase
>       0.02 ±142%      +0.3        0.29 ±  2%  perf-profile.self.cycles-pp.native_sched_clock
>       0.00            +0.3        0.28 ±  3%  perf-profile.self.cycles-pp.__wrgsbase_inactive
>       0.00            +0.3        0.28 ±  6%  perf-profile.self.cycles-pp.clear_buddies
>       0.07 ± 10%      +0.3        0.35 ±  3%  perf-profile.self.cycles-pp.schedule_timeout
>       0.03 ± 70%      +0.3        0.33 ±  3%  perf-profile.self.cycles-pp.select_task_rq
>       0.06 ± 13%      +0.3        0.36 ±  4%  perf-profile.self.cycles-pp.__wake_up_common
>       0.06 ± 13%      +0.3        0.36 ±  3%  perf-profile.self.cycles-pp.dequeue_entity
>       0.06 ± 18%      +0.3        0.37 ±  7%  perf-profile.self.cycles-pp.perf_trace_sched_stat_runtime
>       0.01 ±223%      +0.3        0.33 ±  4%  perf-profile.self.cycles-pp.schedule
>       0.02 ±142%      +0.3        0.35 ±  7%  perf-profile.self.cycles-pp.cpuacct_charge
>       0.01 ±223%      +0.3        0.35        perf-profile.self.cycles-pp.set_next_entity
>       0.00            +0.4        0.35 ± 13%  perf-profile.self.cycles-pp.available_idle_cpu
>       0.08 ± 10%      +0.4        0.44 ±  5%  perf-profile.self.cycles-pp.prepare_to_wait
>       0.63 ±  3%      +0.4        1.00 ±  4%  perf-profile.self.cycles-pp.vfs_read
>       0.02 ±142%      +0.4        0.40 ±  4%  perf-profile.self.cycles-pp.check_preempt_wakeup
>       0.02 ±141%      +0.4        0.42 ±  4%  perf-profile.self.cycles-pp.pick_next_entity
>       0.07 ± 17%      +0.4        0.48        perf-profile.self.cycles-pp.__calc_delta
>       0.06 ± 14%      +0.4        0.47 ±  3%  perf-profile.self.cycles-pp.unix_stream_data_wait
>       0.04 ± 45%      +0.4        0.45 ±  4%  perf-profile.self.cycles-pp.switch_fpu_return
>       0.00            +0.5        0.46 ±  7%  perf-profile.self.cycles-pp.set_next_buddy
>       0.07 ± 17%      +0.5        0.53 ±  3%  perf-profile.self.cycles-pp.select_task_rq_fair
>       0.08 ± 16%      +0.5        0.55 ±  4%  perf-profile.self.cycles-pp.try_to_wake_up
>       0.08 ± 19%      +0.5        0.56 ±  3%  perf-profile.self.cycles-pp.update_rq_clock
>       0.02 ±141%      +0.5        0.50 ± 10%  perf-profile.self.cycles-pp.select_idle_sibling
>       0.77 ±  2%      +0.5        1.25 ±  2%  perf-profile.self.cycles-pp.__libc_read
>       0.09 ± 19%      +0.5        0.59 ±  3%  perf-profile.self.cycles-pp.reweight_entity
>       0.08 ± 14%      +0.5        0.59 ±  2%  perf-profile.self.cycles-pp.dequeue_task_fair
>       0.08 ± 13%      +0.6        0.64 ±  5%  perf-profile.self.cycles-pp.update_min_vruntime
>       0.02 ±141%      +0.6        0.58 ±  7%  perf-profile.self.cycles-pp.put_prev_entity
>       0.06 ± 11%      +0.6        0.64 ±  4%  perf-profile.self.cycles-pp.enqueue_task_fair
>       0.07 ± 18%      +0.6        0.68 ±  3%  perf-profile.self.cycles-pp.os_xsave
>       1.39 ±  2%      +0.7        2.06 ±  3%  perf-profile.self.cycles-pp._raw_spin_lock_irqsave
>       0.28 ±  8%      +0.7        0.97 ±  4%  perf-profile.self.cycles-pp.update_cfs_group
>       0.14 ±  8%      +0.7        0.83 ±  3%  perf-profile.self.cycles-pp.__update_load_avg_se
>       1.76 ±  3%      +0.7        2.47 ±  3%  perf-profile.self.cycles-pp._raw_spin_lock
>       0.12 ± 12%      +0.7        0.85 ±  5%  perf-profile.self.cycles-pp.prepare_task_switch
>       0.12 ± 12%      +0.8        0.91 ±  3%  perf-profile.self.cycles-pp.__update_load_avg_cfs_rq
>       0.13 ± 12%      +0.8        0.93 ±  5%  perf-profile.self.cycles-pp.pick_next_task_fair
>       0.13 ± 12%      +0.9        0.98 ±  4%  perf-profile.self.cycles-pp.__switch_to
>       0.11 ± 18%      +0.9        1.06 ±  5%  perf-profile.self.cycles-pp.___perf_sw_event
>       0.16 ± 11%      +1.2        1.34 ±  4%  perf-profile.self.cycles-pp.enqueue_entity
>       0.20 ± 12%      +1.4        1.58 ±  4%  perf-profile.self.cycles-pp.__switch_to_asm
>       0.23 ± 10%      +1.4        1.65 ±  5%  perf-profile.self.cycles-pp.restore_fpregs_from_fpstate
>       0.25 ± 12%      +1.5        1.77 ±  4%  perf-profile.self.cycles-pp.__schedule
>       0.22 ± 10%      +1.6        1.78 ± 10%  perf-profile.self.cycles-pp.update_load_avg
>       0.23 ± 16%      +1.7        1.91 ±  7%  perf-profile.self.cycles-pp.update_curr
>       0.48 ± 11%      +3.4        3.86 ±  4%  perf-profile.self.cycles-pp.switch_mm_irqs_off
> 
> 
> To reproduce:
> 
>         git clone https://github.com/intel/lkp-tests.git
>         cd lkp-tests
>         sudo bin/lkp install job.yaml           # job file is attached in this email
>         bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
>         sudo bin/lkp run generated-yaml-file
> 
>         # if come across any failure that blocks the test,
>         # please remove ~/.lkp and /lkp dir to run from a clean state.



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3] sched/fair: sanitize vruntime of entity being placed
  2023-02-21 16:57   ` Roman Kagan
@ 2023-02-21 17:26     ` Vincent Guittot
  2023-02-27  8:42       ` Roman Kagan
  0 siblings, 1 reply; 14+ messages in thread
From: Vincent Guittot @ 2023-02-21 17:26 UTC (permalink / raw)
  To: Roman Kagan, Vincent Guittot, Peter Zijlstra, linux-kernel,
	Valentin Schneider, Zhang Qiao, Ben Segall, Waiman Long,
	Steven Rostedt, Mel Gorman, Dietmar Eggemann,
	Daniel Bristot de Oliveira, Ingo Molnar, Juri Lelli

On Tue, 21 Feb 2023 at 17:57, Roman Kagan <rkagan@amazon.de> wrote:
>
> On Tue, Feb 21, 2023 at 10:38:44AM +0100, Vincent Guittot wrote:
> > On Thu, 9 Feb 2023 at 20:31, Roman Kagan <rkagan@amazon.de> wrote:
> > >
> > > From: Zhang Qiao <zhangqiao22@huawei.com>
> > >
> > > When a scheduling entity is placed onto cfs_rq, its vruntime is pulled
> > > to the base level (around cfs_rq->min_vruntime), so that the entity
> > > doesn't gain extra boost when placed backwards.
> > >
> > > However, if the entity being placed wasn't executed for a long time, its
> > > vruntime may get too far behind (e.g. while cfs_rq was executing a
> > > low-weight hog), which can inverse the vruntime comparison due to s64
> > > overflow.  This results in the entity being placed with its original
> > > vruntime way forwards, so that it will effectively never get to the cpu.
> > >
> > > To prevent that, ignore the vruntime of the entity being placed if it
> > > didn't execute for longer than the time that can lead to an overflow.
> > >
> > > Signed-off-by: Zhang Qiao <zhangqiao22@huawei.com>
> > > [rkagan: formatted, adjusted commit log, comments, cutoff value]
> > > Co-developed-by: Roman Kagan <rkagan@amazon.de>
> > > Signed-off-by: Roman Kagan <rkagan@amazon.de>
> >
> > Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
> >
> > > ---
> > > v2 -> v3:
> > > - make cutoff less arbitrary and update comments [Vincent]
> > >
> > > v1 -> v2:
> > > - add Zhang Qiao's s-o-b
> > > - fix constant promotion on 32bit
> > >
> > >  kernel/sched/fair.c | 21 +++++++++++++++++++--
> > >  1 file changed, 19 insertions(+), 2 deletions(-)
>
> Turns out Peter took v2 through his tree, and it has already landed in
> Linus' master.
>
> What scares me, though, is that I've got a message from the test robot
> that this commit drammatically affected hackbench results, see the quote
> below.  I expected the commit not to affect any benchmarks.
>
> Any idea what could have caused this change?

Hmm, It's most probably because se->exec_start is reset after a
migration and the condition becomes true for newly migrated task
whereas its vruntime should be after min_vruntime.

We have missed this condition

>
> Thanks,
> Roman.
>
>
> On Tue, Feb 21, 2023 at 03:34:16PM +0800, kernel test robot wrote:
> > FYI, we noticed a 125.5% improvement of hackbench.throughput due to commit:
> >
> > commit: 829c1651e9c4a6f78398d3e67651cef9bb6b42cc ("sched/fair: sanitize vruntime of entity being placed")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> >
> > in testcase: hackbench
> > on test machine: 88 threads 2 sockets Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz (Cascade Lake) with 128G memory
> > with following parameters:
> >
> >         nr_threads: 50%
> >         iterations: 8
> >         mode: process
> >         ipc: pipe
> >         cpufreq_governor: performance
> >
> > test-description: Hackbench is both a benchmark and a stress test for the Linux kernel scheduler.
> > test-url: https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/sched/cfs-scheduler/hackbench.c
> >
> > In addition to that, the commit also has significant impact on the following tests:
> >
> > +------------------+--------------------------------------------------+
> > | testcase: change | hackbench: hackbench.throughput -8.1% regression |
> > | test machine     | 104 threads 2 sockets (Skylake) with 192G memory |
> > | test parameters  | cpufreq_governor=performance                     |
> > |                  | ipc=socket                                       |
> > |                  | iterations=4                                     |
> > |                  | mode=process                                     |
> > |                  | nr_threads=100%                                  |
> > +------------------+--------------------------------------------------+
> >
> > Details are as below:
> >
> > =========================================================================================
> > compiler/cpufreq_governor/ipc/iterations/kconfig/mode/nr_threads/rootfs/tbox_group/testcase:
> >   gcc-11/performance/pipe/8/x86_64-rhel-8.3/process/50%/debian-11.1-x86_64-20220510.cgz/lkp-csl-2sp9/hackbench
> >
> > commit:
> >   a2e90611b9 ("sched/fair: Remove capacity inversion detection")
> >   829c1651e9 ("sched/fair: sanitize vruntime of entity being placed")
> >
> > a2e90611b9f425ad 829c1651e9c4a6f78398d3e6765
> > ---------------- ---------------------------
> >          %stddev     %change         %stddev
> >              \          |                \
> >     308887 ±  5%    +125.5%     696539        hackbench.throughput
> >     259291 ±  2%    +127.3%     589293        hackbench.throughput_avg
> >     308887 ±  5%    +125.5%     696539        hackbench.throughput_best
> >     198770 ±  2%    +105.5%     408552 ±  4%  hackbench.throughput_worst
> >     319.60 ±  2%     -55.8%     141.24        hackbench.time.elapsed_time
> >     319.60 ±  2%     -55.8%     141.24        hackbench.time.elapsed_time.max
> >  1.298e+09 ±  8%     -87.6%  1.613e+08 ±  7%  hackbench.time.involuntary_context_switches
> >     477107           -12.5%     417660        hackbench.time.minor_page_faults
> >      24683 ±  2%     -57.2%      10562        hackbench.time.system_time
> >       2136 ±  3%     -45.0%       1174        hackbench.time.user_time
> >   3.21e+09 ±  4%     -83.0%  5.442e+08 ±  3%  hackbench.time.voluntary_context_switches
> >   5.28e+08 ±  4%      +8.4%  5.723e+08 ±  3%  cpuidle..time
> >     365.97 ±  2%     -48.9%     187.12        uptime.boot
> >    3322559 ±  3%     +34.3%    4463206 ± 15%  vmstat.memory.cache
> >   14194257 ±  2%     -62.8%    5279904 ±  3%  vmstat.system.cs
> >    2120781 ±  3%     -72.8%     576421 ±  4%  vmstat.system.in
> >       1.84 ± 12%      +2.6        4.48 ±  5%  mpstat.cpu.all.idle%
> >       2.49 ±  3%      -1.1        1.39 ±  4%  mpstat.cpu.all.irq%
> >       0.04 ± 12%      +0.0        0.05        mpstat.cpu.all.soft%
> >       7.36            +2.2        9.56        mpstat.cpu.all.usr%
> >      61555 ±  6%     -72.8%      16751 ± 16%  numa-meminfo.node1.Active
> >      61515 ±  6%     -72.8%      16717 ± 16%  numa-meminfo.node1.Active(anon)
> >     960182 ±102%    +225.6%    3125990 ± 42%  numa-meminfo.node1.FilePages
> >    1754002 ± 53%    +137.9%    4173379 ± 34%  numa-meminfo.node1.MemUsed
> >   35296824 ±  6%    +157.8%   91005048        numa-numastat.node0.local_node
> >   35310119 ±  6%    +157.9%   91058472        numa-numastat.node0.numa_hit
> >   35512423 ±  5%    +159.7%   92232951        numa-numastat.node1.local_node
> >   35577275 ±  4%    +159.4%   92273266        numa-numastat.node1.numa_hit
> >   35310253 ±  6%    +157.9%   91058211        numa-vmstat.node0.numa_hit
> >   35296958 ±  6%    +157.8%   91004787        numa-vmstat.node0.numa_local
> >      15337 ±  6%     -72.5%       4216 ± 17%  numa-vmstat.node1.nr_active_anon
> >     239988 ±102%    +225.7%     781607 ± 42%  numa-vmstat.node1.nr_file_pages
> >      15337 ±  6%     -72.5%       4216 ± 17%  numa-vmstat.node1.nr_zone_active_anon
> >   35577325 ±  4%    +159.4%   92273215        numa-vmstat.node1.numa_hit
> >   35512473 ±  5%    +159.7%   92232900        numa-vmstat.node1.numa_local
> >      64500 ±  8%     -61.8%      24643 ± 32%  meminfo.Active
> >      64422 ±  8%     -61.9%      24568 ± 32%  meminfo.Active(anon)
> >     140271 ± 14%     -38.0%      86979 ± 24%  meminfo.AnonHugePages
> >     372672 ±  2%     +13.3%     422069        meminfo.AnonPages
> >    3205235 ±  3%     +35.1%    4329061 ± 15%  meminfo.Cached
> >    1548601 ±  7%     +77.4%    2747319 ± 24%  meminfo.Committed_AS
> >     783193 ± 14%    +154.9%    1996137 ± 33%  meminfo.Inactive
> >     783010 ± 14%    +154.9%    1995951 ± 33%  meminfo.Inactive(anon)
> >    4986534 ±  2%     +28.2%    6394741 ± 10%  meminfo.Memused
> >     475092 ± 22%    +236.5%    1598918 ± 41%  meminfo.Shmem
> >       2777            -2.1%       2719        turbostat.Bzy_MHz
> >   11143123 ±  6%     +72.0%   19162667        turbostat.C1
> >       0.24 ±  7%      +0.7        0.94 ±  3%  turbostat.C1%
> >     100440 ± 18%    +203.8%     305136 ± 15%  turbostat.C1E
> >       0.06 ±  9%      +0.1        0.18 ± 11%  turbostat.C1E%
> >       1.24 ±  3%      +1.6        2.81 ±  4%  turbostat.C6%
> >       1.38 ±  3%    +156.1%       3.55 ±  3%  turbostat.CPU%c1
> >       0.33 ±  5%     +76.5%       0.58 ±  7%  turbostat.CPU%c6
> >       0.16           +31.2%       0.21        turbostat.IPC
> >  6.866e+08 ±  5%     -87.8%   83575393 ±  5%  turbostat.IRQ
> >       0.33 ± 27%      +0.2        0.57        turbostat.POLL%
> >       0.12 ± 10%    +176.4%       0.33 ± 12%  turbostat.Pkg%pc2
> >       0.09 ±  7%    -100.0%       0.00        turbostat.Pkg%pc6
> >      61.33            +5.2%      64.50 ±  2%  turbostat.PkgTmp
> >      14.81            +2.0%      15.11        turbostat.RAMWatt
> >      16242 ±  8%     -62.0%       6179 ± 32%  proc-vmstat.nr_active_anon
> >      93150 ±  2%     +13.2%     105429        proc-vmstat.nr_anon_pages
> >     801219 ±  3%     +35.1%    1082320 ± 15%  proc-vmstat.nr_file_pages
> >     195506 ± 14%    +155.2%     498919 ± 33%  proc-vmstat.nr_inactive_anon
> >     118682 ± 22%    +236.9%     399783 ± 41%  proc-vmstat.nr_shmem
> >      16242 ±  8%     -62.0%       6179 ± 32%  proc-vmstat.nr_zone_active_anon
> >     195506 ± 14%    +155.2%     498919 ± 33%  proc-vmstat.nr_zone_inactive_anon
> >   70889233 ±  5%    +158.6%  1.833e+08        proc-vmstat.numa_hit
> >   70811086 ±  5%    +158.8%  1.832e+08        proc-vmstat.numa_local
> >      55885 ± 22%     -67.2%      18327 ± 38%  proc-vmstat.numa_pages_migrated
> >     422312 ± 10%     -95.4%      19371 ±  7%  proc-vmstat.pgactivate
> >   71068460 ±  5%    +158.1%  1.834e+08        proc-vmstat.pgalloc_normal
> >    1554994           -19.6%    1250346 ±  4%  proc-vmstat.pgfault
> >   71011267 ±  5%    +155.9%  1.817e+08        proc-vmstat.pgfree
> >      55885 ± 22%     -67.2%      18327 ± 38%  proc-vmstat.pgmigrate_success
> >     111247 ±  2%     -35.0%      72355 ±  2%  proc-vmstat.pgreuse
> >    2506368 ±  2%     -53.1%    1176320        proc-vmstat.unevictable_pgs_scanned
> >      20.06 ± 10%     -22.4%      15.56 ±  8%  sched_debug.cfs_rq:/.h_nr_running.max
> >       0.81 ± 32%     -93.1%       0.06 ±223%  sched_debug.cfs_rq:/.h_nr_running.min
> >       1917 ± 34%    -100.0%       0.00        sched_debug.cfs_rq:/.load.min
> >      24.18 ± 10%     +39.0%      33.62 ± 11%  sched_debug.cfs_rq:/.load_avg.avg
> >     245.61 ± 25%     +66.3%     408.33 ± 22%  sched_debug.cfs_rq:/.load_avg.max
> >      47.52 ± 13%     +72.6%      82.03 ±  8%  sched_debug.cfs_rq:/.load_avg.stddev
> >   13431147           -64.9%    4717147        sched_debug.cfs_rq:/.min_vruntime.avg
> >   18161799 ±  7%     -67.4%    5925316 ±  6%  sched_debug.cfs_rq:/.min_vruntime.max
> >   12413026           -65.0%    4340952        sched_debug.cfs_rq:/.min_vruntime.min
> >     739748 ± 16%     -66.6%     247410 ± 17%  sched_debug.cfs_rq:/.min_vruntime.stddev
> >       0.85           -16.4%       0.71        sched_debug.cfs_rq:/.nr_running.avg
> >       0.61 ± 25%     -90.9%       0.06 ±223%  sched_debug.cfs_rq:/.nr_running.min
> >       0.10 ± 25%    +109.3%       0.22 ±  7%  sched_debug.cfs_rq:/.nr_running.stddev
> >     169.22          +101.7%     341.33        sched_debug.cfs_rq:/.removed.load_avg.max
> >      32.41 ± 24%    +100.2%      64.90 ± 16%  sched_debug.cfs_rq:/.removed.load_avg.stddev
> >      82.92 ± 10%    +108.1%     172.56        sched_debug.cfs_rq:/.removed.runnable_avg.max
> >      13.60 ± 28%    +114.0%      29.10 ± 20%  sched_debug.cfs_rq:/.removed.runnable_avg.stddev
> >      82.92 ± 10%    +108.1%     172.56        sched_debug.cfs_rq:/.removed.util_avg.max
> >      13.60 ± 28%    +114.0%      29.10 ± 20%  sched_debug.cfs_rq:/.removed.util_avg.stddev
> >       2156 ± 12%     -36.6%       1368 ± 27%  sched_debug.cfs_rq:/.runnable_avg.min
> >       2285 ±  7%     -19.8%       1833 ±  6%  sched_debug.cfs_rq:/.runnable_avg.stddev
> >   -2389921           -64.8%    -840940        sched_debug.cfs_rq:/.spread0.min
> >     739781 ± 16%     -66.5%     247837 ± 17%  sched_debug.cfs_rq:/.spread0.stddev
> >     843.88 ±  2%     -20.5%     670.53        sched_debug.cfs_rq:/.util_avg.avg
> >     433.64 ±  7%     -43.5%     244.83 ± 17%  sched_debug.cfs_rq:/.util_avg.min
> >     187.00 ±  6%     +40.6%     263.02 ±  4%  sched_debug.cfs_rq:/.util_avg.stddev
> >     394.15 ± 14%     -29.5%     278.06 ±  3%  sched_debug.cfs_rq:/.util_est_enqueued.avg
> >       1128 ± 12%     -17.6%     930.39 ±  5%  sched_debug.cfs_rq:/.util_est_enqueued.max
> >      38.36 ± 29%    -100.0%       0.00        sched_debug.cfs_rq:/.util_est_enqueued.min
> >       3596 ± 15%     -39.5%       2175 ±  7%  sched_debug.cpu.avg_idle.min
> >     160647 ±  9%     -25.9%     118978 ±  9%  sched_debug.cpu.avg_idle.stddev
> >     197365           -46.2%     106170        sched_debug.cpu.clock.avg
> >     197450           -46.2%     106208        sched_debug.cpu.clock.max
> >     197281           -46.2%     106128        sched_debug.cpu.clock.min
> >      49.96 ± 22%     -53.1%      23.44 ± 19%  sched_debug.cpu.clock.stddev
> >     193146           -45.7%     104898        sched_debug.cpu.clock_task.avg
> >     194592           -45.8%     105455        sched_debug.cpu.clock_task.max
> >     177878           -49.3%      90211        sched_debug.cpu.clock_task.min
> >       1794 ±  5%     -10.7%       1602 ±  2%  sched_debug.cpu.clock_task.stddev
> >      13154 ±  2%     -20.3%      10479        sched_debug.cpu.curr->pid.avg
> >      15059           -17.2%      12468        sched_debug.cpu.curr->pid.max
> >       7263 ± 33%    -100.0%       0.00        sched_debug.cpu.curr->pid.min
> >       9321 ± 36%     +98.2%      18478 ± 44%  sched_debug.cpu.max_idle_balance_cost.stddev
> >       0.00 ± 17%     -41.6%       0.00 ± 13%  sched_debug.cpu.next_balance.stddev
> >      20.00 ± 11%     -21.4%      15.72 ±  7%  sched_debug.cpu.nr_running.max
> >       0.86 ± 17%     -87.1%       0.11 ±141%  sched_debug.cpu.nr_running.min
> >   25069883           -83.7%    4084117 ±  4%  sched_debug.cpu.nr_switches.avg
> >   26486718           -82.8%    4544009 ±  4%  sched_debug.cpu.nr_switches.max
> >   23680077           -84.5%    3663816 ±  4%  sched_debug.cpu.nr_switches.min
> >     589836 ±  3%     -68.7%     184621 ± 16%  sched_debug.cpu.nr_switches.stddev
> >     197278           -46.2%     106128        sched_debug.cpu_clk
> >     194327           -46.9%     103176        sched_debug.ktime
> >     197967           -46.0%     106821        sched_debug.sched_clk
> >      14.91           -37.6%       9.31        perf-stat.i.MPKI
> >  2.657e+10           +25.0%   3.32e+10        perf-stat.i.branch-instructions
> >       1.17            -0.4        0.78        perf-stat.i.branch-miss-rate%
> >  3.069e+08           -20.1%  2.454e+08        perf-stat.i.branch-misses
> >       6.43 ±  8%      +2.2        8.59 ±  4%  perf-stat.i.cache-miss-rate%
> >  1.952e+09           -24.3%  1.478e+09        perf-stat.i.cache-references
> >   14344055 ±  2%     -58.6%    5932018 ±  3%  perf-stat.i.context-switches
> >       1.83           -21.8%       1.43        perf-stat.i.cpi
> >  2.403e+11            -3.4%  2.322e+11        perf-stat.i.cpu-cycles
> >    1420139 ±  2%     -38.8%     869692 ±  5%  perf-stat.i.cpu-migrations
> >       2619 ±  7%     -15.5%       2212 ±  8%  perf-stat.i.cycles-between-cache-misses
> >       0.24 ± 19%      -0.1        0.10 ± 17%  perf-stat.i.dTLB-load-miss-rate%
> >   90403286 ± 19%     -55.8%   39926283 ± 16%  perf-stat.i.dTLB-load-misses
> >  3.823e+10           +28.6%  4.918e+10        perf-stat.i.dTLB-loads
> >       0.01 ± 34%      -0.0        0.01 ± 33%  perf-stat.i.dTLB-store-miss-rate%
> >    2779663 ± 34%     -52.7%    1315899 ± 31%  perf-stat.i.dTLB-store-misses
> >   2.19e+10           +24.2%   2.72e+10        perf-stat.i.dTLB-stores
> >      47.99 ±  2%     +28.0       75.94        perf-stat.i.iTLB-load-miss-rate%
> >   89417955 ±  2%     +38.7%   1.24e+08 ±  4%  perf-stat.i.iTLB-load-misses
> >   97721514 ±  2%     -58.2%   40865783 ±  3%  perf-stat.i.iTLB-loads
> >  1.329e+11           +26.3%  1.678e+11        perf-stat.i.instructions
> >       1503            -7.7%       1388 ±  3%  perf-stat.i.instructions-per-iTLB-miss
> >       0.55           +30.2%       0.72        perf-stat.i.ipc
> >       1.64 ± 18%    +217.4%       5.20 ± 11%  perf-stat.i.major-faults
> >       2.73            -3.7%       2.63        perf-stat.i.metric.GHz
> >       1098 ±  2%      -7.1%       1020 ±  3%  perf-stat.i.metric.K/sec
> >       1008           +24.4%       1254        perf-stat.i.metric.M/sec
> >       4334 ±  2%     +90.5%       8257 ±  7%  perf-stat.i.minor-faults
> >      90.94           -14.9       75.99        perf-stat.i.node-load-miss-rate%
> >   41932510 ±  8%     -43.0%   23899176 ± 10%  perf-stat.i.node-load-misses
> >    3366677 ±  5%     +86.2%    6267816        perf-stat.i.node-loads
> >      81.77 ±  3%     -36.3       45.52 ±  3%  perf-stat.i.node-store-miss-rate%
> >   18498318 ±  7%     -31.8%   12613933 ±  7%  perf-stat.i.node-store-misses
> >    3023556 ± 10%    +508.7%   18405880 ±  2%  perf-stat.i.node-stores
> >       4336 ±  2%     +90.5%       8262 ±  7%  perf-stat.i.page-faults
> >      14.70           -41.2%       8.65        perf-stat.overall.MPKI
> >       1.16            -0.4        0.72        perf-stat.overall.branch-miss-rate%
> >       6.22 ±  7%      +2.4        8.59 ±  4%  perf-stat.overall.cache-miss-rate%
> >       1.81           -24.3%       1.37        perf-stat.overall.cpi
> >       0.24 ± 19%      -0.2        0.07 ± 15%  perf-stat.overall.dTLB-load-miss-rate%
> >       0.01 ± 34%      -0.0        0.00 ± 29%  perf-stat.overall.dTLB-store-miss-rate%
> >      47.78 ±  2%     +29.3       77.12        perf-stat.overall.iTLB-load-miss-rate%
> >       1486            -9.1%       1351 ±  4%  perf-stat.overall.instructions-per-iTLB-miss
> >       0.55           +32.0%       0.73        perf-stat.overall.ipc
> >      92.54           -15.4       77.16 ±  2%  perf-stat.overall.node-load-miss-rate%
> >      85.82 ±  2%     -48.1       37.76 ±  5%  perf-stat.overall.node-store-miss-rate%
> >  2.648e+10           +25.2%  3.314e+10        perf-stat.ps.branch-instructions
> >   3.06e+08           -22.1%  2.383e+08        perf-stat.ps.branch-misses
> >  1.947e+09           -25.5%  1.451e+09        perf-stat.ps.cache-references
> >   14298713 ±  2%     -62.5%    5359285 ±  3%  perf-stat.ps.context-switches
> >  2.396e+11            -4.0%  2.299e+11        perf-stat.ps.cpu-cycles
> >    1415512 ±  2%     -42.2%     817981 ±  4%  perf-stat.ps.cpu-migrations
> >   90073948 ± 19%     -60.4%   35711862 ± 15%  perf-stat.ps.dTLB-load-misses
> >  3.811e+10           +29.7%  4.944e+10        perf-stat.ps.dTLB-loads
> >    2767291 ± 34%     -56.3%    1210210 ± 29%  perf-stat.ps.dTLB-store-misses
> >  2.183e+10           +25.0%  2.729e+10        perf-stat.ps.dTLB-stores
> >   89118809 ±  2%     +39.6%  1.244e+08 ±  4%  perf-stat.ps.iTLB-load-misses
> >   97404381 ±  2%     -62.2%   36860047 ±  3%  perf-stat.ps.iTLB-loads
> >  1.324e+11           +26.7%  1.678e+11        perf-stat.ps.instructions
> >       1.62 ± 18%    +164.7%       4.29 ±  8%  perf-stat.ps.major-faults
> >       4310 ±  2%     +75.1%       7549 ±  5%  perf-stat.ps.minor-faults
> >   41743097 ±  8%     -47.3%   21984450 ±  9%  perf-stat.ps.node-load-misses
> >    3356259 ±  5%     +92.6%    6462631        perf-stat.ps.node-loads
> >   18414647 ±  7%     -35.7%   11833799 ±  6%  perf-stat.ps.node-store-misses
> >    3019790 ± 10%    +545.0%   19478071        perf-stat.ps.node-stores
> >       4312 ±  2%     +75.2%       7553 ±  5%  perf-stat.ps.page-faults
> >  4.252e+13           -43.7%  2.395e+13        perf-stat.total.instructions
> >      29.92 ±  4%     -22.8        7.09 ± 29%  perf-profile.calltrace.cycles-pp.__wake_up_common_lock.pipe_write.vfs_write.ksys_write.do_syscall_64
> >      28.53 ±  5%     -21.6        6.92 ± 29%  perf-profile.calltrace.cycles-pp.__wake_up_common.__wake_up_common_lock.pipe_write.vfs_write.ksys_write
> >      27.86 ±  5%     -21.1        6.77 ± 29%  perf-profile.calltrace.cycles-pp.autoremove_wake_function.__wake_up_common.__wake_up_common_lock.pipe_write.vfs_write
> >      27.55 ±  5%     -20.9        6.68 ± 29%  perf-profile.calltrace.cycles-pp.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock.pipe_write
> >      22.28 ±  4%     -17.0        5.31 ± 30%  perf-profile.calltrace.cycles-pp.schedule.pipe_read.vfs_read.ksys_read.do_syscall_64
> >      21.98 ±  4%     -16.7        5.24 ± 30%  perf-profile.calltrace.cycles-pp.__schedule.schedule.pipe_read.vfs_read.ksys_read
> >      12.62 ±  4%      -9.6        3.00 ± 33%  perf-profile.calltrace.cycles-pp.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock
> >      34.09            -9.2       24.92 ±  3%  perf-profile.calltrace.cycles-pp.pipe_read.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >      11.48 ±  5%      -8.8        2.69 ± 38%  perf-profile.calltrace.cycles-pp.enqueue_task_fair.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common
> >       9.60 ±  7%      -7.2        2.40 ± 35%  perf-profile.calltrace.cycles-pp.dequeue_task_fair.__schedule.schedule.pipe_read.vfs_read
> >      36.39            -6.2       30.20        perf-profile.calltrace.cycles-pp.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
> >      40.40            -6.1       34.28        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
> >      40.95            -5.7       35.26        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__libc_read
> >      37.43            -5.4       32.07        perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
> >       6.30 ± 11%      -5.2        1.09 ± 36%  perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write
> >       5.66 ± 12%      -5.1        0.58 ± 75%  perf-profile.calltrace.cycles-pp.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >       6.46 ± 10%      -5.1        1.40 ± 28%  perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write
> >       5.53 ± 13%      -5.0        0.56 ± 75%  perf-profile.calltrace.cycles-pp.schedule.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
> >       5.42 ± 13%      -4.9        0.56 ± 75%  perf-profile.calltrace.cycles-pp.__schedule.schedule.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode
> >       5.82 ±  9%      -4.7        1.10 ± 37%  perf-profile.calltrace.cycles-pp._raw_spin_lock.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock
> >       5.86 ± 16%      -4.6        1.31 ± 37%  perf-profile.calltrace.cycles-pp.enqueue_entity.enqueue_task_fair.ttwu_do_activate.try_to_wake_up.autoremove_wake_function
> >       5.26 ±  9%      -4.4        0.89 ± 57%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.try_to_wake_up.autoremove_wake_function.__wake_up_common
> >      45.18            -3.5       41.68        perf-profile.calltrace.cycles-pp.__libc_read
> >      50.31            -3.2       47.12        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write
> >       4.00 ± 27%      -2.9        1.09 ± 40%  perf-profile.calltrace.cycles-pp.dequeue_entity.dequeue_task_fair.__schedule.schedule.pipe_read
> >      50.75            -2.7       48.06        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__libc_write
> >      40.80            -2.6       38.20        perf-profile.calltrace.cycles-pp.pipe_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >       3.10 ± 15%      -2.5        0.62 ±103%  perf-profile.calltrace.cycles-pp.update_cfs_group.dequeue_task_fair.__schedule.schedule.pipe_read
> >       2.94 ± 12%      -2.3        0.62 ±102%  perf-profile.calltrace.cycles-pp.update_cfs_group.enqueue_task_fair.ttwu_do_activate.try_to_wake_up.autoremove_wake_function
> >       2.38 ±  9%      -2.0        0.38 ±102%  perf-profile.calltrace.cycles-pp._raw_spin_lock.__schedule.schedule.pipe_read.vfs_read
> >       2.24 ±  7%      -1.8        0.40 ± 71%  perf-profile.calltrace.cycles-pp.prepare_to_wait_event.pipe_read.vfs_read.ksys_read.do_syscall_64
> >       2.08 ±  6%      -1.8        0.29 ±100%  perf-profile.calltrace.cycles-pp.pick_next_task_fair.__schedule.schedule.pipe_read.vfs_read
> >       2.10 ± 10%      -1.8        0.32 ±104%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.__schedule.schedule.pipe_read
> >       2.76 ±  7%      -1.5        1.24 ± 17%  perf-profile.calltrace.cycles-pp.select_task_rq.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock
> >       2.27 ±  5%      -1.4        0.88 ± 11%  perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
> >       2.43 ±  7%      -1.3        1.16 ± 17%  perf-profile.calltrace.cycles-pp.select_task_rq_fair.select_task_rq.try_to_wake_up.autoremove_wake_function.__wake_up_common
> >       2.46 ±  5%      -1.3        1.20 ±  7%  perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
> >       1.54 ±  5%      -1.2        0.32 ±101%  perf-profile.calltrace.cycles-pp.update_load_avg.enqueue_entity.enqueue_task_fair.ttwu_do_activate.try_to_wake_up
> >       0.97 ±  9%      -0.3        0.66 ± 19%  perf-profile.calltrace.cycles-pp.select_idle_sibling.select_task_rq_fair.select_task_rq.try_to_wake_up.autoremove_wake_function
> >       0.86 ±  6%      +0.2        1.02        perf-profile.calltrace.cycles-pp.__might_fault._copy_from_iter.copy_page_from_iter.pipe_write.vfs_write
> >       0.64 ±  9%      +0.5        1.16 ±  5%  perf-profile.calltrace.cycles-pp.mutex_unlock.pipe_read.vfs_read.ksys_read.do_syscall_64
> >       0.47 ± 45%      +0.5        0.99 ±  5%  perf-profile.calltrace.cycles-pp.__fget_light.__fdget_pos.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >       0.60 ±  8%      +0.5        1.13 ±  5%  perf-profile.calltrace.cycles-pp.__fdget_pos.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
> >       0.00            +0.5        0.54 ±  5%  perf-profile.calltrace.cycles-pp.current_time.file_update_time.pipe_write.vfs_write.ksys_write
> >       0.00            +0.6        0.56 ±  4%  perf-profile.calltrace.cycles-pp.__might_resched.__might_fault._copy_from_iter.copy_page_from_iter.pipe_write
> >       0.00            +0.6        0.56 ±  7%  perf-profile.calltrace.cycles-pp.__might_resched.__might_fault._copy_to_iter.copy_page_to_iter.pipe_read
> >       0.00            +0.6        0.58 ±  5%  perf-profile.calltrace.cycles-pp.__might_resched.mutex_lock.pipe_write.vfs_write.ksys_write
> >       0.00            +0.6        0.62 ±  3%  perf-profile.calltrace.cycles-pp.__might_resched.mutex_lock.pipe_read.vfs_read.ksys_read
> >       0.00            +0.7        0.65 ±  6%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.prepare_to_wait_event.pipe_write.vfs_write
> >       0.00            +0.7        0.65 ±  7%  perf-profile.calltrace.cycles-pp.poll_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
> >       0.57 ±  5%      +0.7        1.24 ±  6%  perf-profile.calltrace.cycles-pp.__fget_light.__fdget_pos.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >       0.00            +0.7        0.72 ±  6%  perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.prepare_to_wait_event.pipe_write.vfs_write.ksys_write
> >       0.00            +0.8        0.75 ±  6%  perf-profile.calltrace.cycles-pp.mutex_spin_on_owner.__mutex_lock.pipe_write.vfs_write.ksys_write
> >       0.74 ±  9%      +0.8        1.48 ±  5%  perf-profile.calltrace.cycles-pp.file_update_time.pipe_write.vfs_write.ksys_write.do_syscall_64
> >       0.63 ±  5%      +0.8        1.40 ±  5%  perf-profile.calltrace.cycles-pp.__fdget_pos.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write
> >       0.00            +0.8        0.78 ± 19%  perf-profile.calltrace.cycles-pp.reader__read_event.perf_session__process_events.record__finish_output.__cmd_record
> >       0.00            +0.8        0.78 ± 19%  perf-profile.calltrace.cycles-pp.record__finish_output.__cmd_record
> >       0.00            +0.8        0.78 ± 19%  perf-profile.calltrace.cycles-pp.perf_session__process_events.record__finish_output.__cmd_record
> >       0.00            +0.8        0.80 ± 15%  perf-profile.calltrace.cycles-pp.__cmd_record
> >       0.00            +0.8        0.82 ± 11%  perf-profile.calltrace.cycles-pp.intel_idle_irq.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
> >       0.00            +0.9        0.85 ±  6%  perf-profile.calltrace.cycles-pp.prepare_to_wait_event.pipe_write.vfs_write.ksys_write.do_syscall_64
> >       0.00            +0.9        0.86 ±  4%  perf-profile.calltrace.cycles-pp.current_time.atime_needs_update.touch_atime.pipe_read.vfs_read
> >       0.00            +0.9        0.87 ±  5%  perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.__libc_write
> >       0.00            +0.9        0.88 ±  5%  perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.__libc_read
> >       0.26 ±100%      +1.0        1.22 ± 10%  perf-profile.calltrace.cycles-pp.__schedule.schedule.pipe_write.vfs_write.ksys_write
> >       0.00            +1.0        0.96 ±  6%  perf-profile.calltrace.cycles-pp.__might_fault._copy_to_iter.copy_page_to_iter.pipe_read.vfs_read
> >       0.27 ±100%      +1.0        1.23 ± 10%  perf-profile.calltrace.cycles-pp.schedule.pipe_write.vfs_write.ksys_write.do_syscall_64
> >       0.00            +1.0        0.97 ±  7%  perf-profile.calltrace.cycles-pp.page_counter_uncharge.uncharge_batch.__mem_cgroup_uncharge.__folio_put.pipe_read
> >       0.87 ±  8%      +1.1        1.98 ±  5%  perf-profile.calltrace.cycles-pp.apparmor_file_permission.security_file_permission.vfs_read.ksys_read.do_syscall_64
> >       0.73 ±  6%      +1.1        1.85 ±  5%  perf-profile.calltrace.cycles-pp.apparmor_file_permission.security_file_permission.vfs_write.ksys_write.do_syscall_64
> >       0.00            +1.2        1.15 ±  7%  perf-profile.calltrace.cycles-pp.uncharge_batch.__mem_cgroup_uncharge.__folio_put.pipe_read.vfs_read
> >       0.00            +1.2        1.23 ±  6%  perf-profile.calltrace.cycles-pp.__mem_cgroup_uncharge.__folio_put.pipe_read.vfs_read.ksys_read
> >       0.00            +1.2        1.24 ±  7%  perf-profile.calltrace.cycles-pp.__folio_put.pipe_read.vfs_read.ksys_read.do_syscall_64
> >       0.48 ± 45%      +1.3        1.74 ±  6%  perf-profile.calltrace.cycles-pp.atime_needs_update.touch_atime.pipe_read.vfs_read.ksys_read
> >       0.60 ±  7%      +1.3        1.87 ±  8%  perf-profile.calltrace.cycles-pp.__wake_up_common_lock.pipe_read.vfs_read.ksys_read.do_syscall_64
> >       1.23 ±  7%      +1.3        2.51 ±  4%  perf-profile.calltrace.cycles-pp.mutex_lock.pipe_read.vfs_read.ksys_read.do_syscall_64
> >      43.42            +1.3       44.75        perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write
> >       0.83 ±  7%      +1.3        2.17 ±  5%  perf-profile.calltrace.cycles-pp.security_file_permission.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >       0.98 ±  7%      +1.4        2.36 ±  6%  perf-profile.calltrace.cycles-pp.security_file_permission.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >       0.27 ±100%      +1.4        1.70 ±  9%  perf-profile.calltrace.cycles-pp.__wake_up_common.__wake_up_common_lock.pipe_read.vfs_read.ksys_read
> >       0.79 ±  8%      +1.4        2.23 ±  6%  perf-profile.calltrace.cycles-pp.touch_atime.pipe_read.vfs_read.ksys_read.do_syscall_64
> >       0.18 ±141%      +1.5        1.63 ±  9%  perf-profile.calltrace.cycles-pp.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock.pipe_read
> >       0.18 ±141%      +1.5        1.67 ±  9%  perf-profile.calltrace.cycles-pp.autoremove_wake_function.__wake_up_common.__wake_up_common_lock.pipe_read.vfs_read
> >       0.00            +1.6        1.57 ± 10%  perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
> >       0.00            +1.6        1.57 ± 10%  perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
> >       1.05 ±  8%      +1.7        2.73 ±  6%  perf-profile.calltrace.cycles-pp.copy_user_enhanced_fast_string.copyin._copy_from_iter.copy_page_from_iter.pipe_write
> >       1.84 ±  9%      +1.7        3.56 ±  5%  perf-profile.calltrace.cycles-pp.copy_user_enhanced_fast_string.copyout._copy_to_iter.copy_page_to_iter.pipe_read
> >       1.41 ±  9%      +1.8        3.17 ±  6%  perf-profile.calltrace.cycles-pp.copyin._copy_from_iter.copy_page_from_iter.pipe_write.vfs_write
> >       0.00            +1.8        1.79 ±  9%  perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
> >       1.99 ±  9%      +2.0        3.95 ±  5%  perf-profile.calltrace.cycles-pp.copyout._copy_to_iter.copy_page_to_iter.pipe_read.vfs_read
> >       2.40 ±  7%      +2.4        4.82 ±  5%  perf-profile.calltrace.cycles-pp._copy_from_iter.copy_page_from_iter.pipe_write.vfs_write.ksys_write
> >       0.00            +2.5        2.50 ±  7%  perf-profile.calltrace.cycles-pp.__mutex_lock.pipe_write.vfs_write.ksys_write.do_syscall_64
> >       2.89 ±  8%      +2.6        5.47 ±  5%  perf-profile.calltrace.cycles-pp.copy_page_from_iter.pipe_write.vfs_write.ksys_write.do_syscall_64
> >       1.04 ± 30%      +2.8        3.86 ±  5%  perf-profile.calltrace.cycles-pp.__entry_text_start.__libc_write
> >       0.00            +2.9        2.90 ± 11%  perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
> >       0.00            +2.9        2.91 ± 11%  perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
> >       0.00            +2.9        2.91 ± 11%  perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64_no_verify
> >       0.85 ± 27%      +2.9        3.80 ±  5%  perf-profile.calltrace.cycles-pp.__entry_text_start.__libc_read
> >       0.00            +3.0        2.96 ± 11%  perf-profile.calltrace.cycles-pp.secondary_startup_64_no_verify
> >       2.60 ±  9%      +3.1        5.74 ±  6%  perf-profile.calltrace.cycles-pp._copy_to_iter.copy_page_to_iter.pipe_read.vfs_read.ksys_read
> >       2.93 ±  9%      +3.7        6.66 ±  5%  perf-profile.calltrace.cycles-pp.copy_page_to_iter.pipe_read.vfs_read.ksys_read.do_syscall_64
> >       1.60 ± 12%      +4.6        6.18 ±  7%  perf-profile.calltrace.cycles-pp.mutex_unlock.pipe_write.vfs_write.ksys_write.do_syscall_64
> >       2.60 ± 10%      +4.6        7.24 ±  5%  perf-profile.calltrace.cycles-pp.mutex_lock.pipe_write.vfs_write.ksys_write.do_syscall_64
> >      28.75 ±  5%     -21.6        7.19 ± 28%  perf-profile.children.cycles-pp.schedule
> >      30.52 ±  4%     -21.6        8.97 ± 22%  perf-profile.children.cycles-pp.__wake_up_common_lock
> >      28.53 ±  6%     -21.0        7.56 ± 26%  perf-profile.children.cycles-pp.__schedule
> >      29.04 ±  5%     -20.4        8.63 ± 23%  perf-profile.children.cycles-pp.__wake_up_common
> >      28.37 ±  5%     -19.9        8.44 ± 23%  perf-profile.children.cycles-pp.autoremove_wake_function
> >      28.08 ±  5%     -19.7        8.33 ± 23%  perf-profile.children.cycles-pp.try_to_wake_up
> >      13.90 ±  2%     -10.2        3.75 ± 28%  perf-profile.children.cycles-pp.ttwu_do_activate
> >      12.66 ±  3%      -9.2        3.47 ± 29%  perf-profile.children.cycles-pp.enqueue_task_fair
> >      34.20            -9.2       25.05 ±  3%  perf-profile.children.cycles-pp.pipe_read
> >      90.86            -9.1       81.73        perf-profile.children.cycles-pp.do_syscall_64
> >      91.80            -8.3       83.49        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
> >      10.28 ±  7%      -7.8        2.53 ± 27%  perf-profile.children.cycles-pp._raw_spin_lock
> >       9.85 ±  7%      -6.9        2.92 ± 29%  perf-profile.children.cycles-pp.dequeue_task_fair
> >       8.69 ±  7%      -6.6        2.05 ± 24%  perf-profile.children.cycles-pp.exit_to_user_mode_prepare
> >       8.99 ±  6%      -6.2        2.81 ± 16%  perf-profile.children.cycles-pp.syscall_exit_to_user_mode
> >      36.46            -6.1       30.34        perf-profile.children.cycles-pp.vfs_read
> >       8.38 ±  8%      -5.8        2.60 ± 23%  perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
> >       6.10 ± 11%      -5.4        0.66 ± 61%  perf-profile.children.cycles-pp.exit_to_user_mode_loop
> >      37.45            -5.3       32.13        perf-profile.children.cycles-pp.ksys_read
> >       6.50 ± 35%      -4.9        1.62 ± 61%  perf-profile.children.cycles-pp.update_curr
> >       6.56 ± 15%      -4.6        1.95 ± 57%  perf-profile.children.cycles-pp.update_cfs_group
> >       6.38 ± 14%      -4.5        1.91 ± 28%  perf-profile.children.cycles-pp.enqueue_entity
> >       5.74 ±  5%      -3.8        1.92 ± 25%  perf-profile.children.cycles-pp.update_load_avg
> >      45.56            -3.8       41.75        perf-profile.children.cycles-pp.__libc_read
> >       3.99 ±  4%      -3.1        0.92 ± 24%  perf-profile.children.cycles-pp.pick_next_task_fair
> >       4.12 ± 27%      -2.7        1.39 ± 34%  perf-profile.children.cycles-pp.dequeue_entity
> >      40.88            -2.5       38.37        perf-profile.children.cycles-pp.pipe_write
> >       3.11 ±  4%      -2.4        0.75 ± 22%  perf-profile.children.cycles-pp.switch_mm_irqs_off
> >       2.06 ± 33%      -1.8        0.27 ± 27%  perf-profile.children.cycles-pp.asm_sysvec_call_function_single
> >       2.38 ± 41%      -1.8        0.60 ± 72%  perf-profile.children.cycles-pp.perf_trace_sched_wakeup_template
> >       2.29 ±  5%      -1.7        0.60 ± 25%  perf-profile.children.cycles-pp.switch_fpu_return
> >       2.30 ±  6%      -1.6        0.68 ± 18%  perf-profile.children.cycles-pp.prepare_task_switch
> >       1.82 ± 33%      -1.6        0.22 ± 31%  perf-profile.children.cycles-pp.sysvec_call_function_single
> >       1.77 ± 33%      -1.6        0.20 ± 32%  perf-profile.children.cycles-pp.__sysvec_call_function_single
> >       1.96 ±  5%      -1.5        0.50 ± 20%  perf-profile.children.cycles-pp.reweight_entity
> >       2.80 ±  7%      -1.2        1.60 ± 12%  perf-profile.children.cycles-pp.select_task_rq
> >       1.61 ±  6%      -1.2        0.42 ± 25%  perf-profile.children.cycles-pp.restore_fpregs_from_fpstate
> >       1.34 ±  9%      -1.2        0.16 ± 28%  perf-profile.children.cycles-pp._raw_spin_unlock_irqrestore
> >       1.62 ±  4%      -1.2        0.45 ± 22%  perf-profile.children.cycles-pp.set_next_entity
> >       1.55 ±  8%      -1.1        0.43 ± 12%  perf-profile.children.cycles-pp.update_rq_clock
> >       1.49 ±  8%      -1.1        0.41 ± 14%  perf-profile.children.cycles-pp.__update_load_avg_cfs_rq
> >       1.30 ± 20%      -1.0        0.26 ± 18%  perf-profile.children.cycles-pp.finish_task_switch
> >       1.44 ±  5%      -1.0        0.42 ± 19%  perf-profile.children.cycles-pp.__switch_to_asm
> >       2.47 ±  7%      -1.0        1.50 ± 12%  perf-profile.children.cycles-pp.select_task_rq_fair
> >       2.33 ±  7%      -0.9        1.40 ±  3%  perf-profile.children.cycles-pp.prepare_to_wait_event
> >       1.24 ±  7%      -0.9        0.35 ± 14%  perf-profile.children.cycles-pp.__update_load_avg_se
> >       1.41 ± 32%      -0.9        0.56 ± 24%  perf-profile.children.cycles-pp.sched_ttwu_pending
> >       2.29 ±  8%      -0.8        1.45 ±  3%  perf-profile.children.cycles-pp._raw_spin_lock_irqsave
> >       1.04 ±  7%      -0.8        0.24 ± 22%  perf-profile.children.cycles-pp.check_preempt_curr
> >       1.01 ±  3%      -0.7        0.30 ± 20%  perf-profile.children.cycles-pp.__switch_to
> >       0.92 ±  7%      -0.7        0.26 ± 12%  perf-profile.children.cycles-pp.update_min_vruntime
> >       0.71 ±  2%      -0.6        0.08 ± 75%  perf-profile.children.cycles-pp.put_prev_entity
> >       0.76 ±  6%      -0.6        0.14 ± 32%  perf-profile.children.cycles-pp.check_preempt_wakeup
> >       0.81 ± 66%      -0.6        0.22 ± 34%  perf-profile.children.cycles-pp.set_task_cpu
> >       0.82 ± 17%      -0.6        0.23 ± 10%  perf-profile.children.cycles-pp.cpuacct_charge
> >       1.08 ± 15%      -0.6        0.51 ± 10%  perf-profile.children.cycles-pp.wake_affine
> >       0.56 ± 15%      -0.5        0.03 ±100%  perf-profile.children.cycles-pp.asm_sysvec_reschedule_ipi
> >       0.66 ±  3%      -0.5        0.15 ± 28%  perf-profile.children.cycles-pp.os_xsave
> >       0.52 ± 44%      -0.5        0.06 ±151%  perf-profile.children.cycles-pp.native_irq_return_iret
> >       0.55 ±  5%      -0.4        0.15 ± 21%  perf-profile.children.cycles-pp.__calc_delta
> >       0.56 ± 10%      -0.4        0.17 ± 26%  perf-profile.children.cycles-pp.___perf_sw_event
> >       0.70 ± 15%      -0.4        0.32 ± 11%  perf-profile.children.cycles-pp.task_h_load
> >       0.40 ±  4%      -0.3        0.06 ± 49%  perf-profile.children.cycles-pp.pick_next_entity
> >       0.57 ±  6%      -0.3        0.26 ±  7%  perf-profile.children.cycles-pp.__list_del_entry_valid
> >       0.39 ±  8%      -0.3        0.08 ± 24%  perf-profile.children.cycles-pp.set_next_buddy
> >       0.64 ±  6%      -0.3        0.36 ±  6%  perf-profile.children.cycles-pp._raw_spin_lock_irq
> >       0.53 ± 20%      -0.3        0.25 ±  8%  perf-profile.children.cycles-pp.ttwu_queue_wakelist
> >       0.36 ±  8%      -0.3        0.08 ± 11%  perf-profile.children.cycles-pp.rb_insert_color
> >       0.41 ±  6%      -0.3        0.14 ± 17%  perf-profile.children.cycles-pp.sched_clock_cpu
> >       0.36 ± 33%      -0.3        0.10 ± 17%  perf-profile.children.cycles-pp.__flush_smp_call_function_queue
> >       0.37 ±  4%      -0.2        0.13 ± 16%  perf-profile.children.cycles-pp.native_sched_clock
> >       0.28 ±  5%      -0.2        0.07 ± 18%  perf-profile.children.cycles-pp.rb_erase
> >       0.32 ±  7%      -0.2        0.12 ± 10%  perf-profile.children.cycles-pp.__list_add_valid
> >       0.23 ±  6%      -0.2        0.03 ±103%  perf-profile.children.cycles-pp.resched_curr
> >       0.27 ±  5%      -0.2        0.08 ± 20%  perf-profile.children.cycles-pp.__wrgsbase_inactive
> >       0.26 ±  6%      -0.2        0.08 ± 17%  perf-profile.children.cycles-pp.finish_wait
> >       0.26 ±  4%      -0.2        0.08 ± 11%  perf-profile.children.cycles-pp.rcu_note_context_switch
> >       0.33 ± 21%      -0.2        0.15 ± 32%  perf-profile.children.cycles-pp.migrate_task_rq_fair
> >       0.22 ±  9%      -0.2        0.07 ± 22%  perf-profile.children.cycles-pp.perf_trace_buf_update
> >       0.17 ±  8%      -0.1        0.03 ±100%  perf-profile.children.cycles-pp.rb_next
> >       0.15 ± 32%      -0.1        0.03 ±100%  perf-profile.children.cycles-pp.llist_reverse_order
> >       0.34 ±  7%      -0.1        0.26 ±  3%  perf-profile.children.cycles-pp.anon_pipe_buf_release
> >       0.14 ±  6%      -0.1        0.07 ± 17%  perf-profile.children.cycles-pp.read@plt
> >       0.10 ± 17%      -0.1        0.04 ± 75%  perf-profile.children.cycles-pp.remove_entity_load_avg
> >       0.07 ± 10%      -0.0        0.02 ± 99%  perf-profile.children.cycles-pp.generic_update_time
> >       0.11 ±  6%      -0.0        0.07 ±  8%  perf-profile.children.cycles-pp.__mark_inode_dirty
> >       0.00            +0.1        0.06 ±  9%  perf-profile.children.cycles-pp.load_balance
> >       0.00            +0.1        0.06 ± 11%  perf-profile.children.cycles-pp._raw_spin_trylock
> >       0.00            +0.1        0.06 ±  7%  perf-profile.children.cycles-pp.uncharge_folio
> >       0.00            +0.1        0.06 ±  7%  perf-profile.children.cycles-pp.__do_softirq
> >       0.00            +0.1        0.07 ± 10%  perf-profile.children.cycles-pp.tick_nohz_get_sleep_length
> >       0.00            +0.1        0.08 ± 14%  perf-profile.children.cycles-pp.__get_obj_cgroup_from_memcg
> >       0.15 ± 23%      +0.1        0.23 ±  7%  perf-profile.children.cycles-pp.task_tick_fair
> >       0.19 ± 17%      +0.1        0.28 ±  7%  perf-profile.children.cycles-pp.scheduler_tick
> >       0.00            +0.1        0.10 ± 21%  perf-profile.children.cycles-pp.select_idle_core
> >       0.00            +0.1        0.10 ±  9%  perf-profile.children.cycles-pp.osq_unlock
> >       0.23 ± 12%      +0.1        0.34 ±  6%  perf-profile.children.cycles-pp.update_process_times
> >       0.37 ± 13%      +0.1        0.48 ±  5%  perf-profile.children.cycles-pp.hrtimer_interrupt
> >       0.24 ± 12%      +0.1        0.35 ±  6%  perf-profile.children.cycles-pp.tick_sched_handle
> >       0.31 ± 14%      +0.1        0.43 ±  4%  perf-profile.children.cycles-pp.__hrtimer_run_queues
> >       0.37 ± 12%      +0.1        0.49 ±  5%  perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
> >       0.00            +0.1        0.12 ± 10%  perf-profile.children.cycles-pp.__mod_memcg_state
> >       0.26 ± 10%      +0.1        0.38 ±  6%  perf-profile.children.cycles-pp.tick_sched_timer
> >       0.00            +0.1        0.13 ±  7%  perf-profile.children.cycles-pp.free_unref_page
> >       0.00            +0.1        0.14 ±  8%  perf-profile.children.cycles-pp.rmqueue
> >       0.15 ±  8%      +0.2        0.30 ±  5%  perf-profile.children.cycles-pp.rcu_all_qs
> >       0.16 ±  6%      +0.2        0.31 ±  5%  perf-profile.children.cycles-pp.__x64_sys_write
> >       0.00            +0.2        0.16 ± 10%  perf-profile.children.cycles-pp.propagate_protected_usage
> >       0.00            +0.2        0.16 ± 10%  perf-profile.children.cycles-pp.menu_select
> >       0.00            +0.2        0.16 ±  9%  perf-profile.children.cycles-pp.memcg_account_kmem
> >       0.42 ± 12%      +0.2        0.57 ±  4%  perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
> >       0.15 ± 11%      +0.2        0.31 ±  8%  perf-profile.children.cycles-pp.__x64_sys_read
> >       0.00            +0.2        0.17 ±  8%  perf-profile.children.cycles-pp.get_page_from_freelist
> >       0.44 ± 11%      +0.2        0.62 ±  4%  perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
> >       0.10 ± 31%      +0.2        0.28 ± 24%  perf-profile.children.cycles-pp.mnt_user_ns
> >       0.16 ±  4%      +0.2        0.35 ±  5%  perf-profile.children.cycles-pp.kill_fasync
> >       0.20 ± 10%      +0.2        0.40 ±  3%  perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare
> >       0.09 ±  7%      +0.2        0.29 ±  4%  perf-profile.children.cycles-pp.page_copy_sane
> >       0.08 ±  8%      +0.2        0.31 ±  6%  perf-profile.children.cycles-pp.rw_verify_area
> >       0.12 ± 11%      +0.2        0.36 ±  8%  perf-profile.children.cycles-pp.ktime_get_coarse_real_ts64
> >       0.28 ± 12%      +0.2        0.52 ±  5%  perf-profile.children.cycles-pp.inode_needs_update_time
> >       0.00            +0.3        0.27 ±  7%  perf-profile.children.cycles-pp.__memcg_kmem_charge_page
> >       0.43 ±  6%      +0.3        0.73 ±  5%  perf-profile.children.cycles-pp.__cond_resched
> >       0.21 ± 29%      +0.3        0.54 ± 15%  perf-profile.children.cycles-pp.select_idle_cpu
> >       0.10 ± 10%      +0.3        0.43 ± 17%  perf-profile.children.cycles-pp.fsnotify_perm
> >       0.23 ± 11%      +0.3        0.56 ±  6%  perf-profile.children.cycles-pp.syscall_enter_from_user_mode
> >       0.06 ± 75%      +0.4        0.47 ± 27%  perf-profile.children.cycles-pp.queue_event
> >       0.21 ±  9%      +0.4        0.62 ±  5%  perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
> >       0.06 ± 75%      +0.4        0.48 ± 26%  perf-profile.children.cycles-pp.ordered_events__queue
> >       0.06 ± 73%      +0.4        0.50 ± 24%  perf-profile.children.cycles-pp.process_simple
> >       0.01 ±223%      +0.4        0.44 ±  9%  perf-profile.children.cycles-pp.schedule_idle
> >       0.05 ±  8%      +0.5        0.52 ±  7%  perf-profile.children.cycles-pp.__alloc_pages
> >       0.45 ±  7%      +0.5        0.94 ±  5%  perf-profile.children.cycles-pp.__get_task_ioprio
> >       0.89 ±  8%      +0.5        1.41 ±  4%  perf-profile.children.cycles-pp.__might_sleep
> >       0.01 ±223%      +0.5        0.54 ± 21%  perf-profile.children.cycles-pp.flush_smp_call_function_queue
> >       0.05 ± 46%      +0.5        0.60 ±  7%  perf-profile.children.cycles-pp.osq_lock
> >       0.34 ±  8%      +0.6        0.90 ±  5%  perf-profile.children.cycles-pp.aa_file_perm
> >       0.01 ±223%      +0.7        0.67 ±  7%  perf-profile.children.cycles-pp.poll_idle
> >       0.14 ± 17%      +0.7        0.82 ±  6%  perf-profile.children.cycles-pp.mutex_spin_on_owner
> >       0.12 ± 12%      +0.7        0.82 ± 15%  perf-profile.children.cycles-pp.__cmd_record
> >       0.07 ± 72%      +0.7        0.78 ± 19%  perf-profile.children.cycles-pp.reader__read_event
> >       0.07 ± 72%      +0.7        0.78 ± 19%  perf-profile.children.cycles-pp.record__finish_output
> >       0.07 ± 72%      +0.7        0.78 ± 19%  perf-profile.children.cycles-pp.perf_session__process_events
> >       0.76 ±  8%      +0.8        1.52 ±  5%  perf-profile.children.cycles-pp.file_update_time
> >       0.08 ± 61%      +0.8        0.85 ± 11%  perf-profile.children.cycles-pp.intel_idle_irq
> >       1.23 ±  8%      +0.9        2.11 ±  4%  perf-profile.children.cycles-pp.__might_fault
> >       0.02 ±141%      +1.0        0.97 ±  7%  perf-profile.children.cycles-pp.page_counter_uncharge
> >       0.51 ±  9%      +1.0        1.48 ±  4%  perf-profile.children.cycles-pp.current_time
> >       0.05 ± 46%      +1.1        1.15 ±  7%  perf-profile.children.cycles-pp.uncharge_batch
> >       1.12 ±  6%      +1.1        2.23 ±  5%  perf-profile.children.cycles-pp.__fget_light
> >       0.06 ± 14%      +1.2        1.23 ±  6%  perf-profile.children.cycles-pp.__mem_cgroup_uncharge
> >       0.06 ± 14%      +1.2        1.24 ±  7%  perf-profile.children.cycles-pp.__folio_put
> >       0.64 ±  7%      +1.2        1.83 ±  5%  perf-profile.children.cycles-pp.syscall_return_via_sysret
> >       1.19 ±  8%      +1.2        2.42 ±  4%  perf-profile.children.cycles-pp.__might_resched
> >       0.59 ±  9%      +1.3        1.84 ±  6%  perf-profile.children.cycles-pp.atime_needs_update
> >      43.47            +1.4       44.83        perf-profile.children.cycles-pp.ksys_write
> >       1.28 ±  6%      +1.4        2.68 ±  5%  perf-profile.children.cycles-pp.__fdget_pos
> >       0.80 ±  8%      +1.5        2.28 ±  6%  perf-profile.children.cycles-pp.touch_atime
> >       0.11 ± 49%      +1.5        1.59 ±  9%  perf-profile.children.cycles-pp.cpuidle_enter_state
> >       0.11 ± 49%      +1.5        1.60 ±  9%  perf-profile.children.cycles-pp.cpuidle_enter
> >       0.12 ± 51%      +1.7        1.81 ±  9%  perf-profile.children.cycles-pp.cpuidle_idle_call
> >       1.44 ±  8%      +1.8        3.22 ±  6%  perf-profile.children.cycles-pp.copyin
> >       2.00 ±  9%      +2.0        4.03 ±  5%  perf-profile.children.cycles-pp.copyout
> >       1.02 ±  8%      +2.0        3.07 ±  5%  perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
> >       1.63 ±  7%      +2.3        3.90 ±  5%  perf-profile.children.cycles-pp.apparmor_file_permission
> >       2.64 ±  8%      +2.3        4.98 ±  5%  perf-profile.children.cycles-pp._copy_from_iter
> >       0.40 ± 14%      +2.5        2.92 ±  7%  perf-profile.children.cycles-pp.__mutex_lock
> >       2.91 ±  8%      +2.6        5.54 ±  5%  perf-profile.children.cycles-pp.copy_page_from_iter
> >       0.17 ± 62%      +2.7        2.91 ± 11%  perf-profile.children.cycles-pp.start_secondary
> >       1.83 ±  7%      +2.8        4.59 ±  5%  perf-profile.children.cycles-pp.security_file_permission
> >       0.17 ± 60%      +2.8        2.94 ± 11%  perf-profile.children.cycles-pp.do_idle
> >       0.17 ± 60%      +2.8        2.96 ± 11%  perf-profile.children.cycles-pp.secondary_startup_64_no_verify
> >       0.17 ± 60%      +2.8        2.96 ± 11%  perf-profile.children.cycles-pp.cpu_startup_entry
> >       2.62 ±  9%      +3.2        5.84 ±  6%  perf-profile.children.cycles-pp._copy_to_iter
> >       1.55 ±  8%      +3.2        4.79 ±  5%  perf-profile.children.cycles-pp.__entry_text_start
> >       3.09 ±  8%      +3.7        6.77 ±  5%  perf-profile.children.cycles-pp.copy_user_enhanced_fast_string
> >       2.95 ±  9%      +3.8        6.73 ±  5%  perf-profile.children.cycles-pp.copy_page_to_iter
> >       2.28 ± 11%      +5.1        7.40 ±  6%  perf-profile.children.cycles-pp.mutex_unlock
> >       3.92 ±  9%      +6.0        9.94 ±  5%  perf-profile.children.cycles-pp.mutex_lock
> >       8.37 ±  9%      -5.8        2.60 ± 23%  perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
> >       6.54 ± 15%      -4.6        1.95 ± 57%  perf-profile.self.cycles-pp.update_cfs_group
> >       3.08 ±  4%      -2.3        0.74 ± 22%  perf-profile.self.cycles-pp.switch_mm_irqs_off
> >       2.96 ±  4%      -1.8        1.13 ± 33%  perf-profile.self.cycles-pp.update_load_avg
> >       2.22 ±  8%      -1.5        0.74 ± 12%  perf-profile.self.cycles-pp._raw_spin_lock_irqsave
> >       1.96 ±  9%      -1.5        0.48 ± 15%  perf-profile.self.cycles-pp.update_curr
> >       1.94 ±  5%      -1.3        0.64 ± 16%  perf-profile.self.cycles-pp._raw_spin_lock
> >       1.78 ±  5%      -1.3        0.50 ± 18%  perf-profile.self.cycles-pp.__schedule
> >       1.59 ±  7%      -1.2        0.40 ± 12%  perf-profile.self.cycles-pp.enqueue_entity
> >       1.61 ±  6%      -1.2        0.42 ± 25%  perf-profile.self.cycles-pp.restore_fpregs_from_fpstate
> >       1.44 ±  8%      -1.0        0.39 ± 14%  perf-profile.self.cycles-pp.__update_load_avg_cfs_rq
> >       1.42 ±  5%      -1.0        0.41 ± 19%  perf-profile.self.cycles-pp.__switch_to_asm
> >       1.18 ±  7%      -0.9        0.33 ± 14%  perf-profile.self.cycles-pp.__update_load_avg_se
> >       1.14 ± 10%      -0.8        0.31 ±  9%  perf-profile.self.cycles-pp.update_rq_clock
> >       0.90 ±  7%      -0.7        0.19 ± 21%  perf-profile.self.cycles-pp.pick_next_task_fair
> >       1.04 ±  7%      -0.7        0.33 ± 13%  perf-profile.self.cycles-pp.prepare_task_switch
> >       0.98 ±  4%      -0.7        0.29 ± 20%  perf-profile.self.cycles-pp.__switch_to
> >       0.88 ±  6%      -0.7        0.20 ± 17%  perf-profile.self.cycles-pp.enqueue_task_fair
> >       1.01 ±  6%      -0.7        0.35 ± 10%  perf-profile.self.cycles-pp.prepare_to_wait_event
> >       0.90 ±  8%      -0.6        0.25 ± 12%  perf-profile.self.cycles-pp.update_min_vruntime
> >       0.79 ± 17%      -0.6        0.22 ±  9%  perf-profile.self.cycles-pp.cpuacct_charge
> >       1.10 ±  5%      -0.6        0.54 ±  9%  perf-profile.self.cycles-pp.try_to_wake_up
> >       0.66 ±  3%      -0.5        0.15 ± 27%  perf-profile.self.cycles-pp.os_xsave
> >       0.71 ±  6%      -0.5        0.22 ± 18%  perf-profile.self.cycles-pp.reweight_entity
> >       0.68 ±  9%      -0.5        0.19 ± 10%  perf-profile.self.cycles-pp.perf_trace_sched_wakeup_template
> >       0.67 ±  9%      -0.5        0.18 ± 11%  perf-profile.self.cycles-pp.__wake_up_common
> >       0.65 ±  6%      -0.5        0.17 ± 23%  perf-profile.self.cycles-pp.switch_fpu_return
> >       0.60 ± 11%      -0.5        0.14 ± 28%  perf-profile.self.cycles-pp.perf_tp_event
> >       0.52 ± 44%      -0.5        0.06 ±151%  perf-profile.self.cycles-pp.native_irq_return_iret
> >       0.52 ±  7%      -0.4        0.08 ± 25%  perf-profile.self.cycles-pp._raw_spin_unlock_irqrestore
> >       0.55 ±  4%      -0.4        0.15 ± 22%  perf-profile.self.cycles-pp.__calc_delta
> >       0.61 ±  5%      -0.4        0.21 ± 12%  perf-profile.self.cycles-pp.dequeue_task_fair
> >       0.69 ± 14%      -0.4        0.32 ± 11%  perf-profile.self.cycles-pp.task_h_load
> >       0.49 ± 11%      -0.3        0.15 ± 29%  perf-profile.self.cycles-pp.___perf_sw_event
> >       0.37 ±  4%      -0.3        0.05 ± 73%  perf-profile.self.cycles-pp.pick_next_entity
> >       0.50 ±  3%      -0.3        0.19 ± 15%  perf-profile.self.cycles-pp.select_idle_sibling
> >       0.38 ±  9%      -0.3        0.08 ± 24%  perf-profile.self.cycles-pp.set_next_buddy
> >       0.32 ±  4%      -0.3        0.03 ±100%  perf-profile.self.cycles-pp.put_prev_entity
> >       0.64 ±  6%      -0.3        0.35 ±  7%  perf-profile.self.cycles-pp._raw_spin_lock_irq
> >       0.52 ±  5%      -0.3        0.25 ±  6%  perf-profile.self.cycles-pp.__list_del_entry_valid
> >       0.34 ±  5%      -0.3        0.07 ± 29%  perf-profile.self.cycles-pp.schedule
> >       0.35 ±  9%      -0.3        0.08 ± 10%  perf-profile.self.cycles-pp.rb_insert_color
> >       0.40 ±  5%      -0.3        0.14 ± 16%  perf-profile.self.cycles-pp.select_task_rq_fair
> >       0.33 ±  6%      -0.3        0.08 ± 16%  perf-profile.self.cycles-pp.check_preempt_wakeup
> >       0.33 ±  8%      -0.2        0.10 ± 16%  perf-profile.self.cycles-pp.select_task_rq
> >       0.36 ±  3%      -0.2        0.13 ± 16%  perf-profile.self.cycles-pp.native_sched_clock
> >       0.32 ±  7%      -0.2        0.10 ± 14%  perf-profile.self.cycles-pp.finish_task_switch
> >       0.32 ±  4%      -0.2        0.11 ± 13%  perf-profile.self.cycles-pp.dequeue_entity
> >       0.32 ±  8%      -0.2        0.12 ± 10%  perf-profile.self.cycles-pp.__list_add_valid
> >       0.23 ±  5%      -0.2        0.03 ±103%  perf-profile.self.cycles-pp.resched_curr
> >       0.27 ±  6%      -0.2        0.07 ± 21%  perf-profile.self.cycles-pp.rb_erase
> >       0.27 ±  5%      -0.2        0.08 ± 20%  perf-profile.self.cycles-pp.__wrgsbase_inactive
> >       0.28 ± 13%      -0.2        0.09 ± 12%  perf-profile.self.cycles-pp.check_preempt_curr
> >       0.30 ± 13%      -0.2        0.12 ±  7%  perf-profile.self.cycles-pp.ttwu_queue_wakelist
> >       0.24 ±  5%      -0.2        0.06 ± 19%  perf-profile.self.cycles-pp.set_next_entity
> >       0.21 ± 34%      -0.2        0.04 ± 71%  perf-profile.self.cycles-pp.__flush_smp_call_function_queue
> >       0.25 ±  5%      -0.2        0.08 ± 16%  perf-profile.self.cycles-pp.rcu_note_context_switch
> >       0.19 ± 26%      -0.1        0.04 ± 73%  perf-profile.self.cycles-pp.perf_trace_sched_stat_runtime
> >       0.20 ±  8%      -0.1        0.06 ± 13%  perf-profile.self.cycles-pp.ttwu_do_activate
> >       0.17 ±  8%      -0.1        0.03 ±100%  perf-profile.self.cycles-pp.rb_next
> >       0.22 ± 23%      -0.1        0.09 ± 31%  perf-profile.self.cycles-pp.migrate_task_rq_fair
> >       0.15 ± 32%      -0.1        0.03 ±100%  perf-profile.self.cycles-pp.llist_reverse_order
> >       0.16 ±  8%      -0.1        0.06 ± 14%  perf-profile.self.cycles-pp.wake_affine
> >       0.10 ± 31%      -0.1        0.03 ±100%  perf-profile.self.cycles-pp.sched_ttwu_pending
> >       0.14 ±  5%      -0.1        0.07 ± 20%  perf-profile.self.cycles-pp.read@plt
> >       0.32 ±  8%      -0.1        0.26 ±  3%  perf-profile.self.cycles-pp.anon_pipe_buf_release
> >       0.10 ±  6%      -0.1        0.04 ± 45%  perf-profile.self.cycles-pp.__wake_up_common_lock
> >       0.10 ±  9%      -0.0        0.07 ±  8%  perf-profile.self.cycles-pp.__mark_inode_dirty
> >       0.00            +0.1        0.06 ± 11%  perf-profile.self.cycles-pp.free_unref_page
> >       0.00            +0.1        0.06 ±  6%  perf-profile.self.cycles-pp.__alloc_pages
> >       0.00            +0.1        0.06 ± 11%  perf-profile.self.cycles-pp._raw_spin_trylock
> >       0.00            +0.1        0.06 ±  7%  perf-profile.self.cycles-pp.uncharge_folio
> >       0.00            +0.1        0.06 ± 11%  perf-profile.self.cycles-pp.uncharge_batch
> >       0.00            +0.1        0.07 ± 10%  perf-profile.self.cycles-pp.menu_select
> >       0.00            +0.1        0.08 ± 14%  perf-profile.self.cycles-pp.__get_obj_cgroup_from_memcg
> >       0.00            +0.1        0.08 ±  7%  perf-profile.self.cycles-pp.__memcg_kmem_charge_page
> >       0.00            +0.1        0.10 ± 10%  perf-profile.self.cycles-pp.osq_unlock
> >       0.07 ±  5%      +0.1        0.17 ±  8%  perf-profile.self.cycles-pp.copyin
> >       0.00            +0.1        0.11 ± 11%  perf-profile.self.cycles-pp.__mod_memcg_state
> >       0.13 ±  8%      +0.1        0.24 ±  6%  perf-profile.self.cycles-pp.rcu_all_qs
> >       0.14 ±  5%      +0.1        0.28 ±  5%  perf-profile.self.cycles-pp.__x64_sys_write
> >       0.07 ± 10%      +0.1        0.21 ±  5%  perf-profile.self.cycles-pp.page_copy_sane
> >       0.13 ± 12%      +0.1        0.28 ±  9%  perf-profile.self.cycles-pp.__x64_sys_read
> >       0.00            +0.2        0.15 ± 10%  perf-profile.self.cycles-pp.propagate_protected_usage
> >       0.18 ±  9%      +0.2        0.33 ±  4%  perf-profile.self.cycles-pp.syscall_exit_to_user_mode_prepare
> >       0.07 ±  8%      +0.2        0.23 ±  5%  perf-profile.self.cycles-pp.rw_verify_area
> >       0.08 ± 34%      +0.2        0.24 ± 27%  perf-profile.self.cycles-pp.mnt_user_ns
> >       0.13 ±  5%      +0.2        0.31 ±  7%  perf-profile.self.cycles-pp.kill_fasync
> >       0.21 ±  8%      +0.2        0.39 ±  5%  perf-profile.self.cycles-pp.__might_fault
> >       0.06 ± 13%      +0.2        0.26 ±  9%  perf-profile.self.cycles-pp.copyout
> >       0.10 ± 11%      +0.2        0.31 ±  8%  perf-profile.self.cycles-pp.ktime_get_coarse_real_ts64
> >       0.26 ± 13%      +0.2        0.49 ±  6%  perf-profile.self.cycles-pp.inode_needs_update_time
> >       0.23 ±  8%      +0.2        0.47 ±  5%  perf-profile.self.cycles-pp.copy_page_from_iter
> >       0.14 ±  7%      +0.2        0.38 ±  6%  perf-profile.self.cycles-pp.file_update_time
> >       0.36 ±  7%      +0.3        0.62 ±  4%  perf-profile.self.cycles-pp.ksys_read
> >       0.54 ± 13%      +0.3        0.80 ±  4%  perf-profile.self.cycles-pp._copy_from_iter
> >       0.15 ±  5%      +0.3        0.41 ±  8%  perf-profile.self.cycles-pp.touch_atime
> >       0.14 ±  5%      +0.3        0.40 ±  6%  perf-profile.self.cycles-pp.__cond_resched
> >       0.18 ±  5%      +0.3        0.47 ±  4%  perf-profile.self.cycles-pp.syscall_exit_to_user_mode
> >       0.16 ±  8%      +0.3        0.46 ±  6%  perf-profile.self.cycles-pp.syscall_enter_from_user_mode
> >       0.16 ±  9%      +0.3        0.47 ±  6%  perf-profile.self.cycles-pp.__fdget_pos
> >       1.79 ±  8%      +0.3        2.12 ±  3%  perf-profile.self.cycles-pp.pipe_read
> >       0.10 ±  8%      +0.3        0.43 ± 17%  perf-profile.self.cycles-pp.fsnotify_perm
> >       0.20 ±  4%      +0.4        0.55 ±  5%  perf-profile.self.cycles-pp.ksys_write
> >       0.05 ± 76%      +0.4        0.46 ± 27%  perf-profile.self.cycles-pp.queue_event
> >       0.32 ±  6%      +0.4        0.73 ±  6%  perf-profile.self.cycles-pp.exit_to_user_mode_prepare
> >       0.21 ±  9%      +0.4        0.62 ±  6%  perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
> >       0.79 ±  8%      +0.4        1.22 ±  4%  perf-profile.self.cycles-pp.__might_sleep
> >       0.44 ±  5%      +0.4        0.88 ±  7%  perf-profile.self.cycles-pp.do_syscall_64
> >       0.26 ±  8%      +0.4        0.70 ±  4%  perf-profile.self.cycles-pp.atime_needs_update
> >       0.42 ±  7%      +0.5        0.88 ±  5%  perf-profile.self.cycles-pp.__get_task_ioprio
> >       0.28 ± 12%      +0.5        0.75 ±  5%  perf-profile.self.cycles-pp.copy_page_to_iter
> >       0.19 ±  6%      +0.5        0.68 ± 10%  perf-profile.self.cycles-pp.security_file_permission
> >       0.31 ±  8%      +0.5        0.83 ±  5%  perf-profile.self.cycles-pp.aa_file_perm
> >       0.05 ± 46%      +0.5        0.59 ±  8%  perf-profile.self.cycles-pp.osq_lock
> >       0.30 ±  7%      +0.5        0.85 ±  6%  perf-profile.self.cycles-pp._copy_to_iter
> >       0.00            +0.6        0.59 ±  6%  perf-profile.self.cycles-pp.poll_idle
> >       0.13 ± 20%      +0.7        0.81 ±  6%  perf-profile.self.cycles-pp.mutex_spin_on_owner
> >       0.38 ±  9%      +0.7        1.12 ±  5%  perf-profile.self.cycles-pp.current_time
> >       0.08 ± 59%      +0.8        0.82 ± 11%  perf-profile.self.cycles-pp.intel_idle_irq
> >       0.92 ±  6%      +0.8        1.72 ±  4%  perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
> >       0.01 ±223%      +0.8        0.82 ±  6%  perf-profile.self.cycles-pp.page_counter_uncharge
> >       0.86 ±  7%      +1.1        1.91 ±  4%  perf-profile.self.cycles-pp.vfs_read
> >       1.07 ±  6%      +1.1        2.14 ±  5%  perf-profile.self.cycles-pp.__fget_light
> >       0.67 ±  7%      +1.1        1.74 ±  6%  perf-profile.self.cycles-pp.vfs_write
> >       0.15 ± 12%      +1.1        1.28 ±  7%  perf-profile.self.cycles-pp.__mutex_lock
> >       1.09 ±  6%      +1.1        2.22 ±  5%  perf-profile.self.cycles-pp.__libc_read
> >       0.62 ±  6%      +1.2        1.79 ±  5%  perf-profile.self.cycles-pp.syscall_return_via_sysret
> >       1.16 ±  8%      +1.2        2.38 ±  4%  perf-profile.self.cycles-pp.__might_resched
> >       0.91 ±  7%      +1.3        2.20 ±  5%  perf-profile.self.cycles-pp.__libc_write
> >       0.59 ±  8%      +1.3        1.93 ±  6%  perf-profile.self.cycles-pp.__entry_text_start
> >       1.27 ±  7%      +1.7        3.00 ±  6%  perf-profile.self.cycles-pp.apparmor_file_permission
> >       0.99 ±  8%      +2.0        2.98 ±  5%  perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
> >       1.74 ±  8%      +3.4        5.15 ±  6%  perf-profile.self.cycles-pp.pipe_write
> >       2.98 ±  8%      +3.7        6.64 ±  5%  perf-profile.self.cycles-pp.copy_user_enhanced_fast_string
> >       2.62 ± 10%      +4.8        7.38 ±  5%  perf-profile.self.cycles-pp.mutex_lock
> >       2.20 ± 10%      +5.1        7.30 ±  6%  perf-profile.self.cycles-pp.mutex_unlock
> >
> >
> > ***************************************************************************************************
> > lkp-skl-fpga01: 104 threads 2 sockets (Skylake) with 192G memory
> > =========================================================================================
> > compiler/cpufreq_governor/ipc/iterations/kconfig/mode/nr_threads/rootfs/tbox_group/testcase:
> >   gcc-11/performance/socket/4/x86_64-rhel-8.3/process/100%/debian-11.1-x86_64-20220510.cgz/lkp-skl-fpga01/hackbench
> >
> > commit:
> >   a2e90611b9 ("sched/fair: Remove capacity inversion detection")
> >   829c1651e9 ("sched/fair: sanitize vruntime of entity being placed")
> >
> > a2e90611b9f425ad 829c1651e9c4a6f78398d3e6765
> > ---------------- ---------------------------
> >          %stddev     %change         %stddev
> >              \          |                \
> >     177139            -8.1%     162815        hackbench.throughput
> >     174484           -18.8%     141618 ±  2%  hackbench.throughput_avg
> >     177139            -8.1%     162815        hackbench.throughput_best
> >     168530           -37.3%     105615 ±  3%  hackbench.throughput_worst
> >     281.38           +23.1%     346.39 ±  2%  hackbench.time.elapsed_time
> >     281.38           +23.1%     346.39 ±  2%  hackbench.time.elapsed_time.max
> >  1.053e+08 ±  2%    +688.4%  8.302e+08 ±  9%  hackbench.time.involuntary_context_switches
> >      21992           +27.8%      28116 ±  2%  hackbench.time.system_time
> >       6652            +8.2%       7196        hackbench.time.user_time
> >  3.482e+08          +289.2%  1.355e+09 ±  9%  hackbench.time.voluntary_context_switches
> >    2110813 ±  5%     +21.6%    2565791 ±  3%  cpuidle..usage
> >     333.95           +19.5%     399.05        uptime.boot
> >       0.03            -0.0        0.03        mpstat.cpu.all.soft%
> >      22.68            -2.9       19.77        mpstat.cpu.all.usr%
> >     561083 ± 10%     +45.5%     816171 ± 12%  numa-numastat.node0.local_node
> >     614314 ±  9%     +36.9%     841173 ± 12%  numa-numastat.node0.numa_hit
> >    1393279 ±  7%     -16.8%    1158997 ±  2%  numa-numastat.node1.local_node
> >    1443679 ±  5%     -14.9%    1229074 ±  3%  numa-numastat.node1.numa_hit
> >    4129900 ±  8%     -23.0%    3181115        vmstat.memory.cache
> >       1731           +30.8%       2265        vmstat.procs.r
> >    1598044          +290.3%    6237840 ±  7%  vmstat.system.cs
> >     320762           +60.5%     514672 ±  8%  vmstat.system.in
> >     962111 ±  6%     +46.0%    1404646 ±  7%  turbostat.C1
> >     233987 ±  5%     +51.2%     353892        turbostat.C1E
> >   91515563           +97.3%  1.806e+08 ± 10%  turbostat.IRQ
> >     448466 ± 14%     -34.2%     294934 ±  5%  turbostat.POLL
> >      34.60            -7.3%      32.07        turbostat.RAMWatt
> >     514028 ±  2%     -14.0%     442125 ±  2%  meminfo.AnonPages
> >    4006312 ±  8%     -23.9%    3047078        meminfo.Cached
> >    3321064 ± 10%     -32.7%    2236362 ±  2%  meminfo.Committed_AS
> >    1714752 ± 21%     -60.3%     680479 ±  8%  meminfo.Inactive
> >    1714585 ± 21%     -60.3%     680305 ±  8%  meminfo.Inactive(anon)
> >     757124 ± 18%     -67.2%     248485 ± 27%  meminfo.Mapped
> >    6476123 ±  6%     -19.4%    5220738        meminfo.Memused
> >    1275724 ± 26%     -75.2%     316896 ± 15%  meminfo.Shmem
> >    6806047 ±  3%     -13.3%    5901974        meminfo.max_used_kB
> >     161311 ± 23%     +31.7%     212494 ±  5%  numa-meminfo.node0.AnonPages
> >     165693 ± 22%     +30.5%     216264 ±  5%  numa-meminfo.node0.Inactive
> >     165563 ± 22%     +30.6%     216232 ±  5%  numa-meminfo.node0.Inactive(anon)
> >     140638 ± 19%     -36.7%      89034 ± 11%  numa-meminfo.node0.Mapped
> >     352173 ± 14%     -35.3%     227805 ±  8%  numa-meminfo.node1.AnonPages
> >     501396 ± 11%     -22.6%     388042 ±  5%  numa-meminfo.node1.AnonPages.max
> >    1702242 ± 43%     -77.8%     378325 ± 22%  numa-meminfo.node1.FilePages
> >    1540803 ± 25%     -70.4%     455592 ± 13%  numa-meminfo.node1.Inactive
> >    1540767 ± 25%     -70.4%     455451 ± 13%  numa-meminfo.node1.Inactive(anon)
> >     612123 ± 18%     -74.9%     153752 ± 37%  numa-meminfo.node1.Mapped
> >    3085231 ± 24%     -53.9%    1420940 ± 14%  numa-meminfo.node1.MemUsed
> >     254052 ±  4%     -19.1%     205632 ± 21%  numa-meminfo.node1.SUnreclaim
> >    1259640 ± 27%     -75.9%     303123 ± 15%  numa-meminfo.node1.Shmem
> >     304597 ±  7%     -20.2%     242920 ± 17%  numa-meminfo.node1.Slab
> >      40345 ± 23%     +31.5%      53054 ±  5%  numa-vmstat.node0.nr_anon_pages
> >      41412 ± 22%     +30.4%      53988 ±  5%  numa-vmstat.node0.nr_inactive_anon
> >      35261 ± 19%     -36.9%      22256 ± 12%  numa-vmstat.node0.nr_mapped
> >      41412 ± 22%     +30.4%      53988 ±  5%  numa-vmstat.node0.nr_zone_inactive_anon
> >     614185 ±  9%     +36.9%     841065 ± 12%  numa-vmstat.node0.numa_hit
> >     560955 ± 11%     +45.5%     816063 ± 12%  numa-vmstat.node0.numa_local
> >      88129 ± 14%     -35.2%      57097 ±  8%  numa-vmstat.node1.nr_anon_pages
> >     426425 ± 43%     -77.9%      94199 ± 22%  numa-vmstat.node1.nr_file_pages
> >     386166 ± 25%     -70.5%     113880 ± 13%  numa-vmstat.node1.nr_inactive_anon
> >     153658 ± 18%     -75.3%      38021 ± 37%  numa-vmstat.node1.nr_mapped
> >     315775 ± 27%     -76.1%      75399 ± 16%  numa-vmstat.node1.nr_shmem
> >      63411 ±  4%     -18.6%      51593 ± 21%  numa-vmstat.node1.nr_slab_unreclaimable
> >     386166 ± 25%     -70.5%     113880 ± 13%  numa-vmstat.node1.nr_zone_inactive_anon
> >    1443470 ±  5%     -14.9%    1228740 ±  3%  numa-vmstat.node1.numa_hit
> >    1393069 ±  7%     -16.8%    1158664 ±  2%  numa-vmstat.node1.numa_local
> >     128457 ±  2%     -14.0%     110530 ±  3%  proc-vmstat.nr_anon_pages
> >     999461 ±  8%     -23.8%     761774        proc-vmstat.nr_file_pages
> >     426485 ± 21%     -60.1%     170237 ±  9%  proc-vmstat.nr_inactive_anon
> >      82464            -2.6%      80281        proc-vmstat.nr_kernel_stack
> >     187777 ± 18%     -66.9%      62076 ± 28%  proc-vmstat.nr_mapped
> >     316813 ± 27%     -75.0%      79228 ± 16%  proc-vmstat.nr_shmem
> >      31469            -2.0%      30840        proc-vmstat.nr_slab_reclaimable
> >     117889            -8.4%     108036        proc-vmstat.nr_slab_unreclaimable
> >     426485 ± 21%     -60.1%     170237 ±  9%  proc-vmstat.nr_zone_inactive_anon
> >     187187 ± 12%     -43.5%     105680 ±  9%  proc-vmstat.numa_hint_faults
> >     128363 ± 15%     -61.5%      49371 ± 19%  proc-vmstat.numa_hint_faults_local
> >      47314 ± 22%     +39.2%      65863 ± 13%  proc-vmstat.numa_pages_migrated
> >     457026 ±  9%     -18.1%     374188 ± 13%  proc-vmstat.numa_pte_updates
> >    2586600 ±  3%     +27.7%    3302787 ±  8%  proc-vmstat.pgalloc_normal
> >    1589970            -6.2%    1491838        proc-vmstat.pgfault
> >    2347186 ± 10%     +37.7%    3232369 ±  8%  proc-vmstat.pgfree
> >      47314 ± 22%     +39.2%      65863 ± 13%  proc-vmstat.pgmigrate_success
> >     112713            +7.0%     120630 ±  3%  proc-vmstat.pgreuse
> >    2189056           +22.2%    2674944 ±  2%  proc-vmstat.unevictable_pgs_scanned
> >      14.08 ±  2%     +29.3%      18.20 ±  5%  sched_debug.cfs_rq:/.h_nr_running.avg
> >       0.80 ± 14%    +179.2%       2.23 ± 24%  sched_debug.cfs_rq:/.h_nr_running.min
> >     245.23 ± 12%     -19.7%     196.97 ±  6%  sched_debug.cfs_rq:/.load_avg.max
> >       2.27 ± 16%     +75.0%       3.97 ±  4%  sched_debug.cfs_rq:/.load_avg.min
> >      45.77 ± 16%     -17.8%      37.60 ±  6%  sched_debug.cfs_rq:/.load_avg.stddev
> >   11842707           +39.9%   16567992        sched_debug.cfs_rq:/.min_vruntime.avg
> >   13773080 ±  3%    +113.9%   29460281 ±  7%  sched_debug.cfs_rq:/.min_vruntime.max
> >   11423218           +30.3%   14885830        sched_debug.cfs_rq:/.min_vruntime.min
> >     301190 ± 12%    +439.9%    1626088 ± 10%  sched_debug.cfs_rq:/.min_vruntime.stddev
> >     203.83           -16.3%     170.67        sched_debug.cfs_rq:/.removed.load_avg.max
> >      14330 ±  3%     +30.9%      18756 ±  5%  sched_debug.cfs_rq:/.runnable_avg.avg
> >      25115 ±  4%     +15.5%      28999 ±  6%  sched_debug.cfs_rq:/.runnable_avg.max
> >       3811 ± 11%     +68.0%       6404 ± 21%  sched_debug.cfs_rq:/.runnable_avg.min
> >       3818 ±  6%     +15.3%       4404 ±  7%  sched_debug.cfs_rq:/.runnable_avg.stddev
> >    -849635          +410.6%   -4338612        sched_debug.cfs_rq:/.spread0.avg
> >    1092373 ± 54%    +691.1%    8641673 ± 21%  sched_debug.cfs_rq:/.spread0.max
> >   -1263082          +378.1%   -6038905        sched_debug.cfs_rq:/.spread0.min
> >     300764 ± 12%    +441.8%    1629507 ±  9%  sched_debug.cfs_rq:/.spread0.stddev
> >       1591 ±  4%     -11.1%       1413 ±  3%  sched_debug.cfs_rq:/.util_avg.max
> >     288.90 ± 11%     +64.5%     475.23 ± 13%  sched_debug.cfs_rq:/.util_avg.min
> >     240.33 ±  2%     -32.1%     163.09 ±  3%  sched_debug.cfs_rq:/.util_avg.stddev
> >     494.27 ±  3%     +41.6%     699.85 ±  3%  sched_debug.cfs_rq:/.util_est_enqueued.avg
> >      11.23 ± 54%    +634.1%      82.47 ± 22%  sched_debug.cfs_rq:/.util_est_enqueued.min
> >     174576           +20.7%     210681        sched_debug.cpu.clock.avg
> >     174926           +21.2%     211944        sched_debug.cpu.clock.max
> >     174164           +20.3%     209436        sched_debug.cpu.clock.min
> >     230.84 ± 33%    +226.1%     752.67 ± 20%  sched_debug.cpu.clock.stddev
> >     172836           +20.6%     208504        sched_debug.cpu.clock_task.avg
> >     173552           +21.0%     210079        sched_debug.cpu.clock_task.max
> >     156807           +22.3%     191789        sched_debug.cpu.clock_task.min
> >       1634           +17.1%       1914 ±  5%  sched_debug.cpu.clock_task.stddev
> >       0.00 ± 32%    +220.1%       0.00 ± 20%  sched_debug.cpu.next_balance.stddev
> >      14.12 ±  2%     +28.7%      18.18 ±  5%  sched_debug.cpu.nr_running.avg
> >       0.73 ± 25%    +213.6%       2.30 ± 24%  sched_debug.cpu.nr_running.min
> >    1810086          +461.3%   10159215 ± 10%  sched_debug.cpu.nr_switches.avg
> >    2315994 ±  3%    +515.6%   14258195 ±  9%  sched_debug.cpu.nr_switches.max
> >    1529863          +380.3%    7348324 ±  9%  sched_debug.cpu.nr_switches.min
> >     167487 ± 18%    +770.8%    1458519 ± 21%  sched_debug.cpu.nr_switches.stddev
> >     174149           +20.2%     209410        sched_debug.cpu_clk
> >     170980           +20.6%     206240        sched_debug.ktime
> >     174896           +20.2%     210153        sched_debug.sched_clk
> >       7.35           +24.9%       9.18 ±  4%  perf-stat.i.MPKI
> >  1.918e+10           +14.4%  2.194e+10        perf-stat.i.branch-instructions
> >       2.16            -0.1        2.09        perf-stat.i.branch-miss-rate%
> >  4.133e+08            +6.6%  4.405e+08        perf-stat.i.branch-misses
> >      23.08            -9.2       13.86 ±  7%  perf-stat.i.cache-miss-rate%
> >  1.714e+08           -37.2%  1.076e+08 ±  3%  perf-stat.i.cache-misses
> >  7.497e+08           +33.7%  1.002e+09 ±  5%  perf-stat.i.cache-references
> >    1636365          +382.4%    7893858 ±  5%  perf-stat.i.context-switches
> >       2.74            -6.8%       2.56        perf-stat.i.cpi
> >     131725          +288.0%     511159 ± 10%  perf-stat.i.cpu-migrations
> >       1672          +160.8%       4361 ±  4%  perf-stat.i.cycles-between-cache-misses
> >       0.49            +0.6        1.11 ±  5%  perf-stat.i.dTLB-load-miss-rate%
> >  1.417e+08          +158.7%  3.665e+08 ±  5%  perf-stat.i.dTLB-load-misses
> >  2.908e+10            +9.1%  3.172e+10        perf-stat.i.dTLB-loads
> >       0.12 ±  4%      +0.1        0.20 ±  4%  perf-stat.i.dTLB-store-miss-rate%
> >   20805655 ±  4%     +90.9%   39716345 ±  4%  perf-stat.i.dTLB-store-misses
> >  1.755e+10            +8.6%  1.907e+10        perf-stat.i.dTLB-stores
> >      29.04            +3.6       32.62 ±  2%  perf-stat.i.iTLB-load-miss-rate%
> >   56676082           +60.4%   90917582 ±  3%  perf-stat.i.iTLB-load-misses
> >  1.381e+08           +30.6%  1.804e+08        perf-stat.i.iTLB-loads
> >   1.03e+11           +10.5%  1.139e+11        perf-stat.i.instructions
> >       1840           -21.1%       1451 ±  4%  perf-stat.i.instructions-per-iTLB-miss
> >       0.37           +10.9%       0.41        perf-stat.i.ipc
> >       1084            -4.5%       1035 ±  2%  perf-stat.i.metric.K/sec
> >     640.69           +10.3%     706.44        perf-stat.i.metric.M/sec
> >       5249            -9.3%       4762 ±  3%  perf-stat.i.minor-faults
> >      23.57           +18.7       42.30 ±  8%  perf-stat.i.node-load-miss-rate%
> >   40174555           -45.0%   22109431 ± 10%  perf-stat.i.node-loads
> >       8.84 ±  2%     +24.5       33.30 ± 10%  perf-stat.i.node-store-miss-rate%
> >    2912322           +60.3%    4667137 ± 16%  perf-stat.i.node-store-misses
> >   34046752           -50.6%   16826621 ±  9%  perf-stat.i.node-stores
> >       5278            -9.2%       4791 ±  3%  perf-stat.i.page-faults
> >       7.24           +12.1%       8.12 ±  4%  perf-stat.overall.MPKI
> >       2.15            -0.1        2.05        perf-stat.overall.branch-miss-rate%
> >      22.92            -9.5       13.41 ±  7%  perf-stat.overall.cache-miss-rate%
> >       2.73            -6.3%       2.56        perf-stat.overall.cpi
> >       1644           +43.4%       2358 ±  3%  perf-stat.overall.cycles-between-cache-misses
> >       0.48            +0.5        0.99 ±  4%  perf-stat.overall.dTLB-load-miss-rate%
> >       0.12 ±  4%      +0.1        0.19 ±  4%  perf-stat.overall.dTLB-store-miss-rate%
> >      29.06            +2.9       32.01 ±  2%  perf-stat.overall.iTLB-load-miss-rate%
> >       1826           -26.6%       1340 ±  4%  perf-stat.overall.instructions-per-iTLB-miss
> >       0.37            +6.8%       0.39        perf-stat.overall.ipc
> >      22.74            +6.8       29.53 ± 13%  perf-stat.overall.node-load-miss-rate%
> >       7.63            +8.4       16.02 ± 20%  perf-stat.overall.node-store-miss-rate%
> >  1.915e+10            +9.0%  2.088e+10        perf-stat.ps.branch-instructions
> >  4.119e+08            +3.9%  4.282e+08        perf-stat.ps.branch-misses
> >  1.707e+08           -30.5%  1.186e+08 ±  3%  perf-stat.ps.cache-misses
> >  7.446e+08           +19.2%  8.874e+08 ±  4%  perf-stat.ps.cache-references
> >    1611874          +289.1%    6271376 ±  7%  perf-stat.ps.context-switches
> >     127362          +189.0%     368041 ± 11%  perf-stat.ps.cpu-migrations
> >  1.407e+08          +116.2%  3.042e+08 ±  5%  perf-stat.ps.dTLB-load-misses
> >  2.901e+10            +5.4%  3.057e+10        perf-stat.ps.dTLB-loads
> >   20667480 ±  4%     +66.8%   34473793 ±  4%  perf-stat.ps.dTLB-store-misses
> >  1.751e+10            +5.1%   1.84e+10        perf-stat.ps.dTLB-stores
> >   56310692           +45.0%   81644183 ±  4%  perf-stat.ps.iTLB-load-misses
> >  1.375e+08           +26.1%  1.733e+08        perf-stat.ps.iTLB-loads
> >  1.028e+11            +6.3%  1.093e+11        perf-stat.ps.instructions
> >       4929           -24.5%       3723 ±  2%  perf-stat.ps.minor-faults
> >   40134633           -32.9%   26946247 ±  9%  perf-stat.ps.node-loads
> >    2805073           +39.5%    3914304 ± 16%  perf-stat.ps.node-store-misses
> >   33938259           -38.9%   20726382 ±  8%  perf-stat.ps.node-stores
> >       4952           -24.5%       3741 ±  2%  perf-stat.ps.page-faults
> >  2.911e+13           +30.9%  3.809e+13 ±  2%  perf-stat.total.instructions
> >      15.30 ±  4%      -8.6        6.66 ±  5%  perf-profile.calltrace.cycles-pp.sock_alloc_send_pskb.unix_stream_sendmsg.sock_sendmsg.sock_write_iter.vfs_write
> >      13.84 ±  6%      -7.9        5.98 ±  6%  perf-profile.calltrace.cycles-pp.alloc_skb_with_frags.sock_alloc_send_pskb.unix_stream_sendmsg.sock_sendmsg.sock_write_iter
> >      13.61 ±  6%      -7.8        5.84 ±  6%  perf-profile.calltrace.cycles-pp.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb.unix_stream_sendmsg.sock_sendmsg
> >       9.00 ±  2%      -5.5        3.48 ±  4%  perf-profile.calltrace.cycles-pp.consume_skb.unix_stream_read_generic.unix_stream_recvmsg.sock_read_iter.vfs_read
> >       6.44 ±  4%      -4.3        2.14 ±  6%  perf-profile.calltrace.cycles-pp.skb_release_data.consume_skb.unix_stream_read_generic.unix_stream_recvmsg.sock_read_iter
> >       5.83 ±  8%      -3.4        2.44 ±  5%  perf-profile.calltrace.cycles-pp.kmem_cache_alloc_node.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb.unix_stream_sendmsg
> >       5.81 ±  6%      -3.3        2.48 ±  6%  perf-profile.calltrace.cycles-pp.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb.unix_stream_sendmsg
> >       5.50 ±  7%      -3.2        2.32 ±  6%  perf-profile.calltrace.cycles-pp.__kmalloc_node_track_caller.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
> >       5.07 ±  8%      -3.0        2.04 ±  6%  perf-profile.calltrace.cycles-pp.__kmem_cache_alloc_node.__kmalloc_node_track_caller.kmalloc_reserve.__alloc_skb.alloc_skb_with_frags
> >       6.22 ±  2%      -2.9        3.33 ±  3%  perf-profile.calltrace.cycles-pp.unix_stream_read_actor.unix_stream_read_generic.unix_stream_recvmsg.sock_read_iter.vfs_read
> >       6.17 ±  2%      -2.9        3.30 ±  3%  perf-profile.calltrace.cycles-pp.skb_copy_datagram_iter.unix_stream_read_actor.unix_stream_read_generic.unix_stream_recvmsg.sock_read_iter
> >       6.11 ±  2%      -2.9        3.24 ±  3%  perf-profile.calltrace.cycles-pp.__skb_datagram_iter.skb_copy_datagram_iter.unix_stream_read_actor.unix_stream_read_generic.unix_stream_recvmsg
> >      50.99            -2.6       48.39        perf-profile.calltrace.cycles-pp.__libc_read
> >       5.66 ±  3%      -2.3        3.35 ±  3%  perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.__libc_read
> >       5.52 ±  3%      -2.3        3.27 ±  3%  perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.__libc_write
> >       3.14 ±  2%      -1.7        1.42 ±  4%  perf-profile.calltrace.cycles-pp._copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.unix_stream_read_actor.unix_stream_read_generic
> >       2.73 ±  2%      -1.6        1.15 ±  4%  perf-profile.calltrace.cycles-pp.copyout._copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.unix_stream_read_actor
> >       2.59 ±  2%      -1.5        1.07 ±  4%  perf-profile.calltrace.cycles-pp.copy_user_enhanced_fast_string.copyout._copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter
> >       2.72 ±  3%      -1.4        1.34 ±  6%  perf-profile.calltrace.cycles-pp.kmem_cache_free.unix_stream_read_generic.unix_stream_recvmsg.sock_read_iter.vfs_read
> >      41.50            -1.2       40.27        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__libc_read
> >       2.26 ±  4%      -1.1        1.12        perf-profile.calltrace.cycles-pp.skb_release_head_state.consume_skb.unix_stream_read_generic.unix_stream_recvmsg.sock_read_iter
> >       2.76 ±  3%      -1.1        1.63 ±  3%  perf-profile.calltrace.cycles-pp.__check_object_size.simple_copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.unix_stream_read_actor
> >       2.84 ±  3%      -1.1        1.71 ±  2%  perf-profile.calltrace.cycles-pp.simple_copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.unix_stream_read_actor.unix_stream_read_generic
> >       2.20 ±  4%      -1.1        1.08        perf-profile.calltrace.cycles-pp.unix_destruct_scm.skb_release_head_state.consume_skb.unix_stream_read_generic.unix_stream_recvmsg
> >       2.98 ±  2%      -1.1        1.90 ±  6%  perf-profile.calltrace.cycles-pp.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_sendmsg.sock_write_iter.vfs_write
> >       1.99 ±  4%      -1.1        0.92 ±  2%  perf-profile.calltrace.cycles-pp.sock_wfree.unix_destruct_scm.skb_release_head_state.consume_skb.unix_stream_read_generic
> >       2.10 ±  3%      -1.0        1.08 ±  4%  perf-profile.calltrace.cycles-pp.check_heap_object.__check_object_size.simple_copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter
> >       2.08 ±  4%      -0.8        1.24 ±  3%  perf-profile.calltrace.cycles-pp.__entry_text_start.__libc_write
> >       2.16 ±  3%      -0.7        1.47        perf-profile.calltrace.cycles-pp.__entry_text_start.__libc_read
> >       2.20 ±  2%      -0.7        1.52 ±  3%  perf-profile.calltrace.cycles-pp.__kmem_cache_free.skb_release_data.consume_skb.unix_stream_read_generic.unix_stream_recvmsg
> >       1.46 ±  3%      -0.6        0.87 ±  8%  perf-profile.calltrace.cycles-pp._copy_from_iter.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_sendmsg.sock_write_iter
> >       4.82 ±  2%      -0.6        4.24        perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
> >       1.31 ±  2%      -0.4        0.90 ±  4%  perf-profile.calltrace.cycles-pp.__check_object_size.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_sendmsg.sock_write_iter
> >       0.96 ±  3%      -0.4        0.57 ± 10%  perf-profile.calltrace.cycles-pp.copyin._copy_from_iter.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_sendmsg
> >       1.14 ±  3%      -0.4        0.76 ±  5%  perf-profile.calltrace.cycles-pp.memcg_slab_post_alloc_hook.kmem_cache_alloc_node.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
> >       0.99 ±  3%      -0.3        0.65 ±  8%  perf-profile.calltrace.cycles-pp.memcg_slab_post_alloc_hook.__kmem_cache_alloc_node.__kmalloc_node_track_caller.kmalloc_reserve.__alloc_skb
> >       1.30 ±  4%      -0.3        0.99 ±  3%  perf-profile.calltrace.cycles-pp.sock_recvmsg.sock_read_iter.vfs_read.ksys_read.do_syscall_64
> >       0.98 ±  2%      -0.3        0.69 ±  3%  perf-profile.calltrace.cycles-pp.security_file_permission.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >       0.67            -0.2        0.42 ± 50%  perf-profile.calltrace.cycles-pp.check_heap_object.__check_object_size.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_sendmsg
> >       0.56 ±  4%      -0.2        0.32 ± 81%  perf-profile.calltrace.cycles-pp.__fdget_pos.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
> >       0.86 ±  2%      -0.2        0.63 ±  3%  perf-profile.calltrace.cycles-pp.apparmor_file_permission.security_file_permission.vfs_write.ksys_write.do_syscall_64
> >       1.15 ±  4%      -0.2        0.93 ±  4%  perf-profile.calltrace.cycles-pp.security_socket_recvmsg.sock_recvmsg.sock_read_iter.vfs_read.ksys_read
> >       0.90            -0.2        0.69 ±  3%  perf-profile.calltrace.cycles-pp.get_obj_cgroup_from_current.kmem_cache_alloc_node.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb
> >       1.23 ±  3%      -0.2        1.07 ±  3%  perf-profile.calltrace.cycles-pp.security_socket_sendmsg.sock_sendmsg.sock_write_iter.vfs_write.ksys_write
> >       1.05 ±  2%      -0.2        0.88 ±  2%  perf-profile.calltrace.cycles-pp.security_file_permission.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >       0.84 ±  4%      -0.2        0.68 ±  4%  perf-profile.calltrace.cycles-pp.aa_sk_perm.security_socket_recvmsg.sock_recvmsg.sock_read_iter.vfs_read
> >       0.88            -0.1        0.78 ±  5%  perf-profile.calltrace.cycles-pp.apparmor_file_permission.security_file_permission.vfs_read.ksys_read.do_syscall_64
> >       0.94 ±  3%      -0.1        0.88 ±  4%  perf-profile.calltrace.cycles-pp.aa_sk_perm.security_socket_sendmsg.sock_sendmsg.sock_write_iter.vfs_write
> >       0.62 ±  2%      +0.3        0.90 ±  2%  perf-profile.calltrace.cycles-pp.mutex_lock.unix_stream_read_generic.unix_stream_recvmsg.sock_read_iter.vfs_read
> >       0.00            +0.6        0.58 ±  3%  perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.prepare_to_wait.unix_stream_data_wait.unix_stream_read_generic.unix_stream_recvmsg
> >       0.00            +0.6        0.61 ±  6%  perf-profile.calltrace.cycles-pp.update_load_avg.enqueue_task_fair.ttwu_do_activate.try_to_wake_up.autoremove_wake_function
> >       0.00            +0.6        0.62 ±  4%  perf-profile.calltrace.cycles-pp.set_next_entity.pick_next_task_fair.__schedule.schedule.exit_to_user_mode_loop
> >       0.00            +0.7        0.67 ± 11%  perf-profile.calltrace.cycles-pp.update_load_avg.dequeue_entity.dequeue_task_fair.__schedule.schedule
> >       0.00            +0.7        0.67 ±  7%  perf-profile.calltrace.cycles-pp.__switch_to_asm.__libc_write
> >       0.00            +0.8        0.76 ±  4%  perf-profile.calltrace.cycles-pp.reweight_entity.dequeue_task_fair.__schedule.schedule.schedule_timeout
> >       0.00            +0.8        0.77 ±  4%  perf-profile.calltrace.cycles-pp.___perf_sw_event.prepare_task_switch.__schedule.schedule.schedule_timeout
> >       0.00            +0.8        0.77 ±  8%  perf-profile.calltrace.cycles-pp.put_prev_entity.pick_next_task_fair.__schedule.schedule.exit_to_user_mode_loop
> >       0.00            +0.8        0.81 ±  5%  perf-profile.calltrace.cycles-pp.prepare_task_switch.__schedule.schedule.exit_to_user_mode_loop.exit_to_user_mode_prepare
> >       0.00            +0.8        0.81 ±  5%  perf-profile.calltrace.cycles-pp.check_preempt_wakeup.check_preempt_curr.ttwu_do_activate.try_to_wake_up.autoremove_wake_function
> >       0.00            +0.8        0.82 ±  2%  perf-profile.calltrace.cycles-pp.__switch_to_asm.__libc_read
> >       0.00            +0.8        0.82 ±  3%  perf-profile.calltrace.cycles-pp.reweight_entity.enqueue_task_fair.ttwu_do_activate.try_to_wake_up.autoremove_wake_function
> >       0.00            +0.9        0.86 ±  5%  perf-profile.calltrace.cycles-pp.perf_trace_sched_wakeup_template.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock
> >       0.00            +0.9        0.87 ±  8%  perf-profile.calltrace.cycles-pp.update_load_avg.enqueue_entity.enqueue_task_fair.ttwu_do_activate.try_to_wake_up
> >      29.66            +0.9       30.58        perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write
> >       0.00            +1.0        0.95 ±  3%  perf-profile.calltrace.cycles-pp.set_next_entity.pick_next_task_fair.__schedule.schedule.schedule_timeout
> >       0.00            +1.0        0.98 ±  4%  perf-profile.calltrace.cycles-pp.check_preempt_curr.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common
> >       0.00            +1.0        0.99 ±  3%  perf-profile.calltrace.cycles-pp.update_curr.enqueue_entity.enqueue_task_fair.ttwu_do_activate.try_to_wake_up
> >       0.00            +1.0        1.05 ±  4%  perf-profile.calltrace.cycles-pp.prepare_to_wait.unix_stream_data_wait.unix_stream_read_generic.unix_stream_recvmsg.sock_read_iter
> >       0.00            +1.1        1.07 ± 12%  perf-profile.calltrace.cycles-pp.select_idle_sibling.select_task_rq_fair.select_task_rq.try_to_wake_up.autoremove_wake_function
> >      27.81 ±  2%      +1.2       28.98        perf-profile.calltrace.cycles-pp.unix_stream_recvmsg.sock_read_iter.vfs_read.ksys_read.do_syscall_64
> >      27.36 ±  2%      +1.2       28.59        perf-profile.calltrace.cycles-pp.unix_stream_read_generic.unix_stream_recvmsg.sock_read_iter.vfs_read.ksys_read
> >       0.00            +1.5        1.46 ±  2%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.try_to_wake_up.autoremove_wake_function.__wake_up_common
> >       0.00            +1.6        1.55 ±  4%  perf-profile.calltrace.cycles-pp.prepare_task_switch.__schedule.schedule.schedule_timeout.unix_stream_data_wait
> >       0.00            +1.6        1.60 ±  4%  perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
> >      27.58            +1.6       29.19        perf-profile.calltrace.cycles-pp.sock_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >       0.00            +1.6        1.63 ±  5%  perf-profile.calltrace.cycles-pp.update_curr.dequeue_entity.dequeue_task_fair.__schedule.schedule
> >       0.00            +1.6        1.65 ±  5%  perf-profile.calltrace.cycles-pp.restore_fpregs_from_fpstate.switch_fpu_return.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
> >       0.00            +1.7        1.66 ±  6%  perf-profile.calltrace.cycles-pp.switch_mm_irqs_off.__schedule.schedule.exit_to_user_mode_loop.exit_to_user_mode_prepare
> >       0.00            +1.8        1.80        perf-profile.calltrace.cycles-pp._raw_spin_lock.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock
> >       0.00            +1.8        1.84 ±  2%  perf-profile.calltrace.cycles-pp.pick_next_task_fair.__schedule.schedule.schedule_timeout.unix_stream_data_wait
> >       0.00            +2.0        1.97 ±  2%  perf-profile.calltrace.cycles-pp.switch_mm_irqs_off.__schedule.schedule.schedule_timeout.unix_stream_data_wait
> >      26.63 ±  2%      +2.0       28.61        perf-profile.calltrace.cycles-pp.sock_sendmsg.sock_write_iter.vfs_write.ksys_write.do_syscall_64
> >       0.00            +2.0        2.01 ±  6%  perf-profile.calltrace.cycles-pp.pick_next_task_fair.__schedule.schedule.exit_to_user_mode_loop.exit_to_user_mode_prepare
> >       0.00            +2.1        2.09 ±  6%  perf-profile.calltrace.cycles-pp.select_task_rq_fair.select_task_rq.try_to_wake_up.autoremove_wake_function.__wake_up_common
> >       0.00            +2.1        2.11 ±  5%  perf-profile.calltrace.cycles-pp.switch_fpu_return.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >      25.21 ±  2%      +2.2       27.43        perf-profile.calltrace.cycles-pp.unix_stream_sendmsg.sock_sendmsg.sock_write_iter.vfs_write.ksys_write
> >       0.00            +2.4        2.43 ±  5%  perf-profile.calltrace.cycles-pp.select_task_rq.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock
> >      48.00            +2.7       50.69        perf-profile.calltrace.cycles-pp.__libc_write
> >       0.00            +2.9        2.87 ±  5%  perf-profile.calltrace.cycles-pp.dequeue_entity.dequeue_task_fair.__schedule.schedule.schedule_timeout
> >       0.09 ±223%      +3.4        3.47 ±  3%  perf-profile.calltrace.cycles-pp.enqueue_entity.enqueue_task_fair.ttwu_do_activate.try_to_wake_up.autoremove_wake_function
> >      39.07            +4.8       43.84        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__libc_write
> >       0.66 ± 18%      +5.0        5.62 ±  4%  perf-profile.calltrace.cycles-pp.dequeue_task_fair.__schedule.schedule.schedule_timeout.unix_stream_data_wait
> >       4.73            +5.1        9.88 ±  3%  perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write
> >       0.66 ± 20%      +5.3        5.98 ±  3%  perf-profile.calltrace.cycles-pp.enqueue_task_fair.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common
> >      35.96            +5.7       41.68        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write
> >       0.00            +6.0        6.02 ±  6%  perf-profile.calltrace.cycles-pp.__schedule.schedule.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode
> >       0.00            +6.2        6.18 ±  6%  perf-profile.calltrace.cycles-pp.schedule.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
> >       0.00            +6.4        6.36 ±  6%  perf-profile.calltrace.cycles-pp.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >       0.78 ± 19%      +6.4        7.15 ±  3%  perf-profile.calltrace.cycles-pp.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock
> >       0.18 ±141%      +7.0        7.18 ±  6%  perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write
> >       1.89 ± 15%     +12.1       13.96 ±  3%  perf-profile.calltrace.cycles-pp.__schedule.schedule.schedule_timeout.unix_stream_data_wait.unix_stream_read_generic
> >       1.92 ± 15%     +12.3       14.23 ±  3%  perf-profile.calltrace.cycles-pp.schedule.schedule_timeout.unix_stream_data_wait.unix_stream_read_generic.unix_stream_recvmsg
> >       1.66 ± 19%     +12.4       14.06 ±  2%  perf-profile.calltrace.cycles-pp.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock.sock_def_readable
> >       1.96 ± 15%     +12.5       14.48 ±  3%  perf-profile.calltrace.cycles-pp.schedule_timeout.unix_stream_data_wait.unix_stream_read_generic.unix_stream_recvmsg.sock_read_iter
> >       1.69 ± 19%     +12.7       14.38 ±  2%  perf-profile.calltrace.cycles-pp.autoremove_wake_function.__wake_up_common.__wake_up_common_lock.sock_def_readable.unix_stream_sendmsg
> >       1.75 ± 19%     +13.0       14.75 ±  2%  perf-profile.calltrace.cycles-pp.__wake_up_common.__wake_up_common_lock.sock_def_readable.unix_stream_sendmsg.sock_sendmsg
> >       2.53 ± 10%     +13.4       15.90 ±  2%  perf-profile.calltrace.cycles-pp.sock_def_readable.unix_stream_sendmsg.sock_sendmsg.sock_write_iter.vfs_write
> >       1.96 ± 16%     +13.5       15.42 ±  2%  perf-profile.calltrace.cycles-pp.__wake_up_common_lock.sock_def_readable.unix_stream_sendmsg.sock_sendmsg.sock_write_iter
> >       2.28 ± 15%     +14.6       16.86 ±  3%  perf-profile.calltrace.cycles-pp.unix_stream_data_wait.unix_stream_read_generic.unix_stream_recvmsg.sock_read_iter.vfs_read
> >      15.31 ±  4%      -8.6        6.67 ±  5%  perf-profile.children.cycles-pp.sock_alloc_send_pskb
> >      13.85 ±  6%      -7.9        5.98 ±  5%  perf-profile.children.cycles-pp.alloc_skb_with_frags
> >      13.70 ±  6%      -7.8        5.89 ±  6%  perf-profile.children.cycles-pp.__alloc_skb
> >       9.01 ±  2%      -5.5        3.48 ±  4%  perf-profile.children.cycles-pp.consume_skb
> >       6.86 ± 26%      -4.7        2.15 ±  2%  perf-profile.children.cycles-pp._raw_spin_lock_irqsave
> >      11.27 ±  3%      -4.6        6.67 ±  3%  perf-profile.children.cycles-pp.syscall_return_via_sysret
> >       6.46 ±  4%      -4.3        2.15 ±  6%  perf-profile.children.cycles-pp.skb_release_data
> >       4.18 ± 25%      -4.0        0.15 ± 69%  perf-profile.children.cycles-pp.___slab_alloc
> >       5.76 ± 32%      -3.9        1.91 ±  3%  perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
> >       5.98 ±  8%      -3.5        2.52 ±  5%  perf-profile.children.cycles-pp.kmem_cache_alloc_node
> >       5.84 ±  6%      -3.3        2.50 ±  6%  perf-profile.children.cycles-pp.kmalloc_reserve
> >       3.33 ± 30%      -3.3        0.05 ± 88%  perf-profile.children.cycles-pp.get_partial_node
> >       5.63 ±  7%      -3.3        2.37 ±  6%  perf-profile.children.cycles-pp.__kmalloc_node_track_caller
> >       5.20 ±  7%      -3.1        2.12 ±  6%  perf-profile.children.cycles-pp.__kmem_cache_alloc_node
> >       6.23 ±  2%      -2.9        3.33 ±  3%  perf-profile.children.cycles-pp.unix_stream_read_actor
> >       6.18 ±  2%      -2.9        3.31 ±  3%  perf-profile.children.cycles-pp.skb_copy_datagram_iter
> >       6.11 ±  2%      -2.9        3.25 ±  3%  perf-profile.children.cycles-pp.__skb_datagram_iter
> >      51.39            -2.5       48.85        perf-profile.children.cycles-pp.__libc_read
> >       3.14 ±  3%      -2.5        0.61 ± 13%  perf-profile.children.cycles-pp.__slab_free
> >       5.34 ±  3%      -2.1        3.23 ±  3%  perf-profile.children.cycles-pp.__entry_text_start
> >       3.57 ±  2%      -1.9        1.66 ±  6%  perf-profile.children.cycles-pp.copy_user_enhanced_fast_string
> >       3.16 ±  2%      -1.7        1.43 ±  4%  perf-profile.children.cycles-pp._copy_to_iter
> >       2.74 ±  2%      -1.6        1.16 ±  4%  perf-profile.children.cycles-pp.copyout
> >       4.16 ±  2%      -1.5        2.62 ±  3%  perf-profile.children.cycles-pp.__check_object_size
> >       2.73 ±  3%      -1.4        1.35 ±  6%  perf-profile.children.cycles-pp.kmem_cache_free
> >       2.82 ±  2%      -1.2        1.63 ±  3%  perf-profile.children.cycles-pp.check_heap_object
> >       2.27 ±  4%      -1.1        1.13 ±  2%  perf-profile.children.cycles-pp.skb_release_head_state
> >       2.85 ±  3%      -1.1        1.72 ±  2%  perf-profile.children.cycles-pp.simple_copy_to_iter
> >       2.22 ±  4%      -1.1        1.10        perf-profile.children.cycles-pp.unix_destruct_scm
> >       3.00 ±  2%      -1.1        1.91 ±  5%  perf-profile.children.cycles-pp.skb_copy_datagram_from_iter
> >       2.00 ±  4%      -1.1        0.92 ±  2%  perf-profile.children.cycles-pp.sock_wfree
> >       2.16 ±  3%      -0.7        1.43 ±  7%  perf-profile.children.cycles-pp.memcg_slab_post_alloc_hook
> >       1.45 ±  3%      -0.7        0.73 ±  7%  perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
> >       2.21 ±  2%      -0.7        1.52 ±  3%  perf-profile.children.cycles-pp.__kmem_cache_free
> >       1.49 ±  3%      -0.6        0.89 ±  8%  perf-profile.children.cycles-pp._copy_from_iter
> >       1.40 ±  3%      -0.6        0.85 ± 13%  perf-profile.children.cycles-pp.mod_objcg_state
> >       0.74            -0.5        0.24 ± 16%  perf-profile.children.cycles-pp.__build_skb_around
> >       1.48            -0.5        1.01 ±  2%  perf-profile.children.cycles-pp.get_obj_cgroup_from_current
> >       2.05 ±  2%      -0.5        1.59 ±  2%  perf-profile.children.cycles-pp.security_file_permission
> >       0.98 ±  2%      -0.4        0.59 ± 10%  perf-profile.children.cycles-pp.copyin
> >       1.08 ±  3%      -0.4        0.72 ±  3%  perf-profile.children.cycles-pp.__might_resched
> >       1.75            -0.3        1.42 ±  4%  perf-profile.children.cycles-pp.apparmor_file_permission
> >       1.32 ±  4%      -0.3        1.00 ±  3%  perf-profile.children.cycles-pp.sock_recvmsg
> >       0.54 ±  4%      -0.3        0.25 ±  6%  perf-profile.children.cycles-pp.skb_unlink
> >       0.54 ±  6%      -0.3        0.26 ±  3%  perf-profile.children.cycles-pp.unix_write_space
> >       0.66 ±  3%      -0.3        0.39 ±  4%  perf-profile.children.cycles-pp.obj_cgroup_charge
> >       0.68 ±  2%      -0.3        0.41 ±  4%  perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
> >       0.86 ±  4%      -0.3        0.59 ±  3%  perf-profile.children.cycles-pp.__check_heap_object
> >       0.75 ±  9%      -0.3        0.48 ±  2%  perf-profile.children.cycles-pp.skb_set_owner_w
> >       1.84 ±  3%      -0.3        1.58 ±  4%  perf-profile.children.cycles-pp.aa_sk_perm
> >       0.68 ± 11%      -0.2        0.44 ±  3%  perf-profile.children.cycles-pp.skb_queue_tail
> >       1.22 ±  4%      -0.2        0.99 ±  5%  perf-profile.children.cycles-pp.__fdget_pos
> >       0.70 ±  2%      -0.2        0.48 ±  5%  perf-profile.children.cycles-pp.__get_obj_cgroup_from_memcg
> >       1.16 ±  4%      -0.2        0.93 ±  3%  perf-profile.children.cycles-pp.security_socket_recvmsg
> >       0.48 ±  3%      -0.2        0.29 ±  4%  perf-profile.children.cycles-pp.__might_fault
> >       0.24 ±  7%      -0.2        0.05 ± 56%  perf-profile.children.cycles-pp.fsnotify_perm
> >       1.12 ±  4%      -0.2        0.93 ±  6%  perf-profile.children.cycles-pp.__fget_light
> >       1.24 ±  3%      -0.2        1.07 ±  3%  perf-profile.children.cycles-pp.security_socket_sendmsg
> >       0.61 ±  3%      -0.2        0.45 ±  2%  perf-profile.children.cycles-pp.__might_sleep
> >       0.33 ±  5%      -0.2        0.17 ±  6%  perf-profile.children.cycles-pp.refill_obj_stock
> >       0.40 ±  2%      -0.1        0.25 ±  4%  perf-profile.children.cycles-pp.kmalloc_slab
> >       0.57 ±  2%      -0.1        0.45        perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
> >       0.54 ±  3%      -0.1        0.42 ±  2%  perf-profile.children.cycles-pp.wait_for_unix_gc
> >       0.42 ±  2%      -0.1        0.30 ±  3%  perf-profile.children.cycles-pp.is_vmalloc_addr
> >       1.00 ±  2%      -0.1        0.87 ±  5%  perf-profile.children.cycles-pp.__virt_addr_valid
> >       0.52 ±  2%      -0.1        0.41        perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
> >       0.33 ±  3%      -0.1        0.21 ±  3%  perf-profile.children.cycles-pp.tick_sched_handle
> >       0.36 ±  2%      -0.1        0.25 ±  4%  perf-profile.children.cycles-pp.tick_sched_timer
> >       0.47 ±  2%      -0.1        0.36 ±  2%  perf-profile.children.cycles-pp.hrtimer_interrupt
> >       0.48 ±  2%      -0.1        0.36 ±  2%  perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
> >       0.32 ±  3%      -0.1        0.21 ±  5%  perf-profile.children.cycles-pp.update_process_times
> >       0.42 ±  3%      -0.1        0.31 ±  2%  perf-profile.children.cycles-pp.__hrtimer_run_queues
> >       0.26 ±  6%      -0.1        0.16 ±  4%  perf-profile.children.cycles-pp.kmalloc_size_roundup
> >       0.20 ±  4%      -0.1        0.10 ±  9%  perf-profile.children.cycles-pp.task_tick_fair
> >       0.24 ±  3%      -0.1        0.15 ±  4%  perf-profile.children.cycles-pp.scheduler_tick
> >       0.30 ±  5%      -0.1        0.21 ±  8%  perf-profile.children.cycles-pp.obj_cgroup_uncharge_pages
> >       0.20 ±  2%      -0.1        0.11 ±  6%  perf-profile.children.cycles-pp.should_failslab
> >       0.51 ±  2%      -0.1        0.43 ±  6%  perf-profile.children.cycles-pp.syscall_enter_from_user_mode
> >       0.15 ±  8%      -0.1        0.07 ± 13%  perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare
> >       0.19 ±  4%      -0.1        0.12 ±  5%  perf-profile.children.cycles-pp.apparmor_socket_sendmsg
> >       0.20 ±  4%      -0.1        0.13 ±  5%  perf-profile.children.cycles-pp.aa_file_perm
> >       0.18 ±  5%      -0.1        0.12 ±  5%  perf-profile.children.cycles-pp.apparmor_socket_recvmsg
> >       0.14 ± 13%      -0.1        0.08 ± 55%  perf-profile.children.cycles-pp.__mod_memcg_lruvec_state
> >       0.24 ±  4%      -0.1        0.18 ±  2%  perf-profile.children.cycles-pp.rcu_all_qs
> >       0.18 ± 10%      -0.1        0.12 ± 11%  perf-profile.children.cycles-pp.memcg_account_kmem
> >       0.37 ±  3%      -0.1        0.31 ±  3%  perf-profile.children.cycles-pp.security_socket_getpeersec_dgram
> >       0.08            -0.0        0.06 ±  8%  perf-profile.children.cycles-pp.put_pid
> >       0.18 ±  3%      -0.0        0.16 ±  4%  perf-profile.children.cycles-pp.apparmor_socket_getpeersec_dgram
> >       0.21 ±  3%      +0.0        0.23 ±  2%  perf-profile.children.cycles-pp.__get_task_ioprio
> >       0.00            +0.1        0.05        perf-profile.children.cycles-pp.perf_exclude_event
> >       0.00            +0.1        0.06 ±  7%  perf-profile.children.cycles-pp.invalidate_user_asid
> >       0.00            +0.1        0.07 ±  6%  perf-profile.children.cycles-pp.__bitmap_and
> >       0.05            +0.1        0.13 ±  8%  perf-profile.children.cycles-pp.irqentry_exit_to_user_mode
> >       0.00            +0.1        0.08 ±  7%  perf-profile.children.cycles-pp.schedule_debug
> >       0.00            +0.1        0.08 ± 13%  perf-profile.children.cycles-pp.read@plt
> >       0.00            +0.1        0.08 ±  5%  perf-profile.children.cycles-pp.sysvec_reschedule_ipi
> >       0.00            +0.1        0.10 ±  4%  perf-profile.children.cycles-pp.tracing_gen_ctx_irq_test
> >       0.00            +0.1        0.10 ±  4%  perf-profile.children.cycles-pp.place_entity
> >       0.00            +0.1        0.12 ± 10%  perf-profile.children.cycles-pp.native_irq_return_iret
> >       0.07 ± 14%      +0.1        0.19 ±  3%  perf-profile.children.cycles-pp.__list_add_valid
> >       0.00            +0.1        0.13 ±  6%  perf-profile.children.cycles-pp.perf_trace_buf_alloc
> >       0.00            +0.1        0.13 ± 34%  perf-profile.children.cycles-pp._find_next_and_bit
> >       0.00            +0.1        0.14 ±  5%  perf-profile.children.cycles-pp.switch_ldt
> >       0.00            +0.1        0.15 ±  5%  perf-profile.children.cycles-pp.check_cfs_rq_runtime
> >       0.00            +0.1        0.15 ± 30%  perf-profile.children.cycles-pp.migrate_task_rq_fair
> >       0.00            +0.2        0.15 ±  5%  perf-profile.children.cycles-pp.__rdgsbase_inactive
> >       0.00            +0.2        0.16 ±  3%  perf-profile.children.cycles-pp.save_fpregs_to_fpstate
> >       0.00            +0.2        0.16 ±  6%  perf-profile.children.cycles-pp.ttwu_queue_wakelist
> >       0.00            +0.2        0.17        perf-profile.children.cycles-pp.perf_trace_buf_update
> >       0.00            +0.2        0.18 ±  2%  perf-profile.children.cycles-pp.rb_insert_color
> >       0.00            +0.2        0.18 ±  4%  perf-profile.children.cycles-pp.rb_next
> >       0.00            +0.2        0.18 ± 21%  perf-profile.children.cycles-pp.__cgroup_account_cputime
> >       0.01 ±223%      +0.2        0.21 ± 28%  perf-profile.children.cycles-pp.perf_trace_sched_switch
> >       0.00            +0.2        0.20 ±  3%  perf-profile.children.cycles-pp.select_idle_cpu
> >       0.00            +0.2        0.20 ±  3%  perf-profile.children.cycles-pp.rcu_note_context_switch
> >       0.00            +0.2        0.21 ± 26%  perf-profile.children.cycles-pp.set_task_cpu
> >       0.00            +0.2        0.22 ±  8%  perf-profile.children.cycles-pp.resched_curr
> >       0.08 ±  5%      +0.2        0.31 ± 11%  perf-profile.children.cycles-pp.task_h_load
> >       0.00            +0.2        0.24 ±  3%  perf-profile.children.cycles-pp.finish_wait
> >       0.04 ± 44%      +0.3        0.29 ±  5%  perf-profile.children.cycles-pp.rb_erase
> >       0.19 ±  6%      +0.3        0.46        perf-profile.children.cycles-pp._raw_spin_unlock_irqrestore
> >       0.20 ±  6%      +0.3        0.47 ±  3%  perf-profile.children.cycles-pp.__list_del_entry_valid
> >       0.00            +0.3        0.28 ±  3%  perf-profile.children.cycles-pp.__wrgsbase_inactive
> >       0.02 ±141%      +0.3        0.30 ±  2%  perf-profile.children.cycles-pp.native_sched_clock
> >       0.06 ± 13%      +0.3        0.34 ±  2%  perf-profile.children.cycles-pp.sched_clock_cpu
> >       0.64 ±  2%      +0.3        0.93        perf-profile.children.cycles-pp.mutex_lock
> >       0.00            +0.3        0.30 ±  5%  perf-profile.children.cycles-pp.cr4_update_irqsoff
> >       0.00            +0.3        0.30 ±  4%  perf-profile.children.cycles-pp.clear_buddies
> >       0.07 ± 55%      +0.3        0.37 ±  5%  perf-profile.children.cycles-pp.perf_trace_sched_stat_runtime
> >       0.10 ± 66%      +0.3        0.42 ±  5%  perf-profile.children.cycles-pp.perf_tp_event
> >       0.02 ±142%      +0.3        0.36 ±  6%  perf-profile.children.cycles-pp.cpuacct_charge
> >       0.12 ±  9%      +0.4        0.47 ± 11%  perf-profile.children.cycles-pp.wake_affine
> >       0.00            +0.4        0.36 ± 13%  perf-profile.children.cycles-pp.available_idle_cpu
> >       0.05 ± 48%      +0.4        0.42 ±  6%  perf-profile.children.cycles-pp.finish_task_switch
> >       0.12 ±  4%      +0.4        0.49 ±  4%  perf-profile.children.cycles-pp.asm_sysvec_reschedule_ipi
> >       0.07 ± 17%      +0.4        0.48        perf-profile.children.cycles-pp.__calc_delta
> >       0.03 ±100%      +0.5        0.49 ±  4%  perf-profile.children.cycles-pp.pick_next_entity
> >       0.00            +0.5        0.48 ±  8%  perf-profile.children.cycles-pp.set_next_buddy
> >       0.08 ± 14%      +0.6        0.66 ±  4%  perf-profile.children.cycles-pp.update_min_vruntime
> >       0.07 ± 17%      +0.6        0.68 ±  2%  perf-profile.children.cycles-pp.os_xsave
> >       0.29 ±  7%      +0.7        0.99 ±  3%  perf-profile.children.cycles-pp.update_cfs_group
> >       0.17 ± 17%      +0.7        0.87 ±  4%  perf-profile.children.cycles-pp.perf_trace_sched_wakeup_template
> >       0.14 ±  7%      +0.7        0.87 ±  3%  perf-profile.children.cycles-pp.__update_load_avg_se
> >       0.14 ± 16%      +0.8        0.90 ±  2%  perf-profile.children.cycles-pp.update_rq_clock
> >       0.08 ± 17%      +0.8        0.84 ±  5%  perf-profile.children.cycles-pp.check_preempt_wakeup
> >       0.12 ± 14%      +0.8        0.95 ±  3%  perf-profile.children.cycles-pp.__update_load_avg_cfs_rq
> >       0.22 ±  5%      +0.8        1.07 ±  3%  perf-profile.children.cycles-pp.prepare_to_wait
> >       0.10 ± 18%      +0.9        0.98 ±  3%  perf-profile.children.cycles-pp.check_preempt_curr
> >      29.72            +0.9       30.61        perf-profile.children.cycles-pp.vfs_write
> >       0.14 ± 11%      +0.9        1.03 ±  4%  perf-profile.children.cycles-pp.__switch_to
> >       0.07 ± 20%      +0.9        0.99 ±  6%  perf-profile.children.cycles-pp.put_prev_entity
> >       0.12 ± 16%      +1.0        1.13 ±  5%  perf-profile.children.cycles-pp.___perf_sw_event
> >       0.07 ± 17%      +1.0        1.10 ± 13%  perf-profile.children.cycles-pp.select_idle_sibling
> >      27.82 ±  2%      +1.2       28.99        perf-profile.children.cycles-pp.unix_stream_recvmsg
> >      27.41 ±  2%      +1.2       28.63        perf-profile.children.cycles-pp.unix_stream_read_generic
> >       0.20 ± 15%      +1.4        1.59 ±  3%  perf-profile.children.cycles-pp.reweight_entity
> >       0.21 ± 13%      +1.4        1.60 ±  4%  perf-profile.children.cycles-pp.__switch_to_asm
> >       0.23 ± 10%      +1.4        1.65 ±  5%  perf-profile.children.cycles-pp.restore_fpregs_from_fpstate
> >       0.20 ± 13%      +1.5        1.69 ±  3%  perf-profile.children.cycles-pp.set_next_entity
> >      27.59            +1.6       29.19        perf-profile.children.cycles-pp.sock_write_iter
> >       0.28 ± 10%      +1.8        2.12 ±  5%  perf-profile.children.cycles-pp.switch_fpu_return
> >       0.26 ± 11%      +1.8        2.10 ±  6%  perf-profile.children.cycles-pp.select_task_rq_fair
> >      26.66 ±  2%      +2.0       28.63        perf-profile.children.cycles-pp.sock_sendmsg
> >       0.31 ± 12%      +2.1        2.44 ±  5%  perf-profile.children.cycles-pp.select_task_rq
> >       0.30 ± 14%      +2.2        2.46 ±  4%  perf-profile.children.cycles-pp.prepare_task_switch
> >      25.27 ±  2%      +2.2       27.47        perf-profile.children.cycles-pp.unix_stream_sendmsg
> >       2.10            +2.3        4.38 ±  2%  perf-profile.children.cycles-pp._raw_spin_lock
> >       0.40 ± 14%      +2.5        2.92 ±  5%  perf-profile.children.cycles-pp.dequeue_entity
> >      48.40            +2.6       51.02        perf-profile.children.cycles-pp.__libc_write
> >       0.46 ± 15%      +3.1        3.51 ±  3%  perf-profile.children.cycles-pp.enqueue_entity
> >       0.49 ± 10%      +3.2        3.64 ±  7%  perf-profile.children.cycles-pp.update_load_avg
> >       0.53 ± 20%      +3.4        3.91 ±  3%  perf-profile.children.cycles-pp.update_curr
> >      80.81            +3.4       84.24        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
> >       0.50 ± 12%      +3.5        4.00 ±  4%  perf-profile.children.cycles-pp.switch_mm_irqs_off
> >       0.55 ±  9%      +3.8        4.38 ±  4%  perf-profile.children.cycles-pp.pick_next_task_fair
> >       9.60            +4.6       14.15 ±  2%  perf-profile.children.cycles-pp.syscall_exit_to_user_mode
> >       0.78 ± 13%      +4.9        5.65 ±  4%  perf-profile.children.cycles-pp.dequeue_task_fair
> >       0.78 ± 15%      +5.2        5.99 ±  3%  perf-profile.children.cycles-pp.enqueue_task_fair
> >      74.30            +5.6       79.86        perf-profile.children.cycles-pp.do_syscall_64
> >       0.90 ± 15%      +6.3        7.16 ±  3%  perf-profile.children.cycles-pp.ttwu_do_activate
> >       0.33 ± 31%      +6.3        6.61 ±  6%  perf-profile.children.cycles-pp.exit_to_user_mode_loop
> >       0.82 ± 15%      +8.1        8.92 ±  5%  perf-profile.children.cycles-pp.exit_to_user_mode_prepare
> >       1.90 ± 16%     +12.2       14.10 ±  2%  perf-profile.children.cycles-pp.try_to_wake_up
> >       2.36 ± 11%     +12.2       14.60 ±  3%  perf-profile.children.cycles-pp.schedule_timeout
> >       1.95 ± 15%     +12.5       14.41 ±  2%  perf-profile.children.cycles-pp.autoremove_wake_function
> >       2.01 ± 15%     +12.8       14.76 ±  2%  perf-profile.children.cycles-pp.__wake_up_common
> >       2.23 ± 13%     +13.2       15.45 ±  2%  perf-profile.children.cycles-pp.__wake_up_common_lock
> >       2.53 ± 10%     +13.4       15.90 ±  2%  perf-profile.children.cycles-pp.sock_def_readable
> >       2.29 ± 15%     +14.6       16.93 ±  3%  perf-profile.children.cycles-pp.unix_stream_data_wait
> >       2.61 ± 13%     +18.0       20.65 ±  4%  perf-profile.children.cycles-pp.schedule
> >       2.66 ± 13%     +18.1       20.77 ±  4%  perf-profile.children.cycles-pp.__schedule
> >      11.25 ±  3%      -4.6        6.67 ±  3%  perf-profile.self.cycles-pp.syscall_return_via_sysret
> >       5.76 ± 32%      -3.9        1.90 ±  3%  perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
> >       8.69 ±  3%      -3.4        5.27 ±  3%  perf-profile.self.cycles-pp.syscall_exit_to_user_mode
> >       3.11 ±  3%      -2.5        0.60 ± 13%  perf-profile.self.cycles-pp.__slab_free
> >       6.65 ±  2%      -2.2        4.47 ±  2%  perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
> >       4.78 ±  3%      -1.9        2.88 ±  3%  perf-profile.self.cycles-pp.__entry_text_start
> >       3.52 ±  2%      -1.9        1.64 ±  6%  perf-profile.self.cycles-pp.copy_user_enhanced_fast_string
> >       2.06 ±  3%      -1.1        0.96 ±  5%  perf-profile.self.cycles-pp.kmem_cache_free
> >       1.42 ±  3%      -1.0        0.46 ± 10%  perf-profile.self.cycles-pp.check_heap_object
> >       1.43 ±  4%      -0.8        0.64        perf-profile.self.cycles-pp.sock_wfree
> >       0.99 ±  3%      -0.8        0.21 ± 12%  perf-profile.self.cycles-pp.skb_release_data
> >       0.84 ±  8%      -0.7        0.10 ± 64%  perf-profile.self.cycles-pp.___slab_alloc
> >       1.97 ±  2%      -0.6        1.32        perf-profile.self.cycles-pp.unix_stream_read_generic
> >       1.60 ±  3%      -0.5        1.11 ±  4%  perf-profile.self.cycles-pp.memcg_slab_post_alloc_hook
> >       1.24 ±  2%      -0.5        0.75 ± 11%  perf-profile.self.cycles-pp.mod_objcg_state
> >       0.71            -0.5        0.23 ± 15%  perf-profile.self.cycles-pp.__build_skb_around
> >       0.95 ±  3%      -0.5        0.50 ±  6%  perf-profile.self.cycles-pp.__alloc_skb
> >       0.97 ±  4%      -0.4        0.55 ±  5%  perf-profile.self.cycles-pp.kmem_cache_alloc_node
> >       0.99 ±  3%      -0.4        0.59 ±  4%  perf-profile.self.cycles-pp.vfs_write
> >       1.38 ±  2%      -0.4        0.99        perf-profile.self.cycles-pp.__kmem_cache_free
> >       0.86 ±  2%      -0.4        0.50 ±  3%  perf-profile.self.cycles-pp.__kmem_cache_alloc_node
> >       0.92 ±  4%      -0.4        0.56 ±  4%  perf-profile.self.cycles-pp.sock_write_iter
> >       1.06 ±  3%      -0.4        0.70 ±  3%  perf-profile.self.cycles-pp.__might_resched
> >       0.73 ±  4%      -0.3        0.44 ±  4%  perf-profile.self.cycles-pp.__cond_resched
> >       0.85 ±  3%      -0.3        0.59 ±  4%  perf-profile.self.cycles-pp.__check_heap_object
> >       1.46 ±  7%      -0.3        1.20 ±  2%  perf-profile.self.cycles-pp.unix_stream_sendmsg
> >       0.73 ±  9%      -0.3        0.47 ±  2%  perf-profile.self.cycles-pp.skb_set_owner_w
> >       1.54            -0.3        1.28 ±  4%  perf-profile.self.cycles-pp.apparmor_file_permission
> >       0.74 ±  3%      -0.2        0.50 ±  2%  perf-profile.self.cycles-pp.get_obj_cgroup_from_current
> >       1.15 ±  3%      -0.2        0.91 ±  8%  perf-profile.self.cycles-pp.aa_sk_perm
> >       0.60            -0.2        0.36 ±  4%  perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
> >       0.65 ±  4%      -0.2        0.45 ±  6%  perf-profile.self.cycles-pp.__get_obj_cgroup_from_memcg
> >       0.24 ±  6%      -0.2        0.05 ± 56%  perf-profile.self.cycles-pp.fsnotify_perm
> >       0.76 ±  3%      -0.2        0.58 ±  2%  perf-profile.self.cycles-pp.sock_read_iter
> >       1.10 ±  4%      -0.2        0.92 ±  6%  perf-profile.self.cycles-pp.__fget_light
> >       0.42 ±  3%      -0.2        0.25 ±  4%  perf-profile.self.cycles-pp.obj_cgroup_charge
> >       0.32 ±  4%      -0.2        0.17 ±  6%  perf-profile.self.cycles-pp.refill_obj_stock
> >       0.29            -0.2        0.14 ±  8%  perf-profile.self.cycles-pp.__kmalloc_node_track_caller
> >       0.54 ±  3%      -0.1        0.40 ±  2%  perf-profile.self.cycles-pp.__might_sleep
> >       0.30 ±  7%      -0.1        0.16 ± 22%  perf-profile.self.cycles-pp.security_file_permission
> >       0.34 ±  3%      -0.1        0.21 ±  6%  perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
> >       0.41 ±  3%      -0.1        0.29 ±  3%  perf-profile.self.cycles-pp.is_vmalloc_addr
> >       0.27 ±  3%      -0.1        0.16 ±  6%  perf-profile.self.cycles-pp._copy_from_iter
> >       0.24 ±  3%      -0.1        0.12 ±  3%  perf-profile.self.cycles-pp.ksys_write
> >       0.95 ±  2%      -0.1        0.84 ±  5%  perf-profile.self.cycles-pp.__virt_addr_valid
> >       0.56 ± 11%      -0.1        0.46 ±  4%  perf-profile.self.cycles-pp.sock_def_readable
> >       0.16 ±  7%      -0.1        0.06 ± 18%  perf-profile.self.cycles-pp.sock_recvmsg
> >       0.22 ±  5%      -0.1        0.14 ±  2%  perf-profile.self.cycles-pp.ksys_read
> >       0.27 ±  4%      -0.1        0.19 ±  5%  perf-profile.self.cycles-pp.kmalloc_slab
> >       0.28 ±  2%      -0.1        0.20 ±  2%  perf-profile.self.cycles-pp.consume_skb
> >       0.35 ±  2%      -0.1        0.28 ±  3%  perf-profile.self.cycles-pp.__check_object_size
> >       0.13 ±  8%      -0.1        0.06 ± 18%  perf-profile.self.cycles-pp.syscall_exit_to_user_mode_prepare
> >       0.20 ±  5%      -0.1        0.12 ±  6%  perf-profile.self.cycles-pp.kmalloc_reserve
> >       0.26 ±  5%      -0.1        0.19 ±  4%  perf-profile.self.cycles-pp.sock_alloc_send_pskb
> >       0.42 ±  2%      -0.1        0.35 ±  7%  perf-profile.self.cycles-pp.syscall_enter_from_user_mode
> >       0.19 ±  5%      -0.1        0.12 ±  6%  perf-profile.self.cycles-pp.aa_file_perm
> >       0.16 ±  4%      -0.1        0.10 ±  4%  perf-profile.self.cycles-pp.skb_copy_datagram_from_iter
> >       0.18 ±  4%      -0.1        0.12 ±  6%  perf-profile.self.cycles-pp.apparmor_socket_sendmsg
> >       0.18 ±  5%      -0.1        0.12 ±  4%  perf-profile.self.cycles-pp.apparmor_socket_recvmsg
> >       0.15 ±  5%      -0.1        0.10 ±  5%  perf-profile.self.cycles-pp.alloc_skb_with_frags
> >       0.64 ±  3%      -0.1        0.59        perf-profile.self.cycles-pp.__libc_write
> >       0.20 ±  4%      -0.1        0.15 ±  3%  perf-profile.self.cycles-pp._copy_to_iter
> >       0.15 ±  5%      -0.1        0.10 ± 11%  perf-profile.self.cycles-pp.sock_sendmsg
> >       0.08 ±  4%      -0.1        0.03 ± 81%  perf-profile.self.cycles-pp.copyout
> >       0.11 ±  6%      -0.0        0.06 ±  7%  perf-profile.self.cycles-pp.__fdget_pos
> >       0.12 ±  5%      -0.0        0.07 ± 10%  perf-profile.self.cycles-pp.kmalloc_size_roundup
> >       0.34 ±  3%      -0.0        0.29        perf-profile.self.cycles-pp.do_syscall_64
> >       0.20 ±  4%      -0.0        0.15 ±  4%  perf-profile.self.cycles-pp.rcu_all_qs
> >       0.41 ±  3%      -0.0        0.37 ±  8%  perf-profile.self.cycles-pp.unix_stream_recvmsg
> >       0.22 ±  2%      -0.0        0.17 ±  4%  perf-profile.self.cycles-pp.unix_destruct_scm
> >       0.09 ±  4%      -0.0        0.05        perf-profile.self.cycles-pp.should_failslab
> >       0.10 ± 15%      -0.0        0.06 ± 50%  perf-profile.self.cycles-pp.__mod_memcg_lruvec_state
> >       0.11 ±  4%      -0.0        0.07        perf-profile.self.cycles-pp.__might_fault
> >       0.16 ±  2%      -0.0        0.13 ±  6%  perf-profile.self.cycles-pp.obj_cgroup_uncharge_pages
> >       0.18 ±  4%      -0.0        0.16 ±  3%  perf-profile.self.cycles-pp.security_socket_getpeersec_dgram
> >       0.28 ±  2%      -0.0        0.25 ±  2%  perf-profile.self.cycles-pp.unix_write_space
> >       0.17 ±  2%      -0.0        0.15 ±  5%  perf-profile.self.cycles-pp.apparmor_socket_getpeersec_dgram
> >       0.08 ±  6%      -0.0        0.05 ±  7%  perf-profile.self.cycles-pp.security_socket_sendmsg
> >       0.12 ±  4%      -0.0        0.10 ±  3%  perf-profile.self.cycles-pp.__skb_datagram_iter
> >       0.24 ±  2%      -0.0        0.22        perf-profile.self.cycles-pp.mutex_unlock
> >       0.08 ±  5%      +0.0        0.10 ±  6%  perf-profile.self.cycles-pp.scm_recv
> >       0.17 ±  2%      +0.0        0.19 ±  3%  perf-profile.self.cycles-pp.__x64_sys_read
> >       0.19 ±  3%      +0.0        0.22 ±  2%  perf-profile.self.cycles-pp.__get_task_ioprio
> >       0.00            +0.1        0.06        perf-profile.self.cycles-pp.finish_wait
> >       0.00            +0.1        0.06 ±  7%  perf-profile.self.cycles-pp.cr4_update_irqsoff
> >       0.00            +0.1        0.06 ±  7%  perf-profile.self.cycles-pp.invalidate_user_asid
> >       0.00            +0.1        0.07 ± 12%  perf-profile.self.cycles-pp.wake_affine
> >       0.00            +0.1        0.07 ±  7%  perf-profile.self.cycles-pp.check_cfs_rq_runtime
> >       0.00            +0.1        0.07 ±  5%  perf-profile.self.cycles-pp.perf_trace_buf_update
> >       0.00            +0.1        0.07 ±  9%  perf-profile.self.cycles-pp.asm_sysvec_reschedule_ipi
> >       0.00            +0.1        0.07 ± 10%  perf-profile.self.cycles-pp.__bitmap_and
> >       0.00            +0.1        0.08 ± 10%  perf-profile.self.cycles-pp.schedule_debug
> >       0.00            +0.1        0.08 ± 13%  perf-profile.self.cycles-pp.read@plt
> >       0.00            +0.1        0.08 ± 12%  perf-profile.self.cycles-pp.perf_trace_buf_alloc
> >       0.00            +0.1        0.09 ± 35%  perf-profile.self.cycles-pp.migrate_task_rq_fair
> >       0.00            +0.1        0.09 ±  5%  perf-profile.self.cycles-pp.place_entity
> >       0.00            +0.1        0.10 ±  4%  perf-profile.self.cycles-pp.tracing_gen_ctx_irq_test
> >       0.00            +0.1        0.10        perf-profile.self.cycles-pp.__wake_up_common_lock
> >       0.07 ± 17%      +0.1        0.18 ±  3%  perf-profile.self.cycles-pp.__list_add_valid
> >       0.00            +0.1        0.11 ±  8%  perf-profile.self.cycles-pp.native_irq_return_iret
> >       0.00            +0.1        0.12 ±  6%  perf-profile.self.cycles-pp.select_idle_cpu
> >       0.00            +0.1        0.12 ± 34%  perf-profile.self.cycles-pp._find_next_and_bit
> >       0.00            +0.1        0.13 ± 25%  perf-profile.self.cycles-pp.__cgroup_account_cputime
> >       0.00            +0.1        0.13 ±  7%  perf-profile.self.cycles-pp.switch_ldt
> >       0.00            +0.1        0.14 ±  5%  perf-profile.self.cycles-pp.check_preempt_curr
> >       0.00            +0.1        0.15 ±  2%  perf-profile.self.cycles-pp.save_fpregs_to_fpstate
> >       0.00            +0.1        0.15 ±  5%  perf-profile.self.cycles-pp.__rdgsbase_inactive
> >       0.14 ±  3%      +0.2        0.29        perf-profile.self.cycles-pp._raw_spin_unlock_irqrestore
> >       0.00            +0.2        0.15 ±  7%  perf-profile.self.cycles-pp.ttwu_queue_wakelist
> >       0.00            +0.2        0.17 ±  4%  perf-profile.self.cycles-pp.rb_insert_color
> >       0.00            +0.2        0.17 ±  5%  perf-profile.self.cycles-pp.rb_next
> >       0.00            +0.2        0.18 ±  2%  perf-profile.self.cycles-pp.autoremove_wake_function
> >       0.01 ±223%      +0.2        0.19 ±  6%  perf-profile.self.cycles-pp.ttwu_do_activate
> >       0.00            +0.2        0.20 ±  2%  perf-profile.self.cycles-pp.rcu_note_context_switch
> >       0.00            +0.2        0.20 ±  7%  perf-profile.self.cycles-pp.exit_to_user_mode_loop
> >       0.27            +0.2        0.47 ±  3%  perf-profile.self.cycles-pp.mutex_lock
> >       0.00            +0.2        0.20 ± 28%  perf-profile.self.cycles-pp.perf_trace_sched_switch
> >       0.00            +0.2        0.21 ±  9%  perf-profile.self.cycles-pp.resched_curr
> >       0.04 ± 45%      +0.2        0.26 ±  7%  perf-profile.self.cycles-pp.perf_tp_event
> >       0.06 ±  7%      +0.2        0.28 ±  8%  perf-profile.self.cycles-pp.perf_trace_sched_wakeup_template
> >       0.19 ±  7%      +0.2        0.41 ±  5%  perf-profile.self.cycles-pp.__list_del_entry_valid
> >       0.08 ±  5%      +0.2        0.31 ± 11%  perf-profile.self.cycles-pp.task_h_load
> >       0.00            +0.2        0.23 ±  5%  perf-profile.self.cycles-pp.finish_task_switch
> >       0.03 ± 70%      +0.2        0.27 ±  5%  perf-profile.self.cycles-pp.rb_erase
> >       0.02 ±142%      +0.3        0.29 ±  2%  perf-profile.self.cycles-pp.native_sched_clock
> >       0.00            +0.3        0.28 ±  3%  perf-profile.self.cycles-pp.__wrgsbase_inactive
> >       0.00            +0.3        0.28 ±  6%  perf-profile.self.cycles-pp.clear_buddies
> >       0.07 ± 10%      +0.3        0.35 ±  3%  perf-profile.self.cycles-pp.schedule_timeout
> >       0.03 ± 70%      +0.3        0.33 ±  3%  perf-profile.self.cycles-pp.select_task_rq
> >       0.06 ± 13%      +0.3        0.36 ±  4%  perf-profile.self.cycles-pp.__wake_up_common
> >       0.06 ± 13%      +0.3        0.36 ±  3%  perf-profile.self.cycles-pp.dequeue_entity
> >       0.06 ± 18%      +0.3        0.37 ±  7%  perf-profile.self.cycles-pp.perf_trace_sched_stat_runtime
> >       0.01 ±223%      +0.3        0.33 ±  4%  perf-profile.self.cycles-pp.schedule
> >       0.02 ±142%      +0.3        0.35 ±  7%  perf-profile.self.cycles-pp.cpuacct_charge
> >       0.01 ±223%      +0.3        0.35        perf-profile.self.cycles-pp.set_next_entity
> >       0.00            +0.4        0.35 ± 13%  perf-profile.self.cycles-pp.available_idle_cpu
> >       0.08 ± 10%      +0.4        0.44 ±  5%  perf-profile.self.cycles-pp.prepare_to_wait
> >       0.63 ±  3%      +0.4        1.00 ±  4%  perf-profile.self.cycles-pp.vfs_read
> >       0.02 ±142%      +0.4        0.40 ±  4%  perf-profile.self.cycles-pp.check_preempt_wakeup
> >       0.02 ±141%      +0.4        0.42 ±  4%  perf-profile.self.cycles-pp.pick_next_entity
> >       0.07 ± 17%      +0.4        0.48        perf-profile.self.cycles-pp.__calc_delta
> >       0.06 ± 14%      +0.4        0.47 ±  3%  perf-profile.self.cycles-pp.unix_stream_data_wait
> >       0.04 ± 45%      +0.4        0.45 ±  4%  perf-profile.self.cycles-pp.switch_fpu_return
> >       0.00            +0.5        0.46 ±  7%  perf-profile.self.cycles-pp.set_next_buddy
> >       0.07 ± 17%      +0.5        0.53 ±  3%  perf-profile.self.cycles-pp.select_task_rq_fair
> >       0.08 ± 16%      +0.5        0.55 ±  4%  perf-profile.self.cycles-pp.try_to_wake_up
> >       0.08 ± 19%      +0.5        0.56 ±  3%  perf-profile.self.cycles-pp.update_rq_clock
> >       0.02 ±141%      +0.5        0.50 ± 10%  perf-profile.self.cycles-pp.select_idle_sibling
> >       0.77 ±  2%      +0.5        1.25 ±  2%  perf-profile.self.cycles-pp.__libc_read
> >       0.09 ± 19%      +0.5        0.59 ±  3%  perf-profile.self.cycles-pp.reweight_entity
> >       0.08 ± 14%      +0.5        0.59 ±  2%  perf-profile.self.cycles-pp.dequeue_task_fair
> >       0.08 ± 13%      +0.6        0.64 ±  5%  perf-profile.self.cycles-pp.update_min_vruntime
> >       0.02 ±141%      +0.6        0.58 ±  7%  perf-profile.self.cycles-pp.put_prev_entity
> >       0.06 ± 11%      +0.6        0.64 ±  4%  perf-profile.self.cycles-pp.enqueue_task_fair
> >       0.07 ± 18%      +0.6        0.68 ±  3%  perf-profile.self.cycles-pp.os_xsave
> >       1.39 ±  2%      +0.7        2.06 ±  3%  perf-profile.self.cycles-pp._raw_spin_lock_irqsave
> >       0.28 ±  8%      +0.7        0.97 ±  4%  perf-profile.self.cycles-pp.update_cfs_group
> >       0.14 ±  8%      +0.7        0.83 ±  3%  perf-profile.self.cycles-pp.__update_load_avg_se
> >       1.76 ±  3%      +0.7        2.47 ±  3%  perf-profile.self.cycles-pp._raw_spin_lock
> >       0.12 ± 12%      +0.7        0.85 ±  5%  perf-profile.self.cycles-pp.prepare_task_switch
> >       0.12 ± 12%      +0.8        0.91 ±  3%  perf-profile.self.cycles-pp.__update_load_avg_cfs_rq
> >       0.13 ± 12%      +0.8        0.93 ±  5%  perf-profile.self.cycles-pp.pick_next_task_fair
> >       0.13 ± 12%      +0.9        0.98 ±  4%  perf-profile.self.cycles-pp.__switch_to
> >       0.11 ± 18%      +0.9        1.06 ±  5%  perf-profile.self.cycles-pp.___perf_sw_event
> >       0.16 ± 11%      +1.2        1.34 ±  4%  perf-profile.self.cycles-pp.enqueue_entity
> >       0.20 ± 12%      +1.4        1.58 ±  4%  perf-profile.self.cycles-pp.__switch_to_asm
> >       0.23 ± 10%      +1.4        1.65 ±  5%  perf-profile.self.cycles-pp.restore_fpregs_from_fpstate
> >       0.25 ± 12%      +1.5        1.77 ±  4%  perf-profile.self.cycles-pp.__schedule
> >       0.22 ± 10%      +1.6        1.78 ± 10%  perf-profile.self.cycles-pp.update_load_avg
> >       0.23 ± 16%      +1.7        1.91 ±  7%  perf-profile.self.cycles-pp.update_curr
> >       0.48 ± 11%      +3.4        3.86 ±  4%  perf-profile.self.cycles-pp.switch_mm_irqs_off
> >
> >
> > To reproduce:
> >
> >         git clone https://github.com/intel/lkp-tests.git
> >         cd lkp-tests
> >         sudo bin/lkp install job.yaml           # job file is attached in this email
> >         bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
> >         sudo bin/lkp run generated-yaml-file
> >
> >         # if come across any failure that blocks the test,
> >         # please remove ~/.lkp and /lkp dir to run from a clean state.
>
>
>
> Amazon Development Center Germany GmbH
> Krausenstr. 38
> 10117 Berlin
> Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
> Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
> Sitz: Berlin
> Ust-ID: DE 289 237 879
>
>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3] sched/fair: sanitize vruntime of entity being placed
  2023-02-21 17:26     ` Vincent Guittot
@ 2023-02-27  8:42       ` Roman Kagan
  2023-02-27 14:37         ` Vincent Guittot
  0 siblings, 1 reply; 14+ messages in thread
From: Roman Kagan @ 2023-02-27  8:42 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Peter Zijlstra, linux-kernel, Valentin Schneider, Zhang Qiao,
	Ben Segall, Waiman Long, Steven Rostedt, Mel Gorman,
	Dietmar Eggemann, Daniel Bristot de Oliveira, Ingo Molnar,
	Juri Lelli

On Tue, Feb 21, 2023 at 06:26:11PM +0100, Vincent Guittot wrote:
> On Tue, 21 Feb 2023 at 17:57, Roman Kagan <rkagan@amazon.de> wrote:
> > What scares me, though, is that I've got a message from the test robot
> > that this commit drammatically affected hackbench results, see the quote
> > below.  I expected the commit not to affect any benchmarks.
> >
> > Any idea what could have caused this change?
> 
> Hmm, It's most probably because se->exec_start is reset after a
> migration and the condition becomes true for newly migrated task
> whereas its vruntime should be after min_vruntime.
> 
> We have missed this condition

Makes sense to me.

But what would then be the reliable way to detect a sched_entity which
has slept for long and risks overflowing in .vruntime comparison?

Thanks,
Roman.



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3] sched/fair: sanitize vruntime of entity being placed
  2023-02-27  8:42       ` Roman Kagan
@ 2023-02-27 14:37         ` Vincent Guittot
  2023-02-27 17:00           ` Dietmar Eggemann
  2023-03-02  9:36           ` Zhang Qiao
  0 siblings, 2 replies; 14+ messages in thread
From: Vincent Guittot @ 2023-02-27 14:37 UTC (permalink / raw)
  To: Roman Kagan, Vincent Guittot, Peter Zijlstra, linux-kernel,
	Valentin Schneider, Zhang Qiao, Ben Segall, Waiman Long,
	Steven Rostedt, Mel Gorman, Dietmar Eggemann,
	Daniel Bristot de Oliveira, Ingo Molnar, Juri Lelli

On Mon, 27 Feb 2023 at 09:43, Roman Kagan <rkagan@amazon.de> wrote:
>
> On Tue, Feb 21, 2023 at 06:26:11PM +0100, Vincent Guittot wrote:
> > On Tue, 21 Feb 2023 at 17:57, Roman Kagan <rkagan@amazon.de> wrote:
> > > What scares me, though, is that I've got a message from the test robot
> > > that this commit drammatically affected hackbench results, see the quote
> > > below.  I expected the commit not to affect any benchmarks.
> > >
> > > Any idea what could have caused this change?
> >
> > Hmm, It's most probably because se->exec_start is reset after a
> > migration and the condition becomes true for newly migrated task
> > whereas its vruntime should be after min_vruntime.
> >
> > We have missed this condition
>
> Makes sense to me.
>
> But what would then be the reliable way to detect a sched_entity which
> has slept for long and risks overflowing in .vruntime comparison?

For now I don't have a better idea than adding the same check in
migrate_task_rq_fair()

>
> Thanks,
> Roman.
>
>
>
> Amazon Development Center Germany GmbH
> Krausenstr. 38
> 10117 Berlin
> Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
> Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
> Sitz: Berlin
> Ust-ID: DE 289 237 879
>
>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3] sched/fair: sanitize vruntime of entity being placed
  2023-02-27 14:37         ` Vincent Guittot
@ 2023-02-27 17:00           ` Dietmar Eggemann
  2023-02-27 17:15             ` Vincent Guittot
  2023-03-02  9:36           ` Zhang Qiao
  1 sibling, 1 reply; 14+ messages in thread
From: Dietmar Eggemann @ 2023-02-27 17:00 UTC (permalink / raw)
  To: Vincent Guittot, Roman Kagan, Peter Zijlstra, linux-kernel,
	Valentin Schneider, Zhang Qiao, Ben Segall, Waiman Long,
	Steven Rostedt, Mel Gorman, Daniel Bristot de Oliveira,
	Ingo Molnar, Juri Lelli

On 27/02/2023 15:37, Vincent Guittot wrote:
> On Mon, 27 Feb 2023 at 09:43, Roman Kagan <rkagan@amazon.de> wrote:
>>
>> On Tue, Feb 21, 2023 at 06:26:11PM +0100, Vincent Guittot wrote:
>>> On Tue, 21 Feb 2023 at 17:57, Roman Kagan <rkagan@amazon.de> wrote:
>>>> What scares me, though, is that I've got a message from the test robot
>>>> that this commit drammatically affected hackbench results, see the quote
>>>> below.  I expected the commit not to affect any benchmarks.
>>>>
>>>> Any idea what could have caused this change?
>>>
>>> Hmm, It's most probably because se->exec_start is reset after a
>>> migration and the condition becomes true for newly migrated task
>>> whereas its vruntime should be after min_vruntime.
>>>
>>> We have missed this condition
>>
>> Makes sense to me.
>>
>> But what would then be the reliable way to detect a sched_entity which
>> has slept for long and risks overflowing in .vruntime comparison?
> 
> For now I don't have a better idea than adding the same check in
> migrate_task_rq_fair()

Don't we have the issue that we could have a non-up-to-date rq clock in
migrate? No rq lock held in `!task_on_rq_migrating(p)`.

Also deferring `se->exec_start = 0` from `migrate` into `enqueue ->
place entity` doesn't seem to work since the rq clocks of different CPUs
are not in sync.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3] sched/fair: sanitize vruntime of entity being placed
  2023-02-27 17:00           ` Dietmar Eggemann
@ 2023-02-27 17:15             ` Vincent Guittot
  0 siblings, 0 replies; 14+ messages in thread
From: Vincent Guittot @ 2023-02-27 17:15 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: Roman Kagan, Peter Zijlstra, linux-kernel, Valentin Schneider,
	Zhang Qiao, Ben Segall, Waiman Long, Steven Rostedt, Mel Gorman,
	Daniel Bristot de Oliveira, Ingo Molnar, Juri Lelli

On Mon, 27 Feb 2023 at 18:00, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
>
> On 27/02/2023 15:37, Vincent Guittot wrote:
> > On Mon, 27 Feb 2023 at 09:43, Roman Kagan <rkagan@amazon.de> wrote:
> >>
> >> On Tue, Feb 21, 2023 at 06:26:11PM +0100, Vincent Guittot wrote:
> >>> On Tue, 21 Feb 2023 at 17:57, Roman Kagan <rkagan@amazon.de> wrote:
> >>>> What scares me, though, is that I've got a message from the test robot
> >>>> that this commit drammatically affected hackbench results, see the quote
> >>>> below.  I expected the commit not to affect any benchmarks.
> >>>>
> >>>> Any idea what could have caused this change?
> >>>
> >>> Hmm, It's most probably because se->exec_start is reset after a
> >>> migration and the condition becomes true for newly migrated task
> >>> whereas its vruntime should be after min_vruntime.
> >>>
> >>> We have missed this condition
> >>
> >> Makes sense to me.
> >>
> >> But what would then be the reliable way to detect a sched_entity which
> >> has slept for long and risks overflowing in .vruntime comparison?
> >
> > For now I don't have a better idea than adding the same check in
> > migrate_task_rq_fair()
>
> Don't we have the issue that we could have a non-up-to-date rq clock in
> migrate? No rq lock held in `!task_on_rq_migrating(p)`.

yes the rq clock may be not up to date but that would also mean that
the cfs was idle and as a result its min_vruntime has not moved
forward and we don't have a problem of possible overflow

>
> Also deferring `se->exec_start = 0` from `migrate` into `enqueue ->
> place entity` doesn't seem to work since the rq clocks of different CPUs
> are not in sync.

yes

>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3] sched/fair: sanitize vruntime of entity being placed
  2023-02-27 14:37         ` Vincent Guittot
  2023-02-27 17:00           ` Dietmar Eggemann
@ 2023-03-02  9:36           ` Zhang Qiao
  2023-03-02 13:34             ` Vincent Guittot
  1 sibling, 1 reply; 14+ messages in thread
From: Zhang Qiao @ 2023-03-02  9:36 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Roman Kagan, Peter Zijlstra, linux-kernel, Valentin Schneider,
	Ben Segall, Waiman Long, Steven Rostedt, Mel Gorman,
	Dietmar Eggemann, Daniel Bristot de Oliveira, Ingo Molnar,
	Juri Lelli



在 2023/2/27 22:37, Vincent Guittot 写道:
> On Mon, 27 Feb 2023 at 09:43, Roman Kagan <rkagan@amazon.de> wrote:
>>
>> On Tue, Feb 21, 2023 at 06:26:11PM +0100, Vincent Guittot wrote:
>>> On Tue, 21 Feb 2023 at 17:57, Roman Kagan <rkagan@amazon.de> wrote:
>>>> What scares me, though, is that I've got a message from the test robot
>>>> that this commit drammatically affected hackbench results, see the quote
>>>> below.  I expected the commit not to affect any benchmarks.
>>>>
>>>> Any idea what could have caused this change?
>>>
>>> Hmm, It's most probably because se->exec_start is reset after a
>>> migration and the condition becomes true for newly migrated task
>>> whereas its vruntime should be after min_vruntime.
>>>
>>> We have missed this condition
>>
>> Makes sense to me.
>>
>> But what would then be the reliable way to detect a sched_entity which
>> has slept for long and risks overflowing in .vruntime comparison?
> 
> For now I don't have a better idea than adding the same check in
> migrate_task_rq_fair()

Hi, Vincent,
I fixed this condition as you said, and the test results are as follows.

testcase: hackbench -g 44 -f 20 --process --pipe -l 60000 -s 100
version1: v6.2
version2: v6.2 + commit 829c1651e9c4
version3: v6.2 + commit 829c1651e9c4 + this patch

-------------------------------------------------
	version1	version2	version3
test1	81.0 		118.1 		82.1
test2	82.1 		116.9 		80.3
test3	83.2 		103.9 		83.3
avg(s)	82.1 		113.0 		81.9

-------------------------------------------------
After deal with the task migration case, the hackbench result has restored.

The patch as follow, how does this look?

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ff4dbbae3b10..3a88d20fd29e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4648,6 +4648,26 @@ static void check_spread(struct cfs_rq *cfs_rq, struct sched_entity *se)
 #endif
 }

+static inline u64 sched_sleeper_credit(struct sched_entity *se)
+{
+
+       unsigned long thresh;
+
+       if (se_is_idle(se))
+               thresh = sysctl_sched_min_granularity;
+       else
+               thresh = sysctl_sched_latency;
+
+       /*
+        * Halve their sleep time's effect, to allow
+        * for a gentler effect of sleepers:
+        */
+       if (sched_feat(GENTLE_FAIR_SLEEPERS))
+               thresh >>= 1;
+
+       return thresh;
+}
+
 static void
 place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
 {
@@ -4664,23 +4684,8 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
                vruntime += sched_vslice(cfs_rq, se);

        /* sleeps up to a single latency don't count. */
-       if (!initial) {
-               unsigned long thresh;
-
-               if (se_is_idle(se))
-                       thresh = sysctl_sched_min_granularity;
-               else
-                       thresh = sysctl_sched_latency;
-
-               /*
-                * Halve their sleep time's effect, to allow
-                * for a gentler effect of sleepers:
-                */
-               if (sched_feat(GENTLE_FAIR_SLEEPERS))
-                       thresh >>= 1;
-
-               vruntime -= thresh;
-       }
+       if (!initial)
+               vruntime -= sched_sleeper_credit(se);

        /*
         * Pull vruntime of the entity being placed to the base level of
@@ -4690,7 +4695,7 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
         * inversed due to s64 overflow.
         */
        sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start;
-       if ((s64)sleep_time > 60LL * NSEC_PER_SEC)
+       if (se->exec_start != 0 && (s64)sleep_time > 60LL * NSEC_PER_SEC)
                se->vruntime = vruntime;
        else
                se->vruntime = max_vruntime(se->vruntime, vruntime);
@@ -7634,8 +7639,12 @@ static void migrate_task_rq_fair(struct task_struct *p, int new_cpu)
         */
        if (READ_ONCE(p->__state) == TASK_WAKING) {
                struct cfs_rq *cfs_rq = cfs_rq_of(se);
+               u64 sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start;

-               se->vruntime -= u64_u32_load(cfs_rq->min_vruntime);
+               if ((s64)sleep_time > 60LL * NSEC_PER_SEC)
+                       se->vruntime = -sched_sleeper_credit(se);
+               else
+                       se->vruntime -= u64_u32_load(cfs_rq->min_vruntime);
        }

        if (!task_on_rq_migrating(p)) {



Thanks.
Zhang Qiao.

> 
>>
>> Thanks,
>> Roman.
>>
>>
>>
>> Amazon Development Center Germany GmbH
>> Krausenstr. 38
>> 10117 Berlin
>> Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
>> Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
>> Sitz: Berlin
>> Ust-ID: DE 289 237 879
>>
>>
>>
> .
> 

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v3] sched/fair: sanitize vruntime of entity being placed
  2023-03-02  9:36           ` Zhang Qiao
@ 2023-03-02 13:34             ` Vincent Guittot
  2023-03-02 14:29               ` Zhang Qiao
  0 siblings, 1 reply; 14+ messages in thread
From: Vincent Guittot @ 2023-03-02 13:34 UTC (permalink / raw)
  To: Zhang Qiao
  Cc: Roman Kagan, Peter Zijlstra, linux-kernel, Valentin Schneider,
	Ben Segall, Waiman Long, Steven Rostedt, Mel Gorman,
	Dietmar Eggemann, Daniel Bristot de Oliveira, Ingo Molnar,
	Juri Lelli

On Thu, 2 Mar 2023 at 10:36, Zhang Qiao <zhangqiao22@huawei.com> wrote:
>
>
>
> 在 2023/2/27 22:37, Vincent Guittot 写道:
> > On Mon, 27 Feb 2023 at 09:43, Roman Kagan <rkagan@amazon.de> wrote:
> >>
> >> On Tue, Feb 21, 2023 at 06:26:11PM +0100, Vincent Guittot wrote:
> >>> On Tue, 21 Feb 2023 at 17:57, Roman Kagan <rkagan@amazon.de> wrote:
> >>>> What scares me, though, is that I've got a message from the test robot
> >>>> that this commit drammatically affected hackbench results, see the quote
> >>>> below.  I expected the commit not to affect any benchmarks.
> >>>>
> >>>> Any idea what could have caused this change?
> >>>
> >>> Hmm, It's most probably because se->exec_start is reset after a
> >>> migration and the condition becomes true for newly migrated task
> >>> whereas its vruntime should be after min_vruntime.
> >>>
> >>> We have missed this condition
> >>
> >> Makes sense to me.
> >>
> >> But what would then be the reliable way to detect a sched_entity which
> >> has slept for long and risks overflowing in .vruntime comparison?
> >
> > For now I don't have a better idea than adding the same check in
> > migrate_task_rq_fair()
>
> Hi, Vincent,
> I fixed this condition as you said, and the test results are as follows.
>
> testcase: hackbench -g 44 -f 20 --process --pipe -l 60000 -s 100
> version1: v6.2
> version2: v6.2 + commit 829c1651e9c4
> version3: v6.2 + commit 829c1651e9c4 + this patch
>
> -------------------------------------------------
>         version1        version2        version3
> test1   81.0            118.1           82.1
> test2   82.1            116.9           80.3
> test3   83.2            103.9           83.3
> avg(s)  82.1            113.0           81.9
>
> -------------------------------------------------
> After deal with the task migration case, the hackbench result has restored.
>
> The patch as follow, how does this look?
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index ff4dbbae3b10..3a88d20fd29e 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4648,6 +4648,26 @@ static void check_spread(struct cfs_rq *cfs_rq, struct sched_entity *se)
>  #endif
>  }
>
> +static inline u64 sched_sleeper_credit(struct sched_entity *se)
> +{
> +
> +       unsigned long thresh;
> +
> +       if (se_is_idle(se))
> +               thresh = sysctl_sched_min_granularity;
> +       else
> +               thresh = sysctl_sched_latency;
> +
> +       /*
> +        * Halve their sleep time's effect, to allow
> +        * for a gentler effect of sleepers:
> +        */
> +       if (sched_feat(GENTLE_FAIR_SLEEPERS))
> +               thresh >>= 1;
> +
> +       return thresh;
> +}
> +
>  static void
>  place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
>  {
> @@ -4664,23 +4684,8 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
>                 vruntime += sched_vslice(cfs_rq, se);
>
>         /* sleeps up to a single latency don't count. */
> -       if (!initial) {
> -               unsigned long thresh;
> -
> -               if (se_is_idle(se))
> -                       thresh = sysctl_sched_min_granularity;
> -               else
> -                       thresh = sysctl_sched_latency;
> -
> -               /*
> -                * Halve their sleep time's effect, to allow
> -                * for a gentler effect of sleepers:
> -                */
> -               if (sched_feat(GENTLE_FAIR_SLEEPERS))
> -                       thresh >>= 1;
> -
> -               vruntime -= thresh;
> -       }
> +       if (!initial)
> +               vruntime -= sched_sleeper_credit(se);
>
>         /*
>          * Pull vruntime of the entity being placed to the base level of
> @@ -4690,7 +4695,7 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
>          * inversed due to s64 overflow.
>          */
>         sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start;
> -       if ((s64)sleep_time > 60LL * NSEC_PER_SEC)
> +       if (se->exec_start != 0 && (s64)sleep_time > 60LL * NSEC_PER_SEC)
>                 se->vruntime = vruntime;
>         else
>                 se->vruntime = max_vruntime(se->vruntime, vruntime);
> @@ -7634,8 +7639,12 @@ static void migrate_task_rq_fair(struct task_struct *p, int new_cpu)
>          */
>         if (READ_ONCE(p->__state) == TASK_WAKING) {
>                 struct cfs_rq *cfs_rq = cfs_rq_of(se);
> +               u64 sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start;
>
> -               se->vruntime -= u64_u32_load(cfs_rq->min_vruntime);
> +               if ((s64)sleep_time > 60LL * NSEC_PER_SEC)

You also need to test (se->exec_start !=0) here because the task might
migrate another time before being scheduled. You should create a
helper function like below and use it in both place

static inline bool entity_long_sleep(se)
{
        struct cfs_rq *cfs_rq;
        u64 sleep_time;

        if (se->exec_start == 0)
                return false;

        cfs_rq = cfs_rq_of(se);
        sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start;
        if ((s64)sleep_time > 60LL * NSEC_PER_SEC)
                return true;

        return false;
}


> +                       se->vruntime = -sched_sleeper_credit(se);
> +               else
> +                       se->vruntime -= u64_u32_load(cfs_rq->min_vruntime);
>         }
>
>         if (!task_on_rq_migrating(p)) {
>
>
>
> Thanks.
> Zhang Qiao.
>
> >
> >>
> >> Thanks,
> >> Roman.
> >>
> >>
> >>
> >> Amazon Development Center Germany GmbH
> >> Krausenstr. 38
> >> 10117 Berlin
> >> Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
> >> Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
> >> Sitz: Berlin
> >> Ust-ID: DE 289 237 879
> >>
> >>
> >>
> > .
> >

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3] sched/fair: sanitize vruntime of entity being placed
  2023-03-02 13:34             ` Vincent Guittot
@ 2023-03-02 14:29               ` Zhang Qiao
  2023-03-02 14:55                 ` Vincent Guittot
  0 siblings, 1 reply; 14+ messages in thread
From: Zhang Qiao @ 2023-03-02 14:29 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Roman Kagan, Peter Zijlstra, linux-kernel, Valentin Schneider,
	Ben Segall, Waiman Long, Steven Rostedt, Mel Gorman,
	Dietmar Eggemann, Daniel Bristot de Oliveira, Ingo Molnar,
	Juri Lelli



在 2023/3/2 21:34, Vincent Guittot 写道:
> On Thu, 2 Mar 2023 at 10:36, Zhang Qiao <zhangqiao22@huawei.com> wrote:
>>
>>
>>
>> 在 2023/2/27 22:37, Vincent Guittot 写道:
>>> On Mon, 27 Feb 2023 at 09:43, Roman Kagan <rkagan@amazon.de> wrote:
>>>>
>>>> On Tue, Feb 21, 2023 at 06:26:11PM +0100, Vincent Guittot wrote:
>>>>> On Tue, 21 Feb 2023 at 17:57, Roman Kagan <rkagan@amazon.de> wrote:
>>>>>> What scares me, though, is that I've got a message from the test robot
>>>>>> that this commit drammatically affected hackbench results, see the quote
>>>>>> below.  I expected the commit not to affect any benchmarks.
>>>>>>
>>>>>> Any idea what could have caused this change?
>>>>>
>>>>> Hmm, It's most probably because se->exec_start is reset after a
>>>>> migration and the condition becomes true for newly migrated task
>>>>> whereas its vruntime should be after min_vruntime.
>>>>>
>>>>> We have missed this condition
>>>>
>>>> Makes sense to me.
>>>>
>>>> But what would then be the reliable way to detect a sched_entity which
>>>> has slept for long and risks overflowing in .vruntime comparison?
>>>
>>> For now I don't have a better idea than adding the same check in
>>> migrate_task_rq_fair()
>>
>> Hi, Vincent,
>> I fixed this condition as you said, and the test results are as follows.
>>
>> testcase: hackbench -g 44 -f 20 --process --pipe -l 60000 -s 100
>> version1: v6.2
>> version2: v6.2 + commit 829c1651e9c4
>> version3: v6.2 + commit 829c1651e9c4 + this patch
>>
>> -------------------------------------------------
>>         version1        version2        version3
>> test1   81.0            118.1           82.1
>> test2   82.1            116.9           80.3
>> test3   83.2            103.9           83.3
>> avg(s)  82.1            113.0           81.9
>>
>> -------------------------------------------------
>> After deal with the task migration case, the hackbench result has restored.
>>
>> The patch as follow, how does this look?
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index ff4dbbae3b10..3a88d20fd29e 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -4648,6 +4648,26 @@ static void check_spread(struct cfs_rq *cfs_rq, struct sched_entity *se)
>>  #endif
>>  }
>>
>> +static inline u64 sched_sleeper_credit(struct sched_entity *se)
>> +{
>> +
>> +       unsigned long thresh;
>> +
>> +       if (se_is_idle(se))
>> +               thresh = sysctl_sched_min_granularity;
>> +       else
>> +               thresh = sysctl_sched_latency;
>> +
>> +       /*
>> +        * Halve their sleep time's effect, to allow
>> +        * for a gentler effect of sleepers:
>> +        */
>> +       if (sched_feat(GENTLE_FAIR_SLEEPERS))
>> +               thresh >>= 1;
>> +
>> +       return thresh;
>> +}
>> +
>>  static void
>>  place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
>>  {
>> @@ -4664,23 +4684,8 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
>>                 vruntime += sched_vslice(cfs_rq, se);
>>
>>         /* sleeps up to a single latency don't count. */
>> -       if (!initial) {
>> -               unsigned long thresh;
>> -
>> -               if (se_is_idle(se))
>> -                       thresh = sysctl_sched_min_granularity;
>> -               else
>> -                       thresh = sysctl_sched_latency;
>> -
>> -               /*
>> -                * Halve their sleep time's effect, to allow
>> -                * for a gentler effect of sleepers:
>> -                */
>> -               if (sched_feat(GENTLE_FAIR_SLEEPERS))
>> -                       thresh >>= 1;
>> -
>> -               vruntime -= thresh;
>> -       }
>> +       if (!initial)
>> +               vruntime -= sched_sleeper_credit(se);
>>
>>         /*
>>          * Pull vruntime of the entity being placed to the base level of
>> @@ -4690,7 +4695,7 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
>>          * inversed due to s64 overflow.
>>          */
>>         sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start;
>> -       if ((s64)sleep_time > 60LL * NSEC_PER_SEC)
>> +       if (se->exec_start != 0 && (s64)sleep_time > 60LL * NSEC_PER_SEC)
>>                 se->vruntime = vruntime;
>>         else
>>                 se->vruntime = max_vruntime(se->vruntime, vruntime);
>> @@ -7634,8 +7639,12 @@ static void migrate_task_rq_fair(struct task_struct *p, int new_cpu)
>>          */
>>         if (READ_ONCE(p->__state) == TASK_WAKING) {
>>                 struct cfs_rq *cfs_rq = cfs_rq_of(se);
>> +               u64 sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start;
>>
>> -               se->vruntime -= u64_u32_load(cfs_rq->min_vruntime);
>> +               if ((s64)sleep_time > 60LL * NSEC_PER_SEC)
> 
> You also need to test (se->exec_start !=0) here because the task might

Hi,

I don't understand when the another migration happend. Could you tell me in more detail?

I think the next migration will happend after the wakee task enqueued, but at this time
the p->__state isn't TASK_WAKING, p->__state already be changed to TASK_RUNNING at ttwu_do_wakeup().

If such a migration exists, Previous code "se->vruntime -= u64_u32_load(cfs_rq->min_vruntime);" maybe
perform multiple times,wouldn't it go wrong in this way?

> migrate another time before being scheduled. You should create a
> helper function like below and use it in both place

Ok, I will update at next version.


Thanks,
ZhangQiao.

>
> static inline bool entity_long_sleep(se)
> {
>         struct cfs_rq *cfs_rq;
>         u64 sleep_time;
> 
>         if (se->exec_start == 0)
>                 return false;
> 
>         cfs_rq = cfs_rq_of(se);
>         sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start;
>         if ((s64)sleep_time > 60LL * NSEC_PER_SEC)
>                 return true;
> 
>         return false;
> }
> 
> 
>> +                       se->vruntime = -sched_sleeper_credit(se);
>> +               else
>> +                       se->vruntime -= u64_u32_load(cfs_rq->min_vruntime);
>>         }
>>
>>         if (!task_on_rq_migrating(p)) {
>>
>>
>>
>> Thanks.
>> Zhang Qiao.
>>
>>>
>>>>
>>>> Thanks,
>>>> Roman.
>>>>
>>>>
>>>>
>>>> Amazon Development Center Germany GmbH
>>>> Krausenstr. 38
>>>> 10117 Berlin
>>>> Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
>>>> Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
>>>> Sitz: Berlin
>>>> Ust-ID: DE 289 237 879
>>>>
>>>>
>>>>
>>> .
>>>
> .
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3] sched/fair: sanitize vruntime of entity being placed
  2023-03-02 14:29               ` Zhang Qiao
@ 2023-03-02 14:55                 ` Vincent Guittot
  2023-03-03  6:51                   ` Zhang Qiao
  0 siblings, 1 reply; 14+ messages in thread
From: Vincent Guittot @ 2023-03-02 14:55 UTC (permalink / raw)
  To: Zhang Qiao
  Cc: Roman Kagan, Peter Zijlstra, linux-kernel, Valentin Schneider,
	Ben Segall, Waiman Long, Steven Rostedt, Mel Gorman,
	Dietmar Eggemann, Daniel Bristot de Oliveira, Ingo Molnar,
	Juri Lelli

On Thu, 2 Mar 2023 at 15:29, Zhang Qiao <zhangqiao22@huawei.com> wrote:
>
>
>
> 在 2023/3/2 21:34, Vincent Guittot 写道:
> > On Thu, 2 Mar 2023 at 10:36, Zhang Qiao <zhangqiao22@huawei.com> wrote:
> >>
> >>
> >>
> >> 在 2023/2/27 22:37, Vincent Guittot 写道:
> >>> On Mon, 27 Feb 2023 at 09:43, Roman Kagan <rkagan@amazon.de> wrote:
> >>>>
> >>>> On Tue, Feb 21, 2023 at 06:26:11PM +0100, Vincent Guittot wrote:
> >>>>> On Tue, 21 Feb 2023 at 17:57, Roman Kagan <rkagan@amazon.de> wrote:
> >>>>>> What scares me, though, is that I've got a message from the test robot
> >>>>>> that this commit drammatically affected hackbench results, see the quote
> >>>>>> below.  I expected the commit not to affect any benchmarks.
> >>>>>>
> >>>>>> Any idea what could have caused this change?
> >>>>>
> >>>>> Hmm, It's most probably because se->exec_start is reset after a
> >>>>> migration and the condition becomes true for newly migrated task
> >>>>> whereas its vruntime should be after min_vruntime.
> >>>>>
> >>>>> We have missed this condition
> >>>>
> >>>> Makes sense to me.
> >>>>
> >>>> But what would then be the reliable way to detect a sched_entity which
> >>>> has slept for long and risks overflowing in .vruntime comparison?
> >>>
> >>> For now I don't have a better idea than adding the same check in
> >>> migrate_task_rq_fair()
> >>
> >> Hi, Vincent,
> >> I fixed this condition as you said, and the test results are as follows.
> >>
> >> testcase: hackbench -g 44 -f 20 --process --pipe -l 60000 -s 100
> >> version1: v6.2
> >> version2: v6.2 + commit 829c1651e9c4
> >> version3: v6.2 + commit 829c1651e9c4 + this patch
> >>
> >> -------------------------------------------------
> >>         version1        version2        version3
> >> test1   81.0            118.1           82.1
> >> test2   82.1            116.9           80.3
> >> test3   83.2            103.9           83.3
> >> avg(s)  82.1            113.0           81.9
> >>
> >> -------------------------------------------------
> >> After deal with the task migration case, the hackbench result has restored.
> >>
> >> The patch as follow, how does this look?
> >>
> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >> index ff4dbbae3b10..3a88d20fd29e 100644
> >> --- a/kernel/sched/fair.c
> >> +++ b/kernel/sched/fair.c
> >> @@ -4648,6 +4648,26 @@ static void check_spread(struct cfs_rq *cfs_rq, struct sched_entity *se)
> >>  #endif
> >>  }
> >>
> >> +static inline u64 sched_sleeper_credit(struct sched_entity *se)
> >> +{
> >> +
> >> +       unsigned long thresh;
> >> +
> >> +       if (se_is_idle(se))
> >> +               thresh = sysctl_sched_min_granularity;
> >> +       else
> >> +               thresh = sysctl_sched_latency;
> >> +
> >> +       /*
> >> +        * Halve their sleep time's effect, to allow
> >> +        * for a gentler effect of sleepers:
> >> +        */
> >> +       if (sched_feat(GENTLE_FAIR_SLEEPERS))
> >> +               thresh >>= 1;
> >> +
> >> +       return thresh;
> >> +}
> >> +
> >>  static void
> >>  place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
> >>  {
> >> @@ -4664,23 +4684,8 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
> >>                 vruntime += sched_vslice(cfs_rq, se);
> >>
> >>         /* sleeps up to a single latency don't count. */
> >> -       if (!initial) {
> >> -               unsigned long thresh;
> >> -
> >> -               if (se_is_idle(se))
> >> -                       thresh = sysctl_sched_min_granularity;
> >> -               else
> >> -                       thresh = sysctl_sched_latency;
> >> -
> >> -               /*
> >> -                * Halve their sleep time's effect, to allow
> >> -                * for a gentler effect of sleepers:
> >> -                */
> >> -               if (sched_feat(GENTLE_FAIR_SLEEPERS))
> >> -                       thresh >>= 1;
> >> -
> >> -               vruntime -= thresh;
> >> -       }
> >> +       if (!initial)
> >> +               vruntime -= sched_sleeper_credit(se);
> >>
> >>         /*
> >>          * Pull vruntime of the entity being placed to the base level of
> >> @@ -4690,7 +4695,7 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
> >>          * inversed due to s64 overflow.
> >>          */
> >>         sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start;
> >> -       if ((s64)sleep_time > 60LL * NSEC_PER_SEC)
> >> +       if (se->exec_start != 0 && (s64)sleep_time > 60LL * NSEC_PER_SEC)
> >>                 se->vruntime = vruntime;
> >>         else
> >>                 se->vruntime = max_vruntime(se->vruntime, vruntime);
> >> @@ -7634,8 +7639,12 @@ static void migrate_task_rq_fair(struct task_struct *p, int new_cpu)
> >>          */
> >>         if (READ_ONCE(p->__state) == TASK_WAKING) {
> >>                 struct cfs_rq *cfs_rq = cfs_rq_of(se);
> >> +               u64 sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start;
> >>
> >> -               se->vruntime -= u64_u32_load(cfs_rq->min_vruntime);
> >> +               if ((s64)sleep_time > 60LL * NSEC_PER_SEC)
> >
> > You also need to test (se->exec_start !=0) here because the task might
>
> Hi,
>
> I don't understand when the another migration happend. Could you tell me in more detail?

se->exec_start is update when the task becomes current.

You can have the sequence:

task TA runs on CPU0
    TA's se->exec_start = xxxx
TA is put back into the rb tree waiting for next slice while another
task is running
CPU1 pulls TA which migrates on CPU1
    migrate_task_rq_fair() w/ TA's se->exec_start == xxxx
        TA's se->exec_start = 0
TA is put into the rb tree of CPU1 waiting to run on CPU1
CPU2 pulls TA which migrates on CPU2
    migrate_task_rq_fair() w/ TA's se->exec_start == 0
        TA's se->exec_start = 0

>
> I think the next migration will happend after the wakee task enqueued, but at this time
> the p->__state isn't TASK_WAKING, p->__state already be changed to TASK_RUNNING at ttwu_do_wakeup().
>
> If such a migration exists, Previous code "se->vruntime -= u64_u32_load(cfs_rq->min_vruntime);" maybe
> perform multiple times,wouldn't it go wrong in this way?

the vruntime have been updated when enqueued but not exec_start

>
> > migrate another time before being scheduled. You should create a
> > helper function like below and use it in both place
>
> Ok, I will update at next version.
>
>
> Thanks,
> ZhangQiao.
>
> >
> > static inline bool entity_long_sleep(se)
> > {
> >         struct cfs_rq *cfs_rq;
> >         u64 sleep_time;
> >
> >         if (se->exec_start == 0)
> >                 return false;
> >
> >         cfs_rq = cfs_rq_of(se);
> >         sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start;
> >         if ((s64)sleep_time > 60LL * NSEC_PER_SEC)
> >                 return true;
> >
> >         return false;
> > }
> >
> >
> >> +                       se->vruntime = -sched_sleeper_credit(se);
> >> +               else
> >> +                       se->vruntime -= u64_u32_load(cfs_rq->min_vruntime);
> >>         }
> >>
> >>         if (!task_on_rq_migrating(p)) {
> >>
> >>
> >>
> >> Thanks.
> >> Zhang Qiao.
> >>
> >>>
> >>>>
> >>>> Thanks,
> >>>> Roman.
> >>>>
> >>>>
> >>>>
> >>>> Amazon Development Center Germany GmbH
> >>>> Krausenstr. 38
> >>>> 10117 Berlin
> >>>> Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
> >>>> Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
> >>>> Sitz: Berlin
> >>>> Ust-ID: DE 289 237 879
> >>>>
> >>>>
> >>>>
> >>> .
> >>>
> > .
> >

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3] sched/fair: sanitize vruntime of entity being placed
  2023-03-02 14:55                 ` Vincent Guittot
@ 2023-03-03  6:51                   ` Zhang Qiao
  2023-03-03  8:32                     ` Vincent Guittot
  0 siblings, 1 reply; 14+ messages in thread
From: Zhang Qiao @ 2023-03-03  6:51 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Roman Kagan, Peter Zijlstra, linux-kernel, Valentin Schneider,
	Ben Segall, Waiman Long, Steven Rostedt, Mel Gorman,
	Dietmar Eggemann, Daniel Bristot de Oliveira, Ingo Molnar,
	Juri Lelli



在 2023/3/2 22:55, Vincent Guittot 写道:
> On Thu, 2 Mar 2023 at 15:29, Zhang Qiao <zhangqiao22@huawei.com> wrote:
>>
>>
>>
>> 在 2023/3/2 21:34, Vincent Guittot 写道:
>>> On Thu, 2 Mar 2023 at 10:36, Zhang Qiao <zhangqiao22@huawei.com> wrote:
>>>>
>>>>
>>>>
>>>> 在 2023/2/27 22:37, Vincent Guittot 写道:
>>>>> On Mon, 27 Feb 2023 at 09:43, Roman Kagan <rkagan@amazon.de> wrote:
>>>>>>
>>>>>> On Tue, Feb 21, 2023 at 06:26:11PM +0100, Vincent Guittot wrote:
>>>>>>> On Tue, 21 Feb 2023 at 17:57, Roman Kagan <rkagan@amazon.de> wrote:
>>>>>>>> What scares me, though, is that I've got a message from the test robot
>>>>>>>> that this commit drammatically affected hackbench results, see the quote
>>>>>>>> below.  I expected the commit not to affect any benchmarks.
>>>>>>>>
>>>>>>>> Any idea what could have caused this change?
>>>>>>>
>>>>>>> Hmm, It's most probably because se->exec_start is reset after a
>>>>>>> migration and the condition becomes true for newly migrated task
>>>>>>> whereas its vruntime should be after min_vruntime.
>>>>>>>
>>>>>>> We have missed this condition
>>>>>>
>>>>>> Makes sense to me.
>>>>>>
>>>>>> But what would then be the reliable way to detect a sched_entity which
>>>>>> has slept for long and risks overflowing in .vruntime comparison?
>>>>>
>>>>> For now I don't have a better idea than adding the same check in
>>>>> migrate_task_rq_fair()
>>>>
>>>> Hi, Vincent,
>>>> I fixed this condition as you said, and the test results are as follows.
>>>>
>>>> testcase: hackbench -g 44 -f 20 --process --pipe -l 60000 -s 100
>>>> version1: v6.2
>>>> version2: v6.2 + commit 829c1651e9c4
>>>> version3: v6.2 + commit 829c1651e9c4 + this patch
>>>>
>>>> -------------------------------------------------
>>>>         version1        version2        version3
>>>> test1   81.0            118.1           82.1
>>>> test2   82.1            116.9           80.3
>>>> test3   83.2            103.9           83.3
>>>> avg(s)  82.1            113.0           81.9
>>>>
>>>> -------------------------------------------------
>>>> After deal with the task migration case, the hackbench result has restored.
>>>>
>>>> The patch as follow, how does this look?
>>>>
>>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>>> index ff4dbbae3b10..3a88d20fd29e 100644
>>>> --- a/kernel/sched/fair.c
>>>> +++ b/kernel/sched/fair.c
>>>> @@ -4648,6 +4648,26 @@ static void check_spread(struct cfs_rq *cfs_rq, struct sched_entity *se)
>>>>  #endif
>>>>  }
>>>>
>>>> +static inline u64 sched_sleeper_credit(struct sched_entity *se)
>>>> +{
>>>> +
>>>> +       unsigned long thresh;
>>>> +
>>>> +       if (se_is_idle(se))
>>>> +               thresh = sysctl_sched_min_granularity;
>>>> +       else
>>>> +               thresh = sysctl_sched_latency;
>>>> +
>>>> +       /*
>>>> +        * Halve their sleep time's effect, to allow
>>>> +        * for a gentler effect of sleepers:
>>>> +        */
>>>> +       if (sched_feat(GENTLE_FAIR_SLEEPERS))
>>>> +               thresh >>= 1;
>>>> +
>>>> +       return thresh;
>>>> +}
>>>> +
>>>>  static void
>>>>  place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
>>>>  {
>>>> @@ -4664,23 +4684,8 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
>>>>                 vruntime += sched_vslice(cfs_rq, se);
>>>>
>>>>         /* sleeps up to a single latency don't count. */
>>>> -       if (!initial) {
>>>> -               unsigned long thresh;
>>>> -
>>>> -               if (se_is_idle(se))
>>>> -                       thresh = sysctl_sched_min_granularity;
>>>> -               else
>>>> -                       thresh = sysctl_sched_latency;
>>>> -
>>>> -               /*
>>>> -                * Halve their sleep time's effect, to allow
>>>> -                * for a gentler effect of sleepers:
>>>> -                */
>>>> -               if (sched_feat(GENTLE_FAIR_SLEEPERS))
>>>> -                       thresh >>= 1;
>>>> -
>>>> -               vruntime -= thresh;
>>>> -       }
>>>> +       if (!initial)
>>>> +               vruntime -= sched_sleeper_credit(se);
>>>>
>>>>         /*
>>>>          * Pull vruntime of the entity being placed to the base level of
>>>> @@ -4690,7 +4695,7 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
>>>>          * inversed due to s64 overflow.
>>>>          */
>>>>         sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start;
>>>> -       if ((s64)sleep_time > 60LL * NSEC_PER_SEC)
>>>> +       if (se->exec_start != 0 && (s64)sleep_time > 60LL * NSEC_PER_SEC)
>>>>                 se->vruntime = vruntime;
>>>>         else
>>>>                 se->vruntime = max_vruntime(se->vruntime, vruntime);
>>>> @@ -7634,8 +7639,12 @@ static void migrate_task_rq_fair(struct task_struct *p, int new_cpu)
>>>>          */
>>>>         if (READ_ONCE(p->__state) == TASK_WAKING) {
>>>>                 struct cfs_rq *cfs_rq = cfs_rq_of(se);
>>>> +               u64 sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start;
>>>>
>>>> -               se->vruntime -= u64_u32_load(cfs_rq->min_vruntime);
>>>> +               if ((s64)sleep_time > 60LL * NSEC_PER_SEC)
>>>
>>> You also need to test (se->exec_start !=0) here because the task might
>>
>> Hi,
>>
>> I don't understand when the another migration happend. Could you tell me in more detail?
> 
> se->exec_start is update when the task becomes current.
> 
> You can have the sequence:
> 
> task TA runs on CPU0
>     TA's se->exec_start = xxxx
> TA is put back into the rb tree waiting for next slice while another
> task is running
> CPU1 pulls TA which migrates on CPU1
>     migrate_task_rq_fair() w/ TA's se->exec_start == xxxx
>         TA's se->exec_start = 0
> TA is put into the rb tree of CPU1 waiting to run on CPU1
> CPU2 pulls TA which migrates on CPU2
>     migrate_task_rq_fair() w/ TA's se->exec_start == 0
>         TA's se->exec_start = 0
Hi, Vincent,

yes, you're right, such sequence does exist. But at this point, p->__state != TASK_WAKING.

I have a question, Whether there is case that is "p->se.exec_start == 0 && p->__state == TASK_WAKING" ?
I analyzed the code and concluded that this case isn't existed, is it right?

Thanks.
ZhangQiao.

> 
>>
>> I think the next migration will happend after the wakee task enqueued, but at this time
>> the p->__state isn't TASK_WAKING, p->__state already be changed to TASK_RUNNING at ttwu_do_wakeup().
>>
>> If such a migration exists, Previous code "se->vruntime -= u64_u32_load(cfs_rq->min_vruntime);" maybe
>> perform multiple times,wouldn't it go wrong in this way?
> 
> the vruntime have been updated when enqueued but not exec_start
> 
>>
>>> migrate another time before being scheduled. You should create a
>>> helper function like below and use it in both place
>>
>> Ok, I will update at next version.
>>
>>
>> Thanks,
>> ZhangQiao.
>>
>>>
>>> static inline bool entity_long_sleep(se)
>>> {
>>>         struct cfs_rq *cfs_rq;
>>>         u64 sleep_time;
>>>
>>>         if (se->exec_start == 0)
>>>                 return false;
>>>
>>>         cfs_rq = cfs_rq_of(se);
>>>         sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start;
>>>         if ((s64)sleep_time > 60LL * NSEC_PER_SEC)
>>>                 return true;
>>>
>>>         return false;
>>> }
>>>
>>>
>>>> +                       se->vruntime = -sched_sleeper_credit(se);
>>>> +               else
>>>> +                       se->vruntime -= u64_u32_load(cfs_rq->min_vruntime);
>>>>         }
>>>>
>>>>         if (!task_on_rq_migrating(p)) {
>>>>
>>>>
>>>>
>>>> Thanks.
>>>> Zhang Qiao.
>>>>
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Roman.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Amazon Development Center Germany GmbH
>>>>>> Krausenstr. 38
>>>>>> 10117 Berlin
>>>>>> Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
>>>>>> Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
>>>>>> Sitz: Berlin
>>>>>> Ust-ID: DE 289 237 879
>>>>>>
>>>>>>
>>>>>>
>>>>> .
>>>>>
>>> .
>>>
> .
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3] sched/fair: sanitize vruntime of entity being placed
  2023-03-03  6:51                   ` Zhang Qiao
@ 2023-03-03  8:32                     ` Vincent Guittot
  0 siblings, 0 replies; 14+ messages in thread
From: Vincent Guittot @ 2023-03-03  8:32 UTC (permalink / raw)
  To: Zhang Qiao
  Cc: Roman Kagan, Peter Zijlstra, linux-kernel, Valentin Schneider,
	Ben Segall, Waiman Long, Steven Rostedt, Mel Gorman,
	Dietmar Eggemann, Daniel Bristot de Oliveira, Ingo Molnar,
	Juri Lelli

On Fri, 3 Mar 2023 at 07:51, Zhang Qiao <zhangqiao22@huawei.com> wrote:
>
>
>
> 在 2023/3/2 22:55, Vincent Guittot 写道:
> > On Thu, 2 Mar 2023 at 15:29, Zhang Qiao <zhangqiao22@huawei.com> wrote:
> >>
> >>
> >>
> >> 在 2023/3/2 21:34, Vincent Guittot 写道:
> >>> On Thu, 2 Mar 2023 at 10:36, Zhang Qiao <zhangqiao22@huawei.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>> 在 2023/2/27 22:37, Vincent Guittot 写道:
> >>>>> On Mon, 27 Feb 2023 at 09:43, Roman Kagan <rkagan@amazon.de> wrote:
> >>>>>>
> >>>>>> On Tue, Feb 21, 2023 at 06:26:11PM +0100, Vincent Guittot wrote:
> >>>>>>> On Tue, 21 Feb 2023 at 17:57, Roman Kagan <rkagan@amazon.de> wrote:
> >>>>>>>> What scares me, though, is that I've got a message from the test robot
> >>>>>>>> that this commit drammatically affected hackbench results, see the quote
> >>>>>>>> below.  I expected the commit not to affect any benchmarks.
> >>>>>>>>
> >>>>>>>> Any idea what could have caused this change?
> >>>>>>>
> >>>>>>> Hmm, It's most probably because se->exec_start is reset after a
> >>>>>>> migration and the condition becomes true for newly migrated task
> >>>>>>> whereas its vruntime should be after min_vruntime.
> >>>>>>>
> >>>>>>> We have missed this condition
> >>>>>>
> >>>>>> Makes sense to me.
> >>>>>>
> >>>>>> But what would then be the reliable way to detect a sched_entity which
> >>>>>> has slept for long and risks overflowing in .vruntime comparison?
> >>>>>
> >>>>> For now I don't have a better idea than adding the same check in
> >>>>> migrate_task_rq_fair()
> >>>>
> >>>> Hi, Vincent,
> >>>> I fixed this condition as you said, and the test results are as follows.
> >>>>
> >>>> testcase: hackbench -g 44 -f 20 --process --pipe -l 60000 -s 100
> >>>> version1: v6.2
> >>>> version2: v6.2 + commit 829c1651e9c4
> >>>> version3: v6.2 + commit 829c1651e9c4 + this patch
> >>>>
> >>>> -------------------------------------------------
> >>>>         version1        version2        version3
> >>>> test1   81.0            118.1           82.1
> >>>> test2   82.1            116.9           80.3
> >>>> test3   83.2            103.9           83.3
> >>>> avg(s)  82.1            113.0           81.9
> >>>>
> >>>> -------------------------------------------------
> >>>> After deal with the task migration case, the hackbench result has restored.
> >>>>
> >>>> The patch as follow, how does this look?
> >>>>
> >>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >>>> index ff4dbbae3b10..3a88d20fd29e 100644
> >>>> --- a/kernel/sched/fair.c
> >>>> +++ b/kernel/sched/fair.c
> >>>> @@ -4648,6 +4648,26 @@ static void check_spread(struct cfs_rq *cfs_rq, struct sched_entity *se)
> >>>>  #endif
> >>>>  }
> >>>>
> >>>> +static inline u64 sched_sleeper_credit(struct sched_entity *se)
> >>>> +{
> >>>> +
> >>>> +       unsigned long thresh;
> >>>> +
> >>>> +       if (se_is_idle(se))
> >>>> +               thresh = sysctl_sched_min_granularity;
> >>>> +       else
> >>>> +               thresh = sysctl_sched_latency;
> >>>> +
> >>>> +       /*
> >>>> +        * Halve their sleep time's effect, to allow
> >>>> +        * for a gentler effect of sleepers:
> >>>> +        */
> >>>> +       if (sched_feat(GENTLE_FAIR_SLEEPERS))
> >>>> +               thresh >>= 1;
> >>>> +
> >>>> +       return thresh;
> >>>> +}
> >>>> +
> >>>>  static void
> >>>>  place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
> >>>>  {
> >>>> @@ -4664,23 +4684,8 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
> >>>>                 vruntime += sched_vslice(cfs_rq, se);
> >>>>
> >>>>         /* sleeps up to a single latency don't count. */
> >>>> -       if (!initial) {
> >>>> -               unsigned long thresh;
> >>>> -
> >>>> -               if (se_is_idle(se))
> >>>> -                       thresh = sysctl_sched_min_granularity;
> >>>> -               else
> >>>> -                       thresh = sysctl_sched_latency;
> >>>> -
> >>>> -               /*
> >>>> -                * Halve their sleep time's effect, to allow
> >>>> -                * for a gentler effect of sleepers:
> >>>> -                */
> >>>> -               if (sched_feat(GENTLE_FAIR_SLEEPERS))
> >>>> -                       thresh >>= 1;
> >>>> -
> >>>> -               vruntime -= thresh;
> >>>> -       }
> >>>> +       if (!initial)
> >>>> +               vruntime -= sched_sleeper_credit(se);
> >>>>
> >>>>         /*
> >>>>          * Pull vruntime of the entity being placed to the base level of
> >>>> @@ -4690,7 +4695,7 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
> >>>>          * inversed due to s64 overflow.
> >>>>          */
> >>>>         sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start;
> >>>> -       if ((s64)sleep_time > 60LL * NSEC_PER_SEC)
> >>>> +       if (se->exec_start != 0 && (s64)sleep_time > 60LL * NSEC_PER_SEC)
> >>>>                 se->vruntime = vruntime;
> >>>>         else
> >>>>                 se->vruntime = max_vruntime(se->vruntime, vruntime);
> >>>> @@ -7634,8 +7639,12 @@ static void migrate_task_rq_fair(struct task_struct *p, int new_cpu)
> >>>>          */
> >>>>         if (READ_ONCE(p->__state) == TASK_WAKING) {
> >>>>                 struct cfs_rq *cfs_rq = cfs_rq_of(se);
> >>>> +               u64 sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start;
> >>>>
> >>>> -               se->vruntime -= u64_u32_load(cfs_rq->min_vruntime);
> >>>> +               if ((s64)sleep_time > 60LL * NSEC_PER_SEC)
> >>>
> >>> You also need to test (se->exec_start !=0) here because the task might
> >>
> >> Hi,
> >>
> >> I don't understand when the another migration happend. Could you tell me in more detail?
> >
> > se->exec_start is update when the task becomes current.
> >
> > You can have the sequence:
> >
> > task TA runs on CPU0
> >     TA's se->exec_start = xxxx
> > TA is put back into the rb tree waiting for next slice while another
> > task is running
> > CPU1 pulls TA which migrates on CPU1
> >     migrate_task_rq_fair() w/ TA's se->exec_start == xxxx
> >         TA's se->exec_start = 0
> > TA is put into the rb tree of CPU1 waiting to run on CPU1
> > CPU2 pulls TA which migrates on CPU2
> >     migrate_task_rq_fair() w/ TA's se->exec_start == 0
> >         TA's se->exec_start = 0
> Hi, Vincent,
>
> yes, you're right, such sequence does exist. But at this point, p->__state != TASK_WAKING.
>
> I have a question, Whether there is case that is "p->se.exec_start == 0 && p->__state == TASK_WAKING" ?
> I analyzed the code and concluded that this case isn't existed, is it right?

Yes, you're right. Your proposal is enough

Thanks

>
> Thanks.
> ZhangQiao.
>
> >
> >>
> >> I think the next migration will happend after the wakee task enqueued, but at this time
> >> the p->__state isn't TASK_WAKING, p->__state already be changed to TASK_RUNNING at ttwu_do_wakeup().
> >>
> >> If such a migration exists, Previous code "se->vruntime -= u64_u32_load(cfs_rq->min_vruntime);" maybe
> >> perform multiple times,wouldn't it go wrong in this way?
> >
> > the vruntime have been updated when enqueued but not exec_start
> >
> >>
> >>> migrate another time before being scheduled. You should create a
> >>> helper function like below and use it in both place
> >>
> >> Ok, I will update at next version.
> >>
> >>
> >> Thanks,
> >> ZhangQiao.
> >>
> >>>
> >>> static inline bool entity_long_sleep(se)
> >>> {
> >>>         struct cfs_rq *cfs_rq;
> >>>         u64 sleep_time;
> >>>
> >>>         if (se->exec_start == 0)
> >>>                 return false;
> >>>
> >>>         cfs_rq = cfs_rq_of(se);
> >>>         sleep_time = rq_clock_task(rq_of(cfs_rq)) - se->exec_start;
> >>>         if ((s64)sleep_time > 60LL * NSEC_PER_SEC)
> >>>                 return true;
> >>>
> >>>         return false;
> >>> }
> >>>
> >>>
> >>>> +                       se->vruntime = -sched_sleeper_credit(se);
> >>>> +               else
> >>>> +                       se->vruntime -= u64_u32_load(cfs_rq->min_vruntime);
> >>>>         }
> >>>>
> >>>>         if (!task_on_rq_migrating(p)) {
> >>>>
> >>>>
> >>>>
> >>>> Thanks.
> >>>> Zhang Qiao.
> >>>>
> >>>>>
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Roman.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> Amazon Development Center Germany GmbH
> >>>>>> Krausenstr. 38
> >>>>>> 10117 Berlin
> >>>>>> Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
> >>>>>> Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
> >>>>>> Sitz: Berlin
> >>>>>> Ust-ID: DE 289 237 879
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>> .
> >>>>>
> >>> .
> >>>
> > .
> >

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-03-03  8:33 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-09 19:31 [PATCH v3] sched/fair: sanitize vruntime of entity being placed Roman Kagan
2023-02-21  9:38 ` Vincent Guittot
2023-02-21 16:57   ` Roman Kagan
2023-02-21 17:26     ` Vincent Guittot
2023-02-27  8:42       ` Roman Kagan
2023-02-27 14:37         ` Vincent Guittot
2023-02-27 17:00           ` Dietmar Eggemann
2023-02-27 17:15             ` Vincent Guittot
2023-03-02  9:36           ` Zhang Qiao
2023-03-02 13:34             ` Vincent Guittot
2023-03-02 14:29               ` Zhang Qiao
2023-03-02 14:55                 ` Vincent Guittot
2023-03-03  6:51                   ` Zhang Qiao
2023-03-03  8:32                     ` Vincent Guittot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).