linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Linux Kernel 5.13 GA] ESXi Performance regression
@ 2021-07-30 12:27 Abdul Anshad Azeez
  2021-07-30 13:26 ` Valentin Schneider
  0 siblings, 1 reply; 7+ messages in thread
From: Abdul Anshad Azeez @ 2021-07-30 12:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: peterz, valentin.schneider, mingo, juri.lelli, vincent.guittot,
	rostedt, Rajender M, Rahul Gopakumar

As part of VMware's performance regression testing for Linux Kernel
upstream releases, we evaluated the performance of Linux kernel 5.13
against the 5.12 release. Our evaluation revealed performance
regressions in ESXi Compute workloads up to 3x and ESXi Networking
workloads up to 40%.

After performing the bisect between kernel 5.13 and 5.12, we
identified the root cause behavior to be a “Scheduler” related commit
from Peter Zijlstra's "8a99b6833c884fa0e7919030d93fecedc69fc625 (
sched: Move SCHED_DEBUG sysctl to debugfs)". It appears that the
issue arose due to Peter's commit changing the default value of
"sched_wakeup_granularity_ns" and more details are below.

Impacted test case details:

1. Compute:
- VM Config - RHEL 8.1 - 1VM with 8vCPU & 16G Memory
- Benchmark - kernel compile
- Measures time taken to compile Linux kernel source code (Linux
kernel version used - 4.9.24)
- make -j 2xVCPU - This uses all the available CPU threads to achieve
100% CPU utilization

2. Networking:
- VM Config - RHEL 8.1 - 1VM with 8vCPU & 16G Memory and 8VM with
4vCPU & 8G Memory
- Benchmark - Netperf
- Netperf TCP_STREAM RECV small (8K socket & 256B message)(
TCP_NODELAY set) packets – Throughput (1VM)
- Netperf UDP_STREAM RECV (256K socket & 256B message) – Packet rate (
8VM)

From our testing, overall results indicate that the above-mentioned
commit has introduced performance regressions in kernel compile
workload for Compute area and in Networking, test cases with high
packet rates were impacted.

We noticed that Peter Zijlstra's commit has moved the Scheduler
tunables to debugfs file system. And on taking a closer look, the
values of two such tunables are different between before and after
the above-mentioned commit.

1. Before:
sched_min_granularity_ns    - 10000000 (10ms)
sched_wakeup_granularity_ns - 15000000 (15ms)

2. After:
sched_min_granularity_ns    - 3000000 (3ms)
sched_wakeup_granularity_ns - 4000000 (4ms)

With further experiments, we have confirmed that the value of
"sched_wakeup_granularity_ns" is influencing these performance
regressions. And, on setting the "sched_wakeup_granularity_ns" value
back to "15000000" in Peter Zijlstra's commit, we are able to gain
back the lost performance in our Compute & Networking workloads.

Further, we also collected guest scheduling stats (during Kernel
compile workload) and were able to notice more involuntary switches
forced by the scheduler when "sched_wakeup_granularity_ns" value is
set to "4000000".

1. "sched_wakeup_granularity_ns = 4000000" (3 iterations):
nr_involuntary_switches : 3
nr_involuntary_switches : 2
nr_involuntary_switches : 2

2. "sched_wakeup_granularity_ns = 15000000" (3 iterations):
nr_involuntary_switches : 0
nr_involuntary_switches : 0
nr_involuntary_switches : 0

So, we believe decreasing the value of "sched_wakeup_granularity_ns"
is causing more preemption to the running processes and it's
impacting the CPU-bound tasks - Kernel compile & Netperf high packet
rate workloads.

Also, since Linux 5.14-rc3 kernel was recently released, we repeated
the same experiments on 5.14-rc3 and were able to observe the same
regressions in both areas (Compute & Networking).

We wanted to understand the reason behind the change in default
values for the above two scheduler tunables and since changing the
value of "sched_wakeup_granularity_ns" from 15ms to 4ms forces more
involuntary switches and which in-turn introduces performance
regression, can this be changed back to 15ms?

Abdul Anshad Azeez
Performance Engineering
VMware, Inc.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-08-05 15:28 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-30 12:27 [Linux Kernel 5.13 GA] ESXi Performance regression Abdul Anshad Azeez
2021-07-30 13:26 ` Valentin Schneider
2021-08-05 14:33   ` Rahul Gopakumar
2021-08-05 14:58     ` Steven Rostedt
2021-08-05 15:05       ` Peter Zijlstra
2021-08-05 15:24         ` Steven Rostedt
2021-08-05 15:28           ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).