From: Abdul Anshad Azeez <firstname.lastname@example.org> To: "email@example.com" <firstname.lastname@example.org> Cc: "email@example.com" <firstname.lastname@example.org>, "email@example.com" <firstname.lastname@example.org>, "email@example.com" <firstname.lastname@example.org>, "email@example.com" <firstname.lastname@example.org>, "email@example.com" <firstname.lastname@example.org>, "email@example.com" <firstname.lastname@example.org>, Rajender M <email@example.com>, Rahul Gopakumar <firstname.lastname@example.org> Subject: [Linux Kernel 5.13 GA] ESXi Performance regression Date: Fri, 30 Jul 2021 12:27:26 +0000 [thread overview] Message-ID: <BYAPR05MB483975D437F293A40BEF3189A6EC9@BYAPR05MB4839.namprd05.prod.outlook.com> (raw) As part of VMware's performance regression testing for Linux Kernel upstream releases, we evaluated the performance of Linux kernel 5.13 against the 5.12 release. Our evaluation revealed performance regressions in ESXi Compute workloads up to 3x and ESXi Networking workloads up to 40%. After performing the bisect between kernel 5.13 and 5.12, we identified the root cause behavior to be a “Scheduler” related commit from Peter Zijlstra's "8a99b6833c884fa0e7919030d93fecedc69fc625 ( sched: Move SCHED_DEBUG sysctl to debugfs)". It appears that the issue arose due to Peter's commit changing the default value of "sched_wakeup_granularity_ns" and more details are below. Impacted test case details: 1. Compute: - VM Config - RHEL 8.1 - 1VM with 8vCPU & 16G Memory - Benchmark - kernel compile - Measures time taken to compile Linux kernel source code (Linux kernel version used - 4.9.24) - make -j 2xVCPU - This uses all the available CPU threads to achieve 100% CPU utilization 2. Networking: - VM Config - RHEL 8.1 - 1VM with 8vCPU & 16G Memory and 8VM with 4vCPU & 8G Memory - Benchmark - Netperf - Netperf TCP_STREAM RECV small (8K socket & 256B message)( TCP_NODELAY set) packets – Throughput (1VM) - Netperf UDP_STREAM RECV (256K socket & 256B message) – Packet rate ( 8VM) From our testing, overall results indicate that the above-mentioned commit has introduced performance regressions in kernel compile workload for Compute area and in Networking, test cases with high packet rates were impacted. We noticed that Peter Zijlstra's commit has moved the Scheduler tunables to debugfs file system. And on taking a closer look, the values of two such tunables are different between before and after the above-mentioned commit. 1. Before: sched_min_granularity_ns - 10000000 (10ms) sched_wakeup_granularity_ns - 15000000 (15ms) 2. After: sched_min_granularity_ns - 3000000 (3ms) sched_wakeup_granularity_ns - 4000000 (4ms) With further experiments, we have confirmed that the value of "sched_wakeup_granularity_ns" is influencing these performance regressions. And, on setting the "sched_wakeup_granularity_ns" value back to "15000000" in Peter Zijlstra's commit, we are able to gain back the lost performance in our Compute & Networking workloads. Further, we also collected guest scheduling stats (during Kernel compile workload) and were able to notice more involuntary switches forced by the scheduler when "sched_wakeup_granularity_ns" value is set to "4000000". 1. "sched_wakeup_granularity_ns = 4000000" (3 iterations): nr_involuntary_switches : 3 nr_involuntary_switches : 2 nr_involuntary_switches : 2 2. "sched_wakeup_granularity_ns = 15000000" (3 iterations): nr_involuntary_switches : 0 nr_involuntary_switches : 0 nr_involuntary_switches : 0 So, we believe decreasing the value of "sched_wakeup_granularity_ns" is causing more preemption to the running processes and it's impacting the CPU-bound tasks - Kernel compile & Netperf high packet rate workloads. Also, since Linux 5.14-rc3 kernel was recently released, we repeated the same experiments on 5.14-rc3 and were able to observe the same regressions in both areas (Compute & Networking). We wanted to understand the reason behind the change in default values for the above two scheduler tunables and since changing the value of "sched_wakeup_granularity_ns" from 15ms to 4ms forces more involuntary switches and which in-turn introduces performance regression, can this be changed back to 15ms? Abdul Anshad Azeez Performance Engineering VMware, Inc.
next reply other threads:[~2021-07-30 12:27 UTC|newest] Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-07-30 12:27 Abdul Anshad Azeez [this message] 2021-07-30 13:26 ` Valentin Schneider 2021-08-05 14:33 ` Rahul Gopakumar 2021-08-05 14:58 ` Steven Rostedt 2021-08-05 15:05 ` Peter Zijlstra 2021-08-05 15:24 ` Steven Rostedt 2021-08-05 15:28 ` Peter Zijlstra
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=BYAPR05MB483975D437F293A40BEF3189A6EC9@BYAPR05MB4839.namprd05.prod.outlook.com \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --subject='Re: [Linux Kernel 5.13 GA] ESXi Performance regression' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).