All of
 help / color / mirror / Atom feed
From: Manikandan Jagatheesan <>
To: "" <>,
	"" <>,
	"" <>
Cc: "" <>,
	"" <>,
	"" <>,
	"" <>,
	"" <>,
	"" <>,
	"" <>,
	"" <>,
	Peter Jonasson <>,
	Yiu Cho Lau <>, Rajender M <>,
	Abdul Anshad Azeez <>,
	Kodeswaran Kumarasamy <>,
	Rahul Gopakumar <>
Subject: Performance Regression in Linux Kernel 5.19
Date: Fri, 9 Sep 2022 11:46:08 +0000	[thread overview]
Message-ID: <> (raw)

As part of VMware's performance regression testing for Linux
Kernel upstream releases, we have evaluated the performance
of Linux kernel 5.19 against the 5.18 release and we have 
noticed performance regressions in Linux VMs on ESXi as shown 
- Compute(up to -70%)
- Networking(up to -30%)
- Storage(up to -13%) 
After performing the bisect between kernel 5.18 and 5.19, we 
identified the root cause to be the enablement of IBRS mitigation 
for spectre_v2 vulnerability by commit 6ad0ad2bf8a6 ("x86/bugs: 
Report Intel retbleed vulnerability").
To confirm this, we have disabled the above security mitigation
through kernel boot parameter(spectre_v2=off) in 5.19 and re-ran
our tests & confirmed that the performance was on-par with 
5.18 release. 
Performance data and workload details:
Used Linux VM on ESXi host: Ubuntu20.04.3
ESXi Compute workloads:
Server configs: 112 threads, 4 sockets Skylake with 2TB memory
1. Boot-halt test:
- Configs: Single VM with different CPU and Memory configurations
                 (1vCPU_32gb, 28vCPU_256gb, 56vCPU_512gb, 84vCPU_1024gb
                 & 112vCPU_1433gb)
- Test-desc: Measures the time taken by the Guest to boot up and 
                   shut down itself. We have "shutdown -h now" in 
                   rc.local for Linux. Boothalt time is calculated by 
                   using timestamps of following patterns from vmware.log.
                   * Begin Pattern - " PowerOn"
                   * End Pattern - "VMX exit"
- Boothalt time = Timestamp(End Pattern) - Timestamp(Begin Pattern)
- Highly affected case: Lower vCPU config is affected (1vCPU_32gb
                                    up to -12%)
- Metric: Secs
- Performance data:
      * Immediate before commit: 14.844 secs
      * Intel retbleed/IBRS commit: 16.29 secs (absolute diff ~2 secs)
2. Kernel Compile test:
- Configs: Single VM with different CPU and Memory configurations
                 (1vCPU_4gb, 28vCPU_64gb, 56vCPU_64gb, 84vCPU_64gb,
                 112vCPU_64gb & 126vCPU_64gb)
- Test-desc: A CPU intensive benchmark. Measures time taken to compile 
                   Linux kernel source (4.9.24).
- Highly affected case: Higher vCPU configs - 112vCPU_64gb (up to -10%)
- Command: make -j 2x$VCPU. This uses all the available CPU threads to 
                     achieve 100% CPU utilization.
                     Timestamp is recorded in the vmware.log before and after 
                     compiling the source.
                     * Begin Pattern - "VMQARESULT BEGIN"
                     * End Pattern - "VMQARESULT END"
- Metric: Secs
- Performance data:
      * Immediate before commit: 21.316 secs
      * Intel retbleed/IBRS commit: 23.824secs (absolute diff ~2 secs)
3. OSbench test:
- Configs: Single VM with 1vCPU_4gb config
- Test-desc: This is a collection of benchmarks that aim to measure 
                   the performance of operating system primitives, such as 
                   process and thread creation and it is publicly available.
                   To build the benchmarks, we need a C compiler, meson 
                   and ninja.
- Highly affected case: 1vCPU_4gb (up to -70%)
- Command: To run - ./create_threads 
- Metric: Milliseconds
- Performance data:
   i) create_threads 
      * Immediate before commit: 16.46 msecs
      * Intel retbleed/IBRS commit: 27.97 msecs (absolute diff ~11 msecs)
   ii) create_processes
      * Immediate before commit: 69.03 msecs
      * Intel retbleed/IBRS commit: 83.20 msecs (absolute diff ~14 msecs)
ESXi Networking workloads:
- Server config: 56 threads 2 sockets Skylake with 192G memory
- Benchmark: Netperf 2.7.0
- Topology: A Linux VM on an ESXi host is connected to a Bare Metal 
                   Linux client using back to back direct connection without 
                   involving a physical switch.
- Test-Desc: We measure bulk data transfer and request/response
                    performance using TCP and UDP protocols.
- Highly affected case: Single VM on 8vCPU with TCP_STREAM RECV
                                     Large packets(256K Socket & 16K Message size) 
                                     up to -30%
- Netperf command: (TCP_STREAM_RECV large packets)
netperf -l 60 -H DestinationIP -p port -t TCP_STREAM -- -s 256K 
-S 256K -m 16K -M 16K
Linux VM on the ESXi host act as RECEIVER and Bare Metal 
Linux host act as SENDER. 
We initiate netperf from Bare Metal Client Linux host and start 
netserver from Linux VM on the ESXi host with 16 parallel netperf 
- Metrics: TCP_STREAM(Cpu/Gbits, Gbps), UDP_STREAM(Kilo packets per
                second), TCP_RR(ResponseTime in microseconds)
TCP_STREAM_Throughput - Capture Throughput from netperf output file.
TCP_STREAM_CPU - Capture CPU/Gbits from Total CPU spent in all
                                 of the threads in given duration divided by 
                                 respective throughput Gbps.
UDP_STREAM Msgs - Capture from netstats & netperf out files.
TCP_RR RespTimeMean - Capture output from netperf out file.
- NIC Model used: Intel(R) Ethernet Controller XL710 for 40GbE QSFP+
- Performance data:
      * Immediate before commit: 11.932 Gbps
      * Intel retbleed/IBRS commit: 8.56 Gbps (~3.5 Gbps of throughput drop)
ESXi Storage workloads:
- Server config: 56 threads 2 sockets Skylake with 192G memory
- Benchmark: FIO v3.20
- Test-Desc: We measure how much read/write I/O operations can be
                    performed at a given period of time, average time it
                    takes to complete the I/O and the total CPU cycles
                    been spent.
- I/O  Block size: 4KiB, 64KiB & 256KiB
- Read write Ratio: 100% read, 100% write & 70/30 mixed readwrite
- Access Patterns: Random & Sequential
- # of VMs: Single VM (1VM_8vCPU) & Multi VMs(16VM_4vCPU)
- Devices under test: Local device and SAN
- Local device: Local NVMe (Intel Corporation DC P3700 SSD)
- SAN connected: QLogic QLE2692 FC-16G (connected to DELL EMC
                             PowerStore 5000T array)
- Highly affected case: 1VM-cpucost_64K_seq_7030readwrite (up to -13%)
- Throughput and latency tests are not affected.
- Command: fio --name=fio-test --ioengine=libaio --iodepth=16 --rw=rw 
         --rwmixread=70 --rwmixwrite=30 --bs=65536 --thread --direct=1 
         --numjobs=8 --group_reporting=1 --time_based --runtime=180 
         /dev/sdg:/dev/sdh:/dev/sdi --significant_figures=10
- Metrics: Throughput (IOPS), Latency (milliseconds) and Cpucost
                (CPIO - cycles per I/O) t
                The new CPIO (internal tool) is implemented simply as a 
                python script, that uses a processor’s performance counters 
                to arrive at the CPU cycles used in a given duration.
- Command: python3 /usr/lib/vmware/cpio/cpio.pyc -i 25 -n 5 -D all 
                    -v -d -o outputDir
                     here, 25 is the interval of collection
                     5 is the number of intervals
                     all is the device for which we intend to collect data.
- Topology: A standalone server(ESXi image) with local NVMe disks and
                   FC-16G HBA is connected to an “DELL EMC PowerStore 5000T”
                   array for Storage I/O performance measurements.
- Performance data:
     * Immediate before commit: 269928 cycles/io
      * Intel retbleed/IBRS commit: 303937 cycles/io (absolute 
                                                      diff 34009 cycles/io)
We believe these findings would be useful to the Linux community and
wanted to document the same.

Manikandan Jagatheesan
Performance Engineering
VMware, Inc.

             reply	other threads:[~2022-09-09 11:46 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-09 11:46 Manikandan Jagatheesan [this message]
2022-09-09 13:18 ` Performance Regression in Linux Kernel 5.19 Peter Zijlstra
2022-09-09 21:22 ` David Laight
2022-09-10  7:52 ` Borislav Petkov
2022-09-12 10:58   ` Borislav Petkov
2022-09-13  8:40     ` Manikandan Jagatheesan
2022-09-13 10:27       ` Boris Petkov
2022-09-13 11:20       ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.