Performance Regression due to ASPM disable patch

* Performance Regression due to ASPM disable patch
       [not found] <CGME20230712155834epcas5p1140d90c8a0a181930956622728c4dd89@epcas5p1.samsung.com>
@ 2023-07-12 15:55 ` Anuj Gupta
  2023-07-13  5:59   ` Heiner Kallweit
  2023-07-13 12:37   ` Linux regression tracking #adding (Thorsten Leemhuis)
  0 siblings, 2 replies; 5+ messages in thread
From: Anuj Gupta @ 2023-07-12 15:55 UTC (permalink / raw)
  To: hkallweit1, davem
  Cc: holger, kai.heng.feng, simon.horman, nic_swsd, netdev, linux-nvme

[-- Attachment #1: Type: text/plain, Size: 1286 bytes --]

Hi,

I see a performance regression for read/write workloads on our NVMe over
fabrics using TCP as transport setup.
IOPS drop by 23% for 4k-randread [1] and by 18% for 4k-randwrite [2].

I bisected and found that the commit
e1ed3e4d91112027b90c7ee61479141b3f948e6a ("r8169: disable ASPM during
NAPI poll") is the trigger.
When I revert this commit, the performance drop goes away.

The target machine uses a realtek ethernet controller - 
root@testpc:/home/test# lspci | grep -i eth
29:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. Device 2600
(rev 21)
2a:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. Killer
E3000 2.5GbE Controller (rev 03)

I tried to disable aspm by passing "pcie_aspm=off" as boot parameter and
by setting pcie aspm policy to performance. But it didn't improve the
performance.
I wonder if this is already known, and something different should be
done to handle the original issue? 

[1] fio randread
fio -direct=1 -iodepth=1 -rw=randread -ioengine=psync -bs=4k -numjobs=1
-runtime=30 -group_reporting -filename=/dev/nvme1n1 -name=psync_read
-output=psync_read
[2] fio randwrite
fio -direct=1 -iodepth=1 -rw=randwrite -ioengine=psync -bs=4k -numjobs=1
-runtime=30 -group_reporting -filename=/dev/nvme1n1 -name=psync_read
-output=psync_write

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread