All of lore.kernel.org
 help / color / mirror / Atom feed
* Scrub of my nvme SSD has slowed by about 2/3
@ 2023-07-03 20:19 Tim Cuthbertson
  2023-07-03 23:49 ` Qu Wenruo
                   ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Tim Cuthbertson @ 2023-07-03 20:19 UTC (permalink / raw)
  To: linux-btrfs

Yesterday, I noticed that a scrub of my main system filesystem has
slowed from about 2.9 gb/sec to about 949 mb/sec. My scrub used to run
in about 12 seconds, now it is taking 51 seconds. I had just installed
Linux kernel 6.4.1 on Arch Linux, upgrading from 6.3.9. At first I
suspected the new kernel, but now I am not so sure.

I have btrfs-progs v 6.3.2-1. It was last upgraded on June 23.

Here are the results of a recent scrub:

btrfs scrub status /mnt/nvme0n1p3/
UUID:             20db1fe2-60a4-4eb7-87ac-1953a55dda16
Scrub started:    Sun Jul  2 19:19:53 2023
Status:           finished
Duration:         0:00:51
Total to scrub:   47.28GiB
Rate:             948.61MiB/s
Error summary:    no errors found

Here is hdparm performance output of the drive:

/dev/nvme0n1:
 Timing O_DIRECT cached reads:   3744 MB in  2.00 seconds = 1871.94 MB/sec
 Timing O_DIRECT disk reads: 9180 MB in  3.00 seconds = 3059.63 MB/sec

Here is an attempt at describing my system:
inxi -F
System:

  Host: tux Kernel: 6.4.1-arch1-1 arch: x86_64 bits: 64 Console: pty
pts/2 Distro: Arch Linux
Machine:
  Type: Desktop Mobo: ASUSTeK model: TUF GAMING X570-PLUS (WI-FI) v: Rev X.0x
    serial: 200771405807421 UEFI: American Megatrends v: 4602 date: 02/23/2023
CPU:
  Info: 12-core model: AMD Ryzen 9 3900X bits: 64 type: MT MCP cache: L2: 6 MiB
  Speed (MHz): avg: 2666 min/max: 2200/4672 cores: 1: 3800 2: 2200 3:
2200 4: 2200 5: 2200
    6: 3800 7: 2200 8: 3800 9: 2200 10: 2200 11: 3800 12: 2200 13:
3800 14: 2200 15: 2200 16: 2200
    17: 2200 18: 2200 19: 2200 20: 2200 21: 3800 22: 2200 23: 2200 24: 3800
Graphics:
  Device-1: NVIDIA TU104 [GeForce RTX 2060] driver: nvidia v: 535.54.03
  Display: server: X.org v: 1.21.1.8 driver: X: loaded: nvidia
unloaded: modesetting gpu: nvidia
    tty: 273x63
  API: OpenGL Message: GL data unavailable in console and glxinfo missing.
Audio:
  Device-1: NVIDIA TU104 HD Audio driver: snd_hda_intel
  Device-2: AMD Starship/Matisse HD Audio driver: snd_hda_intel
  API: ALSA v: k6.4.1-arch1-1 status: kernel-api
Network:
  Device-1: Intel Wireless-AC 9260 driver: iwlwifi
  IF: wlan0 state: up mac: cc:d9:ac:3a:b4:9d
  Device-2: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet driver: r8169
  IF: enp5s0 state: down mac: 24:4b:fe:96:38:f9
Bluetooth:
  Device-1: N/A driver: btusb type: USB
  Report: rfkill ID: hci0 rfk-id: 0 state: down bt-service: disabled
rfk-block: hardware: no
    software: no address: see --recommends
Drives:
  Local Storage: total: 7.73 TiB used: 378.62 GiB (4.8%)
  ID-1: /dev/nvme0n1 vendor: Western Digital model: WDBRPG0010BNC-WRSN
size: 931.51 GiB
  ID-2: /dev/sda vendor: Samsung model: SSD 860 EVO 500GB size: 465.76 GiB
  ID-3: /dev/sdb vendor: Seagate model: ST2000DM008-2FR102 size: 1.82 TiB
  ID-4: /dev/sdc vendor: Western Digital model: WD50NDZW-11BGSS1 size:
4.55 TiB type: USB
Partition:
  ID-1: / size: 915.26 GiB used: 47.37 GiB (5.2%) fs: btrfs dev: /dev/nvme0n1p3
  ID-2: /boot size: 252 MiB used: 92.1 MiB (36.5%) fs: vfat dev: /dev/nvme0n1p1
  ID-3: /home size: 915.26 GiB used: 47.37 GiB (5.2%) fs: btrfs dev:
/dev/nvme0n1p3
Swap:
  ID-1: swap-1 type: partition size: 16 GiB used: 0 KiB (0.0%) dev:
/dev/nvme0n1p2
Sensors:
  System Temperatures: cpu: 27.5 C mobo: 26.0 C gpu: nvidia temp: 32 C
  Fan Speeds (RPM): fan-1: 847 fan-2: 1074 fan-3: 0 fan-4: 0 fan-5:
1002 fan-6: 0 fan-7: 782
Info:
  Processes: 407 Uptime: 23m Memory: available: 31.25 GiB used: 1.54
GiB (4.9%) Init: systemd
  Shell: Bash inxi: 3.3.27

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Scrub of my nvme SSD has slowed by about 2/3
  2023-07-03 20:19 Scrub of my nvme SSD has slowed by about 2/3 Tim Cuthbertson
@ 2023-07-03 23:49 ` Qu Wenruo
  2023-07-05  2:44   ` Qu Wenruo
  2023-07-11  5:33 ` Martin Steigerwald
  2023-07-12 11:02 ` Linux regression tracking #adding (Thorsten Leemhuis)
  2 siblings, 1 reply; 30+ messages in thread
From: Qu Wenruo @ 2023-07-03 23:49 UTC (permalink / raw)
  To: Tim Cuthbertson, linux-btrfs



On 2023/7/4 04:19, Tim Cuthbertson wrote:
> Yesterday, I noticed that a scrub of my main system filesystem has
> slowed from about 2.9 gb/sec to about 949 mb/sec. My scrub used to run
> in about 12 seconds, now it is taking 51 seconds. I had just installed
> Linux kernel 6.4.1 on Arch Linux, upgrading from 6.3.9. At first I
> suspected the new kernel, but now I am not so sure.

Well, the v6.4 kernel has introduced a new scrub implementation, which
has a completely different way of handling IOs.

In my initial tests, the new scrub should lead to less IOPS while higher
throughput.
But it doesn't look good at all for your case.

Have you tried to roll the kernel back to 6.3.x and re-test?

One of the new behavior change is in how csum is verified.
Previously the csum is verified one thread per-sector (4K block), but
now it's changed to one thread per stripe (64K block).
But with a much larger block size to reduce IOPS.

All the changes should lead to a better performance on slower disks, but
with your blazing fast devices, the csum verification may be a
bottleneck instead.

If it's really the case, mind to also monitor your CPU usage during
scrub and compare the CPU usage between v6.4 and v6.3 kernels?

Thanks,
Qu
>
> I have btrfs-progs v 6.3.2-1. It was last upgraded on June 23.
>
> Here are the results of a recent scrub:
>
> btrfs scrub status /mnt/nvme0n1p3/
> UUID:             20db1fe2-60a4-4eb7-87ac-1953a55dda16
> Scrub started:    Sun Jul  2 19:19:53 2023
> Status:           finished
> Duration:         0:00:51
> Total to scrub:   47.28GiB
> Rate:             948.61MiB/s
> Error summary:    no errors found
>
> Here is hdparm performance output of the drive:
>
> /dev/nvme0n1:
>   Timing O_DIRECT cached reads:   3744 MB in  2.00 seconds = 1871.94 MB/sec
>   Timing O_DIRECT disk reads: 9180 MB in  3.00 seconds = 3059.63 MB/sec
>
> Here is an attempt at describing my system:
> inxi -F
> System:
>
>    Host: tux Kernel: 6.4.1-arch1-1 arch: x86_64 bits: 64 Console: pty
> pts/2 Distro: Arch Linux
> Machine:
>    Type: Desktop Mobo: ASUSTeK model: TUF GAMING X570-PLUS (WI-FI) v: Rev X.0x
>      serial: 200771405807421 UEFI: American Megatrends v: 4602 date: 02/23/2023
> CPU:
>    Info: 12-core model: AMD Ryzen 9 3900X bits: 64 type: MT MCP cache: L2: 6 MiB
>    Speed (MHz): avg: 2666 min/max: 2200/4672 cores: 1: 3800 2: 2200 3:
> 2200 4: 2200 5: 2200
>      6: 3800 7: 2200 8: 3800 9: 2200 10: 2200 11: 3800 12: 2200 13:
> 3800 14: 2200 15: 2200 16: 2200
>      17: 2200 18: 2200 19: 2200 20: 2200 21: 3800 22: 2200 23: 2200 24: 3800
> Graphics:
>    Device-1: NVIDIA TU104 [GeForce RTX 2060] driver: nvidia v: 535.54.03
>    Display: server: X.org v: 1.21.1.8 driver: X: loaded: nvidia
> unloaded: modesetting gpu: nvidia
>      tty: 273x63
>    API: OpenGL Message: GL data unavailable in console and glxinfo missing.
> Audio:
>    Device-1: NVIDIA TU104 HD Audio driver: snd_hda_intel
>    Device-2: AMD Starship/Matisse HD Audio driver: snd_hda_intel
>    API: ALSA v: k6.4.1-arch1-1 status: kernel-api
> Network:
>    Device-1: Intel Wireless-AC 9260 driver: iwlwifi
>    IF: wlan0 state: up mac: cc:d9:ac:3a:b4:9d
>    Device-2: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet driver: r8169
>    IF: enp5s0 state: down mac: 24:4b:fe:96:38:f9
> Bluetooth:
>    Device-1: N/A driver: btusb type: USB
>    Report: rfkill ID: hci0 rfk-id: 0 state: down bt-service: disabled
> rfk-block: hardware: no
>      software: no address: see --recommends
> Drives:
>    Local Storage: total: 7.73 TiB used: 378.62 GiB (4.8%)
>    ID-1: /dev/nvme0n1 vendor: Western Digital model: WDBRPG0010BNC-WRSN
> size: 931.51 GiB
>    ID-2: /dev/sda vendor: Samsung model: SSD 860 EVO 500GB size: 465.76 GiB
>    ID-3: /dev/sdb vendor: Seagate model: ST2000DM008-2FR102 size: 1.82 TiB
>    ID-4: /dev/sdc vendor: Western Digital model: WD50NDZW-11BGSS1 size:
> 4.55 TiB type: USB
> Partition:
>    ID-1: / size: 915.26 GiB used: 47.37 GiB (5.2%) fs: btrfs dev: /dev/nvme0n1p3
>    ID-2: /boot size: 252 MiB used: 92.1 MiB (36.5%) fs: vfat dev: /dev/nvme0n1p1
>    ID-3: /home size: 915.26 GiB used: 47.37 GiB (5.2%) fs: btrfs dev:
> /dev/nvme0n1p3
> Swap:
>    ID-1: swap-1 type: partition size: 16 GiB used: 0 KiB (0.0%) dev:
> /dev/nvme0n1p2
> Sensors:
>    System Temperatures: cpu: 27.5 C mobo: 26.0 C gpu: nvidia temp: 32 C
>    Fan Speeds (RPM): fan-1: 847 fan-2: 1074 fan-3: 0 fan-4: 0 fan-5:
> 1002 fan-6: 0 fan-7: 782
> Info:
>    Processes: 407 Uptime: 23m Memory: available: 31.25 GiB used: 1.54
> GiB (4.9%) Init: systemd
>    Shell: Bash inxi: 3.3.27

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Scrub of my nvme SSD has slowed by about 2/3
  2023-07-03 23:49 ` Qu Wenruo
@ 2023-07-05  2:44   ` Qu Wenruo
  2023-07-11  5:36     ` Martin Steigerwald
  0 siblings, 1 reply; 30+ messages in thread
From: Qu Wenruo @ 2023-07-05  2:44 UTC (permalink / raw)
  To: Tim Cuthbertson, linux-btrfs



On 2023/7/4 07:49, Qu Wenruo wrote:
>
>
> On 2023/7/4 04:19, Tim Cuthbertson wrote:
>> Yesterday, I noticed that a scrub of my main system filesystem has
>> slowed from about 2.9 gb/sec to about 949 mb/sec. My scrub used to run
>> in about 12 seconds, now it is taking 51 seconds. I had just installed
>> Linux kernel 6.4.1 on Arch Linux, upgrading from 6.3.9. At first I
>> suspected the new kernel, but now I am not so sure.
>
> Well, the v6.4 kernel has introduced a new scrub implementation, which
> has a completely different way of handling IOs.

In fact, I'm considering doing the old way of checksum verification, one
thread per-block, to address the performance problem.

If you're fine compiling a custom kernel, I can craft a branch for you
to test.

My educated guess shows, it seems possible that performance regression
is caused by the behavior change.

Currently we read one stripe, do the verification in a single thread.
Your disk read can read 3GB/s, while your CPU can only do CRC32
verification around 2GB/s.

In that case, csum verification is the bottle neck, resulting a
theocratic performance around 1.2GB/s, near your observed performance.

But if we allow the multi-thread verification, we can do 16 threads,
that results something around 2.7GB/s, also matches your old kernel
observation.

Thanks,
Qu
>
> In my initial tests, the new scrub should lead to less IOPS while higher
> throughput.
> But it doesn't look good at all for your case.
>
> Have you tried to roll the kernel back to 6.3.x and re-test?
>
> One of the new behavior change is in how csum is verified.
> Previously the csum is verified one thread per-sector (4K block), but
> now it's changed to one thread per stripe (64K block).
> But with a much larger block size to reduce IOPS.
>
> All the changes should lead to a better performance on slower disks, but
> with your blazing fast devices, the csum verification may be a
> bottleneck instead.
>
> If it's really the case, mind to also monitor your CPU usage during
> scrub and compare the CPU usage between v6.4 and v6.3 kernels?
>
> Thanks,
> Qu
>>
>> I have btrfs-progs v 6.3.2-1. It was last upgraded on June 23.
>>
>> Here are the results of a recent scrub:
>>
>> btrfs scrub status /mnt/nvme0n1p3/
>> UUID:             20db1fe2-60a4-4eb7-87ac-1953a55dda16
>> Scrub started:    Sun Jul  2 19:19:53 2023
>> Status:           finished
>> Duration:         0:00:51
>> Total to scrub:   47.28GiB
>> Rate:             948.61MiB/s
>> Error summary:    no errors found
>>
>> Here is hdparm performance output of the drive:
>>
>> /dev/nvme0n1:
>>   Timing O_DIRECT cached reads:   3744 MB in  2.00 seconds = 1871.94
>> MB/sec
>>   Timing O_DIRECT disk reads: 9180 MB in  3.00 seconds = 3059.63 MB/sec
>>
>> Here is an attempt at describing my system:
>> inxi -F
>> System:
>>
>>    Host: tux Kernel: 6.4.1-arch1-1 arch: x86_64 bits: 64 Console: pty
>> pts/2 Distro: Arch Linux
>> Machine:
>>    Type: Desktop Mobo: ASUSTeK model: TUF GAMING X570-PLUS (WI-FI) v:
>> Rev X.0x
>>      serial: 200771405807421 UEFI: American Megatrends v: 4602 date:
>> 02/23/2023
>> CPU:
>>    Info: 12-core model: AMD Ryzen 9 3900X bits: 64 type: MT MCP cache:
>> L2: 6 MiB
>>    Speed (MHz): avg: 2666 min/max: 2200/4672 cores: 1: 3800 2: 2200 3:
>> 2200 4: 2200 5: 2200
>>      6: 3800 7: 2200 8: 3800 9: 2200 10: 2200 11: 3800 12: 2200 13:
>> 3800 14: 2200 15: 2200 16: 2200
>>      17: 2200 18: 2200 19: 2200 20: 2200 21: 3800 22: 2200 23: 2200
>> 24: 3800
>> Graphics:
>>    Device-1: NVIDIA TU104 [GeForce RTX 2060] driver: nvidia v: 535.54.03
>>    Display: server: X.org v: 1.21.1.8 driver: X: loaded: nvidia
>> unloaded: modesetting gpu: nvidia
>>      tty: 273x63
>>    API: OpenGL Message: GL data unavailable in console and glxinfo
>> missing.
>> Audio:
>>    Device-1: NVIDIA TU104 HD Audio driver: snd_hda_intel
>>    Device-2: AMD Starship/Matisse HD Audio driver: snd_hda_intel
>>    API: ALSA v: k6.4.1-arch1-1 status: kernel-api
>> Network:
>>    Device-1: Intel Wireless-AC 9260 driver: iwlwifi
>>    IF: wlan0 state: up mac: cc:d9:ac:3a:b4:9d
>>    Device-2: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet
>> driver: r8169
>>    IF: enp5s0 state: down mac: 24:4b:fe:96:38:f9
>> Bluetooth:
>>    Device-1: N/A driver: btusb type: USB
>>    Report: rfkill ID: hci0 rfk-id: 0 state: down bt-service: disabled
>> rfk-block: hardware: no
>>      software: no address: see --recommends
>> Drives:
>>    Local Storage: total: 7.73 TiB used: 378.62 GiB (4.8%)
>>    ID-1: /dev/nvme0n1 vendor: Western Digital model: WDBRPG0010BNC-WRSN
>> size: 931.51 GiB
>>    ID-2: /dev/sda vendor: Samsung model: SSD 860 EVO 500GB size:
>> 465.76 GiB
>>    ID-3: /dev/sdb vendor: Seagate model: ST2000DM008-2FR102 size: 1.82
>> TiB
>>    ID-4: /dev/sdc vendor: Western Digital model: WD50NDZW-11BGSS1 size:
>> 4.55 TiB type: USB
>> Partition:
>>    ID-1: / size: 915.26 GiB used: 47.37 GiB (5.2%) fs: btrfs dev:
>> /dev/nvme0n1p3
>>    ID-2: /boot size: 252 MiB used: 92.1 MiB (36.5%) fs: vfat dev:
>> /dev/nvme0n1p1
>>    ID-3: /home size: 915.26 GiB used: 47.37 GiB (5.2%) fs: btrfs dev:
>> /dev/nvme0n1p3
>> Swap:
>>    ID-1: swap-1 type: partition size: 16 GiB used: 0 KiB (0.0%) dev:
>> /dev/nvme0n1p2
>> Sensors:
>>    System Temperatures: cpu: 27.5 C mobo: 26.0 C gpu: nvidia temp: 32 C
>>    Fan Speeds (RPM): fan-1: 847 fan-2: 1074 fan-3: 0 fan-4: 0 fan-5:
>> 1002 fan-6: 0 fan-7: 782
>> Info:
>>    Processes: 407 Uptime: 23m Memory: available: 31.25 GiB used: 1.54
>> GiB (4.9%) Init: systemd
>>    Shell: Bash inxi: 3.3.27

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Scrub of my nvme SSD has slowed by about 2/3
  2023-07-03 20:19 Scrub of my nvme SSD has slowed by about 2/3 Tim Cuthbertson
  2023-07-03 23:49 ` Qu Wenruo
@ 2023-07-11  5:33 ` Martin Steigerwald
  2023-07-11  5:49   ` Martin Steigerwald
  2023-07-12 11:02 ` Linux regression tracking #adding (Thorsten Leemhuis)
  2 siblings, 1 reply; 30+ messages in thread
From: Martin Steigerwald @ 2023-07-11  5:33 UTC (permalink / raw)
  To: linux-btrfs, Tim Cuthbertson

Tim Cuthbertson - 03.07.23, 22:19:50 CEST:
> Yesterday, I noticed that a scrub of my main system filesystem has
> slowed from about 2.9 gb/sec to about 949 mb/sec. My scrub used to run
> in about 12 seconds, now it is taking 51 seconds. I had just
> installed Linux kernel 6.4.1 on Arch Linux, upgrading from 6.3.9. At
> first I suspected the new kernel, but now I am not so sure.
> 
> I have btrfs-progs v 6.3.2-1. It was last upgraded on June 23.

I can confirm this with similar values.

v6.3 was fine, with scrub speeds from 1.8 to 2.6 GiB/s, v6.4 only has a 
bit less 1 GiB/s.

atop shows 100% utilization of NVME SSD which is odd at less than 1 GiB/
s sequential I/O and a lot of kworker threads doing about 200-300% of 
system time CPU utilization.

This is with ThinkPad T14 AMD Gen 1 with AMD Ryzen 7 PRO 4750U and 32 
GiB RAM on Samsung 980 Pro 2TB NVME SSD connected via PCIe 3. The 
hardware can definitely do more throughput even with "just" PCIe 3.

-- 
Martin



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Scrub of my nvme SSD has slowed by about 2/3
  2023-07-05  2:44   ` Qu Wenruo
@ 2023-07-11  5:36     ` Martin Steigerwald
  0 siblings, 0 replies; 30+ messages in thread
From: Martin Steigerwald @ 2023-07-11  5:36 UTC (permalink / raw)
  To: Tim Cuthbertson, linux-btrfs, Qu Wenruo

Qu Wenruo - 05.07.23, 04:44:02 CEST:
> If you're fine compiling a custom kernel, I can craft a branch for you
> to test.

I can compile a custom kernel. Actually findings are with custom compiled 
kernel from either

git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git

or

git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

I can confirm faster speeds also with Debian kernel <=6.3. No Debian 6.4 
kernel installed yet.

Thanks,
-- 
Martin



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Scrub of my nvme SSD has slowed by about 2/3
  2023-07-11  5:33 ` Martin Steigerwald
@ 2023-07-11  5:49   ` Martin Steigerwald
  2023-07-11  5:52     ` Martin Steigerwald
  0 siblings, 1 reply; 30+ messages in thread
From: Martin Steigerwald @ 2023-07-11  5:49 UTC (permalink / raw)
  To: linux-btrfs, Tim Cuthbertson

Martin Steigerwald - 11.07.23, 07:33:36 CEST:
> This is with ThinkPad T14 AMD Gen 1 with AMD Ryzen 7 PRO 4750U and 32
> GiB RAM on Samsung 980 Pro 2TB NVME SSD connected via PCIe 3. The
> hardware can definitely do more throughput even with "just" PCIe 3.

I forgot to add that BTRFS is on top LVM inside LUKS.

I see about 180000 reads in 10 seconds in atop. I have seen latency 
values from 55 to 85 µs which is highly unusual for NVME SSD ("avio" in 
atop¹).

[1] according to man page atop(1) from atop 2.9:

the average number of milliseconds needed by a request ('avio') for 
seek, latency and data transfer

-- 
Martin



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Scrub of my nvme SSD has slowed by about 2/3
  2023-07-11  5:49   ` Martin Steigerwald
@ 2023-07-11  5:52     ` Martin Steigerwald
  2023-07-11  8:59       ` Qu Wenruo
  0 siblings, 1 reply; 30+ messages in thread
From: Martin Steigerwald @ 2023-07-11  5:52 UTC (permalink / raw)
  To: linux-btrfs, Tim Cuthbertson

Martin Steigerwald - 11.07.23, 07:49:43 CEST:
> I see about 180000 reads in 10 seconds in atop. I have seen latency
> values from 55 to 85 µs which is highly unusual for NVME SSD ("avio"
> in atop¹).

Well I did not compare to a base line during scrub with 6.3. So not 
actually sure about the unusual bit. But at least during daily activity 
I do not see those values.

Anyway, I am willing to test a patch.

> [1] according to man page atop(1) from atop 2.9:
> 
> the average number of milliseconds needed by a request ('avio') for
> seek, latency and data transfer
-- 
Martin



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Scrub of my nvme SSD has slowed by about 2/3
  2023-07-11  5:52     ` Martin Steigerwald
@ 2023-07-11  8:59       ` Qu Wenruo
  2023-07-11  9:25         ` Martin Steigerwald
  0 siblings, 1 reply; 30+ messages in thread
From: Qu Wenruo @ 2023-07-11  8:59 UTC (permalink / raw)
  To: Martin Steigerwald, linux-btrfs, Tim Cuthbertson



On 2023/7/11 13:52, Martin Steigerwald wrote:
> Martin Steigerwald - 11.07.23, 07:49:43 CEST:
>> I see about 180000 reads in 10 seconds in atop. I have seen latency
>> values from 55 to 85 µs which is highly unusual for NVME SSD ("avio"
>> in atop¹).
>
> Well I did not compare to a base line during scrub with 6.3. So not
> actually sure about the unusual bit. But at least during daily activity
> I do not see those values.
>
> Anyway, I am willing to test a patch.

Mind to try the following branch?

https://github.com/adam900710/linux/tree/scrub_multi_thread

Or you can grab the commit on top and backport to any kernel >= 6.4.

Thanks,
Qu
>
>> [1] according to man page atop(1) from atop 2.9:
>>
>> the average number of milliseconds needed by a request ('avio') for
>> seek, latency and data transfer

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Scrub of my nvme SSD has slowed by about 2/3
  2023-07-11  8:59       ` Qu Wenruo
@ 2023-07-11  9:25         ` Martin Steigerwald
  2023-07-11  9:57           ` Qu Wenruo
  0 siblings, 1 reply; 30+ messages in thread
From: Martin Steigerwald @ 2023-07-11  9:25 UTC (permalink / raw)
  To: linux-btrfs, Tim Cuthbertson, Qu Wenruo

Qu Wenruo - 11.07.23, 10:59:55 CEST:
> On 2023/7/11 13:52, Martin Steigerwald wrote:
> > Martin Steigerwald - 11.07.23, 07:49:43 CEST:
> >> I see about 180000 reads in 10 seconds in atop. I have seen latency
> >> values from 55 to 85 µs which is highly unusual for NVME SSD
> >> ("avio"
> >> in atop¹).
> > 
> > Well I did not compare to a base line during scrub with 6.3. So not
> > actually sure about the unusual bit. But at least during daily
> > activity I do not see those values.
> > 
> > Anyway, I am willing to test a patch.
> 
> Mind to try the following branch?
> 
> https://github.com/adam900710/linux/tree/scrub_multi_thread
> 
> Or you can grab the commit on top and backport to any kernel >= 6.4.

Cherry picking the commit on top of v6.4.3 lead to a merge conflict. 
Since this is a production machine and I am no kernel developer with 
insight to the inner workings of BTRFS, I'd prefer a patch that applies 
cleanly on top of v6.4.3. I'd rather not try out a tree, unless I know 
its a stable kernel version or at least rc3/4 or later. Again this is a 
production machine.

You know, I prefer to keep my data :)

> Thanks,
> Qu
> 
> >> [1] according to man page atop(1) from atop 2.9:
> >> 
> >> the average number of milliseconds needed by a request ('avio') for
> >> seek, latency and data transfer


-- 
Martin



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Scrub of my nvme SSD has slowed by about 2/3
  2023-07-11  9:25         ` Martin Steigerwald
@ 2023-07-11  9:57           ` Qu Wenruo
  2023-07-11 10:56             ` Martin Steigerwald
  0 siblings, 1 reply; 30+ messages in thread
From: Qu Wenruo @ 2023-07-11  9:57 UTC (permalink / raw)
  To: Martin Steigerwald, linux-btrfs, Tim Cuthbertson, Qu Wenruo

[-- Attachment #1: Type: text/plain, Size: 1696 bytes --]



On 2023/7/11 17:25, Martin Steigerwald wrote:
> Qu Wenruo - 11.07.23, 10:59:55 CEST:
>> On 2023/7/11 13:52, Martin Steigerwald wrote:
>>> Martin Steigerwald - 11.07.23, 07:49:43 CEST:
>>>> I see about 180000 reads in 10 seconds in atop. I have seen latency
>>>> values from 55 to 85 µs which is highly unusual for NVME SSD
>>>> ("avio"
>>>> in atop¹).
>>>
>>> Well I did not compare to a base line during scrub with 6.3. So not
>>> actually sure about the unusual bit. But at least during daily
>>> activity I do not see those values.
>>>
>>> Anyway, I am willing to test a patch.
>>
>> Mind to try the following branch?
>>
>> https://github.com/adam900710/linux/tree/scrub_multi_thread
>>
>> Or you can grab the commit on top and backport to any kernel >= 6.4.
> 
> Cherry picking the commit on top of v6.4.3 lead to a merge conflict.
> Since this is a production machine and I am no kernel developer with
> insight to the inner workings of BTRFS, I'd prefer a patch that applies
> cleanly on top of v6.4.3. I'd rather not try out a tree, unless I know
> its a stable kernel version or at least rc3/4 or later. Again this is a
> production machine.

Well, I have only tested that patch on that development branch, thus I 
can not ensure the result of the backport.

But still, here you go the backported patch.

I'd recommend to test the functionality of scrub on some less important 
machine first then on your production latptop though.

Thanks,
Qu
> 
> You know, I prefer to keep my data :)
> 
>> Thanks,
>> Qu
>>
>>>> [1] according to man page atop(1) from atop 2.9:
>>>>
>>>> the average number of milliseconds needed by a request ('avio') for
>>>> seek, latency and data transfer
> 
> 

[-- Attachment #2: 0001-btrfs-speedup-scrub-csum-verification.patch --]
[-- Type: text/x-patch, Size: 6209 bytes --]

From 7b5d9f4c59cb071f92ffa5af06827c58eaf4030e Mon Sep 17 00:00:00 2001
Message-ID: <7b5d9f4c59cb071f92ffa5af06827c58eaf4030e.1689069371.git.wqu@suse.com>
From: Qu Wenruo <wqu@suse.com>
Date: Wed, 5 Jul 2023 14:02:01 +0800
Subject: [PATCH] btrfs: speedup scrub csum verification

[REGRESSION]
There is a report about scrub is much slower on v6.4 kernel on fast NVME
devices.

The system has a NVME device which can reach over 3GBytes/s, but scrub
speed is below 1GBytes/s.

[CAUSE]
Since commit e02ee89baa66 ("btrfs: scrub: switch scrub_simple_mirror() to
scrub_stripe infrastructure") scrub goes a completely new
implementation.

There is a behavior change, where previously scrub is doing csum
verification in one-thread-per-block way, but the new code goes
one-thread-per-stripe way.

This means for the worst case, new code would only have one thread
verifying a whole 64K stripe filled with data.

While the old code is doing 16 threads to handle the same stripe.

Considering the reporter's CPU can only do CRC32C at around 2GBytes/s,
while the NVME drive can do 3GBytes/s, the difference can be big:

	1 thread:	1 / ( 1 / 3 + 1 / 2)     = 1.2 Gbytes/s
	8 threads: 	1 / ( 1 / 3 + 1 / 8 / 2) = 2.5 Gbytes/s

[FIX]
To fix the performance regression, this patch would introduce
multi-thread csum verification by:

- Introduce a new workqueue for scrub csum verification
  The new workqueue is needed as we can not queue the same csum work
  into the main scrub worker, where we are waiting for the csum work
  to finish.
  Or this can lead to dead lock if there is no more worker allocated.

- Add extra members to scrub_sector_verification
  This allows a work to be queued for the specific sector.
  Although this means we will have 20 bytes overhead per sector.

- Queue sector verification work into scrub_csum_worker
  This allows multiple threads to handle the csum verification workload.

- Do not reset stripe->sectors during scrub_find_fill_first_stripe()
  Since sectors now contain extra info, we should not touch those
  members.

Reported-by: Bernd Lentes <bernd.lentes@helmholtz-muenchen.de>
Link: https://lore.kernel.org/linux-btrfs/CAAKzf7=yS9vnf5zNid1CyvN19wyAgPz5o9sJP0vBqN6LReqXVg@mail.gmail.com/
Fixes: e02ee89baa66 ("btrfs: scrub: switch scrub_simple_mirror() to scrub_stripe infrastructure")
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/scrub.c | 49 ++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 41 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 16c228344cbb..3577b8d927b2 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -72,6 +72,11 @@ struct scrub_sector_verification {
 		 */
 		u64 generation;
 	};
+
+	/* For multi-thread verification. */
+	struct scrub_stripe *stripe;
+	struct work_struct work;
+	unsigned int sector_nr;
 };
 
 enum scrub_stripe_flags {
@@ -259,6 +264,12 @@ static int init_scrub_stripe(struct btrfs_fs_info *fs_info,
 				  GFP_KERNEL);
 	if (!stripe->sectors)
 		goto error;
+	for (int i = 0; i < stripe->nr_sectors; i++) {
+		struct scrub_sector_verification *sector = &stripe->sectors[i];
+
+		sector->stripe = stripe;
+		sector->sector_nr = i;
+	}
 
 	stripe->csums = kcalloc(BTRFS_STRIPE_LEN >> fs_info->sectorsize_bits,
 				fs_info->csum_size, GFP_KERNEL);
@@ -730,11 +741,11 @@ static void scrub_verify_one_sector(struct scrub_stripe *stripe, int sector_nr)
 
 	/* Sector not utilized, skip it. */
 	if (!test_bit(sector_nr, &stripe->extent_sector_bitmap))
-		return;
+		goto out;
 
 	/* IO error, no need to check. */
 	if (test_bit(sector_nr, &stripe->io_error_bitmap))
-		return;
+		goto out;
 
 	/* Metadata, verify the full tree block. */
 	if (sector->is_metadata) {
@@ -752,10 +763,10 @@ static void scrub_verify_one_sector(struct scrub_stripe *stripe, int sector_nr)
 				      stripe->logical +
 				      (sector_nr << fs_info->sectorsize_bits),
 				      stripe->logical);
-			return;
+			goto out;
 		}
 		scrub_verify_one_metadata(stripe, sector_nr);
-		return;
+		goto out;
 	}
 
 	/*
@@ -764,7 +775,7 @@ static void scrub_verify_one_sector(struct scrub_stripe *stripe, int sector_nr)
 	 */
 	if (!sector->csum) {
 		clear_bit(sector_nr, &stripe->error_bitmap);
-		return;
+		goto out;
 	}
 
 	ret = btrfs_check_sector_csum(fs_info, page, pgoff, csum_buf, sector->csum);
@@ -775,6 +786,17 @@ static void scrub_verify_one_sector(struct scrub_stripe *stripe, int sector_nr)
 		clear_bit(sector_nr, &stripe->csum_error_bitmap);
 		clear_bit(sector_nr, &stripe->error_bitmap);
 	}
+out:
+	atomic_dec(&stripe->pending_io);
+	wake_up(&stripe->io_wait);
+}
+
+static void scrub_verify_work(struct work_struct *work)
+{
+	struct scrub_sector_verification *sector = container_of(work,
+			struct scrub_sector_verification, work);
+
+	scrub_verify_one_sector(sector->stripe, sector->sector_nr);
 }
 
 /* Verify specified sectors of a stripe. */
@@ -784,11 +806,24 @@ static void scrub_verify_one_stripe(struct scrub_stripe *stripe, unsigned long b
 	const u32 sectors_per_tree = fs_info->nodesize >> fs_info->sectorsize_bits;
 	int sector_nr;
 
+	/* All IO should have finished, and we will reuse pending_io soon. */
+	ASSERT(atomic_read(&stripe->pending_io) == 0);
+
 	for_each_set_bit(sector_nr, &bitmap, stripe->nr_sectors) {
-		scrub_verify_one_sector(stripe, sector_nr);
+		struct scrub_sector_verification *sector = &stripe->sectors[sector_nr];
+
+		/* The sector should have been initialized. */
+		ASSERT(sector->sector_nr == sector_nr);
+		ASSERT(sector->stripe == stripe);
+
+		atomic_inc(&stripe->pending_io);
+		INIT_WORK(&sector->work, scrub_verify_work);
+		queue_work(fs_info->scrub_wr_completion_workers, &sector->work);
+
 		if (stripe->sectors[sector_nr].is_metadata)
 			sector_nr += sectors_per_tree - 1;
 	}
+	wait_scrub_stripe_io(stripe);
 }
 
 static int calc_sector_number(struct scrub_stripe *stripe, struct bio_vec *first_bvec)
@@ -1534,8 +1569,6 @@ static int scrub_find_fill_first_stripe(struct btrfs_block_group *bg,
 	u64 extent_gen;
 	int ret;
 
-	memset(stripe->sectors, 0, sizeof(struct scrub_sector_verification) *
-				   stripe->nr_sectors);
 	scrub_stripe_reset_bitmaps(stripe);
 
 	/* The range must be inside the bg. */
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: Scrub of my nvme SSD has slowed by about 2/3
  2023-07-11  9:57           ` Qu Wenruo
@ 2023-07-11 10:56             ` Martin Steigerwald
  2023-07-11 11:05               ` Qu Wenruo
  0 siblings, 1 reply; 30+ messages in thread
From: Martin Steigerwald @ 2023-07-11 10:56 UTC (permalink / raw)
  To: linux-btrfs, Tim Cuthbertson, Qu Wenruo, Qu Wenruo

Qu Wenruo - 11.07.23, 11:57:52 CEST:
> On 2023/7/11 17:25, Martin Steigerwald wrote:
> > Qu Wenruo - 11.07.23, 10:59:55 CEST:
> >> On 2023/7/11 13:52, Martin Steigerwald wrote:
> >>> Martin Steigerwald - 11.07.23, 07:49:43 CEST:
> >>>> I see about 180000 reads in 10 seconds in atop. I have seen
> >>>> latency
> >>>> values from 55 to 85 µs which is highly unusual for NVME SSD
> >>>> ("avio"
> >>>> in atop¹).
> >>> 
> >>> Well I did not compare to a base line during scrub with 6.3. So
> >>> not actually sure about the unusual bit. But at least during daily
> >>> activity I do not see those values.
> >>> 
> >>> Anyway, I am willing to test a patch.
> >> 
> >> Mind to try the following branch?
> >> 
> >> https://github.com/adam900710/linux/tree/scrub_multi_thread
> >> 
> >> Or you can grab the commit on top and backport to any kernel >=
> >> 6.4.
> > 
> > Cherry picking the commit on top of v6.4.3 lead to a merge conflict.
> > Since this is a production machine and I am no kernel developer with
> > insight to the inner workings of BTRFS, I'd prefer a patch that
> > applies cleanly on top of v6.4.3. I'd rather not try out a tree,
> > unless I know its a stable kernel version or at least rc3/4 or
> > later. Again this is a production machine.
> 
> Well, I have only tested that patch on that development branch, thus I
> can not ensure the result of the backport.
> 
> But still, here you go the backported patch.
> 
> I'd recommend to test the functionality of scrub on some less
> important machine first then on your production latptop though.

I took this calculated risk.

However, while with the patch applied there seem to be more kworker 
threads doing work using 500-600% of CPU time in system (8 cores with 
hyper threading, so 16 logical cores) the result is even less 
performance. Latency values got even worse going up to 0,2 ms. An 
unrelated BTRFS filesystem in another logical volume is even stalled to 
almost a second for (mostly) write accesses.

Scrubbing about 650 to 750 MiB/s for a volume with about 1,2 TiB of 
data, mostly in larger files. Now on second attempt even only 620 MiB/s. 
Which is less than before. Before it reaches about 1 GiB/s. I made sure 
that no desktop search indexing was interfering.

Oh, I forgot to mention, BTRFS uses xxhash here. However it was easily 
scrubbing at 1,5 to 2,5 GiB/s with 5.3. The filesystem uses zstd 
compression and free space tree (free space cache v2).

So from what I can see here, your patch made it worse.

Best,
-- 
Martin



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Scrub of my nvme SSD has slowed by about 2/3
  2023-07-11 10:56             ` Martin Steigerwald
@ 2023-07-11 11:05               ` Qu Wenruo
  2023-07-11 11:26                 ` Martin Steigerwald
  0 siblings, 1 reply; 30+ messages in thread
From: Qu Wenruo @ 2023-07-11 11:05 UTC (permalink / raw)
  To: Martin Steigerwald, linux-btrfs, Tim Cuthbertson, Qu Wenruo



On 2023/7/11 18:56, Martin Steigerwald wrote:
> Qu Wenruo - 11.07.23, 11:57:52 CEST:
>> On 2023/7/11 17:25, Martin Steigerwald wrote:
>>> Qu Wenruo - 11.07.23, 10:59:55 CEST:
>>>> On 2023/7/11 13:52, Martin Steigerwald wrote:
>>>>> Martin Steigerwald - 11.07.23, 07:49:43 CEST:
>>>>>> I see about 180000 reads in 10 seconds in atop. I have seen
>>>>>> latency
>>>>>> values from 55 to 85 µs which is highly unusual for NVME SSD
>>>>>> ("avio"
>>>>>> in atop¹).
>>>>>
>>>>> Well I did not compare to a base line during scrub with 6.3. So
>>>>> not actually sure about the unusual bit. But at least during daily
>>>>> activity I do not see those values.
>>>>>
>>>>> Anyway, I am willing to test a patch.
>>>>
>>>> Mind to try the following branch?
>>>>
>>>> https://github.com/adam900710/linux/tree/scrub_multi_thread
>>>>
>>>> Or you can grab the commit on top and backport to any kernel >=
>>>> 6.4.
>>>
>>> Cherry picking the commit on top of v6.4.3 lead to a merge conflict.
>>> Since this is a production machine and I am no kernel developer with
>>> insight to the inner workings of BTRFS, I'd prefer a patch that
>>> applies cleanly on top of v6.4.3. I'd rather not try out a tree,
>>> unless I know its a stable kernel version or at least rc3/4 or
>>> later. Again this is a production machine.
>>
>> Well, I have only tested that patch on that development branch, thus I
>> can not ensure the result of the backport.
>>
>> But still, here you go the backported patch.
>>
>> I'd recommend to test the functionality of scrub on some less
>> important machine first then on your production latptop though.
> 
> I took this calculated risk.
> 
> However, while with the patch applied there seem to be more kworker
> threads doing work using 500-600% of CPU time in system (8 cores with
> hyper threading, so 16 logical cores) the result is even less
> performance. Latency values got even worse going up to 0,2 ms. An
> unrelated BTRFS filesystem in another logical volume is even stalled to
> almost a second for (mostly) write accesses.
> 
> Scrubbing about 650 to 750 MiB/s for a volume with about 1,2 TiB of
> data, mostly in larger files. Now on second attempt even only 620 MiB/s.
> Which is less than before. Before it reaches about 1 GiB/s. I made sure
> that no desktop search indexing was interfering.
> 
> Oh, I forgot to mention, BTRFS uses xxhash here. However it was easily
> scrubbing at 1,5 to 2,5 GiB/s with 5.3. The filesystem uses zstd
> compression and free space tree (free space cache v2).
> 
> So from what I can see here, your patch made it worse.

Thanks for the confirming, this at least prove it's not the hashing 
threads limit causing the regression.

Which is pretty weird, the read pattern is in fact better than the 
original behavior, all read are in 64K (even if there are some holes, we 
are fine reading the garbage, this should reduce IOPS workload), and we 
submit a batch of 8 of such read in one go.

BTW, what's the CPU usage of v6.3 kernel? Is it higher or lower?
And what about the latency?

Currently I'm out of ideas, for now you can revert that testing patch.

If you're interested in more testing, you can apply the following small 
diff, which changed the batch number of scrub.

You can try either double or half the number to see which change helps more.

Thanks,
Qu

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 16c228344cbb..26689d98c58f 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -45,7 +45,7 @@ struct scrub_ctx;
   *
   * This determines the batch size for stripe submitted in one go.
   */
-#define SCRUB_STRIPES_PER_SCTX 8       /* That would be 8 64K stripe 
per-device. */
+#define SCRUB_STRIPES_PER_SCTX 16      /* That would be 8 64K stripe 
per-device. */

  /*
   * The following value times PAGE_SIZE needs to be large enough to 
match the
> 
> Best,

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: Scrub of my nvme SSD has slowed by about 2/3
  2023-07-11 11:05               ` Qu Wenruo
@ 2023-07-11 11:26                 ` Martin Steigerwald
  2023-07-11 11:33                   ` Qu Wenruo
  0 siblings, 1 reply; 30+ messages in thread
From: Martin Steigerwald @ 2023-07-11 11:26 UTC (permalink / raw)
  To: linux-btrfs, Tim Cuthbertson, Qu Wenruo, Qu Wenruo

Qu Wenruo - 11.07.23, 13:05:42 CEST:
> On 2023/7/11 18:56, Martin Steigerwald wrote:
> > Qu Wenruo - 11.07.23, 11:57:52 CEST:
> >> On 2023/7/11 17:25, Martin Steigerwald wrote:
> >>> Qu Wenruo - 11.07.23, 10:59:55 CEST:
> >>>> On 2023/7/11 13:52, Martin Steigerwald wrote:
> >>>>> Martin Steigerwald - 11.07.23, 07:49:43 CEST:
> >>>>>> I see about 180000 reads in 10 seconds in atop. I have seen
> >>>>>> latency
> >>>>>> values from 55 to 85 µs which is highly unusual for NVME SSD
> >>>>>> ("avio"
> >>>>>> in atop¹).
[…]
> >>>> Mind to try the following branch?
> >>>> 
> >>>> https://github.com/adam900710/linux/tree/scrub_multi_thread
> >>>> 
> >>>> Or you can grab the commit on top and backport to any kernel >=
> >>>> 6.4.
> >>> 
> >>> Cherry picking the commit on top of v6.4.3 lead to a merge
> >>> conflict.
[…]
> >> Well, I have only tested that patch on that development branch,
> >> thus I can not ensure the result of the backport.
> >> 
> >> But still, here you go the backported patch.
> >> 
> >> I'd recommend to test the functionality of scrub on some less
> >> important machine first then on your production latptop though.
> > 
> > I took this calculated risk.
> > 
> > However, while with the patch applied there seem to be more kworker
> > threads doing work using 500-600% of CPU time in system (8 cores
> > with
> > hyper threading, so 16 logical cores) the result is even less
> > performance. Latency values got even worse going up to 0,2 ms. An
> > unrelated BTRFS filesystem in another logical volume is even stalled
> > to almost a second for (mostly) write accesses.
> > 
> > Scrubbing about 650 to 750 MiB/s for a volume with about 1,2 TiB of
> > data, mostly in larger files. Now on second attempt even only 620
> > MiB/s. Which is less than before. Before it reaches about 1 GiB/s.
> > I made sure that no desktop search indexing was interfering.
> > 
> > Oh, I forgot to mention, BTRFS uses xxhash here. However it was
> > easily scrubbing at 1,5 to 2,5 GiB/s with 5.3. The filesystem uses
> > zstd compression and free space tree (free space cache v2).
> > 
> > So from what I can see here, your patch made it worse.
> 
> Thanks for the confirming, this at least prove it's not the hashing
> threads limit causing the regression.
> 
> Which is pretty weird, the read pattern is in fact better than the
> original behavior, all read are in 64K (even if there are some holes,
> we are fine reading the garbage, this should reduce IOPS workload),
> and we submit a batch of 8 of such read in one go.
> 
> BTW, what's the CPU usage of v6.3 kernel? Is it higher or lower?
> And what about the latency?

CPU usage is between 600-700% on 6.3.9, Latency between 50-70 µs. And 
scrubbing speed is above 2 GiB/s, peaking at 2,27 GiB/s. Later it went 
down a bit to 1,7 GiB/s, maybe due to background activity.

I'd say the CPU usage to result (=scrubbing speed) ratio is much, much 
better with 6.3. However the latencies during scrubbing are pretty much 
the same. I even seen up to 0.2 ms.

> Currently I'm out of ideas, for now you can revert that testing patch.
> 
> If you're interested in more testing, you can apply the following
> small diff, which changed the batch number of scrub.
> 
> You can try either double or half the number to see which change helps
> more.

No time for further testing at the moment. Maybe at a later time.

It might be good you put together a test setup yourself. Any computer 
with NVME SSD should do I think. Unless there is something very special 
about my laptop, but I doubt this. This reduces greatly on the turn-
around time.

I think for now I am back at 6.3. It works. :)

-- 
Martin



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Scrub of my nvme SSD has slowed by about 2/3
  2023-07-11 11:26                 ` Martin Steigerwald
@ 2023-07-11 11:33                   ` Qu Wenruo
  2023-07-11 11:47                     ` Martin Steigerwald
  2023-07-14  0:28                     ` Qu Wenruo
  0 siblings, 2 replies; 30+ messages in thread
From: Qu Wenruo @ 2023-07-11 11:33 UTC (permalink / raw)
  To: Martin Steigerwald, linux-btrfs, Tim Cuthbertson, Qu Wenruo



On 2023/7/11 19:26, Martin Steigerwald wrote:
> Qu Wenruo - 11.07.23, 13:05:42 CEST:
>> On 2023/7/11 18:56, Martin Steigerwald wrote:
>>> Qu Wenruo - 11.07.23, 11:57:52 CEST:
>>>> On 2023/7/11 17:25, Martin Steigerwald wrote:
>>>>> Qu Wenruo - 11.07.23, 10:59:55 CEST:
>>>>>> On 2023/7/11 13:52, Martin Steigerwald wrote:
>>>>>>> Martin Steigerwald - 11.07.23, 07:49:43 CEST:
>>>>>>>> I see about 180000 reads in 10 seconds in atop. I have seen
>>>>>>>> latency
>>>>>>>> values from 55 to 85 µs which is highly unusual for NVME SSD
>>>>>>>> ("avio"
>>>>>>>> in atop¹).
> […]
>>>>>> Mind to try the following branch?
>>>>>>
>>>>>> https://github.com/adam900710/linux/tree/scrub_multi_thread
>>>>>>
>>>>>> Or you can grab the commit on top and backport to any kernel >=
>>>>>> 6.4.
>>>>>
>>>>> Cherry picking the commit on top of v6.4.3 lead to a merge
>>>>> conflict.
> […]
>>>> Well, I have only tested that patch on that development branch,
>>>> thus I can not ensure the result of the backport.
>>>>
>>>> But still, here you go the backported patch.
>>>>
>>>> I'd recommend to test the functionality of scrub on some less
>>>> important machine first then on your production latptop though.
>>>
>>> I took this calculated risk.
>>>
>>> However, while with the patch applied there seem to be more kworker
>>> threads doing work using 500-600% of CPU time in system (8 cores
>>> with
>>> hyper threading, so 16 logical cores) the result is even less
>>> performance. Latency values got even worse going up to 0,2 ms. An
>>> unrelated BTRFS filesystem in another logical volume is even stalled
>>> to almost a second for (mostly) write accesses.
>>>
>>> Scrubbing about 650 to 750 MiB/s for a volume with about 1,2 TiB of
>>> data, mostly in larger files. Now on second attempt even only 620
>>> MiB/s. Which is less than before. Before it reaches about 1 GiB/s.
>>> I made sure that no desktop search indexing was interfering.
>>>
>>> Oh, I forgot to mention, BTRFS uses xxhash here. However it was
>>> easily scrubbing at 1,5 to 2,5 GiB/s with 5.3. The filesystem uses
>>> zstd compression and free space tree (free space cache v2).
>>>
>>> So from what I can see here, your patch made it worse.
>>
>> Thanks for the confirming, this at least prove it's not the hashing
>> threads limit causing the regression.
>>
>> Which is pretty weird, the read pattern is in fact better than the
>> original behavior, all read are in 64K (even if there are some holes,
>> we are fine reading the garbage, this should reduce IOPS workload),
>> and we submit a batch of 8 of such read in one go.
>>
>> BTW, what's the CPU usage of v6.3 kernel? Is it higher or lower?
>> And what about the latency?
>
> CPU usage is between 600-700% on 6.3.9, Latency between 50-70 µs. And
> scrubbing speed is above 2 GiB/s, peaking at 2,27 GiB/s. Later it went
> down a bit to 1,7 GiB/s, maybe due to background activity.

That 600~700% means btrfs is taking all its available thread_pool
(min(nr_cpu + 2, 8)).

So although the patch doesn't work as expected, we're still limited by
the csum verification part.

At least I have some clue now.
>
> I'd say the CPU usage to result (=scrubbing speed) ratio is much, much
> better with 6.3. However the latencies during scrubbing are pretty much
> the same. I even seen up to 0.2 ms.
>
>> Currently I'm out of ideas, for now you can revert that testing patch.
>>
>> If you're interested in more testing, you can apply the following
>> small diff, which changed the batch number of scrub.
>>
>> You can try either double or half the number to see which change helps
>> more.
>
> No time for further testing at the moment. Maybe at a later time.
>
> It might be good you put together a test setup yourself. Any computer
> with NVME SSD should do I think. Unless there is something very special
> about my laptop, but I doubt this. This reduces greatly on the turn-
> around time.

Sure, I'll prepare a dedicated machine for this.

Thanks for all your effort!
Qu

>
> I think for now I am back at 6.3. It works. :)
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Scrub of my nvme SSD has slowed by about 2/3
  2023-07-11 11:33                   ` Qu Wenruo
@ 2023-07-11 11:47                     ` Martin Steigerwald
  2023-07-14  0:28                     ` Qu Wenruo
  1 sibling, 0 replies; 30+ messages in thread
From: Martin Steigerwald @ 2023-07-11 11:47 UTC (permalink / raw)
  To: linux-btrfs, Tim Cuthbertson, Qu Wenruo, Qu Wenruo

Qu Wenruo - 11.07.23, 13:33:50 CEST:
> >> BTW, what's the CPU usage of v6.3 kernel? Is it higher or lower?
> >> And what about the latency?
> > 
> > CPU usage is between 600-700% on 6.3.9, Latency between 50-70 µs.
> > And
> > scrubbing speed is above 2 GiB/s, peaking at 2,27 GiB/s. Later it
> > went down a bit to 1,7 GiB/s, maybe due to background activity.
> 
> That 600~700% means btrfs is taking all its available thread_pool
> (min(nr_cpu + 2, 8)).
> 
> So although the patch doesn't work as expected, we're still limited by
> the csum verification part.
> 
> At least I have some clue now.

Well it would have an additional 800-900% of CPU time left over to use 
on this machine, those modern processors are crazy. But for that it 
would have to use more threads. However if you can make this more 
efficient CPU time wise… all the better.

> > I'd say the CPU usage to result (=scrubbing speed) ratio is much,
> > much better with 6.3. However the latencies during scrubbing are
> > pretty much the same. I even seen up to 0.2 ms.
[…]
> >> If you're interested in more testing, you can apply the following
> >> small diff, which changed the batch number of scrub.
[…]
> > No time for further testing at the moment. Maybe at a later time.
> > 
> > It might be good you put together a test setup yourself. Any
[…]
> Sure, I'll prepare a dedicated machine for this.
> 
> Thanks for all your effort!

You are welcome.

Thanks,
-- 
Martin



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Scrub of my nvme SSD has slowed by about 2/3
  2023-07-03 20:19 Scrub of my nvme SSD has slowed by about 2/3 Tim Cuthbertson
  2023-07-03 23:49 ` Qu Wenruo
  2023-07-11  5:33 ` Martin Steigerwald
@ 2023-07-12 11:02 ` Linux regression tracking #adding (Thorsten Leemhuis)
  2023-07-19  6:42   ` Martin Steigerwald
  2023-08-29 12:17   ` Linux regression tracking #update (Thorsten Leemhuis)
  2 siblings, 2 replies; 30+ messages in thread
From: Linux regression tracking #adding (Thorsten Leemhuis) @ 2023-07-12 11:02 UTC (permalink / raw)
  To: Tim Cuthbertson, linux-btrfs; +Cc: Linux kernel regressions list

[CCing the regression list, as it should be in the loop for regressions:
https://docs.kernel.org/admin-guide/reporting-regressions.html]

[TLDR: I'm adding this report to the list of tracked Linux kernel
regressions; the text you find below is based on a few templates
paragraphs you might have encountered already in similar form.
See link in footer if these mails annoy you.]

On 03.07.23 22:19, Tim Cuthbertson wrote:
> Yesterday, I noticed that a scrub of my main system filesystem has
> slowed from about 2.9 gb/sec to about 949 mb/sec. My scrub used to run
> in about 12 seconds, now it is taking 51 seconds. I had just installed
> Linux kernel 6.4.1 on Arch Linux, upgrading from 6.3.9. At first I
> suspected the new kernel, but now I am not so sure.

Thanks for the report. It seems it will take some work to address this,
so to be sure the issue doesn't fall through the cracks unnoticed, I'm
adding it to regzbot, the Linux kernel regression tracking bot:

#regzbot ^introduced e02ee89baa66
#regzbot title btrfs: scrub nvme SSD has slowed by about 2/3 due to csum
#regzbot monitor:
https://lore.kernel.org/all/6c1ffe48e93fee9aa975ecc22dc2e7a1f3d7a0de.1688539673.git.wqu@suse.com/
#regzbot ignore-activity

This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply and tell me -- ideally
while also telling regzbot about it, as explained by the page listed in
the footer of this mail.

Developers: When fixing the issue, remember to add 'Link:' tags pointing
to the report (the parent of this mail). See page linked in footer for
details.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Scrub of my nvme SSD has slowed by about 2/3
  2023-07-11 11:33                   ` Qu Wenruo
  2023-07-11 11:47                     ` Martin Steigerwald
@ 2023-07-14  0:28                     ` Qu Wenruo
  2023-07-14  6:01                       ` Qu Wenruo
  2023-07-16  9:57                       ` Sebastian Döring
  1 sibling, 2 replies; 30+ messages in thread
From: Qu Wenruo @ 2023-07-14  0:28 UTC (permalink / raw)
  To: Martin Steigerwald, linux-btrfs, Tim Cuthbertson, Qu Wenruo

Just a quick update on the situation.

I got a dedicated VM with a PCIE3.0 NVME passed to it without any host
cache.

With v6.3 the scrub speed can reach 3GB/s while the v6.4 (misc-next) can
only go around 1GB/s.

With dedicated VM and more comprehensive telemetry, it shows there are
two major problems:

- Lack of block layer merging
   All 64K stripes are just submitted as is, while the old code can
   merge its read requests to around 512K.

   The cause is the removal of block layer plug/unplug.

   A quick 4 lines fix can improve the performance to around 1.5GB/s.

- Bad csum distribution
   With above problem fixed, I observed that the csum verification seems
   to have only one worker.

   Still investigating the cause.

Thanks,
Qu

On 2023/7/11 19:33, Qu Wenruo wrote:
>
>
> On 2023/7/11 19:26, Martin Steigerwald wrote:
>> Qu Wenruo - 11.07.23, 13:05:42 CEST:
>>> On 2023/7/11 18:56, Martin Steigerwald wrote:
>>>> Qu Wenruo - 11.07.23, 11:57:52 CEST:
>>>>> On 2023/7/11 17:25, Martin Steigerwald wrote:
>>>>>> Qu Wenruo - 11.07.23, 10:59:55 CEST:
>>>>>>> On 2023/7/11 13:52, Martin Steigerwald wrote:
>>>>>>>> Martin Steigerwald - 11.07.23, 07:49:43 CEST:
>>>>>>>>> I see about 180000 reads in 10 seconds in atop. I have seen
>>>>>>>>> latency
>>>>>>>>> values from 55 to 85 µs which is highly unusual for NVME SSD
>>>>>>>>> ("avio"
>>>>>>>>> in atop¹).
>> […]
>>>>>>> Mind to try the following branch?
>>>>>>>
>>>>>>> https://github.com/adam900710/linux/tree/scrub_multi_thread
>>>>>>>
>>>>>>> Or you can grab the commit on top and backport to any kernel >=
>>>>>>> 6.4.
>>>>>>
>>>>>> Cherry picking the commit on top of v6.4.3 lead to a merge
>>>>>> conflict.
>> […]
>>>>> Well, I have only tested that patch on that development branch,
>>>>> thus I can not ensure the result of the backport.
>>>>>
>>>>> But still, here you go the backported patch.
>>>>>
>>>>> I'd recommend to test the functionality of scrub on some less
>>>>> important machine first then on your production latptop though.
>>>>
>>>> I took this calculated risk.
>>>>
>>>> However, while with the patch applied there seem to be more kworker
>>>> threads doing work using 500-600% of CPU time in system (8 cores
>>>> with
>>>> hyper threading, so 16 logical cores) the result is even less
>>>> performance. Latency values got even worse going up to 0,2 ms. An
>>>> unrelated BTRFS filesystem in another logical volume is even stalled
>>>> to almost a second for (mostly) write accesses.
>>>>
>>>> Scrubbing about 650 to 750 MiB/s for a volume with about 1,2 TiB of
>>>> data, mostly in larger files. Now on second attempt even only 620
>>>> MiB/s. Which is less than before. Before it reaches about 1 GiB/s.
>>>> I made sure that no desktop search indexing was interfering.
>>>>
>>>> Oh, I forgot to mention, BTRFS uses xxhash here. However it was
>>>> easily scrubbing at 1,5 to 2,5 GiB/s with 5.3. The filesystem uses
>>>> zstd compression and free space tree (free space cache v2).
>>>>
>>>> So from what I can see here, your patch made it worse.
>>>
>>> Thanks for the confirming, this at least prove it's not the hashing
>>> threads limit causing the regression.
>>>
>>> Which is pretty weird, the read pattern is in fact better than the
>>> original behavior, all read are in 64K (even if there are some holes,
>>> we are fine reading the garbage, this should reduce IOPS workload),
>>> and we submit a batch of 8 of such read in one go.
>>>
>>> BTW, what's the CPU usage of v6.3 kernel? Is it higher or lower?
>>> And what about the latency?
>>
>> CPU usage is between 600-700% on 6.3.9, Latency between 50-70 µs. And
>> scrubbing speed is above 2 GiB/s, peaking at 2,27 GiB/s. Later it went
>> down a bit to 1,7 GiB/s, maybe due to background activity.
>
> That 600~700% means btrfs is taking all its available thread_pool
> (min(nr_cpu + 2, 8)).
>
> So although the patch doesn't work as expected, we're still limited by
> the csum verification part.
>
> At least I have some clue now.
>>
>> I'd say the CPU usage to result (=scrubbing speed) ratio is much, much
>> better with 6.3. However the latencies during scrubbing are pretty much
>> the same. I even seen up to 0.2 ms.
>>
>>> Currently I'm out of ideas, for now you can revert that testing patch.
>>>
>>> If you're interested in more testing, you can apply the following
>>> small diff, which changed the batch number of scrub.
>>>
>>> You can try either double or half the number to see which change helps
>>> more.
>>
>> No time for further testing at the moment. Maybe at a later time.
>>
>> It might be good you put together a test setup yourself. Any computer
>> with NVME SSD should do I think. Unless there is something very special
>> about my laptop, but I doubt this. This reduces greatly on the turn-
>> around time.
>
> Sure, I'll prepare a dedicated machine for this.
>
> Thanks for all your effort!
> Qu
>
>>
>> I think for now I am back at 6.3. It works. :)
>>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Scrub of my nvme SSD has slowed by about 2/3
  2023-07-14  0:28                     ` Qu Wenruo
@ 2023-07-14  6:01                       ` Qu Wenruo
  2023-07-14  6:58                         ` Martin Steigerwald
  2023-07-16  9:57                       ` Sebastian Döring
  1 sibling, 1 reply; 30+ messages in thread
From: Qu Wenruo @ 2023-07-14  6:01 UTC (permalink / raw)
  To: Martin Steigerwald, linux-btrfs, Tim Cuthbertson, Qu Wenruo



On 2023/7/14 08:28, Qu Wenruo wrote:
> Just a quick update on the situation.
>
> I got a dedicated VM with a PCIE3.0 NVME passed to it without any host
> cache.
>
> With v6.3 the scrub speed can reach 3GB/s while the v6.4 (misc-next) can
> only go around 1GB/s.
>
> With dedicated VM and more comprehensive telemetry, it shows there are
> two major problems:
>
> - Lack of block layer merging
>    All 64K stripes are just submitted as is, while the old code can
>    merge its read requests to around 512K.
>
>    The cause is the removal of block layer plug/unplug.
>
>    A quick 4 lines fix can improve the performance to around 1.5GB/s.
>
> - Bad csum distribution
>    With above problem fixed, I observed that the csum verification seems
>    to have only one worker.
>
>    Still investigating the cause.

This turns out to be a problem with the read submission queue depth.

New:

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz
aqu-sz  %util
vda           3982.00 1754704.00 23522.00  85.52    0.71   440.66
2.85 100.00

Old:
Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz
aqu-sz  %util
vda           7640.00 3427420.00 19355.00  71.70    2.17   448.62
16.60 100.00

The queue depth change means we may need a change to the mostly
submit-and-wait behavior of the stripe handling...

Thanks,
Qu

>
> Thanks,
> Qu
>
> On 2023/7/11 19:33, Qu Wenruo wrote:
>>
>>
>> On 2023/7/11 19:26, Martin Steigerwald wrote:
>>> Qu Wenruo - 11.07.23, 13:05:42 CEST:
>>>> On 2023/7/11 18:56, Martin Steigerwald wrote:
>>>>> Qu Wenruo - 11.07.23, 11:57:52 CEST:
>>>>>> On 2023/7/11 17:25, Martin Steigerwald wrote:
>>>>>>> Qu Wenruo - 11.07.23, 10:59:55 CEST:
>>>>>>>> On 2023/7/11 13:52, Martin Steigerwald wrote:
>>>>>>>>> Martin Steigerwald - 11.07.23, 07:49:43 CEST:
>>>>>>>>>> I see about 180000 reads in 10 seconds in atop. I have seen
>>>>>>>>>> latency
>>>>>>>>>> values from 55 to 85 µs which is highly unusual for NVME SSD
>>>>>>>>>> ("avio"
>>>>>>>>>> in atop¹).
>>> […]
>>>>>>>> Mind to try the following branch?
>>>>>>>>
>>>>>>>> https://github.com/adam900710/linux/tree/scrub_multi_thread
>>>>>>>>
>>>>>>>> Or you can grab the commit on top and backport to any kernel >=
>>>>>>>> 6.4.
>>>>>>>
>>>>>>> Cherry picking the commit on top of v6.4.3 lead to a merge
>>>>>>> conflict.
>>> […]
>>>>>> Well, I have only tested that patch on that development branch,
>>>>>> thus I can not ensure the result of the backport.
>>>>>>
>>>>>> But still, here you go the backported patch.
>>>>>>
>>>>>> I'd recommend to test the functionality of scrub on some less
>>>>>> important machine first then on your production latptop though.
>>>>>
>>>>> I took this calculated risk.
>>>>>
>>>>> However, while with the patch applied there seem to be more kworker
>>>>> threads doing work using 500-600% of CPU time in system (8 cores
>>>>> with
>>>>> hyper threading, so 16 logical cores) the result is even less
>>>>> performance. Latency values got even worse going up to 0,2 ms. An
>>>>> unrelated BTRFS filesystem in another logical volume is even stalled
>>>>> to almost a second for (mostly) write accesses.
>>>>>
>>>>> Scrubbing about 650 to 750 MiB/s for a volume with about 1,2 TiB of
>>>>> data, mostly in larger files. Now on second attempt even only 620
>>>>> MiB/s. Which is less than before. Before it reaches about 1 GiB/s.
>>>>> I made sure that no desktop search indexing was interfering.
>>>>>
>>>>> Oh, I forgot to mention, BTRFS uses xxhash here. However it was
>>>>> easily scrubbing at 1,5 to 2,5 GiB/s with 5.3. The filesystem uses
>>>>> zstd compression and free space tree (free space cache v2).
>>>>>
>>>>> So from what I can see here, your patch made it worse.
>>>>
>>>> Thanks for the confirming, this at least prove it's not the hashing
>>>> threads limit causing the regression.
>>>>
>>>> Which is pretty weird, the read pattern is in fact better than the
>>>> original behavior, all read are in 64K (even if there are some holes,
>>>> we are fine reading the garbage, this should reduce IOPS workload),
>>>> and we submit a batch of 8 of such read in one go.
>>>>
>>>> BTW, what's the CPU usage of v6.3 kernel? Is it higher or lower?
>>>> And what about the latency?
>>>
>>> CPU usage is between 600-700% on 6.3.9, Latency between 50-70 µs. And
>>> scrubbing speed is above 2 GiB/s, peaking at 2,27 GiB/s. Later it went
>>> down a bit to 1,7 GiB/s, maybe due to background activity.
>>
>> That 600~700% means btrfs is taking all its available thread_pool
>> (min(nr_cpu + 2, 8)).
>>
>> So although the patch doesn't work as expected, we're still limited by
>> the csum verification part.
>>
>> At least I have some clue now.
>>>
>>> I'd say the CPU usage to result (=scrubbing speed) ratio is much, much
>>> better with 6.3. However the latencies during scrubbing are pretty much
>>> the same. I even seen up to 0.2 ms.
>>>
>>>> Currently I'm out of ideas, for now you can revert that testing patch.
>>>>
>>>> If you're interested in more testing, you can apply the following
>>>> small diff, which changed the batch number of scrub.
>>>>
>>>> You can try either double or half the number to see which change helps
>>>> more.
>>>
>>> No time for further testing at the moment. Maybe at a later time.
>>>
>>> It might be good you put together a test setup yourself. Any computer
>>> with NVME SSD should do I think. Unless there is something very special
>>> about my laptop, but I doubt this. This reduces greatly on the turn-
>>> around time.
>>
>> Sure, I'll prepare a dedicated machine for this.
>>
>> Thanks for all your effort!
>> Qu
>>
>>>
>>> I think for now I am back at 6.3. It works. :)
>>>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Scrub of my nvme SSD has slowed by about 2/3
  2023-07-14  6:01                       ` Qu Wenruo
@ 2023-07-14  6:58                         ` Martin Steigerwald
  0 siblings, 0 replies; 30+ messages in thread
From: Martin Steigerwald @ 2023-07-14  6:58 UTC (permalink / raw)
  To: linux-btrfs, Tim Cuthbertson, Qu Wenruo, Qu Wenruo

Hi Qu.

Qu Wenruo - 14.07.23, 08:01:34 CEST:
> >   Still investigating the cause.
> 
> This turns out to be a problem with the read submission queue depth.

Thanks for your updates!

Good luck!

Best,
-- 
Martin



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Scrub of my nvme SSD has slowed by about 2/3
  2023-07-14  0:28                     ` Qu Wenruo
  2023-07-14  6:01                       ` Qu Wenruo
@ 2023-07-16  9:57                       ` Sebastian Döring
  2023-07-16 10:55                         ` Qu Wenruo
  1 sibling, 1 reply; 30+ messages in thread
From: Sebastian Döring @ 2023-07-16  9:57 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, Qu Wenruo

Hi all,

>I got a dedicated VM with a PCIE3.0 NVME passed to it without any host
cache.

>With v6.3 the scrub speed can reach 3GB/s while the v6.4 (misc-next) can
only go around 1GB/s.

I'm also observing severely degraded scrub performance (~50%) on a
spinning disk (on top of mdraid and LUKS). Are we sure this regression
is in any way NVME related?

Best regards,
Sebastian

On Fri, Jul 14, 2023 at 3:01 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
> Just a quick update on the situation.
>
> I got a dedicated VM with a PCIE3.0 NVME passed to it without any host
> cache.
>
> With v6.3 the scrub speed can reach 3GB/s while the v6.4 (misc-next) can
> only go around 1GB/s.
>
> With dedicated VM and more comprehensive telemetry, it shows there are
> two major problems:
>
> - Lack of block layer merging
>    All 64K stripes are just submitted as is, while the old code can
>    merge its read requests to around 512K.
>
>    The cause is the removal of block layer plug/unplug.
>
>    A quick 4 lines fix can improve the performance to around 1.5GB/s.
>
> - Bad csum distribution
>    With above problem fixed, I observed that the csum verification seems
>    to have only one worker.
>
>    Still investigating the cause.
>
> Thanks,
> Qu
>
> On 2023/7/11 19:33, Qu Wenruo wrote:
> >
> >
> > On 2023/7/11 19:26, Martin Steigerwald wrote:
> >> Qu Wenruo - 11.07.23, 13:05:42 CEST:
> >>> On 2023/7/11 18:56, Martin Steigerwald wrote:
> >>>> Qu Wenruo - 11.07.23, 11:57:52 CEST:
> >>>>> On 2023/7/11 17:25, Martin Steigerwald wrote:
> >>>>>> Qu Wenruo - 11.07.23, 10:59:55 CEST:
> >>>>>>> On 2023/7/11 13:52, Martin Steigerwald wrote:
> >>>>>>>> Martin Steigerwald - 11.07.23, 07:49:43 CEST:
> >>>>>>>>> I see about 180000 reads in 10 seconds in atop. I have seen
> >>>>>>>>> latency
> >>>>>>>>> values from 55 to 85 µs which is highly unusual for NVME SSD
> >>>>>>>>> ("avio"
> >>>>>>>>> in atop¹).
> >> […]
> >>>>>>> Mind to try the following branch?
> >>>>>>>
> >>>>>>> https://github.com/adam900710/linux/tree/scrub_multi_thread
> >>>>>>>
> >>>>>>> Or you can grab the commit on top and backport to any kernel >=
> >>>>>>> 6.4.
> >>>>>>
> >>>>>> Cherry picking the commit on top of v6.4.3 lead to a merge
> >>>>>> conflict.
> >> […]
> >>>>> Well, I have only tested that patch on that development branch,
> >>>>> thus I can not ensure the result of the backport.
> >>>>>
> >>>>> But still, here you go the backported patch.
> >>>>>
> >>>>> I'd recommend to test the functionality of scrub on some less
> >>>>> important machine first then on your production latptop though.
> >>>>
> >>>> I took this calculated risk.
> >>>>
> >>>> However, while with the patch applied there seem to be more kworker
> >>>> threads doing work using 500-600% of CPU time in system (8 cores
> >>>> with
> >>>> hyper threading, so 16 logical cores) the result is even less
> >>>> performance. Latency values got even worse going up to 0,2 ms. An
> >>>> unrelated BTRFS filesystem in another logical volume is even stalled
> >>>> to almost a second for (mostly) write accesses.
> >>>>
> >>>> Scrubbing about 650 to 750 MiB/s for a volume with about 1,2 TiB of
> >>>> data, mostly in larger files. Now on second attempt even only 620
> >>>> MiB/s. Which is less than before. Before it reaches about 1 GiB/s.
> >>>> I made sure that no desktop search indexing was interfering.
> >>>>
> >>>> Oh, I forgot to mention, BTRFS uses xxhash here. However it was
> >>>> easily scrubbing at 1,5 to 2,5 GiB/s with 5.3. The filesystem uses
> >>>> zstd compression and free space tree (free space cache v2).
> >>>>
> >>>> So from what I can see here, your patch made it worse.
> >>>
> >>> Thanks for the confirming, this at least prove it's not the hashing
> >>> threads limit causing the regression.
> >>>
> >>> Which is pretty weird, the read pattern is in fact better than the
> >>> original behavior, all read are in 64K (even if there are some holes,
> >>> we are fine reading the garbage, this should reduce IOPS workload),
> >>> and we submit a batch of 8 of such read in one go.
> >>>
> >>> BTW, what's the CPU usage of v6.3 kernel? Is it higher or lower?
> >>> And what about the latency?
> >>
> >> CPU usage is between 600-700% on 6.3.9, Latency between 50-70 µs. And
> >> scrubbing speed is above 2 GiB/s, peaking at 2,27 GiB/s. Later it went
> >> down a bit to 1,7 GiB/s, maybe due to background activity.
> >
> > That 600~700% means btrfs is taking all its available thread_pool
> > (min(nr_cpu + 2, 8)).
> >
> > So although the patch doesn't work as expected, we're still limited by
> > the csum verification part.
> >
> > At least I have some clue now.
> >>
> >> I'd say the CPU usage to result (=scrubbing speed) ratio is much, much
> >> better with 6.3. However the latencies during scrubbing are pretty much
> >> the same. I even seen up to 0.2 ms.
> >>
> >>> Currently I'm out of ideas, for now you can revert that testing patch.
> >>>
> >>> If you're interested in more testing, you can apply the following
> >>> small diff, which changed the batch number of scrub.
> >>>
> >>> You can try either double or half the number to see which change helps
> >>> more.
> >>
> >> No time for further testing at the moment. Maybe at a later time.
> >>
> >> It might be good you put together a test setup yourself. Any computer
> >> with NVME SSD should do I think. Unless there is something very special
> >> about my laptop, but I doubt this. This reduces greatly on the turn-
> >> around time.
> >
> > Sure, I'll prepare a dedicated machine for this.
> >
> > Thanks for all your effort!
> > Qu
> >
> >>
> >> I think for now I am back at 6.3. It works. :)
> >>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Scrub of my nvme SSD has slowed by about 2/3
  2023-07-16  9:57                       ` Sebastian Döring
@ 2023-07-16 10:55                         ` Qu Wenruo
  2023-07-16 16:01                           ` Sebastian Döring
  0 siblings, 1 reply; 30+ messages in thread
From: Qu Wenruo @ 2023-07-16 10:55 UTC (permalink / raw)
  To: Sebastian Döring; +Cc: linux-btrfs, Qu Wenruo



On 2023/7/16 17:57, Sebastian Döring wrote:
> Hi all,
>
>> I got a dedicated VM with a PCIE3.0 NVME passed to it without any host
> cache.
>
>> With v6.3 the scrub speed can reach 3GB/s while the v6.4 (misc-next) can
> only go around 1GB/s.
>
> I'm also observing severely degraded scrub performance (~50%) on a
> spinning disk (on top of mdraid and LUKS). Are we sure this regression
> is in any way NVME related?

The regression would happen if the storage devices don't have any
firmware level request merge (SATA NCQ feature).

In that case, the rework scrub block size is way smaller than the old
one (64K vs 512K), which would cause performance regression.

You can try this patch to see if it helps with your setup:

https://patchwork.kernel.org/project/linux-btrfs/patch/ef3951fa130f9b61fe097e8d5f6e425525165a28.1689330324.git.wqu@suse.com/

For NVME, it still doesn't reach the old performance, but for SATA HDDs
even without NCQ, it should more or less reach the old performance.

Thanks,
Qu

>
> Best regards,
> Sebastian
>
> On Fri, Jul 14, 2023 at 3:01 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>
>> Just a quick update on the situation.
>>
>> I got a dedicated VM with a PCIE3.0 NVME passed to it without any host
>> cache.
>>
>> With v6.3 the scrub speed can reach 3GB/s while the v6.4 (misc-next) can
>> only go around 1GB/s.
>>
>> With dedicated VM and more comprehensive telemetry, it shows there are
>> two major problems:
>>
>> - Lack of block layer merging
>>     All 64K stripes are just submitted as is, while the old code can
>>     merge its read requests to around 512K.
>>
>>     The cause is the removal of block layer plug/unplug.
>>
>>     A quick 4 lines fix can improve the performance to around 1.5GB/s.
>>
>> - Bad csum distribution
>>     With above problem fixed, I observed that the csum verification seems
>>     to have only one worker.
>>
>>     Still investigating the cause.
>>
>> Thanks,
>> Qu
>>
>> On 2023/7/11 19:33, Qu Wenruo wrote:
>>>
>>>
>>> On 2023/7/11 19:26, Martin Steigerwald wrote:
>>>> Qu Wenruo - 11.07.23, 13:05:42 CEST:
>>>>> On 2023/7/11 18:56, Martin Steigerwald wrote:
>>>>>> Qu Wenruo - 11.07.23, 11:57:52 CEST:
>>>>>>> On 2023/7/11 17:25, Martin Steigerwald wrote:
>>>>>>>> Qu Wenruo - 11.07.23, 10:59:55 CEST:
>>>>>>>>> On 2023/7/11 13:52, Martin Steigerwald wrote:
>>>>>>>>>> Martin Steigerwald - 11.07.23, 07:49:43 CEST:
>>>>>>>>>>> I see about 180000 reads in 10 seconds in atop. I have seen
>>>>>>>>>>> latency
>>>>>>>>>>> values from 55 to 85 µs which is highly unusual for NVME SSD
>>>>>>>>>>> ("avio"
>>>>>>>>>>> in atop¹).
>>>> […]
>>>>>>>>> Mind to try the following branch?
>>>>>>>>>
>>>>>>>>> https://github.com/adam900710/linux/tree/scrub_multi_thread
>>>>>>>>>
>>>>>>>>> Or you can grab the commit on top and backport to any kernel >=
>>>>>>>>> 6.4.
>>>>>>>>
>>>>>>>> Cherry picking the commit on top of v6.4.3 lead to a merge
>>>>>>>> conflict.
>>>> […]
>>>>>>> Well, I have only tested that patch on that development branch,
>>>>>>> thus I can not ensure the result of the backport.
>>>>>>>
>>>>>>> But still, here you go the backported patch.
>>>>>>>
>>>>>>> I'd recommend to test the functionality of scrub on some less
>>>>>>> important machine first then on your production latptop though.
>>>>>>
>>>>>> I took this calculated risk.
>>>>>>
>>>>>> However, while with the patch applied there seem to be more kworker
>>>>>> threads doing work using 500-600% of CPU time in system (8 cores
>>>>>> with
>>>>>> hyper threading, so 16 logical cores) the result is even less
>>>>>> performance. Latency values got even worse going up to 0,2 ms. An
>>>>>> unrelated BTRFS filesystem in another logical volume is even stalled
>>>>>> to almost a second for (mostly) write accesses.
>>>>>>
>>>>>> Scrubbing about 650 to 750 MiB/s for a volume with about 1,2 TiB of
>>>>>> data, mostly in larger files. Now on second attempt even only 620
>>>>>> MiB/s. Which is less than before. Before it reaches about 1 GiB/s.
>>>>>> I made sure that no desktop search indexing was interfering.
>>>>>>
>>>>>> Oh, I forgot to mention, BTRFS uses xxhash here. However it was
>>>>>> easily scrubbing at 1,5 to 2,5 GiB/s with 5.3. The filesystem uses
>>>>>> zstd compression and free space tree (free space cache v2).
>>>>>>
>>>>>> So from what I can see here, your patch made it worse.
>>>>>
>>>>> Thanks for the confirming, this at least prove it's not the hashing
>>>>> threads limit causing the regression.
>>>>>
>>>>> Which is pretty weird, the read pattern is in fact better than the
>>>>> original behavior, all read are in 64K (even if there are some holes,
>>>>> we are fine reading the garbage, this should reduce IOPS workload),
>>>>> and we submit a batch of 8 of such read in one go.
>>>>>
>>>>> BTW, what's the CPU usage of v6.3 kernel? Is it higher or lower?
>>>>> And what about the latency?
>>>>
>>>> CPU usage is between 600-700% on 6.3.9, Latency between 50-70 µs. And
>>>> scrubbing speed is above 2 GiB/s, peaking at 2,27 GiB/s. Later it went
>>>> down a bit to 1,7 GiB/s, maybe due to background activity.
>>>
>>> That 600~700% means btrfs is taking all its available thread_pool
>>> (min(nr_cpu + 2, 8)).
>>>
>>> So although the patch doesn't work as expected, we're still limited by
>>> the csum verification part.
>>>
>>> At least I have some clue now.
>>>>
>>>> I'd say the CPU usage to result (=scrubbing speed) ratio is much, much
>>>> better with 6.3. However the latencies during scrubbing are pretty much
>>>> the same. I even seen up to 0.2 ms.
>>>>
>>>>> Currently I'm out of ideas, for now you can revert that testing patch.
>>>>>
>>>>> If you're interested in more testing, you can apply the following
>>>>> small diff, which changed the batch number of scrub.
>>>>>
>>>>> You can try either double or half the number to see which change helps
>>>>> more.
>>>>
>>>> No time for further testing at the moment. Maybe at a later time.
>>>>
>>>> It might be good you put together a test setup yourself. Any computer
>>>> with NVME SSD should do I think. Unless there is something very special
>>>> about my laptop, but I doubt this. This reduces greatly on the turn-
>>>> around time.
>>>
>>> Sure, I'll prepare a dedicated machine for this.
>>>
>>> Thanks for all your effort!
>>> Qu
>>>
>>>>
>>>> I think for now I am back at 6.3. It works. :)
>>>>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Scrub of my nvme SSD has slowed by about 2/3
  2023-07-16 10:55                         ` Qu Wenruo
@ 2023-07-16 16:01                           ` Sebastian Döring
  2023-07-17  5:23                             ` Qu Wenruo
  0 siblings, 1 reply; 30+ messages in thread
From: Sebastian Döring @ 2023-07-16 16:01 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, Qu Wenruo

Hi Qu,

>You can try this patch to see if it helps with your setup:

>https://patchwork.kernel.org/project/linux-btrfs/patch/ef3951fa130f9b61fe097e8d5f6e425525165a28.1689330324.git.wqu@suse.com/

I gave this patch a shot, applied to 6.4.3, and it did indeed return
scrub performance to 6.3.*-ish levels for me (at least in this short 5
minute test):

rockpro64 ~ # scrub canceled for 0a1dfaa5-e448-44df-b5ca-3024b9f35b43
Scrub started:    Sun Jul 16 17:48:48 2023
Status:           aborted
Duration:         0:05:28
Total to scrub:   103.34GiB
Rate:             322.61MiB/s
Error summary:    no errors found

Unpatched, I saw around ~140MiB/s.

As an aside, are you aware that the "Total to scrub" seems totally
borked in per device scrub status (-d flag)?

Scrub device /dev/mapper/disk5-6 (id 1) status
Scrub resumed:    Sun Jul 16 17:56:53 2023
Status:           running
Duration:         33:32:24
Time left:        0:00:00
ETA:              Sun Jul 16 17:59:36 2023
Total to scrub:   7.49TiB
Bytes scrubbed:   7.49TiB  (100.00%)
Rate:             65.03MiB/s
Error summary:    no errors found

Scrub device /dev/mapper/disk7-8 (id 2) status
Scrub resumed:    Sun Jul 16 17:56:53 2023
Status:           running
Duration:         33:32:24
Time left:        0:00:00
ETA:              Sun Jul 16 17:59:36 2023
Total to scrub:   8.35TiB
Bytes scrubbed:   8.35TiB  (100.00%)
Rate:             72.48MiB/s
Error summary:    no errors found


Best regards,
Sebastian

On Sun, Jul 16, 2023 at 12:55 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
>
> On 2023/7/16 17:57, Sebastian Döring wrote:
> > Hi all,
> >
> >> I got a dedicated VM with a PCIE3.0 NVME passed to it without any host
> > cache.
> >
> >> With v6.3 the scrub speed can reach 3GB/s while the v6.4 (misc-next) can
> > only go around 1GB/s.
> >
> > I'm also observing severely degraded scrub performance (~50%) on a
> > spinning disk (on top of mdraid and LUKS). Are we sure this regression
> > is in any way NVME related?
>
> The regression would happen if the storage devices don't have any
> firmware level request merge (SATA NCQ feature).
>
> In that case, the rework scrub block size is way smaller than the old
> one (64K vs 512K), which would cause performance regression.
>
> You can try this patch to see if it helps with your setup:
>
> https://patchwork.kernel.org/project/linux-btrfs/patch/ef3951fa130f9b61fe097e8d5f6e425525165a28.1689330324.git.wqu@suse.com/
>
> For NVME, it still doesn't reach the old performance, but for SATA HDDs
> even without NCQ, it should more or less reach the old performance.
>
> Thanks,
> Qu
>
> >
> > Best regards,
> > Sebastian
> >
> > On Fri, Jul 14, 2023 at 3:01 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> >>
> >> Just a quick update on the situation.
> >>
> >> I got a dedicated VM with a PCIE3.0 NVME passed to it without any host
> >> cache.
> >>
> >> With v6.3 the scrub speed can reach 3GB/s while the v6.4 (misc-next) can
> >> only go around 1GB/s.
> >>
> >> With dedicated VM and more comprehensive telemetry, it shows there are
> >> two major problems:
> >>
> >> - Lack of block layer merging
> >>     All 64K stripes are just submitted as is, while the old code can
> >>     merge its read requests to around 512K.
> >>
> >>     The cause is the removal of block layer plug/unplug.
> >>
> >>     A quick 4 lines fix can improve the performance to around 1.5GB/s.
> >>
> >> - Bad csum distribution
> >>     With above problem fixed, I observed that the csum verification seems
> >>     to have only one worker.
> >>
> >>     Still investigating the cause.
> >>
> >> Thanks,
> >> Qu
> >>
> >> On 2023/7/11 19:33, Qu Wenruo wrote:
> >>>
> >>>
> >>> On 2023/7/11 19:26, Martin Steigerwald wrote:
> >>>> Qu Wenruo - 11.07.23, 13:05:42 CEST:
> >>>>> On 2023/7/11 18:56, Martin Steigerwald wrote:
> >>>>>> Qu Wenruo - 11.07.23, 11:57:52 CEST:
> >>>>>>> On 2023/7/11 17:25, Martin Steigerwald wrote:
> >>>>>>>> Qu Wenruo - 11.07.23, 10:59:55 CEST:
> >>>>>>>>> On 2023/7/11 13:52, Martin Steigerwald wrote:
> >>>>>>>>>> Martin Steigerwald - 11.07.23, 07:49:43 CEST:
> >>>>>>>>>>> I see about 180000 reads in 10 seconds in atop. I have seen
> >>>>>>>>>>> latency
> >>>>>>>>>>> values from 55 to 85 µs which is highly unusual for NVME SSD
> >>>>>>>>>>> ("avio"
> >>>>>>>>>>> in atop¹).
> >>>> […]
> >>>>>>>>> Mind to try the following branch?
> >>>>>>>>>
> >>>>>>>>> https://github.com/adam900710/linux/tree/scrub_multi_thread
> >>>>>>>>>
> >>>>>>>>> Or you can grab the commit on top and backport to any kernel >=
> >>>>>>>>> 6.4.
> >>>>>>>>
> >>>>>>>> Cherry picking the commit on top of v6.4.3 lead to a merge
> >>>>>>>> conflict.
> >>>> […]
> >>>>>>> Well, I have only tested that patch on that development branch,
> >>>>>>> thus I can not ensure the result of the backport.
> >>>>>>>
> >>>>>>> But still, here you go the backported patch.
> >>>>>>>
> >>>>>>> I'd recommend to test the functionality of scrub on some less
> >>>>>>> important machine first then on your production latptop though.
> >>>>>>
> >>>>>> I took this calculated risk.
> >>>>>>
> >>>>>> However, while with the patch applied there seem to be more kworker
> >>>>>> threads doing work using 500-600% of CPU time in system (8 cores
> >>>>>> with
> >>>>>> hyper threading, so 16 logical cores) the result is even less
> >>>>>> performance. Latency values got even worse going up to 0,2 ms. An
> >>>>>> unrelated BTRFS filesystem in another logical volume is even stalled
> >>>>>> to almost a second for (mostly) write accesses.
> >>>>>>
> >>>>>> Scrubbing about 650 to 750 MiB/s for a volume with about 1,2 TiB of
> >>>>>> data, mostly in larger files. Now on second attempt even only 620
> >>>>>> MiB/s. Which is less than before. Before it reaches about 1 GiB/s.
> >>>>>> I made sure that no desktop search indexing was interfering.
> >>>>>>
> >>>>>> Oh, I forgot to mention, BTRFS uses xxhash here. However it was
> >>>>>> easily scrubbing at 1,5 to 2,5 GiB/s with 5.3. The filesystem uses
> >>>>>> zstd compression and free space tree (free space cache v2).
> >>>>>>
> >>>>>> So from what I can see here, your patch made it worse.
> >>>>>
> >>>>> Thanks for the confirming, this at least prove it's not the hashing
> >>>>> threads limit causing the regression.
> >>>>>
> >>>>> Which is pretty weird, the read pattern is in fact better than the
> >>>>> original behavior, all read are in 64K (even if there are some holes,
> >>>>> we are fine reading the garbage, this should reduce IOPS workload),
> >>>>> and we submit a batch of 8 of such read in one go.
> >>>>>
> >>>>> BTW, what's the CPU usage of v6.3 kernel? Is it higher or lower?
> >>>>> And what about the latency?
> >>>>
> >>>> CPU usage is between 600-700% on 6.3.9, Latency between 50-70 µs. And
> >>>> scrubbing speed is above 2 GiB/s, peaking at 2,27 GiB/s. Later it went
> >>>> down a bit to 1,7 GiB/s, maybe due to background activity.
> >>>
> >>> That 600~700% means btrfs is taking all its available thread_pool
> >>> (min(nr_cpu + 2, 8)).
> >>>
> >>> So although the patch doesn't work as expected, we're still limited by
> >>> the csum verification part.
> >>>
> >>> At least I have some clue now.
> >>>>
> >>>> I'd say the CPU usage to result (=scrubbing speed) ratio is much, much
> >>>> better with 6.3. However the latencies during scrubbing are pretty much
> >>>> the same. I even seen up to 0.2 ms.
> >>>>
> >>>>> Currently I'm out of ideas, for now you can revert that testing patch.
> >>>>>
> >>>>> If you're interested in more testing, you can apply the following
> >>>>> small diff, which changed the batch number of scrub.
> >>>>>
> >>>>> You can try either double or half the number to see which change helps
> >>>>> more.
> >>>>
> >>>> No time for further testing at the moment. Maybe at a later time.
> >>>>
> >>>> It might be good you put together a test setup yourself. Any computer
> >>>> with NVME SSD should do I think. Unless there is something very special
> >>>> about my laptop, but I doubt this. This reduces greatly on the turn-
> >>>> around time.
> >>>
> >>> Sure, I'll prepare a dedicated machine for this.
> >>>
> >>> Thanks for all your effort!
> >>> Qu
> >>>
> >>>>
> >>>> I think for now I am back at 6.3. It works. :)
> >>>>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Scrub of my nvme SSD has slowed by about 2/3
  2023-07-16 16:01                           ` Sebastian Döring
@ 2023-07-17  5:23                             ` Qu Wenruo
  0 siblings, 0 replies; 30+ messages in thread
From: Qu Wenruo @ 2023-07-17  5:23 UTC (permalink / raw)
  To: Sebastian Döring, Qu Wenruo; +Cc: linux-btrfs



On 2023/7/17 00:01, Sebastian Döring wrote:
> Hi Qu,
> 
>> You can try this patch to see if it helps with your setup:
> 
>> https://patchwork.kernel.org/project/linux-btrfs/patch/ef3951fa130f9b61fe097e8d5f6e425525165a28.1689330324.git.wqu@suse.com/
> 
> I gave this patch a shot, applied to 6.4.3, and it did indeed return
> scrub performance to 6.3.*-ish levels for me (at least in this short 5
> minute test):
> 
> rockpro64 ~ # scrub canceled for 0a1dfaa5-e448-44df-b5ca-3024b9f35b43
> Scrub started:    Sun Jul 16 17:48:48 2023
> Status:           aborted
> Duration:         0:05:28
> Total to scrub:   103.34GiB
> Rate:             322.61MiB/s
> Error summary:    no errors found
> 
> Unpatched, I saw around ~140MiB/s.
> 
> As an aside, are you aware that the "Total to scrub" seems totally
> borked in per device scrub status (-d flag)?

That's a bug in btrfs-progs AFAIK.

There are some cases that btrfs-progs uses the total size of the disk 
(instead of the used space) to report.

Although for the case of per-device scrubbing, it may need more accurate 
checks (needs to go through dev-extent tree to grab the real per-device 
used space).

There is already a patch to fix some of the cases, but there may be more:

https://patchwork.kernel.org/project/linux-btrfs/patch/2e1ee8fb0a05dbb2f6a4327d5b1383c3f7635dea.1685924954.git.wqu@suse.com/

Thanks,
Qu

> 
> Scrub device /dev/mapper/disk5-6 (id 1) status
> Scrub resumed:    Sun Jul 16 17:56:53 2023
> Status:           running
> Duration:         33:32:24
> Time left:        0:00:00
> ETA:              Sun Jul 16 17:59:36 2023
> Total to scrub:   7.49TiB
> Bytes scrubbed:   7.49TiB  (100.00%)
> Rate:             65.03MiB/s
> Error summary:    no errors found
> 
> Scrub device /dev/mapper/disk7-8 (id 2) status
> Scrub resumed:    Sun Jul 16 17:56:53 2023
> Status:           running
> Duration:         33:32:24
> Time left:        0:00:00
> ETA:              Sun Jul 16 17:59:36 2023
> Total to scrub:   8.35TiB
> Bytes scrubbed:   8.35TiB  (100.00%)
> Rate:             72.48MiB/s
> Error summary:    no errors found
> 
> 
> Best regards,
> Sebastian
> 
> On Sun, Jul 16, 2023 at 12:55 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>
>>
>>
>> On 2023/7/16 17:57, Sebastian Döring wrote:
>>> Hi all,
>>>
>>>> I got a dedicated VM with a PCIE3.0 NVME passed to it without any host
>>> cache.
>>>
>>>> With v6.3 the scrub speed can reach 3GB/s while the v6.4 (misc-next) can
>>> only go around 1GB/s.
>>>
>>> I'm also observing severely degraded scrub performance (~50%) on a
>>> spinning disk (on top of mdraid and LUKS). Are we sure this regression
>>> is in any way NVME related?
>>
>> The regression would happen if the storage devices don't have any
>> firmware level request merge (SATA NCQ feature).
>>
>> In that case, the rework scrub block size is way smaller than the old
>> one (64K vs 512K), which would cause performance regression.
>>
>> You can try this patch to see if it helps with your setup:
>>
>> https://patchwork.kernel.org/project/linux-btrfs/patch/ef3951fa130f9b61fe097e8d5f6e425525165a28.1689330324.git.wqu@suse.com/
>>
>> For NVME, it still doesn't reach the old performance, but for SATA HDDs
>> even without NCQ, it should more or less reach the old performance.
>>
>> Thanks,
>> Qu
>>
>>>
>>> Best regards,
>>> Sebastian
>>>
>>> On Fri, Jul 14, 2023 at 3:01 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>>>
>>>> Just a quick update on the situation.
>>>>
>>>> I got a dedicated VM with a PCIE3.0 NVME passed to it without any host
>>>> cache.
>>>>
>>>> With v6.3 the scrub speed can reach 3GB/s while the v6.4 (misc-next) can
>>>> only go around 1GB/s.
>>>>
>>>> With dedicated VM and more comprehensive telemetry, it shows there are
>>>> two major problems:
>>>>
>>>> - Lack of block layer merging
>>>>      All 64K stripes are just submitted as is, while the old code can
>>>>      merge its read requests to around 512K.
>>>>
>>>>      The cause is the removal of block layer plug/unplug.
>>>>
>>>>      A quick 4 lines fix can improve the performance to around 1.5GB/s.
>>>>
>>>> - Bad csum distribution
>>>>      With above problem fixed, I observed that the csum verification seems
>>>>      to have only one worker.
>>>>
>>>>      Still investigating the cause.
>>>>
>>>> Thanks,
>>>> Qu
>>>>
>>>> On 2023/7/11 19:33, Qu Wenruo wrote:
>>>>>
>>>>>
>>>>> On 2023/7/11 19:26, Martin Steigerwald wrote:
>>>>>> Qu Wenruo - 11.07.23, 13:05:42 CEST:
>>>>>>> On 2023/7/11 18:56, Martin Steigerwald wrote:
>>>>>>>> Qu Wenruo - 11.07.23, 11:57:52 CEST:
>>>>>>>>> On 2023/7/11 17:25, Martin Steigerwald wrote:
>>>>>>>>>> Qu Wenruo - 11.07.23, 10:59:55 CEST:
>>>>>>>>>>> On 2023/7/11 13:52, Martin Steigerwald wrote:
>>>>>>>>>>>> Martin Steigerwald - 11.07.23, 07:49:43 CEST:
>>>>>>>>>>>>> I see about 180000 reads in 10 seconds in atop. I have seen
>>>>>>>>>>>>> latency
>>>>>>>>>>>>> values from 55 to 85 µs which is highly unusual for NVME SSD
>>>>>>>>>>>>> ("avio"
>>>>>>>>>>>>> in atop¹).
>>>>>> […]
>>>>>>>>>>> Mind to try the following branch?
>>>>>>>>>>>
>>>>>>>>>>> https://github.com/adam900710/linux/tree/scrub_multi_thread
>>>>>>>>>>>
>>>>>>>>>>> Or you can grab the commit on top and backport to any kernel >=
>>>>>>>>>>> 6.4.
>>>>>>>>>>
>>>>>>>>>> Cherry picking the commit on top of v6.4.3 lead to a merge
>>>>>>>>>> conflict.
>>>>>> […]
>>>>>>>>> Well, I have only tested that patch on that development branch,
>>>>>>>>> thus I can not ensure the result of the backport.
>>>>>>>>>
>>>>>>>>> But still, here you go the backported patch.
>>>>>>>>>
>>>>>>>>> I'd recommend to test the functionality of scrub on some less
>>>>>>>>> important machine first then on your production latptop though.
>>>>>>>>
>>>>>>>> I took this calculated risk.
>>>>>>>>
>>>>>>>> However, while with the patch applied there seem to be more kworker
>>>>>>>> threads doing work using 500-600% of CPU time in system (8 cores
>>>>>>>> with
>>>>>>>> hyper threading, so 16 logical cores) the result is even less
>>>>>>>> performance. Latency values got even worse going up to 0,2 ms. An
>>>>>>>> unrelated BTRFS filesystem in another logical volume is even stalled
>>>>>>>> to almost a second for (mostly) write accesses.
>>>>>>>>
>>>>>>>> Scrubbing about 650 to 750 MiB/s for a volume with about 1,2 TiB of
>>>>>>>> data, mostly in larger files. Now on second attempt even only 620
>>>>>>>> MiB/s. Which is less than before. Before it reaches about 1 GiB/s.
>>>>>>>> I made sure that no desktop search indexing was interfering.
>>>>>>>>
>>>>>>>> Oh, I forgot to mention, BTRFS uses xxhash here. However it was
>>>>>>>> easily scrubbing at 1,5 to 2,5 GiB/s with 5.3. The filesystem uses
>>>>>>>> zstd compression and free space tree (free space cache v2).
>>>>>>>>
>>>>>>>> So from what I can see here, your patch made it worse.
>>>>>>>
>>>>>>> Thanks for the confirming, this at least prove it's not the hashing
>>>>>>> threads limit causing the regression.
>>>>>>>
>>>>>>> Which is pretty weird, the read pattern is in fact better than the
>>>>>>> original behavior, all read are in 64K (even if there are some holes,
>>>>>>> we are fine reading the garbage, this should reduce IOPS workload),
>>>>>>> and we submit a batch of 8 of such read in one go.
>>>>>>>
>>>>>>> BTW, what's the CPU usage of v6.3 kernel? Is it higher or lower?
>>>>>>> And what about the latency?
>>>>>>
>>>>>> CPU usage is between 600-700% on 6.3.9, Latency between 50-70 µs. And
>>>>>> scrubbing speed is above 2 GiB/s, peaking at 2,27 GiB/s. Later it went
>>>>>> down a bit to 1,7 GiB/s, maybe due to background activity.
>>>>>
>>>>> That 600~700% means btrfs is taking all its available thread_pool
>>>>> (min(nr_cpu + 2, 8)).
>>>>>
>>>>> So although the patch doesn't work as expected, we're still limited by
>>>>> the csum verification part.
>>>>>
>>>>> At least I have some clue now.
>>>>>>
>>>>>> I'd say the CPU usage to result (=scrubbing speed) ratio is much, much
>>>>>> better with 6.3. However the latencies during scrubbing are pretty much
>>>>>> the same. I even seen up to 0.2 ms.
>>>>>>
>>>>>>> Currently I'm out of ideas, for now you can revert that testing patch.
>>>>>>>
>>>>>>> If you're interested in more testing, you can apply the following
>>>>>>> small diff, which changed the batch number of scrub.
>>>>>>>
>>>>>>> You can try either double or half the number to see which change helps
>>>>>>> more.
>>>>>>
>>>>>> No time for further testing at the moment. Maybe at a later time.
>>>>>>
>>>>>> It might be good you put together a test setup yourself. Any computer
>>>>>> with NVME SSD should do I think. Unless there is something very special
>>>>>> about my laptop, but I doubt this. This reduces greatly on the turn-
>>>>>> around time.
>>>>>
>>>>> Sure, I'll prepare a dedicated machine for this.
>>>>>
>>>>> Thanks for all your effort!
>>>>> Qu
>>>>>
>>>>>>
>>>>>> I think for now I am back at 6.3. It works. :)
>>>>>>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Scrub of my nvme SSD has slowed by about 2/3
  2023-07-12 11:02 ` Linux regression tracking #adding (Thorsten Leemhuis)
@ 2023-07-19  6:42   ` Martin Steigerwald
  2023-07-19  6:55     ` Martin Steigerwald
  2023-08-29 12:17   ` Linux regression tracking #update (Thorsten Leemhuis)
  1 sibling, 1 reply; 30+ messages in thread
From: Martin Steigerwald @ 2023-07-19  6:42 UTC (permalink / raw)
  To: Tim Cuthbertson, linux-btrfs, Linux regressions mailing list
  Cc: Linux kernel regressions list

Hi Thorsten.

#regzbot monitor: https://lore.kernel.org/linux-btrfs/cover.1689744163.git.wqu@suse.com/

I hope that does it.

Above links to a patch of Qu about asking for other ideas to fix the
regression. It however has been superseded by a later patch of him.

So feel free to drop this.

Best,
Martin

Linux regression tracking #adding (Thorsten Leemhuis) - 12.07.23, 13:02:18 CEST:
> [CCing the regression list, as it should be in the loop for
> regressions:
> https://docs.kernel.org/admin-guide/reporting-regressions.html]
> 
> [TLDR: I'm adding this report to the list of tracked Linux kernel
> regressions; the text you find below is based on a few templates
> paragraphs you might have encountered already in similar form.
> See link in footer if these mails annoy you.]
> 
> On 03.07.23 22:19, Tim Cuthbertson wrote:
> > Yesterday, I noticed that a scrub of my main system filesystem has
> > slowed from about 2.9 gb/sec to about 949 mb/sec. My scrub used to
> > run in about 12 seconds, now it is taking 51 seconds. I had just
> > installed Linux kernel 6.4.1 on Arch Linux, upgrading from 6.3.9.
> > At first I suspected the new kernel, but now I am not so sure.
> 
> Thanks for the report. It seems it will take some work to address
> this, so to be sure the issue doesn't fall through the cracks
> unnoticed, I'm adding it to regzbot, the Linux kernel regression
> tracking bot:
> 
> #regzbot ^introduced e02ee89baa66
> #regzbot title btrfs: scrub nvme SSD has slowed by about 2/3 due to
> csum #regzbot monitor:
> https://lore.kernel.org/all/6c1ffe48e93fee9aa975ecc22dc2e7a1f3d7a0de.1
> 688539673.git.wqu@suse.com/ #regzbot ignore-activity
> 
> This isn't a regression? This issue or a fix for it are already
> discussed somewhere else? It was fixed already? You want to clarify
> when the regression started to happen? Or point out I got the title
> or something else totally wrong? Then just reply and tell me --
> ideally while also telling regzbot about it, as explained by the page
> listed in the footer of this mail.
> 
> Developers: When fixing the issue, remember to add 'Link:' tags
> pointing to the report (the parent of this mail). See page linked in
> footer for details.
> 
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker'
> hat) --
> Everything you wanna know about Linux kernel regression tracking:
> https://linux-regtracking.leemhuis.info/about/#tldr
> That page also explains what to do if mails like this annoy you.


-- 
Martin



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Scrub of my nvme SSD has slowed by about 2/3
  2023-07-19  6:42   ` Martin Steigerwald
@ 2023-07-19  6:55     ` Martin Steigerwald
  0 siblings, 0 replies; 30+ messages in thread
From: Martin Steigerwald @ 2023-07-19  6:55 UTC (permalink / raw)
  To: Tim Cuthbertson, linux-btrfs, Linux regressions mailing list
  Cc: Linux kernel regressions list

Martin Steigerwald - 19.07.23, 08:42:58 CEST:
> Hi Thorsten.
> 
> #regzbot monitor:
> https://lore.kernel.org/linux-btrfs/cover.1689744163.git.wqu@suse.com
> /
> 
> I hope that does it.
> 
> Above links to a patch of Qu about asking for other ideas to fix the
> regression. It however has been superseded by a later patch of him.

FWIW time-wise it was the other way around. The other patch was slightly 
earlier, but it not fully recover the original speed of scrubbing of 
NVME devices in 6.3¹. Thus Qu was asking for additional ideas.

[1] https://lore.kernel.org/linux-btrfs/4f48e79c-93f7-b473-648d-4c995070c8ac@gmx.com/T/#t

Best,
-- 
Martin



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Scrub of my nvme SSD has slowed by about 2/3
  2023-07-12 11:02 ` Linux regression tracking #adding (Thorsten Leemhuis)
  2023-07-19  6:42   ` Martin Steigerwald
@ 2023-08-29 12:17   ` Linux regression tracking #update (Thorsten Leemhuis)
  2023-09-08 11:54     ` Martin Steigerwald
  1 sibling, 1 reply; 30+ messages in thread
From: Linux regression tracking #update (Thorsten Leemhuis) @ 2023-08-29 12:17 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Linux kernel regressions list

On 12.07.23 13:02, Linux regression tracking #adding (Thorsten Leemhuis)
wrote:

> On 03.07.23 22:19, Tim Cuthbertson wrote:
>> Yesterday, I noticed that a scrub of my main system filesystem has
>> slowed from about 2.9 gb/sec to about 949 mb/sec. My scrub used to run
>> in about 12 seconds, now it is taking 51 seconds. I had just installed
>> Linux kernel 6.4.1 on Arch Linux, upgrading from 6.3.9. At first I
>> suspected the new kernel, but now I am not so sure.
> 
> Thanks for the report. It seems it will take some work to address this,
> so to be sure the issue doesn't fall through the cracks unnoticed, I'm
> adding it to regzbot, the Linux kernel regression tracking bot:
> 
> #regzbot ^introduced e02ee89baa66

#regzbot resolve: various changes merged for 6.6 improve things again;
more planned; backporting is planned, too;
#regzbot ignore-activity

(yes, that is not idea, but that's how it is sometimes)

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Scrub of my nvme SSD has slowed by about 2/3
  2023-08-29 12:17   ` Linux regression tracking #update (Thorsten Leemhuis)
@ 2023-09-08 11:54     ` Martin Steigerwald
  2023-09-08 22:03       ` Qu Wenruo
  0 siblings, 1 reply; 30+ messages in thread
From: Martin Steigerwald @ 2023-09-08 11:54 UTC (permalink / raw)
  To: linux-btrfs, Linux regressions mailing list,
	Linux kernel regressions list
  Cc: Qu Wenruo, Tim Cuthbertson

Hi Thorsten, Qu, Tim, everyone,

Linux regression tracking #update (Thorsten Leemhuis) - 29.08.23, 14:17:33 
CEST:
> On 12.07.23 13:02, Linux regression tracking #adding (Thorsten Leemhuis)
> wrote:
> > On 03.07.23 22:19, Tim Cuthbertson wrote:
> >> Yesterday, I noticed that a scrub of my main system filesystem has
> >> slowed from about 2.9 gb/sec to about 949 mb/sec. My scrub used to
> >> run
> >> in about 12 seconds, now it is taking 51 seconds. I had just
> >> installed
> >> Linux kernel 6.4.1 on Arch Linux, upgrading from 6.3.9. At first I
> >> suspected the new kernel, but now I am not so sure.
> > 
> > Thanks for the report. It seems it will take some work to address
> > this,
> > so to be sure the issue doesn't fall through the cracks unnoticed, I'm
> > adding it to regzbot, the Linux kernel regression tracking bot:
> > 
> > #regzbot ^introduced e02ee89baa66
> 
> #regzbot resolve: various changes merged for 6.6 improve things again;
> more planned; backporting is planned, too;
> #regzbot ignore-activity
> 
> (yes, that is not idea, but that's how it is sometimes)

ideal?

Scrubbing "/home" with 304.61GiB (interestingly both back then with 6.4 
and now with 6.5.2):

- 6.4: 966.84MiB/s
- 6.5.2:  748.02MiB/s

I expected an improvement.

Same Lenovo ThinkPad T14 AMD Gen 1 with AMD Ryzen 7 PRO 4750U, 32 GiB RAM 
and 2TB Samsung 980 Pro NVME SSD as before.

Ciao,
-- 
Martin



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Scrub of my nvme SSD has slowed by about 2/3
  2023-09-08 11:54     ` Martin Steigerwald
@ 2023-09-08 22:03       ` Qu Wenruo
  2023-09-09  8:06         ` Martin Steigerwald
  2023-10-13 13:07         ` Martin Steigerwald
  0 siblings, 2 replies; 30+ messages in thread
From: Qu Wenruo @ 2023-09-08 22:03 UTC (permalink / raw)
  To: Martin Steigerwald, linux-btrfs, Linux regressions mailing list
  Cc: Tim Cuthbertson



On 2023/9/8 19:54, Martin Steigerwald wrote:
> Hi Thorsten, Qu, Tim, everyone,
>
> Linux regression tracking #update (Thorsten Leemhuis) - 29.08.23, 14:17:33
> CEST:
>> On 12.07.23 13:02, Linux regression tracking #adding (Thorsten Leemhuis)
>> wrote:
>>> On 03.07.23 22:19, Tim Cuthbertson wrote:
>>>> Yesterday, I noticed that a scrub of my main system filesystem has
>>>> slowed from about 2.9 gb/sec to about 949 mb/sec. My scrub used to
>>>> run
>>>> in about 12 seconds, now it is taking 51 seconds. I had just
>>>> installed
>>>> Linux kernel 6.4.1 on Arch Linux, upgrading from 6.3.9. At first I
>>>> suspected the new kernel, but now I am not so sure.
>>>
>>> Thanks for the report. It seems it will take some work to address
>>> this,
>>> so to be sure the issue doesn't fall through the cracks unnoticed, I'm
>>> adding it to regzbot, the Linux kernel regression tracking bot:
>>>
>>> #regzbot ^introduced e02ee89baa66
>>
>> #regzbot resolve: various changes merged for 6.6 improve things again;
>> more planned; backporting is planned, too;
>> #regzbot ignore-activity
>>
>> (yes, that is not idea, but that's how it is sometimes)
>
> ideal?
>
> Scrubbing "/home" with 304.61GiB (interestingly both back then with 6.4
> and now with 6.5.2):
>
> - 6.4: 966.84MiB/s
> - 6.5.2:  748.02MiB/s
>
> I expected an improvement.
>
> Same Lenovo ThinkPad T14 AMD Gen 1 with AMD Ryzen 7 PRO 4750U, 32 GiB RAM
> and 2TB Samsung 980 Pro NVME SSD as before.

The fixes didn't arrive until v6.6.

Thanks,
Qu

>
> Ciao,

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Scrub of my nvme SSD has slowed by about 2/3
  2023-09-08 22:03       ` Qu Wenruo
@ 2023-09-09  8:06         ` Martin Steigerwald
  2023-10-13 13:07         ` Martin Steigerwald
  1 sibling, 0 replies; 30+ messages in thread
From: Martin Steigerwald @ 2023-09-09  8:06 UTC (permalink / raw)
  To: linux-btrfs, Linux regressions mailing list, Qu Wenruo; +Cc: Tim Cuthbertson

Qu Wenruo - 09.09.23, 00:03:38 CEST:
> > Scrubbing "/home" with 304.61GiB (interestingly both back then with
> > 6.4
> > and now with 6.5.2):
> > 
> > - 6.4: 966.84MiB/s
> > - 6.5.2:  748.02MiB/s
> > 
> > I expected an improvement.
> > 
> > Same Lenovo ThinkPad T14 AMD Gen 1 with AMD Ryzen 7 PRO 4750U, 32 GiB
> > RAM and 2TB Samsung 980 Pro NVME SSD as before.
> 
> The fixes didn't arrive until v6.6.

Ah, okay. That explains it.

Well I can wait.

Best,
-- 
Martin



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Scrub of my nvme SSD has slowed by about 2/3
  2023-09-08 22:03       ` Qu Wenruo
  2023-09-09  8:06         ` Martin Steigerwald
@ 2023-10-13 13:07         ` Martin Steigerwald
  1 sibling, 0 replies; 30+ messages in thread
From: Martin Steigerwald @ 2023-10-13 13:07 UTC (permalink / raw)
  To: linux-btrfs, Linux regressions mailing list, Qu Wenruo; +Cc: Tim Cuthbertson

Hello Qu, hello,

Qu Wenruo - 09.09.23, 00:03:38 CEST:
> > Scrubbing "/home" with 304.61GiB (interestingly both back then with
> > 6.4
> > and now with 6.5.2):
> > 
> > - 6.4: 966.84MiB/s
> > - 6.5.2:  748.02MiB/s
> > 
> > I expected an improvement.
> > 
> > Same Lenovo ThinkPad T14 AMD Gen 1 with AMD Ryzen 7 PRO 4750U, 32 GiB
> > RAM and 2TB Samsung 980 Pro NVME SSD as before.
> 
> The fixes didn't arrive until v6.6.

Thank you for making scrub fast again: /home with 1.88GiB/s ;) on 6.5.6.

Best,
-- 
Martin



^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2023-10-13 13:16 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-03 20:19 Scrub of my nvme SSD has slowed by about 2/3 Tim Cuthbertson
2023-07-03 23:49 ` Qu Wenruo
2023-07-05  2:44   ` Qu Wenruo
2023-07-11  5:36     ` Martin Steigerwald
2023-07-11  5:33 ` Martin Steigerwald
2023-07-11  5:49   ` Martin Steigerwald
2023-07-11  5:52     ` Martin Steigerwald
2023-07-11  8:59       ` Qu Wenruo
2023-07-11  9:25         ` Martin Steigerwald
2023-07-11  9:57           ` Qu Wenruo
2023-07-11 10:56             ` Martin Steigerwald
2023-07-11 11:05               ` Qu Wenruo
2023-07-11 11:26                 ` Martin Steigerwald
2023-07-11 11:33                   ` Qu Wenruo
2023-07-11 11:47                     ` Martin Steigerwald
2023-07-14  0:28                     ` Qu Wenruo
2023-07-14  6:01                       ` Qu Wenruo
2023-07-14  6:58                         ` Martin Steigerwald
2023-07-16  9:57                       ` Sebastian Döring
2023-07-16 10:55                         ` Qu Wenruo
2023-07-16 16:01                           ` Sebastian Döring
2023-07-17  5:23                             ` Qu Wenruo
2023-07-12 11:02 ` Linux regression tracking #adding (Thorsten Leemhuis)
2023-07-19  6:42   ` Martin Steigerwald
2023-07-19  6:55     ` Martin Steigerwald
2023-08-29 12:17   ` Linux regression tracking #update (Thorsten Leemhuis)
2023-09-08 11:54     ` Martin Steigerwald
2023-09-08 22:03       ` Qu Wenruo
2023-09-09  8:06         ` Martin Steigerwald
2023-10-13 13:07         ` Martin Steigerwald

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.