All of lore.kernel.org
 help / color / mirror / Atom feed
* [dm-devel] Integrity discard/trim extremely slow on NVMe SSD storage (~10GiB/minute)
@ 2021-04-19 18:29 Melvin Vermeeren
  2021-04-26 15:33 ` Mikulas Patocka
  0 siblings, 1 reply; 6+ messages in thread
From: Melvin Vermeeren @ 2021-04-19 18:29 UTC (permalink / raw)
  To: dm-devel


[-- Attachment #1.1: Type: text/plain, Size: 3008 bytes --]

Note: This was originally posted on cryptsetup GitLab.
Note: Reposting here for better visibility as it appears to be a kernel bug.
Ref: https://gitlab.com/cryptsetup/cryptsetup/-/issues/639

Issue description
-----------------

With a Seagate FireCuda 520 2TB NVMe SSD running in PCIe 3.0 x4 mode (my 
motherboard does not have PCIe 4.0), discards through `dm-integrity` layer are 
extremely slow to the point of being almost unusable or in some cases fully 
unusable.

This is so slow that having the `discard` option on swap in not possible, as 
it takes around 3 minutes to complete for 32GiB swap causing timeouts during 
boot which in turn causes various other services to fail resulting in a drop 
to the emergency shell.

`blkdiscard` directly to NVMe device takes I think 10 sec or so for the entire 
2TB, but through `dm-integrity` the rate is approx 10GiB per minute, meaning 
over 3 hours to discard the entire 2TB. Normal read and write operations are 
not affected and are high performance, easily reaching 2GiB/s through the 
entire layer: `disk dm-integrity mdadm luks lvm ext4`.

Checking the kernel thread usage in htop quite some `dm-integrity-offload` 
threads are in the `D` state with `0.0` CPU usage when discarding, which is 
rather odd. No integrity threads are actually working and read-write disk 
usage measured with `dstat` is not even 1MiB/s.

To detail the above, `dstat` shows extremely clear timings: 2 seconds 0k 
write, 1 second 512k write, repeat. Possible timeout in locks somewhere or 
other problematic lock situation?

Steps for reproducing the issue
-------------------------------

1. Create two 10G partitions on SSD.
2. Setup `dm-integrity` on one of these and open the device with `--allow-
discards`.
3. `blkdiscard` both partitions.
	* Raw partition is done instantly.
	* Integrity partition takes around a minute.

Additional info
---------------

The NVMe device is formatted to native 4096 byte sectors and the `dm-
integrity` layer also uses 4096 byte sectors.

Debian bullseye (testing), kernel 5.10.0-6-rt-amd64 5.10.28-1. Same issue 
occurred during testing with Arch Linux liveiso which is kernel 5.11.x. 
Cryptsetup package version 2.3.5.

On another server system (IBM POWER9, ppc64le) with SAS 3.0 SSD discard is 
working properly at more than acceptable speeds, showing significant CPU usage 
while discarding. In this case it is a regular Intel amd64 desktop system.

Debug log
---------

Nothing really fails, dmesg and syslog show no issues/warnings at all, not 
sure what to include.

Only appears to effect NVMe
---------------------------

Further tests on the same machine show that SATA SSD is not affected by this 
issue and discards at high performance. Appears to be NVMe-specific bug:
Ref: https://gitlab.com/cryptsetup/cryptsetup/-/issues/639#note_555208783

If there is anything I can do to help feel free to let me know.
Note that I am not subscribed to dm-level, please CC me directly.

Thanks,

-- 
Melvin Vermeeren
Systems engineer

[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 97 bytes --]

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dm-devel] Integrity discard/trim extremely slow on NVMe SSD storage (~10GiB/minute)
  2021-04-19 18:29 [dm-devel] Integrity discard/trim extremely slow on NVMe SSD storage (~10GiB/minute) Melvin Vermeeren
@ 2021-04-26 15:33 ` Mikulas Patocka
  2021-04-26 16:02   ` Mikulas Patocka
  2021-04-30 19:23   ` Melvin Vermeeren
  0 siblings, 2 replies; 6+ messages in thread
From: Mikulas Patocka @ 2021-04-26 15:33 UTC (permalink / raw)
  To: Melvin Vermeeren; +Cc: dm-devel, Milan Broz



On Mon, 19 Apr 2021, Melvin Vermeeren wrote:

> Note: This was originally posted on cryptsetup GitLab.
> Note: Reposting here for better visibility as it appears to be a kernel bug.
> Ref: https://gitlab.com/cryptsetup/cryptsetup/-/issues/639
> 
> Issue description
> -----------------
> 
> With a Seagate FireCuda 520 2TB NVMe SSD running in PCIe 3.0 x4 mode (my 
> motherboard does not have PCIe 4.0), discards through `dm-integrity` 
> layer are extremely slow to the point of being almost unusable or in 
> some cases fully unusable.
> 
> This is so slow that having the `discard` option on swap in not 
> possible, as it takes around 3 minutes to complete for 32GiB swap 
> causing timeouts during boot which in turn causes various other services 
> to fail resulting in a drop to the emergency shell.
> 
> `blkdiscard` directly to NVMe device takes I think 10 sec or so for the 
> entire 2TB, but through `dm-integrity` the rate is approx 10GiB per 
> minute, meaning over 3 hours to discard the entire 2TB. Normal read and 
> write operations are not affected and are high performance, easily 
> reaching 2GiB/s through the entire layer: `disk dm-integrity mdadm luks 
> lvm ext4`.
> 
> Checking the kernel thread usage in htop quite some 
> `dm-integrity-offload` threads are in the `D` state with `0.0` CPU usage 
> when discarding, which is rather odd. No integrity threads are actually 
> working and read-write disk usage measured with `dstat` is not even 
> 1MiB/s.
> 
> To detail the above, `dstat` shows extremely clear timings: 2 seconds 0k 
> write, 1 second 512k write, repeat. Possible timeout in locks somewhere 
> or other problematic lock situation?
> 
> Steps for reproducing the issue
> -------------------------------
> 
> 1. Create two 10G partitions on SSD.
> 2. Setup `dm-integrity` on one of these and open the device with `--allow-
> discards`.
> 3. `blkdiscard` both partitions.
> 	* Raw partition is done instantly.
> 	* Integrity partition takes around a minute.
> 
> Additional info
> ---------------
> 
> The NVMe device is formatted to native 4096 byte sectors and the `dm-
> integrity` layer also uses 4096 byte sectors.
> 
> Debian bullseye (testing), kernel 5.10.0-6-rt-amd64 5.10.28-1. Same issue 
> occurred during testing with Arch Linux liveiso which is kernel 5.11.x. 
> Cryptsetup package version 2.3.5.
> 
> On another server system (IBM POWER9, ppc64le) with SAS 3.0 SSD discard is 
> working properly at more than acceptable speeds, showing significant CPU usage 
> while discarding. In this case it is a regular Intel amd64 desktop system.
> 
> Debug log
> ---------
> 
> Nothing really fails, dmesg and syslog show no issues/warnings at all, not 
> sure what to include.
> 
> Only appears to effect NVMe
> ---------------------------
> 
> Further tests on the same machine show that SATA SSD is not affected by this 
> issue and discards at high performance. Appears to be NVMe-specific bug:
> Ref: https://gitlab.com/cryptsetup/cryptsetup/-/issues/639#note_555208783

I tried it on my nvme device (Samsung SSD 960 EVO 500GB) and I could 
discard 32GB in 5 seconds.

I assume that it is specific to the nvme device you are using. The device 
is perhaps slow due to a mix of dicard+read+write requests that 
dm-integrity generates.

> If there is anything I can do to help feel free to let me know.
> Note that I am not subscribed to dm-level, please CC me directly.
> 
> Thanks,

Could you try it on other nvme disks?

Mikulas

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dm-devel] Integrity discard/trim extremely slow on NVMe SSD storage (~10GiB/minute)
  2021-04-26 15:33 ` Mikulas Patocka
@ 2021-04-26 16:02   ` Mikulas Patocka
  2021-04-30 19:26     ` Melvin Vermeeren
  2021-04-30 19:23   ` Melvin Vermeeren
  1 sibling, 1 reply; 6+ messages in thread
From: Mikulas Patocka @ 2021-04-26 16:02 UTC (permalink / raw)
  To: Melvin Vermeeren; +Cc: dm-devel, Milan Broz



On Mon, 26 Apr 2021, Mikulas Patocka wrote:

> > Further tests on the same machine show that SATA SSD is not affected by this 
> > issue and discards at high performance. Appears to be NVMe-specific bug:
> > Ref: https://gitlab.com/cryptsetup/cryptsetup/-/issues/639#note_555208783
> 
> I tried it on my nvme device (Samsung SSD 960 EVO 500GB) and I could 
> discard 32GB in 5 seconds.
> 
> I assume that it is specific to the nvme device you are using. The device 
> is perhaps slow due to a mix of dicard+read+write requests that 
> dm-integrity generates.
> 
> > If there is anything I can do to help feel free to let me know.
> > Note that I am not subscribed to dm-level, please CC me directly.
> > 
> > Thanks,
> 
> Could you try it on other nvme disks?

Try this patch - it will avoid writing discard filler to metadata if it is 
already there. It won't help on the first discard, but it may help when 
discarding already discarded blocks.

Mikulas



dm-integrity: don't write metadata if we overwrite it with the same content

If we discard already discarded blocks, we do not need to write discard
filler to the metadata, because it is already there.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>

Index: linux-2.6/drivers/md/dm-integrity.c
===================================================================
--- linux-2.6.orig/drivers/md/dm-integrity.c
+++ linux-2.6/drivers/md/dm-integrity.c
@@ -1429,8 +1429,10 @@ static int dm_integrity_rw_tag(struct dm
 		if (op == TAG_READ) {
 			memcpy(tag, dp, to_copy);
 		} else if (op == TAG_WRITE) {
-			memcpy(dp, tag, to_copy);
-			dm_bufio_mark_partial_buffer_dirty(b, *metadata_offset, *metadata_offset + to_copy);
+			if (memcmp(dp, tag, to_copy)) {
+				memcpy(dp, tag, to_copy);
+				dm_bufio_mark_partial_buffer_dirty(b, *metadata_offset, *metadata_offset + to_copy);
+			}
 		} else {
 			/* e.g.: op == TAG_CMP */
 

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dm-devel] Integrity discard/trim extremely slow on NVMe SSD storage (~10GiB/minute)
  2021-04-26 15:33 ` Mikulas Patocka
  2021-04-26 16:02   ` Mikulas Patocka
@ 2021-04-30 19:23   ` Melvin Vermeeren
  1 sibling, 0 replies; 6+ messages in thread
From: Melvin Vermeeren @ 2021-04-30 19:23 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: dm-devel, Milan Broz


[-- Attachment #1.1: Type: text/plain, Size: 1048 bytes --]

Hi Mikulas,

Got around to checking the patches today, apologies for the delay.

On Monday, 26 April 2021 17:33:32 CEST Mikulas Patocka wrote:
> I tried it on my nvme device (Samsung SSD 960 EVO 500GB) and I could
> discard 32GB in 5 seconds.
> 
> I assume that it is specific to the nvme device you are using. The device
> is perhaps slow due to a mix of dicard+read+write requests that
> dm-integrity generates.

This makes sense to me as well.

> Could you try it on other nvme disks?

I cannot myself personally, but I asked a friend with Crucial P5 NVMe SSD to 
test and there the discards passed through dm-integrity are also well 
performing, so it seems specific so Seagate FireCuda 520.

I contacted Seagate about this problem with some refs and reproduction steps, 
hopefully they will resolve this in a firmware update or I will have to return 
the drives eventually. (The worst part is all I/O blocks hard while handling 
integrity discards, causing really system freeze like experience.)

Thanks,

-- 
Melvin Vermeeren
Systems engineer

[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 97 bytes --]

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dm-devel] Integrity discard/trim extremely slow on NVMe SSD storage (~10GiB/minute)
  2021-04-26 16:02   ` Mikulas Patocka
@ 2021-04-30 19:26     ` Melvin Vermeeren
  2021-05-12 19:26       ` Melvin Vermeeren
  0 siblings, 1 reply; 6+ messages in thread
From: Melvin Vermeeren @ 2021-04-30 19:26 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: dm-devel, Milan Broz


[-- Attachment #1.1: Type: text/plain, Size: 739 bytes --]

Hi Mikulas,

On Monday, 26 April 2021 18:02:12 CEST Mikulas Patocka wrote:
> Try this patch - it will avoid writing discard filler to metadata if it is
> already there. It won't help on the first discard, but it may help when
> discarding already discarded blocks.

I applied the patch and verified the patched kernel module is being used. 
Unfortunately there is no real difference while discarding. Presumably the 
drive is choking on the read requests alone? (As mentioned in other mail I 
contacted Seagate about this, it should be fixed in drive firmware.)

Still, the patch itself seems good and does avoid unnecessary writes, so it 
seems like a good thing to be merged in my opinion.

Thanks,

-- 
Melvin Vermeeren
Systems engineer

[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 97 bytes --]

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dm-devel] Integrity discard/trim extremely slow on NVMe SSD storage (~10GiB/minute)
  2021-04-30 19:26     ` Melvin Vermeeren
@ 2021-05-12 19:26       ` Melvin Vermeeren
  0 siblings, 0 replies; 6+ messages in thread
From: Melvin Vermeeren @ 2021-05-12 19:26 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: dm-devel, Milan Broz


[-- Attachment #1.1: Type: text/plain, Size: 1184 bytes --]

Hi again,

On Friday, 30 April 2021 21:26:23 CEST Melvin Vermeeren wrote:
> (As mentioned in other mail I
> contacted Seagate about this, it should be fixed in drive firmware.)

After some back and forth with testing and diagnostics Seagate has provided 
updated firmware version STNSC016, drives originally had version STNSC014. 
Updating this firmware with nvme-cli utilities has resolved the issue.

For validation, I allocated 500GiB space for testing in the full stack:
FireCuda 520 -> dm-integrity -> mdadm raid1 -> luks cryptsetup -> lvm -> lv

Then I discard the LV. Disk util % of both drives moves to 100% while 
discarding, as checked with dstat. It took 20.920s real seconds to discard 
500GiB on a live system.

With old firmware, rate with this workload was approx 10GiB per 70 seconds or 
so. This means the new firmware is approx (500/10) * (70/20.92) = 167.3 times 
faster in this workload with a rate of approx 1434GiB per minute.

Perhaps there are still optimisations possible for dm-integrity, but in this 
case it really was a device issue as you suspected. Thanks again for all the 
help and work on dm-integrity!

Cheers,

-- 
Melvin Vermeeren
Systems engineer

[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 97 bytes --]

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-05-17  7:51 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-19 18:29 [dm-devel] Integrity discard/trim extremely slow on NVMe SSD storage (~10GiB/minute) Melvin Vermeeren
2021-04-26 15:33 ` Mikulas Patocka
2021-04-26 16:02   ` Mikulas Patocka
2021-04-30 19:26     ` Melvin Vermeeren
2021-05-12 19:26       ` Melvin Vermeeren
2021-04-30 19:23   ` Melvin Vermeeren

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.