My Western Digital SN850 appears to suffer from deep power state issues

* My Western Digital SN850 appears to suffer from deep power state issues - considering submitting quirk patch.
@ 2022-05-15 16:00 Marcos Scriven
  2022-05-15 19:44 ` Keith Busch
  0 siblings, 1 reply; 6+ messages in thread
From: Marcos Scriven @ 2022-05-15 16:00 UTC (permalink / raw)
  To: linux-nvme

Hi all

I've been experiencing issues with my system freezing, and traced it down to the nvme controller resetting:

[268690.209099] nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10
[268690.289109] nvme 0000:01:00.0: enabling device (0000 -> 0002)
[268690.289234] nvme nvme0: Removing after probe failure status: -19
[268690.313116] nvme0n1: detected capacity change from 1953525168 to 0
[268690.313116] blk_update_request: I/O error, dev nvme0n1, sector 119170336 op 0x1:(WRITE) flags 0x800 phys_seg 14 prio class 0
[268690.313117] blk_update_request: I/O error, dev nvme0n1, sector 293367304 op 0x1:(WRITE) flags 0x8800 phys_seg 5 prio class 0
[268690.313118] blk_update_request: I/O error, dev nvme0n1, sector 1886015680 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0

Only a reboot resolves this.

The vendor/product id:

01:00.0 Non-Volatile memory controller [0108]: Sandisk Corp WD Black SN850 [15b7:5011] (rev 01)

This is installed in a desktop machine (the details of which I can give if relevant). I only mention this as the power profile is much less frugal than a laptop.

There are several bug reports on various distros about this, but to give just a couple:

https://bugzilla.kernel.org/show_bug.cgi?id=195039
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1705748

It's also clearly a know quirk. Here's a couple of examples:

https://github.com/torvalds/linux/commit/dc22c1c058b5c4fe967a20589e36f029ee42a706
https://github.com/torvalds/linux/commit/538e4a8c571efdf131834431e0c14808bcfb1004

On my particular system there seems to be two power state settings.

One from nvme id-ctrl /dev/nvme0:

ps    0 : mp:9.00W operational enlat:0 exlat:0 rrt:0 rrl:0
          rwt:0 rwl:0 idle_power:0.6300W active_power:9.00W
ps    1 : mp:4.10W operational enlat:0 exlat:0 rrt:0 rrl:0
          rwt:0 rwl:0 idle_power:0.6300W active_power:4.10W
ps    2 : mp:3.50W operational enlat:0 exlat:0 rrt:0 rrl:0
          rwt:0 rwl:0 idle_power:0.6300W active_power:3.50W
ps    3 : mp:0.0250W non-operational enlat:5000 exlat:10000 rrt:3 rrl:3
          rwt:3 rwl:3 idle_power:0.0250W active_power:-
ps    4 : mp:0.0050W non-operational enlat:5000 exlat:45000 rrt:4 rrl:4
          rwt:4 rwl:4 idle_power:0.0050W active_power:-

The other from nvme get-feature -f 0x0c -H /dev/nvme0

get-feature:0xc (Autonomous Power State Transition), Current value:0x000001
	Autonomous Power State Transition Enable (APSTE): Enabled
	Auto PST Entries	.................
	Entry[ 0]
	.................
	Idle Time Prior to Transition (ITPT): 750 ms
	Idle Transition Power State   (ITPS): 3
	.................
	Entry[ 1]
	.................
	Idle Time Prior to Transition (ITPT): 750 ms
	Idle Transition Power State   (ITPS): 3
	.................
	Entry[ 2]
	.................
	Idle Time Prior to Transition (ITPT): 750 ms
	Idle Transition Power State   (ITPS): 3
	.................
	Entry[ 3]
	.................
	Idle Time Prior to Transition (ITPT): 2500 ms
	Idle Transition Power State   (ITPS): 4
	.................

I don't really know how those two interact, or are related, if at all? The timings don't seem to match up.

Anyway, with all that background, I'm happy to try NVME_QUIRK_NO_DEEPEST_PS for 15b7:5011 locally, and submit here if it works.

However, the main problem is how to reproduce this issue reliably/deterministically, in order to be confident in the patch. It can happen with in minutes or days at the moment.

So, my questions:

1) How can I reproduce the issue deterministically?
2) Are there any other causes of this I'd need to rule out? E.g BIOS, PSU, broken drive rather than a power state quirk.

I also have a couple of more fundamental questions, the answer to which is probably way beyond my understanding:

3) Why are so many drives needing this qurik in Linux? Could it be that Windows also avoids these power states?
4) I looked at the code around the message, and it seems to be this is about an attempt to reset the controller, rather than just accept it's timed out an operation. Is that correct? And if so, could there be a problem with the way resetting is working - or is it again a quirk with these NVMEs?
5) On that note, some Googling yielded this patch, that I think was rejected https://patchwork.kernel.org/project/linux-block/patch/20180516040313.13596-12-ming.lei@redhat.com/. I'm unclear on the details, but felt it might be relevant.

Thanks to the maintainers of the NVME drivers.

Marcos

^ permalink raw reply	[flat|nested] 6+ messages in thread