* [PATCH] fix: nvme_update_ns_info method should be called even if nvme_ms_ids_equal return false @ 2022-04-08 2:57 me 2022-04-08 5:19 ` Christoph Hellwig 0 siblings, 1 reply; 21+ messages in thread From: me @ 2022-04-08 2:57 UTC (permalink / raw) To: kbusch; +Cc: linux-nvme, axboe, hch, sagi, kingtous From: kingtous <kingtous@qq.com> I've encountered the suspend issue which causes NVMe SSD cannot be detected after reboot from Linux or wake from suspend. I've tried almost all popular Linux distributions like Ubuntu, Debian, Arch, Manjaro. None of them works with my NVMe SSD. I changed a brand new SSD but it doesn't work either. Originally, if nvme_ns_ids_equal method returns false, the nvme ns info will not try to update and namespace suddenly be removed. This is the issue I've found causing my laptop SSD not detected after suspend or reboot from Linux. After removing 'goto out_free_id', the suspend issue has gone, everything works. When nvme namespaced ids are not equal, we should also try to update namespace info, not just skip it. Signed-off-by: kingtous <kingtous@qq.com> --- drivers/nvme/host/core.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index efb85c6d8..89c6a9598 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -4056,7 +4056,6 @@ static void nvme_validate_ns(struct nvme_ns *ns, struct nvme_ns_ids *ids) if (!nvme_ns_ids_equal(&ns->head->ids, ids)) { dev_err(ns->ctrl->device, "identifiers changed for nsid %d\n", ns->head->ns_id); - goto out_free_id; } ret = nvme_update_ns_info(ns, id); -- 2.35.1 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH] fix: nvme_update_ns_info method should be called even if nvme_ms_ids_equal return false 2022-04-08 2:57 [PATCH] fix: nvme_update_ns_info method should be called even if nvme_ms_ids_equal return false me @ 2022-04-08 5:19 ` Christoph Hellwig [not found] ` <ABwAxgCYBwe5Iq2fM1-8qqrc.3.1649398333597.Hmail.me@kingtous.cn> 0 siblings, 1 reply; 21+ messages in thread From: Christoph Hellwig @ 2022-04-08 5:19 UTC (permalink / raw) To: me; +Cc: kbusch, linux-nvme, axboe, hch, sagi, kingtous On Fri, Apr 08, 2022 at 10:57:04AM +0800, me@kingtous.cn wrote: > When nvme namespaced ids are not equal, we should also try to update namespace info, not just skip it. This is, umm, od. Can you do a: nvme id-ns /dev/nvme0n1 before and after a suspend to see what changed there? (assuming you only have one controller with one namespace, else repeat for each /dev/nvme*n* device) ^ permalink raw reply [flat|nested] 21+ messages in thread
[parent not found: <ABwAxgCYBwe5Iq2fM1-8qqrc.3.1649398333597.Hmail.me@kingtous.cn>]
* Re: [PATCH] fix: nvme_update_ns_info method should be called even if nvme_ms_ids_equal return false [not found] ` <ABwAxgCYBwe5Iq2fM1-8qqrc.3.1649398333597.Hmail.me@kingtous.cn> @ 2022-04-08 6:22 ` Christoph Hellwig 2022-04-08 7:56 ` 金韬 0 siblings, 1 reply; 21+ messages in thread From: Christoph Hellwig @ 2022-04-08 6:22 UTC (permalink / raw) To: me; +Cc: Christoph Hellwig, kbusch, linux-nvme, axboe, sagi, kingtous On Fri, Apr 08, 2022 at 02:12:13PM +0800, me@kingtous.cn wrote: > Yes, I only have one controller with one namespace. > I've done `nvme id-ns /dev/nvme0n1` times before and after suspend. The only difference between the two output is nuse address. The nuse address's output is dynamic, the outputs are different each time I execute the nvme id-ns command. > When the laptop wakes up, tty will show "identifiers changed for nsid 1". I checked the "nvme_ns_ids_equal" function, it will check uuid, nguid, eui64 and csi. So maybe there is something changed in uuid or csi? Let's try this: diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index efb85c6d8e2d5..0c38184a3ffa2 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -1631,10 +1631,28 @@ static void nvme_config_discard(struct gendisk *disk, struct nvme_ns *ns) static bool nvme_ns_ids_equal(struct nvme_ns_ids *a, struct nvme_ns_ids *b) { - return uuid_equal(&a->uuid, &b->uuid) && - memcmp(&a->nguid, &b->nguid, sizeof(a->nguid)) == 0 && - memcmp(&a->eui64, &b->eui64, sizeof(a->eui64)) == 0 && - a->csi == b->csi; + if (uuid_equal(&a->uuid, &b->uuid)) { + printk("uuid changed from %pU to %pU\n", + &a->uuid, &b->uuid); + return false; + } + if (memcmp(&a->nguid, &b->nguid, sizeof(a->nguid))) { + printk("nguid changed from %16phN to %16phN\n", + &a->nguid, &b->nguid); + return false; + } + if (memcmp(&a->eui64, &b->eui64, sizeof(a->eui64))) { + printk("eui changed from %8phN to %8phN\n", + &a->eui64, &b->eui64); + return false; + } + if (a->csi != b->csi) { + printk("csi changed from %u to %u\n", + a->csi, b->csi); + return false; + } + + return true; } static int nvme_init_ms(struct nvme_ns *ns, struct nvme_id_ns *id) ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH] fix: nvme_update_ns_info method should be called even if nvme_ms_ids_equal return false 2022-04-08 6:22 ` Christoph Hellwig @ 2022-04-08 7:56 ` 金韬 2022-04-08 8:07 ` Christoph Hellwig 0 siblings, 1 reply; 21+ messages in thread From: 金韬 @ 2022-04-08 7:56 UTC (permalink / raw) To: Christoph Hellwig; +Cc: kbusch, linux-nvme, axboe, sagi, kingtous This is output from dmesg. Seems that "eui" has changed. [ 2.086226] loop0: detected capacity change from 0 to 8 [ 26.577001] eui changed from 0100000000000000 to 0000000000000001 [ 26.577003] nvme nvme0: identifiers changed for nsid 1 在 2022/4/8 14:22, Christoph Hellwig 写道: > On Fri, Apr 08, 2022 at 02:12:13PM +0800, me@kingtous.cn wrote: >> Yes, I only have one controller with one namespace. >> I've done `nvme id-ns /dev/nvme0n1` times before and after suspend. The only difference between the two output is nuse address. The nuse address's output is dynamic, the outputs are different each time I execute the nvme id-ns command. >> When the laptop wakes up, tty will show "identifiers changed for nsid 1". I checked the "nvme_ns_ids_equal" function, it will check uuid, nguid, eui64 and csi. So maybe there is something changed in uuid or csi? > > Let's try this: > > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c > index efb85c6d8e2d5..0c38184a3ffa2 100644 > --- a/drivers/nvme/host/core.c > +++ b/drivers/nvme/host/core.c > @@ -1631,10 +1631,28 @@ static void nvme_config_discard(struct gendisk *disk, struct nvme_ns *ns) > > static bool nvme_ns_ids_equal(struct nvme_ns_ids *a, struct nvme_ns_ids *b) > { > - return uuid_equal(&a->uuid, &b->uuid) && > - memcmp(&a->nguid, &b->nguid, sizeof(a->nguid)) == 0 && > - memcmp(&a->eui64, &b->eui64, sizeof(a->eui64)) == 0 && > - a->csi == b->csi; > + if (uuid_equal(&a->uuid, &b->uuid)) { > + printk("uuid changed from %pU to %pU\n", > + &a->uuid, &b->uuid); > + return false; > + } > + if (memcmp(&a->nguid, &b->nguid, sizeof(a->nguid))) { > + printk("nguid changed from %16phN to %16phN\n", > + &a->nguid, &b->nguid); > + return false; > + } > + if (memcmp(&a->eui64, &b->eui64, sizeof(a->eui64))) { > + printk("eui changed from %8phN to %8phN\n", > + &a->eui64, &b->eui64); > + return false; > + } > + if (a->csi != b->csi) { > + printk("csi changed from %u to %u\n", > + a->csi, b->csi); > + return false; > + } > + > + return true; > } > > static int nvme_init_ms(struct nvme_ns *ns, struct nvme_id_ns *id) > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] fix: nvme_update_ns_info method should be called even if nvme_ms_ids_equal return false 2022-04-08 7:56 ` 金韬 @ 2022-04-08 8:07 ` Christoph Hellwig 2022-04-08 8:35 ` 金韬 2022-04-08 15:18 ` Keith Busch 0 siblings, 2 replies; 21+ messages in thread From: Christoph Hellwig @ 2022-04-08 8:07 UTC (permalink / raw) To: 金韬 Cc: Christoph Hellwig, kbusch, linux-nvme, axboe, sagi, kingtous On Fri, Apr 08, 2022 at 03:56:49PM +0800, 金韬 wrote: > This is output from dmesg. Seems that "eui" has changed. > > [ 2.086226] loop0: detected capacity change from 0 to 8 > [ 26.577001] eui changed from 0100000000000000 to 0000000000000001 > [ 26.577003] nvme nvme0: identifiers changed for nsid 1 Ok, looks like the device is broken and changes the EUID after power cycles. Can you send the output of lspci -v? Also just out of curiousity, does the ID keep changing if you do more suspend cycles? ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] fix: nvme_update_ns_info method should be called even if nvme_ms_ids_equal return false 2022-04-08 8:07 ` Christoph Hellwig @ 2022-04-08 8:35 ` 金韬 2022-04-08 15:18 ` Keith Busch 1 sibling, 0 replies; 21+ messages in thread From: 金韬 @ 2022-04-08 8:35 UTC (permalink / raw) To: Christoph Hellwig; +Cc: kbusch, linux-nvme, axboe, sagi, kingtous Yes, the ID keeps changing if I do more suspend cycles. But numbers are the same, from 0100000000000000 to 0000000000000001. [ 26.574986] nvme nvme0: 8/0/0 default/read/poll queues [ 26.577001] eui changed from 0100000000000000 to 0000000000000001 [ 26.577003] nvme nvme0: identifiers changed for nsid 1 [ 2467.069659] eui changed from 0100000000000000 to 0000000000000001 [ 2467.069660] nvme nvme0: identifiers changed for nsid 1 [ 2500.790058] eui changed from 0100000000000000 to 0000000000000001 [ 2500.790059] nvme nvme0: identifiers changed for nsid 1 I've changed two SSDs(Hikvision C2000eco(current) 1T, Gloway NVMe Basic version 1T) and they behave the same. I think is a device-specific problem. The model is Lenovo Thinkbook 14p (amd ryzen 5800H + 32G). The suspend problem occurs in all Linux distributions like Ubuntu/Debian/Arch/Manjaro. Current using: Manjaro Linux 21.2.5 (Linux 5.17.1 stable kernel, directly downloaded from kernel.org) > lspci -k: 00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne Root Complex Subsystem: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne Root Complex 00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne IOMMU Subsystem: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne IOMMU 00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge 00:01.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge Kernel driver in use: pcieport 00:01.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge Kernel driver in use: pcieport 00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge 00:02.4 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge Kernel driver in use: pcieport 00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge 00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir Internal PCIe GPP Bridge to Bus Kernel driver in use: pcieport 00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 51) Subsystem: Lenovo Device 3846 Kernel driver in use: piix4_smbus Kernel modules: i2c_piix4, sp5100_tco 00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51) Subsystem: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge 00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 0 00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 1 00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 2 00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 3 Kernel driver in use: k10temp Kernel modules: k10temp 00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 4 00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 5 00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 6 00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 7 01:00.0 Network controller: Intel Corporation Wi-Fi 6 AX200 (rev 1a) Subsystem: Intel Corporation Wi-Fi 6 AX200NGW Kernel driver in use: iwlwifi Kernel modules: iwlwifi 02:00.0 SD Host controller: O2 Micro, Inc. SD/MMC Card Reader Controller (rev 01) Subsystem: O2 Micro, Inc. Device 0002 Kernel driver in use: sdhci-pci Kernel modules: sdhci_pci 03:00.0 Non-Volatile memory controller: MAXIO Technology (Hangzhou) Ltd. NVMe SSD Controller MAP1202 (rev 01) Subsystem: MAXIO Technology (Hangzhou) Ltd. NVMe SSD Controller MAP1202 Kernel driver in use: nvme 04:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Cezanne (rev c5) Subsystem: Lenovo Device 3807 Kernel driver in use: amdgpu Kernel modules: amdgpu 04:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Renoir Radeon High Definition Audio Controller Subsystem: Lenovo Device 3814 Kernel driver in use: snd_hda_intel Kernel modules: snd_hda_intel 04:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) Platform Security Processor Subsystem: Lenovo Device 3831 Kernel driver in use: ccp Kernel modules: ccp 04:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne USB 3.1 Subsystem: Lenovo Device 3839 Kernel driver in use: xhci_hcd Kernel modules: xhci_pci 04:00.4 USB controller: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne USB 3.1 Subsystem: Lenovo Device 3838 Kernel driver in use: xhci_hcd Kernel modules: xhci_pci 04:00.5 Multimedia controller: Advanced Micro Devices, Inc. [AMD] ACP/ACP3X/ACP6x Audio Coprocessor (rev 01) Subsystem: Lenovo Device 3832 Kernel modules: snd_pci_acp3x, snd_rn_pci_acp3x, snd_pci_acp5x 04:00.6 Audio device: Advanced Micro Devices, Inc. [AMD] Family 17h/19h HD Audio Controller Subsystem: Lenovo Device 3833 Kernel driver in use: snd_hda_intel Kernel modules: snd_hda_intel 在 2022/4/8 16:07, Christoph Hellwig 写道: > On Fri, Apr 08, 2022 at 03:56:49PM +0800, 金韬 wrote: >> This is output from dmesg. Seems that "eui" has changed. >> >> [ 2.086226] loop0: detected capacity change from 0 to 8 >> [ 26.577001] eui changed from 0100000000000000 to 0000000000000001 >> [ 26.577003] nvme nvme0: identifiers changed for nsid 1 > > Ok, looks like the device is broken and changes the EUID after power > cycles. Can you send the output of lspci -v? > > Also just out of curiousity, does the ID keep changing if you do more > suspend cycles? > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] fix: nvme_update_ns_info method should be called even if nvme_ms_ids_equal return false 2022-04-08 8:07 ` Christoph Hellwig 2022-04-08 8:35 ` 金韬 @ 2022-04-08 15:18 ` Keith Busch 2022-04-08 16:04 ` Christoph Hellwig 1 sibling, 1 reply; 21+ messages in thread From: Keith Busch @ 2022-04-08 15:18 UTC (permalink / raw) To: Christoph Hellwig; +Cc: 金韬, linux-nvme, axboe, sagi, kingtous On Fri, Apr 08, 2022 at 10:07:21AM +0200, Christoph Hellwig wrote: > On Fri, Apr 08, 2022 at 03:56:49PM +0800, 金韬 wrote: > > This is output from dmesg. Seems that "eui" has changed. > > > > [ 2.086226] loop0: detected capacity change from 0 to 8 > > [ 26.577001] eui changed from 0100000000000000 to 0000000000000001 > > [ 26.577003] nvme nvme0: identifiers changed for nsid 1 > > Ok, looks like the device is broken and changes the EUID after power > cycles. Can you send the output of lspci -v? > > Also just out of curiousity, does the ID keep changing if you do more > suspend cycles? The eui isn't legit in the first place (no OUI), and appears to be swqpping the byte order during resume. This should be reported to the vendor. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] fix: nvme_update_ns_info method should be called even if nvme_ms_ids_equal return false 2022-04-08 15:18 ` Keith Busch @ 2022-04-08 16:04 ` Christoph Hellwig 2022-04-09 0:58 ` Tao Jin 0 siblings, 1 reply; 21+ messages in thread From: Christoph Hellwig @ 2022-04-08 16:04 UTC (permalink / raw) To: Keith Busch Cc: Christoph Hellwig, 金韬, linux-nvme, axboe, sagi, kingtous On Fri, Apr 08, 2022 at 09:18:19AM -0600, Keith Busch wrote: > On Fri, Apr 08, 2022 at 10:07:21AM +0200, Christoph Hellwig wrote: > > On Fri, Apr 08, 2022 at 03:56:49PM +0800, 金韬 wrote: > > > This is output from dmesg. Seems that "eui" has changed. > > > > > > [ 2.086226] loop0: detected capacity change from 0 to 8 > > > [ 26.577001] eui changed from 0100000000000000 to 0000000000000001 > > > [ 26.577003] nvme nvme0: identifiers changed for nsid 1 > > > > Ok, looks like the device is broken and changes the EUID after power > > cycles. Can you send the output of lspci -v? > > > > Also just out of curiousity, does the ID keep changing if you do more > > suspend cycles? > > The eui isn't legit in the first place (no OUI), and appears to be swqpping the Yes. > byte order during resume. This should be reported to the vendor. Well, the id-ns output posted earlier shows the same output before and after resume. Which is really weird. Either way we'll have to quirk it some way. Just to pointpoint this down a bit, what does nvme ns-descs /dev/nvme0n1 report? I wonder if we get different IDs from the different methods to retrive them given that namespace allocation looks at the Namespace Identification Descriptor last, while revalidation only looks at Identify Namespace. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] fix: nvme_update_ns_info method should be called even if nvme_ms_ids_equal return false 2022-04-08 16:04 ` Christoph Hellwig @ 2022-04-09 0:58 ` Tao Jin 2022-04-09 4:43 ` Christoph Hellwig 0 siblings, 1 reply; 21+ messages in thread From: Tao Jin @ 2022-04-09 0:58 UTC (permalink / raw) To: Christoph Hellwig, Keith Busch; +Cc: linux-nvme, axboe, sagi, kingtous Thanks for your kind reply. The output from command "nvme ns-descs /dev/nvme0n1" shows below: before suspend: NVME Namespace Identification Descriptors NS 1: uuid : 01000000-0000-0000-0000-000000000000 but after suspend, uuid seems disapeared: NVME Namespace Identification Descriptors NS 1: eui64 : 0100000000000000 If I do more suspend operations, the output is the same: NVME Namespace Identification Descriptors NS 1: eui64 : 0100000000000000 Note that I'm using the kernel which customed by myself, which comments out "goto out_free_id". It means "nvme_update_ns_info" will be called even if invalidate ids failed. Because I can't do suspend operation if using official kernel, which will cause my SSD directly invisible in Linux and trigger ext4 error, freezing the laptop. ``` static void nvme_validate_ns(struct nvme_ns *ns, struct nvme_ns_ids *ids) { struct nvme_id_ns *id; int ret = NVME_SC_INVALID_NS | NVME_SC_DNR; if (test_bit(NVME_NS_DEAD, &ns->flags)) goto out; ret = nvme_identify_ns(ns->ctrl, ns->head->ns_id, ids, &id); if (ret) goto out; ret = NVME_SC_INVALID_NS | NVME_SC_DNR; if (!nvme_ns_ids_equal(&ns->head->ids, ids)) { dev_err(ns->ctrl->device, "identifiers changed for nsid %d\n", ns->head->ns_id); - goto out_free_id; } ret = nvme_update_ns_info(ns, id); out_free_id: kfree(id); out: /* * Only remove the namespace if we got a fatal error back from the * device, otherwise ignore the error and just move on. * * TODO: we should probably schedule a delayed retry here. */ if (ret > 0 && (ret & NVME_SC_DNR)) nvme_ns_remove(ns); } ``` In addition, Windows 10/11 has no suspend issue in this laptop. It's really weird. 在 2022/4/9 00:04, Christoph Hellwig 写道: > On Fri, Apr 08, 2022 at 09:18:19AM -0600, Keith Busch wrote: >> On Fri, Apr 08, 2022 at 10:07:21AM +0200, Christoph Hellwig wrote: >>> On Fri, Apr 08, 2022 at 03:56:49PM +0800, 金韬 wrote: >>>> This is output from dmesg. Seems that "eui" has changed. >>>> >>>> [ 2.086226] loop0: detected capacity change from 0 to 8 >>>> [ 26.577001] eui changed from 0100000000000000 to 0000000000000001 >>>> [ 26.577003] nvme nvme0: identifiers changed for nsid 1 >>> >>> Ok, looks like the device is broken and changes the EUID after power >>> cycles. Can you send the output of lspci -v? >>> >>> Also just out of curiousity, does the ID keep changing if you do more >>> suspend cycles? >> >> The eui isn't legit in the first place (no OUI), and appears to be swqpping the > > Yes. > >> byte order during resume. This should be reported to the vendor. > > Well, the id-ns output posted earlier shows the same output before and > after resume. Which is really weird. > > Either way we'll have to quirk it some way. > > Just to pointpoint this down a bit, what does > > nvme ns-descs /dev/nvme0n1 > > report? I wonder if we get different IDs from the different methods > to retrive them given that namespace allocation looks at the > Namespace Identification Descriptor last, while revalidation only > looks at Identify Namespace. > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] fix: nvme_update_ns_info method should be called even if nvme_ms_ids_equal return false 2022-04-09 0:58 ` Tao Jin @ 2022-04-09 4:43 ` Christoph Hellwig 2022-04-09 9:11 ` Tao Jin 0 siblings, 1 reply; 21+ messages in thread From: Christoph Hellwig @ 2022-04-09 4:43 UTC (permalink / raw) To: Tao Jin; +Cc: Christoph Hellwig, Keith Busch, linux-nvme, axboe, sagi, kingtous On Sat, Apr 09, 2022 at 08:58:27AM +0800, Tao Jin wrote: > Note that I'm using the kernel which customed by myself, which comments out > "goto out_free_id". It means "nvme_update_ns_info" will be called even if > invalidate ids failed. Because I can't do suspend operation if using > official kernel, which will cause my SSD directly invisible in Linux and > trigger ext4 error, freezing the laptop. Thanks for the update. This proves two things: a) the device is completely broken in reporting IDs b) the reason why the eui64 changes after each suspend is because the kernel is inconsistent in how it queries for the eui64 I'll prepare patches to fix both issues over the weekend. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] fix: nvme_update_ns_info method should be called even if nvme_ms_ids_equal return false 2022-04-09 4:43 ` Christoph Hellwig @ 2022-04-09 9:11 ` Tao Jin 2022-04-11 5:49 ` Christoph Hellwig 0 siblings, 1 reply; 21+ messages in thread From: Tao Jin @ 2022-04-09 9:11 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Keith Busch, linux-nvme, axboe, sagi, kingtous Thanks. 在 2022/4/9 12:43, Christoph Hellwig 写道: > On Sat, Apr 09, 2022 at 08:58:27AM +0800, Tao Jin wrote: >> Note that I'm using the kernel which customed by myself, which comments out >> "goto out_free_id". It means "nvme_update_ns_info" will be called even if >> invalidate ids failed. Because I can't do suspend operation if using >> official kernel, which will cause my SSD directly invisible in Linux and >> trigger ext4 error, freezing the laptop. > > Thanks for the update. This proves two things: > > a) the device is completely broken in reporting IDs > b) the reason why the eui64 changes after each suspend is because the > kernel is inconsistent in how it queries for the eui64 > > I'll prepare patches to fix both issues over the weekend. > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] fix: nvme_update_ns_info method should be called even if nvme_ms_ids_equal return false 2022-04-09 9:11 ` Tao Jin @ 2022-04-11 5:49 ` Christoph Hellwig 2022-04-11 5:56 ` 金韬 0 siblings, 1 reply; 21+ messages in thread From: Christoph Hellwig @ 2022-04-11 5:49 UTC (permalink / raw) To: Tao Jin; +Cc: Christoph Hellwig, Keith Busch, linux-nvme, axboe, sagi, kingtous So I couldn't really come up with a good heuristic for checking for a valid EUI64 that would work here given that in one of the cases the only bit set is in the OUI. We'll need to a blunt quirk to disable identifiers for this controllers. Can you send the output of "lscpi -v" that contains the vendor and device id? ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] fix: nvme_update_ns_info method should be called even if nvme_ms_ids_equal return false 2022-04-11 5:49 ` Christoph Hellwig @ 2022-04-11 5:56 ` 金韬 2022-04-11 6:07 ` Christoph Hellwig 0 siblings, 1 reply; 21+ messages in thread From: 金韬 @ 2022-04-11 5:56 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Keith Busch, linux-nvme, axboe, sagi, kingtous Yes. To ensure all outputs are shown, I ran "lspci -v" with sudo priviledge. "sudo lspci -v" shows below: 00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne Root Complex Subsystem: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne Root Complex Flags: fast devsel 00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne IOMMU Subsystem: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne IOMMU Flags: fast devsel, IRQ -2147483648 Capabilities: [40] Secure device <?> Capabilities: [64] MSI: Enable- Count=1/4 Maskable- 64bit+ Capabilities: [74] HyperTransport: MSI Mapping Enable+ Fixed+ 00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge Flags: fast devsel, IOMMU group 0 00:01.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0, IRQ 26, IOMMU group 1 Bus: primary=00, secondary=01, subordinate=01, sec-latency=0 I/O behind bridge: [disabled] Memory behind bridge: fd600000-fd6fffff [size=1M] Prefetchable memory behind bridge: [disabled] Capabilities: [50] Power Management version 3 Capabilities: [58] Express Root Port (Slot+), MSI 00 Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [c0] Subsystem: Advanced Micro Devices, Inc. [AMD] Device 1453 Capabilities: [c8] HyperTransport: MSI Mapping Enable+ Fixed+ Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?> Capabilities: [270] Secondary PCI Express Capabilities: [2a0] Access Control Services Capabilities: [370] L1 PM Substates Kernel driver in use: pcieport 00:01.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0, IRQ 27, IOMMU group 2 Bus: primary=00, secondary=02, subordinate=02, sec-latency=0 I/O behind bridge: [disabled] Memory behind bridge: fd500000-fd5fffff [size=1M] Prefetchable memory behind bridge: [disabled] Capabilities: [50] Power Management version 3 Capabilities: [58] Express Root Port (Slot+), MSI 00 Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [c0] Subsystem: Advanced Micro Devices, Inc. [AMD] Device 1453 Capabilities: [c8] HyperTransport: MSI Mapping Enable+ Fixed+ Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?> Capabilities: [270] Secondary PCI Express Capabilities: [2a0] Access Control Services Capabilities: [370] L1 PM Substates Kernel driver in use: pcieport 00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge Flags: fast devsel, IOMMU group 3 00:02.4 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0, IRQ 28, IOMMU group 4 Bus: primary=00, secondary=03, subordinate=03, sec-latency=0 I/O behind bridge: [disabled] Memory behind bridge: fd400000-fd4fffff [size=1M] Prefetchable memory behind bridge: [disabled] Capabilities: [50] Power Management version 3 Capabilities: [58] Express Root Port (Slot+), MSI 00 Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [c0] Subsystem: Advanced Micro Devices, Inc. [AMD] Device 1453 Capabilities: [c8] HyperTransport: MSI Mapping Enable+ Fixed+ Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?> Capabilities: [270] Secondary PCI Express Capabilities: [2a0] Access Control Services Capabilities: [370] L1 PM Substates Kernel driver in use: pcieport 00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge Flags: fast devsel, IOMMU group 5 00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir Internal PCIe GPP Bridge to Bus (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0, IRQ 29, IOMMU group 5 Bus: primary=00, secondary=04, subordinate=04, sec-latency=0 I/O behind bridge: 00001000-00001fff [size=4K] Memory behind bridge: fd000000-fd3fffff [size=4M] Prefetchable memory behind bridge: 0000000860000000-000000ffffffffff [size=1014272M] Capabilities: [50] Power Management version 3 Capabilities: [58] Express Root Port (Slot-), MSI 00 Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [c0] Subsystem: Advanced Micro Devices, Inc. [AMD] Renoir Internal PCIe GPP Bridge to Bus Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?> Capabilities: [270] Secondary PCI Express Capabilities: [400] Data Link Feature <?> Capabilities: [410] Physical Layer 16.0 GT/s <?> Capabilities: [440] Lane Margining at the Receiver <?> Kernel driver in use: pcieport 00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 51) Subsystem: Lenovo Device 3846 Flags: 66MHz, medium devsel, IOMMU group 6 Kernel driver in use: piix4_smbus Kernel modules: i2c_piix4, sp5100_tco 00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51) Subsystem: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge Flags: bus master, 66MHz, medium devsel, latency 0, IOMMU group 6 00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 0 Flags: fast devsel, IOMMU group 7 00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 1 Flags: fast devsel, IOMMU group 7 00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 2 Flags: fast devsel, IOMMU group 7 00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 3 Flags: fast devsel, IOMMU group 7 Kernel driver in use: k10temp Kernel modules: k10temp 00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 4 Flags: fast devsel, IOMMU group 7 00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 5 Flags: fast devsel, IOMMU group 7 00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 6 Flags: fast devsel, IOMMU group 7 00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 7 Flags: fast devsel, IOMMU group 7 01:00.0 Network controller: Intel Corporation Wi-Fi 6 AX200 (rev 1a) Subsystem: Intel Corporation Wi-Fi 6 AX200NGW Flags: bus master, fast devsel, latency 0, IRQ 76, IOMMU group 8 Memory at fd600000 (64-bit, non-prefetchable) [size=16K] Capabilities: [c8] Power Management version 3 Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [40] Express Endpoint, MSI 00 Capabilities: [80] MSI-X: Enable+ Count=16 Masked- Capabilities: [100] Advanced Error Reporting Capabilities: [14c] Latency Tolerance Reporting Capabilities: [154] L1 PM Substates Kernel driver in use: iwlwifi Kernel modules: iwlwifi 02:00.0 SD Host controller: O2 Micro, Inc. SD/MMC Card Reader Controller (rev 01) (prog-if 01) Subsystem: O2 Micro, Inc. Device 0002 Flags: bus master, fast devsel, latency 0, IRQ 58, IOMMU group 9 Memory at fd501000 (32-bit, non-prefetchable) [size=4K] Memory at fd500000 (32-bit, non-prefetchable) [size=2K] Capabilities: [6c] Power Management version 3 Capabilities: [48] MSI: Enable+ Count=1/1 Maskable+ 64bit+ Capabilities: [80] Express Endpoint, MSI 00 Capabilities: [100] Virtual Channel Capabilities: [200] Advanced Error Reporting Capabilities: [230] Latency Tolerance Reporting Capabilities: [240] L1 PM Substates Kernel driver in use: sdhci-pci Kernel modules: sdhci_pci 03:00.0 Non-Volatile memory controller: MAXIO Technology (Hangzhou) Ltd. NVMe SSD Controller MAP1202 (rev 01) (prog-if 02 [NVM Express]) Subsystem: MAXIO Technology (Hangzhou) Ltd. NVMe SSD Controller MAP1202 Flags: bus master, fast devsel, latency 0, IRQ 30, NUMA node 0, IOMMU group 10 Memory at fd400000 (64-bit, non-prefetchable) [size=16K] Capabilities: [40] Power Management version 3 Capabilities: [50] MSI: Enable- Count=1/32 Maskable+ 64bit+ Capabilities: [70] Express Endpoint, MSI 1f Capabilities: [b0] MSI-X: Enable+ Count=20 Masked- Capabilities: [100] Advanced Error Reporting Capabilities: [148] Device Serial Number 00-00-00-00-00-00-00-00 Capabilities: [158] Alternative Routing-ID Interpretation (ARI) Capabilities: [168] Secondary PCI Express Capabilities: [1d4] Latency Tolerance Reporting Capabilities: [1dc] L1 PM Substates Capabilities: [1ec] Vendor Specific Information: ID=0002 Rev=4 Len=100 <?> Capabilities: [2ec] Vendor Specific Information: ID=0001 Rev=1 Len=038 <?> Kernel driver in use: nvme 04:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Cezanne (rev c5) (prog-if 00 [VGA controller]) Subsystem: Lenovo Device 3807 Flags: bus master, fast devsel, latency 0, IRQ 49, IOMMU group 5 Memory at 860000000 (64-bit, prefetchable) [size=256M] Memory at 870000000 (64-bit, prefetchable) [size=2M] I/O ports at 1000 [size=256] Memory at fd300000 (32-bit, non-prefetchable) [size=512K] Capabilities: [48] Vendor Specific Information: Len=08 <?> Capabilities: [50] Power Management version 3 Capabilities: [64] Express Legacy Endpoint, MSI 00 Capabilities: [a0] MSI: Enable- Count=1/4 Maskable- 64bit+ Capabilities: [c0] MSI-X: Enable+ Count=4 Masked- Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?> Capabilities: [270] Secondary PCI Express Capabilities: [2b0] Address Translation Service (ATS) Capabilities: [2c0] Page Request Interface (PRI) Capabilities: [2d0] Process Address Space ID (PASID) Capabilities: [400] Data Link Feature <?> Capabilities: [410] Physical Layer 16.0 GT/s <?> Capabilities: [440] Lane Margining at the Receiver <?> Kernel driver in use: amdgpu Kernel modules: amdgpu 04:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Renoir Radeon High Definition Audio Controller Subsystem: Lenovo Device 3814 Flags: bus master, fast devsel, latency 0, IRQ 75, IOMMU group 5 Memory at fd3c8000 (32-bit, non-prefetchable) [size=16K] Capabilities: [48] Vendor Specific Information: Len=08 <?> Capabilities: [50] Power Management version 3 Capabilities: [64] Express Legacy Endpoint, MSI 00 Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?> Kernel driver in use: snd_hda_intel Kernel modules: snd_hda_intel 04:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) Platform Security Processor Subsystem: Lenovo Device 3831 Flags: bus master, fast devsel, latency 0, IRQ 70, IOMMU group 5 Memory at fd200000 (32-bit, non-prefetchable) [size=1M] Memory at fd3cc000 (32-bit, non-prefetchable) [size=8K] Capabilities: [48] Vendor Specific Information: Len=08 <?> Capabilities: [50] Power Management version 3 Capabilities: [64] Express Endpoint, MSI 00 Capabilities: [a0] MSI: Enable- Count=1/2 Maskable- 64bit+ Capabilities: [c0] MSI-X: Enable+ Count=2 Masked- Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?> Kernel driver in use: ccp Kernel modules: ccp 04:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne USB 3.1 (prog-if 30 [XHCI]) Subsystem: Lenovo Device 3839 Flags: bus master, fast devsel, latency 0, IRQ 40, IOMMU group 5 Memory at fd000000 (64-bit, non-prefetchable) [size=1M] Capabilities: [48] Vendor Specific Information: Len=08 <?> Capabilities: [50] Power Management version 3 Capabilities: [64] Express Endpoint, MSI 00 Capabilities: [a0] MSI: Enable- Count=1/8 Maskable- 64bit+ Capabilities: [c0] MSI-X: Enable+ Count=8 Masked- Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?> Kernel driver in use: xhci_hcd Kernel modules: xhci_pci 04:00.4 USB controller: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne USB 3.1 (prog-if 30 [XHCI]) Subsystem: Lenovo Device 3838 Flags: bus master, fast devsel, latency 0, IRQ 49, IOMMU group 5 Memory at fd100000 (64-bit, non-prefetchable) [size=1M] Capabilities: [48] Vendor Specific Information: Len=08 <?> Capabilities: [50] Power Management version 3 Capabilities: [64] Express Endpoint, MSI 00 Capabilities: [a0] MSI: Enable- Count=1/8 Maskable- 64bit+ Capabilities: [c0] MSI-X: Enable+ Count=8 Masked- Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?> Kernel driver in use: xhci_hcd Kernel modules: xhci_pci 04:00.5 Multimedia controller: Advanced Micro Devices, Inc. [AMD] ACP/ACP3X/ACP6x Audio Coprocessor (rev 01) Subsystem: Lenovo Device 3832 Flags: fast devsel, IRQ 73, IOMMU group 5 Memory at fd380000 (32-bit, non-prefetchable) [size=256K] Capabilities: [48] Vendor Specific Information: Len=08 <?> Capabilities: [50] Power Management version 3 Capabilities: [64] Express Endpoint, MSI 00 Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?> Kernel modules: snd_pci_acp3x, snd_rn_pci_acp3x, snd_pci_acp5x 04:00.6 Audio device: Advanced Micro Devices, Inc. [AMD] Family 17h/19h HD Audio Controller Subsystem: Lenovo Device 3833 Flags: bus master, fast devsel, latency 0, IRQ 31, IOMMU group 5 Memory at fd3c0000 (32-bit, non-prefetchable) [size=32K] Capabilities: [48] Vendor Specific Information: Len=08 <?> Capabilities: [50] Power Management version 3 Capabilities: [64] Express Endpoint, MSI 00 Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?> Kernel driver in use: snd_hda_intel Kernel modules: snd_hda_intel 在 2022/4/11 13:49, Christoph Hellwig 写道: > So I couldn't really come up with a good heuristic for checking for > a valid EUI64 that would work here given that in one of the cases > the only bit set is in the OUI. We'll need to a blunt quirk to disable > identifiers for this controllers. Can you send the output of "lscpi -v" > that contains the vendor and device id? > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] fix: nvme_update_ns_info method should be called even if nvme_ms_ids_equal return false 2022-04-11 5:56 ` 金韬 @ 2022-04-11 6:07 ` Christoph Hellwig 2022-04-11 14:20 ` 金韬 0 siblings, 1 reply; 21+ messages in thread From: Christoph Hellwig @ 2022-04-11 6:07 UTC (permalink / raw) To: 金韬 Cc: Christoph Hellwig, Keith Busch, linux-nvme, axboe, sagi, kingtous Plase try this patch: --- From 95cf9f00488ca5b203cb9205f108b8ec0c30d001 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig <hch@lst.de> Date: Mon, 11 Apr 2022 08:05:27 +0200 Subject: nvme: add a Namespace Identifiers quirk MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The MAXIO MAP1202 controllers reports completely bogus Namespace identifiers that even change after suspend cycles. Disable using the Identifiers entirely. Reported-by: 金韬 <me@kingtous.cn> Signed-off-by: Christoph Hellwig <hch@lst.de> --- drivers/nvme/host/core.c | 24 ++++++++++++++++++------ drivers/nvme/host/nvme.h | 5 +++++ drivers/nvme/host/pci.c | 2 ++ 3 files changed, 25 insertions(+), 6 deletions(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index efb85c6d8e2d5..907f4a2b3de87 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -1282,6 +1282,8 @@ static int nvme_process_ns_desc(struct nvme_ctrl *ctrl, struct nvme_ns_ids *ids, switch (cur->nidt) { case NVME_NIDT_EUI64: + if (ctrl->quirks & NVME_QUIRK_BOGUS_NID) + return -1; if (cur->nidl != NVME_NIDT_EUI64_LEN) { dev_warn(ctrl->device, "%s %d for NVME_NIDT_EUI64\n", warn_str, cur->nidl); @@ -1290,6 +1292,8 @@ static int nvme_process_ns_desc(struct nvme_ctrl *ctrl, struct nvme_ns_ids *ids, memcpy(ids->eui64, data + sizeof(*cur), NVME_NIDT_EUI64_LEN); return NVME_NIDT_EUI64_LEN; case NVME_NIDT_NGUID: + if (ctrl->quirks & NVME_QUIRK_BOGUS_NID) + return -1; if (cur->nidl != NVME_NIDT_NGUID_LEN) { dev_warn(ctrl->device, "%s %d for NVME_NIDT_NGUID\n", warn_str, cur->nidl); @@ -1298,6 +1302,8 @@ static int nvme_process_ns_desc(struct nvme_ctrl *ctrl, struct nvme_ns_ids *ids, memcpy(ids->nguid, data + sizeof(*cur), NVME_NIDT_NGUID_LEN); return NVME_NIDT_NGUID_LEN; case NVME_NIDT_UUID: + if (ctrl->quirks & NVME_QUIRK_BOGUS_NID) + return -1; if (cur->nidl != NVME_NIDT_UUID_LEN) { dev_warn(ctrl->device, "%s %d for NVME_NIDT_UUID\n", warn_str, cur->nidl); @@ -1399,12 +1405,18 @@ static int nvme_identify_ns(struct nvme_ctrl *ctrl, unsigned nsid, if ((*id)->ncap == 0) /* namespace not allocated or attached */ goto out_free_id; - if (ctrl->vs >= NVME_VS(1, 1, 0) && - !memchr_inv(ids->eui64, 0, sizeof(ids->eui64))) - memcpy(ids->eui64, (*id)->eui64, sizeof(ids->eui64)); - if (ctrl->vs >= NVME_VS(1, 2, 0) && - !memchr_inv(ids->nguid, 0, sizeof(ids->nguid))) - memcpy(ids->nguid, (*id)->nguid, sizeof(ids->nguid)); + + if (ctrl->quirks & NVME_QUIRK_BOGUS_NID) { + dev_info(ctrl->device, + "Ignoring bogus Namespace Identifiers\n"); + } else { + if (ctrl->vs >= NVME_VS(1, 1, 0) && + !memchr_inv(ids->eui64, 0, sizeof(ids->eui64))) + memcpy(ids->eui64, (*id)->eui64, sizeof(ids->eui64)); + if (ctrl->vs >= NVME_VS(1, 2, 0) && + !memchr_inv(ids->nguid, 0, sizeof(ids->nguid))) + memcpy(ids->nguid, (*id)->nguid, sizeof(ids->nguid)); + } return 0; diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h index 1393bbf82d71e..a2b53ca633359 100644 --- a/drivers/nvme/host/nvme.h +++ b/drivers/nvme/host/nvme.h @@ -144,6 +144,11 @@ enum nvme_quirks { * encoding the generation sequence number. */ NVME_QUIRK_SKIP_CID_GEN = (1 << 17), + + /* + * Reports garbage in the namespace identifiers (eui64, nguid, uuid). + */ + NVME_QUIRK_BOGUS_NID = (1 << 18), }; /* diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index d817ca17463ed..c386c91483505 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -3447,6 +3447,8 @@ static const struct pci_device_id nvme_id_table[] = { .driver_data = NVME_QUIRK_NO_DEEPEST_PS, }, { PCI_DEVICE(0x2646, 0x2263), /* KINGSTON A2000 NVMe SSD */ .driver_data = NVME_QUIRK_NO_DEEPEST_PS, }, + { PCI_DEVICE(0x1e4B, 0x1202), /* MAXIO MAP1202 */ + .driver_data = NVME_QUIRK_BOGUS_NID, }, { PCI_DEVICE(PCI_VENDOR_ID_AMAZON, 0x0061), .driver_data = NVME_QUIRK_DMA_ADDRESS_BITS_48, }, { PCI_DEVICE(PCI_VENDOR_ID_AMAZON, 0x0065), -- 2.30.2 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH] fix: nvme_update_ns_info method should be called even if nvme_ms_ids_equal return false 2022-04-11 6:07 ` Christoph Hellwig @ 2022-04-11 14:20 ` 金韬 2022-04-12 5:04 ` Christoph Hellwig 0 siblings, 1 reply; 21+ messages in thread From: 金韬 @ 2022-04-11 14:20 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Keith Busch, linux-nvme, axboe, sagi, kingtous After applying the patch provided (based on official kernel 5.17.1), the suspend issue has gone, and works! dmesg shows below: [ 0.489981] nvme 0000:03:00.0: platform quirk: setting simple suspend [ 0.490002] nvme nvme0: pci function 0000:03:00.0 [ 0.494897] nvme nvme0: missing or invalid SUBNQN field. [ 0.498902] nvme nvme0: allocated 16 MiB host memory buffer. [ 0.537800] nvme nvme0: 8/0/0 default/read/poll queues [ 0.543528] nvme nvme0: Ignoring bogus Namespace Identifiers [ 0.546709] nvme0n1: p1 p2 p3 p4 [ 0.985096] EXT4-fs (nvme0n1p4): mounted filesystem with ordered data mode. Quota mode: none. [ 1.220554] EXT4-fs (nvme0n1p4): re-mounted. Quota mode: none. [ 36.672605] nvme nvme0: 8/0/0 default/read/poll queues [ 36.673717] nvme nvme0: Ignoring bogus Namespace Identifiers btw, I checked the patch code, I noticed "MAXIO MAP1202" is hard coded in pci.c. I think not only MAP1202 has the problem mentioned in this thread, but also "MAXIO MAP1002"(Gloway Basic 1T NVMe SSD, which has the same suspend issue). > + { PCI_DEVICE(0x1e4B, 0x1202), /* MAXIO MAP1202 */ > + .driver_data = NVME_QUIRK_BOGUS_NID, }, I highly doubt that MAXIO SSD controller may all have the issue of reporting nsid. 在 2022/4/11 14:07, Christoph Hellwig 写道: > Plase try this patch: > > --- > From 95cf9f00488ca5b203cb9205f108b8ec0c30d001 Mon Sep 17 00:00:00 2001 > From: Christoph Hellwig <hch@lst.de> > Date: Mon, 11 Apr 2022 08:05:27 +0200 > Subject: nvme: add a Namespace Identifiers quirk > MIME-Version: 1.0 > Content-Type: text/plain; charset=UTF-8 > Content-Transfer-Encoding: 8bit > > The MAXIO MAP1202 controllers reports completely bogus Namespace > identifiers that even change after suspend cycles. Disable using > the Identifiers entirely. > > Reported-by: 金韬 <me@kingtous.cn> > Signed-off-by: Christoph Hellwig <hch@lst.de> > --- > drivers/nvme/host/core.c | 24 ++++++++++++++++++------ > drivers/nvme/host/nvme.h | 5 +++++ > drivers/nvme/host/pci.c | 2 ++ > 3 files changed, 25 insertions(+), 6 deletions(-) > > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c > index efb85c6d8e2d5..907f4a2b3de87 100644 > --- a/drivers/nvme/host/core.c > +++ b/drivers/nvme/host/core.c > @@ -1282,6 +1282,8 @@ static int nvme_process_ns_desc(struct nvme_ctrl *ctrl, struct nvme_ns_ids *ids, > > switch (cur->nidt) { > case NVME_NIDT_EUI64: > + if (ctrl->quirks & NVME_QUIRK_BOGUS_NID) > + return -1; > if (cur->nidl != NVME_NIDT_EUI64_LEN) { > dev_warn(ctrl->device, "%s %d for NVME_NIDT_EUI64\n", > warn_str, cur->nidl); > @@ -1290,6 +1292,8 @@ static int nvme_process_ns_desc(struct nvme_ctrl *ctrl, struct nvme_ns_ids *ids, > memcpy(ids->eui64, data + sizeof(*cur), NVME_NIDT_EUI64_LEN); > return NVME_NIDT_EUI64_LEN; > case NVME_NIDT_NGUID: > + if (ctrl->quirks & NVME_QUIRK_BOGUS_NID) > + return -1; > if (cur->nidl != NVME_NIDT_NGUID_LEN) { > dev_warn(ctrl->device, "%s %d for NVME_NIDT_NGUID\n", > warn_str, cur->nidl); > @@ -1298,6 +1302,8 @@ static int nvme_process_ns_desc(struct nvme_ctrl *ctrl, struct nvme_ns_ids *ids, > memcpy(ids->nguid, data + sizeof(*cur), NVME_NIDT_NGUID_LEN); > return NVME_NIDT_NGUID_LEN; > case NVME_NIDT_UUID: > + if (ctrl->quirks & NVME_QUIRK_BOGUS_NID) > + return -1; > if (cur->nidl != NVME_NIDT_UUID_LEN) { > dev_warn(ctrl->device, "%s %d for NVME_NIDT_UUID\n", > warn_str, cur->nidl); > @@ -1399,12 +1405,18 @@ static int nvme_identify_ns(struct nvme_ctrl *ctrl, unsigned nsid, > if ((*id)->ncap == 0) /* namespace not allocated or attached */ > goto out_free_id; > > - if (ctrl->vs >= NVME_VS(1, 1, 0) && > - !memchr_inv(ids->eui64, 0, sizeof(ids->eui64))) > - memcpy(ids->eui64, (*id)->eui64, sizeof(ids->eui64)); > - if (ctrl->vs >= NVME_VS(1, 2, 0) && > - !memchr_inv(ids->nguid, 0, sizeof(ids->nguid))) > - memcpy(ids->nguid, (*id)->nguid, sizeof(ids->nguid)); > + > + if (ctrl->quirks & NVME_QUIRK_BOGUS_NID) { > + dev_info(ctrl->device, > + "Ignoring bogus Namespace Identifiers\n"); > + } else { > + if (ctrl->vs >= NVME_VS(1, 1, 0) && > + !memchr_inv(ids->eui64, 0, sizeof(ids->eui64))) > + memcpy(ids->eui64, (*id)->eui64, sizeof(ids->eui64)); > + if (ctrl->vs >= NVME_VS(1, 2, 0) && > + !memchr_inv(ids->nguid, 0, sizeof(ids->nguid))) > + memcpy(ids->nguid, (*id)->nguid, sizeof(ids->nguid)); > + } > > return 0; > > diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h > index 1393bbf82d71e..a2b53ca633359 100644 > --- a/drivers/nvme/host/nvme.h > +++ b/drivers/nvme/host/nvme.h > @@ -144,6 +144,11 @@ enum nvme_quirks { > * encoding the generation sequence number. > */ > NVME_QUIRK_SKIP_CID_GEN = (1 << 17), > + > + /* > + * Reports garbage in the namespace identifiers (eui64, nguid, uuid). > + */ > + NVME_QUIRK_BOGUS_NID = (1 << 18), > }; > > /* > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c > index d817ca17463ed..c386c91483505 100644 > --- a/drivers/nvme/host/pci.c > +++ b/drivers/nvme/host/pci.c > @@ -3447,6 +3447,8 @@ static const struct pci_device_id nvme_id_table[] = { > .driver_data = NVME_QUIRK_NO_DEEPEST_PS, }, > { PCI_DEVICE(0x2646, 0x2263), /* KINGSTON A2000 NVMe SSD */ > .driver_data = NVME_QUIRK_NO_DEEPEST_PS, }, > + { PCI_DEVICE(0x1e4B, 0x1202), /* MAXIO MAP1202 */ > + .driver_data = NVME_QUIRK_BOGUS_NID, }, > { PCI_DEVICE(PCI_VENDOR_ID_AMAZON, 0x0061), > .driver_data = NVME_QUIRK_DMA_ADDRESS_BITS_48, }, > { PCI_DEVICE(PCI_VENDOR_ID_AMAZON, 0x0065), ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] fix: nvme_update_ns_info method should be called even if nvme_ms_ids_equal return false 2022-04-11 14:20 ` 金韬 @ 2022-04-12 5:04 ` Christoph Hellwig 2022-04-12 6:34 ` 金韬 2022-05-23 18:18 ` Arman Hajishafieha 0 siblings, 2 replies; 21+ messages in thread From: Christoph Hellwig @ 2022-04-12 5:04 UTC (permalink / raw) To: 金韬 Cc: Christoph Hellwig, Keith Busch, linux-nvme, axboe, sagi, kingtous On Mon, Apr 11, 2022 at 10:20:13PM +0800, 金韬 wrote: > btw, I checked the patch code, I noticed "MAXIO MAP1202" is hard coded in > pci.c. I think not only MAP1202 has the problem mentioned in this thread, > but also "MAXIO MAP1002"(Gloway Basic 1T NVMe SSD, which has the same > suspend issue). If you can verify it has the issue we can add a quirk for it as well. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] fix: nvme_update_ns_info method should be called even if nvme_ms_ids_equal return false 2022-04-12 5:04 ` Christoph Hellwig @ 2022-04-12 6:34 ` 金韬 2022-05-23 18:18 ` Arman Hajishafieha 1 sibling, 0 replies; 21+ messages in thread From: 金韬 @ 2022-04-12 6:34 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Keith Busch, linux-nvme, axboe, sagi, kingtous Yes, the issue also exists on Gloway Basic NVMe SSD("MAXIO MAP1002") and I can reproduce it every time when I trigger a suspend action. WD SN550 has no suspend issue on my laptop, where the NVMe controller is not "MAXIO MAP1002/1202". 在 2022/4/12 13:04, Christoph Hellwig 写道: > On Mon, Apr 11, 2022 at 10:20:13PM +0800, 金韬 wrote: >> btw, I checked the patch code, I noticed "MAXIO MAP1202" is hard coded in >> pci.c. I think not only MAP1202 has the problem mentioned in this thread, >> but also "MAXIO MAP1002"(Gloway Basic 1T NVMe SSD, which has the same >> suspend issue). > > If you can verify it has the issue we can add a quirk for it as well. > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] fix: nvme_update_ns_info method should be called even if nvme_ms_ids_equal return false 2022-04-12 5:04 ` Christoph Hellwig 2022-04-12 6:34 ` 金韬 @ 2022-05-23 18:18 ` Arman Hajishafieha 2022-05-24 13:50 ` hch 1 sibling, 1 reply; 21+ messages in thread From: Arman Hajishafieha @ 2022-05-23 18:18 UTC (permalink / raw) To: hch, me; +Cc: kbusch, kingtous, sagi, linux-nvme, axboe On Tue, 2022-04-12 at 07:04 +0200, Christoph Hellwig wrote: > On Mon, Apr 11, 2022 at 10:20:13PM +0800, 金韬 wrote: > > btw, I checked the patch code, I noticed "MAXIO MAP1202" is hard > > coded in > > pci.c. I think not only MAP1202 has the problem mentioned in this > > thread, > > but also "MAXIO MAP1002"(Gloway Basic 1T NVMe SSD, which has the > > same > > suspend issue). > > If you can verify it has the issue we can add a quirk for it as well. Hello. I can confirm that "MAXIO MAP1001" (1e4b:1001) controller found in "Asgard N2 1TB SSD" suffers from the exact same issue as well. After applying the patch to disable checking namespace identifiers it can resume from suspend normally. It seems like this issue affects a wider range of Maxio controllers. Could you please add a quirk for this device as well? ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] fix: nvme_update_ns_info method should be called even if nvme_ms_ids_equal return false 2022-05-23 18:18 ` Arman Hajishafieha @ 2022-05-24 13:50 ` hch 2022-05-24 16:51 ` Arman Hajishafieha 0 siblings, 1 reply; 21+ messages in thread From: hch @ 2022-05-24 13:50 UTC (permalink / raw) To: Arman Hajishafieha; +Cc: hch, me, kbusch, kingtous, sagi, linux-nvme, axboe On Mon, May 23, 2022 at 06:18:01PM +0000, Arman Hajishafieha wrote: > I can confirm that "MAXIO MAP1001" (1e4b:1001) controller found in > "Asgard N2 1TB SSD" suffers from the exact same issue as well. > After applying the patch to disable checking namespace identifiers it > can resume from suspend normally. It seems like this issue affects a > wider range of Maxio controllers. > Could you please add a quirk for this device as well? Please try this: diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 5a98a7de09642..cb4adc0c22843 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -3453,6 +3453,8 @@ static const struct pci_device_id nvme_id_table[] = { .driver_data = NVME_QUIRK_NO_DEEPEST_PS, }, { PCI_DEVICE(0x2646, 0x2263), /* KINGSTON A2000 NVMe SSD */ .driver_data = NVME_QUIRK_NO_DEEPEST_PS, }, + { PCI_DEVICE(0x1e4B, 0x1001), /* MAXIO MAP1001 */ + .driver_data = NVME_QUIRK_BOGUS_NID, }, { PCI_DEVICE(0x1e4B, 0x1002), /* MAXIO MAP1002 */ .driver_data = NVME_QUIRK_BOGUS_NID, }, { PCI_DEVICE(0x1e4B, 0x1202), /* MAXIO MAP1202 */ ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH] fix: nvme_update_ns_info method should be called even if nvme_ms_ids_equal return false 2022-05-24 13:50 ` hch @ 2022-05-24 16:51 ` Arman Hajishafieha 2022-05-24 20:54 ` Chaitanya Kulkarni 0 siblings, 1 reply; 21+ messages in thread From: Arman Hajishafieha @ 2022-05-24 16:51 UTC (permalink / raw) To: hch; +Cc: kbusch, kingtous, sagi, me, linux-nvme, axboe On Tue, 2022-05-24 at 15:50 +0200, hch@lst.de wrote: > On Mon, May 23, 2022 at 06:18:01PM +0000, Arman Hajishafieha wrote: > > I can confirm that "MAXIO MAP1001" (1e4b:1001) controller found in > > "Asgard N2 1TB SSD" suffers from the exact same issue as well. > > After applying the patch to disable checking namespace identifiers > > it > > can resume from suspend normally. It seems like this issue affects > > a > > wider range of Maxio controllers. > > Could you please add a quirk for this device as well? > > Please try this: > > > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c > index 5a98a7de09642..cb4adc0c22843 100644 > --- a/drivers/nvme/host/pci.c > +++ b/drivers/nvme/host/pci.c > @@ -3453,6 +3453,8 @@ static const struct pci_device_id > nvme_id_table[] = { > .driver_data = NVME_QUIRK_NO_DEEPEST_PS, }, > { PCI_DEVICE(0x2646, 0x2263), /* KINGSTON A2000 NVMe SSD > */ > .driver_data = NVME_QUIRK_NO_DEEPEST_PS, }, > + { PCI_DEVICE(0x1e4B, 0x1001), /* MAXIO MAP1001 */ > + .driver_data = NVME_QUIRK_BOGUS_NID, }, > { PCI_DEVICE(0x1e4B, 0x1002), /* MAXIO MAP1002 */ > .driver_data = NVME_QUIRK_BOGUS_NID, }, > { PCI_DEVICE(0x1e4B, 0x1202), /* MAXIO MAP1202 */ > > I applied the patch against kernel 5.18.0 As dmesg log below shows, the quirk is applied correctly and resume from suspend is working as expected. Thank you [ 1.684064] nvme 0000:04:00.0: platform quirk: setting simple suspend [ 1.684119] nvme nvme0: pci function 0000:04:00.0 [ 1.696073] nvme nvme0: missing or invalid SUBNQN field. [ 1.699332] nvme nvme0: allocated 16 MiB host memory buffer. [ 1.701789] nvme nvme0: 8/0/0 default/read/poll queues [ 1.707999] nvme nvme0: Ignoring bogus Namespace Identifiers [ 1.710368] nvme0n1: p1 p2 p3 p4 p5 p6 p7 [ 36.560018] XFS (nvme0n1p5): Mounting V5 Filesystem [ 36.982002] XFS (nvme0n1p5): Ending clean mount [ 91.027799] nvme nvme0: 8/0/0 default/read/poll queues [ 91.034207] nvme nvme0: Ignoring bogus Namespace Identifiers ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] fix: nvme_update_ns_info method should be called even if nvme_ms_ids_equal return false 2022-05-24 16:51 ` Arman Hajishafieha @ 2022-05-24 20:54 ` Chaitanya Kulkarni 0 siblings, 0 replies; 21+ messages in thread From: Chaitanya Kulkarni @ 2022-05-24 20:54 UTC (permalink / raw) To: hch; +Cc: kbusch, kingtous, sagi, me, linux-nvme, axboe, Arman Hajishafieha >> Please try this: >> >> >> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c >> index 5a98a7de09642..cb4adc0c22843 100644 >> --- a/drivers/nvme/host/pci.c >> +++ b/drivers/nvme/host/pci.c >> @@ -3453,6 +3453,8 @@ static const struct pci_device_id >> nvme_id_table[] = { >> .driver_data = NVME_QUIRK_NO_DEEPEST_PS, }, >> { PCI_DEVICE(0x2646, 0x2263), /* KINGSTON A2000 NVMe SSD >> */ >> .driver_data = NVME_QUIRK_NO_DEEPEST_PS, }, >> + { PCI_DEVICE(0x1e4B, 0x1001), /* MAXIO MAP1001 */ >> + .driver_data = NVME_QUIRK_BOGUS_NID, }, >> { PCI_DEVICE(0x1e4B, 0x1002), /* MAXIO MAP1002 */ >> .driver_data = NVME_QUIRK_BOGUS_NID, }, >> { PCI_DEVICE(0x1e4B, 0x1202), /* MAXIO MAP1202 */ >> >> > I applied the patch against kernel 5.18.0 > As dmesg log below shows, the quirk is applied correctly and resume > from suspend is working as expected. > Thank you > > [ 1.684064] nvme 0000:04:00.0: platform quirk: setting simple > suspend > [ 1.684119] nvme nvme0: pci function 0000:04:00.0 > [ 1.696073] nvme nvme0: missing or invalid SUBNQN field. > [ 1.699332] nvme nvme0: allocated 16 MiB host memory buffer. > [ 1.701789] nvme nvme0: 8/0/0 default/read/poll queues > [ 1.707999] nvme nvme0: Ignoring bogus Namespace Identifiers > [ 1.710368] nvme0n1: p1 p2 p3 p4 p5 p6 p7 > [ 36.560018] XFS (nvme0n1p5): Mounting V5 Filesystem > [ 36.982002] XFS (nvme0n1p5): Ending clean mount > [ 91.027799] nvme nvme0: 8/0/0 default/read/poll queues > [ 91.034207] nvme nvme0: Ignoring bogus Namespace Identifiers Feel free to add this if you are planning to send as a formal patch. Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> -ck ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2022-05-24 20:54 UTC | newest] Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-04-08 2:57 [PATCH] fix: nvme_update_ns_info method should be called even if nvme_ms_ids_equal return false me 2022-04-08 5:19 ` Christoph Hellwig [not found] ` <ABwAxgCYBwe5Iq2fM1-8qqrc.3.1649398333597.Hmail.me@kingtous.cn> 2022-04-08 6:22 ` Christoph Hellwig 2022-04-08 7:56 ` 金韬 2022-04-08 8:07 ` Christoph Hellwig 2022-04-08 8:35 ` 金韬 2022-04-08 15:18 ` Keith Busch 2022-04-08 16:04 ` Christoph Hellwig 2022-04-09 0:58 ` Tao Jin 2022-04-09 4:43 ` Christoph Hellwig 2022-04-09 9:11 ` Tao Jin 2022-04-11 5:49 ` Christoph Hellwig 2022-04-11 5:56 ` 金韬 2022-04-11 6:07 ` Christoph Hellwig 2022-04-11 14:20 ` 金韬 2022-04-12 5:04 ` Christoph Hellwig 2022-04-12 6:34 ` 金韬 2022-05-23 18:18 ` Arman Hajishafieha 2022-05-24 13:50 ` hch 2022-05-24 16:51 ` Arman Hajishafieha 2022-05-24 20:54 ` Chaitanya Kulkarni
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.