All of lore.kernel.org
 help / color / mirror / Atom feed
* Fwd: Need NVME QUIRK BOGUS for SAMSUNG MZ1WV480HCGL-000MV (Samsung SM-953 Datacenter SSD)
@ 2023-06-26  1:15 Bagas Sanjaya
  2023-06-26 15:39 ` Chaitanya Kulkarni
  2023-06-27 16:10 ` Keith Busch
  0 siblings, 2 replies; 31+ messages in thread
From: Bagas Sanjaya @ 2023-06-26  1:15 UTC (permalink / raw)
  To: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
	Clemens S.,
	Martin Belanger, Chaitanya Kulkarni, John Meneghini,
	Hannes Reinecke
  Cc: Linux Kernel Mailing List, Linux Regressions, Linux NVMe

Hi,

I notice a regression report on Bugzilla [1]. Quoting from it:

> Hello,
> since upgrading one of my hosts to KErnel 6.2 i have a problem with 6 nvme disks (SAMSUNG MZ1WV480HCGL-000MV).
> 
> This problem didn't occur with Kernel 5.15.
> 
> Here is log from dmesg | grep nvme:
> 
> nvme nvme0: pci function 0000:04:00.0                                                                                                                                                                                                                                                                                         
> nvme nvme1: pci function 0000:05:00.0                                                                                                                                                                                                                                                                                         
> nvme nvme2: pci function 0000:06:00.0                                                                                                                                                                                                                                                                                         
> nvme nvme3: pci function 0000:07:00.0                                                                                                                                                                                                                                                                                         
> nvme nvme4: pci function 0000:0a:00.0                                                                                                                                                                                                                                                                                         
> nvme nvme5: pci function 0000:0b:00.0                                                                                                                                                                                                                                                                                         
> nvme nvme0: 8/0/0 default/read/poll queues                                                                                                                                                                                                                                                                                    
> nvme nvme1: 8/0/0 default/read/poll queues                                                                                                                                                                                                                                                                                    
> nvme nvme2: 8/0/0 default/read/poll queues                                                                                                                                                                                                                                                                                    
> nvme nvme3: 8/0/0 default/read/poll queues                                                                                                                                                                                                                                                                                    
> nvme nvme4: 8/0/0 default/read/poll queues                                                                                                                                                                                                                                                                                    
> nvme nvme5: 8/0/0 default/read/poll queues                                                                                                                                                                                                                                                                                    
> nvme nvme1: globally duplicate IDs for nsid 1                                                                                                                                                                                                                                                                                 
> nvme nvme1: VID:DID 144d:a802 model:SAMSUNG MZ1WV480HCGL-000MV firmware:BXU87M9Q                                                                                                                                                                                                                                              
> nvme nvme2: globally duplicate IDs for nsid 1                                                                                                                                                                                                                                                                                 
> nvme nvme2: VID:DID 144d:a802 model:SAMSUNG MZ1WV480HCGL-000MV firmware:BXU87M9Q                                                                                                                                                                                                                                              
> nvme nvme4: globally duplicate IDs for nsid 1                                                                                                                                                                                                                                                                                 
> nvme nvme3: globally duplicate IDs for nsid 1                                                                                                                                                                                                                                                                                 
> nvme nvme4: VID:DID 144d:a802 model:SAMSUNG MZ1WV480HCGL-000MV firmware:BXU87M9Q                                                                                                                                                                                                                                              
> nvme nvme3: VID:DID 144d:a802 model:SAMSUNG MZ1WV480HCGL-000MV firmware:BXU87M9Q                                                                                                                                                                                                                                              
> nvme nvme5: globally duplicate IDs for nsid 1                                                                                                                                                                                                                                                                                 
> nvme nvme5: VID:DID 144d:a802 model:SAMSUNG MZ1WV480HCGL-000MV firmware:BXU87M9Q                                                                                                                                                                                                                                              
> nvme0n1: p1 p9                           
> 
> Only one drive, out of 6 is usable.
> 
> These drives are rather common datacenter drives (Samsung SM-953).
> 
> From another Bug-Report (https://bugzilla.kernel.org/show_bug.cgi?id=217384):
> 
> To fix it, please add two strings in pci.c
> 
> { PCI_DEVICE(0x144d, 0xa802), /* SAMSUNG MZ1WV480HCGL-000MV  */
> .driver_data = NVME_QUIRK_BOGUS_NID, },
> 
> Thanks in advance.

See Bugzilla for the full thread.

The reporter had a quirk (see above) that fixed this regression,
nevertheless I'm adding it to regzbot to make sure it doesn't fall
through cracks unnoticed:

#regzbot introduced: 86c2457a8e8112f https://bugzilla.kernel.org/show_bug.cgi?id=217593
#regzbot title: NVME_QUIRK_BOGUS_NID is needed for SAMSUNG MZ1WV480HCGL-000MV

Thanks.

[1]: https://bugzilla.kernel.org/show_bug.cgi?id=217593

-- 
An old man doll... just what I always wanted! - Clara

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Fwd: Need NVME QUIRK BOGUS for SAMSUNG MZ1WV480HCGL-000MV (Samsung SM-953 Datacenter SSD)
  2023-06-26  1:15 Fwd: Need NVME QUIRK BOGUS for SAMSUNG MZ1WV480HCGL-000MV (Samsung SM-953 Datacenter SSD) Bagas Sanjaya
@ 2023-06-26 15:39 ` Chaitanya Kulkarni
  2023-06-26 18:26   ` Linux regression tracking (Thorsten Leemhuis)
  2023-06-27 16:10 ` Keith Busch
  1 sibling, 1 reply; 31+ messages in thread
From: Chaitanya Kulkarni @ 2023-06-26 15:39 UTC (permalink / raw)
  To: Bagas Sanjaya, Clemens S.
  Cc: Linux Kernel Mailing List, Linux Regressions, Linux NVMe,
	John Meneghini, Chaitanya Kulkarni, Sagi Grimberg, Keith Busch,
	Christoph Hellwig, Hannes Reinecke, Martin Belanger

On 6/25/2023 6:15 PM, Bagas Sanjaya wrote:
> Hi,
> 
> I notice a regression report on Bugzilla [1]. Quoting from it:

[...]

>> Only one drive, out of 6 is usable.
>>
>> These drives are rather common datacenter drives (Samsung SM-953).
>>
>>  From another Bug-Report (https://bugzilla.kernel.org/show_bug.cgi?id=217384):
>>
>> To fix it, please add two strings in pci.c
>>
>> { PCI_DEVICE(0x144d, 0xa802), /* SAMSUNG MZ1WV480HCGL-000MV  */
>> .driver_data = NVME_QUIRK_BOGUS_NID, },
>>

Please send a formal patch I'll be happy to review it ..

-ck



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Fwd: Need NVME QUIRK BOGUS for SAMSUNG MZ1WV480HCGL-000MV (Samsung SM-953 Datacenter SSD)
  2023-06-26 15:39 ` Chaitanya Kulkarni
@ 2023-06-26 18:26   ` Linux regression tracking (Thorsten Leemhuis)
  0 siblings, 0 replies; 31+ messages in thread
From: Linux regression tracking (Thorsten Leemhuis) @ 2023-06-26 18:26 UTC (permalink / raw)
  To: Chaitanya Kulkarni, Bagas Sanjaya, Clemens S.
  Cc: Linux Kernel Mailing List, Linux Regressions, Linux NVMe,
	John Meneghini, Sagi Grimberg, Keith Busch, Christoph Hellwig,
	Hannes Reinecke, Martin Belanger

On 26.06.23 17:39, Chaitanya Kulkarni wrote:
> On 6/25/2023 6:15 PM, Bagas Sanjaya wrote:
>>
>> I notice a regression report on Bugzilla [1]. Quoting from it:
> 
> [...]
> 
>>> Only one drive, out of 6 is usable.
>>>
>>> These drives are rather common datacenter drives (Samsung SM-953).
>>>
>>>  From another Bug-Report (https://bugzilla.kernel.org/show_bug.cgi?id=217384):
>>>
>>> To fix it, please add two strings in pci.c
>>>
>>> { PCI_DEVICE(0x144d, 0xa802), /* SAMSUNG MZ1WV480HCGL-000MV  */
>>> .driver_data = NVME_QUIRK_BOGUS_NID, },
> 
> Please send a formal patch I'll be happy to review it ..

Huh, sorry, but why?

You might not know this, but Bagas is somebody that helps with
regression tracking (many thx for this BTW), so he's not even affected
by the problem -- and most likely is neither familiar with the code or
has the hardware to test the quirk entry.

Sure, asking others for help is fine. But from what I see this is
something that should be handled by those that developed the culprit.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Fwd: Need NVME QUIRK BOGUS for SAMSUNG MZ1WV480HCGL-000MV (Samsung SM-953 Datacenter SSD)
  2023-06-26  1:15 Fwd: Need NVME QUIRK BOGUS for SAMSUNG MZ1WV480HCGL-000MV (Samsung SM-953 Datacenter SSD) Bagas Sanjaya
  2023-06-26 15:39 ` Chaitanya Kulkarni
@ 2023-06-27 16:10 ` Keith Busch
  2023-06-27 21:57   ` Chaitanya Kulkarni
                     ` (3 more replies)
  1 sibling, 4 replies; 31+ messages in thread
From: Keith Busch @ 2023-06-27 16:10 UTC (permalink / raw)
  To: Bagas Sanjaya
  Cc: Jens Axboe, Christoph Hellwig, Sagi Grimberg, Clemens S.,
	Martin Belanger, Chaitanya Kulkarni, John Meneghini,
	Hannes Reinecke, Linux Kernel Mailing List, Linux Regressions,
	Linux NVMe, Pankaj Raghav, Kanchan Joshi

On Mon, Jun 26, 2023 at 08:15:49AM +0700, Bagas Sanjaya wrote:
> See Bugzilla for the full thread.
> 
> The reporter had a quirk (see above) that fixed this regression,
> nevertheless I'm adding it to regzbot to make sure it doesn't fall
> through cracks unnoticed:
> 
> #regzbot introduced: 86c2457a8e8112f https://bugzilla.kernel.org/show_bug.cgi?id=217593
> #regzbot title: NVME_QUIRK_BOGUS_NID is needed for SAMSUNG MZ1WV480HCGL-000MV

These bug reports really should go to the vendors that created the
broken device with non-unique "unique" fields. It's easy enough for me
to write the quirk patch, but it's not the ideal solution and may harm
devices/firmwares with the same VID:DID that don't have this problem.
Not being the vendor, I'm not in a postition to know that, so CC'ing
some Samsung folks.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Fwd: Need NVME QUIRK BOGUS for SAMSUNG MZ1WV480HCGL-000MV (Samsung SM-953 Datacenter SSD)
  2023-06-27 16:10 ` Keith Busch
@ 2023-06-27 21:57   ` Chaitanya Kulkarni
  2023-06-29  8:49   ` Pankaj Raghav
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 31+ messages in thread
From: Chaitanya Kulkarni @ 2023-06-27 21:57 UTC (permalink / raw)
  To: Keith Busch, Bagas Sanjaya
  Cc: Jens Axboe, Christoph Hellwig, Sagi Grimberg, Clemens S.,
	Martin Belanger, Chaitanya Kulkarni, John Meneghini,
	Hannes Reinecke, Linux Kernel Mailing List, Linux Regressions,
	Linux NVMe, Pankaj Raghav, Kanchan Joshi

On 6/27/23 09:10, Keith Busch wrote:
> On Mon, Jun 26, 2023 at 08:15:49AM +0700, Bagas Sanjaya wrote:
>> See Bugzilla for the full thread.
>>
>> The reporter had a quirk (see above) that fixed this regression,
>> nevertheless I'm adding it to regzbot to make sure it doesn't fall
>> through cracks unnoticed:
>>
>> #regzbot introduced: 86c2457a8e8112f https://bugzilla.kernel.org/show_bug.cgi?id=217593
>> #regzbot title: NVME_QUIRK_BOGUS_NID is needed for SAMSUNG MZ1WV480HCGL-000MV
> These bug reports really should go to the vendors that created the
> broken device with non-unique "unique" fields. It's easy enough for me
> to write the quirk patch, but it's not the ideal solution and may harm
> devices/firmwares with the same VID:DID that don't have this problem.
> Not being the vendor, I'm not in a postition to know that, so CC'ing
> some Samsung folks.

It is easy enough to send a patch also but I don't have the device..
hopefully Samsung folks can help resolve this ..

-ck



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Fwd: Need NVME QUIRK BOGUS for SAMSUNG MZ1WV480HCGL-000MV (Samsung SM-953 Datacenter SSD)
  2023-06-27 16:10 ` Keith Busch
  2023-06-27 21:57   ` Chaitanya Kulkarni
@ 2023-06-29  8:49   ` Pankaj Raghav
  2023-06-30  8:40   ` Pankaj Raghav
  2023-07-10  8:52   ` Linux regression tracking (Thorsten Leemhuis)
  3 siblings, 0 replies; 31+ messages in thread
From: Pankaj Raghav @ 2023-06-29  8:49 UTC (permalink / raw)
  To: Keith Busch, Bagas Sanjaya
  Cc: Jens Axboe, Christoph Hellwig, Sagi Grimberg, Clemens S.,
	Martin Belanger, Chaitanya Kulkarni, John Meneghini,
	Hannes Reinecke, Linux Kernel Mailing List, Linux Regressions,
	Linux NVMe, Kanchan Joshi, Javier Gonzalez

Thanks for forwarding the report, Keith!

On 2023-06-27 18:10, Keith Busch wrote:
> On Mon, Jun 26, 2023 at 08:15:49AM +0700, Bagas Sanjaya wrote:
>> See Bugzilla for the full thread.
>>
>> The reporter had a quirk (see above) that fixed this regression,
>> nevertheless I'm adding it to regzbot to make sure it doesn't fall
>> through cracks unnoticed:
>>
>> #regzbot introduced: 86c2457a8e8112f https://bugzilla.kernel.org/show_bug.cgi?id=217593
>> #regzbot title: NVME_QUIRK_BOGUS_NID is needed for SAMSUNG MZ1WV480HCGL-000MV
> 
> These bug reports really should go to the vendors that created the
> broken device with non-unique "unique" fields. It's easy enough for me
> to write the quirk patch, but it's not the ideal solution and may harm
> devices/firmwares with the same VID:DID that don't have this problem.
> Not being the vendor, I'm not in a postition to know that, so CC'ing
> some Samsung folks.

I will forward this report internally. Usually the Datacenter SSDs firmware
are created based on customer requirements, so please wait before applying these
changes.

Regards,
Pankaj

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Fwd: Need NVME QUIRK BOGUS for SAMSUNG MZ1WV480HCGL-000MV (Samsung SM-953 Datacenter SSD)
  2023-06-27 16:10 ` Keith Busch
  2023-06-27 21:57   ` Chaitanya Kulkarni
  2023-06-29  8:49   ` Pankaj Raghav
@ 2023-06-30  8:40   ` Pankaj Raghav
  2023-06-30 15:18     ` Clemens Springsguth
  2023-07-10  8:52   ` Linux regression tracking (Thorsten Leemhuis)
  3 siblings, 1 reply; 31+ messages in thread
From: Pankaj Raghav @ 2023-06-30  8:40 UTC (permalink / raw)
  To: Keith Busch, Clemens S.
  Cc: Jens Axboe, Christoph Hellwig, Sagi Grimberg, Martin Belanger,
	Chaitanya Kulkarni, John Meneghini, Hannes Reinecke,
	Linux Kernel Mailing List, Linux Regressions, Linux NVMe,
	Kanchan Joshi, Jinhwan Park, Bagas Sanjaya, Javier Gonzalez

On 2023-06-27 18:10, Keith Busch wrote:
> On Mon, Jun 26, 2023 at 08:15:49AM +0700, Bagas Sanjaya wrote:
>> See Bugzilla for the full thread.
>>
>> The reporter had a quirk (see above) that fixed this regression,
>> nevertheless I'm adding it to regzbot to make sure it doesn't fall
>> through cracks unnoticed:
>>
>> #regzbot introduced: 86c2457a8e8112f https://bugzilla.kernel.org/show_bug.cgi?id=217593
>> #regzbot title: NVME_QUIRK_BOGUS_NID is needed for SAMSUNG MZ1WV480HCGL-000MV
> 
> These bug reports really should go to the vendors that created the
> broken device with non-unique "unique" fields. It's easy enough for me
> to write the quirk patch, but it's not the ideal solution and may harm
> devices/firmwares with the same VID:DID that don't have this problem.
> Not being the vendor, I'm not in a postition to know that, so CC'ing
> some Samsung folks.

I got a reply from our team internally. There is no plan on releasing
a new firmware for this model and the related models with the same VID:DID
pair. We can add a quirk.

@Clemens: Let me know if you want me to send the quirk as a proper patch.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Fwd: Need NVME QUIRK BOGUS for SAMSUNG MZ1WV480HCGL-000MV (Samsung SM-953 Datacenter SSD)
  2023-06-30  8:40   ` Pankaj Raghav
@ 2023-06-30 15:18     ` Clemens Springsguth
  2023-06-30 16:21       ` Pankaj Raghav
  0 siblings, 1 reply; 31+ messages in thread
From: Clemens Springsguth @ 2023-06-30 15:18 UTC (permalink / raw)
  To: Pankaj Raghav
  Cc: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
	Martin Belanger, Chaitanya Kulkarni, John Meneghini,
	Hannes Reinecke, Linux Kernel Mailing List, Linux Regressions,
	Linux NVMe, Kanchan Joshi, Jinhwan Park, Bagas Sanjaya,
	Javier Gonzalez

Hi Pankaj,

i would be very very grateful if you would create and submit the quirk
as a proper patch.

kind regards
Clemens


On Fri, Jun 30, 2023 at 10:40 AM Pankaj Raghav <p.raghav@samsung.com> wrote:
>
> On 2023-06-27 18:10, Keith Busch wrote:
> > On Mon, Jun 26, 2023 at 08:15:49AM +0700, Bagas Sanjaya wrote:
> >> See Bugzilla for the full thread.
> >>
> >> The reporter had a quirk (see above) that fixed this regression,
> >> nevertheless I'm adding it to regzbot to make sure it doesn't fall
> >> through cracks unnoticed:
> >>
> >> #regzbot introduced: 86c2457a8e8112f https://bugzilla.kernel.org/show_bug.cgi?id=217593
> >> #regzbot title: NVME_QUIRK_BOGUS_NID is needed for SAMSUNG MZ1WV480HCGL-000MV
> >
> > These bug reports really should go to the vendors that created the
> > broken device with non-unique "unique" fields. It's easy enough for me
> > to write the quirk patch, but it's not the ideal solution and may harm
> > devices/firmwares with the same VID:DID that don't have this problem.
> > Not being the vendor, I'm not in a postition to know that, so CC'ing
> > some Samsung folks.
>
> I got a reply from our team internally. There is no plan on releasing
> a new firmware for this model and the related models with the same VID:DID
> pair. We can add a quirk.
>
> @Clemens: Let me know if you want me to send the quirk as a proper patch.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Fwd: Need NVME QUIRK BOGUS for SAMSUNG MZ1WV480HCGL-000MV (Samsung SM-953 Datacenter SSD)
  2023-06-30 15:18     ` Clemens Springsguth
@ 2023-06-30 16:21       ` Pankaj Raghav
  2023-07-03 13:25         ` Clemens Springsguth
  0 siblings, 1 reply; 31+ messages in thread
From: Pankaj Raghav @ 2023-06-30 16:21 UTC (permalink / raw)
  To: Clemens Springsguth
  Cc: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
	Martin Belanger, Chaitanya Kulkarni, John Meneghini,
	Hannes Reinecke, Linux Kernel Mailing List, Linux Regressions,
	Linux NVMe, Kanchan Joshi, Jinhwan Park, Bagas Sanjaya,
	Javier Gonzalez

On 2023-06-30 17:18, Clemens Springsguth wrote:
> Hi Pankaj,
> 
> i would be very very grateful if you would create and submit the quirk
> as a proper patch.
> 

No problem. I will send something soon, but could you test it and
give me a tested-by tag when I send the patch? I don't have that
model at my disposal.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Fwd: Need NVME QUIRK BOGUS for SAMSUNG MZ1WV480HCGL-000MV (Samsung SM-953 Datacenter SSD)
  2023-06-30 16:21       ` Pankaj Raghav
@ 2023-07-03 13:25         ` Clemens Springsguth
  0 siblings, 0 replies; 31+ messages in thread
From: Clemens Springsguth @ 2023-07-03 13:25 UTC (permalink / raw)
  To: Pankaj Raghav
  Cc: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
	Martin Belanger, Chaitanya Kulkarni, John Meneghini,
	Hannes Reinecke, Linux Kernel Mailing List, Linux Regressions,
	Linux NVMe, Kanchan Joshi, Jinhwan Park, Bagas Sanjaya,
	Javier Gonzalez

On Fri, Jun 30, 2023 at 6:21 PM Pankaj Raghav <p.raghav@samsung.com> wrote:
>
> On 2023-06-30 17:18, Clemens Springsguth wrote:
> > Hi Pankaj,
> >
> > i would be very very grateful if you would create and submit the quirk
> > as a proper patch.
> >
>
> No problem. I will send something soon, but could you test it and
> give me a tested-by tag when I send the patch? I don't have that
> model at my disposal.
>

Hi Pankaj,

I am available for testing with pleasure.

kind regards
Clemens

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Fwd: Need NVME QUIRK BOGUS for SAMSUNG MZ1WV480HCGL-000MV (Samsung SM-953 Datacenter SSD)
  2023-06-27 16:10 ` Keith Busch
                     ` (2 preceding siblings ...)
  2023-06-30  8:40   ` Pankaj Raghav
@ 2023-07-10  8:52   ` Linux regression tracking (Thorsten Leemhuis)
  2023-07-10 15:58     ` Keith Busch
  3 siblings, 1 reply; 31+ messages in thread
From: Linux regression tracking (Thorsten Leemhuis) @ 2023-07-10  8:52 UTC (permalink / raw)
  To: Keith Busch, Bagas Sanjaya
  Cc: Jens Axboe, Christoph Hellwig, Sagi Grimberg, Clemens S.,
	Martin Belanger, Chaitanya Kulkarni, John Meneghini,
	Hannes Reinecke, Linux Kernel Mailing List, Linux Regressions,
	Linux NVMe, Pankaj Raghav, Kanchan Joshi

On 27.06.23 18:10, Keith Busch wrote:
> On Mon, Jun 26, 2023 at 08:15:49AM +0700, Bagas Sanjaya wrote:
>> See Bugzilla for the full thread.
>>
>> The reporter had a quirk (see above) that fixed this regression,
>> nevertheless I'm adding it to regzbot to make sure it doesn't fall
>> through cracks unnoticed:
>>
>> #regzbot introduced: 86c2457a8e8112f https://bugzilla.kernel.org/show_bug.cgi?id=217593
>> #regzbot title: NVME_QUIRK_BOGUS_NID is needed for SAMSUNG MZ1WV480HCGL-000MV
> 
> These bug reports really should go to the vendors that created the
> broken device with non-unique "unique" fields.

I understand that, but I think we need middlemen for that, as I or Bagas
don't have the contacts -- and it's IMHO also a bit much too ask us for
in general, as regression tracking is hard enough already. At least
unless this becomes something that happen regularly, then a list of
persons we could contact would be fine I guess. But we simply can't deal
with too many subsystem specific special cases.

> It's easy enough for me
> to write the quirk patch, but it's not the ideal solution and may harm
> devices/firmwares with the same VID:DID that don't have this problem.
> Not being the vendor, I'm not in a postition to know that, so CC'ing
> some Samsung folks.

Another request came in today, even with a pseudo-patch:
https://bugzilla.kernel.org/show_bug.cgi?id=217649

To quote:
```
As with numerous NVMe controllers these days, Samsung's
MZAL41T0HBLB-00BL2, which Lenovo builds into their 16ARP8 also suffers
from invalid IDs, breaking suspend and hibernate also on the latest
kernel 6.4.2.

The following change restores this functionality:

File: root/drivers/nvme/host/pci.c
Change:

-	{ PCI_DEVICE(0x144d, 0xa80b),   /* Samsung PM9B1 256G and 512G */
-		.driver_data = NVME_QUIRK_DISABLE_WRITE_ZEROES, },

+	{ PCI_DEVICE(0x144d, 0xa80b),   /* Samsung PM9B1 256G, 512G and 1TB */
+		.driver_data = NVME_QUIRK_BOGUS_NID |
+				NVME_QUIRK_DISABLE_WRITE_ZEROES, },
```

Ciao, Thorsten

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Fwd: Need NVME QUIRK BOGUS for SAMSUNG MZ1WV480HCGL-000MV (Samsung SM-953 Datacenter SSD)
  2023-07-10  8:52   ` Linux regression tracking (Thorsten Leemhuis)
@ 2023-07-10 15:58     ` Keith Busch
  2023-07-11  6:54       ` Pankaj Raghav
  0 siblings, 1 reply; 31+ messages in thread
From: Keith Busch @ 2023-07-10 15:58 UTC (permalink / raw)
  To: Linux regressions mailing list
  Cc: Bagas Sanjaya, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
	Clemens S.,
	Martin Belanger, Chaitanya Kulkarni, John Meneghini,
	Hannes Reinecke, Linux Kernel Mailing List, Linux NVMe,
	Pankaj Raghav, Kanchan Joshi

On Mon, Jul 10, 2023 at 10:52:52AM +0200, Linux regression tracking (Thorsten Leemhuis) wrote:
> On 27.06.23 18:10, Keith Busch wrote:
> > On Mon, Jun 26, 2023 at 08:15:49AM +0700, Bagas Sanjaya wrote:
> >> See Bugzilla for the full thread.
> >>
> >> The reporter had a quirk (see above) that fixed this regression,
> >> nevertheless I'm adding it to regzbot to make sure it doesn't fall
> >> through cracks unnoticed:
> >>
> >> #regzbot introduced: 86c2457a8e8112f https://bugzilla.kernel.org/show_bug.cgi?id=217593
> >> #regzbot title: NVME_QUIRK_BOGUS_NID is needed for SAMSUNG MZ1WV480HCGL-000MV
> > 
> > These bug reports really should go to the vendors that created the
> > broken device with non-unique "unique" fields.
> 
> I understand that, but I think we need middlemen for that, as I or Bagas
> don't have the contacts -- and it's IMHO also a bit much too ask us for
> in general, as regression tracking is hard enough already. At least
> unless this becomes something that happen regularly, then a list of
> persons we could contact would be fine I guess. But we simply can't deal
> with too many subsystem specific special cases.

I'm not asking the Linux regression trackers to fill that role, though.
I'm asking people who experience these issues report it to their vendor
directly because these device makers apparently have zero clue that
their spec non-compliance is causing painful experiences for their
customers and annoyance for maintainers. They keep pumping out more and
more devices with the same breakage.

This particular vendor has been great at engaging with Linux, but that's
not necessarily normal among all device makers, and I don't have
contacts with the majority of the vendors we've had to quirk for this
issue.

We did complain to the NVMe spec workgroup that their complaince cert
suite is not testing for this. There was a little initial interest in
fixing that gap, but it fizzled out...

> Another request came in today, even with a pseudo-patch:
> https://bugzilla.kernel.org/show_bug.cgi?id=217649
> 
> To quote:
> ```
> As with numerous NVMe controllers these days, Samsung's
> MZAL41T0HBLB-00BL2, which Lenovo builds into their 16ARP8 also suffers
> from invalid IDs, breaking suspend and hibernate also on the latest
> kernel 6.4.2.
> 
> The following change restores this functionality:
> 
> File: root/drivers/nvme/host/pci.c
> Change:
> 
> -	{ PCI_DEVICE(0x144d, 0xa80b),   /* Samsung PM9B1 256G and 512G */
> -		.driver_data = NVME_QUIRK_DISABLE_WRITE_ZEROES, },
> 
> +	{ PCI_DEVICE(0x144d, 0xa80b),   /* Samsung PM9B1 256G, 512G and 1TB */
> +		.driver_data = NVME_QUIRK_BOGUS_NID |
> +				NVME_QUIRK_DISABLE_WRITE_ZEROES, },

Panjaj, okay with this one too?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Fwd: Need NVME QUIRK BOGUS for SAMSUNG MZ1WV480HCGL-000MV (Samsung SM-953 Datacenter SSD)
  2023-07-10 15:58     ` Keith Busch
@ 2023-07-11  6:54       ` Pankaj Raghav
  2023-07-11  9:39         ` Linux regression tracking (Thorsten Leemhuis)
  2023-07-16 19:19         ` August Wikerfors
  0 siblings, 2 replies; 31+ messages in thread
From: Pankaj Raghav @ 2023-07-11  6:54 UTC (permalink / raw)
  To: Keith Busch, Linux regressions mailing list
  Cc: Bagas Sanjaya, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
	Clemens S.,
	Martin Belanger, Chaitanya Kulkarni, John Meneghini,
	Hannes Reinecke, Linux Kernel Mailing List, Linux NVMe,
	Kanchan Joshi, Javier Gonzalez, 박진환

>> I understand that, but I think we need middlemen for that, as I or Bagas
>> don't have the contacts -- and it's IMHO also a bit much too ask us for
>> in general, as regression tracking is hard enough already. At least
>> unless this becomes something that happen regularly, then a list of
>> persons we could contact would be fine I guess. But we simply can't deal
>> with too many subsystem specific special cases.
> 
> I'm not asking the Linux regression trackers to fill that role, though.
> I'm asking people who experience these issues report it to their vendor
> directly because these device makers apparently have zero clue that
> their spec non-compliance is causing painful experiences for their
> customers and annoyance for maintainers. They keep pumping out more and
> more devices with the same breakage.
> 
> This particular vendor has been great at engaging with Linux, but that's
> not necessarily normal among all device makers, and I don't have
> contacts with the majority of the vendors we've had to quirk for this
> issue.
> 
> We did complain to the NVMe spec workgroup that their complaince cert
> suite is not testing for this. There was a little initial interest in
> fixing that gap, but it fizzled out...
> 
>> Another request came in today, even with a pseudo-patch:
>> https://bugzilla.kernel.org/show_bug.cgi?id=217649
>>
>> To quote:
>> ```
>> As with numerous NVMe controllers these days, Samsung's
>> MZAL41T0HBLB-00BL2, which Lenovo builds into their 16ARP8 also suffers
>> from invalid IDs, breaking suspend and hibernate also on the latest
>> kernel 6.4.2.
>>
>> The following change restores this functionality:
>>
>> File: root/drivers/nvme/host/pci.c
>> Change:
>>
>> -	{ PCI_DEVICE(0x144d, 0xa80b),   /* Samsung PM9B1 256G and 512G */
>> -		.driver_data = NVME_QUIRK_DISABLE_WRITE_ZEROES, },
>>
>> +	{ PCI_DEVICE(0x144d, 0xa80b),   /* Samsung PM9B1 256G, 512G and 1TB */
>> +		.driver_data = NVME_QUIRK_BOGUS_NID |
>> +				NVME_QUIRK_DISABLE_WRITE_ZEROES, },
> 
> Panjaj, okay with this one too?

This looks a like a new device that might have a firmware update. I will ping
internally first.

As you mentioned, the recent addition of globally unique ID check
is breaking a lot of devices because of non-compliant firmware. I will try to create
some awareness about this issue internally as well.

--
Pankaj

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Fwd: Need NVME QUIRK BOGUS for SAMSUNG MZ1WV480HCGL-000MV (Samsung SM-953 Datacenter SSD)
  2023-07-11  6:54       ` Pankaj Raghav
@ 2023-07-11  9:39         ` Linux regression tracking (Thorsten Leemhuis)
  2023-07-11 12:06           ` Christoph Hellwig
  2023-07-16 19:19         ` August Wikerfors
  1 sibling, 1 reply; 31+ messages in thread
From: Linux regression tracking (Thorsten Leemhuis) @ 2023-07-11  9:39 UTC (permalink / raw)
  To: Pankaj Raghav, Keith Busch, Linux regressions mailing list
  Cc: Bagas Sanjaya, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
	Clemens S.,
	Martin Belanger, Chaitanya Kulkarni, John Meneghini,
	Hannes Reinecke, Linux Kernel Mailing List, Linux NVMe,
	Kanchan Joshi, Javier Gonzalez, 박진환,
	Linus Torvalds

[CCing Linus for the "whack a mole" aspect in the second half]

On 11.07.23 08:54, Pankaj Raghav wrote:
>>> I understand that, but I think we need middlemen for that, as I or Bagas
>>> don't have the contacts -- and it's IMHO also a bit much too ask us for
>>> in general, as regression tracking is hard enough already. At least
>>> unless this becomes something that happen regularly, then a list of
>>> persons we could contact would be fine I guess. But we simply can't deal
>>> with too many subsystem specific special cases.
>>
>> I'm not asking the Linux regression trackers to fill that role, though.

Well, during our work we often encounter those bugs -- often from people
that are no regular developers that already had a hard time
understanding the issue and reporting it to us somehow. Asking those to...

>> I'm asking people who experience these issues report it to their vendor

...find the right destination and format to report their Linux problems
to the vendors is unlikely to fly I suspect. And I'm not sure if that is
in our interest, as then it might take a lot longer to get those quirk
entries into the kernel source.

But whatever, the main reason why I write this mail is different:

>> directly because these device makers apparently have zero clue that
>> their spec non-compliance is causing painful experiences for their
>> customers and annoyance for maintainers. They keep pumping out more and
>> more devices with the same breakage.
>>
>> This particular vendor has been great at engaging with Linux, but that's
>> not necessarily normal among all device makers, and I don't have
>> contacts with the majority of the vendors we've had to quirk for this
>> issue.
>>
>> We did complain to the NVMe spec workgroup that their complaince cert
>> suite is not testing for this. There was a little initial interest in
>> fixing that gap, but it fizzled out...

Preface: this is not my area of expertise, and maybe I should keep my
mouth shut. But whatever.

Well, that "They keep pumping out more and more devices with the same
breakage" and the "new device" comment from Pankaj below bear the
question: should we stop trying to play "whack a mole" with all those
quirk entries and handle devices with duplicate ids just like Windows does?

That would "make things just work"(tm).

And yes, I suspect there are good reasons why we went down the "quirk"
route or why abandoning it might be hard. But maybe it's time to
reconsider that path, as from my outside point of view things sound a
lot like they are somewhat similar to the ACPI problems we dealt with
~15 years ago: we learned that we have to deal with broken ACPI
implementations and somehow use them in a way similar to how Windows
uses them, as that's the OS the machine was designed for and tested with.

Ciao, Thorsten

>>> Another request came in today, even with a pseudo-patch:
>>> https://bugzilla.kernel.org/show_bug.cgi?id=217649
>>>
>>> To quote:
>>> ```
>>> As with numerous NVMe controllers these days, Samsung's
>>> MZAL41T0HBLB-00BL2, which Lenovo builds into their 16ARP8 also suffers
>>> from invalid IDs, breaking suspend and hibernate also on the latest
>>> kernel 6.4.2.
> [...]
>> Panjaj, okay with this one too?
> 
> This looks a like a new device that might have a firmware update. I will ping
> internally first.
> 
> As you mentioned, the recent addition of globally unique ID check
> is breaking a lot of devices because of non-compliant firmware. I will try to create
> some awareness about this issue internally as well.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Fwd: Need NVME QUIRK BOGUS for SAMSUNG MZ1WV480HCGL-000MV (Samsung SM-953 Datacenter SSD)
  2023-07-11  9:39         ` Linux regression tracking (Thorsten Leemhuis)
@ 2023-07-11 12:06           ` Christoph Hellwig
  2023-07-11 12:14             ` Sagi Grimberg
                               ` (2 more replies)
  0 siblings, 3 replies; 31+ messages in thread
From: Christoph Hellwig @ 2023-07-11 12:06 UTC (permalink / raw)
  To: Linux regressions mailing list
  Cc: Pankaj Raghav, Keith Busch, Bagas Sanjaya, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, Clemens S.,
	Martin Belanger, Chaitanya Kulkarni, John Meneghini,
	Hannes Reinecke, Linux Kernel Mailing List, Linux NVMe,
	Kanchan Joshi, Javier Gonzalez, 박진환,
	Linus Torvalds

On Tue, Jul 11, 2023 at 11:39:11AM +0200, Linux regression tracking (Thorsten Leemhuis) wrote:
> Well, that "They keep pumping out more and more devices with the same
> breakage" and the "new device" comment from Pankaj below bear the
> question: should we stop trying to play "whack a mole" with all those
> quirk entries and handle devices with duplicate ids just like Windows does?

As far as I can tell Windows completely ignores the IDs.  Which, looking
back, I'd love to be able to do as well, but they are already used
by udev for the /dev/disk/by-id/ links.   Those are usually not used
on desktop systems, as they use the file system labels and UUIDs, but
that doesn't work for non-file system uses.

And all this has been working really well with the good old enterprise
SSDs, it's just that the cheap consumer devices keep fucking it up.

If we'd take it away now we'd break existing users, which puts us between
a rock and a hard place.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Fwd: Need NVME QUIRK BOGUS for SAMSUNG MZ1WV480HCGL-000MV (Samsung SM-953 Datacenter SSD)
  2023-07-11 12:06           ` Christoph Hellwig
@ 2023-07-11 12:14             ` Sagi Grimberg
  2023-07-11 14:12               ` Greg KH
  2023-07-11 15:33             ` Keith Busch
  2023-07-11 16:47             ` Linus Torvalds
  2 siblings, 1 reply; 31+ messages in thread
From: Sagi Grimberg @ 2023-07-11 12:14 UTC (permalink / raw)
  To: Christoph Hellwig, Linux regressions mailing list
  Cc: Pankaj Raghav, Keith Busch, Bagas Sanjaya, Jens Axboe, Clemens S.,
	Martin Belanger, Chaitanya Kulkarni, John Meneghini,
	Hannes Reinecke, Linux Kernel Mailing List, Linux NVMe,
	Kanchan Joshi, Javier Gonzalez, 박진환,
	Linus Torvalds


>> Well, that "They keep pumping out more and more devices with the same
>> breakage" and the "new device" comment from Pankaj below bear the
>> question: should we stop trying to play "whack a mole" with all those
>> quirk entries and handle devices with duplicate ids just like Windows does?
> 
> As far as I can tell Windows completely ignores the IDs.  Which, looking
> back, I'd love to be able to do as well, but they are already used
> by udev for the /dev/disk/by-id/ links.   Those are usually not used
> on desktop systems, as they use the file system labels and UUIDs, but
> that doesn't work for non-file system uses.
> 
> And all this has been working really well with the good old enterprise
> SSDs, it's just that the cheap consumer devices keep fucking it up.
> 
> If we'd take it away now we'd break existing users, which puts us between
> a rock and a hard place.

Maybe the compromise would be to add a modparam that tells the driver
to ignore it altogether (like allow_bogus_identifiers) that would
default to false. Then people can just workaround the problem instead
of having the back-and-fourth with the vendor?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Fwd: Need NVME QUIRK BOGUS for SAMSUNG MZ1WV480HCGL-000MV (Samsung SM-953 Datacenter SSD)
  2023-07-11 12:14             ` Sagi Grimberg
@ 2023-07-11 14:12               ` Greg KH
  2023-07-11 14:39                 ` Sagi Grimberg
  0 siblings, 1 reply; 31+ messages in thread
From: Greg KH @ 2023-07-11 14:12 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Linux regressions mailing list, Pankaj Raghav,
	Keith Busch, Bagas Sanjaya, Jens Axboe, Clemens S.,
	Martin Belanger, Chaitanya Kulkarni, John Meneghini,
	Hannes Reinecke, Linux Kernel Mailing List, Linux NVMe,
	Kanchan Joshi, Javier Gonzalez, 박진환,
	Linus Torvalds

On Tue, Jul 11, 2023 at 03:14:54PM +0300, Sagi Grimberg wrote:
> 
> > > Well, that "They keep pumping out more and more devices with the same
> > > breakage" and the "new device" comment from Pankaj below bear the
> > > question: should we stop trying to play "whack a mole" with all those
> > > quirk entries and handle devices with duplicate ids just like Windows does?
> > 
> > As far as I can tell Windows completely ignores the IDs.  Which, looking
> > back, I'd love to be able to do as well, but they are already used
> > by udev for the /dev/disk/by-id/ links.   Those are usually not used
> > on desktop systems, as they use the file system labels and UUIDs, but
> > that doesn't work for non-file system uses.
> > 
> > And all this has been working really well with the good old enterprise
> > SSDs, it's just that the cheap consumer devices keep fucking it up.
> > 
> > If we'd take it away now we'd break existing users, which puts us between
> > a rock and a hard place.
> 
> Maybe the compromise would be to add a modparam that tells the driver
> to ignore it altogether (like allow_bogus_identifiers) that would
> default to false. Then people can just workaround the problem instead
> of having the back-and-fourth with the vendor?
> 

Module parameters do not work on a per-device basis, sorry.  This isn't
the 1990's anymore, please do not attempt to add new ones :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Fwd: Need NVME QUIRK BOGUS for SAMSUNG MZ1WV480HCGL-000MV (Samsung SM-953 Datacenter SSD)
  2023-07-11 14:12               ` Greg KH
@ 2023-07-11 14:39                 ` Sagi Grimberg
  0 siblings, 0 replies; 31+ messages in thread
From: Sagi Grimberg @ 2023-07-11 14:39 UTC (permalink / raw)
  To: Greg KH
  Cc: Christoph Hellwig, Linux regressions mailing list, Pankaj Raghav,
	Keith Busch, Bagas Sanjaya, Jens Axboe, Clemens S.,
	Martin Belanger, Chaitanya Kulkarni, John Meneghini,
	Hannes Reinecke, Linux Kernel Mailing List, Linux NVMe,
	Kanchan Joshi, Javier Gonzalez, 박진환,
	Linus Torvalds


>>>> Well, that "They keep pumping out more and more devices with the same
>>>> breakage" and the "new device" comment from Pankaj below bear the
>>>> question: should we stop trying to play "whack a mole" with all those
>>>> quirk entries and handle devices with duplicate ids just like Windows does?
>>>
>>> As far as I can tell Windows completely ignores the IDs.  Which, looking
>>> back, I'd love to be able to do as well, but they are already used
>>> by udev for the /dev/disk/by-id/ links.   Those are usually not used
>>> on desktop systems, as they use the file system labels and UUIDs, but
>>> that doesn't work for non-file system uses.
>>>
>>> And all this has been working really well with the good old enterprise
>>> SSDs, it's just that the cheap consumer devices keep fucking it up.
>>>
>>> If we'd take it away now we'd break existing users, which puts us between
>>> a rock and a hard place.
>>
>> Maybe the compromise would be to add a modparam that tells the driver
>> to ignore it altogether (like allow_bogus_identifiers) that would
>> default to false. Then people can just workaround the problem instead
>> of having the back-and-fourth with the vendor?
>>
> 
> Module parameters do not work on a per-device basis, sorry.  This isn't
> the 1990's anymore, please do not attempt to add new ones :)

Don't get me wrong, I don't like adding this. But the source of this
is that there are simply too many breakages of non-compliant consumer
drives out there that maybe a compromise would be "globally relax
compliance check in this specific area" as a workaround.

Right now each time this issue is seen in the wild, the only
resolution is either the vendor fixing it, or the driver adds
a quirk, which is positive and exactly what we want. But more
and more users complain, and there is no immediate workaround.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Fwd: Need NVME QUIRK BOGUS for SAMSUNG MZ1WV480HCGL-000MV (Samsung SM-953 Datacenter SSD)
  2023-07-11 12:06           ` Christoph Hellwig
  2023-07-11 12:14             ` Sagi Grimberg
@ 2023-07-11 15:33             ` Keith Busch
  2023-07-11 16:47             ` Linus Torvalds
  2 siblings, 0 replies; 31+ messages in thread
From: Keith Busch @ 2023-07-11 15:33 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Linux regressions mailing list, Pankaj Raghav, Bagas Sanjaya,
	Jens Axboe, Sagi Grimberg, Clemens S.,
	Martin Belanger, Chaitanya Kulkarni, John Meneghini,
	Hannes Reinecke, Linux Kernel Mailing List, Linux NVMe,
	Kanchan Joshi, Javier Gonzalez, 박진환,
	Linus Torvalds

On Tue, Jul 11, 2023 at 02:06:09PM +0200, Christoph Hellwig wrote:
> On Tue, Jul 11, 2023 at 11:39:11AM +0200, Linux regression tracking (Thorsten Leemhuis) wrote:
> > Well, that "They keep pumping out more and more devices with the same
> > breakage" and the "new device" comment from Pankaj below bear the
> > question: should we stop trying to play "whack a mole" with all those
> > quirk entries and handle devices with duplicate ids just like Windows does?
> 
> As far as I can tell Windows completely ignores the IDs.  Which, looking
> back, I'd love to be able to do as well, but they are already used
> by udev for the /dev/disk/by-id/ links.   Those are usually not used
> on desktop systems, as they use the file system labels and UUIDs, but
> that doesn't work for non-file system uses.

It's also an inescapable requirement for supporting multipath, and a
number of the devices being quirked bizarrely also report CMIC/NMIC
capabilities.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Fwd: Need NVME QUIRK BOGUS for SAMSUNG MZ1WV480HCGL-000MV (Samsung SM-953 Datacenter SSD)
  2023-07-11 12:06           ` Christoph Hellwig
  2023-07-11 12:14             ` Sagi Grimberg
  2023-07-11 15:33             ` Keith Busch
@ 2023-07-11 16:47             ` Linus Torvalds
  2023-07-11 17:21               ` Keith Busch
  2023-07-12 16:45               ` Christoph Hellwig
  2 siblings, 2 replies; 31+ messages in thread
From: Linus Torvalds @ 2023-07-11 16:47 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Linux regressions mailing list, Pankaj Raghav, Keith Busch,
	Bagas Sanjaya, Jens Axboe, Sagi Grimberg, Clemens S.,
	Martin Belanger, Chaitanya Kulkarni, John Meneghini,
	Hannes Reinecke, Linux Kernel Mailing List, Linux NVMe,
	Kanchan Joshi, Javier Gonzalez, 박진환

On Tue, 11 Jul 2023 at 05:06, Christoph Hellwig <hch@lst.de> wrote:
>
> As far as I can tell Windows completely ignores the IDs.  Which, looking
> back, I'd love to be able to do as well, but they are already used
> by udev for the /dev/disk/by-id/ links.   Those are usually not used
> on desktop systems, as they use the file system labels and UUIDs, but
> that doesn't work for non-file system uses.

The thing is, the nvme code seems to actively do completely stuipid
things in this area.

> And all this has been working really well with the good old enterprise
> SSDs, it's just that the cheap consumer devices keep f*cking it up.

Christoph, deal with reality, not with what you think things should look like.

Anybody who expected unique ID's is frankly completely incompetent.
People should have *known* not to do this.

 "Those Who Do Not Learn History Are Doomed To Repeat It"
          - Santayana

and we have NEVER EVER seen devices with reliably unique IDs. Really.
We've had these uuid's before (ask Greg about USB devices one day, and
that was *recent*).

We've always known that vendors will fill in a fixed value, and
somebody still decided to make this a correctness issue?

Christoph, don't blame vendors. Somebody did indeed f*ck up.  But it was you.

> If we'd take it away now we'd break existing users, which puts us between
> a rock and a hard place.

Well, here's a suggestion: stop making it worse.

For example, we have this completely unacceptable garbage:

        ret = nvme_global_check_duplicate_ids(ctrl->subsys, &info->ids);
        if (ret) {
                dev_err(ctrl->device,
                        "globally duplicate IDs for nsid %d\n", info->nsid);
                nvme_print_device_info(ctrl);
                return ret;
        }

iow, the code even checks for and *notices* that there are duplicate
IDs, and what does it do? It then errors out.

Then expecting people TO WAIT FOR A NEW KERNEL VERSION when you
noticed something wrong? What an absolute crock.

So stop blaming anybody else.

I think the code should *default* to "unreliable uuid", and then if
you're sure it's actually ok, then you use it. Then some rare
enterprise user with multipathing  - who is going to be very very
careful about which device to use anyway - can use the "approved
list".

Or "Oh, I noticed a non-unique UUID, let me generate one for you based
on physical location".

But this "my disk doesn't work in v6.0 and later because some clown
added a duplicate check that shouldn't be there" is not a good thing
to then try to make excuses for.

            Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Fwd: Need NVME QUIRK BOGUS for SAMSUNG MZ1WV480HCGL-000MV (Samsung SM-953 Datacenter SSD)
  2023-07-11 16:47             ` Linus Torvalds
@ 2023-07-11 17:21               ` Keith Busch
  2023-07-11 22:07                 ` John Meneghini
  2023-07-12  7:38                 ` Sagi Grimberg
  2023-07-12 16:45               ` Christoph Hellwig
  1 sibling, 2 replies; 31+ messages in thread
From: Keith Busch @ 2023-07-11 17:21 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Christoph Hellwig, Linux regressions mailing list, Pankaj Raghav,
	Bagas Sanjaya, Jens Axboe, Sagi Grimberg, Clemens S.,
	Martin Belanger, Chaitanya Kulkarni, John Meneghini,
	Hannes Reinecke, Linux Kernel Mailing List, Linux NVMe,
	Kanchan Joshi, Javier Gonzalez, 박진환

On Tue, Jul 11, 2023 at 09:47:00AM -0700, Linus Torvalds wrote:
> On Tue, 11 Jul 2023 at 05:06, Christoph Hellwig <hch@lst.de> wrote:
> For example, we have this completely unacceptable garbage:
> 
>         ret = nvme_global_check_duplicate_ids(ctrl->subsys, &info->ids);
>         if (ret) {
>                 dev_err(ctrl->device,
>                         "globally duplicate IDs for nsid %d\n", info->nsid);
>                 nvme_print_device_info(ctrl);
>                 return ret;
>         }
> 
> iow, the code even checks for and *notices* that there are duplicate
> IDs, and what does it do? It then errors out.

This check came from a recent half-baked spec feature called "Dispersed
Namespaces" that caused breakage and data corruption when used in Linux.
Rather than attempt to support that mostly vendor specific feature, the
driver attempted to fence that off as unmaintainable. This check wasn't
aimed at enforcing "correctness", but it certainly found a lot of that
as collatoral damage. Let's see if we can find a better way to detect
the difference with a sane fallback as you suggest.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Fwd: Need NVME QUIRK BOGUS for SAMSUNG MZ1WV480HCGL-000MV (Samsung SM-953 Datacenter SSD)
  2023-07-11 17:21               ` Keith Busch
@ 2023-07-11 22:07                 ` John Meneghini
  2023-07-12  7:38                 ` Sagi Grimberg
  1 sibling, 0 replies; 31+ messages in thread
From: John Meneghini @ 2023-07-11 22:07 UTC (permalink / raw)
  To: Keith Busch, Linus Torvalds
  Cc: Christoph Hellwig, Linux regressions mailing list, Pankaj Raghav,
	Bagas Sanjaya, Jens Axboe, Sagi Grimberg, Clemens S.,
	Martin Belanger, Chaitanya Kulkarni, Hannes Reinecke,
	Linux Kernel Mailing List, Linux NVMe, Kanchan Joshi,
	Javier Gonzalez, 박진환

Yes, this is what I thought.  This is all the result of the duplicate NID check added to deal with TP4034 Dispersed Namespaces.

One suggestion I have would be to limit this check to nvme-of subsystems only. These are the only devices I am aware of out 
there which support TP4034.  Moreover, all nvme-of devices report a valid NID.  It's required with NVMe Over Fabrics. The PCIe 
devices, I expect, don't care. You don't really need a valid NID with a private namespace - which is what most PCIe devices are.

I'll wager that if you change nvme_global_check_duplicate_ids() to check only nvme-of subsystems, and simply continue with PCIe 
subsystems, 90% of these nvme quirks can be removed.

John Meneghini
Senior Principal Platform Storage Engineer
RHEL SST - Platform Storage Group
jmeneghi@redhat.com

On 7/11/23 13:21, Keith Busch wrote:
> On Tue, Jul 11, 2023 at 09:47:00AM -0700, Linus Torvalds wrote:
>> On Tue, 11 Jul 2023 at 05:06, Christoph Hellwig <hch@lst.de> wrote:
>> For example, we have this completely unacceptable garbage:
>>
>>          ret = nvme_global_check_duplicate_ids(ctrl->subsys, &info->ids);
>>          if (ret) {
>>                  dev_err(ctrl->device,
>>                          "globally duplicate IDs for nsid %d\n", info->nsid);
>>                  nvme_print_device_info(ctrl);
>>                  return ret;
>>          }
>>
>> iow, the code even checks for and *notices* that there are duplicate
>> IDs, and what does it do? It then errors out.
> 
> This check came from a recent half-baked spec feature called "Dispersed
> Namespaces" that caused breakage and data corruption when used in Linux.
> Rather than attempt to support that mostly vendor specific feature, the
> driver attempted to fence that off as unmaintainable. This check wasn't
> aimed at enforcing "correctness", but it certainly found a lot of that
> as collatoral damage. Let's see if we can find a better way to detect
> the difference with a sane fallback as you suggest.
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Fwd: Need NVME QUIRK BOGUS for SAMSUNG MZ1WV480HCGL-000MV (Samsung SM-953 Datacenter SSD)
  2023-07-11 17:21               ` Keith Busch
  2023-07-11 22:07                 ` John Meneghini
@ 2023-07-12  7:38                 ` Sagi Grimberg
  2023-07-12 16:48                   ` Linus Torvalds
  1 sibling, 1 reply; 31+ messages in thread
From: Sagi Grimberg @ 2023-07-12  7:38 UTC (permalink / raw)
  To: Keith Busch, Linus Torvalds
  Cc: Christoph Hellwig, Linux regressions mailing list, Pankaj Raghav,
	Bagas Sanjaya, Jens Axboe, Clemens S.,
	Martin Belanger, Chaitanya Kulkarni, John Meneghini,
	Hannes Reinecke, Linux Kernel Mailing List, Linux NVMe,
	Kanchan Joshi, Javier Gonzalez, 박진환


>> For example, we have this completely unacceptable garbage:
>>
>>          ret = nvme_global_check_duplicate_ids(ctrl->subsys, &info->ids);
>>          if (ret) {
>>                  dev_err(ctrl->device,
>>                          "globally duplicate IDs for nsid %d\n", info->nsid);
>>                  nvme_print_device_info(ctrl);
>>                  return ret;
>>          }
>>
>> iow, the code even checks for and *notices* that there are duplicate
>> IDs, and what does it do? It then errors out.
> 
> This check came from a recent half-baked spec feature called "Dispersed
> Namespaces" that caused breakage and data corruption when used in Linux.
> Rather than attempt to support that mostly vendor specific feature, the
> driver attempted to fence that off as unmaintainable. This check wasn't
> aimed at enforcing "correctness", but it certainly found a lot of that
> as collatoral damage. Let's see if we can find a better way to detect
> the difference with a sane fallback as you suggest.

Perhaps we could fallback to what we do in wwid_show()?

"nvme.%04x-%*phN-%*phN-%08x\n", subsys->vendor_id, serial_len,
subsys->serial, model_len, subsys->model, head->ns_id)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Fwd: Need NVME QUIRK BOGUS for SAMSUNG MZ1WV480HCGL-000MV (Samsung SM-953 Datacenter SSD)
  2023-07-11 16:47             ` Linus Torvalds
  2023-07-11 17:21               ` Keith Busch
@ 2023-07-12 16:45               ` Christoph Hellwig
  2023-07-12 16:51                 ` Linus Torvalds
  1 sibling, 1 reply; 31+ messages in thread
From: Christoph Hellwig @ 2023-07-12 16:45 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Christoph Hellwig, Linux regressions mailing list, Pankaj Raghav,
	Keith Busch, Bagas Sanjaya, Jens Axboe, Sagi Grimberg, Clemens S.,
	Martin Belanger, Chaitanya Kulkarni, John Meneghini,
	Hannes Reinecke, Linux Kernel Mailing List, Linux NVMe,
	Kanchan Joshi, Javier Gonzalez, 박진환

On Tue, Jul 11, 2023 at 09:47:00AM -0700, Linus Torvalds wrote:
> Anybody who expected unique ID's is frankly completely incompetent.
> People should have *known* not to do this.

Except that storage software really does rely on them.  Not your
laptop (or mine) but all kinds of data center software.  Where these
IDs have worked fine for decades.  It just turns out nvme has the
misfortune of trying to deal with both that and cheap consumer crap.

> and we have NEVER EVER seen devices with reliably unique IDs. Really.

Sorry, but that's bullshit. 

> iow, the code even checks for and *notices* that there are duplicate
> IDs, and what does it do? It then errors out.

Yes.  Because we have applications which will lose data if they
suddently get the wrong device.  

> I think the code should *default* to "unreliable uuid", and then if
> you're sure it's actually ok, then you use it. Then some rare
> enterprise user with multipathing  - who is going to be very very
> careful about which device to use anyway - can use the "approved
> list".

That doesn't scale either.

> Or "Oh, I noticed a non-unique UUID, let me generate one for you based
> on physical location".
> 
> But this "my disk doesn't work in v6.0 and later because some clown
> added a duplicate check that shouldn't be there" is not a good thing
> to then try to make excuses for.

Well, let's try something like this (co-developed with Sagi) that
allows it for non-multipath PCIe devices, hich covers the quirks
we've added so far, but also add a big fat warning, because we know
people rely on the /dev/disk/by-id/ links.  Probably not on these
devices, but who knows.

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 47d7ba2827ff29..37b6fa74666204 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3431,10 +3431,40 @@ static int nvme_init_ns_head(struct nvme_ns *ns, struct nvme_ns_info *info)
 
 	ret = nvme_global_check_duplicate_ids(ctrl->subsys, &info->ids);
 	if (ret) {
-		dev_err(ctrl->device,
-			"globally duplicate IDs for nsid %d\n", info->nsid);
+		/*
+		 * We've found two different namespaces on two different
+		 * subsystems that report the same ID.  This is pretty nasty
+		 * for anything that actually requires unique device
+		 * identification.  In the kernel we need this for multipathing,
+		 * and in user space the /dev/disk/by-id/ links rely on it.
+		 *
+		 * If the device also claims to be multi-path capable back off
+		 * here now and refuse the probe the second device as this is a
+		 * recipe for data corruption.  If not this is probably a
+		 * cheap consumer device if on the PCIe bus, so let the user
+		 * proceed and use the shiny toy, but warn that with changing
+		 * probing order (which due to our async probing could just be
+		 * device taking longer to startup) the other device could show
+		 * up at any time.
+		 */
 		nvme_print_device_info(ctrl);
-		return ret;
+		if ((ns->ctrl->ops->flags & NVME_F_FABRICS) || /* !PCIe */
+		    ((ns->ctrl->subsys->cmic & NVME_CTRL_CMIC_MULTI_CTRL) &&
+		     info->is_shared)) {
+			dev_err(ctrl->device,
+				"ignoring nsid %d because of duplicate IDs\n",
+				info->nsid);
+			return ret;
+		}
+
+		dev_err(ctrl->device,
+			"clearing duplicate IDs for nsid %d\n", info->nsid);
+		dev_err(ctrl->device,
+			"use of /dev/disk/by-id/ may cause data corruption\n");
+		memset(&info->ids.nguid, 0, sizeof(info->ids.nguid));
+		memset(&info->ids.uuid, 0, sizeof(info->ids.uuid));
+		memset(&info->ids.eui64, 0, sizeof(info->ids.eui64));
+		ctrl->quirks |= NVME_QUIRK_BOGUS_NID;
 	}
 
 	mutex_lock(&ctrl->subsys->lock);

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: Fwd: Need NVME QUIRK BOGUS for SAMSUNG MZ1WV480HCGL-000MV (Samsung SM-953 Datacenter SSD)
  2023-07-12  7:38                 ` Sagi Grimberg
@ 2023-07-12 16:48                   ` Linus Torvalds
  0 siblings, 0 replies; 31+ messages in thread
From: Linus Torvalds @ 2023-07-12 16:48 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Keith Busch, Christoph Hellwig, Linux regressions mailing list,
	Pankaj Raghav, Bagas Sanjaya, Jens Axboe, Clemens S.,
	Martin Belanger, Chaitanya Kulkarni, John Meneghini,
	Hannes Reinecke, Linux Kernel Mailing List, Linux NVMe,
	Kanchan Joshi, Javier Gonzalez, 박진환

On Wed, 12 Jul 2023 at 00:38, Sagi Grimberg <sagi@grimberg.me> wrote:
>
> Perhaps we could fallback to what we do in wwid_show()?
>
> "nvme.%04x-%*phN-%*phN-%08x\n", subsys->vendor_id, serial_len,
> subsys->serial, model_len, subsys->model, head->ns_id)

That's not going to be any more unique.

Yes, there is *slightly* more hope that a simpler "serial number"
might be different between devices, but honestly, when I told
Christoph to learn from history and not assume UUID's are unique for
hardware, that "history" is literally decades of serial numbers.

Why do you think things like dm/md generate the UUID and literally
write it to disk? It's because those unique serial numbers have never
ever existed.

And absolutely NONE OF THIS IS NEW. This is not a nvme issue. SCSI
disks allegedly has a VPD page 0x80 with a serial number. Not only
won't many devices report that VPD in the first place, even if they do
there's no guarantee that it's actually unique.

And USB devices are *supposed* to have unique device serial numbers
("iSerial"). I think it's literally been in the spec since day#1. And
some even do. Quite a lot absolutely do NOT.

It's just fundamentally cheaper for hardware manufacturers that do
mass manufacturing to do completely identical pieces of hardware.
Serial numbers and UUID's are _fundamentally_ hard and expensive at
some level.

If you sell overpriced garbage to enterprise users, then a serial
number or UUID will most definitely be part of the package. Because
that is _literally_ what "Enterprise hardware" means: "Overpriced
garbage with a provenance and somebody else to blame".

But if you sell to the mass market, and aren't going for people who
pay extra for provenance, a serial number or a UUID is pure overhead.

Sometimes it's easy enough to do, and people do it.

But if you *rely* on it, *you* are the problem. Not the vendor that
was churning out a million devices for real users, and that was
spending all the effort on just making it work.

Basically, I think all of these things are purely for fabrics and the
nvme target side.

Expecting serial numbers from a PCIe device is pure fantasy.

And, more importantly, it's stupid and wrong. Exactly because we've
been here before. Don't repeat mistakes from the past.

By all means _report_ things like serial numbers. They can be very
useful for tracking. But never *ever* rely on them for basic
functionality. People absolutely will have two identical devices from
the same batch, and unless there is some actual hardware compliance
verification, they will not have unique serial numbers.

In many cases, the best you can hope for is that the duplicate ones
are *obviously* duplicate (ie "all zeroes" and similar). But you
shouldn't expect or depend on even that to be the case.

             Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Fwd: Need NVME QUIRK BOGUS for SAMSUNG MZ1WV480HCGL-000MV (Samsung SM-953 Datacenter SSD)
  2023-07-12 16:45               ` Christoph Hellwig
@ 2023-07-12 16:51                 ` Linus Torvalds
  2023-07-12 16:57                   ` Christoph Hellwig
  0 siblings, 1 reply; 31+ messages in thread
From: Linus Torvalds @ 2023-07-12 16:51 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Linux regressions mailing list, Pankaj Raghav, Keith Busch,
	Bagas Sanjaya, Jens Axboe, Sagi Grimberg, Clemens S.,
	Martin Belanger, Chaitanya Kulkarni, John Meneghini,
	Hannes Reinecke, Linux Kernel Mailing List, Linux NVMe,
	Kanchan Joshi, Javier Gonzalez, 박진환

On Wed, 12 Jul 2023 at 09:45, Christoph Hellwig <hch@lst.de> wrote:
>
> On Tue, Jul 11, 2023 at 09:47:00AM -0700, Linus Torvalds wrote:
>
> > and we have NEVER EVER seen devices with reliably unique IDs. Really.
>
> Sorry, but that's bullshit.

Christoph, you are *literally* involved in a discussion where this is the case.

What you call "bullshit" is what everybody else here calls REALITY.

The fact that *some* devices have serial numbers in no way implies
that all of them do.

And if you think that they all do, you are the problem. Literally in this case.

So you had better really internalize this "there are no reliable
serial numbers in general". Because it's simply a FACT.

> Well, let's try something like this (co-developed with Sagi) that
> allows it for non-multipath PCIe devices, which covers the quirks
> we've added so far, but also add a big fat warning, because we know
> people rely on the /dev/disk/by-id/ links.  Probably not on these
> devices, but who knows.

Looks sane to me.

          Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Fwd: Need NVME QUIRK BOGUS for SAMSUNG MZ1WV480HCGL-000MV (Samsung SM-953 Datacenter SSD)
  2023-07-12 16:51                 ` Linus Torvalds
@ 2023-07-12 16:57                   ` Christoph Hellwig
  2023-07-12 17:34                     ` Linus Torvalds
  0 siblings, 1 reply; 31+ messages in thread
From: Christoph Hellwig @ 2023-07-12 16:57 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Christoph Hellwig, Linux regressions mailing list, Pankaj Raghav,
	Keith Busch, Bagas Sanjaya, Jens Axboe, Sagi Grimberg, Clemens S.,
	Martin Belanger, Chaitanya Kulkarni, John Meneghini,
	Hannes Reinecke, Linux Kernel Mailing List, Linux NVMe,
	Kanchan Joshi, Javier Gonzalez, 박진환

On Wed, Jul 12, 2023 at 09:51:52AM -0700, Linus Torvalds wrote:
> On Wed, 12 Jul 2023 at 09:45, Christoph Hellwig <hch@lst.de> wrote:
> >
> > On Tue, Jul 11, 2023 at 09:47:00AM -0700, Linus Torvalds wrote:
> >
> > > and we have NEVER EVER seen devices with reliably unique IDs. Really.
> >
> > Sorry, but that's bullshit.
> 
> Christoph, you are *literally* involved in a discussion where this is the case.

Yes.  But that's not never ever.  Enterprise storage devices have been
pretty very good at it, because it is part of the purchase specs.

So don't claim NEVER EVER, which is just BS.  Claim we've been way
to optimistic in that even cheap devices would get something so
basic, and you're spot on.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Fwd: Need NVME QUIRK BOGUS for SAMSUNG MZ1WV480HCGL-000MV (Samsung SM-953 Datacenter SSD)
  2023-07-12 16:57                   ` Christoph Hellwig
@ 2023-07-12 17:34                     ` Linus Torvalds
  2023-07-13 11:44                       ` Christoph Hellwig
  0 siblings, 1 reply; 31+ messages in thread
From: Linus Torvalds @ 2023-07-12 17:34 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Linux regressions mailing list, Pankaj Raghav, Keith Busch,
	Bagas Sanjaya, Jens Axboe, Sagi Grimberg, Clemens S.,
	Martin Belanger, Chaitanya Kulkarni, John Meneghini,
	Hannes Reinecke, Linux Kernel Mailing List, Linux NVMe,
	Kanchan Joshi, Javier Gonzalez, 박진환

On Wed, 12 Jul 2023 at 09:57, Christoph Hellwig <hch@lst.de> wrote:
>
> On Wed, Jul 12, 2023 at 09:51:52AM -0700, Linus Torvalds wrote:
> > Christoph, you are *literally* involved in a discussion where this is the case.
>
> Yes.  But that's not never ever.  Enterprise storage devices have been
> pretty very good at it, because it is part of the purchase specs.

Your basic logic is lacking.

"Humans are animals" and "Some animals have four legs" are both
undeniabvly true.

That still does *not* mean "Humans have four legs".

And yet hat is *literally* the kind of nonsense you are spouting.

I very much say that "some devices will not have UUID's".

That's a *FACT*.

You then say "Some devices have UUID's, so all devices must have UUIDs".

That's *STUPID*.

Can you really not tell the difference?

           Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Fwd: Need NVME QUIRK BOGUS for SAMSUNG MZ1WV480HCGL-000MV (Samsung SM-953 Datacenter SSD)
  2023-07-12 17:34                     ` Linus Torvalds
@ 2023-07-13 11:44                       ` Christoph Hellwig
  2023-07-13 16:53                         ` Linus Torvalds
  0 siblings, 1 reply; 31+ messages in thread
From: Christoph Hellwig @ 2023-07-13 11:44 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Christoph Hellwig, Linux regressions mailing list, Pankaj Raghav,
	Keith Busch, Bagas Sanjaya, Jens Axboe, Sagi Grimberg, Clemens S.,
	Martin Belanger, Chaitanya Kulkarni, John Meneghini,
	Hannes Reinecke, Linux Kernel Mailing List, Linux NVMe,
	Kanchan Joshi, Javier Gonzalez, 박진환

On Wed, Jul 12, 2023 at 10:34:05AM -0700, Linus Torvalds wrote:
> That's a *FACT*.
> 
> You then say "Some devices have UUID's, so all devices must have UUIDs".

Please stop putting words in my mouth.  Maybe instead of shouting it
would help to actually read the text?

You said we've never seen devices with reliably IDs, which simply isn't
true.  And that doesn't mean I'm saying all devices have reliably IDs,
which somehow you're not trying to make me say.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Fwd: Need NVME QUIRK BOGUS for SAMSUNG MZ1WV480HCGL-000MV (Samsung SM-953 Datacenter SSD)
  2023-07-13 11:44                       ` Christoph Hellwig
@ 2023-07-13 16:53                         ` Linus Torvalds
  0 siblings, 0 replies; 31+ messages in thread
From: Linus Torvalds @ 2023-07-13 16:53 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Linux regressions mailing list, Pankaj Raghav, Keith Busch,
	Bagas Sanjaya, Jens Axboe, Sagi Grimberg, Clemens S.,
	Martin Belanger, Chaitanya Kulkarni, John Meneghini,
	Hannes Reinecke, Linux Kernel Mailing List, Linux NVMe,
	Kanchan Joshi, Javier Gonzalez, 박진환

On Thu, 13 Jul 2023 at 04:44, Christoph Hellwig <hch@lst.de> wrote:
>
> Please stop putting words in my mouth.  Maybe instead of shouting it
> would help to actually read the text?

I have read the text. You spout unbelievable garbage. Even in this email:

> You said we've never seen devices with reliably IDs, which simply isn't
> true.  And that doesn't mean I'm saying all devices have reliably IDs,
> which somehow you're not trying to make me say.

When I say "devices", I obviously mean in general.

You seem to argue against it.

OF COURSE a _single_ device that has shown to have a UUID is likely to
then repeat that UUID reliably. But that doesn't mean it's true in
general.

The fact is, any driver subsystem that deals with more than one device
manufacturer had better not rely on any serial number for uniqueness.

If some device layer does, and there is a device without serial
numbers, then the bug is squarely on the driver, not the device.
Because the driver author clearly didn't think things through.

THAT is what I mean by "devices do not EVER reliably have uuids".

I just checked. I have on my desk access to three machines with NVMe
drives. Of them

 (a) one didn't have any UUID at all, didn't report namespaces and the
"wwid" exposed in /proc is that

        return sysfs_emit(buf, "nvme.%04x-%*phN-%*phN-%08x\n",
subsys->vendor_id,
                serial_len, subsys->serial, model_len, subsys->model,
                head->ns_id);

     thing.

(b) the other two had a NULL uuid and nguid, but seem to have a eui64,
whatever the hell that is.

So just from my limited test, one in three didn't have any UUID at all.

Deal with it. Stop claiming it's somehow "reliable". Two out of three
out of a random selection is not "reliable".

And it has never been reliable, judging by those quirks. When we have
manufacturers like Samsung, Phison, ADATA, Lexar, Sandisk, Micron, and
Kingston mentioned in the quirk list, we probably have covered most of
the consumer SSD manufacturers.

It's a sad that NVMe - that started out as a "lean mean low-overhead
disk interface definition" results in this kind of mess.

           Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Fwd: Need NVME QUIRK BOGUS for SAMSUNG MZ1WV480HCGL-000MV (Samsung SM-953 Datacenter SSD)
  2023-07-11  6:54       ` Pankaj Raghav
  2023-07-11  9:39         ` Linux regression tracking (Thorsten Leemhuis)
@ 2023-07-16 19:19         ` August Wikerfors
  1 sibling, 0 replies; 31+ messages in thread
From: August Wikerfors @ 2023-07-16 19:19 UTC (permalink / raw)
  To: Pankaj Raghav
  Cc: Keith Busch, Linux regressions mailing list, Bagas Sanjaya,
	Jens Axboe, Christoph Hellwig, Sagi Grimberg, Clemens S.,
	Martin Belanger, Chaitanya Kulkarni, John Meneghini,
	Hannes Reinecke, Linux Kernel Mailing List, Linux NVMe,
	Kanchan Joshi, Javier Gonzalez, 박진환

On 2023-07-11 08:54, Pankaj Raghav wrote:
>>> Another request came in today, even with a pseudo-patch:
>>> https://bugzilla.kernel.org/show_bug.cgi?id=217649
>>>
>>> To quote:
>>> ```
>>> As with numerous NVMe controllers these days, Samsung's
>>> MZAL41T0HBLB-00BL2, which Lenovo builds into their 16ARP8 also suffers
>>> from invalid IDs, breaking suspend and hibernate also on the latest
>>> kernel 6.4.2.
>>>
>>> The following change restores this functionality:
>>>
>>> File: root/drivers/nvme/host/pci.c
>>> Change:
>>>
>>> -	{ PCI_DEVICE(0x144d, 0xa80b),   /* Samsung PM9B1 256G and 512G */
>>> -		.driver_data = NVME_QUIRK_DISABLE_WRITE_ZEROES, },
>>>
>>> +	{ PCI_DEVICE(0x144d, 0xa80b),   /* Samsung PM9B1 256G, 512G and 1TB */
>>> +		.driver_data = NVME_QUIRK_BOGUS_NID |
>>> +				NVME_QUIRK_DISABLE_WRITE_ZEROES, },
>>
>> Panjaj, okay with this one too?
> 
> This looks a like a new device that might have a firmware update. I will ping
> internally first.

(Note that this is a different issue from the regression in this thread)

I reported this back in November [1] and was told by Kanchan Joshi at
Samsung that it was fixed in new firmware [2]. Lenovo was also contacted
and said they were working on the update in December [3]. I'm not sure
what happened then, but in March, Mark Pearson at Lenovo wrote [4]:

> I'm stuck on this one - the FW team reached out to Samsung to see if 
> there were fixes that we should be picking up and Samsung reported back 
> that there are no Linux issues reported against this part :(

The release process then seems to have started over [5] and the latest
update as of May is that the update is supposed to be released this month.

[1] https://lore.kernel.org/all/20221116171727.4083-1-git@augustwikerfors.se/
[2] https://lore.kernel.org/all/20221206055928.GB24451@test-zns/
[3] https://github.com/tomsom/yoga-linux/issues/9#issuecomment-1368013293
[4] https://github.com/fwupd/firmware-lenovo/issues/308#issuecomment-1466631468
[5] https://forums.lenovo.com/topic/findpost/27/5196929/5984302

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2023-07-16 19:25 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-26  1:15 Fwd: Need NVME QUIRK BOGUS for SAMSUNG MZ1WV480HCGL-000MV (Samsung SM-953 Datacenter SSD) Bagas Sanjaya
2023-06-26 15:39 ` Chaitanya Kulkarni
2023-06-26 18:26   ` Linux regression tracking (Thorsten Leemhuis)
2023-06-27 16:10 ` Keith Busch
2023-06-27 21:57   ` Chaitanya Kulkarni
2023-06-29  8:49   ` Pankaj Raghav
2023-06-30  8:40   ` Pankaj Raghav
2023-06-30 15:18     ` Clemens Springsguth
2023-06-30 16:21       ` Pankaj Raghav
2023-07-03 13:25         ` Clemens Springsguth
2023-07-10  8:52   ` Linux regression tracking (Thorsten Leemhuis)
2023-07-10 15:58     ` Keith Busch
2023-07-11  6:54       ` Pankaj Raghav
2023-07-11  9:39         ` Linux regression tracking (Thorsten Leemhuis)
2023-07-11 12:06           ` Christoph Hellwig
2023-07-11 12:14             ` Sagi Grimberg
2023-07-11 14:12               ` Greg KH
2023-07-11 14:39                 ` Sagi Grimberg
2023-07-11 15:33             ` Keith Busch
2023-07-11 16:47             ` Linus Torvalds
2023-07-11 17:21               ` Keith Busch
2023-07-11 22:07                 ` John Meneghini
2023-07-12  7:38                 ` Sagi Grimberg
2023-07-12 16:48                   ` Linus Torvalds
2023-07-12 16:45               ` Christoph Hellwig
2023-07-12 16:51                 ` Linus Torvalds
2023-07-12 16:57                   ` Christoph Hellwig
2023-07-12 17:34                     ` Linus Torvalds
2023-07-13 11:44                       ` Christoph Hellwig
2023-07-13 16:53                         ` Linus Torvalds
2023-07-16 19:19         ` August Wikerfors

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.