linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Questions (and a possible bug) regarding the ata_device_blacklist and ATA_HORKAGE_ZERO_AFTER_TRIM
@ 2019-09-26 13:04 Stefan Tauner
  2019-09-26 22:01 ` Martin K. Petersen
  0 siblings, 1 reply; 4+ messages in thread
From: Stefan Tauner @ 2019-09-26 13:04 UTC (permalink / raw)
  To: linux-ide

Hi,

I am running an MD RAID5 with 3 SSDs (2x Crucial CT500MX500 (FW
M3CR022), 1x Samsung 860 EVO (because it was/is the only decent SSD
with msata; FW RVT41B6Q)) on Debian Buster (with an ancient^Wstable
4.19 kernel). Before this setup I was using a RAID1 and weekly fstrim
runs to issue discards.

Today I became aware of the RAID5 discard issue that led to trim being
disabled on such arrays:
https://github.com/torvalds/linux/commit/8e0e99ba64c7ba46133a7c8a3e3f7de01f23bd93

In my research to verify that my drives work fine and if I should
enable it I came across the black (actually white) list in the libata
code. My question is regarding the matching of the model_num field in
the black list. The name and the use of ATA_ID_PROD seem to indicate
that a single ATA "property" is used as the string to match. I don't
know the technical details how this is communicated by the drive but I
assume it's the same thing that smartctl and hdparm output as "Model
Number" and "Device Model" respectively.

If this is correct (is it?) then there is a problem with the list
AFAICT because the Crucial SSD I have reports this field simply as
"CT500MX500SSD4" but the kernel expects "Crucial" at the beginning of
almost all Crucial drives (line 4523+) including the vendor wildcard at
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/ata/libata-core.c#n4586
Interestingly, in line 4520 there is an entry for the CT500BX100SSD1
that does not start with "Crucial".

After looking into smartctl's drive database I guess the MX500 [2] (as
well as BX100, BX200, BX300 and BX500 [1]) series stand out in this
regard. This means that all of them do *not* get the
ATA_HORKAGE_ZERO_AFTER_TRIM flag set because they are not matched by
any of the model-specific entries nor the cumulative "Crucial*" vendor
entry.

I have not tested my drive to actually return zeros after trimming but
from the kernel code I would assume that its intent is to match all
Crucial SSDs and thus it is a bug mine is not matched. If someone
tells me to the preferred method to test it I am happy to do this. If
need be I can also submit a patch (just for MX500? all of the above?).

Is there any way to see which flags the kernel applies to a drive?
Interestingly, "lsblk -D" does only show "0" for the Samsung device
(although AFAICT it is matched by the white list AND reports
"Deterministic read ZEROs after TRIM" according to hdparm. But I don't
know what lsblk actually looks at...?

[1] https://www.smartmontools.org/changeset/4776
[2] https://www.smartmontools.org/browser/trunk/smartmontools/drivedb.h?#L1906

(Please CC since I am not subscribed)

KR
-- 
Dipl.-Ing. Stefan Tauner
Lecturer and former researcher
Embedded Systems Department

University of Applied Sciences Technikum Wien
Hoechstaedtplatz 6, 1200 Vienna, Austria
E: stefan.tauner@technikum-wien.at
I: embsys.technikum-wien.at
I: www.technikum-wien.at

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Questions (and a possible bug) regarding the ata_device_blacklist and ATA_HORKAGE_ZERO_AFTER_TRIM
  2019-09-26 13:04 Questions (and a possible bug) regarding the ata_device_blacklist and ATA_HORKAGE_ZERO_AFTER_TRIM Stefan Tauner
@ 2019-09-26 22:01 ` Martin K. Petersen
  2021-09-15 16:35   ` Stefan Tauner
  0 siblings, 1 reply; 4+ messages in thread
From: Martin K. Petersen @ 2019-09-26 22:01 UTC (permalink / raw)
  To: Stefan Tauner; +Cc: linux-ide


Stefan,

> I don't know the technical details how this is communicated by the
> drive but I assume it's the same thing that smartctl and hdparm output
> as "Model Number" and "Device Model" respectively.

Yes.

> If this is correct (is it?) then there is a problem with the list
> AFAICT because the Crucial SSD I have reports this field simply as
> "CT500MX500SSD4" but the kernel expects "Crucial" at the beginning of
> almost all Crucial drives (line 4523+) including the vendor wildcard at
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/ata/libata-core.c#n4586
> Interestingly, in line 4520 there is an entry for the CT500BX100SSD1
> that does not start with "Crucial".

With a few exceptions, the entries in the libata white/blacklist were
submitted by Crucial/Micron themselves. But it's possible that they
changed their naming scheme.

> After looking into smartctl's drive database I guess the MX500 [2] (as
> well as BX100, BX200, BX300 and BX500 [1]) series stand out in this
> regard. This means that all of them do *not* get the
> ATA_HORKAGE_ZERO_AFTER_TRIM flag set because they are not matched by
> any of the model-specific entries nor the cumulative "Crucial*" vendor
> entry.

The newest drives I have are M550 models.

> I have not tested my drive to actually return zeros after trimming but
> from the kernel code I would assume that its intent is to match all
> Crucial SSDs and thus it is a bug mine is not matched. If someone
> tells me to the preferred method to test it I am happy to do this. If
> need be I can also submit a patch (just for MX500? all of the above?).

There's no way to exhaustively test. Many drives will return zeroes most
of the time but can have corner conditions that cause them to ignore
TRIM commands.

The ones we whitelisted were as a result of feedback from the vendors
themselves (thanks to an advertised qualification for use with hardware
RAID5 controllers). As you know, there is no way for a drive to express
this capability/guarantee in the ATA protocol.

> Is there any way to see which flags the kernel applies to a drive?

# grep . /sys/class/ata_device/*/trim
/sys/class/ata_device/dev1.0/trim:unqueued
/sys/class/ata_device/dev2.0/trim:queued

> Interestingly, "lsblk -D" does only show "0" for the Samsung device
> (although AFAICT it is matched by the white list AND reports
> "Deterministic read ZEROs after TRIM" according to hdparm. But I don't
> know what lsblk actually looks at...?

lsblk looks at /sys/block/*/queue/discard*

You get "0" for the discard granularity on the Samsung?

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Questions (and a possible bug) regarding the ata_device_blacklist and ATA_HORKAGE_ZERO_AFTER_TRIM
  2019-09-26 22:01 ` Martin K. Petersen
@ 2021-09-15 16:35   ` Stefan Tauner
  2021-10-13  3:19     ` Martin K. Petersen
  0 siblings, 1 reply; 4+ messages in thread
From: Stefan Tauner @ 2021-09-15 16:35 UTC (permalink / raw)
  To: Martin K. Petersen; +Cc: linux-ide

Hi,

sorry for the "small" delay... I got distracted and only now revisited
this topic as I wanted to use discard to improve backup space
efficiency and pondered on using devices_handle_discard_safely of the
raid456 module (I run ext4 on lvm on luks on raid5 on 3 ssds) since
otherwise I cannot trim at all.

My inquiry deals with two points:
 - Discussing the addition of ATA_HORKAGE_ZERO_AFTER_TRIM for Crucial
   CT500MX500 (or CT*MX500 to include the 250 GB, 1 TB and 2 TB models)
 - Determining why the Samsung SSD 860 EVO is not recognized to zero
   after trim

On Thu, 26 Sep 2019 18:01:03 -0400
"Martin K. Petersen" <martin.petersen@oracle.com> wrote:

> > I don't know the technical details how this is communicated by the
> > drive but I assume it's the same thing that smartctl and hdparm output
> > as "Model Number" and "Device Model" respectively.  
> 
> Yes.
> 
> > If this is correct (is it?) then there is a problem with the list
> > AFAICT because the Crucial SSD I have reports this field simply as
> > "CT500MX500SSD4" but the kernel expects "Crucial" at the beginning of
> > almost all Crucial drives (line 4523+) including the vendor wildcard at
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/ata/libata-core.c#n4586
> > Interestingly, in line 4520 there is an entry for the CT500BX100SSD1
> > that does not start with "Crucial".  
> 
> With a few exceptions, the entries in the libata white/blacklist were
> submitted by Crucial/Micron themselves. But it's possible that they
> changed their naming scheme.

I can look for some smartctl logs of similar models but it is obviously
the case for mine.

> > After looking into smartctl's drive database I guess the MX500 [2] (as
> > well as BX100, BX200, BX300 and BX500 [1]) series stand out in this
> > regard. This means that all of them do *not* get the
> > ATA_HORKAGE_ZERO_AFTER_TRIM flag set because they are not matched by
> > any of the model-specific entries nor the cumulative "Crucial*" vendor
> > entry.  
> 
> The newest drives I have are M550 models.

Since Crucial has stopped producing new models I think it makes sense
to eventually conclude this topic and make some (final?) changes if
need be. Apparently the queued trim issues are not fully figured out
yet (saw commits to Linus' tree a short while ago on that) - so maybe
final-ish changes ;)

> > I have not tested my drive to actually return zeros after trimming but
> > from the kernel code I would assume that its intent is to match all
> > Crucial SSDs and thus it is a bug mine is not matched. If someone
> > tells me to the preferred method to test it I am happy to do this. If
> > need be I can also submit a patch (just for MX500? all of the above?).  
> 
> There's no way to exhaustively test. Many drives will return zeroes most
> of the time but can have corner conditions that cause them to ignore
> TRIM commands.

Sure, but since the whitelist was filled with devices that have been
tested/validated empirically, I wonder how thorough this needs to be
to add a drive with good confidence. After all, the vendor wildcard
for Crucial SSDs[1] has been quite broad and only restricted later
with blacklist entries (only due to NCQ trim and LPM problems AFAICT)...
So while queued trim is not blacklisted on my device the safe zeroing
assumption is not whitelisted for no other reason than the model
string missing "Crucial " at the beginning.
 
> > Is there any way to see which flags the kernel applies to a drive?  
> 
> # grep . /sys/class/ata_device/*/trim
> /sys/class/ata_device/dev1.0/trim:unqueued
> /sys/class/ata_device/dev2.0/trim:queued

But that's only to distinguish ATA_HORKAGE_NO_NCQ_TRIM I guess? While
this seems to be the major culprit of trim related issues I don't care
about that (yet).

> > Interestingly, "lsblk -D" does only show "0" for the Samsung device
> > (although AFAICT it is matched by the white list AND reports
> > "Deterministic read ZEROs after TRIM" according to hdparm. But I don't
> > know what lsblk actually looks at...?  
> 
> lsblk looks at /sys/block/*/queue/discard*

Yes, I could have checked strace :)

> You get "0" for the discard granularity on the Samsung?

Not for the granularity - that's fine I presume - but for the zeroing
capability. This is still the case (with Linux 5.10). I would have
expected that to be non-zero for devices with
ATA_HORKAGE_ZERO_AFTER_TRIM.

# lsblk -o PATH,MODEL,DISC-ALN,DISC-GRAN,DISC-MAX,DISC-ZERO -d
PATH     MODEL                           DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
/dev/sda CT500MX500SSD4                         0        4K       2G         0
/dev/sdb CT500MX500SSD4                         0        4K       2G         0
/dev/sdc Samsung_SSD_860_EVO_mSATA_500GB        0      512B       2G         0

Just to make sure lsblk is not lying:
# cat /sys/block/sdc/queue/discard_zeroes_data 
0

I don't understand why that's the case.


1: https://github.com/torvalds/linux/blob/7a8526a5cd51cf5f070310c6c37dd7293334ac49/drivers/ata/libata-core.c#L4030

KR
-- 
Dipl.-Ing. Stefan Tauner
Lecturer and former researcher
Embedded Systems Department

University of Applied Sciences Technikum Wien
Hoechstaedtplatz 6, 1200 Vienna, Austria
E: stefan.tauner@technikum-wien.at
I: embsys.technikum-wien.at

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Questions (and a possible bug) regarding the ata_device_blacklist and ATA_HORKAGE_ZERO_AFTER_TRIM
  2021-09-15 16:35   ` Stefan Tauner
@ 2021-10-13  3:19     ` Martin K. Petersen
  0 siblings, 0 replies; 4+ messages in thread
From: Martin K. Petersen @ 2021-10-13  3:19 UTC (permalink / raw)
  To: Stefan Tauner; +Cc: Martin K. Petersen, linux-ide


Stefan,

> This is still the case (with Linux 5.10). I would have expected that
> to be non-zero for devices with ATA_HORKAGE_ZERO_AFTER_TRIM.

The short answer is that we have split things into two distinct
operations: Deallocating a block range and zeroing a block range. And as
a result, upper layers no longer have to know whether a discard is
deterministic or not. That is all handled at the bottom of the stack
where the requested operation is translated into something suitable for
the device in question.

Because of the semantic distinction between deallocating and zeroing a
block range, discard_zeroes_data always reports 0 on modern kernels.
This is done to make applications pick the zeroing operation over
discard when they want to clear a block range.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-10-13  3:19 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-26 13:04 Questions (and a possible bug) regarding the ata_device_blacklist and ATA_HORKAGE_ZERO_AFTER_TRIM Stefan Tauner
2019-09-26 22:01 ` Martin K. Petersen
2021-09-15 16:35   ` Stefan Tauner
2021-10-13  3:19     ` Martin K. Petersen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).