linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5] libata: Add ATA_HORKAGE_NO_NCQ_ON_AMD for Samsung 860 and 870 SSD.
@ 2021-09-01 15:16 Kate Hsuan
  2021-09-02 16:07 ` Hans de Goede
  0 siblings, 1 reply; 10+ messages in thread
From: Kate Hsuan @ 2021-09-01 15:16 UTC (permalink / raw)
  To: Jens Axboe, Hans de Goede, Damien Le Moal
  Cc: linux-ide, linux-kernel, Kate Hsuan

Many users are reporting that the Samsung 860 and 870 SSD are having
various issues when combined with AMD SATA controllers and only
completely disabling NCQ helps to avoid these issues.

Always disabling NCQ for Samsung 860/870 SSDs regardless of the host
SATA adapter vendor will cause I/O performance degradation with well
behaved adapters. To limit the performance impact to AMD adapters,
introduce the ATA_HORKAGE_NO_NCQ_ON_AMD flag to force disable NCQ
only for these adapters.

Also, two libata.force parameters (noncqamd and ncqamd) are introduced
to disable and enable the NCQ for the system which equiped with AMD
SATA adapter and Samsung 860 and 870 SSDs. The user can determine NCQ
function to be enabled or disabled according to the demand.

BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=201693
Signed-off-by: Kate Hsuan <hpa@redhat.com>
---
Changes in v5:
* The libata.force parameters ncqamd and noncqamd are used to enable and
  disable the NCQ for the systems equiped with AMD SATA adapter and
  Samsung 860 and 870 SSDs.
* The character encoding of the patch comment was fixed.

Changes in v4:
* A function ata_dev_check_adapter() is added to check the vendor ID of
  the adapter.
* ATA_HORKAGE_NONCQ_ON_AMD was modified to ATA_HORKAGE_NO_NCQ_ON_AMD to
  align with the naming convention.
* Commit messages were improved according to reviewer comments.

Changes in v3:
* ATA_HORKAGE_NONCQ_ON_ASMEDIA_AMD_MARVELL was modified to
  ATA_HORKAGE_NONCQ_ON_AMD.
* Codes were fixed to completely disable NCQ on AMD controller.

---
 drivers/ata/libata-core.c | 34 ++++++++++++++++++++++++++++++++--
 include/linux/libata.h    |  1 +
 2 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 3eda3291952b..e2e900085f99 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -2199,6 +2199,25 @@ static void ata_dev_config_ncq_prio(struct ata_device *dev)

 }

+static bool ata_dev_check_adapter(struct ata_device *dev,
+				  unsigned short vendor_id)
+{
+	struct pci_dev *pcidev = NULL;
+	struct device *parent_dev = NULL;
+
+	for (parent_dev = dev->tdev.parent; parent_dev != NULL;
+	     parent_dev = parent_dev->parent) {
+		if (dev_is_pci(parent_dev)) {
+			pcidev = to_pci_dev(parent_dev);
+			if (pcidev->vendor == vendor_id)
+				return true;
+			break;
+		}
+	}
+
+	return false;
+}
+
 static int ata_dev_config_ncq(struct ata_device *dev,
 			       char *desc, size_t desc_sz)
 {
@@ -2217,6 +2236,13 @@ static int ata_dev_config_ncq(struct ata_device *dev,
 		snprintf(desc, desc_sz, "NCQ (not used)");
 		return 0;
 	}
+
+	if (dev->horkage & ATA_HORKAGE_NO_NCQ_ON_AMD &&
+	    ata_dev_check_adapter(dev, PCI_VENDOR_ID_AMD)) {
+		snprintf(desc, desc_sz, "NCQ (not used)");
+		return 0;
+	}
+
 	if (ap->flags & ATA_FLAG_NCQ) {
 		hdepth = min(ap->scsi_host->can_queue, ATA_MAX_QUEUE);
 		dev->flags |= ATA_DFLAG_NCQ;
@@ -3951,9 +3977,11 @@ static const struct ata_blacklist_entry ata_device_blacklist [] = {
 	{ "Samsung SSD 850*",		NULL,	ATA_HORKAGE_NO_NCQ_TRIM |
 						ATA_HORKAGE_ZERO_AFTER_TRIM, },
 	{ "Samsung SSD 860*",		NULL,	ATA_HORKAGE_NO_NCQ_TRIM |
-						ATA_HORKAGE_ZERO_AFTER_TRIM, },
+						ATA_HORKAGE_ZERO_AFTER_TRIM |
+						ATA_HORKAGE_NO_NCQ_ON_AMD, },
 	{ "Samsung SSD 870*",		NULL,	ATA_HORKAGE_NO_NCQ_TRIM |
-						ATA_HORKAGE_ZERO_AFTER_TRIM, },
+						ATA_HORKAGE_ZERO_AFTER_TRIM |
+						ATA_HORKAGE_NO_NCQ_ON_AMD, },
 	{ "FCCT*M500*",			NULL,	ATA_HORKAGE_NO_NCQ_TRIM |
 						ATA_HORKAGE_ZERO_AFTER_TRIM, },

@@ -6108,6 +6136,8 @@ static int __init ata_parse_force_one(char **cur,
 		{ "ncq",	.horkage_off	= ATA_HORKAGE_NONCQ },
 		{ "noncqtrim",	.horkage_on	= ATA_HORKAGE_NO_NCQ_TRIM },
 		{ "ncqtrim",	.horkage_off	= ATA_HORKAGE_NO_NCQ_TRIM },
+		{ "noncqamd",	.horkage_on	= ATA_HORKAGE_NO_NCQ_ON_AMD },
+		{ "ncqamd",	.horkage_off	= ATA_HORKAGE_NO_NCQ_ON_AMD },
 		{ "dump_id",	.horkage_on	= ATA_HORKAGE_DUMP_ID },
 		{ "pio0",	.xfer_mask	= 1 << (ATA_SHIFT_PIO + 0) },
 		{ "pio1",	.xfer_mask	= 1 << (ATA_SHIFT_PIO + 1) },
diff --git a/include/linux/libata.h b/include/linux/libata.h
index 3fcd24236793..ef1417152ecd 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -422,6 +422,7 @@ enum {
 	ATA_HORKAGE_NOTRIM	= (1 << 24),	/* don't use TRIM */
 	ATA_HORKAGE_MAX_SEC_1024 = (1 << 25),	/* Limit max sects to 1024 */
 	ATA_HORKAGE_MAX_TRIM_128M = (1 << 26),	/* Limit max trim size to 128M */
+	ATA_HORKAGE_NO_NCQ_ON_AMD = (1 << 27),	/* Disable NCQ on AMD chipset */

 	 /* DMA mask for user DMA control: User visible values; DO NOT
 	    renumber */
--
2.31.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v5] libata: Add ATA_HORKAGE_NO_NCQ_ON_AMD for Samsung 860 and 870 SSD.
  2021-09-01 15:16 [PATCH v5] libata: Add ATA_HORKAGE_NO_NCQ_ON_AMD for Samsung 860 and 870 SSD Kate Hsuan
@ 2021-09-02 16:07 ` Hans de Goede
  2021-09-02 19:01   ` Martin K. Petersen
  0 siblings, 1 reply; 10+ messages in thread
From: Hans de Goede @ 2021-09-02 16:07 UTC (permalink / raw)
  To: Kate Hsuan, Jens Axboe, Damien Le Moal, Tor Vic; +Cc: linux-ide, linux-kernel

Hi All,

On 9/1/21 5:16 PM, Kate Hsuan wrote:
> Many users are reporting that the Samsung 860 and 870 SSD are having
> various issues when combined with AMD SATA controllers and only
> completely disabling NCQ helps to avoid these issues.
> 
> Always disabling NCQ for Samsung 860/870 SSDs regardless of the host
> SATA adapter vendor will cause I/O performance degradation with well
> behaved adapters. To limit the performance impact to AMD adapters,
> introduce the ATA_HORKAGE_NO_NCQ_ON_AMD flag to force disable NCQ
> only for these adapters.
> 
> Also, two libata.force parameters (noncqamd and ncqamd) are introduced
> to disable and enable the NCQ for the system which equiped with AMD
> SATA adapter and Samsung 860 and 870 SSDs. The user can determine NCQ
> function to be enabled or disabled according to the demand.
> 
> BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=201693
> Signed-off-by: Kate Hsuan <hpa@redhat.com>

Thanks, v5 looks good to me, but in the mean time I've also gotten the
first replies to my request to collect PCI-ids on systems having this
issue and it looks like checking for PCI_VENDOR_ID_AMD might not be
the right thing to do.

Tor Vic who has a very recent motherboard which does not show this
issue reports the following pci-ids:

06:00.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH
SATA Controller [AHCI mode] [1022:7901] (rev 51)
07:00.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH
SATA Controller [AHCI mode] [1022:7901] (rev 51)

Where as we now have a bunch of lspci -nn outputs from different motherboard
with the issue from: https://bugzilla.kernel.org/show_bug.cgi?id=201693 :

00:11.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD/ATI]
SB7x0/SB8x0/SB9x0 SATA Controller [AHCI mode] [1002:4391] (rev 40)

00:11.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD/ATI]
SB7x0/SB8x0/SB9x0 SATA Controller [AHCI mode] [1002:4391] (rev 40)

00:11.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD/ATI]
SB7x0/SB8x0/SB9x0 SATA Controller [AHCI mode] [1002:4391] (rev 40)

00:11.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD/ATI] 
SB7x0/SB8x0/SB9x0 SATA Controller [AHCI mode] [1002:4391]

00:11.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD/ATI]
SB7x0/SB8x0/SB9x0 SATA Controller [AHCI mode] [1002:4391]

Notice these are basically all the same; and it seems these are
all from the AMD/ATI period where the motherboars chipset where still
using PCI_VENDOR_ID_ATI as vendor-id.

So it looks like we actually need to disable NCQ for Samsung 860/870
devices when the SATA controller has a vendor-id of PCI_VENDOR_ID_ATI
rather then AMD.

So lets wait a bit for some more info to become available and then
I think we need a v6 with s/AMD/ATI/ (everywhere, also in the name
of the horkage-flag, the cmdline options, etc.).

Regards,

Hans

> ---
> Changes in v5:
> * The libata.force parameters ncqamd and noncqamd are used to enable and
>   disable the NCQ for the systems equiped with AMD SATA adapter and
>   Samsung 860 and 870 SSDs.
> * The character encoding of the patch comment was fixed.
> 
> Changes in v4:
> * A function ata_dev_check_adapter() is added to check the vendor ID of
>   the adapter.
> * ATA_HORKAGE_NONCQ_ON_AMD was modified to ATA_HORKAGE_NO_NCQ_ON_AMD to
>   align with the naming convention.
> * Commit messages were improved according to reviewer comments.
> 
> Changes in v3:
> * ATA_HORKAGE_NONCQ_ON_ASMEDIA_AMD_MARVELL was modified to
>   ATA_HORKAGE_NONCQ_ON_AMD.
> * Codes were fixed to completely disable NCQ on AMD controller.
> 
> ---
>  drivers/ata/libata-core.c | 34 ++++++++++++++++++++++++++++++++--
>  include/linux/libata.h    |  1 +
>  2 files changed, 33 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
> index 3eda3291952b..e2e900085f99 100644
> --- a/drivers/ata/libata-core.c
> +++ b/drivers/ata/libata-core.c
> @@ -2199,6 +2199,25 @@ static void ata_dev_config_ncq_prio(struct ata_device *dev)
> 
>  }
> 
> +static bool ata_dev_check_adapter(struct ata_device *dev,
> +				  unsigned short vendor_id)
> +{
> +	struct pci_dev *pcidev = NULL;
> +	struct device *parent_dev = NULL;
> +
> +	for (parent_dev = dev->tdev.parent; parent_dev != NULL;
> +	     parent_dev = parent_dev->parent) {
> +		if (dev_is_pci(parent_dev)) {
> +			pcidev = to_pci_dev(parent_dev);
> +			if (pcidev->vendor == vendor_id)
> +				return true;
> +			break;
> +		}
> +	}
> +
> +	return false;
> +}
> +
>  static int ata_dev_config_ncq(struct ata_device *dev,
>  			       char *desc, size_t desc_sz)
>  {
> @@ -2217,6 +2236,13 @@ static int ata_dev_config_ncq(struct ata_device *dev,
>  		snprintf(desc, desc_sz, "NCQ (not used)");
>  		return 0;
>  	}
> +
> +	if (dev->horkage & ATA_HORKAGE_NO_NCQ_ON_AMD &&
> +	    ata_dev_check_adapter(dev, PCI_VENDOR_ID_AMD)) {
> +		snprintf(desc, desc_sz, "NCQ (not used)");
> +		return 0;
> +	}
> +
>  	if (ap->flags & ATA_FLAG_NCQ) {
>  		hdepth = min(ap->scsi_host->can_queue, ATA_MAX_QUEUE);
>  		dev->flags |= ATA_DFLAG_NCQ;
> @@ -3951,9 +3977,11 @@ static const struct ata_blacklist_entry ata_device_blacklist [] = {
>  	{ "Samsung SSD 850*",		NULL,	ATA_HORKAGE_NO_NCQ_TRIM |
>  						ATA_HORKAGE_ZERO_AFTER_TRIM, },
>  	{ "Samsung SSD 860*",		NULL,	ATA_HORKAGE_NO_NCQ_TRIM |
> -						ATA_HORKAGE_ZERO_AFTER_TRIM, },
> +						ATA_HORKAGE_ZERO_AFTER_TRIM |
> +						ATA_HORKAGE_NO_NCQ_ON_AMD, },
>  	{ "Samsung SSD 870*",		NULL,	ATA_HORKAGE_NO_NCQ_TRIM |
> -						ATA_HORKAGE_ZERO_AFTER_TRIM, },
> +						ATA_HORKAGE_ZERO_AFTER_TRIM |
> +						ATA_HORKAGE_NO_NCQ_ON_AMD, },
>  	{ "FCCT*M500*",			NULL,	ATA_HORKAGE_NO_NCQ_TRIM |
>  						ATA_HORKAGE_ZERO_AFTER_TRIM, },
> 
> @@ -6108,6 +6136,8 @@ static int __init ata_parse_force_one(char **cur,
>  		{ "ncq",	.horkage_off	= ATA_HORKAGE_NONCQ },
>  		{ "noncqtrim",	.horkage_on	= ATA_HORKAGE_NO_NCQ_TRIM },
>  		{ "ncqtrim",	.horkage_off	= ATA_HORKAGE_NO_NCQ_TRIM },
> +		{ "noncqamd",	.horkage_on	= ATA_HORKAGE_NO_NCQ_ON_AMD },
> +		{ "ncqamd",	.horkage_off	= ATA_HORKAGE_NO_NCQ_ON_AMD },
>  		{ "dump_id",	.horkage_on	= ATA_HORKAGE_DUMP_ID },
>  		{ "pio0",	.xfer_mask	= 1 << (ATA_SHIFT_PIO + 0) },
>  		{ "pio1",	.xfer_mask	= 1 << (ATA_SHIFT_PIO + 1) },
> diff --git a/include/linux/libata.h b/include/linux/libata.h
> index 3fcd24236793..ef1417152ecd 100644
> --- a/include/linux/libata.h
> +++ b/include/linux/libata.h
> @@ -422,6 +422,7 @@ enum {
>  	ATA_HORKAGE_NOTRIM	= (1 << 24),	/* don't use TRIM */
>  	ATA_HORKAGE_MAX_SEC_1024 = (1 << 25),	/* Limit max sects to 1024 */
>  	ATA_HORKAGE_MAX_TRIM_128M = (1 << 26),	/* Limit max trim size to 128M */
> +	ATA_HORKAGE_NO_NCQ_ON_AMD = (1 << 27),	/* Disable NCQ on AMD chipset */
> 
>  	 /* DMA mask for user DMA control: User visible values; DO NOT
>  	    renumber */
> --
> 2.31.1
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v5] libata: Add ATA_HORKAGE_NO_NCQ_ON_AMD for Samsung 860 and 870 SSD.
  2021-09-02 16:07 ` Hans de Goede
@ 2021-09-02 19:01   ` Martin K. Petersen
  2021-09-02 20:46     ` Hans de Goede
  0 siblings, 1 reply; 10+ messages in thread
From: Martin K. Petersen @ 2021-09-02 19:01 UTC (permalink / raw)
  To: Hans de Goede
  Cc: Kate Hsuan, Jens Axboe, Damien Le Moal, Tor Vic, linux-ide, linux-kernel


Hans,

> So it looks like we actually need to disable NCQ for Samsung 860/870
> devices when the SATA controller has a vendor-id of PCI_VENDOR_ID_ATI
> rather then AMD.

That's another great data point!

I wonder if there actually is a Samsung problem (given that these drives
work fine on other controllers). Or if it is just the queued trim
handling that's broken on 1002:4391 controllers from ATI.

When I originally experimented with queued trim I had systems I could
not get to work. But queued trim worked fine when the same drives were
connected to more modern chipsets (note that this was "did not work at
all" as opposed to "randomly corrupting data").

Do we have any evidence at all of queued trim working with non-Samsung
drives on these controllers? Not sure how many modern SATA drives
actually implement this feature. Maybe the reason we see Samsung drives
in the bug reports is due to a combination of popularity and the fact
that these drives actually implement queued trim support.

> So lets wait a bit for some more info to become available and then
> I think we need a v6 with s/AMD/ATI/ (everywhere, also in the name
> of the horkage-flag, the cmdline options, etc.).

Sounds good!

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v5] libata: Add ATA_HORKAGE_NO_NCQ_ON_AMD for Samsung 860 and 870 SSD.
  2021-09-02 19:01   ` Martin K. Petersen
@ 2021-09-02 20:46     ` Hans de Goede
  2021-09-02 20:50       ` Hans de Goede
  0 siblings, 1 reply; 10+ messages in thread
From: Hans de Goede @ 2021-09-02 20:46 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Kate Hsuan, Jens Axboe, Damien Le Moal, Tor Vic, linux-ide, linux-kernel

Hi,

On 9/2/21 9:01 PM, Martin K. Petersen wrote:
> 
> Hans,
> 
>> So it looks like we actually need to disable NCQ for Samsung 860/870
>> devices when the SATA controller has a vendor-id of PCI_VENDOR_ID_ATI
>> rather then AMD.
> 
> That's another great data point!
> 
> I wonder if there actually is a Samsung problem (given that these drives
> work fine on other controllers). Or if it is just the queued trim
> handling that's broken on 1002:4391 controllers from ATI.
> 
> When I originally experimented with queued trim I had systems I could
> not get to work. But queued trim worked fine when the same drives were
> connected to more modern chipsets (note that this was "did not work at
> all" as opposed to "randomly corrupting data").
>
> Do we have any evidence at all of queued trim working with non-Samsung
> drives on these controllers? Not sure how many modern SATA drives
> actually implement this feature. Maybe the reason we see Samsung drives
> in the bug reports is due to a combination of popularity and the fact
> that these drives actually implement queued trim support.

The Samsung 860 / 870 series causing issues when queued trim support
is enabled are quite wide-spread, covering many different controller
models from all well known controller vendors (Intel, Asmedia, Marvell
and AMD). So disabling queued-trim support definitely is the right
thing to do (and we should have done so a long time ago, I am to
blame for this not being done sooner).

As for your theory that it is really a problem with the controller
and not the the SSDs, I honestly do not know, but I doubt it,
there are no such reports with any other vendor's SSD or newer
Samsung models, so this seems unlikely.

Regards,

Hans


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v5] libata: Add ATA_HORKAGE_NO_NCQ_ON_AMD for Samsung 860 and 870 SSD.
  2021-09-02 20:46     ` Hans de Goede
@ 2021-09-02 20:50       ` Hans de Goede
  2021-09-03  3:21         ` Martin K. Petersen
  0 siblings, 1 reply; 10+ messages in thread
From: Hans de Goede @ 2021-09-02 20:50 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Kate Hsuan, Jens Axboe, Damien Le Moal, Tor Vic, linux-ide, linux-kernel

Hi Again,

On 9/2/21 10:46 PM, Hans de Goede wrote:
> Hi,
> 
> On 9/2/21 9:01 PM, Martin K. Petersen wrote:
>>
>> Hans,
>>
>>> So it looks like we actually need to disable NCQ for Samsung 860/870
>>> devices when the SATA controller has a vendor-id of PCI_VENDOR_ID_ATI
>>> rather then AMD.
>>
>> That's another great data point!
>>
>> I wonder if there actually is a Samsung problem (given that these drives
>> work fine on other controllers). Or if it is just the queued trim
>> handling that's broken on 1002:4391 controllers from ATI.
>>
>> When I originally experimented with queued trim I had systems I could
>> not get to work. But queued trim worked fine when the same drives were
>> connected to more modern chipsets (note that this was "did not work at
>> all" as opposed to "randomly corrupting data").
>>
>> Do we have any evidence at all of queued trim working with non-Samsung
>> drives on these controllers? Not sure how many modern SATA drives
>> actually implement this feature. Maybe the reason we see Samsung drives
>> in the bug reports is due to a combination of popularity and the fact
>> that these drives actually implement queued trim support.
> 
> The Samsung 860 / 870 series causing issues when queued trim support
> is enabled are quite wide-spread, covering many different controller
> models from all well known controller vendors (Intel, Asmedia, Marvell
> and AMD). So disabling queued-trim support definitely is the right
> thing to do (and we should have done so a long time ago, I am to
> blame for this not being done sooner).
> 
> As for your theory that it is really a problem with the controller
> and not the the SSDs, I honestly do not know, but I doubt it,
> there are no such reports with any other vendor's SSD or newer
> Samsung models, so this seems unlikely.

I just realized that all newer Samsung models are non SATA...

Still I cponsider it likely that some of the other vendors also
implement queued trim support and there are no reports of issues
with the other vendors' SSDs.

Regards,

Hans


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v5] libata: Add ATA_HORKAGE_NO_NCQ_ON_AMD for Samsung 860 and 870 SSD.
  2021-09-02 20:50       ` Hans de Goede
@ 2021-09-03  3:21         ` Martin K. Petersen
  2021-09-19 14:24           ` Tor Vic
  0 siblings, 1 reply; 10+ messages in thread
From: Martin K. Petersen @ 2021-09-03  3:21 UTC (permalink / raw)
  To: Hans de Goede
  Cc: Martin K. Petersen, Kate Hsuan, Jens Axboe, Damien Le Moal,
	Tor Vic, linux-ide, linux-kernel


Hans,

> I just realized that all newer Samsung models are non SATA...
>
> Still I cponsider it likely that some of the other vendors also
> implement queued trim support and there are no reports of issues
> with the other vendors' SSDs.

When I originally worked on this the only other drive that supported
queued trim was a specific controller generation from Crucial/Micron.

Since performance-sensitive workloads quickly moved to NVMe, I don't
know if implementing queued trim has been very high on the SSD
manufacturers' todo lists. FWIW, I just checked and none of the more
recent SATA SSD drives I happen to have support queued trim.

Purely anecdotal: I have a Samsung 863 which I believe is
architecturally very similar to the 860. That drive clocked over 40K
hours as my main git repo/build drive until it was retired last
fall. And it ran a queued fstrim every night.

Anyway. I am not against disabling queued trim for these drives. As far
as I'm concerned it was a feature that didn't quite get enough industry
momentum. It just irks me that we don't have a good understanding of why
it works for some and not for others...

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v5] libata: Add ATA_HORKAGE_NO_NCQ_ON_AMD for Samsung 860 and 870 SSD.
  2021-09-03  3:21         ` Martin K. Petersen
@ 2021-09-19 14:24           ` Tor Vic
  2021-09-19 15:27             ` Hans de Goede
  0 siblings, 1 reply; 10+ messages in thread
From: Tor Vic @ 2021-09-19 14:24 UTC (permalink / raw)
  To: Martin K. Petersen, Hans de Goede
  Cc: Kate Hsuan, Jens Axboe, Damien Le Moal, linux-ide, linux-kernel

Hi,

I saw that v2 (?) of this patch has made it into stable, which
is quite reasonable given the number of bug reports.
Are there any plans to "enhance" this patch once sufficient data
on controller support/drive combinations has been collected?

I didn't run any benchmarks to see whether performance has changed,
but I now have this on 5.14.6:

   /sys/class/ata_device/dev3.0/trim:forced_unqueued
   /sys/class/ata_device/dev4.0/trim:forced_unqueued

Before:

   /sys/class/ata_device/dev3.0/trim:queued
   /sys/class/ata_device/dev4.0/trim:queued

These correspond to 860 Pro and 860 Evo, connected to a X570
mainboard (AMD FCH controller).
Note that neither before nor after this commit I had any problems
with these drives.


On 03.09.21 03:21, Martin K. Petersen wrote:
> 
> Hans,
> 
>> I just realized that all newer Samsung models are non SATA...
>>
>> Still I cponsider it likely that some of the other vendors also
>> implement queued trim support and there are no reports of issues
>> with the other vendors' SSDs.
> 
> When I originally worked on this the only other drive that supported
> queued trim was a specific controller generation from Crucial/Micron.
> 
> Since performance-sensitive workloads quickly moved to NVMe, I don't
> know if implementing queued trim has been very high on the SSD
> manufacturers' todo lists. FWIW, I just checked and none of the more
> recent SATA SSD drives I happen to have support queued trim.
> 
> Purely anecdotal: I have a Samsung 863 which I believe is
> architecturally very similar to the 860. That drive clocked over 40K
> hours as my main git repo/build drive until it was retired last
> fall. And it ran a queued fstrim every night.
> 
> Anyway. I am not against disabling queued trim for these drives. As far
> as I'm concerned it was a feature that didn't quite get enough industry
> momentum. It just irks me that we don't have a good understanding of why
> it works for some and not for others...
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v5] libata: Add ATA_HORKAGE_NO_NCQ_ON_AMD for Samsung 860 and 870 SSD.
  2021-09-19 14:24           ` Tor Vic
@ 2021-09-19 15:27             ` Hans de Goede
  2021-09-19 16:27               ` Tor Vic
  0 siblings, 1 reply; 10+ messages in thread
From: Hans de Goede @ 2021-09-19 15:27 UTC (permalink / raw)
  To: Tor Vic, Martin K. Petersen
  Cc: Kate Hsuan, Jens Axboe, Damien Le Moal, linux-ide, linux-kernel

Hi Tor,

On 9/19/21 4:24 PM, Tor Vic wrote:
> Hi,
> 
> I saw that v2 (?) of this patch has made it into stable, which
> is quite reasonable given the number of bug reports.
> Are there any plans to "enhance" this patch once sufficient data
> on controller support/drive combinations has been collected?

ATM there are no plans to limit these quirks, we have bug
reports of queued trims being an issue over all usual chip-vendors
of sata controllers (including more recent AMD models).

Note that unless you have immediate "discard" enabled as an option
on all layers of your storage stack (dmcrypt, device-mapper/raid,
filesystem) then this change will not impact you at all.

Also note that AFAIK all major distros do not enable immediate
discard, instead relying on fstrim runs from a cronjob, which
again means this change will not impact users of those distros.

So chances are that your workload simply never triggered the issue;
and this is the cause of everything always having worked fine for
you.

Regards,

Hans


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v5] libata: Add ATA_HORKAGE_NO_NCQ_ON_AMD for Samsung 860 and 870 SSD.
  2021-09-19 15:27             ` Hans de Goede
@ 2021-09-19 16:27               ` Tor Vic
  2021-09-19 16:51                 ` Hans de Goede
  0 siblings, 1 reply; 10+ messages in thread
From: Tor Vic @ 2021-09-19 16:27 UTC (permalink / raw)
  To: Hans de Goede, Martin K. Petersen
  Cc: Kate Hsuan, Jens Axboe, Damien Le Moal, linux-ide, linux-kernel



On 19.09.21 15:27, Hans de Goede wrote:
> Hi Tor,
> 
> On 9/19/21 4:24 PM, Tor Vic wrote:
>> Hi,
>>
>> I saw that v2 (?) of this patch has made it into stable, which
>> is quite reasonable given the number of bug reports.
>> Are there any plans to "enhance" this patch once sufficient data
>> on controller support/drive combinations has been collected?
> 
> ATM there are no plans to limit these quirks, we have bug
> reports of queued trims being an issue over all usual chip-vendors
> of sata controllers (including more recent AMD models).
> 
> Note that unless you have immediate "discard" enabled as an option
> on all layers of your storage stack (dmcrypt, device-mapper/raid,
> filesystem) then this change will not impact you at all.

Is that the "discard" mount option?
I added this to one of the partitions residing on my 860 Evo,
reverted the patch, and it still seems to work just fine.

   $ mount | grep sdb 
 
 

   /dev/sdb1 on /mnt/vbox type ext4 (rw,nosuid,nodev,noatime,discard)

Is there another place where discard has to be enabled?
Or is there a way to check that discard is effectively enabled?

Not sure if relevant, but here are a couple of lines from the syslog:

   ata4.00: 976773168 sectors, multi 1: LBA48 NCQ (depth 32), AA
   [...]
   ata4.00: Enabling discard_zeroes_data

Thanks!

> 
> Also note that AFAIK all major distros do not enable immediate
> discard, instead relying on fstrim runs from a cronjob, which
> again means this change will not impact users of those distros.
> 
> So chances are that your workload simply never triggered the issue;
> and this is the cause of everything always having worked fine for
> you.
> 
> Regards,
> 
> Hans
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v5] libata: Add ATA_HORKAGE_NO_NCQ_ON_AMD for Samsung 860 and 870 SSD.
  2021-09-19 16:27               ` Tor Vic
@ 2021-09-19 16:51                 ` Hans de Goede
  0 siblings, 0 replies; 10+ messages in thread
From: Hans de Goede @ 2021-09-19 16:51 UTC (permalink / raw)
  To: Tor Vic, Martin K. Petersen
  Cc: Kate Hsuan, Jens Axboe, Damien Le Moal, linux-ide, linux-kernel

Hi,

On 9/19/21 6:27 PM, Tor Vic wrote:
> 
> 
> On 19.09.21 15:27, Hans de Goede wrote:
>> Hi Tor,
>>
>> On 9/19/21 4:24 PM, Tor Vic wrote:
>>> Hi,
>>>
>>> I saw that v2 (?) of this patch has made it into stable, which
>>> is quite reasonable given the number of bug reports.
>>> Are there any plans to "enhance" this patch once sufficient data
>>> on controller support/drive combinations has been collected?
>>
>> ATM there are no plans to limit these quirks, we have bug
>> reports of queued trims being an issue over all usual chip-vendors
>> of sata controllers (including more recent AMD models).
>>
>> Note that unless you have immediate "discard" enabled as an option
>> on all layers of your storage stack (dmcrypt, device-mapper/raid,
>> filesystem) then this change will not impact you at all.
> 
> Is that the "discard" mount option?
> I added this to one of the partitions residing on my 860 Evo,
> reverted the patch, and it still seems to work just fine.
> 
>   $ mount | grep sdb
> 
> 
>   /dev/sdb1 on /mnt/vbox type ext4 (rw,nosuid,nodev,noatime,discard)
> 
> Is there another place where discard has to be enabled?

No since you do not seem to be using dmcrypt/raid/lvm that should
do the trick.

Except that it sounds like this is a partition carrying vm
images. Those never delete storage, they only grow, to
effectively trim you need to either punch holes in files,
or remove files. Discard only comes in to play when used
diskspace becomes unused.

To test preferably remove several large files at once while
also generating a whole bunch of other diskio (e.g.
compile the kernel while also deleting several large files
from the same disk, with discard enable).

But even if that works for you, that is 1 report that this
works in some cases, vs many that it does not work; and also
note that you had to manually enable this, it was not
enabled before. So this really is going to impact the
performance of very few users, while looking at the amount
of bugreports about hangs / disk-corruption the problem
of having queued-trim support enabled is much much bigger,
so I see very little reason to re-enable this even if it
happens to work in your case.

> Or is there a way to check that discard is effectively enabled?

There probably is, but I don't know where / how to check this.

Regards,

Hans




^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-09-19 16:51 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-01 15:16 [PATCH v5] libata: Add ATA_HORKAGE_NO_NCQ_ON_AMD for Samsung 860 and 870 SSD Kate Hsuan
2021-09-02 16:07 ` Hans de Goede
2021-09-02 19:01   ` Martin K. Petersen
2021-09-02 20:46     ` Hans de Goede
2021-09-02 20:50       ` Hans de Goede
2021-09-03  3:21         ` Martin K. Petersen
2021-09-19 14:24           ` Tor Vic
2021-09-19 15:27             ` Hans de Goede
2021-09-19 16:27               ` Tor Vic
2021-09-19 16:51                 ` Hans de Goede

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).