linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] sd: make ->no_write_same independent of reported ->max_ws_blocks
@ 2016-12-05 23:56 Nicolai Stange
  2016-12-06  3:29 ` Martin K. Petersen
  0 siblings, 1 reply; 4+ messages in thread
From: Nicolai Stange @ 2016-12-05 23:56 UTC (permalink / raw)
  To: Jens Axboe
  Cc: James E.J. Bottomley, Martin K. Petersen, Shaun Tancheff,
	Chaitanya Kulkarni, Christoph Hellwig, linux-scsi, linux-kernel,
	Nicolai Stange

Due to reported problems with Write Same on ATA devices,
commit 0ce1b18c42a5 ("libata: Some drives failing on SCT Write Same")
strived to report non-support for Write Same on non-zoned ATA devices.

However, due to the following control flow in sd_config_write_same() this
doesn't always take effect, namely if the ->max_ws_blocks as set in the
by the ATA Identify Device exceeds SD_WS10_BLOCKS:

  if (sdkp->max_ws_blocks > SD_MAX_WS10_BLOCKS)
    [...]
  else if (sdkp->ws16 || sdkp->ws10 || sdkp->device->no_report_opcodes)
    [...]
  else {
    sdkp->device->no_write_same = 1;
    sdkp->max_ws_blocks = 0;
  }

Since commit e73c23ff736e ("block: add async variant of
blkdev_issue_zeroout"), blkdev_issue_zeroout() got a little bit more
sensitive towards failing Write Sames on devices that claim to support them
and this results in messages like

  EXT4-fs (dm-1): Delayed block allocation failed for inode 2625094 at
                  logical offset 2032 with max blocks 2 with error 121
  EXT4-fs (dm-1): This should not happen!! Data will be lost

The block limits VPD page of the device in question quotes a value of
0x3fffc0 > 0xffff == SD_MAX_WS10_BLOCKS for the device in question.

The error code 121 is EREMOTEIO which gets asserted by scsi_io_completion()
in case of invalid requests due to invalid command opcodes.

Fix this by doing the sdkp->max_ws_blocks > SD_MAX_WS10_BLOCKS handling
only if some kind of Write Same support is reported, i.e. only if
  sdkp->ws16 || sdkp->ws10 || sdkp->device->no_report_opcodes
holds. Let the handling code for the non-support case thus take effect,
if needed.

Fixes: e73c23ff736e ("block: add async variant of blkdev_issue_zeroout")
Fixes: 0ce1b18c42a5 ("libata: Some drives failing on SCT Write Same")
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
---
Applicable to next-20161202.

 drivers/scsi/sd.c | 27 +++++++++++++++------------
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 2cfeb3c..ef1bab5 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -806,18 +806,21 @@ static void sd_config_write_same(struct scsi_disk *sdkp)
 		goto out;
 	}
 
-	/* Some devices can not handle block counts above 0xffff despite
-	 * supporting WRITE SAME(16). Consequently we default to 64k
-	 * blocks per I/O unless the device explicitly advertises a
-	 * bigger limit.
-	 */
-	if (sdkp->max_ws_blocks > SD_MAX_WS10_BLOCKS)
-		sdkp->max_ws_blocks = min_not_zero(sdkp->max_ws_blocks,
-						   (u32)SD_MAX_WS16_BLOCKS);
-	else if (sdkp->ws16 || sdkp->ws10 || sdkp->device->no_report_opcodes)
-		sdkp->max_ws_blocks = min_not_zero(sdkp->max_ws_blocks,
-						   (u32)SD_MAX_WS10_BLOCKS);
-	else {
+	if (sdkp->ws16 || sdkp->ws10 || sdkp->device->no_report_opcodes) {
+		/*
+		 * Some devices can not handle block counts above 0xffff despite
+		 * supporting WRITE SAME(16). Consequently we default to 64k
+		 * blocks per I/O unless the device explicitly advertises a
+		 * bigger limit.
+		 */
+		if (sdkp->max_ws_blocks > SD_MAX_WS10_BLOCKS) {
+			sdkp->max_ws_blocks = min_not_zero(sdkp->max_ws_blocks,
+						(u32)SD_MAX_WS16_BLOCKS);
+		} else {
+			sdkp->max_ws_blocks = min_not_zero(sdkp->max_ws_blocks,
+						(u32)SD_MAX_WS10_BLOCKS);
+		}
+	} else {
 		sdkp->device->no_write_same = 1;
 		sdkp->max_ws_blocks = 0;
 	}
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] sd: make ->no_write_same independent of reported ->max_ws_blocks
  2016-12-05 23:56 [PATCH] sd: make ->no_write_same independent of reported ->max_ws_blocks Nicolai Stange
@ 2016-12-06  3:29 ` Martin K. Petersen
  2016-12-06  8:08   ` Nicolai Stange
  0 siblings, 1 reply; 4+ messages in thread
From: Martin K. Petersen @ 2016-12-06  3:29 UTC (permalink / raw)
  To: Nicolai Stange
  Cc: Jens Axboe, James E.J. Bottomley, Martin K. Petersen,
	Shaun Tancheff, Chaitanya Kulkarni, Christoph Hellwig,
	linux-scsi, linux-kernel

>>>>> "Nicolai" == Nicolai Stange <nicstange@gmail.com> writes:

Nicolai,

Nicolai> Due to reported problems with Write Same on ATA devices, commit
Nicolai> 0ce1b18c42a5 ("libata: Some drives failing on SCT Write Same")
Nicolai> strived to report non-support for Write Same on non-zoned ATA
Nicolai> devices.

Nicolai> However, due to the following control flow in
Nicolai> sd_config_write_same() this doesn't always take effect, namely
Nicolai> if the ->max_ws_blocks as set in the by the ATA Identify Device
Nicolai> exceeds SD_WS10_BLOCKS:

I'd much prefer for libata to set no_write_same = 1 for non-ZAC devices.

Older SCSI devices have no way to explicitly report that WRITE SAME is
supported. So the heuristic is the way it is to permit trying WRITE SAME
unless no_write_same has been set by the device driver. libata used to
do this prior to the zoned support going in.

Nicolai> Since commit e73c23ff736e ("block: add async variant of
Nicolai> blkdev_issue_zeroout"), blkdev_issue_zeroout() got a little bit
Nicolai> more sensitive towards failing Write Sames on devices that
Nicolai> claim to support them and this results in messages like

That's something that needs to be addressed. blkdev_issue_zeroout() must
cope with WRITE SAME failing and fall back to a manual zeroout.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] sd: make ->no_write_same independent of reported ->max_ws_blocks
  2016-12-06  3:29 ` Martin K. Petersen
@ 2016-12-06  8:08   ` Nicolai Stange
  2016-12-08  0:18     ` Martin K. Petersen
  0 siblings, 1 reply; 4+ messages in thread
From: Nicolai Stange @ 2016-12-06  8:08 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Nicolai Stange, Jens Axboe, James E.J. Bottomley, Shaun Tancheff,
	Chaitanya Kulkarni, Christoph Hellwig, linux-scsi, linux-kernel

Hello Martin,

"Martin K. Petersen" <martin.petersen@oracle.com> writes:

>>>>>> "Nicolai" == Nicolai Stange <nicstange@gmail.com> writes:
> Nicolai> Due to reported problems with Write Same on ATA devices, commit
> Nicolai> 0ce1b18c42a5 ("libata: Some drives failing on SCT Write Same")
> Nicolai> strived to report non-support for Write Same on non-zoned ATA
> Nicolai> devices.
>
> Nicolai> However, due to the following control flow in
> Nicolai> sd_config_write_same() this doesn't always take effect, namely
> Nicolai> if the ->max_ws_blocks as set in the by the ATA Identify Device
> Nicolai> exceeds SD_WS10_BLOCKS:
>
> I'd much prefer for libata to set no_write_same = 1 for non-ZAC devices.

Or just try it once and let the sd layer, i.e. sd_done(), disable it
once a ILLEGAL COMMAND OPCODE is reported. This works right now and as
you said below, calling code must cope gracefully with a failing Write
Same anyway (which doesn't work right now).

>
> Older SCSI devices have no way to explicitly report that WRITE SAME is
> supported. So the heuristic is the way it is to permit trying WRITE SAME
> unless no_write_same has been set by the device driver.

Ok, I didn't see that there might be a heuristic going on.

I've got a couple of questions about this, but they're mainly out of
curiosity. So feel free to ignore them.

1.) Do these older SCSI devices have a way to report ->max_ws_blocks?
    Because otherwise the heuristic would not work?
    Or is it set speculatively somewhere?

2.) If so, what about such older devices having
    0 < ->max_ws_blocks < SD_MAX_WS10_BLOCKS ?
    Wouldn't these also be suitable candidates for trying that
    heuristic on?

3.) Those older devices that have ->max_ws_blocks > SD_MAX_WS10_BLOCKS
    but ->ws16 == ->ws10 == 0, i.e. the heuristicated ones would
    always be given WRITE_SAME, not WRITE_SAME_16 commands?
    C.f. sd_setup_write_same_cmnd(): if ->ws16 is not set, do
    WRITE_SAME. Isn't this a little bit odd given that the reported 
    ->max_ws_blocks would be greater than SD_MAX_WS10_BLOCKS?
    Ok, given that these devices are older anyway, WRITE_SAME seems
    like the obvious choice to be made over WRITE_SAME_16. Which
    brings me back to question 2.).

    The answer to this question would possibly affect ATA devices with
    this heuristic going on as well: according to ata_scsiop_maint_in(),
    they would only support WRITE_SAME_16, but not WRITE_SAME.

    Heck, this is perhaps the reason why I'm seeing those errors this
    commit 0ce1b18c42a5 ("libata: Some drives failing on SCT Write
    Same") effectively turns the heuristics for my ATA device on,
    i.e. unsets ->ws16, resulting in WRITE_SAME's which are unsupported
    by libata-scsi, c.f. ata_get_xlat_func()...

>
> Nicolai> Since commit e73c23ff736e ("block: add async variant of
> Nicolai> blkdev_issue_zeroout"), blkdev_issue_zeroout() got a little bit
> Nicolai> more sensitive towards failing Write Sames on devices that
> Nicolai> claim to support them and this results in messages like
>
> That's something that needs to be addressed. blkdev_issue_zeroout() must
> cope with WRITE SAME failing and fall back to a manual zeroout.

That's very useful information! So this commit really needs a fixup in
either way.


Thank you!

Nicolai

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] sd: make ->no_write_same independent of reported ->max_ws_blocks
  2016-12-06  8:08   ` Nicolai Stange
@ 2016-12-08  0:18     ` Martin K. Petersen
  0 siblings, 0 replies; 4+ messages in thread
From: Martin K. Petersen @ 2016-12-08  0:18 UTC (permalink / raw)
  To: Nicolai Stange
  Cc: Martin K. Petersen, Jens Axboe, James E.J. Bottomley,
	Shaun Tancheff, Chaitanya Kulkarni, Christoph Hellwig,
	linux-scsi, linux-kernel

>>>>> "Nicolai" == Nicolai Stange <nicstange@gmail.com> writes:

Nicolai,

Nicolai> 1.) Do these older SCSI devices have a way to report
Nicolai>     ->max_ws_blocks?

I'm afraid not.

Nicolai> 3.) Those older devices that have ->max_ws_blocks >
Nicolai>     SD_MAX_WS10_BLOCKS but ->ws16 == ->ws10 == 0, i.e. the
Nicolai>     heuristicated ones would always be given WRITE_SAME, not
Nicolai>     WRITE_SAME_16 commands?  C.f. sd_setup_write_same_cmnd():
Nicolai>     if ->ws16 is not set, do WRITE_SAME. Isn't this a little
Nicolai>     bit odd given that the reported -> max_ws_blocks would be
Nicolai>     greater than SD_MAX_WS10_BLOCKS?

The check looks confusing because it caps the number of blocks to 0xFFFF
(the WRITE SAME(10) limit) for WRITE SAME(16) commands. Several older
devices accept the 16-byte command which allows for a bigger block range
but they only actually check the lower two bytes. This will cause data
corruption as parts of a block range may not be zeroed as requested.

FWIW, I'm in agreement with your patch to disable write same in libata
as a quick fix for 4.9.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-12-08  0:19 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-05 23:56 [PATCH] sd: make ->no_write_same independent of reported ->max_ws_blocks Nicolai Stange
2016-12-06  3:29 ` Martin K. Petersen
2016-12-06  8:08   ` Nicolai Stange
2016-12-08  0:18     ` Martin K. Petersen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).