All of lore.kernel.org
 help / color / mirror / Atom feed
* [REGRESSION v4.10-rc1] blkdev_issue_zeroout() returns -EREMOTEIO on the first call for SCSI device that doesn't support WRITE SAME
@ 2017-02-03  7:55 Junichi Nomura
  2017-02-03 15:21 ` Jens Axboe
  0 siblings, 1 reply; 8+ messages in thread
From: Junichi Nomura @ 2017-02-03  7:55 UTC (permalink / raw)
  To: linux-block, linux-scsi

I found following ext4 error occurs on a certain storage since v4.10-rc1:
  EXT4-fs (sdc1): Delayed block allocation failed for inode 12 at logical offset 100 with max blocks 2 with error 121
  EXT4-fs (sdc1): This should not happen!! Data will be lost

Error 121 (EREMOTEIO) was returned from blkdev_issue_zeroout().
That came from sd driver because WRITE SAME was sent to the device
which didn't support it.

The problem was introduced by commit e73c23ff736e ("block: add async
variant of blkdev_issue_zeroout"). Before the commit, blkdev_issue_zeroout
fell back to normal zero writing when WRITE SAME failed and it seems
sd driver's heuristics depends on that behaviour.

Below is a band-aid fix to restore the fallback behaviour for sd. Although
there should be better fix as retrying blindly is not a good idea...

v4.10-rc6:
  # cat /sys/block/sdc/queue/write_same_max_bytes
  33553920
  # fallocate -v -z -l 512 /dev/sdc1
  fallocate: fallocate failed: Remote I/O error
  # cat /sys/block/sdc/queue/write_same_max_bytes
  0
  # fallocate -v -z -l 512 /dev/sdc1
  # echo $?
  0

v4.9 or v4.10-rc6 + this patch:
  # grep . /sys/block/sdc/queue/write_same_max_bytes
  33553920
  # fallocate -v -z -l 512 /dev/sdc1
  # echo $?
  0
  # grep . /sys/block/sdc/queue/write_same_max_bytes
  0

diff --git a/block/blk-lib.c b/block/blk-lib.c
index f8c82a9..8e53474 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -360,6 +360,7 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 			 sector_t nr_sects, gfp_t gfp_mask, bool discard)
 {
 	int ret;
+	int pass = 0;
 	struct bio *bio = NULL;
 	struct blk_plug plug;
 
@@ -369,6 +370,7 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 			return 0;
 	}
 
+  retry_other_method:
 	blk_start_plug(&plug);
 	ret = __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask,
 			&bio, discard);
@@ -378,6 +380,11 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 	}
 	blk_finish_plug(&plug);
 
+	if (ret && pass++ == 0) {
+		bio = NULL;
+		goto retry_other_method;
+	}
+
 	return ret;
 }
 EXPORT_SYMBOL(blkdev_issue_zeroout);
-- 
Jun'ichi Nomura, NEC Corporation / NEC Solution Innovators, Ltd.

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [REGRESSION v4.10-rc1] blkdev_issue_zeroout() returns -EREMOTEIO on the first call for SCSI device that doesn't support WRITE SAME
  2017-02-03  7:55 [REGRESSION v4.10-rc1] blkdev_issue_zeroout() returns -EREMOTEIO on the first call for SCSI device that doesn't support WRITE SAME Junichi Nomura
@ 2017-02-03 15:21 ` Jens Axboe
  2017-02-03 16:12   ` Christoph Hellwig
  0 siblings, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2017-02-03 15:21 UTC (permalink / raw)
  To: Junichi Nomura, linux-block, linux-scsi
  Cc: Christoph Hellwig, Chaitanya Kulkarni

On 02/03/2017 12:55 AM, Junichi Nomura wrote:
> I found following ext4 error occurs on a certain storage since v4.10-rc1:
>   EXT4-fs (sdc1): Delayed block allocation failed for inode 12 at logical offset 100 with max blocks 2 with error 121
>   EXT4-fs (sdc1): This should not happen!! Data will be lost
> 
> Error 121 (EREMOTEIO) was returned from blkdev_issue_zeroout().
> That came from sd driver because WRITE SAME was sent to the device
> which didn't support it.
> 
> The problem was introduced by commit e73c23ff736e ("block: add async
> variant of blkdev_issue_zeroout"). Before the commit, blkdev_issue_zeroout
> fell back to normal zero writing when WRITE SAME failed and it seems
> sd driver's heuristics depends on that behaviour.

CC Christoph and Chaitanya.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [REGRESSION v4.10-rc1] blkdev_issue_zeroout() returns -EREMOTEIO on the first call for SCSI device that doesn't support WRITE SAME
  2017-02-03 15:21 ` Jens Axboe
@ 2017-02-03 16:12   ` Christoph Hellwig
  2017-02-03 16:14     ` Jens Axboe
  0 siblings, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2017-02-03 16:12 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Junichi Nomura, linux-block, linux-scsi, Christoph Hellwig,
	Chaitanya Kulkarni, Martin K. Petersen

On Fri, Feb 03, 2017 at 08:21:31AM -0700, Jens Axboe wrote:
> > Error 121 (EREMOTEIO) was returned from blkdev_issue_zeroout().
> > That came from sd driver because WRITE SAME was sent to the device
> > which didn't support it.
> > 
> > The problem was introduced by commit e73c23ff736e ("block: add async
> > variant of blkdev_issue_zeroout"). Before the commit, blkdev_issue_zeroout
> > fell back to normal zero writing when WRITE SAME failed and it seems
> > sd driver's heuristics depends on that behaviour.
> 
> CC Christoph and Chaitanya.

And adding Martin as the sd.c Write Same code is his.

I suspect we'll have to restore the old way this works for 4.10 as it's
too late in the cycle, but that whole idea of trying Write Same first
and just disabling it if it doesn't work is a receipe for desaster -
it kinda works for a synchronous blkdev_issue_zeroout, but if we want
to be able to submit it asynchronously it's getting too hairy to handle.

I think we should fix sd.c to only send WRITE SAME if either of the
variants are explicitly listed as supported through
REPORT SUPPORTED OPERATION CODES, or maybe through a whitelist if
there are important enough devices.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [REGRESSION v4.10-rc1] blkdev_issue_zeroout() returns -EREMOTEIO on the first call for SCSI device that doesn't support WRITE SAME
  2017-02-03 16:12   ` Christoph Hellwig
@ 2017-02-03 16:14     ` Jens Axboe
  2017-02-03 22:45         ` Martin K. Petersen
  0 siblings, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2017-02-03 16:14 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Junichi Nomura, linux-block, linux-scsi, Chaitanya Kulkarni,
	Martin K. Petersen

On 02/03/2017 09:12 AM, Christoph Hellwig wrote:
> On Fri, Feb 03, 2017 at 08:21:31AM -0700, Jens Axboe wrote:
>>> Error 121 (EREMOTEIO) was returned from blkdev_issue_zeroout().
>>> That came from sd driver because WRITE SAME was sent to the device
>>> which didn't support it.
>>>
>>> The problem was introduced by commit e73c23ff736e ("block: add async
>>> variant of blkdev_issue_zeroout"). Before the commit, blkdev_issue_zeroout
>>> fell back to normal zero writing when WRITE SAME failed and it seems
>>> sd driver's heuristics depends on that behaviour.
>>
>> CC Christoph and Chaitanya.
> 
> And adding Martin as the sd.c Write Same code is his.
> 
> I suspect we'll have to restore the old way this works for 4.10 as it's
> too late in the cycle, but that whole idea of trying Write Same first
> and just disabling it if it doesn't work is a receipe for desaster -
> it kinda works for a synchronous blkdev_issue_zeroout, but if we want
> to be able to submit it asynchronously it's getting too hairy to handle.

I agree, the current approach is a hot and ugly mess.

> I think we should fix sd.c to only send WRITE SAME if either of the
> variants are explicitly listed as supported through
> REPORT SUPPORTED OPERATION CODES, or maybe through a whitelist if
> there are important enough devices.

Yep

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [REGRESSION v4.10-rc1] blkdev_issue_zeroout() returns -EREMOTEIO on the first call for SCSI device that doesn't support WRITE SAME
  2017-02-03 16:14     ` Jens Axboe
@ 2017-02-03 22:45         ` Martin K. Petersen
  0 siblings, 0 replies; 8+ messages in thread
From: Martin K. Petersen @ 2017-02-03 22:45 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, Junichi Nomura, linux-block, linux-scsi,
	Chaitanya Kulkarni, Martin K. Petersen

>>>>> "Jens" == Jens Axboe <axboe@kernel.dk> writes:

>> I think we should fix sd.c to only send WRITE SAME if either of the
>> variants are explicitly listed as supported through REPORT SUPPORTED
>> OPERATION CODES, or maybe through a whitelist if there are important
>> enough devices.

Jens> Yep

I hate it too. But the reason it's assumed on is that there is
essentially no heuristic that works. Just like we assume that READ
always works.

Out of the ~200 devices I have access to in the lab:

 - 100% of the SAS/FC disk drives and SSDs support WRITE SAME
 - Only 2 out of about 50 different drive models support RSOC
 - About half of the arrays support WRITE SAME(10/16)
 - None of the arrays I have support RSOC

So even if we were to entertain using RSOC for "enterprise" transport
classes (which I concur would be nice for other reasons), it wouldn't
solve the WRITE SAME problem...

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [REGRESSION v4.10-rc1] blkdev_issue_zeroout() returns -EREMOTEIO on the first call for SCSI device that doesn't support WRITE SAME
@ 2017-02-03 22:45         ` Martin K. Petersen
  0 siblings, 0 replies; 8+ messages in thread
From: Martin K. Petersen @ 2017-02-03 22:45 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, Junichi Nomura, linux-block, linux-scsi,
	Chaitanya Kulkarni, Martin K. Petersen

>>>>> "Jens" == Jens Axboe <axboe@kernel.dk> writes:

>> I think we should fix sd.c to only send WRITE SAME if either of the
>> variants are explicitly listed as supported through REPORT SUPPORTED
>> OPERATION CODES, or maybe through a whitelist if there are important
>> enough devices.

Jens> Yep

I hate it too. But the reason it's assumed on is that there is
essentially no heuristic that works. Just like we assume that READ
always works.

Out of the ~200 devices I have access to in the lab:

 - 100% of the SAS/FC disk drives and SSDs support WRITE SAME
 - Only 2 out of about 50 different drive models support RSOC
 - About half of the arrays support WRITE SAME(10/16)
 - None of the arrays I have support RSOC

So even if we were to entertain using RSOC for "enterprise" transport
classes (which I concur would be nice for other reasons), it wouldn't
solve the WRITE SAME problem...

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [REGRESSION v4.10-rc1] blkdev_issue_zeroout() returns -EREMOTEIO on the first call for SCSI device that doesn't support WRITE SAME
  2017-02-03 22:45         ` Martin K. Petersen
  (?)
@ 2017-02-04  3:17         ` Jens Axboe
  2017-02-04  8:48           ` Christoph Hellwig
  -1 siblings, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2017-02-04  3:17 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Christoph Hellwig, Junichi Nomura, linux-block, linux-scsi,
	Chaitanya Kulkarni

On 02/03/2017 03:45 PM, Martin K. Petersen wrote:
>>>>>> "Jens" == Jens Axboe <axboe@kernel.dk> writes:
> 
>>> I think we should fix sd.c to only send WRITE SAME if either of the
>>> variants are explicitly listed as supported through REPORT SUPPORTED
>>> OPERATION CODES, or maybe through a whitelist if there are important
>>> enough devices.
> 
> Jens> Yep
> 
> I hate it too. But the reason it's assumed on is that there is
> essentially no heuristic that works. Just like we assume that READ
> always works.
> 
> Out of the ~200 devices I have access to in the lab:
> 
>  - 100% of the SAS/FC disk drives and SSDs support WRITE SAME
>  - Only 2 out of about 50 different drive models support RSOC
>  - About half of the arrays support WRITE SAME(10/16)
>  - None of the arrays I have support RSOC
> 
> So even if we were to entertain using RSOC for "enterprise" transport
> classes (which I concur would be nice for other reasons), it wouldn't
> solve the WRITE SAME problem...

We're at (almost) -rc7 time, we have to do more than hand wave about
this. What's the plan for 4.10 final?


-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [REGRESSION v4.10-rc1] blkdev_issue_zeroout() returns -EREMOTEIO on the first call for SCSI device that doesn't support WRITE SAME
  2017-02-04  3:17         ` Jens Axboe
@ 2017-02-04  8:48           ` Christoph Hellwig
  0 siblings, 0 replies; 8+ messages in thread
From: Christoph Hellwig @ 2017-02-04  8:48 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Martin K. Petersen, Christoph Hellwig, Junichi Nomura,
	linux-block, linux-scsi, Chaitanya Kulkarni

On Fri, Feb 03, 2017 at 08:17:10PM -0700, Jens Axboe wrote:
> We're at (almost) -rc7 time, we have to do more than hand wave about
> this. What's the plan for 4.10 final?

I'll send your a fix to revert the async write same changes for now,
as suggested in the last mail from me.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-02-04  8:48 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-03  7:55 [REGRESSION v4.10-rc1] blkdev_issue_zeroout() returns -EREMOTEIO on the first call for SCSI device that doesn't support WRITE SAME Junichi Nomura
2017-02-03 15:21 ` Jens Axboe
2017-02-03 16:12   ` Christoph Hellwig
2017-02-03 16:14     ` Jens Axboe
2017-02-03 22:45       ` Martin K. Petersen
2017-02-03 22:45         ` Martin K. Petersen
2017-02-04  3:17         ` Jens Axboe
2017-02-04  8:48           ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.