linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] block: correctly fallback for zeroout
@ 2016-05-26 18:08 Shaohua Li
       [not found] ` <20160527054918.GA9521@sucs.org>
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Shaohua Li @ 2016-05-26 18:08 UTC (permalink / raw)
  To: linux-block, linux-kernel
  Cc: sitsofe, snitzer, axboe, martin.petersen, Kernel-team

blkdev_issue_zeroout try discard/writesame first, if they fail, zeroout
fallback to regular write. The problem is discard/writesame doesn't
return error for -EOPNOTSUPP, then zeroout can't do fallback and leave
disk data not changed. zeroout should have guaranteed zero-fill
behavior.

BTW, I saw several callers of blkdev_issue_discard can handle
-EOPNOTSUPP, not sure why blkdev_issue_discard not returns -EOPNOTSUPP.
The same story for blkdev_issue_write_same.

https://bugzilla.kernel.org/show_bug.cgi?id=118581

Cc: Sitsofe Wheeler <sitsofe@yahoo.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Shaohua Li <shli@fb.com>
---
 block/blk-lib.c | 35 ++++++++++++++++++++++++++---------
 1 file changed, 26 insertions(+), 9 deletions(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 23d7f30..232f9ea 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -95,8 +95,9 @@ EXPORT_SYMBOL(__blkdev_issue_discard);
  * Description:
  *    Issue a discard request for the sectors in question.
  */
-int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
-		sector_t nr_sects, gfp_t gfp_mask, unsigned long flags)
+static int do_blkdev_issue_discard(struct block_device *bdev, sector_t sector,
+	sector_t nr_sects, gfp_t gfp_mask, unsigned long flags,
+	bool ignore_nosupport)
 {
 	int type = REQ_WRITE | REQ_DISCARD;
 	struct bio *bio = NULL;
@@ -111,13 +112,20 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 			&bio);
 	if (!ret && bio) {
 		ret = submit_bio_wait(type, bio);
-		if (ret == -EOPNOTSUPP)
+		if (ignore_nosupport && ret == -EOPNOTSUPP)
 			ret = 0;
 	}
 	blk_finish_plug(&plug);
 
 	return ret;
 }
+
+int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
+		sector_t nr_sects, gfp_t gfp_mask, unsigned long flags)
+{
+	return do_blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask,
+			flags, true);
+}
 EXPORT_SYMBOL(blkdev_issue_discard);
 
 /**
@@ -131,9 +139,9 @@ EXPORT_SYMBOL(blkdev_issue_discard);
  * Description:
  *    Issue a write same request for the sectors in question.
  */
-int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
+static int do_blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
 			    sector_t nr_sects, gfp_t gfp_mask,
-			    struct page *page)
+			    struct page *page, bool ignore_nosupport)
 {
 	struct request_queue *q = bdev_get_queue(bdev);
 	unsigned int max_write_same_sectors;
@@ -167,7 +175,15 @@ int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
 
 	if (bio)
 		ret = submit_bio_wait(REQ_WRITE | REQ_WRITE_SAME, bio);
-	return ret != -EOPNOTSUPP ? ret : 0;
+	return (ret != -EOPNOTSUPP || !ignore_nosupport) ? ret : 0;
+}
+
+int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
+			    sector_t nr_sects, gfp_t gfp_mask,
+			    struct page *page)
+{
+	return do_blkdev_issue_write_same(bdev, sector, nr_sects, gfp_mask,
+		page, true);
 }
 EXPORT_SYMBOL(blkdev_issue_write_same);
 
@@ -238,12 +254,13 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 	struct request_queue *q = bdev_get_queue(bdev);
 
 	if (discard && blk_queue_discard(q) && q->limits.discard_zeroes_data &&
-	    blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, 0) == 0)
+	    do_blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, 0,
+					false) == 0)
 		return 0;
 
 	if (bdev_write_same(bdev) &&
-	    blkdev_issue_write_same(bdev, sector, nr_sects, gfp_mask,
-				    ZERO_PAGE(0)) == 0)
+	    do_blkdev_issue_write_same(bdev, sector, nr_sects, gfp_mask,
+				    ZERO_PAGE(0), false) == 0)
 		return 0;
 
 	return __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask);
-- 
2.8.0.rc2

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] block: correctly fallback for zeroout
       [not found] ` <20160527054918.GA9521@sucs.org>
@ 2016-05-28  9:27   ` Sitsofe Wheeler
  2016-06-02 16:58     ` Shaohua Li
  2016-06-03  2:56   ` Martin K. Petersen
  1 sibling, 1 reply; 12+ messages in thread
From: Sitsofe Wheeler @ 2016-05-28  9:27 UTC (permalink / raw)
  To: Shaohua Li
  Cc: linux-block, linux-kernel, snitzer, axboe, martin.petersen, Kernel-team

On Sat, May 28, 2016 at 08:55:43AM +0000, Sitsofe Wheeler wrote:
> On Thu, May 26, 2016 at 11:08:14AM -0700, Shaohua Li wrote:
> > blkdev_issue_zeroout try discard/writesame first, if they fail, zeroout
> > fallback to regular write. The problem is discard/writesame doesn't
> > return error for -EOPNOTSUPP, then zeroout can't do fallback and leave
> > disk data not changed. zeroout should have guaranteed zero-fill
> > behavior.
> 
> It sounds like at least this patch should go in so BLKZEROOUT can always
> fall back (since those zeros are essential) but it would still be nice
> to see the disabling of write same being copied up to md device's
> write_same_max_bytes so everyone knows not to try using it in the future
> but perhaps someone will say "what if I re-enable it on the device
> below?" etc.

I've tested Shaohua's original patch on top of Linus' tree and even
without the suggested changes (above and below) it at least resolves the
success being returned but data not being zeroed (with both PVSCSI and
scsi_debug underlying devices) issue so:

Tested-by: Sitsofe Wheeler <sitsofe@yahoo.com>

> > BTW, I saw several callers of blkdev_issue_discard can handle
> > -EOPNOTSUPP, not sure why blkdev_issue_discard not returns -EOPNOTSUPP.
> > The same story for blkdev_issue_write_same.
> 
> Most of the time there's no harm if discard fails for any reason -
> there's no guarantee what state the data is in even if it succeeds so
> not doing anything is always legal. I guess there's an argument for why
> try harder. Further, perhaps not every caller is prepared to handle the
> case where an advertised feature suddenly becomes not supported and this
> papers over the problem.
> 
> The original SCSI WRITE SAME has overloaded semantics - not only does it
> mean "write this data multiple times" but it can also be used to mean
> "discard this range" too. If the kernel's command was modelled on the
> SCSI original perhaps this conflation clouded things?
> 
> My fear is that there is another user of blkdev_issue_write_same other
> than blkdev_issue_zeroout (e.g. if this call is somehow wrapped so user
> space can issue it) who doesn't know about the secret "don't hold back!"
> version and winds up ignoring (permanent) errors.
> 
> > https://bugzilla.kernel.org/show_bug.cgi?id=118581
> > 
> > Cc: Sitsofe Wheeler <sitsofe@yahoo.com>
> > Cc: Mike Snitzer <snitzer@redhat.com>
> > Cc: Jens Axboe <axboe@fb.com>
> > Cc: Martin K. Petersen <martin.petersen@oracle.com>
> > Signed-off-by: Shaohua Li <shli@fb.com>
> > ---
> >  block/blk-lib.c | 35 ++++++++++++++++++++++++++---------
> >  1 file changed, 26 insertions(+), 9 deletions(-)
> > 
> > diff --git a/block/blk-lib.c b/block/blk-lib.c
> > index 23d7f30..232f9ea 100644
> > --- a/block/blk-lib.c
> > +++ b/block/blk-lib.c
> > @@ -95,8 +95,9 @@ EXPORT_SYMBOL(__blkdev_issue_discard);
> >  * Description:
> >  *    Issue a discard request for the sectors in question.
> >  */
> > -int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
> > -        sector_t nr_sects, gfp_t gfp_mask, unsigned long flags)
> > +static int do_blkdev_issue_discard(struct block_device *bdev, sector_t sector,
> > +    sector_t nr_sects, gfp_t gfp_mask, unsigned long flags,
> > +    bool ignore_nosupport)
> 
> I'm not sure about "ignore" and "no" being in the same variable name -
> it's almost like double negation.

-- 
Sitsofe | http://sucs.org/~sits/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] block: correctly fallback for zeroout
  2016-05-26 18:08 [PATCH] block: correctly fallback for zeroout Shaohua Li
       [not found] ` <20160527054918.GA9521@sucs.org>
@ 2016-05-29  6:47 ` Christoph Hellwig
  2016-06-03  3:06   ` Martin K. Petersen
  2016-06-03  3:26 ` [PATCH] " Martin K. Petersen
  2 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2016-05-29  6:47 UTC (permalink / raw)
  To: Shaohua Li
  Cc: linux-block, linux-kernel, sitsofe, snitzer, axboe,
	martin.petersen, Kernel-team

On Thu, May 26, 2016 at 11:08:14AM -0700, Shaohua Li wrote:
> -int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
> -		sector_t nr_sects, gfp_t gfp_mask, unsigned long flags)
> +static int do_blkdev_issue_discard(struct block_device *bdev, sector_t sector,

We've split blkdev_issue_discard into __blkdev_issue_discard and a small
wrapper around in for 4.7, so this will need a bit of an update.

As part of that I also removed the strange EOPNOTSUPP ignore, but Mike
reverted it just because it changed something in the dm testsuite.

I still believe we should never ignore it in this helper, and only
do so in callers that believe it's the right thing.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] block: correctly fallback for zeroout
  2016-05-28  9:27   ` Sitsofe Wheeler
@ 2016-06-02 16:58     ` Shaohua Li
  2016-06-02 17:02       ` Martin K. Petersen
  0 siblings, 1 reply; 12+ messages in thread
From: Shaohua Li @ 2016-06-02 16:58 UTC (permalink / raw)
  To: Sitsofe Wheeler, axboe
  Cc: linux-block, linux-kernel, snitzer, axboe, martin.petersen, Kernel-team

On Sat, May 28, 2016 at 10:27:55AM +0100, Sitsofe Wheeler wrote:
> On Sat, May 28, 2016 at 08:55:43AM +0000, Sitsofe Wheeler wrote:
> > On Thu, May 26, 2016 at 11:08:14AM -0700, Shaohua Li wrote:
> > > blkdev_issue_zeroout try discard/writesame first, if they fail, zeroout
> > > fallback to regular write. The problem is discard/writesame doesn't
> > > return error for -EOPNOTSUPP, then zeroout can't do fallback and leave
> > > disk data not changed. zeroout should have guaranteed zero-fill
> > > behavior.
> > 
> > It sounds like at least this patch should go in so BLKZEROOUT can always
> > fall back (since those zeros are essential) but it would still be nice
> > to see the disabling of write same being copied up to md device's
> > write_same_max_bytes so everyone knows not to try using it in the future
> > but perhaps someone will say "what if I re-enable it on the device
> > below?" etc.
> 
> I've tested Shaohua's original patch on top of Linus' tree and even
> without the suggested changes (above and below) it at least resolves the
> success being returned but data not being zeroed (with both PVSCSI and
> scsi_debug underlying devices) issue so:
> 
> Tested-by: Sitsofe Wheeler <sitsofe@yahoo.com>

Jens,

any chance you could merge this one? I'll fix the MD part later and make sure
write_same_max_bytes sets to 0 after IO failure.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] block: correctly fallback for zeroout
  2016-06-02 16:58     ` Shaohua Li
@ 2016-06-02 17:02       ` Martin K. Petersen
  0 siblings, 0 replies; 12+ messages in thread
From: Martin K. Petersen @ 2016-06-02 17:02 UTC (permalink / raw)
  To: Shaohua Li
  Cc: Sitsofe Wheeler, axboe, linux-block, linux-kernel, snitzer,
	martin.petersen, Kernel-team

>>>>> "Shaohua" == Shaohua Li <shli@fb.com> writes:

Shaohua> any chance you could merge this one? I'll fix the MD part later
Shaohua> and make sure write_same_max_bytes sets to 0 after IO failure.

I'd like to look it over first. I've been away for a few days. Will get
to it today.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] block: correctly fallback for zeroout
       [not found] ` <20160527054918.GA9521@sucs.org>
  2016-05-28  9:27   ` Sitsofe Wheeler
@ 2016-06-03  2:56   ` Martin K. Petersen
  1 sibling, 0 replies; 12+ messages in thread
From: Martin K. Petersen @ 2016-06-03  2:56 UTC (permalink / raw)
  To: Sitsofe Wheeler
  Cc: Shaohua Li, linux-block, linux-kernel, snitzer, axboe,
	martin.petersen, Kernel-team

>>>>> "Sitsofe" == Sitsofe Wheeler <sitsofe@yahoo.com> writes:

Sitsofe> The original SCSI WRITE SAME has overloaded semantics - not
Sitsofe> only does it mean "write this data multiple times" but it can
Sitsofe> also be used to mean "discard this range" too. If the kernel's
Sitsofe> command was modelled on the SCSI original perhaps this
Sitsofe> conflation clouded things?

REQ_WRITE_SAME in the context of the kernel explicitly means "write
payload to this block range".

A REQ_DISCARD command may be serviced using WRITE SAME(16) with the
UNMAP bit set in the SCSI disk driver but that's entirely orthogonal.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] block: correctly fallback for zeroout
  2016-05-29  6:47 ` Christoph Hellwig
@ 2016-06-03  3:06   ` Martin K. Petersen
  2016-06-03  3:54     ` Mike Snitzer
  0 siblings, 1 reply; 12+ messages in thread
From: Martin K. Petersen @ 2016-06-03  3:06 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Shaohua Li, linux-block, linux-kernel, sitsofe, snitzer, axboe,
	martin.petersen, Kernel-team

>>>>> "Christoph" == Christoph Hellwig <hch@infradead.org> writes:

Christoph> As part of that I also removed the strange EOPNOTSUPP ignore,
Christoph> but Mike reverted it just because it changed something in the
Christoph> dm testsuite.

Mike?

Christoph> I still believe we should never ignore it in this helper, and
Christoph> only do so in callers that believe it's the right thing.

Yeah.

I really wish EOPNOTSUPP would just go away except for ioctl callers.
Now that we have real bi_error I don't understand why we need it.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] block: correctly fallback for zeroout
  2016-05-26 18:08 [PATCH] block: correctly fallback for zeroout Shaohua Li
       [not found] ` <20160527054918.GA9521@sucs.org>
  2016-05-29  6:47 ` Christoph Hellwig
@ 2016-06-03  3:26 ` Martin K. Petersen
  2 siblings, 0 replies; 12+ messages in thread
From: Martin K. Petersen @ 2016-06-03  3:26 UTC (permalink / raw)
  To: Shaohua Li
  Cc: linux-block, linux-kernel, sitsofe, snitzer, axboe,
	martin.petersen, Kernel-team

>>>>> "Shaohua" == Shaohua Li <shli@fb.com> writes:

Shaohua> blkdev_issue_zeroout try discard/writesame first, if they fail,
Shaohua> zeroout fallback to regular write. The problem is
Shaohua> discard/writesame doesn't return error for -EOPNOTSUPP, then
Shaohua> zeroout can't do fallback and leave disk data not
Shaohua> changed. zeroout should have guaranteed zero-fill behavior.

As discussed at LSF/MM, let's explicitly separate the exported/ioctl()
functions from the __/do_foo_bar iterators. That's essentially what you
have done. And then put all error handling and policy in the ioctl()
wrappers instead of the worker functions.

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 23d7f30..232f9ea 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -95,8 +95,9 @@ EXPORT_SYMBOL(__blkdev_issue_discard);
  * Description:
  *    Issue a discard request for the sectors in question.
  */
-int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
-		sector_t nr_sects, gfp_t gfp_mask, unsigned long flags)
+static int do_blkdev_issue_discard(struct block_device *bdev, sector_t sector,
+	sector_t nr_sects, gfp_t gfp_mask, unsigned long flags,
+	bool ignore_nosupport)
 {
 	int type = REQ_WRITE | REQ_DISCARD;
 	struct bio *bio = NULL;
@@ -111,13 +112,20 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 			&bio);
 	if (!ret && bio) {
 		ret = submit_bio_wait(type, bio);
-		if (ret == -EOPNOTSUPP)
+		if (ignore_nosupport && ret == -EOPNOTSUPP)
 			ret = 0;
 	}
 	blk_finish_plug(&plug);
 
 	return ret;
 }
+
+int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
+		sector_t nr_sects, gfp_t gfp_mask, unsigned long flags)
+{
+	return do_blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask,
+			flags, true);
+}

I'd prefer to do the EOPNOTSUPP mapping for the ioctl here instead of in
the do_blkdev_issue_discard() function. Then you don't need the
ignore_nosupport flag.

 EXPORT_SYMBOL(blkdev_issue_discard);
 
 /**
@@ -131,9 +139,9 @@ EXPORT_SYMBOL(blkdev_issue_discard);
  * Description:
  *    Issue a write same request for the sectors in question.
  */
-int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
+static int do_blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
 			    sector_t nr_sects, gfp_t gfp_mask,
-			    struct page *page)
+			    struct page *page, bool ignore_nosupport)
 {
 	struct request_queue *q = bdev_get_queue(bdev);
 	unsigned int max_write_same_sectors;
@@ -167,7 +175,15 @@ int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
 
 	if (bio)
 		ret = submit_bio_wait(REQ_WRITE | REQ_WRITE_SAME, bio);
-	return ret != -EOPNOTSUPP ? ret : 0;
+	return (ret != -EOPNOTSUPP || !ignore_nosupport) ? ret : 0;
+}
+
+int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
+			    sector_t nr_sects, gfp_t gfp_mask,
+			    struct page *page)
+{
+	return do_blkdev_issue_write_same(bdev, sector, nr_sects, gfp_mask,
+		page, true);
 }
 EXPORT_SYMBOL(blkdev_issue_write_same);

There should not be a "soft" fail for WRITE SAME, it's not a hint.

@@ -238,12 +254,13 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 	struct request_queue *q = bdev_get_queue(bdev);
 
 	if (discard && blk_queue_discard(q) && q->limits.discard_zeroes_data &&
-	    blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, 0) == 0)
+	    do_blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, 0,
+					false) == 0)
 		return 0;
 
 	if (bdev_write_same(bdev) &&
-	    blkdev_issue_write_same(bdev, sector, nr_sects, gfp_mask,
-				    ZERO_PAGE(0)) == 0)
+	    do_blkdev_issue_write_same(bdev, sector, nr_sects, gfp_mask,
+				    ZERO_PAGE(0), false) == 0)
 		return 0;
 
 	return __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask);

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: block: correctly fallback for zeroout
  2016-06-03  3:06   ` Martin K. Petersen
@ 2016-06-03  3:54     ` Mike Snitzer
  2016-06-07  2:32       ` Martin K. Petersen
  0 siblings, 1 reply; 12+ messages in thread
From: Mike Snitzer @ 2016-06-03  3:54 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Christoph Hellwig, Shaohua Li, linux-block, linux-kernel,
	sitsofe, axboe, Kernel-team

On Thu, Jun 02 2016 at 11:06pm -0400,
Martin K. Petersen <martin.petersen@oracle.com> wrote:

> >>>>> "Christoph" == Christoph Hellwig <hch@infradead.org> writes:
> 
> Christoph> As part of that I also removed the strange EOPNOTSUPP ignore,
> Christoph> but Mike reverted it just because it changed something in the
> Christoph> dm testsuite.
> 
> Mike?

Yes? ;)

Seems there is some serious confusion going on here.  The entirety of
hch's post (to which you quoted a subset) makes little sense to me.

shli's patch builds ontop of latest blk-lib.c code yet hch said this::
"We've split blkdev_issue_discard into __blkdev_issue_discard and a
small wrapper around in for 4.7, so this will need a bit of an update."

And hch never "removed the strange EOPNOTSUPP ignore".  He preserved it
(see his commit 38f25255330's "return ret != -EOPNOTSUPP ? ret : 0;"
that I adjusted in commit bbd848e0f -- _and_ he expanded it to eat the
early return that I restored).

So I can only infer that hch is still missing why my revert fixes
historic blkdev_issue_discard() behavior that his commit regressed.
Please read commit bbd848e0f's header.  That at least details the early
vs late -EOPNOTSUPP blkdev_issue_discard() return.

But all that nuance aside, AFAICT my commit bbd848e0f ("block: reinstate
early return of -EOPNOTSUPP from blkdev_issue_discard") really has
_nothing_ to do with the issue shli is addressing with his fix.

> Christoph> I still believe we should never ignore it in this helper, and
> Christoph> only do so in callers that believe it's the right thing.
> 
> Yeah.

Hmm...

You agreed to what hch said there about how we should probably always
return EOPNOTSUPP but then you immediately elaborated with details that
mean you don't agree:
 
> I really wish EOPNOTSUPP would just go away except for ioctl callers.
> Now that we have real bi_error I don't understand why we need it.

But hch was originally in favor of _always_ dropping EOPNOTSUPP on the
floor (that is what his commit 38f25255330 did).  Then he said he
disagrees with these interfaces playing games with masking EOPNOTSUPP --
to which you seemingly really don't agree.  Unless I'm completely
misreading you.

Anyway, shli is at least making it so that blkdev_issue_zerout() can
fallback to other mechanisms as needed.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: block: correctly fallback for zeroout
  2016-06-03  3:54     ` Mike Snitzer
@ 2016-06-07  2:32       ` Martin K. Petersen
  2016-06-07  6:38         ` Christoph Hellwig
  0 siblings, 1 reply; 12+ messages in thread
From: Martin K. Petersen @ 2016-06-07  2:32 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Martin K. Petersen, Christoph Hellwig, Shaohua Li, linux-block,
	linux-kernel, sitsofe, axboe, Kernel-team

>>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:

Mike> But hch was originally in favor of _always_ dropping EOPNOTSUPP on
Mike> the floor (that is what his commit 38f25255330 did).  Then he said
Mike> he disagrees with these interfaces playing games with masking
Mike> EOPNOTSUPP -- to which you seemingly really don't agree.  Unless
Mike> I'm completely misreading you.

Userland apps rely on EOPNOTSUPP, we can't break that.

What I don't like this is "soft" error special casing of EOPNOTSUPP in
the actual implementation of discard, write same, etc. These functions
should return either success or failure. And the ioctl wrapper should
then decide whether to return EOPNOTSUPP, EIO or EPONIES.

I.e. separate the policy from the implementation. This would also solve
some of the grievances for the target folks.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: block: correctly fallback for zeroout
  2016-06-07  2:32       ` Martin K. Petersen
@ 2016-06-07  6:38         ` Christoph Hellwig
  2016-06-10  2:05           ` Martin K. Petersen
  0 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2016-06-07  6:38 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Mike Snitzer, Christoph Hellwig, Shaohua Li, linux-block,
	linux-kernel, sitsofe, axboe, Kernel-team

On Mon, Jun 06, 2016 at 10:32:38PM -0400, Martin K. Petersen wrote:
> >>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:
> 
> Mike> But hch was originally in favor of _always_ dropping EOPNOTSUPP on
> Mike> the floor (that is what his commit 38f25255330 did).  Then he said
> Mike> he disagrees with these interfaces playing games with masking
> Mike> EOPNOTSUPP -- to which you seemingly really don't agree.  Unless
> Mike> I'm completely misreading you.
> 
> Userland apps rely on EOPNOTSUPP, we can't break that.

Rely on what exactly?  Current we return EOPNOTSUPP if the device
doesn't claim to support discards, but it returns 0 if the device first
claims to support it but then fails the I/O.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: block: correctly fallback for zeroout
  2016-06-07  6:38         ` Christoph Hellwig
@ 2016-06-10  2:05           ` Martin K. Petersen
  0 siblings, 0 replies; 12+ messages in thread
From: Martin K. Petersen @ 2016-06-10  2:05 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Martin K. Petersen, Mike Snitzer, Shaohua Li, linux-block,
	linux-kernel, sitsofe, axboe, Kernel-team

>>>>> "Christoph" == Christoph Hellwig <hch@infradead.org> writes:

>> Userland apps rely on EOPNOTSUPP, we can't break that.

Christoph> Rely on what exactly?  Current we return EOPNOTSUPP if the
Christoph> device doesn't claim to support discards, but it returns 0 if
Christoph> the device first claims to support it but then fails the I/O.

Hopefully we can clean up this when/if we go the fallocate() route.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-06-10  2:05 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-26 18:08 [PATCH] block: correctly fallback for zeroout Shaohua Li
     [not found] ` <20160527054918.GA9521@sucs.org>
2016-05-28  9:27   ` Sitsofe Wheeler
2016-06-02 16:58     ` Shaohua Li
2016-06-02 17:02       ` Martin K. Petersen
2016-06-03  2:56   ` Martin K. Petersen
2016-05-29  6:47 ` Christoph Hellwig
2016-06-03  3:06   ` Martin K. Petersen
2016-06-03  3:54     ` Mike Snitzer
2016-06-07  2:32       ` Martin K. Petersen
2016-06-07  6:38         ` Christoph Hellwig
2016-06-10  2:05           ` Martin K. Petersen
2016-06-03  3:26 ` [PATCH] " Martin K. Petersen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).