linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ming Lin <mlin@kernel.org>
To: Mike Snitzer <snitzer@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>,
	lkml <linux-kernel@vger.kernel.org>, Jens Axboe <axboe@kernel.dk>,
	Kent Overstreet <kent.overstreet@gmail.com>,
	Dongsu Park <dpark@posteo.net>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	Ming Lin <ming.l@ssi.samsung.com>,
	linux-nvme@lists.infradead.org
Subject: Re: [PATCH v6 05/11] block: remove split code in blkdev_issue_{discard,write_same}
Date: Wed, 21 Oct 2015 13:13:09 -0700	[thread overview]
Message-ID: <1445458389.26847.10.camel@ssi> (raw)
In-Reply-To: <20151021181812.GA5807@redhat.com>

On Wed, 2015-10-21 at 14:18 -0400, Mike Snitzer wrote:
> On Wed, Oct 21 2015 at  1:33pm -0400,
> Ming Lin <mlin@kernel.org> wrote:
> 
> > On Wed, 2015-10-21 at 12:19 -0400, Mike Snitzer wrote:
> > > On Wed, Oct 21 2015 at 12:02pm -0400,
> > > Mike Snitzer <snitzer@redhat.com> wrote:
> > > 
> > > > On Wed, Oct 14 2015 at  9:27am -0400,
> > > > Christoph Hellwig <hch@infradead.org> wrote:
> > > > 
> > > > > On Tue, Oct 13, 2015 at 10:44:11AM -0700, Ming Lin wrote:
> > > > > > I just did a quick test with a Samsung 900G NVMe device.
> > > > > > mkfs.xfs is OK on 4.3-rc5.
> > > > > > 
> > > > > > What's your device model? I may find a similar one to try.
> > > > > 
> > > > > This is a HGST Ultrastar SN100
> > > > > 
> > > > > Analsys and tentativ fix below:
> > > > > 
> > > > > blktrace for before the commit:
> > > > > 
> > > > > 259,0    1        2     0.000002543  2394  G   D 0 + 8388607 [mkfs.xfs]
> > > > > 259,0    1        3     0.000008230  2394  I   D 0 + 8388607 [mkfs.xfs]
> > > > > 259,0    1        4     0.000031090   207  D   D 0 + 8388607 [kworker/1:1H]
> > > > > 259,0    1        5     0.000044869  2394  Q   D 8388607 + 8388607 [mkfs.xfs]
> > > > > 259,0    1        6     0.000045992  2394  G   D 8388607 + 8388607 [mkfs.xfs]
> > > > > 259,0    1        7     0.000049559  2394  I   D 8388607 + 8388607 [mkfs.xfs]
> > > > > 259,0    1        8     0.000061551   207  D   D 8388607 + 8388607 [kworker/1:1H]
> > > > > 
> > > > > .. and so on.
> > > > > 
> > > > > blktrace with the commit:
> > > > > 
> > > > > 259,0    2        1     0.000000000  1228  Q   D 0 + 4194304 [mkfs.xfs]
> > > > > 259,0    2        2     0.000002543  1228  G   D 0 + 4194304 [mkfs.xfs]
> > > > > 259,0    2        3     0.000010080  1228  I   D 0 + 4194304 [mkfs.xfs]
> > > > > 259,0    2        4     0.000082187   267  D   D 0 + 4194304 [kworker/2:1H]
> > > > > 259,0    2        5     0.000224869  1228  Q   D 4194304 + 4194304 [mkfs.xfs]
> > > > > 259,0    2        6     0.000225835  1228  G   D 4194304 + 4194304 [mkfs.xfs]
> > > > > 259,0    2        7     0.000229457  1228  I   D 4194304 + 4194304 [mkfs.xfs]
> > > > > 259,0    2        8     0.000238507   267  D   D 4194304 + 4194304 [kworker/2:1H]
> > > > > 
> > > > > So discards are smaller, but better aligned.  Now if I tweak a single
> > > > > line in blk-lib.c to be able to use all of bi_size I get the old I/O
> > > > > pattern back and everything works fine again:
> > > > > 
> > > > > diff --git a/block/blk-lib.c b/block/blk-lib.c
> > > > > index bd40292..65b61dc 100644
> > > > > --- a/block/blk-lib.c
> > > > > +++ b/block/blk-lib.c
> > > > > @@ -82,7 +82,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
> > > > >  			break;
> > > > >  		}
> > > > >  
> > > > > -		req_sects = min_t(sector_t, nr_sects, MAX_BIO_SECTORS);
> > > > > +		req_sects = min_t(sector_t, nr_sects, UINT_MAX >> 9);
> > > > >  		end_sect = sector + req_sects;
> > > > >  
> > > > >  		bio->bi_iter.bi_sector = sector;
> > > > 
> > > > Can we change UINT_MAX >> 9 to rounddown to the first factor of
> > > > minimum_io_size?
> > > > 
> > > > That should work for all devices and for dm-thinp (and dm-cache) in
> > > > particular will ensure that all discards that are issued will be a
> > > > multiple of the underlying device's blocksize.
> > > 
> > > Jeff Moyer pointed out having req_sects be a factor of
> > > discard_granularity makes more sense.  And I agree.  Same difference in
> > > the end (since dm-thinp sets discard_granularity to the thinp
> > > blocksize).
> > 
> > An old version of this patch did use discard_granularity
> > https://www.redhat.com/archives/dm-devel/2015-August/msg00000.html
> > 
> > But you didn't agree.
> > https://www.redhat.com/archives/dm-devel/2015-August/msg00001.html
> > 
> > Maybe we can re-add discard_granularity now?
> 
> I disagreed on a more generic level than discard_granularity shaping the
> split boundary.
> 
> But we are where we are.  If we're going to split (due to 32-bit limits
> in bio->bi_iter.bi_size) then we should at least do so in terms of the
> support discard_granularity.

How about below?
It actually reverts commit b49a0871 and adds patch at
https://www.redhat.com/archives/dm-devel/2015-August/msg00000.html

Christoph, could you help to try it?

commit 122bf0a43cb1611ed62aaf945f25b649c27a71ed
Author: Ming Lin <mlin@kernel.org>
Date:   Wed Oct 21 11:24:48 2015 -0700

    block: check discard_granularity and alignment
    
    Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
---
 block/blk-lib.c | 31 ++++++++++++++++++++++---------
 1 file changed, 22 insertions(+), 9 deletions(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index bd40292..9ebf653 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -26,13 +26,6 @@ static void bio_batch_end_io(struct bio *bio)
 	bio_put(bio);
 }
 
-/*
- * Ensure that max discard sectors doesn't overflow bi_size and hopefully
- * it is of the proper granularity as long as the granularity is a power
- * of two.
- */
-#define MAX_BIO_SECTORS ((1U << 31) >> 9)
-
 /**
  * blkdev_issue_discard - queue a discard
  * @bdev:	blockdev to issue discard for
@@ -50,6 +43,8 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 	DECLARE_COMPLETION_ONSTACK(wait);
 	struct request_queue *q = bdev_get_queue(bdev);
 	int type = REQ_WRITE | REQ_DISCARD;
+	unsigned int granularity;
+	int alignment;
 	struct bio_batch bb;
 	struct bio *bio;
 	int ret = 0;
@@ -61,6 +56,10 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 	if (!blk_queue_discard(q))
 		return -EOPNOTSUPP;
 
+	/* Zero-sector (unknown) and one-sector granularities are the same.  */
+	granularity = max(q->limits.discard_granularity >> 9, 1U);
+	alignment = (bdev_discard_alignment(bdev) >> 9) % granularity;
+
 	if (flags & BLKDEV_DISCARD_SECURE) {
 		if (!blk_queue_secdiscard(q))
 			return -EOPNOTSUPP;
@@ -74,7 +73,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 	blk_start_plug(&plug);
 	while (nr_sects) {
 		unsigned int req_sects;
-		sector_t end_sect;
+		sector_t end_sect, tmp;
 
 		bio = bio_alloc(gfp_mask, 1);
 		if (!bio) {
@@ -82,8 +81,22 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 			break;
 		}
 
-		req_sects = min_t(sector_t, nr_sects, MAX_BIO_SECTORS);
+		/* Make sure bi_size doesn't overflow */
+		req_sects = min_t(sector_t, nr_sects, UINT_MAX >> 9);
+
+		/*
+		 * If splitting a request, and the next starting sector would be
+		 * misaligned, stop the discard at the previous aligned sector.
+		 */
 		end_sect = sector + req_sects;
+		tmp = end_sect;
+		if (req_sects < nr_sects &&
+		    sector_div(tmp, granularity) != alignment) {
+			end_sect = end_sect - alignment;
+			sector_div(end_sect, granularity);
+			end_sect = end_sect * granularity + alignment;
+			req_sects = end_sect - sector;
+		}
 
 		bio->bi_iter.bi_sector = sector;
 		bio->bi_end_io = bio_batch_end_io;



  reply	other threads:[~2015-10-21 20:13 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-12  7:07 [PATCH v6 00/11] simplify block layer based on immutable biovecs Ming Lin
2015-08-12  7:07 ` [PATCH v6 01/11] block: make generic_make_request handle arbitrarily sized bios Ming Lin
2015-08-12  7:07 ` [PATCH v6 02/11] block: simplify bio_add_page() Ming Lin
2015-08-12  7:07 ` [PATCH v6 03/11] bcache: remove driver private bio splitting code Ming Lin
2016-01-08  1:53   ` Eric Wheeler
2015-08-12  7:07 ` [PATCH v6 04/11] btrfs: remove bio splitting and merge_bvec_fn() calls Ming Lin
2015-08-12  7:07 ` [PATCH v6 05/11] block: remove split code in blkdev_issue_{discard,write_same} Ming Lin
2015-10-13 11:50   ` Christoph Hellwig
2015-10-13 17:44     ` Ming Lin
2015-10-14 13:27       ` Christoph Hellwig
2015-10-14 16:38         ` [PATCH v6 05/11] block: remove split code in blkdev_issue_{discard,write_same}B Keith Busch
2015-10-14 16:50           ` Christoph Hellwig
2015-10-21 16:02         ` [PATCH v6 05/11] block: remove split code in blkdev_issue_{discard,write_same} Mike Snitzer
2015-10-21 16:19           ` Mike Snitzer
2015-10-21 16:33             ` Martin K. Petersen
2015-10-21 17:33             ` Ming Lin
2015-10-21 18:18               ` Mike Snitzer
2015-10-21 20:13                 ` Ming Lin [this message]
2015-10-22 10:24                   ` Christoph Hellwig
2015-10-22 11:22                     ` Christoph Hellwig
2015-10-21  7:21       ` Christoph Hellwig
2015-10-21 13:39         ` Jeff Moyer
2015-10-21 15:01           ` Ming Lin
2015-10-21 15:33             ` Mike Snitzer
2015-10-21 17:18               ` Ming Lin
2015-08-12  7:07 ` [PATCH v6 06/11] md/raid5: split bio for chunk_aligned_read Ming Lin
2015-08-12  7:07 ` [PATCH v6 07/11] md/raid5: get rid of bio_fits_rdev() Ming Lin
2015-08-12  7:07 ` [PATCH v6 08/11] block: kill merge_bvec_fn() completely Ming Lin
2015-08-12  7:07 ` [PATCH v6 09/11] fs: use helper bio_add_page() instead of open coding on bi_io_vec Ming Lin
2015-08-12  7:07 ` [PATCH v6 10/11] block: remove bio_get_nr_vecs() Ming Lin
2015-08-12  7:07 ` [PATCH v6 11/11] Documentation: update notes in biovecs about arbitrarily sized bios Ming Lin
2015-08-13 16:51 ` [PATCH v6 00/11] simplify block layer based on immutable biovecs Jens Axboe
2015-08-13 17:03   ` Ming Lin
2015-08-13 17:07     ` Jens Axboe
2015-08-13 17:36       ` Ming Lin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1445458389.26847.10.camel@ssi \
    --to=mlin@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=dpark@posteo.net \
    --cc=hch@infradead.org \
    --cc=kent.overstreet@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=martin.petersen@oracle.com \
    --cc=ming.l@ssi.samsung.com \
    --cc=snitzer@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).