From: Ming Lin <mlin@kernel.org> To: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Mike Snitzer <snitzer@redhat.com>, device-mapper development <dm-devel@redhat.com>, Ming Lei <ming.lei@canonical.com>, Christoph Hellwig <hch@lst.de>, Alasdair Kergon <agk@redhat.com>, Lars Ellenberg <drbd-dev@lists.linbit.com>, Philip Kelleher <pjk1939@linux.vnet.ibm.com>, Joshua Morris <josh.h.morris@us.ibm.com>, Christoph Hellwig <hch@infradead.org>, Kent Overstreet <kent.overstreet@gmail.com>, Nitin Gupta <ngupta@vflare.org>, Ming Lin <ming.l@ssi.samsung.com>, Oleg Drokin <oleg.drokin@intel.com>, Al Viro <viro@zeniv.linux.org.uk>, Jens Axboe <axboe@kernel.dk>, Andreas Dilger <andreas.dilger@intel.com>, Geoff Levand <geoff@infradead.org>, Jiri Kosina <jkosina@suse.cz>, lkml <linux-kernel@vger.kernel.org>, Jim Paris <jim@jtan.com>, Minchan Kim <minchan@kernel.org>, Dongsu Park <dpark@posteo.net>, drbd-user@lists.linbit.com Subject: Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios Date: Sat, 08 Aug 2015 22:59:50 -0700 [thread overview] Message-ID: <1439099990.7880.0.camel@hasee> (raw) In-Reply-To: <yq18u9liwhj.fsf@sermon.lab.mkp.net> On Sat, 2015-08-08 at 12:19 -0400, Martin K. Petersen wrote: > >>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes: > > Mike> This will translate to all intermediate layers that might split > Mike> discards needing to worry about granularity/alignment too > Mike> (e.g. how dm-thinp will have to care because it must generate > Mike> discard mappings with associated bios based on how blocks were > Mike> mapped to thinp). > > The fundamental issue here is that alignment and granularity should > never, ever have been enforced at the top of the stack. Horrendous idea > from the very beginning. > > For the < handful of braindead devices that get confused when you do > partial or misaligned blocks we should have had a quirk that did any > range adjusting at the bottom in sd_setup_discard_cmnd(). > > There's a reason I turned discard_zeroes_data off for UNMAP! > > Wrt. the range size I don't have a problem with capping at the 32-bit > bi_size limit. We probably don't want to send commands much bigger than > that anyway. How about below? commit b8ca440bd77653d4d2bac90b7fd1599e9e0e150a Author: Ming Lin <ming.l@ssi.samsung.com> Date: Fri Aug 7 15:07:07 2015 -0700 block: remove split code in blkdev_issue_{discard,write_same} The split code in blkdev_issue_{discard,write_same} can go away now that any driver that cares does the split. We have to make sure bio size doesn't overflow. For discard, we set max discard sectors to (1<<31)>>9 to ensure it doesn't overflow bi_size and hopefully it is of the proper granularity as long as the granularity is a power of two. Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> --- block/blk-lib.c | 47 +++++++++++------------------------------------ 1 file changed, 11 insertions(+), 36 deletions(-) diff --git a/block/blk-lib.c b/block/blk-lib.c index 7688ee3..4859e4b 100644 --- a/block/blk-lib.c +++ b/block/blk-lib.c @@ -26,6 +26,13 @@ static void bio_batch_end_io(struct bio *bio, int err) bio_put(bio); } +/* + * Ensure that max discard sectors doesn't overflow bi_size and hopefully + * it is of the proper granularity as long as the granularity is a power + * of two. + */ +#define MAX_DISCARD_SECTORS ((1U << 31) >> 9) + /** * blkdev_issue_discard - queue a discard * @bdev: blockdev to issue discard for @@ -43,8 +50,6 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, DECLARE_COMPLETION_ONSTACK(wait); struct request_queue *q = bdev_get_queue(bdev); int type = REQ_WRITE | REQ_DISCARD; - unsigned int max_discard_sectors, granularity; - int alignment; struct bio_batch bb; struct bio *bio; int ret = 0; @@ -56,21 +61,6 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, if (!blk_queue_discard(q)) return -EOPNOTSUPP; - /* Zero-sector (unknown) and one-sector granularities are the same. */ - granularity = max(q->limits.discard_granularity >> 9, 1U); - alignment = (bdev_discard_alignment(bdev) >> 9) % granularity; - - /* - * Ensure that max_discard_sectors is of the proper - * granularity, so that requests stay aligned after a split. - */ - max_discard_sectors = min(q->limits.max_discard_sectors, UINT_MAX >> 9); - max_discard_sectors -= max_discard_sectors % granularity; - if (unlikely(!max_discard_sectors)) { - /* Avoid infinite loop below. Being cautious never hurts. */ - return -EOPNOTSUPP; - } - if (flags & BLKDEV_DISCARD_SECURE) { if (!blk_queue_secdiscard(q)) return -EOPNOTSUPP; @@ -84,7 +74,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, blk_start_plug(&plug); while (nr_sects) { unsigned int req_sects; - sector_t end_sect, tmp; + sector_t end_sect; bio = bio_alloc(gfp_mask, 1); if (!bio) { @@ -92,21 +82,8 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, break; } - req_sects = min_t(sector_t, nr_sects, max_discard_sectors); - - /* - * If splitting a request, and the next starting sector would be - * misaligned, stop the discard at the previous aligned sector. - */ + req_sects = min_t(sector_t, nr_sects, MAX_DISCARD_SECTORS); end_sect = sector + req_sects; - tmp = end_sect; - if (req_sects < nr_sects && - sector_div(tmp, granularity) != alignment) { - end_sect = end_sect - alignment; - sector_div(end_sect, granularity); - end_sect = end_sect * granularity + alignment; - req_sects = end_sect - sector; - } bio->bi_iter.bi_sector = sector; bio->bi_end_io = bio_batch_end_io; @@ -166,10 +143,8 @@ int blkdev_issue_write_same(struct block_device *bdev, sector_t sector, if (!q) return -ENXIO; - max_write_same_sectors = q->limits.max_write_same_sectors; - - if (max_write_same_sectors == 0) - return -EOPNOTSUPP; + /* Ensure that max_write_same_sectors doesn't overflow bi_size */ + max_write_same_sectors = UINT_MAX >> 9; atomic_set(&bb.done, 1); bb.flags = 1 << BIO_UPTODATE;
WARNING: multiple messages have this Message-ID (diff)
From: Ming Lin <mlin-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> To: "Martin K. Petersen" <martin.petersen-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> Cc: Mike Snitzer <snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, Ming Lei <ming.lei-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>, device-mapper development <dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, Joshua Morris <josh.h.morris-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>, Alasdair Kergon <agk-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, Lars Ellenberg <drbd-dev-cunTk1MwBs8qoQakbn7OcQ@public.gmane.org>, Philip Kelleher <pjk1939-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>, Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>, Christoph Hellwig <hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>, Nitin-63ez5xqkn6DQT0dZR+AlfA@public.gmane.org, Kent Overstreet <kent.overstreet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>, Gupta <ngupta-KNmc09w0p+Ednm+yROfE0A@public.gmane.org>, Ming Lin <ming.l-Vzezgt5dB6uUEJcrhfAQsw@public.gmane.org>, Oleg Drokin <oleg.drokin-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>, Al Viro <viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org>, Jens Axboe <axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>, Andreas Dilger <andreas.dilger-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>, Geoff Levand <geoff-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>, Jiri Kosina <jkosina-AlSwsSmVLrQ@public.gmane.org>, lkml <linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>, Jim Paris <jim-XrPbb/hENzg@public.gmane.org>, Minchan Kim <minchan-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>, Dongsu Park <dpark-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org>, drbd-user-cunTk1MwBs8qoQakbn7OcQ@public.gmane.org Subject: Re: [dm-devel] [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios Date: Sat, 08 Aug 2015 22:59:50 -0700 [thread overview] Message-ID: <1439099990.7880.0.camel@hasee> (raw) In-Reply-To: <yq18u9liwhj.fsf-+q57XtR/GgMb6DWv4sQWN6xOck334EZe@public.gmane.org> On Sat, 2015-08-08 at 12:19 -0400, Martin K. Petersen wrote: > >>>>> "Mike" == Mike Snitzer <snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes: > > Mike> This will translate to all intermediate layers that might split > Mike> discards needing to worry about granularity/alignment too > Mike> (e.g. how dm-thinp will have to care because it must generate > Mike> discard mappings with associated bios based on how blocks were > Mike> mapped to thinp). > > The fundamental issue here is that alignment and granularity should > never, ever have been enforced at the top of the stack. Horrendous idea > from the very beginning. > > For the < handful of braindead devices that get confused when you do > partial or misaligned blocks we should have had a quirk that did any > range adjusting at the bottom in sd_setup_discard_cmnd(). > > There's a reason I turned discard_zeroes_data off for UNMAP! > > Wrt. the range size I don't have a problem with capping at the 32-bit > bi_size limit. We probably don't want to send commands much bigger than > that anyway. How about below? commit b8ca440bd77653d4d2bac90b7fd1599e9e0e150a Author: Ming Lin <ming.l-Vzezgt5dB6uUEJcrhfAQsw@public.gmane.org> Date: Fri Aug 7 15:07:07 2015 -0700 block: remove split code in blkdev_issue_{discard,write_same} The split code in blkdev_issue_{discard,write_same} can go away now that any driver that cares does the split. We have to make sure bio size doesn't overflow. For discard, we set max discard sectors to (1<<31)>>9 to ensure it doesn't overflow bi_size and hopefully it is of the proper granularity as long as the granularity is a power of two. Signed-off-by: Ming Lin <ming.l-Vzezgt5dB6uUEJcrhfAQsw@public.gmane.org> --- block/blk-lib.c | 47 +++++++++++------------------------------------ 1 file changed, 11 insertions(+), 36 deletions(-) diff --git a/block/blk-lib.c b/block/blk-lib.c index 7688ee3..4859e4b 100644 --- a/block/blk-lib.c +++ b/block/blk-lib.c @@ -26,6 +26,13 @@ static void bio_batch_end_io(struct bio *bio, int err) bio_put(bio); } +/* + * Ensure that max discard sectors doesn't overflow bi_size and hopefully + * it is of the proper granularity as long as the granularity is a power + * of two. + */ +#define MAX_DISCARD_SECTORS ((1U << 31) >> 9) + /** * blkdev_issue_discard - queue a discard * @bdev: blockdev to issue discard for @@ -43,8 +50,6 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, DECLARE_COMPLETION_ONSTACK(wait); struct request_queue *q = bdev_get_queue(bdev); int type = REQ_WRITE | REQ_DISCARD; - unsigned int max_discard_sectors, granularity; - int alignment; struct bio_batch bb; struct bio *bio; int ret = 0; @@ -56,21 +61,6 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, if (!blk_queue_discard(q)) return -EOPNOTSUPP; - /* Zero-sector (unknown) and one-sector granularities are the same. */ - granularity = max(q->limits.discard_granularity >> 9, 1U); - alignment = (bdev_discard_alignment(bdev) >> 9) % granularity; - - /* - * Ensure that max_discard_sectors is of the proper - * granularity, so that requests stay aligned after a split. - */ - max_discard_sectors = min(q->limits.max_discard_sectors, UINT_MAX >> 9); - max_discard_sectors -= max_discard_sectors % granularity; - if (unlikely(!max_discard_sectors)) { - /* Avoid infinite loop below. Being cautious never hurts. */ - return -EOPNOTSUPP; - } - if (flags & BLKDEV_DISCARD_SECURE) { if (!blk_queue_secdiscard(q)) return -EOPNOTSUPP; @@ -84,7 +74,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, blk_start_plug(&plug); while (nr_sects) { unsigned int req_sects; - sector_t end_sect, tmp; + sector_t end_sect; bio = bio_alloc(gfp_mask, 1); if (!bio) { @@ -92,21 +82,8 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, break; } - req_sects = min_t(sector_t, nr_sects, max_discard_sectors); - - /* - * If splitting a request, and the next starting sector would be - * misaligned, stop the discard at the previous aligned sector. - */ + req_sects = min_t(sector_t, nr_sects, MAX_DISCARD_SECTORS); end_sect = sector + req_sects; - tmp = end_sect; - if (req_sects < nr_sects && - sector_div(tmp, granularity) != alignment) { - end_sect = end_sect - alignment; - sector_div(end_sect, granularity); - end_sect = end_sect * granularity + alignment; - req_sects = end_sect - sector; - } bio->bi_iter.bi_sector = sector; bio->bi_end_io = bio_batch_end_io; @@ -166,10 +143,8 @@ int blkdev_issue_write_same(struct block_device *bdev, sector_t sector, if (!q) return -ENXIO; - max_write_same_sectors = q->limits.max_write_same_sectors; - - if (max_write_same_sectors == 0) - return -EOPNOTSUPP; + /* Ensure that max_write_same_sectors doesn't overflow bi_size */ + max_write_same_sectors = UINT_MAX >> 9; atomic_set(&bb.done, 1); bb.flags = 1 << BIO_UPTODATE;
next prev parent reply other threads:[~2015-08-09 5:59 UTC|newest] Thread overview: 102+ messages / expand[flat|nested] mbox.gz Atom feed top 2015-07-06 7:44 [PATCH v5 01/11] block: make generic_make_request handle arbitrarily sized bios Ming Lin 2015-07-06 7:44 ` [PATCH v5 02/11] block: simplify bio_add_page() Ming Lin 2015-07-06 7:44 ` [PATCH v5 03/11] bcache: remove driver private bio splitting code Ming Lin 2015-07-06 7:44 ` [PATCH v5 04/11] btrfs: remove bio splitting and merge_bvec_fn() calls Ming Lin 2015-07-06 7:44 ` [PATCH v5 05/11] block: remove split code in blkdev_issue_discard Ming Lin 2015-07-06 7:44 ` [PATCH v5 06/11] md/raid5: split bio for chunk_aligned_read Ming Lin 2015-07-06 7:44 ` [PATCH v5 07/11] md/raid5: get rid of bio_fits_rdev() Ming Lin 2015-07-06 7:44 ` [PATCH v5 08/11] block: kill merge_bvec_fn() completely Ming Lin 2015-07-06 7:44 ` [PATCH v5 09/11] fs: use helper bio_add_page() instead of open coding on bi_io_vec Ming Lin 2015-07-06 7:44 ` [PATCH v5 10/11] block: remove bio_get_nr_vecs() Ming Lin 2015-07-06 10:58 ` Steven Whitehouse 2015-07-06 17:21 ` Ming Lin 2015-07-07 9:04 ` Steven Whitehouse 2015-07-06 7:44 ` [PATCH v5 11/11] Documentation: update notes in biovecs about arbitrarily sized bios Ming Lin 2015-07-31 19:23 ` [PATCH v5 01/11] block: make generic_make_request handle " Mike Snitzer 2015-07-31 21:19 ` Ming Lin 2015-07-31 21:19 ` Ming Lin 2015-07-31 21:38 ` Mike Snitzer 2015-07-31 21:38 ` Mike Snitzer 2015-07-31 22:02 ` Ming Lin 2015-07-31 22:18 ` Ming Lin 2015-08-01 6:58 ` Ming Lin 2015-08-01 6:58 ` Ming Lin 2015-08-01 16:33 ` Mike Snitzer 2015-08-03 5:58 ` Ming Lin 2015-08-03 5:58 ` Ming Lin 2015-08-04 11:36 ` Christoph Hellwig 2015-08-05 6:03 ` Ming Lin 2015-08-05 6:03 ` Ming Lin 2015-08-07 7:30 ` Christoph Hellwig 2015-08-07 7:30 ` Christoph Hellwig 2015-08-07 23:40 ` Ming Lin 2015-08-07 23:40 ` Ming Lin 2015-08-08 0:30 ` Kent Overstreet 2015-08-08 5:17 ` Ming Lin 2015-08-08 5:17 ` Ming Lin 2015-08-08 5:22 ` Kent Overstreet 2015-08-08 12:35 ` Christoph Hellwig 2015-08-08 12:35 ` Christoph Hellwig 2015-08-08 8:52 ` [dm-devel] " Hannes Reinecke 2015-08-08 9:02 ` Kent Overstreet 2015-08-13 6:04 ` Hannes Reinecke 2015-08-07 0:00 ` Kent Overstreet 2015-08-07 7:30 ` Christoph Hellwig 2015-08-07 7:30 ` Christoph Hellwig 2015-08-08 16:19 ` [dm-devel] " Martin K. Petersen 2015-08-08 16:19 ` Martin K. Petersen 2015-08-09 5:59 ` Ming Lin [this message] 2015-08-09 5:59 ` Ming Lin 2015-08-09 6:41 ` Christoph Hellwig 2015-08-09 6:41 ` Christoph Hellwig 2015-08-09 6:55 ` Ming Lin 2015-08-09 6:55 ` Ming Lin 2015-08-09 7:01 ` Christoph Hellwig 2015-08-09 7:01 ` Christoph Hellwig 2015-08-09 7:18 ` Ming Lin 2015-08-09 7:18 ` Ming Lin 2015-08-10 15:02 ` Mike Snitzer 2015-08-10 15:02 ` Mike Snitzer 2015-08-10 16:14 ` Ming Lin 2015-08-10 16:14 ` Ming Lin 2015-08-10 16:18 ` Ming Lin 2015-08-10 16:18 ` Ming Lin 2015-08-10 16:40 ` Martin K. Petersen 2015-08-10 16:40 ` Martin K. Petersen 2015-08-10 18:13 ` Mike Snitzer 2015-08-10 22:30 ` Ming Lin 2015-08-10 22:30 ` Ming Lin 2015-08-10 16:22 ` Martin K. Petersen 2015-08-10 16:22 ` Martin K. Petersen 2015-08-10 18:18 ` Ming Lin 2015-08-11 2:00 ` Martin K. Petersen 2015-08-11 2:00 ` Martin K. Petersen 2015-08-11 2:41 ` Mike Snitzer 2015-08-11 2:41 ` Mike Snitzer 2015-08-11 3:38 ` Kent Overstreet 2015-08-11 14:08 ` Mike Snitzer 2015-08-11 14:08 ` Mike Snitzer 2015-08-11 17:49 ` Martin K. Petersen 2015-08-11 17:49 ` Martin K. Petersen 2015-08-11 18:05 ` Martin K. Petersen 2015-08-11 18:05 ` Martin K. Petersen 2015-08-11 20:56 ` Ming Lin 2015-08-11 20:56 ` Ming Lin 2015-08-12 0:24 ` Martin K. Petersen 2015-08-12 0:24 ` Martin K. Petersen 2015-08-12 4:41 ` Ming Lin 2015-08-12 4:41 ` Ming Lin 2015-08-11 17:36 ` Martin K. Petersen 2015-08-11 17:36 ` Martin K. Petersen 2015-08-11 17:47 ` Mike Snitzer 2015-08-11 17:47 ` Mike Snitzer 2015-08-11 18:01 ` [dm-devel] " Martin K. Petersen 2015-08-11 18:01 ` Martin K. Petersen 2015-08-18 5:09 ` Ming Lin 2015-08-18 7:04 ` Ming Lin 2015-08-18 14:45 ` Mike Snitzer 2015-08-18 17:32 ` Ming Lin 2015-08-18 19:59 ` Mike Snitzer 2015-08-18 21:16 ` Ming Lin 2015-08-18 21:22 ` Mike Snitzer 2015-08-18 22:17 ` Ming Lin
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=1439099990.7880.0.camel@hasee \ --to=mlin@kernel.org \ --cc=agk@redhat.com \ --cc=andreas.dilger@intel.com \ --cc=axboe@kernel.dk \ --cc=dm-devel@redhat.com \ --cc=dpark@posteo.net \ --cc=drbd-dev@lists.linbit.com \ --cc=drbd-user@lists.linbit.com \ --cc=geoff@infradead.org \ --cc=hch@infradead.org \ --cc=hch@lst.de \ --cc=jim@jtan.com \ --cc=jkosina@suse.cz \ --cc=josh.h.morris@us.ibm.com \ --cc=kent.overstreet@gmail.com \ --cc=linux-kernel@vger.kernel.org \ --cc=martin.petersen@oracle.com \ --cc=minchan@kernel.org \ --cc=ming.l@ssi.samsung.com \ --cc=ming.lei@canonical.com \ --cc=ngupta@vflare.org \ --cc=oleg.drokin@intel.com \ --cc=pjk1939@linux.vnet.ibm.com \ --cc=snitzer@redhat.com \ --cc=viro@zeniv.linux.org.uk \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.