* [PATCH 0/3] Fixes for gapped scatters @ 2015-07-15 13:19 Sagi Grimberg 2015-07-15 13:19 ` [PATCH 1/3] block: copy multi iovec user mappings if QUEUE_FLAG_SG_GAPS is set Sagi Grimberg ` (3 more replies) 0 siblings, 4 replies; 29+ messages in thread From: Sagi Grimberg @ 2015-07-15 13:19 UTC (permalink / raw) This small set addresses some issues with gapped scattered IOs. The motivation of this is to have the iser driver take advantage of it too. Patch 1 prevent gapped SG_IO if the driver can't handle gaps. Patches 2,3 prevent gapped integrity payloads (for drivers that support data integrity). Review is welcome. Christoph Hellwig (1): block: copy multi iovec user mappings if QUEUE_FLAG_SG_GAPS is set Sagi Grimberg (2): block: Refuse request/bio merges with gaps in the integrity payload block: Refuse adding appending a gapped integrity page to a bio block/bio-integrity.c | 9 +++++++++ block/blk-integrity.c | 32 +++++++++++++++++++++++++++----- block/blk-map.c | 4 +++- 3 files changed, 39 insertions(+), 6 deletions(-) -- 1.8.4.3 ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 1/3] block: copy multi iovec user mappings if QUEUE_FLAG_SG_GAPS is set 2015-07-15 13:19 [PATCH 0/3] Fixes for gapped scatters Sagi Grimberg @ 2015-07-15 13:19 ` Sagi Grimberg 2015-07-16 16:47 ` Keith Busch 2015-07-15 13:19 ` [PATCH 2/3] block: Refuse request/bio merges with gaps in the integrity payload Sagi Grimberg ` (2 subsequent siblings) 3 siblings, 1 reply; 29+ messages in thread From: Sagi Grimberg @ 2015-07-15 13:19 UTC (permalink / raw) From: Christoph Hellwig <hch@lst.de> For drivers that don't support gaps in the SG lists handed to them we must not create bios from multiple iovecs as they can be (and usually are) discontiguous. This doesn't matter for any current user, but will help to allow iSER which can't handle gaps to use the QUEUE_FLAG_SG_GAPS flag instead of using driver-local bounce buffering. Signed-off-by: Christoph Hellwig <hch at lst.de> Signed-off-by: Sagi Grimberg <sagig at mellanox.com> --- block/blk-map.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/block/blk-map.c b/block/blk-map.c index da310a1..86962eb 100644 --- a/block/blk-map.c +++ b/block/blk-map.c @@ -85,7 +85,9 @@ int blk_rq_map_user_iov(struct request_queue *q, struct request *rq, unaligned = 1; } - if (unaligned || (q->dma_pad_mask & iter->count) || map_data) + if (unaligned || map_data || + (q->dma_pad_mask & iter->count) || + ((q->queue_flags & (1 << QUEUE_FLAG_SG_GAPS) && iter->nr_segs > 1))) bio = bio_copy_user_iov(q, map_data, iter, gfp_mask); else bio = bio_map_user_iov(q, iter, gfp_mask); -- 1.8.4.3 ^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 1/3] block: copy multi iovec user mappings if QUEUE_FLAG_SG_GAPS is set 2015-07-15 13:19 ` [PATCH 1/3] block: copy multi iovec user mappings if QUEUE_FLAG_SG_GAPS is set Sagi Grimberg @ 2015-07-16 16:47 ` Keith Busch 2015-07-16 16:49 ` Jens Axboe 2015-07-17 7:43 ` Christoph Hellwig 0 siblings, 2 replies; 29+ messages in thread From: Keith Busch @ 2015-07-16 16:47 UTC (permalink / raw) On Wed, 15 Jul 2015, Sagi Grimberg wrote: > For drivers that don't support gaps in the SG lists handed to them we must > not create bios from multiple iovecs as they can be (and usually are) > discontiguous. This doesn't matter for any current user, but will help > to allow iSER which can't handle gaps to use the QUEUE_FLAG_SG_GAPS flag > instead of using driver-local bounce buffering. Maybe SG_GAPS is a bit of a misnomer. There are cases we can directly map a user iov with multiple discontiguous vectors. At least for NVMe it'd work if each iov's base and length are on page boundaries, we don't need to do the indirect copy. Like with this user example: --- int fd = open("/dev/nvme0n1", O_RDONLY | O_DIRECT); struct iovec iov[2]; unsigned char *buffer; posix_memalign((void **)&buffer, 0x4000, 0x1000); iov[0].iov_base = &buffer[0]; iov[1].iov_base = &buffer[0x2000]; iov[0].iov_len = 0x1000; iov[1].iov_len = 0x1000; readv(fd, iov, 2); -- ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 1/3] block: copy multi iovec user mappings if QUEUE_FLAG_SG_GAPS is set 2015-07-16 16:47 ` Keith Busch @ 2015-07-16 16:49 ` Jens Axboe 2015-07-16 19:47 ` Matthew Wilcox 2015-07-17 7:44 ` Christoph Hellwig 2015-07-17 7:43 ` Christoph Hellwig 1 sibling, 2 replies; 29+ messages in thread From: Jens Axboe @ 2015-07-16 16:49 UTC (permalink / raw) On 07/16/2015 10:47 AM, Keith Busch wrote: > On Wed, 15 Jul 2015, Sagi Grimberg wrote: >> For drivers that don't support gaps in the SG lists handed to them we >> must >> not create bios from multiple iovecs as they can be (and usually are) >> discontiguous. This doesn't matter for any current user, but will help >> to allow iSER which can't handle gaps to use the QUEUE_FLAG_SG_GAPS flag >> instead of using driver-local bounce buffering. > > Maybe SG_GAPS is a bit of a misnomer. There are cases we can directly > map a user iov with multiple discontiguous vectors. At least for NVMe > it'd work if each iov's base and length are on page boundaries, we don't > need to do the indirect copy. > > Like with this user example: > --- > int fd = open("/dev/nvme0n1", O_RDONLY | O_DIRECT); > struct iovec iov[2]; > unsigned char *buffer; > > posix_memalign((void **)&buffer, 0x4000, 0x1000); > > iov[0].iov_base = &buffer[0]; > iov[1].iov_base = &buffer[0x2000]; > iov[0].iov_len = 0x1000; > iov[1].iov_len = 0x1000; > > readv(fd, iov, 2); SG_GAPS is a bit of a misnomer, but it's not easy to explain exactly what that logic is in a few short words! SG gaps was the closest I could come up with. -- Jens Axboe ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 1/3] block: copy multi iovec user mappings if QUEUE_FLAG_SG_GAPS is set 2015-07-16 16:49 ` Jens Axboe @ 2015-07-16 19:47 ` Matthew Wilcox 2015-07-17 15:26 ` Jens Axboe 2015-07-17 7:44 ` Christoph Hellwig 1 sibling, 1 reply; 29+ messages in thread From: Matthew Wilcox @ 2015-07-16 19:47 UTC (permalink / raw) On Thu, Jul 16, 2015@10:49:54AM -0600, Jens Axboe wrote: > SG_GAPS is a bit of a misnomer, but it's not easy to explain exactly what > that logic is in a few short words! SG gaps was the closest I could come up > with. The sense is wrong -- surely it should be 'NOGAPS'? ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 1/3] block: copy multi iovec user mappings if QUEUE_FLAG_SG_GAPS is set 2015-07-16 19:47 ` Matthew Wilcox @ 2015-07-17 15:26 ` Jens Axboe 0 siblings, 0 replies; 29+ messages in thread From: Jens Axboe @ 2015-07-17 15:26 UTC (permalink / raw) On 07/16/2015 01:47 PM, Matthew Wilcox wrote: > On Thu, Jul 16, 2015@10:49:54AM -0600, Jens Axboe wrote: >> SG_GAPS is a bit of a misnomer, but it's not easy to explain exactly what >> that logic is in a few short words! SG gaps was the closest I could come up >> with. > > The sense is wrong -- surely it should be 'NOGAPS'? That might make it clearer :-) -- Jens Axboe ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 1/3] block: copy multi iovec user mappings if QUEUE_FLAG_SG_GAPS is set 2015-07-16 16:49 ` Jens Axboe 2015-07-16 19:47 ` Matthew Wilcox @ 2015-07-17 7:44 ` Christoph Hellwig 1 sibling, 0 replies; 29+ messages in thread From: Christoph Hellwig @ 2015-07-17 7:44 UTC (permalink / raw) On Thu, Jul 16, 2015@10:49:54AM -0600, Jens Axboe wrote: > SG_GAPS is a bit of a misnomer, but it's not easy to explain exactly what > that logic is in a few short words! SG gaps was the closest I could come up > with. SG_NO_OFFSETS_INTO_TRAILING_PAGES would desscribe it, but is a little too long.. ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 1/3] block: copy multi iovec user mappings if QUEUE_FLAG_SG_GAPS is set 2015-07-16 16:47 ` Keith Busch 2015-07-16 16:49 ` Jens Axboe @ 2015-07-17 7:43 ` Christoph Hellwig 1 sibling, 0 replies; 29+ messages in thread From: Christoph Hellwig @ 2015-07-17 7:43 UTC (permalink / raw) On Thu, Jul 16, 2015@04:47:12PM +0000, Keith Busch wrote: > On Wed, 15 Jul 2015, Sagi Grimberg wrote: >> For drivers that don't support gaps in the SG lists handed to them we must >> not create bios from multiple iovecs as they can be (and usually are) >> discontiguous. This doesn't matter for any current user, but will help >> to allow iSER which can't handle gaps to use the QUEUE_FLAG_SG_GAPS flag >> instead of using driver-local bounce buffering. > > Maybe SG_GAPS is a bit of a misnomer. There are cases we can directly > map a user iov with multiple discontiguous vectors. At least for NVMe > it'd work if each iov's base and length are on page boundaries, we don't > need to do the indirect copy. Yes, right now the check is a little pessimistic. If we really care for this use case we can iterate over all iovecs and check if they would introduce offsets into a page. Note that NVMe never uses blk_rq_map_user_iov with more than a single iov so it doesn't apply there, there concern here is iSER, but the same rules apply there as well. ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 2/3] block: Refuse request/bio merges with gaps in the integrity payload 2015-07-15 13:19 [PATCH 0/3] Fixes for gapped scatters Sagi Grimberg 2015-07-15 13:19 ` [PATCH 1/3] block: copy multi iovec user mappings if QUEUE_FLAG_SG_GAPS is set Sagi Grimberg @ 2015-07-15 13:19 ` Sagi Grimberg 2015-07-15 13:19 ` [PATCH 3/3] block: Refuse adding appending a gapped integrity page to a bio Sagi Grimberg 2015-07-17 1:39 ` [PATCH 0/3] Fixes for gapped scatters Martin K. Petersen 3 siblings, 0 replies; 29+ messages in thread From: Sagi Grimberg @ 2015-07-15 13:19 UTC (permalink / raw) If a driver turned on QUEUE_FLAG_SG_GAPS, it means that it cannot handle gaps so we must not allow those in the integrity payload as well. Signed-off-by: Sagi Grimberg <sagig at mellanox.com> --- block/blk-integrity.c | 32 +++++++++++++++++++++++++++----- 1 file changed, 27 insertions(+), 5 deletions(-) diff --git a/block/blk-integrity.c b/block/blk-integrity.c index 79ffb48..bb18e0a 100644 --- a/block/blk-integrity.c +++ b/block/blk-integrity.c @@ -189,20 +189,31 @@ EXPORT_SYMBOL(blk_integrity_compare); bool blk_integrity_merge_rq(struct request_queue *q, struct request *req, struct request *next) { + struct bio_integrity_payload *bip = bio_integrity(req->bio); + struct bio_integrity_payload *bip_next = bio_integrity(next->bio); + if (blk_integrity_rq(req) == 0 && blk_integrity_rq(next) == 0) return true; if (blk_integrity_rq(req) == 0 || blk_integrity_rq(next) == 0) return false; - if (bio_integrity(req->bio)->bip_flags != - bio_integrity(next->bio)->bip_flags) + if (bip->bip_flags != bip_next->bip_flags) return false; if (req->nr_integrity_segments + next->nr_integrity_segments > q->limits.max_integrity_segments) return false; + if (q->queue_flags & (1 << QUEUE_FLAG_SG_GAPS)) { + struct bio_vec *iv, *iv_next; + + iv = &bip->bip_vec[bip->bip_vcnt]; + iv_next = &bip_next->bip_vec[0]; + if (bvec_gap_to_prev(iv, iv_next->bv_offset)) + return false; + } + return true; } EXPORT_SYMBOL(blk_integrity_merge_rq); @@ -212,16 +223,27 @@ bool blk_integrity_merge_bio(struct request_queue *q, struct request *req, { int nr_integrity_segs; struct bio *next = bio->bi_next; + struct bio_integrity_payload *bip = bio_integrity(req->bio); + struct bio_integrity_payload *bip_next = bio_integrity(bio); - if (blk_integrity_rq(req) == 0 && bio_integrity(bio) == NULL) + if (blk_integrity_rq(req) == 0 && bip_next == NULL) return true; - if (blk_integrity_rq(req) == 0 || bio_integrity(bio) == NULL) + if (blk_integrity_rq(req) == 0 || bip_next == NULL) return false; - if (bio_integrity(req->bio)->bip_flags != bio_integrity(bio)->bip_flags) + if (bip->bip_flags != bip_next->bip_flags) return false; + if (q->queue_flags & (1 << QUEUE_FLAG_SG_GAPS)) { + struct bio_vec *iv, *iv_next; + + iv = &bip->bip_vec[bip->bip_vcnt - 1]; + iv_next = &bip_next->bip_vec[bip_next->bip_vcnt]; + if (bvec_gap_to_prev(iv, iv_next->bv_offset)) + return false; + } + bio->bi_next = NULL; nr_integrity_segs = blk_rq_count_integrity_sg(q, bio); bio->bi_next = next; -- 1.8.4.3 ^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 3/3] block: Refuse adding appending a gapped integrity page to a bio 2015-07-15 13:19 [PATCH 0/3] Fixes for gapped scatters Sagi Grimberg 2015-07-15 13:19 ` [PATCH 1/3] block: copy multi iovec user mappings if QUEUE_FLAG_SG_GAPS is set Sagi Grimberg 2015-07-15 13:19 ` [PATCH 2/3] block: Refuse request/bio merges with gaps in the integrity payload Sagi Grimberg @ 2015-07-15 13:19 ` Sagi Grimberg 2015-07-15 15:28 ` Jens Axboe 2015-07-17 1:39 ` [PATCH 0/3] Fixes for gapped scatters Martin K. Petersen 3 siblings, 1 reply; 29+ messages in thread From: Sagi Grimberg @ 2015-07-15 13:19 UTC (permalink / raw) This is only theoretical at the moment given that the only subsystems that generate integrity payloads are the block layer itself and the scsi target. But when we will expose integrity meta-data to user-space, we'll need to refuse appending a page with a gap when QUEUE_FLAG_SG_GAPS flag is on. Signed-off-by: Sagi Grimberg <sagig at mellanox.com> --- block/bio-integrity.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/block/bio-integrity.c b/block/bio-integrity.c index 5cbd5d9..a18edb8 100644 --- a/block/bio-integrity.c +++ b/block/bio-integrity.c @@ -140,6 +140,15 @@ int bio_integrity_add_page(struct bio *bio, struct page *page, iv = bip->bip_vec + bip->bip_vcnt; + if (bip->bip_vcnt) { + struct request_queue *q = bdev_get_queue(bio->bi_bdev); + struct bio_vec *iv_prev = &bip->bip_vec[bip->bip_vcnt - 1]; + + if (q->queue_flags & (1 << QUEUE_FLAG_SG_GAPS) && + bvec_gap_to_prev(iv_prev, offset)) + return 0; + } + iv->bv_page = page; iv->bv_len = len; iv->bv_offset = offset; -- 1.8.4.3 ^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 3/3] block: Refuse adding appending a gapped integrity page to a bio 2015-07-15 13:19 ` [PATCH 3/3] block: Refuse adding appending a gapped integrity page to a bio Sagi Grimberg @ 2015-07-15 15:28 ` Jens Axboe 2015-07-16 9:26 ` Christoph Hellwig 0 siblings, 1 reply; 29+ messages in thread From: Jens Axboe @ 2015-07-15 15:28 UTC (permalink / raw) On 07/15/2015 07:19 AM, Sagi Grimberg wrote: > This is only theoretical at the moment given that the only > subsystems that generate integrity payloads are the block layer > itself and the scsi target. But when we will expose integrity > meta-data to user-space, we'll need to refuse appending a page > with a gap when QUEUE_FLAG_SG_GAPS flag is on. > > Signed-off-by: Sagi Grimberg <sagig at mellanox.com> > --- > block/bio-integrity.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/block/bio-integrity.c b/block/bio-integrity.c > index 5cbd5d9..a18edb8 100644 > --- a/block/bio-integrity.c > +++ b/block/bio-integrity.c > @@ -140,6 +140,15 @@ int bio_integrity_add_page(struct bio *bio, struct page *page, > > iv = bip->bip_vec + bip->bip_vcnt; > > + if (bip->bip_vcnt) { > + struct request_queue *q = bdev_get_queue(bio->bi_bdev); > + struct bio_vec *iv_prev = &bip->bip_vec[bip->bip_vcnt - 1]; > + > + if (q->queue_flags & (1 << QUEUE_FLAG_SG_GAPS) && > + bvec_gap_to_prev(iv_prev, offset)) > + return 0; > + } This chunk of code largely ends up being duplicated in several places with your patches, might be a good idea to pull that into a helper function. Then you can add the comment in one place as well :-) -- Jens Axboe ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 3/3] block: Refuse adding appending a gapped integrity page to a bio 2015-07-15 15:28 ` Jens Axboe @ 2015-07-16 9:26 ` Christoph Hellwig 2015-07-16 15:58 ` Jens Axboe 2015-07-17 1:50 ` Martin K. Petersen 0 siblings, 2 replies; 29+ messages in thread From: Christoph Hellwig @ 2015-07-16 9:26 UTC (permalink / raw) On Wed, Jul 15, 2015@09:28:23AM -0600, Jens Axboe wrote: > This chunk of code largely ends up being duplicated in several places with > your patches, might be a good idea to pull that into a helper function. Then > you can add the comment in one place as well :-) Basically all the bio_vec-related code in block-integrity is copy & paste of the I/O path. In the long run we'll need to refactor it by adding bio_vec iteration helpers that can pick different anchors from the bio. But for now I think I'd rather have this quick fix. ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 3/3] block: Refuse adding appending a gapped integrity page to a bio 2015-07-16 9:26 ` Christoph Hellwig @ 2015-07-16 15:58 ` Jens Axboe 2015-07-19 15:18 ` Sagi Grimberg 2015-07-17 1:50 ` Martin K. Petersen 1 sibling, 1 reply; 29+ messages in thread From: Jens Axboe @ 2015-07-16 15:58 UTC (permalink / raw) On 07/16/2015 03:26 AM, Christoph Hellwig wrote: > On Wed, Jul 15, 2015@09:28:23AM -0600, Jens Axboe wrote: >> This chunk of code largely ends up being duplicated in several places with >> your patches, might be a good idea to pull that into a helper function. Then >> you can add the comment in one place as well :-) > > Basically all the bio_vec-related code in block-integrity is copy & > paste of the I/O path. In the long run we'll need to refactor it by > adding bio_vec iteration helpers that can pick different anchors from > the bio. But for now I think I'd rather have this quick fix. I wasn't asking for a refactor of the bio_vec integrity code (though that would be nice...), just that we don't add 3 more instances of 'does this gap with previous'. -- Jens Axboe ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 3/3] block: Refuse adding appending a gapped integrity page to a bio 2015-07-16 15:58 ` Jens Axboe @ 2015-07-19 15:18 ` Sagi Grimberg 2015-08-19 9:40 ` Christoph Hellwig 0 siblings, 1 reply; 29+ messages in thread From: Sagi Grimberg @ 2015-07-19 15:18 UTC (permalink / raw) On 7/16/2015 6:58 PM, Jens Axboe wrote: > On 07/16/2015 03:26 AM, Christoph Hellwig wrote: >> On Wed, Jul 15, 2015@09:28:23AM -0600, Jens Axboe wrote: >>> This chunk of code largely ends up being duplicated in several places >>> with >>> your patches, might be a good idea to pull that into a helper >>> function. Then >>> you can add the comment in one place as well :-) >> >> Basically all the bio_vec-related code in block-integrity is copy & >> paste of the I/O path. In the long run we'll need to refactor it by >> adding bio_vec iteration helpers that can pick different anchors from >> the bio. But for now I think I'd rather have this quick fix. > > I wasn't asking for a refactor of the bio_vec integrity code (though > that would be nice...), just that we don't add 3 more instances of 'does > this gap with previous'. > Hi Jens, I'll centralize the checks. No problem. Oddly enough I still see "gappy" IO when I'm running in the non-mq path. I suspect it's coming from the elevator code somehow... I assume no one tried to run it on non-mq right? Sagi. ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 3/3] block: Refuse adding appending a gapped integrity page to a bio 2015-07-19 15:18 ` Sagi Grimberg @ 2015-08-19 9:40 ` Christoph Hellwig 2015-08-19 10:30 ` Sagi Grimberg 0 siblings, 1 reply; 29+ messages in thread From: Christoph Hellwig @ 2015-08-19 9:40 UTC (permalink / raw) On Sun, Jul 19, 2015@06:18:09PM +0300, Sagi Grimberg wrote: > I'll centralize the checks. No problem. Did you get a chance to respin these patches? It would be good to get them in this merge window. ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 3/3] block: Refuse adding appending a gapped integrity page to a bio 2015-08-19 9:40 ` Christoph Hellwig @ 2015-08-19 10:30 ` Sagi Grimberg 2015-08-19 10:42 ` Christoph Hellwig 0 siblings, 1 reply; 29+ messages in thread From: Sagi Grimberg @ 2015-08-19 10:30 UTC (permalink / raw) On 8/19/2015 12:40 PM, Christoph Hellwig wrote: > On Sun, Jul 19, 2015@06:18:09PM +0300, Sagi Grimberg wrote: >> I'll centralize the checks. No problem. > > Did you get a chance to respin these patches? It would be > good to get them in this merge window. > Actually I didn't. I started to, but then I noticed that I was still seeing gaps when using cfq (e.g. non-mq code path), I assume that this was never tested? Since I was busy with the iser-target iWARP support with Steve so I didn't get a chance to get back to it yet... Sagi. ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 3/3] block: Refuse adding appending a gapped integrity page to a bio 2015-08-19 10:30 ` Sagi Grimberg @ 2015-08-19 10:42 ` Christoph Hellwig 2015-09-02 8:04 ` Sagi Grimberg 0 siblings, 1 reply; 29+ messages in thread From: Christoph Hellwig @ 2015-08-19 10:42 UTC (permalink / raw) On Wed, Aug 19, 2015@01:30:56PM +0300, Sagi Grimberg wrote: > Actually I didn't. I started to, but then I noticed that > I was still seeing gaps when using cfq (e.g. non-mq code > path), I assume that this was never tested? It probably wasn't. The only user so far is the NVMe driver which is blk-mq only. > Since I was busy with the iser-target iWARP support with Steve > so I didn't get a chance to get back to it yet... Ok. ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 3/3] block: Refuse adding appending a gapped integrity page to a bio 2015-08-19 10:42 ` Christoph Hellwig @ 2015-09-02 8:04 ` Sagi Grimberg 2015-09-02 14:37 ` Jens Axboe 0 siblings, 1 reply; 29+ messages in thread From: Sagi Grimberg @ 2015-09-02 8:04 UTC (permalink / raw) On 8/19/2015 1:42 PM, Christoph Hellwig wrote: > On Wed, Aug 19, 2015@01:30:56PM +0300, Sagi Grimberg wrote: >> Actually I didn't. I started to, but then I noticed that >> I was still seeing gaps when using cfq (e.g. non-mq code >> path), I assume that this was never tested? > > It probably wasn't. The only user so far is the NVMe driver which > is blk-mq only. So I got back to have a look on this. I originally thought that this issue was specific to io schedulers, but I don't think it is anymore, its just easier to trigger with io schedulers. It seems we are only protecting against gapped back merges (i.e. appending a gapped bio to a request biotail) but we are not protecting against front merges (i.e. adding a gapped bio to request as the bio head). Imagine we have two bio_vec elements and the queue boundary is 0xfff: req_bvec: offset=0xe00 length=0x200 bio_bvec: offset=0x0 length=0x200 bvec_gap_to_prev() will allow back merging {req_bvec, bio_bvec} as bio_vec->offset=0x0 and req_bvec->offset + req_bvec->length is aligned to the queue boundary, but the problem is we might do a front merge {bio_bvec, req_bvec} which gives us a gap. I'm able to reproduce this with iser with 512B sequential reads (encourage gapped merges) but I wonder how this issue was missed in nvme (the code seem to allow front merges)? Anyway, the patch below seems to solved the issue for me: Comments? -- commit 100383e208045368b4e3ac20e3fa46bdad966df2 Author: Sagi Grimberg <sagig at mellanox.com> Date: Wed Sep 2 01:31:39 2015 +0300 block: Protect against front merges with gaps We seem to have missed gaps detection on front merges. Introduce req_gap_to_next call that is should detect a gap in a front merge. Signed-off-by: Sagi Grimberg <sagig at mellanox.com> diff --git a/block/blk-merge.c b/block/blk-merge.c index 09bf5cb..c95152a 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -327,9 +327,20 @@ int ll_back_merge_fn(struct request_queue *q, struct request *req, return ll_new_hw_segment(q, req, bio); } +static int req_gap_to_next(struct request *req, struct bio *bio) +{ + struct bio *next = req->bio; + + return bvec_gap_to_prev(req->q, &bio->bi_io_vec[bio->bi_vcnt - 1], + next->bi_io_vec[0].bv_offset); +} + int ll_front_merge_fn(struct request_queue *q, struct request *req, struct bio *bio) { + if (req_gap_to_next(req, bio)) + return 0; + if (blk_rq_sectors(req) + bio_sectors(bio) > blk_rq_get_max_sectors(req)) { req->cmd_flags |= REQ_NOMERGE; -- ^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 3/3] block: Refuse adding appending a gapped integrity page to a bio 2015-09-02 8:04 ` Sagi Grimberg @ 2015-09-02 14:37 ` Jens Axboe 2015-09-02 17:30 ` Sagi Grimberg 0 siblings, 1 reply; 29+ messages in thread From: Jens Axboe @ 2015-09-02 14:37 UTC (permalink / raw) On 09/02/2015 02:04 AM, Sagi Grimberg wrote: > On 8/19/2015 1:42 PM, Christoph Hellwig wrote: >> On Wed, Aug 19, 2015@01:30:56PM +0300, Sagi Grimberg wrote: >>> Actually I didn't. I started to, but then I noticed that >>> I was still seeing gaps when using cfq (e.g. non-mq code >>> path), I assume that this was never tested? >> >> It probably wasn't. The only user so far is the NVMe driver which >> is blk-mq only. > > So I got back to have a look on this. I originally thought that > this issue was specific to io schedulers, but I don't think it is > anymore, its just easier to trigger with io schedulers. > > It seems we are only protecting against gapped back merges (i.e. > appending a gapped bio to a request biotail) but we are not protecting > against front merges (i.e. adding a gapped bio to request as the bio > head). > > Imagine we have two bio_vec elements and the queue boundary is 0xfff: > req_bvec: offset=0xe00 length=0x200 > bio_bvec: offset=0x0 length=0x200 > > bvec_gap_to_prev() will allow back merging {req_bvec, bio_bvec} as > bio_vec->offset=0x0 and req_bvec->offset + req_bvec->length is aligned > to the queue boundary, but the problem is we might do a front merge > {bio_bvec, req_bvec} which gives us a gap. > > I'm able to reproduce this with iser with 512B sequential reads > (encourage gapped merges) but I wonder how this issue was missed in > nvme (the code seem to allow front merges)? > > Anyway, the patch below seems to solved the issue for me: > > Comments? Weird, I guess front merging was overlooked when the initial patch was added. Looks correct to me, and as far as I can see, we have now got all bases covered. But there's room for some cleanup now, where we check is a bit of a mess. If we kill the check in blk_rq_merge_ok() and only do them in the front/back merge points (and the req-to-req case), then I think that would be an improvement. Does the below work for you? diff --git a/block/blk-merge.c b/block/blk-merge.c index 30a0d9f89017..d226bc5e3b8d 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -309,9 +309,39 @@ no_merge: return 0; } +static bool queue_gap_check(struct request_queue *q, struct bio *bio) +{ + if (q->queue_flags & (1 << QUEUE_FLAG_SG_GAPS) && bio_has_data(bio)) + return true; + + return false; +} + +static bool bio_gap_to_prev(struct bio *prev, struct bio *next) +{ + return bvec_gap_to_prev(&prev->bi_io_vec[prev->bi_vcnt - 1], + next->bi_io_vec[0].bv_offset); +} + +static bool req_gap_to_prev(struct request *req, struct bio *bio) +{ + return bio_gap_to_prev(req->biotail, bio); +} + +static bool req_gap_to_next(struct request *req, struct bio *bio) +{ + struct bio *next = req->bio; + + return bvec_gap_to_prev(&bio->bi_io_vec[bio->bi_vcnt - 1], + next->bi_io_vec[0].bv_offset); +} + int ll_back_merge_fn(struct request_queue *q, struct request *req, struct bio *bio) { + if (queue_gap_check(q, bio) && req_gap_to_prev(req, bio)) + return 0; + if (blk_rq_sectors(req) + bio_sectors(bio) > blk_rq_get_max_sectors(req)) { req->cmd_flags |= REQ_NOMERGE; @@ -330,6 +360,9 @@ int ll_back_merge_fn(struct request_queue *q, struct request *req, int ll_front_merge_fn(struct request_queue *q, struct request *req, struct bio *bio) { + if (queue_gap_check(q, bio) && req_gap_to_next(req, bio)) + return 0; + if (blk_rq_sectors(req) + bio_sectors(bio) > blk_rq_get_max_sectors(req)) { req->cmd_flags |= REQ_NOMERGE; @@ -356,14 +389,6 @@ static bool req_no_special_merge(struct request *req) return !q->mq_ops && req->special; } -static int req_gap_to_prev(struct request *req, struct request *next) -{ - struct bio *prev = req->biotail; - - return bvec_gap_to_prev(&prev->bi_io_vec[prev->bi_vcnt - 1], - next->bio->bi_io_vec[0].bv_offset); -} - static int ll_merge_requests_fn(struct request_queue *q, struct request *req, struct request *next) { @@ -379,7 +404,7 @@ static int ll_merge_requests_fn(struct request_queue *q, struct request *req, return 0; if (test_bit(QUEUE_FLAG_SG_GAPS, &q->queue_flags) && - req_gap_to_prev(req, next)) + req_gap_to_prev(req, next->bio)) return 0; /* @@ -564,8 +589,6 @@ int blk_attempt_req_merge(struct request_queue *q, struct request *rq, bool blk_rq_merge_ok(struct request *rq, struct bio *bio) { - struct request_queue *q = rq->q; - if (!rq_mergeable(rq) || !bio_mergeable(bio)) return false; @@ -589,15 +612,6 @@ bool blk_rq_merge_ok(struct request *rq, struct bio *bio) !blk_write_same_mergeable(rq->bio, bio)) return false; - /* Only check gaps if the bio carries data */ - if (q->queue_flags & (1 << QUEUE_FLAG_SG_GAPS) && bio_has_data(bio)) { - struct bio_vec *bprev; - - bprev = &rq->biotail->bi_io_vec[rq->biotail->bi_vcnt - 1]; - if (bvec_gap_to_prev(bprev, bio->bi_io_vec[0].bv_offset)) - return false; - } - return true; } -- Jens Axboe ^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 3/3] block: Refuse adding appending a gapped integrity page to a bio 2015-09-02 14:37 ` Jens Axboe @ 2015-09-02 17:30 ` Sagi Grimberg 2015-09-02 18:03 ` Jens Axboe 2015-09-02 19:18 ` Jens Axboe 0 siblings, 2 replies; 29+ messages in thread From: Sagi Grimberg @ 2015-09-02 17:30 UTC (permalink / raw) On 9/2/2015 5:37 PM, Jens Axboe wrote: > On 09/02/2015 02:04 AM, Sagi Grimberg wrote: >> On 8/19/2015 1:42 PM, Christoph Hellwig wrote: >>> On Wed, Aug 19, 2015@01:30:56PM +0300, Sagi Grimberg wrote: >>>> Actually I didn't. I started to, but then I noticed that >>>> I was still seeing gaps when using cfq (e.g. non-mq code >>>> path), I assume that this was never tested? >>> >>> It probably wasn't. The only user so far is the NVMe driver which >>> is blk-mq only. >> >> So I got back to have a look on this. I originally thought that >> this issue was specific to io schedulers, but I don't think it is >> anymore, its just easier to trigger with io schedulers. >> >> It seems we are only protecting against gapped back merges (i.e. >> appending a gapped bio to a request biotail) but we are not protecting >> against front merges (i.e. adding a gapped bio to request as the bio >> head). >> >> Imagine we have two bio_vec elements and the queue boundary is 0xfff: >> req_bvec: offset=0xe00 length=0x200 >> bio_bvec: offset=0x0 length=0x200 >> >> bvec_gap_to_prev() will allow back merging {req_bvec, bio_bvec} as >> bio_vec->offset=0x0 and req_bvec->offset + req_bvec->length is aligned >> to the queue boundary, but the problem is we might do a front merge >> {bio_bvec, req_bvec} which gives us a gap. >> >> I'm able to reproduce this with iser with 512B sequential reads >> (encourage gapped merges) but I wonder how this issue was missed in >> nvme (the code seem to allow front merges)? >> >> Anyway, the patch below seems to solved the issue for me: >> >> Comments? > > Weird, I guess front merging was overlooked when the initial patch was > added. Looks correct to me, and as far as I can see, we have now got all > bases covered. > > But there's room for some cleanup now, where we check is a bit of a > mess. If we kill the check in blk_rq_merge_ok() and only do them in the > front/back merge points (and the req-to-req case), then I think that > would be an improvement. Yea, we do get to checking back merges even for front merges in this point... > Does the below work for you? It's not on top of Keith virt_boundary patch right? First glance looks ok though. > > > diff --git a/block/blk-merge.c b/block/blk-merge.c > index 30a0d9f89017..d226bc5e3b8d 100644 > --- a/block/blk-merge.c > +++ b/block/blk-merge.c > @@ -309,9 +309,39 @@ no_merge: > return 0; > } > > +static bool queue_gap_check(struct request_queue *q, struct bio *bio) > +{ > + if (q->queue_flags & (1 << QUEUE_FLAG_SG_GAPS) && bio_has_data(bio)) > + return true; > + > + return false; > +} > + > +static bool bio_gap_to_prev(struct bio *prev, struct bio *next) > +{ > + return bvec_gap_to_prev(&prev->bi_io_vec[prev->bi_vcnt - 1], > + next->bi_io_vec[0].bv_offset); > +} > + > +static bool req_gap_to_prev(struct request *req, struct bio *bio) > +{ > + return bio_gap_to_prev(req->biotail, bio); > +} > + > +static bool req_gap_to_next(struct request *req, struct bio *bio) > +{ > + struct bio *next = req->bio; > + > + return bvec_gap_to_prev(&bio->bi_io_vec[bio->bi_vcnt - 1], > + next->bi_io_vec[0].bv_offset); > +} > + > int ll_back_merge_fn(struct request_queue *q, struct request *req, > struct bio *bio) > { > + if (queue_gap_check(q, bio) && req_gap_to_prev(req, bio)) > + return 0; > + > if (blk_rq_sectors(req) + bio_sectors(bio) > > blk_rq_get_max_sectors(req)) { > req->cmd_flags |= REQ_NOMERGE; > @@ -330,6 +360,9 @@ int ll_back_merge_fn(struct request_queue *q, struct > request *req, > int ll_front_merge_fn(struct request_queue *q, struct request *req, > struct bio *bio) > { > + if (queue_gap_check(q, bio) && req_gap_to_next(req, bio)) > + return 0; > + > if (blk_rq_sectors(req) + bio_sectors(bio) > > blk_rq_get_max_sectors(req)) { > req->cmd_flags |= REQ_NOMERGE; > @@ -356,14 +389,6 @@ static bool req_no_special_merge(struct request *req) > return !q->mq_ops && req->special; > } > > -static int req_gap_to_prev(struct request *req, struct request *next) > -{ > - struct bio *prev = req->biotail; > - > - return bvec_gap_to_prev(&prev->bi_io_vec[prev->bi_vcnt - 1], > - next->bio->bi_io_vec[0].bv_offset); > -} > - > static int ll_merge_requests_fn(struct request_queue *q, struct request > *req, > struct request *next) > { > @@ -379,7 +404,7 @@ static int ll_merge_requests_fn(struct request_queue > *q, struct request *req, > return 0; > > if (test_bit(QUEUE_FLAG_SG_GAPS, &q->queue_flags) && > - req_gap_to_prev(req, next)) > + req_gap_to_prev(req, next->bio)) > return 0; > > /* > @@ -564,8 +589,6 @@ int blk_attempt_req_merge(struct request_queue *q, > struct request *rq, > > bool blk_rq_merge_ok(struct request *rq, struct bio *bio) > { > - struct request_queue *q = rq->q; > - > if (!rq_mergeable(rq) || !bio_mergeable(bio)) > return false; > > @@ -589,15 +612,6 @@ bool blk_rq_merge_ok(struct request *rq, struct bio > *bio) > !blk_write_same_mergeable(rq->bio, bio)) > return false; > > - /* Only check gaps if the bio carries data */ > - if (q->queue_flags & (1 << QUEUE_FLAG_SG_GAPS) && bio_has_data(bio)) { > - struct bio_vec *bprev; > - > - bprev = &rq->biotail->bi_io_vec[rq->biotail->bi_vcnt - 1]; > - if (bvec_gap_to_prev(bprev, bio->bi_io_vec[0].bv_offset)) > - return false; > - } > - > return true; > } > > ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 3/3] block: Refuse adding appending a gapped integrity page to a bio 2015-09-02 17:30 ` Sagi Grimberg @ 2015-09-02 18:03 ` Jens Axboe 2015-09-02 19:18 ` Jens Axboe 1 sibling, 0 replies; 29+ messages in thread From: Jens Axboe @ 2015-09-02 18:03 UTC (permalink / raw) On 09/02/2015 11:30 AM, Sagi Grimberg wrote: > On 9/2/2015 5:37 PM, Jens Axboe wrote: >> On 09/02/2015 02:04 AM, Sagi Grimberg wrote: >>> On 8/19/2015 1:42 PM, Christoph Hellwig wrote: >>>> On Wed, Aug 19, 2015@01:30:56PM +0300, Sagi Grimberg wrote: >>>>> Actually I didn't. I started to, but then I noticed that >>>>> I was still seeing gaps when using cfq (e.g. non-mq code >>>>> path), I assume that this was never tested? >>>> >>>> It probably wasn't. The only user so far is the NVMe driver which >>>> is blk-mq only. >>> >>> So I got back to have a look on this. I originally thought that >>> this issue was specific to io schedulers, but I don't think it is >>> anymore, its just easier to trigger with io schedulers. >>> >>> It seems we are only protecting against gapped back merges (i.e. >>> appending a gapped bio to a request biotail) but we are not protecting >>> against front merges (i.e. adding a gapped bio to request as the bio >>> head). >>> >>> Imagine we have two bio_vec elements and the queue boundary is 0xfff: >>> req_bvec: offset=0xe00 length=0x200 >>> bio_bvec: offset=0x0 length=0x200 >>> >>> bvec_gap_to_prev() will allow back merging {req_bvec, bio_bvec} as >>> bio_vec->offset=0x0 and req_bvec->offset + req_bvec->length is aligned >>> to the queue boundary, but the problem is we might do a front merge >>> {bio_bvec, req_bvec} which gives us a gap. >>> >>> I'm able to reproduce this with iser with 512B sequential reads >>> (encourage gapped merges) but I wonder how this issue was missed in >>> nvme (the code seem to allow front merges)? >>> >>> Anyway, the patch below seems to solved the issue for me: >>> >>> Comments? >> >> Weird, I guess front merging was overlooked when the initial patch was >> added. Looks correct to me, and as far as I can see, we have now got all >> bases covered. >> >> But there's room for some cleanup now, where we check is a bit of a >> mess. If we kill the check in blk_rq_merge_ok() and only do them in the >> front/back merge points (and the req-to-req case), then I think that >> would be an improvement. > > > Yea, we do get to checking back merges even for front merges in this > point... Yup >> Does the below work for you? > > It's not on top of Keith virt_boundary patch right? > First glance looks ok though. Looks like I was on the wrong branch, master. Let me update it for post that patch. -- Jens Axboe ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 3/3] block: Refuse adding appending a gapped integrity page to a bio 2015-09-02 17:30 ` Sagi Grimberg 2015-09-02 18:03 ` Jens Axboe @ 2015-09-02 19:18 ` Jens Axboe 2015-09-03 9:07 ` Sagi Grimberg 1 sibling, 1 reply; 29+ messages in thread From: Jens Axboe @ 2015-09-02 19:18 UTC (permalink / raw) On Wed, Sep 02 2015, Sagi Grimberg wrote: > >Does the below work for you? > > It's not on top of Keith virt_boundary patch right? > First glance looks ok though. Updated for that. diff --git a/block/blk-merge.c b/block/blk-merge.c index b2625271a572..5f35a05337b1 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -436,9 +436,27 @@ no_merge: return 0; } +static bool req_gap_to_prev(struct request *req, struct bio *next) +{ + struct bio *prev = req->biotail; + + return bvec_gap_to_prev(req->q, &prev->bi_io_vec[prev->bi_vcnt - 1], + next->bi_io_vec[0].bv_offset); +} + +static bool req_gap_to_next(struct request *req, struct bio *bio) +{ + struct bio *next = req->bio; + + return bvec_gap_to_prev(req->q, &bio->bi_io_vec[bio->bi_vcnt - 1], + next->bi_io_vec[0].bv_offset); +} + int ll_back_merge_fn(struct request_queue *q, struct request *req, struct bio *bio) { + if (bio_has_data(bio) && req_gap_to_prev(req, bio)) + return 0; if (blk_rq_sectors(req) + bio_sectors(bio) > blk_rq_get_max_sectors(req)) { req->cmd_flags |= REQ_NOMERGE; @@ -457,6 +475,9 @@ int ll_back_merge_fn(struct request_queue *q, struct request *req, int ll_front_merge_fn(struct request_queue *q, struct request *req, struct bio *bio) { + + if (bio_has_data(bio) && req_gap_to_next(req, bio)) + return 0; if (blk_rq_sectors(req) + bio_sectors(bio) > blk_rq_get_max_sectors(req)) { req->cmd_flags |= REQ_NOMERGE; @@ -483,14 +504,6 @@ static bool req_no_special_merge(struct request *req) return !q->mq_ops && req->special; } -static int req_gap_to_prev(struct request *req, struct bio *next) -{ - struct bio *prev = req->biotail; - - return bvec_gap_to_prev(req->q, &prev->bi_io_vec[prev->bi_vcnt - 1], - next->bi_io_vec[0].bv_offset); -} - static int ll_merge_requests_fn(struct request_queue *q, struct request *req, struct request *next) { @@ -713,10 +726,6 @@ bool blk_rq_merge_ok(struct request *rq, struct bio *bio) !blk_write_same_mergeable(rq->bio, bio)) return false; - /* Only check gaps if the bio carries data */ - if (bio_has_data(bio) && req_gap_to_prev(rq, bio)) - return false; - return true; } -- Jens Axboe ^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 3/3] block: Refuse adding appending a gapped integrity page to a bio 2015-09-02 19:18 ` Jens Axboe @ 2015-09-03 9:07 ` Sagi Grimberg 2015-09-03 14:53 ` Jens Axboe 2015-09-03 15:04 ` Jens Axboe 0 siblings, 2 replies; 29+ messages in thread From: Sagi Grimberg @ 2015-09-03 9:07 UTC (permalink / raw) On 9/2/2015 10:18 PM, Jens Axboe wrote: > On Wed, Sep 02 2015, Sagi Grimberg wrote: >>> Does the below work for you? >> >> It's not on top of Keith virt_boundary patch right? >> First glance looks ok though. > > Updated for that. > Thanks Jens, I'll test it. > > diff --git a/block/blk-merge.c b/block/blk-merge.c > index b2625271a572..5f35a05337b1 100644 > --- a/block/blk-merge.c > +++ b/block/blk-merge.c > @@ -436,9 +436,27 @@ no_merge: > return 0; > } > > +static bool req_gap_to_prev(struct request *req, struct bio *next) > +{ > + struct bio *prev = req->biotail; > + > + return bvec_gap_to_prev(req->q, &prev->bi_io_vec[prev->bi_vcnt - 1], > + next->bi_io_vec[0].bv_offset); > +} > + > +static bool req_gap_to_next(struct request *req, struct bio *bio) > +{ > + struct bio *next = req->bio; > + > + return bvec_gap_to_prev(req->q, &bio->bi_io_vec[bio->bi_vcnt - 1], > + next->bi_io_vec[0].bv_offset); > +} > + > int ll_back_merge_fn(struct request_queue *q, struct request *req, > struct bio *bio) > { > + if (bio_has_data(bio) && req_gap_to_prev(req, bio)) > + return 0; > if (blk_rq_sectors(req) + bio_sectors(bio) > > blk_rq_get_max_sectors(req)) { > req->cmd_flags |= REQ_NOMERGE; > @@ -457,6 +475,9 @@ int ll_back_merge_fn(struct request_queue *q, struct request *req, > int ll_front_merge_fn(struct request_queue *q, struct request *req, > struct bio *bio) > { > + > + if (bio_has_data(bio) && req_gap_to_next(req, bio)) > + return 0; I think it will be cleaner to git bio_has_data(bio) check into req_gap_to_[prev|next] ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 3/3] block: Refuse adding appending a gapped integrity page to a bio 2015-09-03 9:07 ` Sagi Grimberg @ 2015-09-03 14:53 ` Jens Axboe 2015-09-03 15:04 ` Jens Axboe 1 sibling, 0 replies; 29+ messages in thread From: Jens Axboe @ 2015-09-03 14:53 UTC (permalink / raw) On 09/03/2015 03:07 AM, Sagi Grimberg wrote: > On 9/2/2015 10:18 PM, Jens Axboe wrote: >> On Wed, Sep 02 2015, Sagi Grimberg wrote: >>>> Does the below work for you? >>> >>> It's not on top of Keith virt_boundary patch right? >>> First glance looks ok though. >> >> Updated for that. >> > > Thanks Jens, > > I'll test it. > >> >> diff --git a/block/blk-merge.c b/block/blk-merge.c >> index b2625271a572..5f35a05337b1 100644 >> --- a/block/blk-merge.c >> +++ b/block/blk-merge.c >> @@ -436,9 +436,27 @@ no_merge: >> return 0; >> } >> >> +static bool req_gap_to_prev(struct request *req, struct bio *next) >> +{ >> + struct bio *prev = req->biotail; >> + >> + return bvec_gap_to_prev(req->q, &prev->bi_io_vec[prev->bi_vcnt - 1], >> + next->bi_io_vec[0].bv_offset); >> +} >> + >> +static bool req_gap_to_next(struct request *req, struct bio *bio) >> +{ >> + struct bio *next = req->bio; >> + >> + return bvec_gap_to_prev(req->q, &bio->bi_io_vec[bio->bi_vcnt - 1], >> + next->bi_io_vec[0].bv_offset); >> +} >> + >> int ll_back_merge_fn(struct request_queue *q, struct request *req, >> struct bio *bio) >> { >> + if (bio_has_data(bio) && req_gap_to_prev(req, bio)) >> + return 0; >> if (blk_rq_sectors(req) + bio_sectors(bio) > >> blk_rq_get_max_sectors(req)) { >> req->cmd_flags |= REQ_NOMERGE; >> @@ -457,6 +475,9 @@ int ll_back_merge_fn(struct request_queue *q, >> struct request *req, >> int ll_front_merge_fn(struct request_queue *q, struct request *req, >> struct bio *bio) >> { >> + >> + if (bio_has_data(bio) && req_gap_to_next(req, bio)) >> + return 0; > > I think it will be cleaner to git bio_has_data(bio) check into > req_gap_to_[prev|next] Yeah I agree, I'll make that change. -- Jens Axboe ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 3/3] block: Refuse adding appending a gapped integrity page to a bio 2015-09-03 9:07 ` Sagi Grimberg 2015-09-03 14:53 ` Jens Axboe @ 2015-09-03 15:04 ` Jens Axboe 2015-09-03 15:41 ` Sagi Grimberg 1 sibling, 1 reply; 29+ messages in thread From: Jens Axboe @ 2015-09-03 15:04 UTC (permalink / raw) On Thu, Sep 03 2015, Sagi Grimberg wrote: > On 9/2/2015 10:18 PM, Jens Axboe wrote: > >On Wed, Sep 02 2015, Sagi Grimberg wrote: > >>>Does the below work for you? > >> > >>It's not on top of Keith virt_boundary patch right? > >>First glance looks ok though. > > > >Updated for that. > > > > Thanks Jens, > > I'll test it. Cleaned up version, unified the gap checking and changed the names of the function to make it easier to understand. And then we only need to check bio_has_data() in bio_will_gap(). Functionally it should be identical to the previous one. diff --git a/block/blk-merge.c b/block/blk-merge.c index cce23ba1ae5f..bf9c45ddcd11 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -435,9 +435,31 @@ no_merge: return 0; } +static bool bio_will_gap(struct request_queue *q, struct bio *prev, + struct bio *next) +{ + if (!bio_has_data(prev)) + return false; + + return bvec_gap_to_prev(q, &prev->bi_io_vec[prev->bi_vcnt - 1], + next->bi_io_vec[0].bv_offset); +} + +static bool req_gap_back_merge(struct request *req, struct bio *next) +{ + return bio_will_gap(req->q, req->biotail, next); +} + +static bool req_gap_front_merge(struct request *req, struct bio *bio) +{ + return bio_will_gap(req->q, bio, req->bio); +} + int ll_back_merge_fn(struct request_queue *q, struct request *req, struct bio *bio) { + if (req_gap_back_merge(req, bio)) + return 0; if (blk_rq_sectors(req) + bio_sectors(bio) > blk_rq_get_max_sectors(req)) { req->cmd_flags |= REQ_NOMERGE; @@ -456,6 +478,9 @@ int ll_back_merge_fn(struct request_queue *q, struct request *req, int ll_front_merge_fn(struct request_queue *q, struct request *req, struct bio *bio) { + + if (req_gap_front_merge(req, bio)) + return 0; if (blk_rq_sectors(req) + bio_sectors(bio) > blk_rq_get_max_sectors(req)) { req->cmd_flags |= REQ_NOMERGE; @@ -482,14 +507,6 @@ static bool req_no_special_merge(struct request *req) return !q->mq_ops && req->special; } -static int req_gap_to_prev(struct request *req, struct bio *next) -{ - struct bio *prev = req->biotail; - - return bvec_gap_to_prev(req->q, &prev->bi_io_vec[prev->bi_vcnt - 1], - next->bi_io_vec[0].bv_offset); -} - static int ll_merge_requests_fn(struct request_queue *q, struct request *req, struct request *next) { @@ -504,7 +521,7 @@ static int ll_merge_requests_fn(struct request_queue *q, struct request *req, if (req_no_special_merge(req) || req_no_special_merge(next)) return 0; - if (req_gap_to_prev(req, next->bio)) + if (req_gap_back_merge(req, next->bio)) return 0; /* @@ -712,10 +729,6 @@ bool blk_rq_merge_ok(struct request *rq, struct bio *bio) !blk_write_same_mergeable(rq->bio, bio)) return false; - /* Only check gaps if the bio carries data */ - if (bio_has_data(bio) && req_gap_to_prev(rq, bio)) - return false; - return true; } -- Jens Axboe ^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 3/3] block: Refuse adding appending a gapped integrity page to a bio 2015-09-03 15:04 ` Jens Axboe @ 2015-09-03 15:41 ` Sagi Grimberg 2015-09-03 15:52 ` Jens Axboe 0 siblings, 1 reply; 29+ messages in thread From: Sagi Grimberg @ 2015-09-03 15:41 UTC (permalink / raw) On 9/3/2015 6:04 PM, Jens Axboe wrote: > On Thu, Sep 03 2015, Sagi Grimberg wrote: >> On 9/2/2015 10:18 PM, Jens Axboe wrote: >>> On Wed, Sep 02 2015, Sagi Grimberg wrote: >>>>> Does the below work for you? >>>> >>>> It's not on top of Keith virt_boundary patch right? >>>> First glance looks ok though. >>> >>> Updated for that. >>> >> >> Thanks Jens, >> >> I'll test it. > > Cleaned up version, unified the gap checking and changed the names of > the function to make it easier to understand. And then we only need to > check bio_has_data() in bio_will_gap(). :) I was just going to say that maybe we should keep the check explicit.. I am going to fix this also for integrity payload [1], and there I need to check for blk_integrity_rq(req) explicitly because it doesn't really makes sense to call an integrity gap checker if you don't have integrity... Also, lets move the gap helpers to be static inline in blkdev.h so I can put my integrity gap helpers there too. Anyway, its just cosmetics... Let me include your patch in a series I'm planning on sending soon enough as I don't see merges anymore so I guess the issue is gone now. [1]: commit 48679ce5f6ffe6827d40287b521ea139cdf95ff7 Author: Sagi Grimberg <sagig at mellanox.com> Date: Wed Jul 15 15:06:04 2015 +0300 block: Refuse request/bio merges with gaps in the integrity payload If a driver sets the block queue virtual boundary, it means that it cannot handle gaps so we must not allow those in the integrity payload as well. Signed-off-by: Sagi Grimberg <sagig at mellanox.com> diff --git a/block/blk-integrity.c b/block/blk-integrity.c index f548b64..eee1d74 100644 --- a/block/blk-integrity.c +++ b/block/blk-integrity.c @@ -204,6 +204,9 @@ bool blk_integrity_merge_rq(struct request_queue *q, struct request *req, q->limits.max_integrity_segments) return false; + if (integrity_req_gap_to_prev(req, next->bio)) + return false; + return true; } EXPORT_SYMBOL(blk_integrity_merge_rq); diff --git a/block/blk-merge.c b/block/blk-merge.c index e6b426f..1a3f105 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -315,6 +315,10 @@ int ll_back_merge_fn(struct request_queue *q, struct request *req, if (bio_has_data(bio) && req_gap_to_prev(req, bio)) return 0; + if (blk_integrity_rq(req) && + integrity_req_gap_to_prev(req, bio)) + return 0; + if (blk_rq_sectors(req) + bio_sectors(bio) > blk_rq_get_max_sectors(req)) { blk_rq_get_max_sectors(req)) { req->cmd_flags |= REQ_NOMERGE; @@ -336,6 +340,10 @@ int ll_front_merge_fn(struct request_queue *q, struct request *req, if (bio_has_data(bio) && req_gap_to_next(req, bio)) return 0; + if (blk_integrity_rq(req) && + integrity_req_gap_to_next(req, bio)) + return 0; + if (blk_rq_sectors(req) + bio_sectors(bio) > blk_rq_get_max_sectors(req)) { req->cmd_flags |= REQ_NOMERGE; diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index b1bb60a..ad5a21a 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1389,6 +1389,26 @@ static inline bool req_gap_to_next(struct request *req, struct bio *bio) next->bi_io_vec[0].bv_offset); } +static inline bool integrity_req_gap_to_prev(struct request *req, + struct bio *next) +{ + struct bio_integrity_payload *bip = bio_integrity(req->bio); + struct bio_integrity_payload *bip_next = bio_integrity(next); + + return bvec_gap_to_prev(req->q, &bip->bip_vec[bip->bip_vcnt - 1], + bip_next->bip_vec[0].bv_offset); +} + +static inline bool integrity_req_gap_to_next(struct request *req, + struct bio *bio) +{ + struct bio_integrity_payload *bip = bio_integrity(bio); + struct bio_integrity_payload *bip_next = bio_integrity(req->bio); + + return bvec_gap_to_prev(req->q, &bip->bip_vec[bip->bip_vcnt - 1], + bip_next->bip_vec[0].bv_offset); +} struct work_struct; int kblockd_schedule_work(struct work_struct *work); int kblockd_schedule_delayed_work(struct delayed_work *dwork, unsigned long delay); -- ^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 3/3] block: Refuse adding appending a gapped integrity page to a bio 2015-09-03 15:41 ` Sagi Grimberg @ 2015-09-03 15:52 ` Jens Axboe 0 siblings, 0 replies; 29+ messages in thread From: Jens Axboe @ 2015-09-03 15:52 UTC (permalink / raw) On 09/03/2015 09:41 AM, Sagi Grimberg wrote: > On 9/3/2015 6:04 PM, Jens Axboe wrote: >> On Thu, Sep 03 2015, Sagi Grimberg wrote: >>> On 9/2/2015 10:18 PM, Jens Axboe wrote: >>>> On Wed, Sep 02 2015, Sagi Grimberg wrote: >>>>>> Does the below work for you? >>>>> >>>>> It's not on top of Keith virt_boundary patch right? >>>>> First glance looks ok though. >>>> >>>> Updated for that. >>>> >>> >>> Thanks Jens, >>> >>> I'll test it. >> >> Cleaned up version, unified the gap checking and changed the names of >> the function to make it easier to understand. And then we only need to >> check bio_has_data() in bio_will_gap(). > > :) > > I was just going to say that maybe we should keep the check > explicit.. Hehe. I think it makes sense to check whether it carries data or not in the gap check, it's somewhat different than the integrity check. > I am going to fix this also for integrity payload [1], and there > I need to check for blk_integrity_rq(req) explicitly because > it doesn't really makes sense to call an integrity gap checker > if you don't have integrity... > > Also, lets move the gap helpers to be static inline in blkdev.h > so I can put my integrity gap helpers there too. > > Anyway, its just cosmetics... > > Let me include your patch in a series I'm planning on > sending soon enough as I don't see merges anymore so I guess > the issue is gone now. Updated patch below. Let me know when you have sufficient confidence in it working, and I will add your reviewed/tested/whatever-by and we can queue it up for this series. > commit 48679ce5f6ffe6827d40287b521ea139cdf95ff7 > Author: Sagi Grimberg <sagig at mellanox.com> > Date: Wed Jul 15 15:06:04 2015 +0300 > > block: Refuse request/bio merges with gaps in the integrity payload > > If a driver sets the block queue virtual boundary, it means that > it cannot handle gaps so we must not allow those in the integrity > payload as well. > > Signed-off-by: Sagi Grimberg <sagig at mellanox.com> > > diff --git a/block/blk-integrity.c b/block/blk-integrity.c > index f548b64..eee1d74 100644 > --- a/block/blk-integrity.c > +++ b/block/blk-integrity.c > @@ -204,6 +204,9 @@ bool blk_integrity_merge_rq(struct request_queue *q, > struct request *req, > q->limits.max_integrity_segments) > return false; > > + if (integrity_req_gap_to_prev(req, next->bio)) > + return false; > + > return true; > } > EXPORT_SYMBOL(blk_integrity_merge_rq); > diff --git a/block/blk-merge.c b/block/blk-merge.c > index e6b426f..1a3f105 100644 > --- a/block/blk-merge.c > +++ b/block/blk-merge.c > @@ -315,6 +315,10 @@ int ll_back_merge_fn(struct request_queue *q, > struct request *req, > if (bio_has_data(bio) && req_gap_to_prev(req, bio)) > return 0; > > + if (blk_integrity_rq(req) && > + integrity_req_gap_to_prev(req, bio)) > + return 0; > + > if (blk_rq_sectors(req) + bio_sectors(bio) > > blk_rq_get_max_sectors(req)) { > blk_rq_get_max_sectors(req)) { > req->cmd_flags |= REQ_NOMERGE; > @@ -336,6 +340,10 @@ int ll_front_merge_fn(struct request_queue *q, > struct request *req, > if (bio_has_data(bio) && req_gap_to_next(req, bio)) > return 0; > > + if (blk_integrity_rq(req) && > + integrity_req_gap_to_next(req, bio)) > + return 0; > + > if (blk_rq_sectors(req) + bio_sectors(bio) > > blk_rq_get_max_sectors(req)) { > req->cmd_flags |= REQ_NOMERGE; > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h > index b1bb60a..ad5a21a 100644 > --- a/include/linux/blkdev.h > +++ b/include/linux/blkdev.h > @@ -1389,6 +1389,26 @@ static inline bool req_gap_to_next(struct request > *req, struct bio *bio) > next->bi_io_vec[0].bv_offset); > } > > +static inline bool integrity_req_gap_to_prev(struct request *req, > + struct bio *next) > +{ > + struct bio_integrity_payload *bip = bio_integrity(req->bio); > + struct bio_integrity_payload *bip_next = bio_integrity(next); > + > + return bvec_gap_to_prev(req->q, &bip->bip_vec[bip->bip_vcnt - 1], > + bip_next->bip_vec[0].bv_offset); > +} > + > +static inline bool integrity_req_gap_to_next(struct request *req, > + struct bio *bio) > +{ > + struct bio_integrity_payload *bip = bio_integrity(bio); > + struct bio_integrity_payload *bip_next = bio_integrity(req->bio); > + > + return bvec_gap_to_prev(req->q, &bip->bip_vec[bip->bip_vcnt - 1], > + bip_next->bip_vec[0].bv_offset); > +} > struct work_struct; > int kblockd_schedule_work(struct work_struct *work); > int kblockd_schedule_delayed_work(struct delayed_work *dwork, unsigned > long delay); Looks good to me. diff --git a/block/blk-merge.c b/block/blk-merge.c index cce23ba1ae5f..d9eddbc189f5 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -438,6 +438,8 @@ no_merge: int ll_back_merge_fn(struct request_queue *q, struct request *req, struct bio *bio) { + if (req_gap_back_merge(req, bio)) + return 0; if (blk_rq_sectors(req) + bio_sectors(bio) > blk_rq_get_max_sectors(req)) { req->cmd_flags |= REQ_NOMERGE; @@ -456,6 +458,9 @@ int ll_back_merge_fn(struct request_queue *q, struct request *req, int ll_front_merge_fn(struct request_queue *q, struct request *req, struct bio *bio) { + + if (req_gap_front_merge(req, bio)) + return 0; if (blk_rq_sectors(req) + bio_sectors(bio) > blk_rq_get_max_sectors(req)) { req->cmd_flags |= REQ_NOMERGE; @@ -482,14 +487,6 @@ static bool req_no_special_merge(struct request *req) return !q->mq_ops && req->special; } -static int req_gap_to_prev(struct request *req, struct bio *next) -{ - struct bio *prev = req->biotail; - - return bvec_gap_to_prev(req->q, &prev->bi_io_vec[prev->bi_vcnt - 1], - next->bi_io_vec[0].bv_offset); -} - static int ll_merge_requests_fn(struct request_queue *q, struct request *req, struct request *next) { @@ -504,7 +501,7 @@ static int ll_merge_requests_fn(struct request_queue *q, struct request *req, if (req_no_special_merge(req) || req_no_special_merge(next)) return 0; - if (req_gap_to_prev(req, next->bio)) + if (req_gap_back_merge(req, next->bio)) return 0; /* @@ -712,10 +709,6 @@ bool blk_rq_merge_ok(struct request *rq, struct bio *bio) !blk_write_same_mergeable(rq->bio, bio)) return false; - /* Only check gaps if the bio carries data */ - if (bio_has_data(bio) && req_gap_to_prev(rq, bio)) - return false; - return true; } diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index a622f270f09e..0a362ed03c99 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1368,6 +1368,26 @@ static inline bool bvec_gap_to_prev(struct request_queue *q, ((bprv->bv_offset + bprv->bv_len) & queue_virt_boundary(q)); } +static inline bool bio_will_gap(struct request_queue *q, struct bio *prev, + struct bio *next) +{ + if (!bio_has_data(prev)) + return false; + + return bvec_gap_to_prev(q, &prev->bi_io_vec[prev->bi_vcnt - 1], + next->bi_io_vec[0].bv_offset); +} + +static inline bool req_gap_back_merge(struct request *req, struct bio *next) +{ + return bio_will_gap(req->q, req->biotail, next); +} + +static inline bool req_gap_front_merge(struct request *req, struct bio *bio) +{ + return bio_will_gap(req->q, bio, req->bio); +} + struct work_struct; int kblockd_schedule_work(struct work_struct *work); int kblockd_schedule_delayed_work(struct delayed_work *dwork, unsigned long delay); -- Jens Axboe ^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 3/3] block: Refuse adding appending a gapped integrity page to a bio 2015-07-16 9:26 ` Christoph Hellwig 2015-07-16 15:58 ` Jens Axboe @ 2015-07-17 1:50 ` Martin K. Petersen 1 sibling, 0 replies; 29+ messages in thread From: Martin K. Petersen @ 2015-07-17 1:50 UTC (permalink / raw) >>>>> "Christoph" == Christoph Hellwig <hch at infradead.org> writes: Christoph> In the long run we'll need to refactor it by adding bio_vec Christoph> iteration helpers that can pick different anchors from the Christoph> bio. Yep. Always wanted to avoid that duplication. Keith's work made it easier. I just haven't taken things one step further yet... -- Martin K. Petersen Oracle Linux Engineering ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 0/3] Fixes for gapped scatters 2015-07-15 13:19 [PATCH 0/3] Fixes for gapped scatters Sagi Grimberg ` (2 preceding siblings ...) 2015-07-15 13:19 ` [PATCH 3/3] block: Refuse adding appending a gapped integrity page to a bio Sagi Grimberg @ 2015-07-17 1:39 ` Martin K. Petersen 3 siblings, 0 replies; 29+ messages in thread From: Martin K. Petersen @ 2015-07-17 1:39 UTC (permalink / raw) >>>>> "Sagi" == Sagi Grimberg <sagig at mellanox.com> writes: Sagi> This small set addresses some issues with gapped scattered Sagi> IOs. The motivation of this is to have the iser driver take Sagi> advantage of it too. Sagi> Patch 1 prevent gapped SG_IO if the driver can't handle gaps. Sagi> Patches 2,3 prevent gapped integrity payloads (for drivers that Sagi> support data integrity). Back in the prototyping days of DIX we dealt with several controllers that had various oddball scatterlist requirements for PI. The only workaround we found that performed acceptably was to grab a page and bounce the PI to that. In the end none of these controllers ended up shipping, and DIX explicitly requires a controller to be able to handle a separate scatterlist element per PI interval. I know that for NVMe latency is king and so on. But I do question the sanity of having to issue a flurry 512+8 I/Os instead of quarter of a meg and a page of PI. You're at the whim of the kernel memory allocator and for a long time slab or slub was allocating backwards so you never ended up being able to merge PI scatterlist entries. Also, if you were to go down this path you'd have to handle cases where a bio itself would need to get split. The PI for two contiguous 512-byte data blocks in a bio_vec entry could reside in separate integrity pages at offsets that would make them "gappy". -- Martin K. Petersen Oracle Linux Engineering ^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2015-09-03 15:52 UTC | newest] Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-07-15 13:19 [PATCH 0/3] Fixes for gapped scatters Sagi Grimberg 2015-07-15 13:19 ` [PATCH 1/3] block: copy multi iovec user mappings if QUEUE_FLAG_SG_GAPS is set Sagi Grimberg 2015-07-16 16:47 ` Keith Busch 2015-07-16 16:49 ` Jens Axboe 2015-07-16 19:47 ` Matthew Wilcox 2015-07-17 15:26 ` Jens Axboe 2015-07-17 7:44 ` Christoph Hellwig 2015-07-17 7:43 ` Christoph Hellwig 2015-07-15 13:19 ` [PATCH 2/3] block: Refuse request/bio merges with gaps in the integrity payload Sagi Grimberg 2015-07-15 13:19 ` [PATCH 3/3] block: Refuse adding appending a gapped integrity page to a bio Sagi Grimberg 2015-07-15 15:28 ` Jens Axboe 2015-07-16 9:26 ` Christoph Hellwig 2015-07-16 15:58 ` Jens Axboe 2015-07-19 15:18 ` Sagi Grimberg 2015-08-19 9:40 ` Christoph Hellwig 2015-08-19 10:30 ` Sagi Grimberg 2015-08-19 10:42 ` Christoph Hellwig 2015-09-02 8:04 ` Sagi Grimberg 2015-09-02 14:37 ` Jens Axboe 2015-09-02 17:30 ` Sagi Grimberg 2015-09-02 18:03 ` Jens Axboe 2015-09-02 19:18 ` Jens Axboe 2015-09-03 9:07 ` Sagi Grimberg 2015-09-03 14:53 ` Jens Axboe 2015-09-03 15:04 ` Jens Axboe 2015-09-03 15:41 ` Sagi Grimberg 2015-09-03 15:52 ` Jens Axboe 2015-07-17 1:50 ` Martin K. Petersen 2015-07-17 1:39 ` [PATCH 0/3] Fixes for gapped scatters Martin K. Petersen
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.