All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch] dm: propagate QUEUE_FLAG_NO_SG_MERGE
@ 2014-08-08 15:03 Jeff Moyer
  2014-08-08 15:24 ` Greg KH
  2014-08-11  1:11 ` Mike Snitzer
  0 siblings, 2 replies; 5+ messages in thread
From: Jeff Moyer @ 2014-08-08 15:03 UTC (permalink / raw)
  To: axboe, dm-devel, linux-kernel; +Cc: stable

Hi,

Commit 05f1dd5 introduced a new queue flag: QUEUE_FLAG_NO_SG_MERGE.
This gets set by default in blk_mq_init_queue for mq-enabled devices.
The effect of the flag is to bypass the SG segment merging.  Instead,
the bio->bi_vcnt is used as the number of hardware segments.

With a device mapper target on top of a device with
QUEUE_FLAG_NO_SG_MERGE set, we can end up sending down more segments
than a driver is prepared to handle.  I ran into this when backporting
the virtio_blk mq support.  It triggerred this BUG_ON, in
virtio_queue_rq:

        BUG_ON(req->nr_phys_segments + 2 > vblk->sg_elems);

The queue's max is set here:
        blk_queue_max_segments(q, vblk->sg_elems-2);

Basically, what happens is that a bio is built up for the dm device
(which does not have the QUEUE_FLAG_NO_SG_MERGE flag set) using
bio_add_page.  That path will call into __blk_recalc_rq_segments, so
what you end up with is bi_phys_segments being much smaller than bi_vcnt
(and bi_vcnt grows beyond the maximum sg elements).  Then, when the bio
is submitted, it gets cloned.  When the cloned bio is submitted, it will
end up in blk_recount_segments, here:

        if (test_bit(QUEUE_FLAG_NO_SG_MERGE, &q->queue_flags))
                bio->bi_phys_segments = bio->bi_vcnt;

and now we've set bio->bi_phys_segments to a number that is beyond what
was registered as queue_max_segments by the driver.

The right way to fix this is to propagate the queue flag up the stack.
Attached is a patch that does this, tested and confirmed to fix the
problem in my environment.

The rules for propagating the flag are simple:
- if the flag is set for any underlying device, it must be set for the
  upper device
- consequently, if the flag is not set for any underlying device, it
  should not be set for the upper device.

stable notes: this patch should be applied to 3.16.

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>

diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 5f59f1e..bf756d1 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -1386,6 +1386,14 @@ static int device_is_not_random(struct dm_target *ti, struct dm_dev *dev,
 	return q && !blk_queue_add_random(q);
 }
 
+static int queue_supports_sg_merge(struct dm_target *ti, struct dm_dev *dev,
+				   sector_t start, sector_t len, void *data)
+{
+	struct request_queue *q = bdev_get_queue(dev->bdev);
+
+	return q && !test_bit(QUEUE_FLAG_NO_SG_MERGE, &q->queue_flags);
+}
+
 static bool dm_table_all_devices_attribute(struct dm_table *t,
 					   iterate_devices_callout_fn func)
 {
@@ -1464,6 +1472,9 @@ void dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
 	if (!dm_table_supports_write_same(t))
 		q->limits.max_write_same_sectors = 0;
 
+	if (!dm_table_all_devices_attribute(t, queue_supports_sg_merge))
+		queue_flag_set_unlocked(QUEUE_FLAG_NO_SG_MERGE, q);
+
 	dm_table_set_integrity(t);
 
 	/*

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [patch] dm: propagate QUEUE_FLAG_NO_SG_MERGE
  2014-08-08 15:03 [patch] dm: propagate QUEUE_FLAG_NO_SG_MERGE Jeff Moyer
@ 2014-08-08 15:24 ` Greg KH
  2014-08-11  1:11 ` Mike Snitzer
  1 sibling, 0 replies; 5+ messages in thread
From: Greg KH @ 2014-08-08 15:24 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: axboe, dm-devel, linux-kernel, stable

On Fri, Aug 08, 2014 at 11:03:41AM -0400, Jeff Moyer wrote:
> Hi,
> 
> Commit 05f1dd5 introduced a new queue flag: QUEUE_FLAG_NO_SG_MERGE.
> This gets set by default in blk_mq_init_queue for mq-enabled devices.
> The effect of the flag is to bypass the SG segment merging.  Instead,
> the bio->bi_vcnt is used as the number of hardware segments.
> 
> With a device mapper target on top of a device with
> QUEUE_FLAG_NO_SG_MERGE set, we can end up sending down more segments
> than a driver is prepared to handle.  I ran into this when backporting
> the virtio_blk mq support.  It triggerred this BUG_ON, in
> virtio_queue_rq:
> 
>         BUG_ON(req->nr_phys_segments + 2 > vblk->sg_elems);
> 
> The queue's max is set here:
>         blk_queue_max_segments(q, vblk->sg_elems-2);
> 
> Basically, what happens is that a bio is built up for the dm device
> (which does not have the QUEUE_FLAG_NO_SG_MERGE flag set) using
> bio_add_page.  That path will call into __blk_recalc_rq_segments, so
> what you end up with is bi_phys_segments being much smaller than bi_vcnt
> (and bi_vcnt grows beyond the maximum sg elements).  Then, when the bio
> is submitted, it gets cloned.  When the cloned bio is submitted, it will
> end up in blk_recount_segments, here:
> 
>         if (test_bit(QUEUE_FLAG_NO_SG_MERGE, &q->queue_flags))
>                 bio->bi_phys_segments = bio->bi_vcnt;
> 
> and now we've set bio->bi_phys_segments to a number that is beyond what
> was registered as queue_max_segments by the driver.
> 
> The right way to fix this is to propagate the queue flag up the stack.
> Attached is a patch that does this, tested and confirmed to fix the
> problem in my environment.
> 
> The rules for propagating the flag are simple:
> - if the flag is set for any underlying device, it must be set for the
>   upper device
> - consequently, if the flag is not set for any underlying device, it
>   should not be set for the upper device.
> 
> stable notes: this patch should be applied to 3.16.
> 
> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
> 

<formletter>

This is not the correct way to submit patches for inclusion in the
stable kernel tree.  Please read Documentation/stable_kernel_rules.txt
for how to do this properly.

</formletter>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: dm: propagate QUEUE_FLAG_NO_SG_MERGE
  2014-08-08 15:03 [patch] dm: propagate QUEUE_FLAG_NO_SG_MERGE Jeff Moyer
  2014-08-08 15:24 ` Greg KH
@ 2014-08-11  1:11 ` Mike Snitzer
  2014-08-11 14:40   ` Jeff Moyer
  1 sibling, 1 reply; 5+ messages in thread
From: Mike Snitzer @ 2014-08-11  1:11 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: axboe, dm-devel, linux-kernel, stable

On Fri, Aug 08 2014 at 11:03am -0400,
Jeff Moyer <jmoyer@redhat.com> wrote:

> Hi,
> 
> Commit 05f1dd5 introduced a new queue flag: QUEUE_FLAG_NO_SG_MERGE.
> This gets set by default in blk_mq_init_queue for mq-enabled devices.
> The effect of the flag is to bypass the SG segment merging.  Instead,
> the bio->bi_vcnt is used as the number of hardware segments.
> 
> With a device mapper target on top of a device with
> QUEUE_FLAG_NO_SG_MERGE set, we can end up sending down more segments
> than a driver is prepared to handle.  I ran into this when backporting
> the virtio_blk mq support.  It triggerred this BUG_ON, in
> virtio_queue_rq:
> 
>         BUG_ON(req->nr_phys_segments + 2 > vblk->sg_elems);
> 
> The queue's max is set here:
>         blk_queue_max_segments(q, vblk->sg_elems-2);
> 
> Basically, what happens is that a bio is built up for the dm device
> (which does not have the QUEUE_FLAG_NO_SG_MERGE flag set) using
> bio_add_page.  That path will call into __blk_recalc_rq_segments, so
> what you end up with is bi_phys_segments being much smaller than bi_vcnt
> (and bi_vcnt grows beyond the maximum sg elements).  Then, when the bio
> is submitted, it gets cloned.  When the cloned bio is submitted, it will
> end up in blk_recount_segments, here:
> 
>         if (test_bit(QUEUE_FLAG_NO_SG_MERGE, &q->queue_flags))
>                 bio->bi_phys_segments = bio->bi_vcnt;
> 
> and now we've set bio->bi_phys_segments to a number that is beyond what
> was registered as queue_max_segments by the driver.
> 
> The right way to fix this is to propagate the queue flag up the stack.
> Attached is a patch that does this, tested and confirmed to fix the
> problem in my environment.
> 
> The rules for propagating the flag are simple:
> - if the flag is set for any underlying device, it must be set for the
>   upper device
> - consequently, if the flag is not set for any underlying device, it
>   should not be set for the upper device.

Hi Jeff,

Thanks for the patch.  But I think you need the following tweak.
Otherwise if the DM table is reloaded (and the devices in the table
happen to change) the flag won't get cleared as needed.

I've folded this in and staged it in linux-next for 3.17 here:
https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=for-next&id=200612ec33e555a356eebc717630b866ae2b694f

diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index feedafd..f9c6cb8 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -1509,7 +1509,9 @@ void dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
 	if (!dm_table_supports_write_same(t))
 		q->limits.max_write_same_sectors = 0;
 
-	if (!dm_table_all_devices_attribute(t, queue_supports_sg_merge))
+	if (dm_table_all_devices_attribute(t, queue_supports_sg_merge))
+		queue_flag_clear_unlocked(QUEUE_FLAG_NO_SG_MERGE, q);
+	else
 		queue_flag_set_unlocked(QUEUE_FLAG_NO_SG_MERGE, q);
 
 	dm_table_set_integrity(t);

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: dm: propagate QUEUE_FLAG_NO_SG_MERGE
  2014-08-11  1:11 ` Mike Snitzer
@ 2014-08-11 14:40   ` Jeff Moyer
  2014-08-15 18:47     ` Jens Axboe
  0 siblings, 1 reply; 5+ messages in thread
From: Jeff Moyer @ 2014-08-11 14:40 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: axboe, dm-devel, linux-kernel

Mike Snitzer <snitzer@redhat.com> writes:

> Hi Jeff,
>
> Thanks for the patch.  But I think you need the following tweak.
> Otherwise if the DM table is reloaded (and the devices in the table
> happen to change) the flag won't get cleared as needed.

Ah, thanks for the education.  Looks good to me.

Thanks!
Jeff

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: dm: propagate QUEUE_FLAG_NO_SG_MERGE
  2014-08-11 14:40   ` Jeff Moyer
@ 2014-08-15 18:47     ` Jens Axboe
  0 siblings, 0 replies; 5+ messages in thread
From: Jens Axboe @ 2014-08-15 18:47 UTC (permalink / raw)
  To: Jeff Moyer, Mike Snitzer; +Cc: dm-devel, linux-kernel

On 08/11/2014 08:40 AM, Jeff Moyer wrote:
> Mike Snitzer <snitzer@redhat.com> writes:
> 
>> Hi Jeff,
>>
>> Thanks for the patch.  But I think you need the following tweak.
>> Otherwise if the DM table is reloaded (and the devices in the table
>> happen to change) the flag won't get cleared as needed.
> 
> Ah, thanks for the education.  Looks good to me.

Ditto, you can add my acked-by, if not already queued up etc.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-08-15 18:47 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-08 15:03 [patch] dm: propagate QUEUE_FLAG_NO_SG_MERGE Jeff Moyer
2014-08-08 15:24 ` Greg KH
2014-08-11  1:11 ` Mike Snitzer
2014-08-11 14:40   ` Jeff Moyer
2014-08-15 18:47     ` Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.