All of lore.kernel.org
 help / color / mirror / Atom feed
* fragmented i/o with 2.6.31?
@ 2009-09-16 15:34 David Strand
  2009-09-16 16:22 ` David Strand
  0 siblings, 1 reply; 12+ messages in thread
From: David Strand @ 2009-09-16 15:34 UTC (permalink / raw)
  To: device-mapper development

I am issuing 512 Kbyte reads through the device mapper device node to
a fibre channel disk. With 2.6.30 one read command for the entire 512
Kbyte length is placed on the wire. With 2.6.31 this is being broken
up into 5 smaller read commands placed on the wire, decreasing
performance.

This is especially penalizing on some disks where we have prefetch
turned off via the scsi mode page. Is there any easy way (through
configuration or sysfs) to restore the single read per i/o behavior
that I used to get?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: fragmented i/o with 2.6.31?
  2009-09-16 15:34 fragmented i/o with 2.6.31? David Strand
@ 2009-09-16 16:22 ` David Strand
  2009-09-17  8:02   ` Kiyoshi Ueda
  0 siblings, 1 reply; 12+ messages in thread
From: David Strand @ 2009-09-16 16:22 UTC (permalink / raw)
  To: device-mapper development

I should note that I am using dm-mpath, and the i/o is fragmented on
the wire when using the device mapper device node but it is not
fragmented when using one of the regular /dev/sd* device nodes for
that device.

On Wed, Sep 16, 2009 at 8:34 AM, David Strand <dpstrand@gmail.com> wrote:
> I am issuing 512 Kbyte reads through the device mapper device node to
> a fibre channel disk. With 2.6.30 one read command for the entire 512
> Kbyte length is placed on the wire. With 2.6.31 this is being broken
> up into 5 smaller read commands placed on the wire, decreasing
> performance.
>
> This is especially penalizing on some disks where we have prefetch
> turned off via the scsi mode page. Is there any easy way (through
> configuration or sysfs) to restore the single read per i/o behavior
> that I used to get?
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: fragmented i/o with 2.6.31?
  2009-09-16 16:22 ` David Strand
@ 2009-09-17  8:02   ` Kiyoshi Ueda
  2009-09-17  9:14     ` Jun'ichi Nomura
  0 siblings, 1 reply; 12+ messages in thread
From: Kiyoshi Ueda @ 2009-09-17  8:02 UTC (permalink / raw)
  To: David Strand, Mike Snitzer, Alasdair Kergon; +Cc: device-mapper development

Hi David, Mike, Alasdair,

On 09/17/2009 01:22 AM +0900, David Strand wrote:
> On Wed, Sep 16, 2009 at 8:34 AM, David Strand <dpstrand@gmail.com> wrote:
>> I am issuing 512 Kbyte reads through the device mapper device node to
>> a fibre channel disk. With 2.6.30 one read command for the entire 512
>> Kbyte length is placed on the wire. With 2.6.31 this is being broken
>> up into 5 smaller read commands placed on the wire, decreasing
>> performance.
>>
>> This is especially penalizing on some disks where we have prefetch
>> turned off via the scsi mode page. Is there any easy way (through
>> configuration or sysfs) to restore the single read per i/o behavior
>> that I used to get?
>
> I should note that I am using dm-mpath, and the i/o is fragmented on
> the wire when using the device mapper device node but it is not
> fragmented when using one of the regular /dev/sd* device nodes for
> that device.

David,
Thank you for reporting this.
I found on my test machine that max_sectors is set to SAFE_MAX_SECTORS,
which limits the I/O size small.
The attached patch fixes it.  I guess the patch (and increasing
read-ahead size in /sys/block/dm-<n>/queue/read_ahead_kb) will solve
your fragmentation issue.  Please try it.


Mike, Alasdair,
I found that max_sectors and max_hw_sectors of dm device are set
in smaller values than those of underlying devices.  E.g:
    # cat /sys/block/sdj/queue/max_sectors_kb
    512
    # cat /sys/block/sdj/queue/max_hw_sectors_kb
    32767
    # echo "0 10 linear /dev/sdj 0" | dmsetup create test
    # cat /sys/block/dm-0/queue/max_sectors_kb
    127
    # cat /sys/block/dm-0/queue/max_hw_sectors_kb
    127
This prevents the I/O size of struct request from becoming enough big
size, and causes undesired request fragmentation in request-based dm.

This should be caused by the queue_limits stacking.
In dm_calculate_queue_limits(), the block-layer's small default size
is included in the merging process of target's queue_limits.
So underlying queue_limits is not propagated correctly.

I think initializing default values of all max_* in '0' is an easy fix.
Do you think my patch is acceptable?
Any other idea to fix this problem?

Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: David Strand <dpstrand@gmail.com>
Cc: Mike Snitzer <snitzer@redhat.com>,
Cc: Alasdair G Kergon <agk@redhat.com>
---
 drivers/md/dm-table.c |    4 ++++
 1 file changed, 4 insertions(+)

Index: 2.6.31/drivers/md/dm-table.c
===================================================================
--- 2.6.31.orig/drivers/md/dm-table.c
+++ 2.6.31/drivers/md/dm-table.c
@@ -992,9 +992,13 @@ int dm_calculate_queue_limits(struct dm_
 	unsigned i = 0;
 
 	blk_set_default_limits(limits);
+	limits->max_sectors = 0;
+	limits->max_hw_sectors = 0;
 
 	while (i < dm_table_get_num_targets(table)) {
 		blk_set_default_limits(&ti_limits);
+		ti_limits.max_sectors = 0;
+		ti_limits.max_hw_sectors = 0;
 
 		ti = dm_table_get_target(table, i++);
 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: fragmented i/o with 2.6.31?
  2009-09-17  8:02   ` Kiyoshi Ueda
@ 2009-09-17  9:14     ` Jun'ichi Nomura
  2009-09-17 13:11       ` Mike Snitzer
  0 siblings, 1 reply; 12+ messages in thread
From: Jun'ichi Nomura @ 2009-09-17  9:14 UTC (permalink / raw)
  To: device-mapper development, Mike Snitzer, Alasdair Kergon; +Cc: Jens Axboe

Hi Mike, Alasdair,

Kiyoshi Ueda wrote:
> On 09/17/2009 01:22 AM +0900, David Strand wrote:
>> On Wed, Sep 16, 2009 at 8:34 AM, David Strand <dpstrand@gmail.com> wrote:
>>> I am issuing 512 Kbyte reads through the device mapper device node to
>>> a fibre channel disk. With 2.6.30 one read command for the entire 512
>>> Kbyte length is placed on the wire. With 2.6.31 this is being broken
>>> up into 5 smaller read commands placed on the wire, decreasing
>>> performance.
>>>
>>> This is especially penalizing on some disks where we have prefetch
>>> turned off via the scsi mode page. Is there any easy way (through
>>> configuration or sysfs) to restore the single read per i/o behavior
>>> that I used to get?
>>
>> I should note that I am using dm-mpath, and the i/o is fragmented on
>> the wire when using the device mapper device node but it is not
>> fragmented when using one of the regular /dev/sd* device nodes for
>> that device.
> 
> David,
> Thank you for reporting this.
> I found on my test machine that max_sectors is set to SAFE_MAX_SECTORS,
> which limits the I/O size small.
> The attached patch fixes it.  I guess the patch (and increasing
> read-ahead size in /sys/block/dm-<n>/queue/read_ahead_kb) will solve
> your fragmentation issue.  Please try it.
> 
> 
> Mike, Alasdair,
> I found that max_sectors and max_hw_sectors of dm device are set
> in smaller values than those of underlying devices.  E.g:
>     # cat /sys/block/sdj/queue/max_sectors_kb
>     512
>     # cat /sys/block/sdj/queue/max_hw_sectors_kb
>     32767
>     # echo "0 10 linear /dev/sdj 0" | dmsetup create test
>     # cat /sys/block/dm-0/queue/max_sectors_kb
>     127
>     # cat /sys/block/dm-0/queue/max_hw_sectors_kb
>     127
> This prevents the I/O size of struct request from becoming enough big
> size, and causes undesired request fragmentation in request-based dm.
> 
> This should be caused by the queue_limits stacking.
> In dm_calculate_queue_limits(), the block-layer's small default size
> is included in the merging process of target's queue_limits.
> So underlying queue_limits is not propagated correctly.
> 
> I think initializing default values of all max_* in '0' is an easy fix.
> Do you think my patch is acceptable?
> Any other idea to fix this problem?

Well, sorry, we jumped the gun..
The patch should work fine for dm-multipath but
setting '0' by default will cause problems on targets like 'zero' and
'error', which take no underlying device and use the default value.

>  	blk_set_default_limits(limits);
> +	limits->max_sectors = 0;
> +	limits->max_hw_sectors = 0;

So this should either set something very big (e.g. UINT_MAX)
or set 0 by default but change to a certain safe value, if the end
result of merging the limits is still 0.

Attached is a revised patch with the latter approach.
Please check this.
If the approach is fine, I think we should bring this up to Jens
whether to have these helpers in dm-table.c or move to block/blk-settings.c.

Thanks,
-- 
Jun'ichi Nomura, NEC Corporation

max_sectors and max_hw_sectors of dm device are set to smaller values
than those of underlying devices.  E.g:
    # cat /sys/block/sdj/queue/max_sectors_kb
    512
    # cat /sys/block/sdj/queue/max_hw_sectors_kb
    32767
    # echo "0 10 linear /dev/sdj 0" | dmsetup create test
    # cat /sys/block/dm-0/queue/max_sectors_kb
    127
    # cat /sys/block/dm-0/queue/max_hw_sectors_kb
    127
This prevents the I/O size of struct request from becoming large,
and causes undesired request fragmentation in request-based dm.

This is caused by the queue_limits stacking.
In dm_calculate_queue_limits(), the block-layer's safe default value
(SAFE_MAX_SECTORS, 255) is included in the merging process of target's
queue_limits. So underlying queue_limits is not propagated correctly.

Initialize default values of all max_*sectors to '0'
and change the limits to SAFE_MAX_SECTORS only if the value is still
'0' after merging.

Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: David Strand <dpstrand@gmail.com>
Cc: Mike Snitzer <snitzer@redhat.com>,
Cc: Alasdair G Kergon <agk@redhat.com>
---
 drivers/md/dm-table.c |   30 +++++++++++++++++++++++++++---
 1 file changed, 27 insertions(+), 3 deletions(-)

Index: linux-2.6.31/drivers/md/dm-table.c
===================================================================
--- linux-2.6.31.orig/drivers/md/dm-table.c
+++ linux-2.6.31/drivers/md/dm-table.c
@@ -647,6 +647,28 @@ int dm_split_args(int *argc, char ***arg
 }
 
 /*
+ * blk_stack_limits() chooses min_not_zero max_sectors value of underlying
+ * devices. So set the default to 0.
+ * Otherwise, the default SAFE_MAX_SECTORS dominates even if all underlying
+ * devices have max_sectors values larger than that.
+ */
+static void _set_default_limits_for_stacking(struct queue_limits *limits)
+{
+	blk_set_default_limits(limits);
+	limits->max_sectors = 0;
+	limits->max_hw_sectors = 0;
+}
+
+/* If there's no underlying device, use the default value in blockdev. */
+static void _adjust_limits_for_stacking(struct queue_limits *limits)
+{
+	if (limits->max_sectors == 0)
+		limits->max_sectors = SAFE_MAX_SECTORS;
+	if (limits->max_hw_sectors == 0)
+		limits->max_hw_sectors = SAFE_MAX_SECTORS;
+}
+
+/*
  * Impose necessary and sufficient conditions on a devices's table such
  * that any incoming bio which respects its logical_block_size can be
  * processed successfully.  If it falls across the boundary between
@@ -684,7 +706,7 @@ static int validate_hardware_logical_blo
 	while (i < dm_table_get_num_targets(table)) {
 		ti = dm_table_get_target(table, i++);
 
-		blk_set_default_limits(&ti_limits);
+		_set_default_limits_for_stacking(&ti_limits);
 
 		/* combine all target devices' limits */
 		if (ti->type->iterate_devices)
@@ -707,6 +729,8 @@ static int validate_hardware_logical_blo
 		    device_logical_block_size_sects - next_target_start : 0;
 	}
 
+	_adjust_limits_for_stacking(limits);
+
 	if (remaining) {
 		DMWARN("%s: table line %u (start sect %llu len %llu) "
 		       "not aligned to h/w logical block size %u",
@@ -991,10 +1015,10 @@ int dm_calculate_queue_limits(struct dm_
 	struct queue_limits ti_limits;
 	unsigned i = 0;
 
-	blk_set_default_limits(limits);
+	_set_default_limits_for_stacking(limits);
 
 	while (i < dm_table_get_num_targets(table)) {
-		blk_set_default_limits(&ti_limits);
+		_set_default_limits_for_stacking(&ti_limits);
 
 		ti = dm_table_get_target(table, i++);
 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: fragmented i/o with 2.6.31?
  2009-09-17  9:14     ` Jun'ichi Nomura
@ 2009-09-17 13:11       ` Mike Snitzer
  2009-09-17 17:32         ` David Strand
  2009-09-18  6:00         ` Martin K. Petersen
  0 siblings, 2 replies; 12+ messages in thread
From: Mike Snitzer @ 2009-09-17 13:11 UTC (permalink / raw)
  To: Jun'ichi Nomura, Jens Axboe, Martin K. Petersen
  Cc: device-mapper development, Alasdair Kergon

On Thu, Sep 17 2009 at  5:14am -0400,
Jun'ichi Nomura <j-nomura@ce.jp.nec.com> wrote:

> Hi Mike, Alasdair,
> 
> Kiyoshi Ueda wrote:
> > On 09/17/2009 01:22 AM +0900, David Strand wrote:
> >> On Wed, Sep 16, 2009 at 8:34 AM, David Strand <dpstrand@gmail.com> wrote:
> >>> I am issuing 512 Kbyte reads through the device mapper device node to
> >>> a fibre channel disk. With 2.6.30 one read command for the entire 512
> >>> Kbyte length is placed on the wire. With 2.6.31 this is being broken
> >>> up into 5 smaller read commands placed on the wire, decreasing
> >>> performance.
> >>>
> >>> This is especially penalizing on some disks where we have prefetch
> >>> turned off via the scsi mode page. Is there any easy way (through
> >>> configuration or sysfs) to restore the single read per i/o behavior
> >>> that I used to get?
> >>
> >> I should note that I am using dm-mpath, and the i/o is fragmented on
> >> the wire when using the device mapper device node but it is not
> >> fragmented when using one of the regular /dev/sd* device nodes for
> >> that device.
> > 
> > David,
> > Thank you for reporting this.
> > I found on my test machine that max_sectors is set to SAFE_MAX_SECTORS,
> > which limits the I/O size small.
> > The attached patch fixes it.  I guess the patch (and increasing
> > read-ahead size in /sys/block/dm-<n>/queue/read_ahead_kb) will solve
> > your fragmentation issue.  Please try it.
> > 
> > 
> > Mike, Alasdair,
> > I found that max_sectors and max_hw_sectors of dm device are set
> > in smaller values than those of underlying devices.  E.g:
> >     # cat /sys/block/sdj/queue/max_sectors_kb
> >     512
> >     # cat /sys/block/sdj/queue/max_hw_sectors_kb
> >     32767
> >     # echo "0 10 linear /dev/sdj 0" | dmsetup create test
> >     # cat /sys/block/dm-0/queue/max_sectors_kb
> >     127
> >     # cat /sys/block/dm-0/queue/max_hw_sectors_kb
> >     127
> > This prevents the I/O size of struct request from becoming enough big
> > size, and causes undesired request fragmentation in request-based dm.
> > 
> > This should be caused by the queue_limits stacking.
> > In dm_calculate_queue_limits(), the block-layer's small default size
> > is included in the merging process of target's queue_limits.
> > So underlying queue_limits is not propagated correctly.
> > 
> > I think initializing default values of all max_* in '0' is an easy fix.
> > Do you think my patch is acceptable?
> > Any other idea to fix this problem?
> 
> Well, sorry, we jumped the gun..
> The patch should work fine for dm-multipath but
> setting '0' by default will cause problems on targets like 'zero' and
> 'error', which take no underlying device and use the default value.
> 
> >  	blk_set_default_limits(limits);
> > +	limits->max_sectors = 0;
> > +	limits->max_hw_sectors = 0;
> 
> So this should either set something very big (e.g. UINT_MAX)
> or set 0 by default but change to a certain safe value, if the end
> result of merging the limits is still 0.
> 
> Attached is a revised patch with the latter approach.
> Please check this.
> If the approach is fine, I think we should bring this up to Jens
> whether to have these helpers in dm-table.c or move to block/blk-settings.c.
> 
> Thanks,
> -- 
> Jun'ichi Nomura, NEC Corporation

This revised patch looks good.  Thanks for sorting this out.

Seems like we may want some common variant in block even though I'm not
aware of other block drivers that would benefit...

But I'll defer to Martin and/or Jens on whether these helpers are fine
to stay in dm-table.c or should be worked into blk-settings.c

Martin/Jens, what do you guys think?


> max_sectors and max_hw_sectors of dm device are set to smaller values
> than those of underlying devices.  E.g:
>     # cat /sys/block/sdj/queue/max_sectors_kb
>     512
>     # cat /sys/block/sdj/queue/max_hw_sectors_kb
>     32767
>     # echo "0 10 linear /dev/sdj 0" | dmsetup create test
>     # cat /sys/block/dm-0/queue/max_sectors_kb
>     127
>     # cat /sys/block/dm-0/queue/max_hw_sectors_kb
>     127
> This prevents the I/O size of struct request from becoming large,
> and causes undesired request fragmentation in request-based dm.
> 
> This is caused by the queue_limits stacking.
> In dm_calculate_queue_limits(), the block-layer's safe default value
> (SAFE_MAX_SECTORS, 255) is included in the merging process of target's
> queue_limits. So underlying queue_limits is not propagated correctly.
> 
> Initialize default values of all max_*sectors to '0'
> and change the limits to SAFE_MAX_SECTORS only if the value is still
> '0' after merging.
> 
> Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
> Cc: David Strand <dpstrand@gmail.com>
> Cc: Mike Snitzer <snitzer@redhat.com>,
> Cc: Alasdair G Kergon <agk@redhat.com>
> ---
>  drivers/md/dm-table.c |   30 +++++++++++++++++++++++++++---
>  1 file changed, 27 insertions(+), 3 deletions(-)
> 
> Index: linux-2.6.31/drivers/md/dm-table.c
> ===================================================================
> --- linux-2.6.31.orig/drivers/md/dm-table.c
> +++ linux-2.6.31/drivers/md/dm-table.c
> @@ -647,6 +647,28 @@ int dm_split_args(int *argc, char ***arg
>  }
>  
>  /*
> + * blk_stack_limits() chooses min_not_zero max_sectors value of underlying
> + * devices. So set the default to 0.
> + * Otherwise, the default SAFE_MAX_SECTORS dominates even if all underlying
> + * devices have max_sectors values larger than that.
> + */
> +static void _set_default_limits_for_stacking(struct queue_limits *limits)
> +{
> +	blk_set_default_limits(limits);
> +	limits->max_sectors = 0;
> +	limits->max_hw_sectors = 0;
> +}
> +
> +/* If there's no underlying device, use the default value in blockdev. */
> +static void _adjust_limits_for_stacking(struct queue_limits *limits)
> +{
> +	if (limits->max_sectors == 0)
> +		limits->max_sectors = SAFE_MAX_SECTORS;
> +	if (limits->max_hw_sectors == 0)
> +		limits->max_hw_sectors = SAFE_MAX_SECTORS;
> +}
> +
> +/*
>   * Impose necessary and sufficient conditions on a devices's table such
>   * that any incoming bio which respects its logical_block_size can be
>   * processed successfully.  If it falls across the boundary between
> @@ -684,7 +706,7 @@ static int validate_hardware_logical_blo
>  	while (i < dm_table_get_num_targets(table)) {
>  		ti = dm_table_get_target(table, i++);
>  
> -		blk_set_default_limits(&ti_limits);
> +		_set_default_limits_for_stacking(&ti_limits);
>  
>  		/* combine all target devices' limits */
>  		if (ti->type->iterate_devices)
> @@ -707,6 +729,8 @@ static int validate_hardware_logical_blo
>  		    device_logical_block_size_sects - next_target_start : 0;
>  	}
>  
> +	_adjust_limits_for_stacking(limits);
> +
>  	if (remaining) {
>  		DMWARN("%s: table line %u (start sect %llu len %llu) "
>  		       "not aligned to h/w logical block size %u",
> @@ -991,10 +1015,10 @@ int dm_calculate_queue_limits(struct dm_
>  	struct queue_limits ti_limits;
>  	unsigned i = 0;
>  
> -	blk_set_default_limits(limits);
> +	_set_default_limits_for_stacking(limits);
>  
>  	while (i < dm_table_get_num_targets(table)) {
> -		blk_set_default_limits(&ti_limits);
> +		_set_default_limits_for_stacking(&ti_limits);
>  
>  		ti = dm_table_get_target(table, i++);
>  

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: fragmented i/o with 2.6.31?
  2009-09-17 13:11       ` Mike Snitzer
@ 2009-09-17 17:32         ` David Strand
  2009-09-18  6:00         ` Martin K. Petersen
  1 sibling, 0 replies; 12+ messages in thread
From: David Strand @ 2009-09-17 17:32 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Jun'ichi Nomura, device-mapper development, Alasdair Kergon,
	Martin K. Petersen, Jens Axboe

I have confirmed that this was the problem, and the patch fixes it, good job!

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Re: fragmented i/o with 2.6.31?
  2009-09-17 13:11       ` Mike Snitzer
  2009-09-17 17:32         ` David Strand
@ 2009-09-18  6:00         ` Martin K. Petersen
  2009-09-18 14:30           ` Jun'ichi Nomura
  2009-09-18 15:06           ` Mike Snitzer
  1 sibling, 2 replies; 12+ messages in thread
From: Martin K. Petersen @ 2009-09-18  6:00 UTC (permalink / raw)
  To: device-mapper development
  Cc: Jun'ichi Nomura, Alasdair Kergon, Martin K. Petersen, Jens Axboe

>>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:

>> >  	blk_set_default_limits(limits);
>> > + limits->max_sectors = 0;
>> > + limits->max_hw_sectors = 0;
>> 
Mike> Seems like we may want some common variant in block even though
Mike> I'm not aware of other block drivers that would benefit...

Mike> But I'll defer to Martin and/or Jens on whether these helpers are
Mike> fine to stay in dm-table.c or should be worked into blk-settings.c

In the pre-topology days we set max_sectors to SAFE_MAX_SECTORS upon
creation of a queue.  This is an old ATA-ism that's been around for a
ages.

Ideally we'd simply nuke it and drivers that really needed to lower the
bar would explicitly call blk_queue_max_sectors().  However, I'm afraid
to change the default because I'm sure there are legacy drivers lurking
somewhere that depend on it.

Seeing as blk_set_default_limits() is mostly aimed at stacking drivers I
think I'd prefer moving SAFE_MAX_SECTORS back to blk_queue_make_request
and then set max_sectors and max_hw_sectors to 0 in default_limits.

Would that work for you guys?

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Re: fragmented i/o with 2.6.31?
  2009-09-18  6:00         ` Martin K. Petersen
@ 2009-09-18 14:30           ` Jun'ichi Nomura
  2009-09-18 15:06           ` Mike Snitzer
  1 sibling, 0 replies; 12+ messages in thread
From: Jun'ichi Nomura @ 2009-09-18 14:30 UTC (permalink / raw)
  To: Martin K. Petersen; +Cc: device-mapper development, Alasdair Kergon, Jens Axboe

Martin K. Petersen wrote:
>>>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:
> 
>>>>  	blk_set_default_limits(limits);
>>>> + limits->max_sectors = 0;
>>>> + limits->max_hw_sectors = 0;
> Mike> Seems like we may want some common variant in block even though
> Mike> I'm not aware of other block drivers that would benefit...
> 
> Mike> But I'll defer to Martin and/or Jens on whether these helpers are
> Mike> fine to stay in dm-table.c or should be worked into blk-settings.c
> 
> In the pre-topology days we set max_sectors to SAFE_MAX_SECTORS upon
> creation of a queue.  This is an old ATA-ism that's been around for a
> ages.
> 
> Ideally we'd simply nuke it and drivers that really needed to lower the
> bar would explicitly call blk_queue_max_sectors().  However, I'm afraid
> to change the default because I'm sure there are legacy drivers lurking
> somewhere that depend on it.
> 
> Seeing as blk_set_default_limits() is mostly aimed at stacking drivers I
> think I'd prefer moving SAFE_MAX_SECTORS back to blk_queue_make_request
> and then set max_sectors and max_hw_sectors to 0 in default_limits.
> 
> Would that work for you guys?

Yes, that should work.

I'll split the patch into 2, one for changing blk_set_default_limits
and the other for dm to set the safe default if 'max_sectors == 0'.

Thanks,
-- 
Jun'ichi Nomura, NEC Corporation

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: fragmented i/o with 2.6.31?
  2009-09-18  6:00         ` Martin K. Petersen
  2009-09-18 14:30           ` Jun'ichi Nomura
@ 2009-09-18 15:06           ` Mike Snitzer
  2009-09-18 15:38             ` Jun'ichi Nomura
  1 sibling, 1 reply; 12+ messages in thread
From: Mike Snitzer @ 2009-09-18 15:06 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Jun'ichi Nomura, device-mapper development, Alasdair Kergon,
	Jens Axboe

On Fri, Sep 18 2009 at  2:00am -0400,
Martin K. Petersen <martin.petersen@oracle.com> wrote:

> >>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:
> 
> >> >  	blk_set_default_limits(limits);
> >> > + limits->max_sectors = 0;
> >> > + limits->max_hw_sectors = 0;
> >> 
> Mike> Seems like we may want some common variant in block even though
> Mike> I'm not aware of other block drivers that would benefit...
> 
> Mike> But I'll defer to Martin and/or Jens on whether these helpers are
> Mike> fine to stay in dm-table.c or should be worked into blk-settings.c
> 
> In the pre-topology days we set max_sectors to SAFE_MAX_SECTORS upon
> creation of a queue.  This is an old ATA-ism that's been around for a
> ages.
> 
> Ideally we'd simply nuke it and drivers that really needed to lower the
> bar would explicitly call blk_queue_max_sectors().  However, I'm afraid
> to change the default because I'm sure there are legacy drivers lurking
> somewhere that depend on it.
> 
> Seeing as blk_set_default_limits() is mostly aimed at stacking drivers I
> think I'd prefer moving SAFE_MAX_SECTORS back to blk_queue_make_request
> and then set max_sectors and max_hw_sectors to 0 in default_limits.
> 
> Would that work for you guys?

So you're referring to fact that this commit removed
blk_queue_max_sectors(q, SAFE_MAX_SECTORS) from blk_queue_make_request:
http://git.kernel.org/linus/e475bba2

I think I like your proposal.  But, to clarify things further, are you
saying:

By moving SAFE_MAX_SECTORS back to blk_queue_make_request (after its
existing call to blk_set_default_limits right?) and having
blk_set_default_limits set max_sectors and max_hw_sectors to 0:

DM will be free to establish the proper limit stacking because the DM
limits are not derived from the queue's default limits?  Because the DM
device limits are just stacked and copied to the queue, some background
for those following along:

DM's actual stacking of limits takes place when the DM table is
translated to the DM device's final queue (at table resume time), see:
http://git.kernel.org/linus/754c5fc7e

drivers/md/dm.c:dm_swap_table() calls dm_calculate_queue_limits() to
stack the limits.

drivers/md/dm.c:__bind() sets the DM device's queue_limits via
dm_table_set_restrictions()

drivers/md/dm-table.c:dm_table_set_restrictions() simply copies the
queue_limits established by DM's stacking with:
        /*                                                                                                        
         * Copy table's limits to the DM device's request_queue                                                                                                        
         */
        q->limits = *limits;

Now coming full circle:
AFAIK the only piece I'm missing is how/where your proposed changes will
account for the need to establish SAFE_MAX_SECTORS _after_ the stacking
of queue_limits: IFF max_sectors and max_hw_sectors are still 0 (like
Jun'ichi did in DM with the 2nd patch posted).

But I don't pretend to have this all sorted out in my head.  I could
easily be missing some other piece(s) implicit in your proposal.

Maybe an RFC patch that illustrates your thinking would help further this
discussion?

Regards,
Mike

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: fragmented i/o with 2.6.31?
  2009-09-18 15:06           ` Mike Snitzer
@ 2009-09-18 15:38             ` Jun'ichi Nomura
  2009-09-18 15:57               ` Mike Snitzer
  0 siblings, 1 reply; 12+ messages in thread
From: Jun'ichi Nomura @ 2009-09-18 15:38 UTC (permalink / raw)
  To: Mike Snitzer, Martin K. Petersen
  Cc: device-mapper development, Alasdair Kergon, Jens Axboe

Mike Snitzer wrote:
> On Fri, Sep 18 2009 at  2:00am -0400,
> Martin K. Petersen <martin.petersen@oracle.com> wrote:
> 
>>>>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:
>>>>>  	blk_set_default_limits(limits);
>>>>> + limits->max_sectors = 0;
>>>>> + limits->max_hw_sectors = 0;
>> Mike> Seems like we may want some common variant in block even though
>> Mike> I'm not aware of other block drivers that would benefit...
>>
>> Mike> But I'll defer to Martin and/or Jens on whether these helpers are
>> Mike> fine to stay in dm-table.c or should be worked into blk-settings.c
>>
>> In the pre-topology days we set max_sectors to SAFE_MAX_SECTORS upon
>> creation of a queue.  This is an old ATA-ism that's been around for a
>> ages.
>>
>> Ideally we'd simply nuke it and drivers that really needed to lower the
>> bar would explicitly call blk_queue_max_sectors().  However, I'm afraid
>> to change the default because I'm sure there are legacy drivers lurking
>> somewhere that depend on it.
>>
>> Seeing as blk_set_default_limits() is mostly aimed at stacking drivers I
>> think I'd prefer moving SAFE_MAX_SECTORS back to blk_queue_make_request
>> and then set max_sectors and max_hw_sectors to 0 in default_limits.
>>
>> Would that work for you guys?
> 
> So you're referring to fact that this commit removed
> blk_queue_max_sectors(q, SAFE_MAX_SECTORS) from blk_queue_make_request:
> http://git.kernel.org/linus/e475bba2
> 
> I think I like your proposal.  But, to clarify things further, are you
> saying:
> 
> By moving SAFE_MAX_SECTORS back to blk_queue_make_request (after its
> existing call to blk_set_default_limits right?) and having
> blk_set_default_limits set max_sectors and max_hw_sectors to 0:
> 
> DM will be free to establish the proper limit stacking because the DM
> limits are not derived from the queue's default limits?  Because the DM
> device limits are just stacked and copied to the queue, some background
> for those following along:
> 
> DM's actual stacking of limits takes place when the DM table is
> translated to the DM device's final queue (at table resume time), see:
> http://git.kernel.org/linus/754c5fc7e
> 
> drivers/md/dm.c:dm_swap_table() calls dm_calculate_queue_limits() to
> stack the limits.
> 
> drivers/md/dm.c:__bind() sets the DM device's queue_limits via
> dm_table_set_restrictions()
> 
> drivers/md/dm-table.c:dm_table_set_restrictions() simply copies the
> queue_limits established by DM's stacking with:
>         /*                                                                                                        
>          * Copy table's limits to the DM device's request_queue                                                                                                        
>          */
>         q->limits = *limits;
> 
> Now coming full circle:
> AFAIK the only piece I'm missing is how/where your proposed changes will
> account for the need to establish SAFE_MAX_SECTORS _after_ the stacking
> of queue_limits: IFF max_sectors and max_hw_sectors are still 0 (like
> Jun'ichi did in DM with the 2nd patch posted).
> 
> But I don't pretend to have this all sorted out in my head.  I could
> easily be missing some other piece(s) implicit in your proposal.
> 
> Maybe an RFC patch that illustrates your thinking would help further this
> discussion?

I just sent out revised patchset:

[PATCH 1/2] dm: Set safe default max_sectors for targets with no 
underlying device
https://www.redhat.com/archives/dm-devel/2009-September/msg00203.html

[PATCH 2/2] block: blk_set_default_limits sets 0 to max_sectors
https://www.redhat.com/archives/dm-devel/2009-September/msg00205.html


But I wonder better fix might be to provide blk_queue_copy_limits()
to replace this in dm-table.c:

 >         q->limits = *limits;

where blk_queue_copy_limits() looks like this:

void blk_queue_copy_limits(struct request_queue *q, struct queue_limits 
*lim)
{
	q->limits = *limits;

	/* fix-up bad values */
	if (q->limits.max_sectors == 0 || q->limits.max_hw_sectors == 0)
		blk_queue_max_sectors(q, SAFE_MAX_SECTORS);
}

so that block/blk-settings.c has full-control on default value
and dm don't need to care about the magic 'SAFE_MAX_SECTORS'.

Thanks,
-- 
Jun'ichi Nomura, NEC Corporation

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: fragmented i/o with 2.6.31?
  2009-09-18 15:38             ` Jun'ichi Nomura
@ 2009-09-18 15:57               ` Mike Snitzer
  2009-09-18 16:55                 ` Jun'ichi Nomura
  0 siblings, 1 reply; 12+ messages in thread
From: Mike Snitzer @ 2009-09-18 15:57 UTC (permalink / raw)
  To: Jun'ichi Nomura
  Cc: device-mapper development, Alasdair Kergon, Martin K. Petersen,
	Jens Axboe

On Fri, Sep 18 2009 at 11:38am -0400,
Jun'ichi Nomura <j-nomura@ce.jp.nec.com> wrote:

> Mike Snitzer wrote:
>> On Fri, Sep 18 2009 at  2:00am -0400,
>> Martin K. Petersen <martin.petersen@oracle.com> wrote:
>>
>>>>>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:
>>>>>>  	blk_set_default_limits(limits);
>>>>>> + limits->max_sectors = 0;
>>>>>> + limits->max_hw_sectors = 0;
>>> Mike> Seems like we may want some common variant in block even though
>>> Mike> I'm not aware of other block drivers that would benefit...
>>>
>>> Mike> But I'll defer to Martin and/or Jens on whether these helpers are
>>> Mike> fine to stay in dm-table.c or should be worked into blk-settings.c
>>>
>>> In the pre-topology days we set max_sectors to SAFE_MAX_SECTORS upon
>>> creation of a queue.  This is an old ATA-ism that's been around for a
>>> ages.
>>>
>>> Ideally we'd simply nuke it and drivers that really needed to lower the
>>> bar would explicitly call blk_queue_max_sectors().  However, I'm afraid
>>> to change the default because I'm sure there are legacy drivers lurking
>>> somewhere that depend on it.
>>>
>>> Seeing as blk_set_default_limits() is mostly aimed at stacking drivers I
>>> think I'd prefer moving SAFE_MAX_SECTORS back to blk_queue_make_request
>>> and then set max_sectors and max_hw_sectors to 0 in default_limits.
>>>
>>> Would that work for you guys?
>>
>> So you're referring to fact that this commit removed
>> blk_queue_max_sectors(q, SAFE_MAX_SECTORS) from blk_queue_make_request:
>> http://git.kernel.org/linus/e475bba2
>>
>> I think I like your proposal.  But, to clarify things further, are you
>> saying:
>>
>> By moving SAFE_MAX_SECTORS back to blk_queue_make_request (after its
>> existing call to blk_set_default_limits right?) and having
>> blk_set_default_limits set max_sectors and max_hw_sectors to 0:
>>
>> DM will be free to establish the proper limit stacking because the DM
>> limits are not derived from the queue's default limits?  Because the DM
>> device limits are just stacked and copied to the queue, some background
>> for those following along:
>>
>> DM's actual stacking of limits takes place when the DM table is
>> translated to the DM device's final queue (at table resume time), see:
>> http://git.kernel.org/linus/754c5fc7e
>>
>> drivers/md/dm.c:dm_swap_table() calls dm_calculate_queue_limits() to
>> stack the limits.
>>
>> drivers/md/dm.c:__bind() sets the DM device's queue_limits via
>> dm_table_set_restrictions()
>>
>> drivers/md/dm-table.c:dm_table_set_restrictions() simply copies the
>> queue_limits established by DM's stacking with:
>>         /*                                                              
>>                                                    * Copy table's 
>> limits to the DM device's request_queue                                 
>>                                                                         
>>          */
>>         q->limits = *limits;
>>
>> Now coming full circle:
>> AFAIK the only piece I'm missing is how/where your proposed changes will
>> account for the need to establish SAFE_MAX_SECTORS _after_ the stacking
>> of queue_limits: IFF max_sectors and max_hw_sectors are still 0 (like
>> Jun'ichi did in DM with the 2nd patch posted).
>>
>> But I don't pretend to have this all sorted out in my head.  I could
>> easily be missing some other piece(s) implicit in your proposal.
>>
>> Maybe an RFC patch that illustrates your thinking would help further this
>> discussion?
>
> I just sent out revised patchset:
>
> [PATCH 1/2] dm: Set safe default max_sectors for targets with no  
> underlying device
> https://www.redhat.com/archives/dm-devel/2009-September/msg00203.html
>
> [PATCH 2/2] block: blk_set_default_limits sets 0 to max_sectors
> https://www.redhat.com/archives/dm-devel/2009-September/msg00205.html
>
>
> But I wonder better fix might be to provide blk_queue_copy_limits()
> to replace this in dm-table.c:
>
> >         q->limits = *limits;
>
> where blk_queue_copy_limits() looks like this:
>
> void blk_queue_copy_limits(struct request_queue *q, struct queue_limits  
> *lim)
> {
> 	q->limits = *limits;
>
> 	/* fix-up bad values */
> 	if (q->limits.max_sectors == 0 || q->limits.max_hw_sectors == 0)
> 		blk_queue_max_sectors(q, SAFE_MAX_SECTORS);
> }
>
> so that block/blk-settings.c has full-control on default value
> and dm don't need to care about the magic 'SAFE_MAX_SECTORS'.

Even better, I like that much better than your DM specific changes I
just commented on.

But rather than "fix-up bad values" I'd suggest a more helpful comment
block (like the one from your patch that I just commented on).

You likely planned on cleaning the above up with a more robust comment
and I'm jumping the gun on being critical :)

Thanks,
Mike

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Re: fragmented i/o with 2.6.31?
  2009-09-18 15:57               ` Mike Snitzer
@ 2009-09-18 16:55                 ` Jun'ichi Nomura
  0 siblings, 0 replies; 12+ messages in thread
From: Jun'ichi Nomura @ 2009-09-18 16:55 UTC (permalink / raw)
  To: device-mapper development, Mike Snitzer
  Cc: Martin K. Petersen, Alasdair Kergon, Jens Axboe

Mike Snitzer wrote:
> On Fri, Sep 18 2009 at 11:38am -0400,
> Jun'ichi Nomura <j-nomura@ce.jp.nec.com> wrote:
> 
>> Mike Snitzer wrote:
>>> On Fri, Sep 18 2009 at  2:00am -0400,
>>> Martin K. Petersen <martin.petersen@oracle.com> wrote:
>>>
>>>>>>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:
>>>>>>>  	blk_set_default_limits(limits);
>>>>>>> + limits->max_sectors = 0;
>>>>>>> + limits->max_hw_sectors = 0;
>>>> Mike> Seems like we may want some common variant in block even though
>>>> Mike> I'm not aware of other block drivers that would benefit...
>>>>
>>>> Mike> But I'll defer to Martin and/or Jens on whether these helpers are
>>>> Mike> fine to stay in dm-table.c or should be worked into blk-settings.c
>>>>
>>>> In the pre-topology days we set max_sectors to SAFE_MAX_SECTORS upon
>>>> creation of a queue.  This is an old ATA-ism that's been around for a
>>>> ages.
>>>>
>>>> Ideally we'd simply nuke it and drivers that really needed to lower the
>>>> bar would explicitly call blk_queue_max_sectors().  However, I'm afraid
>>>> to change the default because I'm sure there are legacy drivers lurking
>>>> somewhere that depend on it.
>>>>
>>>> Seeing as blk_set_default_limits() is mostly aimed at stacking drivers I
>>>> think I'd prefer moving SAFE_MAX_SECTORS back to blk_queue_make_request
>>>> and then set max_sectors and max_hw_sectors to 0 in default_limits.
>>>>
>>>> Would that work for you guys?
>>> So you're referring to fact that this commit removed
>>> blk_queue_max_sectors(q, SAFE_MAX_SECTORS) from blk_queue_make_request:
>>> http://git.kernel.org/linus/e475bba2
>>>
>>> I think I like your proposal.  But, to clarify things further, are you
>>> saying:
>>>
>>> By moving SAFE_MAX_SECTORS back to blk_queue_make_request (after its
>>> existing call to blk_set_default_limits right?) and having
>>> blk_set_default_limits set max_sectors and max_hw_sectors to 0:
>>>
>>> DM will be free to establish the proper limit stacking because the DM
>>> limits are not derived from the queue's default limits?  Because the DM
>>> device limits are just stacked and copied to the queue, some background
>>> for those following along:
>>>
>>> DM's actual stacking of limits takes place when the DM table is
>>> translated to the DM device's final queue (at table resume time), see:
>>> http://git.kernel.org/linus/754c5fc7e
>>>
>>> drivers/md/dm.c:dm_swap_table() calls dm_calculate_queue_limits() to
>>> stack the limits.
>>>
>>> drivers/md/dm.c:__bind() sets the DM device's queue_limits via
>>> dm_table_set_restrictions()
>>>
>>> drivers/md/dm-table.c:dm_table_set_restrictions() simply copies the
>>> queue_limits established by DM's stacking with:
>>>         /*                                                              
>>>                                                    * Copy table's 
>>> limits to the DM device's request_queue                                 
>>>                                                                         
>>>          */
>>>         q->limits = *limits;
>>>
>>> Now coming full circle:
>>> AFAIK the only piece I'm missing is how/where your proposed changes will
>>> account for the need to establish SAFE_MAX_SECTORS _after_ the stacking
>>> of queue_limits: IFF max_sectors and max_hw_sectors are still 0 (like
>>> Jun'ichi did in DM with the 2nd patch posted).
>>>
>>> But I don't pretend to have this all sorted out in my head.  I could
>>> easily be missing some other piece(s) implicit in your proposal.
>>>
>>> Maybe an RFC patch that illustrates your thinking would help further this
>>> discussion?
>> I just sent out revised patchset:
>>
>> [PATCH 1/2] dm: Set safe default max_sectors for targets with no  
>> underlying device
>> https://www.redhat.com/archives/dm-devel/2009-September/msg00203.html
>>
>> [PATCH 2/2] block: blk_set_default_limits sets 0 to max_sectors
>> https://www.redhat.com/archives/dm-devel/2009-September/msg00205.html
>>
>>
>> But I wonder better fix might be to provide blk_queue_copy_limits()
>> to replace this in dm-table.c:
>>
>>>         q->limits = *limits;
>> where blk_queue_copy_limits() looks like this:
>>
>> void blk_queue_copy_limits(struct request_queue *q, struct queue_limits  
>> *lim)
>> {
>> 	q->limits = *limits;
>>
>> 	/* fix-up bad values */
>> 	if (q->limits.max_sectors == 0 || q->limits.max_hw_sectors == 0)
>> 		blk_queue_max_sectors(q, SAFE_MAX_SECTORS);
>> }
>>
>> so that block/blk-settings.c has full-control on default value
>> and dm don't need to care about the magic 'SAFE_MAX_SECTORS'.
> 
> Even better, I like that much better than your DM specific changes I
> just commented on.
> 
> But rather than "fix-up bad values" I'd suggest a more helpful comment
> block (like the one from your patch that I just commented on).

Thanks for the comments.
I re-posted the patchset. Please check them.

   [PATCH 1/3] block: Add blk_queue_copy_limits()
   https://www.redhat.com/archives/dm-devel/2009-September/msg00209.html

   [PATCH 2/3] dm: Use blk_queue_copy_limits()
   https://www.redhat.com/archives/dm-devel/2009-September/msg00210.html

   [PATCH 3/3] block: blk_set_default_limits sets 0 to max_sectors
   https://www.redhat.com/archives/dm-devel/2009-September/msg00211.html

> You likely planned on cleaning the above up with a more robust comment
> and I'm jumping the gun on being critical :)

I was falling asleep but woken up by your comment :)

Thanks,
-- 
Jun'ichi Nomura, NEC Corporation

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2009-09-18 16:55 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-09-16 15:34 fragmented i/o with 2.6.31? David Strand
2009-09-16 16:22 ` David Strand
2009-09-17  8:02   ` Kiyoshi Ueda
2009-09-17  9:14     ` Jun'ichi Nomura
2009-09-17 13:11       ` Mike Snitzer
2009-09-17 17:32         ` David Strand
2009-09-18  6:00         ` Martin K. Petersen
2009-09-18 14:30           ` Jun'ichi Nomura
2009-09-18 15:06           ` Mike Snitzer
2009-09-18 15:38             ` Jun'ichi Nomura
2009-09-18 15:57               ` Mike Snitzer
2009-09-18 16:55                 ` Jun'ichi Nomura

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.