All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Direct I/O bio size regression
@ 2006-04-24  6:14 David Chinner
  2006-04-24  7:02 ` Jens Axboe
  0 siblings, 1 reply; 15+ messages in thread
From: David Chinner @ 2006-04-24  6:14 UTC (permalink / raw)
  To: linux-kernel

The change introduced here in 2.6.15:

http://kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=defd94b75409b983f94548ea2f52ff5787ddb848

sets the request queue max_sector size unconditionally to 1024 sectors in
blk_queue_max_sectors() even if the underlying hardware can support a larger
number of sectors.

Hence when building direct I/O bios, we have the situation where:

	- dio_new_bio() limits bio vector size artifically to
	  1024 sectors / page size because bio_get_nr_vecs()
	  is used q->max_sectors to size the new bio; and
	- dio_bio_add_page() limits the total bio size to 1024
	  sectors because bio_add_page() now uses q->max_sectors
	  to limit the size of the bio.
	  
Therefore, we can't build direct I/Os larger than 1024 sectors even
if the hardware supports large I/Os.  This is a regression as before
this mod we were able to issue direct I/Os limited by either the
maximum number of vectors in an bio or the hardware limits.

The patch below (against 2.6.16) allows direct I/O to build bios as
large as the underlying hardware will allow.

Signed-off-by: Dave Chinner <dgc@sgi.com>
---
 bio.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

Index: 2.6.x-xfs-new/fs/bio.c
===================================================================
--- 2.6.x-xfs-new.orig/fs/bio.c	2006-02-06 11:57:50.000000000 +1100
+++ 2.6.x-xfs-new/fs/bio.c	2006-04-24 15:46:16.849484424 +1000
@@ -304,7 +304,7 @@ int bio_get_nr_vecs(struct block_device 
 	request_queue_t *q = bdev_get_queue(bdev);
 	int nr_pages;
 
-	nr_pages = ((q->max_sectors << 9) + PAGE_SIZE - 1) >> PAGE_SHIFT;
+	nr_pages = ((q->max_hw_sectors << 9) + PAGE_SIZE - 1) >> PAGE_SHIFT;
 	if (nr_pages > q->max_phys_segments)
 		nr_pages = q->max_phys_segments;
 	if (nr_pages > q->max_hw_segments)
@@ -446,7 +446,7 @@ int bio_add_page(struct bio *bio, struct
 		 unsigned int offset)
 {
 	struct request_queue *q = bdev_get_queue(bio->bi_bdev);
-	return __bio_add_page(q, bio, page, len, offset, q->max_sectors);
+	return __bio_add_page(q, bio, page, len, offset, q->max_hw_sectors);
 }
 
 struct bio_map_data {

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Direct I/O bio size regression
  2006-04-24  6:14 [PATCH] Direct I/O bio size regression David Chinner
@ 2006-04-24  7:02 ` Jens Axboe
  2006-04-24  9:05   ` Jens Axboe
  0 siblings, 1 reply; 15+ messages in thread
From: Jens Axboe @ 2006-04-24  7:02 UTC (permalink / raw)
  To: David Chinner; +Cc: linux-kernel

On Mon, Apr 24 2006, David Chinner wrote:
> The change introduced here in 2.6.15:
> 
> http://kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=defd94b75409b983f94548ea2f52ff5787ddb848
> 
> sets the request queue max_sector size unconditionally to 1024 sectors in
> blk_queue_max_sectors() even if the underlying hardware can support a larger
> number of sectors.
> 
> Hence when building direct I/O bios, we have the situation where:
> 
> 	- dio_new_bio() limits bio vector size artifically to
> 	  1024 sectors / page size because bio_get_nr_vecs()
> 	  is used q->max_sectors to size the new bio; and
> 	- dio_bio_add_page() limits the total bio size to 1024
> 	  sectors because bio_add_page() now uses q->max_sectors
> 	  to limit the size of the bio.
> 	  
> Therefore, we can't build direct I/Os larger than 1024 sectors even
> if the hardware supports large I/Os.  This is a regression as before
> this mod we were able to issue direct I/Os limited by either the
> maximum number of vectors in an bio or the hardware limits.
> 
> The patch below (against 2.6.16) allows direct I/O to build bios as
> large as the underlying hardware will allow.
> 
> Signed-off-by: Dave Chinner <dgc@sgi.com>
> ---
>  bio.c |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> Index: 2.6.x-xfs-new/fs/bio.c
> ===================================================================
> --- 2.6.x-xfs-new.orig/fs/bio.c	2006-02-06 11:57:50.000000000 +1100
> +++ 2.6.x-xfs-new/fs/bio.c	2006-04-24 15:46:16.849484424 +1000
> @@ -304,7 +304,7 @@ int bio_get_nr_vecs(struct block_device 
>  	request_queue_t *q = bdev_get_queue(bdev);
>  	int nr_pages;
>  
> -	nr_pages = ((q->max_sectors << 9) + PAGE_SIZE - 1) >> PAGE_SHIFT;
> +	nr_pages = ((q->max_hw_sectors << 9) + PAGE_SIZE - 1) >> PAGE_SHIFT;
>  	if (nr_pages > q->max_phys_segments)
>  		nr_pages = q->max_phys_segments;
>  	if (nr_pages > q->max_hw_segments)
> @@ -446,7 +446,7 @@ int bio_add_page(struct bio *bio, struct
>  		 unsigned int offset)
>  {
>  	struct request_queue *q = bdev_get_queue(bio->bi_bdev);
> -	return __bio_add_page(q, bio, page, len, offset, q->max_sectors);
> +	return __bio_add_page(q, bio, page, len, offset, q->max_hw_sectors);
>  }
>  
>  struct bio_map_data {

Clearly correct, I'll make sure this gets merged right away.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Direct I/O bio size regression
  2006-04-24  7:02 ` Jens Axboe
@ 2006-04-24  9:05   ` Jens Axboe
  2006-04-24 14:56     ` David Chinner
  0 siblings, 1 reply; 15+ messages in thread
From: Jens Axboe @ 2006-04-24  9:05 UTC (permalink / raw)
  To: David Chinner; +Cc: linux-kernel

On Mon, Apr 24 2006, Jens Axboe wrote:
> > Index: 2.6.x-xfs-new/fs/bio.c
> > ===================================================================
> > --- 2.6.x-xfs-new.orig/fs/bio.c	2006-02-06 11:57:50.000000000 +1100
> > +++ 2.6.x-xfs-new/fs/bio.c	2006-04-24 15:46:16.849484424 +1000
> > @@ -304,7 +304,7 @@ int bio_get_nr_vecs(struct block_device 
> >  	request_queue_t *q = bdev_get_queue(bdev);
> >  	int nr_pages;
> >  
> > -	nr_pages = ((q->max_sectors << 9) + PAGE_SIZE - 1) >> PAGE_SHIFT;
> > +	nr_pages = ((q->max_hw_sectors << 9) + PAGE_SIZE - 1) >> PAGE_SHIFT;
> >  	if (nr_pages > q->max_phys_segments)
> >  		nr_pages = q->max_phys_segments;
> >  	if (nr_pages > q->max_hw_segments)
> > @@ -446,7 +446,7 @@ int bio_add_page(struct bio *bio, struct
> >  		 unsigned int offset)
> >  {
> >  	struct request_queue *q = bdev_get_queue(bio->bi_bdev);
> > -	return __bio_add_page(q, bio, page, len, offset, q->max_sectors);
> > +	return __bio_add_page(q, bio, page, len, offset, q->max_hw_sectors);
> >  }
> >  
> >  struct bio_map_data {
> 
> Clearly correct, I'll make sure this gets merged right away.

Spoke too soon... The last part is actually on purpose, to prevent
really huge requests as part of normal file system IO. That's why we
have a bio_add_pc_page(). The first hunk may cause things to not work
optimally then if we don't apply the last hunk.

The best approach is probably to tune max_sectors on the system itself.
That's why it is exposed, after all.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Direct I/O bio size regression
  2006-04-24  9:05   ` Jens Axboe
@ 2006-04-24 14:56     ` David Chinner
  2006-04-24 18:47       ` Jens Axboe
  0 siblings, 1 reply; 15+ messages in thread
From: David Chinner @ 2006-04-24 14:56 UTC (permalink / raw)
  To: Jens Axboe; +Cc: David Chinner, linux-kernel

On Mon, Apr 24, 2006 at 11:05:08AM +0200, Jens Axboe wrote:
> On Mon, Apr 24 2006, Jens Axboe wrote:
> > > Index: 2.6.x-xfs-new/fs/bio.c
> > > ===================================================================
> > > --- 2.6.x-xfs-new.orig/fs/bio.c	2006-02-06 11:57:50.000000000 +1100
> > > +++ 2.6.x-xfs-new/fs/bio.c	2006-04-24 15:46:16.849484424 +1000
> > > @@ -304,7 +304,7 @@ int bio_get_nr_vecs(struct block_device 
> > >  	request_queue_t *q = bdev_get_queue(bdev);
> > >  	int nr_pages;
> > >  
> > > -	nr_pages = ((q->max_sectors << 9) + PAGE_SIZE - 1) >> PAGE_SHIFT;
> > > +	nr_pages = ((q->max_hw_sectors << 9) + PAGE_SIZE - 1) >> PAGE_SHIFT;
> > >  	if (nr_pages > q->max_phys_segments)
> > >  		nr_pages = q->max_phys_segments;
> > >  	if (nr_pages > q->max_hw_segments)
> > > @@ -446,7 +446,7 @@ int bio_add_page(struct bio *bio, struct
> > >  		 unsigned int offset)
> > >  {
> > >  	struct request_queue *q = bdev_get_queue(bio->bi_bdev);
> > > -	return __bio_add_page(q, bio, page, len, offset, q->max_sectors);
> > > +	return __bio_add_page(q, bio, page, len, offset, q->max_hw_sectors);
> > >  }
> > >  
> > >  struct bio_map_data {
> > 
> > Clearly correct, I'll make sure this gets merged right away.
> 
> Spoke too soon... The last part is actually on purpose, to prevent
> really huge requests as part of normal file system IO.

I don't understand why this was considered necessary. It
doesn't appear to be explained in any of the code so can you
explain the problem that large filesystem I/Os pose to the block
layer? We _need_ to be able to drive really huge requests from the
filesystem down to the disks, especially for direct I/O.....

FWIW, we've just got XFS to the point where we could issue large
I/Os (up to 8MB on 16k pages) with a default configuration kernel
and filesystem using md+dm on an Altix. That makes an artificial
512KB filesystem I/O size limit a pretty major step backwards in
terms of performance for default configs.....

> That's why we
> have a bio_add_pc_page(). The first hunk may cause things to not work
> optimally then if we don't apply the last hunk.

bio_add_pc_page() requires a request queue to be passed to it.  It's
called only from scsi layers in the context of mapping pages into a
bio from sg_io(). The comment for bio_add_pc_page() says for use
with REQ_PC queues only, and that appears to only be used by ide-cd
cdroms. Is that comment correct?

Also, it seems to me that using bio_add_pc_page() in a filesystem
or in the generic direct i/o code seems like a gross layering
violation to me because they are supposed to know nothing about
request queues.

> The best approach is probably to tune max_sectors on the system itself.
> That's why it is exposed, after all.

You mean /sys/block/sd*/max_sector_kb?

Cheers,

Dave.

-- 
Dave Chinner
R&D Software Enginner
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Direct I/O bio size regression
  2006-04-24 14:56     ` David Chinner
@ 2006-04-24 18:47       ` Jens Axboe
  2006-04-26  2:30         ` David Chinner
  0 siblings, 1 reply; 15+ messages in thread
From: Jens Axboe @ 2006-04-24 18:47 UTC (permalink / raw)
  To: David Chinner; +Cc: linux-kernel

On Tue, Apr 25 2006, David Chinner wrote:
> On Mon, Apr 24, 2006 at 11:05:08AM +0200, Jens Axboe wrote:
> > On Mon, Apr 24 2006, Jens Axboe wrote:
> > > > Index: 2.6.x-xfs-new/fs/bio.c
> > > > ===================================================================
> > > > --- 2.6.x-xfs-new.orig/fs/bio.c	2006-02-06 11:57:50.000000000 +1100
> > > > +++ 2.6.x-xfs-new/fs/bio.c	2006-04-24 15:46:16.849484424 +1000
> > > > @@ -304,7 +304,7 @@ int bio_get_nr_vecs(struct block_device 
> > > >  	request_queue_t *q = bdev_get_queue(bdev);
> > > >  	int nr_pages;
> > > >  
> > > > -	nr_pages = ((q->max_sectors << 9) + PAGE_SIZE - 1) >> PAGE_SHIFT;
> > > > +	nr_pages = ((q->max_hw_sectors << 9) + PAGE_SIZE - 1) >> PAGE_SHIFT;
> > > >  	if (nr_pages > q->max_phys_segments)
> > > >  		nr_pages = q->max_phys_segments;
> > > >  	if (nr_pages > q->max_hw_segments)
> > > > @@ -446,7 +446,7 @@ int bio_add_page(struct bio *bio, struct
> > > >  		 unsigned int offset)
> > > >  {
> > > >  	struct request_queue *q = bdev_get_queue(bio->bi_bdev);
> > > > -	return __bio_add_page(q, bio, page, len, offset, q->max_sectors);
> > > > +	return __bio_add_page(q, bio, page, len, offset, q->max_hw_sectors);
> > > >  }
> > > >  
> > > >  struct bio_map_data {
> > > 
> > > Clearly correct, I'll make sure this gets merged right away.
> > 
> > Spoke too soon... The last part is actually on purpose, to prevent
> > really huge requests as part of normal file system IO.
> 
> I don't understand why this was considered necessary. It
> doesn't appear to be explained in any of the code so can you
> explain the problem that large filesystem I/Os pose to the block
> layer? We _need_ to be able to drive really huge requests from the
> filesystem down to the disks, especially for direct I/O.....
> 
> FWIW, we've just got XFS to the point where we could issue large
> I/Os (up to 8MB on 16k pages) with a default configuration kernel
> and filesystem using md+dm on an Altix. That makes an artificial
> 512KB filesystem I/O size limit a pretty major step backwards in
> terms of performance for default configs.....

The change was needed to safely split max_sectors into two sane parts:

- The soft value, ->max_sectors, that holds a sane default of maximum io
  size. The main issue we want to prevent is filling the queue with huge
  amounts of io, both from a pinning POV but also from user latency
  reasons.

- The hard value, ->max_hw_sectors. Previously, there was no real clear
  definition of what ->max_sectors was supposed to do. We couldn't
  increase it to fit the hardware limits of most hardware, because that
  would hurt us latency/memory wise.

> > That's why we
> > have a bio_add_pc_page(). The first hunk may cause things to not work
> > optimally then if we don't apply the last hunk.
> 
> bio_add_pc_page() requires a request queue to be passed to it.  It's
> called only from scsi layers in the context of mapping pages into a
> bio from sg_io(). The comment for bio_add_pc_page() says for use
> with REQ_PC queues only, and that appears to only be used by ide-cd
> cdroms. Is that comment correct?

It's used for any SG_IO path, so that is not at all restricted to
ide-cd. It covers all block devices.

> Also, it seems to me that using bio_add_pc_page() in a filesystem
> or in the generic direct i/o code seems like a gross layering
> violation to me because they are supposed to know nothing about
> request queues.

I'm not suggesting you do that at all. You should not have to change
your file system. See below.

> > The best approach is probably to tune max_sectors on the system itself.
> > That's why it is exposed, after all.
> 
> You mean /sys/block/sd*/max_sector_kb?

Exactly. Your max_hw_sectors_kb should already be correct, if not then
that is a driver issue that needs to be fixed. And that's not a new
issue, it was always so. You can then increase max_sectors_kb to any
value as long as it's less than max_hw_sectors_kb, and your filesystem
will happily build you ios as large as you need (equiv to what your
patch would have accomplished).

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Direct I/O bio size regression
  2006-04-24 18:47       ` Jens Axboe
@ 2006-04-26  2:30         ` David Chinner
  2006-04-26  5:28           ` Jens Axboe
  2006-05-07 16:25           ` Lee Revell
  0 siblings, 2 replies; 15+ messages in thread
From: David Chinner @ 2006-04-26  2:30 UTC (permalink / raw)
  To: Jens Axboe; +Cc: David Chinner, linux-kernel

On Mon, Apr 24, 2006 at 08:47:30PM +0200, Jens Axboe wrote:
> On Tue, Apr 25 2006, David Chinner wrote:
> > On Mon, Apr 24, 2006 at 11:05:08AM +0200, Jens Axboe wrote:
> > > 
> > > Spoke too soon... The last part is actually on purpose, to prevent
> > > really huge requests as part of normal file system IO.
> > 
> > I don't understand why this was considered necessary. It
> > doesn't appear to be explained in any of the code so can you
> > explain the problem that large filesystem I/Os pose to the block
> > layer? We _need_ to be able to drive really huge requests from the
> > filesystem down to the disks, especially for direct I/O.....
> > 
> > FWIW, we've just got XFS to the point where we could issue large
> > I/Os (up to 8MB on 16k pages) with a default configuration kernel
> > and filesystem using md+dm on an Altix. That makes an artificial
> > 512KB filesystem I/O size limit a pretty major step backwards in
> > terms of performance for default configs.....
> 
> The change was needed to safely split max_sectors into two sane parts:
> 
> - The soft value, ->max_sectors, that holds a sane default of maximum io
>   size. The main issue we want to prevent is filling the queue with huge
>   amounts of io, both from a pinning POV but also from user latency
>   reasons.

Got any data that you can share with us?

Wrt latency, is the problem to do with large requests causing short
term latency? I thought that latency minimisation is the job of the
I/O scheduler, so if this is the case, doesn't this indicate a
deficiency of the I/O scheduler? e.g. the I/o scheduler could split
large requests to reduce latency, just like you merge adjacent
requests to reduce the number of I/Os and keep overall latency
low...

And as to the pinning problem - if you have a problem with too much
memory in the I/O queues, then the I/O queues are too deep or they
need to be throttled based on the amount of data in them as well as
the number of queued requests.  It's the method or configuration of
the I/O scheduler being used to throttle requests that is deficient
here, not the fact that a filesystem is building large I/Os.

It seems to me that you've crippled the block layer to solve very
specific problems that most people don't see. I haven't seen pinning
problems since the cfq request queue depth was reduced from 8192 to
128 and all the I/O latency problems I see are to do with multiple
small I/Os being issued rather than a single large I/O....

> - The hard value, ->max_hw_sectors. Previously, there was no real clear
>   definition of what ->max_sectors was supposed to do. We couldn't
>   increase it to fit the hardware limits of most hardware, because that
>   would hurt us latency/memory wise.

But we did have max_sectors = max_hw_sectors and I can't say that
I've seen any evidence that it hurt us latency/memory wise.

> > > The best approach is probably to tune max_sectors on the system itself.
> > > That's why it is exposed, after all.
> > 
> > You mean /sys/block/sd*/max_sector_kb?
> 
> Exactly.

Not happy. Now, instead of having a default config that works just fine,
we've got to change the config of every block device on every boot
on every machine we sell. This is a big deal when you're talking
about machines with _thousands_ of block devices on them all needing
to have their defaults changed.

BTW, can you point me to the discussion(s) that lead to this mod so
I can catch up on this quickly? I can't find anything on lkml or
linux-fsdevel about it....

Cheers,

Dave.
-- 
Dave Chinner
R&D Software Enginner
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Direct I/O bio size regression
  2006-04-26  2:30         ` David Chinner
@ 2006-04-26  5:28           ` Jens Axboe
  2006-04-26 15:41             ` David Chinner
  2006-05-07 16:25           ` Lee Revell
  1 sibling, 1 reply; 15+ messages in thread
From: Jens Axboe @ 2006-04-26  5:28 UTC (permalink / raw)
  To: David Chinner; +Cc: linux-kernel

On Wed, Apr 26 2006, David Chinner wrote:
> On Mon, Apr 24, 2006 at 08:47:30PM +0200, Jens Axboe wrote:
> > On Tue, Apr 25 2006, David Chinner wrote:
> > > On Mon, Apr 24, 2006 at 11:05:08AM +0200, Jens Axboe wrote:
> > > > 
> > > > Spoke too soon... The last part is actually on purpose, to prevent
> > > > really huge requests as part of normal file system IO.
> > > 
> > > I don't understand why this was considered necessary. It
> > > doesn't appear to be explained in any of the code so can you
> > > explain the problem that large filesystem I/Os pose to the block
> > > layer? We _need_ to be able to drive really huge requests from the
> > > filesystem down to the disks, especially for direct I/O.....
> > > 
> > > FWIW, we've just got XFS to the point where we could issue large
> > > I/Os (up to 8MB on 16k pages) with a default configuration kernel
> > > and filesystem using md+dm on an Altix. That makes an artificial
> > > 512KB filesystem I/O size limit a pretty major step backwards in
> > > terms of performance for default configs.....
> > 
> > The change was needed to safely split max_sectors into two sane parts:
> > 
> > - The soft value, ->max_sectors, that holds a sane default of maximum io
> >   size. The main issue we want to prevent is filling the queue with huge
> >   amounts of io, both from a pinning POV but also from user latency
> >   reasons.
> 
> Got any data that you can share with us?
> 
> Wrt latency, is the problem to do with large requests causing short
> term latency? I thought that latency minimisation is the job of the
> I/O scheduler, so if this is the case, doesn't this indicate a
> deficiency of the I/O scheduler? e.g. the I/o scheduler could split
> large requests to reduce latency, just like you merge adjacent
> requests to reduce the number of I/Os and keep overall latency
> low...

What would be the point of allowing you to build these large ios only to
split them up again? It's not only painfully inefficient, it's also
tricky to do since it requires extra allocations and no good place to do
it.

> And as to the pinning problem - if you have a problem with too much
> memory in the I/O queues, then the I/O queues are too deep or they
> need to be throttled based on the amount of data in them as well as
> the number of queued requests.  It's the method or configuration of
> the I/O scheduler being used to throttle requests that is deficient
> here, not the fact that a filesystem is building large I/Os.
> 
> It seems to me that you've crippled the block layer to solve very
> specific problems that most people don't see. I haven't seen pinning
> problems since the cfq request queue depth was reduced from 8192 to
> 128 and all the I/O latency problems I see are to do with multiple
> small I/Os being issued rather than a single large I/O....

I haven't crippled anything, in fact it's a lot more flexible now. I
don't know why you are whining, you have the exact same possibilities to
do large ios as you did before. Up max_sectors_kb.

8192 requests was nasty. And guess what, any recent ide or sata drive
should have 32768 as max_sectors_kb value. Multiply that by 128 * 2
(nr_requests * 2) and you have 8 times as much memory pinned in the
queue as 8192 requests did for IDE.

> > - The hard value, ->max_hw_sectors. Previously, there was no real clear
> >   definition of what ->max_sectors was supposed to do. We couldn't
> >   increase it to fit the hardware limits of most hardware, because that
> >   would hurt us latency/memory wise.
> 
> But we did have max_sectors = max_hw_sectors and I can't say that
> I've seen any evidence that it hurt us latency/memory wise.

Well good for you.

> > > > The best approach is probably to tune max_sectors on the system itself.
> > > > That's why it is exposed, after all.
> > > 
> > > You mean /sys/block/sd*/max_sector_kb?
> > 
> > Exactly.
> 
> Not happy. Now, instead of having a default config that works just fine,
> we've got to change the config of every block device on every boot
> on every machine we sell. This is a big deal when you're talking
> about machines with _thousands_ of block devices on them all needing
> to have their defaults changed.

Oh please, it's a simple operation. I doubt you put monkeys in front of
the machines doing this manually.

> BTW, can you point me to the discussion(s) that lead to this mod so
> I can catch up on this quickly? I can't find anything on lkml or
> linux-fsdevel about it....

See the postings from Mike Christie that led to the patches containing
this. WRT the max_sectors/max_hw_sectors splitup, see discussions from
the -RT people on long completion run times on large requests.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Direct I/O bio size regression
  2006-04-26  5:28           ` Jens Axboe
@ 2006-04-26 15:41             ` David Chinner
  2006-04-26 17:55               ` Jens Axboe
  0 siblings, 1 reply; 15+ messages in thread
From: David Chinner @ 2006-04-26 15:41 UTC (permalink / raw)
  To: Jens Axboe; +Cc: David Chinner, linux-kernel

On Wed, Apr 26, 2006 at 07:28:47AM +0200, Jens Axboe wrote:
> On Wed, Apr 26 2006, David Chinner wrote:
> > On Mon, Apr 24, 2006 at 08:47:30PM +0200, Jens Axboe wrote:
> > > The change was needed to safely split max_sectors into two sane parts:
> > > 
> > > - The soft value, ->max_sectors, that holds a sane default of maximum io
> > >   size. The main issue we want to prevent is filling the queue with huge
> > >   amounts of io, both from a pinning POV but also from user latency
> > >   reasons.
> > 
> > Got any data that you can share with us?
> > 
> > Wrt latency, is the problem to do with large requests causing short
> > term latency? I thought that latency minimisation is the job of the
> > I/O scheduler, so if this is the case, doesn't this indicate a
> > deficiency of the I/O scheduler? e.g. the I/o scheduler could split
> > large requests to reduce latency, just like you merge adjacent
> > requests to reduce the number of I/Os and keep overall latency
> > low...
> 
> What would be the point of allowing you to build these large ios only to
> split them up again?

Filesystems have good reasons to issue large I/Os.  Large I/Os for
XFS mean less locking and filesystem structure traversal (i.e CPU
usage) for a given load, we execute fewer and larger allocations, we
get better parallelism and scalability, less transactions are
required which means less log writes to the device, etc. There's a
few second order filesystem effects that I can think of like this
that result in reduced filesystem I/O load on the block device...

AFAICT, the latency issue you talk about is not a filesystem issue
but a block layer issue, so my question is why you saw this as
something the filesystems were doing wrong rather than a deficiency
with the I/O scheduler. I'm not advocating you do what I said,
I was illustrating a possible alternate approach to reducing
latency in the block layer when the filesystem issues large I/Os.

FWIW, I'd much prefer to do 2 concurrent 2MB I/Os than issue
2x(4x512KB) I/Os and hope that we don't get 8 seeks instead of 2.
Larger I/Os give far more consistent performance than the equivalent
throughput in small I/Os. On loads where latency matters,
consistent, predictable latency is preferable even if it means a
higher baseline latency.

> it's also
> tricky to do since it requires extra allocations and no good place to do
> it.

Allocation fails - means OOM, latency is already shot to pieces and
you can simply ship it to disk as it stands. And i'd think that
the place to split it would be in the merge function. My guess is
that dm and md would provide plenty of examples of what to do ;)

Like I said though, it was simply an example....

> > And as to the pinning problem - if you have a problem with too much
> > memory in the I/O queues, then the I/O queues are too deep or they
> > need to be throttled based on the amount of data in them as well as
> > the number of queued requests.  It's the method or configuration of
> > the I/O scheduler being used to throttle requests that is deficient
> > here, not the fact that a filesystem is building large I/Os.
> > 
> > It seems to me that you've crippled the block layer to solve very
> > specific problems that most people don't see. I haven't seen pinning
> > problems since the cfq request queue depth was reduced from 8192 to
> > 128 and all the I/O latency problems I see are to do with multiple
> > small I/Os being issued rather than a single large I/O....
> 
> I haven't crippled anything, in fact it's a lot more flexible now. I
> don't know why you are whining, you have the exact same possibilities to
> do large ios as you did before. Up max_sectors_kb.
> 
> 8192 requests was nasty.

I still have the scars.....

> And guess what, any recent ide or sata drive
> should have 32768 as max_sectors_kb value.

Please correct me if I'm wrong - my understanding is that a max_sectors_kb
this large is mostly irrelevant because the maximum size of an single I/O
linux can support:

#define BIO_MAX_PAGES           (256)

On a 4k page machine, we've already got a maximum of 1MB for an I/O.

And looking at at max_hw_segments or max_phys_segments, they both restrict the
size of the bio. Seems that most devices set them to 128 or 256 as well.

That means that on a 4k page machine 512k or 1MB are the most common largest
possible I/O sizes and hence the maximum amount of memory pinned is 32 or 64x
smaller than your original claim.

Also If that is correct, then capping max_sectors to 512KB can't do much
for reducing I/O latency or the amount of pinned memory on these systems
because they couldn't issue I/O much larger than this.

So I'm still not understanding the rationale behind the new default setting
of max_sectors....

Cheers,

Dave.
-- 
Dave Chinner
R&D Software Enginner
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Direct I/O bio size regression
  2006-04-26 15:41             ` David Chinner
@ 2006-04-26 17:55               ` Jens Axboe
  0 siblings, 0 replies; 15+ messages in thread
From: Jens Axboe @ 2006-04-26 17:55 UTC (permalink / raw)
  To: David Chinner; +Cc: linux-kernel

On Thu, Apr 27 2006, David Chinner wrote:
> On Wed, Apr 26, 2006 at 07:28:47AM +0200, Jens Axboe wrote:
> > On Wed, Apr 26 2006, David Chinner wrote:
> > > On Mon, Apr 24, 2006 at 08:47:30PM +0200, Jens Axboe wrote:
> > > > The change was needed to safely split max_sectors into two sane parts:
> > > > 
> > > > - The soft value, ->max_sectors, that holds a sane default of maximum io
> > > >   size. The main issue we want to prevent is filling the queue with huge
> > > >   amounts of io, both from a pinning POV but also from user latency
> > > >   reasons.
> > > 
> > > Got any data that you can share with us?
> > > 
> > > Wrt latency, is the problem to do with large requests causing short
> > > term latency? I thought that latency minimisation is the job of the
> > > I/O scheduler, so if this is the case, doesn't this indicate a
> > > deficiency of the I/O scheduler? e.g. the I/o scheduler could split
> > > large requests to reduce latency, just like you merge adjacent
> > > requests to reduce the number of I/Os and keep overall latency
> > > low...
> > 
> > What would be the point of allowing you to build these large ios only to
> > split them up again?
> 
> Filesystems have good reasons to issue large I/Os.  Large I/Os for
> XFS mean less locking and filesystem structure traversal (i.e CPU
> usage) for a given load, we execute fewer and larger allocations, we
> get better parallelism and scalability, less transactions are
> required which means less log writes to the device, etc. There's a
> few second order filesystem effects that I can think of like this
> that result in reduced filesystem I/O load on the block device...

It's pretty clear that where you are coming from - the big server camp.
And yes, for those cases you pretty much always want huge ios. And yes,
you can get those, just set the exposed variable and be done with it. I
don't know why you are still making an issue of this or debating it...

> AFAICT, the latency issue you talk about is not a filesystem issue
> but a block layer issue, so my question is why you saw this as
> something the filesystems were doing wrong rather than a deficiency
> with the I/O scheduler. I'm not advocating you do what I said,
> I was illustrating a possible alternate approach to reducing
> latency in the block layer when the filesystem issues large I/Os.

It's actually a colaboration between the block layer / io scheduler and
the vm. The vm to some extent relies on the block layer throttling to
not have gigabytes or io in progress. I'm not too fond of that, but
that's the way it still is.

> FWIW, I'd much prefer to do 2 concurrent 2MB I/Os than issue
> 2x(4x512KB) I/Os and hope that we don't get 8 seeks instead of 2.
> Larger I/Os give far more consistent performance than the equivalent
> throughput in small I/Os. On loads where latency matters,
> consistent, predictable latency is preferable even if it means a
> higher baseline latency.

I agree. For most people it doesn't matter though, and the usecs spent
in completion is more important as it gives them skipless audio and
whatnot.

> > it's also
> > tricky to do since it requires extra allocations and no good place to do
> > it.
> 
> Allocation fails - means OOM, latency is already shot to pieces and
> you can simply ship it to disk as it stands. And i'd think that
> the place to split it would be in the merge function. My guess is
> that dm and md would provide plenty of examples of what to do ;)
> 
> Like I said though, it was simply an example....

It was a joke, there's no way on earth I'd ever add something like this.
It's pointless. Splitting is actually pretty tricky to get right (I'm
sure if you'd ever looked at dm splitting you would not be making such
suggestions). And since we are building these large ios in the first
place, the way to go is naturally to _disallow_ building something we
would split up later.

Take a look at how the buildup works, dm/md have hooks to say yay or nay
to adding another page to a bio.

> > And guess what, any recent ide or sata drive
> > should have 32768 as max_sectors_kb value.
> 
> Please correct me if I'm wrong - my understanding is that a max_sectors_kb
> this large is mostly irrelevant because the maximum size of an single I/O
> linux can support:
> 
> #define BIO_MAX_PAGES           (256)
> 
> On a 4k page machine, we've already got a maximum of 1MB for an I/O.

Not quite true. It's the largest size of a bio, but you can have
multiple bios tied to a request.

> And looking at at max_hw_segments or max_phys_segments, they both
> restrict the size of the bio. Seems that most devices set them to 128
> or 256 as well.

We can never exceed the driver given value of course. If the hardware
can do more than it advertises, then it would be a good idea to adjust
those values. Don't forget that a segment may be larger than a page.

> That means that on a 4k page machine 512k or 1MB are the most common
> largest possible I/O sizes and hence the maximum amount of memory
> pinned is 32 or 64x smaller than your original claim.
>
> Also If that is correct, then capping max_sectors to 512KB can't do much
> for reducing I/O latency or the amount of pinned memory on these systems
> because they couldn't issue I/O much larger than this.

Most drivers don't set a true max_hw_sectors yet, what I stated was for
what SATA (or lba48 on IDE, but that is limited by 256 segments anyway)
can support. The reason they don't set it yet is because this soft/hard
value split up is still quite fresh and drivers were told in the past to
not set huge max_sectors values because of this.

> So I'm still not understanding the rationale behind the new default setting
> of max_sectors....

Sorry I'm not making myself clearer. You still seem to not understand
exactly how the request buildup or limitations work, perhaps you should
start there. As to the problem in general, seems to me you are making a
big deal out of a small problem - a problem that can easily be rectified
in user space by just setting the right value.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Direct I/O bio size regression
  2006-04-26  2:30         ` David Chinner
  2006-04-26  5:28           ` Jens Axboe
@ 2006-05-07 16:25           ` Lee Revell
  1 sibling, 0 replies; 15+ messages in thread
From: Lee Revell @ 2006-05-07 16:25 UTC (permalink / raw)
  To: David Chinner; +Cc: Jens Axboe, linux-kernel

On Wed, 2006-04-26 at 12:30 +1000, David Chinner wrote:
> Got any data that you can share with us?
> 

The thread was from July 2004 and was called:



Re: [linux-audio-dev] Re: [announce]
[patch] Voluntary Kernel Preemption
Patch

Also some info in thread:



Re: [patch]
voluntary-preempt-2.6.8-rc2-M5

> Wrt latency, is the problem to do with large requests causing short
> term latency? I thought that latency minimisation is the job of the
> I/O scheduler, so if this is the case, doesn't this indicate a
> deficiency of the I/O scheduler? e.g. the I/o scheduler could split
> large requests to reduce latency, just like you merge adjacent
> requests to reduce the number of I/Os and keep overall latency
> low...
> 

I think you are talking past each other - Jens is referring to scheduler
latency, not IO latency.

Lee


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Direct I/O bio size regression
  2006-04-25  7:52     ` Nick Piggin
@ 2006-04-25 10:45       ` Al Boldi
  0 siblings, 0 replies; 15+ messages in thread
From: Al Boldi @ 2006-04-25 10:45 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Jens Axboe, linux-kernel, David Chinner

Nick Piggin wrote:
> Al Boldi wrote:
> > Jens Axboe wrote:
> >>On Mon, Apr 24 2006, Al Boldi wrote:
> >>>On my system max_hw_sectors_kb is fixed at 1024, and max_sectors_kb
> >>>defaults to 512, which leads to terribly fluctuating thruput.
> >>>
> >>>Setting max_sectors_kb = max_hw_sectors_kb makes things even worse.
> >>>
> >>>Tuning max_sectors_kb to ~192 only stabilizes this situation.
> >>
> >>That sounds pretty strange. Do you have a test case?
> >
> > I would think that, if you could get your hands on some hw that defaults
> > to the same values, you may easily see the same problem by doing this:
> >
> > 1. # vmstat 1 (or some other bio mon)
> > 2. < change vt >
> > 3. # cat /dev/hda > /dev/null &
> > 4. # cat /dev/hda > /dev/null
> > Let this second cat run for a sec, then ^C.
> > Depending on your hw specifics the bio should either go up or down by a
> > factor of 2 (on my system 25mb/s-48mb/s).  You may have to repeat step 4
> > a few times to aggravate the situation.
> >
> > Note that this is not specific to cat, but can also be observed during
> > normal random disk access, although not in a controlled manner.
>
> *random* disk access?
>
> What io scheduler are you using? Can you try with as?

Same w/ deadline, as, and cfq.

Thanks!

--
Al


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Direct I/O bio size regression
  2006-04-24 20:59   ` Al Boldi
@ 2006-04-25  7:52     ` Nick Piggin
  2006-04-25 10:45       ` Al Boldi
  0 siblings, 1 reply; 15+ messages in thread
From: Nick Piggin @ 2006-04-25  7:52 UTC (permalink / raw)
  To: Al Boldi; +Cc: Jens Axboe, linux-kernel, David Chinner

Al Boldi wrote:
> Jens Axboe wrote:
> 
>>On Mon, Apr 24 2006, Al Boldi wrote:
>>
>>>On my system max_hw_sectors_kb is fixed at 1024, and max_sectors_kb
>>>defaults to 512, which leads to terribly fluctuating thruput.
>>>
>>>Setting max_sectors_kb = max_hw_sectors_kb makes things even worse.
>>>
>>>Tuning max_sectors_kb to ~192 only stabilizes this situation.
>>
>>That sounds pretty strange. Do you have a test case?
> 
> 
> I would think that, if you could get your hands on some hw that defaults to 
> the same values, you may easily see the same problem by doing this:
> 
> 1. # vmstat 1 (or some other bio mon)
> 2. < change vt >
> 3. # cat /dev/hda > /dev/null &
> 4. # cat /dev/hda > /dev/null
> Let this second cat run for a sec, then ^C.
> Depending on your hw specifics the bio should either go up or down by a 
> factor of 2 (on my system 25mb/s-48mb/s).  You may have to repeat step 4 a 
> few times to aggravate the situation.
> 
> Note that this is not specific to cat, but can also be observed during normal 
> random disk access, although not in a controlled manner.

*random* disk access?

What io scheduler are you using? Can you try with as?

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Direct I/O bio size regression
  2006-04-24 19:49 ` Jens Axboe
@ 2006-04-24 20:59   ` Al Boldi
  2006-04-25  7:52     ` Nick Piggin
  0 siblings, 1 reply; 15+ messages in thread
From: Al Boldi @ 2006-04-24 20:59 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-kernel, David Chinner

Jens Axboe wrote:
> On Mon, Apr 24 2006, Al Boldi wrote:
> > On my system max_hw_sectors_kb is fixed at 1024, and max_sectors_kb
> > defaults to 512, which leads to terribly fluctuating thruput.
> >
> > Setting max_sectors_kb = max_hw_sectors_kb makes things even worse.
> >
> > Tuning max_sectors_kb to ~192 only stabilizes this situation.
>
> That sounds pretty strange. Do you have a test case?

I would think that, if you could get your hands on some hw that defaults to 
the same values, you may easily see the same problem by doing this:

1. # vmstat 1 (or some other bio mon)
2. < change vt >
3. # cat /dev/hda > /dev/null &
4. # cat /dev/hda > /dev/null
Let this second cat run for a sec, then ^C.
Depending on your hw specifics the bio should either go up or down by a 
factor of 2 (on my system 25mb/s-48mb/s).  You may have to repeat step 4 a 
few times to aggravate the situation.

Note that this is not specific to cat, but can also be observed during normal 
random disk access, although not in a controlled manner.

Setting max_sectors_kb to ~192 seems to inhibit this problem.

Thanks!

--
Al






^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Direct I/O bio size regression
  2006-04-24 17:06 Al Boldi
@ 2006-04-24 19:49 ` Jens Axboe
  2006-04-24 20:59   ` Al Boldi
  0 siblings, 1 reply; 15+ messages in thread
From: Jens Axboe @ 2006-04-24 19:49 UTC (permalink / raw)
  To: Al Boldi; +Cc: linux-kernel, David Chinner

On Mon, Apr 24 2006, Al Boldi wrote:
> David Chinner wrote:
> > On Mon, Apr 24, 2006 at 11:05:08AM +0200, Jens Axboe wrote:
> > > On Mon, Apr 24 2006, Jens Axboe wrote:
> > > > > Index: 2.6.x-xfs-new/fs/bio.c
> > > > > ===================================================================
> > > > > --- 2.6.x-xfs-new.orig/fs/bio.c   2006-02-06 11:57:50.000000000
> > > > > +1100 +++ 2.6.x-xfs-new/fs/bio.c        2006-04-24
> > > > > 15:46:16.849484424 +1000 @@ -304,7 +304,7 @@ int
> > > > > bio_get_nr_vecs(struct block_device request_queue_t *q =
> > > > > bdev_get_queue(bdev);
> > > > >   int nr_pages;
> > > > >
> > > > > - nr_pages = ((q->max_sectors << 9) + PAGE_SIZE - 1) >> PAGE_SHIFT;
> > > > > + nr_pages = ((q->max_hw_sectors << 9) + PAGE_SIZE - 1) >>
> > > > > PAGE_SHIFT; if (nr_pages > q->max_phys_segments)
> > > > >           nr_pages = q->max_phys_segments;
> > > > >   if (nr_pages > q->max_hw_segments)
> > > > > @@ -446,7 +446,7 @@ int bio_add_page(struct bio *bio, struct
> > > > >            unsigned int offset)
> > > > >  {
> > > > >   struct request_queue *q = bdev_get_queue(bio->bi_bdev);
> > > > > - return __bio_add_page(q, bio, page, len, offset, q->max_sectors);
> > > > > + return __bio_add_page(q, bio, page, len, offset,
> > > > > q->max_hw_sectors); }
> > > > >
> > > > >  struct bio_map_data {
> > > >
> > > > Clearly correct, I'll make sure this gets merged right away.
> > >
> > > Spoke too soon... The last part is actually on purpose, to prevent
> > > really huge requests as part of normal file system IO.
> >
> > I don't understand why this was considered necessary. It
> > doesn't appear to be explained in any of the code so can you
> > explain the problem that large filesystem I/Os pose to the block
> > layer? We _need_ to be able to drive really huge requests from the
> > filesystem down to the disks, especially for direct I/O.....
> > FWIW, we've just got XFS to the point where we could issue large
> > I/Os (up to 8MB on 16k pages) with a default configuration kernel
> > and filesystem using md+dm on an Altix. That makes an artificial
> > 512KB filesystem I/O size limit a pretty major step backwards in
> > terms of performance for default configs.....
> >
> > > That's why we
> > > have a bio_add_pc_page(). The first hunk may cause things to not work
> > > optimally then if we don't apply the last hunk.
> >
> > bio_add_pc_page() requires a request queue to be passed to it.  It's
> > called only from scsi layers in the context of mapping pages into a
> > bio from sg_io(). The comment for bio_add_pc_page() says for use
> > with REQ_PC queues only, and that appears to only be used by ide-cd
> > cdroms. Is that comment correct?
> >
> > Also, it seems to me that using bio_add_pc_page() in a filesystem
> > or in the generic direct i/o code seems like a gross layering
> > violation to me because they are supposed to know nothing about
> > request queues.
> >
> > > The best approach is probably to tune max_sectors on the system itself.
> > > That's why it is exposed, after all.
> >
> > You mean /sys/block/sd*/max_sector_kb?
> 
> On my system max_hw_sectors_kb is fixed at 1024, and max_sectors_kb defaults 
> to 512, which leads to terribly fluctuating thruput.
> 
> Setting max_sectors_kb = max_hw_sectors_kb makes things even worse.
> 
> Tuning max_sectors_kb to ~192 only stabilizes this situation.

That sounds pretty strange. Do you have a test case?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Direct I/O bio size regression
@ 2006-04-24 17:06 Al Boldi
  2006-04-24 19:49 ` Jens Axboe
  0 siblings, 1 reply; 15+ messages in thread
From: Al Boldi @ 2006-04-24 17:06 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-kernel, David Chinner

David Chinner wrote:
> On Mon, Apr 24, 2006 at 11:05:08AM +0200, Jens Axboe wrote:
> > On Mon, Apr 24 2006, Jens Axboe wrote:
> > > > Index: 2.6.x-xfs-new/fs/bio.c
> > > > ===================================================================
> > > > --- 2.6.x-xfs-new.orig/fs/bio.c   2006-02-06 11:57:50.000000000
> > > > +1100 +++ 2.6.x-xfs-new/fs/bio.c        2006-04-24
> > > > 15:46:16.849484424 +1000 @@ -304,7 +304,7 @@ int
> > > > bio_get_nr_vecs(struct block_device request_queue_t *q =
> > > > bdev_get_queue(bdev);
> > > >   int nr_pages;
> > > >
> > > > - nr_pages = ((q->max_sectors << 9) + PAGE_SIZE - 1) >> PAGE_SHIFT;
> > > > + nr_pages = ((q->max_hw_sectors << 9) + PAGE_SIZE - 1) >>
> > > > PAGE_SHIFT; if (nr_pages > q->max_phys_segments)
> > > >           nr_pages = q->max_phys_segments;
> > > >   if (nr_pages > q->max_hw_segments)
> > > > @@ -446,7 +446,7 @@ int bio_add_page(struct bio *bio, struct
> > > >            unsigned int offset)
> > > >  {
> > > >   struct request_queue *q = bdev_get_queue(bio->bi_bdev);
> > > > - return __bio_add_page(q, bio, page, len, offset, q->max_sectors);
> > > > + return __bio_add_page(q, bio, page, len, offset,
> > > > q->max_hw_sectors); }
> > > >
> > > >  struct bio_map_data {
> > >
> > > Clearly correct, I'll make sure this gets merged right away.
> >
> > Spoke too soon... The last part is actually on purpose, to prevent
> > really huge requests as part of normal file system IO.
>
> I don't understand why this was considered necessary. It
> doesn't appear to be explained in any of the code so can you
> explain the problem that large filesystem I/Os pose to the block
> layer? We _need_ to be able to drive really huge requests from the
> filesystem down to the disks, especially for direct I/O.....
> FWIW, we've just got XFS to the point where we could issue large
> I/Os (up to 8MB on 16k pages) with a default configuration kernel
> and filesystem using md+dm on an Altix. That makes an artificial
> 512KB filesystem I/O size limit a pretty major step backwards in
> terms of performance for default configs.....
>
> > That's why we
> > have a bio_add_pc_page(). The first hunk may cause things to not work
> > optimally then if we don't apply the last hunk.
>
> bio_add_pc_page() requires a request queue to be passed to it.  It's
> called only from scsi layers in the context of mapping pages into a
> bio from sg_io(). The comment for bio_add_pc_page() says for use
> with REQ_PC queues only, and that appears to only be used by ide-cd
> cdroms. Is that comment correct?
>
> Also, it seems to me that using bio_add_pc_page() in a filesystem
> or in the generic direct i/o code seems like a gross layering
> violation to me because they are supposed to know nothing about
> request queues.
>
> > The best approach is probably to tune max_sectors on the system itself.
> > That's why it is exposed, after all.
>
> You mean /sys/block/sd*/max_sector_kb?

On my system max_hw_sectors_kb is fixed at 1024, and max_sectors_kb defaults 
to 512, which leads to terribly fluctuating thruput.

Setting max_sectors_kb = max_hw_sectors_kb makes things even worse.

Tuning max_sectors_kb to ~192 only stabilizes this situation.

Would you think that this points to some underlying bio/queue problem?

Thanks!

--
Al


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2006-05-07 16:25 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-04-24  6:14 [PATCH] Direct I/O bio size regression David Chinner
2006-04-24  7:02 ` Jens Axboe
2006-04-24  9:05   ` Jens Axboe
2006-04-24 14:56     ` David Chinner
2006-04-24 18:47       ` Jens Axboe
2006-04-26  2:30         ` David Chinner
2006-04-26  5:28           ` Jens Axboe
2006-04-26 15:41             ` David Chinner
2006-04-26 17:55               ` Jens Axboe
2006-05-07 16:25           ` Lee Revell
2006-04-24 17:06 Al Boldi
2006-04-24 19:49 ` Jens Axboe
2006-04-24 20:59   ` Al Boldi
2006-04-25  7:52     ` Nick Piggin
2006-04-25 10:45       ` Al Boldi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.