linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Poor read performance when sequential write presents
@ 2002-05-23 14:20 chen, xiangping
  2002-05-23 19:51 ` Andrew Morton
  0 siblings, 1 reply; 12+ messages in thread
From: chen, xiangping @ 2002-05-23 14:20 UTC (permalink / raw)
  To: linux-kernel

Hi,

I did a IO test with one sequential read and one sequential write 
to different files. I expected somewhat similar throughput on read
and write. But it seemed that the read is blocked until the write
finishes. After the write process finished, the read process slowly
picks up the speed. Is Linux buffer cache in favor of write? How
to tune it?


Thanks,

Xiangping Chen

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Poor read performance when sequential write presents
  2002-05-23 14:20 Poor read performance when sequential write presents chen, xiangping
@ 2002-05-23 19:51 ` Andrew Morton
  2002-05-24  8:59   ` Giuliano Pochini
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Morton @ 2002-05-23 19:51 UTC (permalink / raw)
  To: chen, xiangping; +Cc: linux-kernel

"chen, xiangping" wrote:
> 
> Hi,
> 
> I did a IO test with one sequential read and one sequential write
> to different files. I expected somewhat similar throughput on read
> and write. But it seemed that the read is blocked until the write
> finishes. After the write process finished, the read process slowly
> picks up the speed. Is Linux buffer cache in favor of write? How
> to tune it?
> 

Reads and writes are very different beasts - writes deal with
the past and have good knowledge of what to do.  But reads
must predict the future.

You need to do two things:

1: Configure the device for a really big readahead window.

   Configuring readahead in 2.4 is a pig.  Try one of the
   following:

     echo file_readahead:N > /proc/ide/hda/settings   (N is kilobytes)
     blockdev --setra M /dev/hda                      (M is in 512 byte sectors)
     echo K > /prov/sys/vm/max-readahead              (K is in pages - 4k on x86)

   You'll find that one of these makes a difference.

2: Apply http://www.zip.com.au/~akpm/linux/patches/2.4/2.4.19-pre5/read-latency2.patch
   which will prevent reads from being penalised by writes.
   Or use a -ac kernel, which already has this patch.

-

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Poor read performance when sequential write presents
  2002-05-23 19:51 ` Andrew Morton
@ 2002-05-24  8:59   ` Giuliano Pochini
  2002-05-24  9:26     ` Andrew Morton
  0 siblings, 1 reply; 12+ messages in thread
From: Giuliano Pochini @ 2002-05-24  8:59 UTC (permalink / raw)
  To: linux-kernel; +Cc: chen, xiangping, Andrew Morton


>> I did a IO test with one sequential read and one sequential write
>> to different files. I expected somewhat similar throughput on read
>> and write. But it seemed that the read is blocked until the write
>> finishes. After the write process finished, the read process slowly
>> picks up the speed. Is Linux buffer cache in favor of write? How
>> to tune it?
> [...]
> 2: Apply http://www.zip.com.au/~akpm/linux/patches/2.4/2.4.19-pre5/read-latency2.patch

Hmmm, someone wrote a patch to fix another related problem: the fact
that multiple readers read at a very different speed. It's not unusual
that one reader gets stuck until all other have finished. I don't
remember who wrote that patch, sorry.


Bye.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Poor read performance when sequential write presents
  2002-05-24  8:59   ` Giuliano Pochini
@ 2002-05-24  9:26     ` Andrew Morton
  2002-05-24  9:46       ` William Lee Irwin III
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Morton @ 2002-05-24  9:26 UTC (permalink / raw)
  To: Giuliano Pochini; +Cc: linux-kernel, chen, xiangping

Giuliano Pochini wrote:
> 
> >> I did a IO test with one sequential read and one sequential write
> >> to different files. I expected somewhat similar throughput on read
> >> and write. But it seemed that the read is blocked until the write
> >> finishes. After the write process finished, the read process slowly
> >> picks up the speed. Is Linux buffer cache in favor of write? How
> >> to tune it?
> > [...]
> > 2: Apply http://www.zip.com.au/~akpm/linux/patches/2.4/2.4.19-pre5/read-latency2.patch
> 
> Hmmm, someone wrote a patch to fix another related problem: the fact
> that multiple readers read at a very different speed. It's not unusual
> that one reader gets stuck until all other have finished. I don't
> remember who wrote that patch, sorry.

Oh absolutely.   That's the reason why 2.4 is beating 2.5 at tiobench with
more than one thread.  2.5 is alternating fairly between threads and 2.4
is not.  So 2.4 seeks less.

I've been testing this extensively on 2.5 + multipage BIO I/O and when you
increase readahead from 32 pages (two BIOs) to 64 pages (4 BIOs), 2.5 goes
from perfect to horrid - each threads grabs the disk head and performs many,
many megabytes of read before any other thread gets a share.  Net effect is
that the tiobench numbers are great, but any operation which involves
reading disk has 30 or 60 second latencies.

Interestingly, it seems specific to IDE.  SCSI behaves well.

I have tons of traces and debug code - I'll bug Jens about this in a week or
so.

-

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Poor read performance when sequential write presents
  2002-05-24  9:26     ` Andrew Morton
@ 2002-05-24  9:46       ` William Lee Irwin III
  2002-05-24 10:04         ` Andrew Morton
  0 siblings, 1 reply; 12+ messages in thread
From: William Lee Irwin III @ 2002-05-24  9:46 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Giuliano Pochini, linux-kernel, chen, xiangping

On Fri, May 24, 2002 at 02:26:48AM -0700, Andrew Morton wrote:
> Oh absolutely.   That's the reason why 2.4 is beating 2.5 at tiobench with
> more than one thread.  2.5 is alternating fairly between threads and 2.4
> is not.  So 2.4 seeks less.

In one sense or another some sort of graceful transition to unfair
behavior could be considered a kind of thrashing control; how meaningful
that is in the context of disk I/O is a question I can't answer directly,
though. Do you have any comments on this potential strategic unfairness?


On Fri, May 24, 2002 at 02:26:48AM -0700, Andrew Morton wrote:
> I've been testing this extensively on 2.5 + multipage BIO I/O and when you
> increase readahead from 32 pages (two BIOs) to 64 pages (4 BIOs), 2.5 goes
> from perfect to horrid - each threads grabs the disk head and performs many,
> many megabytes of read before any other thread gets a share.  Net effect is
> that the tiobench numbers are great, but any operation which involves
> reading disk has 30 or 60 second latencies.
> Interestingly, it seems specific to IDE.  SCSI behaves well.
> I have tons of traces and debug code - I'll bug Jens about this in a week or
> so.

What kinds of phenomena appear to be associated with IDE's latencies?
I recall some comments from prior IDE maintainers on poor interactions
between generic disk I/O layers and IDE drivers, particularly with
respect to small transactions being given to the drivers to perform.
Are these comments still relevant, or is this of a different nature?


Cheers,
Bill

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Poor read performance when sequential write presents
  2002-05-24  9:46       ` William Lee Irwin III
@ 2002-05-24 10:04         ` Andrew Morton
  2002-05-27  8:06           ` Jens Axboe
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Morton @ 2002-05-24 10:04 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: Giuliano Pochini, linux-kernel, chen, xiangping

William Lee Irwin III wrote:
> 
> On Fri, May 24, 2002 at 02:26:48AM -0700, Andrew Morton wrote:
> > Oh absolutely.   That's the reason why 2.4 is beating 2.5 at tiobench with
> > more than one thread.  2.5 is alternating fairly between threads and 2.4
> > is not.  So 2.4 seeks less.
> 
> In one sense or another some sort of graceful transition to unfair
> behavior could be considered a kind of thrashing control; how meaningful
> that is in the context of disk I/O is a question I can't answer directly,
> though. Do you have any comments on this potential strategic unfairness?

Well we already have controls for strategic unfairness: the "read passovers"
thing.  It sets a finite limit on the number of times which a request can
be walked past by the merging algorithm.   Once that counter has
expired, the request becomes effectively a merging barrier and it will
propagate to the head of the queue as fast as the disk can retire the
reads.

I don't have a problem if the `read latency' tunable (and this
algorithm) cause a single thread to hog the disk head across
multiple successive readahead windows.  That's probably a good thing,
and it's tunable.

But it seems to not be working right.

And there's no userspace tunable at this time.

> On Fri, May 24, 2002 at 02:26:48AM -0700, Andrew Morton wrote:
> > I've been testing this extensively on 2.5 + multipage BIO I/O and when you
> > increase readahead from 32 pages (two BIOs) to 64 pages (4 BIOs), 2.5 goes
> > from perfect to horrid - each threads grabs the disk head and performs many,
> > many megabytes of read before any other thread gets a share.  Net effect is
> > that the tiobench numbers are great, but any operation which involves
> > reading disk has 30 or 60 second latencies.
> > Interestingly, it seems specific to IDE.  SCSI behaves well.
> > I have tons of traces and debug code - I'll bug Jens about this in a week or
> > so.
> 
> What kinds of phenomena appear to be associated with IDE's latencies?
> I recall some comments from prior IDE maintainers on poor interactions
> between generic disk I/O layers and IDE drivers, particularly with
> respect to small transactions being given to the drivers to perform.
> Are these comments still relevant, or is this of a different nature?

I assume that there's a difference in the way in which the generic layer
treats queueing for IDE devices.  In 2.4, IDE devices are `head active',
so the request at the head of the queue is under I/O.  But SCSI isn't
head-active.  Requests get removed from the head of the queue prior to
being serviced.  At least, that's how I think it goes.  I also believe that
the 2.4 elevator does not look at the active request at the head when making
merging decisions.

But in 2.5, head-activeness went away and as far as I know, IDE and SCSI are
treated the same.  Odd.

-

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Poor read performance when sequential write presents
  2002-05-24 10:04         ` Andrew Morton
@ 2002-05-27  8:06           ` Jens Axboe
  2002-05-27  8:22             ` Andrew Morton
  0 siblings, 1 reply; 12+ messages in thread
From: Jens Axboe @ 2002-05-27  8:06 UTC (permalink / raw)
  To: Andrew Morton
  Cc: William Lee Irwin III, Giuliano Pochini, linux-kernel, chen, xiangping

On Fri, May 24 2002, Andrew Morton wrote:
> > What kinds of phenomena appear to be associated with IDE's latencies?
> > I recall some comments from prior IDE maintainers on poor interactions
> > between generic disk I/O layers and IDE drivers, particularly with
> > respect to small transactions being given to the drivers to perform.
> > Are these comments still relevant, or is this of a different nature?
> 
> I assume that there's a difference in the way in which the generic layer
> treats queueing for IDE devices.  In 2.4, IDE devices are `head active',
> so the request at the head of the queue is under I/O.  But SCSI isn't
> head-active.  Requests get removed from the head of the queue prior to
> being serviced.  At least, that's how I think it goes.  I also believe that

That's correct for IDE when the queue is unplugged (if plugged, first
request is ok to touch).

> the 2.4 elevator does not look at the active request at the head when making
> merging decisions.

When unplugged, right.

> But in 2.5, head-activeness went away and as far as I know, IDE and SCSI are
> treated the same.  Odd.

It didn't really go away, it just gets handled automatically now.
elv_next_request() marks the request as started, in which case the i/o
scheduler won't consider it for merging etc. SCSI removes the request
directly after it has been marked started, while IDE leaves it on the
queue until it completes. For IDE TCQ, the behaviour is the same as with
SCSI.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Poor read performance when sequential write presents
  2002-05-27  8:06           ` Jens Axboe
@ 2002-05-27  8:22             ` Andrew Morton
  2002-05-27  8:54               ` Jens Axboe
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Morton @ 2002-05-27  8:22 UTC (permalink / raw)
  To: Jens Axboe
  Cc: William Lee Irwin III, Giuliano Pochini, linux-kernel, chen, xiangping

Jens Axboe wrote:
> 
> ...
> > But in 2.5, head-activeness went away and as far as I know, IDE and SCSI are
> > treated the same.  Odd.
> 
> It didn't really go away, it just gets handled automatically now.
> elv_next_request() marks the request as started, in which case the i/o
> scheduler won't consider it for merging etc. SCSI removes the request
> directly after it has been marked started, while IDE leaves it on the
> queue until it completes. For IDE TCQ, the behaviour is the same as with
> SCSI.

It won't consider the active request at the head of the queue for 
merging (making the request larger).  But it _could_ consider the
request when making decisions about insertion (adding a new request
at the head of the queue because it's close-on-disk to the active
one).   Does it do that?

-

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Poor read performance when sequential write presents
  2002-05-27  8:22             ` Andrew Morton
@ 2002-05-27  8:54               ` Jens Axboe
  2002-05-27  9:35                 ` Andrew Morton
  0 siblings, 1 reply; 12+ messages in thread
From: Jens Axboe @ 2002-05-27  8:54 UTC (permalink / raw)
  To: Andrew Morton
  Cc: William Lee Irwin III, Giuliano Pochini, linux-kernel, chen, xiangping

On Mon, May 27 2002, Andrew Morton wrote:
> Jens Axboe wrote:
> > 
> > ...
> > > But in 2.5, head-activeness went away and as far as I know, IDE and SCSI are
> > > treated the same.  Odd.
> > 
> > It didn't really go away, it just gets handled automatically now.
> > elv_next_request() marks the request as started, in which case the i/o
> > scheduler won't consider it for merging etc. SCSI removes the request
> > directly after it has been marked started, while IDE leaves it on the
> > queue until it completes. For IDE TCQ, the behaviour is the same as with
> > SCSI.
> 
> It won't consider the active request at the head of the queue for 
> merging (making the request larger).  But it _could_ consider the
> request when making decisions about insertion (adding a new request
> at the head of the queue because it's close-on-disk to the active
> one).   Does it do that?

Only when the front request isn't active is it safe to consider
insertion in front of it. 2.5 does that exactly because it knows if the
request has been started, while 2.4 has to guess by looking at the
head-active flag and the plug status.

If the request is started, we will only consider placing in front of the
2nd request not after the 1st. We could consider in between 1st and 2nd,
that should be safe. In fact that should be perfectly safe, just move
the barrier and started test down after the insert test. *req is the
insert-after point.

diff -Nru a/drivers/block/elevator.c b/drivers/block/elevator.c
--- a/drivers/block/elevator.c	Mon May 27 10:53:53 2002
+++ b/drivers/block/elevator.c	Mon May 27 10:53:53 2002
@@ -174,9 +174,6 @@
 	while ((entry = entry->prev) != &q->queue_head) {
 		__rq = list_entry_rq(entry);
 
-		if (__rq->flags & (REQ_BARRIER | REQ_STARTED))
-			break;
-
 		/*
 		 * simply "aging" of requests in queue
 		 */
@@ -189,6 +186,9 @@
 
 		if (!*req && bio_rq_in_between(bio, __rq, &q->queue_head))
 			*req = __rq;
+
+		if (__rq->flags & (REQ_BARRIER | REQ_STARTED))
+			break;
 
 		if ((ret = elv_try_merge(__rq, bio))) {
 			if (ret == ELEVATOR_FRONT_MERGE)

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Poor read performance when sequential write presents
  2002-05-27  8:54               ` Jens Axboe
@ 2002-05-27  9:35                 ` Andrew Morton
  2002-05-28  9:25                   ` Jens Axboe
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Morton @ 2002-05-27  9:35 UTC (permalink / raw)
  To: Jens Axboe
  Cc: William Lee Irwin III, Giuliano Pochini, linux-kernel, chen, xiangping

Jens Axboe wrote:
> 
> On Mon, May 27 2002, Andrew Morton wrote:
> > Jens Axboe wrote:
> > >
> > > ...
> > > > But in 2.5, head-activeness went away and as far as I know, IDE and SCSI are
> > > > treated the same.  Odd.
> > >
> > > It didn't really go away, it just gets handled automatically now.
> > > elv_next_request() marks the request as started, in which case the i/o
> > > scheduler won't consider it for merging etc. SCSI removes the request
> > > directly after it has been marked started, while IDE leaves it on the
> > > queue until it completes. For IDE TCQ, the behaviour is the same as with
> > > SCSI.
> >
> > It won't consider the active request at the head of the queue for
> > merging (making the request larger).  But it _could_ consider the
> > request when making decisions about insertion (adding a new request
> > at the head of the queue because it's close-on-disk to the active
> > one).   Does it do that?
> 
> Only when the front request isn't active is it safe to consider
> insertion in front of it. 2.5 does that exactly because it knows if the
> request has been started, while 2.4 has to guess by looking at the
> head-active flag and the plug status.
> 
> If the request is started, we will only consider placing in front of the
> 2nd request not after the 1st. We could consider in between 1st and 2nd,
> that should be safe. In fact that should be perfectly safe, just move
> the barrier and started test down after the insert test. *req is the
> insert-after point.

Makes sense.  I suspect it may even worsen the problem I observed
with the mpage code.  Set the readahead to 256k with `blockdev --setra 512'
and then run tiobench.  The read latencies are massive - one thread
gets hold of the disk head and hogs it for 30-60 seconds.

The readahead code has a sort of double-window design.  The idea is that
if the disk does 50 megs/sec and your application processes data at
49 megs/sec, the application will never block on I/O.  At 256k readahead,
the readahead code will be laying out four BIOs at a time.  It's probable
that the application is actually submitting BIOs for a new readahead
window before all of the BIOs for the old one are complete.  So it's performing
merging against its own reads.

Given all this, what I would expect to see is for thread "A" to capture
the disk head for some period of time, until eventually one of thread "B"'s
requests expires its latency.  Then thread "B" gets to hog the disk head.
That's reasonable behaviour,  but the latencies are *enormous*.  Almost
like the latency stuff isn't working.  But it sure looks OK.

Not super-high priority at this time.  I'll play with it some more.
(Some userspace tunables for the elevator would be nice.  Hint. ;))

hmm.  Actually the code looks a bit odd:

                if (elv_linus_sequence(__rq)-- <= 0)
                        break;
                if (!(__rq->flags & REQ_CMD))
                        continue;
                if (elv_linus_sequence(__rq) < bio_sectors(bio))
                        break;

The first decrement is saying that elv_linus_sequence is in units of
requests, but the comparison (and the later `-= bio_sectors()') seems
to be saying it's in units of sectors.

I think calculating the latency in terms of requests makes more sense - just
ignore the actual size of those requests (or weight it down in some manner).
But I don't immediately see what the above code is up to?

-

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Poor read performance when sequential write presents
  2002-05-27  9:35                 ` Andrew Morton
@ 2002-05-28  9:25                   ` Jens Axboe
  2002-05-28  9:36                     ` Jens Axboe
  0 siblings, 1 reply; 12+ messages in thread
From: Jens Axboe @ 2002-05-28  9:25 UTC (permalink / raw)
  To: Andrew Morton
  Cc: William Lee Irwin III, Giuliano Pochini, linux-kernel, chen, xiangping

On Mon, May 27 2002, Andrew Morton wrote:
> > On Mon, May 27 2002, Andrew Morton wrote:
> > > Jens Axboe wrote:
> > > >
> > > > ...
> > > > > But in 2.5, head-activeness went away and as far as I know, IDE and SCSI are
> > > > > treated the same.  Odd.
> > > >
> > > > It didn't really go away, it just gets handled automatically now.
> > > > elv_next_request() marks the request as started, in which case the i/o
> > > > scheduler won't consider it for merging etc. SCSI removes the request
> > > > directly after it has been marked started, while IDE leaves it on the
> > > > queue until it completes. For IDE TCQ, the behaviour is the same as with
> > > > SCSI.
> > >
> > > It won't consider the active request at the head of the queue for
> > > merging (making the request larger).  But it _could_ consider the
> > > request when making decisions about insertion (adding a new request
> > > at the head of the queue because it's close-on-disk to the active
> > > one).   Does it do that?
> > 
> > Only when the front request isn't active is it safe to consider
> > insertion in front of it. 2.5 does that exactly because it knows if the
> > request has been started, while 2.4 has to guess by looking at the
> > head-active flag and the plug status.
> > 
> > If the request is started, we will only consider placing in front of the
> > 2nd request not after the 1st. We could consider in between 1st and 2nd,
> > that should be safe. In fact that should be perfectly safe, just move
> > the barrier and started test down after the insert test. *req is the
> > insert-after point.
> 
> Makes sense.  I suspect it may even worsen the problem I observed
> with the mpage code.  Set the readahead to 256k with `blockdev --setra 512'
> and then run tiobench.  The read latencies are massive - one thread
> gets hold of the disk head and hogs it for 30-60 seconds.
> 
> The readahead code has a sort of double-window design.  The idea is that
> if the disk does 50 megs/sec and your application processes data at
> 49 megs/sec, the application will never block on I/O.  At 256k readahead,
> the readahead code will be laying out four BIOs at a time.  It's probable
> that the application is actually submitting BIOs for a new readahead
> window before all of the BIOs for the old one are complete.  So it's
> performing merging against its own reads.
> 
> Given all this, what I would expect to see is for thread "A" to capture
> the disk head for some period of time, until eventually one of thread "B"'s
> requests expires its latency.  Then thread "B" gets to hog the disk head.
> That's reasonable behaviour,  but the latencies are *enormous*.  Almost
> like the latency stuff isn't working.  But it sure looks OK.

I'm still waiting for some time to implement some nicer i/o scheduling
algorithms, I'd be sad to see elevator_linus be the default for 2.6. For
now it's just receiving the odd fixes here and there which do make small
improvements.

> Not super-high priority at this time.  I'll play with it some more.
> (Some userspace tunables for the elevator would be nice.  Hint. ;))

Agreed :-)

> hmm.  Actually the code looks a bit odd:
> 
>                 if (elv_linus_sequence(__rq)-- <= 0)
>                         break;
>                 if (!(__rq->flags & REQ_CMD))
>                         continue;
>                 if (elv_linus_sequence(__rq) < bio_sectors(bio))
>                         break;
> 
> The first decrement is saying that elv_linus_sequence is in units of
> requests, but the comparison (and the later `-= bio_sectors()') seems
> to be saying it's in units of sectors.

Well, it really is in units of sectors in 2.5, the first decrement is a
scan aging measure.

> I think calculating the latency in terms of requests makes more sense - just
> ignore the actual size of those requests (or weight it down in some manner).
> But I don't immediately see what the above code is up to?

That might make more sense, but again it's not likely to make
elevator_linus too tolerable anyways. You can easily changes the
read/write initial sequences to be >> 2 what they are now, and just
account seeks. The end result would be very similar, though :-)

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Poor read performance when sequential write presents
  2002-05-28  9:25                   ` Jens Axboe
@ 2002-05-28  9:36                     ` Jens Axboe
  0 siblings, 0 replies; 12+ messages in thread
From: Jens Axboe @ 2002-05-28  9:36 UTC (permalink / raw)
  To: Andrew Morton
  Cc: William Lee Irwin III, Giuliano Pochini, linux-kernel, chen, xiangping

On Tue, May 28 2002, Jens Axboe wrote:
> > hmm.  Actually the code looks a bit odd:
> > 
> >                 if (elv_linus_sequence(__rq)-- <= 0)
> >                         break;
> >                 if (!(__rq->flags & REQ_CMD))
> >                         continue;
> >                 if (elv_linus_sequence(__rq) < bio_sectors(bio))
> >                         break;
> > 
> > The first decrement is saying that elv_linus_sequence is in units of
> > requests, but the comparison (and the later `-= bio_sectors()') seems
> > to be saying it's in units of sectors.
> 
> Well, it really is in units of sectors in 2.5, the first decrement is a
> scan aging measure.

Something like this make more sense.

diff -Nru a/drivers/block/elevator.c b/drivers/block/elevator.c
--- a/drivers/block/elevator.c	Tue May 28 11:33:38 2002
+++ b/drivers/block/elevator.c	Tue May 28 11:33:38 2002
@@ -174,21 +174,8 @@
 	while ((entry = entry->prev) != &q->queue_head) {
 		__rq = list_entry_rq(entry);
 
-		if (__rq->flags & (REQ_BARRIER | REQ_STARTED))
-			break;
-
-		/*
-		 * simply "aging" of requests in queue
-		 */
-		if (elv_linus_sequence(__rq)-- <= 0)
-			break;
 		if (!(__rq->flags & REQ_CMD))
 			continue;
-		if (elv_linus_sequence(__rq) < bio_sectors(bio))
-			break;
-
-		if (!*req && bio_rq_in_between(bio, __rq, &q->queue_head))
-			*req = __rq;
 
 		if ((ret = elv_try_merge(__rq, bio))) {
 			if (ret == ELEVATOR_FRONT_MERGE)
@@ -197,6 +184,15 @@
 			q->last_merge = &__rq->queuelist;
 			break;
 		}
+
+		if (elv_linus_sequence(__rq) < bio_sectors(bio))
+			break;
+
+		if (!*req && bio_rq_in_between(bio, __rq, &q->queue_head))
+			*req = __rq;
+
+		if (__rq->flags & (REQ_BARRIER | REQ_STARTED))
+			break;
 	}
 
 	return ret;

which basically only accounts seeks (sequence is still in sectors but
that doesn't matter). We will always try and merge (don't worry,
rq_mergeable() will check barrier and started bits), the sequence check
is postponed until right before the insertion check.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2002-05-28  9:36 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-05-23 14:20 Poor read performance when sequential write presents chen, xiangping
2002-05-23 19:51 ` Andrew Morton
2002-05-24  8:59   ` Giuliano Pochini
2002-05-24  9:26     ` Andrew Morton
2002-05-24  9:46       ` William Lee Irwin III
2002-05-24 10:04         ` Andrew Morton
2002-05-27  8:06           ` Jens Axboe
2002-05-27  8:22             ` Andrew Morton
2002-05-27  8:54               ` Jens Axboe
2002-05-27  9:35                 ` Andrew Morton
2002-05-28  9:25                   ` Jens Axboe
2002-05-28  9:36                     ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).