linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Regression] High latency when doing large I/O
@ 2009-01-17  0:44 Mathieu Desnoyers
  2009-01-17 16:26 ` [RFC PATCH] block: Fix bio merge induced high I/O latency Mathieu Desnoyers
  0 siblings, 1 reply; 39+ messages in thread
From: Mathieu Desnoyers @ 2009-01-17  0:44 UTC (permalink / raw)
  To: Jens Axboe, Andrea Arcangeli, akpm, Ben Gamari, ltt-dev
  Cc: linux-kernel, Jens Axboe

Hi,

A long standing I/O regression (since 2.6.18, still there today) has hit
Slashdot recently :
http://bugzilla.kernel.org/show_bug.cgi?id=12309
http://it.slashdot.org/article.pl?sid=09/01/15/049201

I've taken a trace reproducing the wrong behavior on my machine and I
think it's getting us somewhere.

LTTng 0.83, kernel 2.6.28
Machine : Intel Xeon E5405 dual quad-core, 16GB ram
(just created a new block-trace.c LTTng probe which is not released yet.
It basically replaces blktrace)


echo 3 > /proc/sys/vm/drop_caches

lttctl -C -w /tmp/trace -o channel.mm.bufnum=8 -o channel.block.bufnum=64 trace

dd if=/dev/zero of=/tmp/newfile bs=1M count=1M
cp -ax music /tmp   (copying 1.1GB of mp3)

ls  (takes 15 seconds to get the directory listing !)

lttctl -D trace

I looked at the trace (especially at the ls surroundings), and bash is
waiting for a few seconds for I/O in the exec system call (to exec ls).

While this happens, we have dd doing lots and lots of bio_queue. There
is a bio_backmerge after each bio_queue event. This is reasonable,
because dd is writing to a contiguous file.

However, I wonder if this is not the actual problem. We have dd which
has the head request in the elevator request queue. It is progressing
steadily by plugging/unplugging the device periodically and gets its
work done. However, because requests are being dequeued at the same
rate others are being merged, I suspect it stays at the top of the queue
and does not let the other unrelated requests run.

There is a test in the blk-merge.c which makes sure that merged requests
do not get bigger than a certain size. However, if the request is
steadily dequeued, I think this test is not doing anything.

If you are interested in looking at the trace I've taken, I could
provide it.

Does that make sense ?

Mathieu

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC PATCH] block: Fix bio merge induced high I/O latency
  2009-01-17  0:44 [Regression] High latency when doing large I/O Mathieu Desnoyers
@ 2009-01-17 16:26 ` Mathieu Desnoyers
  2009-01-17 16:50   ` Leon Woestenberg
                     ` (2 more replies)
  0 siblings, 3 replies; 39+ messages in thread
From: Mathieu Desnoyers @ 2009-01-17 16:26 UTC (permalink / raw)
  To: Jens Axboe, Andrea Arcangeli, akpm, Ingo Molnar, Linus Torvalds
  Cc: linux-kernel, ltt-dev

A long standing I/O regression (since 2.6.18, still there today) has hit
Slashdot recently :
http://bugzilla.kernel.org/show_bug.cgi?id=12309
http://it.slashdot.org/article.pl?sid=09/01/15/049201

I've taken a trace reproducing the wrong behavior on my machine and I
think it's getting us somewhere.

LTTng 0.83, kernel 2.6.28
Machine : Intel Xeon E5405 dual quad-core, 16GB ram
(just created a new block-trace.c LTTng probe which is not released yet.
It basically replaces blktrace)


echo 3 > /proc/sys/vm/drop_caches

lttctl -C -w /tmp/trace -o channel.mm.bufnum=8 -o channel.block.bufnum=64 trace

dd if=/dev/zero of=/tmp/newfile bs=1M count=1M
cp -ax music /tmp   (copying 1.1GB of mp3)

ls  (takes 15 seconds to get the directory listing !)

lttctl -D trace

I looked at the trace (especially at the ls surroundings), and bash is
waiting for a few seconds for I/O in the exec system call (to exec ls).

While this happens, we have dd doing lots and lots of bio_queue. There
is a bio_backmerge after each bio_queue event. This is reasonable,
because dd is writing to a contiguous file.

However, I wonder if this is not the actual problem. We have dd which
has the head request in the elevator request queue. It is progressing
steadily by plugging/unplugging the device periodically and gets its
work done. However, because requests are being dequeued at the same
rate others are being merged, I suspect it stays at the top of the queue
and does not let the other unrelated requests run.

There is a test in the blk-merge.c which makes sure that merged requests
do not get bigger than a certain size. However, if the request is
steadily dequeued, I think this test is not doing anything.


This patch implements a basic test to make sure we never merge more than 128
requests into the same request if it is the "last_merge" request. I have not
been able to trigger the problem again with the fix applied. It might not be in
a perfect state : there may be better solutions to the problem, but I think it
helps pointing out where the culprit lays.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Jens Axboe <axboe@kernel.dk>
CC: Andrea Arcangeli <andrea@suse.de>
CC: akpm@linux-foundation.org
CC: Ingo Molnar <mingo@elte.hu>
CC: Linus Torvalds <torvalds@linux-foundation.org>
---
 block/blk-merge.c      |   12 +++++++++---
 block/elevator.c       |   31 ++++++++++++++++++++++++++++---
 include/linux/blkdev.h |    1 +
 3 files changed, 38 insertions(+), 6 deletions(-)

Index: linux-2.6-lttng/include/linux/blkdev.h
===================================================================
--- linux-2.6-lttng.orig/include/linux/blkdev.h	2009-01-17 09:49:54.000000000 -0500
+++ linux-2.6-lttng/include/linux/blkdev.h	2009-01-17 09:50:29.000000000 -0500
@@ -313,6 +313,7 @@ struct request_queue
 	 */
 	struct list_head	queue_head;
 	struct request		*last_merge;
+	int			nr_cached_merge;
 	elevator_t		*elevator;
 
 	/*
Index: linux-2.6-lttng/block/elevator.c
===================================================================
--- linux-2.6-lttng.orig/block/elevator.c	2009-01-17 09:49:54.000000000 -0500
+++ linux-2.6-lttng/block/elevator.c	2009-01-17 11:07:12.000000000 -0500
@@ -255,6 +255,7 @@ int elevator_init(struct request_queue *
 
 	INIT_LIST_HEAD(&q->queue_head);
 	q->last_merge = NULL;
+	q->nr_cached_merge = 0;
 	q->end_sector = 0;
 	q->boundary_rq = NULL;
 
@@ -438,8 +439,10 @@ void elv_dispatch_sort(struct request_qu
 	struct list_head *entry;
 	int stop_flags;
 
-	if (q->last_merge == rq)
+	if (q->last_merge == rq) {
 		q->last_merge = NULL;
+		q->nr_cached_merge = 0;
+	}
 
 	elv_rqhash_del(q, rq);
 
@@ -478,8 +481,10 @@ EXPORT_SYMBOL(elv_dispatch_sort);
  */
 void elv_dispatch_add_tail(struct request_queue *q, struct request *rq)
 {
-	if (q->last_merge == rq)
+	if (q->last_merge == rq) {
 		q->last_merge = NULL;
+		q->nr_cached_merge = 0;
+	}
 
 	elv_rqhash_del(q, rq);
 
@@ -498,6 +503,16 @@ int elv_merge(struct request_queue *q, s
 	int ret;
 
 	/*
+	 * Make sure we don't starve other requests by merging too many cached
+	 * requests together.
+	 */
+	if (q->nr_cached_merge >= BLKDEV_MAX_RQ) {
+		q->last_merge = NULL;
+		q->nr_cached_merge = 0;
+		return ELEVATOR_NO_MERGE;
+	}
+
+	/*
 	 * First try one-hit cache.
 	 */
 	if (q->last_merge) {
@@ -536,6 +551,10 @@ void elv_merged_request(struct request_q
 	if (type == ELEVATOR_BACK_MERGE)
 		elv_rqhash_reposition(q, rq);
 
+	if (q->last_merge != rq)
+		q->nr_cached_merge = 0;
+	else
+		q->nr_cached_merge++;
 	q->last_merge = rq;
 }
 
@@ -551,6 +570,10 @@ void elv_merge_requests(struct request_q
 	elv_rqhash_del(q, next);
 
 	q->nr_sorted--;
+	if (q->last_merge != rq)
+		q->nr_cached_merge = 0;
+	else
+		q->nr_cached_merge++;
 	q->last_merge = rq;
 }
 
@@ -626,8 +649,10 @@ void elv_insert(struct request_queue *q,
 		q->nr_sorted++;
 		if (rq_mergeable(rq)) {
 			elv_rqhash_add(q, rq);
-			if (!q->last_merge)
+			if (!q->last_merge) {
+				q->nr_cached_merge = 1;
 				q->last_merge = rq;
+			}
 		}
 
 		/*
Index: linux-2.6-lttng/block/blk-merge.c
===================================================================
--- linux-2.6-lttng.orig/block/blk-merge.c	2009-01-17 09:49:54.000000000 -0500
+++ linux-2.6-lttng/block/blk-merge.c	2009-01-17 09:50:29.000000000 -0500
@@ -231,8 +231,10 @@ static inline int ll_new_hw_segment(stru
 	if (req->nr_phys_segments + nr_phys_segs > q->max_hw_segments
 	    || req->nr_phys_segments + nr_phys_segs > q->max_phys_segments) {
 		req->cmd_flags |= REQ_NOMERGE;
-		if (req == q->last_merge)
+		if (req == q->last_merge) {
 			q->last_merge = NULL;
+			q->nr_cached_merge = 0;
+		}
 		return 0;
 	}
 
@@ -256,8 +258,10 @@ int ll_back_merge_fn(struct request_queu
 
 	if (req->nr_sectors + bio_sectors(bio) > max_sectors) {
 		req->cmd_flags |= REQ_NOMERGE;
-		if (req == q->last_merge)
+		if (req == q->last_merge) {
 			q->last_merge = NULL;
+			q->nr_cached_merge = 0;
+		}
 		return 0;
 	}
 	if (!bio_flagged(req->biotail, BIO_SEG_VALID))
@@ -281,8 +285,10 @@ int ll_front_merge_fn(struct request_que
 
 	if (req->nr_sectors + bio_sectors(bio) > max_sectors) {
 		req->cmd_flags |= REQ_NOMERGE;
-		if (req == q->last_merge)
+		if (req == q->last_merge) {
 			q->last_merge = NULL;
+			q->nr_cached_merge = 0;
+		}
 		return 0;
 	}
 	if (!bio_flagged(bio, BIO_SEG_VALID))

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH] block: Fix bio merge induced high I/O latency
  2009-01-17 16:26 ` [RFC PATCH] block: Fix bio merge induced high I/O latency Mathieu Desnoyers
@ 2009-01-17 16:50   ` Leon Woestenberg
  2009-01-17 17:15     ` Mathieu Desnoyers
  2009-01-17 19:04   ` Jens Axboe
  2009-01-17 20:03   ` Ben Gamari
  2 siblings, 1 reply; 39+ messages in thread
From: Leon Woestenberg @ 2009-01-17 16:50 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Jens Axboe, Andrea Arcangeli, akpm, Ingo Molnar, Linus Torvalds,
	linux-kernel, ltt-dev, Thomas Gleixner

Hello Mathieu et al,

On Sat, Jan 17, 2009 at 5:26 PM, Mathieu Desnoyers
<mathieu.desnoyers@polymtl.ca> wrote:
> A long standing I/O regression (since 2.6.18, still there today) has hit
> Slashdot recently :
> http://bugzilla.kernel.org/show_bug.cgi?id=12309

Are you sure you are solving the *actual* problem?

The bugzilla entry shows a bisect attempt that leads to a patch
involving negative clock jumps.
http://bugzilla.kernel.org/show_bug.cgi?id=12309#c29

with a corrected link to the bisect patch:
http://bugzilla.kernel.org/show_bug.cgi?id=12309#c30

Wouldn't a negative clock jump be very influential to the
(time-driven) I/O schedulers and be a more probable cause?

Regards,
-- 
Leon

p.s. Added Thomas to the CC list as his name is on the patch Signed-off-by list.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH] block: Fix bio merge induced high I/O latency
  2009-01-17 16:50   ` Leon Woestenberg
@ 2009-01-17 17:15     ` Mathieu Desnoyers
  0 siblings, 0 replies; 39+ messages in thread
From: Mathieu Desnoyers @ 2009-01-17 17:15 UTC (permalink / raw)
  To: Leon Woestenberg
  Cc: Jens Axboe, Andrea Arcangeli, akpm, Ingo Molnar, Linus Torvalds,
	linux-kernel, ltt-dev, Thomas Gleixner

* Leon Woestenberg (leon.woestenberg@gmail.com) wrote:
> Hello Mathieu et al,
> 
> On Sat, Jan 17, 2009 at 5:26 PM, Mathieu Desnoyers
> <mathieu.desnoyers@polymtl.ca> wrote:
> > A long standing I/O regression (since 2.6.18, still there today) has hit
> > Slashdot recently :
> > http://bugzilla.kernel.org/show_bug.cgi?id=12309
> 
> Are you sure you are solving the *actual* problem?
> 
> The bugzilla entry shows a bisect attempt that leads to a patch
> involving negative clock jumps.
> http://bugzilla.kernel.org/show_bug.cgi?id=12309#c29
> 
> with a corrected link to the bisect patch:
> http://bugzilla.kernel.org/show_bug.cgi?id=12309#c30
> 
> Wouldn't a negative clock jump be very influential to the
> (time-driven) I/O schedulers and be a more probable cause?
> 

When a merge is done, the lowest timestamp between the existing request
and the new request to merge is kept as a start_time value for the
merged request we end up with. In this case, that would probably make
that request stay on top of the queue even if unrelated interactive I/O
requests come.

I suspect that this negative clock jump could have hidden the problem by
making the start time of the interactive request lower than the start
time of the merged request.

Mathieu

> Regards,
> -- 
> Leon
> 
> p.s. Added Thomas to the CC list as his name is on the patch Signed-off-by list.

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH] block: Fix bio merge induced high I/O latency
  2009-01-17 16:26 ` [RFC PATCH] block: Fix bio merge induced high I/O latency Mathieu Desnoyers
  2009-01-17 16:50   ` Leon Woestenberg
@ 2009-01-17 19:04   ` Jens Axboe
  2009-01-18 21:12     ` Mathieu Desnoyers
  2009-01-19 15:45     ` Nikanth K
  2009-01-17 20:03   ` Ben Gamari
  2 siblings, 2 replies; 39+ messages in thread
From: Jens Axboe @ 2009-01-17 19:04 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Andrea Arcangeli, akpm, Ingo Molnar, Linus Torvalds,
	linux-kernel, ltt-dev

On Sat, Jan 17 2009, Mathieu Desnoyers wrote:
> A long standing I/O regression (since 2.6.18, still there today) has hit
> Slashdot recently :
> http://bugzilla.kernel.org/show_bug.cgi?id=12309
> http://it.slashdot.org/article.pl?sid=09/01/15/049201
> 
> I've taken a trace reproducing the wrong behavior on my machine and I
> think it's getting us somewhere.
> 
> LTTng 0.83, kernel 2.6.28
> Machine : Intel Xeon E5405 dual quad-core, 16GB ram
> (just created a new block-trace.c LTTng probe which is not released yet.
> It basically replaces blktrace)
> 
> 
> echo 3 > /proc/sys/vm/drop_caches
> 
> lttctl -C -w /tmp/trace -o channel.mm.bufnum=8 -o channel.block.bufnum=64 trace
> 
> dd if=/dev/zero of=/tmp/newfile bs=1M count=1M
> cp -ax music /tmp   (copying 1.1GB of mp3)
> 
> ls  (takes 15 seconds to get the directory listing !)
> 
> lttctl -D trace
> 
> I looked at the trace (especially at the ls surroundings), and bash is
> waiting for a few seconds for I/O in the exec system call (to exec ls).
> 
> While this happens, we have dd doing lots and lots of bio_queue. There
> is a bio_backmerge after each bio_queue event. This is reasonable,
> because dd is writing to a contiguous file.
> 
> However, I wonder if this is not the actual problem. We have dd which
> has the head request in the elevator request queue. It is progressing
> steadily by plugging/unplugging the device periodically and gets its
> work done. However, because requests are being dequeued at the same
> rate others are being merged, I suspect it stays at the top of the queue
> and does not let the other unrelated requests run.
> 
> There is a test in the blk-merge.c which makes sure that merged requests
> do not get bigger than a certain size. However, if the request is
> steadily dequeued, I think this test is not doing anything.
> 
> 
> This patch implements a basic test to make sure we never merge more
> than 128 requests into the same request if it is the "last_merge"
> request. I have not been able to trigger the problem again with the
> fix applied. It might not be in a perfect state : there may be better
> solutions to the problem, but I think it helps pointing out where the
> culprit lays.

To be painfully honest, I have no idea what you are attempting to solve
with this patch. First of all, Linux has always merged any request
possible. The one-hit cache is just that, a one hit cache frontend for
merging. We'll be hitting the merge hash and doing the same merge if it
fails. Since we even cap the size of the request, the merging is also
bounded.

Furthermore, the request being merged is not considered for IO yet. It
has not been dispatched by the io scheduler. IOW, I'm surprised your
patch makes any difference at all. Especially with your 128 limit, since
4kbx128kb is 512kb which is the default max merge size anyway. These
sort of test cases tend to be very sensitive and exhibit different
behaviour for many runs, so call me a bit skeptical and consider that an
enouragement to do more directed testing. You could use fio for
instance. Have two jobs in your job file. One is a dd type process that
just writes a huge file, the other job starts eg 10 seconds later and
does a 4kb read of a file.

As a quick test, could you try and increase the slice_idle to eg 20ms?
Sometimes I've seen timing being slightly off, which makes us miss the
sync window for the ls (in your case) process. Then you get a mix of
async and sync IO all the time, which very much slows down the sync
process.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH] block: Fix bio merge induced high I/O latency
  2009-01-17 16:26 ` [RFC PATCH] block: Fix bio merge induced high I/O latency Mathieu Desnoyers
  2009-01-17 16:50   ` Leon Woestenberg
  2009-01-17 19:04   ` Jens Axboe
@ 2009-01-17 20:03   ` Ben Gamari
  2 siblings, 0 replies; 39+ messages in thread
From: Ben Gamari @ 2009-01-17 20:03 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Jens Axboe, Andrea Arcangeli, akpm, Ingo Molnar, Linus Torvalds,
	linux-kernel, ltt-dev

On Sat, 2009-01-17 at 11:26 -0500, Mathieu Desnoyers wrote:
> This patch implements a basic test to make sure we never merge more than 128
> requests into the same request if it is the "last_merge" request. I have not
> been able to trigger the problem again with the fix applied. It might not be in
> a perfect state : there may be better solutions to the problem, but I think it
> helps pointing out where the culprit lays.

Unfortunately, it seems like the patch hasn't really fixed much. After
porting it forward to Linus' master, I haven't exhibited any difference
in real world use cases (e.g. desktop use cases while building a
kernel).

Given Jen's remarks, I suppose this isn't too surprising. Does anyone
else with greater familiarity with the block I/O subsystem have any more
ideas about the source of the slowdown? It seems like the recent patches
incorporating blktrace support into ftrace could be helpful for further
data collection, correct?

- Ben




^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH] block: Fix bio merge induced high I/O latency
  2009-01-17 19:04   ` Jens Axboe
@ 2009-01-18 21:12     ` Mathieu Desnoyers
  2009-01-18 21:27       ` Mathieu Desnoyers
  2009-01-19 18:26       ` Jens Axboe
  2009-01-19 15:45     ` Nikanth K
  1 sibling, 2 replies; 39+ messages in thread
From: Mathieu Desnoyers @ 2009-01-18 21:12 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Andrea Arcangeli, akpm, Ingo Molnar, Linus Torvalds,
	linux-kernel, ltt-dev

* Jens Axboe (jens.axboe@oracle.com) wrote:
> On Sat, Jan 17 2009, Mathieu Desnoyers wrote:
> > A long standing I/O regression (since 2.6.18, still there today) has hit
> > Slashdot recently :
> > http://bugzilla.kernel.org/show_bug.cgi?id=12309
> > http://it.slashdot.org/article.pl?sid=09/01/15/049201
> > 
> > I've taken a trace reproducing the wrong behavior on my machine and I
> > think it's getting us somewhere.
> > 
> > LTTng 0.83, kernel 2.6.28
> > Machine : Intel Xeon E5405 dual quad-core, 16GB ram
> > (just created a new block-trace.c LTTng probe which is not released yet.
> > It basically replaces blktrace)
> > 
> > 
> > echo 3 > /proc/sys/vm/drop_caches
> > 
> > lttctl -C -w /tmp/trace -o channel.mm.bufnum=8 -o channel.block.bufnum=64 trace
> > 
> > dd if=/dev/zero of=/tmp/newfile bs=1M count=1M
> > cp -ax music /tmp   (copying 1.1GB of mp3)
> > 
> > ls  (takes 15 seconds to get the directory listing !)
> > 
> > lttctl -D trace
> > 
> > I looked at the trace (especially at the ls surroundings), and bash is
> > waiting for a few seconds for I/O in the exec system call (to exec ls).
> > 
> > While this happens, we have dd doing lots and lots of bio_queue. There
> > is a bio_backmerge after each bio_queue event. This is reasonable,
> > because dd is writing to a contiguous file.
> > 
> > However, I wonder if this is not the actual problem. We have dd which
> > has the head request in the elevator request queue. It is progressing
> > steadily by plugging/unplugging the device periodically and gets its
> > work done. However, because requests are being dequeued at the same
> > rate others are being merged, I suspect it stays at the top of the queue
> > and does not let the other unrelated requests run.
> > 
> > There is a test in the blk-merge.c which makes sure that merged requests
> > do not get bigger than a certain size. However, if the request is
> > steadily dequeued, I think this test is not doing anything.
> > 
> > 
> > This patch implements a basic test to make sure we never merge more
> > than 128 requests into the same request if it is the "last_merge"
> > request. I have not been able to trigger the problem again with the
> > fix applied. It might not be in a perfect state : there may be better
> > solutions to the problem, but I think it helps pointing out where the
> > culprit lays.
> 
> To be painfully honest, I have no idea what you are attempting to solve
> with this patch. First of all, Linux has always merged any request
> possible. The one-hit cache is just that, a one hit cache frontend for
> merging. We'll be hitting the merge hash and doing the same merge if it
> fails. Since we even cap the size of the request, the merging is also
> bounded.
> 

Hi Jens,

I was mostly trying to poke around and try to figure out what was going
on in the I/O elevator. Sorry if my first attempts did not make much
sense. Following your advice, I've looked more deeply into the test
cases.

> Furthermore, the request being merged is not considered for IO yet. It
> has not been dispatched by the io scheduler. IOW, I'm surprised your
> patch makes any difference at all. Especially with your 128 limit, since
> 4kbx128kb is 512kb which is the default max merge size anyway. These
> sort of test cases tend to be very sensitive and exhibit different
> behaviour for many runs, so call me a bit skeptical and consider that an
> enouragement to do more directed testing. You could use fio for
> instance. Have two jobs in your job file. One is a dd type process that
> just writes a huge file, the other job starts eg 10 seconds later and
> does a 4kb read of a file.
> 

I looked at the "ls" behavior (while doing a dd) within my LTTng trace
to create a fio job file.  The said behavior is appended below as "Part
1 - ls I/O behavior". Note that the original "ls" test case was done
with the anticipatory I/O scheduler, which was active by default on my
debian system with custom vanilla 2.6.28 kernel. Also note that I am
running this on a raid-1, but have experienced the same problem on a
standard partition I created on the same machine.

I created the fio job file appended as "Part 2 - dd+ls fio job file". It
consists of one dd-like job and many small jobs reading as many data as
ls did. I used the small test script to batch run this ("Part 3 - batch
test").

The results for the ls-like jobs are interesting :

I/O scheduler        runt-min (msec)   runt-max (msec)
noop                       41             10563
anticipatory               63              8185
deadline                   52             33387
cfq                        43              1420


> As a quick test, could you try and increase the slice_idle to eg 20ms?
> Sometimes I've seen timing being slightly off, which makes us miss the
> sync window for the ls (in your case) process. Then you get a mix of
> async and sync IO all the time, which very much slows down the sync
> process.
> 

Just to confirm, the quick test you are taking about would be :

---
 block/cfq-iosched.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6-lttng/block/cfq-iosched.c
===================================================================
--- linux-2.6-lttng.orig/block/cfq-iosched.c	2009-01-18 15:17:32.000000000 -0500
+++ linux-2.6-lttng/block/cfq-iosched.c	2009-01-18 15:46:38.000000000 -0500
@@ -26,7 +26,7 @@ static const int cfq_back_penalty = 2;
 static const int cfq_slice_sync = HZ / 10;
 static int cfq_slice_async = HZ / 25;
 static const int cfq_slice_async_rq = 2;
-static int cfq_slice_idle = HZ / 125;
+static int cfq_slice_idle = 20;
 
 /*
  * offset from end of service tree


It does not make much difference with the standard cfq test :

I/O scheduler        runt-min (msec)   runt-max (msec)
cfq (standard)             43              1420
cfq (20ms slice_idle)      31              1573


So, I guess 1.5s delay to run ls on a directory when the cache is cold
with a cfq I/O scheduler is somewhat acceptable, but I doubt the 8, 10
and 33s response times for the anticipatory, noop and deadline I/O
schedulers are. I wonder why on earth is the anticipatory I/O scheduler
activated by default with my kernel given it results in so poor
interactive behavior when doing large I/O ?

Thanks for the advices,

Mathieu



* Part 1 - ls I/O behavior

lttv -m textDump -t /traces/block-backmerge \
     -e "state.pid=4145&event.subname=bio_queue"

block.bio_queue: 662.707321959 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, SYSCALL { sector = 327680048, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 }
block.bio_queue: 662.707331445 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, SYSCALL { sector = 349175018, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 662.968214766 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, SYSCALL { sector = 327696968, size = 16384, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 662.968222110 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, SYSCALL { sector = 349191938, size = 16384, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 662.971662800 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, TRAP { sector = 327697032, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 662.971670417 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, TRAP { sector = 349192002, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 662.971684184 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, TRAP { sector = 327697040, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 662.971689854 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, TRAP { sector = 349192010, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 662.971695762 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, TRAP { sector = 327697048, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 662.971701135 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, TRAP { sector = 349192018, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 662.971706301 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, TRAP { sector = 327697056, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 662.971711698 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, TRAP { sector = 349192026, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 662.971723359 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, TRAP { sector = 327697064, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 662.971729035 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, TRAP { sector = 349192034, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 662.999391873 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, TRAP { sector = 327697072, size = 53248, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 662.999397864 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, TRAP { sector = 349192042, size = 53248, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 670.809328737 (/traces/block-backmerge/block_7), 4145, 4145, /bin/ls, , 4063, 0x0, TRAP { sector = 327697000, size = 16384, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 670.809337500 (/traces/block-backmerge/block_7), 4145, 4145, /bin/ls, , 4063, 0x0, TRAP { sector = 349191970, size = 16384, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 671.161036834 (/traces/block-backmerge/block_5), 4145, 4145, /bin/ls, , 4063, 0x0, SYSCALL { sector = 360714880, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 }
block.bio_queue: 671.161047247 (/traces/block-backmerge/block_5), 4145, 4145, /bin/ls, , 4063, 0x0, SYSCALL { sector = 382209850, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 671.653601399 (/traces/block-backmerge/block_7), 4145, 4145, /bin/ls, , 4063, 0x0, SYSCALL { sector = 360712184, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 }
block.bio_queue: 671.653611077 (/traces/block-backmerge/block_7), 4145, 4145, /bin/ls, , 4063, 0x0, SYSCALL { sector = 382207154, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }


* Part 2 - dd+ls fio job file (test.job5)

[job1]
rw=write
size=10240m
direct=0
blocksize=1024k

[global]
rw=randread
size=96k
filesize=30m
direct=0
bsrange=4k-52k

[file1]
startdelay=0

[file2]
startdelay=4

[file3]
startdelay=8

[file4]
startdelay=12

[file5]
startdelay=16

[file6]
startdelay=20

[file7]
startdelay=24

[file8]
startdelay=28

[file9]
startdelay=32

[file10]
startdelay=36

[file11]
startdelay=40

[file12]
startdelay=44

[file13]
startdelay=48

[file14]
startdelay=52

[file15]
startdelay=56

[file16]
startdelay=60

[file17]
startdelay=64

[file18]
startdelay=68

[file19]
startdelay=72

[file20]
startdelay=76

[file21]
startdelay=80

[file22]
startdelay=84

[file23]
startdelay=88

[file24]
startdelay=92

[file25]
startdelay=96

[file26]
startdelay=100

[file27]
startdelay=104

[file28]
startdelay=108

[file29]
startdelay=112

[file30]
startdelay=116

[file31]
startdelay=120

[file32]
startdelay=124

[file33]
startdelay=128

[file34]
startdelay=132

[file35]
startdelay=134

[file36]
startdelay=138

[file37]
startdelay=142

[file38]
startdelay=146

[file39]
startdelay=150

[file40]
startdelay=200

[file41]
startdelay=260


* Part 3 - batch test (do-tests.sh)

#!/bin/sh

TESTS="anticipatory noop deadline cfq"

for TEST in ${TESTS}; do 
	echo "Running ${TEST}"
	
	rm -f file*.0 job*.0

	echo ${TEST} > /sys/block/sda/queue/scheduler
	echo ${TEST} > /sys/block/sdb/queue/scheduler
	sync
	echo 3 > /proc/sys/vm/drop_caches
	sleep 5
	
	./fio test.job5 --output test.result.${TEST}
done


-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH] block: Fix bio merge induced high I/O latency
  2009-01-18 21:12     ` Mathieu Desnoyers
@ 2009-01-18 21:27       ` Mathieu Desnoyers
  2009-01-19 18:26       ` Jens Axboe
  1 sibling, 0 replies; 39+ messages in thread
From: Mathieu Desnoyers @ 2009-01-18 21:27 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Andrea Arcangeli, akpm, Ingo Molnar, Linus Torvalds,
	linux-kernel, ltt-dev

* Mathieu Desnoyers (mathieu.desnoyers@polymtl.ca) wrote:
> * Jens Axboe (jens.axboe@oracle.com) wrote:
> > On Sat, Jan 17 2009, Mathieu Desnoyers wrote:
> > > A long standing I/O regression (since 2.6.18, still there today) has hit
> > > Slashdot recently :
> > > http://bugzilla.kernel.org/show_bug.cgi?id=12309
> > > http://it.slashdot.org/article.pl?sid=09/01/15/049201
> > > 
> > > I've taken a trace reproducing the wrong behavior on my machine and I
> > > think it's getting us somewhere.
> > > 
> > > LTTng 0.83, kernel 2.6.28
> > > Machine : Intel Xeon E5405 dual quad-core, 16GB ram
> > > (just created a new block-trace.c LTTng probe which is not released yet.
> > > It basically replaces blktrace)
> > > 
> > > 
> > > echo 3 > /proc/sys/vm/drop_caches
> > > 
> > > lttctl -C -w /tmp/trace -o channel.mm.bufnum=8 -o channel.block.bufnum=64 trace
> > > 
> > > dd if=/dev/zero of=/tmp/newfile bs=1M count=1M
> > > cp -ax music /tmp   (copying 1.1GB of mp3)
> > > 
> > > ls  (takes 15 seconds to get the directory listing !)
> > > 
> > > lttctl -D trace
> > > 
> > > I looked at the trace (especially at the ls surroundings), and bash is
> > > waiting for a few seconds for I/O in the exec system call (to exec ls).
> > > 
> > > While this happens, we have dd doing lots and lots of bio_queue. There
> > > is a bio_backmerge after each bio_queue event. This is reasonable,
> > > because dd is writing to a contiguous file.
> > > 
> > > However, I wonder if this is not the actual problem. We have dd which
> > > has the head request in the elevator request queue. It is progressing
> > > steadily by plugging/unplugging the device periodically and gets its
> > > work done. However, because requests are being dequeued at the same
> > > rate others are being merged, I suspect it stays at the top of the queue
> > > and does not let the other unrelated requests run.
> > > 
> > > There is a test in the blk-merge.c which makes sure that merged requests
> > > do not get bigger than a certain size. However, if the request is
> > > steadily dequeued, I think this test is not doing anything.
> > > 
> > > 
> > > This patch implements a basic test to make sure we never merge more
> > > than 128 requests into the same request if it is the "last_merge"
> > > request. I have not been able to trigger the problem again with the
> > > fix applied. It might not be in a perfect state : there may be better
> > > solutions to the problem, but I think it helps pointing out where the
> > > culprit lays.
> > 
> > To be painfully honest, I have no idea what you are attempting to solve
> > with this patch. First of all, Linux has always merged any request
> > possible. The one-hit cache is just that, a one hit cache frontend for
> > merging. We'll be hitting the merge hash and doing the same merge if it
> > fails. Since we even cap the size of the request, the merging is also
> > bounded.
> > 
> 
> Hi Jens,
> 
> I was mostly trying to poke around and try to figure out what was going
> on in the I/O elevator. Sorry if my first attempts did not make much
> sense. Following your advice, I've looked more deeply into the test
> cases.
> 
> > Furthermore, the request being merged is not considered for IO yet. It
> > has not been dispatched by the io scheduler. IOW, I'm surprised your
> > patch makes any difference at all. Especially with your 128 limit, since
> > 4kbx128kb is 512kb which is the default max merge size anyway. These
> > sort of test cases tend to be very sensitive and exhibit different
> > behaviour for many runs, so call me a bit skeptical and consider that an
> > enouragement to do more directed testing. You could use fio for
> > instance. Have two jobs in your job file. One is a dd type process that
> > just writes a huge file, the other job starts eg 10 seconds later and
> > does a 4kb read of a file.
> > 
> 
> I looked at the "ls" behavior (while doing a dd) within my LTTng trace
> to create a fio job file.  The said behavior is appended below as "Part
> 1 - ls I/O behavior". Note that the original "ls" test case was done
> with the anticipatory I/O scheduler, which was active by default on my
> debian system with custom vanilla 2.6.28 kernel. Also note that I am
> running this on a raid-1, but have experienced the same problem on a
> standard partition I created on the same machine.
> 
> I created the fio job file appended as "Part 2 - dd+ls fio job file". It
> consists of one dd-like job and many small jobs reading as many data as
> ls did. I used the small test script to batch run this ("Part 3 - batch
> test").
> 
> The results for the ls-like jobs are interesting :
> 
> I/O scheduler        runt-min (msec)   runt-max (msec)
> noop                       41             10563
> anticipatory               63              8185
> deadline                   52             33387
> cfq                        43              1420
> 
> 
> > As a quick test, could you try and increase the slice_idle to eg 20ms?
> > Sometimes I've seen timing being slightly off, which makes us miss the
> > sync window for the ls (in your case) process. Then you get a mix of
> > async and sync IO all the time, which very much slows down the sync
> > process.
> > 
> 
> Just to confirm, the quick test you are taking about would be :
> 
> ---
>  block/cfq-iosched.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Index: linux-2.6-lttng/block/cfq-iosched.c
> ===================================================================
> --- linux-2.6-lttng.orig/block/cfq-iosched.c	2009-01-18 15:17:32.000000000 -0500
> +++ linux-2.6-lttng/block/cfq-iosched.c	2009-01-18 15:46:38.000000000 -0500
> @@ -26,7 +26,7 @@ static const int cfq_back_penalty = 2;
>  static const int cfq_slice_sync = HZ / 10;
>  static int cfq_slice_async = HZ / 25;
>  static const int cfq_slice_async_rq = 2;
> -static int cfq_slice_idle = HZ / 125;
> +static int cfq_slice_idle = 20;
>  
>  /*
>   * offset from end of service tree
> 
> 
> It does not make much difference with the standard cfq test :
> 
> I/O scheduler        runt-min (msec)   runt-max (msec)
> cfq (standard)             43              1420
> cfq (20ms slice_idle)      31              1573
> 
> 
> So, I guess 1.5s delay to run ls on a directory when the cache is cold
> with a cfq I/O scheduler is somewhat acceptable, but I doubt the 8, 10
> and 33s response times for the anticipatory, noop and deadline I/O
> schedulers are. I wonder why on earth is the anticipatory I/O scheduler
> activated by default with my kernel given it results in so poor
> interactive behavior when doing large I/O ?
> 

I found out why : I had an old pre-2.6.18 .config hanging around in
/boot on _many_ of my systems and upgraded to a newer vanilla kernel
using these defaults. make oldconfig left
CONFIG_DEFAULT_IOSCHED="anticipatory".

Changing to CONFIG_DEFAULT_IOSCHED="cfq" makes everything run better
under heavy I/O. I bet I'm not the only one in this situation.

Mathieu


> Thanks for the advices,
> 
> Mathieu
> 
> 
> 
> * Part 1 - ls I/O behavior
> 
> lttv -m textDump -t /traces/block-backmerge \
>      -e "state.pid=4145&event.subname=bio_queue"
> 
> block.bio_queue: 662.707321959 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, SYSCALL { sector = 327680048, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 }
> block.bio_queue: 662.707331445 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, SYSCALL { sector = 349175018, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
> block.bio_queue: 662.968214766 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, SYSCALL { sector = 327696968, size = 16384, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
> block.bio_queue: 662.968222110 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, SYSCALL { sector = 349191938, size = 16384, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
> block.bio_queue: 662.971662800 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, TRAP { sector = 327697032, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
> block.bio_queue: 662.971670417 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, TRAP { sector = 349192002, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
> block.bio_queue: 662.971684184 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, TRAP { sector = 327697040, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
> block.bio_queue: 662.971689854 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, TRAP { sector = 349192010, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
> block.bio_queue: 662.971695762 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, TRAP { sector = 327697048, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
> block.bio_queue: 662.971701135 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, TRAP { sector = 349192018, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
> block.bio_queue: 662.971706301 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, TRAP { sector = 327697056, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
> block.bio_queue: 662.971711698 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, TRAP { sector = 349192026, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
> block.bio_queue: 662.971723359 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, TRAP { sector = 327697064, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
> block.bio_queue: 662.971729035 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, TRAP { sector = 349192034, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
> block.bio_queue: 662.999391873 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, TRAP { sector = 327697072, size = 53248, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
> block.bio_queue: 662.999397864 (/traces/block-backmerge/block_2), 4145, 4145, bash, , 4063, 0x0, TRAP { sector = 349192042, size = 53248, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
> block.bio_queue: 670.809328737 (/traces/block-backmerge/block_7), 4145, 4145, /bin/ls, , 4063, 0x0, TRAP { sector = 327697000, size = 16384, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
> block.bio_queue: 670.809337500 (/traces/block-backmerge/block_7), 4145, 4145, /bin/ls, , 4063, 0x0, TRAP { sector = 349191970, size = 16384, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
> block.bio_queue: 671.161036834 (/traces/block-backmerge/block_5), 4145, 4145, /bin/ls, , 4063, 0x0, SYSCALL { sector = 360714880, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 }
> block.bio_queue: 671.161047247 (/traces/block-backmerge/block_5), 4145, 4145, /bin/ls, , 4063, 0x0, SYSCALL { sector = 382209850, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
> block.bio_queue: 671.653601399 (/traces/block-backmerge/block_7), 4145, 4145, /bin/ls, , 4063, 0x0, SYSCALL { sector = 360712184, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 }
> block.bio_queue: 671.653611077 (/traces/block-backmerge/block_7), 4145, 4145, /bin/ls, , 4063, 0x0, SYSCALL { sector = 382207154, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
> 
> 
> * Part 2 - dd+ls fio job file (test.job5)
> 
> [job1]
> rw=write
> size=10240m
> direct=0
> blocksize=1024k
> 
> [global]
> rw=randread
> size=96k
> filesize=30m
> direct=0
> bsrange=4k-52k
> 
> [file1]
> startdelay=0
> 
> [file2]
> startdelay=4
> 
> [file3]
> startdelay=8
> 
> [file4]
> startdelay=12
> 
> [file5]
> startdelay=16
> 
> [file6]
> startdelay=20
> 
> [file7]
> startdelay=24
> 
> [file8]
> startdelay=28
> 
> [file9]
> startdelay=32
> 
> [file10]
> startdelay=36
> 
> [file11]
> startdelay=40
> 
> [file12]
> startdelay=44
> 
> [file13]
> startdelay=48
> 
> [file14]
> startdelay=52
> 
> [file15]
> startdelay=56
> 
> [file16]
> startdelay=60
> 
> [file17]
> startdelay=64
> 
> [file18]
> startdelay=68
> 
> [file19]
> startdelay=72
> 
> [file20]
> startdelay=76
> 
> [file21]
> startdelay=80
> 
> [file22]
> startdelay=84
> 
> [file23]
> startdelay=88
> 
> [file24]
> startdelay=92
> 
> [file25]
> startdelay=96
> 
> [file26]
> startdelay=100
> 
> [file27]
> startdelay=104
> 
> [file28]
> startdelay=108
> 
> [file29]
> startdelay=112
> 
> [file30]
> startdelay=116
> 
> [file31]
> startdelay=120
> 
> [file32]
> startdelay=124
> 
> [file33]
> startdelay=128
> 
> [file34]
> startdelay=132
> 
> [file35]
> startdelay=134
> 
> [file36]
> startdelay=138
> 
> [file37]
> startdelay=142
> 
> [file38]
> startdelay=146
> 
> [file39]
> startdelay=150
> 
> [file40]
> startdelay=200
> 
> [file41]
> startdelay=260
> 
> 
> * Part 3 - batch test (do-tests.sh)
> 
> #!/bin/sh
> 
> TESTS="anticipatory noop deadline cfq"
> 
> for TEST in ${TESTS}; do 
> 	echo "Running ${TEST}"
> 	
> 	rm -f file*.0 job*.0
> 
> 	echo ${TEST} > /sys/block/sda/queue/scheduler
> 	echo ${TEST} > /sys/block/sdb/queue/scheduler
> 	sync
> 	echo 3 > /proc/sys/vm/drop_caches
> 	sleep 5
> 	
> 	./fio test.job5 --output test.result.${TEST}
> done
> 
> 
> -- 
> Mathieu Desnoyers
> OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH] block: Fix bio merge induced high I/O latency
  2009-01-17 19:04   ` Jens Axboe
  2009-01-18 21:12     ` Mathieu Desnoyers
@ 2009-01-19 15:45     ` Nikanth K
  2009-01-19 18:23       ` Jens Axboe
  1 sibling, 1 reply; 39+ messages in thread
From: Nikanth K @ 2009-01-19 15:45 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Mathieu Desnoyers, Andrea Arcangeli, akpm, Ingo Molnar,
	Linus Torvalds, linux-kernel, ltt-dev

On Sun, Jan 18, 2009 at 12:34 AM, Jens Axboe <jens.axboe@oracle.com> wrote:

>
> As a quick test, could you try and increase the slice_idle to eg 20ms?
> Sometimes I've seen timing being slightly off, which makes us miss the
> sync window for the ls (in your case) process. Then you get a mix of
> async and sync IO all the time, which very much slows down the sync
> process.
>

Do you mean to say that 'ls' could not submit another request until
the previous sync request completes, but its idle window gets disabled
as it takes way too long to complete during heavy load? But when there
are requests in the driver, wont the idling be disabled anyway? Or did
you mean to increase slice_sync?

Thanks
Nikanth

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH] block: Fix bio merge induced high I/O latency
  2009-01-19 15:45     ` Nikanth K
@ 2009-01-19 18:23       ` Jens Axboe
  0 siblings, 0 replies; 39+ messages in thread
From: Jens Axboe @ 2009-01-19 18:23 UTC (permalink / raw)
  To: Nikanth K
  Cc: Mathieu Desnoyers, Andrea Arcangeli, akpm, Ingo Molnar,
	Linus Torvalds, linux-kernel, ltt-dev

On Mon, Jan 19 2009, Nikanth K wrote:
> On Sun, Jan 18, 2009 at 12:34 AM, Jens Axboe <jens.axboe@oracle.com> wrote:
> 
> >
> > As a quick test, could you try and increase the slice_idle to eg 20ms?
> > Sometimes I've seen timing being slightly off, which makes us miss the
> > sync window for the ls (in your case) process. Then you get a mix of
> > async and sync IO all the time, which very much slows down the sync
> > process.
> >
> 
> Do you mean to say that 'ls' could not submit another request until
> the previous sync request completes, but its idle window gets disabled
> as it takes way too long to complete during heavy load? But when there

'ls' would never submit a new request before the previous one completes,
such is the nature of sync processes. That's the whole reason we have
the idle window.

> are requests in the driver, wont the idling be disabled anyway? Or did
> you mean to increase slice_sync?

No, idling is on a per-cfqq (process) basis. I did not mean to increase
slice_sync, that wont help at all. It's the window between submissions
of requests that I wanted to test being larger, but apparently that
wasn't the case here.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH] block: Fix bio merge induced high I/O latency
  2009-01-18 21:12     ` Mathieu Desnoyers
  2009-01-18 21:27       ` Mathieu Desnoyers
@ 2009-01-19 18:26       ` Jens Axboe
  2009-01-20  2:10         ` Mathieu Desnoyers
  1 sibling, 1 reply; 39+ messages in thread
From: Jens Axboe @ 2009-01-19 18:26 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Andrea Arcangeli, akpm, Ingo Molnar, Linus Torvalds,
	linux-kernel, ltt-dev

On Sun, Jan 18 2009, Mathieu Desnoyers wrote:
> I looked at the "ls" behavior (while doing a dd) within my LTTng trace
> to create a fio job file.  The said behavior is appended below as "Part
> 1 - ls I/O behavior". Note that the original "ls" test case was done
> with the anticipatory I/O scheduler, which was active by default on my
> debian system with custom vanilla 2.6.28 kernel. Also note that I am
> running this on a raid-1, but have experienced the same problem on a
> standard partition I created on the same machine.
> 
> I created the fio job file appended as "Part 2 - dd+ls fio job file". It
> consists of one dd-like job and many small jobs reading as many data as
> ls did. I used the small test script to batch run this ("Part 3 - batch
> test").
> 
> The results for the ls-like jobs are interesting :
> 
> I/O scheduler        runt-min (msec)   runt-max (msec)
> noop                       41             10563
> anticipatory               63              8185
> deadline                   52             33387
> cfq                        43              1420

Do you have queuing enabled on your drives? You can check that in
/sys/block/sdX/device/queue_depth. Try setting those to 1 and retest all
schedulers, would be good for comparison.

raid personalities or dm complicates matters, since it introduces a
disconnect between 'ls' and the io scheduler at the bottom...

> > As a quick test, could you try and increase the slice_idle to eg 20ms?
> > Sometimes I've seen timing being slightly off, which makes us miss the
> > sync window for the ls (in your case) process. Then you get a mix of
> > async and sync IO all the time, which very much slows down the sync
> > process.
> > 
> 
> Just to confirm, the quick test you are taking about would be :
> 
> ---
>  block/cfq-iosched.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Index: linux-2.6-lttng/block/cfq-iosched.c
> ===================================================================
> --- linux-2.6-lttng.orig/block/cfq-iosched.c	2009-01-18 15:17:32.000000000 -0500
> +++ linux-2.6-lttng/block/cfq-iosched.c	2009-01-18 15:46:38.000000000 -0500
> @@ -26,7 +26,7 @@ static const int cfq_back_penalty = 2;
>  static const int cfq_slice_sync = HZ / 10;
>  static int cfq_slice_async = HZ / 25;
>  static const int cfq_slice_async_rq = 2;
> -static int cfq_slice_idle = HZ / 125;
> +static int cfq_slice_idle = 20;
>  
>  /*
>   * offset from end of service tree
> 
> 
> It does not make much difference with the standard cfq test :
> 
> I/O scheduler        runt-min (msec)   runt-max (msec)
> cfq (standard)             43              1420
> cfq (20ms slice_idle)      31              1573

OK, that's good at least!

> So, I guess 1.5s delay to run ls on a directory when the cache is cold
> with a cfq I/O scheduler is somewhat acceptable, but I doubt the 8, 10
> and 33s response times for the anticipatory, noop and deadline I/O
> schedulers are. I wonder why on earth is the anticipatory I/O scheduler
> activated by default with my kernel given it results in so poor
> interactive behavior when doing large I/O ?

I see you already found out why :-)

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH] block: Fix bio merge induced high I/O latency
  2009-01-19 18:26       ` Jens Axboe
@ 2009-01-20  2:10         ` Mathieu Desnoyers
  2009-01-20  7:37           ` Jens Axboe
  0 siblings, 1 reply; 39+ messages in thread
From: Mathieu Desnoyers @ 2009-01-20  2:10 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Andrea Arcangeli, akpm, Ingo Molnar, Linus Torvalds,
	linux-kernel, ltt-dev

* Jens Axboe (jens.axboe@oracle.com) wrote:
> On Sun, Jan 18 2009, Mathieu Desnoyers wrote:
> > I looked at the "ls" behavior (while doing a dd) within my LTTng trace
> > to create a fio job file.  The said behavior is appended below as "Part
> > 1 - ls I/O behavior". Note that the original "ls" test case was done
> > with the anticipatory I/O scheduler, which was active by default on my
> > debian system with custom vanilla 2.6.28 kernel. Also note that I am
> > running this on a raid-1, but have experienced the same problem on a
> > standard partition I created on the same machine.
> > 
> > I created the fio job file appended as "Part 2 - dd+ls fio job file". It
> > consists of one dd-like job and many small jobs reading as many data as
> > ls did. I used the small test script to batch run this ("Part 3 - batch
> > test").
> > 
> > The results for the ls-like jobs are interesting :
> > 
> > I/O scheduler        runt-min (msec)   runt-max (msec)
> > noop                       41             10563
> > anticipatory               63              8185
> > deadline                   52             33387
> > cfq                        43              1420
> 

Extra note : I have a HZ=250 on my system. Changing to 100 or 1000 did
not make much difference (also tried with NO_HZ enabled).

> Do you have queuing enabled on your drives? You can check that in
> /sys/block/sdX/device/queue_depth. Try setting those to 1 and retest all
> schedulers, would be good for comparison.
> 

Here are the tests with a queue_depth of 1 :

I/O scheduler        runt-min (msec)   runt-max (msec)
noop                       43             38235
anticipatory               44              8728
deadline                   51             19751
cfq                        48               427


Overall, I wouldn't say it makes much difference.


> raid personalities or dm complicates matters, since it introduces a
> disconnect between 'ls' and the io scheduler at the bottom...
> 

Yes, ideally I should re-run those directly on the disk partitions.

I am also tempted to create a fio job file which acts like a ssh server
receiving a connexion after it has been pruned from the cache while the
system if doing heavy I/O. "ssh", in this case, seems to be doing much
more I/O than a simple "ls", and I think we might want to see if cfq
behaves correctly in such case. Most of this I/O is coming from page
faults (identified as traps in the trace) probably because the ssh
executable has been thrown out of the cache by

echo 3 > /proc/sys/vm/drop_caches

The behavior of an incoming ssh connexion after clearing the cache is
appended below (Part 1 - LTTng trace for incoming ssh connexion). The
job file created (Part 2) reads, for each job, a 2MB file with random
reads each between 4k-44k. The results are very interesting for cfq :

I/O scheduler        runt-min (msec)   runt-max (msec)
noop                       586           110242
anticipatory               531            26942
deadline                   561           108772
cfq                        523            28216

So, basically, ssh being out of the cache can take 28s to answer an
incoming ssh connexion even with the cfq scheduler. This is not exactly
what I would call an acceptable latency.

Mathieu


* Part 1 - LTTng trace for incoming ssh connexion


block.bio_queue: 14270.987362011 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, sshd, , 4159, 0x0, SYSCALL { sector = 12312, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 }
block.bio_queue: 14270.987370577 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, sshd, , 4159, 0x0, SYSCALL { sector = 21507282, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14271.002701211 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, sshd, , 4159, 0x0, SYSCALL { sector = 376717312, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 }
block.bio_queue: 14271.002708852 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, sshd, , 4159, 0x0, SYSCALL { sector = 398212282, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14271.994249134 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, sshd, , 4159, 0x0, SYSCALL { sector = 376762504, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 }
block.bio_queue: 14271.994258500 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, sshd, , 4159, 0x0, SYSCALL { sector = 398257474, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14272.005047300 (/traces/trace-slow-ssh-pid-5555/block_0), 5555, 5555, sshd, , 4159, 0x0, TRAP { sector = 186581088, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14272.005054182 (/traces/trace-slow-ssh-pid-5555/block_0), 5555, 5555, sshd, , 4159, 0x0, TRAP { sector = 208076058, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14272.197046688 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, sshd, , 4159, 0x0, TRAP { sector = 186581680, size = 45056, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14272.197056120 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, sshd, , 4159, 0x0, TRAP { sector = 208076650, size = 45056, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14272.214463959 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, sshd, , 4159, 0x0, TRAP { sector = 376983192, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14272.214469777 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, sshd, , 4159, 0x0, TRAP { sector = 398478162, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14272.358980449 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, sshd, , 4159, 0x0, TRAP { sector = 376983312, size = 16384, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14272.358986893 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, sshd, , 4159, 0x0, TRAP { sector = 398478282, size = 16384, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14272.366179882 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 504036296, size = 20480, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14272.366188841 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 525531266, size = 20480, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14272.366228133 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 504037392, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14272.366233770 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 525532362, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14272.366245471 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 504070144, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14272.366250460 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 525565114, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14272.366258431 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 504172624, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14272.366263414 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 525667594, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14272.366271329 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 504172640, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14272.366275709 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 525667610, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14272.366305707 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 504172664, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14272.366311569 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 525667634, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14272.366320581 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 504172680, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14272.366327005 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 525667650, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14272.366334928 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 504172688, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14272.366339671 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 525667658, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14272.366351578 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 504172696, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14272.366356064 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 525667666, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14272.394371136 (/traces/trace-slow-ssh-pid-5555/block_0), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 504172704, size = 16384, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14272.394378840 (/traces/trace-slow-ssh-pid-5555/block_0), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 525667674, size = 16384, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14272.394396826 (/traces/trace-slow-ssh-pid-5555/block_0), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 504172744, size = 16384, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14272.394402397 (/traces/trace-slow-ssh-pid-5555/block_0), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 525667714, size = 16384, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14272.504393076 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 376762496, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 }
block.bio_queue: 14272.504399733 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 398257466, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14272.651642743 (/traces/trace-slow-ssh-pid-5555/block_0), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376819168, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14272.651650198 (/traces/trace-slow-ssh-pid-5555/block_0), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398314138, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14272.651668568 (/traces/trace-slow-ssh-pid-5555/block_0), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376819192, size = 8192, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14272.651673473 (/traces/trace-slow-ssh-pid-5555/block_0), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398314162, size = 8192, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14272.813095173 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376930384, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14272.813103780 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398425354, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14272.818773204 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376983360, size = 8192, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14272.818779958 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398478330, size = 8192, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14272.867827280 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376871792, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14272.867834786 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398366762, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14272.867857878 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376871816, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14272.867863845 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398366786, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.000933599 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376871832, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.000941927 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398366802, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.000962547 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376871856, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.000967971 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398366826, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.000988999 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376871896, size = 20480, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.000994441 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398366866, size = 20480, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.016781818 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 557798168, size = 16384, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.016787698 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 579293138, size = 16384, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.027449494 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 557798264, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.027455846 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 579293234, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.079950572 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 557801192, size = 69632, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.079957430 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 579296162, size = 69632, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.087728033 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 557800984, size = 106496, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.087734033 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 579295954, size = 106496, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.205730103 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376977904, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.205735312 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398472874, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.213716615 (/traces/trace-slow-ssh-pid-5555/block_0), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 557596672, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 }
block.bio_queue: 14273.213725447 (/traces/trace-slow-ssh-pid-5555/block_0), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 579091642, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.376105867 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 557632888, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 }
block.bio_queue: 14273.376113769 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 579127858, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.390329162 (/traces/trace-slow-ssh-pid-5555/block_3), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 557744176, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.390338057 (/traces/trace-slow-ssh-pid-5555/block_3), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 579239146, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.390366345 (/traces/trace-slow-ssh-pid-5555/block_3), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 557744184, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.390371136 (/traces/trace-slow-ssh-pid-5555/block_3), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 579239154, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.390384775 (/traces/trace-slow-ssh-pid-5555/block_3), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 557744192, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.390389617 (/traces/trace-slow-ssh-pid-5555/block_3), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 579239162, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.390402469 (/traces/trace-slow-ssh-pid-5555/block_3), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 557744200, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.390407113 (/traces/trace-slow-ssh-pid-5555/block_3), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 579239170, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.390420125 (/traces/trace-slow-ssh-pid-5555/block_3), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 557744208, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.390424982 (/traces/trace-slow-ssh-pid-5555/block_3), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 579239178, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.390432638 (/traces/trace-slow-ssh-pid-5555/block_3), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 557744216, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.390436805 (/traces/trace-slow-ssh-pid-5555/block_3), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 579239186, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.390462732 (/traces/trace-slow-ssh-pid-5555/block_3), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 557744224, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.390467689 (/traces/trace-slow-ssh-pid-5555/block_3), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 579239194, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.548801789 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 557744232, size = 8192, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.548812506 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 579239202, size = 8192, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.548844346 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 557744256, size = 32768, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.548850571 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 579239226, size = 32768, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.555483129 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376978008, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.555489558 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398472978, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.555502566 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376978016, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.555507462 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398472986, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.555513691 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376978024, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.555518362 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398472994, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.555522790 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376978032, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.555527365 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398473002, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.555531940 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376978040, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.555536359 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398473010, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.555540953 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376978048, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.555545306 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398473018, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.555549707 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376978056, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.555554228 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398473026, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.555565226 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376978064, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.555583185 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398473034, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.556111195 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376978072, size = 12288, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.556116436 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398473042, size = 12288, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.556132550 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376978104, size = 24576, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.556137395 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398473074, size = 24576, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.557633755 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376979192, size = 20480, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.557639746 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398474162, size = 20480, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.557651417 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376979240, size = 12288, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.557655782 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398474210, size = 12288, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.558790122 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376978680, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.558797670 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398473650, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.558810157 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376978688, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.558815023 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398473658, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.558826051 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376978736, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.558830869 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398473706, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.559618325 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376978744, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.559624455 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398473714, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.559648476 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376978760, size = 16384, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.559653673 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398473730, size = 16384, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.560470401 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 557632776, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 }
block.bio_queue: 14273.560475954 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 579127746, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.564633093 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 557647824, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.564639949 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 579142794, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.570412202 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 557647944, size = 36864, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.570417494 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 579142914, size = 36864, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.570432050 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 557648024, size = 28672, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.570436544 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 579142994, size = 28672, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.573250317 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 557648112, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.573255825 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 579143082, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.573813668 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 557648208, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.573819380 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 579143178, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.574357597 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 557649240, size = 69632, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.574363720 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 579144210, size = 69632, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.579745509 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 557632816, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 }
block.bio_queue: 14273.579750936 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 579127786, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.580137575 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 557649536, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.580143137 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 579144506, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.581782686 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 557649648, size = 28672, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.581787972 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 579144618, size = 28672, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.581798890 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 557649712, size = 16384, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.581803213 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 579144682, size = 16384, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.583373838 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376980416, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.583379589 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398475386, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.592597554 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376982864, size = 77824, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.592603461 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398477834, size = 77824, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.605484632 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 557649424, size = 8192, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.605490392 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 579144394, size = 8192, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.606285537 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376766472, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.606292749 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398261442, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.618255248 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503841136, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 }
block.bio_queue: 14273.618262031 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525336106, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.766848612 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503957088, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 }
block.bio_queue: 14273.766854819 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525452058, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.779173851 (/traces/trace-slow-ssh-pid-5555/block_3), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503857536, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.779179020 (/traces/trace-slow-ssh-pid-5555/block_3), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525352506, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.956064108 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 383516688, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 }
block.bio_queue: 14273.956073127 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 405011658, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14273.963661833 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 504172672, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 }
block.bio_queue: 14273.963667482 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525667642, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14274.105890774 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503857200, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14274.105897887 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525352170, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14274.114466614 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 639844352, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 }
block.bio_queue: 14274.114471721 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 661339322, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14274.194546003 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503857392, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14274.194551112 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525352362, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14274.195244833 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376978584, size = 8192, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14274.195250131 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398473554, size = 8192, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14274.342679172 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376977824, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14274.342686069 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398472794, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14274.342702066 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376977864, size = 12288, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14274.342706689 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398472834, size = 12288, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14274.514308041 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376979128, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14274.514316219 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398474098, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14274.514332549 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376979144, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14274.514337418 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398474114, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14274.514354278 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376979160, size = 8192, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14274.514358806 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398474130, size = 8192, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14274.514371841 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376979176, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14274.514376353 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398474146, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14274.671607720 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 110366736, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14274.671614533 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 131861706, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14274.688855653 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503841144, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 }
block.bio_queue: 14274.688861789 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525336114, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14274.710775517 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503957224, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 }
block.bio_queue: 14274.710783249 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525452194, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14274.711178453 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 504036272, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14274.711185887 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525531242, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14275.753947620 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 557727992, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 }
block.bio_queue: 14275.753956191 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 579222962, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14275.891101527 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 558242792, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 }
block.bio_queue: 14275.891109390 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 579737762, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14276.054306664 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 566165504, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 }
block.bio_queue: 14276.054312781 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 587660474, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14276.202061219 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 566169560, size = 16384, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14276.202067900 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 587664530, size = 16384, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14276.343169743 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 566169656, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14276.343177097 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 587664626, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14276.435036005 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 566171584, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14276.435042329 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 587666554, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14276.587967625 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 566170576, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14276.587975446 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 587665546, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14276.714877542 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 566171080, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14276.714885441 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 587666050, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14276.885331923 (/traces/trace-slow-ssh-pid-5555/block_0), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 566170824, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14276.885338400 (/traces/trace-slow-ssh-pid-5555/block_0), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 587665794, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.041004774 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 566170696, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.041011242 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 587665666, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.090024321 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 566170760, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.090030807 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 587665730, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.139160617 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 566170792, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.139166503 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 587665762, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.146527238 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 566170808, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.146532806 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 587665778, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.147041642 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 566170816, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.147046664 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 587665786, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.147056378 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 566170832, size = 8192, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.147060909 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 587665802, size = 8192, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.149654636 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 504086544, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.149661995 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525581514, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.299441568 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 566165512, size = 16384, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.299449098 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 587660482, size = 16384, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.316058849 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 566165608, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.316064702 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 587660578, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.316655231 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 566167536, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.316661231 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 587662506, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.319198772 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 566168544, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.319204644 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 587663514, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.325427594 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 566169048, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.325432190 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 587664018, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.327980237 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 566169296, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.327985268 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 587664266, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.329234978 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 566169168, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.329239811 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 587664138, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.330769742 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 566169104, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.330775631 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 587664074, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.331300113 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 566169136, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.331305777 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 587664106, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.331634685 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 566169120, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.331640664 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 587664090, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.332191280 (/traces/trace-slow-ssh-pid-5555/block_0), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 566169112, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.332198036 (/traces/trace-slow-ssh-pid-5555/block_0), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 587664082, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.332857870 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 641990688, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 }
block.bio_queue: 14277.332863016 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 663485658, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.339925356 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 504086552, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.339930549 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525581522, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.350000251 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503840960, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 }
block.bio_queue: 14277.350007112 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525335930, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.360440736 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503844888, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.360446037 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525339858, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.417649469 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503841152, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 }
block.bio_queue: 14277.417655383 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525336122, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.418058555 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503957240, size = 16384, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.418063403 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525452210, size = 16384, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.418555076 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503957272, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.418560377 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525452242, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.418570217 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503957280, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.418574897 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525452250, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.418581063 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503957288, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.418585764 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525452258, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.418590078 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503957296, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.418594614 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525452266, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.418598451 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503957304, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.418602756 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525452274, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.418606908 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503957312, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.418611238 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525452282, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.418615216 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503957320, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.418619527 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525452290, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.418623322 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503957328, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.418627663 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525452298, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.418836246 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503957336, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.418841193 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525452306, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.419381341 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503957344, size = 65536, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.419386225 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525452314, size = 65536, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.419849133 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503957472, size = 20480, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.419853747 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525452442, size = 20480, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.576690908 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 110510128, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.576698949 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 132005098, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.588845789 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503988328, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.588852656 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525483298, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.601952879 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503873536, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14277.601959539 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525368506, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14278.060232543 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376983048, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14278.060241912 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398478018, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14278.064129159 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503857272, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 }
block.bio_queue: 14278.064138655 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525352242, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14278.071310370 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 504037776, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14278.071330264 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525532746, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14278.080891196 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503939072, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14278.080897109 (/traces/trace-slow-ssh-pid-5555/block_1), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525434042, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14278.084320641 (/traces/trace-slow-ssh-pid-5555/block_0), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376947512, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14278.084328574 (/traces/trace-slow-ssh-pid-5555/block_0), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398442482, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14278.084343616 (/traces/trace-slow-ssh-pid-5555/block_0), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376947552, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14278.084348755 (/traces/trace-slow-ssh-pid-5555/block_0), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398442522, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14278.084358266 (/traces/trace-slow-ssh-pid-5555/block_0), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376947568, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14278.084363390 (/traces/trace-slow-ssh-pid-5555/block_0), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398442538, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14278.084378252 (/traces/trace-slow-ssh-pid-5555/block_0), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376947576, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14278.084383308 (/traces/trace-slow-ssh-pid-5555/block_0), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398442546, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14278.096592889 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376947584, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14278.096599909 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398442554, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14278.096953622 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 376946984, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14278.096958890 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 398441954, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14278.101879473 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503955464, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14278.101885305 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525450434, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14278.118154240 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503971864, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14278.118162137 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525466834, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14278.126133387 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503988608, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14278.126139687 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525483578, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14278.136351623 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503857280, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14278.136357399 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525352250, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14278.138499766 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 566169080, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14278.138506375 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 587664050, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14278.139160026 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 566169064, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14278.139165315 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 587664034, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14278.139782848 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 566169072, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14278.139788161 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 587664042, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14278.139799535 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 566169088, size = 8192, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14278.139804017 (/traces/trace-slow-ssh-pid-5555/block_5), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 587664058, size = 8192, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14278.141005857 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503841632, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 }
block.bio_queue: 14278.141012172 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525336602, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14278.149367501 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 503956240, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14278.149373775 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 525451210, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14278.155173707 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 315408384, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 }
block.bio_queue: 14278.155179359 (/traces/trace-slow-ssh-pid-5555/block_6), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 336903354, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14278.169842985 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 483393984, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 }
block.bio_queue: 14278.169849091 (/traces/trace-slow-ssh-pid-5555/block_7), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 504888954, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14278.180896269 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 483400808, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 }
block.bio_queue: 14278.180903577 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 504895778, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14278.184431117 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 483795656, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14278.184437162 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 505290626, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14278.209624125 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 503923064, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14278.209631628 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, TRAP { sector = 525418034, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14278.221083451 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 503873552, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14278.221090019 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 525368522, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14278.318767351 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 640040968, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 10, not_uptodate = 0 }
block.bio_queue: 14278.318773435 (/traces/trace-slow-ssh-pid-5555/block_4), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 661535938, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14278.325009226 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 641367208, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14278.325014566 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 662862178, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14278.330573352 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 641367216, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }
block.bio_queue: 14278.330579649 (/traces/trace-slow-ssh-pid-5555/block_2), 5555, 5555, /usr/sbin/sshd, , 4159, 0x0, SYSCALL { sector = 662862186, size = 4096, rw(FAILFAST_DRIVER,FAILFAST_TRANSPORT, = 0, not_uptodate = 0 }


* Part 2 - ssh connexion job file (test.job.ssh)

[job1]
rw=write
size=10240m
direct=0
blocksize=1024k

[global]
rw=randread
size=2048k
filesize=30m
direct=0
bsrange=4k-44k

[file1]
startdelay=0

[file2]
startdelay=4

[file3]
startdelay=8

[file4]
startdelay=12

[file5]
startdelay=16

[file6]
startdelay=20

[file7]
startdelay=24

[file8]
startdelay=28

[file9]
startdelay=32

[file10]
startdelay=36

[file11]
startdelay=40

[file12]
startdelay=44

[file13]
startdelay=48

[file14]
startdelay=52

[file15]
startdelay=56

[file16]
startdelay=60

[file17]
startdelay=64

[file18]
startdelay=68

[file19]
startdelay=72

[file20]
startdelay=76

[file21]
startdelay=80

[file22]
startdelay=84

[file23]
startdelay=88

[file24]
startdelay=92

[file25]
startdelay=96

[file26]
startdelay=100

[file27]
startdelay=104

[file28]
startdelay=108

[file29]
startdelay=112

[file30]
startdelay=116

[file31]
startdelay=120

[file32]
startdelay=124

[file33]
startdelay=128

[file34]
startdelay=132

[file35]
startdelay=134

[file36]
startdelay=138

[file37]
startdelay=142

[file38]
startdelay=146

[file39]
startdelay=150

[file40]
startdelay=200

[file41]
startdelay=260

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH] block: Fix bio merge induced high I/O latency
  2009-01-20  2:10         ` Mathieu Desnoyers
@ 2009-01-20  7:37           ` Jens Axboe
  2009-01-20 12:28             ` Jens Axboe
                               ` (2 more replies)
  0 siblings, 3 replies; 39+ messages in thread
From: Jens Axboe @ 2009-01-20  7:37 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Andrea Arcangeli, akpm, Ingo Molnar, Linus Torvalds,
	linux-kernel, ltt-dev

On Mon, Jan 19 2009, Mathieu Desnoyers wrote:
> * Jens Axboe (jens.axboe@oracle.com) wrote:
> > On Sun, Jan 18 2009, Mathieu Desnoyers wrote:
> > > I looked at the "ls" behavior (while doing a dd) within my LTTng trace
> > > to create a fio job file.  The said behavior is appended below as "Part
> > > 1 - ls I/O behavior". Note that the original "ls" test case was done
> > > with the anticipatory I/O scheduler, which was active by default on my
> > > debian system with custom vanilla 2.6.28 kernel. Also note that I am
> > > running this on a raid-1, but have experienced the same problem on a
> > > standard partition I created on the same machine.
> > > 
> > > I created the fio job file appended as "Part 2 - dd+ls fio job file". It
> > > consists of one dd-like job and many small jobs reading as many data as
> > > ls did. I used the small test script to batch run this ("Part 3 - batch
> > > test").
> > > 
> > > The results for the ls-like jobs are interesting :
> > > 
> > > I/O scheduler        runt-min (msec)   runt-max (msec)
> > > noop                       41             10563
> > > anticipatory               63              8185
> > > deadline                   52             33387
> > > cfq                        43              1420
> > 
> 
> Extra note : I have a HZ=250 on my system. Changing to 100 or 1000 did
> not make much difference (also tried with NO_HZ enabled).
> 
> > Do you have queuing enabled on your drives? You can check that in
> > /sys/block/sdX/device/queue_depth. Try setting those to 1 and retest all
> > schedulers, would be good for comparison.
> > 
> 
> Here are the tests with a queue_depth of 1 :
> 
> I/O scheduler        runt-min (msec)   runt-max (msec)
> noop                       43             38235
> anticipatory               44              8728
> deadline                   51             19751
> cfq                        48               427
> 
> 
> Overall, I wouldn't say it makes much difference.

0,5 seconds vs 1,5 seconds isn't much of a difference?

> > raid personalities or dm complicates matters, since it introduces a
> > disconnect between 'ls' and the io scheduler at the bottom...
> > 
> 
> Yes, ideally I should re-run those directly on the disk partitions.

At least for comparison.

> I am also tempted to create a fio job file which acts like a ssh server
> receiving a connexion after it has been pruned from the cache while the
> system if doing heavy I/O. "ssh", in this case, seems to be doing much
> more I/O than a simple "ls", and I think we might want to see if cfq
> behaves correctly in such case. Most of this I/O is coming from page
> faults (identified as traps in the trace) probably because the ssh
> executable has been thrown out of the cache by
> 
> echo 3 > /proc/sys/vm/drop_caches
> 
> The behavior of an incoming ssh connexion after clearing the cache is
> appended below (Part 1 - LTTng trace for incoming ssh connexion). The
> job file created (Part 2) reads, for each job, a 2MB file with random
> reads each between 4k-44k. The results are very interesting for cfq :
> 
> I/O scheduler        runt-min (msec)   runt-max (msec)
> noop                       586           110242
> anticipatory               531            26942
> deadline                   561           108772
> cfq                        523            28216
> 
> So, basically, ssh being out of the cache can take 28s to answer an
> incoming ssh connexion even with the cfq scheduler. This is not exactly
> what I would call an acceptable latency.

At some point, you have to stop and consider what is acceptable
performance for a given IO pattern. Your ssh test case is purely random
IO, and neither CFQ nor AS would do any idling for that. We can make
this test case faster for sure, the hard part is making sure that we
don't regress on async throughput at the same time.

Also remember that with your raid1, it's not entirely reasonable to
blaim all performance issues on the IO scheduler as per my previous
mail. It would be a lot more fair to view the disk numbers individually.

Can you retry this job with 'quantum' set to 1 and 'slice_async_rq' set
to 1 as well?

However, I think we should be doing somewhat better at this test case.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH] block: Fix bio merge induced high I/O latency
  2009-01-20  7:37           ` Jens Axboe
@ 2009-01-20 12:28             ` Jens Axboe
  2009-01-20 14:22               ` [ltt-dev] " Mathieu Desnoyers
                                 ` (2 more replies)
  2009-01-20 13:45             ` [ltt-dev] " Mathieu Desnoyers
  2009-01-20 20:22             ` Ben Gamari
  2 siblings, 3 replies; 39+ messages in thread
From: Jens Axboe @ 2009-01-20 12:28 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: akpm, Ingo Molnar, Linus Torvalds, linux-kernel, ltt-dev

On Tue, Jan 20 2009, Jens Axboe wrote:
> On Mon, Jan 19 2009, Mathieu Desnoyers wrote:
> > * Jens Axboe (jens.axboe@oracle.com) wrote:
> > > On Sun, Jan 18 2009, Mathieu Desnoyers wrote:
> > > > I looked at the "ls" behavior (while doing a dd) within my LTTng trace
> > > > to create a fio job file.  The said behavior is appended below as "Part
> > > > 1 - ls I/O behavior". Note that the original "ls" test case was done
> > > > with the anticipatory I/O scheduler, which was active by default on my
> > > > debian system with custom vanilla 2.6.28 kernel. Also note that I am
> > > > running this on a raid-1, but have experienced the same problem on a
> > > > standard partition I created on the same machine.
> > > > 
> > > > I created the fio job file appended as "Part 2 - dd+ls fio job file". It
> > > > consists of one dd-like job and many small jobs reading as many data as
> > > > ls did. I used the small test script to batch run this ("Part 3 - batch
> > > > test").
> > > > 
> > > > The results for the ls-like jobs are interesting :
> > > > 
> > > > I/O scheduler        runt-min (msec)   runt-max (msec)
> > > > noop                       41             10563
> > > > anticipatory               63              8185
> > > > deadline                   52             33387
> > > > cfq                        43              1420
> > > 
> > 
> > Extra note : I have a HZ=250 on my system. Changing to 100 or 1000 did
> > not make much difference (also tried with NO_HZ enabled).
> > 
> > > Do you have queuing enabled on your drives? You can check that in
> > > /sys/block/sdX/device/queue_depth. Try setting those to 1 and retest all
> > > schedulers, would be good for comparison.
> > > 
> > 
> > Here are the tests with a queue_depth of 1 :
> > 
> > I/O scheduler        runt-min (msec)   runt-max (msec)
> > noop                       43             38235
> > anticipatory               44              8728
> > deadline                   51             19751
> > cfq                        48               427
> > 
> > 
> > Overall, I wouldn't say it makes much difference.
> 
> 0,5 seconds vs 1,5 seconds isn't much of a difference?
> 
> > > raid personalities or dm complicates matters, since it introduces a
> > > disconnect between 'ls' and the io scheduler at the bottom...
> > > 
> > 
> > Yes, ideally I should re-run those directly on the disk partitions.
> 
> At least for comparison.
> 
> > I am also tempted to create a fio job file which acts like a ssh server
> > receiving a connexion after it has been pruned from the cache while the
> > system if doing heavy I/O. "ssh", in this case, seems to be doing much
> > more I/O than a simple "ls", and I think we might want to see if cfq
> > behaves correctly in such case. Most of this I/O is coming from page
> > faults (identified as traps in the trace) probably because the ssh
> > executable has been thrown out of the cache by
> > 
> > echo 3 > /proc/sys/vm/drop_caches
> > 
> > The behavior of an incoming ssh connexion after clearing the cache is
> > appended below (Part 1 - LTTng trace for incoming ssh connexion). The
> > job file created (Part 2) reads, for each job, a 2MB file with random
> > reads each between 4k-44k. The results are very interesting for cfq :
> > 
> > I/O scheduler        runt-min (msec)   runt-max (msec)
> > noop                       586           110242
> > anticipatory               531            26942
> > deadline                   561           108772
> > cfq                        523            28216
> > 
> > So, basically, ssh being out of the cache can take 28s to answer an
> > incoming ssh connexion even with the cfq scheduler. This is not exactly
> > what I would call an acceptable latency.
> 
> At some point, you have to stop and consider what is acceptable
> performance for a given IO pattern. Your ssh test case is purely random
> IO, and neither CFQ nor AS would do any idling for that. We can make
> this test case faster for sure, the hard part is making sure that we
> don't regress on async throughput at the same time.
> 
> Also remember that with your raid1, it's not entirely reasonable to
> blaim all performance issues on the IO scheduler as per my previous
> mail. It would be a lot more fair to view the disk numbers individually.
> 
> Can you retry this job with 'quantum' set to 1 and 'slice_async_rq' set
> to 1 as well?
> 
> However, I think we should be doing somewhat better at this test case.

Mathieu, does this improve anything for you?

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index e8525fa..a556512 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -1765,6 +1765,32 @@ cfq_update_idle_window(struct cfq_data *cfqd, struct cfq_queue *cfqq,
 }
 
 /*
+ * Pull dispatched requests from 'cfqq' back into the scheduler
+ */
+static void cfq_pull_dispatched_requests(struct cfq_data *cfqd,
+					 struct cfq_queue *cfqq)
+{
+	struct request_queue *q = cfqd->queue;
+	struct request *rq, *tmp;
+
+	list_for_each_entry_safe(rq, tmp, &q->queue_head, queuelist) {
+		if ((rq->cmd_flags & REQ_STARTED) || RQ_CFQQ(rq) != cfqq)
+			continue;
+
+		/*
+		 * Pull off the dispatch list and put it back into the cfqq
+		 */
+		list_del(&rq->queuelist);
+		cfqq->dispatched--;
+		if (cfq_cfqq_sync(cfqq))
+			cfqd->sync_flight--;
+
+		list_add_tail(&rq->queuelist, &cfqq->fifo);
+		cfq_add_rq_rb(rq);
+	}
+}
+
+/*
  * Check if new_cfqq should preempt the currently active queue. Return 0 for
  * no or if we aren't sure, a 1 will cause a preempt.
  */
@@ -1820,8 +1846,14 @@ cfq_should_preempt(struct cfq_data *cfqd, struct cfq_queue *new_cfqq,
  */
 static void cfq_preempt_queue(struct cfq_data *cfqd, struct cfq_queue *cfqq)
 {
+	struct cfq_queue *old_cfqq = cfqd->active_queue;
+
 	cfq_log_cfqq(cfqd, cfqq, "preempt");
-	cfq_slice_expired(cfqd, 1);
+
+	if (old_cfqq) {
+		__cfq_slice_expired(cfqd, old_cfqq, 1);
+		cfq_pull_dispatched_requests(cfqd, old_cfqq);
+	}
 
 	/*
 	 * Put the new queue at the front of the of the current list,

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [ltt-dev] [RFC PATCH] block: Fix bio merge induced high I/O latency
  2009-01-20  7:37           ` Jens Axboe
  2009-01-20 12:28             ` Jens Axboe
@ 2009-01-20 13:45             ` Mathieu Desnoyers
  2009-01-20 20:22             ` Ben Gamari
  2 siblings, 0 replies; 39+ messages in thread
From: Mathieu Desnoyers @ 2009-01-20 13:45 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-kernel, ltt-dev, Andrea Arcangeli, akpm, Linus Torvalds,
	Ingo Molnar

* Jens Axboe (jens.axboe@oracle.com) wrote:
> On Mon, Jan 19 2009, Mathieu Desnoyers wrote:
> > * Jens Axboe (jens.axboe@oracle.com) wrote:
> > > On Sun, Jan 18 2009, Mathieu Desnoyers wrote:
> > > > I looked at the "ls" behavior (while doing a dd) within my LTTng trace
> > > > to create a fio job file.  The said behavior is appended below as "Part
> > > > 1 - ls I/O behavior". Note that the original "ls" test case was done
> > > > with the anticipatory I/O scheduler, which was active by default on my
> > > > debian system with custom vanilla 2.6.28 kernel. Also note that I am
> > > > running this on a raid-1, but have experienced the same problem on a
> > > > standard partition I created on the same machine.
> > > > 
> > > > I created the fio job file appended as "Part 2 - dd+ls fio job file". It
> > > > consists of one dd-like job and many small jobs reading as many data as
> > > > ls did. I used the small test script to batch run this ("Part 3 - batch
> > > > test").
> > > > 
> > > > The results for the ls-like jobs are interesting :
> > > > 
> > > > I/O scheduler        runt-min (msec)   runt-max (msec)
> > > > noop                       41             10563
> > > > anticipatory               63              8185
> > > > deadline                   52             33387
> > > > cfq                        43              1420
> > > 
> > 
> > Extra note : I have a HZ=250 on my system. Changing to 100 or 1000 did
> > not make much difference (also tried with NO_HZ enabled).
> > 
> > > Do you have queuing enabled on your drives? You can check that in
> > > /sys/block/sdX/device/queue_depth. Try setting those to 1 and retest all
> > > schedulers, would be good for comparison.
> > > 
> > 
> > Here are the tests with a queue_depth of 1 :
> > 
> > I/O scheduler        runt-min (msec)   runt-max (msec)
> > noop                       43             38235
> > anticipatory               44              8728
> > deadline                   51             19751
> > cfq                        48               427
> > 
> > 
> > Overall, I wouldn't say it makes much difference.
> 
> 0,5 seconds vs 1,5 seconds isn't much of a difference?
> 

threefold.. yes, that's significant, but not in term of usability in
that specific case.

> > > raid personalities or dm complicates matters, since it introduces a
> > > disconnect between 'ls' and the io scheduler at the bottom...
> > > 
> > 
> > Yes, ideally I should re-run those directly on the disk partitions.
> 
> At least for comparison.
> 

Here it is. ssh test done on /dev/sda directly

queue_depth=31 (default)
/sys/block/sda/queue/iosched/slice_async_rq = 2 (default)
/sys/block/sda/queue/iosched/quantum = 4 (default)

I/O scheduler        runt-min (msec)   runt-max (msec)
noop                      612            205684
anticipatory              562              5555 
deadline                  505            113153          
cfq                       523              6637

> > I am also tempted to create a fio job file which acts like a ssh server
> > receiving a connexion after it has been pruned from the cache while the
> > system if doing heavy I/O. "ssh", in this case, seems to be doing much
> > more I/O than a simple "ls", and I think we might want to see if cfq
> > behaves correctly in such case. Most of this I/O is coming from page
> > faults (identified as traps in the trace) probably because the ssh
> > executable has been thrown out of the cache by
> > 
> > echo 3 > /proc/sys/vm/drop_caches
> > 
> > The behavior of an incoming ssh connexion after clearing the cache is
> > appended below (Part 1 - LTTng trace for incoming ssh connexion). The
> > job file created (Part 2) reads, for each job, a 2MB file with random
> > reads each between 4k-44k. The results are very interesting for cfq :
> > 
> > I/O scheduler        runt-min (msec)   runt-max (msec)
> > noop                       586           110242
> > anticipatory               531            26942
> > deadline                   561           108772
> > cfq                        523            28216
> > 
> > So, basically, ssh being out of the cache can take 28s to answer an
> > incoming ssh connexion even with the cfq scheduler. This is not exactly
> > what I would call an acceptable latency.
> 
> At some point, you have to stop and consider what is acceptable
> performance for a given IO pattern. Your ssh test case is purely random
> IO, and neither CFQ nor AS would do any idling for that. We can make
> this test case faster for sure, the hard part is making sure that we
> don't regress on async throughput at the same time.
> 
> Also remember that with your raid1, it's not entirely reasonable to
> blaim all performance issues on the IO scheduler as per my previous
> mail. It would be a lot more fair to view the disk numbers individually.
> 
> Can you retry this job with 'quantum' set to 1 and 'slice_async_rq' set
> to 1 as well?
> 

Sure, ssh test done on /dev/sda

queue_depth=31 (default)
/sys/block/sda/queue/iosched/slice_async_rq = 1
/sys/block/sda/queue/iosched/quantum = 1

I/O scheduler        runt-min (msec)   runt-max (msec)
cfq (default)             523              6637
cfq (s_rq=1,q=1)          503              6743

It did not do much difference.

Mathieu


> However, I think we should be doing somewhat better at this test case.
> 
> -- 
> Jens Axboe
> 
> 
> _______________________________________________
> ltt-dev mailing list
> ltt-dev@lists.casi.polymtl.ca
> http://lists.casi.polymtl.ca/cgi-bin/mailman/listinfo/ltt-dev
> 

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [ltt-dev] [RFC PATCH] block: Fix bio merge induced high I/O latency
  2009-01-20 12:28             ` Jens Axboe
@ 2009-01-20 14:22               ` Mathieu Desnoyers
  2009-01-20 14:24                 ` Jens Axboe
  2009-01-20 23:27               ` Mathieu Desnoyers
  2009-02-02  2:08               ` [RFC PATCH] block: Fix bio merge induced high I/O latency Mathieu Desnoyers
  2 siblings, 1 reply; 39+ messages in thread
From: Mathieu Desnoyers @ 2009-01-20 14:22 UTC (permalink / raw)
  To: Jens Axboe; +Cc: akpm, ltt-dev, Linus Torvalds, Ingo Molnar, linux-kernel

* Jens Axboe (jens.axboe@oracle.com) wrote:
> On Tue, Jan 20 2009, Jens Axboe wrote:
> > On Mon, Jan 19 2009, Mathieu Desnoyers wrote:
> > > * Jens Axboe (jens.axboe@oracle.com) wrote:
> > > > On Sun, Jan 18 2009, Mathieu Desnoyers wrote:
> > > > > I looked at the "ls" behavior (while doing a dd) within my LTTng trace
> > > > > to create a fio job file.  The said behavior is appended below as "Part
> > > > > 1 - ls I/O behavior". Note that the original "ls" test case was done
> > > > > with the anticipatory I/O scheduler, which was active by default on my
> > > > > debian system with custom vanilla 2.6.28 kernel. Also note that I am
> > > > > running this on a raid-1, but have experienced the same problem on a
> > > > > standard partition I created on the same machine.
> > > > > 
> > > > > I created the fio job file appended as "Part 2 - dd+ls fio job file". It
> > > > > consists of one dd-like job and many small jobs reading as many data as
> > > > > ls did. I used the small test script to batch run this ("Part 3 - batch
> > > > > test").
> > > > > 
> > > > > The results for the ls-like jobs are interesting :
> > > > > 
> > > > > I/O scheduler        runt-min (msec)   runt-max (msec)
> > > > > noop                       41             10563
> > > > > anticipatory               63              8185
> > > > > deadline                   52             33387
> > > > > cfq                        43              1420
> > > > 
> > > 
> > > Extra note : I have a HZ=250 on my system. Changing to 100 or 1000 did
> > > not make much difference (also tried with NO_HZ enabled).
> > > 
> > > > Do you have queuing enabled on your drives? You can check that in
> > > > /sys/block/sdX/device/queue_depth. Try setting those to 1 and retest all
> > > > schedulers, would be good for comparison.
> > > > 
> > > 
> > > Here are the tests with a queue_depth of 1 :
> > > 
> > > I/O scheduler        runt-min (msec)   runt-max (msec)
> > > noop                       43             38235
> > > anticipatory               44              8728
> > > deadline                   51             19751
> > > cfq                        48               427
> > > 
> > > 
> > > Overall, I wouldn't say it makes much difference.
> > 
> > 0,5 seconds vs 1,5 seconds isn't much of a difference?
> > 
> > > > raid personalities or dm complicates matters, since it introduces a
> > > > disconnect between 'ls' and the io scheduler at the bottom...
> > > > 
> > > 
> > > Yes, ideally I should re-run those directly on the disk partitions.
> > 
> > At least for comparison.
> > 
> > > I am also tempted to create a fio job file which acts like a ssh server
> > > receiving a connexion after it has been pruned from the cache while the
> > > system if doing heavy I/O. "ssh", in this case, seems to be doing much
> > > more I/O than a simple "ls", and I think we might want to see if cfq
> > > behaves correctly in such case. Most of this I/O is coming from page
> > > faults (identified as traps in the trace) probably because the ssh
> > > executable has been thrown out of the cache by
> > > 
> > > echo 3 > /proc/sys/vm/drop_caches
> > > 
> > > The behavior of an incoming ssh connexion after clearing the cache is
> > > appended below (Part 1 - LTTng trace for incoming ssh connexion). The
> > > job file created (Part 2) reads, for each job, a 2MB file with random
> > > reads each between 4k-44k. The results are very interesting for cfq :
> > > 
> > > I/O scheduler        runt-min (msec)   runt-max (msec)
> > > noop                       586           110242
> > > anticipatory               531            26942
> > > deadline                   561           108772
> > > cfq                        523            28216
> > > 
> > > So, basically, ssh being out of the cache can take 28s to answer an
> > > incoming ssh connexion even with the cfq scheduler. This is not exactly
> > > what I would call an acceptable latency.
> > 
> > At some point, you have to stop and consider what is acceptable
> > performance for a given IO pattern. Your ssh test case is purely random
> > IO, and neither CFQ nor AS would do any idling for that. We can make
> > this test case faster for sure, the hard part is making sure that we
> > don't regress on async throughput at the same time.
> > 
> > Also remember that with your raid1, it's not entirely reasonable to
> > blaim all performance issues on the IO scheduler as per my previous
> > mail. It would be a lot more fair to view the disk numbers individually.
> > 
> > Can you retry this job with 'quantum' set to 1 and 'slice_async_rq' set
> > to 1 as well?
> > 
> > However, I think we should be doing somewhat better at this test case.
> 
> Mathieu, does this improve anything for you?
> 

I got this message when running with your patch applied :
cfq: forced dispatching is broken (nr_sorted=4294967275), please report this
(message appeared 10 times in a job run)

Here is the result :

ssh test done on /dev/sda directly

queue_depth=31 (default)
/sys/block/sda/queue/iosched/slice_async_rq = 2 (default)
/sys/block/sda/queue/iosched/quantum = 4 (default)

I/O scheduler        runt-min (msec)   runt-max (msec)
cfq (default)             523              6637
cfq (patched)             564              7195

Pretty much the same.

Here is the test done on raid1 :
queue_depth=31 (default)
/sys/block/sd{a,b}/queue/iosched/slice_async_rq = 2 (default)
/sys/block/sd{a,b}/queue/iosched/quantum = 4 (default)

I/O scheduler        runt-min (msec)   runt-max (msec)
cfq (default, raid1)       523            28216
cfq (patched, raid1)       540            16454

With nearly same order of magnitude worse-case.

Mathieu


> diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
> index e8525fa..a556512 100644
> --- a/block/cfq-iosched.c
> +++ b/block/cfq-iosched.c
> @@ -1765,6 +1765,32 @@ cfq_update_idle_window(struct cfq_data *cfqd, struct cfq_queue *cfqq,
>  }
>  
>  /*
> + * Pull dispatched requests from 'cfqq' back into the scheduler
> + */
> +static void cfq_pull_dispatched_requests(struct cfq_data *cfqd,
> +					 struct cfq_queue *cfqq)
> +{
> +	struct request_queue *q = cfqd->queue;
> +	struct request *rq, *tmp;
> +
> +	list_for_each_entry_safe(rq, tmp, &q->queue_head, queuelist) {
> +		if ((rq->cmd_flags & REQ_STARTED) || RQ_CFQQ(rq) != cfqq)
> +			continue;
> +
> +		/*
> +		 * Pull off the dispatch list and put it back into the cfqq
> +		 */
> +		list_del(&rq->queuelist);
> +		cfqq->dispatched--;
> +		if (cfq_cfqq_sync(cfqq))
> +			cfqd->sync_flight--;
> +
> +		list_add_tail(&rq->queuelist, &cfqq->fifo);
> +		cfq_add_rq_rb(rq);
> +	}
> +}
> +
> +/*
>   * Check if new_cfqq should preempt the currently active queue. Return 0 for
>   * no or if we aren't sure, a 1 will cause a preempt.
>   */
> @@ -1820,8 +1846,14 @@ cfq_should_preempt(struct cfq_data *cfqd, struct cfq_queue *new_cfqq,
>   */
>  static void cfq_preempt_queue(struct cfq_data *cfqd, struct cfq_queue *cfqq)
>  {
> +	struct cfq_queue *old_cfqq = cfqd->active_queue;
> +
>  	cfq_log_cfqq(cfqd, cfqq, "preempt");
> -	cfq_slice_expired(cfqd, 1);
> +
> +	if (old_cfqq) {
> +		__cfq_slice_expired(cfqd, old_cfqq, 1);
> +		cfq_pull_dispatched_requests(cfqd, old_cfqq);
> +	}
>  
>  	/*
>  	 * Put the new queue at the front of the of the current list,
> 
> -- 
> Jens Axboe
> 
> 
> _______________________________________________
> ltt-dev mailing list
> ltt-dev@lists.casi.polymtl.ca
> http://lists.casi.polymtl.ca/cgi-bin/mailman/listinfo/ltt-dev
> 

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [ltt-dev] [RFC PATCH] block: Fix bio merge induced high I/O latency
  2009-01-20 14:22               ` [ltt-dev] " Mathieu Desnoyers
@ 2009-01-20 14:24                 ` Jens Axboe
  2009-01-20 15:42                   ` Mathieu Desnoyers
  0 siblings, 1 reply; 39+ messages in thread
From: Jens Axboe @ 2009-01-20 14:24 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: akpm, ltt-dev, Linus Torvalds, Ingo Molnar, linux-kernel

On Tue, Jan 20 2009, Mathieu Desnoyers wrote:
> * Jens Axboe (jens.axboe@oracle.com) wrote:
> > On Tue, Jan 20 2009, Jens Axboe wrote:
> > > On Mon, Jan 19 2009, Mathieu Desnoyers wrote:
> > > > * Jens Axboe (jens.axboe@oracle.com) wrote:
> > > > > On Sun, Jan 18 2009, Mathieu Desnoyers wrote:
> > > > > > I looked at the "ls" behavior (while doing a dd) within my LTTng trace
> > > > > > to create a fio job file.  The said behavior is appended below as "Part
> > > > > > 1 - ls I/O behavior". Note that the original "ls" test case was done
> > > > > > with the anticipatory I/O scheduler, which was active by default on my
> > > > > > debian system with custom vanilla 2.6.28 kernel. Also note that I am
> > > > > > running this on a raid-1, but have experienced the same problem on a
> > > > > > standard partition I created on the same machine.
> > > > > > 
> > > > > > I created the fio job file appended as "Part 2 - dd+ls fio job file". It
> > > > > > consists of one dd-like job and many small jobs reading as many data as
> > > > > > ls did. I used the small test script to batch run this ("Part 3 - batch
> > > > > > test").
> > > > > > 
> > > > > > The results for the ls-like jobs are interesting :
> > > > > > 
> > > > > > I/O scheduler        runt-min (msec)   runt-max (msec)
> > > > > > noop                       41             10563
> > > > > > anticipatory               63              8185
> > > > > > deadline                   52             33387
> > > > > > cfq                        43              1420
> > > > > 
> > > > 
> > > > Extra note : I have a HZ=250 on my system. Changing to 100 or 1000 did
> > > > not make much difference (also tried with NO_HZ enabled).
> > > > 
> > > > > Do you have queuing enabled on your drives? You can check that in
> > > > > /sys/block/sdX/device/queue_depth. Try setting those to 1 and retest all
> > > > > schedulers, would be good for comparison.
> > > > > 
> > > > 
> > > > Here are the tests with a queue_depth of 1 :
> > > > 
> > > > I/O scheduler        runt-min (msec)   runt-max (msec)
> > > > noop                       43             38235
> > > > anticipatory               44              8728
> > > > deadline                   51             19751
> > > > cfq                        48               427
> > > > 
> > > > 
> > > > Overall, I wouldn't say it makes much difference.
> > > 
> > > 0,5 seconds vs 1,5 seconds isn't much of a difference?
> > > 
> > > > > raid personalities or dm complicates matters, since it introduces a
> > > > > disconnect between 'ls' and the io scheduler at the bottom...
> > > > > 
> > > > 
> > > > Yes, ideally I should re-run those directly on the disk partitions.
> > > 
> > > At least for comparison.
> > > 
> > > > I am also tempted to create a fio job file which acts like a ssh server
> > > > receiving a connexion after it has been pruned from the cache while the
> > > > system if doing heavy I/O. "ssh", in this case, seems to be doing much
> > > > more I/O than a simple "ls", and I think we might want to see if cfq
> > > > behaves correctly in such case. Most of this I/O is coming from page
> > > > faults (identified as traps in the trace) probably because the ssh
> > > > executable has been thrown out of the cache by
> > > > 
> > > > echo 3 > /proc/sys/vm/drop_caches
> > > > 
> > > > The behavior of an incoming ssh connexion after clearing the cache is
> > > > appended below (Part 1 - LTTng trace for incoming ssh connexion). The
> > > > job file created (Part 2) reads, for each job, a 2MB file with random
> > > > reads each between 4k-44k. The results are very interesting for cfq :
> > > > 
> > > > I/O scheduler        runt-min (msec)   runt-max (msec)
> > > > noop                       586           110242
> > > > anticipatory               531            26942
> > > > deadline                   561           108772
> > > > cfq                        523            28216
> > > > 
> > > > So, basically, ssh being out of the cache can take 28s to answer an
> > > > incoming ssh connexion even with the cfq scheduler. This is not exactly
> > > > what I would call an acceptable latency.
> > > 
> > > At some point, you have to stop and consider what is acceptable
> > > performance for a given IO pattern. Your ssh test case is purely random
> > > IO, and neither CFQ nor AS would do any idling for that. We can make
> > > this test case faster for sure, the hard part is making sure that we
> > > don't regress on async throughput at the same time.
> > > 
> > > Also remember that with your raid1, it's not entirely reasonable to
> > > blaim all performance issues on the IO scheduler as per my previous
> > > mail. It would be a lot more fair to view the disk numbers individually.
> > > 
> > > Can you retry this job with 'quantum' set to 1 and 'slice_async_rq' set
> > > to 1 as well?
> > > 
> > > However, I think we should be doing somewhat better at this test case.
> > 
> > Mathieu, does this improve anything for you?
> > 
> 
> I got this message when running with your patch applied :
> cfq: forced dispatching is broken (nr_sorted=4294967275), please report this
> (message appeared 10 times in a job run)

Woops, missed a sort inc. Updated version below, or just ignore the
warning.

> Here is the result :
> 
> ssh test done on /dev/sda directly
> 
> queue_depth=31 (default)
> /sys/block/sda/queue/iosched/slice_async_rq = 2 (default)
> /sys/block/sda/queue/iosched/quantum = 4 (default)
> 
> I/O scheduler        runt-min (msec)   runt-max (msec)
> cfq (default)             523              6637
> cfq (patched)             564              7195
> 
> Pretty much the same.

Can you retry with depth=1 as well? There's not much to rip back out, if
everything is immediately sent to the device.

> 
> Here is the test done on raid1 :
> queue_depth=31 (default)
> /sys/block/sd{a,b}/queue/iosched/slice_async_rq = 2 (default)
> /sys/block/sd{a,b}/queue/iosched/quantum = 4 (default)
> 
> I/O scheduler        runt-min (msec)   runt-max (msec)
> cfq (default, raid1)       523            28216
> cfq (patched, raid1)       540            16454
> 
> With nearly same order of magnitude worse-case.

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index e8525fa..30714de 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -1765,6 +1765,36 @@ cfq_update_idle_window(struct cfq_data *cfqd, struct cfq_queue *cfqq,
 }
 
 /*
+ * Pull dispatched requests from 'cfqq' back into the scheduler
+ */
+static void cfq_pull_dispatched_requests(struct cfq_data *cfqd,
+					 struct cfq_queue *cfqq)
+{
+	struct request_queue *q = cfqd->queue;
+	struct request *rq;
+
+	list_for_each_entry_reverse(rq, &q->queue_head, queuelist) {
+		if (rq->cmd_flags & REQ_STARTED)
+			break;
+
+		if (RQ_CFQQ(rq) != cfqq)
+			continue;
+
+		/*
+		 * Pull off the dispatch list and put it back into the cfqq
+		 */
+		list_del(&rq->queuelist);
+		cfqq->dispatched--;
+		if (cfq_cfqq_sync(cfqq))
+			cfqd->sync_flight--;
+
+		cfq_add_rq_rb(rq);
+		q->nr_sorted++;
+		list_add_tail(&rq->queuelist, &cfqq->fifo);
+	}
+}
+
+/*
  * Check if new_cfqq should preempt the currently active queue. Return 0 for
  * no or if we aren't sure, a 1 will cause a preempt.
  */
@@ -1820,8 +1850,14 @@ cfq_should_preempt(struct cfq_data *cfqd, struct cfq_queue *new_cfqq,
  */
 static void cfq_preempt_queue(struct cfq_data *cfqd, struct cfq_queue *cfqq)
 {
+	struct cfq_queue *old_cfqq = cfqd->active_queue;
+
 	cfq_log_cfqq(cfqd, cfqq, "preempt");
-	cfq_slice_expired(cfqd, 1);
+
+	if (old_cfqq) {
+		__cfq_slice_expired(cfqd, old_cfqq, 1);
+		cfq_pull_dispatched_requests(cfqd, old_cfqq);
+	}
 
 	/*
 	 * Put the new queue at the front of the of the current list,

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [ltt-dev] [RFC PATCH] block: Fix bio merge induced high I/O latency
  2009-01-20 14:24                 ` Jens Axboe
@ 2009-01-20 15:42                   ` Mathieu Desnoyers
  2009-01-20 23:06                     ` Mathieu Desnoyers
  0 siblings, 1 reply; 39+ messages in thread
From: Mathieu Desnoyers @ 2009-01-20 15:42 UTC (permalink / raw)
  To: Jens Axboe; +Cc: akpm, ltt-dev, Linus Torvalds, Ingo Molnar, linux-kernel

* Jens Axboe (jens.axboe@oracle.com) wrote:
> On Tue, Jan 20 2009, Mathieu Desnoyers wrote:
> > * Jens Axboe (jens.axboe@oracle.com) wrote:
> > > On Tue, Jan 20 2009, Jens Axboe wrote:
> > > > On Mon, Jan 19 2009, Mathieu Desnoyers wrote:
> > > > > * Jens Axboe (jens.axboe@oracle.com) wrote:
> > > > > > On Sun, Jan 18 2009, Mathieu Desnoyers wrote:
> > > > > > > I looked at the "ls" behavior (while doing a dd) within my LTTng trace
> > > > > > > to create a fio job file.  The said behavior is appended below as "Part
> > > > > > > 1 - ls I/O behavior". Note that the original "ls" test case was done
> > > > > > > with the anticipatory I/O scheduler, which was active by default on my
> > > > > > > debian system with custom vanilla 2.6.28 kernel. Also note that I am
> > > > > > > running this on a raid-1, but have experienced the same problem on a
> > > > > > > standard partition I created on the same machine.
> > > > > > > 
> > > > > > > I created the fio job file appended as "Part 2 - dd+ls fio job file". It
> > > > > > > consists of one dd-like job and many small jobs reading as many data as
> > > > > > > ls did. I used the small test script to batch run this ("Part 3 - batch
> > > > > > > test").
> > > > > > > 
> > > > > > > The results for the ls-like jobs are interesting :
> > > > > > > 
> > > > > > > I/O scheduler        runt-min (msec)   runt-max (msec)
> > > > > > > noop                       41             10563
> > > > > > > anticipatory               63              8185
> > > > > > > deadline                   52             33387
> > > > > > > cfq                        43              1420
> > > > > > 
> > > > > 
> > > > > Extra note : I have a HZ=250 on my system. Changing to 100 or 1000 did
> > > > > not make much difference (also tried with NO_HZ enabled).
> > > > > 
> > > > > > Do you have queuing enabled on your drives? You can check that in
> > > > > > /sys/block/sdX/device/queue_depth. Try setting those to 1 and retest all
> > > > > > schedulers, would be good for comparison.
> > > > > > 
> > > > > 
> > > > > Here are the tests with a queue_depth of 1 :
> > > > > 
> > > > > I/O scheduler        runt-min (msec)   runt-max (msec)
> > > > > noop                       43             38235
> > > > > anticipatory               44              8728
> > > > > deadline                   51             19751
> > > > > cfq                        48               427
> > > > > 
> > > > > 
> > > > > Overall, I wouldn't say it makes much difference.
> > > > 
> > > > 0,5 seconds vs 1,5 seconds isn't much of a difference?
> > > > 
> > > > > > raid personalities or dm complicates matters, since it introduces a
> > > > > > disconnect between 'ls' and the io scheduler at the bottom...
> > > > > > 
> > > > > 
> > > > > Yes, ideally I should re-run those directly on the disk partitions.
> > > > 
> > > > At least for comparison.
> > > > 
> > > > > I am also tempted to create a fio job file which acts like a ssh server
> > > > > receiving a connexion after it has been pruned from the cache while the
> > > > > system if doing heavy I/O. "ssh", in this case, seems to be doing much
> > > > > more I/O than a simple "ls", and I think we might want to see if cfq
> > > > > behaves correctly in such case. Most of this I/O is coming from page
> > > > > faults (identified as traps in the trace) probably because the ssh
> > > > > executable has been thrown out of the cache by
> > > > > 
> > > > > echo 3 > /proc/sys/vm/drop_caches
> > > > > 
> > > > > The behavior of an incoming ssh connexion after clearing the cache is
> > > > > appended below (Part 1 - LTTng trace for incoming ssh connexion). The
> > > > > job file created (Part 2) reads, for each job, a 2MB file with random
> > > > > reads each between 4k-44k. The results are very interesting for cfq :
> > > > > 
> > > > > I/O scheduler        runt-min (msec)   runt-max (msec)
> > > > > noop                       586           110242
> > > > > anticipatory               531            26942
> > > > > deadline                   561           108772
> > > > > cfq                        523            28216
> > > > > 
> > > > > So, basically, ssh being out of the cache can take 28s to answer an
> > > > > incoming ssh connexion even with the cfq scheduler. This is not exactly
> > > > > what I would call an acceptable latency.
> > > > 
> > > > At some point, you have to stop and consider what is acceptable
> > > > performance for a given IO pattern. Your ssh test case is purely random
> > > > IO, and neither CFQ nor AS would do any idling for that. We can make
> > > > this test case faster for sure, the hard part is making sure that we
> > > > don't regress on async throughput at the same time.
> > > > 
> > > > Also remember that with your raid1, it's not entirely reasonable to
> > > > blaim all performance issues on the IO scheduler as per my previous
> > > > mail. It would be a lot more fair to view the disk numbers individually.
> > > > 
> > > > Can you retry this job with 'quantum' set to 1 and 'slice_async_rq' set
> > > > to 1 as well?
> > > > 
> > > > However, I think we should be doing somewhat better at this test case.
> > > 
> > > Mathieu, does this improve anything for you?
> > > 
> > 
> > I got this message when running with your patch applied :
> > cfq: forced dispatching is broken (nr_sorted=4294967275), please report this
> > (message appeared 10 times in a job run)
> 
> Woops, missed a sort inc. Updated version below, or just ignore the
> warning.
> 
> > Here is the result :
> > 
> > ssh test done on /dev/sda directly
> > 
> > queue_depth=31 (default)
> > /sys/block/sda/queue/iosched/slice_async_rq = 2 (default)
> > /sys/block/sda/queue/iosched/quantum = 4 (default)
> > 
> > I/O scheduler        runt-min (msec)   runt-max (msec)
> > cfq (default)             523              6637
> > cfq (patched)             564              7195
> > 
> > Pretty much the same.
> 
> Can you retry with depth=1 as well? There's not much to rip back out, if
> everything is immediately sent to the device.
> 

echo 1 > /sys/block/sda/queue/iosched/quantum 
echo 1 > /sys/block/sda/queue/iosched/slice_async_rq
echo 1 > /sys/block/sda/device/queue_depth

ssh test done on /dev/sda directly

oops, something wrong in the new patch ?


[  302.077063] BUG: unable to handle kernel paging request at 00000008
[  302.078732] IP: [<ffffffff8040a1e5>] cfq_remove_request+0x35/0x1d0           
[  302.078732] PGD 43ac76067 PUD 43b1f3067 PMD 0                                
[  302.078732] Oops: 0002 [#1] PREEMPT SMP                                      
[  302.078732] LTT NESTING LEVEL : 0                                            
[  302.078732] last sysfs file: /sys/block/sda/stat                             
[  302.078732] Dumping ftrace buffer:                                           
[  302.078732]    (ftrace buffer empty)                                         
[  302.078732] CPU 0                                                            
[  302.078732] Modules linked in: e1000e loop ltt_tracer ltt_trace_control ltt_e
[  302.078732] Pid: 3748, comm: cron Not tainted 2.6.28 #53                     
[  302.078732] RIP: 0010:[<ffffffff8040a1e5>]  [<ffffffff8040a1e5>] cfq_remove_0
[  302.078732] RSP: 0018:ffff8804388a38a8  EFLAGS: 00010087                     
[  302.078732] RAX: 0000000000200200 RBX: ffff880437d92000 RCX: 000000002bcde392
[  302.078732] RDX: 0000000000100100 RSI: ffff880437d92fd0 RDI: ffff880437d92fd0
[  302.078732] RBP: ffff8804388a38d8 R08: ffff88043e8ce608 R09: 000000002bcdb78a
[  302.078732] R10: 000000002bcdbb8a R11: 0000000000000808 R12: ffff88043e8ce5d8
[  302.078732] R13: ffff880437d92fd0 R14: ffff88043e433800 R15: ffff88043e8ce5d8
[  302.078732] FS:  00007fd9637ea780(0000) GS:ffffffff808de7c0(0000) knlGS:00000
[  302.078732] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b                
[  302.078732] CR2: 0000000000100108 CR3: 000000043ad52000 CR4: 00000000000006e0
[  302.078732] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  302.078732] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  302.078732] Process cron (pid: 3748, threadinfo ffff8804388a2000, task ffff8)
[  302.078732] Stack:                                                           
[  302.078732]  ffff88043e8ce5e8 ffff880437d92fd0 ffff88043e8ce5d8 ffff88043d550
[  302.078732]  ffff88043e433800 ffff88043e433800 ffff8804388a3908 ffffffff8040d
[  302.078732]  ffff88043e8ce5d8 ffff88043e433800 ffff880437d92fd0 ffff88043e8c8
[  302.078732] Call Trace:                                                      
[  302.078732]  [<ffffffff8040a3bd>] cfq_dispatch_insert+0x3d/0x70              
[  302.078732]  [<ffffffff8040a43c>] cfq_add_rq_rb+0x4c/0xb0                    
[  302.078732]  [<ffffffff8040ab6f>] cfq_insert_request+0x24f/0x420             
[  302.078732]  [<ffffffff803fac30>] elv_insert+0x160/0x2f0                     
[  302.078732]  [<ffffffff803fae3b>] __elv_add_request+0x7b/0xd0                
[  302.078732]  [<ffffffff803fe02d>] __make_request+0xfd/0x4f0                  
[  302.078732]  [<ffffffff803fc39c>] generic_make_request+0x40c/0x550           
[  302.078732]  [<ffffffff8029ccab>] ? mempool_alloc+0x5b/0x150                 
[  302.078732]  [<ffffffff802f54c8>] ? __find_get_block+0xc8/0x210              
[  302.078732]  [<ffffffff803fc582>] submit_bio+0xa2/0x150                      
[  302.078732]  [<ffffffff802fa75e>] ? bio_alloc_bioset+0x5e/0x100              
[  302.078732]  [<ffffffff802f4d26>] submit_bh+0xf6/0x130                       
[  302.078732]  [<ffffffff8032fbc4>] __ext3_get_inode_loc+0x224/0x340           
[  302.078732]  [<ffffffff8032fd40>] ext3_iget+0x60/0x420                       
[  302.078732]  [<ffffffff80336e68>] ext3_lookup+0xa8/0x100                     
[  302.078732]  [<ffffffff802e3d46>] ? d_alloc+0x186/0x1f0                      
[  302.078732]  [<ffffffff802d92a6>] do_lookup+0x206/0x260                      
[  302.078732]  [<ffffffff802db4f6>] __link_path_walk+0x756/0xfe0               
[  302.078732]  [<ffffffff80262cd4>] ? get_lock_stats+0x34/0x70                 
[  302.078732]  [<ffffffff802dc16b>] ? do_path_lookup+0x9b/0x200                
[  302.078732]  [<ffffffff802dbf9e>] path_walk+0x6e/0xe0                        
[  302.078732]  [<ffffffff802dc176>] do_path_lookup+0xa6/0x200                  
[  302.078732]  [<ffffffff802dad36>] ? getname+0x1c6/0x230                      
[  302.078732]  [<ffffffff802dd02b>] user_path_at+0x7b/0xb0                     
[  302.078732]  [<ffffffff8067d3a7>] ? _spin_unlock_irqrestore+0x47/0x80        
[  302.078732]  [<ffffffff80259ad3>] ? hrtimer_try_to_cancel+0x53/0xb0          
[  302.078732]  [<ffffffff80259b52>] ? hrtimer_cancel+0x22/0x30                 
[  302.078732]  [<ffffffff802d414d>] vfs_stat_fd+0x2d/0x60                      
[  302.078732]  [<ffffffff802d422c>] sys_newstat+0x2c/0x50                      
[  302.078732]  [<ffffffff80265901>] ? trace_hardirqs_on_caller+0x1b1/0x210     
[  302.078732]  [<ffffffff8067cd0e>] ? trace_hardirqs_on_thunk+0x3a/0x3f        
[  302.078732]  [<ffffffff8020c5db>] system_call_fastpath+0x16/0x1b             
[  302.078732] Code: 41 54 53 48 83 ec 08 0f 1f 44 00 00 4c 8b bf c0 00 00 00 4 
[  302.078732] RIP  [<ffffffff8040a1e5>] cfq_remove_request+0x35/0x1d0          
[  302.078732]  RSP <ffff8804388a38a8>                                          
[  302.078732] CR2: 0000000000100108                                            
[  302.078732] ---[ end trace 925e67a354a83fdc ]---                             
[  302.078732] note: cron[3748] exited with preempt_count 1                    



> > 
> > Here is the test done on raid1 :
> > queue_depth=31 (default)
> > /sys/block/sd{a,b}/queue/iosched/slice_async_rq = 2 (default)
> > /sys/block/sd{a,b}/queue/iosched/quantum = 4 (default)
> > 
> > I/O scheduler        runt-min (msec)   runt-max (msec)
> > cfq (default, raid1)       523            28216
> > cfq (patched, raid1)       540            16454
> > 
> > With nearly same order of magnitude worse-case.
> 
> diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
> index e8525fa..30714de 100644
> --- a/block/cfq-iosched.c
> +++ b/block/cfq-iosched.c
> @@ -1765,6 +1765,36 @@ cfq_update_idle_window(struct cfq_data *cfqd, struct cfq_queue *cfqq,
>  }
>  
>  /*
> + * Pull dispatched requests from 'cfqq' back into the scheduler
> + */
> +static void cfq_pull_dispatched_requests(struct cfq_data *cfqd,
> +					 struct cfq_queue *cfqq)
> +{
> +	struct request_queue *q = cfqd->queue;
> +	struct request *rq;
> +
> +	list_for_each_entry_reverse(rq, &q->queue_head, queuelist) {
> +		if (rq->cmd_flags & REQ_STARTED)
> +			break;
> +
> +		if (RQ_CFQQ(rq) != cfqq)
> +			continue;
> +
> +		/*
> +		 * Pull off the dispatch list and put it back into the cfqq
> +		 */
> +		list_del(&rq->queuelist);
> +		cfqq->dispatched--;
> +		if (cfq_cfqq_sync(cfqq))
> +			cfqd->sync_flight--;
> +
> +		cfq_add_rq_rb(rq);
> +		q->nr_sorted++;
> +		list_add_tail(&rq->queuelist, &cfqq->fifo);
> +	}
> +}
> +
> +/*
>   * Check if new_cfqq should preempt the currently active queue. Return 0 for
>   * no or if we aren't sure, a 1 will cause a preempt.
>   */
> @@ -1820,8 +1850,14 @@ cfq_should_preempt(struct cfq_data *cfqd, struct cfq_queue *new_cfqq,
>   */
>  static void cfq_preempt_queue(struct cfq_data *cfqd, struct cfq_queue *cfqq)
>  {
> +	struct cfq_queue *old_cfqq = cfqd->active_queue;
> +
>  	cfq_log_cfqq(cfqd, cfqq, "preempt");
> -	cfq_slice_expired(cfqd, 1);
> +
> +	if (old_cfqq) {
> +		__cfq_slice_expired(cfqd, old_cfqq, 1);
> +		cfq_pull_dispatched_requests(cfqd, old_cfqq);
> +	}
>  
>  	/*
>  	 * Put the new queue at the front of the of the current list,
> 
> -- 
> Jens Axboe
> 

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH] block: Fix bio merge induced high I/O latency
  2009-01-20  7:37           ` Jens Axboe
  2009-01-20 12:28             ` Jens Axboe
  2009-01-20 13:45             ` [ltt-dev] " Mathieu Desnoyers
@ 2009-01-20 20:22             ` Ben Gamari
  2009-01-20 22:23               ` Ben Gamari
  2009-01-22  2:35               ` Ben Gamari
  2 siblings, 2 replies; 39+ messages in thread
From: Ben Gamari @ 2009-01-20 20:22 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Mathieu Desnoyers, Andrea Arcangeli, akpm, Ingo Molnar,
	Linus Torvalds, linux-kernel, ltt-dev

On Tue, Jan 20, 2009 at 2:37 AM, Jens Axboe <jens.axboe@oracle.com> wrote:
> On Mon, Jan 19 2009, Mathieu Desnoyers wrote:
>> * Jens Axboe (jens.axboe@oracle.com) wrote:
>> Yes, ideally I should re-run those directly on the disk partitions.
>
> At least for comparison.
>

I just completed my own set of benchmarks using the fio job file
Mathieu provided. This was on a 2.5 inch 7200 RPM SATA partition
formatted as ext3. As you can see, I tested all of the available
schedulers with both queuing enabled and disabled. I'll test the Jens'
patch soon. Would a blktrace of the fio run help? Let me know if
there's any other benchmarking or profiling that could be done.
Thanks,

- Ben


			mint		maxt
==========================================================
queue_depth=31:
anticipatory		35 msec		11036 msec
cfq			37 msec		3350 msec
deadline		36 msec		18144 msec
noop			39 msec		41512 msec

==========================================================
queue_depth=1:
anticipatory		45 msec		9561 msec
cfq			28 msec		3974 msec
deadline		47 msec		16802 msec
noop			35 msec		38173 msec

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH] block: Fix bio merge induced high I/O latency
  2009-01-20 20:22             ` Ben Gamari
@ 2009-01-20 22:23               ` Ben Gamari
  2009-01-20 23:05                 ` Mathieu Desnoyers
  2009-01-22  2:35               ` Ben Gamari
  1 sibling, 1 reply; 39+ messages in thread
From: Ben Gamari @ 2009-01-20 22:23 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Mathieu Desnoyers, Andrea Arcangeli, akpm, Ingo Molnar,
	Linus Torvalds, linux-kernel, ltt-dev

The kernel build finally finished. Unfortunately, it crashes quickly
after booting with moderate disk IO, bringing down the entire machine.
For this reason, I haven't been able to complete a fio benchmark.
Jens, what do you think about this backtrace?

- Ben


BUG: unable to handle kernel paging request at 0000000008
IP: [<ffffffff811c4b2d>] cfq_remove_request+0xb0/0x1da
PGD b2902067 PUD b292e067 PMD 0
Oops: 0002 [#1] SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0t
CPU 0
Modules linked in: aes_x86_64 aes_generic i915 drm i2c_algo_bit rfcomm bridge s]
Pid: 3903, comm: evolution Not tainted 2.6.29-rc2ben #16
RIP: 0010:[<ffffffff811c4b2d>]  [<ffffffff811c4b2d>] cfq_remove_request+0xb0/0xa
RSP: 0018:ffff8800bb853758  EFLAGS: 00010006
RAX: 0000000000200200 RBX: ffff8800b28f3420 RCX: 0000000009deabeb
RDX: 0000000000100100 RSI: ffff8800b010afd0 RDI: ffff8800b010afd0
RBP: ffff8800bb853788 R08: ffff88011fc08250 R09: 000000000cf8b20b
R10: 0000000009e15923 R11: ffff8800b28f3420 R12: ffff8800b010afd0
R13: ffff8800b010afd0 R14: ffff88011d4e8000 R15: ffff88011fc08220
FS:  00007f4b1ef407e0(0000) GS:ffffffff817e7000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000100108 CR3: 00000000b284b000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process evolution (pid: 3903, threadinfo ffff8800bb852000, task ffff8800da0c2de)
Stack:
 ffffffff811ccc19 ffff88011fc08220 ffff8800b010afd0 ffff88011d572000
 ffff88011d4e8000 ffff88011d572000 ffff8800bb8537b8 ffffffff811c4ca8
 ffff88011fc08220 ffff88011d572000 ffff8800b010afd0 ffff88011fc08250
Call Trace:
 [<ffffffff811ccc19>] ? rb_insert_color+0xbd/0xe6
 [<ffffffff811c4ca8>] cfq_dispatch_insert+0x51/0x72
 [<ffffffff811c4d0d>] cfq_add_rq_rb+0x44/0xcf
 [<ffffffff811c5519>] cfq_insert_request+0x34d/0x3d1
 [<ffffffff811b6d81>] elv_insert+0x1a9/0x250
 [<ffffffff811b6ec3>] __elv_add_request+0x9b/0xa4
 [<ffffffff811b9769>] __make_request+0x3c4/0x446
 [<ffffffff811b7f53>] generic_make_request+0x2bf/0x309
 [<ffffffff811b8068>] submit_bio+0xcb/0xd4
 [<ffffffff810f170b>] submit_bh+0x115/0x138
 [<ffffffff810f31f7>] ll_rw_block+0xa5/0xf4
 [<ffffffff810f3886>] __block_prepare_write+0x277/0x306
 [<ffffffff8112c759>] ? ext3_get_block+0x0/0x101
 [<ffffffff810f3a7e>] block_write_begin+0x8b/0xdd
 [<ffffffff8112bd66>] ext3_write_begin+0xee/0x1c0
 [<ffffffff8112c759>] ? ext3_get_block+0x0/0x101
 [<ffffffff8109f3be>] generic_file_buffered_write+0x12e/0x2e4
 [<ffffffff8109f973>] __generic_file_aio_write_nolock+0x263/0x297
 [<ffffffff810e4470>] ? touch_atime+0xdf/0x101
 [<ffffffff8109feaa>] ? generic_file_aio_read+0x503/0x59c
 [<ffffffff810a01ed>] generic_file_aio_write+0x6c/0xc8
 [<ffffffff81128c72>] ext3_file_write+0x23/0xa5
 [<ffffffff810d2d77>] do_sync_write+0xec/0x132
 [<ffffffff8105da1c>] ? autoremove_wake_function+0x0/0x3d
 [<ffffffff8119c880>] ? selinux_file_permission+0x40/0xcb
 [<ffffffff8119c902>] ? selinux_file_permission+0xc2/0xcb
 [<ffffffff81194cc4>] ? security_file_permission+0x16/0x18
 [<ffffffff810d3693>] vfs_write+0xb0/0x10a
 [<ffffffff810d37bb>] sys_write+0x4c/0x74
 [<ffffffff810114aa>] system_call_fastpath+0x16/0x1b
Code: 48 85 c0 74 0c 4c 39 e0 48 8d b0 60 ff ff ff 75 02 31 f6 48 8b 7d d0 48 8
RIP  [<ffffffff811c4b2d>] cfq_remove_request+0xb0/0x1da
 RSP <ffff8800bb853758>
CR2: 0000000000100108
---[ end trace 6c5ef63f7957c4cf ]---




On Tue, Jan 20, 2009 at 3:22 PM, Ben Gamari <bgamari@gmail.com> wrote:
> On Tue, Jan 20, 2009 at 2:37 AM, Jens Axboe <jens.axboe@oracle.com> wrote:
>> On Mon, Jan 19 2009, Mathieu Desnoyers wrote:
>>> * Jens Axboe (jens.axboe@oracle.com) wrote:
>>> Yes, ideally I should re-run those directly on the disk partitions.
>>
>> At least for comparison.
>>
>
> I just completed my own set of benchmarks using the fio job file
> Mathieu provided. This was on a 2.5 inch 7200 RPM SATA partition
> formatted as ext3. As you can see, I tested all of the available
> schedulers with both queuing enabled and disabled. I'll test the Jens'
> patch soon. Would a blktrace of the fio run help? Let me know if
> there's any other benchmarking or profiling that could be done.
> Thanks,
>
> - Ben
>
>
>                        mint            maxt
> ==========================================================
> queue_depth=31:
> anticipatory            35 msec         11036 msec
> cfq                     37 msec         3350 msec
> deadline                36 msec         18144 msec
> noop                    39 msec         41512 msec
>
> ==========================================================
> queue_depth=1:
> anticipatory            45 msec         9561 msec
> cfq                     28 msec         3974 msec
> deadline                47 msec         16802 msec
> noop                    35 msec         38173 msec
>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH] block: Fix bio merge induced high I/O latency
  2009-01-20 22:23               ` Ben Gamari
@ 2009-01-20 23:05                 ` Mathieu Desnoyers
  0 siblings, 0 replies; 39+ messages in thread
From: Mathieu Desnoyers @ 2009-01-20 23:05 UTC (permalink / raw)
  To: Ben Gamari
  Cc: Jens Axboe, Andrea Arcangeli, akpm, Ingo Molnar, Linus Torvalds,
	linux-kernel, ltt-dev

* Ben Gamari (bgamari@gmail.com) wrote:
> The kernel build finally finished. Unfortunately, it crashes quickly
> after booting with moderate disk IO, bringing down the entire machine.
> For this reason, I haven't been able to complete a fio benchmark.
> Jens, what do you think about this backtrace?
> 


Hi Ben,

Try with this new patch I just did. It solves the problem for me. Jens
seems to have done a list_del in a non-safe list iteration.

Mathieu

Fixes cfq iosched test patch

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
 block/cfq-iosched.c |   38 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 37 insertions(+), 1 deletion(-)

Index: linux-2.6-lttng/block/cfq-iosched.c
===================================================================
--- linux-2.6-lttng.orig/block/cfq-iosched.c	2009-01-20 10:31:46.000000000 -0500
+++ linux-2.6-lttng/block/cfq-iosched.c	2009-01-20 17:41:06.000000000 -0500
@@ -1761,6 +1761,36 @@ cfq_update_idle_window(struct cfq_data *
 }
 
 /*
+ * Pull dispatched requests from 'cfqq' back into the scheduler
+ */
+static void cfq_pull_dispatched_requests(struct cfq_data *cfqd,
+					 struct cfq_queue *cfqq)
+{
+	struct request_queue *q = cfqd->queue;
+	struct request *rq, *tmp;
+
+	list_for_each_entry_safe_reverse(rq, tmp, &q->queue_head, queuelist) {
+		if (rq->cmd_flags & REQ_STARTED)
+			break;
+
+		if (RQ_CFQQ(rq) != cfqq)
+			continue;
+
+		/*
+		 * Pull off the dispatch list and put it back into the cfqq
+		 */
+		list_del(&rq->queuelist);
+		cfqq->dispatched--;
+		if (cfq_cfqq_sync(cfqq))
+			cfqd->sync_flight--;
+
+		cfq_add_rq_rb(rq);
+		q->nr_sorted++;
+		list_add_tail(&rq->queuelist, &cfqq->fifo);
+	}
+}
+
+/*
  * Check if new_cfqq should preempt the currently active queue. Return 0 for
  * no or if we aren't sure, a 1 will cause a preempt.
  */
@@ -1816,8 +1846,14 @@ cfq_should_preempt(struct cfq_data *cfqd
  */
 static void cfq_preempt_queue(struct cfq_data *cfqd, struct cfq_queue *cfqq)
 {
+	struct cfq_queue *old_cfqq = cfqd->active_queue;
+
 	cfq_log_cfqq(cfqd, cfqq, "preempt");
-	cfq_slice_expired(cfqd, 1);
+
+	if (old_cfqq) {
+		__cfq_slice_expired(cfqd, old_cfqq, 1);
+		cfq_pull_dispatched_requests(cfqd, old_cfqq);
+	}
 
 	/*
 	 * Put the new queue at the front of the of the current list,

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [ltt-dev] [RFC PATCH] block: Fix bio merge induced high I/O latency
  2009-01-20 15:42                   ` Mathieu Desnoyers
@ 2009-01-20 23:06                     ` Mathieu Desnoyers
  0 siblings, 0 replies; 39+ messages in thread
From: Mathieu Desnoyers @ 2009-01-20 23:06 UTC (permalink / raw)
  To: Jens Axboe; +Cc: akpm, ltt-dev, Linus Torvalds, Ingo Molnar, linux-kernel

* Mathieu Desnoyers (compudj@krystal.dyndns.org) wrote:
> * Jens Axboe (jens.axboe@oracle.com) wrote:
> > On Tue, Jan 20 2009, Mathieu Desnoyers wrote:
> > > * Jens Axboe (jens.axboe@oracle.com) wrote:
> > > > On Tue, Jan 20 2009, Jens Axboe wrote:
> > > > > On Mon, Jan 19 2009, Mathieu Desnoyers wrote:
> > > > > > * Jens Axboe (jens.axboe@oracle.com) wrote:
> > > > > > > On Sun, Jan 18 2009, Mathieu Desnoyers wrote:
> > > > > > > > I looked at the "ls" behavior (while doing a dd) within my LTTng trace
> > > > > > > > to create a fio job file.  The said behavior is appended below as "Part
> > > > > > > > 1 - ls I/O behavior". Note that the original "ls" test case was done
> > > > > > > > with the anticipatory I/O scheduler, which was active by default on my
> > > > > > > > debian system with custom vanilla 2.6.28 kernel. Also note that I am
> > > > > > > > running this on a raid-1, but have experienced the same problem on a
> > > > > > > > standard partition I created on the same machine.
> > > > > > > > 
> > > > > > > > I created the fio job file appended as "Part 2 - dd+ls fio job file". It
> > > > > > > > consists of one dd-like job and many small jobs reading as many data as
> > > > > > > > ls did. I used the small test script to batch run this ("Part 3 - batch
> > > > > > > > test").
> > > > > > > > 
> > > > > > > > The results for the ls-like jobs are interesting :
> > > > > > > > 
> > > > > > > > I/O scheduler        runt-min (msec)   runt-max (msec)
> > > > > > > > noop                       41             10563
> > > > > > > > anticipatory               63              8185
> > > > > > > > deadline                   52             33387
> > > > > > > > cfq                        43              1420
> > > > > > > 
> > > > > > 
> > > > > > Extra note : I have a HZ=250 on my system. Changing to 100 or 1000 did
> > > > > > not make much difference (also tried with NO_HZ enabled).
> > > > > > 
> > > > > > > Do you have queuing enabled on your drives? You can check that in
> > > > > > > /sys/block/sdX/device/queue_depth. Try setting those to 1 and retest all
> > > > > > > schedulers, would be good for comparison.
> > > > > > > 
> > > > > > 
> > > > > > Here are the tests with a queue_depth of 1 :
> > > > > > 
> > > > > > I/O scheduler        runt-min (msec)   runt-max (msec)
> > > > > > noop                       43             38235
> > > > > > anticipatory               44              8728
> > > > > > deadline                   51             19751
> > > > > > cfq                        48               427
> > > > > > 
> > > > > > 
> > > > > > Overall, I wouldn't say it makes much difference.
> > > > > 
> > > > > 0,5 seconds vs 1,5 seconds isn't much of a difference?
> > > > > 
> > > > > > > raid personalities or dm complicates matters, since it introduces a
> > > > > > > disconnect between 'ls' and the io scheduler at the bottom...
> > > > > > > 
> > > > > > 
> > > > > > Yes, ideally I should re-run those directly on the disk partitions.
> > > > > 
> > > > > At least for comparison.
> > > > > 
> > > > > > I am also tempted to create a fio job file which acts like a ssh server
> > > > > > receiving a connexion after it has been pruned from the cache while the
> > > > > > system if doing heavy I/O. "ssh", in this case, seems to be doing much
> > > > > > more I/O than a simple "ls", and I think we might want to see if cfq
> > > > > > behaves correctly in such case. Most of this I/O is coming from page
> > > > > > faults (identified as traps in the trace) probably because the ssh
> > > > > > executable has been thrown out of the cache by
> > > > > > 
> > > > > > echo 3 > /proc/sys/vm/drop_caches
> > > > > > 
> > > > > > The behavior of an incoming ssh connexion after clearing the cache is
> > > > > > appended below (Part 1 - LTTng trace for incoming ssh connexion). The
> > > > > > job file created (Part 2) reads, for each job, a 2MB file with random
> > > > > > reads each between 4k-44k. The results are very interesting for cfq :
> > > > > > 
> > > > > > I/O scheduler        runt-min (msec)   runt-max (msec)
> > > > > > noop                       586           110242
> > > > > > anticipatory               531            26942
> > > > > > deadline                   561           108772
> > > > > > cfq                        523            28216
> > > > > > 
> > > > > > So, basically, ssh being out of the cache can take 28s to answer an
> > > > > > incoming ssh connexion even with the cfq scheduler. This is not exactly
> > > > > > what I would call an acceptable latency.
> > > > > 
> > > > > At some point, you have to stop and consider what is acceptable
> > > > > performance for a given IO pattern. Your ssh test case is purely random
> > > > > IO, and neither CFQ nor AS would do any idling for that. We can make
> > > > > this test case faster for sure, the hard part is making sure that we
> > > > > don't regress on async throughput at the same time.
> > > > > 
> > > > > Also remember that with your raid1, it's not entirely reasonable to
> > > > > blaim all performance issues on the IO scheduler as per my previous
> > > > > mail. It would be a lot more fair to view the disk numbers individually.
> > > > > 
> > > > > Can you retry this job with 'quantum' set to 1 and 'slice_async_rq' set
> > > > > to 1 as well?
> > > > > 
> > > > > However, I think we should be doing somewhat better at this test case.
> > > > 
> > > > Mathieu, does this improve anything for you?
> > > > 
> > > 
> > > I got this message when running with your patch applied :
> > > cfq: forced dispatching is broken (nr_sorted=4294967275), please report this
> > > (message appeared 10 times in a job run)
> > 
> > Woops, missed a sort inc. Updated version below, or just ignore the
> > warning.
> > 
> > > Here is the result :
> > > 
> > > ssh test done on /dev/sda directly
> > > 
> > > queue_depth=31 (default)
> > > /sys/block/sda/queue/iosched/slice_async_rq = 2 (default)
> > > /sys/block/sda/queue/iosched/quantum = 4 (default)
> > > 
> > > I/O scheduler        runt-min (msec)   runt-max (msec)
> > > cfq (default)             523              6637
> > > cfq (patched)             564              7195
> > > 
> > > Pretty much the same.
> > 
> > Can you retry with depth=1 as well? There's not much to rip back out, if
> > everything is immediately sent to the device.
> > 
> 
> echo 1 > /sys/block/sda/queue/iosched/quantum 
> echo 1 > /sys/block/sda/queue/iosched/slice_async_rq
> echo 1 > /sys/block/sda/device/queue_depth
> 
> ssh test done on /dev/sda directly
> 
> oops, something wrong in the new patch ?
> 

[...]

Don't waste time looking into this, here is the fixed version (list_del
in a previously non-safe list iteration).

Mathieu


Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
 block/cfq-iosched.c |   38 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 37 insertions(+), 1 deletion(-)

Index: linux-2.6-lttng/block/cfq-iosched.c
===================================================================
--- linux-2.6-lttng.orig/block/cfq-iosched.c	2009-01-20 10:31:46.000000000 -0500
+++ linux-2.6-lttng/block/cfq-iosched.c	2009-01-20 17:41:06.000000000 -0500
@@ -1761,6 +1761,36 @@ cfq_update_idle_window(struct cfq_data *
 }
 
 /*
+ * Pull dispatched requests from 'cfqq' back into the scheduler
+ */
+static void cfq_pull_dispatched_requests(struct cfq_data *cfqd,
+					 struct cfq_queue *cfqq)
+{
+	struct request_queue *q = cfqd->queue;
+	struct request *rq, *tmp;
+
+	list_for_each_entry_safe_reverse(rq, tmp, &q->queue_head, queuelist) {
+		if (rq->cmd_flags & REQ_STARTED)
+			break;
+
+		if (RQ_CFQQ(rq) != cfqq)
+			continue;
+
+		/*
+		 * Pull off the dispatch list and put it back into the cfqq
+		 */
+		list_del(&rq->queuelist);
+		cfqq->dispatched--;
+		if (cfq_cfqq_sync(cfqq))
+			cfqd->sync_flight--;
+
+		cfq_add_rq_rb(rq);
+		q->nr_sorted++;
+		list_add_tail(&rq->queuelist, &cfqq->fifo);
+	}
+}
+
+/*
  * Check if new_cfqq should preempt the currently active queue. Return 0 for
  * no or if we aren't sure, a 1 will cause a preempt.
  */
@@ -1816,8 +1846,14 @@ cfq_should_preempt(struct cfq_data *cfqd
  */
 static void cfq_preempt_queue(struct cfq_data *cfqd, struct cfq_queue *cfqq)
 {
+	struct cfq_queue *old_cfqq = cfqd->active_queue;
+
 	cfq_log_cfqq(cfqd, cfqq, "preempt");
-	cfq_slice_expired(cfqd, 1);
+
+	if (old_cfqq) {
+		__cfq_slice_expired(cfqd, old_cfqq, 1);
+		cfq_pull_dispatched_requests(cfqd, old_cfqq);
+	}
 
 	/*
 	 * Put the new queue at the front of the of the current list,

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH] block: Fix bio merge induced high I/O latency
  2009-01-20 12:28             ` Jens Axboe
  2009-01-20 14:22               ` [ltt-dev] " Mathieu Desnoyers
@ 2009-01-20 23:27               ` Mathieu Desnoyers
  2009-01-21  0:25                 ` Mathieu Desnoyers
  2009-01-23  3:21                 ` [ltt-dev] " KOSAKI Motohiro
  2009-02-02  2:08               ` [RFC PATCH] block: Fix bio merge induced high I/O latency Mathieu Desnoyers
  2 siblings, 2 replies; 39+ messages in thread
From: Mathieu Desnoyers @ 2009-01-20 23:27 UTC (permalink / raw)
  To: Jens Axboe; +Cc: akpm, Ingo Molnar, Linus Torvalds, linux-kernel, ltt-dev

* Jens Axboe (jens.axboe@oracle.com) wrote:
> On Tue, Jan 20 2009, Jens Axboe wrote:
> > On Mon, Jan 19 2009, Mathieu Desnoyers wrote:
> > > * Jens Axboe (jens.axboe@oracle.com) wrote:
> > > > On Sun, Jan 18 2009, Mathieu Desnoyers wrote:
> > > > > I looked at the "ls" behavior (while doing a dd) within my LTTng trace
> > > > > to create a fio job file.  The said behavior is appended below as "Part
> > > > > 1 - ls I/O behavior". Note that the original "ls" test case was done
> > > > > with the anticipatory I/O scheduler, which was active by default on my
> > > > > debian system with custom vanilla 2.6.28 kernel. Also note that I am
> > > > > running this on a raid-1, but have experienced the same problem on a
> > > > > standard partition I created on the same machine.
> > > > > 
> > > > > I created the fio job file appended as "Part 2 - dd+ls fio job file". It
> > > > > consists of one dd-like job and many small jobs reading as many data as
> > > > > ls did. I used the small test script to batch run this ("Part 3 - batch
> > > > > test").
> > > > > 
> > > > > The results for the ls-like jobs are interesting :
> > > > > 
> > > > > I/O scheduler        runt-min (msec)   runt-max (msec)
> > > > > noop                       41             10563
> > > > > anticipatory               63              8185
> > > > > deadline                   52             33387
> > > > > cfq                        43              1420
> > > > 
> > > 
> > > Extra note : I have a HZ=250 on my system. Changing to 100 or 1000 did
> > > not make much difference (also tried with NO_HZ enabled).
> > > 
> > > > Do you have queuing enabled on your drives? You can check that in
> > > > /sys/block/sdX/device/queue_depth. Try setting those to 1 and retest all
> > > > schedulers, would be good for comparison.
> > > > 
> > > 
> > > Here are the tests with a queue_depth of 1 :
> > > 
> > > I/O scheduler        runt-min (msec)   runt-max (msec)
> > > noop                       43             38235
> > > anticipatory               44              8728
> > > deadline                   51             19751
> > > cfq                        48               427
> > > 
> > > 
> > > Overall, I wouldn't say it makes much difference.
> > 
> > 0,5 seconds vs 1,5 seconds isn't much of a difference?
> > 
> > > > raid personalities or dm complicates matters, since it introduces a
> > > > disconnect between 'ls' and the io scheduler at the bottom...
> > > > 
> > > 
> > > Yes, ideally I should re-run those directly on the disk partitions.
> > 
> > At least for comparison.
> > 
> > > I am also tempted to create a fio job file which acts like a ssh server
> > > receiving a connexion after it has been pruned from the cache while the
> > > system if doing heavy I/O. "ssh", in this case, seems to be doing much
> > > more I/O than a simple "ls", and I think we might want to see if cfq
> > > behaves correctly in such case. Most of this I/O is coming from page
> > > faults (identified as traps in the trace) probably because the ssh
> > > executable has been thrown out of the cache by
> > > 
> > > echo 3 > /proc/sys/vm/drop_caches
> > > 
> > > The behavior of an incoming ssh connexion after clearing the cache is
> > > appended below (Part 1 - LTTng trace for incoming ssh connexion). The
> > > job file created (Part 2) reads, for each job, a 2MB file with random
> > > reads each between 4k-44k. The results are very interesting for cfq :
> > > 
> > > I/O scheduler        runt-min (msec)   runt-max (msec)
> > > noop                       586           110242
> > > anticipatory               531            26942
> > > deadline                   561           108772
> > > cfq                        523            28216
> > > 
> > > So, basically, ssh being out of the cache can take 28s to answer an
> > > incoming ssh connexion even with the cfq scheduler. This is not exactly
> > > what I would call an acceptable latency.
> > 
> > At some point, you have to stop and consider what is acceptable
> > performance for a given IO pattern. Your ssh test case is purely random
> > IO, and neither CFQ nor AS would do any idling for that. We can make
> > this test case faster for sure, the hard part is making sure that we
> > don't regress on async throughput at the same time.
> > 
> > Also remember that with your raid1, it's not entirely reasonable to
> > blaim all performance issues on the IO scheduler as per my previous
> > mail. It would be a lot more fair to view the disk numbers individually.
> > 
> > Can you retry this job with 'quantum' set to 1 and 'slice_async_rq' set
> > to 1 as well?
> > 
> > However, I think we should be doing somewhat better at this test case.
> 
> Mathieu, does this improve anything for you?
> 

So, I ran the tests with my corrected patch, and the results are very
good !

"incoming ssh connexion" test

"config 2.6.28 cfq"
Linux 2.6.28
/sys/block/sd{a,b}/device/queue_depth = 31 (default)
/sys/block/sd{a,b}/queue/iosched/slice_async_rq = 2 (default)
/sys/block/sd{a,b}/queue/iosched/quantum = 4 (default)

"config 2.6.28.1-patch1"
Linux 2.6.28.1
Corrected cfq patch applied
echo 1 > /sys/block/sd{a,b}/device/queue_depth
echo 1 > /sys/block/sd{a,b}/queue/iosched/slice_async_rq
echo 1 > /sys/block/sd{a,b}/queue/iosched/quantum

On /dev/sda :

I/O scheduler        runt-min (msec)   runt-max (msec)
cfq (2.6.28 cfq)          523             6637
cfq (2.6.28.1-patch1)     579             2082

On raid1 :

I/O scheduler        runt-min (msec)   runt-max (msec)
cfq (2.6.28 cfq)           523            28216
cfq (2.6.28.1-patch1)      517             3086

It looks like we are getting somewhere :) Are there any specific
queue_depth, slice_async_rq, quantum variations you would like to be
tested ?

For reference, I attach my ssh-like job file (again) to this mail.

Mathieu


[job1]
rw=write
size=10240m
direct=0
blocksize=1024k

[global]
rw=randread
size=2048k
filesize=30m
direct=0
bsrange=4k-44k

[file1]
startdelay=0

[file2]
startdelay=4

[file3]
startdelay=8

[file4]
startdelay=12

[file5]
startdelay=16

[file6]
startdelay=20

[file7]
startdelay=24

[file8]
startdelay=28

[file9]
startdelay=32

[file10]
startdelay=36

[file11]
startdelay=40

[file12]
startdelay=44

[file13]
startdelay=48

[file14]
startdelay=52

[file15]
startdelay=56

[file16]
startdelay=60

[file17]
startdelay=64

[file18]
startdelay=68

[file19]
startdelay=72

[file20]
startdelay=76

[file21]
startdelay=80

[file22]
startdelay=84

[file23]
startdelay=88

[file24]
startdelay=92

[file25]
startdelay=96

[file26]
startdelay=100

[file27]
startdelay=104

[file28]
startdelay=108

[file29]
startdelay=112

[file30]
startdelay=116

[file31]
startdelay=120

[file32]
startdelay=124

[file33]
startdelay=128

[file34]
startdelay=132

[file35]
startdelay=134

[file36]
startdelay=138

[file37]
startdelay=142

[file38]
startdelay=146

[file39]
startdelay=150

[file40]
startdelay=200

[file41]
startdelay=260

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH] block: Fix bio merge induced high I/O latency
  2009-01-20 23:27               ` Mathieu Desnoyers
@ 2009-01-21  0:25                 ` Mathieu Desnoyers
  2009-01-21  4:38                   ` Ben Gamari
  2009-01-22 22:59                   ` Mathieu Desnoyers
  2009-01-23  3:21                 ` [ltt-dev] " KOSAKI Motohiro
  1 sibling, 2 replies; 39+ messages in thread
From: Mathieu Desnoyers @ 2009-01-21  0:25 UTC (permalink / raw)
  To: Jens Axboe; +Cc: akpm, Ingo Molnar, Linus Torvalds, linux-kernel, ltt-dev

* Mathieu Desnoyers (mathieu.desnoyers@polymtl.ca) wrote:
> * Jens Axboe (jens.axboe@oracle.com) wrote:
> > On Tue, Jan 20 2009, Jens Axboe wrote:
> > > On Mon, Jan 19 2009, Mathieu Desnoyers wrote:
> > > > * Jens Axboe (jens.axboe@oracle.com) wrote:
> > > > > On Sun, Jan 18 2009, Mathieu Desnoyers wrote:
> > > > > > I looked at the "ls" behavior (while doing a dd) within my LTTng trace
> > > > > > to create a fio job file.  The said behavior is appended below as "Part
> > > > > > 1 - ls I/O behavior". Note that the original "ls" test case was done
> > > > > > with the anticipatory I/O scheduler, which was active by default on my
> > > > > > debian system with custom vanilla 2.6.28 kernel. Also note that I am
> > > > > > running this on a raid-1, but have experienced the same problem on a
> > > > > > standard partition I created on the same machine.
> > > > > > 
> > > > > > I created the fio job file appended as "Part 2 - dd+ls fio job file". It
> > > > > > consists of one dd-like job and many small jobs reading as many data as
> > > > > > ls did. I used the small test script to batch run this ("Part 3 - batch
> > > > > > test").
> > > > > > 
> > > > > > The results for the ls-like jobs are interesting :
> > > > > > 
> > > > > > I/O scheduler        runt-min (msec)   runt-max (msec)
> > > > > > noop                       41             10563
> > > > > > anticipatory               63              8185
> > > > > > deadline                   52             33387
> > > > > > cfq                        43              1420
> > > > > 
> > > > 
> > > > Extra note : I have a HZ=250 on my system. Changing to 100 or 1000 did
> > > > not make much difference (also tried with NO_HZ enabled).
> > > > 
> > > > > Do you have queuing enabled on your drives? You can check that in
> > > > > /sys/block/sdX/device/queue_depth. Try setting those to 1 and retest all
> > > > > schedulers, would be good for comparison.
> > > > > 
> > > > 
> > > > Here are the tests with a queue_depth of 1 :
> > > > 
> > > > I/O scheduler        runt-min (msec)   runt-max (msec)
> > > > noop                       43             38235
> > > > anticipatory               44              8728
> > > > deadline                   51             19751
> > > > cfq                        48               427
> > > > 
> > > > 
> > > > Overall, I wouldn't say it makes much difference.
> > > 
> > > 0,5 seconds vs 1,5 seconds isn't much of a difference?
> > > 
> > > > > raid personalities or dm complicates matters, since it introduces a
> > > > > disconnect between 'ls' and the io scheduler at the bottom...
> > > > > 
> > > > 
> > > > Yes, ideally I should re-run those directly on the disk partitions.
> > > 
> > > At least for comparison.
> > > 
> > > > I am also tempted to create a fio job file which acts like a ssh server
> > > > receiving a connexion after it has been pruned from the cache while the
> > > > system if doing heavy I/O. "ssh", in this case, seems to be doing much
> > > > more I/O than a simple "ls", and I think we might want to see if cfq
> > > > behaves correctly in such case. Most of this I/O is coming from page
> > > > faults (identified as traps in the trace) probably because the ssh
> > > > executable has been thrown out of the cache by
> > > > 
> > > > echo 3 > /proc/sys/vm/drop_caches
> > > > 
> > > > The behavior of an incoming ssh connexion after clearing the cache is
> > > > appended below (Part 1 - LTTng trace for incoming ssh connexion). The
> > > > job file created (Part 2) reads, for each job, a 2MB file with random
> > > > reads each between 4k-44k. The results are very interesting for cfq :
> > > > 
> > > > I/O scheduler        runt-min (msec)   runt-max (msec)
> > > > noop                       586           110242
> > > > anticipatory               531            26942
> > > > deadline                   561           108772
> > > > cfq                        523            28216
> > > > 
> > > > So, basically, ssh being out of the cache can take 28s to answer an
> > > > incoming ssh connexion even with the cfq scheduler. This is not exactly
> > > > what I would call an acceptable latency.
> > > 
> > > At some point, you have to stop and consider what is acceptable
> > > performance for a given IO pattern. Your ssh test case is purely random
> > > IO, and neither CFQ nor AS would do any idling for that. We can make
> > > this test case faster for sure, the hard part is making sure that we
> > > don't regress on async throughput at the same time.
> > > 
> > > Also remember that with your raid1, it's not entirely reasonable to
> > > blaim all performance issues on the IO scheduler as per my previous
> > > mail. It would be a lot more fair to view the disk numbers individually.
> > > 
> > > Can you retry this job with 'quantum' set to 1 and 'slice_async_rq' set
> > > to 1 as well?
> > > 
> > > However, I think we should be doing somewhat better at this test case.
> > 
> > Mathieu, does this improve anything for you?
> > 
> 
> So, I ran the tests with my corrected patch, and the results are very
> good !
> 
> "incoming ssh connexion" test
> 
> "config 2.6.28 cfq"
> Linux 2.6.28
> /sys/block/sd{a,b}/device/queue_depth = 31 (default)
> /sys/block/sd{a,b}/queue/iosched/slice_async_rq = 2 (default)
> /sys/block/sd{a,b}/queue/iosched/quantum = 4 (default)
> 
> "config 2.6.28.1-patch1"
> Linux 2.6.28.1
> Corrected cfq patch applied
> echo 1 > /sys/block/sd{a,b}/device/queue_depth
> echo 1 > /sys/block/sd{a,b}/queue/iosched/slice_async_rq
> echo 1 > /sys/block/sd{a,b}/queue/iosched/quantum
> 
> On /dev/sda :
> 
> I/O scheduler        runt-min (msec)   runt-max (msec)
> cfq (2.6.28 cfq)          523             6637
> cfq (2.6.28.1-patch1)     579             2082
> 
> On raid1 :
> 
> I/O scheduler        runt-min (msec)   runt-max (msec)
> cfq (2.6.28 cfq)           523            28216

As a side-note : I'd like to have my results confirmed by others. I just
found out that my 2 Seagate drives are in the "defect" list
(ST3500320AS) that exhibits the behavior to stop for about 30s when doing
"video streaming".
(http://www.computerworld.com/action/article.do?command=viewArticleBasic&taxonomyName=storage&articleId=9126280&taxonomyId=19&intsrc=kc_top)
(http://seagate.custkb.com/seagate/crm/selfservice/search.jsp?DocId=207931)

Therefore, I would not take any decision based on such known bad
firmware. But the last results we've got are definitely interesting.

I'll upgrade my firmware as soon as Segate puts it back online so I can
re-run more tests.

Mathieu

> cfq (2.6.28.1-patch1)      517             3086
> 
> It looks like we are getting somewhere :) Are there any specific
> queue_depth, slice_async_rq, quantum variations you would like to be
> tested ?
> 
> For reference, I attach my ssh-like job file (again) to this mail.
> 
> Mathieu
> 
> 
> [job1]
> rw=write
> size=10240m
> direct=0
> blocksize=1024k
> 
> [global]
> rw=randread
> size=2048k
> filesize=30m
> direct=0
> bsrange=4k-44k
> 
> [file1]
> startdelay=0
> 
> [file2]
> startdelay=4
> 
> [file3]
> startdelay=8
> 
> [file4]
> startdelay=12
> 
> [file5]
> startdelay=16
> 
> [file6]
> startdelay=20
> 
> [file7]
> startdelay=24
> 
> [file8]
> startdelay=28
> 
> [file9]
> startdelay=32
> 
> [file10]
> startdelay=36
> 
> [file11]
> startdelay=40
> 
> [file12]
> startdelay=44
> 
> [file13]
> startdelay=48
> 
> [file14]
> startdelay=52
> 
> [file15]
> startdelay=56
> 
> [file16]
> startdelay=60
> 
> [file17]
> startdelay=64
> 
> [file18]
> startdelay=68
> 
> [file19]
> startdelay=72
> 
> [file20]
> startdelay=76
> 
> [file21]
> startdelay=80
> 
> [file22]
> startdelay=84
> 
> [file23]
> startdelay=88
> 
> [file24]
> startdelay=92
> 
> [file25]
> startdelay=96
> 
> [file26]
> startdelay=100
> 
> [file27]
> startdelay=104
> 
> [file28]
> startdelay=108
> 
> [file29]
> startdelay=112
> 
> [file30]
> startdelay=116
> 
> [file31]
> startdelay=120
> 
> [file32]
> startdelay=124
> 
> [file33]
> startdelay=128
> 
> [file34]
> startdelay=132
> 
> [file35]
> startdelay=134
> 
> [file36]
> startdelay=138
> 
> [file37]
> startdelay=142
> 
> [file38]
> startdelay=146
> 
> [file39]
> startdelay=150
> 
> [file40]
> startdelay=200
> 
> [file41]
> startdelay=260
> 
> -- 
> Mathieu Desnoyers
> OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH] block: Fix bio merge induced high I/O latency
  2009-01-21  0:25                 ` Mathieu Desnoyers
@ 2009-01-21  4:38                   ` Ben Gamari
  2009-01-21  4:54                     ` [ltt-dev] " Mathieu Desnoyers
  2009-01-22 22:59                   ` Mathieu Desnoyers
  1 sibling, 1 reply; 39+ messages in thread
From: Ben Gamari @ 2009-01-21  4:38 UTC (permalink / raw)
  To: Mathieu Desnoyers, Jens Axboe
  Cc: akpm, Ingo Molnar, Linus Torvalds, linux-kernel, ltt-dev

On Tue, Jan 20, 2009 at 7:25 PM, Mathieu Desnoyers
<mathieu.desnoyers@polymtl.ca> wrote:
> * Mathieu Desnoyers (mathieu.desnoyers@polymtl.ca) wrote:
>
> As a side-note : I'd like to have my results confirmed by others.

Well, I think the (fixed) patch did help to some degree (I haven't
done fio benchmarks to compare against yet). Unfortunately, the I/O
wait time problem still remains. I have been waiting 3 minutes now for
evolution to start with 88% I/O wait time yet no visible signs of
progress. I've confirmed I'm using the CFQ scheduler, so that's not
the problem.

Also, Jens, I'd just like to point out that the problem is
reproducible across all schedulers. Does your patch seek to tackle a
problem specific to the CFQ scheduler, leaving the I/O wait issue for
later? Just wondering.

I'll post some benchmarks numbers once I have them. Thanks,

- Ben

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [ltt-dev] [RFC PATCH] block: Fix bio merge induced high I/O latency
  2009-01-21  4:38                   ` Ben Gamari
@ 2009-01-21  4:54                     ` Mathieu Desnoyers
  2009-01-21  6:17                       ` Ben Gamari
  0 siblings, 1 reply; 39+ messages in thread
From: Mathieu Desnoyers @ 2009-01-21  4:54 UTC (permalink / raw)
  To: Ben Gamari
  Cc: Jens Axboe, akpm, ltt-dev, Linus Torvalds, Ingo Molnar, linux-kernel

* Ben Gamari (bgamari@gmail.com) wrote:
> On Tue, Jan 20, 2009 at 7:25 PM, Mathieu Desnoyers
> <mathieu.desnoyers@polymtl.ca> wrote:
> > * Mathieu Desnoyers (mathieu.desnoyers@polymtl.ca) wrote:
> >
> > As a side-note : I'd like to have my results confirmed by others.
> 
> Well, I think the (fixed) patch did help to some degree (I haven't
> done fio benchmarks to compare against yet). Unfortunately, the I/O
> wait time problem still remains. I have been waiting 3 minutes now for
> evolution to start with 88% I/O wait time yet no visible signs of
> progress. I've confirmed I'm using the CFQ scheduler, so that's not
> the problem.
> 

Did you also 

echo 1 > /sys/block/sd{a,b}/device/queue_depth
echo 1 > /sys/block/sd{a,b}/queue/iosched/slice_async_rq
echo 1 > /sys/block/sd{a,b}/queue/iosched/quantum

(replacing sd{a,b} with your actual drives) ?

It seems to have been part of the factors that helped (along with the
patch).

And hopefully you don't have a recent Seagate hard drive like me ? :-)

So you test case is :
- start a large dd with 1M block size
- time evolution

?

Mathieu

> Also, Jens, I'd just like to point out that the problem is
> reproducible across all schedulers. Does your patch seek to tackle a
> problem specific to the CFQ scheduler, leaving the I/O wait issue for
> later? Just wondering.
> 
> I'll post some benchmarks numbers once I have them. Thanks,
> 
> - Ben
> 
> _______________________________________________
> ltt-dev mailing list
> ltt-dev@lists.casi.polymtl.ca
> http://lists.casi.polymtl.ca/cgi-bin/mailman/listinfo/ltt-dev
> 

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [ltt-dev] [RFC PATCH] block: Fix bio merge induced high I/O latency
  2009-01-21  4:54                     ` [ltt-dev] " Mathieu Desnoyers
@ 2009-01-21  6:17                       ` Ben Gamari
  0 siblings, 0 replies; 39+ messages in thread
From: Ben Gamari @ 2009-01-21  6:17 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Jens Axboe, akpm, ltt-dev, Linus Torvalds, Ingo Molnar, linux-kernel

On Tue, 2009-01-20 at 23:54 -0500, Mathieu Desnoyers wrote:
> * Ben Gamari (bgamari@gmail.com) wrote:
> > On Tue, Jan 20, 2009 at 7:25 PM, Mathieu Desnoyers
> > <mathieu.desnoyers@polymtl.ca> wrote:
> > > * Mathieu Desnoyers (mathieu.desnoyers@polymtl.ca) wrote:
> > >
> > > As a side-note : I'd like to have my results confirmed by others.
> > 
> > Well, I think the (fixed) patch did help to some degree (I haven't
> > done fio benchmarks to compare against yet). Unfortunately, the I/O
> > wait time problem still remains. I have been waiting 3 minutes now for
> > evolution to start with 88% I/O wait time yet no visible signs of
> > progress. I've confirmed I'm using the CFQ scheduler, so that's not
> > the problem.
> > 
> 
> Did you also 
> 
> echo 1 > /sys/block/sd{a,b}/device/queue_depth
I have been using this in some of my measurements (this is recorded, of
course).

> echo 1 > /sys/block/sd{a,b}/queue/iosched/slice_async_rq
> echo 1 > /sys/block/sd{a,b}/queue/iosched/quantum
I haven't been doing this although I will collect a data set with these
parameters set. It would be to compare the effect of this to the default
configuration.

> 
> (replacing sd{a,b} with your actual drives) ?
> 
> It seems to have been part of the factors that helped (along with the
> patch).
> 
> And hopefully you don't have a recent Seagate hard drive like me ? :-)
Thankfully, no.

> 
> So you test case is :
> - start a large dd with 1M block size
> - time evolution
> 
I've been using evolution to get a rough idea of the performance of the
configurations but not as a benchmark per se. I have some pretty
good-sized maildirs, so launching evolution for the first time can be
quite a task, IO-wise. Also, switching between folders used to be quite
time consuming. It seems like the patch did help a bit on this front
though.

For a quantitative benchmark I've been using the fio job that you posted
earlier. I've been collecting results and should have a pretty good data
set soon.

I'll send out a compilation of all the data I've collected as soon as
I've finished.

- Ben



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH] block: Fix bio merge induced high I/O latency
  2009-01-20 20:22             ` Ben Gamari
  2009-01-20 22:23               ` Ben Gamari
@ 2009-01-22  2:35               ` Ben Gamari
  1 sibling, 0 replies; 39+ messages in thread
From: Ben Gamari @ 2009-01-22  2:35 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Mathieu Desnoyers, Andrea Arcangeli, akpm, Ingo Molnar,
	Linus Torvalds, linux-kernel, ltt-dev

I'm not sure if this will help, but I just completed another set of
benchmarks using Jens' patch and a variety of device parameters. Again,
I don't know if this will help anyone, but I figured it might help
quantify the differences between device parameters. Let me know if
there's any other benchmarking or testing that I can do. Thanks,

- Ben


			mint		maxt
==========================================================
queue_depth=1, slice_async_rq=1, quantum=1, patched
anticipatory		25 msec		4410 msec
cfq			27 msec		1466 msec
deadline		36 msec		10735 msec
noop			48 msec		37439 msec
==========================================================
queue_depth=1, slice_async_rq=1, quantum=4, patched
anticipatory		38 msec		3579 msec
cfq			35 msec		822 msec
deadline		37 msec		10072 msec
noop			32 msec		45535 msec
==========================================================
queue_depth=1, slice_async_rq=2, quantum=1, patched
anticipatory		33 msec		4480 msec
cfq			28 msec		353 msec
deadline		30 msec		6738 msec
noop			36 msec		39691 msec
==========================================================
queue_depth=1, slice_async_rq=2, quantum=4, patched
anticipatory		40 msec		4498 msec
cfq			35 msec		1395 msec
deadline		41 msec		6877 msec
noop			38 msec		46410 msec
==========================================================
queue_depth=31, slice_async_rq=1, quantum=1, patched
anticipatory		31 msec		6011 msec
cfq			36 msec		4575 msec
deadline		41 msec		18599 msec
noop			38 msec		46347 msec
==========================================================
queue_depth=31, slice_async_rq=2, quantum=1, patched
anticipatory		30 msec		9985 msec
cfq			33 msec		4200 msec
deadline		38 msec		22285 msec
noop			25 msec		40245 msec
==========================================================
queue_depth=31, slice_async_rq=2, quantum=4, patched
anticipatory		30 msec		12197 msec
cfq			30 msec		3457 msec
deadline		35 msec		18969 msec
noop			34 msec		42803 msec



On Tue, 2009-01-20 at 15:22 -0500, Ben Gamari wrote:
> On Tue, Jan 20, 2009 at 2:37 AM, Jens Axboe <jens.axboe@oracle.com> wrote:
> > On Mon, Jan 19 2009, Mathieu Desnoyers wrote:
> >> * Jens Axboe (jens.axboe@oracle.com) wrote:
> >> Yes, ideally I should re-run those directly on the disk partitions.
> >
> > At least for comparison.
> >
> 
> I just completed my own set of benchmarks using the fio job file
> Mathieu provided. This was on a 2.5 inch 7200 RPM SATA partition
> formatted as ext3. As you can see, I tested all of the available
> schedulers with both queuing enabled and disabled. I'll test the Jens'
> patch soon. Would a blktrace of the fio run help? Let me know if
> there's any other benchmarking or profiling that could be done.
> Thanks,
> 
> - Ben
> 
> 
> 			mint		maxt
> ==========================================================
> queue_depth=31:
> anticipatory		35 msec		11036 msec
> cfq			37 msec		3350 msec
> deadline		36 msec		18144 msec
> noop			39 msec		41512 msec
> 
> ==========================================================
> queue_depth=1:
> anticipatory		45 msec		9561 msec
> cfq			28 msec		3974 msec
> deadline		47 msec		16802 msec
> noop			35 msec		38173 msec


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH] block: Fix bio merge induced high I/O latency
  2009-01-21  0:25                 ` Mathieu Desnoyers
  2009-01-21  4:38                   ` Ben Gamari
@ 2009-01-22 22:59                   ` Mathieu Desnoyers
  1 sibling, 0 replies; 39+ messages in thread
From: Mathieu Desnoyers @ 2009-01-22 22:59 UTC (permalink / raw)
  To: Jens Axboe; +Cc: akpm, Ingo Molnar, Linus Torvalds, linux-kernel, ltt-dev

* Mathieu Desnoyers (mathieu.desnoyers@polymtl.ca) wrote:
> * Mathieu Desnoyers (mathieu.desnoyers@polymtl.ca) wrote:
> > * Jens Axboe (jens.axboe@oracle.com) wrote:
> > > On Tue, Jan 20 2009, Jens Axboe wrote:
> > > > On Mon, Jan 19 2009, Mathieu Desnoyers wrote:
> > > > > * Jens Axboe (jens.axboe@oracle.com) wrote:
> > > > > > On Sun, Jan 18 2009, Mathieu Desnoyers wrote:
> > > > > > > I looked at the "ls" behavior (while doing a dd) within my LTTng trace
> > > > > > > to create a fio job file.  The said behavior is appended below as "Part
> > > > > > > 1 - ls I/O behavior". Note that the original "ls" test case was done
> > > > > > > with the anticipatory I/O scheduler, which was active by default on my
> > > > > > > debian system with custom vanilla 2.6.28 kernel. Also note that I am
> > > > > > > running this on a raid-1, but have experienced the same problem on a
> > > > > > > standard partition I created on the same machine.
> > > > > > > 
> > > > > > > I created the fio job file appended as "Part 2 - dd+ls fio job file". It
> > > > > > > consists of one dd-like job and many small jobs reading as many data as
> > > > > > > ls did. I used the small test script to batch run this ("Part 3 - batch
> > > > > > > test").
> > > > > > > 
> > > > > > > The results for the ls-like jobs are interesting :
> > > > > > > 
> > > > > > > I/O scheduler        runt-min (msec)   runt-max (msec)
> > > > > > > noop                       41             10563
> > > > > > > anticipatory               63              8185
> > > > > > > deadline                   52             33387
> > > > > > > cfq                        43              1420
> > > > > > 
> > > > > 
> > > > > Extra note : I have a HZ=250 on my system. Changing to 100 or 1000 did
> > > > > not make much difference (also tried with NO_HZ enabled).
> > > > > 
> > > > > > Do you have queuing enabled on your drives? You can check that in
> > > > > > /sys/block/sdX/device/queue_depth. Try setting those to 1 and retest all
> > > > > > schedulers, would be good for comparison.
> > > > > > 
> > > > > 
> > > > > Here are the tests with a queue_depth of 1 :
> > > > > 
> > > > > I/O scheduler        runt-min (msec)   runt-max (msec)
> > > > > noop                       43             38235
> > > > > anticipatory               44              8728
> > > > > deadline                   51             19751
> > > > > cfq                        48               427
> > > > > 
> > > > > 
> > > > > Overall, I wouldn't say it makes much difference.
> > > > 
> > > > 0,5 seconds vs 1,5 seconds isn't much of a difference?
> > > > 
> > > > > > raid personalities or dm complicates matters, since it introduces a
> > > > > > disconnect between 'ls' and the io scheduler at the bottom...
> > > > > > 
> > > > > 
> > > > > Yes, ideally I should re-run those directly on the disk partitions.
> > > > 
> > > > At least for comparison.
> > > > 
> > > > > I am also tempted to create a fio job file which acts like a ssh server
> > > > > receiving a connexion after it has been pruned from the cache while the
> > > > > system if doing heavy I/O. "ssh", in this case, seems to be doing much
> > > > > more I/O than a simple "ls", and I think we might want to see if cfq
> > > > > behaves correctly in such case. Most of this I/O is coming from page
> > > > > faults (identified as traps in the trace) probably because the ssh
> > > > > executable has been thrown out of the cache by
> > > > > 
> > > > > echo 3 > /proc/sys/vm/drop_caches
> > > > > 
> > > > > The behavior of an incoming ssh connexion after clearing the cache is
> > > > > appended below (Part 1 - LTTng trace for incoming ssh connexion). The
> > > > > job file created (Part 2) reads, for each job, a 2MB file with random
> > > > > reads each between 4k-44k. The results are very interesting for cfq :
> > > > > 
> > > > > I/O scheduler        runt-min (msec)   runt-max (msec)
> > > > > noop                       586           110242
> > > > > anticipatory               531            26942
> > > > > deadline                   561           108772
> > > > > cfq                        523            28216
> > > > > 
> > > > > So, basically, ssh being out of the cache can take 28s to answer an
> > > > > incoming ssh connexion even with the cfq scheduler. This is not exactly
> > > > > what I would call an acceptable latency.
> > > > 
> > > > At some point, you have to stop and consider what is acceptable
> > > > performance for a given IO pattern. Your ssh test case is purely random
> > > > IO, and neither CFQ nor AS would do any idling for that. We can make
> > > > this test case faster for sure, the hard part is making sure that we
> > > > don't regress on async throughput at the same time.
> > > > 
> > > > Also remember that with your raid1, it's not entirely reasonable to
> > > > blaim all performance issues on the IO scheduler as per my previous
> > > > mail. It would be a lot more fair to view the disk numbers individually.
> > > > 
> > > > Can you retry this job with 'quantum' set to 1 and 'slice_async_rq' set
> > > > to 1 as well?
> > > > 
> > > > However, I think we should be doing somewhat better at this test case.
> > > 
> > > Mathieu, does this improve anything for you?
> > > 
> > 
> > So, I ran the tests with my corrected patch, and the results are very
> > good !
> > 
> > "incoming ssh connexion" test
> > 
> > "config 2.6.28 cfq"
> > Linux 2.6.28
> > /sys/block/sd{a,b}/device/queue_depth = 31 (default)
> > /sys/block/sd{a,b}/queue/iosched/slice_async_rq = 2 (default)
> > /sys/block/sd{a,b}/queue/iosched/quantum = 4 (default)
> > 
> > "config 2.6.28.1-patch1"
> > Linux 2.6.28.1
> > Corrected cfq patch applied
> > echo 1 > /sys/block/sd{a,b}/device/queue_depth
> > echo 1 > /sys/block/sd{a,b}/queue/iosched/slice_async_rq
> > echo 1 > /sys/block/sd{a,b}/queue/iosched/quantum
> > 
> > On /dev/sda :
> > 
> > I/O scheduler        runt-min (msec)   runt-max (msec)
> > cfq (2.6.28 cfq)          523             6637
> > cfq (2.6.28.1-patch1)     579             2082
> > 
> > On raid1 :
> > 
> > I/O scheduler        runt-min (msec)   runt-max (msec)
> > cfq (2.6.28 cfq)           523            28216
> 
> As a side-note : I'd like to have my results confirmed by others. I just
> found out that my 2 Seagate drives are in the "defect" list
> (ST3500320AS) that exhibits the behavior to stop for about 30s when doing
> "video streaming".
> (http://www.computerworld.com/action/article.do?command=viewArticleBasic&taxonomyName=storage&articleId=9126280&taxonomyId=19&intsrc=kc_top)
> (http://seagate.custkb.com/seagate/crm/selfservice/search.jsp?DocId=207931)
> 
> Therefore, I would not take any decision based on such known bad
> firmware. But the last results we've got are definitely interesting.
> 
> I'll upgrade my firmware as soon as Segate puts it back online so I can
> re-run more tests.
> 

After firmware upgrade :

"incoming ssh connexion" test
(ran the job file 2-3 times to get correct runt-max results)

"config 2.6.28.1 dfl"
Linux 2.6.28.1
/sys/block/sd{a,b}/device/queue_depth = 31 (default)
/sys/block/sd{a,b}/queue/iosched/slice_async_rq = 2 (default)
/sys/block/sd{a,b}/queue/iosched/quantum = 4 (default)

"config 2.6.28.1 1"
Linux 2.6.28.1
echo 1 > /sys/block/sd{a,b}/device/queue_depth
echo 1 > /sys/block/sd{a,b}/queue/iosched/slice_async_rq
echo 1 > /sys/block/sd{a,b}/queue/iosched/quantum

"config 2.6.28.1-patch dfl"
Linux 2.6.28.1
Corrected cfq patch applied
/sys/block/sd{a,b}/device/queue_depth = 31 (default)
/sys/block/sd{a,b}/queue/iosched/slice_async_rq = 2 (default)
/sys/block/sd{a,b}/queue/iosched/quantum = 4 (default)

"config 2.6.28.1-patch 1"
Linux 2.6.28.1
Corrected cfq patch applied
echo 1 > /sys/block/sd{a,b}/device/queue_depth
echo 1 > /sys/block/sd{a,b}/queue/iosched/slice_async_rq
echo 1 > /sys/block/sd{a,b}/queue/iosched/quantum

On /dev/sda :

I/O scheduler        runt-min (msec) runt-avg (msec)  runt-max (msec)
cfq (2.6.28.1 dfl)        560            4134.04         12125
cfq (2.6.28.1-patch dfl)  508            4329.75          9625
cfq (2.6.28.1 1)          535            1068.46          2622
cfq (2.6.28.1-patch 1)    511            2239.87          4117

On /dev/md1 (raid1) :

I/O scheduler        runt-min (msec) runt-avg (msec)  runt-max (msec)
cfq (2.6.28.1 dfl)        507            4053.19         26265
cfq (2.6.28.1-patch dfl)  532            3991.75         18567
cfq (2.6.28.1 1)          510            1900.14         27410
cfq (2.6.28.1-patch 1)    539            2112.60         22859


A fio output taken from the raid1 cfq (2.6.28.1-patch 1) run looks like
the following. It's a bit strange that we have readers started earlier
which seems to complete only _after_ more recent readers have.

Excerpt (full output appended after email) :

Jobs: 2 (f=1): [W________________rPPPPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [     0/
Jobs: 3 (f=1): [W________________rrPPPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [     0/
Jobs: 3 (f=1): [W________________rrPPPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [     0/
Jobs: 3 (f=1): [W________________rrPPPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [     0/
Jobs: 3 (f=1): [W________________rrPPPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [     0/
Jobs: 4 (f=1): [W________________rrrPPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [     0/
Jobs: 4 (f=1): [W________________rrrPPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [     0/
Jobs: 4 (f=1): [W________________rrrPPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [     0/
Jobs: 4 (f=1): [W________________rrrPPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [     0/
Jobs: 5 (f=1): [W________________rrrrPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [     0/
Jobs: 5 (f=1): [W________________rrrrPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [     0/
Jobs: 5 (f=1): [W________________rrrrPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [     0/
Jobs: 5 (f=1): [W________________rrrrPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [     0/
Jobs: 6 (f=2): [W________________rrrrrPPPPPPPPPPPPPPPPPPPP] [0.0% done] [   560/
Jobs: 5 (f=1): [W________________rrrr_PPPPPPPPPPPPPPPPPPPP] [0.0% done] [  1512/
Jobs: 5 (f=1): [W________________rrrr_PPPPPPPPPPPPPPPPPPPP] [0.0% done] [     0/
Jobs: 5 (f=1): [W________________rrrr_PPPPPPPPPPPPPPPPPPPP] [0.0% done] [     0/
Jobs: 6 (f=2): [W________________rrrr_rPPPPPPPPPPPPPPPPPPP] [0.0% done] [   144/
Jobs: 5 (f=1): [W________________rrrr__PPPPPPPPPPPPPPPPPPP] [0.0% done] [  1932/
Jobs: 5 (f=1): [W________________rrrr__PPPPPPPPPPPPPPPPPPP] [0.0% done] [     0/
Jobs: 5 (f=1): [W________________rrrr__IPPPPPPPPPPPPPPPPPP] [0.0% done] [     0/
Jobs: 6 (f=2): [W________________rrrr__rPPPPPPPPPPPPPPPPPP] [0.0% done] [   608/
Jobs: 6 (f=2): [W________________rrrr__rPPPPPPPPPPPPPPPPPP] [0.0% done] [  1052/
Jobs: 5 (f=1): [W________________rrrr___PPPPPPPPPPPPPPPPPP] [0.0% done] [   388/
Jobs: 5 (f=1): [W________________rrrr___IPPPPPPPPPPPPPPPPP] [0.0% done] [     0/
Jobs: 5 (f=1): [W________________rrrr____PPPPPPPPPPPPPPPPP] [0.0% done] [  2076/
Jobs: 5 (f=5): [W________________rrrr____PPPPPPPPPPPPPPPPP] [49.0% done] [  2936
Jobs: 2 (f=2): [W_________________r______PPPPPPPPPPPPPPPPP] [50.8% done] [  5192
Jobs: 2 (f=2): [W________________________rPPPPPPPPPPPPPPPP] [16.0% done] [   104

Given the numbers I get, I see that runt-max numbers does not appear to
be so high at each job file run, which makes it difficult to compare
them (since you never know if you've hit the worse-case yet). This could
be related to raid1, because I've seen this both with and without your
patch applied, and it only seems to appear on raid1 executions.

However, the patch you sent does not seem to improve the behavior. It
actually makes the average and max latency worse in almost every case.
Changing the queue, slice_async_rq and quantum parameters clearly helps
reducing both avg and max latency.

Mathieu


Full output :

Running cfq
Starting 42 processes

Jobs: 1 (f=1): [W_PPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [5.7% done] [     0/
Jobs: 1 (f=1): [W_PPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [7.7% done] [     0/
Jobs: 1 (f=1): [W_IPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [9.6% done] [     0/
Jobs: 2 (f=2): [W_rPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [11.1% done] [   979
Jobs: 1 (f=1): [W__PPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [10.9% done] [  1098
Jobs: 1 (f=1): [W__PPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [12.9% done] [     0
Jobs: 2 (f=2): [W__rPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [112.5% done] [     
Jobs: 2 (f=2): [W__rPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [16.1% done] [  1160
Jobs: 2 (f=1): [W__rPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [15.9% done] [   888
Jobs: 2 (f=1): [W__rPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [16.0% done] [     0
Jobs: 3 (f=2): [W__rrPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [     0/
Jobs: 3 (f=2): [W__rrPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [     0/
Jobs: 3 (f=2): [W__rrPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [     0/
Jobs: 3 (f=2): [W__rrPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [     0/
Jobs: 3 (f=3): [W___rrPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [16.7% done] [   660
Jobs: 2 (f=2): [W____rPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [18.0% done] [  2064
Jobs: 1 (f=1): [W_____PPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [19.4% done] [  1392
Jobs: 1 (f=1): [W_____PPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [20.6% done] [     0
Jobs: 2 (f=2): [W_____rPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [105.0% done] [     
Jobs: 2 (f=2): [W_____rPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [110.0% done] [     
Jobs: 2 (f=2): [W_____rPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [115.0% done] [     
Jobs: 2 (f=2): [W_____rPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [120.0% done] [     
Jobs: 3 (f=3): [W_____rrPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [104.2% done] [     
Jobs: 3 (f=3): [W_____rrPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [108.3% done] [     
Jobs: 3 (f=3): [W_____rrPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [112.5% done] [     
Jobs: 3 (f=3): [W_____rrPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [116.7% done] [     
Jobs: 4 (f=4): [W_____rrrPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [103.6% done] [     
Jobs: 4 (f=4): [W_____rrrPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [107.1% done] [     
Jobs: 4 (f=4): [W_____rrrPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [9.8% done] [   280/
Jobs: 3 (f=3): [W_____r_rPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [34.0% done] [  3624
Jobs: 2 (f=2): [W________rPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [34.0% done] [  2744
Jobs: 1 (f=1): [W_________PPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [34.0% done] [  1620
Jobs: 1 (f=1): [W_________PPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [34.0% done] [     0
Jobs: 1 (f=1): [W_________PPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [34.3% done] [     0
Jobs: 2 (f=2): [W_________rPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [34.9% done] [   116
Jobs: 1 (f=1): [W__________PPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [34.9% done] [  1944
Jobs: 1 (f=1): [W__________PPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [35.8% done] [     0
Jobs: 1 (f=1): [W__________PPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [36.4% done] [     0
Jobs: 2 (f=2): [W__________rPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [36.9% done] [   228
Jobs: 2 (f=2): [W__________rPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [37.2% done] [  1420
Jobs: 1 (f=1): [W___________PPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [37.7% done] [   400
Jobs: 1 (f=1): [W___________PPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [39.1% done] [     0
Jobs: 2 (f=2): [W___________rPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [39.3% done] [   268
Jobs: 2 (f=2): [W___________rPPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [39.5% done] [   944
Jobs: 1 (f=1): [W____________PPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [40.3% done] [   848
Jobs: 1 (f=1): [W____________IPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [40.5% done] [     0
Jobs: 2 (f=2): [W____________rPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [41.0% done] [   400
Jobs: 2 (f=2): [W____________rPPPPPPPPPPPPPPPPPPPPPPPPPPPP] [41.1% done] [  1208
Jobs: 1 (f=1): [W_____________PPPPPPPPPPPPPPPPPPPPPPPPPPPP] [41.9% done] [   456
Jobs: 2 (f=2): [W_____________rPPPPPPPPPPPPPPPPPPPPPPPPPPP] [101.9% done] [     
Jobs: 2 (f=2): [W_____________rPPPPPPPPPPPPPPPPPPPPPPPPPPP] [42.9% done] [   380
Jobs: 2 (f=2): [W_____________rPPPPPPPPPPPPPPPPPPPPPPPPPPP] [43.3% done] [   760
Jobs: 1 (f=1): [W______________PPPPPPPPPPPPPPPPPPPPPPPPPPP] [43.8% done] [   912
Jobs: 2 (f=2): [W______________rPPPPPPPPPPPPPPPPPPPPPPPPPP] [44.2% done] [    44
Jobs: 2 (f=2): [W______________rPPPPPPPPPPPPPPPPPPPPPPPPPP] [44.6% done] [  1020
Jobs: 1 (f=1): [W_______________PPPPPPPPPPPPPPPPPPPPPPPPPP] [45.4% done] [  1008
Jobs: 1 (f=1): [W_______________PPPPPPPPPPPPPPPPPPPPPPPPPP] [46.2% done] [     0
Jobs: 2 (f=2): [W_______________rPPPPPPPPPPPPPPPPPPPPPPPPP] [46.6% done] [    52
Jobs: 2 (f=2): [W_______________rPPPPPPPPPPPPPPPPPPPPPPPPP] [47.0% done] [  1248
Jobs: 1 (f=1): [W________________PPPPPPPPPPPPPPPPPPPPPPPPP] [47.4% done] [   760
Jobs: 1 (f=1): [W________________PPPPPPPPPPPPPPPPPPPPPPPPP] [48.1% done] [     0
Jobs: 2 (f=1): [W________________rPPPPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [     0/
Jobs: 2 (f=1): [W________________rPPPPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [     0/
Jobs: 2 (f=1): [W________________rPPPPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [     0/
Jobs: 2 (f=1): [W________________rPPPPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [     0/
Jobs: 3 (f=1): [W________________rrPPPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [     0/
Jobs: 3 (f=1): [W________________rrPPPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [     0/
Jobs: 3 (f=1): [W________________rrPPPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [     0/
Jobs: 3 (f=1): [W________________rrPPPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [     0/
Jobs: 4 (f=1): [W________________rrrPPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [     0/
Jobs: 4 (f=1): [W________________rrrPPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [     0/
Jobs: 4 (f=1): [W________________rrrPPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [     0/
Jobs: 4 (f=1): [W________________rrrPPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [     0/
Jobs: 5 (f=1): [W________________rrrrPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [     0/
Jobs: 5 (f=1): [W________________rrrrPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [     0/
Jobs: 5 (f=1): [W________________rrrrPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [     0/
Jobs: 5 (f=1): [W________________rrrrPPPPPPPPPPPPPPPPPPPPP] [0.0% done] [     0/
Jobs: 6 (f=2): [W________________rrrrrPPPPPPPPPPPPPPPPPPPP] [0.0% done] [   560/
Jobs: 5 (f=1): [W________________rrrr_PPPPPPPPPPPPPPPPPPPP] [0.0% done] [  1512/
Jobs: 5 (f=1): [W________________rrrr_PPPPPPPPPPPPPPPPPPPP] [0.0% done] [     0/
Jobs: 5 (f=1): [W________________rrrr_PPPPPPPPPPPPPPPPPPPP] [0.0% done] [     0/
Jobs: 6 (f=2): [W________________rrrr_rPPPPPPPPPPPPPPPPPPP] [0.0% done] [   144/
Jobs: 5 (f=1): [W________________rrrr__PPPPPPPPPPPPPPPPPPP] [0.0% done] [  1932/
Jobs: 5 (f=1): [W________________rrrr__PPPPPPPPPPPPPPPPPPP] [0.0% done] [     0/
Jobs: 5 (f=1): [W________________rrrr__IPPPPPPPPPPPPPPPPPP] [0.0% done] [     0/
Jobs: 6 (f=2): [W________________rrrr__rPPPPPPPPPPPPPPPPPP] [0.0% done] [   608/
Jobs: 6 (f=2): [W________________rrrr__rPPPPPPPPPPPPPPPPPP] [0.0% done] [  1052/
Jobs: 5 (f=1): [W________________rrrr___PPPPPPPPPPPPPPPPPP] [0.0% done] [   388/
Jobs: 5 (f=1): [W________________rrrr___IPPPPPPPPPPPPPPPPP] [0.0% done] [     0/
Jobs: 5 (f=1): [W________________rrrr____PPPPPPPPPPPPPPPPP] [0.0% done] [  2076/
Jobs: 5 (f=5): [W________________rrrr____PPPPPPPPPPPPPPPPP] [49.0% done] [  2936
Jobs: 2 (f=2): [W_________________r______PPPPPPPPPPPPPPPPP] [50.8% done] [  5192
Jobs: 2 (f=2): [W________________________rPPPPPPPPPPPPPPPP] [16.0% done] [   104
Jobs: 2 (f=2): [W________________________rPPPPPPPPPPPPPPPP] [54.7% done] [  1052
Jobs: 1 (f=1): [W_________________________PPPPPPPPPPPPPPPP] [56.6% done] [  1016
Jobs: 1 (f=1): [W_________________________PPPPPPPPPPPPPPPP] [58.1% done] [     0
Jobs: 2 (f=2): [W_________________________rPPPPPPPPPPPPPPP] [59.8% done] [    52
Jobs: 2 (f=2): [W_________________________rPPPPPPPPPPPPPPP] [61.1% done] [  1372
Jobs: 1 (f=1): [W__________________________PPPPPPPPPPPPPPP] [63.2% done] [   652
Jobs: 1 (f=1): [W__________________________PPPPPPPPPPPPPPP] [65.0% done] [     0
Jobs: 2 (f=1): [W__________________________rPPPPPPPPPPPPPP] [0.0% done] [     0/
Jobs: 2 (f=1): [W__________________________rPPPPPPPPPPPPPP] [0.0% done] [     0/
Jobs: 2 (f=1): [W__________________________rPPPPPPPPPPPPPP] [0.0% done] [     0/
Jobs: 2 (f=1): [W__________________________rPPPPPPPPPPPPPP] [0.0% done] [     0/
Jobs: 3 (f=3): [W__________________________rrPPPPPPPPPPPPP] [67.3% done] [  1224
Jobs: 2 (f=2): [W___________________________rPPPPPPPPPPPPP] [68.8% done] [  2124
Jobs: 1 (f=1): [W____________________________PPPPPPPPPPPPP] [69.8% done] [   780
Jobs: 1 (f=1): [W____________________________PPPPPPPPPPPPP] [71.3% done] [     0
Jobs: 2 (f=2): [W____________________________rPPPPPPPPPPPP] [72.9% done] [    84
Jobs: 2 (f=2): [W____________________________rPPPPPPPPPPPP] [73.1% done] [  1312
Jobs: 1 (f=1): [W_____________________________PPPPPPPPPPPP] [73.2% done] [   688
Jobs: 1 (f=1): [W_____________________________PPPPPPPPPPPP] [73.9% done] [     0
Jobs: 2 (f=2): [W_____________________________rPPPPPPPPPPP] [73.6% done] [   476
Jobs: 1 (f=1): [W_____________________________EPPPPPPPPPPP] [73.8% done] [  1608
Jobs: 1 (f=1): [W______________________________PPPPPPPPPPP] [73.9% done] [     0
Jobs: 1 (f=1): [W______________________________PPPPPPPPPPP] [74.1% done] [     0
Jobs: 2 (f=2): [W______________________________rPPPPPPPPPP] [74.7% done] [   228
Jobs: 2 (f=2): [W______________________________rPPPPPPPPPP] [74.8% done] [  1564
Jobs: 1 (f=1): [W_______________________________PPPPPPPPPP] [75.5% done] [   264
Jobs: 1 (f=1): [W_______________________________PPPPPPPPPP] [76.1% done] [     0
Jobs: 2 (f=2): [W_______________________________rPPPPPPPPP] [76.2% done] [   516
Jobs: 1 (f=1): [W________________________________PPPPPPPPP] [75.9% done] [  1532
Jobs: 1 (f=1): [W________________________________PPPPPPPPP] [76.0% done] [     0
Jobs: 1 (f=1): [W________________________________PPPPPPPPP] [76.2% done] [     0
Jobs: 2 (f=2): [W________________________________rPPPPPPPP] [76.5% done] [   768
Jobs: 1 (f=1): [W_________________________________PPPPPPPP] [76.6% done] [  1316
Jobs: 1 (f=1): [W_________________________________PPPPPPPP] [76.7% done] [     0
Jobs: 1 (f=1): [W_________________________________IPPPPPPP] [77.8% done] [     0
Jobs: 2 (f=2): [W_________________________________rPPPPPPP] [77.9% done] [   604
Jobs: 1 (f=1): [W__________________________________IPPPPPP] [78.0% done] [  1444
Jobs: 2 (f=2): [W__________________________________rPPPPPP] [78.2% done] [  1145
Jobs: 1 (f=1): [W___________________________________PPPPPP] [78.3% done] [   932
Jobs: 1 (f=1): [W___________________________________PPPPPP] [79.3% done] [     0
Jobs: 2 (f=2): [W___________________________________rPPPPP] [100.7% done] [     
Jobs: 2 (f=2): [W___________________________________rPPPPP] [80.0% done] [  1012
Jobs: 1 (f=1): [W____________________________________PPPPP] [80.6% done] [  1072
Jobs: 1 (f=1): [W____________________________________PPPPP] [81.6% done] [     0
Jobs: 2 (f=2): [W____________________________________rPPPP] [72.2% done] [    36
Jobs: 2 (f=2): [W____________________________________rPPPP] [82.3% done] [   956
Jobs: 1 (f=1): [W_____________________________________PPPP] [82.9% done] [  1076
Jobs: 1 (f=1): [W_____________________________________PPPP] [83.4% done] [     0
Jobs: 2 (f=2): [W_____________________________________rPPP] [78.2% done] [    48
Jobs: 2 (f=2): [W_____________________________________rPPP] [84.6% done] [  1060
Jobs: 1 (f=1): [W______________________________________PPP] [85.1% done] [   956
Jobs: 1 (f=1): [W______________________________________PPP] [85.7% done] [     0
Jobs: 2 (f=2): [W______________________________________rPP] [86.3% done] [    96
Jobs: 2 (f=2): [W______________________________________rPP] [86.4% done] [   756
Jobs: 1 (f=1): [W_______________________________________PP] [86.9% done] [  1212
Jobs: 1 (f=1): [W_______________________________________PP] [87.5% done] [     0
Jobs: 1 (f=1): [W_______________________________________PP] [88.6% done] [     0
Jobs: 1 (f=1): [W_______________________________________PP] [89.1% done] [     0
Jobs: 1 (f=1): [W_______________________________________PP] [90.2% done] [     0
Jobs: 1 (f=1): [W_______________________________________PP] [90.8% done] [     0
Jobs: 1 (f=1): [W_______________________________________PP] [91.4% done] [     0
Jobs: 1 (f=1): [W_______________________________________PP] [92.5% done] [     0
Jobs: 1 (f=1): [W_______________________________________PP] [93.1% done] [     0
Jobs: 1 (f=1): [W_______________________________________PP] [93.6% done] [     0
Jobs: 1 (f=1): [W_______________________________________PP] [94.2% done] [     0
Jobs: 1 (f=1): [W_______________________________________PP] [95.3% done] [     0
Jobs: 1 (f=1): [W_______________________________________PP] [95.9% done] [     0
Jobs: 1 (f=1): [W_______________________________________PP] [97.1% done] [     0
Jobs: 1 (f=1): [W_______________________________________PP] [97.7% done] [     0
Jobs: 1 (f=1): [W_______________________________________PP] [98.2% done] [     0
Jobs: 1 (f=1): [W_______________________________________PP] [98.8% done] [     0
Jobs: 1 (f=1): [W_______________________________________PP] [98.8% done] [     0
Jobs: 1 (f=1): [W_______________________________________PP] [98.9% done] [     0
Jobs: 1 (f=1): [W_______________________________________PP] [98.9% done] [     0
Jobs: 1 (f=1): [W_______________________________________PP] [98.9% done] [     0
Jobs: 1 (f=1): [W_______________________________________PP] [98.9% done] [     0
Jobs: 1 (f=1): [W_______________________________________PP] [98.9% done] [     0
Jobs: 1 (f=1): [W_______________________________________PP] [98.9% done] [     0
Jobs: 1 (f=1): [W_______________________________________PP] [98.9% done] [     0
Jobs: 1 (f=1): [W_______________________________________PP] [98.9% done] [     0
Jobs: 1 (f=1): [W_______________________________________PP] [98.9% done] [     0
Jobs: 1 (f=1): [W_______________________________________PP] [98.9% done] [     0
Jobs: 1 (f=1): [W_______________________________________PP] [98.9% done] [     0
Jobs: 1 (f=1): [W_______________________________________PP] [98.9% done] [     0
Jobs: 1 (f=1): [W_______________________________________PP] [98.9% done] [     0
Jobs: 1 (f=1): [W_______________________________________PP] [98.9% done] [     0
Jobs: 1 (f=1): [W_______________________________________PP] [98.9% done] [     0
Jobs: 1 (f=1): [W_______________________________________PP] [98.9% done] [     0
Jobs: 1 (f=1): [W_______________________________________PP] [98.9% done] [     0
Jobs: 1 (f=1): [W_______________________________________PP] [99.0% done] [     0
Jobs: 1 (f=1): [W_______________________________________PP] [99.0% done] [     0
Jobs: 0 (f=0) [eta 00m:02s]



Mathieu

> Mathieu
> 
> > cfq (2.6.28.1-patch1)      517             3086
> > 
> > It looks like we are getting somewhere :) Are there any specific
> > queue_depth, slice_async_rq, quantum variations you would like to be
> > tested ?
> > 
> > For reference, I attach my ssh-like job file (again) to this mail.
> > 
> > Mathieu
> > 
> > 
> > [job1]
> > rw=write
> > size=10240m
> > direct=0
> > blocksize=1024k
> > 
> > [global]
> > rw=randread
> > size=2048k
> > filesize=30m
> > direct=0
> > bsrange=4k-44k
> > 
> > [file1]
> > startdelay=0
> > 
> > [file2]
> > startdelay=4
> > 
> > [file3]
> > startdelay=8
> > 
> > [file4]
> > startdelay=12
> > 
> > [file5]
> > startdelay=16
> > 
> > [file6]
> > startdelay=20
> > 
> > [file7]
> > startdelay=24
> > 
> > [file8]
> > startdelay=28
> > 
> > [file9]
> > startdelay=32
> > 
> > [file10]
> > startdelay=36
> > 
> > [file11]
> > startdelay=40
> > 
> > [file12]
> > startdelay=44
> > 
> > [file13]
> > startdelay=48
> > 
> > [file14]
> > startdelay=52
> > 
> > [file15]
> > startdelay=56
> > 
> > [file16]
> > startdelay=60
> > 
> > [file17]
> > startdelay=64
> > 
> > [file18]
> > startdelay=68
> > 
> > [file19]
> > startdelay=72
> > 
> > [file20]
> > startdelay=76
> > 
> > [file21]
> > startdelay=80
> > 
> > [file22]
> > startdelay=84
> > 
> > [file23]
> > startdelay=88
> > 
> > [file24]
> > startdelay=92
> > 
> > [file25]
> > startdelay=96
> > 
> > [file26]
> > startdelay=100
> > 
> > [file27]
> > startdelay=104
> > 
> > [file28]
> > startdelay=108
> > 
> > [file29]
> > startdelay=112
> > 
> > [file30]
> > startdelay=116
> > 
> > [file31]
> > startdelay=120
> > 
> > [file32]
> > startdelay=124
> > 
> > [file33]
> > startdelay=128
> > 
> > [file34]
> > startdelay=132
> > 
> > [file35]
> > startdelay=134
> > 
> > [file36]
> > startdelay=138
> > 
> > [file37]
> > startdelay=142
> > 
> > [file38]
> > startdelay=146
> > 
> > [file39]
> > startdelay=150
> > 
> > [file40]
> > startdelay=200
> > 
> > [file41]
> > startdelay=260
> > 
> > -- 
> > Mathieu Desnoyers
> > OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
> 
> -- 
> Mathieu Desnoyers
> OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [ltt-dev] [RFC PATCH] block: Fix bio merge induced high I/O latency
  2009-01-20 23:27               ` Mathieu Desnoyers
  2009-01-21  0:25                 ` Mathieu Desnoyers
@ 2009-01-23  3:21                 ` KOSAKI Motohiro
  2009-01-23  4:03                   ` Mathieu Desnoyers
  2009-02-10  3:36                   ` [PATCH] mm fix page writeback accounting to fix oom condition under heavy I/O Mathieu Desnoyers
  1 sibling, 2 replies; 39+ messages in thread
From: KOSAKI Motohiro @ 2009-01-23  3:21 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: kosaki.motohiro, Jens Axboe, akpm, ltt-dev, Linus Torvalds,
	Ingo Molnar, linux-kernel

> So, I ran the tests with my corrected patch, and the results are very
> good !
> 
> "incoming ssh connexion" test
> 
> "config 2.6.28 cfq"
> Linux 2.6.28
> /sys/block/sd{a,b}/device/queue_depth = 31 (default)
> /sys/block/sd{a,b}/queue/iosched/slice_async_rq = 2 (default)
> /sys/block/sd{a,b}/queue/iosched/quantum = 4 (default)
> 
> "config 2.6.28.1-patch1"
> Linux 2.6.28.1
> Corrected cfq patch applied
> echo 1 > /sys/block/sd{a,b}/device/queue_depth
> echo 1 > /sys/block/sd{a,b}/queue/iosched/slice_async_rq
> echo 1 > /sys/block/sd{a,b}/queue/iosched/quantum
> 
> On /dev/sda :
> 
> I/O scheduler        runt-min (msec)   runt-max (msec)
> cfq (2.6.28 cfq)          523             6637
> cfq (2.6.28.1-patch1)     579             2082
> 
> On raid1 :
> 
> I/O scheduler        runt-min (msec)   runt-max (msec)
> cfq (2.6.28 cfq)           523            28216
> cfq (2.6.28.1-patch1)      517             3086

Congraturation.
In university machine room (at least, the university in japan), 
parallel ssh workload freqently happend.

I like this patch :)



> 
> It looks like we are getting somewhere :) Are there any specific
> queue_depth, slice_async_rq, quantum variations you would like to be
> tested ?
> 
> For reference, I attach my ssh-like job file (again) to this mail.
> 
> Mathieu
> 
> 
> [job1]
> rw=write
> size=10240m
> direct=0
> blocksize=1024k
> 
> [global]
> rw=randread
> size=2048k
> filesize=30m
> direct=0
> bsrange=4k-44k
> 
> [file1]
> startdelay=0
> 
> [file2]
> startdelay=4
> 
> [file3]
> startdelay=8
> 
> [file4]
> startdelay=12
> 
> [file5]
> startdelay=16
> 
> [file6]
> startdelay=20
> 
> [file7]
> startdelay=24
> 
> [file8]
> startdelay=28
> 
> [file9]
> startdelay=32
> 
> [file10]
> startdelay=36
> 
> [file11]
> startdelay=40
> 
> [file12]
> startdelay=44
> 
> [file13]
> startdelay=48
> 
> [file14]
> startdelay=52
> 
> [file15]
> startdelay=56
> 
> [file16]
> startdelay=60
> 
> [file17]
> startdelay=64
> 
> [file18]
> startdelay=68
> 
> [file19]
> startdelay=72
> 
> [file20]
> startdelay=76
> 
> [file21]
> startdelay=80
> 
> [file22]
> startdelay=84
> 
> [file23]
> startdelay=88
> 
> [file24]
> startdelay=92
> 
> [file25]
> startdelay=96
> 
> [file26]
> startdelay=100
> 
> [file27]
> startdelay=104
> 
> [file28]
> startdelay=108
> 
> [file29]
> startdelay=112
> 
> [file30]
> startdelay=116
> 
> [file31]
> startdelay=120
> 
> [file32]
> startdelay=124
> 
> [file33]
> startdelay=128
> 
> [file34]
> startdelay=132
> 
> [file35]
> startdelay=134
> 
> [file36]
> startdelay=138
> 
> [file37]
> startdelay=142
> 
> [file38]
> startdelay=146
> 
> [file39]
> startdelay=150
> 
> [file40]
> startdelay=200
> 
> [file41]
> startdelay=260
> 
> -- 
> Mathieu Desnoyers
> OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
> 
> _______________________________________________
> ltt-dev mailing list
> ltt-dev@lists.casi.polymtl.ca
> http://lists.casi.polymtl.ca/cgi-bin/mailman/listinfo/ltt-dev




^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [ltt-dev] [RFC PATCH] block: Fix bio merge induced high I/O latency
  2009-01-23  3:21                 ` [ltt-dev] " KOSAKI Motohiro
@ 2009-01-23  4:03                   ` Mathieu Desnoyers
  2009-02-10  3:36                   ` [PATCH] mm fix page writeback accounting to fix oom condition under heavy I/O Mathieu Desnoyers
  1 sibling, 0 replies; 39+ messages in thread
From: Mathieu Desnoyers @ 2009-01-23  4:03 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-kernel, ltt-dev, Jens Axboe, akpm, Linus Torvalds, Ingo Molnar

* KOSAKI Motohiro (kosaki.motohiro@jp.fujitsu.com) wrote:
> > So, I ran the tests with my corrected patch, and the results are very
> > good !
> > 
> > "incoming ssh connexion" test
> > 
> > "config 2.6.28 cfq"
> > Linux 2.6.28
> > /sys/block/sd{a,b}/device/queue_depth = 31 (default)
> > /sys/block/sd{a,b}/queue/iosched/slice_async_rq = 2 (default)
> > /sys/block/sd{a,b}/queue/iosched/quantum = 4 (default)
> > 
> > "config 2.6.28.1-patch1"
> > Linux 2.6.28.1
> > Corrected cfq patch applied
> > echo 1 > /sys/block/sd{a,b}/device/queue_depth
> > echo 1 > /sys/block/sd{a,b}/queue/iosched/slice_async_rq
> > echo 1 > /sys/block/sd{a,b}/queue/iosched/quantum
> > 
> > On /dev/sda :
> > 
> > I/O scheduler        runt-min (msec)   runt-max (msec)
> > cfq (2.6.28 cfq)          523             6637
> > cfq (2.6.28.1-patch1)     579             2082
> > 
> > On raid1 :
> > 
> > I/O scheduler        runt-min (msec)   runt-max (msec)
> > cfq (2.6.28 cfq)           523            28216
> > cfq (2.6.28.1-patch1)      517             3086
> 
> Congraturation.
> In university machine room (at least, the university in japan), 
> parallel ssh workload freqently happend.
> 
> I like this patch :)
> 

Please see my today's posts with numbers taken after my Seagate firmware
upgrade.  The runt-max case is pretty hard to trigger "for sure" and I
had to do a few runs to trigger the problem. The latest tests are
better. E.g. the 3086msec is actually just because the problem has not
been hit.

But the 
echo 1 > /sys/block/sd{a,b}/device/queue_depth
echo 1 > /sys/block/sd{a,b}/queue/iosched/slice_async_rq
echo 1 > /sys/block/sd{a,b}/queue/iosched/quantum

Are definitely helping a lot, as my last numbers also show. The patch,
OTOH, degraded performances rather than making them better.

Mathieu

> 
> 
> > 
> > It looks like we are getting somewhere :) Are there any specific
> > queue_depth, slice_async_rq, quantum variations you would like to be
> > tested ?
> > 
> > For reference, I attach my ssh-like job file (again) to this mail.
> > 
> > Mathieu
> > 
> > 
> > [job1]
> > rw=write
> > size=10240m
> > direct=0
> > blocksize=1024k
> > 
> > [global]
> > rw=randread
> > size=2048k
> > filesize=30m
> > direct=0
> > bsrange=4k-44k
> > 
> > [file1]
> > startdelay=0
> > 
> > [file2]
> > startdelay=4
> > 
> > [file3]
> > startdelay=8
> > 
> > [file4]
> > startdelay=12
> > 
> > [file5]
> > startdelay=16
> > 
> > [file6]
> > startdelay=20
> > 
> > [file7]
> > startdelay=24
> > 
> > [file8]
> > startdelay=28
> > 
> > [file9]
> > startdelay=32
> > 
> > [file10]
> > startdelay=36
> > 
> > [file11]
> > startdelay=40
> > 
> > [file12]
> > startdelay=44
> > 
> > [file13]
> > startdelay=48
> > 
> > [file14]
> > startdelay=52
> > 
> > [file15]
> > startdelay=56
> > 
> > [file16]
> > startdelay=60
> > 
> > [file17]
> > startdelay=64
> > 
> > [file18]
> > startdelay=68
> > 
> > [file19]
> > startdelay=72
> > 
> > [file20]
> > startdelay=76
> > 
> > [file21]
> > startdelay=80
> > 
> > [file22]
> > startdelay=84
> > 
> > [file23]
> > startdelay=88
> > 
> > [file24]
> > startdelay=92
> > 
> > [file25]
> > startdelay=96
> > 
> > [file26]
> > startdelay=100
> > 
> > [file27]
> > startdelay=104
> > 
> > [file28]
> > startdelay=108
> > 
> > [file29]
> > startdelay=112
> > 
> > [file30]
> > startdelay=116
> > 
> > [file31]
> > startdelay=120
> > 
> > [file32]
> > startdelay=124
> > 
> > [file33]
> > startdelay=128
> > 
> > [file34]
> > startdelay=132
> > 
> > [file35]
> > startdelay=134
> > 
> > [file36]
> > startdelay=138
> > 
> > [file37]
> > startdelay=142
> > 
> > [file38]
> > startdelay=146
> > 
> > [file39]
> > startdelay=150
> > 
> > [file40]
> > startdelay=200
> > 
> > [file41]
> > startdelay=260
> > 
> > -- 
> > Mathieu Desnoyers
> > OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
> > 
> > _______________________________________________
> > ltt-dev mailing list
> > ltt-dev@lists.casi.polymtl.ca
> > http://lists.casi.polymtl.ca/cgi-bin/mailman/listinfo/ltt-dev
> 
> 
> 
> 
> _______________________________________________
> ltt-dev mailing list
> ltt-dev@lists.casi.polymtl.ca
> http://lists.casi.polymtl.ca/cgi-bin/mailman/listinfo/ltt-dev
> 

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH] block: Fix bio merge induced high I/O latency
  2009-01-20 12:28             ` Jens Axboe
  2009-01-20 14:22               ` [ltt-dev] " Mathieu Desnoyers
  2009-01-20 23:27               ` Mathieu Desnoyers
@ 2009-02-02  2:08               ` Mathieu Desnoyers
  2009-02-02 11:26                 ` Jens Axboe
  2 siblings, 1 reply; 39+ messages in thread
From: Mathieu Desnoyers @ 2009-02-02  2:08 UTC (permalink / raw)
  To: Jens Axboe; +Cc: akpm, Ingo Molnar, Linus Torvalds, linux-kernel, ltt-dev

Hi Jens,

I tried your patch at

http://bugzilla.kernel.org/attachment.cgi?id=20001

On a 2.6.29-rc3 kernel. I get the following OOPS just after I start
running the fio test. It happens after a few

cfq: moving ffff88043d4b42e0 to dispatch                         
cfq: moving ffff88043d4b4170 to dispatch                        

messages (~20).

Here is the oops :

------------[ cut here ]------------
kernel BUG at block/cfq-iosched.c:650!
invalid opcode: 0000 [#1] PREEMPT SMP
LTT NESTING LEVEL : 0
last sysfs file: /sys/block/sda/stat
CPU 2
Modules linked in: loop ltt_tracer ltt_trace_control ltt_userspa]
Pid: 2934, comm: kjournald Not tainted 2.6.29-rc3 #3
RIP: 0010:[<ffffffff80419c2b>]  [<ffffffff80419c2b>] cfq_remove_0
RSP: 0018:ffff88043b167c20  EFLAGS: 00010046
RAX: 0000000000000000 RBX: ffff88043fd9e088 RCX: 0000000000000001
RDX: 0000000000000010 RSI: ffff88043887b590 RDI: ffff88043887b590
RBP: ffff88043b167c50 R08: 0000000000000002 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff88043fd9e088
R13: ffff88043887b590 R14: ffff88043fc40200 R15: ffff88043fd9e088
FS:  0000000000000000(0000) GS:ffff88043e81a080(0000) knlGS:00000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00007f2a5f98b8c0 CR3: 000000043e8c4000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kjournald (pid: 2934, threadinfo ffff88043b166000, task )
Stack:
 000000000000003b ffff88043887b590 ffff88043fd9e088 ffff88043e5a0
 ffff88043fc40200 ffff88002809ed50 ffff88043b167c80 ffffffff8041d
 0000000000000001 ffff88043887b590 ffffe2001b805138 ffff88043e5a0
Call Trace:
 [<ffffffff80419e4d>] cfq_dispatch_insert+0x3d/0x70
 [<ffffffff80419f2f>] cfq_wait_on_page+0xaf/0xc0
 [<ffffffff804098ed>] elv_wait_on_page+0x1d/0x20
 [<ffffffff8040d207>] blk_backing_dev_wop+0x17/0x50
 [<ffffffff80301872>] sync_buffer+0x52/0x80
 [<ffffffff806a33b2>] __wait_on_bit+0x62/0x90
 [<ffffffff80301820>] ? sync_buffer+0x0/0x80
 [<ffffffff80301820>] ? sync_buffer+0x0/0x80
 [<ffffffff806a3459>] out_of_line_wait_on_bit+0x79/0x90
 [<ffffffff8025a8a0>] ? wake_bit_function+0x0/0x50
 [<ffffffff80301769>] __wait_on_buffer+0xf9/0x130
 [<ffffffff80379acd>] journal_commit_transaction+0x72d/0x1650
 [<ffffffff806a5c87>] ? _spin_unlock_irqrestore+0x47/0x80
 [<ffffffff8024dd2f>] ? try_to_del_timer_sync+0x5f/0x70
 [<ffffffff8037e488>] kjournald+0xe8/0x250
 [<ffffffff8025a860>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8037e3a0>] ? kjournald+0x0/0x250
 [<ffffffff8025a38e>] kthread+0x4e/0x90
 [<ffffffff8025a340>] ? kthread+0x0/0x90
 [<ffffffff8020db2a>] child_rip+0xa/0x20
 [<ffffffff8020d480>] ? restore_args+0x0/0x30
 [<ffffffff8025a340>] ? kthread+0x0/0x90
 [<ffffffff8020db20>] ? child_rip+0x0/0x20
Code: 4d 89 6d 00 49 8b 9d c0 00 00 00 41 8b 45 48 4c 8b 73 08 2
RIP  [<ffffffff80419c2b>] cfq_remove_request+0x6b/0x250
 RSP <ffff88043b167c20>
---[ end trace eab134a8bd405d05 ]---

It seems that the cfqq->queued[sync] counter should either be
incremented/decremented in the new cfq_wait_on_page, or that the fact
that the type of request (sync vs !sync) changes would not be taken care
of correctly. I have not looked at the code enough to find out exactly
what is happening, but I though you might have an idea of the cause.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH] block: Fix bio merge induced high I/O latency
  2009-02-02  2:08               ` [RFC PATCH] block: Fix bio merge induced high I/O latency Mathieu Desnoyers
@ 2009-02-02 11:26                 ` Jens Axboe
  2009-02-03  0:46                   ` Mathieu Desnoyers
  0 siblings, 1 reply; 39+ messages in thread
From: Jens Axboe @ 2009-02-02 11:26 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: akpm, Ingo Molnar, Linus Torvalds, linux-kernel, ltt-dev

On Sun, Feb 01 2009, Mathieu Desnoyers wrote:
> Hi Jens,
> 
> I tried your patch at
> 
> http://bugzilla.kernel.org/attachment.cgi?id=20001
> 
> On a 2.6.29-rc3 kernel. I get the following OOPS just after I start
> running the fio test. It happens after a few
> 
> cfq: moving ffff88043d4b42e0 to dispatch                         
> cfq: moving ffff88043d4b4170 to dispatch                        
> 
> messages (~20).
> 
> Here is the oops :
> 
> ------------[ cut here ]------------
> kernel BUG at block/cfq-iosched.c:650!
> invalid opcode: 0000 [#1] PREEMPT SMP
> LTT NESTING LEVEL : 0
> last sysfs file: /sys/block/sda/stat
> CPU 2
> Modules linked in: loop ltt_tracer ltt_trace_control ltt_userspa]
> Pid: 2934, comm: kjournald Not tainted 2.6.29-rc3 #3
> RIP: 0010:[<ffffffff80419c2b>]  [<ffffffff80419c2b>] cfq_remove_0
> RSP: 0018:ffff88043b167c20  EFLAGS: 00010046
> RAX: 0000000000000000 RBX: ffff88043fd9e088 RCX: 0000000000000001
> RDX: 0000000000000010 RSI: ffff88043887b590 RDI: ffff88043887b590
> RBP: ffff88043b167c50 R08: 0000000000000002 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: ffff88043fd9e088
> R13: ffff88043887b590 R14: ffff88043fc40200 R15: ffff88043fd9e088
> FS:  0000000000000000(0000) GS:ffff88043e81a080(0000) knlGS:00000
> CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> CR2: 00007f2a5f98b8c0 CR3: 000000043e8c4000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process kjournald (pid: 2934, threadinfo ffff88043b166000, task )
> Stack:
>  000000000000003b ffff88043887b590 ffff88043fd9e088 ffff88043e5a0
>  ffff88043fc40200 ffff88002809ed50 ffff88043b167c80 ffffffff8041d
>  0000000000000001 ffff88043887b590 ffffe2001b805138 ffff88043e5a0
> Call Trace:
>  [<ffffffff80419e4d>] cfq_dispatch_insert+0x3d/0x70
>  [<ffffffff80419f2f>] cfq_wait_on_page+0xaf/0xc0
>  [<ffffffff804098ed>] elv_wait_on_page+0x1d/0x20
>  [<ffffffff8040d207>] blk_backing_dev_wop+0x17/0x50
>  [<ffffffff80301872>] sync_buffer+0x52/0x80
>  [<ffffffff806a33b2>] __wait_on_bit+0x62/0x90
>  [<ffffffff80301820>] ? sync_buffer+0x0/0x80
>  [<ffffffff80301820>] ? sync_buffer+0x0/0x80
>  [<ffffffff806a3459>] out_of_line_wait_on_bit+0x79/0x90
>  [<ffffffff8025a8a0>] ? wake_bit_function+0x0/0x50
>  [<ffffffff80301769>] __wait_on_buffer+0xf9/0x130
>  [<ffffffff80379acd>] journal_commit_transaction+0x72d/0x1650
>  [<ffffffff806a5c87>] ? _spin_unlock_irqrestore+0x47/0x80
>  [<ffffffff8024dd2f>] ? try_to_del_timer_sync+0x5f/0x70
>  [<ffffffff8037e488>] kjournald+0xe8/0x250
>  [<ffffffff8025a860>] ? autoremove_wake_function+0x0/0x40
>  [<ffffffff8037e3a0>] ? kjournald+0x0/0x250
>  [<ffffffff8025a38e>] kthread+0x4e/0x90
>  [<ffffffff8025a340>] ? kthread+0x0/0x90
>  [<ffffffff8020db2a>] child_rip+0xa/0x20
>  [<ffffffff8020d480>] ? restore_args+0x0/0x30
>  [<ffffffff8025a340>] ? kthread+0x0/0x90
>  [<ffffffff8020db20>] ? child_rip+0x0/0x20
> Code: 4d 89 6d 00 49 8b 9d c0 00 00 00 41 8b 45 48 4c 8b 73 08 2
> RIP  [<ffffffff80419c2b>] cfq_remove_request+0x6b/0x250
>  RSP <ffff88043b167c20>
> ---[ end trace eab134a8bd405d05 ]---
> 
> It seems that the cfqq->queued[sync] counter should either be
> incremented/decremented in the new cfq_wait_on_page, or that the fact
> that the type of request (sync vs !sync) changes would not be taken care
> of correctly. I have not looked at the code enough to find out exactly
> what is happening, but I though you might have an idea of the cause.

Just ignore the patch for now, I'm not going to be spending more time on
it. It was just an attempt at a quick test, I don't think this approach
is very feasible since it doesn't appear to be the root of the problem.
In any case, were we to continue on this path, the accounting logic in
CFQ would have to be adjusted for this new behaviour. Otherwise there's
a big risk of giving great preference to async writeout once things get
tight.

It's also working around the real problem for this specific issue, which
is that you just don't want to have sync apps blocked waiting for async
writeout in the first place.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH] block: Fix bio merge induced high I/O latency
  2009-02-02 11:26                 ` Jens Axboe
@ 2009-02-03  0:46                   ` Mathieu Desnoyers
  0 siblings, 0 replies; 39+ messages in thread
From: Mathieu Desnoyers @ 2009-02-03  0:46 UTC (permalink / raw)
  To: Jens Axboe; +Cc: akpm, Ingo Molnar, Linus Torvalds, linux-kernel, ltt-dev

* Jens Axboe (jens.axboe@oracle.com) wrote:
> It's also working around the real problem for this specific issue, which
> is that you just don't want to have sync apps blocked waiting for async
> writeout in the first place.
> 

Maybe I could help to identify criterion for such sync requests which
are treated as async. From a newcomer's look at the situation, I would
assume that :

- Small I/O requests
- I/O requests caused by major page faults, except those caused by
  access to mmapped files which result in large consecutive file
  reads/writes.

Should never *ever* fall into the async I/O request path. Am I correct ?
If yes, then I could trigger some tracing test cases and identify the
faulty scenarios with LTTng. Maybe the solution does not sit only within
the block I/O layer :

I guess we would also have to find out what is considered a "large" and
a "small" I/O request. I think using open() flags to specify if
I/O is expected to be synchronous or asynchronous for a particular file
would be a good start (AFAIK, only O_DIRECT seems to be close to this,
but it also has the side-effect of not using any kernel buffering, which
I am not sure is wanted in every case). If this implies adding new
flags to open(), then supporting older apps could be done by heuristics
on the size of the requests. New applications which have very specific
needs (e.g. large synchronous I/O) could be tuned with the new flags.
Any small request coming from the page fault handler would be treated as
synchronous. Requests coming from the page fault handler on a
particular mmapped file would behave following the sync/async flags of
the associated open(). If not flag is specified, the heuristic would
apply to the resulting merged requests from the page fault handler.
Therefore, large consecutive reads of mmapped files would fall in the
"async" category by default. mmap of shared libraries and memory mapping
done by exec() should clearly specify the "sync" flag, because those
accesses *will* cause delays when the application needs to be executed.

Hopefully what I am saying here makes sense. If you have links to some
background information to point me to so I get a better understanding of
how async vs sync requests are handled by the CFQ, I would greatly
appreciate.

Best regards,

Mathieu




-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH] mm fix page writeback accounting to fix oom condition under heavy I/O
  2009-01-23  3:21                 ` [ltt-dev] " KOSAKI Motohiro
  2009-01-23  4:03                   ` Mathieu Desnoyers
@ 2009-02-10  3:36                   ` Mathieu Desnoyers
  2009-02-10  3:55                     ` Nick Piggin
  2009-02-10  5:23                     ` Linus Torvalds
  1 sibling, 2 replies; 39+ messages in thread
From: Mathieu Desnoyers @ 2009-02-10  3:36 UTC (permalink / raw)
  To: KOSAKI Motohiro, Jens Axboe, akpm, Peter Zijlstra,
	Linus Torvalds, Ingo Molnar, thomas.pi, Yuriy Lalym
  Cc: ltt-dev, linux-kernel, linux-mm

Related to :
http://bugzilla.kernel.org/show_bug.cgi?id=12309

Very annoying I/O latencies (20-30 seconds) are occuring under heavy I/O since
~2.6.18.

Yuriy Lalym noticed that the oom killer was eventually called. So I took a look
at /proc/meminfo and noticed that under my test case (fio job created from a
LTTng block I/O trace, reproducing dd writing to a 20GB file and ssh sessions
being opened), the Inactive(file) value increased, and the total memory consumed
increased until only 80kB (out of 16GB) were left.

So I first used cgroups to limit the memory usable by fio (or dd). This seems to
fix the problem.

Thomas noted that there seems to be a problem with pages being passed to the
block I/O elevator not being counted as dirty. I looked at
clear_page_dirty_for_io and noticed that page_mkclean clears the dirty bit and
then set_page_dirty(page) is called on the page. This calls
mm/page-writeback.c:set_page_dirty(). I assume that the
mapping->a_ops->set_page_dirty is NULL, so it calls
buffer.c:__set_page_dirty_buffers(). This calls set_buffer_dirty(bh).

So we come back in clear_page_dirty_for_io where we decrement the dirty
accounting. This is a problem, because we assume that the block layer will
re-increment it when it gets the page, but because the buffer is marked as
dirty, this won't happen.

So this patch fixes this behavior by only decrementing the page accounting
_after_ the block I/O writepage has been done.

The effect on my workload is that the memory stops being completely filled by
page cache under heavy I/O. The vfs_cache_pressure value seems to work again.

However, this does not fully solve the high latency issue : when there are
enough vfs pages in cache that the pages are being written directly to disk
rather than left in the page cache, the CFQ I/O scheduler does not seem to be
able to correctly prioritize I/O requests. I think this might be because when
this high pressure point is reached, all tasks are blocked in the same way when
they try to add pages to the page cache, independently of their I/O priority.
Any idea on how to fix this is welcome.

Related commits :
commit 7658cc289288b8ae7dd2c2224549a048431222b3
Author: Linus Torvalds <torvalds@macmini.osdl.org>
Date:   Fri Dec 29 10:00:58 2006 -0800
    VM: Fix nasty and subtle race in shared mmap'ed page writeback

commit 8c08540f8755c451d8b96ea14cfe796bc3cd712d
Author: Andrew Morton <akpm@osdl.org>
Date:   Sun Dec 10 02:19:24 2006 -0800
    [PATCH] clean up __set_page_dirty_nobuffers()

Both were merged Dec 2006, which is between kernel v2.6.19 and v2.6.20-rc3.

This patch applies on 2.6.29-rc3.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
CC: Jens Axboe <jens.axboe@oracle.com>
CC: akpm@linux-foundation.org
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Ingo Molnar <mingo@elte.hu>
CC: thomas.pi@arcor.dea
CC: Yuriy Lalym <ylalym@gmail.com>
---
 mm/page-writeback.c |   33 +++++++++++++++++++++++++--------
 1 file changed, 25 insertions(+), 8 deletions(-)

Index: linux-2.6-lttng/mm/page-writeback.c
===================================================================
--- linux-2.6-lttng.orig/mm/page-writeback.c	2009-02-09 20:18:41.000000000 -0500
+++ linux-2.6-lttng/mm/page-writeback.c	2009-02-09 20:42:39.000000000 -0500
@@ -945,6 +945,7 @@ int write_cache_pages(struct address_spa
 	int cycled;
 	int range_whole = 0;
 	long nr_to_write = wbc->nr_to_write;
+	int lazyaccounting;
 
 	if (wbc->nonblocking && bdi_write_congested(bdi)) {
 		wbc->encountered_congestion = 1;
@@ -1028,10 +1029,18 @@ continue_unlock:
 			}
 
 			BUG_ON(PageWriteback(page));
-			if (!clear_page_dirty_for_io(page))
+			lazyaccounting = clear_page_dirty_for_io(page);
+			if (!lazyaccounting)
 				goto continue_unlock;
 
 			ret = (*writepage)(page, wbc, data);
+
+			if (lazyaccounting == 2) {
+				dec_zone_page_state(page, NR_FILE_DIRTY);
+				dec_bdi_stat(mapping->backing_dev_info,
+						BDI_RECLAIMABLE);
+			}
+
 			if (unlikely(ret)) {
 				if (ret == AOP_WRITEPAGE_ACTIVATE) {
 					unlock_page(page);
@@ -1149,6 +1158,7 @@ int write_one_page(struct page *page, in
 {
 	struct address_space *mapping = page->mapping;
 	int ret = 0;
+	int lazyaccounting;
 	struct writeback_control wbc = {
 		.sync_mode = WB_SYNC_ALL,
 		.nr_to_write = 1,
@@ -1159,7 +1169,8 @@ int write_one_page(struct page *page, in
 	if (wait)
 		wait_on_page_writeback(page);
 
-	if (clear_page_dirty_for_io(page)) {
+	lazyaccounting = clear_page_dirty_for_io(page);
+	if (lazyaccounting) {
 		page_cache_get(page);
 		ret = mapping->a_ops->writepage(page, &wbc);
 		if (ret == 0 && wait) {
@@ -1167,6 +1178,11 @@ int write_one_page(struct page *page, in
 			if (PageError(page))
 				ret = -EIO;
 		}
+		if (lazyaccounting == 2) {
+			dec_zone_page_state(page, NR_FILE_DIRTY);
+			dec_bdi_stat(mapping->backing_dev_info,
+					BDI_RECLAIMABLE);
+		}
 		page_cache_release(page);
 	} else {
 		unlock_page(page);
@@ -1312,6 +1328,11 @@ EXPORT_SYMBOL(set_page_dirty_lock);
  *
  * This incoherency between the page's dirty flag and radix-tree tag is
  * unfortunate, but it only exists while the page is locked.
+ *
+ * Return values :
+ * 0 : page is not dirty
+ * 1 : page is dirty, no lazy accounting update still have to be performed
+ * 2 : page is direct *and* lazy accounting update must still be performed
  */
 int clear_page_dirty_for_io(struct page *page)
 {
@@ -1358,12 +1379,8 @@ int clear_page_dirty_for_io(struct page 
 		 * the desired exclusion. See mm/memory.c:do_wp_page()
 		 * for more comments.
 		 */
-		if (TestClearPageDirty(page)) {
-			dec_zone_page_state(page, NR_FILE_DIRTY);
-			dec_bdi_stat(mapping->backing_dev_info,
-					BDI_RECLAIMABLE);
-			return 1;
-		}
+		if (TestClearPageDirty(page))
+			return 2;
 		return 0;
 	}
 	return TestClearPageDirty(page);

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] mm fix page writeback accounting to fix oom condition under heavy I/O
  2009-02-10  3:36                   ` [PATCH] mm fix page writeback accounting to fix oom condition under heavy I/O Mathieu Desnoyers
@ 2009-02-10  3:55                     ` Nick Piggin
  2009-02-10  5:23                     ` Linus Torvalds
  1 sibling, 0 replies; 39+ messages in thread
From: Nick Piggin @ 2009-02-10  3:55 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: KOSAKI Motohiro, Jens Axboe, akpm, Peter Zijlstra,
	Linus Torvalds, Ingo Molnar, thomas.pi, Yuriy Lalym, ltt-dev,
	linux-kernel, linux-mm

On Tuesday 10 February 2009 14:36:53 Mathieu Desnoyers wrote:
> Related to :
> http://bugzilla.kernel.org/show_bug.cgi?id=12309
>
> Very annoying I/O latencies (20-30 seconds) are occuring under heavy I/O
> since ~2.6.18.
>
> Yuriy Lalym noticed that the oom killer was eventually called. So I took a
> look at /proc/meminfo and noticed that under my test case (fio job created
> from a LTTng block I/O trace, reproducing dd writing to a 20GB file and ssh
> sessions being opened), the Inactive(file) value increased, and the total
> memory consumed increased until only 80kB (out of 16GB) were left.
>
> So I first used cgroups to limit the memory usable by fio (or dd). This
> seems to fix the problem.
>
> Thomas noted that there seems to be a problem with pages being passed to
> the block I/O elevator not being counted as dirty. I looked at
> clear_page_dirty_for_io and noticed that page_mkclean clears the dirty bit
> and then set_page_dirty(page) is called on the page. This calls
> mm/page-writeback.c:set_page_dirty(). I assume that the
> mapping->a_ops->set_page_dirty is NULL, so it calls
> buffer.c:__set_page_dirty_buffers(). This calls set_buffer_dirty(bh).
>
> So we come back in clear_page_dirty_for_io where we decrement the dirty
> accounting. This is a problem, because we assume that the block layer will
> re-increment it when it gets the page, but because the buffer is marked as
> dirty, this won't happen.
>
> So this patch fixes this behavior by only decrementing the page accounting
> _after_ the block I/O writepage has been done.
>
> The effect on my workload is that the memory stops being completely filled
> by page cache under heavy I/O. The vfs_cache_pressure value seems to work
> again.

I don't think we're supposed to assume the block layer will re-increment
the dirty count? It should be all in the VM. And the VM should increment
writeback count before sending it to the block device, and dirty page
throttling also takes into account the number of writeback pages, so it
should not be allowed to fill up memory with dirty pages even if the
block device queue size is unlimited.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] mm fix page writeback accounting to fix oom condition under heavy I/O
  2009-02-10  3:36                   ` [PATCH] mm fix page writeback accounting to fix oom condition under heavy I/O Mathieu Desnoyers
  2009-02-10  3:55                     ` Nick Piggin
@ 2009-02-10  5:23                     ` Linus Torvalds
  2009-02-10  5:56                       ` Nick Piggin
  2009-02-10  6:12                       ` Mathieu Desnoyers
  1 sibling, 2 replies; 39+ messages in thread
From: Linus Torvalds @ 2009-02-10  5:23 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: KOSAKI Motohiro, Jens Axboe, akpm, Peter Zijlstra, Ingo Molnar,
	thomas.pi, Yuriy Lalym, ltt-dev, linux-kernel, linux-mm



On Mon, 9 Feb 2009, Mathieu Desnoyers wrote:
> 
> So this patch fixes this behavior by only decrementing the page accounting
> _after_ the block I/O writepage has been done.

This makes no sense, really.

Or rather, I don't mind the notion of updating the counters only after IO 
per se, and _that_ part of it probably makes sense. But why is it that you 
only then fix up two of the call-sites. There's a lot more call-sites than 
that for this function. 

So if this really makes a big difference, that's an interesting starting 
point for discussion, but I don't see how this particular patch could 
possibly be the right thing to do.

			Linus

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] mm fix page writeback accounting to fix oom condition under heavy I/O
  2009-02-10  5:23                     ` Linus Torvalds
@ 2009-02-10  5:56                       ` Nick Piggin
  2009-02-10  6:12                       ` Mathieu Desnoyers
  1 sibling, 0 replies; 39+ messages in thread
From: Nick Piggin @ 2009-02-10  5:56 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Mathieu Desnoyers, KOSAKI Motohiro, Jens Axboe, akpm,
	Peter Zijlstra, Ingo Molnar, thomas.pi, Yuriy Lalym, ltt-dev,
	linux-kernel, linux-mm

On Tuesday 10 February 2009 16:23:56 Linus Torvalds wrote:
> On Mon, 9 Feb 2009, Mathieu Desnoyers wrote:
> > So this patch fixes this behavior by only decrementing the page
> > accounting _after_ the block I/O writepage has been done.
>
> This makes no sense, really.
>
> Or rather, I don't mind the notion of updating the counters only after IO
> per se, and _that_ part of it probably makes sense. But why is it that you
> only then fix up two of the call-sites. There's a lot more call-sites than
> that for this function.

Well if you do that, then I'd think you also have to change some
calculations that today use dirty+writeback.

In some ways it does make sense, but OTOH it is natural in the
pagecache since it was introduced to treat writeback as basically
equivalent to dirty. So writeback && !dirty pages shouldn't cause
things to blow up, or if it does then hopefully it is a simple
bug somewhere.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] mm fix page writeback accounting to fix oom condition under heavy I/O
  2009-02-10  5:23                     ` Linus Torvalds
  2009-02-10  5:56                       ` Nick Piggin
@ 2009-02-10  6:12                       ` Mathieu Desnoyers
  1 sibling, 0 replies; 39+ messages in thread
From: Mathieu Desnoyers @ 2009-02-10  6:12 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: KOSAKI Motohiro, Jens Axboe, akpm, Peter Zijlstra, Ingo Molnar,
	thomas.pi, Yuriy Lalym, ltt-dev, linux-kernel, linux-mm

* Linus Torvalds (torvalds@linux-foundation.org) wrote:
> 
> 
> On Mon, 9 Feb 2009, Mathieu Desnoyers wrote:
> > 
> > So this patch fixes this behavior by only decrementing the page accounting
> > _after_ the block I/O writepage has been done.
> 
> This makes no sense, really.
> 
> Or rather, I don't mind the notion of updating the counters only after IO 
> per se, and _that_ part of it probably makes sense. But why is it that you 
> only then fix up two of the call-sites. There's a lot more call-sites than 
> that for this function. 
> 
> So if this really makes a big difference, that's an interesting starting 
> point for discussion, but I don't see how this particular patch could 
> possibly be the right thing to do.
> 

Yes, you are right. Looking in more details at /proc/meminfo under the
workload, I notice this :

MemTotal:       16028812 kB
MemFree:        13651440 kB
Buffers:            8944 kB
Cached:          2209456 kB   <--- increments up to ~16GB

        cached = global_page_state(NR_FILE_PAGES) -
                        total_swapcache_pages - i.bufferram;

SwapCached:            0 kB
Active:            34668 kB
Inactive:        2200668 kB   <--- also

                K(pages[LRU_INACTIVE_ANON] + pages[LRU_INACTIVE_FILE]),

Active(anon):      17136 kB
Inactive(anon):        0 kB
Active(file):      17532 kB
Inactive(file):  2200668 kB   <--- also

                K(pages[LRU_INACTIVE_FILE]),

Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:      19535024 kB
SwapFree:       19535024 kB
Dirty:           1159036 kB
Writeback:             0 kB  <--- stays close to 0
AnonPages:         17060 kB
Mapped:             9476 kB
Slab:              96188 kB
SReclaimable:      79776 kB
SUnreclaim:        16412 kB
PageTables:         3364 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    27549428 kB
Committed_AS:      54292 kB
VmallocTotal:   34359738367 kB
VmallocUsed:        9960 kB
VmallocChunk:   34359727667 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:        7552 kB
DirectMap2M:    16769024 kB

So I think simply substracting K(pages[LRU_INACTIVE_FILE]) from
avail_dirty in clip_bdi_dirty_limit() and to consider it in
balance_dirty_pages() and throttle_vm_writeout() would probably make my
problem go away, but I would like to understand exactly why this is
needed and if I would need to consider other types of page counts that
would have been forgotten.

Mathieu

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2009-02-10  6:12 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-01-17  0:44 [Regression] High latency when doing large I/O Mathieu Desnoyers
2009-01-17 16:26 ` [RFC PATCH] block: Fix bio merge induced high I/O latency Mathieu Desnoyers
2009-01-17 16:50   ` Leon Woestenberg
2009-01-17 17:15     ` Mathieu Desnoyers
2009-01-17 19:04   ` Jens Axboe
2009-01-18 21:12     ` Mathieu Desnoyers
2009-01-18 21:27       ` Mathieu Desnoyers
2009-01-19 18:26       ` Jens Axboe
2009-01-20  2:10         ` Mathieu Desnoyers
2009-01-20  7:37           ` Jens Axboe
2009-01-20 12:28             ` Jens Axboe
2009-01-20 14:22               ` [ltt-dev] " Mathieu Desnoyers
2009-01-20 14:24                 ` Jens Axboe
2009-01-20 15:42                   ` Mathieu Desnoyers
2009-01-20 23:06                     ` Mathieu Desnoyers
2009-01-20 23:27               ` Mathieu Desnoyers
2009-01-21  0:25                 ` Mathieu Desnoyers
2009-01-21  4:38                   ` Ben Gamari
2009-01-21  4:54                     ` [ltt-dev] " Mathieu Desnoyers
2009-01-21  6:17                       ` Ben Gamari
2009-01-22 22:59                   ` Mathieu Desnoyers
2009-01-23  3:21                 ` [ltt-dev] " KOSAKI Motohiro
2009-01-23  4:03                   ` Mathieu Desnoyers
2009-02-10  3:36                   ` [PATCH] mm fix page writeback accounting to fix oom condition under heavy I/O Mathieu Desnoyers
2009-02-10  3:55                     ` Nick Piggin
2009-02-10  5:23                     ` Linus Torvalds
2009-02-10  5:56                       ` Nick Piggin
2009-02-10  6:12                       ` Mathieu Desnoyers
2009-02-02  2:08               ` [RFC PATCH] block: Fix bio merge induced high I/O latency Mathieu Desnoyers
2009-02-02 11:26                 ` Jens Axboe
2009-02-03  0:46                   ` Mathieu Desnoyers
2009-01-20 13:45             ` [ltt-dev] " Mathieu Desnoyers
2009-01-20 20:22             ` Ben Gamari
2009-01-20 22:23               ` Ben Gamari
2009-01-20 23:05                 ` Mathieu Desnoyers
2009-01-22  2:35               ` Ben Gamari
2009-01-19 15:45     ` Nikanth K
2009-01-19 18:23       ` Jens Axboe
2009-01-17 20:03   ` Ben Gamari

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).