linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chris Mason <mason@suse.com>
To: Jens Axboe <axboe@suse.de>
Cc: Andrea Arcangeli <andrea@suse.de>, Andrew Morton <akpm@osdl.org>,
	marcelo@conectiva.com.br, linux-kernel@vger.kernel.org
Subject: Re: RFC on io-stalls patch
Date: 16 Jul 2003 13:06:21 -0400	[thread overview]
Message-ID: <1058375181.4016.147.camel@tiny.suse.com> (raw)
In-Reply-To: <1058278688.4012.57.camel@tiny.suse.com>

[-- Attachment #1: Type: text/plain, Size: 4890 bytes --]

Ok, I grabbed Jens' patch from yesterday and made one small change to
limit the total amount of io past blk_oversized_batch.  The new patch is
attached as axboe-2l.diff.

We should probably drop the BH_Sync stuff, I need more time to make sure
it is worth while.  Also, the patch needs to be changed to calculate
rlim once instead of in every get_request call.  But I wanted to get
some numbers out.

Starting with my simple script yesterday to combine rio write loads and
tiobench (8 tiotest threads):

2.4.22-axboe2 was Jens' patch (adds q->pending[rw]), 2.4.22-axboe-2l is
from the patch attached.

Sequential Reads
                 File  Blk   Num                   Avg      Maximum      Lat%     Lat%    CPU
Identifier       Size  Size  Thr   Rate  (CPU%)  Latency    Latency      >2s      >10s    Eff
---------------- ------ ----- ---  ------ ------ --------- -----------  -------- -------- -----
2.4.21           1792  4096    8    8.15 3.344%     3.564     9557.91   0.00022  0.00000   244
2.4.22-pre5      1792  4096    8    5.15 2.168%     5.964     2781.71   0.00022  0.00000   237
2.4.22-axboe2    1792  4096    8    5.07 2.119%     6.105     1095.73   0.00000  0.00000   239
2.4.22-axboe-2l  1792  4096    8    5.14 2.211%     5.998      970.74   0.00000  0.00000   232

Random Reads
                 File  Blk   Num                   Avg      Maximum      Lat%     Lat%    CPU
Identifier       Size  Size  Thr   Rate  (CPU%)  Latency    Latency      >2s      >10s    Eff
---------------- ------ ----- ---  ------ ------ --------- -----------  -------- -------- -----
2.4.21           1792  4096    8    0.83 0.318%    28.295      529.56   0.00000  0.00000   260
2.4.22-pre5      1792  4096    8    0.53 0.203%    45.613      770.82   0.00000  0.00000   260
2.4.22-axboe2    1792  4096    8    0.66 0.503%    41.791      637.23   0.00000  0.00000   130
2.4.22-axboe-2l  1792  4096    8    0.64 0.163%    46.923      742.38   0.00000  0.00000   391

Sequential Writes
                 File  Blk   Num                   Avg      Maximum      Lat%     Lat%    CPU
Identifier       Size  Size  Thr   Rate  (CPU%)  Latency    Latency      >2s      >10s    Eff
---------------- ------ ----- ---  ------ ------ --------- -----------  -------- -------- -----
2.4.21           1792  4096    8   15.51 11.40%     1.243    11776.20   0.01460  0.00109   136
2.4.22-pre5      1792  4096    8   11.99 9.960%     1.695     4628.58   0.00371  0.00000   120
2.4.22-axboe2    1792  4096    8   11.75 9.324%     1.755     2406.28   0.00065  0.00000   126
2.4.22-axboe-2l  1792  4096    8   11.73 9.320%     1.752     2924.61   0.00109  0.00000   126

Random Writes
                 File  Blk   Num                   Avg      Maximum      Lat%     Lat%    CPU
Identifier       Size  Size  Thr   Rate  (CPU%)  Latency    Latency      >2s      >10s    Eff
---------------- ------ ----- ---  ------ ------ --------- -----------  -------- -------- -----
2.4.21           1792  4096    8    0.41 0.318%     0.033       18.81   0.00000  0.00000   130
2.4.22-pre5      1792  4096    8    0.57 1.234%     0.041       37.63   0.00000  0.00000    46
2.4.22-axboe2    1792  4096    8    0.64 1.435%     0.040       29.32   0.00000  0.00000    45
2.4.22-axboe-2l  1792  4096    8    0.64 1.355%     0.033       20.79   0.00000  0.00000    47

So the max sequential read and write latencies were helped by
elevator-low-latency while the averages were not.  A good std deviation
setup would help here, but tiobench's percentage setup shows some
numbers.  In practice, those max latencies are visible for interactive
loads.

Why did max sequential write latencies go down with -axboe?  my guess is
that we read the ext2 indirect blocks faster, but that's only a guess.

For a little variety, I measured some read latencies in the face of a
dbench 50 workload (after adding some latency stats to rio).  These were
done on scsi because my IDE drive wasn't big enough.  You're looking at
the last line from dbench (worthless) and the read throughput/latency
stats from rio (rio was killed right after dbench finished).

2.4.21          Throughput 100.617 MB/sec (NB=125.771 MB/sec  1006.17 MBit/sec)
2.4.22-pre5     Throughput 111.563 MB/sec (NB=139.454 MB/sec  1115.63 MBit/sec)
2.4.21-axboe-2l Throughput 126.298 MB/sec (NB=157.873 MB/sec  1262.98 MBit/sec)
2.4.21          reads: 34.160 MB, 0.512 MB/s
2.4.22-pre5     reads: 34.254 MB, 0.569 MB/s
2.4.21-axboe-2l reads: 47.733 MB, 0.896 MB/s
2.4.21          read latency min 0.00 avg 7.54 max 5742.93 > 2s 0.00% > 10s 0.00%
2.4.22-pre5     read latency min 0.00 avg 6.74 max 7225.02 > 2s 0.00% > 10s 0.00%
2.4.21-axboe-2l read latency min 0.00 avg 4.32 max 2128.92 > 2s 0.00% > 10s 0.00%

Both the dbench throughput numbers and the rio stats were fairly stable
across runs.  As the number of dbench procs increases vanilla 2.4.21
does worse on latency.

-chris


[-- Attachment #2: axboe-2l.diff --]
[-- Type: text/plain, Size: 4398 bytes --]

===== drivers/block/ll_rw_blk.c 1.47 vs edited =====
--- 1.47/drivers/block/ll_rw_blk.c	Fri Jul 11 04:30:54 2003
+++ edited/drivers/block/ll_rw_blk.c	Wed Jul 16 12:27:24 2003
@@ -458,6 +458,7 @@
   
  	INIT_LIST_HEAD(&q->rq.free);
 	q->rq.count = 0;
+	q->rq.pending[READ] = q->rq.pending[WRITE] = 0;
 	q->nr_requests = 0;
 
 	si_meminfo(&si);
@@ -549,18 +550,35 @@
 static struct request *get_request(request_queue_t *q, int rw)
 {
 	struct request *rq = NULL;
-	struct request_list *rl;
+	struct request_list *rl = &q->rq;
 
-	rl = &q->rq;
-	if (!list_empty(&rl->free) && !blk_oversized_queue(q)) {
+	if (blk_oversized_queue(q)) {
+		int rlim = q->nr_requests >> 5;
+
+		if (rlim < 4)
+			rlim = 4;
+
+		/*
+		 * if its a write, or we have more than a handful of reads
+		 * pending, bail out
+		 */
+		if ((rw == WRITE) || (rw == READ && rl->pending[READ] > rlim))
+			return NULL;
+		if (blk_oversized_queue_reads(q))
+			return NULL;
+	}
+	
+	if (!list_empty(&rl->free)) {
 		rq = blkdev_free_rq(&rl->free);
 		list_del(&rq->queue);
 		rl->count--;
+		rl->pending[rw]++;
 		rq->rq_status = RQ_ACTIVE;
 		rq->cmd = rw;
 		rq->special = NULL;
 		rq->q = q;
 	}
+
 	return rq;
 }
 
@@ -866,13 +884,25 @@
 	 * assume it has free buffers and check waiters
 	 */
 	if (q) {
+		struct request_list *rl = &q->rq;
 		int oversized_batch = 0;
 
 		if (q->can_throttle)
 			oversized_batch = blk_oversized_queue_batch(q);
-		q->rq.count++;
-		list_add(&req->queue, &q->rq.free);
-		if (q->rq.count >= q->batch_requests && !oversized_batch) {
+		rl->count++;
+		/*
+		 * paranoia check
+		 */
+		if (req->cmd == READ || req->cmd == WRITE)
+			rl->pending[req->cmd]--;
+		if (rl->pending[READ] > q->nr_requests)
+			printk("reads: %u\n", rl->pending[READ]);
+		if (rl->pending[WRITE] > q->nr_requests)
+			printk("writes: %u\n", rl->pending[WRITE]);
+		if (rl->pending[READ] + rl->pending[WRITE] > q->nr_requests)
+			printk("too many: %u + %u > %u\n", rl->pending[READ], rl->pending[WRITE], q->nr_requests);
+		list_add(&req->queue, &rl->free);
+		if (rl->count >= q->batch_requests && !oversized_batch) {
 			smp_mb();
 			if (waitqueue_active(&q->wait_for_requests))
 				wake_up(&q->wait_for_requests);
@@ -947,7 +977,7 @@
 static int __make_request(request_queue_t * q, int rw,
 				  struct buffer_head * bh)
 {
-	unsigned int sector, count;
+	unsigned int sector, count, sync;
 	int max_segments = MAX_SEGMENTS;
 	struct request * req, *freereq = NULL;
 	int rw_ahead, max_sectors, el_ret;
@@ -958,6 +988,7 @@
 
 	count = bh->b_size >> 9;
 	sector = bh->b_rsector;
+	sync = test_and_clear_bit(BH_Sync, &bh->b_state);
 
 	rw_ahead = 0;	/* normal case; gets changed below for READA */
 	switch (rw) {
@@ -1124,6 +1155,8 @@
 		blkdev_release_request(freereq);
 	if (should_wake)
 		get_request_wait_wakeup(q, rw);
+	if (sync)
+		__generic_unplug_device(q);
 	spin_unlock_irq(&io_request_lock);
 	return 0;
 end_io:
===== fs/buffer.c 1.89 vs edited =====
--- 1.89/fs/buffer.c	Tue Jun 24 17:11:29 2003
+++ edited/fs/buffer.c	Tue Jul 15 15:31:07 2003
@@ -1142,6 +1142,7 @@
 	bh = getblk(dev, block, size);
 	if (buffer_uptodate(bh))
 		return bh;
+	set_bit(BH_Sync, &bh->b_state);
 	ll_rw_block(READ, 1, &bh);
 	wait_on_buffer(bh);
 	if (buffer_uptodate(bh))
===== include/linux/blkdev.h 1.24 vs edited =====
--- 1.24/include/linux/blkdev.h	Fri Jul  4 13:18:12 2003
+++ edited/include/linux/blkdev.h	Tue Jul 15 16:32:32 2003
@@ -66,6 +66,7 @@
 
 struct request_list {
 	unsigned int count;
+	unsigned int pending[2];
 	struct list_head free;
 };
 
@@ -292,6 +293,13 @@
 {
 	if (q->can_throttle)
 		return atomic_read(&q->nr_sectors) > q->max_queue_sectors;
+	return q->rq.count == 0;
+}
+
+static inline int blk_oversized_queue_reads(request_queue_t * q)
+{
+	if (q->can_throttle)
+		return atomic_read(&q->nr_sectors) > q->max_queue_sectors + q->batch_sectors;
 	return q->rq.count == 0;
 }
 
===== include/linux/fs.h 1.82 vs edited =====
--- 1.82/include/linux/fs.h	Wed Jul  9 14:42:38 2003
+++ edited/include/linux/fs.h	Tue Jul 15 15:31:07 2003
@@ -221,6 +221,7 @@
 	BH_Launder,	/* 1 if we can throttle on this buffer */
 	BH_Attached,	/* 1 if b_inode_buffers is linked into a list */
 	BH_JBD,		/* 1 if it has an attached journal_head */
+	BH_Sync,	/* 1 if the buffer is a sync read */
 
 	BH_PrivateStart,/* not a state bit, but the first bit available
 			 * for private allocation by other entities

  parent reply	other threads:[~2003-07-16 16:54 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-07-08 20:06 RFC on io-stalls patch Marcelo Tosatti
2003-07-10 13:57 ` Jens Axboe
2003-07-11 14:13   ` Chris Mason
2003-07-12  0:20     ` Nick Piggin
2003-07-12 18:37       ` Chris Mason
2003-07-12  7:37     ` Jens Axboe
2003-07-12  7:48       ` Jens Axboe
2003-07-12 18:32       ` Chris Mason
2003-07-13  0:33         ` Andrea Arcangeli
2003-07-13  9:01         ` Jens Axboe
2003-07-13 16:20           ` Chris Mason
2003-07-13 16:45             ` Jeff Garzik
2003-07-13 19:33               ` Andrea Arcangeli
2003-07-13 17:47             ` Jens Axboe
2003-07-13 19:35               ` Andrea Arcangeli
2003-07-14  0:36                 ` Chris Mason
2003-07-13 19:19           ` Andrea Arcangeli
2003-07-14  5:49             ` Jens Axboe
2003-07-14 12:23               ` Marcelo Tosatti
2003-07-14 13:12                 ` Jens Axboe
2003-07-14 19:51                   ` Jens Axboe
2003-07-14 20:09                     ` Chris Mason
2003-07-14 20:19                       ` Andrea Arcangeli
2003-07-14 21:24                         ` Chris Mason
2003-07-15  5:46                       ` Jens Axboe
2003-07-14 20:09                     ` Marcelo Tosatti
2003-07-14 20:24                       ` Andrea Arcangeli
2003-07-14 20:34                         ` Chris Mason
2003-07-15  5:35                           ` Jens Axboe
     [not found]                           ` <20030714224528.GU16313@dualathlon.random>
2003-07-15  5:40                             ` Jens Axboe
     [not found]                             ` <1058229360.13317.364.camel@tiny.suse.com>
2003-07-15  5:43                               ` Jens Axboe
     [not found]                               ` <20030714175238.3eaddd9a.akpm@osdl.org>
     [not found]                                 ` <20030715020706.GC16313@dualathlon.random>
2003-07-15  5:45                                   ` Jens Axboe
2003-07-15  6:01                                     ` Andrea Arcangeli
2003-07-15  6:08                                       ` Jens Axboe
2003-07-15  7:03                                         ` Andrea Arcangeli
2003-07-15  8:28                                           ` Jens Axboe
2003-07-15  9:12                                             ` Chris Mason
2003-07-15  9:17                                               ` Jens Axboe
2003-07-15  9:18                                                 ` Jens Axboe
2003-07-15  9:30                                                   ` Chris Mason
2003-07-15 10:03                                                   ` Andrea Arcangeli
2003-07-15 10:11                                                     ` Jens Axboe
2003-07-15 14:18                                                 ` Chris Mason
2003-07-15 14:29                                                   ` Jens Axboe
2003-07-16 17:06                                                   ` Chris Mason [this message]
2003-07-15  9:22                                               ` Chris Mason
2003-07-15  9:59                                               ` Andrea Arcangeli
2003-07-15  9:48                                             ` Andrea Arcangeli
2003-07-14 20:16                     ` Andrea Arcangeli
2003-07-14 20:17                       ` Marcelo Tosatti
2003-07-14 20:27                         ` Andrea Arcangeli
2003-07-15  5:26                       ` Jens Axboe
2003-07-15  5:48                         ` Andrea Arcangeli
2003-07-15  6:01                           ` Jens Axboe
2003-07-15  6:33                             ` Andrea Arcangeli
2003-07-15 11:22                         ` Alan Cox
2003-07-15 11:27                           ` Jens Axboe
2003-07-16 12:43                             ` Andrea Arcangeli
2003-07-16 12:46                               ` Jens Axboe
2003-07-16 12:59                                 ` Andrea Arcangeli
2003-07-16 13:04                                   ` Jens Axboe
2003-07-16 13:11                                     ` Andrea Arcangeli
2003-07-16 13:21                                       ` Jens Axboe
2003-07-16 13:44                                         ` Andrea Arcangeli
2003-07-16 14:00                                           ` Jens Axboe
2003-07-16 14:24                                             ` Andrea Arcangeli
2003-07-16 16:49                                     ` Andrew Morton
2003-07-15 18:47 Shane Shrybman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1058375181.4016.147.camel@tiny.suse.com \
    --to=mason@suse.com \
    --cc=akpm@osdl.org \
    --cc=andrea@suse.de \
    --cc=axboe@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=marcelo@conectiva.com.br \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).