From: Chris Mason <mason@suse.com>
To: Nick Piggin <piggin@cyberone.com.au>
Cc: Andrea Arcangeli <andrea@suse.de>,
Marc-Christian Petersen <m.c.p@wolk-project.de>,
Jens Axboe <axboe@suse.de>,
Marcelo Tosatti <marcelo@conectiva.com.br>,
Georg Nikodym <georgn@somanetworks.com>,
lkml <linux-kernel@vger.kernel.org>,
Matthias Mueller <matthias.mueller@rz.uni-karlsruhe.de>
Subject: Re: [PATCH] io stalls
Date: 12 Jun 2003 12:06:43 -0400 [thread overview]
Message-ID: <1055434002.23697.431.camel@tiny.suse.com> (raw)
In-Reply-To: <3EE7F18C.3010502@cyberone.com.au>
[-- Attachment #1: Type: text/plain, Size: 1782 bytes --]
On Wed, 2003-06-11 at 23:20, Nick Piggin wrote:
>
> I think the cpu utilization gain of waking a number of tasks
> at once would be outweighed by advantage of waking 1 task
> and not putting it to sleep again for a number of requests.
> You obviously are not claiming concurrency improvements, as
> your method would also increase contention on the io lock
> (or the queue lock in 2.5).
I've been trying variations on this for a few days, none have been
thrilling but the end result is better dbench and iozone throughput
overall. For the 20 writer iozone test, rc7 got an average throughput
of 3MB/s, and yesterdays latency patch got 500k/s or so. Ouch.
This gets us up to 1.2MB/s. I'm keeping yesterday's
get_request_wait_wake, which wakes up a waiter instead of unplugging.
The basic idea here is that after a process is woken up and grabs a
request, he becomes the batch owner. Batch owners get to ignore the
q->full flag for either 1/5 second or 32 requests, whichever comes
first. The timer part is an attempt at preventing memory pressure
writers (who go 1 req at a time) from holding onto batch ownership for
too long. Latency stats after dbench 50:
device 08:01: num_req 120077, total jiffies waited 663231
65538 forced to wait
1 min wait, 175 max wait
10 average wait
65296 < 100, 242 < 200, 0 < 300, 0 < 400, 0 < 500
0 waits longer than 500 jiffies
Good latency system wide comes from fair waiting, but it also comes from
how fast we can run write_some_buffers(), since that is the unit of
throttling. Hopefully this patch decreases the time it takes for
write_some_buffers over the past latency patches, or gives someone else
a better idea ;-)
Attached is an incremental over yesterday's io-stalls-5.diff.
-chris
[-- Attachment #2: io-stalls-6-inc.diff --]
[-- Type: text/plain, Size: 3421 bytes --]
diff -u edited/drivers/block/ll_rw_blk.c edited/drivers/block/ll_rw_blk.c
--- edited/drivers/block/ll_rw_blk.c Wed Jun 11 13:36:10 2003
+++ edited/drivers/block/ll_rw_blk.c Thu Jun 12 11:53:03 2003
@@ -437,6 +437,12 @@
nr_requests = 128;
if (megs < 32)
nr_requests /= 2;
+ q->batch_owner[0] = NULL;
+ q->batch_owner[1] = NULL;
+ q->batch_remaining[0] = 0;
+ q->batch_remaining[1] = 0;
+ q->batch_jiffies[0] = 0;
+ q->batch_jiffies[1] = 0;
blk_grow_request_list(q, nr_requests);
init_waitqueue_head(&q->wait_for_requests[0]);
@@ -558,6 +564,31 @@
blk_queue_bounce_limit(q, BLK_BOUNCE_HIGH);
}
+#define BATCH_JIFFIES (HZ/5)
+static void check_batch_owner(request_queue_t *q, int rw)
+{
+ if (q->batch_owner[rw] != current)
+ return;
+ if (--q->batch_remaining[rw] > 0 &&
+ jiffies - q->batch_jiffies[rw] < BATCH_JIFFIES) {
+ return;
+ }
+ q->batch_owner[rw] = NULL;
+}
+
+static void set_batch_owner(request_queue_t *q, int rw)
+{
+ struct task_struct *tsk = current;
+ if (q->batch_owner[rw] == tsk)
+ return;
+ if (q->batch_owner[rw] &&
+ jiffies - q->batch_jiffies[rw] < BATCH_JIFFIES)
+ return;
+ q->batch_jiffies[rw] = jiffies;
+ q->batch_owner[rw] = current;
+ q->batch_remaining[rw] = q->batch_requests;
+}
+
#define blkdev_free_rq(list) list_entry((list)->next, struct request, queue);
/*
* Get a free request. io_request_lock must be held and interrupts
@@ -587,9 +618,13 @@
*/
static inline struct request *get_request(request_queue_t *q, int rw)
{
- if (queue_full(q, rw))
+ struct request *rq;
+ if (queue_full(q, rw) && q->batch_owner[rw] != current)
return NULL;
- return __get_request(q, rw);
+ rq = __get_request(q, rw);
+ if (rq)
+ check_batch_owner(q, rw);
+ return rq;
}
/*
@@ -657,9 +692,9 @@
add_wait_queue_exclusive(&q->wait_for_requests[rw], &wait);
+ spin_lock_irq(&io_request_lock);
do {
set_current_state(TASK_UNINTERRUPTIBLE);
- spin_lock_irq(&io_request_lock);
if (queue_full(q, rw) || q->rq[rw].count == 0) {
if (q->rq[rw].count == 0)
__generic_unplug_device(q);
@@ -668,8 +703,9 @@
spin_lock_irq(&io_request_lock);
}
rq = __get_request(q, rw);
- spin_unlock_irq(&io_request_lock);
} while (rq == NULL);
+ set_batch_owner(q, rw);
+ spin_unlock_irq(&io_request_lock);
remove_wait_queue(&q->wait_for_requests[rw], &wait);
current->state = TASK_RUNNING;
@@ -1010,6 +1046,7 @@
struct list_head *head, *insert_here;
int latency;
elevator_t *elevator = &q->elevator;
+ int need_unplug = 0;
count = bh->b_size >> 9;
sector = bh->b_rsector;
@@ -1145,8 +1182,8 @@
spin_unlock_irq(&io_request_lock);
freereq = __get_request_wait(q, rw);
head = &q->queue_head;
+ need_unplug = 1;
spin_lock_irq(&io_request_lock);
- get_request_wait_wakeup(q, rw);
goto again;
}
}
@@ -1174,6 +1211,8 @@
out:
if (freereq)
blkdev_release_request(freereq);
+ if (need_unplug)
+ get_request_wait_wakeup(q, rw);
spin_unlock_irq(&io_request_lock);
return 0;
end_io:
diff -u edited/include/linux/blkdev.h edited/include/linux/blkdev.h
--- edited/include/linux/blkdev.h Wed Jun 11 09:56:55 2003
+++ edited/include/linux/blkdev.h Thu Jun 12 09:44:26 2003
@@ -92,6 +92,10 @@
*/
int batch_requests;
+ struct task_struct *batch_owner[2];
+ int batch_remaining[2];
+ unsigned long batch_jiffies[2];
+
/*
* Together with queue_head for cacheline sharing
*/
next prev parent reply other threads:[~2003-06-12 15:53 UTC|newest]
Thread overview: 109+ messages / expand[flat|nested] mbox.gz Atom feed top
2003-05-29 0:55 Linux 2.4.21-rc6 Marcelo Tosatti
2003-05-29 1:22 ` Con Kolivas
2003-05-29 5:24 ` Marc Wilson
2003-05-29 5:34 ` Riley Williams
2003-05-29 5:57 ` Marc Wilson
2003-05-29 7:15 ` Riley Williams
2003-05-29 8:38 ` Willy Tarreau
2003-05-29 8:40 ` Willy Tarreau
2003-06-03 16:02 ` Marcelo Tosatti
2003-06-03 16:13 ` Marc-Christian Petersen
2003-06-04 21:54 ` Pavel Machek
2003-06-05 2:10 ` Michael Frank
2003-06-03 16:30 ` Michael Frank
2003-06-03 16:53 ` Matthias Mueller
2003-06-03 16:59 ` Marc-Christian Petersen
2003-06-03 17:03 ` Marc-Christian Petersen
2003-06-03 18:02 ` Anders Karlsson
2003-06-03 21:12 ` J.A. Magallon
2003-06-03 21:18 ` Marc-Christian Petersen
2003-06-03 17:23 ` Michael Frank
2003-06-04 14:56 ` Jakob Oestergaard
2003-06-04 4:04 ` Marc Wilson
2003-05-29 10:02 ` Con Kolivas
2003-05-29 18:00 ` Georg Nikodym
2003-05-29 19:11 ` -rc7 " Marcelo Tosatti
2003-05-29 19:56 ` Krzysiek Taraszka
2003-05-29 20:18 ` Krzysiek Taraszka
2003-06-04 18:17 ` Marcelo Tosatti
2003-06-04 21:41 ` Krzysiek Taraszka
2003-06-04 22:37 ` Alan Cox
2003-06-04 10:22 ` Andrea Arcangeli
2003-06-04 10:35 ` Marc-Christian Petersen
2003-06-04 10:42 ` Jens Axboe
2003-06-04 10:46 ` Marc-Christian Petersen
2003-06-04 10:48 ` Andrea Arcangeli
2003-06-04 11:57 ` Nick Piggin
2003-06-04 12:00 ` Jens Axboe
2003-06-04 12:09 ` Andrea Arcangeli
2003-06-04 12:20 ` Jens Axboe
2003-06-04 20:50 ` Rob Landley
2003-06-04 12:11 ` Nick Piggin
2003-06-04 12:35 ` Miquel van Smoorenburg
2003-06-09 21:39 ` [PATCH] io stalls (was: -rc7 Re: Linux 2.4.21-rc6) Chris Mason
2003-06-09 22:19 ` Andrea Arcangeli
2003-06-10 0:27 ` Chris Mason
2003-06-10 23:13 ` Chris Mason
2003-06-11 0:16 ` Andrea Arcangeli
2003-06-11 0:44 ` Chris Mason
2003-06-09 23:51 ` [PATCH] io stalls Nick Piggin
2003-06-10 0:32 ` Chris Mason
2003-06-10 0:47 ` Nick Piggin
2003-06-10 1:48 ` Robert White
2003-06-10 2:13 ` Chris Mason
2003-06-10 23:04 ` Robert White
2003-06-11 0:58 ` Chris Mason
2003-06-10 3:22 ` Nick Piggin
2003-06-10 21:17 ` Robert White
2003-06-11 0:40 ` Nick Piggin
2003-06-11 0:33 ` [PATCH] io stalls (was: -rc7 Re: Linux 2.4.21-rc6) Andrea Arcangeli
2003-06-11 0:48 ` [PATCH] io stalls Nick Piggin
2003-06-11 1:07 ` Andrea Arcangeli
2003-06-11 0:54 ` [PATCH] io stalls (was: -rc7 Re: Linux 2.4.21-rc6) Chris Mason
2003-06-11 1:06 ` Andrea Arcangeli
2003-06-11 1:57 ` Chris Mason
2003-06-11 2:10 ` Andrea Arcangeli
2003-06-11 12:24 ` Chris Mason
2003-06-11 17:42 ` Chris Mason
2003-06-11 18:12 ` Andrea Arcangeli
2003-06-11 18:27 ` Chris Mason
2003-06-11 18:35 ` Andrea Arcangeli
2003-06-12 1:04 ` [PATCH] io stalls Nick Piggin
2003-06-12 1:12 ` Chris Mason
2003-06-12 1:29 ` Andrea Arcangeli
2003-06-12 1:37 ` Andrea Arcangeli
2003-06-12 2:22 ` Chris Mason
2003-06-12 2:41 ` Nick Piggin
2003-06-12 2:46 ` Andrea Arcangeli
2003-06-12 2:49 ` Nick Piggin
2003-06-12 2:51 ` Nick Piggin
2003-06-12 2:52 ` Nick Piggin
2003-06-12 3:04 ` Andrea Arcangeli
2003-06-12 2:58 ` Andrea Arcangeli
2003-06-12 3:04 ` Nick Piggin
2003-06-12 3:12 ` Andrea Arcangeli
2003-06-12 3:20 ` Nick Piggin
2003-06-12 3:33 ` Andrea Arcangeli
2003-06-12 3:48 ` Nick Piggin
2003-06-12 4:17 ` Andrea Arcangeli
2003-06-12 4:41 ` Nick Piggin
2003-06-12 16:06 ` Chris Mason [this message]
2003-06-12 16:16 ` Nick Piggin
2003-06-25 19:03 ` Chris Mason
2003-06-25 19:25 ` Andrea Arcangeli
2003-06-25 20:18 ` Chris Mason
2003-06-27 8:41 ` write-caches, I/O stalls: MUST-FIX (was: [PATCH] io stalls) Matthias Andree
2003-06-26 5:48 ` [PATCH] io stalls Nick Piggin
2003-06-26 11:48 ` Chris Mason
2003-06-26 13:04 ` Nick Piggin
2003-06-26 13:18 ` Nick Piggin
2003-06-26 15:55 ` Chris Mason
2003-06-27 1:21 ` Nick Piggin
2003-06-27 1:39 ` Chris Mason
2003-06-27 9:45 ` Nick Piggin
2003-06-27 12:41 ` Chris Mason
2003-06-12 11:57 ` Chris Mason
2003-06-04 10:43 ` -rc7 Re: Linux 2.4.21-rc6 Andrea Arcangeli
2003-06-04 11:01 ` Marc-Christian Petersen
2003-06-03 19:45 ` Config issue (CONFIG_X86_TSC) " Paul
2003-06-03 20:18 ` Jan-Benedict Glaw
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1055434002.23697.431.camel@tiny.suse.com \
--to=mason@suse.com \
--cc=andrea@suse.de \
--cc=axboe@suse.de \
--cc=georgn@somanetworks.com \
--cc=linux-kernel@vger.kernel.org \
--cc=m.c.p@wolk-project.de \
--cc=marcelo@conectiva.com.br \
--cc=matthias.mueller@rz.uni-karlsruhe.de \
--cc=piggin@cyberone.com.au \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).