linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrea Arcangeli <andrea@suse.de>
To: Chris Mason <mason@suse.com>
Cc: Jens Axboe <axboe@suse.de>,
	Marcelo Tosatti <marcelo@conectiva.com.br>,
	lkml <linux-kernel@vger.kernel.org>,
	"Stephen C. Tweedie" <sct@redhat.com>,
	Alan Cox <alan@lxorguk.ukuu.org.uk>,
	Jeff Garzik <jgarzik@pobox.com>, Andrew Morton <akpm@digeo.com>,
	Alexander Viro <viro@math.psu.edu>
Subject: Re: RFC on io-stalls patch
Date: Sun, 13 Jul 2003 02:33:07 +0200	[thread overview]
Message-ID: <20030713003307.GH16313@dualathlon.random> (raw)
In-Reply-To: <1058034751.13318.95.camel@tiny.suse.com>

Hello,

On Sat, Jul 12, 2003 at 02:32:32PM -0400, Chris Mason wrote:
> Anyway, if you've got doubts about the current patch, I'd be happy to
> run a specific benchmark you think will show the bad read
> characteristics.

the main reason I dropped the two queues in elevator-lowatency is that
the ability of calling schedule just once only for the I/O completion
with reads was a minor thing compared having to wait a potential 32M
queue to be flushed to disk before being able to read a byte from disk.
So I didn't want to complicate the code with minor things, while fixing
what I considered the major issue (i.e. the overkill queue size with
contigous writes, and too small queue size while seeking).

I already attacked that problem years ago with the max_bomb_segments
(the dead ioctl ;). You know, at that time I attacked the problem from
the wrong side: rather than limting the mbytes in the queue like
elevator-lowatency does, I enforced a max limit on the single request
size, because I didn't have an idea of how critical it is to get 512k
requests for each SG DMA operation in any actual storage (the same
mistake that the anticipatory scheduler developers did when they thought
anticipatory scheduler could in any way obviate the need of very
aggressive readahead).  elevator-lowlatency is the max_bomb thing in a
way that doesn't hurt contigous I/O throughput at all, with very similar
latency benefits. Furthmore elevator-lowatency allowed me to grow a lot
the number of requests without killing down a box with gigabytes large
I/O queues, so now in presence of heavily-seeking 512bytes per-request
(the opposite of 512k per request with contigous I/O) many more requests
can sit in the elevator in turn allowing a more aggressive reordering of
requests during seeking. That should result in a performance improvement
when seeking (besides the fariness improvement under a flood of
contigous I/O).

Having two queues, could allow a reader to sleep just once, while this
way it also has to wait for batch_sectors before being able to queue. So
I think what it could do is basically only a cpu saving thing (one less
schedule) and I doubt it would be noticeable.

And in general I don't like too much assumptions that considers reads
different than writes, writes can be latency critical too with fsync
(especially with journaling).  Infact if it was just the sync-reads that
we cared about I think read-latency2 from Andrew would already work well
and it's much less invasive, but I consider that a dirty hack compared
to the elevator-lowlatency that fixes stuff for sync writes too, as well
as sync reads, without assuming special workloads. read-latency2
basically makes a very hardcoded assumption that writes aren't latency
critical and it tries to hide the effect of the overkill size of the
queue, by simply putting all reads near the head of the queue, no matter
if the queue size is 1m or 10m or 100m.  The whole point of
elevator-lowlatency is not to add read-hacks that assumes only reads are
latency critical. and of course an overkill size of the queue isn't good
for the VM system too, since that memory is unfreeable and it could
render much harder for the VM to be nice with apps allocating ram during
write throttling etc..

Overall I'm not against resurrecting the specialized read queue, after
all writes also gets a special queue (so one could claim that's an
optimization for sync-writes not sync-reads, it's very symmetric ;), but
conceptually I don't find it very worthwhile. So I would prefer to have
a benchmark as you suggested, before complicating things in mainline
(and as you can see now it's a bit more tricky to retain the two
queues ;).

Andrea

  reply	other threads:[~2003-07-13  0:18 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-07-08 20:06 RFC on io-stalls patch Marcelo Tosatti
2003-07-10 13:57 ` Jens Axboe
2003-07-11 14:13   ` Chris Mason
2003-07-12  0:20     ` Nick Piggin
2003-07-12 18:37       ` Chris Mason
2003-07-12  7:37     ` Jens Axboe
2003-07-12  7:48       ` Jens Axboe
2003-07-12 18:32       ` Chris Mason
2003-07-13  0:33         ` Andrea Arcangeli [this message]
2003-07-13  9:01         ` Jens Axboe
2003-07-13 16:20           ` Chris Mason
2003-07-13 16:45             ` Jeff Garzik
2003-07-13 19:33               ` Andrea Arcangeli
2003-07-13 17:47             ` Jens Axboe
2003-07-13 19:35               ` Andrea Arcangeli
2003-07-14  0:36                 ` Chris Mason
2003-07-13 19:19           ` Andrea Arcangeli
2003-07-14  5:49             ` Jens Axboe
2003-07-14 12:23               ` Marcelo Tosatti
2003-07-14 13:12                 ` Jens Axboe
2003-07-14 19:51                   ` Jens Axboe
2003-07-14 20:09                     ` Chris Mason
2003-07-14 20:19                       ` Andrea Arcangeli
2003-07-14 21:24                         ` Chris Mason
2003-07-15  5:46                       ` Jens Axboe
2003-07-14 20:09                     ` Marcelo Tosatti
2003-07-14 20:24                       ` Andrea Arcangeli
2003-07-14 20:34                         ` Chris Mason
2003-07-15  5:35                           ` Jens Axboe
     [not found]                           ` <20030714224528.GU16313@dualathlon.random>
2003-07-15  5:40                             ` Jens Axboe
     [not found]                             ` <1058229360.13317.364.camel@tiny.suse.com>
2003-07-15  5:43                               ` Jens Axboe
     [not found]                               ` <20030714175238.3eaddd9a.akpm@osdl.org>
     [not found]                                 ` <20030715020706.GC16313@dualathlon.random>
2003-07-15  5:45                                   ` Jens Axboe
2003-07-15  6:01                                     ` Andrea Arcangeli
2003-07-15  6:08                                       ` Jens Axboe
2003-07-15  7:03                                         ` Andrea Arcangeli
2003-07-15  8:28                                           ` Jens Axboe
2003-07-15  9:12                                             ` Chris Mason
2003-07-15  9:17                                               ` Jens Axboe
2003-07-15  9:18                                                 ` Jens Axboe
2003-07-15  9:30                                                   ` Chris Mason
2003-07-15 10:03                                                   ` Andrea Arcangeli
2003-07-15 10:11                                                     ` Jens Axboe
2003-07-15 14:18                                                 ` Chris Mason
2003-07-15 14:29                                                   ` Jens Axboe
2003-07-16 17:06                                                   ` Chris Mason
2003-07-15  9:22                                               ` Chris Mason
2003-07-15  9:59                                               ` Andrea Arcangeli
2003-07-15  9:48                                             ` Andrea Arcangeli
2003-07-14 20:16                     ` Andrea Arcangeli
2003-07-14 20:17                       ` Marcelo Tosatti
2003-07-14 20:27                         ` Andrea Arcangeli
2003-07-15  5:26                       ` Jens Axboe
2003-07-15  5:48                         ` Andrea Arcangeli
2003-07-15  6:01                           ` Jens Axboe
2003-07-15  6:33                             ` Andrea Arcangeli
2003-07-15 11:22                         ` Alan Cox
2003-07-15 11:27                           ` Jens Axboe
2003-07-16 12:43                             ` Andrea Arcangeli
2003-07-16 12:46                               ` Jens Axboe
2003-07-16 12:59                                 ` Andrea Arcangeli
2003-07-16 13:04                                   ` Jens Axboe
2003-07-16 13:11                                     ` Andrea Arcangeli
2003-07-16 13:21                                       ` Jens Axboe
2003-07-16 13:44                                         ` Andrea Arcangeli
2003-07-16 14:00                                           ` Jens Axboe
2003-07-16 14:24                                             ` Andrea Arcangeli
2003-07-16 16:49                                     ` Andrew Morton
2003-07-15 18:47 Shane Shrybman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20030713003307.GH16313@dualathlon.random \
    --to=andrea@suse.de \
    --cc=akpm@digeo.com \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=axboe@suse.de \
    --cc=jgarzik@pobox.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=marcelo@conectiva.com.br \
    --cc=mason@suse.com \
    --cc=sct@redhat.com \
    --cc=viro@math.psu.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).