From: Andrew Morton <akpm@zip.com.au>
To: Neil Brown <neilb@cse.unsw.edu.au>
Cc: Linus Torvalds <torvalds@transmeta.com>,
Kernel Mailing List <linux-kernel@vger.kernel.org>,
Andrea Arcangeli <andrea@suse.de>
Subject: Re: 2.4.14-pre6
Date: Thu, 01 Nov 2001 12:55:41 -0800 [thread overview]
Message-ID: <3BE1B6CD.7DA43A6C@zip.com.au> (raw)
In-Reply-To: message from Linus Torvalds on Wednesday October 31, <Pine.LNX.4.33.0110310809200.32460-100000@penguin.transmeta.com> <15329.8658.642254.284398@notabene.cse.unsw.edu.au>
Neil Brown wrote:
>
> ...
> What I would like is that as soon as a buffer was marked "dirty", it
> would get passed down to the driver (or at least to the
> block-device-layer) with something like
> submit_bh(WRITEA, bh);
> i.e. a write ahead. (or is it write-behind...)
> The device handler (the elevator algorithm for normal disks, other
> code for other devices) could keep them ordered in whatever way it
> chooses, and feed them into the queues at some appropriate time.
>
Sounds sensible to me.
In many ways, it's similar to the current scheme when it's used
with an enormous request queue - all writeable blocks in the
system are candidates for request merging. But your proposal
is a whole lot smarter.
In particular, the current kupdate scheme of writing the
dirty block list out in six chunks, five seconds apart
does potentially miss out on a very large number of merging
opportunities. Your proposal would fix that.
Another potential microoptimisation would be to write out
clean blocks if that helps merging. So if we see a write
for blocks 1,2,3,5,6,7 and block 4 is known to be in memory,
then write it out too. I suspect this would be a win for
ATA but a loss for SCSI. Not sure.
But I have a gut feel that all this is in the noisefloor,
compared to The Big Problem. It's just a matter of identifying
and fixing TBP. Fixing the fdatasync() thing didn't help,
because ext2_write_inode() for a new file has to read the
inode block from disk. Fixing that, by doing an async preread
of the inode's block in ext2_new_inode() didn't help either,
I suspect because my working set was so large that the VM
tossed out my preread before I got to use it. A few more days
poking is needed.
Oh. I have a gripe concerning prune_icache(). The design
idea behind keventd is that it's a "process context bottom
half handler". It's used for things like cardbus hotplug
interrupt handlers, handling tty hangups, etc. It should
probably run SCHED_FIFO.
Using keventd to synchronously flush large amounts of
data out to disk constitutes gross abuse - it's being blocked
from performing its designed duties for many seconds. Can we
please not do that? We already have kswapd, kupdate, bdflush,
which should be sufficient.
-
next prev parent reply other threads:[~2001-11-01 21:01 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2001-10-31 16:15 2.4.14-pre6 Linus Torvalds
2001-10-31 18:36 ` 2.4.14-pre6 Andrew Morton
2001-10-31 19:06 ` 2.4.14-pre6 Linus Torvalds
2001-11-01 10:20 ` 2.4.14-pre6 Neil Brown
2001-11-01 20:55 ` Andrew Morton [this message]
2001-11-02 8:00 ` 2.4.14-pre6 Helge Hafting
2001-11-04 22:34 ` 2.4.14-pre6 Pavel Machek
2001-11-04 23:16 ` 2.4.14-pre6 Daniel Phillips
2001-11-01 21:28 ` 2.4.14-pre6 Chris Mason
-- strict thread matches above, loose matches on Subject: below --
2001-10-31 8:00 2.4.14-pre6 Linus Torvalds
2001-10-31 9:10 ` 2.4.14-pre6 Andrew Morton
2001-10-31 9:29 ` 2.4.14-pre6 Jens Axboe
2001-10-31 9:30 ` 2.4.14-pre6 bert hubert
2001-10-31 19:27 ` 2.4.14-pre6 Michael Peddemors
2001-10-31 19:38 ` 2.4.14-pre6 Linus Torvalds
2001-10-31 19:55 ` 2.4.14-pre6 Mike Castle
2001-10-31 20:02 ` 2.4.14-pre6 Rik van Riel
2001-10-31 23:18 ` 2.4.14-pre6 Erik Andersen
2001-10-31 23:40 ` 2.4.14-pre6 Dax Kelson
2001-10-31 23:57 ` 2.4.14-pre6 Michael Peddemors
2001-10-31 19:52 ` 2.4.14-pre6 Philipp Matthias Hahn
2001-10-31 21:05 ` 2.4.14-pre6 H. Peter Anvin
2001-11-01 19:14 ` 2.4.14-pre6 Pozsar Balazs
2001-11-02 12:01 ` 2.4.14-pre6 Pavel Machek
2001-11-05 20:43 ` 2.4.14-pre6 Charles Cazabon
2001-11-05 20:49 ` 2.4.14-pre6 Linus Torvalds
2001-11-05 21:04 ` 2.4.14-pre6 Johannes Erdfelt
2001-11-05 21:08 ` 2.4.14-pre6 Wilson
2001-11-05 21:27 ` 2.4.14-pre6 Josh Fryman
2001-11-05 19:04 ` 2.4.14-pre6 Gérard Roudier
2001-11-02 16:48 ` 2.4.14-pre6 jogi
2001-11-03 12:47 ` 2.4.14-pre6 Mike Galbraith
2001-11-03 18:01 ` 2.4.14-pre6 Linus Torvalds
2001-11-03 19:07 ` 2.4.14-pre6 Mike Galbraith
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3BE1B6CD.7DA43A6C@zip.com.au \
--to=akpm@zip.com.au \
--cc=andrea@suse.de \
--cc=linux-kernel@vger.kernel.org \
--cc=neilb@cse.unsw.edu.au \
--cc=torvalds@transmeta.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).