From: Andrew Morton <akpm@osdl.org>
To: Linus Torvalds <torvalds@osdl.org>
Cc: Segher Boessenkool <segher@kernel.crashing.org>,
David Miller <davem@davemloft.net>,
nickpiggin@yahoo.com.au, kenneth.w.chen@intel.com,
guichaz@yahoo.fr, hugh@veritas.com,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
ranma@tdiedrich.de, gordonfarquharson@gmail.com,
a.p.zijlstra@chello.nl, tbm@cyrius.com, arjan@infradead.org,
andrei.popa@i-neo.ro
Subject: Re: Ok, explained.. (was Re: [PATCH] mm: fix page_mkclean_one)
Date: Fri, 29 Dec 2006 15:51:18 -0800 [thread overview]
Message-ID: <20061229155118.3feb0c17.akpm@osdl.org> (raw)
In-Reply-To: <Pine.LNX.4.64.0612291431200.4473@woody.osdl.org>
On Fri, 29 Dec 2006 14:42:51 -0800 (PST)
Linus Torvalds <torvalds@osdl.org> wrote:
>
>
> On Fri, 29 Dec 2006, Andrew Morton wrote:
> >
> > - The above change means that we do extra writeout. If a page is dirtied
> > once, kjournald will write it and then pdflush will come along and
> > needlessly write it again.
>
> There's zero extra writeout for any flushing that flushes BY PAGES.
>
> Only broken flushers that flush by buffer heads (which really really
> really shouldn't be done any more: welcome to the 21st century) will cause
> extra writeouts. And those extra writeouts are obviously required for all
> the dirty state to actually hit the disk - which is the point of the
> patch.
>
> So they're not "extra" - they are "required for correct working".
They're extra. As in "can be optimised away".
> But I can't stress the fact enough that people SHOULD NOT do writeback by
> buffer heads. The buffer head has been purely an "IO entity" for the last
> several years now, and it's not a cache entity.
The buffer_head is not an IO container. It is the kernel's core
representation of a disk block. Usually (but not always) it is backed by
some memory which is in pagecache. We can feed buffer_heads into IO
containers via submit_bh(), but that's far from the only thing we use
buffer_heads for. We should have done s/buffer_head/block/g years ago.
JBD implements physical block-based journalling, so it is 100% appropriate
that JBD deal with these disk blocks using their buffer_head
representation.
That being said, ordered-data mode isn't really part of the JBD journalling
system at all (the data doesn't get journalled!) - ordered-mode is an
add-on to the JBD journal to make the metadata which we're about to journal
point at more-likely-to-be-correct data.
JBD's ordered-mode writeback is just a sync and I see no conceptual
problems with killing its old buffer_head based sync and moving it into the
21st century.
> Anybody who does writeback
> by buffer heads is basically bypassing the real cache (the page cache),
> and that's why all the problems happen.
>
> I think ext3 is terminally crap by now. It still uses buffer heads in
> places where it really really shouldn't,
The ordered-data mode flush: sure. The rest of JBD's use of buffer_heads
is quite appropriate.
> and as a result, things like
> directory accesses are simply slower than they should be. Sadly, I don't
> think ext4 is going to fix any of this, either.
I thought I fixed the performance problem?
Somewhat nastily, but as ext3 directories are metadata it is appropriate
that modifications to them be done in terms of buffer_heads (ie: blocks).
> It's all just too inherently wrongly designed around the buffer head
> (which was correct in 1995, but hasn't been correct for a long time in the
> kernel any more).
>
> > - Poor old IO accounting broke again.
>
> No. That's why I used "set_page_dirty()" and did it that strange ugly way
> ("set page dirty, even though it's already dirty, and even though the very
> next thing we will do is TestClearPageDirty???").
nfs_set_page_dirty() and reiserfs_set_page_dirty() should now bail if
PageDirty() to avoid needless work.
> > - For a long time I've wanted to nuke the current ext3/jbd ordered-data
> > implementation altogether, and just make kjournald call into the
> > standard writeback code to do a standard suberblock->inodes->pages walk.
>
> I really would like to see less of the buffer-head-based stuff, and yes,
> more of the normal inode page walking. I don't think you can "order"
> accesses within a page anyway, exactly because of memory mapping issues,
> so any page ordering is not about buffer heads on the page itself, it
> should be purely about metadata.
In this context ext3's "ordered" mode means "sync the file contents before
journalling the metadata which points at it".
> > - It's pretty obnoxious that the VM now sets a clean page "dirty" and
> > then proceeds to modify its contents. It would be nice to stop doing
> > that.
>
> No. I think this really the fundamental confusion people had. People
> thought that setting the page dirty meant that it was no longer being
> modified.
No. Setting a page (or bh, or inode) dirty means "this is known to have
been modified". ie: this cached entity is now out of sync with backing
store.
Ho hum. I don't care much, really. But then, I understand how all this
stuff works. Try explaining to someone the relationship between
pte-dirtiness, page-dirtiness, radix-tree-dirtiness and
buffer_head-dirtiness.
next prev parent reply other threads:[~2006-12-29 23:51 UTC|newest]
Thread overview: 311+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-12-17 0:13 2.6.19 file content corruption on ext3 Andrei Popa
2006-12-17 12:06 ` Andrew Morton
2006-12-17 12:19 ` Marc Haber
2006-12-17 12:32 ` Andrei Popa
2006-12-17 13:39 ` Andrei Popa
2006-12-17 23:40 ` Andrew Morton
2006-12-18 1:02 ` Linus Torvalds
2006-12-18 1:22 ` Linus Torvalds
2006-12-18 1:29 ` Linus Torvalds
2006-12-18 1:57 ` Linus Torvalds
2006-12-18 4:51 ` Nick Piggin
2006-12-18 5:43 ` Andrew Morton
2006-12-18 7:22 ` Nick Piggin
2006-12-18 9:18 ` Andrew Morton
2006-12-18 9:26 ` Andrei Popa
2006-12-18 9:42 ` Nick Piggin
2006-12-19 8:51 ` Marc Haber
2006-12-19 9:28 ` Martin Michlmayr
2006-12-28 18:05 ` Marc Haber
2006-12-28 19:00 ` Linus Torvalds
2006-12-28 19:05 ` Petri Kaukasoina
2006-12-28 19:21 ` Linus Torvalds
2006-12-28 19:39 ` Dave Jones
2006-12-28 20:10 ` Arjan van de Ven
2006-12-29 9:23 ` maximilian attems
2006-12-29 15:02 ` Dave Jones
2006-12-29 18:52 ` maximilian attems
2006-12-29 19:14 ` Dave Jones
2006-12-28 21:24 ` Linus Torvalds
2006-12-28 21:36 ` Russell King
2006-12-28 22:37 ` Linus Torvalds
2006-12-28 22:50 ` David Miller
2006-12-28 23:01 ` Linus Torvalds
2006-12-29 1:38 ` Linus Torvalds
2006-12-29 1:59 ` Andrew Morton
2006-12-28 23:36 ` Anton Altaparmakov
2006-12-28 23:54 ` Linus Torvalds
2006-12-29 17:49 ` Guillaume Chazarain
2006-12-18 5:50 ` Linus Torvalds
2006-12-18 7:16 ` Andrew Morton
2006-12-18 7:17 ` Andrew Morton
2006-12-18 9:30 ` Nick Piggin
2006-12-18 7:30 ` Nick Piggin
2006-12-18 9:19 ` Andrei Popa
2006-12-18 9:38 ` Andrew Morton
2006-12-18 10:00 ` Andrei Popa
2006-12-18 10:11 ` Peter Zijlstra
2006-12-18 10:49 ` Andrei Popa
2006-12-18 15:24 ` Gene Heskett
2006-12-18 15:32 ` Peter Zijlstra
2006-12-18 15:47 ` Gene Heskett
2006-12-18 16:55 ` Peter Zijlstra
2006-12-18 18:03 ` Linus Torvalds
2006-12-18 18:24 ` Peter Zijlstra
2006-12-18 18:35 ` Linus Torvalds
2006-12-18 19:04 ` Andrei Popa
2006-12-18 19:10 ` Peter Zijlstra
2006-12-18 19:18 ` Linus Torvalds
2006-12-18 19:44 ` Andrei Popa
2006-12-18 20:14 ` Linus Torvalds
2006-12-18 20:41 ` Linus Torvalds
2006-12-18 21:11 ` Andrei Popa
2006-12-18 22:00 ` Alessandro Suardi
2006-12-18 22:45 ` Linus Torvalds
2006-12-19 0:13 ` Andrei Popa
2006-12-19 0:29 ` Linus Torvalds
2006-12-18 22:32 ` Linus Torvalds
2006-12-18 23:48 ` Andrei Popa
2006-12-19 0:04 ` Linus Torvalds
2006-12-19 0:29 ` Andrei Popa
2006-12-19 0:57 ` Linus Torvalds
2006-12-19 1:21 ` Andrew Morton
2006-12-19 1:44 ` Andrei Popa
2006-12-19 1:54 ` Andrew Morton
2006-12-19 2:04 ` Andrei Popa
2006-12-19 8:05 ` Andrei Popa
2006-12-19 8:24 ` Andrew Morton
2006-12-19 8:34 ` Pekka Enberg
2006-12-19 9:13 ` Marc Haber
2006-12-19 1:50 ` Andrei Popa
2006-12-19 1:03 ` Gene Heskett
2006-12-18 22:34 ` Gene Heskett
2006-12-22 17:27 ` Linus Torvalds
2006-12-18 21:43 ` Andrew Morton
2006-12-18 21:49 ` Peter Zijlstra
2006-12-19 23:42 ` Peter Zijlstra
2006-12-20 0:23 ` Linus Torvalds
2006-12-20 9:01 ` Peter Zijlstra
2006-12-20 9:12 ` Peter Zijlstra
2006-12-20 9:39 ` Arjan van de Ven
2006-12-20 11:26 ` [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3) Peter Zijlstra
2006-12-20 11:39 ` Jesper Juhl
2006-12-20 11:42 ` Peter Zijlstra
2006-12-20 12:12 ` Jesper Juhl
2006-12-20 13:00 ` Hugh Dickins
2006-12-20 13:56 ` Peter Zijlstra
2006-12-20 17:03 ` Martin Michlmayr
2006-12-20 17:35 ` Linus Torvalds
2006-12-20 17:53 ` Martin Michlmayr
2006-12-20 19:01 ` Linus Torvalds
2006-12-20 19:50 ` Linus Torvalds
2006-12-20 20:22 ` Peter Zijlstra
2006-12-20 21:55 ` Dave Kleikamp
2006-12-20 22:25 ` Linus Torvalds
2006-12-20 22:59 ` Dave Kleikamp
2006-12-20 22:15 ` Peter Zijlstra
2006-12-20 22:20 ` Peter Zijlstra
2006-12-20 22:49 ` Linus Torvalds
2006-12-20 23:03 ` Peter Zijlstra
2006-12-21 9:16 ` Martin Schwidefsky
2006-12-21 9:20 ` Peter Zijlstra
2006-12-21 9:26 ` Martin Schwidefsky
2006-12-21 20:01 ` Linus Torvalds
2006-12-28 0:00 ` Martin Schwidefsky
2006-12-28 0:42 ` Linus Torvalds
2006-12-28 0:52 ` [PATCH] mm: fix page_mkclean_one David Miller
2006-12-21 2:36 ` [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3) Trond Myklebust
2006-12-21 8:10 ` Peter Zijlstra
2006-12-20 23:24 ` David Chinner
2006-12-20 23:55 ` Linus Torvalds
2006-12-21 1:20 ` David Chinner
2006-12-20 23:32 ` Andrew Morton
2006-12-20 23:55 ` Linus Torvalds
2006-12-21 0:11 ` Andrew Morton
2006-12-21 0:22 ` Linus Torvalds
2006-12-21 0:24 ` Linus Torvalds
2006-12-21 15:48 ` Andrei Popa
2006-12-21 16:58 ` Linus Torvalds
2006-12-21 0:43 ` Linus Torvalds
2006-12-21 1:20 ` Andrew Morton
2006-12-21 2:54 ` Trond Myklebust
2006-12-21 17:19 ` Linus Torvalds
2006-12-21 7:32 ` Gordon Farquharson
2006-12-21 7:53 ` Linus Torvalds
2006-12-21 8:38 ` Martin Michlmayr
2006-12-21 8:59 ` Linus Torvalds
2006-12-21 9:17 ` Gordon Farquharson
2006-12-21 9:27 ` Andrew Morton
2006-12-22 4:20 ` Gordon Farquharson
2006-12-22 4:54 ` Linus Torvalds
2006-12-22 10:00 ` Martin Michlmayr
2006-12-22 10:06 ` Martin Michlmayr
2006-12-22 10:10 ` Martin Michlmayr
2006-12-22 11:07 ` Martin Michlmayr
2006-12-22 15:30 ` Gordon Farquharson
2006-12-22 17:11 ` Martin Michlmayr
2006-12-22 10:17 ` Andrew Morton
2006-12-22 11:12 ` Martin Michlmayr
2006-12-22 12:24 ` Andrei Popa
2006-12-22 12:32 ` Martin Michlmayr
2006-12-22 12:59 ` Martin Michlmayr
2006-12-22 13:25 ` Peter Zijlstra
2006-12-22 13:29 ` Peter Zijlstra
2006-12-22 17:56 ` Linus Torvalds
2006-12-22 19:20 ` Martin Michlmayr
2006-12-24 8:10 ` Gordon Farquharson
2006-12-24 8:43 ` Linus Torvalds
2006-12-24 8:57 ` Andrew Morton
2006-12-24 9:26 ` Linus Torvalds
2006-12-24 12:14 ` Andrei Popa
2006-12-24 12:26 ` Andrei Popa
2006-12-24 12:30 ` Andrew Morton
2006-12-24 12:31 ` Andrew Morton
2006-12-24 16:45 ` Andrei Popa
2006-12-24 17:16 ` Linus Torvalds
2006-12-24 18:07 ` Andrew Morton
2006-12-24 18:37 ` Linus Torvalds
2006-12-24 19:18 ` Linus Torvalds
2006-12-24 20:55 ` Gordon Farquharson
2006-12-26 10:31 ` Nick Piggin
2006-12-26 19:26 ` Linus Torvalds
2006-12-27 12:32 ` Jari Sundell
2006-12-27 12:44 ` valdyn
2006-12-27 13:33 ` Jari Sundell
2007-01-07 2:06 ` Tom Lanyon
2007-01-07 5:58 ` Tom Lanyon
2007-01-07 6:05 ` Andrew Morton
2006-12-24 21:21 ` Michael S. Tsirkin
2006-12-24 19:27 ` Gordon Farquharson
2006-12-24 19:35 ` Linus Torvalds
2006-12-24 20:10 ` Andrei Popa
2006-12-24 20:24 ` Linus Torvalds
2006-12-24 20:30 ` Andrei Popa
2006-12-26 17:51 ` Al Viro
2006-12-26 17:58 ` Al Viro
2006-12-24 22:01 ` Martin Michlmayr
2006-12-24 14:05 ` Martin Michlmayr
2006-12-26 16:17 ` Tobias Diedrich
2006-12-27 4:55 ` [PATCH] mm: fix page_mkclean_one David Miller
2006-12-27 7:00 ` Linus Torvalds
2006-12-27 8:39 ` Andrei Popa
2006-12-28 0:16 ` Linus Torvalds
2006-12-28 0:39 ` Linus Torvalds
2006-12-28 0:52 ` David Miller
2006-12-28 3:04 ` Linus Torvalds
2006-12-28 4:32 ` Gordon Farquharson
2006-12-28 4:53 ` Linus Torvalds
2006-12-28 5:20 ` Gordon Farquharson
2006-12-28 5:41 ` David Miller
2006-12-28 5:47 ` Gordon Farquharson
2006-12-28 10:13 ` Russell King
2006-12-28 14:15 ` Gordon Farquharson
2006-12-28 15:53 ` Martin Michlmayr
2006-12-28 17:27 ` Linus Torvalds
2006-12-28 18:44 ` Russell King
2006-12-28 19:01 ` Linus Torvalds
[not found] ` <97a0a9ac0612272115g4cce1f08n3c3c8498a6076bd5@mail.gmail.com>
[not found] ` <Pine.LNX.4.64.0612272120180.4473@woody.osdl.org>
2006-12-28 5:38 ` Gordon Farquharson
2006-12-28 9:30 ` Martin Michlmayr
2006-12-28 10:16 ` Martin Michlmayr
2006-12-28 10:49 ` Russell King
2006-12-28 14:56 ` Martin Michlmayr
2006-12-28 5:58 ` Gordon Farquharson
2006-12-28 17:08 ` Linus Torvalds
2006-12-28 5:55 ` Chen, Kenneth W
2006-12-28 6:10 ` Chen, Kenneth W
2006-12-28 6:27 ` David Miller
2006-12-28 17:10 ` Linus Torvalds
2006-12-28 9:15 ` Zhang, Yanmin
2006-12-28 17:15 ` Linus Torvalds
2006-12-28 11:50 ` Petri Kaukasoina
2006-12-28 15:09 ` Guillaume Chazarain
2006-12-28 19:19 ` Guillaume Chazarain
2006-12-28 19:28 ` Linus Torvalds
2006-12-28 19:45 ` Andrew Morton
2006-12-28 20:14 ` Linus Torvalds
2006-12-28 22:38 ` David Miller
2006-12-29 2:50 ` Segher Boessenkool
2006-12-29 6:48 ` Linus Torvalds
2006-12-29 8:58 ` Ok, explained.. (was Re: [PATCH] mm: fix page_mkclean_one) Linus Torvalds
2006-12-29 10:48 ` Linus Torvalds
2006-12-29 11:16 ` Andrei Popa
2006-12-29 12:09 ` Nick Piggin
2006-12-29 17:25 ` Linus Torvalds
2006-12-29 12:31 ` Ingo Molnar
2006-12-29 13:08 ` Martin Johansson
2006-12-29 14:08 ` Martin Michlmayr
2006-12-29 15:17 ` Stephen Clark
2006-12-29 15:54 ` Martin Michlmayr
2006-12-29 22:16 ` Andrew Morton
2006-12-29 22:24 ` Andrew Morton
2006-12-29 22:42 ` Linus Torvalds
2006-12-29 23:32 ` Theodore Tso
2006-12-29 23:59 ` Linus Torvalds
2006-12-30 0:05 ` Andrew Morton
2006-12-30 0:50 ` Linus Torvalds
2006-12-29 23:51 ` Andrew Morton [this message]
2006-12-30 0:11 ` Linus Torvalds
2006-12-30 0:33 ` Andrew Morton
2006-12-30 0:58 ` Linus Torvalds
2006-12-30 1:16 ` Andrew Morton
2006-12-29 15:27 ` Theodore Tso
2006-12-29 17:51 ` Linus Torvalds
2006-12-29 12:19 ` [patch] fix data corruption bug in __block_write_full_page() Ingo Molnar
2007-01-02 11:20 ` Christoph Hellwig
2007-01-02 12:06 ` Ingo Molnar
2007-01-02 12:16 ` Christoph Hellwig
2006-12-28 22:35 ` [PATCH] mm: fix page_mkclean_one Mike Galbraith
2006-12-22 15:01 ` [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3) Patrick Mau
2006-12-23 8:15 ` Andrei Popa
2006-12-22 15:08 ` Gordon Farquharson
2006-12-22 10:01 ` Martin Michlmayr
2006-12-22 15:16 ` Gordon Farquharson
2006-12-21 12:30 ` Russell King
2006-12-21 12:36 ` Russell King
2006-12-21 11:21 ` Martin Michlmayr
2006-12-20 22:11 ` Russell King
2006-12-21 8:18 ` Martin Michlmayr
2006-12-21 9:54 ` Russell King
2006-12-20 14:55 ` Martin Schwidefsky
2006-12-20 14:27 ` 2.6.19 file content corruption on ext3 Martin Schwidefsky
2006-12-20 9:32 ` Peter Zijlstra
2006-12-20 14:15 ` Andrei Popa
2006-12-20 14:23 ` Peter Zijlstra
2006-12-20 16:30 ` Andrei Popa
2006-12-20 16:36 ` Peter Zijlstra
2006-12-19 7:38 ` Peter Zijlstra
2006-12-19 4:36 ` Nick Piggin
2006-12-19 6:34 ` Linus Torvalds
2006-12-19 6:51 ` Nick Piggin
2006-12-19 7:26 ` Linus Torvalds
2006-12-19 8:04 ` Linus Torvalds
2006-12-19 9:00 ` Peter Zijlstra
2006-12-19 9:05 ` Peter Zijlstra
[not found] ` <4587B762.2030603@yahoo.com.au>
2006-12-19 10:32 ` Andrew Morton
2006-12-19 10:42 ` Nick Piggin
2006-12-19 10:47 ` Andrew Morton
2006-12-19 10:52 ` Peter Zijlstra
2006-12-19 10:58 ` Nick Piggin
2006-12-19 11:51 ` Peter Zijlstra
2006-12-19 10:55 ` Nick Piggin
2006-12-19 16:51 ` Linus Torvalds
2006-12-19 17:43 ` Linus Torvalds
2006-12-19 18:59 ` Linus Torvalds
2006-12-19 21:30 ` Peter Zijlstra
2006-12-19 22:51 ` Linus Torvalds
2006-12-19 22:58 ` Andrew Morton
2006-12-19 23:06 ` Peter Zijlstra
2006-12-19 23:07 ` Peter Zijlstra
2006-12-20 0:03 ` Linus Torvalds
2006-12-20 0:18 ` Andrew Morton
2006-12-20 18:02 ` Stephen Clark
2006-12-20 5:56 ` Jari Sundell
2006-12-19 21:56 ` Florian Weimer
2006-12-21 13:03 ` Peter Zijlstra
2006-12-21 20:40 ` Andrew Morton
2006-12-19 20:03 ` dean gaudet
2006-12-19 7:22 ` Peter Zijlstra
2006-12-19 7:59 ` Nick Piggin
2006-12-19 8:14 ` Linus Torvalds
2006-12-19 9:40 ` Nick Piggin
2006-12-19 16:46 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20061229155118.3feb0c17.akpm@osdl.org \
--to=akpm@osdl.org \
--cc=a.p.zijlstra@chello.nl \
--cc=andrei.popa@i-neo.ro \
--cc=arjan@infradead.org \
--cc=davem@davemloft.net \
--cc=gordonfarquharson@gmail.com \
--cc=guichaz@yahoo.fr \
--cc=hugh@veritas.com \
--cc=kenneth.w.chen@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=nickpiggin@yahoo.com.au \
--cc=ranma@tdiedrich.de \
--cc=segher@kernel.crashing.org \
--cc=tbm@cyrius.com \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).