From: Peter Zijlstra <a.p.zijlstra@chello.nl>
To: Hugh Dickins <hugh@veritas.com>
Cc: Martin Michlmayr <tbm@cyrius.com>,
Marc Haber <mh+linux-kernel@zugschlus.de>,
Jan Kara <jack@suse.cz>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: 2.6.19 file content corruption on ext3
Date: Sat, 16 Dec 2006 22:29:41 +0100 [thread overview]
Message-ID: <1166304581.10372.18.camel@twins> (raw)
In-Reply-To: <Pine.LNX.4.64.0612161909460.25272@blonde.wat.veritas.com>
On Sat, 2006-12-16 at 19:18 +0000, Hugh Dickins wrote:
> On Sat, 16 Dec 2006, Martin Michlmayr wrote:
> > * Marc Haber <mh+linux-kernel@zugschlus.de> [2006-12-09 10:26]:
> > > Unfortunately, I am lacking the knowledge needed to do this in an
> > > informed way. I am neither familiar enough with git nor do I possess
> > > the necessary C powers.
> >
> > I wonder if what you're seein is related to
> > http://lkml.org/lkml/2006/12/16/73
> >
> > You said that you don't see any corruption with 2.6.18. Can you try
> > to apply the patch from
> > http://www2.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=d08b3851da41d0ee60851f2c75b118e1f7a5fc89
> > to 2.6.18 to see if the corruption shows up?
>
> I did wonder about the very first hunk of Peter's patch, where the
> mapping->private_lock is unlocked earlier now in try_to_free_buffers,
> before the clear_page_dirty. I'm not at all familiar with that area,
> I wonder if Jan has looked at that change, and might be able to say
> whether it's good or not (earlier he worried about his JBD changes,
> but they wouldn't be implicated if just 2.6.18+Peter's gives trouble).
fs/buffers.c:2775
/*
* try_to_free_buffers() checks if all the buffers on this particular page
* are unused, and releases them if so.
*
* Exclusion against try_to_free_buffers may be obtained by either
* locking the page or by holding its mapping's private_lock.
*
* If the page is dirty but all the buffers are clean then we need to
* be sure to mark the page clean as well. This is because the page
* may be against a block device, and a later reattachment of buffers
* to a dirty page will set *all* buffers dirty. Which would corrupt
* filesystem data on the same device.
*
* The same applies to regular filesystem pages: if all the buffers are
* clean then we set the page clean and proceed. To do that, we require
* total exclusion from __set_page_dirty_buffers(). That is obtained with
* private_lock.
*
* try_to_free_buffers() is non-blocking.
*/
Note the 3th paragraph. Would I have opened up a race by moving that
unlock upwards, such that it is possible to re-attach buffers to the
page before having it marked clean; which according to this text will
mark those buffers dirty and cause data corruption?
Hmm, how to go about something like this:
---
Moving the cleaning of the page out from under the private_lock opened
up a window where newly attached buffer might still see the page dirty
status and were thus marked (incorrectly) dirty themselves; resulting in
filesystem data corruption.
Close this by moving the cleaning of the page inside of the private_lock
scope again. However it is not possible to call page_mkclean() from
within the private_lock (this violates locking order); thus introduce a
variant of test_clear_page_dirty() that does not call page_mkclean() and
call it ourselves when we did do clean the page and call it outside of
the private_lock.
This is still safe because the page is still locked by means of
PG_locked.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
fs/buffer.c | 11 +++++++++--
include/linux/page-flags.h | 1 +
mm/page-writeback.c | 10 ++++++++--
3 files changed, 18 insertions(+), 4 deletions(-)
Index: linux-2.6-git/fs/buffer.c
===================================================================
--- linux-2.6-git.orig/fs/buffer.c 2006-12-16 22:18:24.000000000 +0100
+++ linux-2.6-git/fs/buffer.c 2006-12-16 22:22:17.000000000 +0100
@@ -42,6 +42,7 @@
#include <linux/bitops.h>
#include <linux/mpage.h>
#include <linux/bit_spinlock.h>
+#include <linux/rmap.h>
static int fsync_buffers_list(spinlock_t *lock, struct list_head *list);
static void invalidate_bh_lrus(void);
@@ -2832,6 +2833,7 @@ int try_to_free_buffers(struct page *pag
struct address_space * const mapping = page->mapping;
struct buffer_head *buffers_to_free = NULL;
int ret = 0;
+ int must_clean = 0;
BUG_ON(!PageLocked(page));
if (PageWriteback(page))
@@ -2844,7 +2846,6 @@ int try_to_free_buffers(struct page *pag
spin_lock(&mapping->private_lock);
ret = drop_buffers(page, &buffers_to_free);
- spin_unlock(&mapping->private_lock);
if (ret) {
/*
* If the filesystem writes its buffers by hand (eg ext3)
@@ -2858,9 +2859,15 @@ int try_to_free_buffers(struct page *pag
* the page's buffers clean. We discover that here and clean
* the page also.
*/
- if (test_clear_page_dirty(page))
+ if (__test_clear_page_dirty(page, 0)) {
task_io_account_cancelled_write(PAGE_CACHE_SIZE);
+ if (mapping_cap_account_dirty(mapping))
+ must_clean = 1;
+ }
}
+ spin_unlock(&mapping->private_lock);
+ if (must_clean)
+ page_mkclean(page);
out:
if (buffers_to_free) {
struct buffer_head *bh = buffers_to_free;
Index: linux-2.6-git/include/linux/page-flags.h
===================================================================
--- linux-2.6-git.orig/include/linux/page-flags.h 2006-12-16 22:19:56.000000000 +0100
+++ linux-2.6-git/include/linux/page-flags.h 2006-12-16 22:20:07.000000000 +0100
@@ -253,6 +253,7 @@ static inline void SetPageUptodate(struc
struct page; /* forward declaration */
+int __test_clear_page_dirty(struct page *page, int do_clean);
int test_clear_page_dirty(struct page *page);
int test_clear_page_writeback(struct page *page);
int test_set_page_writeback(struct page *page);
Index: linux-2.6-git/mm/page-writeback.c
===================================================================
--- linux-2.6-git.orig/mm/page-writeback.c 2006-12-16 22:18:18.000000000 +0100
+++ linux-2.6-git/mm/page-writeback.c 2006-12-16 22:19:42.000000000 +0100
@@ -854,7 +854,7 @@ EXPORT_SYMBOL(set_page_dirty_lock);
* Clear a page's dirty flag, while caring for dirty memory accounting.
* Returns true if the page was previously dirty.
*/
-int test_clear_page_dirty(struct page *page)
+int __test_clear_page_dirty(struct page *page, int do_clean)
{
struct address_space *mapping = page_mapping(page);
unsigned long flags;
@@ -872,7 +872,8 @@ int test_clear_page_dirty(struct page *p
* page is locked, which pins the address_space
*/
if (mapping_cap_account_dirty(mapping)) {
- page_mkclean(page);
+ if (do_clean)
+ page_mkclean(page);
dec_zone_page_state(page, NR_FILE_DIRTY);
}
return 1;
@@ -880,6 +881,11 @@ int test_clear_page_dirty(struct page *p
write_unlock_irqrestore(&mapping->tree_lock, flags);
return 0;
}
+
+int test_clear_page_dirty(struct page *page)
+{
+ return __test_clear_page_dirty(page, 1);
+}
EXPORT_SYMBOL(test_clear_page_dirty);
/*
next prev parent reply other threads:[~2006-12-16 21:30 UTC|newest]
Thread overview: 154+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-12-07 15:57 2.6.19 file content corruption on ext3 Marc Haber
2006-12-07 16:50 ` Phillip Susi
2006-12-08 1:38 ` Fernando Luis Vázquez Cao
2006-12-08 16:42 ` Marc Haber
2006-12-09 10:47 ` Jan Kara
2006-12-11 19:07 ` Marc Haber
2006-12-14 12:03 ` Jan Kara
2006-12-15 9:30 ` Marc Haber
2006-12-16 8:29 ` Marc Haber
2006-12-09 23:46 ` Mike Galbraith
2006-12-11 9:31 ` Marc Haber
2006-12-09 9:26 ` Marc Haber
2006-12-16 18:43 ` Martin Michlmayr
2006-12-16 19:18 ` Hugh Dickins
2006-12-16 21:29 ` Peter Zijlstra [this message]
2006-12-16 23:08 ` Hugh Dickins
2006-12-17 13:52 ` Jan Kara
2006-12-22 17:05 ` Marc Haber
2006-12-16 18:31 ` Florian Weimer
2006-12-17 11:52 ` Andrew Morton
2006-12-22 13:30 ` Daniel Drake
2006-12-22 17:03 ` Marc Haber
2006-12-17 0:13 Andrei Popa
2006-12-17 12:06 ` Andrew Morton
2006-12-17 12:19 ` Marc Haber
2006-12-17 12:32 ` Andrei Popa
2006-12-17 13:39 ` Andrei Popa
2006-12-17 23:40 ` Andrew Morton
2006-12-18 1:02 ` Linus Torvalds
2006-12-18 1:22 ` Linus Torvalds
2006-12-18 1:29 ` Linus Torvalds
2006-12-18 1:57 ` Linus Torvalds
2006-12-18 4:51 ` Nick Piggin
2006-12-18 5:43 ` Andrew Morton
2006-12-18 7:22 ` Nick Piggin
2006-12-18 9:18 ` Andrew Morton
2006-12-18 9:26 ` Andrei Popa
2006-12-18 9:42 ` Nick Piggin
2006-12-19 8:51 ` Marc Haber
2006-12-19 9:28 ` Martin Michlmayr
2006-12-28 18:05 ` Marc Haber
2006-12-28 19:00 ` Linus Torvalds
2006-12-28 19:05 ` Petri Kaukasoina
2006-12-28 19:21 ` Linus Torvalds
2006-12-28 19:39 ` Dave Jones
2006-12-28 20:10 ` Arjan van de Ven
2006-12-29 9:23 ` maximilian attems
2006-12-29 15:02 ` Dave Jones
2006-12-29 18:52 ` maximilian attems
2006-12-29 19:14 ` Dave Jones
2006-12-28 21:24 ` Linus Torvalds
2006-12-28 21:36 ` Russell King
2006-12-28 22:37 ` Linus Torvalds
2006-12-28 22:50 ` David Miller
2006-12-28 23:01 ` Linus Torvalds
2006-12-29 1:38 ` Linus Torvalds
2006-12-29 1:59 ` Andrew Morton
2006-12-28 23:36 ` Anton Altaparmakov
2006-12-28 23:54 ` Linus Torvalds
2006-12-29 17:49 ` Guillaume Chazarain
2006-12-18 5:50 ` Linus Torvalds
2006-12-18 7:16 ` Andrew Morton
2006-12-18 7:17 ` Andrew Morton
2006-12-18 9:30 ` Nick Piggin
2006-12-18 7:30 ` Nick Piggin
2006-12-18 9:19 ` Andrei Popa
2006-12-18 9:38 ` Andrew Morton
2006-12-18 10:00 ` Andrei Popa
2006-12-18 10:11 ` Peter Zijlstra
2006-12-18 10:49 ` Andrei Popa
2006-12-18 15:24 ` Gene Heskett
2006-12-18 15:32 ` Peter Zijlstra
2006-12-18 15:47 ` Gene Heskett
2006-12-18 16:55 ` Peter Zijlstra
2006-12-18 18:03 ` Linus Torvalds
2006-12-18 18:24 ` Peter Zijlstra
2006-12-18 18:35 ` Linus Torvalds
2006-12-18 19:04 ` Andrei Popa
2006-12-18 19:10 ` Peter Zijlstra
2006-12-18 19:18 ` Linus Torvalds
2006-12-18 19:44 ` Andrei Popa
2006-12-18 20:14 ` Linus Torvalds
2006-12-18 20:41 ` Linus Torvalds
2006-12-18 21:11 ` Andrei Popa
2006-12-18 22:00 ` Alessandro Suardi
2006-12-18 22:45 ` Linus Torvalds
2006-12-19 0:13 ` Andrei Popa
2006-12-19 0:29 ` Linus Torvalds
2006-12-18 22:32 ` Linus Torvalds
2006-12-18 23:48 ` Andrei Popa
2006-12-19 0:04 ` Linus Torvalds
2006-12-19 0:29 ` Andrei Popa
2006-12-19 0:57 ` Linus Torvalds
2006-12-19 1:21 ` Andrew Morton
2006-12-19 1:44 ` Andrei Popa
2006-12-19 1:54 ` Andrew Morton
2006-12-19 2:04 ` Andrei Popa
2006-12-19 8:05 ` Andrei Popa
2006-12-19 8:24 ` Andrew Morton
2006-12-19 8:34 ` Pekka Enberg
2006-12-19 9:13 ` Marc Haber
2006-12-19 1:50 ` Andrei Popa
2006-12-19 1:03 ` Gene Heskett
2006-12-18 22:34 ` Gene Heskett
2006-12-22 17:27 ` Linus Torvalds
2006-12-18 21:43 ` Andrew Morton
2006-12-18 21:49 ` Peter Zijlstra
2006-12-19 23:42 ` Peter Zijlstra
2006-12-20 0:23 ` Linus Torvalds
2006-12-20 9:01 ` Peter Zijlstra
2006-12-20 9:12 ` Peter Zijlstra
2006-12-20 9:39 ` Arjan van de Ven
2006-12-20 14:27 ` Martin Schwidefsky
2006-12-20 9:32 ` Peter Zijlstra
2006-12-20 14:15 ` Andrei Popa
2006-12-20 14:23 ` Peter Zijlstra
2006-12-20 16:30 ` Andrei Popa
2006-12-20 16:36 ` Peter Zijlstra
2006-12-19 7:38 ` Peter Zijlstra
2006-12-19 4:36 ` Nick Piggin
2006-12-19 6:34 ` Linus Torvalds
2006-12-19 6:51 ` Nick Piggin
2006-12-19 7:26 ` Linus Torvalds
2006-12-19 8:04 ` Linus Torvalds
2006-12-19 9:00 ` Peter Zijlstra
2006-12-19 9:05 ` Peter Zijlstra
[not found] ` <4587B762.2030603@yahoo.com.au>
2006-12-19 10:32 ` Andrew Morton
2006-12-19 10:42 ` Nick Piggin
2006-12-19 10:47 ` Andrew Morton
2006-12-19 10:52 ` Peter Zijlstra
2006-12-19 10:58 ` Nick Piggin
2006-12-19 11:51 ` Peter Zijlstra
2006-12-19 10:55 ` Nick Piggin
2006-12-19 16:51 ` Linus Torvalds
2006-12-19 17:43 ` Linus Torvalds
2006-12-19 18:59 ` Linus Torvalds
2006-12-19 21:30 ` Peter Zijlstra
2006-12-19 22:51 ` Linus Torvalds
2006-12-19 22:58 ` Andrew Morton
2006-12-19 23:06 ` Peter Zijlstra
2006-12-19 23:07 ` Peter Zijlstra
2006-12-20 0:03 ` Linus Torvalds
2006-12-20 0:18 ` Andrew Morton
2006-12-20 18:02 ` Stephen Clark
2006-12-20 5:56 ` Jari Sundell
2006-12-19 21:56 ` Florian Weimer
2006-12-21 13:03 ` Peter Zijlstra
2006-12-21 20:40 ` Andrew Morton
2006-12-19 20:03 ` dean gaudet
2006-12-19 7:22 ` Peter Zijlstra
2006-12-19 7:59 ` Nick Piggin
2006-12-19 8:14 ` Linus Torvalds
2006-12-19 9:40 ` Nick Piggin
2006-12-19 16:46 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1166304581.10372.18.camel@twins \
--to=a.p.zijlstra@chello.nl \
--cc=hugh@veritas.com \
--cc=jack@suse.cz \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mh+linux-kernel@zugschlus.de \
--cc=tbm@cyrius.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).