From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752317AbWL2MXe (ORCPT ); Fri, 29 Dec 2006 07:23:34 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752326AbWL2MXe (ORCPT ); Fri, 29 Dec 2006 07:23:34 -0500 Received: from mx2.mail.elte.hu ([157.181.151.9]:57246 "EHLO mx2.mail.elte.hu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752317AbWL2MXd (ORCPT ); Fri, 29 Dec 2006 07:23:33 -0500 Date: Fri, 29 Dec 2006 13:19:46 +0100 From: Ingo Molnar To: Linus Torvalds Cc: Segher Boessenkool , David Miller , nickpiggin@yahoo.com.au, kenneth.w.chen@intel.com, guichaz@yahoo.fr, hugh@veritas.com, linux-kernel@vger.kernel.org, ranma@tdiedrich.de, gordonfarquharson@gmail.com, akpm@osdl.org, a.p.zijlstra@chello.nl, tbm@cyrius.com, arjan@infradead.org, andrei.popa@i-neo.ro Subject: [patch] fix data corruption bug in __block_write_full_page() Message-ID: <20061229121946.GA17837@elte.hu> References: <20061228114517.3315aee7.akpm@osdl.org> <20061228.143815.41633302.davem@davemloft.net> <3d6d8711f7b892a11801d43c5996ebdf@kernel.crashing.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.2i X-ELTE-VirusStatus: clean X-ELTE-SpamScore: -2.6 X-ELTE-SpamLevel: X-ELTE-SpamCheck: no X-ELTE-SpamVersion: ELTE 2.0 X-ELTE-SpamCheck-Details: score=-2.6 required=5.9 tests=BAYES_00 autolearn=no SpamAssassin version=3.0.3 -2.6 BAYES_00 BODY: Bayesian spam probability is 0 to 1% [score: 0.0000] Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org * Linus Torvalds wrote: > I do have a few interesting details from the trace I haven't really > analyzed yet. Here's the trace for events on one of the pages that was > corrupted. Note how the events are numbered (there were 171640 events > total), so the thing you see is just a small set of events from the > whole big trace, but it's the ones that talk about _that_ particular > page. i've extended the tracer in -rt to trace all relevant pagetable, pagecache, buffer-cache and IO events and coupled the tracer to your test.c code. The corruption happens here: test-2126 0.... 3756170us+: trace_page (cf20ebd8 b6a2c000 0) pdflush-2006 0.... 6432909us+: trace_page (cf20ebd8 b6a2c000 4200420) test-2126 0.... 8135596us+: trace_page (cf20ebd8 b6a2c000 4200420) test-2126 0D... 9012933us+: do_page_fault (8048900 4 b6a2c000) test-2126 0.... 9023278us+: trace_page (cf262f24 b6a2c000 0) test-2126 0.... 9023305us > sys_prctl (000000d8 b6a2c000 000000ac) address 0xb6a2c000 is the one that shows the corruption. Now, this address is mapped to page cf262f24 when the bug happened, but it had page 0xcf20ebd8 mapped to it 3 seconds ago, which has this history: test-2126 0.... 3756413us+: trace_page (cf20ebd8 0 0) test-2126 0.... 3756469us+: trace_page (cf20ebd8 0 0) test-2126 0.... 3757341us+: trace_page (cf20ebd8 10 0) IRQ-14-402 0.... 3759332us+: trace_page (cf20ebd8 ffffffff 0) IRQ-14-402 0.... 3759376us+: trace_page (cf20ebd8 ffffffff 0) test-2126 0.... 5104662us+: trace_page (cf20ebd8 b6a2c400 0) test-2126 0.... 5104687us+: trace_page (cf20ebd8 1 0) pdflush-2006 0.... 6432909us+: trace_page (cf20ebd8 b6a2c000 4200420) pdflush-2006 0.... 6432952us+: trace_page (cf20ebd8 ffffffff 4200420) pdflush-2006 0.... 6432986us+: trace_page (cf20ebd8 1 4200420) pdflush-2006 0.... 6433022us+: trace_page (cf20ebd8 4096 4200420) pdflush-2006 0.... 6433061us+: trace_page (cf20ebd8 0 4200420) pdflush-2006 0.... 6433112us+: trace_page (cf20ebd8 0 4200420) pdflush-2006 0.... 6433154us+: trace_page (cf20ebd8 0 4200420) pdflush-2006 0.... 6433303us+: trace_page (cf20ebd8 11 4200420) pdflush-2006 0.... 6433343us+: trace_page (cf20ebd8 13 4200420) pdflush-2006 0.... 6433382us+: trace_page (cf20ebd8 14 4200420) pdflush-2006 0.... 6433421us+: trace_page (cf20ebd8 15 4200420) pdflush-2006 0.... 6433460us+: trace_page (cf20ebd8 ffffffff 4200420) pdflush-2006 0.... 6433504us+: trace_page (cf20ebd8 ffffffff 4200420) test-2126 0.... 8135596us+: trace_page (cf20ebd8 b6a2c000 4200420) in particular timestamp 6433421us is interesting: pdflush-2006 0.... 6433504us+: trace_page (cf20ebd8 ffffffff 4200420) pdflush-2006 0.... 6433526us : trace_page()<-test_clear_page_writeback()<-end_page_writeback()<-__block_write_full_page() pdflush-2006 0.... 6433526us+: block_write_full_page()<-ext3_ordered_writepage()<-generic_writepages()<-(-1)() i.e. the page got its pending writeback cancelled in block_write_full_page(), without any IRQ#14 activity whatsoever! That looks quite suspect. It is this piece of code in __block_write_full_page(): /* * The page was marked dirty, but the buffers were * clean. Someone wrote them back by hand with * ll_rw_block/submit_bh. A rare case. */ .... if (uptodate) SetPageUptodate(page); end_page_writeback(page); A 'rare case' ... hm. So i tried a quick workaround below, just to keep us from marking the page clean, to see whether the corruption goes away - and i was unable to trigger the corruption after half an hour of testing, while before it triggered within 10 seconds! now this patch is only an ugly hack, but the bug definitely seems to be related to buffer management, as you suspected. Ingo --- fs/buffer.c | 1 + 1 file changed, 1 insertion(+) Index: linux/fs/buffer.c =================================================================== --- linux.orig/fs/buffer.c +++ linux/fs/buffer.c @@ -1702,6 +1702,7 @@ done: } while (bh != head); if (uptodate) SetPageUptodate(page); + set_page_dirty(page); end_page_writeback(page); /* * The page and buffer_heads can be released at any time from