From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758007Ab0KPLNp (ORCPT ); Tue, 16 Nov 2010 06:13:45 -0500 Received: from smtp-out-143.synserver.de ([212.40.180.143]:1036 "HELO smtp-out-142.synserver.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with SMTP id S1757318Ab0KPLNn (ORCPT ); Tue, 16 Nov 2010 06:13:43 -0500 X-SynServer-TrustedSrc: 1 X-SynServer-AuthUser: markus@trippelsdorf.de X-SynServer-PPID: 7564 Date: Tue, 16 Nov 2010 12:13:39 +0100 From: Markus Trippelsdorf To: Hugh Dickins Cc: Christoph Lameter , linux-mm@kvack.org, linux-kernel@vger.kernel.org, "Theodore Ts'o" , linux-ext4@vger.kernel.org Subject: [PATCH] ext4 Fix setting random pages PageUptodate Message-ID: <20101116111339.GA1544@arch.trippelsdorf.de> References: <20101110152519.GA1626@arch.trippelsdorf.de> <20101110154057.GA2191@arch.trippelsdorf.de> <20101112122003.GA1572@arch.trippelsdorf.de> <20101115123846.GA30047@arch.trippelsdorf.de> <20101115195439.GA1569@arch.trippelsdorf.de> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2010.11.15 at 13:23 -0800, Hugh Dickins wrote: > On Mon, Nov 15, 2010 at 11:54 AM, Markus Trippelsdorf > wrote: > > On 2010.11.15 at 13:38 +0100, Markus Trippelsdorf wrote: > >> On 2010.11.12 at 13:20 +0100, Markus Trippelsdorf wrote: > >> > > >> > Yes. Fortunately the BUG is gone since I pulled the upcoming drm fixes > >> > >> No. I happend again today (with those fixes already applied): > >> > >> BUG: Bad page state in process knode  pfn:7f0a8 > >> page:ffffea0001bca4c0 count:0 mapcount:0 mapping:          (null) index:0x0 > >> page flags: 0x4000000000000008(uptodate) > >> Pid: 18310, comm: knode Not tainted 2.6.37-rc1-00549-gae712bf-dirty #16 > >> Call Trace: > >>  [] ? bad_page+0x92/0xe0 > >>  [] ? get_page_from_freelist+0x4b0/0x570 > >>  [] ? apic_timer_interrupt+0xe/0x20 > >>  [] ? __alloc_pages_nodemask+0x113/0x6b0 > >>  [] ? file_read_actor+0xc4/0x190 > >>  [] ? generic_file_aio_read+0x560/0x6b0 > >>  [] ? handle_mm_fault+0x6bd/0x970 > >>  [] ? do_page_fault+0x120/0x410 > >>  [] ? do_brk+0x275/0x360 > >>  [] ? page_fault+0x1f/0x30 > >> Disabling lock debugging due to kernel taint > > > > And another one. But this time it seems to point to ext4: > > > > BUG: Bad page state in process rm  pfn:52e54 > > page:ffffea0001222260 count:0 mapcount:0 mapping:          (null) index:0x0 > > page flags: 0x4000000000000008(uptodate) > > Pid: 2084, comm: rm Not tainted 2.6.37-rc1-00549-gae712bf-dirty #23 > > Call Trace: > >  [] ? bad_page+0x92/0xe0 > >  [] ? get_page_from_freelist+0x4b0/0x570 > >  [] ? ext4_ext_put_in_cache+0x46/0x90 > >  [] ? __alloc_pages_nodemask+0x113/0x6b0 > >  [] ? number.clone.2+0x2b7/0x2f0 > >  [] ? find_get_page+0x75/0xb0 > >  [] ? find_or_create_page+0x51/0xb0 > >  [] ? __getblk+0xd7/0x260 > >  [] ? ext4_getblk+0x8f/0x1e0 > >  [] ? ext4_bread+0xd/0x70 > >  [] ? htree_dirblock_to_tree+0x34/0x190 > >  [] ? ext4_htree_fill_tree+0x9f/0x250 > >  [] ? do_filp_open+0x12d/0x5e0 > >  [] ? ext4_readdir+0x14d/0x5a0 > >  [] ? filldir+0x0/0xd0 > >  [] ? vfs_readdir+0xa8/0xd0 > >  [] ? filldir+0x0/0xd0 > >  [] ? sys_getdents+0x81/0xf0 > >  [] ? system_call_fastpath+0x16/0x1b > > Disabling lock debugging due to kernel taint > > > > I don't know. Could a possible bug in linux/fs/ext4/page-io.c be > > responsible for something like this? > > I do think you're right: every one of your "Bad page state" reports > has been complaining only about the PageUptodate bit being set, and > that SetPageUpdate() in ext4_end_bio() does look suspicious, coming > after the put_page(). > > The more suspicious given that other races have been noticed in > precisely that area, and fixed with put_io_page() in the current git > tree. > > Perhaps that fixes your problem, but my guess would be not: I suspect > the "if (!partial_write) SetPageUpdate(page);" should be done before > the block (or put_io_page) which does the put_page(). ext4_end_bio calls put_page and kmem_cache_free before calling SetPageUpdate(). This can result in setting the PageUptodate bit on random pages and causes the following BUG: BUG: Bad page state in process rm pfn:52e54 page:ffffea0001222260 count:0 mapcount:0 mapping: (null) index:0x0 arch kernel: page flags: 0x4000000000000008(uptodate) Fix the problem by moving put_io_page() after the SetPageUpdate() call. Thanks to Hugh Dickins for analyzing this problem. Reported-and-tested-by: Markus Trippelsdorf Signed-off-by: Markus Trippelsdorf --- fs/ext4/page-io.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c index 7f5451c..beacce1 100644 --- a/fs/ext4/page-io.c +++ b/fs/ext4/page-io.c @@ -237,8 +237,6 @@ static void ext4_end_bio(struct bio *bio, int error) } while (bh != head); } - put_io_page(io_end->pages[i]); - /* * If this is a partial write which happened to make * all buffers uptodate then we can optimize away a @@ -248,6 +246,8 @@ static void ext4_end_bio(struct bio *bio, int error) */ if (!partial_write) SetPageUptodate(page); + + put_io_page(io_end->pages[i]); } io_end->num_io_pages = 0; inode = io_end->inode; -- Markus