From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933121Ab0KOVXq (ORCPT ); Mon, 15 Nov 2010 16:23:46 -0500 Received: from smtp-out.google.com ([74.125.121.35]:11184 "EHLO smtp-out.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755003Ab0KOVXo convert rfc822-to-8bit (ORCPT ); Mon, 15 Nov 2010 16:23:44 -0500 DomainKey-Signature: a=rsa-sha1; c=nofws; d=google.com; s=beta; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=hG2L5iBX8++2Qa5SN0/0Rb0/198sR3F84USX4mLoZahtQ7/hbJT+bca2qAGcq1y3Jk 0C1ksX0GN6mymo+acD4Q== MIME-Version: 1.0 In-Reply-To: <20101115195439.GA1569@arch.trippelsdorf.de> References: <20101110152519.GA1626@arch.trippelsdorf.de> <20101110154057.GA2191@arch.trippelsdorf.de> <20101112122003.GA1572@arch.trippelsdorf.de> <20101115123846.GA30047@arch.trippelsdorf.de> <20101115195439.GA1569@arch.trippelsdorf.de> Date: Mon, 15 Nov 2010 13:23:41 -0800 Message-ID: Subject: Re: BUG: Bad page state in process (current git) From: Hugh Dickins To: Markus Trippelsdorf Cc: Christoph Lameter , linux-mm@kvack.org, linux-kernel@vger.kernel.org, "Theodore Ts'o" , linux-ext4@vger.kernel.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT X-System-Of-Record: true Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Nov 15, 2010 at 11:54 AM, Markus Trippelsdorf wrote: > On 2010.11.15 at 13:38 +0100, Markus Trippelsdorf wrote: >> On 2010.11.12 at 13:20 +0100, Markus Trippelsdorf wrote: >> > >> > Yes. Fortunately the BUG is gone since I pulled the upcoming drm fixes >> >> No. I happend again today (with those fixes already applied): >> >> BUG: Bad page state in process knode  pfn:7f0a8 >> page:ffffea0001bca4c0 count:0 mapcount:0 mapping:          (null) index:0x0 >> page flags: 0x4000000000000008(uptodate) >> Pid: 18310, comm: knode Not tainted 2.6.37-rc1-00549-gae712bf-dirty #16 >> Call Trace: >>  [] ? bad_page+0x92/0xe0 >>  [] ? get_page_from_freelist+0x4b0/0x570 >>  [] ? apic_timer_interrupt+0xe/0x20 >>  [] ? __alloc_pages_nodemask+0x113/0x6b0 >>  [] ? file_read_actor+0xc4/0x190 >>  [] ? generic_file_aio_read+0x560/0x6b0 >>  [] ? handle_mm_fault+0x6bd/0x970 >>  [] ? do_page_fault+0x120/0x410 >>  [] ? do_brk+0x275/0x360 >>  [] ? page_fault+0x1f/0x30 >> Disabling lock debugging due to kernel taint > > And another one. But this time it seems to point to ext4: > > BUG: Bad page state in process rm  pfn:52e54 > page:ffffea0001222260 count:0 mapcount:0 mapping:          (null) index:0x0 > page flags: 0x4000000000000008(uptodate) > Pid: 2084, comm: rm Not tainted 2.6.37-rc1-00549-gae712bf-dirty #23 > Call Trace: >  [] ? bad_page+0x92/0xe0 >  [] ? get_page_from_freelist+0x4b0/0x570 >  [] ? ext4_ext_put_in_cache+0x46/0x90 >  [] ? __alloc_pages_nodemask+0x113/0x6b0 >  [] ? number.clone.2+0x2b7/0x2f0 >  [] ? find_get_page+0x75/0xb0 >  [] ? find_or_create_page+0x51/0xb0 >  [] ? __getblk+0xd7/0x260 >  [] ? ext4_getblk+0x8f/0x1e0 >  [] ? ext4_bread+0xd/0x70 >  [] ? htree_dirblock_to_tree+0x34/0x190 >  [] ? ext4_htree_fill_tree+0x9f/0x250 >  [] ? do_filp_open+0x12d/0x5e0 >  [] ? ext4_readdir+0x14d/0x5a0 >  [] ? filldir+0x0/0xd0 >  [] ? vfs_readdir+0xa8/0xd0 >  [] ? filldir+0x0/0xd0 >  [] ? sys_getdents+0x81/0xf0 >  [] ? system_call_fastpath+0x16/0x1b > Disabling lock debugging due to kernel taint > > I don't know. Could a possible bug in linux/fs/ext4/page-io.c be > responsible for something like this? I do think you're right: every one of your "Bad page state" reports has been complaining only about the PageUptodate bit being set, and that SetPageUpdate() in ext4_end_bio() does look suspicious, coming after the put_page(). The more suspicious given that other races have been noticed in precisely that area, and fixed with put_io_page() in the current git tree. Perhaps that fixes your problem, but my guess would be not: I suspect the "if (!partial_write) SetPageUpdate(page);" should be done before the block (or put_io_page) which does the put_page(). Hugh From mboxrd@z Thu Jan 1 00:00:00 1970 From: Hugh Dickins Subject: Re: BUG: Bad page state in process (current git) Date: Mon, 15 Nov 2010 13:23:41 -0800 Message-ID: References: <20101110152519.GA1626@arch.trippelsdorf.de> <20101110154057.GA2191@arch.trippelsdorf.de> <20101112122003.GA1572@arch.trippelsdorf.de> <20101115123846.GA30047@arch.trippelsdorf.de> <20101115195439.GA1569@arch.trippelsdorf.de> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Christoph Lameter , linux-mm@kvack.org, linux-kernel@vger.kernel.org, "Theodore Ts'o" , linux-ext4@vger.kernel.org To: Markus Trippelsdorf Return-path: Received: from smtp-out.google.com ([216.239.44.51]:5379 "EHLO smtp-out.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756504Ab0KOVXo convert rfc822-to-8bit (ORCPT ); Mon, 15 Nov 2010 16:23:44 -0500 Received: from hpaq1.eem.corp.google.com (hpaq1.eem.corp.google.com [172.25.149.1]) by smtp-out.google.com with ESMTP id oAFLNhl6014305 for ; Mon, 15 Nov 2010 13:23:43 -0800 Received: from gxk23 (gxk23.prod.google.com [10.202.11.23]) by hpaq1.eem.corp.google.com with ESMTP id oAFLN2og010796 for ; Mon, 15 Nov 2010 13:23:41 -0800 Received: by gxk23 with SMTP id 23so3236489gxk.19 for ; Mon, 15 Nov 2010 13:23:41 -0800 (PST) In-Reply-To: <20101115195439.GA1569@arch.trippelsdorf.de> Sender: linux-ext4-owner@vger.kernel.org List-ID: On Mon, Nov 15, 2010 at 11:54 AM, Markus Trippelsdorf wrote: > On 2010.11.15 at 13:38 +0100, Markus Trippelsdorf wrote: >> On 2010.11.12 at 13:20 +0100, Markus Trippelsdorf wrote: >> > >> > Yes. Fortunately the BUG is gone since I pulled the upcoming drm f= ixes >> >> No. I happend again today (with those fixes already applied): >> >> BUG: Bad page state in process knode =C2=A0pfn:7f0a8 >> page:ffffea0001bca4c0 count:0 mapcount:0 mapping: =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0(null) index:0x0 >> page flags: 0x4000000000000008(uptodate) >> Pid: 18310, comm: knode Not tainted 2.6.37-rc1-00549-gae712bf-dirty = #16 >> Call Trace: >> =C2=A0[] ? bad_page+0x92/0xe0 >> =C2=A0[] ? get_page_from_freelist+0x4b0/0x570 >> =C2=A0[] ? apic_timer_interrupt+0xe/0x20 >> =C2=A0[] ? __alloc_pages_nodemask+0x113/0x6b0 >> =C2=A0[] ? file_read_actor+0xc4/0x190 >> =C2=A0[] ? generic_file_aio_read+0x560/0x6b0 >> =C2=A0[] ? handle_mm_fault+0x6bd/0x970 >> =C2=A0[] ? do_page_fault+0x120/0x410 >> =C2=A0[] ? do_brk+0x275/0x360 >> =C2=A0[] ? page_fault+0x1f/0x30 >> Disabling lock debugging due to kernel taint > > And another one. But this time it seems to point to ext4: > > BUG: Bad page state in process rm =C2=A0pfn:52e54 > page:ffffea0001222260 count:0 mapcount:0 mapping: =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0(null) index:0x0 > page flags: 0x4000000000000008(uptodate) > Pid: 2084, comm: rm Not tainted 2.6.37-rc1-00549-gae712bf-dirty #23 > Call Trace: > =C2=A0[] ? bad_page+0x92/0xe0 > =C2=A0[] ? get_page_from_freelist+0x4b0/0x570 > =C2=A0[] ? ext4_ext_put_in_cache+0x46/0x90 > =C2=A0[] ? __alloc_pages_nodemask+0x113/0x6b0 > =C2=A0[] ? number.clone.2+0x2b7/0x2f0 > =C2=A0[] ? find_get_page+0x75/0xb0 > =C2=A0[] ? find_or_create_page+0x51/0xb0 > =C2=A0[] ? __getblk+0xd7/0x260 > =C2=A0[] ? ext4_getblk+0x8f/0x1e0 > =C2=A0[] ? ext4_bread+0xd/0x70 > =C2=A0[] ? htree_dirblock_to_tree+0x34/0x190 > =C2=A0[] ? ext4_htree_fill_tree+0x9f/0x250 > =C2=A0[] ? do_filp_open+0x12d/0x5e0 > =C2=A0[] ? ext4_readdir+0x14d/0x5a0 > =C2=A0[] ? filldir+0x0/0xd0 > =C2=A0[] ? vfs_readdir+0xa8/0xd0 > =C2=A0[] ? filldir+0x0/0xd0 > =C2=A0[] ? sys_getdents+0x81/0xf0 > =C2=A0[] ? system_call_fastpath+0x16/0x1b > Disabling lock debugging due to kernel taint > > I don't know. Could a possible bug in linux/fs/ext4/page-io.c be > responsible for something like this? I do think you're right: every one of your "Bad page state" reports has been complaining only about the PageUptodate bit being set, and that SetPageUpdate() in ext4_end_bio() does look suspicious, coming after the put_page(). The more suspicious given that other races have been noticed in precisely that area, and fixed with put_io_page() in the current git tree. Perhaps that fixes your problem, but my guess would be not: I suspect the "if (!partial_write) SetPageUpdate(page);" should be done before the block (or put_io_page) which does the put_page(). Hugh -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" i= n the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail138.messagelabs.com (mail138.messagelabs.com [216.82.249.35]) by kanga.kvack.org (Postfix) with ESMTP id 541A98D006C for ; Mon, 15 Nov 2010 16:23:46 -0500 (EST) Received: from wpaz5.hot.corp.google.com (wpaz5.hot.corp.google.com [172.24.198.69]) by smtp-out.google.com with ESMTP id oAFLNgkR006321 for ; Mon, 15 Nov 2010 13:23:42 -0800 Received: from gxk1 (gxk1.prod.google.com [10.202.11.1]) by wpaz5.hot.corp.google.com with ESMTP id oAFLNGp3006042 for ; Mon, 15 Nov 2010 13:23:41 -0800 Received: by gxk1 with SMTP id 1so2067851gxk.32 for ; Mon, 15 Nov 2010 13:23:41 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <20101115195439.GA1569@arch.trippelsdorf.de> References: <20101110152519.GA1626@arch.trippelsdorf.de> <20101110154057.GA2191@arch.trippelsdorf.de> <20101112122003.GA1572@arch.trippelsdorf.de> <20101115123846.GA30047@arch.trippelsdorf.de> <20101115195439.GA1569@arch.trippelsdorf.de> Date: Mon, 15 Nov 2010 13:23:41 -0800 Message-ID: Subject: Re: BUG: Bad page state in process (current git) From: Hugh Dickins Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org To: Markus Trippelsdorf Cc: Christoph Lameter , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Theodore Ts'o , linux-ext4@vger.kernel.org List-ID: On Mon, Nov 15, 2010 at 11:54 AM, Markus Trippelsdorf wrote: > On 2010.11.15 at 13:38 +0100, Markus Trippelsdorf wrote: >> On 2010.11.12 at 13:20 +0100, Markus Trippelsdorf wrote: >> > >> > Yes. Fortunately the BUG is gone since I pulled the upcoming drm fixes >> >> No. I happend again today (with those fixes already applied): >> >> BUG: Bad page state in process knode =C2=A0pfn:7f0a8 >> page:ffffea0001bca4c0 count:0 mapcount:0 mapping: =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0(null) index:0x0 >> page flags: 0x4000000000000008(uptodate) >> Pid: 18310, comm: knode Not tainted 2.6.37-rc1-00549-gae712bf-dirty #16 >> Call Trace: >> =C2=A0[] ? bad_page+0x92/0xe0 >> =C2=A0[] ? get_page_from_freelist+0x4b0/0x570 >> =C2=A0[] ? apic_timer_interrupt+0xe/0x20 >> =C2=A0[] ? __alloc_pages_nodemask+0x113/0x6b0 >> =C2=A0[] ? file_read_actor+0xc4/0x190 >> =C2=A0[] ? generic_file_aio_read+0x560/0x6b0 >> =C2=A0[] ? handle_mm_fault+0x6bd/0x970 >> =C2=A0[] ? do_page_fault+0x120/0x410 >> =C2=A0[] ? do_brk+0x275/0x360 >> =C2=A0[] ? page_fault+0x1f/0x30 >> Disabling lock debugging due to kernel taint > > And another one. But this time it seems to point to ext4: > > BUG: Bad page state in process rm =C2=A0pfn:52e54 > page:ffffea0001222260 count:0 mapcount:0 mapping: =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0(null) index:0x0 > page flags: 0x4000000000000008(uptodate) > Pid: 2084, comm: rm Not tainted 2.6.37-rc1-00549-gae712bf-dirty #23 > Call Trace: > =C2=A0[] ? bad_page+0x92/0xe0 > =C2=A0[] ? get_page_from_freelist+0x4b0/0x570 > =C2=A0[] ? ext4_ext_put_in_cache+0x46/0x90 > =C2=A0[] ? __alloc_pages_nodemask+0x113/0x6b0 > =C2=A0[] ? number.clone.2+0x2b7/0x2f0 > =C2=A0[] ? find_get_page+0x75/0xb0 > =C2=A0[] ? find_or_create_page+0x51/0xb0 > =C2=A0[] ? __getblk+0xd7/0x260 > =C2=A0[] ? ext4_getblk+0x8f/0x1e0 > =C2=A0[] ? ext4_bread+0xd/0x70 > =C2=A0[] ? htree_dirblock_to_tree+0x34/0x190 > =C2=A0[] ? ext4_htree_fill_tree+0x9f/0x250 > =C2=A0[] ? do_filp_open+0x12d/0x5e0 > =C2=A0[] ? ext4_readdir+0x14d/0x5a0 > =C2=A0[] ? filldir+0x0/0xd0 > =C2=A0[] ? vfs_readdir+0xa8/0xd0 > =C2=A0[] ? filldir+0x0/0xd0 > =C2=A0[] ? sys_getdents+0x81/0xf0 > =C2=A0[] ? system_call_fastpath+0x16/0x1b > Disabling lock debugging due to kernel taint > > I don't know. Could a possible bug in linux/fs/ext4/page-io.c be > responsible for something like this? I do think you're right: every one of your "Bad page state" reports has been complaining only about the PageUptodate bit being set, and that SetPageUpdate() in ext4_end_bio() does look suspicious, coming after the put_page(). The more suspicious given that other races have been noticed in precisely that area, and fixed with put_io_page() in the current git tree. Perhaps that fixes your problem, but my guess would be not: I suspect the "if (!partial_write) SetPageUpdate(page);" should be done before the block (or put_io_page) which does the put_page(). Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: email@kvack.org