From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Tue, 5 Jan 2016 12:13:46 +0100 From: Jan Kara Subject: Re: [PATCH v6 4/7] dax: add support for fsync/msync Message-ID: <20160105111346.GC2724@quack.suse.cz> References: <1450899560-26708-1-git-send-email-ross.zwisler@linux.intel.com> <1450899560-26708-5-git-send-email-ross.zwisler@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org To: Dan Williams Cc: Ross Zwisler , "linux-kernel@vger.kernel.org" , "H. Peter Anvin" , "J. Bruce Fields" , Theodore Ts'o , Alexander Viro , Andreas Dilger , Dave Chinner , Ingo Molnar , Jan Kara , Jeff Layton , Matthew Wilcox , Thomas Gleixner , linux-ext4 , linux-fsdevel , Linux MM , "linux-nvdimm@lists.01.org" , X86 ML , XFS Developers , Andrew Morton , Matthew Wilcox , Dave Hansen List-ID: On Sun 03-01-16 10:13:06, Dan Williams wrote: > On Wed, Dec 23, 2015 at 11:39 AM, Ross Zwisler > wrote: > > To properly handle fsync/msync in an efficient way DAX needs to track dirty > > pages so it is able to flush them durably to media on demand. > > > > The tracking of dirty pages is done via the radix tree in struct > > address_space. This radix tree is already used by the page writeback > > infrastructure for tracking dirty pages associated with an open file, and > > it already has support for exceptional (non struct page*) entries. We > > build upon these features to add exceptional entries to the radix tree for > > DAX dirty PMD or PTE pages at fault time. > > > > Signed-off-by: Ross Zwisler > > I'm hitting the following report with the ndctl dax test [1] on > next-20151231. I bisected it to > commit 3cb108f941de "dax-add-support-for-fsync-sync-v6". I'll take a > closer look tomorrow, but in case someone can beat me to it, here's > the back-trace: > > ------------[ cut here ]------------ > kernel BUG at fs/inode.c:497! I suppose this is the check that mapping->nr_exceptional is zero, isn't it? Hum, I don't see how that could happen given we call truncate_inode_pages_final() just before the clear_inode() call which removes all the exceptional entries from the radix tree. And there's not much room for a race during umount... Does the radix tree really contain any entry or is it an accounting bug? Honza > [..] > CPU: 1 PID: 3001 Comm: umount Tainted: G O 4.4.0-rc7+ #2412 > Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 > task: ffff8800da2a5a00 ti: ffff880307794000 task.ti: ffff880307794000 > RIP: 0010:[] [] clear_inode+0x71/0x80 > RSP: 0018:ffff880307797d50 EFLAGS: 00010002 > RAX: ffff8800da2a5a00 RBX: ffff8800ca2e7328 RCX: ffff8800da2a5a28 > RDX: 0000000000000001 RSI: 0000000000000005 RDI: ffff8800ca2e7530 > RBP: ffff880307797d60 R08: ffffffff82900ae0 R09: 0000000000000000 > R10: ffff8800ca2e7548 R11: 0000000000000000 R12: ffff8800ca2e7530 > R13: ffff8800ca2e7328 R14: ffff8800da2e88d0 R15: ffff8800da2e88d0 > FS: 00007f2b22f4a880(0000) GS:ffff88031fc40000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00005648abd933e8 CR3: 000000007f3fc000 CR4: 00000000000006e0 > Stack: > ffff8800ca2e7328 ffff8800ca2e7000 ffff880307797d88 ffffffffa01c18af > ffff8800ca2e7328 ffff8800ca2e74d0 ffffffffa01ec740 ffff880307797db0 > ffffffff81281038 ffff8800ca2e74c0 ffff880307797e00 ffff8800ca2e7328 > Call Trace: > [] xfs_fs_evict_inode+0x5f/0x110 [xfs] > [] evict+0xb8/0x180 > [] dispose_list+0x3b/0x50 > [] evict_inodes+0x144/0x170 > [] generic_shutdown_super+0x3f/0xf0 > [] kill_block_super+0x27/0x70 > [] deactivate_locked_super+0x43/0x70 > [] deactivate_super+0x5c/0x60 > [] cleanup_mnt+0x3f/0x90 > [] __cleanup_mnt+0x12/0x20 > [] task_work_run+0x76/0x90 > [] syscall_return_slowpath+0x20a/0x280 > [] int_ret_from_sys_call+0x25/0x9f > Code: 48 8d 93 30 03 00 00 48 39 c2 75 23 48 8b 83 d0 00 00 00 a8 20 > 74 1a a8 40 75 18 48 c7 8 > 3 d0 00 00 00 60 00 00 00 5b 41 5c 5d c3 <0f> 0b 0f 0b 0f 0b 0f 0b 0f > 0b 0f 1f 44 00 00 0f 1f > 44 00 00 55 > RIP [] clear_inode+0x71/0x80 > RSP > ---[ end trace 3b1d8898a94a4fc1 ]--- > > [1]: git://git@github.com:pmem/ndctl.git pending > make TESTS="test/dax.sh" check > -- Jan Kara SUSE Labs, CR -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752333AbcAELOG (ORCPT ); Tue, 5 Jan 2016 06:14:06 -0500 Received: from mx2.suse.de ([195.135.220.15]:54339 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751863AbcAELN6 (ORCPT ); Tue, 5 Jan 2016 06:13:58 -0500 Date: Tue, 5 Jan 2016 12:13:46 +0100 From: Jan Kara To: Dan Williams Cc: Ross Zwisler , "linux-kernel@vger.kernel.org" , "H. Peter Anvin" , "J. Bruce Fields" , "Theodore Ts'o" , Alexander Viro , Andreas Dilger , Dave Chinner , Ingo Molnar , Jan Kara , Jeff Layton , Matthew Wilcox , Thomas Gleixner , linux-ext4 , linux-fsdevel , Linux MM , "linux-nvdimm@lists.01.org" , X86 ML , XFS Developers , Andrew Morton , Matthew Wilcox , Dave Hansen Subject: Re: [PATCH v6 4/7] dax: add support for fsync/msync Message-ID: <20160105111346.GC2724@quack.suse.cz> References: <1450899560-26708-1-git-send-email-ross.zwisler@linux.intel.com> <1450899560-26708-5-git-send-email-ross.zwisler@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun 03-01-16 10:13:06, Dan Williams wrote: > On Wed, Dec 23, 2015 at 11:39 AM, Ross Zwisler > wrote: > > To properly handle fsync/msync in an efficient way DAX needs to track dirty > > pages so it is able to flush them durably to media on demand. > > > > The tracking of dirty pages is done via the radix tree in struct > > address_space. This radix tree is already used by the page writeback > > infrastructure for tracking dirty pages associated with an open file, and > > it already has support for exceptional (non struct page*) entries. We > > build upon these features to add exceptional entries to the radix tree for > > DAX dirty PMD or PTE pages at fault time. > > > > Signed-off-by: Ross Zwisler > > I'm hitting the following report with the ndctl dax test [1] on > next-20151231. I bisected it to > commit 3cb108f941de "dax-add-support-for-fsync-sync-v6". I'll take a > closer look tomorrow, but in case someone can beat me to it, here's > the back-trace: > > ------------[ cut here ]------------ > kernel BUG at fs/inode.c:497! I suppose this is the check that mapping->nr_exceptional is zero, isn't it? Hum, I don't see how that could happen given we call truncate_inode_pages_final() just before the clear_inode() call which removes all the exceptional entries from the radix tree. And there's not much room for a race during umount... Does the radix tree really contain any entry or is it an accounting bug? Honza > [..] > CPU: 1 PID: 3001 Comm: umount Tainted: G O 4.4.0-rc7+ #2412 > Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 > task: ffff8800da2a5a00 ti: ffff880307794000 task.ti: ffff880307794000 > RIP: 0010:[] [] clear_inode+0x71/0x80 > RSP: 0018:ffff880307797d50 EFLAGS: 00010002 > RAX: ffff8800da2a5a00 RBX: ffff8800ca2e7328 RCX: ffff8800da2a5a28 > RDX: 0000000000000001 RSI: 0000000000000005 RDI: ffff8800ca2e7530 > RBP: ffff880307797d60 R08: ffffffff82900ae0 R09: 0000000000000000 > R10: ffff8800ca2e7548 R11: 0000000000000000 R12: ffff8800ca2e7530 > R13: ffff8800ca2e7328 R14: ffff8800da2e88d0 R15: ffff8800da2e88d0 > FS: 00007f2b22f4a880(0000) GS:ffff88031fc40000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00005648abd933e8 CR3: 000000007f3fc000 CR4: 00000000000006e0 > Stack: > ffff8800ca2e7328 ffff8800ca2e7000 ffff880307797d88 ffffffffa01c18af > ffff8800ca2e7328 ffff8800ca2e74d0 ffffffffa01ec740 ffff880307797db0 > ffffffff81281038 ffff8800ca2e74c0 ffff880307797e00 ffff8800ca2e7328 > Call Trace: > [] xfs_fs_evict_inode+0x5f/0x110 [xfs] > [] evict+0xb8/0x180 > [] dispose_list+0x3b/0x50 > [] evict_inodes+0x144/0x170 > [] generic_shutdown_super+0x3f/0xf0 > [] kill_block_super+0x27/0x70 > [] deactivate_locked_super+0x43/0x70 > [] deactivate_super+0x5c/0x60 > [] cleanup_mnt+0x3f/0x90 > [] __cleanup_mnt+0x12/0x20 > [] task_work_run+0x76/0x90 > [] syscall_return_slowpath+0x20a/0x280 > [] int_ret_from_sys_call+0x25/0x9f > Code: 48 8d 93 30 03 00 00 48 39 c2 75 23 48 8b 83 d0 00 00 00 a8 20 > 74 1a a8 40 75 18 48 c7 8 > 3 d0 00 00 00 60 00 00 00 5b 41 5c 5d c3 <0f> 0b 0f 0b 0f 0b 0f 0b 0f > 0b 0f 1f 44 00 00 0f 1f > 44 00 00 55 > RIP [] clear_inode+0x71/0x80 > RSP > ---[ end trace 3b1d8898a94a4fc1 ]--- > > [1]: git://git@github.com:pmem/ndctl.git pending > make TESTS="test/dax.sh" check > -- Jan Kara SUSE Labs, CR From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from relay.sgi.com (relay3.corp.sgi.com [198.149.34.15]) by oss.sgi.com (Postfix) with ESMTP id 6A7E97F50 for ; Tue, 5 Jan 2016 05:14:04 -0600 (CST) Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by relay3.corp.sgi.com (Postfix) with ESMTP id E0CF1AC003 for ; Tue, 5 Jan 2016 03:14:00 -0800 (PST) Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by cuda.sgi.com with ESMTP id uaY7Dc7bGSboeNrs (version=TLSv1 cipher=ECDHE-RSA-AES256-SHA bits=256 verify=NO) for ; Tue, 05 Jan 2016 03:13:58 -0800 (PST) Date: Tue, 5 Jan 2016 12:13:46 +0100 From: Jan Kara Subject: Re: [PATCH v6 4/7] dax: add support for fsync/msync Message-ID: <20160105111346.GC2724@quack.suse.cz> References: <1450899560-26708-1-git-send-email-ross.zwisler@linux.intel.com> <1450899560-26708-5-git-send-email-ross.zwisler@linux.intel.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: xfs-bounces@oss.sgi.com Sender: xfs-bounces@oss.sgi.com To: Dan Williams Cc: Dave Hansen , "J. Bruce Fields" , Linux MM , Andreas Dilger , "H. Peter Anvin" , Jeff Layton , "linux-nvdimm@lists.01.org" , X86 ML , Ingo Molnar , Matthew Wilcox , Ross Zwisler , linux-ext4 , XFS Developers , Alexander Viro , Thomas Gleixner , Theodore Ts'o , "linux-kernel@vger.kernel.org" , Jan Kara , linux-fsdevel , Andrew Morton , Matthew Wilcox On Sun 03-01-16 10:13:06, Dan Williams wrote: > On Wed, Dec 23, 2015 at 11:39 AM, Ross Zwisler > wrote: > > To properly handle fsync/msync in an efficient way DAX needs to track dirty > > pages so it is able to flush them durably to media on demand. > > > > The tracking of dirty pages is done via the radix tree in struct > > address_space. This radix tree is already used by the page writeback > > infrastructure for tracking dirty pages associated with an open file, and > > it already has support for exceptional (non struct page*) entries. We > > build upon these features to add exceptional entries to the radix tree for > > DAX dirty PMD or PTE pages at fault time. > > > > Signed-off-by: Ross Zwisler > > I'm hitting the following report with the ndctl dax test [1] on > next-20151231. I bisected it to > commit 3cb108f941de "dax-add-support-for-fsync-sync-v6". I'll take a > closer look tomorrow, but in case someone can beat me to it, here's > the back-trace: > > ------------[ cut here ]------------ > kernel BUG at fs/inode.c:497! I suppose this is the check that mapping->nr_exceptional is zero, isn't it? Hum, I don't see how that could happen given we call truncate_inode_pages_final() just before the clear_inode() call which removes all the exceptional entries from the radix tree. And there's not much room for a race during umount... Does the radix tree really contain any entry or is it an accounting bug? Honza > [..] > CPU: 1 PID: 3001 Comm: umount Tainted: G O 4.4.0-rc7+ #2412 > Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 > task: ffff8800da2a5a00 ti: ffff880307794000 task.ti: ffff880307794000 > RIP: 0010:[] [] clear_inode+0x71/0x80 > RSP: 0018:ffff880307797d50 EFLAGS: 00010002 > RAX: ffff8800da2a5a00 RBX: ffff8800ca2e7328 RCX: ffff8800da2a5a28 > RDX: 0000000000000001 RSI: 0000000000000005 RDI: ffff8800ca2e7530 > RBP: ffff880307797d60 R08: ffffffff82900ae0 R09: 0000000000000000 > R10: ffff8800ca2e7548 R11: 0000000000000000 R12: ffff8800ca2e7530 > R13: ffff8800ca2e7328 R14: ffff8800da2e88d0 R15: ffff8800da2e88d0 > FS: 00007f2b22f4a880(0000) GS:ffff88031fc40000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00005648abd933e8 CR3: 000000007f3fc000 CR4: 00000000000006e0 > Stack: > ffff8800ca2e7328 ffff8800ca2e7000 ffff880307797d88 ffffffffa01c18af > ffff8800ca2e7328 ffff8800ca2e74d0 ffffffffa01ec740 ffff880307797db0 > ffffffff81281038 ffff8800ca2e74c0 ffff880307797e00 ffff8800ca2e7328 > Call Trace: > [] xfs_fs_evict_inode+0x5f/0x110 [xfs] > [] evict+0xb8/0x180 > [] dispose_list+0x3b/0x50 > [] evict_inodes+0x144/0x170 > [] generic_shutdown_super+0x3f/0xf0 > [] kill_block_super+0x27/0x70 > [] deactivate_locked_super+0x43/0x70 > [] deactivate_super+0x5c/0x60 > [] cleanup_mnt+0x3f/0x90 > [] __cleanup_mnt+0x12/0x20 > [] task_work_run+0x76/0x90 > [] syscall_return_slowpath+0x20a/0x280 > [] int_ret_from_sys_call+0x25/0x9f > Code: 48 8d 93 30 03 00 00 48 39 c2 75 23 48 8b 83 d0 00 00 00 a8 20 > 74 1a a8 40 75 18 48 c7 8 > 3 d0 00 00 00 60 00 00 00 5b 41 5c 5d c3 <0f> 0b 0f 0b 0f 0b 0f 0b 0f > 0b 0f 1f 44 00 00 0f 1f > 44 00 00 55 > RIP [] clear_inode+0x71/0x80 > RSP > ---[ end trace 3b1d8898a94a4fc1 ]--- > > [1]: git://git@github.com:pmem/ndctl.git pending > make TESTS="test/dax.sh" check > -- Jan Kara SUSE Labs, CR _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs