From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Tue, 17 Nov 2015 10:12:22 +1100 From: Dave Chinner Subject: Re: [PATCH v2 11/11] xfs: add support for DAX fsync/msync Message-ID: <20151116231222.GY19199@dastard> References: <1447459610-14259-1-git-send-email-ross.zwisler@linux.intel.com> <1447459610-14259-12-git-send-email-ross.zwisler@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1447459610-14259-12-git-send-email-ross.zwisler@linux.intel.com> Sender: owner-linux-mm@kvack.org To: Ross Zwisler Cc: linux-kernel@vger.kernel.org, "H. Peter Anvin" , "J. Bruce Fields" , Theodore Ts'o , Alexander Viro , Andreas Dilger , Dan Williams , Ingo Molnar , Jan Kara , Jeff Layton , Matthew Wilcox , Thomas Gleixner , linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-nvdimm@lists.01.org, x86@kernel.org, xfs@oss.sgi.com, Andrew Morton , Matthew Wilcox , Dave Hansen List-ID: On Fri, Nov 13, 2015 at 05:06:50PM -0700, Ross Zwisler wrote: > To properly support the new DAX fsync/msync infrastructure filesystems > need to call dax_pfn_mkwrite() so that DAX can properly track when a user > write faults on a previously cleaned address. They also need to call > dax_fsync() in the filesystem fsync() path. This dax_fsync() call uses > addresses retrieved from get_block() so it needs to be ordered with > respect to truncate. This is accomplished by using the same locking that > was set up for DAX page faults. > > Signed-off-by: Ross Zwisler > --- > fs/xfs/xfs_file.c | 18 +++++++++++++----- > 1 file changed, 13 insertions(+), 5 deletions(-) > > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c > index 39743ef..2b490a1 100644 > --- a/fs/xfs/xfs_file.c > +++ b/fs/xfs/xfs_file.c > @@ -209,7 +209,8 @@ xfs_file_fsync( > loff_t end, > int datasync) > { > - struct inode *inode = file->f_mapping->host; > + struct address_space *mapping = file->f_mapping; > + struct inode *inode = mapping->host; > struct xfs_inode *ip = XFS_I(inode); > struct xfs_mount *mp = ip->i_mount; > int error = 0; > @@ -218,7 +219,13 @@ xfs_file_fsync( > > trace_xfs_file_fsync(ip); > > - error = filemap_write_and_wait_range(inode->i_mapping, start, end); > + if (dax_mapping(mapping)) { > + xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED); > + dax_fsync(mapping, start, end); > + xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED); > + } > + > + error = filemap_write_and_wait_range(mapping, start, end); Ok, I don't understand a couple of things here. Firstly, if it's a DAX mapping, why are we still calling filemap_write_and_wait_range() after the dax_fsync() call that has already written back all the dirty cachelines? Secondly, exactly what is the XFS_MMAPLOCK_SHARED lock supposed to be doing here? I don't see where dax_fsync() has any callouts to get_block(), so the comment "needs to be ordered with respect to truncate" doesn't make any obvious sense. If we have a racing truncate removing entries from the radix tree, then thanks to the mapping tree lock we'll either find an entry we need to write back, or we won't find any entry at all, right? Lastly, this flushing really needs to be inside filemap_write_and_wait_range(), because we call the writeback code from many more places than just fsync to ensure ordering of various operations such that files are in known state before proceeding (e.g. hole punch). Cheers, Dave. -- Dave Chinner david@fromorbit.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752142AbbKPXM3 (ORCPT ); Mon, 16 Nov 2015 18:12:29 -0500 Received: from ipmail06.adl2.internode.on.net ([150.101.137.129]:55052 "EHLO ipmail06.adl2.internode.on.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751261AbbKPXMZ (ORCPT ); Mon, 16 Nov 2015 18:12:25 -0500 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2CWCQBpYUpWPIYELHldKAECgxCBQoJfg36kfgEBAQEBAQaLM4Utgn2BEIYKAgIBAQKBRk0BAQEBAQEHAQEBAUE/hDUBAQQnExwjEAgDGAklDwUlAwcaE4gtuyABAQgCASAZhXSFRYk5BZZIjR+cTYR7KjQBhQkBAQE Date: Tue, 17 Nov 2015 10:12:22 +1100 From: Dave Chinner To: Ross Zwisler Cc: linux-kernel@vger.kernel.org, "H. Peter Anvin" , "J. Bruce Fields" , "Theodore Ts'o" , Alexander Viro , Andreas Dilger , Dan Williams , Ingo Molnar , Jan Kara , Jeff Layton , Matthew Wilcox , Thomas Gleixner , linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-nvdimm@ml01.01.org, x86@kernel.org, xfs@oss.sgi.com, Andrew Morton , Matthew Wilcox , Dave Hansen Subject: Re: [PATCH v2 11/11] xfs: add support for DAX fsync/msync Message-ID: <20151116231222.GY19199@dastard> References: <1447459610-14259-1-git-send-email-ross.zwisler@linux.intel.com> <1447459610-14259-12-git-send-email-ross.zwisler@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1447459610-14259-12-git-send-email-ross.zwisler@linux.intel.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Nov 13, 2015 at 05:06:50PM -0700, Ross Zwisler wrote: > To properly support the new DAX fsync/msync infrastructure filesystems > need to call dax_pfn_mkwrite() so that DAX can properly track when a user > write faults on a previously cleaned address. They also need to call > dax_fsync() in the filesystem fsync() path. This dax_fsync() call uses > addresses retrieved from get_block() so it needs to be ordered with > respect to truncate. This is accomplished by using the same locking that > was set up for DAX page faults. > > Signed-off-by: Ross Zwisler > --- > fs/xfs/xfs_file.c | 18 +++++++++++++----- > 1 file changed, 13 insertions(+), 5 deletions(-) > > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c > index 39743ef..2b490a1 100644 > --- a/fs/xfs/xfs_file.c > +++ b/fs/xfs/xfs_file.c > @@ -209,7 +209,8 @@ xfs_file_fsync( > loff_t end, > int datasync) > { > - struct inode *inode = file->f_mapping->host; > + struct address_space *mapping = file->f_mapping; > + struct inode *inode = mapping->host; > struct xfs_inode *ip = XFS_I(inode); > struct xfs_mount *mp = ip->i_mount; > int error = 0; > @@ -218,7 +219,13 @@ xfs_file_fsync( > > trace_xfs_file_fsync(ip); > > - error = filemap_write_and_wait_range(inode->i_mapping, start, end); > + if (dax_mapping(mapping)) { > + xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED); > + dax_fsync(mapping, start, end); > + xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED); > + } > + > + error = filemap_write_and_wait_range(mapping, start, end); Ok, I don't understand a couple of things here. Firstly, if it's a DAX mapping, why are we still calling filemap_write_and_wait_range() after the dax_fsync() call that has already written back all the dirty cachelines? Secondly, exactly what is the XFS_MMAPLOCK_SHARED lock supposed to be doing here? I don't see where dax_fsync() has any callouts to get_block(), so the comment "needs to be ordered with respect to truncate" doesn't make any obvious sense. If we have a racing truncate removing entries from the radix tree, then thanks to the mapping tree lock we'll either find an entry we need to write back, or we won't find any entry at all, right? Lastly, this flushing really needs to be inside filemap_write_and_wait_range(), because we call the writeback code from many more places than just fsync to ensure ordering of various operations such that files are in known state before proceeding (e.g. hole punch). Cheers, Dave. -- Dave Chinner david@fromorbit.com From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dave Chinner Subject: Re: [PATCH v2 11/11] xfs: add support for DAX fsync/msync Date: Tue, 17 Nov 2015 10:12:22 +1100 Message-ID: <20151116231222.GY19199@dastard> References: <1447459610-14259-1-git-send-email-ross.zwisler@linux.intel.com> <1447459610-14259-12-git-send-email-ross.zwisler@linux.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: linux-kernel@vger.kernel.org, "H. Peter Anvin" , "J. Bruce Fields" , Theodore Ts'o , Alexander Viro , Andreas Dilger , Dan Williams , Ingo Molnar , Jan Kara , Jeff Layton , Matthew Wilcox , Thomas Gleixner , linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-nvdimm@lists.01.org, x86@kernel.org, xfs@oss.sgi.com, Andrew Morton , Matthew Wilcox , Dave Hansen To: Ross Zwisler Return-path: Content-Disposition: inline In-Reply-To: <1447459610-14259-12-git-send-email-ross.zwisler@linux.intel.com> Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-ext4.vger.kernel.org On Fri, Nov 13, 2015 at 05:06:50PM -0700, Ross Zwisler wrote: > To properly support the new DAX fsync/msync infrastructure filesystems > need to call dax_pfn_mkwrite() so that DAX can properly track when a user > write faults on a previously cleaned address. They also need to call > dax_fsync() in the filesystem fsync() path. This dax_fsync() call uses > addresses retrieved from get_block() so it needs to be ordered with > respect to truncate. This is accomplished by using the same locking that > was set up for DAX page faults. > > Signed-off-by: Ross Zwisler > --- > fs/xfs/xfs_file.c | 18 +++++++++++++----- > 1 file changed, 13 insertions(+), 5 deletions(-) > > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c > index 39743ef..2b490a1 100644 > --- a/fs/xfs/xfs_file.c > +++ b/fs/xfs/xfs_file.c > @@ -209,7 +209,8 @@ xfs_file_fsync( > loff_t end, > int datasync) > { > - struct inode *inode = file->f_mapping->host; > + struct address_space *mapping = file->f_mapping; > + struct inode *inode = mapping->host; > struct xfs_inode *ip = XFS_I(inode); > struct xfs_mount *mp = ip->i_mount; > int error = 0; > @@ -218,7 +219,13 @@ xfs_file_fsync( > > trace_xfs_file_fsync(ip); > > - error = filemap_write_and_wait_range(inode->i_mapping, start, end); > + if (dax_mapping(mapping)) { > + xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED); > + dax_fsync(mapping, start, end); > + xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED); > + } > + > + error = filemap_write_and_wait_range(mapping, start, end); Ok, I don't understand a couple of things here. Firstly, if it's a DAX mapping, why are we still calling filemap_write_and_wait_range() after the dax_fsync() call that has already written back all the dirty cachelines? Secondly, exactly what is the XFS_MMAPLOCK_SHARED lock supposed to be doing here? I don't see where dax_fsync() has any callouts to get_block(), so the comment "needs to be ordered with respect to truncate" doesn't make any obvious sense. If we have a racing truncate removing entries from the radix tree, then thanks to the mapping tree lock we'll either find an entry we need to write back, or we won't find any entry at all, right? Lastly, this flushing really needs to be inside filemap_write_and_wait_range(), because we call the writeback code from many more places than just fsync to ensure ordering of various operations such that files are in known state before proceeding (e.g. hole punch). Cheers, Dave. -- Dave Chinner david@fromorbit.com From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from relay.sgi.com (relay1.corp.sgi.com [137.38.102.111]) by oss.sgi.com (Postfix) with ESMTP id 6FBE57F54 for ; Mon, 16 Nov 2015 17:12:30 -0600 (CST) Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by relay1.corp.sgi.com (Postfix) with ESMTP id 354B38F8033 for ; Mon, 16 Nov 2015 15:12:27 -0800 (PST) Received: from ipmail06.adl2.internode.on.net (ipmail06.adl2.internode.on.net [150.101.137.129]) by cuda.sgi.com with ESMTP id z136fcuSCrD9JZIa for ; Mon, 16 Nov 2015 15:12:24 -0800 (PST) Date: Tue, 17 Nov 2015 10:12:22 +1100 From: Dave Chinner Subject: Re: [PATCH v2 11/11] xfs: add support for DAX fsync/msync Message-ID: <20151116231222.GY19199@dastard> References: <1447459610-14259-1-git-send-email-ross.zwisler@linux.intel.com> <1447459610-14259-12-git-send-email-ross.zwisler@linux.intel.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <1447459610-14259-12-git-send-email-ross.zwisler@linux.intel.com> List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: xfs-bounces@oss.sgi.com Sender: xfs-bounces@oss.sgi.com To: Ross Zwisler Cc: x86@kernel.org, Theodore Ts'o , Andrew Morton , Thomas Gleixner , linux-nvdimm@lists.01.org, Jan Kara , linux-kernel@vger.kernel.org, Dave Hansen , xfs@oss.sgi.com, "J. Bruce Fields" , linux-mm@kvack.org, Ingo Molnar , Andreas Dilger , Alexander Viro , "H. Peter Anvin" , linux-fsdevel@vger.kernel.org, Matthew Wilcox , Dan Williams , linux-ext4@vger.kernel.org, Jeff Layton , Matthew Wilcox On Fri, Nov 13, 2015 at 05:06:50PM -0700, Ross Zwisler wrote: > To properly support the new DAX fsync/msync infrastructure filesystems > need to call dax_pfn_mkwrite() so that DAX can properly track when a user > write faults on a previously cleaned address. They also need to call > dax_fsync() in the filesystem fsync() path. This dax_fsync() call uses > addresses retrieved from get_block() so it needs to be ordered with > respect to truncate. This is accomplished by using the same locking that > was set up for DAX page faults. > > Signed-off-by: Ross Zwisler > --- > fs/xfs/xfs_file.c | 18 +++++++++++++----- > 1 file changed, 13 insertions(+), 5 deletions(-) > > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c > index 39743ef..2b490a1 100644 > --- a/fs/xfs/xfs_file.c > +++ b/fs/xfs/xfs_file.c > @@ -209,7 +209,8 @@ xfs_file_fsync( > loff_t end, > int datasync) > { > - struct inode *inode = file->f_mapping->host; > + struct address_space *mapping = file->f_mapping; > + struct inode *inode = mapping->host; > struct xfs_inode *ip = XFS_I(inode); > struct xfs_mount *mp = ip->i_mount; > int error = 0; > @@ -218,7 +219,13 @@ xfs_file_fsync( > > trace_xfs_file_fsync(ip); > > - error = filemap_write_and_wait_range(inode->i_mapping, start, end); > + if (dax_mapping(mapping)) { > + xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED); > + dax_fsync(mapping, start, end); > + xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED); > + } > + > + error = filemap_write_and_wait_range(mapping, start, end); Ok, I don't understand a couple of things here. Firstly, if it's a DAX mapping, why are we still calling filemap_write_and_wait_range() after the dax_fsync() call that has already written back all the dirty cachelines? Secondly, exactly what is the XFS_MMAPLOCK_SHARED lock supposed to be doing here? I don't see where dax_fsync() has any callouts to get_block(), so the comment "needs to be ordered with respect to truncate" doesn't make any obvious sense. If we have a racing truncate removing entries from the radix tree, then thanks to the mapping tree lock we'll either find an entry we need to write back, or we won't find any entry at all, right? Lastly, this flushing really needs to be inside filemap_write_and_wait_range(), because we call the writeback code from many more places than just fsync to ensure ordering of various operations such that files are in known state before proceeding (e.g. hole punch). Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs