From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Tue, 9 Feb 2016 10:43:53 +0100 From: Jan Kara Subject: Re: [PATCH 2/2] dax: move writeback calls into the filesystems Message-ID: <20160209094353.GF9451@quack.suse.cz> References: <1454829553-29499-1-git-send-email-ross.zwisler@linux.intel.com> <1454829553-29499-3-git-send-email-ross.zwisler@linux.intel.com> <20160207215047.GJ31407@dastard> <20160208201808.GK27429@dastard> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org To: Dan Williams Cc: Dave Chinner , Ross Zwisler , "linux-kernel@vger.kernel.org" , Theodore Ts'o , Alexander Viro , Andreas Dilger , Andrew Morton , Jan Kara , Matthew Wilcox , linux-ext4 , linux-fsdevel , Linux MM , "linux-nvdimm@lists.01.org" , XFS Developers , jmoyer List-ID: On Mon 08-02-16 12:55:24, Dan Williams wrote: > On Mon, Feb 8, 2016 at 12:18 PM, Dave Chinner wrote: > [..] > >> Setting aside the current block zeroing problem you seem to assuming > >> that DAX will always be faster and that may not be true at a media > >> level. Waiting years for some applications to determine if DAX makes > >> sense for their use case seems completely reasonable. In the meantime > >> the apps that are already making these changes want to know that a DAX > >> mapping request has not silently dropped backed to page cache. They > >> also want to know if they successfully jumped through all the hoops to > >> get a larger than pte mapping. > >> > >> I agree it is useful to be able to force DAX on an unmodified > >> application to see what happens, and it follows that if those > >> applications want to run in that mode they will need functional > >> fsync()... > >> > >> I would feel better if we were talking about specific applications and > >> performance numbers to know if forcing DAX on application is a debug > >> facility or a production level capability. You seem to have already > >> made that determination and I'm curious what I'm missing. > > > > I'm not setting any policy here at all. This whole argument is > > based around the DAX mount option doing "global fs enable or > > silently turning it off" and the application not knowing about that. > > > > The whole point of having a persistent per-inode DAX flags is that > > it is a policy mechanism, not a policy. The application can, if it > > is DAX aware, directly control whether DAX is used on a file or not. > > The application can even query and clear that persistent inode flag > > if it is configured not to (or cannot) use DAX. > > > > If the filesystem cannot support DAX, then we can error out attempts > > to set the DAX flag and then the app knows DAX is not available. > > i.e. the attempt to set policy failed. If the flag is set, then the > > inode will *always* use DAX - there is no "fall back to page cache" > > when DAX is enabled. > > > > If the applicaiton is not DAX aware, then the admin can control the > > DAX policy by manipulating these flags themselves, and hence control > > whether DAX is used by the application or not. > > > > If you think I'm dictating policy for DAX users and application, > > then you haven't understood anything I've previously said about why > > the DAX mount option needs to die before any of this is considered > > production ready. DAX is not an opaque "all or nothing" option. XFS > > will provide apps and admins with fine-grained, persistent, > > discoverable policy flags to allow admins and applications to set > > DAX policies however they see fit. This simply cannot be done if the > > only knob you have is a mount option that may or may not stick. > > I agree the mount option needs to die, and I fully grok the reasoning. > What I'm concerned with is that a system using fully-DAX-aware > applications is forced to incur the overhead of maintaining *sync > semantics, periodic sync(2) in particular, even if it is not relying > on those semantics. Let me somewhat correct this: IMO hard requirement is maintaining sync(2) semantics. Periodic writeback does not have any hard durability guarantees and we are free to ignore such requests in ->writepages() (that function has enough information in the writeback_control structure to differentiate between periodic writeback and data integrity sync) if we decide it is useful. Actually, we could do that even for 4.5. Honza -- Jan Kara SUSE Labs, CR -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756436AbcBIJno (ORCPT ); Tue, 9 Feb 2016 04:43:44 -0500 Received: from mx2.suse.de ([195.135.220.15]:39321 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754559AbcBIJnk (ORCPT ); Tue, 9 Feb 2016 04:43:40 -0500 Date: Tue, 9 Feb 2016 10:43:53 +0100 From: Jan Kara To: Dan Williams Cc: Dave Chinner , Ross Zwisler , "linux-kernel@vger.kernel.org" , "Theodore Ts'o" , Alexander Viro , Andreas Dilger , Andrew Morton , Jan Kara , Matthew Wilcox , linux-ext4 , linux-fsdevel , Linux MM , "linux-nvdimm@lists.01.org" , XFS Developers , jmoyer Subject: Re: [PATCH 2/2] dax: move writeback calls into the filesystems Message-ID: <20160209094353.GF9451@quack.suse.cz> References: <1454829553-29499-1-git-send-email-ross.zwisler@linux.intel.com> <1454829553-29499-3-git-send-email-ross.zwisler@linux.intel.com> <20160207215047.GJ31407@dastard> <20160208201808.GK27429@dastard> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon 08-02-16 12:55:24, Dan Williams wrote: > On Mon, Feb 8, 2016 at 12:18 PM, Dave Chinner wrote: > [..] > >> Setting aside the current block zeroing problem you seem to assuming > >> that DAX will always be faster and that may not be true at a media > >> level. Waiting years for some applications to determine if DAX makes > >> sense for their use case seems completely reasonable. In the meantime > >> the apps that are already making these changes want to know that a DAX > >> mapping request has not silently dropped backed to page cache. They > >> also want to know if they successfully jumped through all the hoops to > >> get a larger than pte mapping. > >> > >> I agree it is useful to be able to force DAX on an unmodified > >> application to see what happens, and it follows that if those > >> applications want to run in that mode they will need functional > >> fsync()... > >> > >> I would feel better if we were talking about specific applications and > >> performance numbers to know if forcing DAX on application is a debug > >> facility or a production level capability. You seem to have already > >> made that determination and I'm curious what I'm missing. > > > > I'm not setting any policy here at all. This whole argument is > > based around the DAX mount option doing "global fs enable or > > silently turning it off" and the application not knowing about that. > > > > The whole point of having a persistent per-inode DAX flags is that > > it is a policy mechanism, not a policy. The application can, if it > > is DAX aware, directly control whether DAX is used on a file or not. > > The application can even query and clear that persistent inode flag > > if it is configured not to (or cannot) use DAX. > > > > If the filesystem cannot support DAX, then we can error out attempts > > to set the DAX flag and then the app knows DAX is not available. > > i.e. the attempt to set policy failed. If the flag is set, then the > > inode will *always* use DAX - there is no "fall back to page cache" > > when DAX is enabled. > > > > If the applicaiton is not DAX aware, then the admin can control the > > DAX policy by manipulating these flags themselves, and hence control > > whether DAX is used by the application or not. > > > > If you think I'm dictating policy for DAX users and application, > > then you haven't understood anything I've previously said about why > > the DAX mount option needs to die before any of this is considered > > production ready. DAX is not an opaque "all or nothing" option. XFS > > will provide apps and admins with fine-grained, persistent, > > discoverable policy flags to allow admins and applications to set > > DAX policies however they see fit. This simply cannot be done if the > > only knob you have is a mount option that may or may not stick. > > I agree the mount option needs to die, and I fully grok the reasoning. > What I'm concerned with is that a system using fully-DAX-aware > applications is forced to incur the overhead of maintaining *sync > semantics, periodic sync(2) in particular, even if it is not relying > on those semantics. Let me somewhat correct this: IMO hard requirement is maintaining sync(2) semantics. Periodic writeback does not have any hard durability guarantees and we are free to ignore such requests in ->writepages() (that function has enough information in the writeback_control structure to differentiate between periodic writeback and data integrity sync) if we decide it is useful. Actually, we could do that even for 4.5. Honza -- Jan Kara SUSE Labs, CR From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from relay.sgi.com (relay3.corp.sgi.com [198.149.34.15]) by oss.sgi.com (Postfix) with ESMTP id F10EC7CA2 for ; Tue, 9 Feb 2016 03:43:44 -0600 (CST) Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by relay3.corp.sgi.com (Postfix) with ESMTP id 6B86EAC004 for ; Tue, 9 Feb 2016 01:43:44 -0800 (PST) Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by cuda.sgi.com with ESMTP id NC5Jbw1i4SZt9PFW (version=TLSv1 cipher=ECDHE-RSA-AES256-SHA bits=256 verify=NO) for ; Tue, 09 Feb 2016 01:43:40 -0800 (PST) Date: Tue, 9 Feb 2016 10:43:53 +0100 From: Jan Kara Subject: Re: [PATCH 2/2] dax: move writeback calls into the filesystems Message-ID: <20160209094353.GF9451@quack.suse.cz> References: <1454829553-29499-1-git-send-email-ross.zwisler@linux.intel.com> <1454829553-29499-3-git-send-email-ross.zwisler@linux.intel.com> <20160207215047.GJ31407@dastard> <20160208201808.GK27429@dastard> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: xfs-bounces@oss.sgi.com Sender: xfs-bounces@oss.sgi.com To: Dan Williams Cc: Theodore Ts'o , "linux-nvdimm@lists.01.org" , "linux-kernel@vger.kernel.org" , XFS Developers , Linux MM , jmoyer , Andreas Dilger , Alexander Viro , Jan Kara , linux-fsdevel , Matthew Wilcox , Ross Zwisler , linux-ext4 , Andrew Morton On Mon 08-02-16 12:55:24, Dan Williams wrote: > On Mon, Feb 8, 2016 at 12:18 PM, Dave Chinner wrote: > [..] > >> Setting aside the current block zeroing problem you seem to assuming > >> that DAX will always be faster and that may not be true at a media > >> level. Waiting years for some applications to determine if DAX makes > >> sense for their use case seems completely reasonable. In the meantime > >> the apps that are already making these changes want to know that a DAX > >> mapping request has not silently dropped backed to page cache. They > >> also want to know if they successfully jumped through all the hoops to > >> get a larger than pte mapping. > >> > >> I agree it is useful to be able to force DAX on an unmodified > >> application to see what happens, and it follows that if those > >> applications want to run in that mode they will need functional > >> fsync()... > >> > >> I would feel better if we were talking about specific applications and > >> performance numbers to know if forcing DAX on application is a debug > >> facility or a production level capability. You seem to have already > >> made that determination and I'm curious what I'm missing. > > > > I'm not setting any policy here at all. This whole argument is > > based around the DAX mount option doing "global fs enable or > > silently turning it off" and the application not knowing about that. > > > > The whole point of having a persistent per-inode DAX flags is that > > it is a policy mechanism, not a policy. The application can, if it > > is DAX aware, directly control whether DAX is used on a file or not. > > The application can even query and clear that persistent inode flag > > if it is configured not to (or cannot) use DAX. > > > > If the filesystem cannot support DAX, then we can error out attempts > > to set the DAX flag and then the app knows DAX is not available. > > i.e. the attempt to set policy failed. If the flag is set, then the > > inode will *always* use DAX - there is no "fall back to page cache" > > when DAX is enabled. > > > > If the applicaiton is not DAX aware, then the admin can control the > > DAX policy by manipulating these flags themselves, and hence control > > whether DAX is used by the application or not. > > > > If you think I'm dictating policy for DAX users and application, > > then you haven't understood anything I've previously said about why > > the DAX mount option needs to die before any of this is considered > > production ready. DAX is not an opaque "all or nothing" option. XFS > > will provide apps and admins with fine-grained, persistent, > > discoverable policy flags to allow admins and applications to set > > DAX policies however they see fit. This simply cannot be done if the > > only knob you have is a mount option that may or may not stick. > > I agree the mount option needs to die, and I fully grok the reasoning. > What I'm concerned with is that a system using fully-DAX-aware > applications is forced to incur the overhead of maintaining *sync > semantics, periodic sync(2) in particular, even if it is not relying > on those semantics. Let me somewhat correct this: IMO hard requirement is maintaining sync(2) semantics. Periodic writeback does not have any hard durability guarantees and we are free to ignore such requests in ->writepages() (that function has enough information in the writeback_control structure to differentiate between periodic writeback and data integrity sync) if we decide it is useful. Actually, we could do that even for 4.5. Honza -- Jan Kara SUSE Labs, CR _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs