From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: MIME-Version: 1.0 In-Reply-To: <20171013065716.GB26461@lst.de> References: <150776922692.9144.16963640112710410217.stgit@dwillia2-desk3.amr.corp.intel.com> <20171012142319.GA11254@lst.de> <20171013065716.GB26461@lst.de> From: Dan Williams Date: Fri, 13 Oct 2017 08:14:55 -0700 Message-ID: Subject: Re: [PATCH v9 0/6] MAP_DIRECT for DAX userspace flush Content-Type: text/plain; charset="UTF-8" Sender: owner-linux-mm@kvack.org To: Christoph Hellwig Cc: "linux-nvdimm@lists.01.org" , linux-xfs@vger.kernel.org, Jan Kara , Arnd Bergmann , "Darrick J. Wong" , Linux API , Dave Chinner , "J. Bruce Fields" , Linux MM , Jeff Moyer , Al Viro , Andy Lutomirski , Ross Zwisler , linux-fsdevel , Jeff Layton , Linus Torvalds , Andrew Morton , Jason Gunthorpe List-ID: On Thu, Oct 12, 2017 at 11:57 PM, Christoph Hellwig wrote: > On Thu, Oct 12, 2017 at 10:41:39AM -0700, Dan Williams wrote: >> So, you're jumping into this review at v9 where I've split the patches >> that take an initial MAP_DIRECT lease out from the patches that take >> FL_LAYOUT leases at memory registration time. You can see a previous >> attempt in "[PATCH v8 00/14] MAP_DIRECT for DAX RDMA and userspace >> flush" which should be in your inbox. > > The point is that your problem has absolutely nothing to do with mmap, > and all with get_user_pages. > > get_user_pages on DAX doesn't give the same guarantees as on pagecache > or anonymous memory, and that is the prbolem we need to fix. In fact > I'm pretty sure if we try hard enough (and we might have to try > very hard) we can see the same problem with plain direct I/O and without > any RDMA involved, e.g. do a larger direct I/O write to memory that is > mmap()ed from a DAX file, then truncate the DAX file and reallocate > the blocks, and we might corrupt that new file. We'll probably need > a special setup where there is little other chance but to reallocate > those used blocks. I'll take a harder look at this... > So what we need to do first is to fix get_user_pages vs unmapping > DAX mmap()ed blocks, be that from a hole punch, truncate, COW > operation, etc. > > Then we need to look into the special case of a long-living non-transient > get_user_pages that RDMA does - we can't just reject any truncate or > other operation for that, so that's where something like me layout > lease suggestion comes into play - but the call that should get the > least is not the mmap - it's the memory registration call that does > the get_user_pages. Yes, mmap is not the place to get the lease for a later get_user_pages, and my patches do take an additional lease at get_user_pages / MR init time. However, the mmap call has the file-descriptor for SIGIO the MR-init call does not. If we delay all of the setup it to MR time then we need to invent a notification scheme specific to RDMA which seems like a waste to me when we can generically signal an event on the fd for any event that effects any of the vma's on the file. The FL_LAYOUT lease impacts the entire file, so as far as I can see delaying the notification until MR-init is too late, too granular, and too RDMA specific. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-f51.google.com ([209.85.218.51]:56459 "EHLO mail-oi0-f51.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753541AbdJMPO4 (ORCPT ); Fri, 13 Oct 2017 11:14:56 -0400 Received: by mail-oi0-f51.google.com with SMTP id v9so14781435oif.13 for ; Fri, 13 Oct 2017 08:14:56 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20171013065716.GB26461@lst.de> References: <150776922692.9144.16963640112710410217.stgit@dwillia2-desk3.amr.corp.intel.com> <20171012142319.GA11254@lst.de> <20171013065716.GB26461@lst.de> From: Dan Williams Date: Fri, 13 Oct 2017 08:14:55 -0700 Message-ID: Subject: Re: [PATCH v9 0/6] MAP_DIRECT for DAX userspace flush Content-Type: text/plain; charset="UTF-8" Sender: linux-xfs-owner@vger.kernel.org List-ID: List-Id: xfs To: Christoph Hellwig Cc: "linux-nvdimm@lists.01.org" , linux-xfs@vger.kernel.org, Jan Kara , Arnd Bergmann , "Darrick J. Wong" , Linux API , Dave Chinner , "J. Bruce Fields" , Linux MM , Jeff Moyer , Al Viro , Andy Lutomirski , Ross Zwisler , linux-fsdevel , Jeff Layton , Linus Torvalds , Andrew Morton , Jason Gunthorpe On Thu, Oct 12, 2017 at 11:57 PM, Christoph Hellwig wrote: > On Thu, Oct 12, 2017 at 10:41:39AM -0700, Dan Williams wrote: >> So, you're jumping into this review at v9 where I've split the patches >> that take an initial MAP_DIRECT lease out from the patches that take >> FL_LAYOUT leases at memory registration time. You can see a previous >> attempt in "[PATCH v8 00/14] MAP_DIRECT for DAX RDMA and userspace >> flush" which should be in your inbox. > > The point is that your problem has absolutely nothing to do with mmap, > and all with get_user_pages. > > get_user_pages on DAX doesn't give the same guarantees as on pagecache > or anonymous memory, and that is the prbolem we need to fix. In fact > I'm pretty sure if we try hard enough (and we might have to try > very hard) we can see the same problem with plain direct I/O and without > any RDMA involved, e.g. do a larger direct I/O write to memory that is > mmap()ed from a DAX file, then truncate the DAX file and reallocate > the blocks, and we might corrupt that new file. We'll probably need > a special setup where there is little other chance but to reallocate > those used blocks. I'll take a harder look at this... > So what we need to do first is to fix get_user_pages vs unmapping > DAX mmap()ed blocks, be that from a hole punch, truncate, COW > operation, etc. > > Then we need to look into the special case of a long-living non-transient > get_user_pages that RDMA does - we can't just reject any truncate or > other operation for that, so that's where something like me layout > lease suggestion comes into play - but the call that should get the > least is not the mmap - it's the memory registration call that does > the get_user_pages. Yes, mmap is not the place to get the lease for a later get_user_pages, and my patches do take an additional lease at get_user_pages / MR init time. However, the mmap call has the file-descriptor for SIGIO the MR-init call does not. If we delay all of the setup it to MR time then we need to invent a notification scheme specific to RDMA which seems like a waste to me when we can generically signal an event on the fd for any event that effects any of the vma's on the file. The FL_LAYOUT lease impacts the entire file, so as far as I can see delaying the notification until MR-init is too late, too granular, and too RDMA specific. From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dan Williams Subject: Re: [PATCH v9 0/6] MAP_DIRECT for DAX userspace flush Date: Fri, 13 Oct 2017 08:14:55 -0700 Message-ID: References: <150776922692.9144.16963640112710410217.stgit@dwillia2-desk3.amr.corp.intel.com> <20171012142319.GA11254@lst.de> <20171013065716.GB26461@lst.de> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Return-path: In-Reply-To: <20171013065716.GB26461-jcswGhMUV9g@public.gmane.org> Sender: linux-api-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: Christoph Hellwig Cc: "linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org" , linux-xfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Jan Kara , Arnd Bergmann , "Darrick J. Wong" , Linux API , Dave Chinner , "J. Bruce Fields" , Linux MM , Jeff Moyer , Al Viro , Andy Lutomirski , Ross Zwisler , linux-fsdevel , Jeff Layton , Linus Torvalds , Andrew Morton , Jason Gunthorpe List-Id: linux-api@vger.kernel.org On Thu, Oct 12, 2017 at 11:57 PM, Christoph Hellwig wrote: > On Thu, Oct 12, 2017 at 10:41:39AM -0700, Dan Williams wrote: >> So, you're jumping into this review at v9 where I've split the patches >> that take an initial MAP_DIRECT lease out from the patches that take >> FL_LAYOUT leases at memory registration time. You can see a previous >> attempt in "[PATCH v8 00/14] MAP_DIRECT for DAX RDMA and userspace >> flush" which should be in your inbox. > > The point is that your problem has absolutely nothing to do with mmap, > and all with get_user_pages. > > get_user_pages on DAX doesn't give the same guarantees as on pagecache > or anonymous memory, and that is the prbolem we need to fix. In fact > I'm pretty sure if we try hard enough (and we might have to try > very hard) we can see the same problem with plain direct I/O and without > any RDMA involved, e.g. do a larger direct I/O write to memory that is > mmap()ed from a DAX file, then truncate the DAX file and reallocate > the blocks, and we might corrupt that new file. We'll probably need > a special setup where there is little other chance but to reallocate > those used blocks. I'll take a harder look at this... > So what we need to do first is to fix get_user_pages vs unmapping > DAX mmap()ed blocks, be that from a hole punch, truncate, COW > operation, etc. > > Then we need to look into the special case of a long-living non-transient > get_user_pages that RDMA does - we can't just reject any truncate or > other operation for that, so that's where something like me layout > lease suggestion comes into play - but the call that should get the > least is not the mmap - it's the memory registration call that does > the get_user_pages. Yes, mmap is not the place to get the lease for a later get_user_pages, and my patches do take an additional lease at get_user_pages / MR init time. However, the mmap call has the file-descriptor for SIGIO the MR-init call does not. If we delay all of the setup it to MR time then we need to invent a notification scheme specific to RDMA which seems like a waste to me when we can generically signal an event on the fd for any event that effects any of the vma's on the file. The FL_LAYOUT lease impacts the entire file, so as far as I can see delaying the notification until MR-init is too late, too granular, and too RDMA specific.