From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Subject: Re: [PATCH v9 0/6] MAP_DIRECT for DAX userspace flush References: <150776922692.9144.16963640112710410217.stgit@dwillia2-desk3.amr.corp.intel.com> <20171012142319.GA11254@lst.de> <20171013065716.GB26461@lst.de> <20171013163822.GA17411@obsidianresearch.com> <20171013173145.GA18702@obsidianresearch.com> <20171016072644.GB28270@lst.de> From: Sagi Grimberg Message-ID: <27694a5e-ec3a-0a68-b053-c138e0c91446@grimberg.me> Date: Mon, 16 Oct 2017 15:07:28 +0300 MIME-Version: 1.0 In-Reply-To: <20171016072644.GB28270@lst.de> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org To: Christoph Hellwig , Jason Gunthorpe Cc: "J. Bruce Fields" , Jan Kara , Andrew Morton , Arnd Bergmann , "linux-nvdimm@lists.01.org" , Linux API , "Darrick J. Wong" , Dave Chinner , linux-xfs@vger.kernel.org, Linux MM , Jeff Layton , Al Viro , Andy Lutomirski , linux-fsdevel , Linus Torvalds List-ID: >> I don't think that really represents how lots of apps actually use >> RDMA. >> >> RDMA is often buried down in the software stack (eg in a MPI), and by >> the time a mapping gets used for RDMA transfer the link between the >> FD, mmap and the MR is totally opaque. >> >> Having a MR specific notification means the low level RDMA libraries >> have a chance to deal with everything for the app. >> >> Eg consider a HPC app using MPI that uses some DAX aware library to >> get DAX backed mmap's. It then passes memory in those mmaps to the >> MPI library to do transfers. The MPI creates the MR on demand. >> > > I suspect one of the more interesting use cases might be a file server, > for which that's not the case. But otherwise I agree with the above, > and also thing that notifying the MR handle is the only way to go for > another very important reason: fencing. What if the application/library > does not react on the notification? With a per-MR notification we > can unregister the MR in kernel space and have a rock solid fencing > mechanism. And that is the most important bit here. I agree we must deregister the MR in kernel space. As said, I think its perfectly reasonable to let user-space see error completions and provide query mechanism for MR granularity (unfortunately this will probably need drivers assistance as they know how their device reports in MR granularity access violations). -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f67.google.com ([74.125.82.67]:54176 "EHLO mail-wm0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751448AbdJPMHc (ORCPT ); Mon, 16 Oct 2017 08:07:32 -0400 Subject: Re: [PATCH v9 0/6] MAP_DIRECT for DAX userspace flush References: <150776922692.9144.16963640112710410217.stgit@dwillia2-desk3.amr.corp.intel.com> <20171012142319.GA11254@lst.de> <20171013065716.GB26461@lst.de> <20171013163822.GA17411@obsidianresearch.com> <20171013173145.GA18702@obsidianresearch.com> <20171016072644.GB28270@lst.de> From: Sagi Grimberg Message-ID: <27694a5e-ec3a-0a68-b053-c138e0c91446@grimberg.me> Date: Mon, 16 Oct 2017 15:07:28 +0300 MIME-Version: 1.0 In-Reply-To: <20171016072644.GB28270@lst.de> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-xfs-owner@vger.kernel.org List-ID: List-Id: xfs To: Christoph Hellwig , Jason Gunthorpe Cc: "J. Bruce Fields" , Jan Kara , Andrew Morton , Arnd Bergmann , "linux-nvdimm@lists.01.org" , Linux API , "Darrick J. Wong" , Dave Chinner , linux-xfs@vger.kernel.org, Linux MM , Jeff Layton , Al Viro , Andy Lutomirski , linux-fsdevel , Linus Torvalds >> I don't think that really represents how lots of apps actually use >> RDMA. >> >> RDMA is often buried down in the software stack (eg in a MPI), and by >> the time a mapping gets used for RDMA transfer the link between the >> FD, mmap and the MR is totally opaque. >> >> Having a MR specific notification means the low level RDMA libraries >> have a chance to deal with everything for the app. >> >> Eg consider a HPC app using MPI that uses some DAX aware library to >> get DAX backed mmap's. It then passes memory in those mmaps to the >> MPI library to do transfers. The MPI creates the MR on demand. >> > > I suspect one of the more interesting use cases might be a file server, > for which that's not the case. But otherwise I agree with the above, > and also thing that notifying the MR handle is the only way to go for > another very important reason: fencing. What if the application/library > does not react on the notification? With a per-MR notification we > can unregister the MR in kernel space and have a rock solid fencing > mechanism. And that is the most important bit here. I agree we must deregister the MR in kernel space. As said, I think its perfectly reasonable to let user-space see error completions and provide query mechanism for MR granularity (unfortunately this will probably need drivers assistance as they know how their device reports in MR granularity access violations). From mboxrd@z Thu Jan 1 00:00:00 1970 From: Sagi Grimberg Subject: Re: [PATCH v9 0/6] MAP_DIRECT for DAX userspace flush Date: Mon, 16 Oct 2017 15:07:28 +0300 Message-ID: <27694a5e-ec3a-0a68-b053-c138e0c91446@grimberg.me> References: <150776922692.9144.16963640112710410217.stgit@dwillia2-desk3.amr.corp.intel.com> <20171012142319.GA11254@lst.de> <20171013065716.GB26461@lst.de> <20171013163822.GA17411@obsidianresearch.com> <20171013173145.GA18702@obsidianresearch.com> <20171016072644.GB28270@lst.de> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20171016072644.GB28270-jcswGhMUV9g@public.gmane.org> Content-Language: en-US Sender: linux-api-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: Christoph Hellwig , Jason Gunthorpe Cc: "J. Bruce Fields" , Jan Kara , Andrew Morton , Arnd Bergmann , "linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org" , Linux API , "Darrick J. Wong" , Dave Chinner , linux-xfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Linux MM , Jeff Layton , Al Viro , Andy Lutomirski , linux-fsdevel , Linus Torvalds List-Id: linux-api@vger.kernel.org >> I don't think that really represents how lots of apps actually use >> RDMA. >> >> RDMA is often buried down in the software stack (eg in a MPI), and by >> the time a mapping gets used for RDMA transfer the link between the >> FD, mmap and the MR is totally opaque. >> >> Having a MR specific notification means the low level RDMA libraries >> have a chance to deal with everything for the app. >> >> Eg consider a HPC app using MPI that uses some DAX aware library to >> get DAX backed mmap's. It then passes memory in those mmaps to the >> MPI library to do transfers. The MPI creates the MR on demand. >> > > I suspect one of the more interesting use cases might be a file server, > for which that's not the case. But otherwise I agree with the above, > and also thing that notifying the MR handle is the only way to go for > another very important reason: fencing. What if the application/library > does not react on the notification? With a per-MR notification we > can unregister the MR in kernel space and have a rock solid fencing > mechanism. And that is the most important bit here. I agree we must deregister the MR in kernel space. As said, I think its perfectly reasonable to let user-space see error completions and provide query mechanism for MR granularity (unfortunately this will probably need drivers assistance as they know how their device reports in MR granularity access violations).