From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from quartz.orcorp.ca (quartz.orcorp.ca [184.70.90.242]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 8D9B121F38846 for ; Fri, 13 Oct 2017 08:00:46 -0700 (PDT) Date: Fri, 13 Oct 2017 09:03:48 -0600 From: Jason Gunthorpe Subject: Re: [PATCH v7 07/12] dma-mapping: introduce dma_has_iommu() Message-ID: <20171013150348.GA11257@obsidianresearch.com> References: <20171009191820.GD15336@obsidianresearch.com> <20171010172516.GA29915@obsidianresearch.com> <20171010180512.GA31734@obsidianresearch.com> <20171012182712.GA5772@obsidianresearch.com> <20171013065047.GA26461@lst.de> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20171013065047.GA26461@lst.de> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" To: Christoph Hellwig Cc: Jan Kara , Ashok Raj , "Darrick J. Wong" , linux-rdma@vger.kernel.org, Greg Kroah-Hartman , Joerg Roedel , "linux-nvdimm@lists.01.org" , Dave Chinner , Robin Murphy , linux-xfs@vger.kernel.org, Linux MM , Linux API , linux-fsdevel , David Woodhouse , Marek Szyprowski List-ID: On Fri, Oct 13, 2017 at 08:50:47AM +0200, Christoph Hellwig wrote: > > However, chatting this over with a few more people I have an alternate > > solution that effectively behaves the same as how non-ODP hardware > > handles this case of hole punch / truncation today. So, today if this > > scenario happens on a page-cache backed mapping, the file blocks are > > unmapped and the RDMA continues into pinned pages that are no longer > > part of the file. We can achieve the same thing with the iommu, just > > re-target the I/O into memory that isn't part of the file. That way > > hardware does not see I/O errors and the DAX data consistency model is > > no worse than the page-cache case. > > Yikes. Well, as much as you say Yikes, Dan is correct, this does match the semantics RDMA MR's already have. They become non-coherent if their underlying object is changed, and there are many ways to get there. I've never thought about it, but it does sound like ftruncate, fallocate, etc on a normal file would break the MR coherency too?? There have been efforts in the past driven by the MPI people to create, essentially, something like lease-break' SIGIO. Except it was intended to be general, and wanted solve all the problems related with MR de-coherence. This was complicated and never became acceptable to mainline. Instead ODP was developed, and ODP actually solves all the problem sanely. Thinking about it some more, and with your other comments on get_user_pages in this thread, I tend to agree. It doesn't make sense to develop a user space lease break API for MR's that is a DAX specific feature. Along the some lines, it also doesn't make sense to force-invalidate MR's linked to DAX regions, while leaving MR's linked to other regions that have the same problem alone. If you want to make non-ODP MR's work better, then you need to have a general overall solution to tell userspace when the MR becomes (or I guess, is becoming) non-coherent, that covers all the cases that break MR coherence, not just via DAX. Otherwise, I think Dan is right, keeping the current semantic of having MRs just do something wrong, but not corrupt memory, when they loose coherence, is broadly consistent with how non-ODP MRs work today. Jason _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jason Gunthorpe Subject: Re: [PATCH v7 07/12] dma-mapping: introduce dma_has_iommu() Date: Fri, 13 Oct 2017 09:03:48 -0600 Message-ID: <20171013150348.GA11257@obsidianresearch.com> References: <20171009191820.GD15336@obsidianresearch.com> <20171010172516.GA29915@obsidianresearch.com> <20171010180512.GA31734@obsidianresearch.com> <20171012182712.GA5772@obsidianresearch.com> <20171013065047.GA26461@lst.de> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Content-Disposition: inline In-Reply-To: <20171013065047.GA26461-jcswGhMUV9g@public.gmane.org> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: linux-nvdimm-bounces-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org Sender: "Linux-nvdimm" To: Christoph Hellwig Cc: Jan Kara , Ashok Raj , "Darrick J. Wong" , linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Greg Kroah-Hartman , Joerg Roedel , "linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org" , Dave Chinner , Robin Murphy , linux-xfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Linux MM , Linux API , linux-fsdevel , David Woodhouse , Marek Szyprowski List-Id: linux-rdma@vger.kernel.org On Fri, Oct 13, 2017 at 08:50:47AM +0200, Christoph Hellwig wrote: > > However, chatting this over with a few more people I have an alternate > > solution that effectively behaves the same as how non-ODP hardware > > handles this case of hole punch / truncation today. So, today if this > > scenario happens on a page-cache backed mapping, the file blocks are > > unmapped and the RDMA continues into pinned pages that are no longer > > part of the file. We can achieve the same thing with the iommu, just > > re-target the I/O into memory that isn't part of the file. That way > > hardware does not see I/O errors and the DAX data consistency model is > > no worse than the page-cache case. > > Yikes. Well, as much as you say Yikes, Dan is correct, this does match the semantics RDMA MR's already have. They become non-coherent if their underlying object is changed, and there are many ways to get there. I've never thought about it, but it does sound like ftruncate, fallocate, etc on a normal file would break the MR coherency too?? There have been efforts in the past driven by the MPI people to create, essentially, something like lease-break' SIGIO. Except it was intended to be general, and wanted solve all the problems related with MR de-coherence. This was complicated and never became acceptable to mainline. Instead ODP was developed, and ODP actually solves all the problem sanely. Thinking about it some more, and with your other comments on get_user_pages in this thread, I tend to agree. It doesn't make sense to develop a user space lease break API for MR's that is a DAX specific feature. Along the some lines, it also doesn't make sense to force-invalidate MR's linked to DAX regions, while leaving MR's linked to other regions that have the same problem alone. If you want to make non-ODP MR's work better, then you need to have a general overall solution to tell userspace when the MR becomes (or I guess, is becoming) non-coherent, that covers all the cases that break MR coherence, not just via DAX. Otherwise, I think Dan is right, keeping the current semantic of having MRs just do something wrong, but not corrupt memory, when they loose coherence, is broadly consistent with how non-ODP MRs work today. Jason From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Fri, 13 Oct 2017 09:03:48 -0600 From: Jason Gunthorpe To: Christoph Hellwig Cc: Dan Williams , "linux-nvdimm@lists.01.org" , Jan Kara , Ashok Raj , "Darrick J. Wong" , linux-rdma@vger.kernel.org, Greg Kroah-Hartman , Joerg Roedel , Dave Chinner , linux-xfs@vger.kernel.org, Linux MM , Jeff Moyer , Linux API , linux-fsdevel , Ross Zwisler , David Woodhouse , Robin Murphy , Marek Szyprowski Subject: Re: [PATCH v7 07/12] dma-mapping: introduce dma_has_iommu() Message-ID: <20171013150348.GA11257@obsidianresearch.com> References: <20171009191820.GD15336@obsidianresearch.com> <20171010172516.GA29915@obsidianresearch.com> <20171010180512.GA31734@obsidianresearch.com> <20171012182712.GA5772@obsidianresearch.com> <20171013065047.GA26461@lst.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20171013065047.GA26461@lst.de> Sender: owner-linux-mm@kvack.org List-ID: On Fri, Oct 13, 2017 at 08:50:47AM +0200, Christoph Hellwig wrote: > > However, chatting this over with a few more people I have an alternate > > solution that effectively behaves the same as how non-ODP hardware > > handles this case of hole punch / truncation today. So, today if this > > scenario happens on a page-cache backed mapping, the file blocks are > > unmapped and the RDMA continues into pinned pages that are no longer > > part of the file. We can achieve the same thing with the iommu, just > > re-target the I/O into memory that isn't part of the file. That way > > hardware does not see I/O errors and the DAX data consistency model is > > no worse than the page-cache case. > > Yikes. Well, as much as you say Yikes, Dan is correct, this does match the semantics RDMA MR's already have. They become non-coherent if their underlying object is changed, and there are many ways to get there. I've never thought about it, but it does sound like ftruncate, fallocate, etc on a normal file would break the MR coherency too?? There have been efforts in the past driven by the MPI people to create, essentially, something like lease-break' SIGIO. Except it was intended to be general, and wanted solve all the problems related with MR de-coherence. This was complicated and never became acceptable to mainline. Instead ODP was developed, and ODP actually solves all the problem sanely. Thinking about it some more, and with your other comments on get_user_pages in this thread, I tend to agree. It doesn't make sense to develop a user space lease break API for MR's that is a DAX specific feature. Along the some lines, it also doesn't make sense to force-invalidate MR's linked to DAX regions, while leaving MR's linked to other regions that have the same problem alone. If you want to make non-ODP MR's work better, then you need to have a general overall solution to tell userspace when the MR becomes (or I guess, is becoming) non-coherent, that covers all the cases that break MR coherence, not just via DAX. Otherwise, I think Dan is right, keeping the current semantic of having MRs just do something wrong, but not corrupt memory, when they loose coherence, is broadly consistent with how non-ODP MRs work today. Jason -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from quartz.orcorp.ca ([184.70.90.242]:51169 "EHLO quartz.orcorp.ca" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753478AbdJMPE2 (ORCPT ); Fri, 13 Oct 2017 11:04:28 -0400 Date: Fri, 13 Oct 2017 09:03:48 -0600 From: Jason Gunthorpe Subject: Re: [PATCH v7 07/12] dma-mapping: introduce dma_has_iommu() Message-ID: <20171013150348.GA11257@obsidianresearch.com> References: <20171009191820.GD15336@obsidianresearch.com> <20171010172516.GA29915@obsidianresearch.com> <20171010180512.GA31734@obsidianresearch.com> <20171012182712.GA5772@obsidianresearch.com> <20171013065047.GA26461@lst.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20171013065047.GA26461@lst.de> Sender: linux-xfs-owner@vger.kernel.org List-ID: List-Id: xfs To: Christoph Hellwig Cc: Dan Williams , "linux-nvdimm@lists.01.org" , Jan Kara , Ashok Raj , "Darrick J. Wong" , linux-rdma@vger.kernel.org, Greg Kroah-Hartman , Joerg Roedel , Dave Chinner , linux-xfs@vger.kernel.org, Linux MM , Jeff Moyer , Linux API , linux-fsdevel , Ross Zwisler , David Woodhouse , Robin Murphy , Marek Szyprowski On Fri, Oct 13, 2017 at 08:50:47AM +0200, Christoph Hellwig wrote: > > However, chatting this over with a few more people I have an alternate > > solution that effectively behaves the same as how non-ODP hardware > > handles this case of hole punch / truncation today. So, today if this > > scenario happens on a page-cache backed mapping, the file blocks are > > unmapped and the RDMA continues into pinned pages that are no longer > > part of the file. We can achieve the same thing with the iommu, just > > re-target the I/O into memory that isn't part of the file. That way > > hardware does not see I/O errors and the DAX data consistency model is > > no worse than the page-cache case. > > Yikes. Well, as much as you say Yikes, Dan is correct, this does match the semantics RDMA MR's already have. They become non-coherent if their underlying object is changed, and there are many ways to get there. I've never thought about it, but it does sound like ftruncate, fallocate, etc on a normal file would break the MR coherency too?? There have been efforts in the past driven by the MPI people to create, essentially, something like lease-break' SIGIO. Except it was intended to be general, and wanted solve all the problems related with MR de-coherence. This was complicated and never became acceptable to mainline. Instead ODP was developed, and ODP actually solves all the problem sanely. Thinking about it some more, and with your other comments on get_user_pages in this thread, I tend to agree. It doesn't make sense to develop a user space lease break API for MR's that is a DAX specific feature. Along the some lines, it also doesn't make sense to force-invalidate MR's linked to DAX regions, while leaving MR's linked to other regions that have the same problem alone. If you want to make non-ODP MR's work better, then you need to have a general overall solution to tell userspace when the MR becomes (or I guess, is becoming) non-coherent, that covers all the cases that break MR coherence, not just via DAX. Otherwise, I think Dan is right, keeping the current semantic of having MRs just do something wrong, but not corrupt memory, when they loose coherence, is broadly consistent with how non-ODP MRs work today. Jason