From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jason Gunthorpe Subject: Re: [LSF/MM TOPIC] Discuss least bad options for resolving longterm-GUP usage by RDMA Date: Wed, 6 Feb 2019 16:21:30 -0700 Message-ID: <20190206232130.GK12227@ziepe.ca> References: <20190205175059.GB21617@iweiny-DESK2.sc.intel.com> <20190206095000.GA12006@quack2.suse.cz> <20190206173114.GB12227@ziepe.ca> <20190206175233.GN21860@bombadil.infradead.org> <47820c4d696aee41225854071ec73373a273fd4a.camel@redhat.com> <01000168c43d594c-7979fcf8-b9c1-4bda-b29a-500efe001d66-000000@email.amazonses.com> <20190206210356.GZ6173@dastard> <20190206220828.GJ12227@ziepe.ca> <0c868bc615a60c44d618fb0183fcbe0c418c7c83.camel@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Content-Disposition: inline In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org To: Dan Williams Cc: Doug Ledford , Dave Chinner , Christopher Lameter , Matthew Wilcox , Jan Kara , Ira Weiny , lsf-pc@lists.linux-foundation.org, linux-rdma , Linux MM , Linux Kernel Mailing List , John Hubbard , Jerome Glisse , Michal Hocko List-Id: linux-rdma@vger.kernel.org On Wed, Feb 06, 2019 at 02:44:45PM -0800, Dan Williams wrote: > > Do they need to stick with xfs? > > Can you clarify the motivation for that question? This problem exists > for any filesystem that implements an mmap that where the physical > page backing the mapping is identical to the physical storage location > for the file data. .. and needs to dynamicaly change that mapping. Which is not really something inherent to the general idea of a filesystem. A file system that had *strictly static* block assignments would work fine. Not all filesystem even implement hole punch. Not all filesystem implement reflink. ftruncate doesn't *have* to instantly return the free blocks to allocation pool. ie this is not a DAX & RDMA issue but a XFS & RDMA issue. Replacing XFS is probably not be reasonable, but I wonder if a XFS-- operating mode could exist that had enough features removed to be safe? Ie turn off REFLINK. Change the semantic of ftruncate to be more like ETXTBUSY. Turn off hole punch. > > Are they really trying to do COW backed mappings for the RDMA > > targets? Or do they want a COW backed FS but are perfectly happy > > if the specific RDMA targets are *not* COW and are statically > > allocated? > > I would expect the COW to be broken at registration time. Only ODP > could possibly support reflink + RDMA. So I think this devolves the > problem back to just the "what to do about truncate/punch-hole" > problem in the specific case of non-ODP hardware combined with the > Filesystem-DAX facility. Usually the problem with COW is that you make a READ RDMA MR and on a COW'd file, and some other thread breaks the COW.. This probably becomes a problem if the same process that has the MR triggers a COW break (ie by writing to the CPU mmap). This would cause the page to be reassigned but the MR would not be updated, which is not what the app expects. WRITE is simpler, once the COW is broken during GUP, the pages cannot be COW'd again until the DMA pin is released. So new reflinks would be blocked during the DMA pin period. To fix READ you'd have to treat it like WRITE and break the COW at GPU. Jason