linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@ziepe.ca>
To: Dave Chinner <david@fromorbit.com>
Cc: Christopher Lameter <cl@linux.com>,
	Doug Ledford <dledford@redhat.com>,
	Matthew Wilcox <willy@infradead.org>, Jan Kara <jack@suse.cz>,
	Ira Weiny <ira.weiny@intel.com>,
	lsf-pc@lists.linux-foundation.org, linux-rdma@vger.kernel.org,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	John Hubbard <jhubbard@nvidia.com>,
	Jerome Glisse <jglisse@redhat.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Michal Hocko <mhocko@kernel.org>
Subject: Re: [LSF/MM TOPIC] Discuss least bad options for resolving longterm-GUP usage by RDMA
Date: Wed, 6 Feb 2019 15:08:28 -0700	[thread overview]
Message-ID: <20190206220828.GJ12227@ziepe.ca> (raw)
In-Reply-To: <20190206210356.GZ6173@dastard>

On Thu, Feb 07, 2019 at 08:03:56AM +1100, Dave Chinner wrote:
> On Wed, Feb 06, 2019 at 07:16:21PM +0000, Christopher Lameter wrote:
> > On Wed, 6 Feb 2019, Doug Ledford wrote:
> > 
> > > > Most of the cases we want revoke for are things like truncate().
> > > > Shouldn't happen with a sane system, but we're trying to avoid users
> > > > doing awful things like being able to DMA to pages that are now part of
> > > > a different file.
> > >
> > > Why is the solution revoke then?  Is there something besides truncate
> > > that we have to worry about?  I ask because EBUSY is not currently
> > > listed as a return value of truncate, so extending the API to include
> > > EBUSY to mean "this file has pinned pages that can not be freed" is not
> > > (or should not be) totally out of the question.
> > >
> > > Admittedly, I'm coming in late to this conversation, but did I miss the
> > > portion where that alternative was ruled out?
> > 
> > Coming in late here too but isnt the only DAX case that we are concerned
> > about where there was an mmap with the O_DAX option to do direct write
> > though? If we only allow this use case then we may not have to worry about
> > long term GUP because DAX mapped files will stay in the physical location
> > regardless.
> 
> No, that is not guaranteed. Soon as we have reflink support on XFS,
> writes will physically move the data to a new physical location.
> This is non-negotiatiable, and cannot be blocked forever by a gup
> pin.
> 
> IOWs, DAX on RDMA requires a) page fault capable hardware so that
> the filesystem can move data physically on write access, and b)
> revokable file leases so that the filesystem can kick userspace out
> of the way when it needs to.

Why do we need both? You want to have leases for normal CPU mmaps too?

> Truncate is a red herring. It's definitely a case for revokable
> leases, but it's the rare case rather than the one we actually care
> about. We really care about making copy-on-write capable filesystems like
> XFS work with DAX (we've got people asking for it to be supported
> yesterday!), and that means DAX+RDMA needs to work with storage that
> can change physical location at any time.

Then we must continue to ban longterm pin with DAX..

Nobody is going to want to deploy a system where revoke can happen at
any time and if you don't respond fast enough your system either locks
with some kind of FS meltdown or your process gets SIGKILL. 

I don't really see a reason to invest so much design work into
something that isn't production worthy.

It *almost* made sense with ftruncate, because you could architect to
avoid ftruncate.. But just any FS op might reallocate? Naw.

Dave, you said the FS is responsible to arbitrate access to the
physical pages..

Is it possible to have a filesystem for DAX that is more suited to
this environment? Ie designed to not require block reallocation (no
COW, no reflinks, different approach to ftruncate, etc)

> And that's the real problem we need to solve here. RDMA has no trust
> model other than "I'm userspace, I pinned you, trust me!". That's
> not good enough for FS-DAX+RDMA....

It is baked into the silicon, and I don't see much motion on this
front right now. My best hope is that IOMMU PASID will get widely
deployed and RDMA silicon will arrive that can use it. Seems to be
years away, if at all.

At least we have one chip design that can work in a page faulting mode
..

Jason

  reply	other threads:[~2019-02-06 22:08 UTC|newest]

Thread overview: 106+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-05 17:50 [LSF/MM TOPIC] Discuss least bad options for resolving longterm-GUP usage by RDMA Ira Weiny
2019-02-05 18:01 ` Ira Weiny
2019-02-06 21:31   ` Dave Chinner
2019-02-06  9:50 ` Jan Kara
2019-02-06 17:31   ` Jason Gunthorpe
2019-02-06 17:52     ` Matthew Wilcox
2019-02-06 18:32       ` Doug Ledford
2019-02-06 18:35         ` Matthew Wilcox
2019-02-06 18:44           ` Doug Ledford
2019-02-06 18:52           ` Jason Gunthorpe
2019-02-06 19:45             ` Dan Williams
2019-02-06 20:14               ` Doug Ledford
2019-02-06 21:04                 ` Dan Williams
2019-02-06 21:12                   ` Doug Ledford
2019-02-06 19:16         ` Christopher Lameter
2019-02-06 19:40           ` Matthew Wilcox
2019-02-06 20:16             ` Doug Ledford
2019-02-06 20:20               ` Matthew Wilcox
2019-02-06 20:28                 ` Doug Ledford
2019-02-06 20:41                   ` Matthew Wilcox
2019-02-06 20:47                     ` Doug Ledford
2019-02-06 20:49                       ` Matthew Wilcox
2019-02-06 20:50                         ` Doug Ledford
2019-02-06 20:31                 ` Jason Gunthorpe
2019-02-06 20:39                 ` Christopher Lameter
2019-02-06 20:54                 ` Doug Ledford
2019-02-07 16:48                   ` Jan Kara
2019-02-06 20:24             ` Christopher Lameter
2019-02-06 21:03           ` Dave Chinner
2019-02-06 22:08             ` Jason Gunthorpe [this message]
2019-02-06 22:24               ` Doug Ledford
2019-02-06 22:44                 ` Dan Williams
2019-02-06 23:21                   ` Jason Gunthorpe
2019-02-06 23:30                     ` Dan Williams
2019-02-06 23:41                       ` Jason Gunthorpe
2019-02-07  0:22                         ` Dan Williams
2019-02-07  5:33                           ` Jason Gunthorpe
2019-02-07  1:57                   ` Doug Ledford
2019-02-07  2:48                     ` Dan Williams
2019-02-07  2:42                   ` Doug Ledford
2019-02-07  3:13                     ` Dan Williams
2019-02-07 17:23                       ` Ira Weiny
2019-02-07 16:25                   ` Doug Ledford
2019-02-07 16:55                     ` Christopher Lameter
2019-02-07 17:35                       ` Ira Weiny
2019-02-07 18:17                         ` Christopher Lameter
2019-02-08  4:43                       ` Dave Chinner
2019-02-08 11:10                         ` Jan Kara
2019-02-08 20:50                           ` Dan Williams
2019-02-11 10:24                             ` Jan Kara
2019-02-11 17:22                               ` Dan Williams
2019-02-11 18:06                                 ` Jason Gunthorpe
2019-02-11 18:15                                   ` Dan Williams
2019-02-11 18:19                                   ` Ira Weiny
2019-02-11 18:26                                     ` Jason Gunthorpe
2019-02-11 18:40                                       ` Matthew Wilcox
2019-02-11 19:58                                         ` Dan Williams
2019-02-11 20:49                                           ` Jason Gunthorpe
2019-02-11 21:02                                             ` Dan Williams
2019-02-11 21:09                                               ` Jason Gunthorpe
2019-02-12 16:34                                                 ` Jan Kara
2019-02-12 16:55                                                   ` Christopher Lameter
2019-02-13 15:06                                                     ` Jan Kara
2019-02-12 16:36                                               ` Christopher Lameter
2019-02-12 16:44                                                 ` Jan Kara
2019-02-11 21:08                                     ` Jerome Glisse
2019-02-11 21:22                                     ` John Hubbard
2019-02-11 22:12                                       ` Jason Gunthorpe
2019-02-11 22:33                                         ` John Hubbard
2019-02-12 16:39                                           ` Christopher Lameter
2019-02-13  2:58                                             ` John Hubbard
2019-02-12 16:28                                   ` Jan Kara
2019-02-14 20:26                                   ` Jerome Glisse
2019-02-14 20:50                                     ` Matthew Wilcox
2019-02-14 21:39                                       ` Jerome Glisse
2019-02-15  1:19                                         ` Dave Chinner
2019-02-15 15:42                                           ` Christopher Lameter
2019-02-15 18:08                                             ` Matthew Wilcox
2019-02-15 18:31                                               ` Christopher Lameter
2019-02-15 22:00                                                 ` Jason Gunthorpe
2019-02-15 23:38                                                   ` Ira Weiny
2019-02-16 22:42                                                     ` Dave Chinner
2019-02-17  2:54                                                     ` Christopher Lameter
2019-02-12 16:07                                 ` Jan Kara
2019-02-12 21:53                                   ` Dan Williams
2019-02-08 21:20                           ` Dave Chinner
2019-02-08 15:33                         ` Christopher Lameter
2019-02-07 17:24                     ` Matthew Wilcox
2019-02-07 17:26                       ` Jason Gunthorpe
2019-02-07  3:52                 ` Dave Chinner
2019-02-07  5:23                   ` Jason Gunthorpe
2019-02-07  6:00                     ` Dan Williams
2019-02-07 17:17                       ` Jason Gunthorpe
2019-02-07 23:54                         ` Dan Williams
2019-02-08  1:44                           ` Ira Weiny
2019-02-08  5:19                           ` Jason Gunthorpe
2019-02-08  7:20                             ` Dan Williams
2019-02-08 15:42                               ` Jason Gunthorpe
2019-02-07 15:04                     ` Chuck Lever
2019-02-07 15:28                       ` Tom Talpey
2019-02-07 15:37                         ` Doug Ledford
2019-02-07 15:41                           ` Tom Talpey
2019-02-07 15:56                             ` Doug Ledford
2019-02-07 16:57                         ` Ira Weiny
2019-02-07 21:31                           ` Tom Talpey
2019-02-07 16:54                     ` Ira Weiny

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190206220828.GJ12227@ziepe.ca \
    --to=jgg@ziepe.ca \
    --cc=cl@linux.com \
    --cc=dan.j.williams@intel.com \
    --cc=david@fromorbit.com \
    --cc=dledford@redhat.com \
    --cc=ira.weiny@intel.com \
    --cc=jack@suse.cz \
    --cc=jglisse@redhat.com \
    --cc=jhubbard@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=mhocko@kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).