Linux-Fsdevel Archive on lore.kernel.org
 help / color / Atom feed
From: Christoph Hellwig <hch@infradead.org>
To: Dave Chinner <david@fromorbit.com>
Cc: Jerome Glisse <jglisse@redhat.com>,
	John Hubbard <jhubbard@nvidia.com>,
	john.hubbard@gmail.com, Matthew Wilcox <willy@infradead.org>,
	Michal Hocko <mhocko@kernel.org>,
	Christopher Lameter <cl@linux.com>,
	Jason Gunthorpe <jgg@ziepe.ca>,
	Dan Williams <dan.j.williams@intel.com>, Jan Kara <jack@suse.cz>,
	Al Viro <viro@zeniv.linux.org.uk>,
	linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>,
	linux-rdma <linux-rdma@vger.kernel.org>,
	linux-fsdevel@vger.kernel.org,
	Christian Benvenuti <benve@cisco.com>,
	Dennis Dalessandro <dennis.dalessandro@intel.com>,
	Doug Ledford <dledford@redhat.com>,
	Mike Marciniszyn <mike.marciniszyn@intel.com>
Subject: Re: [PATCH 0/4] get_user_pages*() and RDMA: first steps
Date: Mon, 1 Oct 2018 05:47:57 -0700
Message-ID: <20181001124757.GA26218@infradead.org> (raw)
In-Reply-To: <20181001061127.GQ31060@dastard>

On Mon, Oct 01, 2018 at 04:11:27PM +1000, Dave Chinner wrote:
> This reminds me so much of Linux mmap() in the mid-2000s - mmap()
> worked for ext3 without being aware of page faults,

And "worked" still is a bit of a stretch, as soon as you'd get
ENOSPC it would still blow up badly.  Which probably makes it an
even better analogy to the current case.

> RDMA does not call ->page_mkwrite on clean file backed pages before it
> writes to them and calls set_page_dirty(), and hence RDMA to file
> backed pages is completely unreliable. I'm not sure this can be
> solved without having page fault capable RDMA hardware....

We can always software prefault at gup time.  And also remember that
while RDMA might be the case at least some people care about here it
really isn't different from any of the other gup + I/O cases, including
doing direct I/O to a mmap area.  The only difference in the various
cases is how long the area should be pinned down - some users like RDMA
want a long term mapping, while others like direct I/O just need a short
transient one.

> We could address these use-after-free situations via forcing RDMA to
> use file layout leases and revoke the lease when we need to modify
> the backing store on leased files. However, this doesn't solve the
> need for filesystems to receive write fault notifications via
> ->page_mkwrite.

Exactly.   We need three things here:

 - notification to the filesystem that a page is (possibly) beeing
   written to
 - a way to to block fs operations while the pages are pinned
 - a way to distinguish between short and long term mappings,
   and only allow long terms mappings if they can be broken
   using something like leases

I'm also pretty sure we already explained this a long time ago when the
issue came up last year, so I'm not sure why this is even still
contentious.

  reply index

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-28  5:39 john.hubbard
2018-09-28  5:39 ` [PATCH 1/4] mm: get_user_pages: consolidate error handling john.hubbard
2018-09-28  5:39 ` [PATCH 3/4] infiniband/mm: convert to the new put_user_page() call john.hubbard
2018-09-28 15:39   ` Jason Gunthorpe
2018-09-29  3:12     ` John Hubbard
2018-09-29 16:21       ` Matthew Wilcox
2018-09-29 19:19         ` Jason Gunthorpe
2018-10-01 12:50         ` Christoph Hellwig
2018-10-01 15:29           ` Matthew Wilcox
2018-10-01 15:51             ` Christoph Hellwig
2018-10-01 14:35       ` Dennis Dalessandro
2018-10-03  5:40         ` John Hubbard
2018-10-03 16:27       ` Jan Kara
2018-10-03 23:19         ` John Hubbard
2018-09-28  5:39 ` [PATCH 2/4] mm: introduce put_user_page(), placeholder version john.hubbard
2018-10-03 16:22   ` Jan Kara
2018-10-03 23:23     ` John Hubbard
2018-09-28  5:39 ` [PATCH 4/4] goldfish_pipe/mm: convert to the new release_user_pages() call john.hubbard
2018-09-28 15:29 ` [PATCH 0/4] get_user_pages*() and RDMA: first steps Jerome Glisse
2018-09-28 19:06   ` John Hubbard
2018-09-28 21:49     ` Jerome Glisse
2018-09-29  2:28       ` John Hubbard
2018-09-29  8:46         ` Jerome Glisse
2018-10-01  6:11           ` Dave Chinner
2018-10-01 12:47             ` Christoph Hellwig [this message]
2018-10-02  1:14               ` Dave Chinner
2018-10-03 16:21                 ` Jan Kara
2018-10-01 15:31             ` Jason Gunthorpe
2018-10-03 16:08           ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181001124757.GA26218@infradead.org \
    --to=hch@infradead.org \
    --cc=benve@cisco.com \
    --cc=cl@linux.com \
    --cc=dan.j.williams@intel.com \
    --cc=david@fromorbit.com \
    --cc=dennis.dalessandro@intel.com \
    --cc=dledford@redhat.com \
    --cc=jack@suse.cz \
    --cc=jgg@ziepe.ca \
    --cc=jglisse@redhat.com \
    --cc=jhubbard@nvidia.com \
    --cc=john.hubbard@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=mhocko@kernel.org \
    --cc=mike.marciniszyn@intel.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org
	public-inbox-index linux-fsdevel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git