LKML Archive on lore.kernel.org
 help / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Jan Kara <jack@suse.cz>, John Hubbard <jhubbard@nvidia.com>,
	Matthew Wilcox <willy@infradead.org>,
	John Hubbard <john.hubbard@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linux MM <linux-mm@kvack.org>,
	tom@talpey.com, Al Viro <viro@zeniv.linux.org.uk>,
	benve@cisco.com, Christoph Hellwig <hch@infradead.org>,
	Christopher Lameter <cl@linux.com>,
	"Dalessandro, Dennis" <dennis.dalessandro@intel.com>,
	Doug Ledford <dledford@redhat.com>,
	Jason Gunthorpe <jgg@ziepe.ca>, Michal Hocko <mhocko@kernel.org>,
	Mike Marciniszyn <mike.marciniszyn@intel.com>,
	rcampbell@nvidia.com,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	"Weiny, Ira" <ira.weiny@intel.com>
Subject: Re: [PATCH 1/2] mm: introduce put_user_page*(), placeholder versions
Date: Wed, 12 Dec 2018 13:40:50 -0800
Message-ID: <CAPcyv4gJHeFjEgna1S-2uE4KxkSUgkc=e=2E5oqfoirec84C-w@mail.gmail.com> (raw)
In-Reply-To: <20181212213005.GE5037@redhat.com>

On Wed, Dec 12, 2018 at 1:30 PM Jerome Glisse <jglisse@redhat.com> wrote:
>
> On Wed, Dec 12, 2018 at 08:27:35AM -0800, Dan Williams wrote:
> > On Wed, Dec 12, 2018 at 7:03 AM Jerome Glisse <jglisse@redhat.com> wrote:
> > >
> > > On Mon, Dec 10, 2018 at 11:28:46AM +0100, Jan Kara wrote:
> > > > On Fri 07-12-18 21:24:46, Jerome Glisse wrote:
> > > > > Another crazy idea, why not treating GUP as another mapping of the page
> > > > > and caller of GUP would have to provide either a fake anon_vma struct or
> > > > > a fake vma struct (or both for PRIVATE mapping of a file where you can
> > > > > have a mix of both private and file page thus only if it is a read only
> > > > > GUP) that would get added to the list of existing mapping.
> > > > >
> > > > > So the flow would be:
> > > > >     somefunction_thatuse_gup()
> > > > >     {
> > > > >         ...
> > > > >         GUP(_fast)(vma, ..., fake_anon, fake_vma);
> > > > >         ...
> > > > >     }
> > > > >
> > > > >     GUP(vma, ..., fake_anon, fake_vma)
> > > > >     {
> > > > >         if (vma->flags == ANON) {
> > > > >             // Add the fake anon vma to the anon vma chain as a child
> > > > >             // of current vma
> > > > >         } else {
> > > > >             // Add the fake vma to the mapping tree
> > > > >         }
> > > > >
> > > > >         // The existing GUP except that now it inc mapcount and not
> > > > >         // refcount
> > > > >         GUP_old(..., &nanonymous, &nfiles);
> > > > >
> > > > >         atomic_add(&fake_anon->refcount, nanonymous);
> > > > >         atomic_add(&fake_vma->refcount, nfiles);
> > > > >
> > > > >         return nanonymous + nfiles;
> > > > >     }
> > > >
> > > > Thanks for your idea! This is actually something like I was suggesting back
> > > > at LSF/MM in Deer Valley. There were two downsides to this I remember
> > > > people pointing out:
> > > >
> > > > 1) This cannot really work with __get_user_pages_fast(). You're not allowed
> > > > to get necessary locks to insert new entry into the VMA tree in that
> > > > context. So essentially we'd loose get_user_pages_fast() functionality.
> > > >
> > > > 2) The overhead e.g. for direct IO may be noticeable. You need to allocate
> > > > the fake tracking VMA, get VMA interval tree lock, insert into the tree.
> > > > Then on IO completion you need to queue work to unpin the pages again as you
> > > > cannot remove the fake VMA directly from interrupt context where the IO is
> > > > completed.
> > > >
> > > > You are right that the cost could be amortized if gup() is called for
> > > > multiple consecutive pages however for small IOs there's no help...
> > > >
> > > > So this approach doesn't look like a win to me over using counter in struct
> > > > page and I'd rather try looking into squeezing HMM public page usage of
> > > > struct page so that we can fit that gup counter there as well. I know that
> > > > it may be easier said than done...
> > >
> > > So i want back to the drawing board and first i would like to ascertain
> > > that we all agree on what the objectives are:
> > >
> > >     [O1] Avoid write back from a page still being written by either a
> > >          device or some direct I/O or any other existing user of GUP.
> > >          This would avoid possible file system corruption.
> > >
> > >     [O2] Avoid crash when set_page_dirty() is call on a page that is
> > >          considered clean by core mm (buffer head have been remove and
> > >          with some file system this turns into an ugly mess).
> > >
> > >     [O3] DAX and the device block problems, ie with DAX the page map in
> > >          userspace is the same as the block (persistent memory) and no
> > >          filesystem nor block device understand page as block or pinned
> > >          block.
> > >
> > > For [O3] i don't think any pin count would help in anyway. I believe
> > > that the current long term GUP API that does not allow GUP of DAX is
> > > the only sane solution for now.
> >
> > No, that's not a sane solution, it's an emergency hack.
> >
> > > The real fix would be to teach file-
> > > system about DAX/pinned block so that a pinned block is not reuse
> > > by filesystem.
> >
> > We already have taught filesystems about pinned dax pages, see
> > dax_layout_busy_page(). As much as possible I want to eliminate the
> > concept of "dax pages" as a special case that gets sprinkled
> > throughout the mm.
>
> So thinking on O3 issues what about leveraging the recent change i
> did to mmu notifier. Add a event for truncate or any other file
> event that need to invalidate the file->page for a range of offset.
>
> Add mmu notifier listener to GUP user (except direct I/O) so that
> they invalidate they hardware mapping or switch the hardware mapping
> to use a crappy page. When such event happens what ever user do to
> the page through that driver is broken anyway. So it is better to
> be loud about it then trying to make it pass under the radar.
>
> This will put the burden on broken user and allow you to properly
> recycle your DAX page.
>
> Think of it as revoke through mmu notifier.
>
> So patchset would be:
>     enum mmu_notifier_event {
> +       MMU_NOTIFY_TRUNCATE,
>     };
>
> +   Change truncate code path to emit MMU_NOTIFY_TRUNCATE
>
> Then for each user of GUP (except direct I/O or other very short
> term GUP):
>
>     Patch 1: register mmu notifier
>     Patch 2: listen to MMU_NOTIFY_TRUNCATE and MMU_NOTIFY_UNMAP
>              when that happens update the device page table or
>              usage to point to a crappy page and do put_user_page
>              on all previously held page
>
> So this would solve the revoke side of thing without adding a burden
> on GUP user like direct I/O. Many existing user of GUP already do
> listen to mmu notifier and already behave properly. It is just about
> making every body list to that. Then we can even add the mmu notifier
> pointer as argument to GUP just to make sure no new user of GUP forget
> about registering a notifier (argument as a teaching guide not as a
> something actively use).
>
>
> So does that sounds like a plan to solve your concern with long term
> GUP user ? This does not depend on DAX or anything it would apply to
> any file back pages.

Almost, we need some safety around assuming that DMA is complete the
page, so the notification would need to go all to way to userspace
with something like a file lease notification. It would also need to
be backstopped by an IOMMU in the case where the hardware does not /
can not stop in-flight DMA.

  reply index

Thread overview: 179+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-04  0:17 [PATCH 0/2] put_user_page*(): start converting the call sites john.hubbard
2018-12-04  0:17 ` [PATCH 1/2] mm: introduce put_user_page*(), placeholder versions john.hubbard
2018-12-04  7:53   ` Mike Rapoport
2018-12-05  1:40     ` John Hubbard
2018-12-04 20:28   ` Dan Williams
2018-12-04 21:56     ` John Hubbard
2018-12-04 23:03       ` Dan Williams
2018-12-05  0:36         ` Jerome Glisse
2018-12-05  0:40           ` Dan Williams
2018-12-05  0:59             ` John Hubbard
2018-12-05  0:58         ` John Hubbard
2018-12-05  1:00           ` Dan Williams
2018-12-05  1:15           ` Matthew Wilcox
2018-12-05  1:44             ` Jerome Glisse
2018-12-05  1:57               ` John Hubbard
2018-12-07  2:45                 ` John Hubbard
2018-12-07 19:16                   ` Jerome Glisse
2018-12-07 19:26                     ` Dan Williams
2018-12-07 19:40                       ` Jerome Glisse
2018-12-08  0:52                     ` John Hubbard
2018-12-08  2:24                       ` Jerome Glisse
2018-12-10 10:28                         ` Jan Kara
2018-12-12 15:03                           ` Jerome Glisse
2018-12-12 16:27                             ` Dan Williams
2018-12-12 17:02                               ` Jerome Glisse
2018-12-12 17:49                                 ` Dan Williams
2018-12-12 19:07                                   ` John Hubbard
2018-12-12 21:30                               ` Jerome Glisse
2018-12-12 21:40                                 ` Dan Williams [this message]
2018-12-12 21:53                                   ` Jerome Glisse
2018-12-12 22:11                                     ` Matthew Wilcox
2018-12-12 22:16                                       ` Jerome Glisse
2018-12-12 23:37                                     ` Jason Gunthorpe
2018-12-12 23:46                                       ` John Hubbard
2018-12-12 23:54                                       ` Dan Williams
2018-12-13  0:01                                       ` Jerome Glisse
2018-12-13  0:18                                         ` Dan Williams
2018-12-13  0:44                                           ` Jerome Glisse
2018-12-13  3:26                                             ` Jason Gunthorpe
2018-12-13  3:20                                         ` Jason Gunthorpe
2018-12-13 12:43                                           ` Jerome Glisse
2018-12-13 13:40                                             ` Tom Talpey
2018-12-13 14:18                                               ` Jerome Glisse
2018-12-13 14:51                                                 ` Tom Talpey
2018-12-13 15:18                                                   ` Jerome Glisse
2018-12-13 18:12                                                     ` Tom Talpey
2018-12-13 19:18                                                       ` Jerome Glisse
2018-12-14 10:41                                             ` Jan Kara
2018-12-14 15:25                                               ` Jerome Glisse
2018-12-12 21:56                                 ` John Hubbard
2018-12-12 22:04                                   ` Jerome Glisse
2018-12-12 22:11                                     ` John Hubbard
2018-12-12 22:14                                       ` Jerome Glisse
2018-12-12 22:17                                         ` John Hubbard
2018-12-12 21:46                             ` Dave Chinner
2018-12-12 21:59                               ` Jerome Glisse
2018-12-13  0:51                                 ` Dave Chinner
2018-12-13  2:02                                   ` Jerome Glisse
2018-12-13 15:56                                     ` Christopher Lameter
2018-12-13 16:02                                       ` Jerome Glisse
2018-12-14  6:00                                     ` Dave Chinner
2018-12-14 15:13                                       ` Jerome Glisse
2018-12-14  3:52                                   ` John Hubbard
2018-12-14  5:21                                     ` Dan Williams
2018-12-14  6:11                                       ` John Hubbard
2018-12-14 15:20                                         ` Jerome Glisse
2018-12-14 19:38                                         ` Dan Williams
2018-12-14 19:48                                           ` Matthew Wilcox
2018-12-14 19:53                                             ` Dave Hansen
2018-12-14 20:03                                               ` Matthew Wilcox
2018-12-14 20:17                                                 ` Dan Williams
2018-12-14 20:29                                                   ` Matthew Wilcox
2018-12-15  0:41                                                 ` John Hubbard
2018-12-17  8:56                                           ` Jan Kara
2018-12-17 18:28                                             ` Dan Williams
2018-12-14 15:43                               ` Jan Kara
2018-12-16 21:58                                 ` Dave Chinner
2018-12-17 18:11                                   ` Jerome Glisse
2018-12-17 18:34                                     ` Matthew Wilcox
2018-12-17 19:48                                       ` Jerome Glisse
2018-12-17 19:51                                         ` Matthew Wilcox
2018-12-17 19:54                                           ` Jerome Glisse
2018-12-17 19:59                                             ` Matthew Wilcox
2018-12-17 20:55                                               ` Jerome Glisse
2018-12-17 21:03                                                 ` Matthew Wilcox
2018-12-17 21:15                                                   ` Jerome Glisse
2018-12-18  1:09                                       ` Dave Chinner
2018-12-18  6:12                                       ` Darrick J. Wong
2018-12-18  9:30                                       ` Jan Kara
2018-12-18 23:29                                         ` John Hubbard
2018-12-19  2:07                                           ` Jerome Glisse
2018-12-19 11:08                                             ` Jan Kara
2018-12-20 10:54                                               ` John Hubbard
2018-12-20 16:50                                                 ` Jerome Glisse
2018-12-20 16:57                                                   ` Dan Williams
2018-12-20 16:49                                               ` Jerome Glisse
2019-01-03  1:55                                               ` Jerome Glisse
2019-01-03  3:27                                                 ` John Hubbard
2019-01-03 14:57                                                   ` Jerome Glisse
2019-01-03  9:26                                                 ` Jan Kara
2019-01-03 14:44                                                   ` Jerome Glisse
2019-01-11  2:59                                                     ` John Hubbard
2019-01-11 16:51                                                       ` Jerome Glisse
2019-01-12  1:04                                                         ` John Hubbard
2019-01-12  2:02                                                           ` Jerome Glisse
2019-01-12  2:38                                                             ` John Hubbard
2019-01-12  2:46                                                               ` Jerome Glisse
2019-01-12  3:06                                                                 ` John Hubbard
2019-01-12  3:25                                                                   ` Jerome Glisse
2019-01-12 20:46                                                                     ` John Hubbard
2019-01-14 14:54                                                                   ` Jan Kara
2019-01-14 17:21                                                                     ` Jerome Glisse
2019-01-14 19:09                                                                       ` John Hubbard
2019-01-15  8:34                                                                         ` Jan Kara
2019-01-15 21:39                                                                           ` John Hubbard
2019-01-15  8:07                                                                       ` Jan Kara
2019-01-15 17:15                                                                         ` Jerome Glisse
2019-01-15 21:56                                                                           ` John Hubbard
2019-01-15 22:12                                                                             ` Jerome Glisse
2019-01-16  0:44                                                                               ` John Hubbard
2019-01-16  1:56                                                                                 ` Jerome Glisse
2019-01-16  2:01                                                                                   ` Dan Williams
2019-01-16  2:23                                                                                     ` Jerome Glisse
2019-01-16  4:34                                                                                       ` Dave Chinner
2019-01-16 14:50                                                                                         ` Jerome Glisse
2019-01-16 22:51                                                                                           ` Dave Chinner
2019-01-16 11:38                                                                         ` Jan Kara
2019-01-16 13:08                                                                           ` Jerome Glisse
2019-01-17  5:42                                                                             ` John Hubbard
2019-01-17 15:21                                                                               ` Jerome Glisse
2019-01-18  0:16                                                                                 ` Dave Chinner
2019-01-18  1:59                                                                                   ` Jerome Glisse
2019-01-17  9:30                                                                             ` Jan Kara
2019-01-17 15:17                                                                               ` Jerome Glisse
2019-01-22 15:24                                                                                 ` Jan Kara
2019-01-22 16:46                                                                                   ` Jerome Glisse
2019-01-23 18:02                                                                                     ` Jan Kara
2019-01-23 19:04                                                                                       ` Jerome Glisse
2019-01-29  0:22                                                                                         ` John Hubbard
2019-01-29  1:23                                                                                           ` Jerome Glisse
2019-01-29  6:41                                                                                             ` John Hubbard
2019-01-29 10:12                                                                                               ` Jan Kara
2019-01-30  2:21                                                                                                 ` John Hubbard
2019-01-17  5:25                                                                         ` John Hubbard
2019-01-17  9:04                                                                           ` Jan Kara
2019-01-12  3:14                                                               ` Jerome Glisse
2018-12-18 10:33                                   ` Jan Kara
2018-12-18 23:42                                     ` Dave Chinner
2018-12-19  3:03                                       ` Jason Gunthorpe
2018-12-19  5:26                                         ` Dan Williams
2018-12-19 11:19                                           ` Jan Kara
2018-12-19 10:28                                         ` Dave Chinner
2018-12-19 11:35                                           ` Jan Kara
2018-12-19 16:56                                             ` Jason Gunthorpe
2018-12-19 22:33                                             ` Dave Chinner
2018-12-20  9:07                                               ` Jan Kara
2018-12-20 16:54                                               ` Jerome Glisse
2018-12-19 13:24                                       ` Jan Kara
2018-12-08  5:18                       ` Matthew Wilcox
2018-12-12 19:13                         ` John Hubbard
2018-12-08  7:16                       ` Dan Williams
2018-12-08 16:33                         ` Jerome Glisse
2018-12-08 16:48                           ` Christoph Hellwig
2018-12-08 17:47                             ` Jerome Glisse
2018-12-08 18:26                               ` Christoph Hellwig
2018-12-08 18:45                                 ` Jerome Glisse
2018-12-08 18:09                             ` Dan Williams
2018-12-08 18:12                               ` Christoph Hellwig
2018-12-11  6:18                               ` Dave Chinner
2018-12-05  5:52             ` Dan Williams
2018-12-05 11:16       ` Jan Kara
2018-12-04  0:17 ` [PATCH 2/2] infiniband/mm: convert put_page() to put_user_page*() john.hubbard
2018-12-04 17:10 ` [PATCH 0/2] put_user_page*(): start converting the call sites David Laight
2018-12-05  1:05   ` John Hubbard
2018-12-05 14:08     ` David Laight
2018-12-28  8:37       ` Pavel Machek
2019-02-08  7:56 [PATCH 0/2] mm: put_user_page() call site conversion first john.hubbard
2019-02-08  7:56 ` [PATCH 1/2] mm: introduce put_user_page*(), placeholder versions john.hubbard
2019-02-08 10:32   ` Mike Rapoport
2019-02-08 20:44     ` John Hubbard

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAPcyv4gJHeFjEgna1S-2uE4KxkSUgkc=e=2E5oqfoirec84C-w@mail.gmail.com' \
    --to=dan.j.williams@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=benve@cisco.com \
    --cc=cl@linux.com \
    --cc=dennis.dalessandro@intel.com \
    --cc=dledford@redhat.com \
    --cc=hch@infradead.org \
    --cc=ira.weiny@intel.com \
    --cc=jack@suse.cz \
    --cc=jgg@ziepe.ca \
    --cc=jglisse@redhat.com \
    --cc=jhubbard@nvidia.com \
    --cc=john.hubbard@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mike.marciniszyn@intel.com \
    --cc=rcampbell@nvidia.com \
    --cc=tom@talpey.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org linux-kernel@archiver.kernel.org
	public-inbox-index lkml


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox