Linux-Fsdevel Archive on lore.kernel.org
 help / color / Atom feed
From: Jerome Glisse <jglisse@redhat.com>
To: John Hubbard <jhubbard@nvidia.com>
Cc: Dan Williams <dan.j.williams@intel.com>, Jan Kara <jack@suse.cz>,
	Matthew Wilcox <willy@infradead.org>,
	John Hubbard <john.hubbard@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linux MM <linux-mm@kvack.org>,
	tom@talpey.com, Al Viro <viro@zeniv.linux.org.uk>,
	benve@cisco.com, Christoph Hellwig <hch@infradead.org>,
	Christopher Lameter <cl@linux.com>,
	"Dalessandro, Dennis" <dennis.dalessandro@intel.com>,
	Doug Ledford <dledford@redhat.com>,
	Jason Gunthorpe <jgg@ziepe.ca>, Michal Hocko <mhocko@kernel.org>,
	Mike Marciniszyn <mike.marciniszyn@intel.com>,
	rcampbell@nvidia.com,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH 1/2] mm: introduce put_user_page*(), placeholder versions
Date: Wed, 12 Dec 2018 17:04:19 -0500
Message-ID: <20181212220418.GH5037@redhat.com> (raw)
In-Reply-To: <514cc9e1-dc4d-b979-c6bc-88ac503c098d@nvidia.com>

On Wed, Dec 12, 2018 at 01:56:00PM -0800, John Hubbard wrote:
> On 12/12/18 1:30 PM, Jerome Glisse wrote:
> > On Wed, Dec 12, 2018 at 08:27:35AM -0800, Dan Williams wrote:
> >> On Wed, Dec 12, 2018 at 7:03 AM Jerome Glisse <jglisse@redhat.com> wrote:
> >>>
> >>> On Mon, Dec 10, 2018 at 11:28:46AM +0100, Jan Kara wrote:
> >>>> On Fri 07-12-18 21:24:46, Jerome Glisse wrote:
> >>>>> Another crazy idea, why not treating GUP as another mapping of the page
> >>>>> and caller of GUP would have to provide either a fake anon_vma struct or
> >>>>> a fake vma struct (or both for PRIVATE mapping of a file where you can
> >>>>> have a mix of both private and file page thus only if it is a read only
> >>>>> GUP) that would get added to the list of existing mapping.
> >>>>>
> >>>>> So the flow would be:
> >>>>>     somefunction_thatuse_gup()
> >>>>>     {
> >>>>>         ...
> >>>>>         GUP(_fast)(vma, ..., fake_anon, fake_vma);
> >>>>>         ...
> >>>>>     }
> >>>>>
> >>>>>     GUP(vma, ..., fake_anon, fake_vma)
> >>>>>     {
> >>>>>         if (vma->flags == ANON) {
> >>>>>             // Add the fake anon vma to the anon vma chain as a child
> >>>>>             // of current vma
> >>>>>         } else {
> >>>>>             // Add the fake vma to the mapping tree
> >>>>>         }
> >>>>>
> >>>>>         // The existing GUP except that now it inc mapcount and not
> >>>>>         // refcount
> >>>>>         GUP_old(..., &nanonymous, &nfiles);
> >>>>>
> >>>>>         atomic_add(&fake_anon->refcount, nanonymous);
> >>>>>         atomic_add(&fake_vma->refcount, nfiles);
> >>>>>
> >>>>>         return nanonymous + nfiles;
> >>>>>     }
> >>>>
> >>>> Thanks for your idea! This is actually something like I was suggesting back
> >>>> at LSF/MM in Deer Valley. There were two downsides to this I remember
> >>>> people pointing out:
> >>>>
> >>>> 1) This cannot really work with __get_user_pages_fast(). You're not allowed
> >>>> to get necessary locks to insert new entry into the VMA tree in that
> >>>> context. So essentially we'd loose get_user_pages_fast() functionality.
> >>>>
> >>>> 2) The overhead e.g. for direct IO may be noticeable. You need to allocate
> >>>> the fake tracking VMA, get VMA interval tree lock, insert into the tree.
> >>>> Then on IO completion you need to queue work to unpin the pages again as you
> >>>> cannot remove the fake VMA directly from interrupt context where the IO is
> >>>> completed.
> >>>>
> >>>> You are right that the cost could be amortized if gup() is called for
> >>>> multiple consecutive pages however for small IOs there's no help...
> >>>>
> >>>> So this approach doesn't look like a win to me over using counter in struct
> >>>> page and I'd rather try looking into squeezing HMM public page usage of
> >>>> struct page so that we can fit that gup counter there as well. I know that
> >>>> it may be easier said than done...
> >>>
> >>> So i want back to the drawing board and first i would like to ascertain
> >>> that we all agree on what the objectives are:
> >>>
> >>>     [O1] Avoid write back from a page still being written by either a
> >>>          device or some direct I/O or any other existing user of GUP.
> >>>          This would avoid possible file system corruption.
> >>>
> >>>     [O2] Avoid crash when set_page_dirty() is call on a page that is
> >>>          considered clean by core mm (buffer head have been remove and
> >>>          with some file system this turns into an ugly mess).
> >>>
> >>>     [O3] DAX and the device block problems, ie with DAX the page map in
> >>>          userspace is the same as the block (persistent memory) and no
> >>>          filesystem nor block device understand page as block or pinned
> >>>          block.
> >>>
> >>> For [O3] i don't think any pin count would help in anyway. I believe
> >>> that the current long term GUP API that does not allow GUP of DAX is
> >>> the only sane solution for now.
> >>
> >> No, that's not a sane solution, it's an emergency hack.
> >>
> >>> The real fix would be to teach file-
> >>> system about DAX/pinned block so that a pinned block is not reuse
> >>> by filesystem.
> >>
> >> We already have taught filesystems about pinned dax pages, see
> >> dax_layout_busy_page(). As much as possible I want to eliminate the
> >> concept of "dax pages" as a special case that gets sprinkled
> >> throughout the mm.
> > 
> > So thinking on O3 issues what about leveraging the recent change i
> > did to mmu notifier. Add a event for truncate or any other file
> > event that need to invalidate the file->page for a range of offset.
> > 
> > Add mmu notifier listener to GUP user (except direct I/O) so that
> > they invalidate they hardware mapping or switch the hardware mapping
> > to use a crappy page. When such event happens what ever user do to
> > the page through that driver is broken anyway. So it is better to
> > be loud about it then trying to make it pass under the radar.
> > 
> > This will put the burden on broken user and allow you to properly
> > recycle your DAX page.
> > 
> > Think of it as revoke through mmu notifier.
> > 
> > So patchset would be:
> >     enum mmu_notifier_event {
> > +       MMU_NOTIFY_TRUNCATE,
> >     };
> > 
> > +   Change truncate code path to emit MMU_NOTIFY_TRUNCATE
> > 
> 
> That part looks good.
> 
> > Then for each user of GUP (except direct I/O or other very short
> > term GUP):
> 
> but, why is there a difference between how we handle long- and
> short-term callers? Aren't we just leaving a harder-to-reproduce race
> condition, if we ignore the short-term gup callers?
> 
> So, how does activity (including direct IO and other short-term callers)
> get quiesced (stopped, and guaranteed not to restart or continue), so 
> that truncate or umount can continue on?

The fs would delay block reuse to after refcount is gone so it would
wait for that. It is ok to do that only for short term user in case of
direct I/O this should really not happen as it means that the application
is doing something really stupid. So the waiting on short term user
would be a rare event.


> >     Patch 1: register mmu notifier
> >     Patch 2: listen to MMU_NOTIFY_TRUNCATE and MMU_NOTIFY_UNMAP
> >              when that happens update the device page table or
> >              usage to point to a crappy page and do put_user_page
> >              on all previously held page
> 
> Minor point, this sequence should be done within a wrapper around existing 
> get_user_pages(), such as get_user_pages_revokable() or something.

No we want to teach everyone to abide by the rules, if we add yet another
GUP function prototype people will use the one where they don;t have to
say they abide by the rules. It is time we advertise the fact that GUP
should not be use willy nilly for anything without worrying about the
implication it has :)

So i would rather see a consolidation in the number of GUP prototype we
have than yet another one.

Cheers,
J�r�me

  reply index

Thread overview: 213+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-04  0:17 [PATCH 0/2] put_user_page*(): start converting the call sites john.hubbard
2018-12-04  0:17 ` [PATCH 1/2] mm: introduce put_user_page*(), placeholder versions john.hubbard
2018-12-04  7:53   ` Mike Rapoport
2018-12-05  1:40     ` John Hubbard
2018-12-04 20:28   ` Dan Williams
2018-12-04 21:56     ` John Hubbard
2018-12-04 23:03       ` Dan Williams
2018-12-05  0:36         ` Jerome Glisse
2018-12-05  0:40           ` Dan Williams
2018-12-05  0:59             ` John Hubbard
2018-12-05  0:58         ` John Hubbard
2018-12-05  1:00           ` Dan Williams
2018-12-05  1:15           ` Matthew Wilcox
2018-12-05  1:44             ` Jerome Glisse
2018-12-05  1:57               ` John Hubbard
2018-12-07  2:45                 ` John Hubbard
2018-12-07 19:16                   ` Jerome Glisse
2018-12-07 19:26                     ` Dan Williams
2018-12-07 19:40                       ` Jerome Glisse
2018-12-08  0:52                     ` John Hubbard
2018-12-08  2:24                       ` Jerome Glisse
2018-12-10 10:28                         ` Jan Kara
2018-12-12 15:03                           ` Jerome Glisse
2018-12-12 16:27                             ` Dan Williams
2018-12-12 17:02                               ` Jerome Glisse
2018-12-12 17:49                                 ` Dan Williams
2018-12-12 19:07                                   ` John Hubbard
2018-12-12 21:30                               ` Jerome Glisse
2018-12-12 21:40                                 ` Dan Williams
2018-12-12 21:53                                   ` Jerome Glisse
2018-12-12 22:11                                     ` Matthew Wilcox
2018-12-12 22:16                                       ` Jerome Glisse
2018-12-12 23:37                                     ` Jason Gunthorpe
2018-12-12 23:46                                       ` John Hubbard
2018-12-12 23:54                                       ` Dan Williams
2018-12-13  0:01                                       ` Jerome Glisse
2018-12-13  0:18                                         ` Dan Williams
2018-12-13  0:44                                           ` Jerome Glisse
2018-12-13  3:26                                             ` Jason Gunthorpe
2018-12-13  3:20                                         ` Jason Gunthorpe
2018-12-13 12:43                                           ` Jerome Glisse
2018-12-13 13:40                                             ` Tom Talpey
2018-12-13 14:18                                               ` Jerome Glisse
2018-12-13 14:51                                                 ` Tom Talpey
2018-12-13 15:18                                                   ` Jerome Glisse
2018-12-13 18:12                                                     ` Tom Talpey
2018-12-13 19:18                                                       ` Jerome Glisse
2018-12-14 10:41                                             ` Jan Kara
2018-12-14 15:25                                               ` Jerome Glisse
2018-12-12 21:56                                 ` John Hubbard
2018-12-12 22:04                                   ` Jerome Glisse [this message]
2018-12-12 22:11                                     ` John Hubbard
2018-12-12 22:14                                       ` Jerome Glisse
2018-12-12 22:17                                         ` John Hubbard
2018-12-12 21:46                             ` Dave Chinner
2018-12-12 21:59                               ` Jerome Glisse
2018-12-13  0:51                                 ` Dave Chinner
2018-12-13  2:02                                   ` Jerome Glisse
2018-12-13 15:56                                     ` Christopher Lameter
2018-12-13 16:02                                       ` Jerome Glisse
2018-12-14  6:00                                     ` Dave Chinner
2018-12-14 15:13                                       ` Jerome Glisse
2018-12-14  3:52                                   ` John Hubbard
2018-12-14  5:21                                     ` Dan Williams
2018-12-14  6:11                                       ` John Hubbard
2018-12-14 15:20                                         ` Jerome Glisse
2018-12-14 19:38                                         ` Dan Williams
2018-12-14 19:48                                           ` Matthew Wilcox
2018-12-14 19:53                                             ` Dave Hansen
2018-12-14 20:03                                               ` Matthew Wilcox
2018-12-14 20:17                                                 ` Dan Williams
2018-12-14 20:29                                                   ` Matthew Wilcox
2018-12-15  0:41                                                 ` John Hubbard
2018-12-17  8:56                                           ` Jan Kara
2018-12-17 18:28                                             ` Dan Williams
2018-12-14 15:43                               ` Jan Kara
2018-12-16 21:58                                 ` Dave Chinner
2018-12-17 18:11                                   ` Jerome Glisse
2018-12-17 18:34                                     ` Matthew Wilcox
2018-12-17 19:48                                       ` Jerome Glisse
2018-12-17 19:51                                         ` Matthew Wilcox
2018-12-17 19:54                                           ` Jerome Glisse
2018-12-17 19:59                                             ` Matthew Wilcox
2018-12-17 20:55                                               ` Jerome Glisse
2018-12-17 21:03                                                 ` Matthew Wilcox
2018-12-17 21:15                                                   ` Jerome Glisse
2018-12-18  1:09                                       ` Dave Chinner
2018-12-18  6:12                                       ` Darrick J. Wong
2018-12-18  9:30                                       ` Jan Kara
2018-12-18 23:29                                         ` John Hubbard
2018-12-19  2:07                                           ` Jerome Glisse
2018-12-19 11:08                                             ` Jan Kara
2018-12-20 10:54                                               ` John Hubbard
2018-12-20 16:50                                                 ` Jerome Glisse
2018-12-20 16:57                                                   ` Dan Williams
2018-12-20 16:49                                               ` Jerome Glisse
2019-01-03  1:55                                               ` Jerome Glisse
2019-01-03  3:27                                                 ` John Hubbard
2019-01-03 14:57                                                   ` Jerome Glisse
2019-01-03  9:26                                                 ` Jan Kara
2019-01-03 14:44                                                   ` Jerome Glisse
2019-01-11  2:59                                                     ` John Hubbard
2019-01-11  2:59                                                       ` John Hubbard
2019-01-11 16:51                                                       ` Jerome Glisse
2019-01-11 16:51                                                         ` Jerome Glisse
2019-01-12  1:04                                                         ` John Hubbard
2019-01-12  1:04                                                           ` John Hubbard
2019-01-12  2:02                                                           ` Jerome Glisse
2019-01-12  2:02                                                             ` Jerome Glisse
2019-01-12  2:38                                                             ` John Hubbard
2019-01-12  2:38                                                               ` John Hubbard
2019-01-12  2:46                                                               ` Jerome Glisse
2019-01-12  2:46                                                                 ` Jerome Glisse
2019-01-12  3:06                                                                 ` John Hubbard
2019-01-12  3:06                                                                   ` John Hubbard
2019-01-12  3:25                                                                   ` Jerome Glisse
2019-01-12  3:25                                                                     ` Jerome Glisse
2019-01-12 20:46                                                                     ` John Hubbard
2019-01-12 20:46                                                                       ` John Hubbard
2019-01-14 14:54                                                                   ` Jan Kara
2019-01-14 14:54                                                                     ` Jan Kara
2019-01-14 17:21                                                                     ` Jerome Glisse
2019-01-14 17:21                                                                       ` Jerome Glisse
2019-01-14 19:09                                                                       ` John Hubbard
2019-01-14 19:09                                                                         ` John Hubbard
2019-01-15  8:34                                                                         ` Jan Kara
2019-01-15  8:34                                                                           ` Jan Kara
2019-01-15 21:39                                                                           ` John Hubbard
2019-01-15 21:39                                                                             ` John Hubbard
2019-01-15  8:07                                                                       ` Jan Kara
2019-01-15  8:07                                                                         ` Jan Kara
2019-01-15 17:15                                                                         ` Jerome Glisse
2019-01-15 17:15                                                                           ` Jerome Glisse
2019-01-15 21:56                                                                           ` John Hubbard
2019-01-15 21:56                                                                             ` John Hubbard
2019-01-15 22:12                                                                             ` Jerome Glisse
2019-01-15 22:12                                                                               ` Jerome Glisse
2019-01-16  0:44                                                                               ` John Hubbard
2019-01-16  0:44                                                                                 ` John Hubbard
2019-01-16  1:56                                                                                 ` Jerome Glisse
2019-01-16  1:56                                                                                   ` Jerome Glisse
2019-01-16  2:01                                                                                   ` Dan Williams
2019-01-16  2:01                                                                                     ` Dan Williams
2019-01-16  2:23                                                                                     ` Jerome Glisse
2019-01-16  2:23                                                                                       ` Jerome Glisse
2019-01-16  4:34                                                                                       ` Dave Chinner
2019-01-16  4:34                                                                                         ` Dave Chinner
2019-01-16 14:50                                                                                         ` Jerome Glisse
2019-01-16 14:50                                                                                           ` Jerome Glisse
2019-01-16 22:51                                                                                           ` Dave Chinner
2019-01-16 22:51                                                                                             ` Dave Chinner
2019-01-16 11:38                                                                         ` Jan Kara
2019-01-16 11:38                                                                           ` Jan Kara
2019-01-16 13:08                                                                           ` Jerome Glisse
2019-01-16 13:08                                                                             ` Jerome Glisse
2019-01-17  5:42                                                                             ` John Hubbard
2019-01-17  5:42                                                                               ` John Hubbard
2019-01-17 15:21                                                                               ` Jerome Glisse
2019-01-17 15:21                                                                                 ` Jerome Glisse
2019-01-18  0:16                                                                                 ` Dave Chinner
2019-01-18  1:59                                                                                   ` Jerome Glisse
2019-01-17  9:30                                                                             ` Jan Kara
2019-01-17  9:30                                                                               ` Jan Kara
2019-01-17 15:17                                                                               ` Jerome Glisse
2019-01-17 15:17                                                                                 ` Jerome Glisse
2019-01-22 15:24                                                                                 ` Jan Kara
2019-01-22 16:46                                                                                   ` Jerome Glisse
2019-01-23 18:02                                                                                     ` Jan Kara
2019-01-23 19:04                                                                                       ` Jerome Glisse
2019-01-29  0:22                                                                                         ` John Hubbard
2019-01-29  1:23                                                                                           ` Jerome Glisse
2019-01-29  6:41                                                                                             ` John Hubbard
2019-01-29 10:12                                                                                               ` Jan Kara
2019-01-30  2:21                                                                                                 ` John Hubbard
2019-01-17  5:25                                                                         ` John Hubbard
2019-01-17  5:25                                                                           ` John Hubbard
2019-01-17  9:04                                                                           ` Jan Kara
2019-01-17  9:04                                                                             ` Jan Kara
2019-01-12  3:14                                                               ` Jerome Glisse
2019-01-12  3:14                                                                 ` Jerome Glisse
2018-12-18 10:33                                   ` Jan Kara
2018-12-18 23:42                                     ` Dave Chinner
2018-12-19  3:03                                       ` Jason Gunthorpe
2018-12-19  5:26                                         ` Dan Williams
2018-12-19 11:19                                           ` Jan Kara
2018-12-19 10:28                                         ` Dave Chinner
2018-12-19 11:35                                           ` Jan Kara
2018-12-19 16:56                                             ` Jason Gunthorpe
2018-12-19 22:33                                             ` Dave Chinner
2018-12-20  9:07                                               ` Jan Kara
2018-12-20 16:54                                               ` Jerome Glisse
2018-12-19 13:24                                       ` Jan Kara
2018-12-08  5:18                       ` Matthew Wilcox
2018-12-12 19:13                         ` John Hubbard
2018-12-08  7:16                       ` Dan Williams
2018-12-08 16:33                         ` Jerome Glisse
2018-12-08 16:48                           ` Christoph Hellwig
2018-12-08 17:47                             ` Jerome Glisse
2018-12-08 18:26                               ` Christoph Hellwig
2018-12-08 18:45                                 ` Jerome Glisse
2018-12-08 18:09                             ` Dan Williams
2018-12-08 18:12                               ` Christoph Hellwig
2018-12-11  6:18                               ` Dave Chinner
2018-12-05  5:52             ` Dan Williams
2018-12-05 11:16       ` Jan Kara
2018-12-04  0:17 ` [PATCH 2/2] infiniband/mm: convert put_page() to put_user_page*() john.hubbard
2018-12-04 17:10 ` [PATCH 0/2] put_user_page*(): start converting the call sites David Laight
2018-12-05  1:05   ` John Hubbard
2018-12-05 14:08     ` David Laight
2018-12-28  8:37       ` Pavel Machek
2019-02-08  7:56 [PATCH 0/2] mm: put_user_page() call site conversion first john.hubbard
2019-02-08  7:56 ` [PATCH 1/2] mm: introduce put_user_page*(), placeholder versions john.hubbard
2019-02-08 10:32   ` Mike Rapoport
2019-02-08 20:44     ` John Hubbard

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181212220418.GH5037@redhat.com \
    --to=jglisse@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=benve@cisco.com \
    --cc=cl@linux.com \
    --cc=dan.j.williams@intel.com \
    --cc=dennis.dalessandro@intel.com \
    --cc=dledford@redhat.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=jgg@ziepe.ca \
    --cc=jhubbard@nvidia.com \
    --cc=john.hubbard@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mike.marciniszyn@intel.com \
    --cc=rcampbell@nvidia.com \
    --cc=tom@talpey.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org linux-fsdevel@archiver.kernel.org
	public-inbox-index linux-fsdevel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox