Linux-RDMA Archive on lore.kernel.org
 help / color / Atom feed
From: Jerome Glisse <jglisse@redhat.com>
To: John Hubbard <jhubbard@nvidia.com>
Cc: "Andrew Morton" <akpm@linux-foundation.org>,
	"Al Viro" <viro@zeniv.linux.org.uk>,
	"Alex Williamson" <alex.williamson@redhat.com>,
	"Benjamin Herrenschmidt" <benh@kernel.crashing.org>,
	"Björn Töpel" <bjorn.topel@intel.com>,
	"Christoph Hellwig" <hch@infradead.org>,
	"Dan Williams" <dan.j.williams@intel.com>,
	"Daniel Vetter" <daniel@ffwll.ch>,
	"Dave Chinner" <david@fromorbit.com>,
	"David Airlie" <airlied@linux.ie>,
	"David S . Miller" <davem@davemloft.net>,
	"Ira Weiny" <ira.weiny@intel.com>, "Jan Kara" <jack@suse.cz>,
	"Jason Gunthorpe" <jgg@ziepe.ca>, "Jens Axboe" <axboe@kernel.dk>,
	"Jonathan Corbet" <corbet@lwn.net>,
	"Magnus Karlsson" <magnus.karlsson@intel.com>,
	"Mauro Carvalho Chehab" <mchehab@kernel.org>,
	"Michael Ellerman" <mpe@ellerman.id.au>,
	"Michal Hocko" <mhocko@suse.com>,
	"Mike Kravetz" <mike.kravetz@oracle.com>,
	"Paul Mackerras" <paulus@samba.org>,
	"Shuah Khan" <shuah@kernel.org>,
	"Vlastimil Babka" <vbabka@suse.cz>,
	bpf@vger.kernel.org, dri-devel@lists.freedesktop.org,
	kvm@vger.kernel.org, linux-block@vger.kernel.org,
	linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-kselftest@vger.kernel.org, linux-media@vger.kernel.org,
	linux-rdma@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	netdev@vger.kernel.org, linux-mm@kvack.org,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v2 05/18] mm/gup: introduce pin_user_pages*() and FOLL_PIN
Date: Mon, 4 Nov 2019 15:31:53 -0500
Message-ID: <20191104203153.GB7731@redhat.com> (raw)
In-Reply-To: <25ec4bc0-caaa-2a01-2ae7-2d79663a40e1@nvidia.com>

On Mon, Nov 04, 2019 at 12:09:05PM -0800, John Hubbard wrote:
> Jason, a question for you at the bottom.
> 
> On 11/4/19 11:52 AM, Jerome Glisse wrote:
> ...
> >> CASE 3: ODP
> >> -----------
> >> RDMA hardware with page faulting support. Here, a well-written driver doesn't
> > 
> > CASE3: Hardware with page fault support
> > ---------------------------------------
> > 
> > Here, a well-written ....
> > 
> 
> Ah, OK. So just drop the first sentence, yes.
> 
> ...
> >>>>>> +	 */
> >>>>>> +	gup_flags |= FOLL_REMOTE | FOLL_PIN;
> >>>>>
> >>>>> Wouldn't it be better to not add pin_longterm_pages_remote() until
> >>>>> it can be properly implemented ?
> >>>>>
> >>>>
> >>>> Well, the problem is that I need each call site that requires FOLL_PIN
> >>>> to use a proper wrapper. It's the FOLL_PIN that is the focus here, because
> >>>> there is a hard, bright rule, which is: if and only if a caller sets
> >>>> FOLL_PIN, then the dma-page tracking happens, and put_user_page() must
> >>>> be called.
> >>>>
> >>>> So this leaves me with only two reasonable choices:
> >>>>
> >>>> a) Convert the call site as above: pin_longterm_pages_remote(), which sets
> >>>> FOLL_PIN (the key point!), and leaves the FOLL_LONGTERM situation exactly
> >>>> as it has been so far. When the FOLL_LONGTERM situation is fixed, the call
> >>>> site *might* not need any changes to adopt the working gup.c code.
> >>>>
> >>>> b) Convert the call site to pin_user_pages_remote(), which also sets
> >>>> FOLL_PIN, and also leaves the FOLL_LONGTERM situation exactly as before.
> >>>> There would also be a comment at the call site, to the effect of, "this
> >>>> is the wrong call to make: it really requires FOLL_LONGTERM behavior".
> >>>>
> >>>> When the FOLL_LONGTERM situation is fixed, the call site will need to be
> >>>> changed to pin_longterm_pages_remote().
> >>>>
> >>>> So you can probably see why I picked (a).
> >>>
> >>> But right now nobody has FOLL_LONGTERM and FOLL_REMOTE. So you should
> >>> never have the need for pin_longterm_pages_remote(). My fear is that
> >>> longterm has implication and it would be better to not drop this implication
> >>> by adding a wrapper that does not do what the name says.
> >>>
> >>> So do not introduce pin_longterm_pages_remote() until its first user
> >>> happens. This is option c)
> >>>
> >>
> >> Almost forgot, though: there is already another user: Infiniband:
> >>
> >> drivers/infiniband/core/umem_odp.c:646:         npages = pin_longterm_pages_remote(owning_process, owning_mm,
> > 
> > odp do not need that, i thought the HMM convertion was already upstream
> > but seems not, in any case odp do not need the longterm case it only
> > so best is to revert that user to gup_fast or something until it get
> > converted to HMM.
> > 
> 
> Note for Jason: the (a) or (b) items are talking about the vfio case, which is
> one of the two call sites that now use pin_longterm_pages_remote(), and the
> other one is infiniband:
> 
> drivers/infiniband/core/umem_odp.c:646:         npages = pin_longterm_pages_remote(owning_process, owning_mm,
> drivers/vfio/vfio_iommu_type1.c:353:            ret = pin_longterm_pages_remote(NULL, mm, vaddr, 1,

vfio should be reverted until it can be properly implemented.
The issue is that when you fix the implementation you might
break vfio existing user and thus regress the kernel from user
point of view. So i rather have the change to vfio reverted,
i believe it was not well understood when it got upstream,
between in my 5.4 tree it is still gup_remote not longterm.


> Jerome, Jason: I really don't want to revert the put_page() to put_user_page() 
> conversions that are already throughout the IB driver--pointless churn, right?
> I'd rather either delete them in Jason's tree, or go with what I have here
> while waiting for the deletion.
> 
> Maybe we should just settle on (a) or (b), so that the IB driver ends up with
> the wrapper functions? In fact, if it's getting deleted, then I'd prefer leaving
> it at (a), since that's simple...
> 
> Jason should weigh in on how he wants this to go, with respect to branching
> and merging, since it sounds like that will conflict with the hmm branch 
> (ha, I'm overdue in reviewing his mmu notifier series, that's what I get for
> being late).
> 
> thanks,
> 
> John Hubbard
> NVIDIA


  parent reply index

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-03 21:17 [PATCH v2 00/18] mm/gup: track dma-pinned pages: FOLL_PIN, FOLL_LONGTERM John Hubbard
2019-11-03 21:17 ` [PATCH v2 01/18] mm/gup: pass flags arg to __gup_device_* functions John Hubbard
2019-11-04 16:39   ` Jerome Glisse
2019-11-03 21:17 ` [PATCH v2 02/18] mm/gup: factor out duplicate code from four routines John Hubbard
2019-11-04 16:51   ` Jerome Glisse
2019-11-03 21:17 ` [PATCH v2 03/18] goldish_pipe: rename local pin_user_pages() routine John Hubbard
2019-11-04 16:52   ` Jerome Glisse
2019-11-03 21:17 ` [PATCH v2 04/18] media/v4l2-core: set pages dirty upon releasing DMA buffers John Hubbard
2019-11-10 10:10   ` Hans Verkuil
2019-11-11 21:46     ` John Hubbard
2019-11-03 21:18 ` [PATCH v2 05/18] mm/gup: introduce pin_user_pages*() and FOLL_PIN John Hubbard
2019-11-04 17:33   ` Jerome Glisse
2019-11-04 19:04     ` John Hubbard
2019-11-04 19:18       ` Jerome Glisse
2019-11-04 19:30         ` John Hubbard
2019-11-04 19:52           ` Jerome Glisse
2019-11-04 20:09             ` John Hubbard
2019-11-04 20:31               ` Jason Gunthorpe
2019-11-04 20:40                 ` John Hubbard
2019-11-04 20:31               ` Jerome Glisse [this message]
2019-11-04 20:37                 ` Jason Gunthorpe
2019-11-04 20:57                   ` John Hubbard
2019-11-04 21:15                     ` Jason Gunthorpe
2019-11-04 21:34                       ` John Hubbard
2019-11-04 20:33   ` David Rientjes
2019-11-04 20:48     ` Jerome Glisse
2019-11-05 13:10   ` Mike Rapoport
2019-11-05 19:00     ` John Hubbard
2019-11-07  2:25       ` Ira Weiny
2019-11-07  8:07       ` Mike Rapoport
2019-11-03 21:18 ` [PATCH v2 06/18] goldish_pipe: convert to pin_user_pages() and put_user_page() John Hubbard
2019-11-03 21:18 ` [PATCH v2 07/18] infiniband: set FOLL_PIN, FOLL_LONGTERM via pin_longterm_pages*() John Hubbard
2019-11-04 20:33   ` Jason Gunthorpe
2019-11-04 20:48     ` John Hubbard
2019-11-04 20:57       ` Jason Gunthorpe
2019-11-04 22:03         ` John Hubbard
2019-11-05  2:32           ` Jason Gunthorpe
2019-11-07  2:26         ` Ira Weiny
2019-11-03 21:18 ` [PATCH v2 08/18] mm/process_vm_access: set FOLL_PIN via pin_user_pages_remote() John Hubbard
2019-11-04 17:41   ` Jerome Glisse
2019-11-03 21:18 ` [PATCH v2 09/18] drm/via: set FOLL_PIN via pin_user_pages_fast() John Hubbard
2019-11-04 17:44   ` Jerome Glisse
2019-11-04 18:22     ` John Hubbard
2019-11-03 21:18 ` [PATCH v2 10/18] fs/io_uring: set FOLL_PIN via pin_user_pages() John Hubbard
2019-11-03 21:18 ` [PATCH v2 11/18] net/xdp: " John Hubbard
2019-11-03 21:18 ` [PATCH v2 12/18] mm/gup: track FOLL_PIN pages John Hubbard
2019-11-04 18:52   ` Jerome Glisse
2019-11-04 22:49     ` John Hubbard
2019-11-04 23:49       ` Jerome Glisse
2019-11-05  0:18         ` John Hubbard
2019-11-03 21:18 ` [PATCH v2 13/18] media/v4l2-core: pin_longterm_pages (FOLL_PIN) and put_user_page() conversion John Hubbard
2019-11-10 10:11   ` Hans Verkuil
2019-11-03 21:18 ` [PATCH v2 14/18] vfio, mm: " John Hubbard
2019-11-03 21:18 ` [PATCH v2 15/18] powerpc: book3s64: convert to pin_longterm_pages() and put_user_page() John Hubbard
2019-11-03 21:18 ` [PATCH v2 16/18] mm/gup_benchmark: support pin_user_pages() and related calls John Hubbard
2019-11-03 21:18 ` [PATCH v2 17/18] selftests/vm: run_vmtests: invoke gup_benchmark with basic FOLL_PIN coverage John Hubbard
2019-11-03 21:18 ` [PATCH v2 18/18] mm/gup: remove support for gup(FOLL_LONGTERM) John Hubbard

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191104203153.GB7731@redhat.com \
    --to=jglisse@redhat.com \
    --cc=airlied@linux.ie \
    --cc=akpm@linux-foundation.org \
    --cc=alex.williamson@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=benh@kernel.crashing.org \
    --cc=bjorn.topel@intel.com \
    --cc=bpf@vger.kernel.org \
    --cc=corbet@lwn.net \
    --cc=dan.j.williams@intel.com \
    --cc=daniel@ffwll.ch \
    --cc=davem@davemloft.net \
    --cc=david@fromorbit.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=hch@infradead.org \
    --cc=ira.weiny@intel.com \
    --cc=jack@suse.cz \
    --cc=jgg@ziepe.ca \
    --cc=jhubbard@nvidia.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=magnus.karlsson@intel.com \
    --cc=mchehab@kernel.org \
    --cc=mhocko@suse.com \
    --cc=mike.kravetz@oracle.com \
    --cc=mpe@ellerman.id.au \
    --cc=netdev@vger.kernel.org \
    --cc=paulus@samba.org \
    --cc=shuah@kernel.org \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-RDMA Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-rdma/0 linux-rdma/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-rdma linux-rdma/ https://lore.kernel.org/linux-rdma \
		linux-rdma@vger.kernel.org
	public-inbox-index linux-rdma

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-rdma


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git