Linux-RDMA Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v2 00/18] mm/gup: track dma-pinned pages: FOLL_PIN, FOLL_LONGTERM
@ 2019-11-03 21:17 John Hubbard
  2019-11-03 21:17 ` [PATCH v2 01/18] mm/gup: pass flags arg to __gup_device_* functions John Hubbard
                   ` (17 more replies)
  0 siblings, 18 replies; 57+ messages in thread
From: John Hubbard @ 2019-11-03 21:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Al Viro, Alex Williamson, Benjamin Herrenschmidt,
	Björn Töpel, Christoph Hellwig, Dan Williams,
	Daniel Vetter, Dave Chinner, David Airlie, David S . Miller,
	Ira Weiny, Jan Kara, Jason Gunthorpe, Jens Axboe,
	Jonathan Corbet, Jérôme Glisse, Magnus Karlsson,
	Mauro Carvalho Chehab, Michael Ellerman, Michal Hocko,
	Mike Kravetz, Paul Mackerras, Shuah Khan, Vlastimil Babka, bpf,
	dri-devel, kvm, linux-block, linux-doc, linux-fsdevel,
	linux-kselftest, linux-media, linux-rdma, linuxppc-dev, netdev,
	linux-mm, LKML, John Hubbard

Hi,

Changes since v1:

* Changed the function signature of __huge_pt_done() from int to void.
* Renamed __remove_refs_from_head() to put_compound_head().
* Improved the comment documentation in mm.h and gup.c
* Merged Documentation/vm/pin_user_pages.rst into the "introduce
  FOLL_PIN" patch.
* Fixed Documentation/vm/pin_user_pages.rst:
     * Fixed up a TODO about DAX.
     * 31, not 32 bits total are available for counting
* Deleted some stale comments from the commit description of the
  VFIO patch.
* Added Reviewed-by tags from Ira Weiny and Jens Axboe, and Acked-by
  from Björn Töpel.

======================================================================
Original cover letter (edited to fix up the patch description numbers)

This applies cleanly to linux-next and mmotm, and also to linux.git if
linux-next's commit 20cac10710c9 ("mm/gup_benchmark: fix MAP_HUGETLB
case") is first applied there.

This provides tracking of dma-pinned pages. This is a prerequisite to
solving the larger problem of proper interactions between file-backed
pages, and [R]DMA activities, as discussed in [1], [2], [3], and in
a remarkable number of email threads since about 2017. :)

A new internal gup flag, FOLL_PIN is introduced, and thoroughly
documented in the last patch's Documentation/vm/pin_user_pages.rst.

I believe that this will provide a good starting point for doing the
layout lease work that Ira Weiny has been working on. That's because
these new wrapper functions provide a clean, constrained, systematically
named set of functionality that, again, is required in order to even
know if a page is "dma-pinned".

In contrast to earlier approaches, the page tracking can be
incrementally applied to the kernel call sites that, until now, have
been simply calling get_user_pages() ("gup"). In other words, opt-in by
changing from this:

    get_user_pages() (sets FOLL_GET)
    put_page()

to this:
    pin_user_pages() (sets FOLL_PIN)
    put_user_page()

Because there are interdependencies with FOLL_LONGTERM, a similar
conversion as for FOLL_PIN, was applied. The change was from this:

    get_user_pages(FOLL_LONGTERM) (also sets FOLL_GET)
    put_page()

to this:
    pin_longterm_pages() (sets FOLL_PIN | FOLL_LONGTERM)
    put_user_page()

============================================================
Patch summary:

* Patches 1-4: refactoring and preparatory cleanup, independent fixes
    (Patch 4: V4L2-core bug fix (can be separately applied))

* Patch 5: introduce pin_user_pages(), FOLL_PIN, but no functional
           changes yet
* Patches 6-11: Convert existing put_user_page() callers, to use the
                new pin*()
* Patch 12: Activate tracking of FOLL_PIN pages.
* Patches 13-15: convert FOLL_LONGTERM callers
* Patches: 16-17: gup_benchmark and run_vmtests support
* Patch 18: enforce FOLL_LONGTERM as a gup-internal (only) flag

============================================================
Testing:

* I've done some overall kernel testing (LTP, and a few other goodies),
  and some directed testing to exercise some of the changes. And as you
  can see, gup_benchmark is enhanced to exercise this. Basically, I've been
  able to runtime test the core get_user_pages() and pin_user_pages() and
  related routines, but not so much on several of the call sites--but those
  are generally just a couple of lines changed, each.

  Not much of the kernel is actually using this, which on one hand
  reduces risk quite a lot. But on the other hand, testing coverage
  is low. So I'd love it if, in particular, the Infiniband and PowerPC
  folks could do a smoke test of this series for me.

  Also, my runtime testing for the call sites so far is very weak:

    * io_uring: Some directed tests from liburing exercise this, and they pass.
    * process_vm_access.c: A small directed test passes.
    * gup_benchmark: the enhanced version hits the new gup.c code, and passes.
    * infiniband (still only have crude "IB pingpong" working, on a
                  good day: it's not exercising my conversions at runtime...)
    * VFIO: compiles (I'm vowing to set up a run time test soon, but it's
                      not ready just yet)
    * powerpc: it compiles...
    * drm/via: compiles...
    * goldfish: compiles...
    * net/xdp: compiles...
    * media/v4l2: compiles...

============================================================
Next:

* Get the block/bio_vec sites converted to use pin_user_pages().

* Work with Ira and Dave Chinner to weave this together with the
  layout lease stuff.

============================================================

[1] Some slow progress on get_user_pages() (Apr 2, 2019): https://lwn.net/Articles/784574/
[2] DMA and get_user_pages() (LPC: Dec 12, 2018): https://lwn.net/Articles/774411/
[3] The trouble with get_user_pages() (Apr 30, 2018): https://lwn.net/Articles/753027/

John Hubbard (18):
  mm/gup: pass flags arg to __gup_device_* functions
  mm/gup: factor out duplicate code from four routines
  goldish_pipe: rename local pin_user_pages() routine
  media/v4l2-core: set pages dirty upon releasing DMA buffers
  mm/gup: introduce pin_user_pages*() and FOLL_PIN
  goldish_pipe: convert to pin_user_pages() and put_user_page()
  infiniband: set FOLL_PIN, FOLL_LONGTERM via pin_longterm_pages*()
  mm/process_vm_access: set FOLL_PIN via pin_user_pages_remote()
  drm/via: set FOLL_PIN via pin_user_pages_fast()
  fs/io_uring: set FOLL_PIN via pin_user_pages()
  net/xdp: set FOLL_PIN via pin_user_pages()
  mm/gup: track FOLL_PIN pages
  media/v4l2-core: pin_longterm_pages (FOLL_PIN) and put_user_page()
    conversion
  vfio, mm: pin_longterm_pages (FOLL_PIN) and put_user_page() conversion
  powerpc: book3s64: convert to pin_longterm_pages() and put_user_page()
  mm/gup_benchmark: support pin_user_pages() and related calls
  selftests/vm: run_vmtests: invoke gup_benchmark with basic FOLL_PIN
    coverage
  mm/gup: remove support for gup(FOLL_LONGTERM)

 Documentation/vm/index.rst                  |   1 +
 Documentation/vm/pin_user_pages.rst         | 212 +++++++
 arch/powerpc/mm/book3s64/iommu_api.c        |  15 +-
 drivers/gpu/drm/via/via_dmablit.c           |   2 +-
 drivers/infiniband/core/umem.c              |   5 +-
 drivers/infiniband/core/umem_odp.c          |  10 +-
 drivers/infiniband/hw/hfi1/user_pages.c     |   4 +-
 drivers/infiniband/hw/mthca/mthca_memfree.c |   3 +-
 drivers/infiniband/hw/qib/qib_user_pages.c  |   8 +-
 drivers/infiniband/hw/qib/qib_user_sdma.c   |   2 +-
 drivers/infiniband/hw/usnic/usnic_uiom.c    |   9 +-
 drivers/infiniband/sw/siw/siw_mem.c         |   5 +-
 drivers/media/v4l2-core/videobuf-dma-sg.c   |  10 +-
 drivers/platform/goldfish/goldfish_pipe.c   |  35 +-
 drivers/vfio/vfio_iommu_type1.c             |  15 +-
 fs/io_uring.c                               |   5 +-
 include/linux/mm.h                          | 142 ++++-
 include/linux/mmzone.h                      |   2 +
 include/linux/page_ref.h                    |  10 +
 mm/gup.c                                    | 594 ++++++++++++++++----
 mm/gup_benchmark.c                          |  81 ++-
 mm/huge_memory.c                            |  32 +-
 mm/hugetlb.c                                |  28 +-
 mm/memremap.c                               |   4 +-
 mm/process_vm_access.c                      |  28 +-
 mm/vmstat.c                                 |   2 +
 net/xdp/xdp_umem.c                          |   4 +-
 tools/testing/selftests/vm/gup_benchmark.c  |  28 +-
 tools/testing/selftests/vm/run_vmtests      |  22 +
 29 files changed, 1054 insertions(+), 264 deletions(-)
 create mode 100644 Documentation/vm/pin_user_pages.rst

-- 
2.23.0


^ permalink raw reply	[flat|nested] 57+ messages in thread

end of thread, back to index

Thread overview: 57+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-03 21:17 [PATCH v2 00/18] mm/gup: track dma-pinned pages: FOLL_PIN, FOLL_LONGTERM John Hubbard
2019-11-03 21:17 ` [PATCH v2 01/18] mm/gup: pass flags arg to __gup_device_* functions John Hubbard
2019-11-04 16:39   ` Jerome Glisse
2019-11-03 21:17 ` [PATCH v2 02/18] mm/gup: factor out duplicate code from four routines John Hubbard
2019-11-04 16:51   ` Jerome Glisse
2019-11-03 21:17 ` [PATCH v2 03/18] goldish_pipe: rename local pin_user_pages() routine John Hubbard
2019-11-04 16:52   ` Jerome Glisse
2019-11-03 21:17 ` [PATCH v2 04/18] media/v4l2-core: set pages dirty upon releasing DMA buffers John Hubbard
2019-11-10 10:10   ` Hans Verkuil
2019-11-11 21:46     ` John Hubbard
2019-11-03 21:18 ` [PATCH v2 05/18] mm/gup: introduce pin_user_pages*() and FOLL_PIN John Hubbard
2019-11-04 17:33   ` Jerome Glisse
2019-11-04 19:04     ` John Hubbard
2019-11-04 19:18       ` Jerome Glisse
2019-11-04 19:30         ` John Hubbard
2019-11-04 19:52           ` Jerome Glisse
2019-11-04 20:09             ` John Hubbard
2019-11-04 20:31               ` Jason Gunthorpe
2019-11-04 20:40                 ` John Hubbard
2019-11-04 20:31               ` Jerome Glisse
2019-11-04 20:37                 ` Jason Gunthorpe
2019-11-04 20:57                   ` John Hubbard
2019-11-04 21:15                     ` Jason Gunthorpe
2019-11-04 21:34                       ` John Hubbard
2019-11-04 20:33   ` David Rientjes
2019-11-04 20:48     ` Jerome Glisse
2019-11-05 13:10   ` Mike Rapoport
2019-11-05 19:00     ` John Hubbard
2019-11-07  2:25       ` Ira Weiny
2019-11-07  8:07       ` Mike Rapoport
2019-11-03 21:18 ` [PATCH v2 06/18] goldish_pipe: convert to pin_user_pages() and put_user_page() John Hubbard
2019-11-03 21:18 ` [PATCH v2 07/18] infiniband: set FOLL_PIN, FOLL_LONGTERM via pin_longterm_pages*() John Hubbard
2019-11-04 20:33   ` Jason Gunthorpe
2019-11-04 20:48     ` John Hubbard
2019-11-04 20:57       ` Jason Gunthorpe
2019-11-04 22:03         ` John Hubbard
2019-11-05  2:32           ` Jason Gunthorpe
2019-11-07  2:26         ` Ira Weiny
2019-11-03 21:18 ` [PATCH v2 08/18] mm/process_vm_access: set FOLL_PIN via pin_user_pages_remote() John Hubbard
2019-11-04 17:41   ` Jerome Glisse
2019-11-03 21:18 ` [PATCH v2 09/18] drm/via: set FOLL_PIN via pin_user_pages_fast() John Hubbard
2019-11-04 17:44   ` Jerome Glisse
2019-11-04 18:22     ` John Hubbard
2019-11-03 21:18 ` [PATCH v2 10/18] fs/io_uring: set FOLL_PIN via pin_user_pages() John Hubbard
2019-11-03 21:18 ` [PATCH v2 11/18] net/xdp: " John Hubbard
2019-11-03 21:18 ` [PATCH v2 12/18] mm/gup: track FOLL_PIN pages John Hubbard
2019-11-04 18:52   ` Jerome Glisse
2019-11-04 22:49     ` John Hubbard
2019-11-04 23:49       ` Jerome Glisse
2019-11-05  0:18         ` John Hubbard
2019-11-03 21:18 ` [PATCH v2 13/18] media/v4l2-core: pin_longterm_pages (FOLL_PIN) and put_user_page() conversion John Hubbard
2019-11-10 10:11   ` Hans Verkuil
2019-11-03 21:18 ` [PATCH v2 14/18] vfio, mm: " John Hubbard
2019-11-03 21:18 ` [PATCH v2 15/18] powerpc: book3s64: convert to pin_longterm_pages() and put_user_page() John Hubbard
2019-11-03 21:18 ` [PATCH v2 16/18] mm/gup_benchmark: support pin_user_pages() and related calls John Hubbard
2019-11-03 21:18 ` [PATCH v2 17/18] selftests/vm: run_vmtests: invoke gup_benchmark with basic FOLL_PIN coverage John Hubbard
2019-11-03 21:18 ` [PATCH v2 18/18] mm/gup: remove support for gup(FOLL_LONGTERM) John Hubbard

Linux-RDMA Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-rdma/0 linux-rdma/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-rdma linux-rdma/ https://lore.kernel.org/linux-rdma \
		linux-rdma@vger.kernel.org
	public-inbox-index linux-rdma

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-rdma


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git