linux-kselftest.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Jan Kara <jack@suse.cz>
Cc: "John Hubbard" <jhubbard@nvidia.com>,
	"Leon Romanovsky" <leon@kernel.org>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Al Viro" <viro@zeniv.linux.org.uk>,
	"Alex Williamson" <alex.williamson@redhat.com>,
	"Benjamin Herrenschmidt" <benh@kernel.crashing.org>,
	"Björn Töpel" <bjorn.topel@intel.com>,
	"Christoph Hellwig" <hch@infradead.org>,
	"Daniel Vetter" <daniel@ffwll.ch>,
	"Dave Chinner" <david@fromorbit.com>,
	"David Airlie" <airlied@linux.ie>,
	"David S . Miller" <davem@davemloft.net>,
	"Ira Weiny" <ira.weiny@intel.com>,
	"Jason Gunthorpe" <jgg@ziepe.ca>, "Jens Axboe" <axboe@kernel.dk>,
	"Jonathan Corbet" <corbet@lwn.net>,
	"Jérôme Glisse" <jglisse@redhat.com>,
	"Magnus Karlsson" <magnus.karlsson@intel.com>,
	"Mauro Carvalho Chehab" <mchehab@kernel.org>,
	"Michael Ellerman" <mpe@ellerman.id.au>,
	"Michal Hocko" <mhocko@suse.com>,
	"Mike Kravetz" <mike.kravetz@oracle.com>,
	"Paul Mackerras" <paulus@samba.org>,
	"Shuah Khan" <shuah@kernel.org>,
	"Vlastimil Babka" <vbabka@suse.cz>,
	bpf@vger.kernel.org,
	"Maling list - DRI developers" <dri-devel@lists.freedesktop.org>,
	"KVM list" <kvm@vger.kernel.org>,
	linux-block@vger.kernel.org,
	"Linux Doc Mailing List" <linux-doc@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	linux-kselftest@vger.kernel.org,
	"Linux-media@vger.kernel.org" <linux-media@vger.kernel.org>,
	linux-rdma <linux-rdma@vger.kernel.org>,
	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>,
	Netdev <netdev@vger.kernel.org>, "Linux MM" <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	"Maor Gottlieb" <maorg@mellanox.com>
Subject: Re: [PATCH v11 00/25] mm/gup: track dma-pinned pages: FOLL_PIN
Date: Fri, 20 Dec 2019 16:33:10 -0800	[thread overview]
Message-ID: <CAPcyv4gYnXE-y_aGehazzF-Kej5ibSfqvE2hTnjKJD68bm8ANg@mail.gmail.com> (raw)
In-Reply-To: <20191220092154.GA10068@quack2.suse.cz>

On Fri, Dec 20, 2019 at 1:22 AM Jan Kara <jack@suse.cz> wrote:
>
> On Thu 19-12-19 12:30:31, John Hubbard wrote:
> > On 12/19/19 5:26 AM, Leon Romanovsky wrote:
> > > On Mon, Dec 16, 2019 at 02:25:12PM -0800, John Hubbard wrote:
> > > > Hi,
> > > >
> > > > This implements an API naming change (put_user_page*() -->
> > > > unpin_user_page*()), and also implements tracking of FOLL_PIN pages. It
> > > > extends that tracking to a few select subsystems. More subsystems will
> > > > be added in follow up work.
> > >
> > > Hi John,
> > >
> > > The patchset generates kernel panics in our IB testing. In our tests, we
> > > allocated single memory block and registered multiple MRs using the single
> > > block.
> > >
> > > The possible bad flow is:
> > >   ib_umem_geti() ->
> > >    pin_user_pages_fast(FOLL_WRITE) ->
> > >     internal_get_user_pages_fast(FOLL_WRITE) ->
> > >      gup_pgd_range() ->
> > >       gup_huge_pd() ->
> > >        gup_hugepte() ->
> > >         try_grab_compound_head() ->
> >
> > Hi Leon,
> >
> > Thanks very much for the detailed report! So we're overflowing...
> >
> > At first look, this seems likely to be hitting a weak point in the
> > GUP_PIN_COUNTING_BIAS-based design, one that I believed could be deferred
> > (there's a writeup in Documentation/core-api/pin_user_page.rst, lines
> > 99-121). Basically it's pretty easy to overflow the page->_refcount
> > with huge pages if the pages have a *lot* of subpages.
> >
> > We can only do about 7 pins on 1GB huge pages that use 4KB subpages.
> > Do you have any idea how many pins (repeated pins on the same page, which
> > it sounds like you have) might be involved in your test case,
> > and the huge page and system page sizes? That would allow calculating
> > if we're likely overflowing for that reason.
> >
> > So, ideas and next steps:
> >
> > 1. Assuming that you *are* hitting this, I think I may have to fall back to
> > implementing the "deferred" part of this design, as part of this series, after
> > all. That means:
> >
> >   For the pin/unpin calls at least, stop treating all pages as if they are
> >   a cluster of PAGE_SIZE pages; instead, retrieve a huge page as one page.
> >   That's not how it works now, and the need to hand back a huge array of
> >   subpages is part of the problem. This affects the callers too, so it's not
> >   a super quick change to make. (I was really hoping not to have to do this
> >   yet.)
>
> Does that mean that you would need to make all GUP users huge page aware?
> Otherwise I don't see how what you suggest would work... And I don't think
> making all GUP users huge page aware is realistic (effort-wise) or even
> wanted (maintenance overhead in all those places).
>
> I believe there might be also a different solution for this: For
> transparent huge pages, we could find a space in 'struct page' of the
> second page in the huge page for proper pin counter and just account pins
> there so we'd have full width of 32-bits for it.

That would require THP accounting for dax pages. It is something that
was probably going to be needed, but this would seem to force the
issue.

  parent reply	other threads:[~2019-12-21  0:33 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-16 22:25 [PATCH v11 00/25] mm/gup: track dma-pinned pages: FOLL_PIN John Hubbard
2019-12-16 22:25 ` [PATCH v11 01/25] mm/gup: factor out duplicate code from four routines John Hubbard
2019-12-18 15:52   ` Kirill A. Shutemov
2019-12-18 22:15     ` John Hubbard
2019-12-18 22:45       ` Kirill A. Shutemov
2019-12-16 22:25 ` [PATCH v11 02/25] mm/gup: move try_get_compound_head() to top, fix minor issues John Hubbard
2019-12-16 22:25 ` [PATCH v11 03/25] mm: Cleanup __put_devmap_managed_page() vs ->page_free() John Hubbard
2019-12-16 22:25 ` [PATCH v11 04/25] mm: devmap: refactor 1-based refcounting for ZONE_DEVICE pages John Hubbard
2019-12-18 16:04   ` Kirill A. Shutemov
2019-12-19  0:32     ` John Hubbard
2019-12-19  0:40     ` [PATCH v12] " John Hubbard
2019-12-19  5:27   ` [PATCH v11 04/25] " Dan Williams
2019-12-19  5:48     ` John Hubbard
2019-12-19  6:52       ` Dan Williams
2019-12-19  7:33         ` John Hubbard
2019-12-16 22:25 ` [PATCH v11 05/25] goldish_pipe: rename local pin_user_pages() routine John Hubbard
2019-12-16 22:25 ` [PATCH v11 06/25] mm: fix get_user_pages_remote()'s handling of FOLL_LONGTERM John Hubbard
2019-12-18 16:19   ` Kirill A. Shutemov
2019-12-18 22:15     ` John Hubbard
2019-12-16 22:25 ` [PATCH v11 07/25] vfio: fix FOLL_LONGTERM use, simplify get_user_pages_remote() call John Hubbard
2019-12-16 22:25 ` [PATCH v11 08/25] mm/gup: allow FOLL_FORCE for get_user_pages_fast() John Hubbard
2019-12-16 22:25 ` [PATCH v11 09/25] IB/umem: use get_user_pages_fast() to pin DMA pages John Hubbard
2019-12-16 22:25 ` [PATCH v11 10/25] mm/gup: introduce pin_user_pages*() and FOLL_PIN John Hubbard
2019-12-16 22:25 ` [PATCH v11 11/25] goldish_pipe: convert to pin_user_pages() and put_user_page() John Hubbard
2019-12-16 22:25 ` [PATCH v11 12/25] IB/{core,hw,umem}: set FOLL_PIN via pin_user_pages*(), fix up ODP John Hubbard
2019-12-16 22:25 ` [PATCH v11 13/25] mm/process_vm_access: set FOLL_PIN via pin_user_pages_remote() John Hubbard
2019-12-16 22:25 ` [PATCH v11 14/25] drm/via: set FOLL_PIN via pin_user_pages_fast() John Hubbard
2019-12-16 22:25 ` [PATCH v11 15/25] fs/io_uring: set FOLL_PIN via pin_user_pages() John Hubbard
2019-12-16 22:25 ` [PATCH v11 16/25] net/xdp: " John Hubbard
2019-12-16 22:25 ` [PATCH v11 17/25] media/v4l2-core: set pages dirty upon releasing DMA buffers John Hubbard
2019-12-16 22:25 ` [PATCH v11 18/25] media/v4l2-core: pin_user_pages (FOLL_PIN) and put_user_page() conversion John Hubbard
2019-12-16 22:25 ` [PATCH v11 19/25] vfio, mm: " John Hubbard
2019-12-16 22:25 ` [PATCH v11 20/25] powerpc: book3s64: convert to pin_user_pages() and put_user_page() John Hubbard
2019-12-16 22:25 ` [PATCH v11 21/25] mm/gup_benchmark: use proper FOLL_WRITE flags instead of hard-coding "1" John Hubbard
2019-12-16 22:25 ` [PATCH v11 22/25] mm, tree-wide: rename put_user_page*() to unpin_user_page*() John Hubbard
2019-12-16 22:25 ` [PATCH v11 23/25] mm/gup: track FOLL_PIN pages John Hubbard
2019-12-17 14:19   ` [PATCH v12 " John Hubbard
2019-12-16 22:25 ` [PATCH v11 24/25] mm/gup_benchmark: support pin_user_pages() and related calls John Hubbard
2019-12-16 22:25 ` [PATCH v11 25/25] selftests/vm: run_vmtests: invoke gup_benchmark with basic FOLL_PIN coverage John Hubbard
2019-12-17  7:39 ` [PATCH v11 00/25] mm/gup: track dma-pinned pages: FOLL_PIN Jan Kara
2019-12-19 13:26 ` Leon Romanovsky
2019-12-19 20:30   ` John Hubbard
2019-12-19 21:07     ` Jason Gunthorpe
2019-12-19 21:13       ` John Hubbard
2019-12-20 13:34         ` Jason Gunthorpe
2019-12-21  0:32           ` Dan Williams
2019-12-23 18:24             ` Jason Gunthorpe
2019-12-19 22:58       ` John Hubbard
2019-12-20 18:48         ` Leon Romanovsky
2019-12-20 23:13           ` John Hubbard
2019-12-20 18:29       ` Leon Romanovsky
2019-12-20 23:54         ` John Hubbard
2019-12-21 10:08           ` Leon Romanovsky
2019-12-21 23:59             ` John Hubbard
2019-12-22 13:23           ` Leon Romanovsky
2019-12-25  2:03             ` John Hubbard
2019-12-25  5:26               ` Leon Romanovsky
2019-12-27 21:56                 ` John Hubbard
2019-12-29  4:33                   ` John Hubbard
2020-01-06  9:01                     ` Jan Kara
2020-01-07  1:26                       ` John Hubbard
2019-12-20  9:21     ` Jan Kara
2019-12-21  0:02       ` John Hubbard
2019-12-21  0:33       ` Dan Williams [this message]
2019-12-21  0:41         ` John Hubbard
2019-12-21  0:51           ` Dan Williams
2019-12-21  0:53             ` John Hubbard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPcyv4gYnXE-y_aGehazzF-Kej5ibSfqvE2hTnjKJD68bm8ANg@mail.gmail.com \
    --to=dan.j.williams@intel.com \
    --cc=airlied@linux.ie \
    --cc=akpm@linux-foundation.org \
    --cc=alex.williamson@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=benh@kernel.crashing.org \
    --cc=bjorn.topel@intel.com \
    --cc=bpf@vger.kernel.org \
    --cc=corbet@lwn.net \
    --cc=daniel@ffwll.ch \
    --cc=davem@davemloft.net \
    --cc=david@fromorbit.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=hch@infradead.org \
    --cc=ira.weiny@intel.com \
    --cc=jack@suse.cz \
    --cc=jgg@ziepe.ca \
    --cc=jglisse@redhat.com \
    --cc=jhubbard@nvidia.com \
    --cc=kvm@vger.kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=magnus.karlsson@intel.com \
    --cc=maorg@mellanox.com \
    --cc=mchehab@kernel.org \
    --cc=mhocko@suse.com \
    --cc=mike.kravetz@oracle.com \
    --cc=mpe@ellerman.id.au \
    --cc=netdev@vger.kernel.org \
    --cc=paulus@samba.org \
    --cc=shuah@kernel.org \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).