linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ira Weiny <ira.weiny@intel.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: "Jan Kara" <jack@suse.cz>, "John Hubbard" <jhubbard@nvidia.com>,
	"Matthew Wilcox" <willy@infradead.org>,
	john.hubbard@gmail.com,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Christoph Hellwig" <hch@infradead.org>,
	"Dan Williams" <dan.j.williams@intel.com>,
	"Dave Chinner" <david@fromorbit.com>,
	"Dave Hansen" <dave.hansen@linux.intel.com>,
	"Jason Gunthorpe" <jgg@ziepe.ca>,
	"Jérôme Glisse" <jglisse@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>,
	amd-gfx@lists.freedesktop.org, ceph-devel@vger.kernel.org,
	devel@driverdev.osuosl.org, devel@lists.orangefs.org,
	dri-devel@lists.freedesktop.org, intel-gfx@lists.freedesktop.org,
	kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	linux-block@vger.kernel.org, linux-crypto@vger.kernel.org,
	linux-fbdev@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-media@vger.kernel.org, linux-mm@kvack.org,
	linux-nfs@vger.kernel.org, linux-rdma@vger.kernel.org,
	linux-rpi-kernel@lists.infradead.org, linux-xfs@vger.kernel.org,
	netdev@vger.kernel.org, rds-devel@oss.oracle.com,
	sparclinux@vger.kernel.org, x86@kernel.org,
	xen-devel@lists.xenproject.org
Subject: Re: [PATCH 00/34] put_user_pages(): miscellaneous call sites
Date: Wed, 7 Aug 2019 19:36:37 -0700	[thread overview]
Message-ID: <20190808023637.GA1508@iweiny-DESK2.sc.intel.com> (raw)
In-Reply-To: <20190807084649.GQ11812@dhcp22.suse.cz>

On Wed, Aug 07, 2019 at 10:46:49AM +0200, Michal Hocko wrote:
> On Wed 07-08-19 10:37:26, Jan Kara wrote:
> > On Fri 02-08-19 12:14:09, John Hubbard wrote:
> > > On 8/2/19 7:52 AM, Jan Kara wrote:
> > > > On Fri 02-08-19 07:24:43, Matthew Wilcox wrote:
> > > > > On Fri, Aug 02, 2019 at 02:41:46PM +0200, Jan Kara wrote:
> > > > > > On Fri 02-08-19 11:12:44, Michal Hocko wrote:
> > > > > > > On Thu 01-08-19 19:19:31, john.hubbard@gmail.com wrote:
> > > > > > > [...]
> > > > > > > > 2) Convert all of the call sites for get_user_pages*(), to
> > > > > > > > invoke put_user_page*(), instead of put_page(). This involves dozens of
> > > > > > > > call sites, and will take some time.
> > > > > > > 
> > > > > > > How do we make sure this is the case and it will remain the case in the
> > > > > > > future? There must be some automagic to enforce/check that. It is simply
> > > > > > > not manageable to do it every now and then because then 3) will simply
> > > > > > > be never safe.
> > > > > > > 
> > > > > > > Have you considered coccinele or some other scripted way to do the
> > > > > > > transition? I have no idea how to deal with future changes that would
> > > > > > > break the balance though.
> > > 
> > > Hi Michal,
> > > 
> > > Yes, I've thought about it, and coccinelle falls a bit short (it's not smart
> > > enough to know which put_page()'s to convert). However, there is a debug
> > > option planned: a yet-to-be-posted commit [1] uses struct page extensions
> > > (obviously protected by CONFIG_DEBUG_GET_USER_PAGES_REFERENCES) to add
> > > a redundant counter. That allows:
> > > 
> > > void __put_page(struct page *page)
> > > {
> > > 	...
> > > 	/* Someone called put_page() instead of put_user_page() */
> > > 	WARN_ON_ONCE(atomic_read(&page_ext->pin_count) > 0);
> > > 
> > > > > > 
> > > > > > Yeah, that's why I've been suggesting at LSF/MM that we may need to create
> > > > > > a gup wrapper - say vaddr_pin_pages() - and track which sites dropping
> > > > > > references got converted by using this wrapper instead of gup. The
> > > > > > counterpart would then be more logically named as unpin_page() or whatever
> > > > > > instead of put_user_page().  Sure this is not completely foolproof (you can
> > > > > > create new callsite using vaddr_pin_pages() and then just drop refs using
> > > > > > put_page()) but I suppose it would be a high enough barrier for missed
> > > > > > conversions... Thoughts?
> > > 
> > > The debug option above is still a bit simplistic in its implementation
> > > (and maybe not taking full advantage of the data it has), but I think
> > > it's preferable, because it monitors the "core" and WARNs.
> > > 
> > > Instead of the wrapper, I'm thinking: documentation and the passage of
> > > time, plus the debug option (perhaps enhanced--probably once I post it
> > > someone will notice opportunities), yes?
> > 
> > So I think your debug option and my suggested renaming serve a bit
> > different purposes (and thus both make sense). If you do the renaming, you
> > can just grep to see unconverted sites. Also when someone merges new GUP
> > user (unaware of the new rules) while you switch GUP to use pins instead of
> > ordinary references, you'll get compilation error in case of renaming
> > instead of hard to debug refcount leak without the renaming. And such
> > conflict is almost bound to happen given the size of GUP patch set... Also
> > the renaming serves against the "coding inertia" - i.e., GUP is around for
> > ages so people just use it without checking any documentation or comments.
> > After switching how GUP works, what used to be correct isn't anymore so
> > renaming the function serves as a warning that something has really
> > changed.
> 
> Fully agreed!

Ok Prior to this I've been basing all my work for the RDMA/FS DAX stuff in
Johns put_user_pages()...  (Including when I proposed failing truncate with a
lease in June [1])

However, based on the suggestions in that thread it became clear that a new
interface was going to need to be added to pass in the "RDMA file" information
to GUP to associate file pins with the correct processes...

I have many drawings on my white board with "a whole lot of lines" on them to
make sure that if a process opens a file, mmaps it, pins it with RDMA, _closes_
it, and ummaps it; that the resulting file pin can still be traced back to the
RDMA context and all the processes which may have access to it....  No matter
where the original context may have come from.  I believe I have accomplished
that.

Before I go on, I would like to say that the "imbalance" of get_user_pages()
and put_page() bothers me from a purist standpoint...  However, since this
discussion cropped up I went ahead and ported my work to Linus' current master
(5.3-rc3+) and in doing so I only had to steal a bit of Johns code...  Sorry
John...  :-(

I don't have the commit messages all cleaned up and I know there may be some
discussion on these new interfaces but I wanted to throw this series out there
because I think it may be what Jan and Michal are driving at (or at least in
that direction.

Right now only RDMA and DAX FS's are supported.  Other users of GUP will still
fail on a DAX file and regular files will still be at risk.[2]

I've pushed this work (based 5.3-rc3+ (33920f1ec5bf)) here[3]:

https://github.com/weiny2/linux-kernel/tree/linus-rdmafsdax-b0-v3

I think the most relevant patch to this conversation is:

https://github.com/weiny2/linux-kernel/commit/5d377653ba5cf11c3b716f904b057bee6641aaf6

I stole Jans suggestion for a name as the name I used while prototyping was
pretty bad...  So Thanks Jan...  ;-)

Also thanks to John for his contribution on some of this.  I'm still tweaking
put_user_pages under the hood on the DAX path.

Ira

[1] https://lwn.net/Articles/790544/

[2] I've been looking into how to support io_uring next but I've had some issue
getting a test program to actually call GUP in that code path...  :-(

[3] If it would be easier I can just throw an RFC on the list but right now the
cover letter and some of the commit messages are full of the old stuff and
various ideas I have had...

> 
> > Your refcount debug patches are good to catch bugs in the conversions done
> > but that requires you to be able to excercise the code path in the first
> > place which may require particular HW or so, and you also have to enable
> > the debug option which means you already aim at verifying the GUP
> > references are treated properly.
> > 
> > 								Honza
> > 
> > -- 
> > Jan Kara <jack@suse.com>
> > SUSE Labs, CR
> 
> -- 
> Michal Hocko
> SUSE Labs

  reply	other threads:[~2019-08-08  2:37 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-02  2:19 [PATCH 00/34] put_user_pages(): miscellaneous call sites john.hubbard
2019-08-02  2:19 ` [PATCH 01/34] mm/gup: add make_dirty arg to put_user_pages_dirty_lock() john.hubbard
2019-08-02  2:19 ` [PATCH 02/34] net/rds: convert put_page() to put_user_page*() john.hubbard
2019-08-02  2:19 ` [PATCH 03/34] net/ceph: " john.hubbard
2019-08-02 22:32   ` Jeff Layton
2019-08-02  2:19 ` [PATCH 04/34] x86/kvm: " john.hubbard
2019-08-02  2:19 ` [PATCH 05/34] drm/etnaviv: convert release_pages() to put_user_pages() john.hubbard
2019-08-02  2:19 ` [PATCH 06/34] drm/i915: convert put_page() to put_user_page*() john.hubbard
2019-08-02  9:19   ` Joonas Lahtinen
2019-08-02 18:48     ` John Hubbard
2019-08-03 20:03       ` John Hubbard
2019-08-02  2:19 ` [PATCH 07/34] drm/radeon: " john.hubbard
2019-08-02  2:19 ` [PATCH 08/34] media/ivtv: " john.hubbard
2019-08-02  2:19 ` [PATCH 09/34] media/v4l2-core/mm: " john.hubbard
2019-08-02  2:19 ` [PATCH 10/34] genwqe: " john.hubbard
2019-08-03  7:06   ` Greg Kroah-Hartman
2019-08-02  2:19 ` [PATCH 11/34] scif: " john.hubbard
2019-08-02  2:19 ` [PATCH 12/34] vmci: " john.hubbard
2019-08-02  2:19 ` [PATCH 13/34] rapidio: " john.hubbard
2019-08-02  2:19 ` [PATCH 14/34] oradax: " john.hubbard
2019-08-02  2:19 ` [PATCH 15/34] staging/vc04_services: " john.hubbard
2019-08-03  7:06   ` Greg Kroah-Hartman
2019-08-02  2:19 ` [PATCH 16/34] drivers/tee: " john.hubbard
2019-08-02  6:29   ` Jens Wiklander
2019-08-02 18:51     ` John Hubbard
2019-08-02  2:19 ` [PATCH 17/34] vfio: " john.hubbard
2019-08-02  2:19 ` [PATCH 18/34] fbdev/pvr2fb: " john.hubbard
2019-08-02  2:19 ` [PATCH 19/34] fsl_hypervisor: " john.hubbard
2019-08-02  2:19 ` [PATCH 20/34] xen: " john.hubbard
2019-08-02  4:36   ` Juergen Gross
2019-08-02  5:48     ` John Hubbard
2019-08-02  6:10       ` Juergen Gross
2019-08-02 16:09         ` Weiny, Ira
2019-08-02 19:25           ` John Hubbard
2019-08-02  2:19 ` [PATCH 21/34] fs/exec.c: " john.hubbard
2019-08-02  2:19 ` [PATCH 22/34] orangefs: " john.hubbard
2019-08-02  2:19 ` [PATCH 23/34] uprobes: " john.hubbard
2019-08-02  2:19 ` [PATCH 24/34] futex: " john.hubbard
2019-08-02  2:19 ` [PATCH 25/34] mm/frame_vector.c: " john.hubbard
2019-08-02  2:19 ` [PATCH 26/34] mm/gup_benchmark.c: " john.hubbard
2019-08-02 14:19   ` Keith Busch
2019-08-02  2:19 ` [PATCH 27/34] mm/memory.c: " john.hubbard
2019-08-02  2:19 ` [PATCH 28/34] mm/madvise.c: " john.hubbard
2019-08-02  2:20 ` [PATCH 29/34] mm/process_vm_access.c: " john.hubbard
2019-08-02  2:20 ` [PATCH 30/34] crypt: " john.hubbard
2019-08-02  2:20 ` [PATCH 31/34] nfs: " john.hubbard
2019-08-03  1:27   ` Calum Mackay
2019-08-03  1:41     ` John Hubbard
2019-08-04 23:28       ` Calum Mackay
2019-08-02  2:20 ` [PATCH 32/34] goldfish_pipe: " john.hubbard
2019-08-02  2:20 ` [PATCH 33/34] kernel/events/core.c: " john.hubbard
2019-08-02  2:20 ` [PATCH 34/34] fs/binfmt_elf: " john.hubbard
2019-08-02  9:12 ` [PATCH 00/34] put_user_pages(): miscellaneous call sites Michal Hocko
2019-08-02 12:41   ` Jan Kara
2019-08-02 14:24     ` Matthew Wilcox
2019-08-02 14:52       ` Jan Kara
2019-08-02 19:14         ` John Hubbard
2019-08-07  8:37           ` Jan Kara
2019-08-07  8:46             ` Michal Hocko
2019-08-08  2:36               ` Ira Weiny [this message]
2019-08-08  3:46                 ` John Hubbard
2019-08-08 16:25                   ` Weiny, Ira
2019-08-08 18:18                     ` John Hubbard
2019-08-09  8:34                 ` Jan Kara
  -- strict thread matches above, loose matches on Subject: below --
2019-08-02  2:16 john.hubbard
2019-08-02  2:39 ` John Hubbard
2019-08-02  8:05 ` Peter Zijlstra
2019-08-02 19:33   ` John Hubbard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190808023637.GA1508@iweiny-DESK2.sc.intel.com \
    --to=ira.weiny@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=ceph-devel@vger.kernel.org \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@fromorbit.com \
    --cc=devel@driverdev.osuosl.org \
    --cc=devel@lists.orangefs.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=hch@infradead.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=jack@suse.cz \
    --cc=jgg@ziepe.ca \
    --cc=jglisse@redhat.com \
    --cc=jhubbard@nvidia.com \
    --cc=john.hubbard@gmail.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-fbdev@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=linux-rpi-kernel@lists.infradead.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=mhocko@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=rds-devel@oss.oracle.com \
    --cc=sparclinux@vger.kernel.org \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).