All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: John Hubbard <jhubbard@nvidia.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>, Jan Kara <jack@suse.cz>,
	Andrew Morton <akpm@linux-foundation.org>,
	john.hubbard@gmail.com, Matthew Wilcox <willy@infradead.org>,
	Michal Hocko <mhocko@kernel.org>,
	Christopher Lameter <cl@linux.com>,
	Dan Williams <dan.j.williams@intel.com>,
	linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>,
	linux-rdma <linux-rdma@vger.kernel.org>,
	linux-fsdevel@vger.kernel.org, Al Viro <viro@zeniv.linux.org.uk>,
	Jerome Glisse <jglisse@redhat.com>,
	Christoph Hellwig <hch@infradead.org>,
	Ralph Campbell <rcampbell@nvidia.com>
Subject: Re: [PATCH v4 2/3] mm: introduce put_user_page*(), placeholder versions
Date: Thu, 18 Oct 2018 12:19:51 +0200	[thread overview]
Message-ID: <20181018101951.GO23493@quack2.suse.cz> (raw)
In-Reply-To: <b9899626-9033-348b-6f07-dc90bcd8a468@nvidia.com>

On Thu 11-10-18 20:53:34, John Hubbard wrote:
> On 10/11/18 6:23 PM, John Hubbard wrote:
> > On 10/11/18 6:20 AM, Jason Gunthorpe wrote:
> >> On Thu, Oct 11, 2018 at 10:49:29AM +0200, Jan Kara wrote:
> >>
> >>>> This is a real worry.  If someone uses a mistaken put_page() then how
> >>>> will that bug manifest at runtime?  Under what set of circumstances
> >>>> will the kernel trigger the bug?
> >>>
> >>> At runtime such bug will manifest as a page that can never be evicted from
> >>> memory. We could warn in put_page() if page reference count drops below
> >>> bare minimum for given user pin count which would be able to catch some
> >>> issues but it won't be 100% reliable. So at this point I'm more leaning
> >>> towards making get_user_pages() return a different type than just
> >>> struct page * to make it much harder for refcount to go wrong...
> >>
> >> At least for the infiniband code being used as an example here we take
> >> the struct page from get_user_pages, then stick it in a sgl, and at
> >> put_page time we get the page back out of the sgl via sg_page()
> >>
> >> So type safety will not help this case... I wonder how many other
> >> users are similar? I think this is a pretty reasonable flow for DMA
> >> with user pages.
> >>
> > 
> > That is true. The infiniband code, fortunately, never mixes the two page
> > types into the same pool (or sg list), so it's actually an easier example
> > than some other subsystems. But, yes, type safety doesn't help there. I can 
> > take a moment to look around at the other areas, to quantify how much a type
> > safety change might help.
> > 
> > Back to page flags again, out of desperation:
> > 
> > How much do we know about the page types that all of these subsystems
> > use? In other words, can we, for example, use bit 1 of page->lru.next (see [1]
> > for context) as the "dma-pinned" page flag, while tracking pages within parts 
> > of the kernel that call a mix of alloc_pages, get_user_pages, and other allocators? 
> > In order for that to work, page->index, page->private, and bit 1 of page->mapping
> > must not be used. I doubt that this is always going to hold, but...does it?
> > 
> 
> Oops, pardon me, please ignore that nonsense about page->index and page->private
> and page->mapping, that's actually fine (I was seeing "union", where "struct" was
> written--too much staring at this code). 
> 
> So actually, I think maybe we can just use bit 1 in page->lru.next to sort out
> which pages are dma-pinned, in the calling code, just like we're going to do
> in writeback situations. This should also allow run-time checking that Andrew was 
> hoping for:
> 
>     put_user_page(): assert that the page is dma-pinned
>     put_page(): assert that the page is *not* dma-pinned
> 
> ...both of which depend on that bit being, essentially, available as sort
> of a general page flag. And in fact, if it's not, then the whole approach
> is dead anyway.

Well, put_page() cannot assert page is not dma-pinned as someone can still
to get_page(), put_page() on dma-pinned page and that must not barf. But
put_page() could assert that if the page is pinned, refcount is >=
pincount. That will detect leaked pin references relatively quickly.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

  reply	other threads:[~2018-10-18 10:19 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-08 21:16 [PATCH v4 0/3] get_user_pages*() and RDMA: first steps john.hubbard
2018-10-08 21:16 ` [PATCH v4 1/3] mm: get_user_pages: consolidate error handling john.hubbard
2018-10-09  0:05   ` Andrew Morton
2018-10-08 21:16 ` [PATCH v4 2/3] mm: introduce put_user_page*(), placeholder versions john.hubbard
2018-10-09  0:14   ` Andrew Morton
2018-10-09  8:30     ` Jan Kara
2018-10-09 23:20       ` Andrew Morton
2018-10-10  0:32         ` John Hubbard
2018-10-10  0:32           ` John Hubbard
2018-10-10 23:43           ` Andrew Morton
2018-10-10 23:43             ` Andrew Morton
2018-10-10  0:42     ` John Hubbard
2018-10-10  0:42       ` John Hubbard
2018-10-10  8:59       ` Jan Kara
2018-10-10 23:23         ` John Hubbard
2018-10-10 23:23           ` John Hubbard
2018-10-11  8:42           ` Jan Kara
2018-10-10 23:45       ` Andrew Morton
2018-10-10 23:45         ` Andrew Morton
2018-10-11  8:49         ` Jan Kara
2018-10-11 13:20           ` Jason Gunthorpe
2018-10-12  1:23             ` John Hubbard
2018-10-12  1:23               ` John Hubbard
2018-10-12  3:53               ` John Hubbard
2018-10-12  3:53                 ` John Hubbard
2018-10-18 10:19                 ` Jan Kara [this message]
2018-11-05  7:25                   ` John Hubbard
2018-11-05  7:25                     ` John Hubbard
2018-10-22 19:43               ` Jason Gunthorpe
2018-11-05  7:17                 ` John Hubbard
2018-11-05  7:17                   ` John Hubbard
2018-11-05  8:37                   ` Jan Kara
2018-10-08 21:16 ` [PATCH v4 3/3] infiniband/mm: convert put_page() to put_user_page*() john.hubbard
2018-10-09  9:52   ` kbuild test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181018101951.GO23493@quack2.suse.cz \
    --to=jack@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=dan.j.williams@intel.com \
    --cc=hch@infradead.org \
    --cc=jgg@ziepe.ca \
    --cc=jglisse@redhat.com \
    --cc=jhubbard@nvidia.com \
    --cc=john.hubbard@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=mhocko@kernel.org \
    --cc=rcampbell@nvidia.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.