linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Jason Gunthorpe <jgg@nvidia.com>,
	Dan Williams <dan.j.williams@intel.com>
Cc: <akpm@linux-foundation.org>, Jan Kara <jack@suse.cz>,
	Christoph Hellwig <hch@lst.de>,
	"Darrick J. Wong" <djwong@kernel.org>,
	John Hubbard <jhubbard@nvidia.com>,
	Matthew Wilcox <willy@infradead.org>, <linux-mm@kvack.org>,
	<nvdimm@lists.linux.dev>, <linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH 00/13] Fix the DAX-gup mistake
Date: Tue, 6 Sep 2022 10:23:41 -0700	[thread overview]
Message-ID: <6317821d1c465_166f29417@dwillia2-xfh.jf.intel.com.notmuch> (raw)
In-Reply-To: <YxdFmXi/Zdr8Zi3q@nvidia.com>

Jason Gunthorpe wrote:
> On Sat, Sep 03, 2022 at 07:16:00PM -0700, Dan Williams wrote:
> > tl;dr: Move the pin of 'struct dev_pagemap' instances from gup-time to
> > map time, move the unpin of 'struct dev_pagemap' to truncate_inode_pages()
> > for fsdax and devdax inodes, and use page_maybe_dma_pinned() to
> > determine when filesystems can safely truncate DAX mappings vs DMA.
> > 
> > The longer story is that DAX has caused friction with folio development
> > and other device-memory use cases due to its hack of using a
> > page-reference count of 1 to indicate that the page is DMA idle. That
> > situation arose from the mistake of not managing DAX page reference
> > counts at map time. The lack of page reference counting at map time grew
> > organically from the original DAX experiment of attempting to manage DAX
> > mappings without page structures. The page lock, dirty tracking and
> > other entry management was supported sans pages. However, the page
> > support was then bolted on incrementally so solve problems with gup,
> > memory-failure, and all the other kernel services that are missing when
> > a pfn does not have an associated page structure.
> > 
> > Since then John has led an effort to account for when a page is pinned
> > for DMA vs other sources that elevate the reference count. The
> > page_maybe_dma_pinned() helper slots in seamlessly to replace the need
> > to track transitions to page->_refount == 1.
> > 
> > The larger change in this set comes from Jason's observation that
> > inserting DAX mappings without any reference taken is a bug. So
> > dax_insert_entry(), that fsdax uses, is updated to take 'struct
> > dev_pagemap' references, and devdax is updated to reuse the same.
> 
> It wasn't pagemap references that were the problem, it was struct page
> references.
> 
> pagemap is just something that should be ref'd in the background, as
> long as a struct page has a positive reference the pagemap should be
> considered referenced, IMHO free_zone_device_page() should be dealing
> with this - put the pagemap after calling page_free().

Yes.

I think I got caught admiring the solution of the
page_maybe_dma_pinned() conversion for replacing the ugly observation of
the 2 -> 1 refcount transition, and then introduced this breakage. I
will rework this to catch the 0 to 1 transition of the refcount for
incrementing and use free_zone_device_page() to drop the pgmap
reference.

> Pagemap is protecting page->pgmap from UAF so we must ensure we hold
> it when we do pgmap->ops
> 
> That should be the only put, and it should pair with the only get
> which happens when the driver takes a 0 refcount page out of its free
> list and makes it have a refcount of 1.
> 
> > page mapping helpers. One of the immediate hurdles is the usage of
> > pmd_devmap() to distinguish large page mappings that are not transparent
> > huge pages.
> 
> And this is because the struct page refcounting is not right :|
> 
> I had thought the progression would be to make fsdax use compound
> folios, install compound folios in the PMD, remove all the special
> case refcounting for DAX from the pagetable code, then address the
> pgmap issue from the basis of working page->refcount, eg by putting a
> pgmap put in right after the op->page_free call.

As far as I can see as long as the pgmap is managed at map and
free_zone_device_page() time then the large folio conversion can come
later.

> Can we continue to have the weird page->refcount behavior and still
> change the other things?

No at a minimum the pgmap vs page->refcount problem needs to be solved
first.


  reply	other threads:[~2022-09-06 17:24 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-04  2:16 [PATCH 00/13] Fix the DAX-gup mistake Dan Williams
2022-09-04  2:16 ` [PATCH 01/13] fsdax: Rename "busy page" to "pinned page" Dan Williams
2022-09-04  2:16 ` [PATCH 02/13] fsdax: Use page_maybe_dma_pinned() for DAX vs DMA collisions Dan Williams
2022-09-06 12:07   ` Jason Gunthorpe
2022-09-04  2:16 ` [PATCH 03/13] fsdax: Delete put_devmap_managed_page_refs() Dan Williams
2022-09-04  2:16 ` [PATCH 04/13] fsdax: Update dax_insert_entry() calling convention to return an error Dan Williams
2022-09-04  2:16 ` [PATCH 05/13] fsdax: Cleanup dax_associate_entry() Dan Williams
2022-09-04  2:16 ` [PATCH 06/13] fsdax: Rework dax_insert_entry() calling convention Dan Williams
2022-09-04  2:16 ` [PATCH 07/13] fsdax: Manage pgmap references at entry insertion and deletion Dan Williams
2022-09-06 12:30   ` Jason Gunthorpe
2022-09-04  2:16 ` [PATCH 08/13] devdax: Minor warning fixups Dan Williams
2022-09-04  2:16 ` [PATCH 09/13] devdax: Move address_space helpers to the DAX core Dan Williams
2022-09-04  2:16 ` [PATCH 10/13] dax: Prep dax_{associate, disassociate}_entry() for compound pages Dan Williams
2022-09-04  2:17 ` [PATCH 11/13] devdax: add PUD support to the DAX mapping infrastructure Dan Williams
2022-09-04  2:17 ` [PATCH 12/13] devdax: Use dax_insert_entry() + dax_delete_mapping_entry() Dan Williams
2022-09-04  2:17 ` [PATCH 13/13] mm/gup: Drop DAX pgmap accounting Dan Williams
2022-09-06 13:05 ` [PATCH 00/13] Fix the DAX-gup mistake Jason Gunthorpe
2022-09-06 17:23   ` Dan Williams [this message]
2022-09-06 17:29     ` Jason Gunthorpe
2022-09-06 18:37       ` Dan Williams
2022-09-06 18:49         ` Jason Gunthorpe
2022-09-06 19:41           ` Dan Williams
2022-09-07  0:54             ` Dan Williams
2022-09-07 12:58               ` Jason Gunthorpe
2022-09-07 17:10                 ` Dan Williams
2022-09-07 18:43                   ` Dan Williams
2022-09-07 19:30                     ` Jason Gunthorpe
2022-09-07 20:45                       ` Dan Williams
2022-09-08 18:49                         ` Jason Gunthorpe
2022-09-08 19:27                           ` Dan Williams
2022-09-09 11:53                             ` Jason Gunthorpe
2022-09-09 17:52                               ` Dan Williams
2022-09-09 18:11                             ` Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6317821d1c465_166f29417@dwillia2-xfh.jf.intel.com.notmuch \
    --to=dan.j.williams@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=djwong@kernel.org \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=jgg@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nvdimm@lists.linux.dev \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).