linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Dan Williams <dan.j.williams@intel.com>,
	Jason Gunthorpe <jgg@nvidia.com>
Cc: <akpm@linux-foundation.org>, Jan Kara <jack@suse.cz>,
	Christoph Hellwig <hch@lst.de>,
	"Darrick J. Wong" <djwong@kernel.org>,
	John Hubbard <jhubbard@nvidia.com>,
	Matthew Wilcox <willy@infradead.org>, <linux-mm@kvack.org>,
	<nvdimm@lists.linux.dev>, <linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH 00/13] Fix the DAX-gup mistake
Date: Tue, 6 Sep 2022 17:54:54 -0700	[thread overview]
Message-ID: <6317ebde620ec_166f29466@dwillia2-xfh.jf.intel.com.notmuch> (raw)
In-Reply-To: <6317a26d3e1ed_166f2946e@dwillia2-xfh.jf.intel.com.notmuch>

Dan Williams wrote:
> Jason Gunthorpe wrote:
> > On Tue, Sep 06, 2022 at 11:37:36AM -0700, Dan Williams wrote:
> > > Jason Gunthorpe wrote:
> > > > On Tue, Sep 06, 2022 at 10:23:41AM -0700, Dan Williams wrote:
> > > > 
> > > > > > Can we continue to have the weird page->refcount behavior and still
> > > > > > change the other things?
> > > > > 
> > > > > No at a minimum the pgmap vs page->refcount problem needs to be solved
> > > > > first.
> > > > 
> > > > So who will do the put page after the PTE/PMD's are cleared out? In
> > > > the normal case the tlb flusher does it integrated into zap..
> > > 
> > > AFAICS the zap manages the _mapcount not _refcount. Are you talking
> > > about page_remove_rmap() or some other reference count drop?
> > 
> > No, page refcount.
> > 
> > __tlb_remove_page() eventually causes a put_page() via
> > tlb_batch_pages_flush() calling free_pages_and_swap_cache()
> > 
> > Eg:
> > 
> >  *  MMU_GATHER_NO_GATHER
> >  *
> >  *  If the option is set the mmu_gather will not track individual pages for
> >  *  delayed page free anymore. A platform that enables the option needs to
> >  *  provide its own implementation of the __tlb_remove_page_size() function to
> >  *  free pages.
> 
> Ok, yes, that is a vm_normal_page() mechanism which I was going to defer
> since it is incremental to the _refcount handling fix and maintain that
> DAX pages are still !vm_normal_page() in this set.
> 
> > > > Can we safely have the put page in the fsdax side after the zap?
> > > 
> > > The _refcount is managed from the lifetime insert_page() to
> > > truncate_inode_pages(), where for DAX those are managed from
> > > dax_insert_dentry() to dax_delete_mapping_entry().
> > 
> > As long as we all understand the page doesn't become re-allocatable
> > until the refcount reaches 0 and the free op is called it may be OK!
> 
> Yes, but this does mean that page_maybe_dma_pinned() is not sufficient for
> when the filesystem can safely reuse the page, it really needs to wait
> for the reference count to drop to 0 similar to how it waits for the
> page-idle condition today.

This gets tricky with break_layouts(). For example xfs_break_layouts()
wants to ensure that the page is gup idle while holding the mmap lock.
If the page is not gup idle it needs to drop locks and retry. It is
possible the path to drop a page reference also needs to acquire
filesystem locs. Consider odd cases like DMA from one offset to another
in the same file. So waiting with filesystem locks held is off the
table, which also means that deferring the wait until
dax_delete_mapping_entry() time is also off the table.

That means that even after the conversion to make DAX page references
0-based it will still be the case that filesystem code will be waiting
for the 2 -> 1 transition to indicate "mapped DAX page has no more
external references".

Then dax_delete_mapping_entry() can validate that it is performing the 1
-> 0 transition since no more refernences should have been taken while
holding filesystem locks.


  reply	other threads:[~2022-09-07  0:55 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-04  2:16 [PATCH 00/13] Fix the DAX-gup mistake Dan Williams
2022-09-04  2:16 ` [PATCH 01/13] fsdax: Rename "busy page" to "pinned page" Dan Williams
2022-09-04  2:16 ` [PATCH 02/13] fsdax: Use page_maybe_dma_pinned() for DAX vs DMA collisions Dan Williams
2022-09-06 12:07   ` Jason Gunthorpe
2022-09-04  2:16 ` [PATCH 03/13] fsdax: Delete put_devmap_managed_page_refs() Dan Williams
2022-09-04  2:16 ` [PATCH 04/13] fsdax: Update dax_insert_entry() calling convention to return an error Dan Williams
2022-09-04  2:16 ` [PATCH 05/13] fsdax: Cleanup dax_associate_entry() Dan Williams
2022-09-04  2:16 ` [PATCH 06/13] fsdax: Rework dax_insert_entry() calling convention Dan Williams
2022-09-04  2:16 ` [PATCH 07/13] fsdax: Manage pgmap references at entry insertion and deletion Dan Williams
2022-09-06 12:30   ` Jason Gunthorpe
2022-09-04  2:16 ` [PATCH 08/13] devdax: Minor warning fixups Dan Williams
2022-09-04  2:16 ` [PATCH 09/13] devdax: Move address_space helpers to the DAX core Dan Williams
2022-09-04  2:16 ` [PATCH 10/13] dax: Prep dax_{associate, disassociate}_entry() for compound pages Dan Williams
2022-09-04  2:17 ` [PATCH 11/13] devdax: add PUD support to the DAX mapping infrastructure Dan Williams
2022-09-04  2:17 ` [PATCH 12/13] devdax: Use dax_insert_entry() + dax_delete_mapping_entry() Dan Williams
2022-09-04  2:17 ` [PATCH 13/13] mm/gup: Drop DAX pgmap accounting Dan Williams
2022-09-06 13:05 ` [PATCH 00/13] Fix the DAX-gup mistake Jason Gunthorpe
2022-09-06 17:23   ` Dan Williams
2022-09-06 17:29     ` Jason Gunthorpe
2022-09-06 18:37       ` Dan Williams
2022-09-06 18:49         ` Jason Gunthorpe
2022-09-06 19:41           ` Dan Williams
2022-09-07  0:54             ` Dan Williams [this message]
2022-09-07 12:58               ` Jason Gunthorpe
2022-09-07 17:10                 ` Dan Williams
2022-09-07 18:43                   ` Dan Williams
2022-09-07 19:30                     ` Jason Gunthorpe
2022-09-07 20:45                       ` Dan Williams
2022-09-08 18:49                         ` Jason Gunthorpe
2022-09-08 19:27                           ` Dan Williams
2022-09-09 11:53                             ` Jason Gunthorpe
2022-09-09 17:52                               ` Dan Williams
2022-09-09 18:11                             ` Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6317ebde620ec_166f29466@dwillia2-xfh.jf.intel.com.notmuch \
    --to=dan.j.williams@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=djwong@kernel.org \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=jgg@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nvdimm@lists.linux.dev \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).