All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Dave Chinner <david@fromorbit.com>,
	Dan Williams <dan.j.williams@intel.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>, <akpm@linux-foundation.org>,
	"Matthew Wilcox" <willy@infradead.org>, Jan Kara <jack@suse.cz>,
	"Darrick J. Wong" <djwong@kernel.org>,
	Christoph Hellwig <hch@lst.de>,
	John Hubbard <jhubbard@nvidia.com>,
	<linux-fsdevel@vger.kernel.org>, <nvdimm@lists.linux.dev>,
	<linux-xfs@vger.kernel.org>, <linux-mm@kvack.org>,
	<linux-ext4@vger.kernel.org>
Subject: Re: [PATCH v2 10/18] fsdax: Manage pgmap references at entry insertion and deletion
Date: Thu, 22 Sep 2022 19:01:56 -0700	[thread overview]
Message-ID: <632d13949e113_4a6742947c@dwillia2-xfh.jf.intel.com.notmuch> (raw)
In-Reply-To: <20220923013634.GY3600936@dread.disaster.area>

Dave Chinner wrote:
> On Thu, Sep 22, 2022 at 02:54:42PM -0700, Dan Williams wrote:
> > Jason Gunthorpe wrote:
> > > On Wed, Sep 21, 2022 at 07:17:40PM -0700, Dan Williams wrote:
> > > > Jason Gunthorpe wrote:
> > > > > On Wed, Sep 21, 2022 at 05:14:34PM -0700, Dan Williams wrote:
> > > > > 
> > > > > > > Indeed, you could reasonably put such a liveness test at the moment
> > > > > > > every driver takes a 0 refcount struct page and turns it into a 1
> > > > > > > refcount struct page.
> > > > > > 
> > > > > > I could do it with a flag, but the reason to have pgmap->ref managed at
> > > > > > the page->_refcount 0 -> 1 and 1 -> 0 transitions is so at the end of
> > > > > > time memunmap_pages() can look at the one counter rather than scanning
> > > > > > and rescanning all the pages to see when they go to final idle.
> > > > > 
> > > > > That makes some sense too, but the logical way to do that is to put some
> > > > > counter along the page_free() path, and establish a 'make a page not
> > > > > free' path that does the other side.
> > > > > 
> > > > > ie it should not be in DAX code, it should be all in common pgmap
> > > > > code. The pgmap should never be freed while any page->refcount != 0
> > > > > and that should be an intrinsic property of pgmap, not relying on
> > > > > external parties.
> > > > 
> > > > I just do not know where to put such intrinsics since there is nothing
> > > > today that requires going through the pgmap object to discover the pfn
> > > > and 'allocate' the page.
> > > 
> > > I think that is just a new API that wrappers the set refcount = 1,
> > > percpu refcount and maybe building appropriate compound pages too.
> > > 
> > > Eg maybe something like:
> > > 
> > >   struct folio *pgmap_alloc_folios(pgmap, start, length)
> > > 
> > > And you get back maximally sized allocated folios with refcount = 1
> > > that span the requested range.
> > > 
> > > > In other words make dax_direct_access() the 'allocation' event that pins
> > > > the pgmap? I might be speaking a foreign language if you're not familiar
> > > > with the relationship of 'struct dax_device' to 'struct dev_pagemap'
> > > > instances. This is not the first time I have considered making them one
> > > > in the same.
> > > 
> > > I don't know enough about dax, so yes very foreign :)
> > > 
> > > I'm thinking broadly about how to make pgmap usable to all the other
> > > drivers in a safe and robust way that makes some kind of logical sense.
> > 
> > I think the API should be pgmap_folio_get() because, at least for DAX,
> > the memory is already allocated. The 'allocator' for fsdax is the
> > filesystem block allocator, and pgmap_folio_get() grants access to a
> 
> No, the "allocator" for fsdax is the inode iomap interface, not the
> filesystem block allocator. The filesystem block allocator is only
> involved in iomapping if we have to allocate a new mapping for a
> given file offset.
> 
> A better name for this is "arbiter", not allocator.  To get an
> active mapping of the DAX pages backing a file, we need to ask the
> inode iomap subsystem to *map a file offset* and it will return
> kaddr and/or pfns for the backing store the file offset maps to.
> 
> IOWs, for FSDAX, access to the backing store (i.e. the physical pages) is
> arbitrated by the *inode*, not the filesystem allocator or the dax
> device. Hence if a subsystem needs to pin the backing store for some
> use, it must first ensure that it holds an inode reference (direct
> or indirect) for that range of the backing store that will spans the
> life of the pin. When the pin is done, it can tear down the mappings
> it was using and then the inode reference can be released.
> 
> This ensures that any racing unlink of the inode will not result in
> the backing store being freed from under the application that has a
> pin. It will prevent the inode from being reclaimed and so
> potentially accessing stale or freed in-memory structures. And it
> will prevent the filesytem from being unmounted while the
> application using FSDAX access is still actively using that
> functionality even if it's already closed all it's fds....

Sounds so simple when you put it that way. I'll give it a shot and stop
the gymnastics of trying to get in front of truncate_inode_pages_final()
with a 'dax break layouts', just hold it off until final unpin.

  reply	other threads:[~2022-09-23  2:02 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-16  3:35 [PATCH v2 00/18] Fix the DAX-gup mistake Dan Williams
2022-09-16  3:35 ` [PATCH v2 01/18] fsdax: Wait on @page not @page->_refcount Dan Williams
2022-09-20 14:30   ` Jason Gunthorpe
2022-09-16  3:35 ` [PATCH v2 02/18] fsdax: Use dax_page_idle() to document DAX busy page checking Dan Williams
2022-09-20 14:31   ` Jason Gunthorpe
2022-09-16  3:35 ` [PATCH v2 03/18] fsdax: Include unmapped inodes for page-idle detection Dan Williams
2022-09-16  3:35 ` [PATCH v2 04/18] ext4: Add ext4_break_layouts() to the inode eviction path Dan Williams
2022-09-16  3:35 ` [PATCH v2 05/18] xfs: Add xfs_break_layouts() " Dan Williams
2022-09-18 22:57   ` Dave Chinner
2022-09-19 16:11     ` Dan Williams
2022-09-19 21:29       ` Dave Chinner
2022-09-20 16:44         ` Dan Williams
2022-09-21 22:14           ` Dave Chinner
2022-09-21 22:28             ` Jason Gunthorpe
2022-09-23  0:18               ` Dave Chinner
2022-09-23  0:41                 ` Dan Williams
2022-09-23  2:10                   ` Dave Chinner
2022-09-23  9:38                     ` Jan Kara
2022-09-23 23:06                       ` Dan Williams
2022-09-25 23:54                       ` Dave Chinner
2022-09-26 14:10                         ` Jan Kara
2022-09-29 23:33                           ` Dan Williams
2022-09-30 13:41                             ` Jan Kara
2022-09-30 17:56                               ` Dan Williams
2022-09-30 18:06                                 ` Jason Gunthorpe
2022-09-30 18:46                                   ` Dan Williams
2022-10-03  7:55                                   ` Jan Kara
2022-09-23 12:39                     ` Jason Gunthorpe
2022-09-26  0:34                       ` Dave Chinner
2022-09-26 13:04                         ` Jason Gunthorpe
2022-09-22  0:02             ` Dan Williams
2022-09-22  0:10               ` Jason Gunthorpe
2022-09-16  3:35 ` [PATCH v2 06/18] fsdax: Rework dax_layout_busy_page() to dax_zap_mappings() Dan Williams
2022-09-16  3:35 ` [PATCH v2 07/18] fsdax: Update dax_insert_entry() calling convention to return an error Dan Williams
2022-09-16  3:35 ` [PATCH v2 08/18] fsdax: Cleanup dax_associate_entry() Dan Williams
2022-09-16  3:36 ` [PATCH v2 09/18] fsdax: Rework dax_insert_entry() calling convention Dan Williams
2022-09-16  3:36 ` [PATCH v2 10/18] fsdax: Manage pgmap references at entry insertion and deletion Dan Williams
2022-09-21 14:03   ` Jason Gunthorpe
2022-09-21 15:18     ` Dan Williams
2022-09-21 21:38       ` Dan Williams
2022-09-21 22:07         ` Jason Gunthorpe
2022-09-22  0:14           ` Dan Williams
2022-09-22  0:25             ` Jason Gunthorpe
2022-09-22  2:17               ` Dan Williams
2022-09-22 17:55                 ` Jason Gunthorpe
2022-09-22 21:54                   ` Dan Williams
2022-09-23  1:36                     ` Dave Chinner
2022-09-23  2:01                       ` Dan Williams [this message]
2022-09-23 13:24                     ` Jason Gunthorpe
2022-09-23 16:29                       ` Dan Williams
2022-09-23 17:42                         ` Jason Gunthorpe
2022-09-23 19:03                           ` Dan Williams
2022-09-23 19:23                             ` Jason Gunthorpe
2022-09-27  6:07                             ` Alistair Popple
2022-09-27 12:56                               ` Jason Gunthorpe
2022-09-16  3:36 ` [PATCH v2 11/18] devdax: Minor warning fixups Dan Williams
2022-09-16  3:36 ` [PATCH v2 12/18] devdax: Move address_space helpers to the DAX core Dan Williams
2022-09-27  6:20   ` Alistair Popple
2022-09-29 22:38     ` Dan Williams
2022-09-16  3:36 ` [PATCH v2 13/18] dax: Prep mapping helpers for compound pages Dan Williams
2022-09-21 14:06   ` Jason Gunthorpe
2022-09-21 15:19     ` Dan Williams
2022-09-16  3:36 ` [PATCH v2 14/18] devdax: add PUD support to the DAX mapping infrastructure Dan Williams
2022-09-16  3:36 ` [PATCH v2 15/18] devdax: Use dax_insert_entry() + dax_delete_mapping_entry() Dan Williams
2022-09-21 14:10   ` Jason Gunthorpe
2022-09-21 15:48     ` Dan Williams
2022-09-21 22:23       ` Jason Gunthorpe
2022-09-22  0:15         ` Dan Williams
2022-09-16  3:36 ` [PATCH v2 16/18] mm/memremap_pages: Support initializing pages to a zero reference count Dan Williams
2022-09-21 15:24   ` Jason Gunthorpe
2022-09-21 23:45     ` Dan Williams
2022-09-22  0:03       ` Alistair Popple
2022-09-22  0:04       ` Jason Gunthorpe
2022-09-22  0:34         ` Dan Williams
2022-09-22  1:36           ` Alistair Popple
2022-09-22  2:34             ` Dan Williams
2022-09-26  6:17               ` Alistair Popple
2022-09-22  0:13       ` John Hubbard
2022-09-16  3:36 ` [PATCH v2 17/18] fsdax: Delete put_devmap_managed_page_refs() Dan Williams
2022-09-16  3:36 ` [PATCH v2 18/18] mm/gup: Drop DAX pgmap accounting Dan Williams
2022-09-20 14:29 ` [PATCH v2 00/18] Fix the DAX-gup mistake Jason Gunthorpe
2022-09-20 16:50   ` Dan Williams
2022-11-09  0:20 ` Andrew Morton
2022-11-09 11:38   ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=632d13949e113_4a6742947c@dwillia2-xfh.jf.intel.com.notmuch \
    --to=dan.j.williams@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=jgg@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=nvdimm@lists.linux.dev \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.