All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Jason Gunthorpe <jgg@nvidia.com>,
	Dan Williams <dan.j.williams@intel.com>
Cc: Alistair Popple <apopple@nvidia.com>, <linux-mm@kvack.org>,
	Matthew Wilcox <willy@infradead.org>, Jan Kara <jack@suse.cz>,
	"Darrick J. Wong" <djwong@kernel.org>,
	Christoph Hellwig <hch@lst.de>,
	John Hubbard <jhubbard@nvidia.com>, <david@fromorbit.com>,
	<nvdimm@lists.linux.dev>, <akpm@linux-foundation.org>,
	<linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH v3 10/25] fsdax: Introduce pgmap_request_folios()
Date: Mon, 17 Oct 2022 13:51:18 -0700	[thread overview]
Message-ID: <634dc046bada2_4da329484@dwillia2-xfh.jf.intel.com.notmuch> (raw)
In-Reply-To: <Y0229P6z0E9Niw+9@nvidia.com>

Jason Gunthorpe wrote:
> On Mon, Oct 17, 2022 at 01:06:25PM -0700, Dan Williams wrote:
> > Alistair Popple wrote:
> > > 
> > > Dan Williams <dan.j.williams@intel.com> writes:
> > > 
> > > [...]
> > > 
> > > > +/**
> > > > + * pgmap_request_folios - activate an contiguous span of folios in @pgmap
> > > > + * @pgmap: host page map for the folio array
> > > > + * @folio: start of the folio list, all subsequent folios have same folio_size()
> > > > + *
> > > > + * Caller is responsible for @pgmap remaining live for the duration of
> > > > + * this call. Caller is also responsible for not racing requests for the
> > > > + * same folios.
> > > > + */
> > > > +bool pgmap_request_folios(struct dev_pagemap *pgmap, struct folio *folio,
> > > > +			  int nr_folios)
> > > 
> > > All callers of pgmap_request_folios() I could see call this with
> > > nr_folios == 1 and pgmap == folio_pgmap(folio). Could we remove the
> > > redundant arguments and simplify it to
> > > pgmap_request_folios(struct folio *folio)?
> > 
> > The rationale for the @pgmap argument being separate is to make clear
> > that the caller must assert that pgmap is alive before passing it to
> > pgmap_request_folios(), and that @pgmap is constant for the span of
> > folios requested.
> 
> The api is kind of weird to take in a folio - it should take in the
> offset in bytes from the start of the pgmap and "create" usable
> non-free folios.

Plumbing that is non-trivial. Recall that in the DAX case the filesystem
can be sitting on top of a stack of device-mapper-dax devices routing to
more than one pgmap, and the pgmap can be made up of discontiguous
ranges. The filesystem knows the offset into the dax-device and then
asks that device to translate that to a pfn. From pfn the optional (see
CONFIG_FS_DAX_LIMITED) pgmap can be retrieved. Once DAX knows what pfn
it wants, while holding a lock that prevents the pgmap from entering its
dying state, it can request to pin that pfn.

I arrived at the interface being folio based because the order of the
pin(s) also needs to be conveyed and the order should be consistent.

> A good design is that nothing has a struct folio * unless it can also
> prove it has a non-zero refcount on it, and that is simply impossible
> for any caller at this point.

I agree that's a good design, and I think it is a bug to make this
request on anything but a zero-refcount folio, but plumbing pgmap
offsets to replace pfn_to_page() would require dax_direct_access() to
shift from a pfn lookup to a pgmap + offset lookup.

CONFIG_FS_DAX_LIMITED is something that needs to be deleted, but for now
it supports DAX without a pgmap, where "limited" is just meant for XIP
(eXecute-In-Place), no GUP support. Once that goes then
dax_direct_access() could evolve into an interface that assumes a pgmap
is present. Until then I think pgmap_request_folios() at least puts out
the immediate fire of making people contend with oddly refcounted DAX
pages.

  reply	other threads:[~2022-10-17 20:51 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-14 23:56 [PATCH v3 00/25] Fix the DAX-gup mistake Dan Williams
2022-10-14 23:57 ` [PATCH v3 01/25] fsdax: Wait on @page not @page->_refcount Dan Williams
2022-10-14 23:57 ` [PATCH v3 02/25] fsdax: Use dax_page_idle() to document DAX busy page checking Dan Williams
2022-10-14 23:57 ` [PATCH v3 03/25] fsdax: Include unmapped inodes for page-idle detection Dan Williams
2022-10-14 23:57 ` [PATCH v3 04/25] fsdax: Introduce dax_zap_mappings() Dan Williams
2022-11-02 13:04   ` Aneesh Kumar K.V
2022-10-14 23:57 ` [PATCH v3 05/25] fsdax: Wait for pinned pages during truncate_inode_pages_final() Dan Williams
2022-10-14 23:57 ` [PATCH v3 06/25] fsdax: Validate DAX layouts broken before truncate Dan Williams
2022-10-14 23:57 ` [PATCH v3 07/25] fsdax: Hold dax lock over mapping insertion Dan Williams
2022-10-17 19:31   ` Jason Gunthorpe
2022-10-17 20:17     ` Dan Williams
2022-10-18  5:26       ` Christoph Hellwig
2022-10-18 17:30         ` Dan Williams
2022-10-14 23:57 ` [PATCH v3 08/25] fsdax: Update dax_insert_entry() calling convention to return an error Dan Williams
2022-10-14 23:57 ` [PATCH v3 09/25] fsdax: Rework for_each_mapped_pfn() to dax_for_each_folio() Dan Williams
2022-10-14 23:57 ` [PATCH v3 10/25] fsdax: Introduce pgmap_request_folios() Dan Williams
2022-10-17  6:31   ` Alistair Popple
2022-10-17 20:06     ` Dan Williams
2022-10-17 20:11       ` Jason Gunthorpe
2022-10-17 20:51         ` Dan Williams [this message]
2022-10-17 23:57           ` Jason Gunthorpe
2022-10-18  0:19             ` Dan Williams
2022-10-17 19:41   ` Jason Gunthorpe
2022-10-14 23:58 ` [PATCH v3 11/25] fsdax: Rework dax_insert_entry() calling convention Dan Williams
2022-10-14 23:58 ` [PATCH v3 12/25] fsdax: Cleanup dax_associate_entry() Dan Williams
2022-10-14 23:58 ` [PATCH v3 13/25] devdax: Minor warning fixups Dan Williams
2022-10-14 23:58 ` [PATCH v3 14/25] devdax: Fix sparse lock imbalance warning Dan Williams
2022-10-14 23:58 ` [PATCH v3 15/25] libnvdimm/pmem: Support pmem block devices without dax Dan Williams
2022-10-14 23:58 ` [PATCH v3 16/25] devdax: Move address_space helpers to the DAX core Dan Williams
2022-10-14 23:58 ` [PATCH v3 17/25] devdax: Sparse fixes for xarray locking Dan Williams
2022-10-14 23:58 ` [PATCH v3 18/25] devdax: Sparse fixes for vmfault_t / dax-entry conversions Dan Williams
2022-10-14 23:58 ` [PATCH v3 19/25] devdax: Sparse fixes for vm_fault_t in tracepoints Dan Williams
2022-10-14 23:58 ` [PATCH v3 20/25] devdax: add PUD support to the DAX mapping infrastructure Dan Williams
2022-10-14 23:59 ` [PATCH v3 21/25] devdax: Use dax_insert_entry() + dax_delete_mapping_entry() Dan Williams
2022-10-14 23:59 ` [PATCH v3 22/25] mm/memremap_pages: Replace zone_device_page_init() with pgmap_request_folios() Dan Williams
2022-10-17 19:17   ` Lyude Paul
2022-10-14 23:59 ` [PATCH v3 23/25] mm/memremap_pages: Initialize all ZONE_DEVICE pages to start at refcount 0 Dan Williams
2022-10-17  7:04   ` Alistair Popple
2022-10-17 19:48   ` Jason Gunthorpe
2022-10-14 23:59 ` [PATCH v3 24/25] mm/meremap_pages: Delete put_devmap_managed_page_refs() Dan Williams
2022-10-17  7:08   ` Alistair Popple
2022-10-14 23:59 ` [PATCH v3 25/25] mm/gup: Drop DAX pgmap accounting Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=634dc046bada2_4da329484@dwillia2-xfh.jf.intel.com.notmuch \
    --to=dan.j.williams@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=jgg@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nvdimm@lists.linux.dev \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.