linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@ziepe.ca>
To: Logan Gunthorpe <logang@deltatee.com>
Cc: linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org,
	linux-block@vger.kernel.org, linux-pci@vger.kernel.org,
	linux-mm@kvack.org, "Christoph Hellwig" <hch@lst.de>,
	"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	"Dan Williams" <dan.j.williams@intel.com>,
	"Christian König" <christian.koenig@amd.com>,
	"John Hubbard" <jhubbard@nvidia.com>,
	"Don Dutile" <ddutile@redhat.com>,
	"Matthew Wilcox" <willy@infradead.org>,
	"Daniel Vetter" <daniel.vetter@ffwll.ch>,
	"Minturn Dave B" <dave.b.minturn@intel.com>,
	"Jason Ekstrand" <jason@jlekstrand.net>,
	"Dave Hansen" <dave.hansen@linux.intel.com>,
	"Xiong Jianxin" <jianxin.xiong@intel.com>,
	"Bjorn Helgaas" <helgaas@kernel.org>,
	"Ira Weiny" <ira.weiny@intel.com>,
	"Robin Murphy" <robin.murphy@arm.com>,
	"Martin Oliveira" <martin.oliveira@eideticom.com>,
	"Chaitanya Kulkarni" <ckulkarnilinux@gmail.com>,
	"Ralph Campbell" <rcampbell@nvidia.com>,
	"Stephen Bates" <sbates@raithlin.com>
Subject: Re: [PATCH v10 1/8] mm: introduce FOLL_PCI_P2PDMA to gate getting PCI P2PDMA pages
Date: Fri, 23 Sep 2022 16:53:27 -0300	[thread overview]
Message-ID: <Yy4Ot5MoOhsgYLTQ@ziepe.ca> (raw)
In-Reply-To: <64f8da81-7803-4db4-73da-a158295cbc9c@deltatee.com>

On Fri, Sep 23, 2022 at 01:08:31PM -0600, Logan Gunthorpe wrote:
> 
> 
> On 2022-09-23 12:13, Jason Gunthorpe wrote:
> > On Thu, Sep 22, 2022 at 10:39:19AM -0600, Logan Gunthorpe wrote:
> >> GUP Callers that expect PCI P2PDMA pages can now set FOLL_PCI_P2PDMA to
> >> allow obtaining P2PDMA pages. If GUP is called without the flag and a
> >> P2PDMA page is found, it will return an error.
> >>
> >> FOLL_PCI_P2PDMA cannot be set if FOLL_LONGTERM is set.
> > 
> > What is causing this? It is really troublesome, I would like to fix
> > it. eg I would like to have P2PDMA pages in VFIO iommu page tables and
> > in RDMA MR's - both require longterm.
> 
> You had said it was required if we were relying on unmap_mapping_range()...

Ah.. Ok.  Dan and I have been talking about this a lot, and it turns
out the DAX approach of unmap_mapping_range() still has problems,
really the same problem as FOLL_LONGTERM:

https://lore.kernel.org/all/Yy2pC%2FupZNEkVmc5@nvidia.com/

ie nothing actually waits for the page refs to go to zero during
memunmap_pages(). (indeed they are not actually zero because currently
they are instantly reset to 1 if they become zero)

The current design requires that the pgmap user hold the pgmap_ref in
a way that it remains elevated until page_free() is called for every
page that was ever used.

I'm encouraging Dan to work on better infrastructure in pgmap core
because every pgmap implementation has this issue currently.

For that reason it is probably not so relavent to this series.

Perhaps just clarify in the commit message that the FOLL_LONGTERM
restriction is to copy DAX until the pgmap page refcounts are fixed.

> > Is it just because ZONE_DEVICE was created for DAX and carried that
> > revocable assumption over? Does anything in your series require
> > revocable?
> 
> We still rely on unmap_mapping_range() indirectly in the unbind
> path. So I expect if something takes a LONGERM mapping that would
> block until whatever process holds the pin releases it. That's less
> than ideal and I'm not sure what can be done about it.

We could improve the blocking with some kind of FOLL_LONGTERM notifier
thingy eg after the unmap_mapping_rage() broadcast that a range of
PFNs is going away and FOLL_LONGTERM users can do a revoke if they
support it. It is a rare enough we don't necessarily need to optimize
this alot, and blocking unbind until some FDs close is annoying not
critical.. (eg you already can't unmount a filesystem to unbind the
device on the nvme while FS FDs are open)

Jason

  reply	other threads:[~2022-09-23 19:53 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-22 16:39 [PATCH v10 0/8] Userspace P2PDMA with O_DIRECT NVMe devices Logan Gunthorpe
2022-09-22 16:39 ` [PATCH v10 1/8] mm: introduce FOLL_PCI_P2PDMA to gate getting PCI P2PDMA pages Logan Gunthorpe
2022-09-23 18:13   ` Jason Gunthorpe
2022-09-23 19:08     ` Logan Gunthorpe
2022-09-23 19:53       ` Jason Gunthorpe [this message]
2022-09-23 20:11         ` Logan Gunthorpe
2022-09-23 22:58           ` Jason Gunthorpe
2022-09-23 23:01             ` Logan Gunthorpe
2022-09-23 23:07               ` Jason Gunthorpe
2022-09-23 23:14                 ` Logan Gunthorpe
2022-09-23 23:21                   ` Jason Gunthorpe
2022-09-23 23:35                     ` Logan Gunthorpe
2022-09-23 23:51                     ` Logan Gunthorpe
2022-09-26 22:57                       ` Jason Gunthorpe
2022-09-28 21:38                         ` Logan Gunthorpe
2022-09-22 16:39 ` [PATCH v10 2/8] iov_iter: introduce iov_iter_get_pages_[alloc_]flags() Logan Gunthorpe
2022-09-22 16:39 ` [PATCH v10 3/8] block: add check when merging zone device pages Logan Gunthorpe
2022-09-22 16:39 ` [PATCH v10 4/8] lib/scatterlist: " Logan Gunthorpe
2022-09-22 16:39 ` [PATCH v10 5/8] block: set FOLL_PCI_P2PDMA in __bio_iov_iter_get_pages() Logan Gunthorpe
2022-09-22 16:39 ` [PATCH v10 6/8] block: set FOLL_PCI_P2PDMA in bio_map_user_iov() Logan Gunthorpe
2022-09-22 16:39 ` [PATCH v10 7/8] PCI/P2PDMA: Allow userspace VMA allocations through sysfs Logan Gunthorpe
2022-09-22 18:27   ` Bjorn Helgaas
2022-09-23  8:15   ` Greg Kroah-Hartman
2022-09-22 16:39 ` [PATCH v10 8/8] ABI: sysfs-bus-pci: add documentation for p2pmem allocate Logan Gunthorpe
2022-09-23  8:15   ` Greg Kroah-Hartman
2022-09-23  6:01 ` [PATCH v10 0/8] Userspace P2PDMA with O_DIRECT NVMe devices Christoph Hellwig
2022-09-23 15:25   ` Logan Gunthorpe
2022-09-23  8:16 ` Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yy4Ot5MoOhsgYLTQ@ziepe.ca \
    --to=jgg@ziepe.ca \
    --cc=christian.koenig@amd.com \
    --cc=ckulkarnilinux@gmail.com \
    --cc=dan.j.williams@intel.com \
    --cc=daniel.vetter@ffwll.ch \
    --cc=dave.b.minturn@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=ddutile@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hch@lst.de \
    --cc=helgaas@kernel.org \
    --cc=ira.weiny@intel.com \
    --cc=jason@jlekstrand.net \
    --cc=jhubbard@nvidia.com \
    --cc=jianxin.xiong@intel.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=logang@deltatee.com \
    --cc=martin.oliveira@eideticom.com \
    --cc=rcampbell@nvidia.com \
    --cc=robin.murphy@arm.com \
    --cc=sbates@raithlin.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).