linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Christian König" <christian.koenig@amd.com>
To: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Logan Gunthorpe <logang@deltatee.com>,
	Alistair Popple <apopple@nvidia.com>,
	Felix Kuehling <Felix.Kuehling@amd.com>,
	Christoph Hellwig <hch@lst.de>,
	Dan Williams <dan.j.williams@intel.com>,
	linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org,
	linux-block@vger.kernel.org, linux-pci@vger.kernel.org,
	linux-mm@kvack.org, iommu@lists.linux-foundation.org,
	Stephen Bates <sbates@raithlin.com>,
	John Hubbard <jhubbard@nvidia.com>,
	Don Dutile <ddutile@redhat.com>,
	Matthew Wilcox <willy@infradead.org>,
	Daniel Vetter <daniel.vetter@ffwll.ch>,
	Jakowski Andrzej <andrzej.jakowski@intel.com>,
	Minturn Dave B <dave.b.minturn@intel.com>,
	Jason Ekstrand <jason@jlekstrand.net>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Xiong Jianxin <jianxin.xiong@intel.com>,
	Bjorn Helgaas <helgaas@kernel.org>,
	Ira Weiny <ira.weiny@intel.com>,
	Robin Murphy <robin.murphy@arm.com>,
	Martin Oliveira <martin.oliveira@eideticom.com>,
	Chaitanya Kulkarni <ckulkarnilinux@gmail.com>
Subject: Re: [PATCH v3 19/20] PCI/P2PDMA: introduce pci_mmap_p2pmem()
Date: Mon, 4 Oct 2021 15:22:22 +0200	[thread overview]
Message-ID: <1e219386-7547-4f42-d090-2afd62a268d7@amd.com> (raw)
In-Reply-To: <20211004131102.GU3544071@ziepe.ca>

Am 04.10.21 um 15:11 schrieb Jason Gunthorpe:
> On Mon, Oct 04, 2021 at 08:58:35AM +0200, Christian König wrote:
>> I'm not following this discussion to closely, but try to look into it from
>> time to time.
>>
>> Am 01.10.21 um 19:45 schrieb Jason Gunthorpe:
>>> On Fri, Oct 01, 2021 at 11:01:49AM -0600, Logan Gunthorpe wrote:
>>>
>>>> In device-dax, the refcount is only used to prevent the device, and
>>>> therefore the pages, from going away on device unbind. Pages cannot be
>>>> recycled, as you say, as they are mapped linearly within the device. The
>>>> address space invalidation is done only when the device is unbound.
>>> By address space invalidation I mean invalidation of the VMA that is
>>> pointing to those pages.
>>>
>>> device-dax may not have a issue with use-after-VMA-invalidation by
>>> it's very nature since every PFN always points to the same
>>> thing. fsdax and this p2p stuff are different though.
>>>
>>>> Before the invalidation, an active flag is cleared to ensure no new
>>>> mappings can be created while the unmap is proceeding.
>>>> unmap_mapping_range() should sequence itself with the TLB flush and
>>> AFIAK unmap_mapping_range() kicks off the TLB flush and then
>>> returns. It doesn't always wait for the flush to fully finish. Ie some
>>> cases use RCU to lock the page table against GUP fast and so the
>>> put_page() doesn't happen until the call_rcu completes - after a grace
>>> period. The unmap_mapping_range() does not wait for grace periods.
>> Wow, wait a second. That is quite a boomer. At least in all GEM/TTM based
>> graphics drivers that could potentially cause a lot of trouble.
>>
>> I've just double checked and we certainly have the assumption that when
>> unmap_mapping_range() returns the pte is gone and the TLB flush completed in
>> quite a number of places.
>>
>> Do you have more information when and why that can happen?
> There are two things to keep in mind, flushing the PTEs from the HW
> and serializing against gup_fast.
>
> If you start at unmap_mapping_range() the page is eventually
> discovered in zap_pte_range() and the PTE cleared. It is then passed
> into __tlb_remove_page() which puts it on the batch->pages list
>
> The page free happens in tlb_batch_pages_flush() via
> free_pages_and_swap_cache()
>
> The tlb_batch_pages_flush() happens via zap_page_range() ->
> tlb_finish_mmu(), presumably after the HW has wiped the TLB's on all
> CPUs. On x86 this is done with an IPI and also serializes gup fast, so
> OK
>
> The interesting case is CONFIG_MMU_GATHER_RCU_TABLE_FREE which doesn't
> rely on IPIs anymore to synchronize with gup-fast.
>
> In this configuration it means when unmap_mapping_range() returns the
> TLB will have been flushed, but no serialization with GUP fast was
> done.
>
> This is OK if the GUP fast cannot return the page at all. I assume
> this generally describes the DRM caes?

Yes, exactly that. GUP is completely forbidden for such mappings.

But what about accesses by other CPUs? In other words our use case is 
like the following:

1. We found that we need exclusive access to the higher level object a 
page belongs to.

2. The lock of the higher level object is taken. The lock is also taken 
in the fault handler for the VMA which inserts the PTE in the first place.

3. unmap_mapping_range() for the range of the object is called, the 
expectation is that when that function returns only the kernel can have 
a mapping of the pages backing the object.

4. The kernel has exclusive access to the pages and we know that 
userspace can't mess with them any more.

That use case is completely unrelated to GUP and when this doesn't work 
we have quite a problem.

I should probably note that we recently switched from VM_MIXEDMAP to 
using VM_PFNMAP because the former didn't prevented GUP on all 
architectures.

Christian.

> However, if the GUP fast can return the page then something,
> somewhere, needs to serialize the page free with the RCU as the GUP
> fast can be observing the old PTE before it was zap'd until the RCU
> grace expires.
>
> Relying on the page ref being !0 to protect GUP fast is not safe
> because the page ref can be incr'd immediately upon page re-use.
>
> Interestingly I looked around for this on PPC and I only found RCU
> delayed freeing of the page table level, not RCU delayed freeing of
> pages themselves.. I wonder if it was missed?
>
> There is a path on PPC (tlb_remove_table_sync_one) that triggers an
> IPI but it looks like an exception, and we wouldn't need the RCU at
> all if we used IPI to serialize GUP fast...
>
> It makes logical sense if the RCU also frees the pages on
> CONFIG_MMU_GATHER_RCU_TABLE_FREE so anything returnable by GUP fast
> must be refcounted and freed by tlb_batch_pages_flush(), not by the
> caller of unmap_mapping_range().
>
> If we expect to allow the caller of unmap_mapping_range() to free then
> CONFIG_MMU_GATHER_RCU_TABLE_FREE can't really exist, we always need to
> trigger a serializing IPI during tlb_batch_pages_flush()
>
> AFAICT, at least
>
> Jason


  reply	other threads:[~2021-10-04 13:24 UTC|newest]

Thread overview: 87+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-16 23:40 [PATCH v3 00/20] Userspace P2PDMA with O_DIRECT NVMe devices Logan Gunthorpe
2021-09-16 23:40 ` [PATCH v3 01/20] lib/scatterlist: add flag for indicating P2PDMA segments in an SGL Logan Gunthorpe
2021-09-28 18:32   ` Jason Gunthorpe
2021-09-29 21:15     ` Logan Gunthorpe
2021-09-30  4:47   ` Chaitanya Kulkarni
2021-09-30 16:49     ` Logan Gunthorpe
2021-09-30  4:57   ` Chaitanya Kulkarni
2021-09-16 23:40 ` [PATCH v3 02/20] PCI/P2PDMA: attempt to set map_type if it has not been set Logan Gunthorpe
2021-09-27 18:50   ` Bjorn Helgaas
2021-09-16 23:40 ` [PATCH v3 03/20] PCI/P2PDMA: make pci_p2pdma_map_type() non-static Logan Gunthorpe
2021-09-27 18:46   ` Bjorn Helgaas
2021-09-28 18:48   ` Jason Gunthorpe
2021-09-16 23:40 ` [PATCH v3 04/20] PCI/P2PDMA: introduce helpers for dma_map_sg implementations Logan Gunthorpe
2021-09-27 18:53   ` Bjorn Helgaas
2021-09-27 19:59     ` Logan Gunthorpe
2021-09-28 18:55   ` Jason Gunthorpe
2021-09-29 21:26     ` Logan Gunthorpe
2021-09-28 22:05   ` [PATCH v3 4/20] " Jason Gunthorpe
2021-09-29 21:30     ` Logan Gunthorpe
2021-09-29 22:46       ` Jason Gunthorpe
2021-09-29 23:00         ` Logan Gunthorpe
2021-09-29 23:40           ` Jason Gunthorpe
2021-09-16 23:40 ` [PATCH v3 05/20] dma-mapping: allow EREMOTEIO return code for P2PDMA transfers Logan Gunthorpe
2021-09-28 18:57   ` Jason Gunthorpe
2021-09-16 23:40 ` [PATCH v3 06/20] dma-direct: support PCI P2PDMA pages in dma-direct map_sg Logan Gunthorpe
2021-09-28 19:08   ` Jason Gunthorpe
2021-09-16 23:40 ` [PATCH v3 07/20] dma-mapping: add flags to dma_map_ops to indicate PCI P2PDMA support Logan Gunthorpe
2021-09-28 19:11   ` Jason Gunthorpe
2021-09-16 23:40 ` [PATCH v3 08/20] iommu/dma: support PCI P2PDMA pages in dma-iommu map_sg Logan Gunthorpe
2021-09-28 19:15   ` Jason Gunthorpe
2021-09-16 23:40 ` [PATCH v3 09/20] nvme-pci: check DMA ops when indicating support for PCI P2PDMA Logan Gunthorpe
2021-09-30  5:06   ` Chaitanya Kulkarni
2021-09-30 16:51     ` Logan Gunthorpe
2021-09-30 17:19       ` Chaitanya Kulkarni
2021-09-16 23:40 ` [PATCH v3 10/20] nvme-pci: convert to using dma_map_sgtable() Logan Gunthorpe
2021-10-05 22:29   ` Max Gurtovoy
2021-09-16 23:40 ` [PATCH v3 11/20] RDMA/core: introduce ib_dma_pci_p2p_dma_supported() Logan Gunthorpe
2021-09-28 19:17   ` Jason Gunthorpe
2021-10-05 22:31   ` Max Gurtovoy
2021-09-16 23:40 ` [PATCH v3 12/20] RDMA/rw: use dma_map_sgtable() Logan Gunthorpe
2021-09-28 19:43   ` Jason Gunthorpe
2021-09-29 22:56     ` Logan Gunthorpe
2021-10-05 22:40     ` Max Gurtovoy
2021-09-16 23:40 ` [PATCH v3 13/20] PCI/P2PDMA: remove pci_p2pdma_[un]map_sg() Logan Gunthorpe
2021-09-27 18:50   ` Bjorn Helgaas
2021-09-28 19:43   ` Jason Gunthorpe
2021-10-05 22:42   ` Max Gurtovoy
2021-09-16 23:40 ` [PATCH v3 14/20] mm: introduce FOLL_PCI_P2PDMA to gate getting PCI P2PDMA pages Logan Gunthorpe
2021-09-28 19:47   ` Jason Gunthorpe
2021-09-29 21:34     ` Logan Gunthorpe
2021-09-29 22:48       ` Jason Gunthorpe
2021-09-16 23:40 ` [PATCH v3 15/20] iov_iter: introduce iov_iter_get_pages_[alloc_]flags() Logan Gunthorpe
2021-09-16 23:40 ` [PATCH v3 16/20] block: set FOLL_PCI_P2PDMA in __bio_iov_iter_get_pages() Logan Gunthorpe
2021-09-16 23:40 ` [PATCH v3 17/20] block: set FOLL_PCI_P2PDMA in bio_map_user_iov() Logan Gunthorpe
2021-09-16 23:40 ` [PATCH v3 18/20] mm: use custom page_free for P2PDMA pages Logan Gunthorpe
2021-09-16 23:40 ` [PATCH v3 19/20] PCI/P2PDMA: introduce pci_mmap_p2pmem() Logan Gunthorpe
2021-09-27 18:49   ` Bjorn Helgaas
2021-09-28 19:55   ` Jason Gunthorpe
2021-09-29 21:42     ` Logan Gunthorpe
2021-09-29 23:05       ` Jason Gunthorpe
2021-09-29 23:27         ` Logan Gunthorpe
2021-09-29 23:35           ` Jason Gunthorpe
2021-09-29 23:49             ` Logan Gunthorpe
2021-09-30  0:36               ` Jason Gunthorpe
2021-10-01 13:48                 ` Jason Gunthorpe
2021-10-01 17:01                   ` Logan Gunthorpe
2021-10-01 17:45                     ` Jason Gunthorpe
2021-10-01 20:13                       ` Logan Gunthorpe
2021-10-01 22:14                         ` Jason Gunthorpe
2021-10-01 22:22                           ` Logan Gunthorpe
2021-10-01 22:46                             ` Jason Gunthorpe
2021-10-01 23:27                               ` John Hubbard
2021-10-01 23:34                               ` Logan Gunthorpe
2021-10-04  6:58                       ` Christian König
2021-10-04 13:11                         ` Jason Gunthorpe
2021-10-04 13:22                           ` Christian König [this message]
2021-10-04 13:27                             ` Jason Gunthorpe
2021-10-04 14:54                               ` Christian König
2021-09-28 20:05   ` Jason Gunthorpe
2021-09-29 21:46     ` Logan Gunthorpe
2021-09-16 23:41 ` [PATCH v3 20/20] nvme-pci: allow mmaping the CMB in userspace Logan Gunthorpe
2021-09-28 20:02 ` [PATCH v3 00/20] Userspace P2PDMA with O_DIRECT NVMe devices Jason Gunthorpe
2021-09-29 21:50   ` Logan Gunthorpe
2021-09-29 23:21     ` Jason Gunthorpe
2021-09-29 23:28       ` Logan Gunthorpe
2021-09-29 23:36         ` Jason Gunthorpe
2021-09-29 23:52           ` Logan Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1e219386-7547-4f42-d090-2afd62a268d7@amd.com \
    --to=christian.koenig@amd.com \
    --cc=Felix.Kuehling@amd.com \
    --cc=andrzej.jakowski@intel.com \
    --cc=apopple@nvidia.com \
    --cc=ckulkarnilinux@gmail.com \
    --cc=dan.j.williams@intel.com \
    --cc=daniel.vetter@ffwll.ch \
    --cc=dave.b.minturn@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=ddutile@redhat.com \
    --cc=hch@lst.de \
    --cc=helgaas@kernel.org \
    --cc=iommu@lists.linux-foundation.org \
    --cc=ira.weiny@intel.com \
    --cc=jason@jlekstrand.net \
    --cc=jgg@ziepe.ca \
    --cc=jhubbard@nvidia.com \
    --cc=jianxin.xiong@intel.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=logang@deltatee.com \
    --cc=martin.oliveira@eideticom.com \
    --cc=robin.murphy@arm.com \
    --cc=sbates@raithlin.com \
    --cc=willy@infradead.org \
    --subject='Re: [PATCH v3 19/20] PCI/P2PDMA: introduce pci_mmap_p2pmem()' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).