dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: "Leon Romanovsky" <leon@kernel.org>,
	kvm@vger.kernel.org, linux-rdma@vger.kernel.org,
	"Daniel Vetter" <daniel.vetter@ffwll.ch>,
	"Oded Gabbay" <ogabbay@kernel.org>,
	"Cornelia Huck" <cohuck@redhat.com>,
	dri-devel@lists.freedesktop.org,
	"Sumit Semwal" <sumit.semwal@linaro.org>,
	linaro-mm-sig@lists.linaro.org,
	"Alex Williamson" <alex.williamson@redhat.com>,
	"Dan Williams" <dan.j.williams@intel.com>,
	"Maor Gottlieb" <maorg@nvidia.com>,
	"Christian König" <christian.koenig@amd.com>,
	linux-media@vger.kernel.org
Subject: Re: [PATCH v2 4/4] vfio/pci: Allow MMIO regions to be exported through dma-buf
Date: Wed, 7 Sep 2022 13:12:52 -0300	[thread overview]
Message-ID: <YxjDBOIavc79ZByZ@nvidia.com> (raw)
In-Reply-To: <Yxi5h09JAzIo4Kh8@infradead.org>

On Wed, Sep 07, 2022 at 08:32:23AM -0700, Christoph Hellwig wrote:
> On Wed, Sep 07, 2022 at 12:23:28PM -0300, Jason Gunthorpe wrote:
> >  2) DMABUF abuses dma_map_resource() for P2P and thus doesn't work in
> >     certain special cases.
> 
> Not just certain special cases, but one of the main use cases.
> Basically P2P can happen in two ways:
>
>  a) through a PCIe switch, or
>  b) through connected root ports

Yes, we tested both, both work.

> The open code version here only supports a), only supports it when there
> is no offset between the 'phyiscal' address of the BAR seen PCIe
> endpoint and the Linux way.  x86 usually (always?) doesn't have an
> offset there, but other architectures often do.

The PCI offset is some embedded thing - I've never seen it in a server
platform.

Frankly, it is just bad SOC design and there is good reason why
non-zero needs to be avoided. As soon as you create aliases between
the address spaces you invite trouble. IIRC a SOC I used once put the
memory at 0 -> 4G then put the only PCI aperture at 4g ->
4g+N. However this design requires 64 bit PCI support, which at the
time, the platform didn't have. So they used PCI offset to hackily
alias the aperture over the DDR. I don't remember if they threw out a
bit of DDR to resolve the alias, or if they just didn't support PCI
switches.

In any case, it is a complete mess. You either drastically limit your
BAR size, don't support PCI switches or loose a lot of DDR.

I also seem to remember that iommu and PCI offset don't play nice
together - so for the VFIO use case where the iommu is present I'm
pretty sure we can very safely assume 0 offset. That seems confirmed
by the fact that VFIO has never handled PCI offset in its own P2P path
and P2P works fine in VMs across a wide range of platforms.

That said, I agree we should really have APIs that support this
properly, and dma_map_resource is certainly technically wrong.

So, would you be OK with this series if I try to make a dma_map_p2p()
that resolves the offset issue?

> Last but not least I don't really see how the code would even work
> when an IOMMU is used, as dma_map_resource will return an IOVA that
> is only understood by the IOMMU itself, and not the other endpoint.

I don't understand this.

__iommu_dma_map() will put the given phys into the iommu_domain
associated with 'dev' and return the IOVA it picked.

Here 'dev' is the importing device, it is the device that will issue
the DMA:

+       dma_addr = dma_map_resource(
+               attachment->dev,
+               pci_resource_start(priv->vdev->pdev, priv->index) +
+                       priv->offset,
+               priv->dmabuf->size, dir, DMA_ATTR_SKIP_CPU_SYNC);

eg attachment->dev is the PCI device of the RDMA device, not the VFIO
device.

'phys' is the CPU physical of the PCI BAR page, which with 0 PCI
offset is the right thing to program into the IO page table.

> How was this code even tested?

It was tested on a few platforms, like I said above, the cases where
it doesn't work are special, largely embedded, and not anything we
have in our labs - AFAIK.

Jason

  parent reply	other threads:[~2022-09-07 16:12 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-31 23:12 [PATCH v2 0/4] Allow MMIO regions to be exported through dma-buf Jason Gunthorpe
2022-08-31 23:12 ` [PATCH v2 1/4] dma-buf: Add dma_buf_try_get() Jason Gunthorpe
2022-09-01  7:55   ` Christian König
2022-09-06 16:44     ` Jason Gunthorpe
2022-09-06 17:52       ` Christian König
2022-08-31 23:12 ` [PATCH v2 2/4] vfio: Add vfio_device_get() Jason Gunthorpe
2022-08-31 23:12 ` [PATCH v2 3/4] vfio_pci: Do not open code pci_try_reset_function() Jason Gunthorpe
2022-08-31 23:12 ` [PATCH v2 4/4] vfio/pci: Allow MMIO regions to be exported through dma-buf Jason Gunthorpe
     [not found]   ` <YxcYGzPv022G2vLm@infradead.org>
2022-09-06 10:38     ` Christian König
2022-09-06 11:48       ` Jason Gunthorpe
2022-09-06 12:34         ` Oded Gabbay
2022-09-06 17:59           ` Jason Gunthorpe
2022-09-06 19:44             ` Oded Gabbay
     [not found]         ` <YxiJJYtWgh1l0wxg@infradead.org>
2022-09-07 12:33           ` Jason Gunthorpe
     [not found]             ` <Yxiq5sjf/qA7xS8A@infradead.org>
2022-09-07 14:46               ` Oded Gabbay
2022-09-07 15:23               ` Jason Gunthorpe
     [not found]                 ` <Yxi5h09JAzIo4Kh8@infradead.org>
2022-09-07 16:12                   ` Jason Gunthorpe [this message]
     [not found]                     ` <Yxs+k6psNfBLDqdv@infradead.org>
2022-09-09 14:09                       ` Jason Gunthorpe
2022-09-07 16:31                 ` Robin Murphy
2022-09-07 16:47                   ` Jason Gunthorpe
2022-09-07 17:03               ` Dan Williams
2022-09-07 15:01             ` Robin Murphy
     [not found]       ` <YxiIkh/yeWQkZ54x@infradead.org>
2022-09-07 15:08         ` Christian König

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YxjDBOIavc79ZByZ@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=alex.williamson@redhat.com \
    --cc=christian.koenig@amd.com \
    --cc=cohuck@redhat.com \
    --cc=dan.j.williams@intel.com \
    --cc=daniel.vetter@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=hch@infradead.org \
    --cc=kvm@vger.kernel.org \
    --cc=leon@kernel.org \
    --cc=linaro-mm-sig@lists.linaro.org \
    --cc=linux-media@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=maorg@nvidia.com \
    --cc=ogabbay@kernel.org \
    --cc=sumit.semwal@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).