From: Jason Gunthorpe <jgg@nvidia.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: "Leon Romanovsky" <leon@kernel.org>,
kvm@vger.kernel.org, linux-rdma@vger.kernel.org,
"Daniel Vetter" <daniel.vetter@ffwll.ch>,
"Oded Gabbay" <ogabbay@kernel.org>,
"Cornelia Huck" <cohuck@redhat.com>,
dri-devel@lists.freedesktop.org,
"Sumit Semwal" <sumit.semwal@linaro.org>,
linaro-mm-sig@lists.linaro.org,
"Alex Williamson" <alex.williamson@redhat.com>,
"Dan Williams" <dan.j.williams@intel.com>,
"Maor Gottlieb" <maorg@nvidia.com>,
"Christian König" <christian.koenig@amd.com>,
linux-media@vger.kernel.org
Subject: Re: [PATCH v2 4/4] vfio/pci: Allow MMIO regions to be exported through dma-buf
Date: Wed, 7 Sep 2022 13:12:52 -0300 [thread overview]
Message-ID: <YxjDBOIavc79ZByZ@nvidia.com> (raw)
In-Reply-To: <Yxi5h09JAzIo4Kh8@infradead.org>
On Wed, Sep 07, 2022 at 08:32:23AM -0700, Christoph Hellwig wrote:
> On Wed, Sep 07, 2022 at 12:23:28PM -0300, Jason Gunthorpe wrote:
> > 2) DMABUF abuses dma_map_resource() for P2P and thus doesn't work in
> > certain special cases.
>
> Not just certain special cases, but one of the main use cases.
> Basically P2P can happen in two ways:
>
> a) through a PCIe switch, or
> b) through connected root ports
Yes, we tested both, both work.
> The open code version here only supports a), only supports it when there
> is no offset between the 'phyiscal' address of the BAR seen PCIe
> endpoint and the Linux way. x86 usually (always?) doesn't have an
> offset there, but other architectures often do.
The PCI offset is some embedded thing - I've never seen it in a server
platform.
Frankly, it is just bad SOC design and there is good reason why
non-zero needs to be avoided. As soon as you create aliases between
the address spaces you invite trouble. IIRC a SOC I used once put the
memory at 0 -> 4G then put the only PCI aperture at 4g ->
4g+N. However this design requires 64 bit PCI support, which at the
time, the platform didn't have. So they used PCI offset to hackily
alias the aperture over the DDR. I don't remember if they threw out a
bit of DDR to resolve the alias, or if they just didn't support PCI
switches.
In any case, it is a complete mess. You either drastically limit your
BAR size, don't support PCI switches or loose a lot of DDR.
I also seem to remember that iommu and PCI offset don't play nice
together - so for the VFIO use case where the iommu is present I'm
pretty sure we can very safely assume 0 offset. That seems confirmed
by the fact that VFIO has never handled PCI offset in its own P2P path
and P2P works fine in VMs across a wide range of platforms.
That said, I agree we should really have APIs that support this
properly, and dma_map_resource is certainly technically wrong.
So, would you be OK with this series if I try to make a dma_map_p2p()
that resolves the offset issue?
> Last but not least I don't really see how the code would even work
> when an IOMMU is used, as dma_map_resource will return an IOVA that
> is only understood by the IOMMU itself, and not the other endpoint.
I don't understand this.
__iommu_dma_map() will put the given phys into the iommu_domain
associated with 'dev' and return the IOVA it picked.
Here 'dev' is the importing device, it is the device that will issue
the DMA:
+ dma_addr = dma_map_resource(
+ attachment->dev,
+ pci_resource_start(priv->vdev->pdev, priv->index) +
+ priv->offset,
+ priv->dmabuf->size, dir, DMA_ATTR_SKIP_CPU_SYNC);
eg attachment->dev is the PCI device of the RDMA device, not the VFIO
device.
'phys' is the CPU physical of the PCI BAR page, which with 0 PCI
offset is the right thing to program into the IO page table.
> How was this code even tested?
It was tested on a few platforms, like I said above, the cases where
it doesn't work are special, largely embedded, and not anything we
have in our labs - AFAIK.
Jason
next prev parent reply other threads:[~2022-09-07 16:12 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-31 23:12 [PATCH v2 0/4] Allow MMIO regions to be exported through dma-buf Jason Gunthorpe
2022-08-31 23:12 ` [PATCH v2 1/4] dma-buf: Add dma_buf_try_get() Jason Gunthorpe
2022-09-01 7:55 ` Christian König
2022-09-06 16:44 ` Jason Gunthorpe
2022-09-06 17:52 ` Christian König
2022-08-31 23:12 ` [PATCH v2 2/4] vfio: Add vfio_device_get() Jason Gunthorpe
2022-08-31 23:12 ` [PATCH v2 3/4] vfio_pci: Do not open code pci_try_reset_function() Jason Gunthorpe
2022-08-31 23:12 ` [PATCH v2 4/4] vfio/pci: Allow MMIO regions to be exported through dma-buf Jason Gunthorpe
[not found] ` <YxcYGzPv022G2vLm@infradead.org>
2022-09-06 10:38 ` Christian König
2022-09-06 11:48 ` Jason Gunthorpe
2022-09-06 12:34 ` Oded Gabbay
2022-09-06 17:59 ` Jason Gunthorpe
2022-09-06 19:44 ` Oded Gabbay
[not found] ` <YxiJJYtWgh1l0wxg@infradead.org>
2022-09-07 12:33 ` Jason Gunthorpe
[not found] ` <Yxiq5sjf/qA7xS8A@infradead.org>
2022-09-07 14:46 ` Oded Gabbay
2022-09-07 15:23 ` Jason Gunthorpe
[not found] ` <Yxi5h09JAzIo4Kh8@infradead.org>
2022-09-07 16:12 ` Jason Gunthorpe [this message]
[not found] ` <Yxs+k6psNfBLDqdv@infradead.org>
2022-09-09 14:09 ` Jason Gunthorpe
2022-09-07 16:31 ` Robin Murphy
2022-09-07 16:47 ` Jason Gunthorpe
2022-09-07 17:03 ` Dan Williams
2022-09-07 15:01 ` Robin Murphy
[not found] ` <YxiIkh/yeWQkZ54x@infradead.org>
2022-09-07 15:08 ` Christian König
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YxjDBOIavc79ZByZ@nvidia.com \
--to=jgg@nvidia.com \
--cc=alex.williamson@redhat.com \
--cc=christian.koenig@amd.com \
--cc=cohuck@redhat.com \
--cc=dan.j.williams@intel.com \
--cc=daniel.vetter@ffwll.ch \
--cc=dri-devel@lists.freedesktop.org \
--cc=hch@infradead.org \
--cc=kvm@vger.kernel.org \
--cc=leon@kernel.org \
--cc=linaro-mm-sig@lists.linaro.org \
--cc=linux-media@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=maorg@nvidia.com \
--cc=ogabbay@kernel.org \
--cc=sumit.semwal@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).