dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: "Leon Romanovsky" <leon@kernel.org>,
	kvm@vger.kernel.org, linux-rdma@vger.kernel.org,
	"Daniel Vetter" <daniel.vetter@ffwll.ch>,
	"Oded Gabbay" <ogabbay@kernel.org>,
	"Cornelia Huck" <cohuck@redhat.com>,
	dri-devel@lists.freedesktop.org,
	"Sumit Semwal" <sumit.semwal@linaro.org>,
	linaro-mm-sig@lists.linaro.org,
	"Alex Williamson" <alex.williamson@redhat.com>,
	"Dan Williams" <dan.j.williams@intel.com>,
	"Maor Gottlieb" <maorg@nvidia.com>,
	"Christian König" <christian.koenig@amd.com>,
	linux-media@vger.kernel.org
Subject: Re: [PATCH v2 4/4] vfio/pci: Allow MMIO regions to be exported through dma-buf
Date: Wed, 7 Sep 2022 12:23:28 -0300	[thread overview]
Message-ID: <Yxi3cFfs0SA4XWJw@nvidia.com> (raw)
In-Reply-To: <Yxiq5sjf/qA7xS8A@infradead.org>

On Wed, Sep 07, 2022 at 07:29:58AM -0700, Christoph Hellwig wrote:
> On Wed, Sep 07, 2022 at 09:33:11AM -0300, Jason Gunthorpe wrote:
> > Yes, you said that, and I said that when the AMD driver first merged
> > it - but it went in anyhow and now people are using it in a bunch of
> > places.
> 
> drm folks made up their own weird rules, if they internally stick
> to it they have to listen to it given that they ignore review comments,
> but it violates the scatterlist API and has not business anywhere
> else in the kernel.  And yes, there probably is a reason or two why
> the drm code is unusually error prone.

That may be, but it is creating problems if DRM gets to do X crazy
thing and nobody else can..

So, we have two issues here

 1) DMABUF abuses the scatter list, but this is very constrainted we have
    this weird special "DMABUF scatterlist" that is only touched by DMABUF
    importers. The imports signal that they understand the format with
    a flag. This is ugly and would be nice to clean to a dma mapped
    address list of some sort.

    I spent alot of time a few years ago removing driver touches of
    the SGL and preparing the RDMA stack to do this kind of change, at
    least.

 2) DMABUF abuses dma_map_resource() for P2P and thus doesn't work in
    certain special cases.

    Rather than jump to ZONE_DEVICE and map_sgl I would like to
    continue to support non-struct page mapping. So, I would suggest
    adding a dma_map_p2p() that can cover off the special cases,
    include the two struct devices as arguments with a physical PFN/size. Do
    the same logic we do under the ZONE_DEVICE stuff.

    Drivers can choose if they want to pay the memory cost of
    ZONE_DEVICE and get faster dma_map or get slower dma_map and save
    memory.

I still think we can address them incrementally - but the
dma_map_p2p() might be small enough to sort out right away, if you are
OK with it.

> > > Why would small BARs be problematic for the pages?  The pages are more
> > > a problem for gigantic BARs do the memory overhead.
> > 
> > How do I get a struct page * for a 4k BAR in vfio?
> 
> I guess we have different definitions of small then :)
> 
> But unless my understanding of the code is out out of data,
> memremap_pages just requires the (virtual) start address to be 2MB
> aligned, not the size.  Adding Dan for comments.

Don't we need the virtual start address to equal the physical pfn for
everything to work properly? eg pfn_to_page?

And we can't over-allocate because another driver might want to also
use ZONE_DEVICE pages for its BAR that is now creating a collision.

So, at least as is, the memmap stuff seems unable to support the case
we have with VFIO.

> That being said, what is the point of mapping say a 4k BAR for p2p?
> You're not going to save a measurable amount of CPU overhead if that
> is the only place you transfer to.

For the purpose this series is chasing, it is for doorbell rings. The
actual data transfer may still bounce through CPU memory (if a CMB is
not available), but the latency reduction of directly signaling the
peer device that the transfer is ready is the key objective. 

Bouncing an interrupt through the CPU to cause it to do a writel() is
very tiem consuming, especially on slow ARM devices, while we have
adequate memory bandwidth for data transfer.

When I look at iommufd, it is for generality and compat. We don't have
knowledge of what the guest will do, so regardless of BAR size we have
to create P2P iommu mappings for every kind of PCI BAR. It is what
vfio is currently doing.

Jason

  parent reply	other threads:[~2022-09-07 15:23 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-31 23:12 [PATCH v2 0/4] Allow MMIO regions to be exported through dma-buf Jason Gunthorpe
2022-08-31 23:12 ` [PATCH v2 1/4] dma-buf: Add dma_buf_try_get() Jason Gunthorpe
2022-09-01  7:55   ` Christian König
2022-09-06 16:44     ` Jason Gunthorpe
2022-09-06 17:52       ` Christian König
2022-08-31 23:12 ` [PATCH v2 2/4] vfio: Add vfio_device_get() Jason Gunthorpe
2022-08-31 23:12 ` [PATCH v2 3/4] vfio_pci: Do not open code pci_try_reset_function() Jason Gunthorpe
2022-08-31 23:12 ` [PATCH v2 4/4] vfio/pci: Allow MMIO regions to be exported through dma-buf Jason Gunthorpe
     [not found]   ` <YxcYGzPv022G2vLm@infradead.org>
2022-09-06 10:38     ` Christian König
2022-09-06 11:48       ` Jason Gunthorpe
2022-09-06 12:34         ` Oded Gabbay
2022-09-06 17:59           ` Jason Gunthorpe
2022-09-06 19:44             ` Oded Gabbay
     [not found]         ` <YxiJJYtWgh1l0wxg@infradead.org>
2022-09-07 12:33           ` Jason Gunthorpe
     [not found]             ` <Yxiq5sjf/qA7xS8A@infradead.org>
2022-09-07 14:46               ` Oded Gabbay
2022-09-07 15:23               ` Jason Gunthorpe [this message]
     [not found]                 ` <Yxi5h09JAzIo4Kh8@infradead.org>
2022-09-07 16:12                   ` Jason Gunthorpe
     [not found]                     ` <Yxs+k6psNfBLDqdv@infradead.org>
2022-09-09 14:09                       ` Jason Gunthorpe
2022-09-07 16:31                 ` Robin Murphy
2022-09-07 16:47                   ` Jason Gunthorpe
2022-09-07 17:03               ` Dan Williams
2022-09-07 15:01             ` Robin Murphy
     [not found]       ` <YxiIkh/yeWQkZ54x@infradead.org>
2022-09-07 15:08         ` Christian König

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yxi3cFfs0SA4XWJw@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=alex.williamson@redhat.com \
    --cc=christian.koenig@amd.com \
    --cc=cohuck@redhat.com \
    --cc=dan.j.williams@intel.com \
    --cc=daniel.vetter@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=hch@infradead.org \
    --cc=kvm@vger.kernel.org \
    --cc=leon@kernel.org \
    --cc=linaro-mm-sig@lists.linaro.org \
    --cc=linux-media@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=maorg@nvidia.com \
    --cc=ogabbay@kernel.org \
    --cc=sumit.semwal@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).