linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Paul Cercueil <paul@crapouillou.net>
To: "Andrew Davis" <afd@ti.com>,
	"Christian König" <christian.koenig@amd.com>,
	"Christian König" <ckoenig.leichtzumerken@gmail.com>,
	"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	"Jonathan Corbet" <corbet@lwn.net>,
	"Sumit Semwal" <sumit.semwal@linaro.org>
Cc: "Daniel Vetter" <daniel@ffwll.ch>,
	"Michael Hennerich" <Michael.Hennerich@analog.com>,
	linux-doc@vger.kernel.org, linux-usb@vger.kernel.org,
	linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org,
	linaro-mm-sig@lists.linaro.org, "Nuno Sá" <noname.nuno@gmail.com>,
	"Jonathan Cameron" <jic23@kernel.org>,
	linux-media@vger.kernel.org
Subject: Re: [Linaro-mm-sig] [PATCH v5 1/6] dma-buf: Add dma_buf_{begin,end}_access()
Date: Wed, 24 Jan 2024 16:52:05 +0100	[thread overview]
Message-ID: <4819fa1e6b739380abc669a96dcea90ccc07e774.camel@crapouillou.net> (raw)
In-Reply-To: <715efa1f-c3a4-4952-b72c-ca7f466e3ccb@ti.com>

Hi Andrew,

Le mercredi 24 janvier 2024 à 09:38 -0600, Andrew Davis a écrit :
> On 1/24/24 4:58 AM, Paul Cercueil wrote:
> > Hi Christian,
> > 
> > Le mardi 23 janvier 2024 à 14:28 +0100, Christian König a écrit :
> > >   Am 23.01.24 um 14:02 schrieb Paul Cercueil:
> > >   
> > > > [SNIP]
> > > >   
> > > > >   
> > > > > >    
> > > > > > >   
> > > > > > > That an exporter has to call extra functions to access
> > > > > > > his
> > > > > > > own
> > > > > > > buffers
> > > > > > > is a complete no-go for the design since this forces
> > > > > > > exporters
> > > > > > > into
> > > > > > > doing extra steps for allowing importers to access their
> > > > > > > data.
> > > > > > >   
> > > > > >   
> > > > > > Then what about we add these dma_buf_{begin,end}_access(),
> > > > > > with
> > > > > > only
> > > > > > implementations for "dumb" exporters e.g. udmabuf or the
> > > > > > dmabuf
> > > > > > heaps?
> > > > > > And only importers (who cache the mapping and actually care
> > > > > > about
> > > > > > non-
> > > > > > coherency) would have to call these.
> > > > > >   
> > > > >   
> > > > > No, the problem is still that you would have to change all
> > > > > importers
> > > > > to
> > > > > mandatory use dma_buf_begin/end.
> > > > > 
> > > > > But going a step back caching the mapping is irrelevant for
> > > > > coherency.
> > > > > Even if you don't cache the mapping you don't get coherency.
> > > > >   
> > > >   
> > > > You actually do - at least with udmabuf, as in that case
> > > > dma_buf_map_attachment() / dma_buf_unmap_attachment() will
> > > > handle
> > > > cache
> > > > coherency when the SGs are mapped/unmapped.
> > > >   
> > >   
> > >   Well I just double checked the source in 6.7.1 and I can't see
> > > udmabuf doing anything for cache coherency in map/unmap.
> > >   
> > >   All it does is calling dma_map_sgtable() and
> > > dma_unmap_sgtable() to
> > > create and destroy the SG table and those are not supposed to
> > > sync
> > > anything to the CPU cache.
> > >   
> > >   In other words drivers usually use DMA_ATTR_SKIP_CPU_SYNC here,
> > > it's
> > > just that this is missing from udmabuf.
> > 
> > Ok.
> >   
> > > >   
> > > > The problem was then that dma_buf_unmap_attachment cannot be
> > > > called
> > > > before the dma_fence is signaled, and calling it after is
> > > > already
> > > > too
> > > > late (because the fence would be signaled before the data is
> > > > sync'd).
> > > >   
> > >   
> > >   Well what sync are you talking about? CPU sync? In DMA-buf that
> > > is
> > > handled differently.
> > >   
> > >   For importers it's mandatory that they can be coherent with the
> > > exporter. That usually means they can snoop the CPU cache if the
> > > exporter can snoop the CPU cache.
> > 
> > I seem to have such a system where one device can snoop the CPU
> > cache
> > and the other cannot. Therefore if I want to support it properly, I
> > do
> > need cache flush/sync. I don't actually try to access the data
> > using
> > the CPU (and when I do, I call the sync start/end ioctls).
> > 
> 
> If you don't access the data using the CPU, then how did the data
> end up in the CPU caches? If you have a device that can write-
> allocate
> into your CPU cache, but some other device in the system cannot snoop
> that data back out then that is just broken and those devices cannot
> reasonably share buffers..

I think that's what happens, yes.

> Now we do have systems where some hardware can snoop CPU(or L3)
> caches
> and others cannot, but they will not *allocate* into those caches
> (unless they also have the ability to sync them without CPU in the
> loop).
> 
> Your problem may be if you are still using udmabuf driver as your
> DMA-BUF exporter, which as said before is broken (and I just sent
> some
> patches with a few fixes just for you :)). For udmabuf, data starts
> in the CPU domain (in caches) and is only ever synced for the CPU,
> not for attached devices. So in this case the writing device might
> update those cache lines but a non-snooping reader would never see
> those updates.

I tried today with the system dma-heap, and the behaviour was the same.
Adding an implementation of .dma_buf_begin/end_access() to it made it
work there too.

> I'm not saying there isn't a need for these new {begin,end}_access()
> functions. I can think of a few interesting usecases, but as you
> say below that would be good to work out in a different series.

Yep, but it's a can of worms I'd rather not open if I can avoid it :)

> Andrew

Cheers,
-Paul

> 
> > 
> > >   For exporters you can implement the begin/end CPU access
> > > functions
> > > which allows you to implement something even if your exporting
> > > device
> > > can't snoop the CPU cache.
> > 
> > That only works if the importers call the begin_cpu_access() /
> > end_cpu_access(), which they don't.
> > 
> >   
> > > > Daniel / Sima suggested then that I cache the mapping and add
> > > > new
> > > > functions to ensure cache coherency, which is what these
> > > > patches
> > > > are
> > > > about.
> > > >   
> > >   
> > >   Yeah, I've now catched up on the latest mail. Sorry I haven't
> > > seen
> > > that before.
> > >   
> > >   
> > > >   
> > > > 
> > > >   
> > > > >   
> > > > > In other words exporters are not require to call sync_to_cpu
> > > > > or
> > > > > sync_to_device when you create a mapping.
> > > > > 
> > > > > What exactly is your use case here? And why does coherency
> > > > > matters?
> > > > >   
> > > >   
> > > > My use-case is, I create DMABUFs with udmabuf, that I attach to
> > > > USB/functionfs with the interface introduced by this patchset.
> > > > I
> > > > attach
> > > > them to IIO with a similar interface (being upstreamed in
> > > > parallel),
> > > > and transfer data from USB to IIO and vice-versa in a zero-copy
> > > > fashion.
> > > > 
> > > > This works perfectly fine as long as the USB and IIO hardware
> > > > are
> > > > coherent between themselves, which is the case on most of our
> > > > boards.
> > > > However I do have a board (with a Xilinx Ultrascale SoC) where
> > > > it
> > > > is
> > > > not the case, and cache flushes/sync are needed. So I was
> > > > trying to
> > > > rework these new interfaces to work on that system too.
> > > >   
> > >   
> > >   Yeah, that sounds strongly like one of the use cases we have
> > > rejected so far.
> > >   
> > >   
> > >   
> > > >   
> > > > If this really is a no-no, then I am fine with the assumption
> > > > that
> > > > devices sharing a DMABUF must be coherent between themselves;
> > > > but
> > > > that's something that should probably be enforced rather than
> > > > assumed.
> > > > 
> > > > (and I *think* there is a way to force coherency in the
> > > > Ultrascale's
> > > > interconnect - we're investigating it)
> > > >   
> > >   
> > >   What you can do is that instead of using udmabuf or dma-heaps
> > > is
> > > that the device which can't provide coherency act as exporters of
> > > the
> > > buffers.
> > >   
> > >   The exporter is allowed to call sync_for_cpu/sync_for_device on
> > > it's
> > > own buffers and also gets begin/end CPU access notfications. So
> > > you
> > > can then handle coherency between the exporter and the CPU.
> > 
> > But again that would only work if the importers would call
> > begin_cpu_access() / end_cpu_access(), which they don't, because
> > they
> > don't actually access the data using the CPU.
> > 
> > Unless you mean that the exporter can call
> > sync_for_cpu/sync_for_device
> > before/after every single DMA transfer so that the data appears
> > coherent to the importers, without them having to call
> > begin_cpu_access() / end_cpu_access().
> > 
> > In which case - this would still demultiply the complexity; my USB-
> > functionfs interface here (and IIO interface in the separate
> > patchset)
> > are not device-specific, so I'd rather keep them importers.
> >   
> > >   If you really don't have coherency between devices then that
> > > would
> > > be a really new use case and we would need much more agreement on
> > > how
> > > to do this.
> > 
> > [snip]
> > 
> > Agreed. Desiging a good generic solution would be better.
> > 
> > With that said...
> > 
> > Let's keep it out of this USB-functionfs interface for now. The
> > interface does work perfectly fine on platforms that don't have
> > coherency problems. The coherency issue in itself really is a
> > tangential issue.
> > 
> > So I will send a v6 where I don't try to force the cache coherency
> > -
> > and instead assume that the attached devices are coherent between
> > themselves.
> > 
> > But it would be even better to have a way to detect non-coherency
> > and
> > return an error on attach.
> > 
> > Cheers,
> > -Paul


  reply	other threads:[~2024-01-24 15:52 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-19 14:13 [PATCH v5 0/6] usb: gadget: functionfs: DMABUF import interface Paul Cercueil
2024-01-19 14:13 ` [PATCH v5 1/6] dma-buf: Add dma_buf_{begin,end}_access() Paul Cercueil
2024-01-20 20:20   ` kernel test robot
2024-01-22 10:35   ` [Linaro-mm-sig] " Christian König
2024-01-22 11:01     ` Paul Cercueil
2024-01-22 13:41       ` Christian König
2024-01-23 10:10         ` Paul Cercueil
2024-01-23 11:52           ` Christian König
2024-01-23 13:02             ` Paul Cercueil
     [not found]               ` <577501f9-9d1c-4f8d-9882-7c71090e5ef3@amd.com>
2024-01-24 10:58                 ` Paul Cercueil
2024-01-24 15:38                   ` Andrew Davis
2024-01-24 15:52                     ` Paul Cercueil [this message]
     [not found]                   ` <2ac7562c-d221-409a-bfee-1b3cfcc0f1c6@amd.com>
2024-01-25 18:01                     ` Daniel Vetter
2024-01-26 16:42                       ` Christian König
2024-01-30  9:01                         ` [Linaro-mm-sig] " Daniel Vetter
     [not found]                           ` <a2346244-e22b-4ff6-b6cd-1da7138725ae@amd.com>
2024-01-30  9:48                             ` Paul Cercueil
2024-01-30 10:40                               ` Daniel Vetter
2024-01-30 13:09                                 ` Christian König
2024-01-31  9:07                                   ` Daniel Vetter
2024-02-06 13:28                                     ` Christian König
2024-02-06 13:57                                       ` Daniel Vetter
2024-01-30 10:38                             ` Daniel Vetter
2024-01-25 18:10   ` Daniel Vetter
2024-01-19 14:13 ` [PATCH v5 2/6] dma-buf: udmabuf: Implement .{begin,end}_access Paul Cercueil
2024-02-07 17:10   ` [Linaro-mm-sig] " Daniel Vetter
2024-01-19 14:13 ` [PATCH v5 3/6] usb: gadget: Support already-mapped DMA SGs Paul Cercueil
2024-01-19 14:14 ` [PATCH v5 4/6] usb: gadget: functionfs: Factorize wait-for-endpoint code Paul Cercueil
2024-01-19 14:14 ` [PATCH v5 5/6] usb: gadget: functionfs: Add DMABUF import interface Paul Cercueil
2024-01-19 14:14 ` [PATCH v5 6/6] Documentation: usb: Document FunctionFS DMABUF API Paul Cercueil

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4819fa1e6b739380abc669a96dcea90ccc07e774.camel@crapouillou.net \
    --to=paul@crapouillou.net \
    --cc=Michael.Hennerich@analog.com \
    --cc=afd@ti.com \
    --cc=christian.koenig@amd.com \
    --cc=ckoenig.leichtzumerken@gmail.com \
    --cc=corbet@lwn.net \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=jic23@kernel.org \
    --cc=linaro-mm-sig@lists.linaro.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=noname.nuno@gmail.com \
    --cc=sumit.semwal@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).