linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* IIO, dmabuf, io_uring
@ 2021-08-13 11:41 Paul Cercueil
  2021-08-13 17:20 ` Pavel Begunkov
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Paul Cercueil @ 2021-08-13 11:41 UTC (permalink / raw)
  To: Jonathan Cameron, Sumit Semwal, Christian König, Christoph Hellwig
  Cc: linux-iio, io-uring, linux-media, linux-kernel,
	Michael Hennerich, Alexandru Ardelean

Hi,

A few months ago we (ADI) tried to upstream the interface we use with 
our high-speed ADCs and DACs. It is a system with custom ioctls on the 
iio device node to dequeue and enqueue buffers (allocated with 
dma_alloc_coherent), that can then be mmap'd by userspace applications. 
Anyway, it was ultimately denied entry [1]; this API was okay in ~2014 
when it was designed but it feels like re-inventing the wheel in 2021.

Back to the drawing table, and we'd like to design something that we 
can actually upstream. This high-speed interface looks awfully similar 
to DMABUF, so we may try to implement a DMABUF interface for IIO, 
unless someone has a better idea.

Our first usecase is, we want userspace applications to be able to 
dequeue buffers of samples (from ADCs), and/or enqueue buffers of 
samples (for DACs), and to be able to manipulate them (mmapped 
buffers). With a DMABUF interface, I guess the userspace application 
would dequeue a dma buffer from the driver, mmap it, read/write the 
data, unmap it, then enqueue it to the IIO driver again so that it can 
be disposed of. Does that sound sane?

Our second usecase is - and that's where things get tricky - to be able 
to stream the samples to another computer for processing, over Ethernet 
or USB. Our typical setup is a high-speed ADC/DAC on a dev board with a 
FPGA and a weak soft-core or low-power CPU; processing the data in-situ 
is not an option. Copying the data from one buffer to another is not an 
option either (way too slow), so we absolutely want zero-copy.

Usual userspace zero-copy techniques (vmsplice+splice, MSG_ZEROCOPY 
etc) don't really work with mmapped kernel buffers allocated for DMA 
[2] and/or have a huge overhead, so the way I see it, we would also 
need DMABUF support in both the Ethernet stack and USB (functionfs) 
stack. However, as far as I understood, DMABUF is mostly a DRM/V4L2 
thing, so I am really not sure we have the right idea here.

And finally, there is the new kid in town, io_uring. I am not very 
literate about the topic, but it does not seem to be able to handle DMA 
buffers (yet?). The idea that we could dequeue a buffer of samples from 
the IIO device and send it over the network in one single syscall is 
appealing, though.

Any thoughts? Feedback would be greatly appreciated.

Cheers,
-Paul

[1]: 
https://lore.kernel.org/linux-iio/20210217073638.21681-1-alexandru.ardelean@analog.com/T/#m6b853addb77959c55e078fbb06828db33d4bf3d7
[2]: 
https://newbedev.com/zero-copy-user-space-tcp-send-of-dma-mmap-coherent-mapped-memory



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: IIO, dmabuf, io_uring
  2021-08-13 11:41 IIO, dmabuf, io_uring Paul Cercueil
@ 2021-08-13 17:20 ` Pavel Begunkov
  2021-08-16  9:20   ` Paul Cercueil
  2021-08-14  7:30 ` Christoph Hellwig
  2021-08-15 18:02 ` Christian König
  2 siblings, 1 reply; 7+ messages in thread
From: Pavel Begunkov @ 2021-08-13 17:20 UTC (permalink / raw)
  To: Paul Cercueil, Jonathan Cameron, Sumit Semwal,
	Christian König, Christoph Hellwig
  Cc: linux-iio, io-uring, linux-media, linux-kernel,
	Michael Hennerich, Alexandru Ardelean

Hi Paul,

On 8/13/21 12:41 PM, Paul Cercueil wrote:
> Hi,
> 
> A few months ago we (ADI) tried to upstream the interface we use with our high-speed ADCs and DACs. It is a system with custom ioctls on the iio device node to dequeue and enqueue buffers (allocated with dma_alloc_coherent), that can then be mmap'd by userspace applications. Anyway, it was ultimately denied entry [1]; this API was okay in ~2014 when it was designed but it feels like re-inventing the wheel in 2021.
> 
> Back to the drawing table, and we'd like to design something that we can actually upstream. This high-speed interface looks awfully similar to DMABUF, so we may try to implement a DMABUF interface for IIO, unless someone has a better idea.
> 
> Our first usecase is, we want userspace applications to be able to dequeue buffers of samples (from ADCs), and/or enqueue buffers of samples (for DACs), and to be able to manipulate them (mmapped buffers). With a DMABUF interface, I guess the userspace application would dequeue a dma buffer from the driver, mmap it, read/write the data, unmap it, then enqueue it to the IIO driver again so that it can be disposed of. Does that sound sane?
> 
> Our second usecase is - and that's where things get tricky - to be able to stream the samples to another computer for processing, over Ethernet or USB. Our typical setup is a high-speed ADC/DAC on a dev board with a FPGA and a weak soft-core or low-power CPU; processing the data in-situ is not an option. Copying the data from one buffer to another is not an option either (way too slow), so we absolutely want zero-copy.
> 
> Usual userspace zero-copy techniques (vmsplice+splice, MSG_ZEROCOPY etc) don't really work with mmapped kernel buffers allocated for DMA [2] and/or have a huge overhead, so the way I see it, we would also need DMABUF support in both the Ethernet stack and USB (functionfs) stack. However, as far as I understood, DMABUF is mostly a DRM/V4L2 thing, so I am really not sure we have the right idea here.
> 
> And finally, there is the new kid in town, io_uring. I am not very literate about the topic, but it does not seem to be able to handle DMA buffers (yet?). The idea that we could dequeue a buffer of samples from the IIO device and send it over the network in one single syscall is appealing, though.

You might be interested to look up zctap, previously a.k.a netgpu.

For io_uring, it's work in progress as well.

> 
> Any thoughts? Feedback would be greatly appreciated.
> 
> Cheers,
> -Paul
> 
> [1]: https://lore.kernel.org/linux-iio/20210217073638.21681-1-alexandru.ardelean@analog.com/T/#m6b853addb77959c55e078fbb06828db33d4bf3d7
> [2]: https://newbedev.com/zero-copy-user-space-tcp-send-of-dma-mmap-coherent-mapped-memory

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: IIO, dmabuf, io_uring
  2021-08-13 11:41 IIO, dmabuf, io_uring Paul Cercueil
  2021-08-13 17:20 ` Pavel Begunkov
@ 2021-08-14  7:30 ` Christoph Hellwig
  2021-08-16  9:27   ` Paul Cercueil
  2021-08-16 15:01   ` [Linaro-mm-sig] " Daniel Vetter
  2021-08-15 18:02 ` Christian König
  2 siblings, 2 replies; 7+ messages in thread
From: Christoph Hellwig @ 2021-08-14  7:30 UTC (permalink / raw)
  To: Paul Cercueil
  Cc: Jonathan Cameron, Sumit Semwal, Christian König,
	Christoph Hellwig, linux-iio, io-uring, linux-media,
	linux-kernel, Michael Hennerich, Alexandru Ardelean, dri-devel,
	linaro-mm-sig

On Fri, Aug 13, 2021 at 01:41:26PM +0200, Paul Cercueil wrote:
> Hi,
>
> A few months ago we (ADI) tried to upstream the interface we use with our 
> high-speed ADCs and DACs. It is a system with custom ioctls on the iio 
> device node to dequeue and enqueue buffers (allocated with 
> dma_alloc_coherent), that can then be mmap'd by userspace applications. 
> Anyway, it was ultimately denied entry [1]; this API was okay in ~2014 when 
> it was designed but it feels like re-inventing the wheel in 2021.
>
> Back to the drawing table, and we'd like to design something that we can 
> actually upstream. This high-speed interface looks awfully similar to 
> DMABUF, so we may try to implement a DMABUF interface for IIO, unless 
> someone has a better idea.

To me this does sound a lot like a dma buf use case.  The interesting
question to me is how to signal arrival of new data, or readyness to
consume more data.  I suspect that people that are actually using
dmabuf heavily at the moment (dri/media folks) might be able to chime
in a little more on that.

> Our first usecase is, we want userspace applications to be able to dequeue 
> buffers of samples (from ADCs), and/or enqueue buffers of samples (for 
> DACs), and to be able to manipulate them (mmapped buffers). With a DMABUF 
> interface, I guess the userspace application would dequeue a dma buffer 
> from the driver, mmap it, read/write the data, unmap it, then enqueue it to 
> the IIO driver again so that it can be disposed of. Does that sound sane?
>
> Our second usecase is - and that's where things get tricky - to be able to 
> stream the samples to another computer for processing, over Ethernet or 
> USB. Our typical setup is a high-speed ADC/DAC on a dev board with a FPGA 
> and a weak soft-core or low-power CPU; processing the data in-situ is not 
> an option. Copying the data from one buffer to another is not an option 
> either (way too slow), so we absolutely want zero-copy.
>
> Usual userspace zero-copy techniques (vmsplice+splice, MSG_ZEROCOPY etc) 
> don't really work with mmapped kernel buffers allocated for DMA [2] and/or 
> have a huge overhead, so the way I see it, we would also need DMABUF 
> support in both the Ethernet stack and USB (functionfs) stack. However, as 
> far as I understood, DMABUF is mostly a DRM/V4L2 thing, so I am really not 
> sure we have the right idea here.
>
> And finally, there is the new kid in town, io_uring. I am not very literate 
> about the topic, but it does not seem to be able to handle DMA buffers 
> (yet?). The idea that we could dequeue a buffer of samples from the IIO 
> device and send it over the network in one single syscall is appealing, 
> though.

Think of io_uring really just as an async syscall layer.  It doesn't
replace DMA buffers, but can be used as a different and for some
workloads more efficient way to dispatch syscalls.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: IIO, dmabuf, io_uring
  2021-08-13 11:41 IIO, dmabuf, io_uring Paul Cercueil
  2021-08-13 17:20 ` Pavel Begunkov
  2021-08-14  7:30 ` Christoph Hellwig
@ 2021-08-15 18:02 ` Christian König
  2 siblings, 0 replies; 7+ messages in thread
From: Christian König @ 2021-08-15 18:02 UTC (permalink / raw)
  To: Paul Cercueil, Jonathan Cameron, Sumit Semwal, Christoph Hellwig
  Cc: linux-iio, io-uring, linux-media, linux-kernel,
	Michael Hennerich, Alexandru Ardelean

Hi Paul,

Am 13.08.21 um 13:41 schrieb Paul Cercueil:
> Hi,
>
> A few months ago we (ADI) tried to upstream the interface we use with 
> our high-speed ADCs and DACs. It is a system with custom ioctls on the 
> iio device node to dequeue and enqueue buffers (allocated with 
> dma_alloc_coherent), that can then be mmap'd by userspace 
> applications. Anyway, it was ultimately denied entry [1]; this API was 
> okay in ~2014 when it was designed but it feels like re-inventing the 
> wheel in 2021.
>
> Back to the drawing table, and we'd like to design something that we 
> can actually upstream. This high-speed interface looks awfully similar 
> to DMABUF, so we may try to implement a DMABUF interface for IIO, 
> unless someone has a better idea.

Yeah, that sounds a lot like a DMABUF use case.

>
> Our first usecase is, we want userspace applications to be able to 
> dequeue buffers of samples (from ADCs), and/or enqueue buffers of 
> samples (for DACs), and to be able to manipulate them (mmapped 
> buffers). With a DMABUF interface, I guess the userspace application 
> would dequeue a dma buffer from the driver, mmap it, read/write the 
> data, unmap it, then enqueue it to the IIO driver again so that it can 
> be disposed of. Does that sound sane?

Well it's pretty close. Doing the map/unmap dance all the time is 
usually a bad idea since flushing the CPU TLB all the time totally kills 
your performance.

What you do instead is to implement the CPU synchronize callbacks in 
your DMA-BUF implementation and flush caches as necessary.

>
> Our second usecase is - and that's where things get tricky - to be 
> able to stream the samples to another computer for processing, over 
> Ethernet or USB. Our typical setup is a high-speed ADC/DAC on a dev 
> board with a FPGA and a weak soft-core or low-power CPU; processing 
> the data in-situ is not an option. Copying the data from one buffer to 
> another is not an option either (way too slow), so we absolutely want 
> zero-copy.
>
> Usual userspace zero-copy techniques (vmsplice+splice, MSG_ZEROCOPY 
> etc) don't really work with mmapped kernel buffers allocated for DMA 
> [2] and/or have a huge overhead, so the way I see it, we would also 
> need DMABUF support in both the Ethernet stack and USB (functionfs) 
> stack. However, as far as I understood, DMABUF is mostly a DRM/V4L2 
> thing, so I am really not sure we have the right idea here.

Well two possibilities here: Either implement DMA-BUF support in the 
Ethernet/USB subsystem or get splice working efficiently with DMA-BUF 
mappings.

The first one is certainly a lot of work and no idea if the second is 
even doable and if yes also in a non-hacky way which you can get upstream.

>
> And finally, there is the new kid in town, io_uring. I am not very 
> literate about the topic, but it does not seem to be able to handle 
> DMA buffers (yet?). The idea that we could dequeue a buffer of samples 
> from the IIO device and send it over the network in one single syscall 
> is appealing, though.

As far as I know this is orthogonal to DMA-BUF. Christoph's answer 
sounds like my understand is correct, but there are certainly people 
which know that better than I do.

Regards,
Christian.

>
> Any thoughts? Feedback would be greatly appreciated.
>
> Cheers,
> -Paul
>
> [1]: 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flinux-iio%2F20210217073638.21681-1-alexandru.ardelean%40analog.com%2FT%2F%23m6b853addb77959c55e078fbb06828db33d4bf3d7&data=04%7C01%7Cchristian.koenig%40amd.com%7C2c62025e34b644b98e2508d95e4f4dcb%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637644516997743314%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=vZfslxljjWcXi1RccZcsnKTD8x1CixRN%2Ftk4FMsWN3U%3D&reserved=0
> [2]: 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fnewbedev.com%2Fzero-copy-user-space-tcp-send-of-dma-mmap-coherent-mapped-memory&data=04%7C01%7Cchristian.koenig%40amd.com%7C2c62025e34b644b98e2508d95e4f4dcb%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637644516997753306%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Fn%2B3dO%2B%2F3r0ZpC5oKsQaPN2DREZKVWdVPahYgt2bsSw%3D&reserved=0
>
>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: IIO, dmabuf, io_uring
  2021-08-13 17:20 ` Pavel Begunkov
@ 2021-08-16  9:20   ` Paul Cercueil
  0 siblings, 0 replies; 7+ messages in thread
From: Paul Cercueil @ 2021-08-16  9:20 UTC (permalink / raw)
  To: Pavel Begunkov, Jonathan Lemon
  Cc: Jonathan Cameron, Sumit Semwal, Christian König,
	Christoph Hellwig, linux-iio, io-uring, linux-media,
	linux-kernel, Michael Hennerich, Alexandru Ardelean

Hi,

Le ven., août 13 2021 at 18:20:19 +0100, Pavel Begunkov 
<asml.silence@gmail.com> a écrit :
> Hi Paul,
> 
> On 8/13/21 12:41 PM, Paul Cercueil wrote:
>>  Hi,
>> 
>>  A few months ago we (ADI) tried to upstream the interface we use 
>> with our high-speed ADCs and DACs. It is a system with custom ioctls 
>> on the iio device node to dequeue and enqueue buffers (allocated 
>> with dma_alloc_coherent), that can then be mmap'd by userspace 
>> applications. Anyway, it was ultimately denied entry [1]; this API 
>> was okay in ~2014 when it was designed but it feels like 
>> re-inventing the wheel in 2021.
>> 
>>  Back to the drawing table, and we'd like to design something that 
>> we can actually upstream. This high-speed interface looks awfully 
>> similar to DMABUF, so we may try to implement a DMABUF interface for 
>> IIO, unless someone has a better idea.
>> 
>>  Our first usecase is, we want userspace applications to be able to 
>> dequeue buffers of samples (from ADCs), and/or enqueue buffers of 
>> samples (for DACs), and to be able to manipulate them (mmapped 
>> buffers). With a DMABUF interface, I guess the userspace application 
>> would dequeue a dma buffer from the driver, mmap it, read/write the 
>> data, unmap it, then enqueue it to the IIO driver again so that it 
>> can be disposed of. Does that sound sane?
>> 
>>  Our second usecase is - and that's where things get tricky - to be 
>> able to stream the samples to another computer for processing, over 
>> Ethernet or USB. Our typical setup is a high-speed ADC/DAC on a dev 
>> board with a FPGA and a weak soft-core or low-power CPU; processing 
>> the data in-situ is not an option. Copying the data from one buffer 
>> to another is not an option either (way too slow), so we absolutely 
>> want zero-copy.
>> 
>>  Usual userspace zero-copy techniques (vmsplice+splice, MSG_ZEROCOPY 
>> etc) don't really work with mmapped kernel buffers allocated for DMA 
>> [2] and/or have a huge overhead, so the way I see it, we would also 
>> need DMABUF support in both the Ethernet stack and USB (functionfs) 
>> stack. However, as far as I understood, DMABUF is mostly a DRM/V4L2 
>> thing, so I am really not sure we have the right idea here.
>> 
>>  And finally, there is the new kid in town, io_uring. I am not very 
>> literate about the topic, but it does not seem to be able to handle 
>> DMA buffers (yet?). The idea that we could dequeue a buffer of 
>> samples from the IIO device and send it over the network in one 
>> single syscall is appealing, though.
> 
> You might be interested to look up zctap, previously a.k.a netgpu.

CCing Jonathan (Lemon) then.

Jonathan: Am I right that zctap supports importing/exporting dmabufs? 
Because that would solve half of my problem.

Cheers,
-Paul

> For io_uring, it's work in progress as well.
> 
>> 
>>  Any thoughts? Feedback would be greatly appreciated.
>> 
>>  Cheers,
>>  -Paul
>> 
>>  [1]: 
>> https://lore.kernel.org/linux-iio/20210217073638.21681-1-alexandru.ardelean@analog.com/T/#m6b853addb77959c55e078fbb06828db33d4bf3d7
>>  [2]: 
>> https://newbedev.com/zero-copy-user-space-tcp-send-of-dma-mmap-coherent-mapped-memory
> 
> --
> Pavel Begunkov



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: IIO, dmabuf, io_uring
  2021-08-14  7:30 ` Christoph Hellwig
@ 2021-08-16  9:27   ` Paul Cercueil
  2021-08-16 15:01   ` [Linaro-mm-sig] " Daniel Vetter
  1 sibling, 0 replies; 7+ messages in thread
From: Paul Cercueil @ 2021-08-16  9:27 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jonathan Cameron, Sumit Semwal, Christian König, linux-iio,
	io-uring, linux-media, linux-kernel, Michael Hennerich,
	Alexandru Ardelean, dri-devel, linaro-mm-sig

Hi Christoph,

Le sam., août 14 2021 at 09:30:19 +0200, Christoph Hellwig 
<hch@lst.de> a écrit :
> On Fri, Aug 13, 2021 at 01:41:26PM +0200, Paul Cercueil wrote:
>>  Hi,
>> 
>>  A few months ago we (ADI) tried to upstream the interface we use 
>> with our
>>  high-speed ADCs and DACs. It is a system with custom ioctls on the 
>> iio
>>  device node to dequeue and enqueue buffers (allocated with
>>  dma_alloc_coherent), that can then be mmap'd by userspace 
>> applications.
>>  Anyway, it was ultimately denied entry [1]; this API was okay in 
>> ~2014 when
>>  it was designed but it feels like re-inventing the wheel in 2021.
>> 
>>  Back to the drawing table, and we'd like to design something that 
>> we can
>>  actually upstream. This high-speed interface looks awfully similar 
>> to
>>  DMABUF, so we may try to implement a DMABUF interface for IIO, 
>> unless
>>  someone has a better idea.
> 
> To me this does sound a lot like a dma buf use case.  The interesting
> question to me is how to signal arrival of new data, or readyness to
> consume more data.  I suspect that people that are actually using
> dmabuf heavily at the moment (dri/media folks) might be able to chime
> in a little more on that.

Thanks for the feedback.

I haven't looked too much into how dmabuf works; but IIO device nodes 
right now have a regular stdio interface, so I believe poll() flags can 
be used to signal arrival of new data.

>>  Our first usecase is, we want userspace applications to be able to 
>> dequeue
>>  buffers of samples (from ADCs), and/or enqueue buffers of samples 
>> (for
>>  DACs), and to be able to manipulate them (mmapped buffers). With a 
>> DMABUF
>>  interface, I guess the userspace application would dequeue a dma 
>> buffer
>>  from the driver, mmap it, read/write the data, unmap it, then 
>> enqueue it to
>>  the IIO driver again so that it can be disposed of. Does that sound 
>> sane?
>> 
>>  Our second usecase is - and that's where things get tricky - to be 
>> able to
>>  stream the samples to another computer for processing, over 
>> Ethernet or
>>  USB. Our typical setup is a high-speed ADC/DAC on a dev board with 
>> a FPGA
>>  and a weak soft-core or low-power CPU; processing the data in-situ 
>> is not
>>  an option. Copying the data from one buffer to another is not an 
>> option
>>  either (way too slow), so we absolutely want zero-copy.
>> 
>>  Usual userspace zero-copy techniques (vmsplice+splice, MSG_ZEROCOPY 
>> etc)
>>  don't really work with mmapped kernel buffers allocated for DMA [2] 
>> and/or
>>  have a huge overhead, so the way I see it, we would also need DMABUF
>>  support in both the Ethernet stack and USB (functionfs) stack. 
>> However, as
>>  far as I understood, DMABUF is mostly a DRM/V4L2 thing, so I am 
>> really not
>>  sure we have the right idea here.
>> 
>>  And finally, there is the new kid in town, io_uring. I am not very 
>> literate
>>  about the topic, but it does not seem to be able to handle DMA 
>> buffers
>>  (yet?). The idea that we could dequeue a buffer of samples from the 
>> IIO
>>  device and send it over the network in one single syscall is 
>> appealing,
>>  though.
> 
> Think of io_uring really just as an async syscall layer.  It doesn't
> replace DMA buffers, but can be used as a different and for some
> workloads more efficient way to dispatch syscalls.

That was my thought, yes. Thanks.

Cheers,
-Paul



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Linaro-mm-sig] IIO, dmabuf, io_uring
  2021-08-14  7:30 ` Christoph Hellwig
  2021-08-16  9:27   ` Paul Cercueil
@ 2021-08-16 15:01   ` Daniel Vetter
  1 sibling, 0 replies; 7+ messages in thread
From: Daniel Vetter @ 2021-08-16 15:01 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Paul Cercueil, Michael Hennerich, Christian König,
	linux-iio, linux-kernel, dri-devel, linaro-mm-sig,
	Alexandru Ardelean, io-uring, Jonathan Cameron, linux-media

On Sat, Aug 14, 2021 at 09:30:19AM +0200, Christoph Hellwig wrote:
> On Fri, Aug 13, 2021 at 01:41:26PM +0200, Paul Cercueil wrote:
> > Hi,
> >
> > A few months ago we (ADI) tried to upstream the interface we use with our 
> > high-speed ADCs and DACs. It is a system with custom ioctls on the iio 
> > device node to dequeue and enqueue buffers (allocated with 
> > dma_alloc_coherent), that can then be mmap'd by userspace applications. 
> > Anyway, it was ultimately denied entry [1]; this API was okay in ~2014 when 
> > it was designed but it feels like re-inventing the wheel in 2021.
> >
> > Back to the drawing table, and we'd like to design something that we can 
> > actually upstream. This high-speed interface looks awfully similar to 
> > DMABUF, so we may try to implement a DMABUF interface for IIO, unless 
> > someone has a better idea.
> 
> To me this does sound a lot like a dma buf use case.  The interesting
> question to me is how to signal arrival of new data, or readyness to
> consume more data.  I suspect that people that are actually using
> dmabuf heavily at the moment (dri/media folks) might be able to chime
> in a little more on that.

One option is to just block in userspace (on poll, or an ioctl, or
whatever) and then latch the next stage in the pipeline. That's what media
does right now (because the dma-fence proposal never got anywhere).

In drm we use dma_fences to tie up the stages, and the current
recommendation for uapi is to use the drm_syncobj container (not the
sync_file container, that was a bit an awkward iteration on that problem).
With that you can tie together all the pipeline stages within the kernel
(and at least sometimes directly in hw).

The downside is (well imo it's not a downside, but some people see it as
hta) that once you use dma-fence dri-devel folks really consider your
stuff a gpu driver and expect all the gpu driver review/merge criteria to
be fulfilled. Specifically about the userspace side too:

https://dri.freedesktop.org/docs/drm/gpu/drm-uapi.html#open-source-userspace-requirements

At least one driver is trying to play some very clever games here and
that's not a solid way to make friends ...
-Daniel

> 
> > Our first usecase is, we want userspace applications to be able to dequeue 
> > buffers of samples (from ADCs), and/or enqueue buffers of samples (for 
> > DACs), and to be able to manipulate them (mmapped buffers). With a DMABUF 
> > interface, I guess the userspace application would dequeue a dma buffer 
> > from the driver, mmap it, read/write the data, unmap it, then enqueue it to 
> > the IIO driver again so that it can be disposed of. Does that sound sane?
> >
> > Our second usecase is - and that's where things get tricky - to be able to 
> > stream the samples to another computer for processing, over Ethernet or 
> > USB. Our typical setup is a high-speed ADC/DAC on a dev board with a FPGA 
> > and a weak soft-core or low-power CPU; processing the data in-situ is not 
> > an option. Copying the data from one buffer to another is not an option 
> > either (way too slow), so we absolutely want zero-copy.
> >
> > Usual userspace zero-copy techniques (vmsplice+splice, MSG_ZEROCOPY etc) 
> > don't really work with mmapped kernel buffers allocated for DMA [2] and/or 
> > have a huge overhead, so the way I see it, we would also need DMABUF 
> > support in both the Ethernet stack and USB (functionfs) stack. However, as 
> > far as I understood, DMABUF is mostly a DRM/V4L2 thing, so I am really not 
> > sure we have the right idea here.
> >
> > And finally, there is the new kid in town, io_uring. I am not very literate 
> > about the topic, but it does not seem to be able to handle DMA buffers 
> > (yet?). The idea that we could dequeue a buffer of samples from the IIO 
> > device and send it over the network in one single syscall is appealing, 
> > though.
> 
> Think of io_uring really just as an async syscall layer.  It doesn't
> replace DMA buffers, but can be used as a different and for some
> workloads more efficient way to dispatch syscalls.
> _______________________________________________
> Linaro-mm-sig mailing list
> Linaro-mm-sig@lists.linaro.org
> https://lists.linaro.org/mailman/listinfo/linaro-mm-sig

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-08-16 15:01 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-13 11:41 IIO, dmabuf, io_uring Paul Cercueil
2021-08-13 17:20 ` Pavel Begunkov
2021-08-16  9:20   ` Paul Cercueil
2021-08-14  7:30 ` Christoph Hellwig
2021-08-16  9:27   ` Paul Cercueil
2021-08-16 15:01   ` [Linaro-mm-sig] " Daniel Vetter
2021-08-15 18:02 ` Christian König

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).