All of lore.kernel.org
 help / color / mirror / Atom feed
* Doing DMA from peripheral to userland memory
@ 2021-08-27  9:29 François Legal
  2021-08-27 13:01 ` Jan Kiszka
  2021-08-27 13:01 ` Philippe Gerum
  0 siblings, 2 replies; 12+ messages in thread
From: François Legal @ 2021-08-27  9:29 UTC (permalink / raw)
  To: xenomai

Hello,

working on a zynq7000 target (arm cortex a9), we have a peripheral that generates loads of data (many kbytes per ms).

We would like to move that data, directly from the peripheral memory (the OCM of the SoC) directly to our RT application user memory using DMA.

For one part of the data, we would like the DMA to de interlace that data while moving it. We figured out, the PL330 peripheral on the SoC should be able to do it, however, we would like, as much as possible, to retain the use of one or two channels of the PL330 to plain linux non RT use (via dmaengine).

My first attempt would be to enhance the dmaengine API to add RT API, then implement the RT API calls in the PL330 driver.

What do you think of this approach, and is it achievable at all (DMA directly to user land memory and/or having DMA channels exploited by xenomai and other by linux) ?

Thanks in advance

François



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Doing DMA from peripheral to userland memory
  2021-08-27  9:29 Doing DMA from peripheral to userland memory François Legal
@ 2021-08-27 13:01 ` Jan Kiszka
  2021-08-27 13:01 ` Philippe Gerum
  1 sibling, 0 replies; 12+ messages in thread
From: Jan Kiszka @ 2021-08-27 13:01 UTC (permalink / raw)
  To: François Legal, xenomai

On 27.08.21 11:29, François Legal via Xenomai wrote:
> Hello,
> 
> working on a zynq7000 target (arm cortex a9), we have a peripheral that generates loads of data (many kbytes per ms).
> 
> We would like to move that data, directly from the peripheral memory (the OCM of the SoC) directly to our RT application user memory using DMA.
> 
> For one part of the data, we would like the DMA to de interlace that data while moving it. We figured out, the PL330 peripheral on the SoC should be able to do it, however, we would like, as much as possible, to retain the use of one or two channels of the PL330 to plain linux non RT use (via dmaengine).
> 
> My first attempt would be to enhance the dmaengine API to add RT API, then implement the RT API calls in the PL330 driver.
> 
> What do you think of this approach, and is it achievable at all (DMA directly to user land memory and/or having DMA channels exploited by xenomai and other by linux) ?
> 

IIRC, the topic of DMA also popped up in the context of SPI support in
the past. In any case, you likely want to check how Dovetail and EVL /
Xenomai 4 is possibly already addressing, to get an idea of potential
solution patterns.

Jan

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Doing DMA from peripheral to userland memory
  2021-08-27  9:29 Doing DMA from peripheral to userland memory François Legal
  2021-08-27 13:01 ` Jan Kiszka
@ 2021-08-27 13:01 ` Philippe Gerum
  2021-08-27 13:44   ` François Legal
  1 sibling, 1 reply; 12+ messages in thread
From: Philippe Gerum @ 2021-08-27 13:01 UTC (permalink / raw)
  To: François Legal; +Cc: xenomai


François Legal via Xenomai <xenomai@xenomai.org> writes:

> Hello,
>
> working on a zynq7000 target (arm cortex a9), we have a peripheral that generates loads of data (many kbytes per ms).
>
> We would like to move that data, directly from the peripheral memory (the OCM of the SoC) directly to our RT application user memory using DMA.
>
> For one part of the data, we would like the DMA to de interlace that data while moving it. We figured out, the PL330 peripheral on the SoC should be able to do it, however, we would like, as much as possible, to retain the use of one or two channels of the PL330 to plain linux non RT use (via dmaengine).
>
> My first attempt would be to enhance the dmaengine API to add RT API, then implement the RT API calls in the PL330 driver.
>
> What do you think of this approach, and is it achievable at all (DMA directly to user land memory and/or having DMA channels exploited by xenomai and other by linux) ?
>
> Thanks in advance
>
> François

As a starting point, you may want to have a look at this document:
https://evlproject.org/core/oob-drivers/dma/

This is part of the EVL core documentation, but this is actually a
Dovetail feature.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Doing DMA from peripheral to userland memory
  2021-08-27 13:01 ` Philippe Gerum
@ 2021-08-27 13:44   ` François Legal
  2021-08-27 13:54     ` Philippe Gerum
  0 siblings, 1 reply; 12+ messages in thread
From: François Legal @ 2021-08-27 13:44 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai

Le Vendredi, Août 27, 2021 15:01 CEST, Philippe Gerum <rpm@xenomai.org> a écrit:

>
> François Legal via Xenomai <xenomai@xenomai.org> writes:
>
> > Hello,
> >
> > working on a zynq7000 target (arm cortex a9), we have a peripheral that generates loads of data (many kbytes per ms).
> >
> > We would like to move that data, directly from the peripheral memory (the OCM of the SoC) directly to our RT application user memory using DMA.
> >
> > For one part of the data, we would like the DMA to de interlace that data while moving it. We figured out, the PL330 peripheral on the SoC should be able to do it, however, we would like, as much as possible, to retain the use of one or two channels of the PL330 to plain linux non RT use (via dmaengine).
> >
> > My first attempt would be to enhance the dmaengine API to add RT API, then implement the RT API calls in the PL330 driver.
> >
> > What do you think of this approach, and is it achievable at all (DMA directly to user land memory and/or having DMA channels exploited by xenomai and other by linux) ?
> >
> > Thanks in advance
> >
> > François
>
> As a starting point, you may want to have a look at this document:
> https://evlproject.org/core/oob-drivers/dma/
>
> This is part of the EVL core documentation, but this is actually a
> Dovetail feature.
>

Well, that's quite what I want to do, so this is very good news that it is already available in the future. However, I need it through the ipipe right now, but I guess the process stays the same (through patching the dmaengine API and the DMA engine driver).

I would guess the modifications to the DMA engine driver would be then easily ported to dovetail ?

François

> --
> Philippe.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Doing DMA from peripheral to userland memory
  2021-08-27 13:44   ` François Legal
@ 2021-08-27 13:54     ` Philippe Gerum
  2021-08-27 14:09       ` François Legal
  0 siblings, 1 reply; 12+ messages in thread
From: Philippe Gerum @ 2021-08-27 13:54 UTC (permalink / raw)
  To: François Legal; +Cc: xenomai


François Legal <devel@thom.fr.eu.org> writes:

> Le Vendredi, Août 27, 2021 15:01 CEST, Philippe Gerum <rpm@xenomai.org> a écrit: 
>  
>> 
>> François Legal via Xenomai <xenomai@xenomai.org> writes:
>> 
>> > Hello,
>> >
>> > working on a zynq7000 target (arm cortex a9), we have a peripheral that generates loads of data (many kbytes per ms).
>> >
>> > We would like to move that data, directly from the peripheral memory (the OCM of the SoC) directly to our RT application user memory using DMA.
>> >
>> > For one part of the data, we would like the DMA to de interlace that data while moving it. We figured out, the PL330 peripheral on the SoC should be able to do it, however, we would like, as much as possible, to retain the use of one or two channels of the PL330 to plain linux non RT use (via dmaengine).
>> >
>> > My first attempt would be to enhance the dmaengine API to add RT API, then implement the RT API calls in the PL330 driver.
>> >
>> > What do you think of this approach, and is it achievable at all (DMA directly to user land memory and/or having DMA channels exploited by xenomai and other by linux) ?
>> >
>> > Thanks in advance
>> >
>> > François
>> 
>> As a starting point, you may want to have a look at this document:
>> https://evlproject.org/core/oob-drivers/dma/
>> 
>> This is part of the EVL core documentation, but this is actually a
>> Dovetail feature.
>> 
>
> Well, that's quite what I want to do, so this is very good news that it is already available in the future. However, I need it through the ipipe right now, but I guess the process stays the same (through patching the dmaengine API and the DMA engine driver).
>
> I would guess the modifications to the DMA engine driver would be then easily ported to dovetail ?
>

Since they should follow the same pattern used for the controllers
Dovetail currently supports, I think so. You should be able to simplify
the code when porting it Dovetail actually.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Doing DMA from peripheral to userland memory
  2021-08-27 13:54     ` Philippe Gerum
@ 2021-08-27 14:09       ` François Legal
  2021-08-27 14:36         ` Philippe Gerum
  0 siblings, 1 reply; 12+ messages in thread
From: François Legal @ 2021-08-27 14:09 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai

Le Vendredi, Août 27, 2021 15:54 CEST, Philippe Gerum <rpm@xenomai.org> a écrit:

>
> François Legal <devel@thom.fr.eu.org> writes:
>
> > Le Vendredi, Août 27, 2021 15:01 CEST, Philippe Gerum <rpm@xenomai.org> a écrit:
> >
> >>
> >> François Legal via Xenomai <xenomai@xenomai.org> writes:
> >>
> >> > Hello,
> >> >
> >> > working on a zynq7000 target (arm cortex a9), we have a peripheral that generates loads of data (many kbytes per ms).
> >> >
> >> > We would like to move that data, directly from the peripheral memory (the OCM of the SoC) directly to our RT application user memory using DMA.
> >> >
> >> > For one part of the data, we would like the DMA to de interlace that data while moving it. We figured out, the PL330 peripheral on the SoC should be able to do it, however, we would like, as much as possible, to retain the use of one or two channels of the PL330 to plain linux non RT use (via dmaengine).
> >> >
> >> > My first attempt would be to enhance the dmaengine API to add RT API, then implement the RT API calls in the PL330 driver.
> >> >
> >> > What do you think of this approach, and is it achievable at all (DMA directly to user land memory and/or having DMA channels exploited by xenomai and other by linux) ?
> >> >
> >> > Thanks in advance
> >> >
> >> > François
> >>
> >> As a starting point, you may want to have a look at this document:
> >> https://evlproject.org/core/oob-drivers/dma/
> >>
> >> This is part of the EVL core documentation, but this is actually a
> >> Dovetail feature.
> >>
> >
> > Well, that's quite what I want to do, so this is very good news that it is already available in the future. However, I need it through the ipipe right now, but I guess the process stays the same (through patching the dmaengine API and the DMA engine driver).
> >
> > I would guess the modifications to the DMA engine driver would be then easily ported to dovetail ?
> >
>
> Since they should follow the same pattern used for the controllers
> Dovetail currently supports, I think so. You should be able to simplify
> the code when porting it Dovetail actually.
>

That's what I thought. Thanks a lot.

So now, regarding the "to userland memory" aspect. I guess I will somehow have to, in order to make this happen, change the PTE flags to make these pages non cacheable (using dma_map_page maybe), but I wonder if I have to map the userland pages to kernel space and whether or not I have to pin the userland pages in memory (I believe mlockall in the userland process does that already) ?

François
> --
> Philippe.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Doing DMA from peripheral to userland memory
  2021-08-27 14:09       ` François Legal
@ 2021-08-27 14:36         ` Philippe Gerum
  2021-08-31  9:36           ` François Legal
  0 siblings, 1 reply; 12+ messages in thread
From: Philippe Gerum @ 2021-08-27 14:36 UTC (permalink / raw)
  To: François Legal; +Cc: xenomai


François Legal <devel@thom.fr.eu.org> writes:

> Le Vendredi, Août 27, 2021 15:54 CEST, Philippe Gerum <rpm@xenomai.org> a écrit: 
>  
>> 
>> François Legal <devel@thom.fr.eu.org> writes:
>> 
>> > Le Vendredi, Août 27, 2021 15:01 CEST, Philippe Gerum <rpm@xenomai.org> a écrit: 
>> >  
>> >> 
>> >> François Legal via Xenomai <xenomai@xenomai.org> writes:
>> >> 
>> >> > Hello,
>> >> >
>> >> > working on a zynq7000 target (arm cortex a9), we have a peripheral that generates loads of data (many kbytes per ms).
>> >> >
>> >> > We would like to move that data, directly from the peripheral memory (the OCM of the SoC) directly to our RT application user memory using DMA.
>> >> >
>> >> > For one part of the data, we would like the DMA to de interlace that data while moving it. We figured out, the PL330 peripheral on the SoC should be able to do it, however, we would like, as much as possible, to retain the use of one or two channels of the PL330 to plain linux non RT use (via dmaengine).
>> >> >
>> >> > My first attempt would be to enhance the dmaengine API to add RT API, then implement the RT API calls in the PL330 driver.
>> >> >
>> >> > What do you think of this approach, and is it achievable at all (DMA directly to user land memory and/or having DMA channels exploited by xenomai and other by linux) ?
>> >> >
>> >> > Thanks in advance
>> >> >
>> >> > François
>> >> 
>> >> As a starting point, you may want to have a look at this document:
>> >> https://evlproject.org/core/oob-drivers/dma/
>> >> 
>> >> This is part of the EVL core documentation, but this is actually a
>> >> Dovetail feature.
>> >> 
>> >
>> > Well, that's quite what I want to do, so this is very good news that it is already available in the future. However, I need it through the ipipe right now, but I guess the process stays the same (through patching the dmaengine API and the DMA engine driver).
>> >
>> > I would guess the modifications to the DMA engine driver would be then easily ported to dovetail ?
>> >
>> 
>> Since they should follow the same pattern used for the controllers
>> Dovetail currently supports, I think so. You should be able to simplify
>> the code when porting it Dovetail actually.
>> 
>
> That's what I thought. Thanks a lot.
>
> So now, regarding the "to userland memory" aspect. I guess I will somehow have to, in order to make this happen, change the PTE flags to make these pages non cacheable (using dma_map_page maybe), but I wonder if I have to map the userland pages to kernel space and whether or not I have to pin the userland pages in memory (I believe mlockall in the userland process does that already) ?
>

The out-of-band SPI support available from EVL illustrates a possible
implementation. This code [2] implements what is described in this page
[1].

[1] https://evlproject.org/core/oob-drivers/spi/
[2] https://source.denx.de/Xenomai/xenomai4/linux-evl/-/blob/0969ccef9a5318244e484e847dab52999f6fec5c/drivers/spi/spi.c#L4259

-- 
Philippe.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Doing DMA from peripheral to userland memory
  2021-08-27 14:36         ` Philippe Gerum
@ 2021-08-31  9:36           ` François Legal
  2021-08-31 17:37             ` Philippe Gerum
  0 siblings, 1 reply; 12+ messages in thread
From: François Legal @ 2021-08-31  9:36 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai

Le Vendredi, Août 27, 2021 16:36 CEST, Philippe Gerum <rpm@xenomai.org> a écrit:

>
> François Legal <devel@thom.fr.eu.org> writes:
>
> > Le Vendredi, Août 27, 2021 15:54 CEST, Philippe Gerum <rpm@xenomai.org> a écrit:
> >
> >>
> >> François Legal <devel@thom.fr.eu.org> writes:
> >>
> >> > Le Vendredi, Août 27, 2021 15:01 CEST, Philippe Gerum <rpm@xenomai.org> a écrit:
> >> >
> >> >>
> >> >> François Legal via Xenomai <xenomai@xenomai.org> writes:
> >> >>
> >> >> > Hello,
> >> >> >
> >> >> > working on a zynq7000 target (arm cortex a9), we have a peripheral that generates loads of data (many kbytes per ms).
> >> >> >
> >> >> > We would like to move that data, directly from the peripheral memory (the OCM of the SoC) directly to our RT application user memory using DMA.
> >> >> >
> >> >> > For one part of the data, we would like the DMA to de interlace that data while moving it. We figured out, the PL330 peripheral on the SoC should be able to do it, however, we would like, as much as possible, to retain the use of one or two channels of the PL330 to plain linux non RT use (via dmaengine).
> >> >> >
> >> >> > My first attempt would be to enhance the dmaengine API to add RT API, then implement the RT API calls in the PL330 driver.
> >> >> >
> >> >> > What do you think of this approach, and is it achievable at all (DMA directly to user land memory and/or having DMA channels exploited by xenomai and other by linux) ?
> >> >> >
> >> >> > Thanks in advance
> >> >> >
> >> >> > François
> >> >>
> >> >> As a starting point, you may want to have a look at this document:
> >> >> https://evlproject.org/core/oob-drivers/dma/
> >> >>
> >> >> This is part of the EVL core documentation, but this is actually a
> >> >> Dovetail feature.
> >> >>
> >> >
> >> > Well, that's quite what I want to do, so this is very good news that it is already available in the future. However, I need it through the ipipe right now, but I guess the process stays the same (through patching the dmaengine API and the DMA engine driver).
> >> >
> >> > I would guess the modifications to the DMA engine driver would be then easily ported to dovetail ?
> >> >
> >>
> >> Since they should follow the same pattern used for the controllers
> >> Dovetail currently supports, I think so. You should be able to simplify
> >> the code when porting it Dovetail actually.
> >>
> >
> > That's what I thought. Thanks a lot.
> >
> > So now, regarding the "to userland memory" aspect. I guess I will somehow have to, in order to make this happen, change the PTE flags to make these pages non cacheable (using dma_map_page maybe), but I wonder if I have to map the userland pages to kernel space and whether or not I have to pin the userland pages in memory (I believe mlockall in the userland process does that already) ?
> >
>
> The out-of-band SPI support available from EVL illustrates a possible
> implementation. This code [2] implements what is described in this page
> [1].
>

Thanks for the example. I think what I'm trying to do is a little different from this however.
For the records, this is what I do (and that seems to be working) :
- as soon as user land buffers are allocated, tell the driver to pin the user land buffer pages in memory (with get_user_pages_fast). I'm not sure if this is required, as I think mlockall in the app would already take care of that.
- whenever I need to transfer data to the user land buffer, instruct the driver to dma remap those user land pages (with dma_map_page), then instruct the DMA controller of the physical address of these pages.
et voilà

This seem to work correctly and repeatedly so far.

François

> [1] https://evlproject.org/core/oob-drivers/spi/
> [2] https://source.denx.de/Xenomai/xenomai4/linux-evl/-/blob/0969ccef9a5318244e484e847dab52999f6fec5c/drivers/spi/spi.c#L4259
>
> --
> Philippe.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Doing DMA from peripheral to userland memory
  2021-08-31  9:36           ` François Legal
@ 2021-08-31 17:37             ` Philippe Gerum
  2021-09-01  8:24               ` François Legal
  0 siblings, 1 reply; 12+ messages in thread
From: Philippe Gerum @ 2021-08-31 17:37 UTC (permalink / raw)
  To: François Legal; +Cc: xenomai


François Legal <devel@thom.fr.eu.org> writes:

> Le Vendredi, Août 27, 2021 16:36 CEST, Philippe Gerum <rpm@xenomai.org> a écrit: 
>  
>> 
>> François Legal <devel@thom.fr.eu.org> writes:
>> 
>> > Le Vendredi, Août 27, 2021 15:54 CEST, Philippe Gerum <rpm@xenomai.org> a écrit: 
>> >  
>> >> 
>> >> François Legal <devel@thom.fr.eu.org> writes:
>> >> 
>> >> > Le Vendredi, Août 27, 2021 15:01 CEST, Philippe Gerum <rpm@xenomai.org> a écrit: 
>> >> >  
>> >> >> 
>> >> >> François Legal via Xenomai <xenomai@xenomai.org> writes:
>> >> >> 
>> >> >> > Hello,
>> >> >> >
>> >> >> > working on a zynq7000 target (arm cortex a9), we have a peripheral that generates loads of data (many kbytes per ms).
>> >> >> >
>> >> >> > We would like to move that data, directly from the peripheral memory (the OCM of the SoC) directly to our RT application user memory using DMA.
>> >> >> >
>> >> >> > For one part of the data, we would like the DMA to de interlace that data while moving it. We figured out, the PL330 peripheral on the SoC should be able to do it, however, we would like, as much as possible, to retain the use of one or two channels of the PL330 to plain linux non RT use (via dmaengine).
>> >> >> >
>> >> >> > My first attempt would be to enhance the dmaengine API to add RT API, then implement the RT API calls in the PL330 driver.
>> >> >> >
>> >> >> > What do you think of this approach, and is it achievable at all (DMA directly to user land memory and/or having DMA channels exploited by xenomai and other by linux) ?
>> >> >> >
>> >> >> > Thanks in advance
>> >> >> >
>> >> >> > François
>> >> >> 
>> >> >> As a starting point, you may want to have a look at this document:
>> >> >> https://evlproject.org/core/oob-drivers/dma/
>> >> >> 
>> >> >> This is part of the EVL core documentation, but this is actually a
>> >> >> Dovetail feature.
>> >> >> 
>> >> >
>> >> > Well, that's quite what I want to do, so this is very good news that it is already available in the future. However, I need it through the ipipe right now, but I guess the process stays the same (through patching the dmaengine API and the DMA engine driver).
>> >> >
>> >> > I would guess the modifications to the DMA engine driver would be then easily ported to dovetail ?
>> >> >
>> >> 
>> >> Since they should follow the same pattern used for the controllers
>> >> Dovetail currently supports, I think so. You should be able to simplify
>> >> the code when porting it Dovetail actually.
>> >> 
>> >
>> > That's what I thought. Thanks a lot.
>> >
>> > So now, regarding the "to userland memory" aspect. I guess I will somehow have to, in order to make this happen, change the PTE flags to make these pages non cacheable (using dma_map_page maybe), but I wonder if I have to map the userland pages to kernel space and whether or not I have to pin the userland pages in memory (I believe mlockall in the userland process does that already) ?
>> >
>> 
>> The out-of-band SPI support available from EVL illustrates a possible
>> implementation. This code [2] implements what is described in this page
>> [1].
>> 
>
> Thanks for the example. I think what I'm trying to do is a little different from this however.
> For the records, this is what I do (and that seems to be working) :
> - as soon as user land buffers are allocated, tell the driver to pin the user land buffer pages in memory (with get_user_pages_fast). I'm not sure if this is required, as I think mlockall in the app would already take care of that.
> - whenever I need to transfer data to the user land buffer, instruct the driver to dma remap those user land pages (with dma_map_page), then instruct the DMA controller of the physical address of these pages.
> et voilà
>
> This seem to work correctly and repeatedly so far.
>

Are transfers controlled from the real-time stage, and if so, how do you
deal with cache maintenance between transfers?

-- 
Philippe.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Doing DMA from peripheral to userland memory
  2021-08-31 17:37             ` Philippe Gerum
@ 2021-09-01  8:24               ` François Legal
  2021-09-02 16:41                 ` François Legal
  0 siblings, 1 reply; 12+ messages in thread
From: François Legal @ 2021-09-01  8:24 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai

Le Mardi, Août 31, 2021 19:37 CEST, Philippe Gerum <rpm@xenomai.org> a écrit:

>
> François Legal <devel@thom.fr.eu.org> writes:
>
> > Le Vendredi, Août 27, 2021 16:36 CEST, Philippe Gerum <rpm@xenomai.org> a écrit:
> >
> >>
> >> François Legal <devel@thom.fr.eu.org> writes:
> >>
> >> > Le Vendredi, Août 27, 2021 15:54 CEST, Philippe Gerum <rpm@xenomai.org> a écrit:
> >> >
> >> >>
> >> >> François Legal <devel@thom.fr.eu.org> writes:
> >> >>
> >> >> > Le Vendredi, Août 27, 2021 15:01 CEST, Philippe Gerum <rpm@xenomai.org> a écrit:
> >> >> >
> >> >> >>
> >> >> >> François Legal via Xenomai <xenomai@xenomai.org> writes:
> >> >> >>
> >> >> >> > Hello,
> >> >> >> >
> >> >> >> > working on a zynq7000 target (arm cortex a9), we have a peripheral that generates loads of data (many kbytes per ms).
> >> >> >> >
> >> >> >> > We would like to move that data, directly from the peripheral memory (the OCM of the SoC) directly to our RT application user memory using DMA.
> >> >> >> >
> >> >> >> > For one part of the data, we would like the DMA to de interlace that data while moving it. We figured out, the PL330 peripheral on the SoC should be able to do it, however, we would like, as much as possible, to retain the use of one or two channels of the PL330 to plain linux non RT use (via dmaengine).
> >> >> >> >
> >> >> >> > My first attempt would be to enhance the dmaengine API to add RT API, then implement the RT API calls in the PL330 driver.
> >> >> >> >
> >> >> >> > What do you think of this approach, and is it achievable at all (DMA directly to user land memory and/or having DMA channels exploited by xenomai and other by linux) ?
> >> >> >> >
> >> >> >> > Thanks in advance
> >> >> >> >
> >> >> >> > François
> >> >> >>
> >> >> >> As a starting point, you may want to have a look at this document:
> >> >> >> https://evlproject.org/core/oob-drivers/dma/
> >> >> >>
> >> >> >> This is part of the EVL core documentation, but this is actually a
> >> >> >> Dovetail feature.
> >> >> >>
> >> >> >
> >> >> > Well, that's quite what I want to do, so this is very good news that it is already available in the future. However, I need it through the ipipe right now, but I guess the process stays the same (through patching the dmaengine API and the DMA engine driver).
> >> >> >
> >> >> > I would guess the modifications to the DMA engine driver would be then easily ported to dovetail ?
> >> >> >
> >> >>
> >> >> Since they should follow the same pattern used for the controllers
> >> >> Dovetail currently supports, I think so. You should be able to simplify
> >> >> the code when porting it Dovetail actually.
> >> >>
> >> >
> >> > That's what I thought. Thanks a lot.
> >> >
> >> > So now, regarding the "to userland memory" aspect. I guess I will somehow have to, in order to make this happen, change the PTE flags to make these pages non cacheable (using dma_map_page maybe), but I wonder if I have to map the userland pages to kernel space and whether or not I have to pin the userland pages in memory (I believe mlockall in the userland process does that already) ?
> >> >
> >>
> >> The out-of-band SPI support available from EVL illustrates a possible
> >> implementation. This code [2] implements what is described in this page
> >> [1].
> >>
> >
> > Thanks for the example. I think what I'm trying to do is a little different from this however.
> > For the records, this is what I do (and that seems to be working) :
> > - as soon as user land buffers are allocated, tell the driver to pin the user land buffer pages in memory (with get_user_pages_fast). I'm not sure if this is required, as I think mlockall in the app would already take care of that.
> > - whenever I need to transfer data to the user land buffer, instruct the driver to dma remap those user land pages (with dma_map_page), then instruct the DMA controller of the physical address of these pages.
> > et voilà
> >
> > This seem to work correctly and repeatedly so far.
> >
>
> Are transfers controlled from the real-time stage, and if so, how do you
> deal with cache maintenance between transfers?

That is my next problem to fix. It seems, as long as I run the test program in the debugger, displaying the buffer filled by the DMA in GDB, everything is fine. When GDB get's out of the way, I seem to read data that got in the D cache before the DMA did the transfer.
I tried adding a flush_dcache_range before trigging the DMA, but it did not help.

Any suggestion ?

Thanks

François

>
> --
> Philippe.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Doing DMA from peripheral to userland memory
  2021-09-01  8:24               ` François Legal
@ 2021-09-02 16:41                 ` François Legal
  2021-09-02 17:12                   ` Philippe Gerum
  0 siblings, 1 reply; 12+ messages in thread
From: François Legal @ 2021-09-02 16:41 UTC (permalink / raw)
  To: François Legal; +Cc: Philippe Gerum, xenomai

Le Mercredi, Septembre 01, 2021 10:24 CEST, François Legal via Xenomai <xenomai@xenomai.org> a écrit:

> Le Mardi, Août 31, 2021 19:37 CEST, Philippe Gerum <rpm@xenomai.org> a écrit:
>
> >
> > François Legal <devel@thom.fr.eu.org> writes:
> >
> > > Le Vendredi, Août 27, 2021 16:36 CEST, Philippe Gerum <rpm@xenomai.org> a écrit:
> > >
> > >>
> > >> François Legal <devel@thom.fr.eu.org> writes:
> > >>
> > >> > Le Vendredi, Août 27, 2021 15:54 CEST, Philippe Gerum <rpm@xenomai.org> a écrit:
> > >> >
> > >> >>
> > >> >> François Legal <devel@thom.fr.eu.org> writes:
> > >> >>
> > >> >> > Le Vendredi, Août 27, 2021 15:01 CEST, Philippe Gerum <rpm@xenomai.org> a écrit:
> > >> >> >
> > >> >> >>
> > >> >> >> François Legal via Xenomai <xenomai@xenomai.org> writes:
> > >> >> >>
> > >> >> >> > Hello,
> > >> >> >> >
> > >> >> >> > working on a zynq7000 target (arm cortex a9), we have a peripheral that generates loads of data (many kbytes per ms).
> > >> >> >> >
> > >> >> >> > We would like to move that data, directly from the peripheral memory (the OCM of the SoC) directly to our RT application user memory using DMA.
> > >> >> >> >
> > >> >> >> > For one part of the data, we would like the DMA to de interlace that data while moving it. We figured out, the PL330 peripheral on the SoC should be able to do it, however, we would like, as much as possible, to retain the use of one or two channels of the PL330 to plain linux non RT use (via dmaengine).
> > >> >> >> >
> > >> >> >> > My first attempt would be to enhance the dmaengine API to add RT API, then implement the RT API calls in the PL330 driver.
> > >> >> >> >
> > >> >> >> > What do you think of this approach, and is it achievable at all (DMA directly to user land memory and/or having DMA channels exploited by xenomai and other by linux) ?
> > >> >> >> >
> > >> >> >> > Thanks in advance
> > >> >> >> >
> > >> >> >> > François
> > >> >> >>
> > >> >> >> As a starting point, you may want to have a look at this document:
> > >> >> >> https://evlproject.org/core/oob-drivers/dma/
> > >> >> >>
> > >> >> >> This is part of the EVL core documentation, but this is actually a
> > >> >> >> Dovetail feature.
> > >> >> >>
> > >> >> >
> > >> >> > Well, that's quite what I want to do, so this is very good news that it is already available in the future. However, I need it through the ipipe right now, but I guess the process stays the same (through patching the dmaengine API and the DMA engine driver).
> > >> >> >
> > >> >> > I would guess the modifications to the DMA engine driver would be then easily ported to dovetail ?
> > >> >> >
> > >> >>
> > >> >> Since they should follow the same pattern used for the controllers
> > >> >> Dovetail currently supports, I think so. You should be able to simplify
> > >> >> the code when porting it Dovetail actually.
> > >> >>
> > >> >
> > >> > That's what I thought. Thanks a lot.
> > >> >
> > >> > So now, regarding the "to userland memory" aspect. I guess I will somehow have to, in order to make this happen, change the PTE flags to make these pages non cacheable (using dma_map_page maybe), but I wonder if I have to map the userland pages to kernel space and whether or not I have to pin the userland pages in memory (I believe mlockall in the userland process does that already) ?
> > >> >
> > >>
> > >> The out-of-band SPI support available from EVL illustrates a possible
> > >> implementation. This code [2] implements what is described in this page
> > >> [1].
> > >>
> > >
> > > Thanks for the example. I think what I'm trying to do is a little different from this however.
> > > For the records, this is what I do (and that seems to be working) :> > - as soon as user land buffers are allocated, tell the driver to pin the user land buffer pages in memory (with get_user_pages_fast). I'm not sure if this is required, as I think mlockall in the app would already take care of that.
> > > - whenever I need to transfer data to the user land buffer, instruct the driver to dma remap those user land pages (with dma_map_page), then instruct the DMA controller of the physical address of these pages.
> > > et voilà
> > >
> > > This seem to work correctly and repeatedly so far.
> > >
> >
> > Are transfers controlled from the real-time stage, and if so, how do you
> > deal with cache maintenance between transfers?
>
> That is my next problem to fix. It seems, as long as I run the test program in the debugger, displaying the buffer filled by the DMA in GDB, everything is fine. When GDB get's out of the way, I seem to read data that got in the D cache before the DMA did the transfer.
> I tried adding a flush_dcache_range before trigging the DMA, but it did not help.
>
> Any suggestion ?
>
> Thanks
>
> François
>

So I dug deep into the kernel cache management code for my (arm v7) arch, but could not find an answer nor a solution.
I now wonder whether or not this (DMA to user land memory) is possible on this arch at all because of what is suggested in [1] even if that's a bit old.

I saw that flush_dcache_range on armv7 is quite a noop, I tried with dmac_flush_range (which does the real thing with CP15), passing either the user land virtual address directly or first getting a kernel mapping with kmap_atomic but that did not change anything. I still, most of the time, get the first 2 cache line of data in the user land application wrong after the DMA transfer is done.

I'm not sure where to look at next.

François

> >
> > --
> > Philippe.
>
>

 [1] https://groups.google.com/g/linux.kernel/c/QONWGX6WJaE



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Doing DMA from peripheral to userland memory
  2021-09-02 16:41                 ` François Legal
@ 2021-09-02 17:12                   ` Philippe Gerum
  0 siblings, 0 replies; 12+ messages in thread
From: Philippe Gerum @ 2021-09-02 17:12 UTC (permalink / raw)
  To: François Legal; +Cc: xenomai


François Legal <devel@thom.fr.eu.org> writes:

> Le Mercredi, Septembre 01, 2021 10:24 CEST, François Legal via Xenomai <xenomai@xenomai.org> a écrit: 
>  
>> Le Mardi, Août 31, 2021 19:37 CEST, Philippe Gerum <rpm@xenomai.org> a écrit: 
>>  
>> > 
>> > François Legal <devel@thom.fr.eu.org> writes:
>> > 
>> > > Le Vendredi, Août 27, 2021 16:36 CEST, Philippe Gerum <rpm@xenomai.org> a écrit: 
>> > >  
>> > >> 
>> > >> François Legal <devel@thom.fr.eu.org> writes:
>> > >> 
>> > >> > Le Vendredi, Août 27, 2021 15:54 CEST, Philippe Gerum <rpm@xenomai.org> a écrit: 
>> > >> >  
>> > >> >> 
>> > >> >> François Legal <devel@thom.fr.eu.org> writes:
>> > >> >> 
>> > >> >> > Le Vendredi, Août 27, 2021 15:01 CEST, Philippe Gerum <rpm@xenomai.org> a écrit: 
>> > >> >> >  
>> > >> >> >> 
>> > >> >> >> François Legal via Xenomai <xenomai@xenomai.org> writes:
>> > >> >> >> 
>> > >> >> >> > Hello,
>> > >> >> >> >
>> > >> >> >> > working on a zynq7000 target (arm cortex a9), we have a peripheral that generates loads of data (many kbytes per ms).
>> > >> >> >> >
>> > >> >> >> > We would like to move that data, directly from the peripheral memory (the OCM of the SoC) directly to our RT application user memory using DMA.
>> > >> >> >> >
>> > >> >> >> > For one part of the data, we would like the DMA to de interlace that data while moving it. We figured out, the PL330 peripheral on the SoC should be able to do it, however, we would like, as much as possible, to retain the use of one or two channels of the PL330 to plain linux non RT use (via dmaengine).
>> > >> >> >> >
>> > >> >> >> > My first attempt would be to enhance the dmaengine API to add RT API, then implement the RT API calls in the PL330 driver.
>> > >> >> >> >
>> > >> >> >> > What do you think of this approach, and is it achievable at all (DMA directly to user land memory and/or having DMA channels exploited by xenomai and other by linux) ?
>> > >> >> >> >
>> > >> >> >> > Thanks in advance
>> > >> >> >> >
>> > >> >> >> > François
>> > >> >> >> 
>> > >> >> >> As a starting point, you may want to have a look at this document:
>> > >> >> >> https://evlproject.org/core/oob-drivers/dma/
>> > >> >> >> 
>> > >> >> >> This is part of the EVL core documentation, but this is actually a
>> > >> >> >> Dovetail feature.
>> > >> >> >> 
>> > >> >> >
>> > >> >> > Well, that's quite what I want to do, so this is very good news that it is already available in the future. However, I need it through the ipipe right now, but I guess the process stays the same (through patching the dmaengine API and the DMA engine driver).
>> > >> >> >
>> > >> >> > I would guess the modifications to the DMA engine driver would be then easily ported to dovetail ?
>> > >> >> >
>> > >> >> 
>> > >> >> Since they should follow the same pattern used for the controllers
>> > >> >> Dovetail currently supports, I think so. You should be able to simplify
>> > >> >> the code when porting it Dovetail actually.
>> > >> >> 
>> > >> >
>> > >> > That's what I thought. Thanks a lot.
>> > >> >
>> > >> > So now, regarding the "to userland memory" aspect. I guess I will somehow have to, in order to make this happen, change the PTE flags to make these pages non cacheable (using dma_map_page maybe), but I wonder if I have to map the userland pages to kernel space and whether or not I have to pin the userland pages in memory (I believe mlockall in the userland process does that already) ?
>> > >> >
>> > >> 
>> > >> The out-of-band SPI support available from EVL illustrates a possible
>> > >> implementation. This code [2] implements what is described in this page
>> > >> [1].
>> > >> 
>> > >
>> > > Thanks for the example. I think what I'm trying to do is a little different from this however.
>> > > For the records, this is what I do (and that seems to be working) :> > - as soon as user land buffers are allocated, tell the driver to pin the user land buffer pages in memory (with get_user_pages_fast). I'm not sure if this is required, as I think mlockall in the app would already take care of that.
>> > > - whenever I need to transfer data to the user land buffer, instruct the driver to dma remap those user land pages (with dma_map_page), then instruct the DMA controller of the physical address of these pages.
>> > > et voilà
>> > >
>> > > This seem to work correctly and repeatedly so far.
>> > >
>> > 
>> > Are transfers controlled from the real-time stage, and if so, how do you
>> > deal with cache maintenance between transfers?
>> 
>> That is my next problem to fix. It seems, as long as I run the test program in the debugger, displaying the buffer filled by the DMA in GDB, everything is fine. When GDB get's out of the way, I seem to read data that got in the D cache before the DMA did the transfer.
>> I tried adding a flush_dcache_range before trigging the DMA, but it did not help.
>> 
>> Any suggestion ?
>> 
>> Thanks
>> 
>> François
>> 
>
> So I dug deep into the kernel cache management code for my (arm v7) arch, but could not find an answer nor a solution.
> I now wonder whether or not this (DMA to user land memory) is possible on this arch at all because of what is suggested in [1] even if that's a bit old.
>
> I saw that flush_dcache_range on armv7 is quite a noop, I tried with dmac_flush_range (which does the real thing with CP15), passing either the user land virtual address directly or first getting a kernel mapping with kmap_atomic but that did not change anything. I still, most of the time, get the first 2 cache line of data in the user land application wrong after the DMA transfer is done.
>
> I'm not sure where to look at next.
>

DMA to userland memory is a non-issue in the regular in-band
context. The problem starts with cache maintenance when you want to run
these I/O requests from the oob stage, hence my previous question.

The rule of thumb is that a driver should not fiddle with the innards of
cache maintenance directly, and certainly not with flush_dcache_range()
and friends. This includes Xenomai drivers. The DMA API hides these
details in a portable way, typically the DMA streaming API would clean
and/or invalidate the cache(s) layers when mapping, unmapping buffers.

Problem: we may not use the regular DMA API from oob context.  For
instance, if some IOMMU is involved, or bounce buffers of some sort
exist, or complex cache management layers in the kernel are traversed in
general (e.g. some outer L2 caches are ugly), then things might get
pretty nasty if this rule is not followed. For this reason, if using
coherent memory is practical performance-wise for the use case, then
this is a sane option for oob I/O, and you can do that as illustrated by
the example I referred to.

In this case, the kernel should allocate a suitable chunk of coherent
memory for your application to perform I/O, not your application
requesting common cached memory from its address space to be pinned and
used for DMA.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-09-02 17:12 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-27  9:29 Doing DMA from peripheral to userland memory François Legal
2021-08-27 13:01 ` Jan Kiszka
2021-08-27 13:01 ` Philippe Gerum
2021-08-27 13:44   ` François Legal
2021-08-27 13:54     ` Philippe Gerum
2021-08-27 14:09       ` François Legal
2021-08-27 14:36         ` Philippe Gerum
2021-08-31  9:36           ` François Legal
2021-08-31 17:37             ` Philippe Gerum
2021-09-01  8:24               ` François Legal
2021-09-02 16:41                 ` François Legal
2021-09-02 17:12                   ` Philippe Gerum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.