* Doing DMA from peripheral to userland memory @ 2021-08-27 9:29 François Legal 2021-08-27 13:01 ` Jan Kiszka 2021-08-27 13:01 ` Philippe Gerum 0 siblings, 2 replies; 12+ messages in thread From: François Legal @ 2021-08-27 9:29 UTC (permalink / raw) To: xenomai Hello, working on a zynq7000 target (arm cortex a9), we have a peripheral that generates loads of data (many kbytes per ms). We would like to move that data, directly from the peripheral memory (the OCM of the SoC) directly to our RT application user memory using DMA. For one part of the data, we would like the DMA to de interlace that data while moving it. We figured out, the PL330 peripheral on the SoC should be able to do it, however, we would like, as much as possible, to retain the use of one or two channels of the PL330 to plain linux non RT use (via dmaengine). My first attempt would be to enhance the dmaengine API to add RT API, then implement the RT API calls in the PL330 driver. What do you think of this approach, and is it achievable at all (DMA directly to user land memory and/or having DMA channels exploited by xenomai and other by linux) ? Thanks in advance François ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Doing DMA from peripheral to userland memory 2021-08-27 9:29 Doing DMA from peripheral to userland memory François Legal @ 2021-08-27 13:01 ` Jan Kiszka 2021-08-27 13:01 ` Philippe Gerum 1 sibling, 0 replies; 12+ messages in thread From: Jan Kiszka @ 2021-08-27 13:01 UTC (permalink / raw) To: François Legal, xenomai On 27.08.21 11:29, François Legal via Xenomai wrote: > Hello, > > working on a zynq7000 target (arm cortex a9), we have a peripheral that generates loads of data (many kbytes per ms). > > We would like to move that data, directly from the peripheral memory (the OCM of the SoC) directly to our RT application user memory using DMA. > > For one part of the data, we would like the DMA to de interlace that data while moving it. We figured out, the PL330 peripheral on the SoC should be able to do it, however, we would like, as much as possible, to retain the use of one or two channels of the PL330 to plain linux non RT use (via dmaengine). > > My first attempt would be to enhance the dmaengine API to add RT API, then implement the RT API calls in the PL330 driver. > > What do you think of this approach, and is it achievable at all (DMA directly to user land memory and/or having DMA channels exploited by xenomai and other by linux) ? > IIRC, the topic of DMA also popped up in the context of SPI support in the past. In any case, you likely want to check how Dovetail and EVL / Xenomai 4 is possibly already addressing, to get an idea of potential solution patterns. Jan -- Siemens AG, T RDA IOT Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Doing DMA from peripheral to userland memory 2021-08-27 9:29 Doing DMA from peripheral to userland memory François Legal 2021-08-27 13:01 ` Jan Kiszka @ 2021-08-27 13:01 ` Philippe Gerum 2021-08-27 13:44 ` François Legal 1 sibling, 1 reply; 12+ messages in thread From: Philippe Gerum @ 2021-08-27 13:01 UTC (permalink / raw) To: François Legal; +Cc: xenomai François Legal via Xenomai <xenomai@xenomai.org> writes: > Hello, > > working on a zynq7000 target (arm cortex a9), we have a peripheral that generates loads of data (many kbytes per ms). > > We would like to move that data, directly from the peripheral memory (the OCM of the SoC) directly to our RT application user memory using DMA. > > For one part of the data, we would like the DMA to de interlace that data while moving it. We figured out, the PL330 peripheral on the SoC should be able to do it, however, we would like, as much as possible, to retain the use of one or two channels of the PL330 to plain linux non RT use (via dmaengine). > > My first attempt would be to enhance the dmaengine API to add RT API, then implement the RT API calls in the PL330 driver. > > What do you think of this approach, and is it achievable at all (DMA directly to user land memory and/or having DMA channels exploited by xenomai and other by linux) ? > > Thanks in advance > > François As a starting point, you may want to have a look at this document: https://evlproject.org/core/oob-drivers/dma/ This is part of the EVL core documentation, but this is actually a Dovetail feature. -- Philippe. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Doing DMA from peripheral to userland memory 2021-08-27 13:01 ` Philippe Gerum @ 2021-08-27 13:44 ` François Legal 2021-08-27 13:54 ` Philippe Gerum 0 siblings, 1 reply; 12+ messages in thread From: François Legal @ 2021-08-27 13:44 UTC (permalink / raw) To: Philippe Gerum; +Cc: xenomai Le Vendredi, Août 27, 2021 15:01 CEST, Philippe Gerum <rpm@xenomai.org> a écrit: > > François Legal via Xenomai <xenomai@xenomai.org> writes: > > > Hello, > > > > working on a zynq7000 target (arm cortex a9), we have a peripheral that generates loads of data (many kbytes per ms). > > > > We would like to move that data, directly from the peripheral memory (the OCM of the SoC) directly to our RT application user memory using DMA. > > > > For one part of the data, we would like the DMA to de interlace that data while moving it. We figured out, the PL330 peripheral on the SoC should be able to do it, however, we would like, as much as possible, to retain the use of one or two channels of the PL330 to plain linux non RT use (via dmaengine). > > > > My first attempt would be to enhance the dmaengine API to add RT API, then implement the RT API calls in the PL330 driver. > > > > What do you think of this approach, and is it achievable at all (DMA directly to user land memory and/or having DMA channels exploited by xenomai and other by linux) ? > > > > Thanks in advance > > > > François > > As a starting point, you may want to have a look at this document: > https://evlproject.org/core/oob-drivers/dma/ > > This is part of the EVL core documentation, but this is actually a > Dovetail feature. > Well, that's quite what I want to do, so this is very good news that it is already available in the future. However, I need it through the ipipe right now, but I guess the process stays the same (through patching the dmaengine API and the DMA engine driver). I would guess the modifications to the DMA engine driver would be then easily ported to dovetail ? François > -- > Philippe. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Doing DMA from peripheral to userland memory 2021-08-27 13:44 ` François Legal @ 2021-08-27 13:54 ` Philippe Gerum 2021-08-27 14:09 ` François Legal 0 siblings, 1 reply; 12+ messages in thread From: Philippe Gerum @ 2021-08-27 13:54 UTC (permalink / raw) To: François Legal; +Cc: xenomai François Legal <devel@thom.fr.eu.org> writes: > Le Vendredi, Août 27, 2021 15:01 CEST, Philippe Gerum <rpm@xenomai.org> a écrit: > >> >> François Legal via Xenomai <xenomai@xenomai.org> writes: >> >> > Hello, >> > >> > working on a zynq7000 target (arm cortex a9), we have a peripheral that generates loads of data (many kbytes per ms). >> > >> > We would like to move that data, directly from the peripheral memory (the OCM of the SoC) directly to our RT application user memory using DMA. >> > >> > For one part of the data, we would like the DMA to de interlace that data while moving it. We figured out, the PL330 peripheral on the SoC should be able to do it, however, we would like, as much as possible, to retain the use of one or two channels of the PL330 to plain linux non RT use (via dmaengine). >> > >> > My first attempt would be to enhance the dmaengine API to add RT API, then implement the RT API calls in the PL330 driver. >> > >> > What do you think of this approach, and is it achievable at all (DMA directly to user land memory and/or having DMA channels exploited by xenomai and other by linux) ? >> > >> > Thanks in advance >> > >> > François >> >> As a starting point, you may want to have a look at this document: >> https://evlproject.org/core/oob-drivers/dma/ >> >> This is part of the EVL core documentation, but this is actually a >> Dovetail feature. >> > > Well, that's quite what I want to do, so this is very good news that it is already available in the future. However, I need it through the ipipe right now, but I guess the process stays the same (through patching the dmaengine API and the DMA engine driver). > > I would guess the modifications to the DMA engine driver would be then easily ported to dovetail ? > Since they should follow the same pattern used for the controllers Dovetail currently supports, I think so. You should be able to simplify the code when porting it Dovetail actually. -- Philippe. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Doing DMA from peripheral to userland memory 2021-08-27 13:54 ` Philippe Gerum @ 2021-08-27 14:09 ` François Legal 2021-08-27 14:36 ` Philippe Gerum 0 siblings, 1 reply; 12+ messages in thread From: François Legal @ 2021-08-27 14:09 UTC (permalink / raw) To: Philippe Gerum; +Cc: xenomai Le Vendredi, Août 27, 2021 15:54 CEST, Philippe Gerum <rpm@xenomai.org> a écrit: > > François Legal <devel@thom.fr.eu.org> writes: > > > Le Vendredi, Août 27, 2021 15:01 CEST, Philippe Gerum <rpm@xenomai.org> a écrit: > > > >> > >> François Legal via Xenomai <xenomai@xenomai.org> writes: > >> > >> > Hello, > >> > > >> > working on a zynq7000 target (arm cortex a9), we have a peripheral that generates loads of data (many kbytes per ms). > >> > > >> > We would like to move that data, directly from the peripheral memory (the OCM of the SoC) directly to our RT application user memory using DMA. > >> > > >> > For one part of the data, we would like the DMA to de interlace that data while moving it. We figured out, the PL330 peripheral on the SoC should be able to do it, however, we would like, as much as possible, to retain the use of one or two channels of the PL330 to plain linux non RT use (via dmaengine). > >> > > >> > My first attempt would be to enhance the dmaengine API to add RT API, then implement the RT API calls in the PL330 driver. > >> > > >> > What do you think of this approach, and is it achievable at all (DMA directly to user land memory and/or having DMA channels exploited by xenomai and other by linux) ? > >> > > >> > Thanks in advance > >> > > >> > François > >> > >> As a starting point, you may want to have a look at this document: > >> https://evlproject.org/core/oob-drivers/dma/ > >> > >> This is part of the EVL core documentation, but this is actually a > >> Dovetail feature. > >> > > > > Well, that's quite what I want to do, so this is very good news that it is already available in the future. However, I need it through the ipipe right now, but I guess the process stays the same (through patching the dmaengine API and the DMA engine driver). > > > > I would guess the modifications to the DMA engine driver would be then easily ported to dovetail ? > > > > Since they should follow the same pattern used for the controllers > Dovetail currently supports, I think so. You should be able to simplify > the code when porting it Dovetail actually. > That's what I thought. Thanks a lot. So now, regarding the "to userland memory" aspect. I guess I will somehow have to, in order to make this happen, change the PTE flags to make these pages non cacheable (using dma_map_page maybe), but I wonder if I have to map the userland pages to kernel space and whether or not I have to pin the userland pages in memory (I believe mlockall in the userland process does that already) ? François > -- > Philippe. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Doing DMA from peripheral to userland memory 2021-08-27 14:09 ` François Legal @ 2021-08-27 14:36 ` Philippe Gerum 2021-08-31 9:36 ` François Legal 0 siblings, 1 reply; 12+ messages in thread From: Philippe Gerum @ 2021-08-27 14:36 UTC (permalink / raw) To: François Legal; +Cc: xenomai François Legal <devel@thom.fr.eu.org> writes: > Le Vendredi, Août 27, 2021 15:54 CEST, Philippe Gerum <rpm@xenomai.org> a écrit: > >> >> François Legal <devel@thom.fr.eu.org> writes: >> >> > Le Vendredi, Août 27, 2021 15:01 CEST, Philippe Gerum <rpm@xenomai.org> a écrit: >> > >> >> >> >> François Legal via Xenomai <xenomai@xenomai.org> writes: >> >> >> >> > Hello, >> >> > >> >> > working on a zynq7000 target (arm cortex a9), we have a peripheral that generates loads of data (many kbytes per ms). >> >> > >> >> > We would like to move that data, directly from the peripheral memory (the OCM of the SoC) directly to our RT application user memory using DMA. >> >> > >> >> > For one part of the data, we would like the DMA to de interlace that data while moving it. We figured out, the PL330 peripheral on the SoC should be able to do it, however, we would like, as much as possible, to retain the use of one or two channels of the PL330 to plain linux non RT use (via dmaengine). >> >> > >> >> > My first attempt would be to enhance the dmaengine API to add RT API, then implement the RT API calls in the PL330 driver. >> >> > >> >> > What do you think of this approach, and is it achievable at all (DMA directly to user land memory and/or having DMA channels exploited by xenomai and other by linux) ? >> >> > >> >> > Thanks in advance >> >> > >> >> > François >> >> >> >> As a starting point, you may want to have a look at this document: >> >> https://evlproject.org/core/oob-drivers/dma/ >> >> >> >> This is part of the EVL core documentation, but this is actually a >> >> Dovetail feature. >> >> >> > >> > Well, that's quite what I want to do, so this is very good news that it is already available in the future. However, I need it through the ipipe right now, but I guess the process stays the same (through patching the dmaengine API and the DMA engine driver). >> > >> > I would guess the modifications to the DMA engine driver would be then easily ported to dovetail ? >> > >> >> Since they should follow the same pattern used for the controllers >> Dovetail currently supports, I think so. You should be able to simplify >> the code when porting it Dovetail actually. >> > > That's what I thought. Thanks a lot. > > So now, regarding the "to userland memory" aspect. I guess I will somehow have to, in order to make this happen, change the PTE flags to make these pages non cacheable (using dma_map_page maybe), but I wonder if I have to map the userland pages to kernel space and whether or not I have to pin the userland pages in memory (I believe mlockall in the userland process does that already) ? > The out-of-band SPI support available from EVL illustrates a possible implementation. This code [2] implements what is described in this page [1]. [1] https://evlproject.org/core/oob-drivers/spi/ [2] https://source.denx.de/Xenomai/xenomai4/linux-evl/-/blob/0969ccef9a5318244e484e847dab52999f6fec5c/drivers/spi/spi.c#L4259 -- Philippe. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Doing DMA from peripheral to userland memory 2021-08-27 14:36 ` Philippe Gerum @ 2021-08-31 9:36 ` François Legal 2021-08-31 17:37 ` Philippe Gerum 0 siblings, 1 reply; 12+ messages in thread From: François Legal @ 2021-08-31 9:36 UTC (permalink / raw) To: Philippe Gerum; +Cc: xenomai Le Vendredi, Août 27, 2021 16:36 CEST, Philippe Gerum <rpm@xenomai.org> a écrit: > > François Legal <devel@thom.fr.eu.org> writes: > > > Le Vendredi, Août 27, 2021 15:54 CEST, Philippe Gerum <rpm@xenomai.org> a écrit: > > > >> > >> François Legal <devel@thom.fr.eu.org> writes: > >> > >> > Le Vendredi, Août 27, 2021 15:01 CEST, Philippe Gerum <rpm@xenomai.org> a écrit: > >> > > >> >> > >> >> François Legal via Xenomai <xenomai@xenomai.org> writes: > >> >> > >> >> > Hello, > >> >> > > >> >> > working on a zynq7000 target (arm cortex a9), we have a peripheral that generates loads of data (many kbytes per ms). > >> >> > > >> >> > We would like to move that data, directly from the peripheral memory (the OCM of the SoC) directly to our RT application user memory using DMA. > >> >> > > >> >> > For one part of the data, we would like the DMA to de interlace that data while moving it. We figured out, the PL330 peripheral on the SoC should be able to do it, however, we would like, as much as possible, to retain the use of one or two channels of the PL330 to plain linux non RT use (via dmaengine). > >> >> > > >> >> > My first attempt would be to enhance the dmaengine API to add RT API, then implement the RT API calls in the PL330 driver. > >> >> > > >> >> > What do you think of this approach, and is it achievable at all (DMA directly to user land memory and/or having DMA channels exploited by xenomai and other by linux) ? > >> >> > > >> >> > Thanks in advance > >> >> > > >> >> > François > >> >> > >> >> As a starting point, you may want to have a look at this document: > >> >> https://evlproject.org/core/oob-drivers/dma/ > >> >> > >> >> This is part of the EVL core documentation, but this is actually a > >> >> Dovetail feature. > >> >> > >> > > >> > Well, that's quite what I want to do, so this is very good news that it is already available in the future. However, I need it through the ipipe right now, but I guess the process stays the same (through patching the dmaengine API and the DMA engine driver). > >> > > >> > I would guess the modifications to the DMA engine driver would be then easily ported to dovetail ? > >> > > >> > >> Since they should follow the same pattern used for the controllers > >> Dovetail currently supports, I think so. You should be able to simplify > >> the code when porting it Dovetail actually. > >> > > > > That's what I thought. Thanks a lot. > > > > So now, regarding the "to userland memory" aspect. I guess I will somehow have to, in order to make this happen, change the PTE flags to make these pages non cacheable (using dma_map_page maybe), but I wonder if I have to map the userland pages to kernel space and whether or not I have to pin the userland pages in memory (I believe mlockall in the userland process does that already) ? > > > > The out-of-band SPI support available from EVL illustrates a possible > implementation. This code [2] implements what is described in this page > [1]. > Thanks for the example. I think what I'm trying to do is a little different from this however. For the records, this is what I do (and that seems to be working) : - as soon as user land buffers are allocated, tell the driver to pin the user land buffer pages in memory (with get_user_pages_fast). I'm not sure if this is required, as I think mlockall in the app would already take care of that. - whenever I need to transfer data to the user land buffer, instruct the driver to dma remap those user land pages (with dma_map_page), then instruct the DMA controller of the physical address of these pages. et voilà This seem to work correctly and repeatedly so far. François > [1] https://evlproject.org/core/oob-drivers/spi/ > [2] https://source.denx.de/Xenomai/xenomai4/linux-evl/-/blob/0969ccef9a5318244e484e847dab52999f6fec5c/drivers/spi/spi.c#L4259 > > -- > Philippe. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Doing DMA from peripheral to userland memory 2021-08-31 9:36 ` François Legal @ 2021-08-31 17:37 ` Philippe Gerum 2021-09-01 8:24 ` François Legal 0 siblings, 1 reply; 12+ messages in thread From: Philippe Gerum @ 2021-08-31 17:37 UTC (permalink / raw) To: François Legal; +Cc: xenomai François Legal <devel@thom.fr.eu.org> writes: > Le Vendredi, Août 27, 2021 16:36 CEST, Philippe Gerum <rpm@xenomai.org> a écrit: > >> >> François Legal <devel@thom.fr.eu.org> writes: >> >> > Le Vendredi, Août 27, 2021 15:54 CEST, Philippe Gerum <rpm@xenomai.org> a écrit: >> > >> >> >> >> François Legal <devel@thom.fr.eu.org> writes: >> >> >> >> > Le Vendredi, Août 27, 2021 15:01 CEST, Philippe Gerum <rpm@xenomai.org> a écrit: >> >> > >> >> >> >> >> >> François Legal via Xenomai <xenomai@xenomai.org> writes: >> >> >> >> >> >> > Hello, >> >> >> > >> >> >> > working on a zynq7000 target (arm cortex a9), we have a peripheral that generates loads of data (many kbytes per ms). >> >> >> > >> >> >> > We would like to move that data, directly from the peripheral memory (the OCM of the SoC) directly to our RT application user memory using DMA. >> >> >> > >> >> >> > For one part of the data, we would like the DMA to de interlace that data while moving it. We figured out, the PL330 peripheral on the SoC should be able to do it, however, we would like, as much as possible, to retain the use of one or two channels of the PL330 to plain linux non RT use (via dmaengine). >> >> >> > >> >> >> > My first attempt would be to enhance the dmaengine API to add RT API, then implement the RT API calls in the PL330 driver. >> >> >> > >> >> >> > What do you think of this approach, and is it achievable at all (DMA directly to user land memory and/or having DMA channels exploited by xenomai and other by linux) ? >> >> >> > >> >> >> > Thanks in advance >> >> >> > >> >> >> > François >> >> >> >> >> >> As a starting point, you may want to have a look at this document: >> >> >> https://evlproject.org/core/oob-drivers/dma/ >> >> >> >> >> >> This is part of the EVL core documentation, but this is actually a >> >> >> Dovetail feature. >> >> >> >> >> > >> >> > Well, that's quite what I want to do, so this is very good news that it is already available in the future. However, I need it through the ipipe right now, but I guess the process stays the same (through patching the dmaengine API and the DMA engine driver). >> >> > >> >> > I would guess the modifications to the DMA engine driver would be then easily ported to dovetail ? >> >> > >> >> >> >> Since they should follow the same pattern used for the controllers >> >> Dovetail currently supports, I think so. You should be able to simplify >> >> the code when porting it Dovetail actually. >> >> >> > >> > That's what I thought. Thanks a lot. >> > >> > So now, regarding the "to userland memory" aspect. I guess I will somehow have to, in order to make this happen, change the PTE flags to make these pages non cacheable (using dma_map_page maybe), but I wonder if I have to map the userland pages to kernel space and whether or not I have to pin the userland pages in memory (I believe mlockall in the userland process does that already) ? >> > >> >> The out-of-band SPI support available from EVL illustrates a possible >> implementation. This code [2] implements what is described in this page >> [1]. >> > > Thanks for the example. I think what I'm trying to do is a little different from this however. > For the records, this is what I do (and that seems to be working) : > - as soon as user land buffers are allocated, tell the driver to pin the user land buffer pages in memory (with get_user_pages_fast). I'm not sure if this is required, as I think mlockall in the app would already take care of that. > - whenever I need to transfer data to the user land buffer, instruct the driver to dma remap those user land pages (with dma_map_page), then instruct the DMA controller of the physical address of these pages. > et voilà > > This seem to work correctly and repeatedly so far. > Are transfers controlled from the real-time stage, and if so, how do you deal with cache maintenance between transfers? -- Philippe. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Doing DMA from peripheral to userland memory 2021-08-31 17:37 ` Philippe Gerum @ 2021-09-01 8:24 ` François Legal 2021-09-02 16:41 ` François Legal 0 siblings, 1 reply; 12+ messages in thread From: François Legal @ 2021-09-01 8:24 UTC (permalink / raw) To: Philippe Gerum; +Cc: xenomai Le Mardi, Août 31, 2021 19:37 CEST, Philippe Gerum <rpm@xenomai.org> a écrit: > > François Legal <devel@thom.fr.eu.org> writes: > > > Le Vendredi, Août 27, 2021 16:36 CEST, Philippe Gerum <rpm@xenomai.org> a écrit: > > > >> > >> François Legal <devel@thom.fr.eu.org> writes: > >> > >> > Le Vendredi, Août 27, 2021 15:54 CEST, Philippe Gerum <rpm@xenomai.org> a écrit: > >> > > >> >> > >> >> François Legal <devel@thom.fr.eu.org> writes: > >> >> > >> >> > Le Vendredi, Août 27, 2021 15:01 CEST, Philippe Gerum <rpm@xenomai.org> a écrit: > >> >> > > >> >> >> > >> >> >> François Legal via Xenomai <xenomai@xenomai.org> writes: > >> >> >> > >> >> >> > Hello, > >> >> >> > > >> >> >> > working on a zynq7000 target (arm cortex a9), we have a peripheral that generates loads of data (many kbytes per ms). > >> >> >> > > >> >> >> > We would like to move that data, directly from the peripheral memory (the OCM of the SoC) directly to our RT application user memory using DMA. > >> >> >> > > >> >> >> > For one part of the data, we would like the DMA to de interlace that data while moving it. We figured out, the PL330 peripheral on the SoC should be able to do it, however, we would like, as much as possible, to retain the use of one or two channels of the PL330 to plain linux non RT use (via dmaengine). > >> >> >> > > >> >> >> > My first attempt would be to enhance the dmaengine API to add RT API, then implement the RT API calls in the PL330 driver. > >> >> >> > > >> >> >> > What do you think of this approach, and is it achievable at all (DMA directly to user land memory and/or having DMA channels exploited by xenomai and other by linux) ? > >> >> >> > > >> >> >> > Thanks in advance > >> >> >> > > >> >> >> > François > >> >> >> > >> >> >> As a starting point, you may want to have a look at this document: > >> >> >> https://evlproject.org/core/oob-drivers/dma/ > >> >> >> > >> >> >> This is part of the EVL core documentation, but this is actually a > >> >> >> Dovetail feature. > >> >> >> > >> >> > > >> >> > Well, that's quite what I want to do, so this is very good news that it is already available in the future. However, I need it through the ipipe right now, but I guess the process stays the same (through patching the dmaengine API and the DMA engine driver). > >> >> > > >> >> > I would guess the modifications to the DMA engine driver would be then easily ported to dovetail ? > >> >> > > >> >> > >> >> Since they should follow the same pattern used for the controllers > >> >> Dovetail currently supports, I think so. You should be able to simplify > >> >> the code when porting it Dovetail actually. > >> >> > >> > > >> > That's what I thought. Thanks a lot. > >> > > >> > So now, regarding the "to userland memory" aspect. I guess I will somehow have to, in order to make this happen, change the PTE flags to make these pages non cacheable (using dma_map_page maybe), but I wonder if I have to map the userland pages to kernel space and whether or not I have to pin the userland pages in memory (I believe mlockall in the userland process does that already) ? > >> > > >> > >> The out-of-band SPI support available from EVL illustrates a possible > >> implementation. This code [2] implements what is described in this page > >> [1]. > >> > > > > Thanks for the example. I think what I'm trying to do is a little different from this however. > > For the records, this is what I do (and that seems to be working) : > > - as soon as user land buffers are allocated, tell the driver to pin the user land buffer pages in memory (with get_user_pages_fast). I'm not sure if this is required, as I think mlockall in the app would already take care of that. > > - whenever I need to transfer data to the user land buffer, instruct the driver to dma remap those user land pages (with dma_map_page), then instruct the DMA controller of the physical address of these pages. > > et voilà > > > > This seem to work correctly and repeatedly so far. > > > > Are transfers controlled from the real-time stage, and if so, how do you > deal with cache maintenance between transfers? That is my next problem to fix. It seems, as long as I run the test program in the debugger, displaying the buffer filled by the DMA in GDB, everything is fine. When GDB get's out of the way, I seem to read data that got in the D cache before the DMA did the transfer. I tried adding a flush_dcache_range before trigging the DMA, but it did not help. Any suggestion ? Thanks François > > -- > Philippe. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Doing DMA from peripheral to userland memory 2021-09-01 8:24 ` François Legal @ 2021-09-02 16:41 ` François Legal 2021-09-02 17:12 ` Philippe Gerum 0 siblings, 1 reply; 12+ messages in thread From: François Legal @ 2021-09-02 16:41 UTC (permalink / raw) To: François Legal; +Cc: Philippe Gerum, xenomai Le Mercredi, Septembre 01, 2021 10:24 CEST, François Legal via Xenomai <xenomai@xenomai.org> a écrit: > Le Mardi, Août 31, 2021 19:37 CEST, Philippe Gerum <rpm@xenomai.org> a écrit: > > > > > François Legal <devel@thom.fr.eu.org> writes: > > > > > Le Vendredi, Août 27, 2021 16:36 CEST, Philippe Gerum <rpm@xenomai.org> a écrit: > > > > > >> > > >> François Legal <devel@thom.fr.eu.org> writes: > > >> > > >> > Le Vendredi, Août 27, 2021 15:54 CEST, Philippe Gerum <rpm@xenomai.org> a écrit: > > >> > > > >> >> > > >> >> François Legal <devel@thom.fr.eu.org> writes: > > >> >> > > >> >> > Le Vendredi, Août 27, 2021 15:01 CEST, Philippe Gerum <rpm@xenomai.org> a écrit: > > >> >> > > > >> >> >> > > >> >> >> François Legal via Xenomai <xenomai@xenomai.org> writes: > > >> >> >> > > >> >> >> > Hello, > > >> >> >> > > > >> >> >> > working on a zynq7000 target (arm cortex a9), we have a peripheral that generates loads of data (many kbytes per ms). > > >> >> >> > > > >> >> >> > We would like to move that data, directly from the peripheral memory (the OCM of the SoC) directly to our RT application user memory using DMA. > > >> >> >> > > > >> >> >> > For one part of the data, we would like the DMA to de interlace that data while moving it. We figured out, the PL330 peripheral on the SoC should be able to do it, however, we would like, as much as possible, to retain the use of one or two channels of the PL330 to plain linux non RT use (via dmaengine). > > >> >> >> > > > >> >> >> > My first attempt would be to enhance the dmaengine API to add RT API, then implement the RT API calls in the PL330 driver. > > >> >> >> > > > >> >> >> > What do you think of this approach, and is it achievable at all (DMA directly to user land memory and/or having DMA channels exploited by xenomai and other by linux) ? > > >> >> >> > > > >> >> >> > Thanks in advance > > >> >> >> > > > >> >> >> > François > > >> >> >> > > >> >> >> As a starting point, you may want to have a look at this document: > > >> >> >> https://evlproject.org/core/oob-drivers/dma/ > > >> >> >> > > >> >> >> This is part of the EVL core documentation, but this is actually a > > >> >> >> Dovetail feature. > > >> >> >> > > >> >> > > > >> >> > Well, that's quite what I want to do, so this is very good news that it is already available in the future. However, I need it through the ipipe right now, but I guess the process stays the same (through patching the dmaengine API and the DMA engine driver). > > >> >> > > > >> >> > I would guess the modifications to the DMA engine driver would be then easily ported to dovetail ? > > >> >> > > > >> >> > > >> >> Since they should follow the same pattern used for the controllers > > >> >> Dovetail currently supports, I think so. You should be able to simplify > > >> >> the code when porting it Dovetail actually. > > >> >> > > >> > > > >> > That's what I thought. Thanks a lot. > > >> > > > >> > So now, regarding the "to userland memory" aspect. I guess I will somehow have to, in order to make this happen, change the PTE flags to make these pages non cacheable (using dma_map_page maybe), but I wonder if I have to map the userland pages to kernel space and whether or not I have to pin the userland pages in memory (I believe mlockall in the userland process does that already) ? > > >> > > > >> > > >> The out-of-band SPI support available from EVL illustrates a possible > > >> implementation. This code [2] implements what is described in this page > > >> [1]. > > >> > > > > > > Thanks for the example. I think what I'm trying to do is a little different from this however. > > > For the records, this is what I do (and that seems to be working) :> > - as soon as user land buffers are allocated, tell the driver to pin the user land buffer pages in memory (with get_user_pages_fast). I'm not sure if this is required, as I think mlockall in the app would already take care of that. > > > - whenever I need to transfer data to the user land buffer, instruct the driver to dma remap those user land pages (with dma_map_page), then instruct the DMA controller of the physical address of these pages. > > > et voilà > > > > > > This seem to work correctly and repeatedly so far. > > > > > > > Are transfers controlled from the real-time stage, and if so, how do you > > deal with cache maintenance between transfers? > > That is my next problem to fix. It seems, as long as I run the test program in the debugger, displaying the buffer filled by the DMA in GDB, everything is fine. When GDB get's out of the way, I seem to read data that got in the D cache before the DMA did the transfer. > I tried adding a flush_dcache_range before trigging the DMA, but it did not help. > > Any suggestion ? > > Thanks > > François > So I dug deep into the kernel cache management code for my (arm v7) arch, but could not find an answer nor a solution. I now wonder whether or not this (DMA to user land memory) is possible on this arch at all because of what is suggested in [1] even if that's a bit old. I saw that flush_dcache_range on armv7 is quite a noop, I tried with dmac_flush_range (which does the real thing with CP15), passing either the user land virtual address directly or first getting a kernel mapping with kmap_atomic but that did not change anything. I still, most of the time, get the first 2 cache line of data in the user land application wrong after the DMA transfer is done. I'm not sure where to look at next. François > > > > -- > > Philippe. > > [1] https://groups.google.com/g/linux.kernel/c/QONWGX6WJaE ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Doing DMA from peripheral to userland memory 2021-09-02 16:41 ` François Legal @ 2021-09-02 17:12 ` Philippe Gerum 0 siblings, 0 replies; 12+ messages in thread From: Philippe Gerum @ 2021-09-02 17:12 UTC (permalink / raw) To: François Legal; +Cc: xenomai François Legal <devel@thom.fr.eu.org> writes: > Le Mercredi, Septembre 01, 2021 10:24 CEST, François Legal via Xenomai <xenomai@xenomai.org> a écrit: > >> Le Mardi, Août 31, 2021 19:37 CEST, Philippe Gerum <rpm@xenomai.org> a écrit: >> >> > >> > François Legal <devel@thom.fr.eu.org> writes: >> > >> > > Le Vendredi, Août 27, 2021 16:36 CEST, Philippe Gerum <rpm@xenomai.org> a écrit: >> > > >> > >> >> > >> François Legal <devel@thom.fr.eu.org> writes: >> > >> >> > >> > Le Vendredi, Août 27, 2021 15:54 CEST, Philippe Gerum <rpm@xenomai.org> a écrit: >> > >> > >> > >> >> >> > >> >> François Legal <devel@thom.fr.eu.org> writes: >> > >> >> >> > >> >> > Le Vendredi, Août 27, 2021 15:01 CEST, Philippe Gerum <rpm@xenomai.org> a écrit: >> > >> >> > >> > >> >> >> >> > >> >> >> François Legal via Xenomai <xenomai@xenomai.org> writes: >> > >> >> >> >> > >> >> >> > Hello, >> > >> >> >> > >> > >> >> >> > working on a zynq7000 target (arm cortex a9), we have a peripheral that generates loads of data (many kbytes per ms). >> > >> >> >> > >> > >> >> >> > We would like to move that data, directly from the peripheral memory (the OCM of the SoC) directly to our RT application user memory using DMA. >> > >> >> >> > >> > >> >> >> > For one part of the data, we would like the DMA to de interlace that data while moving it. We figured out, the PL330 peripheral on the SoC should be able to do it, however, we would like, as much as possible, to retain the use of one or two channels of the PL330 to plain linux non RT use (via dmaengine). >> > >> >> >> > >> > >> >> >> > My first attempt would be to enhance the dmaengine API to add RT API, then implement the RT API calls in the PL330 driver. >> > >> >> >> > >> > >> >> >> > What do you think of this approach, and is it achievable at all (DMA directly to user land memory and/or having DMA channels exploited by xenomai and other by linux) ? >> > >> >> >> > >> > >> >> >> > Thanks in advance >> > >> >> >> > >> > >> >> >> > François >> > >> >> >> >> > >> >> >> As a starting point, you may want to have a look at this document: >> > >> >> >> https://evlproject.org/core/oob-drivers/dma/ >> > >> >> >> >> > >> >> >> This is part of the EVL core documentation, but this is actually a >> > >> >> >> Dovetail feature. >> > >> >> >> >> > >> >> > >> > >> >> > Well, that's quite what I want to do, so this is very good news that it is already available in the future. However, I need it through the ipipe right now, but I guess the process stays the same (through patching the dmaengine API and the DMA engine driver). >> > >> >> > >> > >> >> > I would guess the modifications to the DMA engine driver would be then easily ported to dovetail ? >> > >> >> > >> > >> >> >> > >> >> Since they should follow the same pattern used for the controllers >> > >> >> Dovetail currently supports, I think so. You should be able to simplify >> > >> >> the code when porting it Dovetail actually. >> > >> >> >> > >> > >> > >> > That's what I thought. Thanks a lot. >> > >> > >> > >> > So now, regarding the "to userland memory" aspect. I guess I will somehow have to, in order to make this happen, change the PTE flags to make these pages non cacheable (using dma_map_page maybe), but I wonder if I have to map the userland pages to kernel space and whether or not I have to pin the userland pages in memory (I believe mlockall in the userland process does that already) ? >> > >> > >> > >> >> > >> The out-of-band SPI support available from EVL illustrates a possible >> > >> implementation. This code [2] implements what is described in this page >> > >> [1]. >> > >> >> > > >> > > Thanks for the example. I think what I'm trying to do is a little different from this however. >> > > For the records, this is what I do (and that seems to be working) :> > - as soon as user land buffers are allocated, tell the driver to pin the user land buffer pages in memory (with get_user_pages_fast). I'm not sure if this is required, as I think mlockall in the app would already take care of that. >> > > - whenever I need to transfer data to the user land buffer, instruct the driver to dma remap those user land pages (with dma_map_page), then instruct the DMA controller of the physical address of these pages. >> > > et voilà >> > > >> > > This seem to work correctly and repeatedly so far. >> > > >> > >> > Are transfers controlled from the real-time stage, and if so, how do you >> > deal with cache maintenance between transfers? >> >> That is my next problem to fix. It seems, as long as I run the test program in the debugger, displaying the buffer filled by the DMA in GDB, everything is fine. When GDB get's out of the way, I seem to read data that got in the D cache before the DMA did the transfer. >> I tried adding a flush_dcache_range before trigging the DMA, but it did not help. >> >> Any suggestion ? >> >> Thanks >> >> François >> > > So I dug deep into the kernel cache management code for my (arm v7) arch, but could not find an answer nor a solution. > I now wonder whether or not this (DMA to user land memory) is possible on this arch at all because of what is suggested in [1] even if that's a bit old. > > I saw that flush_dcache_range on armv7 is quite a noop, I tried with dmac_flush_range (which does the real thing with CP15), passing either the user land virtual address directly or first getting a kernel mapping with kmap_atomic but that did not change anything. I still, most of the time, get the first 2 cache line of data in the user land application wrong after the DMA transfer is done. > > I'm not sure where to look at next. > DMA to userland memory is a non-issue in the regular in-band context. The problem starts with cache maintenance when you want to run these I/O requests from the oob stage, hence my previous question. The rule of thumb is that a driver should not fiddle with the innards of cache maintenance directly, and certainly not with flush_dcache_range() and friends. This includes Xenomai drivers. The DMA API hides these details in a portable way, typically the DMA streaming API would clean and/or invalidate the cache(s) layers when mapping, unmapping buffers. Problem: we may not use the regular DMA API from oob context. For instance, if some IOMMU is involved, or bounce buffers of some sort exist, or complex cache management layers in the kernel are traversed in general (e.g. some outer L2 caches are ugly), then things might get pretty nasty if this rule is not followed. For this reason, if using coherent memory is practical performance-wise for the use case, then this is a sane option for oob I/O, and you can do that as illustrated by the example I referred to. In this case, the kernel should allocate a suitable chunk of coherent memory for your application to perform I/O, not your application requesting common cached memory from its address space to be pinned and used for DMA. -- Philippe. ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2021-09-02 17:12 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-08-27 9:29 Doing DMA from peripheral to userland memory François Legal 2021-08-27 13:01 ` Jan Kiszka 2021-08-27 13:01 ` Philippe Gerum 2021-08-27 13:44 ` François Legal 2021-08-27 13:54 ` Philippe Gerum 2021-08-27 14:09 ` François Legal 2021-08-27 14:36 ` Philippe Gerum 2021-08-31 9:36 ` François Legal 2021-08-31 17:37 ` Philippe Gerum 2021-09-01 8:24 ` François Legal 2021-09-02 16:41 ` François Legal 2021-09-02 17:12 ` Philippe Gerum
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.