dmaengine Archive on
 help / color / Atom feed
From: Peter Ujfalusi <>
To: Thomas Ruf <>, Vinod Koul <>
Cc: Federico Vaga <>,
	Dave Jiang <>,
	Dan Williams <>,
	<>, <>
Subject: Re: DMA Engine: Transfer From Userspace
Date: Tue, 30 Jun 2020 15:31:11 +0300
Message-ID: <> (raw)
In-Reply-To: <>

On 29/06/2020 18.18, Thomas Ruf wrote:
>> On 26 June 2020 at 12:29 Peter Ujfalusi <> wrote:
>> On 24/06/2020 16.58, Thomas Ruf wrote:
>>>> On 24 June 2020 at 14:07 Peter Ujfalusi <> wrote:
>>>> On 24/06/2020 12.38, Vinod Koul wrote:
>>>>> On 24-06-20, 11:30, Thomas Ruf wrote:
>>>>>> To make it short - i have two questions:
>>>>>> - what are the chances to revive DMA_SG?
>>>>> 100%, if we have a in-kernel user
>>>> Most DMAs can not handle differently provisioned sg_list for src and dst.
>>>> Even if they could handle non symmetric SG setup it requires entirely
>>>> different setup (two independent channels sending the data to each
>>>> other, one reads, the other writes?).
>>> Ok, i implemented that using zynqmp_dma on a Xilinx Zynq platform (obviously ;-) and it works nicely for us.
>> I see, if the HW does not support it then something along the lines of
>> what the atc_prep_dma_sg did can be implemented for most engines.
>> In essence: create a new set of sg_list which is symmetric.
> Sorry, not sure if i understand you right?
> You suggest that in case DMA_SG gets revived we should restrict the support to symmetric sg_lists?

No, not at all. That would not make much sense.

> Just had a glance at the deleted code and the *_prep_dma_sg of these drivers had code to support asymmetric lists and by that "unaligend" memory (relative to page start):
> at_hdmac.c         
> dmaengine.c        
> dmatest.c          
> fsldma.c           
> mv_xor.c           
> nbpfaxi.c          
> ste_dma40.c        
> xgene-dma.c        
> xilinx/zynqmp_dma.c
> Why not just revive that and keep this nice functionality? ;-)

What I'm saying is that the drivers (at least at_hdmac) in essence
creates aligned sg_list out from the received non aligned ones.
It does this w/o actually creating the sg_list itself, but that's just a
small detail.

In a longer run what might make sense is to have a helper function to
convert two non symmetric sg_list into two symmetric ones so drivers
will not have to re-implement the same code and they will only need to
care about symmetric sg lists.

Note, some DMAs can actually handle non symmetric src and dst lists, but
I believe it is rare.

>> What might be plausible is to introduce hw offloading support for memcpy
>> type of operations in a similar fashion how for example crypto does it?
> Sounds good to me, my proxy driver implementation could be a good start for that, too!

It needs to find it's place as well... I'm not sure where that would be.
Simple block-copy offload, sg copy offload, interleaved offload (frame
extraction) offload, dmabuf copy offload comes to mind as candidates.

>> The issue with a user space implemented logic is that it is not portable
>> between systems with different DMAs. It might be that on one DMA the
>> setup takes longer than do a CPU copy of X bytes, on the other DMA it
>> might be significantly less or higher.
> Fully agree with that!
> I was also unsure how my approach will perform but in our case the latency was increased by ~20%, cpu load roughly stayed the same, of course this was the benchmark from user memory to user memory.
> From uncached to user memory the DMA was around 15 times faster.

It depends on the size of the transfer. Lots of small individual
transfers might be worst via DMA do the the setup time, completion
handling, etc.

>> Using CPU vs DMA for a copy in certain lengths and setups should not be
>> a concern of the user space.
> Also fully agree with that!

There is one and big issue with the fallback to CPU copy... If you used
DMA then you might need to do cache operation to get things in their
right place.
If you have done it with CPU then you most like do not need to care
about it.
Handling this should be done in level where we are aware which path is

>> Yes, you have a closed system with controlled parameters, but a generic
>> mem2mem_offload framework should be usable on other setups and the same
>> binary should be working on different DMAs where one is not efficient
>> for <512 bytes, the other shows benefits under 128bytes.
> Usable: of course
> "Faster": not necessarily as long as it is an option
> Thanks for your valuable input and suggestions!
> best regards,
> Thomas

- Péter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki

  reply index

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-19 22:47 Federico Vaga
2020-06-19 23:31 ` Dave Jiang
2020-06-21  7:24   ` Vinod Koul
2020-06-21 20:36     ` Federico Vaga
2020-06-21 20:45       ` Richard Weinberger
2020-06-21 22:32         ` Federico Vaga
2020-06-22  4:47       ` Vinod Koul
2020-06-22  6:57         ` Federico Vaga
2020-06-22 12:01         ` Thomas Ruf
2020-06-22 12:27           ` Richard Weinberger
2020-06-22 14:01             ` Thomas Ruf
2020-06-22 12:30           ` Federico Vaga
2020-06-22 14:03             ` Thomas Ruf
2020-06-22 15:54           ` Vinod Koul
2020-06-22 16:34             ` Thomas Ruf
2020-06-24  9:30               ` Thomas Ruf
2020-06-24  9:38                 ` Vinod Koul
2020-06-24 12:07                   ` Peter Ujfalusi
2020-06-24 13:58                     ` Thomas Ruf
2020-06-26 10:29                       ` Peter Ujfalusi
2020-06-29 15:18                         ` Thomas Ruf
2020-06-30 12:31                           ` Peter Ujfalusi [this message]
2020-07-01 16:13                             ` Thomas Ruf
2020-06-25  0:42     ` Dave Jiang
2020-06-25  8:11       ` Thomas Ruf
2020-06-26 20:08         ` Ira Weiny
2020-06-29 15:31           ` Thomas Ruf
2020-06-22  9:25 ` Federico Vaga
2020-06-22  9:42   ` Vinod Koul

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

dmaengine Archive on

Archives are clonable:
	git clone --mirror dmaengine/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dmaengine dmaengine/ \
	public-inbox-index dmaengine

Example config snippet for mirrors

Newsgroup available over NNTP:

AGPL code for this site: git clone