* dmaengine/Query: What about scatter/gather for mem to mem transfers.
@ 2011-12-08 4:10 Viresh Kumar
2011-12-08 7:37 ` Vinod Koul
0 siblings, 1 reply; 12+ messages in thread
From: Viresh Kumar @ 2011-12-08 4:10 UTC (permalink / raw)
To: Koul, Vinod, Dan Williams
Cc: linux-kernel, Shiraz HASHIM, Armando VISCONTI, Pratyush ANAND,
deepak sikri, Vipin KUMAR, Vipul Kumar SAMAR, Vincenzo FRASCINO,
Mirko GARDI, Rajeev KUMAR, Amit VIRDI, Bhupesh SHARMA
Hi Dan/Vinod
I am looking to implement scatter/gather interface for mem to mem transfers.
But before that, i wanted to know your feedback about it.
In my case, i have memory scattered in pages and performance is not good, if
i submit transfers page by page.
If you are Okay with the idea, i can implement it and submit soon.
--
viresh
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: dmaengine/Query: What about scatter/gather for mem to mem transfers.
2011-12-08 4:10 dmaengine/Query: What about scatter/gather for mem to mem transfers Viresh Kumar
@ 2011-12-08 7:37 ` Vinod Koul
2011-12-08 7:50 ` Viresh Kumar
0 siblings, 1 reply; 12+ messages in thread
From: Vinod Koul @ 2011-12-08 7:37 UTC (permalink / raw)
To: Viresh Kumar
Cc: Dan Williams, linux-kernel, Shiraz HASHIM, Armando VISCONTI,
Pratyush ANAND, deepak sikri, Vipin KUMAR, Vipul Kumar SAMAR,
Vincenzo FRASCINO, Mirko GARDI, Rajeev KUMAR, Amit VIRDI,
Bhupesh SHARMA
On Thu, 2011-12-08 at 09:40 +0530, Viresh Kumar wrote:
> Hi Dan/Vinod
>
> I am looking to implement scatter/gather interface for mem to mem transfers.
> But before that, i wanted to know your feedback about it.
>
> In my case, i have memory scattered in pages and performance is not good, if
> i submit transfers page by page.
>
> If you are Okay with the idea, i can implement it and submit soon.
>
You mean something like:
struct dma_async_tx_descriptor *(*device_prep_dma_sg)(
struct dma_chan *chan,
struct scatterlist *dst_sg, unsigned int dst_nents,
struct scatterlist *src_sg, unsigned int src_nents,
unsigned long flags);
Its already there, you need to implement in your driver :)
--
~Vinod
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: dmaengine/Query: What about scatter/gather for mem to mem transfers.
2011-12-08 7:37 ` Vinod Koul
@ 2011-12-08 7:50 ` Viresh Kumar
2011-12-14 6:47 ` Pratyush Anand
0 siblings, 1 reply; 12+ messages in thread
From: Viresh Kumar @ 2011-12-08 7:50 UTC (permalink / raw)
To: Vinod Koul
Cc: Dan Williams, linux-kernel, Shiraz HASHIM, Armando VISCONTI,
Pratyush ANAND, Deepak SIKRI, Vipin KUMAR, Vipul Kumar SAMAR,
Vincenzo FRASCINO, Mirko GARDI, Rajeev KUMAR, Amit VIRDI,
Bhupesh SHARMA
On 12/8/2011 1:07 PM, Vinod Koul wrote:
> You mean something like:
> struct dma_async_tx_descriptor *(*device_prep_dma_sg)(
> struct dma_chan *chan,
> struct scatterlist *dst_sg, unsigned int dst_nents,
> struct scatterlist *src_sg, unsigned int src_nents,
> unsigned long flags);
>
> Its already there, you need to implement in your driver :)
Ok. I didn't knew it. :(
--
viresh
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: dmaengine/Query: What about scatter/gather for mem to mem transfers.
2011-12-08 7:50 ` Viresh Kumar
@ 2011-12-14 6:47 ` Pratyush Anand
2011-12-15 4:58 ` Pratyush Anand
0 siblings, 1 reply; 12+ messages in thread
From: Pratyush Anand @ 2011-12-14 6:47 UTC (permalink / raw)
To: Viresh KUMAR
Cc: Vinod Koul, Dan Williams, linux-kernel, Shiraz HASHIM,
Armando VISCONTI, Deepak SIKRI, Vipin KUMAR, Vipul Kumar SAMAR,
Vincenzo FRASCINO, Mirko GARDI, Rajeev KUMAR, Amit VIRDI,
Bhupesh SHARMA, linus.walleij
Hello Vinod/Dan,
In continuation to the scatter/gather requirement:
We might need some generic transfer where source and destination address
may be overlapped and also gap between two chunk of source and
destination might not be same.
For examaple,
Transfer size is - -0x4000
Our Src is something like this:
0x1000 -- 0x2000
0x3000 -- 0x5000
0x6000 -- 0x7000
and dst is something like this:
0x6000 -- 0x8000
0x9000 -- 0xB000
It seems that device_prep_interleaved_dma would not be able to handle
such transfer.
So, what I was thinking that to add following flags in enum dma_ctrl_flags.
DMA_SRC_INC = (1 << 10),
DMA_SRC_DEC = (2 << 10),
DMA_SRC_FIX = (3 << 10),
DMA_DST_INC = (1 << 12),
DMA_DST_DEC = (2 << 12),
DMA_DST_FIX = (3 << 12),
Now we can use these flag in device_prep_dma_sg and this function can be
implemented for generic cases.
I think, the above modifications will not affect other's platform and
should be acceptable.
Whats your opinion?
Regards
Pratyush
On 12/8/2011 1:20 PM, Viresh KUMAR wrote:
> On 12/8/2011 1:07 PM, Vinod Koul wrote:
>> You mean something like:
>> struct dma_async_tx_descriptor *(*device_prep_dma_sg)(
>> struct dma_chan *chan,
>> struct scatterlist *dst_sg, unsigned int dst_nents,
>> struct scatterlist *src_sg, unsigned int src_nents,
>> unsigned long flags);
>>
>> Its already there, you need to implement in your driver :)
>
> Ok. I didn't knew it. :(
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: dmaengine/Query: What about scatter/gather for mem to mem transfers.
2011-12-14 6:47 ` Pratyush Anand
@ 2011-12-15 4:58 ` Pratyush Anand
2011-12-15 5:06 ` Koul, Vinod
0 siblings, 1 reply; 12+ messages in thread
From: Pratyush Anand @ 2011-12-15 4:58 UTC (permalink / raw)
To: Vinod Koul, Dan Williams
Cc: Viresh KUMAR, linux-kernel, Shiraz HASHIM, Armando VISCONTI,
Deepak SIKRI, Vipin KUMAR, Vipul Kumar SAMAR, Vincenzo FRASCINO,
Mirko GARDI, Rajeev KUMAR, Amit VIRDI, Bhupesh SHARMA,
linus.walleij
Hello Vinod/Dan,
Please write your opinion.
Regards
Pratyush
On 12/14/2011 12:17 PM, Pratyush Anand wrote:
> Hello Vinod/Dan,
>
> In continuation to the scatter/gather requirement:
> We might need some generic transfer where source and destination address
> may be overlapped and also gap between two chunk of source and
> destination might not be same.
>
> For examaple,
> Transfer size is - -0x4000
> Our Src is something like this:
> 0x1000 -- 0x2000
> 0x3000 -- 0x5000
> 0x6000 -- 0x7000
>
> and dst is something like this:
> 0x6000 -- 0x8000
> 0x9000 -- 0xB000
>
> It seems that device_prep_interleaved_dma would not be able to handle
> such transfer.
>
> So, what I was thinking that to add following flags in enum dma_ctrl_flags.
>
> DMA_SRC_INC = (1 << 10),
> DMA_SRC_DEC = (2 << 10),
> DMA_SRC_FIX = (3 << 10),
> DMA_DST_INC = (1 << 12),
> DMA_DST_DEC = (2 << 12),
> DMA_DST_FIX = (3 << 12),
>
> Now we can use these flag in device_prep_dma_sg and this function can be
> implemented for generic cases.
>
> I think, the above modifications will not affect other's platform and
> should be acceptable.
>
> Whats your opinion?
>
> Regards
> Pratyush
>
> On 12/8/2011 1:20 PM, Viresh KUMAR wrote:
>> On 12/8/2011 1:07 PM, Vinod Koul wrote:
>>> You mean something like:
>>> struct dma_async_tx_descriptor *(*device_prep_dma_sg)(
>>> struct dma_chan *chan,
>>> struct scatterlist *dst_sg, unsigned int dst_nents,
>>> struct scatterlist *src_sg, unsigned int src_nents,
>>> unsigned long flags);
>>>
>>> Its already there, you need to implement in your driver :)
>>
>> Ok. I didn't knew it. :(
>>
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: dmaengine/Query: What about scatter/gather for mem to mem transfers.
2011-12-15 4:58 ` Pratyush Anand
@ 2011-12-15 5:06 ` Koul, Vinod
2011-12-15 5:24 ` Pratyush Anand
0 siblings, 1 reply; 12+ messages in thread
From: Koul, Vinod @ 2011-12-15 5:06 UTC (permalink / raw)
To: Pratyush Anand, Williams, Dan J
Cc: Viresh KUMAR, linux-kernel, Shiraz HASHIM, Armando VISCONTI,
Deepak SIKRI, Vipin KUMAR, Vipul Kumar SAMAR, Vincenzo FRASCINO,
Mirko GARDI, Rajeev KUMAR, Amit VIRDI, Bhupesh SHARMA,
linus.walleij
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1155 bytes --]
>
> Hello Vinod/Dan,
>
> Please write your opinion.
I am on vacation, but will reply briefly here, more next week
>
> Regards
> Pratyush
>
> On 12/14/2011 12:17 PM, Pratyush Anand wrote:
> > Hello Vinod/Dan,
> >
> > In continuation to the scatter/gather requirement:
> > We might need some generic transfer where source and destination
> > address may be overlapped and also gap between two chunk of source and
> > destination might not be same.
> >
> > For examaple,
> > Transfer size is - -0x4000
> > Our Src is something like this:
> > 0x1000 -- 0x2000
> > 0x3000 -- 0x5000
> > 0x6000 -- 0x7000
> >
> > and dst is something like this:
> > 0x6000 -- 0x8000
> > 0x9000 -- 0xB000
So why can't it be split like:
0x1000--0x2000 => 0x6000 --0x7000
0x3000 -- 0x4000 => 0x7000 - 0x8000
0x4000 -- 0x5000 => 0x9000 - 0xA000
0x6000 -- 0x7000 => 0xA000 - 0xB000
That way existing mechanism would work well for you.
You need to split the chunks properly, which is what dma would do anyway
--
~Vinod
ÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±þG«éÿ{ayº\x1dÊÚë,j\a¢f£¢·hïêÿêçz_è®\x03(éÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?¨èÚ&£ø§~á¶iOæ¬z·vØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?I¥
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: dmaengine/Query: What about scatter/gather for mem to mem transfers.
2011-12-15 5:06 ` Koul, Vinod
@ 2011-12-15 5:24 ` Pratyush Anand
2011-12-15 6:56 ` Pratyush Anand
0 siblings, 1 reply; 12+ messages in thread
From: Pratyush Anand @ 2011-12-15 5:24 UTC (permalink / raw)
To: Koul, Vinod
Cc: Williams, Dan J, Viresh KUMAR, linux-kernel, Shiraz HASHIM,
Armando VISCONTI, Deepak SIKRI, Vipin KUMAR, Vipul Kumar SAMAR,
Vincenzo FRASCINO, Mirko GARDI, Rajeev KUMAR, Amit VIRDI,
Bhupesh SHARMA, linus.walleij
On 12/15/2011 10:36 AM, Koul, Vinod wrote:
>>
>> Hello Vinod/Dan,
>>
>> Please write your opinion.
> I am on vacation, but will reply briefly here, more next week
Thanks for your quick reply.
>>
>> Regards
>> Pratyush
>>
>> On 12/14/2011 12:17 PM, Pratyush Anand wrote:
>>> Hello Vinod/Dan,
>>>
>>> In continuation to the scatter/gather requirement:
>>> We might need some generic transfer where source and destination
>>> address may be overlapped and also gap between two chunk of source and
>>> destination might not be same.
>>>
>>> For examaple,
>>> Transfer size is - -0x4000
>>> Our Src is something like this:
>>> 0x1000 -- 0x2000
>>> 0x3000 -- 0x5000
>>> 0x6000 -- 0x7000
>>>
>>> and dst is something like this:
>>> 0x6000 -- 0x8000
>>> 0x9000 -- 0xB000
> So why can't it be split like:
> 0x1000--0x2000 => 0x6000 --0x7000
> 0x3000 -- 0x4000 => 0x7000 - 0x8000
> 0x4000 -- 0x5000 => 0x9000 - 0xA000
> 0x6000 -- 0x7000 => 0xA000 - 0xB000
>
> That way existing mechanism would work well for you.
> You need to split the chunks properly, which is what dma would do anyway
>
Yes, they can be split like this, but then splitting onus will go on dma
user driver, and so there would be replication of similar logic at
several places. Therefore, I was thinking to make device_prep_dma_sg as
generic by adding these flags.
Regards
Pratyush
> --
> ~Vinod
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: dmaengine/Query: What about scatter/gather for mem to mem transfers.
2011-12-15 5:24 ` Pratyush Anand
@ 2011-12-15 6:56 ` Pratyush Anand
2011-12-20 9:21 ` Vinod Koul
0 siblings, 1 reply; 12+ messages in thread
From: Pratyush Anand @ 2011-12-15 6:56 UTC (permalink / raw)
To: Koul, Vinod
Cc: Williams, Dan J, Viresh KUMAR, linux-kernel, Shiraz HASHIM,
Armando VISCONTI, Deepak SIKRI, Vipin KUMAR, Vipul Kumar SAMAR,
Vincenzo FRASCINO, Mirko GARDI, Rajeev KUMAR, Amit VIRDI,
Bhupesh SHARMA, linus.walleij
On 12/15/2011 10:54 AM, Pratyush Anand wrote:
> On 12/15/2011 10:36 AM, Koul, Vinod wrote:
>>>
>>> Hello Vinod/Dan,
>>>
>>> Please write your opinion.
>> I am on vacation, but will reply briefly here, more next week
>
> Thanks for your quick reply.
>
>>>
>>> Regards
>>> Pratyush
>>>
>>> On 12/14/2011 12:17 PM, Pratyush Anand wrote:
>>>> Hello Vinod/Dan,
>>>>
>>>> In continuation to the scatter/gather requirement:
>>>> We might need some generic transfer where source and destination
>>>> address may be overlapped and also gap between two chunk of source and
>>>> destination might not be same.
>>>>
>>>> For examaple,
>>>> Transfer size is - -0x4000
>>>> Our Src is something like this:
>>>> 0x1000 -- 0x2000
>>>> 0x3000 -- 0x5000
>>>> 0x6000 -- 0x7000
>>>>
>>>> and dst is something like this:
>>>> 0x6000 -- 0x8000
>>>> 0x9000 -- 0xB000
>> So why can't it be split like:
>> 0x1000--0x2000 => 0x6000 --0x7000
>> 0x3000 -- 0x4000 => 0x7000 - 0x8000
>> 0x4000 -- 0x5000 => 0x9000 - 0xA000
>> 0x6000 -- 0x7000 => 0xA000 - 0xB000
>>
>> That way existing mechanism would work well for you.
>> You need to split the chunks properly, which is what dma would do anyway
>>
>
> Yes, they can be split like this, but then splitting onus will go on dma
> user driver, and so there would be replication of similar logic at
> several places. Therefore, I was thinking to make device_prep_dma_sg as
> generic by adding these flags.
I see one more issue in using device_prep_interleaved_dma.
Src and Dst address has been allocated in user space.
Now a kernel module extracts physical addresses from these pages and
prepares a sg list, which it submits to DMA.
These addresses would be virtually contiguous and incrementing. But, I
am not sure if they are always physically incrementing too. If they are
not guaranteed to be incrementing, then I see issue.
Otherwise also, a situation can arise when scattered memory is not
always incrementing or decrementing in the same sg list.
>
> Regards
> Pratyush
>
>> --
>> ~Vinod
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: dmaengine/Query: What about scatter/gather for mem to mem transfers.
2011-12-15 6:56 ` Pratyush Anand
@ 2011-12-20 9:21 ` Vinod Koul
2011-12-20 10:15 ` Pratyush Anand
0 siblings, 1 reply; 12+ messages in thread
From: Vinod Koul @ 2011-12-20 9:21 UTC (permalink / raw)
To: Pratyush Anand
Cc: Williams, Dan J, Viresh KUMAR, linux-kernel, Shiraz HASHIM,
Armando VISCONTI, Deepak SIKRI, Vipin KUMAR, Vipul Kumar SAMAR,
Vincenzo FRASCINO, Mirko GARDI, Rajeev KUMAR, Amit VIRDI,
Bhupesh SHARMA, linus.walleij
On Thu, 2011-12-15 at 12:26 +0530, Pratyush Anand wrote:
> >> That way existing mechanism would work well for you.
> >> You need to split the chunks properly, which is what dma would do
> anyway
> >>
> >
> > Yes, they can be split like this, but then splitting onus will go on
> dma
> > user driver, and so there would be replication of similar logic at
> > several places. Therefore, I was thinking to make device_prep_dma_sg
> as
> > generic by adding these flags.
Well I am not sure how adding flags handles this?
There are few things you should consider
1) do you have h/w support for these, if yes then we can talk about
dmaengine APIs doing such a thing
2) if objective is to support such transfers from dma driver POV, then I
wouldn't agree, as these can be split easily to standard dma sg list.
>From the code POV, it wouldn't hurt to create a wrapper which take in
these non standard sg list list and converts then to uniform list
Most important question:
in which practical scenario would src and dstn lengths be different?
>
> I see one more issue in using device_prep_interleaved_dma.
>
> Src and Dst address has been allocated in user space.
> Now a kernel module extracts physical addresses from these pages and
> prepares a sg list, which it submits to DMA.
> These addresses would be virtually contiguous and incrementing. But,
> I
> am not sure if they are always physically incrementing too. If they
> are
> not guaranteed to be incrementing, then I see issue.
>
> Otherwise also, a situation can arise when scattered memory is not
> always incrementing or decrementing in the same sg list.
What _exactly_ are you trying to do?
DMA would need buffers which are physically contagious. Also the user
pages can be swapped out, you would need to pin these pages.
--
~Vinod
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: dmaengine/Query: What about scatter/gather for mem to mem transfers.
2011-12-20 9:21 ` Vinod Koul
@ 2011-12-20 10:15 ` Pratyush Anand
2012-01-02 11:35 ` Vinod Koul
0 siblings, 1 reply; 12+ messages in thread
From: Pratyush Anand @ 2011-12-20 10:15 UTC (permalink / raw)
To: Vinod Koul
Cc: Williams, Dan J, Viresh KUMAR, linux-kernel, Shiraz HASHIM,
Armando VISCONTI, Deepak SIKRI, Vipin KUMAR, Vipul Kumar SAMAR,
Vincenzo FRASCINO, Mirko GARDI, Rajeev KUMAR, Amit VIRDI,
Bhupesh SHARMA, linus.walleij
On 12/20/2011 2:51 PM, Vinod Koul wrote:
> On Thu, 2011-12-15 at 12:26 +0530, Pratyush Anand wrote:
>>>> That way existing mechanism would work well for you.
>>>> You need to split the chunks properly, which is what dma would do
>> anyway
>>>>
>>>
>>> Yes, they can be split like this, but then splitting onus will go on
>> dma
>>> user driver, and so there would be replication of similar logic at
>>> several places. Therefore, I was thinking to make device_prep_dma_sg
>> as
>>> generic by adding these flags.
> Well I am not sure how adding flags handles this?
device_prep_dma_sg has last argument as flags. dma user driver can pass
information about inc/dec of src/dst in this flag, which can be used by
the dma driver. I have put one such implementation at the end of mail.
> There are few things you should consider
> 1) do you have h/w support for these, if yes then we can talk about
> dmaengine APIs doing such a thing
No, there is not any specific HW option.
> 2) if objective is to support such transfers from dma driver POV, then I
> wouldn't agree, as these can be split easily to standard dma sg list.
>> From the code POV, it wouldn't hurt to create a wrapper which take in
> these non standard sg list list and converts then to uniform list
Yes, a wrapper function can do that.
>
> Most important question:
> in which practical scenario would src and dstn lengths be different?
Let me explain my sceberio.
Actual memory has been allocated by malloc in user space and pinned
using mlock. Now src to dest copy (memcpy) has to be implemented using
dma. A kernel module takes these addresses and transfer length as input.
It extracts physical address each page by page. Now, it may happen that
src address is like page 1-5, 7, 9 while dst address is like page 20-26.
So here, for src there would have 3 nodes in sg list while for dest
number of nodes would be 1. One option could be, as you has suggested
earlier to equalize src and dst addresses gap and length by the dma user
driver itself and then call device_prep_interleaved_dma. Other option
what I think, could be to pass these LLIs as it is and then manage it
in device_prep_dma_sg using flags.
>
>>
>> I see one more issue in using device_prep_interleaved_dma.
>>
>> Src and Dst address has been allocated in user space.
>> Now a kernel module extracts physical addresses from these pages and
>> prepares a sg list, which it submits to DMA.
>> These addresses would be virtually contiguous and incrementing. But,
>> I
>> am not sure if they are always physically incrementing too. If they
>> are
>> not guaranteed to be incrementing, then I see issue.
>>
>> Otherwise also, a situation can arise when scattered memory is not
>> always incrementing or decrementing in the same sg list.
> What _exactly_ are you trying to do?
>
> DMA would need buffers which are physically contagious. Also the user
> pages can be swapped out, you would need to pin these pages.
>
but, even after using mlock, will it insure that allocated pages are
always physically incrementing. If not then , probably we can not use
device_prep_interleaved_dma.
Regards
Pratyush
Code which I referred:
+dwc_prep_dma_sg(struct dma_chan *chan,
+ struct scatterlist *dst_sg, unsigned int dst_nents,
+ struct scatterlist *src_sg, unsigned int src_nents,
+ unsigned long flags)
+{
+ struct dw_dma_chan *dwc = to_dw_dma_chan(chan);
+ struct dw_desc *first = NULL, *desc, *prev;
+ unsigned int offset, xfer_count, xfer_bytes;
+ unsigned int best_width, width, misalignment;
+ u32 ctllo;
+ dma_addr_t src, dst;
+ unsigned int slen = 0, dlen = 0, len = 0, total_len = 0;
+ unsigned int si = 0, di = 0;
+ unsigned long def_flags = 0;
+
+ if (unlikely(src_nents == 0 || dst_nents == 0))
+ return NULL;
+
+ /*
+ * TODO: Currently We only support src and dst both as either
+ * incrementing or decrementing.
+ */
+ if ((flags & DMA_SRC_INC) && !(flags & DMA_DST_INC))
+ return NULL;
+
+ if ((flags & DMA_SRC_DEC) && !(flags & DMA_DST_DEC))
+ return NULL;
+
+ if (flags & DMA_SRC_INC)
+ def_flags |= DWC_CTLL_SRC_INC;
+ else if (flags & DMA_SRC_DEC)
+ def_flags |= DWC_CTLL_SRC_DEC;
+
+ if (flags & DMA_DST_INC)
+ def_flags |= DWC_CTLL_DST_INC;
+ else if (flags & DMA_DST_DEC)
+ def_flags |= DWC_CTLL_DST_DEC;
+
+ /* silence gcc warnings */
+ src = dst = 0;
+ prev = NULL;
+
+ while (si < src_nents || di < dst_nents) {
+ if (slen == 0) {
+ si++;
+ src = sg_dma_address(src_sg);
+ slen = sg_dma_len(src_sg);
+ /* If decrementing then start from end address */
+ if (flags & DMA_SRC_DEC)
+ src = src + slen;
+ src_sg = sg_next(src_sg);
+ if (unlikely (si < src_nents && src_sg == NULL)) {
+ dev_printk(KERN_ERR, chan2dev(chan),
+ "Invalid src scattergather list: "
+ "entry %u of %u has last flag\n",
+ si, src_nents);
+ goto err_sg_len;
+ } else if (unlikely (si == src_nents
+ && src_sg != NULL)) {
+ dev_printk(KERN_ERR, chan2dev(chan),
+ "Invalid src scattergather list: "
+ "entry %u of %u has no last flag\n",
+ si, src_nents);
+ goto err_sg_len;
+ }
+ } else {
+ if (flags & DMA_SRC_INC)
+ src += len;
+ else if (flags & DMA_SRC_DEC)
+ src -= len;
+ else
+ goto err_sg_len;
+ }
+
+ if (dlen == 0) {
+ di++;
+ dst = sg_dma_address(dst_sg);
+ dlen = sg_dma_len(dst_sg);
+ /* If decrementing then start from end address */
+ if (flags & DMA_DST_DEC)
+ dst = dst + dlen;
+ dst_sg = sg_next(dst_sg);
+ if (unlikely (di < dst_nents && dst_sg == NULL)) {
+ dev_printk(KERN_ERR, chan2dev(chan),
+ "Invalid dst scattergather list: "
+ "entry %u of %u has last flag\n",
+ di, dst_nents);
+ goto err_sg_len;
+ } else if (unlikely (di == dst_nents
+ && dst_sg != NULL)) {
+ dev_printk(KERN_ERR, chan2dev(chan),
+ "Invalid dst scattergather list: "
+ "entry %u of %u has no last flag\n",
+ di, dst_nents);
+ goto err_sg_len;
+ }
+ } else {
+ if (flags & DMA_DST_INC)
+ dst += len;
+ else if (flags & DMA_DST_DEC)
+ dst -= len;
+ else
+ goto err_sg_len;
+ }
+
+ len = min(slen, dlen);
+ slen -= len;
+ dlen -= len;
+
+ /* src-dst relative misalignment */
+ misalignment = src ^ dst;
+ best_width = __ffs(DWC_MAX_WIDTH | misalignment);
+
+ /* We want the transfer to be performed in 3 parts:
+ *
+ * - align src and dest to the best possible alignment
+ * (up to the maximum width of 8)
+ *
+ * - perform most of the transfer using the most
+ * efficient alignment
+ *
+ * - complete the transfer using whatever alignment is
+ * needed to total len bytes
+ */
+
+ /* First part of the transfer: align src and dst*/
+
+ /* misalignment of src and dst with respect to 0 */
+ misalignment = src | dst;
+ width = __ffs(DWC_MAX_WIDTH | misalignment);
+
+ /* If we're already aligned, the first part is a
+ * no-op, skip to the second */
+ if (width != best_width
+ && len >> best_width > __ffs(DWC_MAX_WIDTH)) {
+ /* align src and dst to (1 << best_width) */
+ unsigned int align = 1 << best_width;
+ xfer_bytes = (align-src) & (align-dst) & (align-1);
+ } else
+ xfer_bytes = len;
+
+ offset = 0;
+ xfer_count = xfer_bytes >> width;
+
+ if (flags & (DMA_SRC_DEC | DMA_DST_DEC)){
+ src -= (1 << width);
+ dst -= (1 << width);
+ }
+
+ do {
+ ctllo = DWC_DEFAULT_CTLLO(chan->private)
+ | DWC_CTLL_DST_WIDTH(width)
+ | DWC_CTLL_SRC_WIDTH(width)
+ | def_flags
+ | DWC_CTLL_FC_M2M;
+
+ while (xfer_count > 0) {
+ xfer_count = min_t(size_t, xfer_count,
+ DWC_MAX_COUNT);
+
+ desc = dwc_desc_get(dwc);
+ if (unlikely (desc == NULL))
+ goto err_desc_get;
+
+ if (flags & (DMA_SRC_INC | DMA_DST_INC)){
+ desc->lli.sar = src + offset;
+ desc->lli.dar = dst + offset;
+ } else {
+ desc->lli.sar = src - offset;
+ desc->lli.dar = dst - offset;
+ }
+ desc->lli.ctllo = ctllo;
+ desc->lli.ctlhi = xfer_count;
+
+ if (first == NULL) {
+ first = desc;
+ } else {
+ prev->lli.llp = desc->txd.phys;
+ dma_sync_single_for_device(
+ chan2parent(chan),
+ prev->txd.phys,
+ sizeof(prev->lli),
+ DMA_TO_DEVICE);
+ list_add_tail(&desc->desc_node,
+ &first->tx_list);
+ }
+
+ prev = desc;
+
+ offset += xfer_count << width;
+ xfer_bytes -= xfer_count << width;
+
+ xfer_count = xfer_bytes >> width;
+ }
+
+ xfer_bytes = len - offset;
+
+ /*
+ * The first part has been done. If there is
+ * some data to be transferred using
+ * best_width-aligned operations, do
+ * it. Otherwise, choose an alignment that is
+ * works for all of the remaining transfer.
+ */
+ if (xfer_bytes >> best_width > 0) {
+ width = best_width;
+ } else {
+ /* misalignment of the remaining part
+ * of src, dst and len */
+ misalignment = (src+offset) | (dst+offset)
+ | xfer_bytes;
+ width = __ffs(DWC_MAX_WIDTH | misalignment);
+ }
+
+ xfer_count = xfer_bytes >> width;
+ } while (offset < len);
+
+ total_len += len;
+ if (flags & (DMA_SRC_DEC | DMA_DST_DEC)){
+ src += (1 << width);
+ dst += (1 << width);
+ }
+
+ }
+
+ if (flags & DMA_PREP_INTERRUPT)
+ /* Trigger interrupt after last block */
+ prev->lli.ctllo |= DWC_CTLL_INT_EN;
+
+ prev->lli.llp = 0;
+ dma_sync_single_for_device(chan2parent(chan),
+ prev->txd.phys, sizeof(prev->lli),
+ DMA_TO_DEVICE);
+
+ first->txd.flags = flags;
+ first->len = total_len;
+
+ return &first->txd;
+
+err_sg_len:
+err_desc_get:
+ dwc_desc_put(dwc, first);
+ return NULL;
+}
+
+static struct dma_async_tx_descriptor *
dwc_prep_slave_sg(struct dma_chan *chan, struct scatterlist *sgl,
unsigned int sg_len, enum dma_data_direction direction,
unsigned long flags)
@@ -1474,6 +1728,7 @@ static int __init dw_probe(struct platform_device
*pdev)
dw->dma.device_free_chan_resources = dwc_free_chan_resources;
dw->dma.device_prep_dma_memcpy = dwc_prep_dma_memcpy;
+ dw->dma.device_prep_dma_sg = dwc_prep_dma_sg;
dw->dma.device_prep_slave_sg = dwc_prep_slave_sg;
dw->dma.device_control = dwc_control;
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: dmaengine/Query: What about scatter/gather for mem to mem transfers.
2011-12-20 10:15 ` Pratyush Anand
@ 2012-01-02 11:35 ` Vinod Koul
2012-01-02 11:50 ` Pratyush Anand
0 siblings, 1 reply; 12+ messages in thread
From: Vinod Koul @ 2012-01-02 11:35 UTC (permalink / raw)
To: Pratyush Anand
Cc: Williams, Dan J, Viresh KUMAR, linux-kernel, Shiraz HASHIM,
Armando VISCONTI, Deepak SIKRI, Vipin KUMAR, Vipul Kumar SAMAR,
Vincenzo FRASCINO, Mirko GARDI, Rajeev KUMAR, Amit VIRDI,
Bhupesh SHARMA, linus.walleij
On Tue, 2011-12-20 at 15:45 +0530, Pratyush Anand wrote:
> On 12/20/2011 2:51 PM, Vinod Koul wrote:
> > On Thu, 2011-12-15 at 12:26 +0530, Pratyush Anand wrote:
> >>>> That way existing mechanism would work well for you.
> >>>> You need to split the chunks properly, which is what dma would do
> >> anyway
> >>>>
> >>>
> >>> Yes, they can be split like this, but then splitting onus will go on
> >> dma
> >>> user driver, and so there would be replication of similar logic at
> >>> several places. Therefore, I was thinking to make device_prep_dma_sg
> >> as
> >>> generic by adding these flags.
> > Well I am not sure how adding flags handles this?
>
> device_prep_dma_sg has last argument as flags. dma user driver can pass
> information about inc/dec of src/dst in this flag, which can be used by
> the dma driver. I have put one such implementation at the end of mail.
with the split pre-done would we need to do this, hence the referred
implementation.
>
> > There are few things you should consider
> > 1) do you have h/w support for these, if yes then we can talk about
> > dmaengine APIs doing such a thing
>
> No, there is not any specific HW option.
So dma driver will _need_ to do this anyway, so do it generically in
client driver and pass to your prepare in current way
> > 2) if objective is to support such transfers from dma driver POV, then I
> > wouldn't agree, as these can be split easily to standard dma sg list.
> >> From the code POV, it wouldn't hurt to create a wrapper which take in
> > these non standard sg list list and converts then to uniform list
>
> Yes, a wrapper function can do that.
>
> >
> > Most important question:
> > in which practical scenario would src and dstn lengths be different?
>
> Let me explain my sceberio.
>
> Actual memory has been allocated by malloc in user space and pinned
> using mlock. Now src to dest copy (memcpy) has to be implemented using
> dma. A kernel module takes these addresses and transfer length as input.
> It extracts physical address each page by page. Now, it may happen that
> src address is like page 1-5, 7, 9 while dst address is like page 20-26.
> So here, for src there would have 3 nodes in sg list while for dest
> number of nodes would be 1. One option could be, as you has suggested
> earlier to equalize src and dst addresses gap and length by the dma user
> driver itself and then call device_prep_interleaved_dma. Other option
> what I think, could be to pass these LLIs as it is and then manage it
> in device_prep_dma_sg using flags.
I can't figure why user space requires you to do the dma, what is the
usuage intended?
Another way would be to allocate kernel mempory in your driver and then
mmap it to userspace, that way you get contagious memory and dont need
to resort to page transfers.
>
> >
> >>
> >> I see one more issue in using device_prep_interleaved_dma.
> >>
> >> Src and Dst address has been allocated in user space.
> >> Now a kernel module extracts physical addresses from these pages and
> >> prepares a sg list, which it submits to DMA.
> >> These addresses would be virtually contiguous and incrementing. But,
> >> I
> >> am not sure if they are always physically incrementing too. If they
> >> are
> >> not guaranteed to be incrementing, then I see issue.
> >>
> >> Otherwise also, a situation can arise when scattered memory is not
> >> always incrementing or decrementing in the same sg list.
> > What _exactly_ are you trying to do?
> >
> > DMA would need buffers which are physically contagious. Also the user
> > pages can be swapped out, you would need to pin these pages.
> >
>
> but, even after using mlock, will it insure that allocated pages are
> always physically incrementing. If not then , probably we can not use
> device_prep_interleaved_dma.
userspace memory is not guaranteed to be contagious and it can be
swapped out. Although using mlock that should be kept in RAM.
>
> Regards
> Pratyush
>
> Code which I referred:
>
> +dwc_prep_dma_sg(struct dma_chan *chan,
> + struct scatterlist *dst_sg, unsigned int dst_nents,
> + struct scatterlist *src_sg, unsigned int src_nents,
> + unsigned long flags)
> +{
the whole function looks very huge, I think you should be able to split
this and other function in your driver and reuse lot of common code and
make this simpler, modular and easier to read.
> + struct dw_dma_chan *dwc = to_dw_dma_chan(chan);
> + struct dw_desc *first = NULL, *desc, *prev;
> + unsigned int offset, xfer_count, xfer_bytes;
> + unsigned int best_width, width, misalignment;
> + u32 ctllo;
> + dma_addr_t src, dst;
> + unsigned int slen = 0, dlen = 0, len = 0, total_len = 0;
> + unsigned int si = 0, di = 0;
> + unsigned long def_flags = 0;
> +
> + if (unlikely(src_nents == 0 || dst_nents == 0))
> + return NULL;
> +
> + /*
> + * TODO: Currently We only support src and dst both as either
> + * incrementing or decrementing.
> + */
> + if ((flags & DMA_SRC_INC) && !(flags & DMA_DST_INC))
> + return NULL;
> +
> + if ((flags & DMA_SRC_DEC) && !(flags & DMA_DST_DEC))
> + return NULL;
> +
> + if (flags & DMA_SRC_INC)
> + def_flags |= DWC_CTLL_SRC_INC;
> + else if (flags & DMA_SRC_DEC)
> + def_flags |= DWC_CTLL_SRC_DEC;
> +
> + if (flags & DMA_DST_INC)
> + def_flags |= DWC_CTLL_DST_INC;
> + else if (flags & DMA_DST_DEC)
> + def_flags |= DWC_CTLL_DST_DEC;
> +
> + /* silence gcc warnings */
> + src = dst = 0;
> + prev = NULL;
> +
> + while (si < src_nents || di < dst_nents) {
> + if (slen == 0) {
> + si++;
> + src = sg_dma_address(src_sg);
> + slen = sg_dma_len(src_sg);
> + /* If decrementing then start from end address */
> + if (flags & DMA_SRC_DEC)
> + src = src + slen;
> + src_sg = sg_next(src_sg);
> + if (unlikely (si < src_nents && src_sg == NULL)) {
> + dev_printk(KERN_ERR, chan2dev(chan),
> + "Invalid src scattergather list: "
> + "entry %u of %u has last flag\n",
> + si, src_nents);
> + goto err_sg_len;
> + } else if (unlikely (si == src_nents
> + && src_sg != NULL)) {
> + dev_printk(KERN_ERR, chan2dev(chan),
> + "Invalid src scattergather list: "
> + "entry %u of %u has no last flag\n",
> + si, src_nents);
> + goto err_sg_len;
> + }
> + } else {
> + if (flags & DMA_SRC_INC)
> + src += len;
> + else if (flags & DMA_SRC_DEC)
> + src -= len;
> + else
> + goto err_sg_len;
> + }
> +
> + if (dlen == 0) {
> + di++;
> + dst = sg_dma_address(dst_sg);
> + dlen = sg_dma_len(dst_sg);
> + /* If decrementing then start from end address */
> + if (flags & DMA_DST_DEC)
> + dst = dst + dlen;
> + dst_sg = sg_next(dst_sg);
> + if (unlikely (di < dst_nents && dst_sg == NULL)) {
> + dev_printk(KERN_ERR, chan2dev(chan),
> + "Invalid dst scattergather list: "
> + "entry %u of %u has last flag\n",
> + di, dst_nents);
> + goto err_sg_len;
> + } else if (unlikely (di == dst_nents
> + && dst_sg != NULL)) {
> + dev_printk(KERN_ERR, chan2dev(chan),
> + "Invalid dst scattergather list: "
> + "entry %u of %u has no last flag\n",
> + di, dst_nents);
> + goto err_sg_len;
> + }
> + } else {
> + if (flags & DMA_DST_INC)
> + dst += len;
> + else if (flags & DMA_DST_DEC)
> + dst -= len;
> + else
> + goto err_sg_len;
> + }
> +
> + len = min(slen, dlen);
> + slen -= len;
> + dlen -= len;
> +
> + /* src-dst relative misalignment */
> + misalignment = src ^ dst;
> + best_width = __ffs(DWC_MAX_WIDTH | misalignment);
> +
> + /* We want the transfer to be performed in 3 parts:
> + *
> + * - align src and dest to the best possible alignment
> + * (up to the maximum width of 8)
> + *
> + * - perform most of the transfer using the most
> + * efficient alignment
> + *
> + * - complete the transfer using whatever alignment is
> + * needed to total len bytes
> + */
> +
> + /* First part of the transfer: align src and dst*/
> +
> + /* misalignment of src and dst with respect to 0 */
> + misalignment = src | dst;
> + width = __ffs(DWC_MAX_WIDTH | misalignment);
> +
> + /* If we're already aligned, the first part is a
> + * no-op, skip to the second */
> + if (width != best_width
> + && len >> best_width > __ffs(DWC_MAX_WIDTH)) {
> + /* align src and dst to (1 << best_width) */
> + unsigned int align = 1 << best_width;
> + xfer_bytes = (align-src) & (align-dst) & (align-1);
> + } else
> + xfer_bytes = len;
> +
> + offset = 0;
> + xfer_count = xfer_bytes >> width;
> +
> + if (flags & (DMA_SRC_DEC | DMA_DST_DEC)){
> + src -= (1 << width);
> + dst -= (1 << width);
> + }
> +
> + do {
> + ctllo = DWC_DEFAULT_CTLLO(chan->private)
> + | DWC_CTLL_DST_WIDTH(width)
> + | DWC_CTLL_SRC_WIDTH(width)
> + | def_flags
> + | DWC_CTLL_FC_M2M;
> +
> + while (xfer_count > 0) {
> + xfer_count = min_t(size_t, xfer_count,
> + DWC_MAX_COUNT);
> +
> + desc = dwc_desc_get(dwc);
> + if (unlikely (desc == NULL))
> + goto err_desc_get;
> +
> + if (flags & (DMA_SRC_INC | DMA_DST_INC)){
> + desc->lli.sar = src + offset;
> + desc->lli.dar = dst + offset;
> + } else {
> + desc->lli.sar = src - offset;
> + desc->lli.dar = dst - offset;
> + }
> + desc->lli.ctllo = ctllo;
> + desc->lli.ctlhi = xfer_count;
> +
> + if (first == NULL) {
> + first = desc;
> + } else {
> + prev->lli.llp = desc->txd.phys;
> + dma_sync_single_for_device(
> + chan2parent(chan),
> + prev->txd.phys,
> + sizeof(prev->lli),
> + DMA_TO_DEVICE);
> + list_add_tail(&desc->desc_node,
> + &first->tx_list);
> + }
> +
> + prev = desc;
> +
> + offset += xfer_count << width;
> + xfer_bytes -= xfer_count << width;
> +
> + xfer_count = xfer_bytes >> width;
> + }
> +
> + xfer_bytes = len - offset;
> +
> + /*
> + * The first part has been done. If there is
> + * some data to be transferred using
> + * best_width-aligned operations, do
> + * it. Otherwise, choose an alignment that is
> + * works for all of the remaining transfer.
> + */
> + if (xfer_bytes >> best_width > 0) {
> + width = best_width;
> + } else {
> + /* misalignment of the remaining part
> + * of src, dst and len */
> + misalignment = (src+offset) | (dst+offset)
> + | xfer_bytes;
> + width = __ffs(DWC_MAX_WIDTH | misalignment);
> + }
> +
> + xfer_count = xfer_bytes >> width;
> + } while (offset < len);
> +
> + total_len += len;
> + if (flags & (DMA_SRC_DEC | DMA_DST_DEC)){
> + src += (1 << width);
> + dst += (1 << width);
> + }
> +
> + }
> +
> + if (flags & DMA_PREP_INTERRUPT)
> + /* Trigger interrupt after last block */
> + prev->lli.ctllo |= DWC_CTLL_INT_EN;
> +
> + prev->lli.llp = 0;
> + dma_sync_single_for_device(chan2parent(chan),
> + prev->txd.phys, sizeof(prev->lli),
> + DMA_TO_DEVICE);
> +
> + first->txd.flags = flags;
> + first->len = total_len;
> +
> + return &first->txd;
> +
> +err_sg_len:
> +err_desc_get:
> + dwc_desc_put(dwc, first);
> + return NULL;
> +}
> +
> +static struct dma_async_tx_descriptor *
> dwc_prep_slave_sg(struct dma_chan *chan, struct scatterlist *sgl,
> unsigned int sg_len, enum dma_data_direction direction,
> unsigned long flags)
> @@ -1474,6 +1728,7 @@ static int __init dw_probe(struct platform_device
> *pdev)
> dw->dma.device_free_chan_resources = dwc_free_chan_resources;
>
> dw->dma.device_prep_dma_memcpy = dwc_prep_dma_memcpy;
> + dw->dma.device_prep_dma_sg = dwc_prep_dma_sg;
>
> dw->dma.device_prep_slave_sg = dwc_prep_slave_sg;
> dw->dma.device_control = dwc_control;
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
~Vinod
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: dmaengine/Query: What about scatter/gather for mem to mem transfers.
2012-01-02 11:35 ` Vinod Koul
@ 2012-01-02 11:50 ` Pratyush Anand
0 siblings, 0 replies; 12+ messages in thread
From: Pratyush Anand @ 2012-01-02 11:50 UTC (permalink / raw)
To: Vinod Koul
Cc: Williams, Dan J, Viresh KUMAR, linux-kernel, Shiraz HASHIM,
Armando VISCONTI, Deepak SIKRI, Vipin KUMAR, Vipul Kumar SAMAR,
Vincenzo FRASCINO, Mirko GARDI, Rajeev KUMAR, Amit VIRDI,
Bhupesh SHARMA, linus.walleij
On 1/2/2012 5:05 PM, Vinod Koul wrote:
> On Tue, 2011-12-20 at 15:45 +0530, Pratyush Anand wrote:
>> > On 12/20/2011 2:51 PM, Vinod Koul wrote:
>>> > > On Thu, 2011-12-15 at 12:26 +0530, Pratyush Anand wrote:
>>>>>> > >>>> That way existing mechanism would work well for you.
>>>>>> > >>>> You need to split the chunks properly, which is what dma would do
>>>> > >> anyway
>>>>>> > >>>>
>>>>> > >>>
>>>>> > >>> Yes, they can be split like this, but then splitting onus will go on
>>>> > >> dma
>>>>> > >>> user driver, and so there would be replication of similar logic at
>>>>> > >>> several places. Therefore, I was thinking to make device_prep_dma_sg
>>>> > >> as
>>>>> > >>> generic by adding these flags.
>>> > > Well I am not sure how adding flags handles this?
>> >
>> > device_prep_dma_sg has last argument as flags. dma user driver can pass
>> > information about inc/dec of src/dst in this flag, which can be used by
>> > the dma driver. I have put one such implementation at the end of mail.
> with the split pre-done would we need to do this, hence the referred
> implementation.
>
even with pre-done, some flag in this function will be needed. Because
HW also need o know whether to src and dst are in incrementing or
decrementing order.
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2012-01-02 11:51 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-12-08 4:10 dmaengine/Query: What about scatter/gather for mem to mem transfers Viresh Kumar
2011-12-08 7:37 ` Vinod Koul
2011-12-08 7:50 ` Viresh Kumar
2011-12-14 6:47 ` Pratyush Anand
2011-12-15 4:58 ` Pratyush Anand
2011-12-15 5:06 ` Koul, Vinod
2011-12-15 5:24 ` Pratyush Anand
2011-12-15 6:56 ` Pratyush Anand
2011-12-20 9:21 ` Vinod Koul
2011-12-20 10:15 ` Pratyush Anand
2012-01-02 11:35 ` Vinod Koul
2012-01-02 11:50 ` Pratyush Anand
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.