All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] videobuf2-dma-sg: Minimize the number of dma segments
@ 2013-07-15  9:34 Ricardo Ribalda Delgado
  2013-07-17  8:27 ` Marek Szyprowski
  0 siblings, 1 reply; 10+ messages in thread
From: Ricardo Ribalda Delgado @ 2013-07-15  9:34 UTC (permalink / raw)
  To: Pawel Osciak, Marek Szyprowski, Kyungmin Park,
	Mauro Carvalho Chehab, linux-media, open list
  Cc: Ricardo Ribalda Delgado

Most DMA engines have limitations regarding the number of DMA segments
(sg-buffers) that they can handle. Videobuffers can easily spread through
houndreds of pages.

In the previous aproach, the pages were allocated individually, this
could led to the creation houndreds of dma segments (sg-buffers) that
could not be handled by some DMA engines.

This patch tries to minimize the number of DMA segments by using
alloc_pages_exact. In the worst case it will behave as before, but most
of the times it will reduce the number fo dma segments

Signed-off-by: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>
---
 drivers/media/v4l2-core/videobuf2-dma-sg.c |   49 +++++++++++++++++++++-------
 1 file changed, 38 insertions(+), 11 deletions(-)

diff --git a/drivers/media/v4l2-core/videobuf2-dma-sg.c b/drivers/media/v4l2-core/videobuf2-dma-sg.c
index 16ae3dc..67a94ab 100644
--- a/drivers/media/v4l2-core/videobuf2-dma-sg.c
+++ b/drivers/media/v4l2-core/videobuf2-dma-sg.c
@@ -42,10 +42,44 @@ struct vb2_dma_sg_buf {
 
 static void vb2_dma_sg_put(void *buf_priv);
 
+static int vb2_dma_sg_alloc_compacted(struct vb2_dma_sg_buf *buf,
+		gfp_t gfp_flags)
+{
+	unsigned int last_page = 0;
+	void *vaddr = NULL;
+	unsigned int req_pages;
+
+	while (last_page < buf->sg_desc.num_pages) {
+		req_pages = buf->sg_desc.num_pages-last_page;
+		while (req_pages >= 1) {
+			vaddr = alloc_pages_exact(req_pages*PAGE_SIZE,
+				GFP_KERNEL | __GFP_ZERO | __GFP_NOWARN);
+			if (vaddr)
+				break;
+			req_pages >>= 1;
+		}
+		if (!vaddr) {
+			while (--last_page >= 0)
+				__free_page(buf->pages[last_page]);
+			return -ENOMEM;
+		}
+		while (req_pages) {
+			buf->pages[last_page] = virt_to_page(vaddr);
+			sg_set_page(&buf->sg_desc.sglist[last_page],
+					buf->pages[last_page], PAGE_SIZE, 0);
+			vaddr += PAGE_SIZE;
+			last_page++;
+			req_pages--;
+		}
+	}
+
+	return 0;
+}
+
 static void *vb2_dma_sg_alloc(void *alloc_ctx, unsigned long size, gfp_t gfp_flags)
 {
 	struct vb2_dma_sg_buf *buf;
-	int i;
+	int ret;
 
 	buf = kzalloc(sizeof *buf, GFP_KERNEL);
 	if (!buf)
@@ -69,14 +103,9 @@ static void *vb2_dma_sg_alloc(void *alloc_ctx, unsigned long size, gfp_t gfp_fla
 	if (!buf->pages)
 		goto fail_pages_array_alloc;
 
-	for (i = 0; i < buf->sg_desc.num_pages; ++i) {
-		buf->pages[i] = alloc_page(GFP_KERNEL | __GFP_ZERO |
-					   __GFP_NOWARN | gfp_flags);
-		if (NULL == buf->pages[i])
-			goto fail_pages_alloc;
-		sg_set_page(&buf->sg_desc.sglist[i],
-			    buf->pages[i], PAGE_SIZE, 0);
-	}
+	ret = vb2_dma_sg_alloc_compacted(buf, gfp_flags);
+	if (ret)
+		goto fail_pages_alloc;
 
 	buf->handler.refcount = &buf->refcount;
 	buf->handler.put = vb2_dma_sg_put;
@@ -89,8 +118,6 @@ static void *vb2_dma_sg_alloc(void *alloc_ctx, unsigned long size, gfp_t gfp_fla
 	return buf;
 
 fail_pages_alloc:
-	while (--i >= 0)
-		__free_page(buf->pages[i]);
 	kfree(buf->pages);
 
 fail_pages_array_alloc:
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] videobuf2-dma-sg: Minimize the number of dma segments
  2013-07-15  9:34 [PATCH] videobuf2-dma-sg: Minimize the number of dma segments Ricardo Ribalda Delgado
@ 2013-07-17  8:27 ` Marek Szyprowski
  2013-07-17  9:43   ` Ricardo Ribalda Delgado
  0 siblings, 1 reply; 10+ messages in thread
From: Marek Szyprowski @ 2013-07-17  8:27 UTC (permalink / raw)
  To: Ricardo Ribalda Delgado
  Cc: Pawel Osciak, Kyungmin Park, Mauro Carvalho Chehab, linux-media,
	open list

Hello,

On 7/15/2013 11:34 AM, Ricardo Ribalda Delgado wrote:
> Most DMA engines have limitations regarding the number of DMA segments
> (sg-buffers) that they can handle. Videobuffers can easily spread through
> houndreds of pages.
>
> In the previous aproach, the pages were allocated individually, this
> could led to the creation houndreds of dma segments (sg-buffers) that
> could not be handled by some DMA engines.
>
> This patch tries to minimize the number of DMA segments by using
> alloc_pages_exact. In the worst case it will behave as before, but most
> of the times it will reduce the number fo dma segments
>
> Signed-off-by: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>

I like the idea, but the code doesn't seem to be correct. buf->pages 
array is
needed for vb2_dma_sg_mmap() function to map pages to userspace. However 
in your
code I don't see any place where you fill it with the pages of order higher
than 0. AFAIR vm_insert_page() can handle compound pages, but 
alloc_pages_exact()
splits such pages into a set of pages of order 0, so you need to change 
your code
to correctly fill buf->pages array.

> ---
>   drivers/media/v4l2-core/videobuf2-dma-sg.c |   49 +++++++++++++++++++++-------
>   1 file changed, 38 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/media/v4l2-core/videobuf2-dma-sg.c b/drivers/media/v4l2-core/videobuf2-dma-sg.c
> index 16ae3dc..67a94ab 100644
> --- a/drivers/media/v4l2-core/videobuf2-dma-sg.c
> +++ b/drivers/media/v4l2-core/videobuf2-dma-sg.c
> @@ -42,10 +42,44 @@ struct vb2_dma_sg_buf {
>   
>   static void vb2_dma_sg_put(void *buf_priv);
>   
> +static int vb2_dma_sg_alloc_compacted(struct vb2_dma_sg_buf *buf,
> +		gfp_t gfp_flags)
> +{
> +	unsigned int last_page = 0;
> +	void *vaddr = NULL;
> +	unsigned int req_pages;
> +
> +	while (last_page < buf->sg_desc.num_pages) {
> +		req_pages = buf->sg_desc.num_pages-last_page;
> +		while (req_pages >= 1) {
> +			vaddr = alloc_pages_exact(req_pages*PAGE_SIZE,
> +				GFP_KERNEL | __GFP_ZERO | __GFP_NOWARN);
> +			if (vaddr)
> +				break;
> +			req_pages >>= 1;
> +		}
> +		if (!vaddr) {
> +			while (--last_page >= 0)
> +				__free_page(buf->pages[last_page]);
> +			return -ENOMEM;
> +		}
> +		while (req_pages) {
> +			buf->pages[last_page] = virt_to_page(vaddr);
> +			sg_set_page(&buf->sg_desc.sglist[last_page],
> +					buf->pages[last_page], PAGE_SIZE, 0);
> +			vaddr += PAGE_SIZE;
> +			last_page++;
> +			req_pages--;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
>   static void *vb2_dma_sg_alloc(void *alloc_ctx, unsigned long size, gfp_t gfp_flags)
>   {
>   	struct vb2_dma_sg_buf *buf;
> -	int i;
> +	int ret;
>   
>   	buf = kzalloc(sizeof *buf, GFP_KERNEL);
>   	if (!buf)
> @@ -69,14 +103,9 @@ static void *vb2_dma_sg_alloc(void *alloc_ctx, unsigned long size, gfp_t gfp_fla
>   	if (!buf->pages)
>   		goto fail_pages_array_alloc;
>   
> -	for (i = 0; i < buf->sg_desc.num_pages; ++i) {
> -		buf->pages[i] = alloc_page(GFP_KERNEL | __GFP_ZERO |
> -					   __GFP_NOWARN | gfp_flags);
> -		if (NULL == buf->pages[i])
> -			goto fail_pages_alloc;
> -		sg_set_page(&buf->sg_desc.sglist[i],
> -			    buf->pages[i], PAGE_SIZE, 0);
> -	}
> +	ret = vb2_dma_sg_alloc_compacted(buf, gfp_flags);
> +	if (ret)
> +		goto fail_pages_alloc;
>   
>   	buf->handler.refcount = &buf->refcount;
>   	buf->handler.put = vb2_dma_sg_put;
> @@ -89,8 +118,6 @@ static void *vb2_dma_sg_alloc(void *alloc_ctx, unsigned long size, gfp_t gfp_fla
>   	return buf;
>   
>   fail_pages_alloc:
> -	while (--i >= 0)
> -		__free_page(buf->pages[i]);
>   	kfree(buf->pages);
>   
>   fail_pages_array_alloc:

Best regards
-- 
Marek Szyprowski
Samsung R&D Institute Poland



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] videobuf2-dma-sg: Minimize the number of dma segments
  2013-07-17  8:27 ` Marek Szyprowski
@ 2013-07-17  9:43   ` Ricardo Ribalda Delgado
  2013-07-17 13:42     ` Marek Szyprowski
  2013-07-17 14:05     ` Ricardo Ribalda Delgado
  0 siblings, 2 replies; 10+ messages in thread
From: Ricardo Ribalda Delgado @ 2013-07-17  9:43 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Pawel Osciak, Kyungmin Park, Mauro Carvalho Chehab, linux-media,
	open list

Hi Marek

 alloc_pages_exact returns pages of order 0, every single page is
filled into buf->pages, that then is used by vb2_dma_sg_mmap(), that
also expects order 0 pages (its loops increments in PAGE_SIZE). The
code has been tested on real HW.

Your concern is that that alloc_pages_exact splits the higher order pages?

Do you want that videobuf2-dma-sg to have support for higher order pages?


Best regards

PS: sorry for the duplicated post, previous one contained html parts
and was rejected by the list

On Wed, Jul 17, 2013 at 10:27 AM, Marek Szyprowski
<m.szyprowski@samsung.com> wrote:
> Hello,
>
>
> On 7/15/2013 11:34 AM, Ricardo Ribalda Delgado wrote:
>>
>> Most DMA engines have limitations regarding the number of DMA segments
>> (sg-buffers) that they can handle. Videobuffers can easily spread through
>> houndreds of pages.
>>
>> In the previous aproach, the pages were allocated individually, this
>> could led to the creation houndreds of dma segments (sg-buffers) that
>> could not be handled by some DMA engines.
>>
>> This patch tries to minimize the number of DMA segments by using
>> alloc_pages_exact. In the worst case it will behave as before, but most
>> of the times it will reduce the number fo dma segments
>>
>> Signed-off-by: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>
>
>
> I like the idea, but the code doesn't seem to be correct. buf->pages array
> is
> needed for vb2_dma_sg_mmap() function to map pages to userspace. However in
> your
> code I don't see any place where you fill it with the pages of order higher
> than 0. AFAIR vm_insert_page() can handle compound pages, but
> alloc_pages_exact()
> splits such pages into a set of pages of order 0, so you need to change your
> code
> to correctly fill buf->pages array.
>
>
>> ---
>>   drivers/media/v4l2-core/videobuf2-dma-sg.c |   49
>> +++++++++++++++++++++-------
>>   1 file changed, 38 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/media/v4l2-core/videobuf2-dma-sg.c
>> b/drivers/media/v4l2-core/videobuf2-dma-sg.c
>> index 16ae3dc..67a94ab 100644
>> --- a/drivers/media/v4l2-core/videobuf2-dma-sg.c
>> +++ b/drivers/media/v4l2-core/videobuf2-dma-sg.c
>> @@ -42,10 +42,44 @@ struct vb2_dma_sg_buf {
>>     static void vb2_dma_sg_put(void *buf_priv);
>>   +static int vb2_dma_sg_alloc_compacted(struct vb2_dma_sg_buf *buf,
>> +               gfp_t gfp_flags)
>> +{
>> +       unsigned int last_page = 0;
>> +       void *vaddr = NULL;
>> +       unsigned int req_pages;
>> +
>> +       while (last_page < buf->sg_desc.num_pages) {
>> +               req_pages = buf->sg_desc.num_pages-last_page;
>> +               while (req_pages >= 1) {
>> +                       vaddr = alloc_pages_exact(req_pages*PAGE_SIZE,
>> +                               GFP_KERNEL | __GFP_ZERO | __GFP_NOWARN);
>> +                       if (vaddr)
>> +                               break;
>> +                       req_pages >>= 1;
>> +               }
>> +               if (!vaddr) {
>> +                       while (--last_page >= 0)
>> +                               __free_page(buf->pages[last_page]);
>> +                       return -ENOMEM;
>> +               }
>> +               while (req_pages) {
>> +                       buf->pages[last_page] = virt_to_page(vaddr);
>> +                       sg_set_page(&buf->sg_desc.sglist[last_page],
>> +                                       buf->pages[last_page], PAGE_SIZE,
>> 0);
>> +                       vaddr += PAGE_SIZE;
>> +                       last_page++;
>> +                       req_pages--;
>> +               }
>> +       }
>> +
>> +       return 0;
>> +}
>> +
>>   static void *vb2_dma_sg_alloc(void *alloc_ctx, unsigned long size, gfp_t
>> gfp_flags)
>>   {
>>         struct vb2_dma_sg_buf *buf;
>> -       int i;
>> +       int ret;
>>         buf = kzalloc(sizeof *buf, GFP_KERNEL);
>>         if (!buf)
>> @@ -69,14 +103,9 @@ static void *vb2_dma_sg_alloc(void *alloc_ctx,
>> unsigned long size, gfp_t gfp_fla
>>         if (!buf->pages)
>>                 goto fail_pages_array_alloc;
>>   -     for (i = 0; i < buf->sg_desc.num_pages; ++i) {
>> -               buf->pages[i] = alloc_page(GFP_KERNEL | __GFP_ZERO |
>> -                                          __GFP_NOWARN | gfp_flags);
>> -               if (NULL == buf->pages[i])
>> -                       goto fail_pages_alloc;
>> -               sg_set_page(&buf->sg_desc.sglist[i],
>> -                           buf->pages[i], PAGE_SIZE, 0);
>> -       }
>> +       ret = vb2_dma_sg_alloc_compacted(buf, gfp_flags);
>> +       if (ret)
>> +               goto fail_pages_alloc;
>>         buf->handler.refcount = &buf->refcount;
>>         buf->handler.put = vb2_dma_sg_put;
>> @@ -89,8 +118,6 @@ static void *vb2_dma_sg_alloc(void *alloc_ctx, unsigned
>> long size, gfp_t gfp_fla
>>         return buf;
>>     fail_pages_alloc:
>> -       while (--i >= 0)
>> -               __free_page(buf->pages[i]);
>>         kfree(buf->pages);
>>     fail_pages_array_alloc:
>
>
> Best regards
> --
> Marek Szyprowski
> Samsung R&D Institute Poland
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-media" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Ricardo Ribalda

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] videobuf2-dma-sg: Minimize the number of dma segments
  2013-07-17  9:43   ` Ricardo Ribalda Delgado
@ 2013-07-17 13:42     ` Marek Szyprowski
  2013-07-17 14:20       ` Ricardo Ribalda Delgado
  2013-07-17 14:05     ` Ricardo Ribalda Delgado
  1 sibling, 1 reply; 10+ messages in thread
From: Marek Szyprowski @ 2013-07-17 13:42 UTC (permalink / raw)
  To: Ricardo Ribalda Delgado
  Cc: Pawel Osciak, Kyungmin Park, Mauro Carvalho Chehab, linux-media,
	open list

Hello,

On 7/17/2013 11:43 AM, Ricardo Ribalda Delgado wrote:
> Hi Marek
>
>   alloc_pages_exact returns pages of order 0, every single page is
> filled into buf->pages, that then is used by vb2_dma_sg_mmap(), that
> also expects order 0 pages (its loops increments in PAGE_SIZE). The
> code has been tested on real HW.
>
> Your concern is that that alloc_pages_exact splits the higher order pages?
>
> Do you want that videobuf2-dma-sg to have support for higher order pages?

Ah... My fault. I didn't notice that you recalculate req_pages at the
begginning of each loop iteration, so the code is correct, buf->pages is
filled correctly with order 0 pages.

So now the only issue I see is the oversized sglist allocation (the size
of sg list is computed for worst case, 0 order pages) and lack of the max
segment size support. Sadly there are devices which can handle single sg
chunk up to some predefined size (see dma_get_max_seg_size() function).

For some reference code, please check __iommu_map_sg() and maybe
__iommu_alloc_buffer() functions in arch/arm/mm/dma-mapping.c

Best regards
-- 
Marek Szyprowski
Samsung R&D Institute Poland



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] videobuf2-dma-sg: Minimize the number of dma segments
  2013-07-17  9:43   ` Ricardo Ribalda Delgado
  2013-07-17 13:42     ` Marek Szyprowski
@ 2013-07-17 14:05     ` Ricardo Ribalda Delgado
  1 sibling, 0 replies; 10+ messages in thread
From: Ricardo Ribalda Delgado @ 2013-07-17 14:05 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Pawel Osciak, Kyungmin Park, Mauro Carvalho Chehab, linux-media,
	open list

Hello again

I have made some experiments and have replaced alloc_pages_exact with
alloc_pages of order N. Unfortunatelly vm_insert_page and vm_map_ram
does not work as expected.

vm_insert_page, only insert the PAGE_SIZE bytes of the higher order
page, if I try to add the other pages manually, the function returns
-22 due to count=0.
vm_map_ram is designed to work with exactly order 0 pages.

it works fine if I manually split the page, but the header file
suggest that function should not be called by the drivers... (although
this is not a device driver).

Here you can see my experiment in case you want to replicate it.


---
 drivers/media/v4l2-core/videobuf2-dma-sg.c |   89 ++++++++++++++++++++++------
 1 file changed, 72 insertions(+), 17 deletions(-)

diff --git a/drivers/media/v4l2-core/videobuf2-dma-sg.c
b/drivers/media/v4l2-core/videobuf2-dma-sg.c
index 16ae3dc..bbb3680 100644
--- a/drivers/media/v4l2-core/videobuf2-dma-sg.c
+++ b/drivers/media/v4l2-core/videobuf2-dma-sg.c
@@ -24,6 +24,8 @@
 static int debug;
 module_param(debug, int, 0644);

+#define PAGE_SPLIT
+
 #define dprintk(level, fmt, arg...) \
  do { \
  if (debug >= level) \
@@ -42,10 +44,69 @@ struct vb2_dma_sg_buf {

 static void vb2_dma_sg_put(void *buf_priv);

+static int vb2_dma_sg_alloc_compacted(struct vb2_dma_sg_buf *buf,
+ gfp_t gfp_flags)
+{
+ unsigned int last_page = 0;
+ int size = buf->sg_desc.size;
+
+ while (size > 0) {
+ struct page *pages;
+ int order;
+#ifdef PAGE_SPLIT
+ int i;
+#endif
+
+ order = get_order(size);
+ /* Dont over allocate*/
+ if ((PAGE_SIZE << order) > size)
+ order--;
+
+ pages = NULL;
+ while (!pages) {
+ pages = alloc_pages(GFP_KERNEL | __GFP_ZERO |
+ __GFP_NOWARN | gfp_flags, order);
+ if (pages)
+ break;
+
+
+ if (order == 0)
+ while (--last_page >= 0) {
+ __free_pages(
+ buf->pages[last_page],
+ get_order(sg_dma_len
+ (&buf->sg_desc.sglist[last_page])));
+ return -ENOMEM;
+ }
+ order--;
+ }
+
+#ifdef PAGE_SPLIT
+ split_page(pages, order);
+ for (i = 0; i < (1<<order); i++) {
+ buf->pages[last_page] = pages + i;
+ sg_set_page(&buf->sg_desc.sglist[last_page],
+ buf->pages[last_page], PAGE_SIZE, 0);
+ last_page++;
+ }
+#else
+ buf->pages[last_page] = pages;
+ sg_set_page(&buf->sg_desc.sglist[last_page],
+ buf->pages[last_page], PAGE_SIZE << order, 0);
+ last_page++;
+#endif
+ size -= PAGE_SIZE << order;
+ }
+
+ buf->sg_desc.num_pages = last_page;
+
+ return 0;
+}
+
 static void *vb2_dma_sg_alloc(void *alloc_ctx, unsigned long size,
gfp_t gfp_flags)
 {
  struct vb2_dma_sg_buf *buf;
- int i;
+ int ret;

  buf = kzalloc(sizeof *buf, GFP_KERNEL);
  if (!buf)
@@ -69,14 +130,9 @@ static void *vb2_dma_sg_alloc(void *alloc_ctx,
unsigned long size, gfp_t gfp_fla
  if (!buf->pages)
  goto fail_pages_array_alloc;

- for (i = 0; i < buf->sg_desc.num_pages; ++i) {
- buf->pages[i] = alloc_page(GFP_KERNEL | __GFP_ZERO |
-   __GFP_NOWARN | gfp_flags);
- if (NULL == buf->pages[i])
- goto fail_pages_alloc;
- sg_set_page(&buf->sg_desc.sglist[i],
-    buf->pages[i], PAGE_SIZE, 0);
- }
+ ret = vb2_dma_sg_alloc_compacted(buf, gfp_flags);
+ if (ret)
+ goto fail_pages_alloc;

  buf->handler.refcount = &buf->refcount;
  buf->handler.put = vb2_dma_sg_put;
@@ -89,8 +145,6 @@ static void *vb2_dma_sg_alloc(void *alloc_ctx,
unsigned long size, gfp_t gfp_fla
  return buf;

 fail_pages_alloc:
- while (--i >= 0)
- __free_page(buf->pages[i]);
  kfree(buf->pages);

 fail_pages_array_alloc:
@@ -111,9 +165,10 @@ static void vb2_dma_sg_put(void *buf_priv)
  buf->sg_desc.num_pages);
  if (buf->vaddr)
  vm_unmap_ram(buf->vaddr, buf->sg_desc.num_pages);
- vfree(buf->sg_desc.sglist);
  while (--i >= 0)
- __free_page(buf->pages[i]);
+ __free_pages(buf->pages[i], get_order(
+ sg_dma_len(&buf->sg_desc.sglist[i])));
+ vfree(buf->sg_desc.sglist);
  kfree(buf->pages);
  kfree(buf);
  }
@@ -248,14 +303,14 @@ static int vb2_dma_sg_mmap(void *buf_priv,
struct vm_area_struct *vma)
  do {
  int ret;

- ret = vm_insert_page(vma, uaddr, buf->pages[i++]);
+ ret = vm_insert_page(vma, uaddr, buf->pages[i]);
  if (ret) {
  printk(KERN_ERR "Remapping memory, error: %d\n", ret);
  return ret;
  }
-
- uaddr += PAGE_SIZE;
- usize -= PAGE_SIZE;
+ uaddr += sg_dma_len(&buf->sg_desc.sglist[i]);
+ usize -= sg_dma_len(&buf->sg_desc.sglist[i]);
+ i++;
  } while (usize > 0);


--
1.7.10.4



On Wed, Jul 17, 2013 at 11:43 AM, Ricardo Ribalda Delgado
<ricardo.ribalda@gmail.com> wrote:
> Hi Marek
>
>  alloc_pages_exact returns pages of order 0, every single page is
> filled into buf->pages, that then is used by vb2_dma_sg_mmap(), that
> also expects order 0 pages (its loops increments in PAGE_SIZE). The
> code has been tested on real HW.
>
> Your concern is that that alloc_pages_exact splits the higher order pages?
>
> Do you want that videobuf2-dma-sg to have support for higher order pages?
>
>
> Best regards
>
> PS: sorry for the duplicated post, previous one contained html parts
> and was rejected by the list
>
> On Wed, Jul 17, 2013 at 10:27 AM, Marek Szyprowski
> <m.szyprowski@samsung.com> wrote:
>> Hello,
>>
>>
>> On 7/15/2013 11:34 AM, Ricardo Ribalda Delgado wrote:
>>>
>>> Most DMA engines have limitations regarding the number of DMA segments
>>> (sg-buffers) that they can handle. Videobuffers can easily spread through
>>> houndreds of pages.
>>>
>>> In the previous aproach, the pages were allocated individually, this
>>> could led to the creation houndreds of dma segments (sg-buffers) that
>>> could not be handled by some DMA engines.
>>>
>>> This patch tries to minimize the number of DMA segments by using
>>> alloc_pages_exact. In the worst case it will behave as before, but most
>>> of the times it will reduce the number fo dma segments
>>>
>>> Signed-off-by: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>
>>
>>
>> I like the idea, but the code doesn't seem to be correct. buf->pages array
>> is
>> needed for vb2_dma_sg_mmap() function to map pages to userspace. However in
>> your
>> code I don't see any place where you fill it with the pages of order higher
>> than 0. AFAIR vm_insert_page() can handle compound pages, but
>> alloc_pages_exact()
>> splits such pages into a set of pages of order 0, so you need to change your
>> code
>> to correctly fill buf->pages array.
>>
>>
>>> ---
>>>   drivers/media/v4l2-core/videobuf2-dma-sg.c |   49
>>> +++++++++++++++++++++-------
>>>   1 file changed, 38 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/drivers/media/v4l2-core/videobuf2-dma-sg.c
>>> b/drivers/media/v4l2-core/videobuf2-dma-sg.c
>>> index 16ae3dc..67a94ab 100644
>>> --- a/drivers/media/v4l2-core/videobuf2-dma-sg.c
>>> +++ b/drivers/media/v4l2-core/videobuf2-dma-sg.c
>>> @@ -42,10 +42,44 @@ struct vb2_dma_sg_buf {
>>>     static void vb2_dma_sg_put(void *buf_priv);
>>>   +static int vb2_dma_sg_alloc_compacted(struct vb2_dma_sg_buf *buf,
>>> +               gfp_t gfp_flags)
>>> +{
>>> +       unsigned int last_page = 0;
>>> +       void *vaddr = NULL;
>>> +       unsigned int req_pages;
>>> +
>>> +       while (last_page < buf->sg_desc.num_pages) {
>>> +               req_pages = buf->sg_desc.num_pages-last_page;
>>> +               while (req_pages >= 1) {
>>> +                       vaddr = alloc_pages_exact(req_pages*PAGE_SIZE,
>>> +                               GFP_KERNEL | __GFP_ZERO | __GFP_NOWARN);
>>> +                       if (vaddr)
>>> +                               break;
>>> +                       req_pages >>= 1;
>>> +               }
>>> +               if (!vaddr) {
>>> +                       while (--last_page >= 0)
>>> +                               __free_page(buf->pages[last_page]);
>>> +                       return -ENOMEM;
>>> +               }
>>> +               while (req_pages) {
>>> +                       buf->pages[last_page] = virt_to_page(vaddr);
>>> +                       sg_set_page(&buf->sg_desc.sglist[last_page],
>>> +                                       buf->pages[last_page], PAGE_SIZE,
>>> 0);
>>> +                       vaddr += PAGE_SIZE;
>>> +                       last_page++;
>>> +                       req_pages--;
>>> +               }
>>> +       }
>>> +
>>> +       return 0;
>>> +}
>>> +
>>>   static void *vb2_dma_sg_alloc(void *alloc_ctx, unsigned long size, gfp_t
>>> gfp_flags)
>>>   {
>>>         struct vb2_dma_sg_buf *buf;
>>> -       int i;
>>> +       int ret;
>>>         buf = kzalloc(sizeof *buf, GFP_KERNEL);
>>>         if (!buf)
>>> @@ -69,14 +103,9 @@ static void *vb2_dma_sg_alloc(void *alloc_ctx,
>>> unsigned long size, gfp_t gfp_fla
>>>         if (!buf->pages)
>>>                 goto fail_pages_array_alloc;
>>>   -     for (i = 0; i < buf->sg_desc.num_pages; ++i) {
>>> -               buf->pages[i] = alloc_page(GFP_KERNEL | __GFP_ZERO |
>>> -                                          __GFP_NOWARN | gfp_flags);
>>> -               if (NULL == buf->pages[i])
>>> -                       goto fail_pages_alloc;
>>> -               sg_set_page(&buf->sg_desc.sglist[i],
>>> -                           buf->pages[i], PAGE_SIZE, 0);
>>> -       }
>>> +       ret = vb2_dma_sg_alloc_compacted(buf, gfp_flags);
>>> +       if (ret)
>>> +               goto fail_pages_alloc;
>>>         buf->handler.refcount = &buf->refcount;
>>>         buf->handler.put = vb2_dma_sg_put;
>>> @@ -89,8 +118,6 @@ static void *vb2_dma_sg_alloc(void *alloc_ctx, unsigned
>>> long size, gfp_t gfp_fla
>>>         return buf;
>>>     fail_pages_alloc:
>>> -       while (--i >= 0)
>>> -               __free_page(buf->pages[i]);
>>>         kfree(buf->pages);
>>>     fail_pages_array_alloc:
>>
>>
>> Best regards
>> --
>> Marek Szyprowski
>> Samsung R&D Institute Poland
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-media" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Ricardo Ribalda



-- 
Ricardo Ribalda

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] videobuf2-dma-sg: Minimize the number of dma segments
  2013-07-17 13:42     ` Marek Szyprowski
@ 2013-07-17 14:20       ` Ricardo Ribalda Delgado
  2013-07-18  7:15         ` Marek Szyprowski
  0 siblings, 1 reply; 10+ messages in thread
From: Ricardo Ribalda Delgado @ 2013-07-17 14:20 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Pawel Osciak, Kyungmin Park, Mauro Carvalho Chehab, linux-media,
	open list

Hello again Marek

In my system I am doing the scatter gather compaction on device
driver... But I agree that it would be better done on the vb2 layer.

For the oversize sglist we could do one of this two things.

If we want to have a simple pass processing we have to allocate an
structure A for the worts case, work on that structure. then allocate
a structure B for the exact size that we need, memcpy A to B, and
free(A).

Otherwise we need two passes. One to allocate the pages, and another
one to allocate the pages and find out the amount of sg, and another
to greate the sg structure.

What do you prefer?


Regards!




On Wed, Jul 17, 2013 at 3:42 PM, Marek Szyprowski
<m.szyprowski@samsung.com> wrote:
> Hello,
>
>
> On 7/17/2013 11:43 AM, Ricardo Ribalda Delgado wrote:
>>
>> Hi Marek
>>
>>   alloc_pages_exact returns pages of order 0, every single page is
>> filled into buf->pages, that then is used by vb2_dma_sg_mmap(), that
>> also expects order 0 pages (its loops increments in PAGE_SIZE). The
>> code has been tested on real HW.
>>
>> Your concern is that that alloc_pages_exact splits the higher order pages?
>>
>> Do you want that videobuf2-dma-sg to have support for higher order pages?
>
>
> Ah... My fault. I didn't notice that you recalculate req_pages at the
> begginning of each loop iteration, so the code is correct, buf->pages is
> filled correctly with order 0 pages.
>
> So now the only issue I see is the oversized sglist allocation (the size
> of sg list is computed for worst case, 0 order pages) and lack of the max
> segment size support. Sadly there are devices which can handle single sg
> chunk up to some predefined size (see dma_get_max_seg_size() function).
>
> For some reference code, please check __iommu_map_sg() and maybe
> __iommu_alloc_buffer() functions in arch/arm/mm/dma-mapping.c
>
>
> Best regards
> --
> Marek Szyprowski
> Samsung R&D Institute Poland
>
>



-- 
Ricardo Ribalda

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] videobuf2-dma-sg: Minimize the number of dma segments
  2013-07-17 14:20       ` Ricardo Ribalda Delgado
@ 2013-07-18  7:15         ` Marek Szyprowski
  2013-07-18  7:39           ` Ricardo Ribalda Delgado
  0 siblings, 1 reply; 10+ messages in thread
From: Marek Szyprowski @ 2013-07-18  7:15 UTC (permalink / raw)
  To: Ricardo Ribalda Delgado
  Cc: Pawel Osciak, Kyungmin Park, Mauro Carvalho Chehab, linux-media,
	open list

Hello,

On 7/17/2013 4:20 PM, Ricardo Ribalda Delgado wrote:
> Hello again Marek
>
> In my system I am doing the scatter gather compaction on device
> driver... But I agree that it would be better done on the vb2 layer.
>
> For the oversize sglist we could do one of this two things.
>
> If we want to have a simple pass processing we have to allocate an
> structure A for the worts case, work on that structure. then allocate
> a structure B for the exact size that we need, memcpy A to B, and
> free(A).
>
> Otherwise we need two passes. One to allocate the pages, and another
> one to allocate the pages and find out the amount of sg, and another
> to greate the sg structure.
>
> What do you prefer?

I would prefer two passes approach. In the first pass you just fill the
buf->pages array with order 0 entries and count the total number of memory
chunks (adding support for max dma segment size at this point should be
quite easy). In the second pass you just allocate the scatter list and
fill it with previously allocated pages.

I have also the following changes on my TODO list for vb2-dma-sg:
- remove custom vb2_dma_sg_desc structure and replace it with common
sg_table structure
- move mapping of the scatter list from device driver to vb2-dma-sg module
to simplify driver code and unify memory management across devices (so the
driver just gets correctly mapped scatter list and only reads dma addresses
of each memory chunk, no longer needs to track buffer state/ownership).
The correct flow is to call dma_map_sg() at buffer allocation,
dma_unmap_sg() at free and dma_sync_for_{device,cpu} in prepare/finish
callbacks. The only problem here is the need to convert all existing users
of vb2-dma-sg (marvell-ccic and solo6x10) to the new interface.

However I have completely no time atm to do any of the above changes. Would
You like to take any of the above tasks while playing with vb2-dma-sg?

Best regards
-- 
Marek Szyprowski
Samsung R&D Institute Poland



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] videobuf2-dma-sg: Minimize the number of dma segments
  2013-07-18  7:15         ` Marek Szyprowski
@ 2013-07-18  7:39           ` Ricardo Ribalda Delgado
  2013-07-18 13:34             ` Marek Szyprowski
  0 siblings, 1 reply; 10+ messages in thread
From: Ricardo Ribalda Delgado @ 2013-07-18  7:39 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Pawel Osciak, Kyungmin Park, Mauro Carvalho Chehab, linux-media,
	open list

Hello again:

I have started to implemt it, but I think there is more hidden work in
this task as it seems. In order to call dma_map_sg and
max_dma_segment_size I need acess to the struct device, but (correct
me if I am wrong), vb2 is device agnostic. Adding the above
functionality will mean not only updating marvell-ccic and solo6x10,
but updating all the vb2 buffers.

Also after some readings, maybe the sg compactation should not be done
here, but in dma_map_sg. According to the doc:

"""
The implementation is free to merge several consecutive sglist entries
into one (e.g. if DMA mapping is done with PAGE_SIZE granularity, any
consecutive sglist entries can be merged into one provided the first one
ends and the second one starts on a page boundary - in fact this is a huge
advantage for cards which either cannot do scatter-gather or have very
limited number of scatter-gather entries) and returns the actual number
of sg entries it mapped them to. On failure 0 is returned.
"""

So, my proposal would be to alloc with alloc_pages to try to get
memory as coherent as possible, then split the page, set the sg in
PAGE_SIZE lenghts, and then let the dma_map_sg do its magic. if it
doesnt do compactation, fix dma_map_sg, so more driver could take
advantage of it.

I could also of course fix marvell-ccic and solo6x10 to use sg_table.

Does anything of this make sense?


Regards

On Thu, Jul 18, 2013 at 9:15 AM, Marek Szyprowski
<m.szyprowski@samsung.com> wrote:
> Hello,
>
>
> On 7/17/2013 4:20 PM, Ricardo Ribalda Delgado wrote:
>>
>> Hello again Marek
>>
>> In my system I am doing the scatter gather compaction on device
>> driver... But I agree that it would be better done on the vb2 layer.
>>
>> For the oversize sglist we could do one of this two things.
>>
>> If we want to have a simple pass processing we have to allocate an
>> structure A for the worts case, work on that structure. then allocate
>> a structure B for the exact size that we need, memcpy A to B, and
>> free(A).
>>
>> Otherwise we need two passes. One to allocate the pages, and another
>> one to allocate the pages and find out the amount of sg, and another
>> to greate the sg structure.
>>
>> What do you prefer?
>
>
> I would prefer two passes approach. In the first pass you just fill the
> buf->pages array with order 0 entries and count the total number of memory
> chunks (adding support for max dma segment size at this point should be
> quite easy). In the second pass you just allocate the scatter list and
> fill it with previously allocated pages.
>
> I have also the following changes on my TODO list for vb2-dma-sg:
> - remove custom vb2_dma_sg_desc structure and replace it with common
> sg_table structure
> - move mapping of the scatter list from device driver to vb2-dma-sg module
> to simplify driver code and unify memory management across devices (so the
> driver just gets correctly mapped scatter list and only reads dma addresses
> of each memory chunk, no longer needs to track buffer state/ownership).
> The correct flow is to call dma_map_sg() at buffer allocation,
> dma_unmap_sg() at free and dma_sync_for_{device,cpu} in prepare/finish
> callbacks. The only problem here is the need to convert all existing users
> of vb2-dma-sg (marvell-ccic and solo6x10) to the new interface.
>
> However I have completely no time atm to do any of the above changes. Would
> You like to take any of the above tasks while playing with vb2-dma-sg?
>
>
> Best regards
> --
> Marek Szyprowski
> Samsung R&D Institute Poland
>
>



-- 
Ricardo Ribalda

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] videobuf2-dma-sg: Minimize the number of dma segments
  2013-07-18  7:39           ` Ricardo Ribalda Delgado
@ 2013-07-18 13:34             ` Marek Szyprowski
  2013-07-19  8:03               ` Ricardo Ribalda Delgado
  0 siblings, 1 reply; 10+ messages in thread
From: Marek Szyprowski @ 2013-07-18 13:34 UTC (permalink / raw)
  To: Ricardo Ribalda Delgado
  Cc: Pawel Osciak, Kyungmin Park, Mauro Carvalho Chehab, linux-media,
	open list

Hello,

On 7/18/2013 9:39 AM, Ricardo Ribalda Delgado wrote:
> Hello again:
>
> I have started to implemt it, but I think there is more hidden work in
> this task as it seems. In order to call dma_map_sg and
> max_dma_segment_size I need acess to the struct device, but (correct
> me if I am wrong), vb2 is device agnostic. Adding the above
> functionality will mean not only updating marvell-ccic and solo6x10,
> but updating all the vb2 buffers.

For getting device pointer, vb2-dma-sg need to be extended with so called
'allocator context'. Please check how it is done in vb2-dma-contig
(vb2_dma_contig_init_ctx() function).


> Also after some readings, maybe the sg compactation should not be done
> here, but in dma_map_sg. According to the doc:
>
> """
> The implementation is free to merge several consecutive sglist entries
> into one (e.g. if DMA mapping is done with PAGE_SIZE granularity, any
> consecutive sglist entries can be merged into one provided the first one
> ends and the second one starts on a page boundary - in fact this is a huge
> advantage for cards which either cannot do scatter-gather or have very
> limited number of scatter-gather entries) and returns the actual number
> of sg entries it mapped them to. On failure 0 is returned.
> """
>
> So, my proposal would be to alloc with alloc_pages to try to get
> memory as coherent as possible, then split the page, set the sg in
> PAGE_SIZE lenghts, and then let the dma_map_sg do its magic. if it
> doesnt do compactation, fix dma_map_sg, so more driver could take
> advantage of it.

Right, this approach is probably the best one, but this way you would need
to do the compaction in every dma-mapping implementation for every supported
architecture. IMHO vb2-dma-sg can help dma-mapping by at least by allocating
memory in larger chunks and constructing shorter scatter list. Updating
dma-mapping functions across all architectures is a lot of work and testing,
so for initial version we should focus on vb2-dma-sg. Memory allocators
already do some work to ease mapping a buffer to dma space.

> I could also of course fix marvell-ccic and solo6x10 to use sg_table.
>
> Does anything of this make sense?

I would also like to help you as much as possible, but for the next 10 days
I will be not available for both personal reasons and holidays. If you have
any questions, feel free to leave them on my mail, I will reply asap I get
back.

Best regards
-- 
Marek Szyprowski
Samsung R&D Institute Poland



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] videobuf2-dma-sg: Minimize the number of dma segments
  2013-07-18 13:34             ` Marek Szyprowski
@ 2013-07-19  8:03               ` Ricardo Ribalda Delgado
  0 siblings, 0 replies; 10+ messages in thread
From: Ricardo Ribalda Delgado @ 2013-07-19  8:03 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Pawel Osciak, Kyungmin Park, Mauro Carvalho Chehab, linux-media,
	open list

Hello Marek

I have prepared a new set of patches, please take a look to them. The
series implements the coherent allocation, segments compaction and use
of sg_table, it does not implement the dma_map/dma_unmap/dma_sync, I
rather work on that one when you are back.

Thanks for your help

On Thu, Jul 18, 2013 at 3:34 PM, Marek Szyprowski
<m.szyprowski@samsung.com> wrote:
> Hello,
>
>
> On 7/18/2013 9:39 AM, Ricardo Ribalda Delgado wrote:
>>
>> Hello again:
>>
>> I have started to implemt it, but I think there is more hidden work in
>> this task as it seems. In order to call dma_map_sg and
>> max_dma_segment_size I need acess to the struct device, but (correct
>> me if I am wrong), vb2 is device agnostic. Adding the above
>> functionality will mean not only updating marvell-ccic and solo6x10,
>> but updating all the vb2 buffers.
>
>
> For getting device pointer, vb2-dma-sg need to be extended with so called
> 'allocator context'. Please check how it is done in vb2-dma-contig
> (vb2_dma_contig_init_ctx() function).
>
>
>
>> Also after some readings, maybe the sg compactation should not be done
>> here, but in dma_map_sg. According to the doc:
>>
>> """
>> The implementation is free to merge several consecutive sglist entries
>> into one (e.g. if DMA mapping is done with PAGE_SIZE granularity, any
>> consecutive sglist entries can be merged into one provided the first one
>> ends and the second one starts on a page boundary - in fact this is a huge
>> advantage for cards which either cannot do scatter-gather or have very
>> limited number of scatter-gather entries) and returns the actual number
>> of sg entries it mapped them to. On failure 0 is returned.
>> """
>>
>> So, my proposal would be to alloc with alloc_pages to try to get
>> memory as coherent as possible, then split the page, set the sg in
>> PAGE_SIZE lenghts, and then let the dma_map_sg do its magic. if it
>> doesnt do compactation, fix dma_map_sg, so more driver could take
>> advantage of it.
>
>
> Right, this approach is probably the best one, but this way you would need
> to do the compaction in every dma-mapping implementation for every supported
> architecture. IMHO vb2-dma-sg can help dma-mapping by at least by allocating
> memory in larger chunks and constructing shorter scatter list. Updating
> dma-mapping functions across all architectures is a lot of work and testing,
> so for initial version we should focus on vb2-dma-sg. Memory allocators
> already do some work to ease mapping a buffer to dma space.
>
>
>> I could also of course fix marvell-ccic and solo6x10 to use sg_table.
>>
>> Does anything of this make sense?
>
>
> I would also like to help you as much as possible, but for the next 10 days
> I will be not available for both personal reasons and holidays. If you have
> any questions, feel free to leave them on my mail, I will reply asap I get
> back.
>
>
> Best regards
> --
> Marek Szyprowski
> Samsung R&D Institute Poland
>
>



-- 
Ricardo Ribalda

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2013-07-19  8:03 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-15  9:34 [PATCH] videobuf2-dma-sg: Minimize the number of dma segments Ricardo Ribalda Delgado
2013-07-17  8:27 ` Marek Szyprowski
2013-07-17  9:43   ` Ricardo Ribalda Delgado
2013-07-17 13:42     ` Marek Szyprowski
2013-07-17 14:20       ` Ricardo Ribalda Delgado
2013-07-18  7:15         ` Marek Szyprowski
2013-07-18  7:39           ` Ricardo Ribalda Delgado
2013-07-18 13:34             ` Marek Szyprowski
2013-07-19  8:03               ` Ricardo Ribalda Delgado
2013-07-17 14:05     ` Ricardo Ribalda Delgado

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.