linux-media.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/2] videobuf2-dma-sg: Contiguos memory allocation
@ 2013-08-02 14:19 Ricardo Ribalda Delgado
  2013-08-02 14:19 ` [PATCH v4 1/2] videobuf2-dma-sg: Allocate pages as contiguous as possible Ricardo Ribalda Delgado
  2013-08-02 14:20 ` [PATCH v4 2/2] videobuf2-dma-sg: Replace vb2_dma_sg_desc with sg_table Ricardo Ribalda Delgado
  0 siblings, 2 replies; 11+ messages in thread
From: Ricardo Ribalda Delgado @ 2013-08-02 14:19 UTC (permalink / raw)
  To: Jonathan Corbet, Mauro Carvalho Chehab, Pawel Osciak,
	Marek Szyprowski, Kyungmin Park, Ismael Luceno,
	Greg Kroah-Hartman, linux-media, Sylwester Nawrocki
  Cc: Ricardo Ribalda Delgado

Allocate memory as contiguos as possible to support dma engines with limitated amount of sg-descriptors.

Replace private structer vb2_dma_sg_desc with generic struct sg_table.

PS: This series of patches is the evolution of my previous patch for vb2-dma-sg to allocate the memory as contiguos as possible.

v4: Constains feedback from Andre Heider
Andre: spelling, spaces around <<  and &pages

v3: Constains feedback from Andre Heider
Andre: Fix error handling (--pages) was wrongly fixed

v2: Contains feedback from Andre Heider and Sylwester Nawrocki

Andre: Fix error handling (--pages)
Sylwester: Squash p3 and p4 into p2


Ricardo Ribalda Delgado (2):
  videobuf2-dma-sg: Allocate pages as contiguous as possible
  videobuf2-dma-sg: Replace vb2_dma_sg_desc with sg_table

 drivers/media/platform/marvell-ccic/mcam-core.c    |   14 +-
 drivers/media/v4l2-core/videobuf2-dma-sg.c         |  149 +++++++++++---------
 drivers/staging/media/solo6x10/solo6x10-v4l2-enc.c |   20 +--
 include/media/videobuf2-dma-sg.h                   |   10 +-
 4 files changed, 105 insertions(+), 88 deletions(-)

-- 
1.7.10.4


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v4 1/2] videobuf2-dma-sg: Allocate pages as contiguous as possible
  2013-08-02 14:19 [PATCH v4 0/2] videobuf2-dma-sg: Contiguos memory allocation Ricardo Ribalda Delgado
@ 2013-08-02 14:19 ` Ricardo Ribalda Delgado
  2013-08-02 14:20 ` [PATCH v4 2/2] videobuf2-dma-sg: Replace vb2_dma_sg_desc with sg_table Ricardo Ribalda Delgado
  1 sibling, 0 replies; 11+ messages in thread
From: Ricardo Ribalda Delgado @ 2013-08-02 14:19 UTC (permalink / raw)
  To: Jonathan Corbet, Mauro Carvalho Chehab, Pawel Osciak,
	Marek Szyprowski, Kyungmin Park, Ismael Luceno,
	Greg Kroah-Hartman, linux-media, Sylwester Nawrocki
  Cc: Ricardo Ribalda Delgado

Most DMA engines have limitations regarding the number of DMA segments
(sg-buffers) that they can handle. Videobuffers can easily spread
through hundreds of pages.

In the previous aproach, the pages were allocated individually, this
could led to the creation houndreds of dma segments (sg-buffers) that
could not be handled by some DMA engines.

This patch tries to minimize the number of DMA segments by using
alloc_pages. In the worst case it will behave as before, but most
of the times it will reduce the number of dma segments

Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Andre Heider <a.heider@gmail.com>
Signed-off-by: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>
---
 drivers/media/v4l2-core/videobuf2-dma-sg.c |   60 +++++++++++++++++++++++-----
 1 file changed, 49 insertions(+), 11 deletions(-)

diff --git a/drivers/media/v4l2-core/videobuf2-dma-sg.c b/drivers/media/v4l2-core/videobuf2-dma-sg.c
index 16ae3dc..4999c48 100644
--- a/drivers/media/v4l2-core/videobuf2-dma-sg.c
+++ b/drivers/media/v4l2-core/videobuf2-dma-sg.c
@@ -42,10 +42,55 @@ struct vb2_dma_sg_buf {
 
 static void vb2_dma_sg_put(void *buf_priv);
 
+static int vb2_dma_sg_alloc_compacted(struct vb2_dma_sg_buf *buf,
+		gfp_t gfp_flags)
+{
+	unsigned int last_page = 0;
+	int size = buf->sg_desc.size;
+
+	while (size > 0) {
+		struct page *pages;
+		int order;
+		int i;
+
+		order = get_order(size);
+		/* Dont over allocate*/
+		if ((PAGE_SIZE << order) > size)
+			order--;
+
+		pages = NULL;
+		while (!pages) {
+			pages = alloc_pages(GFP_KERNEL | __GFP_ZERO |
+					__GFP_NOWARN | gfp_flags, order);
+			if (pages)
+				break;
+
+			if (order == 0) {
+				while (last_page--)
+					__free_page(buf->pages[last_page]);
+				return -ENOMEM;
+			}
+			order--;
+		}
+
+		split_page(pages, order);
+		for (i = 0; i < (1 << order); i++) {
+			buf->pages[last_page] = &pages[i];
+			sg_set_page(&buf->sg_desc.sglist[last_page],
+					buf->pages[last_page], PAGE_SIZE, 0);
+			last_page++;
+		}
+
+		size -= PAGE_SIZE << order;
+	}
+
+	return 0;
+}
+
 static void *vb2_dma_sg_alloc(void *alloc_ctx, unsigned long size, gfp_t gfp_flags)
 {
 	struct vb2_dma_sg_buf *buf;
-	int i;
+	int ret;
 
 	buf = kzalloc(sizeof *buf, GFP_KERNEL);
 	if (!buf)
@@ -69,14 +114,9 @@ static void *vb2_dma_sg_alloc(void *alloc_ctx, unsigned long size, gfp_t gfp_fla
 	if (!buf->pages)
 		goto fail_pages_array_alloc;
 
-	for (i = 0; i < buf->sg_desc.num_pages; ++i) {
-		buf->pages[i] = alloc_page(GFP_KERNEL | __GFP_ZERO |
-					   __GFP_NOWARN | gfp_flags);
-		if (NULL == buf->pages[i])
-			goto fail_pages_alloc;
-		sg_set_page(&buf->sg_desc.sglist[i],
-			    buf->pages[i], PAGE_SIZE, 0);
-	}
+	ret = vb2_dma_sg_alloc_compacted(buf, gfp_flags);
+	if (ret)
+		goto fail_pages_alloc;
 
 	buf->handler.refcount = &buf->refcount;
 	buf->handler.put = vb2_dma_sg_put;
@@ -89,8 +129,6 @@ static void *vb2_dma_sg_alloc(void *alloc_ctx, unsigned long size, gfp_t gfp_fla
 	return buf;
 
 fail_pages_alloc:
-	while (--i >= 0)
-		__free_page(buf->pages[i]);
 	kfree(buf->pages);
 
 fail_pages_array_alloc:
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 2/2] videobuf2-dma-sg: Replace vb2_dma_sg_desc with sg_table
  2013-08-02 14:19 [PATCH v4 0/2] videobuf2-dma-sg: Contiguos memory allocation Ricardo Ribalda Delgado
  2013-08-02 14:19 ` [PATCH v4 1/2] videobuf2-dma-sg: Allocate pages as contiguous as possible Ricardo Ribalda Delgado
@ 2013-08-02 14:20 ` Ricardo Ribalda Delgado
  2014-01-03 14:52   ` Hans Verkuil
  1 sibling, 1 reply; 11+ messages in thread
From: Ricardo Ribalda Delgado @ 2013-08-02 14:20 UTC (permalink / raw)
  To: Jonathan Corbet, Mauro Carvalho Chehab, Pawel Osciak,
	Marek Szyprowski, Kyungmin Park, Ismael Luceno,
	Greg Kroah-Hartman, linux-media, Sylwester Nawrocki
  Cc: Ricardo Ribalda Delgado

Replace the private struct vb2_dma_sg_desc with the struct sg_table so
we can benefit from all the helping functions in lib/scatterlist.c for
things like allocating the sg or compacting the descriptor

marvel-ccic and solo6x10 drivers, that uses this api has been updated

Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Andre Heider <a.heider@gmail.com>
Signed-off-by: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>
---
 drivers/media/platform/marvell-ccic/mcam-core.c    |   14 +--
 drivers/media/v4l2-core/videobuf2-dma-sg.c         |  103 ++++++++------------
 drivers/staging/media/solo6x10/solo6x10-v4l2-enc.c |   20 ++--
 include/media/videobuf2-dma-sg.h                   |   10 +-
 4 files changed, 63 insertions(+), 84 deletions(-)

diff --git a/drivers/media/platform/marvell-ccic/mcam-core.c b/drivers/media/platform/marvell-ccic/mcam-core.c
index 64ab91e..0ac51bd 100644
--- a/drivers/media/platform/marvell-ccic/mcam-core.c
+++ b/drivers/media/platform/marvell-ccic/mcam-core.c
@@ -1040,16 +1040,16 @@ static int mcam_vb_sg_buf_prepare(struct vb2_buffer *vb)
 {
 	struct mcam_vb_buffer *mvb = vb_to_mvb(vb);
 	struct mcam_camera *cam = vb2_get_drv_priv(vb->vb2_queue);
-	struct vb2_dma_sg_desc *sgd = vb2_dma_sg_plane_desc(vb, 0);
+	struct sg_table *sg_table = vb2_dma_sg_plane_desc(vb, 0);
 	struct mcam_dma_desc *desc = mvb->dma_desc;
 	struct scatterlist *sg;
 	int i;
 
-	mvb->dma_desc_nent = dma_map_sg(cam->dev, sgd->sglist, sgd->num_pages,
-			DMA_FROM_DEVICE);
+	mvb->dma_desc_nent = dma_map_sg(cam->dev, sg_table->sgl,
+			sg_table->nents, DMA_FROM_DEVICE);
 	if (mvb->dma_desc_nent <= 0)
 		return -EIO;  /* Not sure what's right here */
-	for_each_sg(sgd->sglist, sg, mvb->dma_desc_nent, i) {
+	for_each_sg(sg_table->sgl, sg, mvb->dma_desc_nent, i) {
 		desc->dma_addr = sg_dma_address(sg);
 		desc->segment_len = sg_dma_len(sg);
 		desc++;
@@ -1060,9 +1060,11 @@ static int mcam_vb_sg_buf_prepare(struct vb2_buffer *vb)
 static int mcam_vb_sg_buf_finish(struct vb2_buffer *vb)
 {
 	struct mcam_camera *cam = vb2_get_drv_priv(vb->vb2_queue);
-	struct vb2_dma_sg_desc *sgd = vb2_dma_sg_plane_desc(vb, 0);
+	struct sg_table *sg_table = vb2_dma_sg_plane_desc(vb, 0);
 
-	dma_unmap_sg(cam->dev, sgd->sglist, sgd->num_pages, DMA_FROM_DEVICE);
+	if (sg_table)
+		dma_unmap_sg(cam->dev, sg_table->sgl,
+				sg_table->nents, DMA_FROM_DEVICE);
 	return 0;
 }
 
diff --git a/drivers/media/v4l2-core/videobuf2-dma-sg.c b/drivers/media/v4l2-core/videobuf2-dma-sg.c
index 4999c48..2f86054 100644
--- a/drivers/media/v4l2-core/videobuf2-dma-sg.c
+++ b/drivers/media/v4l2-core/videobuf2-dma-sg.c
@@ -35,7 +35,9 @@ struct vb2_dma_sg_buf {
 	struct page			**pages;
 	int				write;
 	int				offset;
-	struct vb2_dma_sg_desc		sg_desc;
+	struct sg_table			sg_table;
+	size_t				size;
+	unsigned int			num_pages;
 	atomic_t			refcount;
 	struct vb2_vmarea_handler	handler;
 };
@@ -46,7 +48,7 @@ static int vb2_dma_sg_alloc_compacted(struct vb2_dma_sg_buf *buf,
 		gfp_t gfp_flags)
 {
 	unsigned int last_page = 0;
-	int size = buf->sg_desc.size;
+	int size = buf->size;
 
 	while (size > 0) {
 		struct page *pages;
@@ -74,12 +76,8 @@ static int vb2_dma_sg_alloc_compacted(struct vb2_dma_sg_buf *buf,
 		}
 
 		split_page(pages, order);
-		for (i = 0; i < (1 << order); i++) {
-			buf->pages[last_page] = &pages[i];
-			sg_set_page(&buf->sg_desc.sglist[last_page],
-					buf->pages[last_page], PAGE_SIZE, 0);
-			last_page++;
-		}
+		for (i = 0; i < (1 << order); i++)
+			buf->pages[last_page++] = &pages[i];
 
 		size -= PAGE_SIZE << order;
 	}
@@ -91,6 +89,7 @@ static void *vb2_dma_sg_alloc(void *alloc_ctx, unsigned long size, gfp_t gfp_fla
 {
 	struct vb2_dma_sg_buf *buf;
 	int ret;
+	int num_pages;
 
 	buf = kzalloc(sizeof *buf, GFP_KERNEL);
 	if (!buf)
@@ -99,17 +98,11 @@ static void *vb2_dma_sg_alloc(void *alloc_ctx, unsigned long size, gfp_t gfp_fla
 	buf->vaddr = NULL;
 	buf->write = 0;
 	buf->offset = 0;
-	buf->sg_desc.size = size;
+	buf->size = size;
 	/* size is already page aligned */
-	buf->sg_desc.num_pages = size >> PAGE_SHIFT;
-
-	buf->sg_desc.sglist = vzalloc(buf->sg_desc.num_pages *
-				      sizeof(*buf->sg_desc.sglist));
-	if (!buf->sg_desc.sglist)
-		goto fail_sglist_alloc;
-	sg_init_table(buf->sg_desc.sglist, buf->sg_desc.num_pages);
+	buf->num_pages = size >> PAGE_SHIFT;
 
-	buf->pages = kzalloc(buf->sg_desc.num_pages * sizeof(struct page *),
+	buf->pages = kzalloc(buf->num_pages * sizeof(struct page *),
 			     GFP_KERNEL);
 	if (!buf->pages)
 		goto fail_pages_array_alloc;
@@ -118,6 +111,11 @@ static void *vb2_dma_sg_alloc(void *alloc_ctx, unsigned long size, gfp_t gfp_fla
 	if (ret)
 		goto fail_pages_alloc;
 
+	ret = sg_alloc_table_from_pages(&buf->sg_table, buf->pages,
+			buf->num_pages, 0, size, gfp_flags);
+	if (ret)
+		goto fail_table_alloc;
+
 	buf->handler.refcount = &buf->refcount;
 	buf->handler.put = vb2_dma_sg_put;
 	buf->handler.arg = buf;
@@ -125,16 +123,16 @@ static void *vb2_dma_sg_alloc(void *alloc_ctx, unsigned long size, gfp_t gfp_fla
 	atomic_inc(&buf->refcount);
 
 	dprintk(1, "%s: Allocated buffer of %d pages\n",
-		__func__, buf->sg_desc.num_pages);
+		__func__, buf->num_pages);
 	return buf;
 
+fail_table_alloc:
+	num_pages = buf->num_pages;
+	while (num_pages--)
+		__free_page(buf->pages[num_pages]);
 fail_pages_alloc:
 	kfree(buf->pages);
-
 fail_pages_array_alloc:
-	vfree(buf->sg_desc.sglist);
-
-fail_sglist_alloc:
 	kfree(buf);
 	return NULL;
 }
@@ -142,14 +140,14 @@ fail_sglist_alloc:
 static void vb2_dma_sg_put(void *buf_priv)
 {
 	struct vb2_dma_sg_buf *buf = buf_priv;
-	int i = buf->sg_desc.num_pages;
+	int i = buf->num_pages;
 
 	if (atomic_dec_and_test(&buf->refcount)) {
 		dprintk(1, "%s: Freeing buffer of %d pages\n", __func__,
-			buf->sg_desc.num_pages);
+			buf->num_pages);
 		if (buf->vaddr)
-			vm_unmap_ram(buf->vaddr, buf->sg_desc.num_pages);
-		vfree(buf->sg_desc.sglist);
+			vm_unmap_ram(buf->vaddr, buf->num_pages);
+		sg_free_table(&buf->sg_table);
 		while (--i >= 0)
 			__free_page(buf->pages[i]);
 		kfree(buf->pages);
@@ -162,7 +160,7 @@ static void *vb2_dma_sg_get_userptr(void *alloc_ctx, unsigned long vaddr,
 {
 	struct vb2_dma_sg_buf *buf;
 	unsigned long first, last;
-	int num_pages_from_user, i;
+	int num_pages_from_user;
 
 	buf = kzalloc(sizeof *buf, GFP_KERNEL);
 	if (!buf)
@@ -171,56 +169,41 @@ static void *vb2_dma_sg_get_userptr(void *alloc_ctx, unsigned long vaddr,
 	buf->vaddr = NULL;
 	buf->write = write;
 	buf->offset = vaddr & ~PAGE_MASK;
-	buf->sg_desc.size = size;
+	buf->size = size;
 
 	first = (vaddr           & PAGE_MASK) >> PAGE_SHIFT;
 	last  = ((vaddr + size - 1) & PAGE_MASK) >> PAGE_SHIFT;
-	buf->sg_desc.num_pages = last - first + 1;
-
-	buf->sg_desc.sglist = vzalloc(
-		buf->sg_desc.num_pages * sizeof(*buf->sg_desc.sglist));
-	if (!buf->sg_desc.sglist)
-		goto userptr_fail_sglist_alloc;
+	buf->num_pages = last - first + 1;
 
-	sg_init_table(buf->sg_desc.sglist, buf->sg_desc.num_pages);
-
-	buf->pages = kzalloc(buf->sg_desc.num_pages * sizeof(struct page *),
+	buf->pages = kzalloc(buf->num_pages * sizeof(struct page *),
 			     GFP_KERNEL);
 	if (!buf->pages)
-		goto userptr_fail_pages_array_alloc;
+		return NULL;
 
 	num_pages_from_user = get_user_pages(current, current->mm,
 					     vaddr & PAGE_MASK,
-					     buf->sg_desc.num_pages,
+					     buf->num_pages,
 					     write,
 					     1, /* force */
 					     buf->pages,
 					     NULL);
 
-	if (num_pages_from_user != buf->sg_desc.num_pages)
+	if (num_pages_from_user != buf->num_pages)
 		goto userptr_fail_get_user_pages;
 
-	sg_set_page(&buf->sg_desc.sglist[0], buf->pages[0],
-		    PAGE_SIZE - buf->offset, buf->offset);
-	size -= PAGE_SIZE - buf->offset;
-	for (i = 1; i < buf->sg_desc.num_pages; ++i) {
-		sg_set_page(&buf->sg_desc.sglist[i], buf->pages[i],
-			    min_t(size_t, PAGE_SIZE, size), 0);
-		size -= min_t(size_t, PAGE_SIZE, size);
-	}
+	if (sg_alloc_table_from_pages(&buf->sg_table, buf->pages,
+			buf->num_pages, buf->offset, size, 0))
+		goto userptr_fail_alloc_table_from_pages;
+
 	return buf;
 
+userptr_fail_alloc_table_from_pages:
 userptr_fail_get_user_pages:
 	dprintk(1, "get_user_pages requested/got: %d/%d]\n",
-	       num_pages_from_user, buf->sg_desc.num_pages);
+	       num_pages_from_user, buf->num_pages);
 	while (--num_pages_from_user >= 0)
 		put_page(buf->pages[num_pages_from_user]);
 	kfree(buf->pages);
-
-userptr_fail_pages_array_alloc:
-	vfree(buf->sg_desc.sglist);
-
-userptr_fail_sglist_alloc:
 	kfree(buf);
 	return NULL;
 }
@@ -232,18 +215,18 @@ userptr_fail_sglist_alloc:
 static void vb2_dma_sg_put_userptr(void *buf_priv)
 {
 	struct vb2_dma_sg_buf *buf = buf_priv;
-	int i = buf->sg_desc.num_pages;
+	int i = buf->num_pages;
 
 	dprintk(1, "%s: Releasing userspace buffer of %d pages\n",
-	       __func__, buf->sg_desc.num_pages);
+	       __func__, buf->num_pages);
 	if (buf->vaddr)
-		vm_unmap_ram(buf->vaddr, buf->sg_desc.num_pages);
+		vm_unmap_ram(buf->vaddr, buf->num_pages);
+	sg_free_table(&buf->sg_table);
 	while (--i >= 0) {
 		if (buf->write)
 			set_page_dirty_lock(buf->pages[i]);
 		put_page(buf->pages[i]);
 	}
-	vfree(buf->sg_desc.sglist);
 	kfree(buf->pages);
 	kfree(buf);
 }
@@ -256,7 +239,7 @@ static void *vb2_dma_sg_vaddr(void *buf_priv)
 
 	if (!buf->vaddr)
 		buf->vaddr = vm_map_ram(buf->pages,
-					buf->sg_desc.num_pages,
+					buf->num_pages,
 					-1,
 					PAGE_KERNEL);
 
@@ -312,7 +295,7 @@ static void *vb2_dma_sg_cookie(void *buf_priv)
 {
 	struct vb2_dma_sg_buf *buf = buf_priv;
 
-	return &buf->sg_desc;
+	return &buf->sg_table;
 }
 
 const struct vb2_mem_ops vb2_dma_sg_memops = {
diff --git a/drivers/staging/media/solo6x10/solo6x10-v4l2-enc.c b/drivers/staging/media/solo6x10/solo6x10-v4l2-enc.c
index 98e2902..cfa01f1 100644
--- a/drivers/staging/media/solo6x10/solo6x10-v4l2-enc.c
+++ b/drivers/staging/media/solo6x10/solo6x10-v4l2-enc.c
@@ -346,7 +346,7 @@ static int enc_get_mpeg_dma(struct solo_dev *solo_dev, dma_addr_t dma,
 /* Build a descriptor queue out of an SG list and send it to the P2M for
  * processing. */
 static int solo_send_desc(struct solo_enc_dev *solo_enc, int skip,
-			  struct vb2_dma_sg_desc *vbuf, int off, int size,
+			  struct sg_table *vbuf, int off, int size,
 			  unsigned int base, unsigned int base_size)
 {
 	struct solo_dev *solo_dev = solo_enc->solo_dev;
@@ -359,7 +359,7 @@ static int solo_send_desc(struct solo_enc_dev *solo_enc, int skip,
 
 	solo_enc->desc_count = 1;
 
-	for_each_sg(vbuf->sglist, sg, vbuf->num_pages, i) {
+	for_each_sg(vbuf->sgl, sg, vbuf->nents, i) {
 		struct solo_p2m_desc *desc;
 		dma_addr_t dma;
 		int len;
@@ -434,7 +434,7 @@ static int solo_fill_jpeg(struct solo_enc_dev *solo_enc,
 		struct vb2_buffer *vb, struct vop_header *vh)
 {
 	struct solo_dev *solo_dev = solo_enc->solo_dev;
-	struct vb2_dma_sg_desc *vbuf = vb2_dma_sg_plane_desc(vb, 0);
+	struct sg_table *vbuf = vb2_dma_sg_plane_desc(vb, 0);
 	int frame_size;
 	int ret;
 
@@ -443,7 +443,7 @@ static int solo_fill_jpeg(struct solo_enc_dev *solo_enc,
 	if (vb2_plane_size(vb, 0) < vh->jpeg_size + solo_enc->jpeg_len)
 		return -EIO;
 
-	sg_copy_from_buffer(vbuf->sglist, vbuf->num_pages,
+	sg_copy_from_buffer(vbuf->sgl, vbuf->nents,
 			solo_enc->jpeg_header,
 			solo_enc->jpeg_len);
 
@@ -451,12 +451,12 @@ static int solo_fill_jpeg(struct solo_enc_dev *solo_enc,
 		& ~(DMA_ALIGN - 1);
 	vb2_set_plane_payload(vb, 0, vh->jpeg_size + solo_enc->jpeg_len);
 
-	dma_map_sg(&solo_dev->pdev->dev, vbuf->sglist, vbuf->num_pages,
+	dma_map_sg(&solo_dev->pdev->dev, vbuf->sgl, vbuf->nents,
 			DMA_FROM_DEVICE);
 	ret = solo_send_desc(solo_enc, solo_enc->jpeg_len, vbuf, vh->jpeg_off,
 			frame_size, SOLO_JPEG_EXT_ADDR(solo_dev),
 			SOLO_JPEG_EXT_SIZE(solo_dev));
-	dma_unmap_sg(&solo_dev->pdev->dev, vbuf->sglist, vbuf->num_pages,
+	dma_unmap_sg(&solo_dev->pdev->dev, vbuf->sgl, vbuf->nents,
 			DMA_FROM_DEVICE);
 	return ret;
 }
@@ -465,7 +465,7 @@ static int solo_fill_mpeg(struct solo_enc_dev *solo_enc,
 		struct vb2_buffer *vb, struct vop_header *vh)
 {
 	struct solo_dev *solo_dev = solo_enc->solo_dev;
-	struct vb2_dma_sg_desc *vbuf = vb2_dma_sg_plane_desc(vb, 0);
+	struct sg_table *vbuf = vb2_dma_sg_plane_desc(vb, 0);
 	int frame_off, frame_size;
 	int skip = 0;
 	int ret;
@@ -475,7 +475,7 @@ static int solo_fill_mpeg(struct solo_enc_dev *solo_enc,
 
 	/* If this is a key frame, add extra header */
 	if (!vh->vop_type) {
-		sg_copy_from_buffer(vbuf->sglist, vbuf->num_pages,
+		sg_copy_from_buffer(vbuf->sgl, vbuf->nents,
 				solo_enc->vop,
 				solo_enc->vop_len);
 
@@ -494,12 +494,12 @@ static int solo_fill_mpeg(struct solo_enc_dev *solo_enc,
 	frame_size = (vh->mpeg_size + skip + (DMA_ALIGN - 1))
 		& ~(DMA_ALIGN - 1);
 
-	dma_map_sg(&solo_dev->pdev->dev, vbuf->sglist, vbuf->num_pages,
+	dma_map_sg(&solo_dev->pdev->dev, vbuf->sgl, vbuf->nents,
 			DMA_FROM_DEVICE);
 	ret = solo_send_desc(solo_enc, skip, vbuf, frame_off, frame_size,
 			SOLO_MP4E_EXT_ADDR(solo_dev),
 			SOLO_MP4E_EXT_SIZE(solo_dev));
-	dma_unmap_sg(&solo_dev->pdev->dev, vbuf->sglist, vbuf->num_pages,
+	dma_unmap_sg(&solo_dev->pdev->dev, vbuf->sgl, vbuf->nents,
 			DMA_FROM_DEVICE);
 	return ret;
 }
diff --git a/include/media/videobuf2-dma-sg.h b/include/media/videobuf2-dma-sg.h
index 0038526..7b89852 100644
--- a/include/media/videobuf2-dma-sg.h
+++ b/include/media/videobuf2-dma-sg.h
@@ -15,16 +15,10 @@
 
 #include <media/videobuf2-core.h>
 
-struct vb2_dma_sg_desc {
-	unsigned long		size;
-	unsigned int		num_pages;
-	struct scatterlist	*sglist;
-};
-
-static inline struct vb2_dma_sg_desc *vb2_dma_sg_plane_desc(
+static inline struct sg_table *vb2_dma_sg_plane_desc(
 		struct vb2_buffer *vb, unsigned int plane_no)
 {
-	return (struct vb2_dma_sg_desc *)vb2_plane_cookie(vb, plane_no);
+	return (struct sg_table *)vb2_plane_cookie(vb, plane_no);
 }
 
 extern const struct vb2_mem_ops vb2_dma_sg_memops;
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 2/2] videobuf2-dma-sg: Replace vb2_dma_sg_desc with sg_table
  2013-08-02 14:20 ` [PATCH v4 2/2] videobuf2-dma-sg: Replace vb2_dma_sg_desc with sg_table Ricardo Ribalda Delgado
@ 2014-01-03 14:52   ` Hans Verkuil
  2014-01-03 15:17     ` Ricardo Ribalda Delgado
  0 siblings, 1 reply; 11+ messages in thread
From: Hans Verkuil @ 2014-01-03 14:52 UTC (permalink / raw)
  To: Ricardo Ribalda Delgado
  Cc: Jonathan Corbet, Mauro Carvalho Chehab, Pawel Osciak,
	Marek Szyprowski, Kyungmin Park, Ismael Luceno,
	Greg Kroah-Hartman, linux-media, Sylwester Nawrocki

Hi Ricardo,

I've run into a problem that is caused by this patch:

On 08/02/2013 04:20 PM, Ricardo Ribalda Delgado wrote:
> Replace the private struct vb2_dma_sg_desc with the struct sg_table so
> we can benefit from all the helping functions in lib/scatterlist.c for
> things like allocating the sg or compacting the descriptor
> 
> marvel-ccic and solo6x10 drivers, that uses this api has been updated
> 
> Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Reviewed-by: Andre Heider <a.heider@gmail.com>
> Signed-off-by: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>
> ---
>  drivers/media/platform/marvell-ccic/mcam-core.c    |   14 +--
>  drivers/media/v4l2-core/videobuf2-dma-sg.c         |  103 ++++++++------------
>  drivers/staging/media/solo6x10/solo6x10-v4l2-enc.c |   20 ++--
>  include/media/videobuf2-dma-sg.h                   |   10 +-
>  4 files changed, 63 insertions(+), 84 deletions(-)
> 

<snip>

> diff --git a/drivers/media/v4l2-core/videobuf2-dma-sg.c b/drivers/media/v4l2-core/videobuf2-dma-sg.c
> index 4999c48..2f86054 100644
> --- a/drivers/media/v4l2-core/videobuf2-dma-sg.c
> +++ b/drivers/media/v4l2-core/videobuf2-dma-sg.c

<snip>

> @@ -99,17 +98,11 @@ static void *vb2_dma_sg_alloc(void *alloc_ctx, unsigned long size, gfp_t gfp_fla
>  	buf->vaddr = NULL;
>  	buf->write = 0;
>  	buf->offset = 0;
> -	buf->sg_desc.size = size;
> +	buf->size = size;
>  	/* size is already page aligned */
> -	buf->sg_desc.num_pages = size >> PAGE_SHIFT;
> -
> -	buf->sg_desc.sglist = vzalloc(buf->sg_desc.num_pages *
> -				      sizeof(*buf->sg_desc.sglist));
> -	if (!buf->sg_desc.sglist)
> -		goto fail_sglist_alloc;
> -	sg_init_table(buf->sg_desc.sglist, buf->sg_desc.num_pages);
> +	buf->num_pages = size >> PAGE_SHIFT;
>  
> -	buf->pages = kzalloc(buf->sg_desc.num_pages * sizeof(struct page *),
> +	buf->pages = kzalloc(buf->num_pages * sizeof(struct page *),
>  			     GFP_KERNEL);
>  	if (!buf->pages)
>  		goto fail_pages_array_alloc;
> @@ -118,6 +111,11 @@ static void *vb2_dma_sg_alloc(void *alloc_ctx, unsigned long size, gfp_t gfp_fla
>  	if (ret)
>  		goto fail_pages_alloc;
>  
> +	ret = sg_alloc_table_from_pages(&buf->sg_table, buf->pages,
> +			buf->num_pages, 0, size, gfp_flags);
> +	if (ret)
> +		goto fail_table_alloc;
> +
>  	buf->handler.refcount = &buf->refcount;
>  	buf->handler.put = vb2_dma_sg_put;
>  	buf->handler.arg = buf;

The problem here is the switch from sg_init_table to sg_alloc_table_from_pages. If
the PCI hardware only accepts 32-bit DMA transfers, but it is used on a 64-bit OS
with > 4GB physical memory, then the kernel will allocate DMA bounce buffers for you.

With sg_init_table that works fine since each page in the scatterlist maps to a
bounce buffer that is also just one page, but with sg_alloc_table_from_pages the DMA
bounce buffers can be multiple pages. This is in turn rounded up to the next power of
2 and allocated in the 32-bit address space. Unfortunately, due to memory fragmentation
this very quickly fails with -ENOMEM.

I discovered this while converting saa7134 to vb2. I think that when DMA bounce
buffers are needed, then it should revert to sg_init_table.

I don't know whether this bug also affects non-v4l drivers.

For now at least I won't try to fix this myself as I have discovered that dma-sg
doesn't work anyway for saa7134 due to a hardware limitation so I will switch to
dma-contig for that driver.

But at the very least I thought I should write this down so others know about this
subtle problem and perhaps someone else wants to tackle this.

I actually think that the solo driver is affected by this (I haven't tested it yet).
And at some point we need to convert bttv and cx88 to vb2 as well, and I expect that
they will hit the same problem.

If someone knows a better solution than switching to sg_init_table if bounce buffers
are needed, then let me know.

Regards,

	Hans

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 2/2] videobuf2-dma-sg: Replace vb2_dma_sg_desc with sg_table
  2014-01-03 14:52   ` Hans Verkuil
@ 2014-01-03 15:17     ` Ricardo Ribalda Delgado
  2014-01-03 15:36       ` Hans Verkuil
  0 siblings, 1 reply; 11+ messages in thread
From: Ricardo Ribalda Delgado @ 2014-01-03 15:17 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: Jonathan Corbet, Mauro Carvalho Chehab, Pawel Osciak,
	Marek Szyprowski, Kyungmin Park, Ismael Luceno,
	Greg Kroah-Hartman, linux-media, Sylwester Nawrocki

Hello Hans

Thank you very much for your mail.

For what I understand sg_alloc_table_from_pages does not allocate any
page or bounce buffer, it just take a set of N pages and makes a
sg_table from it, on the process it finds out if page A and A+1are on
the same pfn and if it is true they will share the sg. So it is a
later function that produces the error.  As I see it, before this
patch we were reimplementing sg_alloc_table_from_pages.

Which function is returning -ENOMEM?


Regards!



On Fri, Jan 3, 2014 at 3:52 PM, Hans Verkuil <hverkuil@xs4all.nl> wrote:
> Hi Ricardo,
>
> I've run into a problem that is caused by this patch:
>
> On 08/02/2013 04:20 PM, Ricardo Ribalda Delgado wrote:
>> Replace the private struct vb2_dma_sg_desc with the struct sg_table so
>> we can benefit from all the helping functions in lib/scatterlist.c for
>> things like allocating the sg or compacting the descriptor
>>
>> marvel-ccic and solo6x10 drivers, that uses this api has been updated
>>
>> Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
>> Reviewed-by: Andre Heider <a.heider@gmail.com>
>> Signed-off-by: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>
>> ---
>>  drivers/media/platform/marvell-ccic/mcam-core.c    |   14 +--
>>  drivers/media/v4l2-core/videobuf2-dma-sg.c         |  103 ++++++++------------
>>  drivers/staging/media/solo6x10/solo6x10-v4l2-enc.c |   20 ++--
>>  include/media/videobuf2-dma-sg.h                   |   10 +-
>>  4 files changed, 63 insertions(+), 84 deletions(-)
>>
>
> <snip>
>
>> diff --git a/drivers/media/v4l2-core/videobuf2-dma-sg.c b/drivers/media/v4l2-core/videobuf2-dma-sg.c
>> index 4999c48..2f86054 100644
>> --- a/drivers/media/v4l2-core/videobuf2-dma-sg.c
>> +++ b/drivers/media/v4l2-core/videobuf2-dma-sg.c
>
> <snip>
>
>> @@ -99,17 +98,11 @@ static void *vb2_dma_sg_alloc(void *alloc_ctx, unsigned long size, gfp_t gfp_fla
>>       buf->vaddr = NULL;
>>       buf->write = 0;
>>       buf->offset = 0;
>> -     buf->sg_desc.size = size;
>> +     buf->size = size;
>>       /* size is already page aligned */
>> -     buf->sg_desc.num_pages = size >> PAGE_SHIFT;
>> -
>> -     buf->sg_desc.sglist = vzalloc(buf->sg_desc.num_pages *
>> -                                   sizeof(*buf->sg_desc.sglist));
>> -     if (!buf->sg_desc.sglist)
>> -             goto fail_sglist_alloc;
>> -     sg_init_table(buf->sg_desc.sglist, buf->sg_desc.num_pages);
>> +     buf->num_pages = size >> PAGE_SHIFT;
>>
>> -     buf->pages = kzalloc(buf->sg_desc.num_pages * sizeof(struct page *),
>> +     buf->pages = kzalloc(buf->num_pages * sizeof(struct page *),
>>                            GFP_KERNEL);
>>       if (!buf->pages)
>>               goto fail_pages_array_alloc;
>> @@ -118,6 +111,11 @@ static void *vb2_dma_sg_alloc(void *alloc_ctx, unsigned long size, gfp_t gfp_fla
>>       if (ret)
>>               goto fail_pages_alloc;
>>
>> +     ret = sg_alloc_table_from_pages(&buf->sg_table, buf->pages,
>> +                     buf->num_pages, 0, size, gfp_flags);
>> +     if (ret)
>> +             goto fail_table_alloc;
>> +
>>       buf->handler.refcount = &buf->refcount;
>>       buf->handler.put = vb2_dma_sg_put;
>>       buf->handler.arg = buf;
>
> The problem here is the switch from sg_init_table to sg_alloc_table_from_pages. If
> the PCI hardware only accepts 32-bit DMA transfers, but it is used on a 64-bit OS
> with > 4GB physical memory, then the kernel will allocate DMA bounce buffers for you.
>
> With sg_init_table that works fine since each page in the scatterlist maps to a
> bounce buffer that is also just one page, but with sg_alloc_table_from_pages the DMA
> bounce buffers can be multiple pages. This is in turn rounded up to the next power of
> 2 and allocated in the 32-bit address space. Unfortunately, due to memory fragmentation
> this very quickly fails with -ENOMEM.
>
> I discovered this while converting saa7134 to vb2. I think that when DMA bounce
> buffers are needed, then it should revert to sg_init_table.
>
> I don't know whether this bug also affects non-v4l drivers.
>
> For now at least I won't try to fix this myself as I have discovered that dma-sg
> doesn't work anyway for saa7134 due to a hardware limitation so I will switch to
> dma-contig for that driver.
>
> But at the very least I thought I should write this down so others know about this
> subtle problem and perhaps someone else wants to tackle this.
>
> I actually think that the solo driver is affected by this (I haven't tested it yet).
> And at some point we need to convert bttv and cx88 to vb2 as well, and I expect that
> they will hit the same problem.
>
> If someone knows a better solution than switching to sg_init_table if bounce buffers
> are needed, then let me know.
>
> Regards,
>
>         Hans



-- 
Ricardo Ribalda

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 2/2] videobuf2-dma-sg: Replace vb2_dma_sg_desc with sg_table
  2014-01-03 15:17     ` Ricardo Ribalda Delgado
@ 2014-01-03 15:36       ` Hans Verkuil
  2014-01-03 15:51         ` Ricardo Ribalda Delgado
  0 siblings, 1 reply; 11+ messages in thread
From: Hans Verkuil @ 2014-01-03 15:36 UTC (permalink / raw)
  To: Ricardo Ribalda Delgado
  Cc: Jonathan Corbet, Mauro Carvalho Chehab, Pawel Osciak,
	Marek Szyprowski, Kyungmin Park, Ismael Luceno,
	Greg Kroah-Hartman, linux-media, Sylwester Nawrocki

On 01/03/2014 04:17 PM, Ricardo Ribalda Delgado wrote:
> Hello Hans
> 
> Thank you very much for your mail.
> 
> For what I understand sg_alloc_table_from_pages does not allocate any
> page or bounce buffer, it just take a set of N pages and makes a
> sg_table from it, on the process it finds out if page A and A+1are on
> the same pfn and if it is true they will share the sg. So it is a
> later function that produces the error.  As I see it, before this
> patch we were reimplementing sg_alloc_table_from_pages.
> 
> Which function is returning -ENOMEM?

That's dma_map_sg(), which uses the scatter list constructed by
sg_alloc_table_from_pages(). For x86 that ends up in lib/swiotlb.c,
swiotlb_map_sg_attrs().

Regards,

	Hans

> 
> 
> Regards!
> 
> 
> 
> On Fri, Jan 3, 2014 at 3:52 PM, Hans Verkuil <hverkuil@xs4all.nl> wrote:
>> Hi Ricardo,
>>
>> I've run into a problem that is caused by this patch:
>>
>> On 08/02/2013 04:20 PM, Ricardo Ribalda Delgado wrote:
>>> Replace the private struct vb2_dma_sg_desc with the struct sg_table so
>>> we can benefit from all the helping functions in lib/scatterlist.c for
>>> things like allocating the sg or compacting the descriptor
>>>
>>> marvel-ccic and solo6x10 drivers, that uses this api has been updated
>>>
>>> Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
>>> Reviewed-by: Andre Heider <a.heider@gmail.com>
>>> Signed-off-by: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>
>>> ---
>>>  drivers/media/platform/marvell-ccic/mcam-core.c    |   14 +--
>>>  drivers/media/v4l2-core/videobuf2-dma-sg.c         |  103 ++++++++------------
>>>  drivers/staging/media/solo6x10/solo6x10-v4l2-enc.c |   20 ++--
>>>  include/media/videobuf2-dma-sg.h                   |   10 +-
>>>  4 files changed, 63 insertions(+), 84 deletions(-)
>>>
>>
>> <snip>
>>
>>> diff --git a/drivers/media/v4l2-core/videobuf2-dma-sg.c b/drivers/media/v4l2-core/videobuf2-dma-sg.c
>>> index 4999c48..2f86054 100644
>>> --- a/drivers/media/v4l2-core/videobuf2-dma-sg.c
>>> +++ b/drivers/media/v4l2-core/videobuf2-dma-sg.c
>>
>> <snip>
>>
>>> @@ -99,17 +98,11 @@ static void *vb2_dma_sg_alloc(void *alloc_ctx, unsigned long size, gfp_t gfp_fla
>>>       buf->vaddr = NULL;
>>>       buf->write = 0;
>>>       buf->offset = 0;
>>> -     buf->sg_desc.size = size;
>>> +     buf->size = size;
>>>       /* size is already page aligned */
>>> -     buf->sg_desc.num_pages = size >> PAGE_SHIFT;
>>> -
>>> -     buf->sg_desc.sglist = vzalloc(buf->sg_desc.num_pages *
>>> -                                   sizeof(*buf->sg_desc.sglist));
>>> -     if (!buf->sg_desc.sglist)
>>> -             goto fail_sglist_alloc;
>>> -     sg_init_table(buf->sg_desc.sglist, buf->sg_desc.num_pages);
>>> +     buf->num_pages = size >> PAGE_SHIFT;
>>>
>>> -     buf->pages = kzalloc(buf->sg_desc.num_pages * sizeof(struct page *),
>>> +     buf->pages = kzalloc(buf->num_pages * sizeof(struct page *),
>>>                            GFP_KERNEL);
>>>       if (!buf->pages)
>>>               goto fail_pages_array_alloc;
>>> @@ -118,6 +111,11 @@ static void *vb2_dma_sg_alloc(void *alloc_ctx, unsigned long size, gfp_t gfp_fla
>>>       if (ret)
>>>               goto fail_pages_alloc;
>>>
>>> +     ret = sg_alloc_table_from_pages(&buf->sg_table, buf->pages,
>>> +                     buf->num_pages, 0, size, gfp_flags);
>>> +     if (ret)
>>> +             goto fail_table_alloc;
>>> +
>>>       buf->handler.refcount = &buf->refcount;
>>>       buf->handler.put = vb2_dma_sg_put;
>>>       buf->handler.arg = buf;
>>
>> The problem here is the switch from sg_init_table to sg_alloc_table_from_pages. If
>> the PCI hardware only accepts 32-bit DMA transfers, but it is used on a 64-bit OS
>> with > 4GB physical memory, then the kernel will allocate DMA bounce buffers for you.
>>
>> With sg_init_table that works fine since each page in the scatterlist maps to a
>> bounce buffer that is also just one page, but with sg_alloc_table_from_pages the DMA
>> bounce buffers can be multiple pages. This is in turn rounded up to the next power of
>> 2 and allocated in the 32-bit address space. Unfortunately, due to memory fragmentation
>> this very quickly fails with -ENOMEM.
>>
>> I discovered this while converting saa7134 to vb2. I think that when DMA bounce
>> buffers are needed, then it should revert to sg_init_table.
>>
>> I don't know whether this bug also affects non-v4l drivers.
>>
>> For now at least I won't try to fix this myself as I have discovered that dma-sg
>> doesn't work anyway for saa7134 due to a hardware limitation so I will switch to
>> dma-contig for that driver.
>>
>> But at the very least I thought I should write this down so others know about this
>> subtle problem and perhaps someone else wants to tackle this.
>>
>> I actually think that the solo driver is affected by this (I haven't tested it yet).
>> And at some point we need to convert bttv and cx88 to vb2 as well, and I expect that
>> they will hit the same problem.
>>
>> If someone knows a better solution than switching to sg_init_table if bounce buffers
>> are needed, then let me know.
>>
>> Regards,
>>
>>         Hans
> 
> 
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 2/2] videobuf2-dma-sg: Replace vb2_dma_sg_desc with sg_table
  2014-01-03 15:36       ` Hans Verkuil
@ 2014-01-03 15:51         ` Ricardo Ribalda Delgado
  2014-01-08 14:07           ` Marek Szyprowski
  0 siblings, 1 reply; 11+ messages in thread
From: Ricardo Ribalda Delgado @ 2014-01-03 15:51 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: Jonathan Corbet, Mauro Carvalho Chehab, Pawel Osciak,
	Marek Szyprowski, Kyungmin Park, Ismael Luceno,
	Greg Kroah-Hartman, linux-media, Sylwester Nawrocki

Hello Hans

What if we move the dma_map_sg and dma_unmap_sg to the vb2 interface,
and there do something like:

n_sg= dma_map_sg()
if (n_sg=-ENOMEM){
   split_table() //Breaks down the sg_table into monopages sg
   n_sg= dma_map_sg()
}
if (n_sg=-ENOMEM)
  return -ENOMEM

Regards



On Fri, Jan 3, 2014 at 4:36 PM, Hans Verkuil <hverkuil@xs4all.nl> wrote:
> On 01/03/2014 04:17 PM, Ricardo Ribalda Delgado wrote:
>> Hello Hans
>>
>> Thank you very much for your mail.
>>
>> For what I understand sg_alloc_table_from_pages does not allocate any
>> page or bounce buffer, it just take a set of N pages and makes a
>> sg_table from it, on the process it finds out if page A and A+1are on
>> the same pfn and if it is true they will share the sg. So it is a
>> later function that produces the error.  As I see it, before this
>> patch we were reimplementing sg_alloc_table_from_pages.
>>
>> Which function is returning -ENOMEM?
>
> That's dma_map_sg(), which uses the scatter list constructed by
> sg_alloc_table_from_pages(). For x86 that ends up in lib/swiotlb.c,
> swiotlb_map_sg_attrs().
>
> Regards,
>
>         Hans
>
>>
>>
>> Regards!
>>
>>
>>
>> On Fri, Jan 3, 2014 at 3:52 PM, Hans Verkuil <hverkuil@xs4all.nl> wrote:
>>> Hi Ricardo,
>>>
>>> I've run into a problem that is caused by this patch:
>>>
>>> On 08/02/2013 04:20 PM, Ricardo Ribalda Delgado wrote:
>>>> Replace the private struct vb2_dma_sg_desc with the struct sg_table so
>>>> we can benefit from all the helping functions in lib/scatterlist.c for
>>>> things like allocating the sg or compacting the descriptor
>>>>
>>>> marvel-ccic and solo6x10 drivers, that uses this api has been updated
>>>>
>>>> Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
>>>> Reviewed-by: Andre Heider <a.heider@gmail.com>
>>>> Signed-off-by: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>
>>>> ---
>>>>  drivers/media/platform/marvell-ccic/mcam-core.c    |   14 +--
>>>>  drivers/media/v4l2-core/videobuf2-dma-sg.c         |  103 ++++++++------------
>>>>  drivers/staging/media/solo6x10/solo6x10-v4l2-enc.c |   20 ++--
>>>>  include/media/videobuf2-dma-sg.h                   |   10 +-
>>>>  4 files changed, 63 insertions(+), 84 deletions(-)
>>>>
>>>
>>> <snip>
>>>
>>>> diff --git a/drivers/media/v4l2-core/videobuf2-dma-sg.c b/drivers/media/v4l2-core/videobuf2-dma-sg.c
>>>> index 4999c48..2f86054 100644
>>>> --- a/drivers/media/v4l2-core/videobuf2-dma-sg.c
>>>> +++ b/drivers/media/v4l2-core/videobuf2-dma-sg.c
>>>
>>> <snip>
>>>
>>>> @@ -99,17 +98,11 @@ static void *vb2_dma_sg_alloc(void *alloc_ctx, unsigned long size, gfp_t gfp_fla
>>>>       buf->vaddr = NULL;
>>>>       buf->write = 0;
>>>>       buf->offset = 0;
>>>> -     buf->sg_desc.size = size;
>>>> +     buf->size = size;
>>>>       /* size is already page aligned */
>>>> -     buf->sg_desc.num_pages = size >> PAGE_SHIFT;
>>>> -
>>>> -     buf->sg_desc.sglist = vzalloc(buf->sg_desc.num_pages *
>>>> -                                   sizeof(*buf->sg_desc.sglist));
>>>> -     if (!buf->sg_desc.sglist)
>>>> -             goto fail_sglist_alloc;
>>>> -     sg_init_table(buf->sg_desc.sglist, buf->sg_desc.num_pages);
>>>> +     buf->num_pages = size >> PAGE_SHIFT;
>>>>
>>>> -     buf->pages = kzalloc(buf->sg_desc.num_pages * sizeof(struct page *),
>>>> +     buf->pages = kzalloc(buf->num_pages * sizeof(struct page *),
>>>>                            GFP_KERNEL);
>>>>       if (!buf->pages)
>>>>               goto fail_pages_array_alloc;
>>>> @@ -118,6 +111,11 @@ static void *vb2_dma_sg_alloc(void *alloc_ctx, unsigned long size, gfp_t gfp_fla
>>>>       if (ret)
>>>>               goto fail_pages_alloc;
>>>>
>>>> +     ret = sg_alloc_table_from_pages(&buf->sg_table, buf->pages,
>>>> +                     buf->num_pages, 0, size, gfp_flags);
>>>> +     if (ret)
>>>> +             goto fail_table_alloc;
>>>> +
>>>>       buf->handler.refcount = &buf->refcount;
>>>>       buf->handler.put = vb2_dma_sg_put;
>>>>       buf->handler.arg = buf;
>>>
>>> The problem here is the switch from sg_init_table to sg_alloc_table_from_pages. If
>>> the PCI hardware only accepts 32-bit DMA transfers, but it is used on a 64-bit OS
>>> with > 4GB physical memory, then the kernel will allocate DMA bounce buffers for you.
>>>
>>> With sg_init_table that works fine since each page in the scatterlist maps to a
>>> bounce buffer that is also just one page, but with sg_alloc_table_from_pages the DMA
>>> bounce buffers can be multiple pages. This is in turn rounded up to the next power of
>>> 2 and allocated in the 32-bit address space. Unfortunately, due to memory fragmentation
>>> this very quickly fails with -ENOMEM.
>>>
>>> I discovered this while converting saa7134 to vb2. I think that when DMA bounce
>>> buffers are needed, then it should revert to sg_init_table.
>>>
>>> I don't know whether this bug also affects non-v4l drivers.
>>>
>>> For now at least I won't try to fix this myself as I have discovered that dma-sg
>>> doesn't work anyway for saa7134 due to a hardware limitation so I will switch to
>>> dma-contig for that driver.
>>>
>>> But at the very least I thought I should write this down so others know about this
>>> subtle problem and perhaps someone else wants to tackle this.
>>>
>>> I actually think that the solo driver is affected by this (I haven't tested it yet).
>>> And at some point we need to convert bttv and cx88 to vb2 as well, and I expect that
>>> they will hit the same problem.
>>>
>>> If someone knows a better solution than switching to sg_init_table if bounce buffers
>>> are needed, then let me know.
>>>
>>> Regards,
>>>
>>>         Hans
>>
>>
>>
>



-- 
Ricardo Ribalda

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 2/2] videobuf2-dma-sg: Replace vb2_dma_sg_desc with sg_table
  2014-01-03 15:51         ` Ricardo Ribalda Delgado
@ 2014-01-08 14:07           ` Marek Szyprowski
  2014-01-13  9:54             ` Hans Verkuil
  0 siblings, 1 reply; 11+ messages in thread
From: Marek Szyprowski @ 2014-01-08 14:07 UTC (permalink / raw)
  To: Ricardo Ribalda Delgado, Hans Verkuil
  Cc: Jonathan Corbet, Mauro Carvalho Chehab, Pawel Osciak,
	Kyungmin Park, Ismael Luceno, Greg Kroah-Hartman, linux-media,
	Sylwester Nawrocki

Hello All,

On 2014-01-03 16:51, Ricardo Ribalda Delgado wrote:
> Hello Hans
>
> What if we move the dma_map_sg and dma_unmap_sg to the vb2 interface,
> and there do something like:
>
> n_sg= dma_map_sg()
> if (n_sg=-ENOMEM){
>     split_table() //Breaks down the sg_table into monopages sg
>     n_sg= dma_map_sg()
> }
> if (n_sg=-ENOMEM)
>    return -ENOMEM

dma_map_sg/dma_unmap_sg should be moved to vb2-dma-sg memory allocator. 
The best place for calling them is buf_prepare() and buf_finish() 
callbacks. I think that I've already pointed this some time ago, but 
unfortunately I didn't find enough time to convert existing code.

For solving the problem described by Hans, I think that vb2-dma-sg 
memory allocator should check dma mask of the client device and add 
appropriate GFP_DMA or GFP_DMA32 flags to alloc_pages(). This should fix 
the issues with failed dma_map_sg due to lack of bouncing buffers.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 2/2] videobuf2-dma-sg: Replace vb2_dma_sg_desc with sg_table
  2014-01-08 14:07           ` Marek Szyprowski
@ 2014-01-13  9:54             ` Hans Verkuil
  2014-01-13 13:02               ` Marek Szyprowski
  0 siblings, 1 reply; 11+ messages in thread
From: Hans Verkuil @ 2014-01-13  9:54 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Ricardo Ribalda Delgado, Jonathan Corbet, Mauro Carvalho Chehab,
	Pawel Osciak, Kyungmin Park, Ismael Luceno, Greg Kroah-Hartman,
	linux-media, Sylwester Nawrocki

Hi Marek, Ricardo,

On 01/08/2014 03:07 PM, Marek Szyprowski wrote:
> Hello All,
> 
> On 2014-01-03 16:51, Ricardo Ribalda Delgado wrote:
>> Hello Hans
>>
>> What if we move the dma_map_sg and dma_unmap_sg to the vb2 interface,
>> and there do something like:
>>
>> n_sg= dma_map_sg()
>> if (n_sg=-ENOMEM){
>>     split_table() //Breaks down the sg_table into monopages sg
>>     n_sg= dma_map_sg()

This is not a good approach. Remember that if swiotbl needs to allocate
e.g. 17 contiguous pages it will round up to the next power of two, so it
allocates 32 pages. So even if dma_map_sg succeeds, it might waste a lot
of memory.

>> }
>> if (n_sg=-ENOMEM)
>>    return -ENOMEM
> 
> dma_map_sg/dma_unmap_sg should be moved to vb2-dma-sg memory allocator. 
> The best place for calling them is buf_prepare() and buf_finish() 
> callbacks. I think that I've already pointed this some time ago, but 
> unfortunately I didn't find enough time to convert existing code.

That would be nice, but this is a separate issue.

> For solving the problem described by Hans, I think that vb2-dma-sg 
> memory allocator should check dma mask of the client device and add 
> appropriate GFP_DMA or GFP_DMA32 flags to alloc_pages(). This should fix 
> the issues with failed dma_map_sg due to lack of bouncing buffers.

Those GFP flags are for the scatterlist itself, and that can be placed
anywhere in memory (frankly, I'm not sure why sg_alloc_table_from_pages
has a gfp_flags argument at all and I think it is used incorrectly in
videobuf2-dma-sg.c as well).

I see two options. The first is the patch I included below: this adds a
bool to sg_alloc_table_from_pages() that tells it whether or not page
combining should be enabled. It also adds the vb2 queue's gfp_flags as
an argument to the get_userptr operation. In videobuf2-dma-sg.c that is
checked to see whether or not sg_alloc_table_from_pages() should enable
page-combining.

The alternative would be to have vb2_queue_init check if the use of
V4L2_MEMORY_USERPTR would lead to dma bouncing based on the q->io_modes
and q->gfp_flags and if so, remove USERPTR support from io_modes. Do
we really want to have page bouncing for video capture?

Feedback would be welcome as I am not sure what the best solution is.

Regards,

	Hans

PS: for a final version this patch would be split up in 2 or 3 patches.

Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 79f8b39..404716f 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -1468,7 +1468,7 @@ static int arm_iommu_get_sgtable(struct device *dev, struct sg_table *sgt,
 		return -ENXIO;
 
 	return sg_alloc_table_from_pages(sgt, pages, count, 0, size,
-					 GFP_KERNEL);
+					 true, GFP_KERNEL);
 }
 
 static int __dma_direction_to_prot(enum dma_data_direction dir)
diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
index 56805c3..4c56338 100644
--- a/drivers/gpu/drm/drm_prime.c
+++ b/drivers/gpu/drm/drm_prime.c
@@ -617,7 +617,7 @@ struct sg_table *drm_prime_pages_to_sg(struct page **pages, int nr_pages)
 	}
 
 	ret = sg_alloc_table_from_pages(sg, pages, nr_pages, 0,
-				nr_pages << PAGE_SHIFT, GFP_KERNEL);
+				nr_pages << PAGE_SHIFT, true, GFP_KERNEL);
 	if (ret)
 		goto out;
 
diff --git a/drivers/gpu/drm/exynos/exynos_drm_g2d.c b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
index 7bccedc..0236f9b 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_g2d.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
@@ -505,7 +505,7 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct drm_device *drm_dev,
 	}
 
 	ret = sg_alloc_table_from_pages(sgt, pages, npages, offset,
-					size, GFP_KERNEL);
+					size, true, GFP_KERNEL);
 	if (ret < 0) {
 		DRM_ERROR("failed to get sgt from pages.\n");
 		goto err_free_sgt;
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c b/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c
index 7776e6f..7ee560c 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c
@@ -348,6 +348,7 @@ static int vmw_ttm_map_dma(struct vmw_ttm_tt *vmw_tt)
 						vsgt->num_pages, 0,
 						(unsigned long)
 						vsgt->num_pages << PAGE_SHIFT,
+						true,
 						GFP_KERNEL);
 		if (unlikely(ret != 0))
 			goto out_sg_alloc_fail;
diff --git a/drivers/media/v4l2-core/videobuf2-core.c b/drivers/media/v4l2-core/videobuf2-core.c
index c203b9c..9dd2149 100644
--- a/drivers/media/v4l2-core/videobuf2-core.c
+++ b/drivers/media/v4l2-core/videobuf2-core.c
@@ -1081,7 +1081,8 @@ static int __qbuf_userptr(struct vb2_buffer *vb, const struct v4l2_buffer *b)
 		/* Acquire each plane's memory */
 		mem_priv = call_memop(q, get_userptr, q->alloc_ctx[plane],
 				      planes[plane].m.userptr,
-				      planes[plane].length, write);
+				      planes[plane].length, write,
+				      q->gfp_flags);
 		if (IS_ERR_OR_NULL(mem_priv)) {
 			dprintk(1, "qbuf: failed acquiring userspace "
 						"memory for plane %d\n", plane);
diff --git a/drivers/media/v4l2-core/videobuf2-dma-contig.c b/drivers/media/v4l2-core/videobuf2-dma-contig.c
index 33d3871d..0a6c079 100644
--- a/drivers/media/v4l2-core/videobuf2-dma-contig.c
+++ b/drivers/media/v4l2-core/videobuf2-dma-contig.c
@@ -548,7 +548,7 @@ static inline dma_addr_t vb2_dc_pfn_to_dma(struct device *dev, unsigned long pfn
 #endif
 
 static void *vb2_dc_get_userptr(void *alloc_ctx, unsigned long vaddr,
-	unsigned long size, int write)
+	unsigned long size, int write, unsigned gfp_flags)
 {
 	struct vb2_dc_conf *conf = alloc_ctx;
 	struct vb2_dc_buf *buf;
@@ -637,7 +637,7 @@ static void *vb2_dc_get_userptr(void *alloc_ctx, unsigned long vaddr,
 	}
 
 	ret = sg_alloc_table_from_pages(sgt, pages, n_pages,
-		offset, size, GFP_KERNEL);
+		offset, size, true, GFP_KERNEL);
 	if (ret) {
 		pr_err("failed to initialize sg table\n");
 		goto fail_sgt;
diff --git a/drivers/media/v4l2-core/videobuf2-dma-sg.c b/drivers/media/v4l2-core/videobuf2-dma-sg.c
index c779f21..3603f36 100644
--- a/drivers/media/v4l2-core/videobuf2-dma-sg.c
+++ b/drivers/media/v4l2-core/videobuf2-dma-sg.c
@@ -113,7 +113,7 @@ static void *vb2_dma_sg_alloc(void *alloc_ctx, unsigned long size, gfp_t gfp_fla
 		goto fail_pages_alloc;
 
 	ret = sg_alloc_table_from_pages(&buf->sg_table, buf->pages,
-			buf->num_pages, 0, size, gfp_flags);
+			buf->num_pages, 0, size, true, GFP_KERNEL);
 	if (ret)
 		goto fail_table_alloc;
 
@@ -162,12 +162,14 @@ static inline int vma_is_io(struct vm_area_struct *vma)
 }
 
 static void *vb2_dma_sg_get_userptr(void *alloc_ctx, unsigned long vaddr,
-				    unsigned long size, int write)
+				    unsigned long size, int write,
+				    unsigned gfp_flags)
 {
 	struct vb2_dma_sg_buf *buf;
 	unsigned long first, last;
 	int num_pages_from_user;
 	struct vm_area_struct *vma;
+	bool combine_pages = true;
 
 	buf = kzalloc(sizeof *buf, GFP_KERNEL);
 	if (!buf)
@@ -229,8 +231,24 @@ static void *vb2_dma_sg_get_userptr(void *alloc_ctx, unsigned long vaddr,
 	if (num_pages_from_user != buf->num_pages)
 		goto userptr_fail_get_user_pages;
 
+	/*
+	 * Do not attempt to combine pages if bounce buffers
+	 * have to be involved: allocating a bounce buffer of multiple
+	 * pages will easily fail due to memory fragmentation of the
+	 * DMA-able memory zone.
+	 */
+#ifdef CONFIG_ZONE_DMA
+	if (gfp_flags & GFP_DMA)
+		combine_pages = false;
+#endif
+#ifdef CONFIG_ZONE_DMA32
+	if (gfp_flags & GFP_DMA32)
+		combine_pages = false;
+#endif
+
 	if (sg_alloc_table_from_pages(&buf->sg_table, buf->pages,
-			buf->num_pages, buf->offset, size, 0))
+			buf->num_pages, buf->offset, size,
+			combine_pages, GFP_KERNEL))
 		goto userptr_fail_alloc_table_from_pages;
 
 	return buf;
diff --git a/drivers/media/v4l2-core/videobuf2-vmalloc.c b/drivers/media/v4l2-core/videobuf2-vmalloc.c
index 313d977..12569bc 100644
--- a/drivers/media/v4l2-core/videobuf2-vmalloc.c
+++ b/drivers/media/v4l2-core/videobuf2-vmalloc.c
@@ -70,7 +70,8 @@ static void vb2_vmalloc_put(void *buf_priv)
 }
 
 static void *vb2_vmalloc_get_userptr(void *alloc_ctx, unsigned long vaddr,
-				     unsigned long size, int write)
+				     unsigned long size, int write,
+				     unsigned gfp_flags)
 {
 	struct vb2_vmalloc_buf *buf;
 	unsigned long first, last;
diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index adae88f..eb30306 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -237,7 +237,7 @@ int sg_alloc_table(struct sg_table *, unsigned int, gfp_t);
 int sg_alloc_table_from_pages(struct sg_table *sgt,
 	struct page **pages, unsigned int n_pages,
 	unsigned long offset, unsigned long size,
-	gfp_t gfp_mask);
+	bool combine_pages, gfp_t gfp_mask);
 
 size_t sg_copy_from_buffer(struct scatterlist *sgl, unsigned int nents,
 			   void *buf, size_t buflen);
diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h
index cf870e5..5bd8446 100644
--- a/include/media/videobuf2-core.h
+++ b/include/media/videobuf2-core.h
@@ -87,7 +87,8 @@ struct vb2_mem_ops {
 	struct dma_buf *(*get_dmabuf)(void *buf_priv, unsigned long flags);
 
 	void		*(*get_userptr)(void *alloc_ctx, unsigned long vaddr,
-					unsigned long size, int write);
+					unsigned long size, int write,
+					unsigned gfp_flags);
 	void		(*put_userptr)(void *buf_priv);
 
 	void		(*prepare)(void *buf_priv);
diff --git a/lib/scatterlist.c b/lib/scatterlist.c
index d16fa29..7c5a96a 100644
--- a/lib/scatterlist.c
+++ b/lib/scatterlist.c
@@ -340,14 +340,15 @@ EXPORT_SYMBOL(sg_alloc_table);
  * @n_pages:	Number of pages in the pages array
  * @offset:     Offset from start of the first page to the start of a buffer
  * @size:       Number of valid bytes in the buffer (after offset)
+ * @combine_pages: If true, then combine consecutive pages into one chunk
  * @gfp_mask:	GFP allocation mask
  *
  *  Description:
  *    Allocate and initialize an sg table from a list of pages. Contiguous
- *    ranges of the pages are squashed into a single scatterlist node. A user
- *    may provide an offset at a start and a size of valid data in a buffer
- *    specified by the page array. The returned sg table is released by
- *    sg_free_table.
+ *    ranges of the pages are squashed into a single scatterlist node if
+ *    @combine_pages is true. A user may provide an offset at a start and a
+ *    size of valid data in a buffer specified by the page array. The returned
+ *    sg table is released by sg_free_table.
  *
  * Returns:
  *   0 on success, negative error on failure
@@ -355,7 +356,7 @@ EXPORT_SYMBOL(sg_alloc_table);
 int sg_alloc_table_from_pages(struct sg_table *sgt,
 	struct page **pages, unsigned int n_pages,
 	unsigned long offset, unsigned long size,
-	gfp_t gfp_mask)
+	bool combine_pages, gfp_t gfp_mask)
 {
 	unsigned int chunks;
 	unsigned int i;
@@ -366,7 +367,8 @@ int sg_alloc_table_from_pages(struct sg_table *sgt,
 	/* compute number of contiguous chunks */
 	chunks = 1;
 	for (i = 1; i < n_pages; ++i)
-		if (page_to_pfn(pages[i]) != page_to_pfn(pages[i - 1]) + 1)
+		if (!combine_pages ||
+		    page_to_pfn(pages[i]) != page_to_pfn(pages[i - 1]) + 1)
 			++chunks;
 
 	ret = sg_alloc_table(sgt, chunks, gfp_mask);
@@ -381,8 +383,8 @@ int sg_alloc_table_from_pages(struct sg_table *sgt,
 
 		/* look for the end of the current chunk */
 		for (j = cur_page + 1; j < n_pages; ++j)
-			if (page_to_pfn(pages[j]) !=
-			    page_to_pfn(pages[j - 1]) + 1)
+			if (!combine_pages ||
+			    page_to_pfn(pages[j]) != page_to_pfn(pages[j - 1]) + 1)
 				break;
 
 		chunk_size = ((j - cur_page) << PAGE_SHIFT) - offset;


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 2/2] videobuf2-dma-sg: Replace vb2_dma_sg_desc with sg_table
  2014-01-13  9:54             ` Hans Verkuil
@ 2014-01-13 13:02               ` Marek Szyprowski
  2014-01-13 13:49                 ` Hans Verkuil
  0 siblings, 1 reply; 11+ messages in thread
From: Marek Szyprowski @ 2014-01-13 13:02 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: Ricardo Ribalda Delgado, Jonathan Corbet, Mauro Carvalho Chehab,
	Pawel Osciak, Kyungmin Park, Ismael Luceno, Greg Kroah-Hartman,
	linux-media, Sylwester Nawrocki

Hello,

On 2014-01-13 10:54, Hans Verkuil wrote:
> Hi Marek, Ricardo,
>
> On 01/08/2014 03:07 PM, Marek Szyprowski wrote:
> > Hello All,
> >
> > On 2014-01-03 16:51, Ricardo Ribalda Delgado wrote:
> >> Hello Hans
> >>
> >> What if we move the dma_map_sg and dma_unmap_sg to the vb2 interface,
> >> and there do something like:
> >>
> >> n_sg= dma_map_sg()
> >> if (n_sg=-ENOMEM){
> >>     split_table() //Breaks down the sg_table into monopages sg
> >>     n_sg= dma_map_sg()
>
> This is not a good approach. Remember that if swiotbl needs to allocate
> e.g. 17 contiguous pages it will round up to the next power of two, so it
> allocates 32 pages. So even if dma_map_sg succeeds, it might waste a lot
> of memory.
>
> >> }
> >> if (n_sg=-ENOMEM)
> >>    return -ENOMEM
> >
> > dma_map_sg/dma_unmap_sg should be moved to vb2-dma-sg memory allocator.
> > The best place for calling them is buf_prepare() and buf_finish()
> > callbacks. I think that I've already pointed this some time ago, but
> > unfortunately I didn't find enough time to convert existing code.
>
> That would be nice, but this is a separate issue.
>
> > For solving the problem described by Hans, I think that vb2-dma-sg
> > memory allocator should check dma mask of the client device and add
> > appropriate GFP_DMA or GFP_DMA32 flags to alloc_pages(). This should fix
> > the issues with failed dma_map_sg due to lack of bouncing buffers.
>
> Those GFP flags are for the scatterlist itself, and that can be placed
> anywhere in memory (frankly, I'm not sure why sg_alloc_table_from_pages
> has a gfp_flags argument at all and I think it is used incorrectly in
> videobuf2-dma-sg.c as well).

I was talking about GFP flags passed to alloc_pages in vb2_dma_sg allocator,
not the sg_alloc_table_from_pages().

IMHO the following changes should fix your problem:

1. add client struct device pointer to vb2_dma_sg_desc, so vb2_dma_sg
allocator will be able to check dma parameters of the client device.

2. add following check to vb2_dma_sg_alloc_compacted():

if (dev->dma_mask) {
     if (dev->dma_mask < DMA_BIT_MASK(32))
         gfp_mask |= GFP_DMA;
     else if (dev->dev_mask == DMA_BIT_MASK(32)
         gfp_mask |= GFP_DMA32;
}


> I see two options. The first is the patch I included below: this adds a
> bool to sg_alloc_table_from_pages() that tells it whether or not page
> combining should be enabled. It also adds the vb2 queue's gfp_flags as
> an argument to the get_userptr operation. In videobuf2-dma-sg.c that is
> checked to see whether or not sg_alloc_table_from_pages() should enable
> page-combining.
>
> The alternative would be to have vb2_queue_init check if the use of
> V4L2_MEMORY_USERPTR would lead to dma bouncing based on the q->io_modes
> and q->gfp_flags and if so, remove USERPTR support from io_modes. Do
> we really want to have page bouncing for video capture?

So the main problem is about user ptr modes? This once again shows that
the current user pointer API is too limited and should be replaced by
something more reliable.

> Feedback would be welcome as I am not sure what the best solution is.


Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 2/2] videobuf2-dma-sg: Replace vb2_dma_sg_desc with sg_table
  2014-01-13 13:02               ` Marek Szyprowski
@ 2014-01-13 13:49                 ` Hans Verkuil
  0 siblings, 0 replies; 11+ messages in thread
From: Hans Verkuil @ 2014-01-13 13:49 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Ricardo Ribalda Delgado, Jonathan Corbet, Mauro Carvalho Chehab,
	Pawel Osciak, Kyungmin Park, Ismael Luceno, Greg Kroah-Hartman,
	linux-media, Sylwester Nawrocki

On 01/13/2014 02:02 PM, Marek Szyprowski wrote:
> Hello,
> 
> On 2014-01-13 10:54, Hans Verkuil wrote:
>> Hi Marek, Ricardo,
>>
>> On 01/08/2014 03:07 PM, Marek Szyprowski wrote:
>>> Hello All,
>>>
>>> On 2014-01-03 16:51, Ricardo Ribalda Delgado wrote:
>>>> Hello Hans
>>>>
>>>> What if we move the dma_map_sg and dma_unmap_sg to the vb2 interface,
>>>> and there do something like:
>>>>
>>>> n_sg= dma_map_sg()
>>>> if (n_sg=-ENOMEM){
>>>>     split_table() //Breaks down the sg_table into monopages sg
>>>>     n_sg= dma_map_sg()
>>
>> This is not a good approach. Remember that if swiotbl needs to allocate
>> e.g. 17 contiguous pages it will round up to the next power of two, so it
>> allocates 32 pages. So even if dma_map_sg succeeds, it might waste a lot
>> of memory.
>>
>>>> }
>>>> if (n_sg=-ENOMEM)
>>>>    return -ENOMEM
>>>
>>> dma_map_sg/dma_unmap_sg should be moved to vb2-dma-sg memory allocator.
>>> The best place for calling them is buf_prepare() and buf_finish()
>>> callbacks. I think that I've already pointed this some time ago, but
>>> unfortunately I didn't find enough time to convert existing code.
>>
>> That would be nice, but this is a separate issue.
>>
>>> For solving the problem described by Hans, I think that vb2-dma-sg
>>> memory allocator should check dma mask of the client device and add
>>> appropriate GFP_DMA or GFP_DMA32 flags to alloc_pages(). This should fix
>>> the issues with failed dma_map_sg due to lack of bouncing buffers.
>>
>> Those GFP flags are for the scatterlist itself, and that can be placed
>> anywhere in memory (frankly, I'm not sure why sg_alloc_table_from_pages
>> has a gfp_flags argument at all and I think it is used incorrectly in
>> videobuf2-dma-sg.c as well).
> 
> I was talking about GFP flags passed to alloc_pages in vb2_dma_sg allocator,
> not the sg_alloc_table_from_pages().
> 
> IMHO the following changes should fix your problem:
> 
> 1. add client struct device pointer to vb2_dma_sg_desc, so vb2_dma_sg
> allocator will be able to check dma parameters of the client device.
> 
> 2. add following check to vb2_dma_sg_alloc_compacted():
> 
> if (dev->dma_mask) {
>      if (dev->dma_mask < DMA_BIT_MASK(32))
>          gfp_mask |= GFP_DMA;
>      else if (dev->dev_mask == DMA_BIT_MASK(32)
>          gfp_mask |= GFP_DMA32;
> }

No, it doesn't. This concerns the USERPTR memory model, so the memory for
the buffer is allocated by userspace, not by the kernel. The MMAP memory
model works fine, that's not where the problem is.

> 
> 
>> I see two options. The first is the patch I included below: this adds a
>> bool to sg_alloc_table_from_pages() that tells it whether or not page
>> combining should be enabled. It also adds the vb2 queue's gfp_flags as
>> an argument to the get_userptr operation. In videobuf2-dma-sg.c that is
>> checked to see whether or not sg_alloc_table_from_pages() should enable
>> page-combining.
>>
>> The alternative would be to have vb2_queue_init check if the use of
>> V4L2_MEMORY_USERPTR would lead to dma bouncing based on the q->io_modes
>> and q->gfp_flags and if so, remove USERPTR support from io_modes. Do
>> we really want to have page bouncing for video capture?
> 
> So the main problem is about user ptr modes? This once again shows that
> the current user pointer API is too limited and should be replaced by
> something more reliable.

The userptr model worked perfectly fine before the memory compaction was
added. This is a pure regression.

Regards,

	Hans

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2014-01-13 13:49 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-02 14:19 [PATCH v4 0/2] videobuf2-dma-sg: Contiguos memory allocation Ricardo Ribalda Delgado
2013-08-02 14:19 ` [PATCH v4 1/2] videobuf2-dma-sg: Allocate pages as contiguous as possible Ricardo Ribalda Delgado
2013-08-02 14:20 ` [PATCH v4 2/2] videobuf2-dma-sg: Replace vb2_dma_sg_desc with sg_table Ricardo Ribalda Delgado
2014-01-03 14:52   ` Hans Verkuil
2014-01-03 15:17     ` Ricardo Ribalda Delgado
2014-01-03 15:36       ` Hans Verkuil
2014-01-03 15:51         ` Ricardo Ribalda Delgado
2014-01-08 14:07           ` Marek Szyprowski
2014-01-13  9:54             ` Hans Verkuil
2014-01-13 13:02               ` Marek Szyprowski
2014-01-13 13:49                 ` Hans Verkuil

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).