linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/8] videobuf2: support new noncontiguous DMA API
@ 2021-03-02  0:46 Sergey Senozhatsky
  2021-03-02  0:46 ` [PATCH 1/8] videobuf2: rework vb2_mem_ops API Sergey Senozhatsky
                   ` (9 more replies)
  0 siblings, 10 replies; 21+ messages in thread
From: Sergey Senozhatsky @ 2021-03-02  0:46 UTC (permalink / raw)
  To: Tomasz Figa, Hans Verkuil
  Cc: Christoph Hellwig, Mauro Carvalho Chehab, linux-media,
	linux-kernel, Sergey Senozhatsky

Hello,

	RFC

	The series adds support for new noncontiguous DMA API [0] and
adds V4L2_FLAG_MEMORY_NON_COHERENT UAPI. This is similar to previous
V4L2_FLAG_MEMORY_NON_CONSISTENT (which was renamed), but the patch set
goes a bit further this time and also does some videobuf2 API
refactroings along the way.

A corresponding v4l2-compliance patch will be posted shortly.

[0] https://lore.kernel.org/lkml/20210301085236.947011-2-hch@lst.de/

Sergey Senozhatsky (8):
  videobuf2: rework vb2_mem_ops API
  videobuf2: inverse buffer cache_hints flags
  videobuf2: split buffer cache_hints initialisation
  videobuf2: move cache_hints handling to allocators
  videobuf2: add V4L2_FLAG_MEMORY_NON_COHERENT flag
  videobuf2: add queue memory coherency parameter
  videobuf2: handle V4L2_FLAG_MEMORY_NON_COHERENT flag
  videobuf2: handle non-contiguous DMA allocations

 .../userspace-api/media/v4l/buffer.rst        |  40 +++-
 .../media/v4l/vidioc-create-bufs.rst          |   7 +-
 .../media/v4l/vidioc-reqbufs.rst              |  16 +-
 .../media/common/videobuf2/videobuf2-core.c   | 135 +++++++++-----
 .../common/videobuf2/videobuf2-dma-contig.c   | 175 ++++++++++++++----
 .../media/common/videobuf2/videobuf2-dma-sg.c |  39 ++--
 .../media/common/videobuf2/videobuf2-v4l2.c   |  47 ++---
 .../common/videobuf2/videobuf2-vmalloc.c      |  30 +--
 drivers/media/dvb-core/dvb_vb2.c              |   2 +-
 drivers/media/v4l2-core/v4l2-compat-ioctl32.c |   9 +-
 drivers/media/v4l2-core/v4l2-ioctl.c          |   5 +-
 include/media/videobuf2-core.h                |  57 +++---
 include/uapi/linux/videodev2.h                |  13 +-
 13 files changed, 396 insertions(+), 179 deletions(-)

-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/8] videobuf2: rework vb2_mem_ops API
  2021-03-02  0:46 [PATCH 0/8] videobuf2: support new noncontiguous DMA API Sergey Senozhatsky
@ 2021-03-02  0:46 ` Sergey Senozhatsky
  2021-03-02  0:46 ` [PATCH 2/8] videobuf2: inverse buffer cache_hints flags Sergey Senozhatsky
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 21+ messages in thread
From: Sergey Senozhatsky @ 2021-03-02  0:46 UTC (permalink / raw)
  To: Tomasz Figa, Hans Verkuil
  Cc: Christoph Hellwig, Mauro Carvalho Chehab, linux-media,
	linux-kernel, Sergey Senozhatsky

With new DMA API we need an extension of videobuf2 API. Previously,
videobuf2 core would set non-coherent DMA bit in vb2 queue dma_attr
(if user-space would pass a corresponding memory hint); vb2 core
then would pass the vb2 queue dma_attrs to the vb2 allocators.
vb2 allocator would use queue's dma_attr and DMA API would allocate
either coherent or non-coherent memory.

But we cannot do this anymore, since there is no corresponding DMA
attr flag and, hence, there is no way for the allocator to become
aware of what type of allocation user-space has requested. So we
need to pass more context from videobuf2 core to the allocators.

Fix this by changing call_ptr_memop() macro to pass vb2 pointer to
corresponding op callbacks.

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
 .../media/common/videobuf2/videobuf2-core.c   | 42 +++++++++++--------
 .../common/videobuf2/videobuf2-dma-contig.c   | 36 +++++++++-------
 .../media/common/videobuf2/videobuf2-dma-sg.c | 33 ++++++++-------
 .../common/videobuf2/videobuf2-vmalloc.c      | 30 ++++++-------
 include/media/videobuf2-core.h                | 37 ++++++++--------
 5 files changed, 98 insertions(+), 80 deletions(-)

diff --git a/drivers/media/common/videobuf2/videobuf2-core.c b/drivers/media/common/videobuf2/videobuf2-core.c
index 02281d13505f..9a5cc3e63439 100644
--- a/drivers/media/common/videobuf2/videobuf2-core.c
+++ b/drivers/media/common/videobuf2/videobuf2-core.c
@@ -68,13 +68,13 @@ module_param(debug, int, 0644);
 	err;								\
 })
 
-#define call_ptr_memop(vb, op, args...)					\
+#define call_ptr_memop(op, vb, args...)					\
 ({									\
 	struct vb2_queue *_q = (vb)->vb2_queue;				\
 	void *ptr;							\
 									\
 	log_memop(vb, op);						\
-	ptr = _q->mem_ops->op ? _q->mem_ops->op(args) : NULL;		\
+	ptr = _q->mem_ops->op ? _q->mem_ops->op(vb, args) : NULL;	\
 	if (!IS_ERR_OR_NULL(ptr))					\
 		(vb)->cnt_mem_ ## op++;					\
 	ptr;								\
@@ -144,9 +144,9 @@ module_param(debug, int, 0644);
 	((vb)->vb2_queue->mem_ops->op ?					\
 		(vb)->vb2_queue->mem_ops->op(args) : 0)
 
-#define call_ptr_memop(vb, op, args...)					\
+#define call_ptr_memop(op, vb, args...)					\
 	((vb)->vb2_queue->mem_ops->op ?					\
-		(vb)->vb2_queue->mem_ops->op(args) : NULL)
+		(vb)->vb2_queue->mem_ops->op(vb, args) : NULL)
 
 #define call_void_memop(vb, op, args...)				\
 	do {								\
@@ -230,9 +230,10 @@ static int __vb2_buf_mem_alloc(struct vb2_buffer *vb)
 		if (size < vb->planes[plane].length)
 			goto free;
 
-		mem_priv = call_ptr_memop(vb, alloc,
-				q->alloc_devs[plane] ? : q->dev,
-				q->dma_attrs, size, q->dma_dir, q->gfp_flags);
+		mem_priv = call_ptr_memop(alloc,
+					  vb,
+					  q->alloc_devs[plane] ? : q->dev,
+					  size);
 		if (IS_ERR_OR_NULL(mem_priv)) {
 			if (mem_priv)
 				ret = PTR_ERR(mem_priv);
@@ -975,7 +976,7 @@ void *vb2_plane_vaddr(struct vb2_buffer *vb, unsigned int plane_no)
 	if (plane_no >= vb->num_planes || !vb->planes[plane_no].mem_priv)
 		return NULL;
 
-	return call_ptr_memop(vb, vaddr, vb->planes[plane_no].mem_priv);
+	return call_ptr_memop(vaddr, vb, vb->planes[plane_no].mem_priv);
 
 }
 EXPORT_SYMBOL_GPL(vb2_plane_vaddr);
@@ -985,7 +986,7 @@ void *vb2_plane_cookie(struct vb2_buffer *vb, unsigned int plane_no)
 	if (plane_no >= vb->num_planes || !vb->planes[plane_no].mem_priv)
 		return NULL;
 
-	return call_ptr_memop(vb, cookie, vb->planes[plane_no].mem_priv);
+	return call_ptr_memop(cookie, vb, vb->planes[plane_no].mem_priv);
 }
 EXPORT_SYMBOL_GPL(vb2_plane_cookie);
 
@@ -1125,10 +1126,11 @@ static int __prepare_userptr(struct vb2_buffer *vb)
 		vb->planes[plane].data_offset = 0;
 
 		/* Acquire each plane's memory */
-		mem_priv = call_ptr_memop(vb, get_userptr,
-				q->alloc_devs[plane] ? : q->dev,
-				planes[plane].m.userptr,
-				planes[plane].length, q->dma_dir);
+		mem_priv = call_ptr_memop(get_userptr,
+					  vb,
+					  q->alloc_devs[plane] ? : q->dev,
+					  planes[plane].m.userptr,
+					  planes[plane].length);
 		if (IS_ERR(mem_priv)) {
 			dprintk(q, 1, "failed acquiring userspace memory for plane %d\n",
 				plane);
@@ -1249,9 +1251,11 @@ static int __prepare_dmabuf(struct vb2_buffer *vb)
 		vb->planes[plane].data_offset = 0;
 
 		/* Acquire each plane's memory */
-		mem_priv = call_ptr_memop(vb, attach_dmabuf,
-				q->alloc_devs[plane] ? : q->dev,
-				dbuf, planes[plane].length, q->dma_dir);
+		mem_priv = call_ptr_memop(attach_dmabuf,
+					  vb,
+					  q->alloc_devs[plane] ? : q->dev,
+					  dbuf,
+					  planes[plane].length);
 		if (IS_ERR(mem_priv)) {
 			dprintk(q, 1, "failed to attach dmabuf\n");
 			ret = PTR_ERR(mem_priv);
@@ -2176,8 +2180,10 @@ int vb2_core_expbuf(struct vb2_queue *q, int *fd, unsigned int type,
 
 	vb_plane = &vb->planes[plane];
 
-	dbuf = call_ptr_memop(vb, get_dmabuf, vb_plane->mem_priv,
-				flags & O_ACCMODE);
+	dbuf = call_ptr_memop(get_dmabuf,
+			      vb,
+			      vb_plane->mem_priv,
+			      flags & O_ACCMODE);
 	if (IS_ERR_OR_NULL(dbuf)) {
 		dprintk(q, 1, "failed to export buffer %d, plane %d\n",
 			index, plane);
diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
index a7f61ba85440..019c3843dc6d 100644
--- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
+++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
@@ -40,6 +40,8 @@ struct vb2_dc_buf {
 
 	/* DMABUF related */
 	struct dma_buf_attachment	*db_attach;
+
+	struct vb2_buffer		*vb;
 };
 
 /*********************************************/
@@ -66,14 +68,14 @@ static unsigned long vb2_dc_get_contiguous_size(struct sg_table *sgt)
 /*         callbacks for all buffers         */
 /*********************************************/
 
-static void *vb2_dc_cookie(void *buf_priv)
+static void *vb2_dc_cookie(struct vb2_buffer *vb, void *buf_priv)
 {
 	struct vb2_dc_buf *buf = buf_priv;
 
 	return &buf->dma_addr;
 }
 
-static void *vb2_dc_vaddr(void *buf_priv)
+static void *vb2_dc_vaddr(struct vb2_buffer *vb, void *buf_priv)
 {
 	struct vb2_dc_buf *buf = buf_priv;
 	struct dma_buf_map map;
@@ -137,9 +139,9 @@ static void vb2_dc_put(void *buf_priv)
 	kfree(buf);
 }
 
-static void *vb2_dc_alloc(struct device *dev, unsigned long attrs,
-			  unsigned long size, enum dma_data_direction dma_dir,
-			  gfp_t gfp_flags)
+static void *vb2_dc_alloc(struct vb2_buffer *vb,
+			  struct device *dev,
+			  unsigned long size)
 {
 	struct vb2_dc_buf *buf;
 
@@ -150,9 +152,10 @@ static void *vb2_dc_alloc(struct device *dev, unsigned long attrs,
 	if (!buf)
 		return ERR_PTR(-ENOMEM);
 
-	buf->attrs = attrs;
+	buf->attrs = vb->vb2_queue->dma_attrs;
 	buf->cookie = dma_alloc_attrs(dev, size, &buf->dma_addr,
-					GFP_KERNEL | gfp_flags, buf->attrs);
+				      GFP_KERNEL | vb->vb2_queue->gfp_flags,
+				      buf->attrs);
 	if (!buf->cookie) {
 		dev_err(dev, "dma_alloc_coherent of size %ld failed\n", size);
 		kfree(buf);
@@ -165,11 +168,12 @@ static void *vb2_dc_alloc(struct device *dev, unsigned long attrs,
 	/* Prevent the device from being released while the buffer is used */
 	buf->dev = get_device(dev);
 	buf->size = size;
-	buf->dma_dir = dma_dir;
+	buf->dma_dir = vb->vb2_queue->dma_dir;
 
 	buf->handler.refcount = &buf->refcount;
 	buf->handler.put = vb2_dc_put;
 	buf->handler.arg = buf;
+	buf->vb = vb;
 
 	refcount_set(&buf->refcount, 1);
 
@@ -397,7 +401,9 @@ static struct sg_table *vb2_dc_get_base_sgt(struct vb2_dc_buf *buf)
 	return sgt;
 }
 
-static struct dma_buf *vb2_dc_get_dmabuf(void *buf_priv, unsigned long flags)
+static struct dma_buf *vb2_dc_get_dmabuf(struct vb2_buffer *vb,
+					 void *buf_priv,
+					 unsigned long flags)
 {
 	struct vb2_dc_buf *buf = buf_priv;
 	struct dma_buf *dbuf;
@@ -459,8 +465,8 @@ static void vb2_dc_put_userptr(void *buf_priv)
 	kfree(buf);
 }
 
-static void *vb2_dc_get_userptr(struct device *dev, unsigned long vaddr,
-	unsigned long size, enum dma_data_direction dma_dir)
+static void *vb2_dc_get_userptr(struct vb2_buffer *vb, struct device *dev,
+				unsigned long vaddr, unsigned long size)
 {
 	struct vb2_dc_buf *buf;
 	struct frame_vector *vec;
@@ -490,7 +496,7 @@ static void *vb2_dc_get_userptr(struct device *dev, unsigned long vaddr,
 		return ERR_PTR(-ENOMEM);
 
 	buf->dev = dev;
-	buf->dma_dir = dma_dir;
+	buf->dma_dir = vb->vb2_queue->dma_dir;
 
 	offset = lower_32_bits(offset_in_page(vaddr));
 	vec = vb2_create_framevec(vaddr, size);
@@ -660,8 +666,8 @@ static void vb2_dc_detach_dmabuf(void *mem_priv)
 	kfree(buf);
 }
 
-static void *vb2_dc_attach_dmabuf(struct device *dev, struct dma_buf *dbuf,
-	unsigned long size, enum dma_data_direction dma_dir)
+static void *vb2_dc_attach_dmabuf(struct vb2_buffer *vb, struct device *dev,
+				  struct dma_buf *dbuf, unsigned long size)
 {
 	struct vb2_dc_buf *buf;
 	struct dma_buf_attachment *dba;
@@ -685,7 +691,7 @@ static void *vb2_dc_attach_dmabuf(struct device *dev, struct dma_buf *dbuf,
 		return dba;
 	}
 
-	buf->dma_dir = dma_dir;
+	buf->dma_dir = vb->vb2_queue->dma_dir;
 	buf->size = size;
 	buf->db_attach = dba;
 
diff --git a/drivers/media/common/videobuf2/videobuf2-dma-sg.c b/drivers/media/common/videobuf2/videobuf2-dma-sg.c
index 030e48218687..71094cb5c5d7 100644
--- a/drivers/media/common/videobuf2/videobuf2-dma-sg.c
+++ b/drivers/media/common/videobuf2/videobuf2-dma-sg.c
@@ -51,6 +51,8 @@ struct vb2_dma_sg_buf {
 	struct vb2_vmarea_handler	handler;
 
 	struct dma_buf_attachment	*db_attach;
+
+	struct vb2_buffer		*vb;
 };
 
 static void vb2_dma_sg_put(void *buf_priv);
@@ -96,9 +98,8 @@ static int vb2_dma_sg_alloc_compacted(struct vb2_dma_sg_buf *buf,
 	return 0;
 }
 
-static void *vb2_dma_sg_alloc(struct device *dev, unsigned long dma_attrs,
-			      unsigned long size, enum dma_data_direction dma_dir,
-			      gfp_t gfp_flags)
+static void *vb2_dma_sg_alloc(struct vb2_buffer *vb, struct device *dev,
+			      unsigned long size)
 {
 	struct vb2_dma_sg_buf *buf;
 	struct sg_table *sgt;
@@ -113,7 +114,7 @@ static void *vb2_dma_sg_alloc(struct device *dev, unsigned long dma_attrs,
 		return ERR_PTR(-ENOMEM);
 
 	buf->vaddr = NULL;
-	buf->dma_dir = dma_dir;
+	buf->dma_dir = vb->vb2_queue->dma_dir;
 	buf->offset = 0;
 	buf->size = size;
 	/* size is already page aligned */
@@ -130,7 +131,7 @@ static void *vb2_dma_sg_alloc(struct device *dev, unsigned long dma_attrs,
 	if (!buf->pages)
 		goto fail_pages_array_alloc;
 
-	ret = vb2_dma_sg_alloc_compacted(buf, gfp_flags);
+	ret = vb2_dma_sg_alloc_compacted(buf, vb->vb2_queue->gfp_flags);
 	if (ret)
 		goto fail_pages_alloc;
 
@@ -154,6 +155,7 @@ static void *vb2_dma_sg_alloc(struct device *dev, unsigned long dma_attrs,
 	buf->handler.refcount = &buf->refcount;
 	buf->handler.put = vb2_dma_sg_put;
 	buf->handler.arg = buf;
+	buf->vb = vb;
 
 	refcount_set(&buf->refcount, 1);
 
@@ -213,9 +215,8 @@ static void vb2_dma_sg_finish(void *buf_priv)
 	dma_sync_sgtable_for_cpu(buf->dev, sgt, buf->dma_dir);
 }
 
-static void *vb2_dma_sg_get_userptr(struct device *dev, unsigned long vaddr,
-				    unsigned long size,
-				    enum dma_data_direction dma_dir)
+static void *vb2_dma_sg_get_userptr(struct vb2_buffer *vb, struct device *dev,
+				    unsigned long vaddr, unsigned long size)
 {
 	struct vb2_dma_sg_buf *buf;
 	struct sg_table *sgt;
@@ -230,7 +231,7 @@ static void *vb2_dma_sg_get_userptr(struct device *dev, unsigned long vaddr,
 
 	buf->vaddr = NULL;
 	buf->dev = dev;
-	buf->dma_dir = dma_dir;
+	buf->dma_dir = vb->vb2_queue->dma_dir;
 	buf->offset = vaddr & ~PAGE_MASK;
 	buf->size = size;
 	buf->dma_sgt = &buf->sg_table;
@@ -292,7 +293,7 @@ static void vb2_dma_sg_put_userptr(void *buf_priv)
 	kfree(buf);
 }
 
-static void *vb2_dma_sg_vaddr(void *buf_priv)
+static void *vb2_dma_sg_vaddr(struct vb2_buffer *vb, void *buf_priv)
 {
 	struct vb2_dma_sg_buf *buf = buf_priv;
 	struct dma_buf_map map;
@@ -511,7 +512,9 @@ static const struct dma_buf_ops vb2_dma_sg_dmabuf_ops = {
 	.release = vb2_dma_sg_dmabuf_ops_release,
 };
 
-static struct dma_buf *vb2_dma_sg_get_dmabuf(void *buf_priv, unsigned long flags)
+static struct dma_buf *vb2_dma_sg_get_dmabuf(struct vb2_buffer *vb,
+					     void *buf_priv,
+					     unsigned long flags)
 {
 	struct vb2_dma_sg_buf *buf = buf_priv;
 	struct dma_buf *dbuf;
@@ -605,8 +608,8 @@ static void vb2_dma_sg_detach_dmabuf(void *mem_priv)
 	kfree(buf);
 }
 
-static void *vb2_dma_sg_attach_dmabuf(struct device *dev, struct dma_buf *dbuf,
-	unsigned long size, enum dma_data_direction dma_dir)
+static void *vb2_dma_sg_attach_dmabuf(struct vb2_buffer *vb, struct device *dev,
+				      struct dma_buf *dbuf, unsigned long size)
 {
 	struct vb2_dma_sg_buf *buf;
 	struct dma_buf_attachment *dba;
@@ -630,14 +633,14 @@ static void *vb2_dma_sg_attach_dmabuf(struct device *dev, struct dma_buf *dbuf,
 		return dba;
 	}
 
-	buf->dma_dir = dma_dir;
+	buf->dma_dir = vb->vb2_queue->dma_dir;
 	buf->size = size;
 	buf->db_attach = dba;
 
 	return buf;
 }
 
-static void *vb2_dma_sg_cookie(void *buf_priv)
+static void *vb2_dma_sg_cookie(struct vb2_buffer *vb, void *buf_priv)
 {
 	struct vb2_dma_sg_buf *buf = buf_priv;
 
diff --git a/drivers/media/common/videobuf2/videobuf2-vmalloc.c b/drivers/media/common/videobuf2/videobuf2-vmalloc.c
index 83f95258ec8c..c28c5e64a1a6 100644
--- a/drivers/media/common/videobuf2/videobuf2-vmalloc.c
+++ b/drivers/media/common/videobuf2/videobuf2-vmalloc.c
@@ -34,13 +34,12 @@ struct vb2_vmalloc_buf {
 
 static void vb2_vmalloc_put(void *buf_priv);
 
-static void *vb2_vmalloc_alloc(struct device *dev, unsigned long attrs,
-			       unsigned long size, enum dma_data_direction dma_dir,
-			       gfp_t gfp_flags)
+static void *vb2_vmalloc_alloc(struct vb2_buffer *vb, struct device *dev,
+			       unsigned long size)
 {
 	struct vb2_vmalloc_buf *buf;
 
-	buf = kzalloc(sizeof(*buf), GFP_KERNEL | gfp_flags);
+	buf = kzalloc(sizeof(*buf), GFP_KERNEL | vb->vb2_queue->gfp_flags);
 	if (!buf)
 		return ERR_PTR(-ENOMEM);
 
@@ -52,7 +51,7 @@ static void *vb2_vmalloc_alloc(struct device *dev, unsigned long attrs,
 		return ERR_PTR(-ENOMEM);
 	}
 
-	buf->dma_dir = dma_dir;
+	buf->dma_dir = vb->vb2_queue->dma_dir;
 	buf->handler.refcount = &buf->refcount;
 	buf->handler.put = vb2_vmalloc_put;
 	buf->handler.arg = buf;
@@ -71,9 +70,8 @@ static void vb2_vmalloc_put(void *buf_priv)
 	}
 }
 
-static void *vb2_vmalloc_get_userptr(struct device *dev, unsigned long vaddr,
-				     unsigned long size,
-				     enum dma_data_direction dma_dir)
+static void *vb2_vmalloc_get_userptr(struct vb2_buffer *vb, struct device *dev,
+				     unsigned long vaddr, unsigned long size)
 {
 	struct vb2_vmalloc_buf *buf;
 	struct frame_vector *vec;
@@ -84,7 +82,7 @@ static void *vb2_vmalloc_get_userptr(struct device *dev, unsigned long vaddr,
 	if (!buf)
 		return ERR_PTR(-ENOMEM);
 
-	buf->dma_dir = dma_dir;
+	buf->dma_dir = vb->vb2_queue->dma_dir;
 	offset = vaddr & ~PAGE_MASK;
 	buf->size = size;
 	vec = vb2_create_framevec(vaddr, size);
@@ -147,7 +145,7 @@ static void vb2_vmalloc_put_userptr(void *buf_priv)
 	kfree(buf);
 }
 
-static void *vb2_vmalloc_vaddr(void *buf_priv)
+static void *vb2_vmalloc_vaddr(struct vb2_buffer *vb, void *buf_priv)
 {
 	struct vb2_vmalloc_buf *buf = buf_priv;
 
@@ -339,7 +337,9 @@ static const struct dma_buf_ops vb2_vmalloc_dmabuf_ops = {
 	.release = vb2_vmalloc_dmabuf_ops_release,
 };
 
-static struct dma_buf *vb2_vmalloc_get_dmabuf(void *buf_priv, unsigned long flags)
+static struct dma_buf *vb2_vmalloc_get_dmabuf(struct vb2_buffer *vb,
+					      void *buf_priv,
+					      unsigned long flags)
 {
 	struct vb2_vmalloc_buf *buf = buf_priv;
 	struct dma_buf *dbuf;
@@ -403,8 +403,10 @@ static void vb2_vmalloc_detach_dmabuf(void *mem_priv)
 	kfree(buf);
 }
 
-static void *vb2_vmalloc_attach_dmabuf(struct device *dev, struct dma_buf *dbuf,
-	unsigned long size, enum dma_data_direction dma_dir)
+static void *vb2_vmalloc_attach_dmabuf(struct vb2_buffer *vb,
+				       struct device *dev,
+				       struct dma_buf *dbuf,
+				       unsigned long size)
 {
 	struct vb2_vmalloc_buf *buf;
 
@@ -416,7 +418,7 @@ static void *vb2_vmalloc_attach_dmabuf(struct device *dev, struct dma_buf *dbuf,
 		return ERR_PTR(-ENOMEM);
 
 	buf->dbuf = dbuf;
-	buf->dma_dir = dma_dir;
+	buf->dma_dir = vb->vb2_queue->dma_dir;
 	buf->size = size;
 
 	return buf;
diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h
index 799ba61b5b6f..d0d85be4809b 100644
--- a/include/media/videobuf2-core.h
+++ b/include/media/videobuf2-core.h
@@ -46,6 +46,7 @@ enum vb2_memory {
 
 struct vb2_fileio_data;
 struct vb2_threadio_data;
+struct vb2_buffer;
 
 /**
  * struct vb2_mem_ops - memory handling/memory allocator operations.
@@ -53,10 +54,8 @@ struct vb2_threadio_data;
  *		return ERR_PTR() on failure or a pointer to allocator private,
  *		per-buffer data on success; the returned private structure
  *		will then be passed as @buf_priv argument to other ops in this
- *		structure. Additional gfp_flags to use when allocating the
- *		are also passed to this operation. These flags are from the
- *		gfp_flags field of vb2_queue. The size argument to this function
- *		shall be *page aligned*.
+ *		structure. The size argument to this function shall be
+ *		*page aligned*.
  * @put:	inform the allocator that the buffer will no longer be used;
  *		usually will result in the allocator freeing the buffer (if
  *		no other users of this buffer are present); the @buf_priv
@@ -117,31 +116,33 @@ struct vb2_threadio_data;
  *       map_dmabuf, unmap_dmabuf.
  */
 struct vb2_mem_ops {
-	void		*(*alloc)(struct device *dev, unsigned long attrs,
-				  unsigned long size,
-				  enum dma_data_direction dma_dir,
-				  gfp_t gfp_flags);
+	void		*(*alloc)(struct vb2_buffer *vb,
+				  struct device *dev,
+				  unsigned long size);
 	void		(*put)(void *buf_priv);
-	struct dma_buf *(*get_dmabuf)(void *buf_priv, unsigned long flags);
-
-	void		*(*get_userptr)(struct device *dev, unsigned long vaddr,
-					unsigned long size,
-					enum dma_data_direction dma_dir);
+	struct dma_buf *(*get_dmabuf)(struct vb2_buffer *vb,
+				      void *buf_priv,
+				      unsigned long flags);
+
+	void		*(*get_userptr)(struct vb2_buffer *vb,
+					struct device *dev,
+					unsigned long vaddr,
+					unsigned long size);
 	void		(*put_userptr)(void *buf_priv);
 
 	void		(*prepare)(void *buf_priv);
 	void		(*finish)(void *buf_priv);
 
-	void		*(*attach_dmabuf)(struct device *dev,
+	void		*(*attach_dmabuf)(struct vb2_buffer *vb,
+					  struct device *dev,
 					  struct dma_buf *dbuf,
-					  unsigned long size,
-					  enum dma_data_direction dma_dir);
+					  unsigned long size);
 	void		(*detach_dmabuf)(void *buf_priv);
 	int		(*map_dmabuf)(void *buf_priv);
 	void		(*unmap_dmabuf)(void *buf_priv);
 
-	void		*(*vaddr)(void *buf_priv);
-	void		*(*cookie)(void *buf_priv);
+	void		*(*vaddr)(struct vb2_buffer *vb, void *buf_priv);
+	void		*(*cookie)(struct vb2_buffer *vb, void *buf_priv);
 
 	unsigned int	(*num_users)(void *buf_priv);
 
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/8] videobuf2: inverse buffer cache_hints flags
  2021-03-02  0:46 [PATCH 0/8] videobuf2: support new noncontiguous DMA API Sergey Senozhatsky
  2021-03-02  0:46 ` [PATCH 1/8] videobuf2: rework vb2_mem_ops API Sergey Senozhatsky
@ 2021-03-02  0:46 ` Sergey Senozhatsky
  2021-03-02  0:46 ` [PATCH 3/8] videobuf2: split buffer cache_hints initialisation Sergey Senozhatsky
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 21+ messages in thread
From: Sergey Senozhatsky @ 2021-03-02  0:46 UTC (permalink / raw)
  To: Tomasz Figa, Hans Verkuil
  Cc: Christoph Hellwig, Mauro Carvalho Chehab, linux-media,
	linux-kernel, Sergey Senozhatsky

It would be less error prone if the default cache hints value
(we kzalloc() structs, so it's zeroed out by default) would be
to "always sync/flush" caches. Inverse and rename cache hints
flags.

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
 .../media/common/videobuf2/videobuf2-core.c   | 31 ++++++-------------
 .../media/common/videobuf2/videobuf2-v4l2.c   | 17 +++-------
 include/media/videobuf2-core.h                | 12 +++----
 3 files changed, 21 insertions(+), 39 deletions(-)

diff --git a/drivers/media/common/videobuf2/videobuf2-core.c b/drivers/media/common/videobuf2/videobuf2-core.c
index 9a5cc3e63439..23e41fec9880 100644
--- a/drivers/media/common/videobuf2/videobuf2-core.c
+++ b/drivers/media/common/videobuf2/videobuf2-core.c
@@ -327,12 +327,12 @@ static void __vb2_buf_mem_prepare(struct vb2_buffer *vb)
 	if (vb->synced)
 		return;
 
-	if (vb->need_cache_sync_on_prepare) {
-		for (plane = 0; plane < vb->num_planes; ++plane)
-			call_void_memop(vb, prepare,
-					vb->planes[plane].mem_priv);
-	}
 	vb->synced = 1;
+	if (vb->skip_cache_sync_on_prepare)
+		return;
+
+	for (plane = 0; plane < vb->num_planes; ++plane)
+		call_void_memop(vb, prepare, vb->planes[plane].mem_priv);
 }
 
 /*
@@ -346,12 +346,12 @@ static void __vb2_buf_mem_finish(struct vb2_buffer *vb)
 	if (!vb->synced)
 		return;
 
-	if (vb->need_cache_sync_on_finish) {
-		for (plane = 0; plane < vb->num_planes; ++plane)
-			call_void_memop(vb, finish,
-					vb->planes[plane].mem_priv);
-	}
 	vb->synced = 0;
+	if (vb->skip_cache_sync_on_finish)
+		return;
+
+	for (plane = 0; plane < vb->num_planes; ++plane)
+		call_void_memop(vb, finish, vb->planes[plane].mem_priv);
 }
 
 /*
@@ -415,17 +415,6 @@ static int __vb2_queue_alloc(struct vb2_queue *q, enum vb2_memory memory,
 		vb->index = q->num_buffers + buffer;
 		vb->type = q->type;
 		vb->memory = memory;
-		/*
-		 * We need to set these flags here so that the videobuf2 core
-		 * will call ->prepare()/->finish() cache sync/flush on vb2
-		 * buffers when appropriate. However, we can avoid explicit
-		 * ->prepare() and ->finish() cache sync for DMABUF buffers,
-		 * because DMA exporter takes care of it.
-		 */
-		if (q->memory != VB2_MEMORY_DMABUF) {
-			vb->need_cache_sync_on_prepare = 1;
-			vb->need_cache_sync_on_finish = 1;
-		}
 		for (plane = 0; plane < num_planes; ++plane) {
 			vb->planes[plane].length = plane_sizes[plane];
 			vb->planes[plane].min_length = plane_sizes[plane];
diff --git a/drivers/media/common/videobuf2/videobuf2-v4l2.c b/drivers/media/common/videobuf2/videobuf2-v4l2.c
index 7e96f67c60ba..db93678860bd 100644
--- a/drivers/media/common/videobuf2/videobuf2-v4l2.c
+++ b/drivers/media/common/videobuf2/videobuf2-v4l2.c
@@ -351,18 +351,11 @@ static void set_buffer_cache_hints(struct vb2_queue *q,
 	 * we always need ->prepare() or/and ->finish() cache sync.
 	 */
 	if (q->memory == VB2_MEMORY_DMABUF) {
-		vb->need_cache_sync_on_finish = 0;
-		vb->need_cache_sync_on_prepare = 0;
+		vb->skip_cache_sync_on_finish = 1;
+		vb->skip_cache_sync_on_prepare = 1;
 		return;
 	}
 
-	/*
-	 * Cache sync/invalidation flags are set by default in order to
-	 * preserve existing behaviour for old apps/drivers.
-	 */
-	vb->need_cache_sync_on_prepare = 1;
-	vb->need_cache_sync_on_finish = 1;
-
 	if (!vb2_queue_allows_cache_hints(q)) {
 		/*
 		 * Clear buffer cache flags if queue does not support user
@@ -379,13 +372,13 @@ static void set_buffer_cache_hints(struct vb2_queue *q,
 	 * TO_DEVICE.
 	 */
 	if (q->dma_dir == DMA_TO_DEVICE)
-		vb->need_cache_sync_on_finish = 0;
+		vb->skip_cache_sync_on_finish = 1;
 
 	if (b->flags & V4L2_BUF_FLAG_NO_CACHE_INVALIDATE)
-		vb->need_cache_sync_on_finish = 0;
+		vb->skip_cache_sync_on_finish = 1;
 
 	if (b->flags & V4L2_BUF_FLAG_NO_CACHE_CLEAN)
-		vb->need_cache_sync_on_prepare = 0;
+		vb->skip_cache_sync_on_prepare = 1;
 }
 
 static int vb2_queue_or_prepare_buf(struct vb2_queue *q, struct media_device *mdev,
diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h
index d0d85be4809b..48f57a54ddb1 100644
--- a/include/media/videobuf2-core.h
+++ b/include/media/videobuf2-core.h
@@ -265,10 +265,10 @@ struct vb2_buffer {
 	 *			after the 'buf_finish' op is called.
 	 * copied_timestamp:	the timestamp of this capture buffer was copied
 	 *			from an output buffer.
-	 * need_cache_sync_on_prepare: when set buffer's ->prepare() function
-	 *			performs cache sync/invalidation.
-	 * need_cache_sync_on_finish: when set buffer's ->finish() function
-	 *			performs cache sync/invalidation.
+	 * skip_cache_sync_on_prepare: when set buffer's ->prepare() function
+	 *			skips cache sync/invalidation.
+	 * skip_cache_sync_on_finish: when set buffer's ->finish() function
+	 *			skips cache sync/invalidation.
 	 * queued_entry:	entry on the queued buffers list, which holds
 	 *			all buffers queued from userspace
 	 * done_entry:		entry on the list that stores all buffers ready
@@ -279,8 +279,8 @@ struct vb2_buffer {
 	unsigned int		synced:1;
 	unsigned int		prepared:1;
 	unsigned int		copied_timestamp:1;
-	unsigned int		need_cache_sync_on_prepare:1;
-	unsigned int		need_cache_sync_on_finish:1;
+	unsigned int		skip_cache_sync_on_prepare:1;
+	unsigned int		skip_cache_sync_on_finish:1;
 
 	struct vb2_plane	planes[VB2_MAX_PLANES];
 	struct list_head	queued_entry;
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 3/8] videobuf2: split buffer cache_hints initialisation
  2021-03-02  0:46 [PATCH 0/8] videobuf2: support new noncontiguous DMA API Sergey Senozhatsky
  2021-03-02  0:46 ` [PATCH 1/8] videobuf2: rework vb2_mem_ops API Sergey Senozhatsky
  2021-03-02  0:46 ` [PATCH 2/8] videobuf2: inverse buffer cache_hints flags Sergey Senozhatsky
@ 2021-03-02  0:46 ` Sergey Senozhatsky
  2021-03-02  0:46 ` [PATCH 4/8] videobuf2: move cache_hints handling to allocators Sergey Senozhatsky
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 21+ messages in thread
From: Sergey Senozhatsky @ 2021-03-02  0:46 UTC (permalink / raw)
  To: Tomasz Figa, Hans Verkuil
  Cc: Christoph Hellwig, Mauro Carvalho Chehab, linux-media,
	linux-kernel, Sergey Senozhatsky

V4L2 is not the perfect place to manage vb2 buffer cache hints.
It works for V4L2 users, but there are backends that use vb2 core
and don't use V4L2. Factor buffer cache hints init and call it
when we allocate vb2 buffer.

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
 .../media/common/videobuf2/videobuf2-core.c   | 22 +++++++++++++++++++
 .../media/common/videobuf2/videobuf2-v4l2.c   | 18 ---------------
 2 files changed, 22 insertions(+), 18 deletions(-)

diff --git a/drivers/media/common/videobuf2/videobuf2-core.c b/drivers/media/common/videobuf2/videobuf2-core.c
index 23e41fec9880..76210c006958 100644
--- a/drivers/media/common/videobuf2/videobuf2-core.c
+++ b/drivers/media/common/videobuf2/videobuf2-core.c
@@ -382,6 +382,27 @@ static void __setup_offsets(struct vb2_buffer *vb)
 	}
 }
 
+static void init_buffer_cache_hints(struct vb2_queue *q, struct vb2_buffer *vb)
+{
+	/*
+	 * DMA exporter should take care of cache syncs, so we can avoid
+	 * explicit ->prepare()/->finish() syncs. For other ->memory types
+	 * we always need ->prepare() or/and ->finish() cache sync.
+	 */
+	if (q->memory == VB2_MEMORY_DMABUF) {
+		vb->skip_cache_sync_on_finish = 1;
+		vb->skip_cache_sync_on_prepare = 1;
+		return;
+	}
+
+	/*
+	 * ->finish() cache sync can be avoided when queue direction is
+	 * TO_DEVICE.
+	 */
+	if (q->dma_dir == DMA_TO_DEVICE)
+		vb->skip_cache_sync_on_finish = 1;
+}
+
 /*
  * __vb2_queue_alloc() - allocate videobuf buffer structures and (for MMAP type)
  * video buffer memory for all buffers/planes on the queue and initializes the
@@ -415,6 +436,7 @@ static int __vb2_queue_alloc(struct vb2_queue *q, enum vb2_memory memory,
 		vb->index = q->num_buffers + buffer;
 		vb->type = q->type;
 		vb->memory = memory;
+		init_buffer_cache_hints(q, vb);
 		for (plane = 0; plane < num_planes; ++plane) {
 			vb->planes[plane].length = plane_sizes[plane];
 			vb->planes[plane].min_length = plane_sizes[plane];
diff --git a/drivers/media/common/videobuf2/videobuf2-v4l2.c b/drivers/media/common/videobuf2/videobuf2-v4l2.c
index db93678860bd..a02f365bbe60 100644
--- a/drivers/media/common/videobuf2/videobuf2-v4l2.c
+++ b/drivers/media/common/videobuf2/videobuf2-v4l2.c
@@ -345,17 +345,6 @@ static void set_buffer_cache_hints(struct vb2_queue *q,
 				   struct vb2_buffer *vb,
 				   struct v4l2_buffer *b)
 {
-	/*
-	 * DMA exporter should take care of cache syncs, so we can avoid
-	 * explicit ->prepare()/->finish() syncs. For other ->memory types
-	 * we always need ->prepare() or/and ->finish() cache sync.
-	 */
-	if (q->memory == VB2_MEMORY_DMABUF) {
-		vb->skip_cache_sync_on_finish = 1;
-		vb->skip_cache_sync_on_prepare = 1;
-		return;
-	}
-
 	if (!vb2_queue_allows_cache_hints(q)) {
 		/*
 		 * Clear buffer cache flags if queue does not support user
@@ -367,13 +356,6 @@ static void set_buffer_cache_hints(struct vb2_queue *q,
 		return;
 	}
 
-	/*
-	 * ->finish() cache sync can be avoided when queue direction is
-	 * TO_DEVICE.
-	 */
-	if (q->dma_dir == DMA_TO_DEVICE)
-		vb->skip_cache_sync_on_finish = 1;
-
 	if (b->flags & V4L2_BUF_FLAG_NO_CACHE_INVALIDATE)
 		vb->skip_cache_sync_on_finish = 1;
 
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 4/8] videobuf2: move cache_hints handling to allocators
  2021-03-02  0:46 [PATCH 0/8] videobuf2: support new noncontiguous DMA API Sergey Senozhatsky
                   ` (2 preceding siblings ...)
  2021-03-02  0:46 ` [PATCH 3/8] videobuf2: split buffer cache_hints initialisation Sergey Senozhatsky
@ 2021-03-02  0:46 ` Sergey Senozhatsky
  2021-03-02  0:46 ` [PATCH 5/8] videobuf2: add V4L2_FLAG_MEMORY_NON_COHERENT flag Sergey Senozhatsky
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 21+ messages in thread
From: Sergey Senozhatsky @ 2021-03-02  0:46 UTC (permalink / raw)
  To: Tomasz Figa, Hans Verkuil
  Cc: Christoph Hellwig, Mauro Carvalho Chehab, linux-media,
	linux-kernel, Sergey Senozhatsky

This moves cache hints handling from videobuf2 core down
to allocators level, because allocators do the sync/flush
caches eventually and may take better decisions. Besides,
allocators already decide whether cache sync/flush should
be done or can be skipped. This patch moves the scattered
buffer cache sync logic to one common place.

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
 drivers/media/common/videobuf2/videobuf2-core.c       | 6 ------
 drivers/media/common/videobuf2/videobuf2-dma-contig.c | 6 ++++++
 drivers/media/common/videobuf2/videobuf2-dma-sg.c     | 6 ++++++
 3 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/drivers/media/common/videobuf2/videobuf2-core.c b/drivers/media/common/videobuf2/videobuf2-core.c
index 76210c006958..55af63d54f23 100644
--- a/drivers/media/common/videobuf2/videobuf2-core.c
+++ b/drivers/media/common/videobuf2/videobuf2-core.c
@@ -328,9 +328,6 @@ static void __vb2_buf_mem_prepare(struct vb2_buffer *vb)
 		return;
 
 	vb->synced = 1;
-	if (vb->skip_cache_sync_on_prepare)
-		return;
-
 	for (plane = 0; plane < vb->num_planes; ++plane)
 		call_void_memop(vb, prepare, vb->planes[plane].mem_priv);
 }
@@ -347,9 +344,6 @@ static void __vb2_buf_mem_finish(struct vb2_buffer *vb)
 		return;
 
 	vb->synced = 0;
-	if (vb->skip_cache_sync_on_finish)
-		return;
-
 	for (plane = 0; plane < vb->num_planes; ++plane)
 		call_void_memop(vb, finish, vb->planes[plane].mem_priv);
 }
diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
index 019c3843dc6d..1e218bc440c6 100644
--- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
+++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
@@ -101,6 +101,9 @@ static void vb2_dc_prepare(void *buf_priv)
 	struct vb2_dc_buf *buf = buf_priv;
 	struct sg_table *sgt = buf->dma_sgt;
 
+	if (buf->vb->skip_cache_sync_on_prepare)
+		return;
+
 	if (!sgt)
 		return;
 
@@ -112,6 +115,9 @@ static void vb2_dc_finish(void *buf_priv)
 	struct vb2_dc_buf *buf = buf_priv;
 	struct sg_table *sgt = buf->dma_sgt;
 
+	if (buf->vb->skip_cache_sync_on_finish)
+		return;
+
 	if (!sgt)
 		return;
 
diff --git a/drivers/media/common/videobuf2/videobuf2-dma-sg.c b/drivers/media/common/videobuf2/videobuf2-dma-sg.c
index 71094cb5c5d7..cb587c5a345b 100644
--- a/drivers/media/common/videobuf2/videobuf2-dma-sg.c
+++ b/drivers/media/common/videobuf2/videobuf2-dma-sg.c
@@ -204,6 +204,9 @@ static void vb2_dma_sg_prepare(void *buf_priv)
 	struct vb2_dma_sg_buf *buf = buf_priv;
 	struct sg_table *sgt = buf->dma_sgt;
 
+	if (buf->vb->skip_cache_sync_on_prepare)
+		return;
+
 	dma_sync_sgtable_for_device(buf->dev, sgt, buf->dma_dir);
 }
 
@@ -212,6 +215,9 @@ static void vb2_dma_sg_finish(void *buf_priv)
 	struct vb2_dma_sg_buf *buf = buf_priv;
 	struct sg_table *sgt = buf->dma_sgt;
 
+	if (buf->vb->skip_cache_sync_on_finish)
+		return;
+
 	dma_sync_sgtable_for_cpu(buf->dev, sgt, buf->dma_dir);
 }
 
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 5/8] videobuf2: add V4L2_FLAG_MEMORY_NON_COHERENT flag
  2021-03-02  0:46 [PATCH 0/8] videobuf2: support new noncontiguous DMA API Sergey Senozhatsky
                   ` (3 preceding siblings ...)
  2021-03-02  0:46 ` [PATCH 4/8] videobuf2: move cache_hints handling to allocators Sergey Senozhatsky
@ 2021-03-02  0:46 ` Sergey Senozhatsky
  2021-03-22  7:02   ` Hans Verkuil
  2021-03-02  0:46 ` [PATCH 6/8] videobuf2: add queue memory coherency parameter Sergey Senozhatsky
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 21+ messages in thread
From: Sergey Senozhatsky @ 2021-03-02  0:46 UTC (permalink / raw)
  To: Tomasz Figa, Hans Verkuil
  Cc: Christoph Hellwig, Mauro Carvalho Chehab, linux-media,
	linux-kernel, Sergey Senozhatsky

By setting or clearing V4L2_FLAG_MEMORY_NON_COHERENT flag
user-space should be able to hint vb2 that either a non-coherent
(if supported) or coherent memory should be used for the buffer
allocation.

The patch set also adds a corresponding capability flag:
fill_buf_caps() reports V4L2_BUF_CAP_SUPPORTS_MMAP_CACHE_HINTS
when queue supports user-space cache management hints.

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
 .../userspace-api/media/v4l/buffer.rst        | 40 ++++++++++++++++++-
 .../media/v4l/vidioc-reqbufs.rst              |  5 ++-
 include/uapi/linux/videodev2.h                |  2 +
 3 files changed, 43 insertions(+), 4 deletions(-)

diff --git a/Documentation/userspace-api/media/v4l/buffer.rst b/Documentation/userspace-api/media/v4l/buffer.rst
index 1b0fdc160533..a39852d6174f 100644
--- a/Documentation/userspace-api/media/v4l/buffer.rst
+++ b/Documentation/userspace-api/media/v4l/buffer.rst
@@ -676,8 +676,6 @@ Buffer Flags
 
     \normalsize
 
-.. _memory-flags:
-
 enum v4l2_memory
 ================
 
@@ -701,6 +699,44 @@ enum v4l2_memory
       - 4
       - The buffer is used for :ref:`DMA shared buffer <dmabuf>` I/O.
 
+.. _memory-flags:
+
+Memory Consistency Flags
+------------------------
+
+.. raw:: latex
+
+    \small
+
+.. tabularcolumns:: |p{7.0cm}|p{2.1cm}|p{8.4cm}|
+
+.. cssclass:: longtable
+
+.. flat-table::
+    :header-rows:  0
+    :stub-columns: 0
+    :widths:       3 1 4
+
+    * .. _`V4L2-FLAG-MEMORY-NON-COHERENT`:
+
+      - ``V4L2_FLAG_MEMORY_NON_COHERENT``
+      - 0x00000001
+      - A buffer is allocated either in coherent (it will be automatically
+	coherent between the CPU and the bus) or non-coherent memory. The
+	latter can provide performance gains, for instance the CPU cache
+	sync/flush operations can be avoided if the buffer is accessed by the
+	corresponding device only and the CPU does not read/write to/from that
+	buffer. However, this requires extra care from the driver -- it must
+	guarantee memory consistency by issuing a cache flush/sync when
+	consistency is needed. If this flag is set V4L2 will attempt to
+	allocate the buffer in non-coherent memory. The flag takes effect
+	only if the buffer is used for :ref:`memory mapping <mmap>` I/O and the
+	queue reports the :ref:`V4L2_BUF_CAP_SUPPORTS_MMAP_CACHE_HINTS
+	<V4L2-BUF-CAP-SUPPORTS-MMAP-CACHE-HINTS>` capability.
+
+.. raw:: latex
+
+    \normalsize
 
 Timecodes
 =========
diff --git a/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst b/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst
index c1c88e00b106..950e7ec1aac5 100644
--- a/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst
+++ b/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst
@@ -154,8 +154,9 @@ aborting or finishing any DMA in progress, an implicit
       - This capability is set by the driver to indicate that the queue supports
         cache and memory management hints. However, it's only valid when the
         queue is used for :ref:`memory mapping <mmap>` streaming I/O. See
-        :ref:`V4L2_BUF_FLAG_NO_CACHE_INVALIDATE <V4L2-BUF-FLAG-NO-CACHE-INVALIDATE>` and
-        :ref:`V4L2_BUF_FLAG_NO_CACHE_CLEAN <V4L2-BUF-FLAG-NO-CACHE-CLEAN>`.
+        :ref:`V4L2_BUF_FLAG_NO_CACHE_INVALIDATE <V4L2-BUF-FLAG-NO-CACHE-INVALIDATE>`,
+        :ref:`V4L2_BUF_FLAG_NO_CACHE_CLEAN <V4L2-BUF-FLAG-NO-CACHE-CLEAN>` and
+        :ref:`V4L2_FLAG_MEMORY_NON_COHERENT <V4L2-FLAG-MEMORY-NON-COHERENT>`.
 
 Return Value
 ============
diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
index 79dbde3bcf8d..b1d4171fe50b 100644
--- a/include/uapi/linux/videodev2.h
+++ b/include/uapi/linux/videodev2.h
@@ -954,6 +954,8 @@ struct v4l2_requestbuffers {
 	__u32			reserved[1];
 };
 
+#define V4L2_FLAG_MEMORY_NON_COHERENT			(1 << 0)
+
 /* capabilities for struct v4l2_requestbuffers and v4l2_create_buffers */
 #define V4L2_BUF_CAP_SUPPORTS_MMAP			(1 << 0)
 #define V4L2_BUF_CAP_SUPPORTS_USERPTR			(1 << 1)
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 6/8] videobuf2: add queue memory coherency parameter
  2021-03-02  0:46 [PATCH 0/8] videobuf2: support new noncontiguous DMA API Sergey Senozhatsky
                   ` (4 preceding siblings ...)
  2021-03-02  0:46 ` [PATCH 5/8] videobuf2: add V4L2_FLAG_MEMORY_NON_COHERENT flag Sergey Senozhatsky
@ 2021-03-02  0:46 ` Sergey Senozhatsky
  2021-03-02  0:46 ` [PATCH 7/8] videobuf2: handle V4L2_FLAG_MEMORY_NON_COHERENT flag Sergey Senozhatsky
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 21+ messages in thread
From: Sergey Senozhatsky @ 2021-03-02  0:46 UTC (permalink / raw)
  To: Tomasz Figa, Hans Verkuil
  Cc: Christoph Hellwig, Mauro Carvalho Chehab, linux-media,
	linux-kernel, Sergey Senozhatsky

Preparations for future V4L2_FLAG_MEMORY_NON_COHERENT support.

Extend vb2_core_reqbufs() parameters list to accept requests'
->flags, which will be used for memory coherency configuration.

An attempt to allocate a buffer with coherency requirements
which don't match queue's consistency model will fail.

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
 .../media/common/videobuf2/videobuf2-core.c   | 40 ++++++++++++++++---
 .../media/common/videobuf2/videobuf2-v4l2.c   |  5 ++-
 drivers/media/dvb-core/dvb_vb2.c              |  2 +-
 include/media/videobuf2-core.h                |  8 +++-
 4 files changed, 44 insertions(+), 11 deletions(-)

diff --git a/drivers/media/common/videobuf2/videobuf2-core.c b/drivers/media/common/videobuf2/videobuf2-core.c
index 55af63d54f23..7040b7f47133 100644
--- a/drivers/media/common/videobuf2/videobuf2-core.c
+++ b/drivers/media/common/videobuf2/videobuf2-core.c
@@ -738,11 +738,33 @@ int vb2_verify_memory_type(struct vb2_queue *q,
 }
 EXPORT_SYMBOL(vb2_verify_memory_type);
 
+static void set_queue_coherency(struct vb2_queue *q, bool coherent_mem)
+{
+	q->non_coherent_mem = 0;
+
+	if (!vb2_queue_allows_cache_hints(q))
+		return;
+	if (!coherent_mem)
+		q->non_coherent_mem = 1;
+}
+
+static bool verify_coherency_flags(struct vb2_queue *q, bool coherent_mem)
+{
+	bool queue_is_coherent = !q->non_coherent_mem;
+
+	if (coherent_mem != queue_is_coherent) {
+		dprintk(q, 1, "memory coherency model mismatch\n");
+		return false;
+	}
+	return true;
+}
+
 int vb2_core_reqbufs(struct vb2_queue *q, enum vb2_memory memory,
-		     unsigned int *count)
+		     unsigned int flags, unsigned int *count)
 {
 	unsigned int num_buffers, allocated_buffers, num_planes = 0;
 	unsigned plane_sizes[VB2_MAX_PLANES] = { };
+	bool coherent_mem = true;
 	unsigned int i;
 	int ret;
 
@@ -757,7 +779,8 @@ int vb2_core_reqbufs(struct vb2_queue *q, enum vb2_memory memory,
 	}
 
 	if (*count == 0 || q->num_buffers != 0 ||
-	    (q->memory != VB2_MEMORY_UNKNOWN && q->memory != memory)) {
+	    (q->memory != VB2_MEMORY_UNKNOWN && q->memory != memory) ||
+	    !verify_coherency_flags(q, coherent_mem)) {
 		/*
 		 * We already have buffers allocated, so first check if they
 		 * are not in use and can be freed.
@@ -794,6 +817,7 @@ int vb2_core_reqbufs(struct vb2_queue *q, enum vb2_memory memory,
 	num_buffers = min_t(unsigned int, num_buffers, VB2_MAX_FRAME);
 	memset(q->alloc_devs, 0, sizeof(q->alloc_devs));
 	q->memory = memory;
+	set_queue_coherency(q, coherent_mem);
 
 	/*
 	 * Ask the driver how many buffers and planes per buffer it requires.
@@ -878,12 +902,13 @@ int vb2_core_reqbufs(struct vb2_queue *q, enum vb2_memory memory,
 EXPORT_SYMBOL_GPL(vb2_core_reqbufs);
 
 int vb2_core_create_bufs(struct vb2_queue *q, enum vb2_memory memory,
-			 unsigned int *count,
+			 unsigned int flags, unsigned int *count,
 			 unsigned int requested_planes,
 			 const unsigned int requested_sizes[])
 {
 	unsigned int num_planes = 0, num_buffers, allocated_buffers;
 	unsigned plane_sizes[VB2_MAX_PLANES] = { };
+	bool coherent_mem = true;
 	int ret;
 
 	if (q->num_buffers == VB2_MAX_FRAME) {
@@ -899,11 +924,14 @@ int vb2_core_create_bufs(struct vb2_queue *q, enum vb2_memory memory,
 		memset(q->alloc_devs, 0, sizeof(q->alloc_devs));
 		q->memory = memory;
 		q->waiting_for_buffers = !q->is_output;
+		set_queue_coherency(q, coherent_mem);
 	} else {
 		if (q->memory != memory) {
 			dprintk(q, 1, "memory model mismatch\n");
 			return -EINVAL;
 		}
+		if (!verify_coherency_flags(q, coherent_mem))
+			return -EINVAL;
 	}
 
 	num_buffers = min(*count, VB2_MAX_FRAME - q->num_buffers);
@@ -2576,7 +2604,7 @@ static int __vb2_init_fileio(struct vb2_queue *q, int read)
 	fileio->memory = VB2_MEMORY_MMAP;
 	fileio->type = q->type;
 	q->fileio = fileio;
-	ret = vb2_core_reqbufs(q, fileio->memory, &fileio->count);
+	ret = vb2_core_reqbufs(q, fileio->memory, 0, &fileio->count);
 	if (ret)
 		goto err_kfree;
 
@@ -2633,7 +2661,7 @@ static int __vb2_init_fileio(struct vb2_queue *q, int read)
 
 err_reqbufs:
 	fileio->count = 0;
-	vb2_core_reqbufs(q, fileio->memory, &fileio->count);
+	vb2_core_reqbufs(q, fileio->memory, 0, &fileio->count);
 
 err_kfree:
 	q->fileio = NULL;
@@ -2653,7 +2681,7 @@ static int __vb2_cleanup_fileio(struct vb2_queue *q)
 		vb2_core_streamoff(q, q->type);
 		q->fileio = NULL;
 		fileio->count = 0;
-		vb2_core_reqbufs(q, fileio->memory, &fileio->count);
+		vb2_core_reqbufs(q, fileio->memory, 0, &fileio->count);
 		kfree(fileio);
 		dprintk(q, 3, "file io emulator closed\n");
 	}
diff --git a/drivers/media/common/videobuf2/videobuf2-v4l2.c b/drivers/media/common/videobuf2/videobuf2-v4l2.c
index a02f365bbe60..1166d5a9291a 100644
--- a/drivers/media/common/videobuf2/videobuf2-v4l2.c
+++ b/drivers/media/common/videobuf2/videobuf2-v4l2.c
@@ -697,7 +697,7 @@ int vb2_reqbufs(struct vb2_queue *q, struct v4l2_requestbuffers *req)
 	int ret = vb2_verify_memory_type(q, req->memory, req->type);
 
 	fill_buf_caps(q, &req->capabilities);
-	return ret ? ret : vb2_core_reqbufs(q, req->memory, &req->count);
+	return ret ? ret : vb2_core_reqbufs(q, req->memory, 0, &req->count);
 }
 EXPORT_SYMBOL_GPL(vb2_reqbufs);
 
@@ -772,6 +772,7 @@ int vb2_create_bufs(struct vb2_queue *q, struct v4l2_create_buffers *create)
 		if (requested_sizes[i] == 0)
 			return -EINVAL;
 	return ret ? ret : vb2_core_create_bufs(q, create->memory,
+						0,
 						&create->count,
 						requested_planes,
 						requested_sizes);
@@ -960,7 +961,7 @@ int vb2_ioctl_reqbufs(struct file *file, void *priv,
 		return res;
 	if (vb2_queue_is_busy(vdev, file))
 		return -EBUSY;
-	res = vb2_core_reqbufs(vdev->queue, p->memory, &p->count);
+	res = vb2_core_reqbufs(vdev->queue, p->memory, 0, &p->count);
 	/* If count == 0, then the owner has released all buffers and he
 	   is no longer owner of the queue. Otherwise we have a new owner. */
 	if (res == 0)
diff --git a/drivers/media/dvb-core/dvb_vb2.c b/drivers/media/dvb-core/dvb_vb2.c
index 6974f1731529..959d110407a4 100644
--- a/drivers/media/dvb-core/dvb_vb2.c
+++ b/drivers/media/dvb-core/dvb_vb2.c
@@ -342,7 +342,7 @@ int dvb_vb2_reqbufs(struct dvb_vb2_ctx *ctx, struct dmx_requestbuffers *req)
 
 	ctx->buf_siz = req->size;
 	ctx->buf_cnt = req->count;
-	ret = vb2_core_reqbufs(&ctx->vb_q, VB2_MEMORY_MMAP, &req->count);
+	ret = vb2_core_reqbufs(&ctx->vb_q, VB2_MEMORY_MMAP, 0, &req->count);
 	if (ret) {
 		ctx->state = DVB_VB2_STATE_NONE;
 		dprintk(1, "[%s] count=%d size=%d errno=%d\n", ctx->name,
diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h
index 48f57a54ddb1..2423ddf8361e 100644
--- a/include/media/videobuf2-core.h
+++ b/include/media/videobuf2-core.h
@@ -581,6 +581,7 @@ struct vb2_queue {
 	unsigned int			uses_qbuf:1;
 	unsigned int			uses_requests:1;
 	unsigned int			allow_cache_hints:1;
+	unsigned int			non_coherent_mem:1;
 
 	struct mutex			*lock;
 	void				*owner;
@@ -746,6 +747,8 @@ void vb2_core_querybuf(struct vb2_queue *q, unsigned int index, void *pb);
  * vb2_core_reqbufs() - Initiate streaming.
  * @q:		pointer to &struct vb2_queue with videobuf2 queue.
  * @memory:	memory type, as defined by &enum vb2_memory.
+ * @flags:	auxiliary queue/buffer management flags. Currently, the only
+ *		used flag is %V4L2_FLAG_MEMORY_NON_CONSISTENT.
  * @count:	requested buffer count.
  *
  * Videobuf2 core helper to implement VIDIOC_REQBUF() operation. It is called
@@ -770,12 +773,13 @@ void vb2_core_querybuf(struct vb2_queue *q, unsigned int index, void *pb);
  * Return: returns zero on success; an error code otherwise.
  */
 int vb2_core_reqbufs(struct vb2_queue *q, enum vb2_memory memory,
-		    unsigned int *count);
+		     unsigned int flags, unsigned int *count);
 
 /**
  * vb2_core_create_bufs() - Allocate buffers and any required auxiliary structs
  * @q: pointer to &struct vb2_queue with videobuf2 queue.
  * @memory: memory type, as defined by &enum vb2_memory.
+ * @flags: auxiliary queue/buffer management flags.
  * @count: requested buffer count.
  * @requested_planes: number of planes requested.
  * @requested_sizes: array with the size of the planes.
@@ -793,7 +797,7 @@ int vb2_core_reqbufs(struct vb2_queue *q, enum vb2_memory memory,
  * Return: returns zero on success; an error code otherwise.
  */
 int vb2_core_create_bufs(struct vb2_queue *q, enum vb2_memory memory,
-			 unsigned int *count,
+			 unsigned int flags, unsigned int *count,
 			 unsigned int requested_planes,
 			 const unsigned int requested_sizes[]);
 
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 7/8] videobuf2: handle V4L2_FLAG_MEMORY_NON_COHERENT flag
  2021-03-02  0:46 [PATCH 0/8] videobuf2: support new noncontiguous DMA API Sergey Senozhatsky
                   ` (5 preceding siblings ...)
  2021-03-02  0:46 ` [PATCH 6/8] videobuf2: add queue memory coherency parameter Sergey Senozhatsky
@ 2021-03-02  0:46 ` Sergey Senozhatsky
  2021-03-22  7:16   ` Hans Verkuil
  2021-03-22  7:18   ` Hans Verkuil
  2021-03-02  0:46 ` [PATCH 8/8] videobuf2: handle non-contiguous DMA allocations Sergey Senozhatsky
                   ` (2 subsequent siblings)
  9 siblings, 2 replies; 21+ messages in thread
From: Sergey Senozhatsky @ 2021-03-02  0:46 UTC (permalink / raw)
  To: Tomasz Figa, Hans Verkuil
  Cc: Christoph Hellwig, Mauro Carvalho Chehab, linux-media,
	linux-kernel, Sergey Senozhatsky

This patch lets user-space to request a non-coherent memory
allocation during CREATE_BUFS and REQBUFS ioctl calls.

= CREATE_BUFS

  struct v4l2_create_buffers has seven 4-byte reserved areas,
  so reserved[0] is renamed to ->flags. The struct, thus, now
  has six reserved 4-byte regions.

= CREATE_BUFS32

  struct v4l2_create_buffers32 has seven 4-byte reserved areas,
  so reserved[0] is renamed to ->flags. The struct, thus, now
  has six reserved 4-byte regions.

= REQBUFS

 We use one bit of a ->reserved[1] member of struct v4l2_requestbuffers,
 which is now renamed to ->flags. Unlike v4l2_create_buffers, struct
 v4l2_requestbuffers does not have enough reserved room. Therefore for
 backward compatibility  ->reserved and ->flags were put into anonymous
 union.

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
 .../media/v4l/vidioc-create-bufs.rst          |  7 ++++++-
 .../media/v4l/vidioc-reqbufs.rst              | 11 +++++++++--
 .../media/common/videobuf2/videobuf2-core.c   |  6 ++++++
 .../media/common/videobuf2/videobuf2-v4l2.c   | 19 ++++++++++++++++---
 drivers/media/v4l2-core/v4l2-compat-ioctl32.c |  9 ++++++++-
 drivers/media/v4l2-core/v4l2-ioctl.c          |  5 +----
 include/uapi/linux/videodev2.h                | 11 +++++++++--
 7 files changed, 55 insertions(+), 13 deletions(-)

diff --git a/Documentation/userspace-api/media/v4l/vidioc-create-bufs.rst b/Documentation/userspace-api/media/v4l/vidioc-create-bufs.rst
index b06e5b528e11..132c8b612a94 100644
--- a/Documentation/userspace-api/media/v4l/vidioc-create-bufs.rst
+++ b/Documentation/userspace-api/media/v4l/vidioc-create-bufs.rst
@@ -113,7 +113,12 @@ than the number requested.
 	``V4L2_MEMORY_MMAP`` and ``format.type`` to the buffer type.
 
     * - __u32
-      - ``reserved``\ [7]
+      - ``flags``
+      - Specifies additional buffer management attributes.
+	See :ref:`memory-flags`.
+
+    * - __u32
+      - ``reserved``\ [6]
       - A place holder for future extensions. Drivers and applications
 	must set the array to zero.
 
diff --git a/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst b/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst
index 950e7ec1aac5..80ea48acea84 100644
--- a/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst
+++ b/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst
@@ -104,10 +104,17 @@ aborting or finishing any DMA in progress, an implicit
 	``V4L2_MEMORY_MMAP`` and ``type`` set to the buffer type. This will
 	free any previously allocated buffers, so this is typically something
 	that will be done at the start of the application.
+    * - union {
+      - (anonymous)
+    * - __u32
+      - ``flags``
+      - Specifies additional buffer management attributes.
+	See :ref:`memory-flags`.
     * - __u32
       - ``reserved``\ [1]
-      - A place holder for future extensions. Drivers and applications
-	must set the array to zero.
+      - Kept for backwards compatibility. Use ``flags`` instead.
+    * - }
+      -
 
 .. tabularcolumns:: |p{6.1cm}|p{2.2cm}|p{8.7cm}|
 
diff --git a/drivers/media/common/videobuf2/videobuf2-core.c b/drivers/media/common/videobuf2/videobuf2-core.c
index 7040b7f47133..5906a48e7757 100644
--- a/drivers/media/common/videobuf2/videobuf2-core.c
+++ b/drivers/media/common/videobuf2/videobuf2-core.c
@@ -768,6 +768,9 @@ int vb2_core_reqbufs(struct vb2_queue *q, enum vb2_memory memory,
 	unsigned int i;
 	int ret;
 
+	if (flags & V4L2_FLAG_MEMORY_NON_COHERENT)
+		coherent_mem = false;
+
 	if (q->streaming) {
 		dprintk(q, 1, "streaming active\n");
 		return -EBUSY;
@@ -911,6 +914,9 @@ int vb2_core_create_bufs(struct vb2_queue *q, enum vb2_memory memory,
 	bool coherent_mem = true;
 	int ret;
 
+	if (flags & V4L2_FLAG_MEMORY_NON_COHERENT)
+		coherent_mem = false;
+
 	if (q->num_buffers == VB2_MAX_FRAME) {
 		dprintk(q, 1, "maximum number of buffers already allocated\n");
 		return -ENOBUFS;
diff --git a/drivers/media/common/videobuf2/videobuf2-v4l2.c b/drivers/media/common/videobuf2/videobuf2-v4l2.c
index 1166d5a9291a..f6a8dcc1b5c6 100644
--- a/drivers/media/common/videobuf2/videobuf2-v4l2.c
+++ b/drivers/media/common/videobuf2/videobuf2-v4l2.c
@@ -692,12 +692,22 @@ static void fill_buf_caps(struct vb2_queue *q, u32 *caps)
 #endif
 }
 
+static void validate_coherency_flags(struct vb2_queue *q,
+				     int memory,
+				     unsigned int *flags)
+{
+	if (!q->allow_cache_hints || memory != V4L2_MEMORY_MMAP)
+		*flags &= ~V4L2_FLAG_MEMORY_NON_COHERENT;
+}
+
 int vb2_reqbufs(struct vb2_queue *q, struct v4l2_requestbuffers *req)
 {
 	int ret = vb2_verify_memory_type(q, req->memory, req->type);
 
 	fill_buf_caps(q, &req->capabilities);
-	return ret ? ret : vb2_core_reqbufs(q, req->memory, 0, &req->count);
+	validate_coherency_flags(q, req->memory, &req->flags);
+	return ret ? ret : vb2_core_reqbufs(q, req->memory,
+					    req->flags, &req->count);
 }
 EXPORT_SYMBOL_GPL(vb2_reqbufs);
 
@@ -729,6 +739,7 @@ int vb2_create_bufs(struct vb2_queue *q, struct v4l2_create_buffers *create)
 	unsigned i;
 
 	fill_buf_caps(q, &create->capabilities);
+	validate_coherency_flags(q, create->memory, &create->flags);
 	create->index = q->num_buffers;
 	if (create->count == 0)
 		return ret != -EBUSY ? ret : 0;
@@ -772,7 +783,7 @@ int vb2_create_bufs(struct vb2_queue *q, struct v4l2_create_buffers *create)
 		if (requested_sizes[i] == 0)
 			return -EINVAL;
 	return ret ? ret : vb2_core_create_bufs(q, create->memory,
-						0,
+						create->flags,
 						&create->count,
 						requested_planes,
 						requested_sizes);
@@ -957,11 +968,12 @@ int vb2_ioctl_reqbufs(struct file *file, void *priv,
 	int res = vb2_verify_memory_type(vdev->queue, p->memory, p->type);
 
 	fill_buf_caps(vdev->queue, &p->capabilities);
+	validate_coherency_flags(vdev->queue, p->memory, &p->flags);
 	if (res)
 		return res;
 	if (vb2_queue_is_busy(vdev, file))
 		return -EBUSY;
-	res = vb2_core_reqbufs(vdev->queue, p->memory, 0, &p->count);
+	res = vb2_core_reqbufs(vdev->queue, p->memory, p->flags, &p->count);
 	/* If count == 0, then the owner has released all buffers and he
 	   is no longer owner of the queue. Otherwise we have a new owner. */
 	if (res == 0)
@@ -979,6 +991,7 @@ int vb2_ioctl_create_bufs(struct file *file, void *priv,
 
 	p->index = vdev->queue->num_buffers;
 	fill_buf_caps(vdev->queue, &p->capabilities);
+	validate_coherency_flags(vdev->queue, p->memory, &p->flags);
 	/*
 	 * If count == 0, then just check if memory and type are valid.
 	 * Any -EBUSY result from vb2_verify_memory_type can be mapped to 0.
diff --git a/drivers/media/v4l2-core/v4l2-compat-ioctl32.c b/drivers/media/v4l2-core/v4l2-compat-ioctl32.c
index 0ca75f6784c5..1aa9ca3b6ca4 100644
--- a/drivers/media/v4l2-core/v4l2-compat-ioctl32.c
+++ b/drivers/media/v4l2-core/v4l2-compat-ioctl32.c
@@ -126,6 +126,9 @@ struct v4l2_format32 {
  * @memory:	buffer memory type
  * @format:	frame format, for which buffers are requested
  * @capabilities: capabilities of this buffer type.
+ * @flags:	additional buffer management attributes (ignored unless the
+ *		queue has V4L2_BUF_CAP_SUPPORTS_MMAP_CACHE_HINTS capability and
+ *		configured for MMAP streaming I/O).
  * @reserved:	future extensions
  */
 struct v4l2_create_buffers32 {
@@ -134,7 +137,8 @@ struct v4l2_create_buffers32 {
 	__u32			memory;	/* enum v4l2_memory */
 	struct v4l2_format32	format;
 	__u32			capabilities;
-	__u32			reserved[7];
+	__u32			flags;
+	__u32			reserved[6];
 };
 
 static int get_v4l2_format32(struct v4l2_format *p64,
@@ -182,6 +186,8 @@ static int get_v4l2_create32(struct v4l2_create_buffers *p64,
 	if (copy_from_user(p64, p32,
 			   offsetof(struct v4l2_create_buffers32, format)))
 		return -EFAULT;
+	if (copy_from_user(&p64->flags, &p32->flags, sizeof(p32->flags)))
+		return -EFAULT;
 	return get_v4l2_format32(&p64->format, &p32->format);
 }
 
@@ -227,6 +233,7 @@ static int put_v4l2_create32(struct v4l2_create_buffers *p64,
 	if (copy_to_user(p32, p64,
 			 offsetof(struct v4l2_create_buffers32, format)) ||
 	    put_user(p64->capabilities, &p32->capabilities) ||
+	    put_user(p64->flags, &p32->flags) ||
 	    copy_to_user(p32->reserved, p64->reserved, sizeof(p64->reserved)))
 		return -EFAULT;
 	return put_v4l2_format32(&p64->format, &p32->format);
diff --git a/drivers/media/v4l2-core/v4l2-ioctl.c b/drivers/media/v4l2-core/v4l2-ioctl.c
index 31d1342e61e8..f8d38a3cd1e0 100644
--- a/drivers/media/v4l2-core/v4l2-ioctl.c
+++ b/drivers/media/v4l2-core/v4l2-ioctl.c
@@ -2045,9 +2045,6 @@ static int v4l_reqbufs(const struct v4l2_ioctl_ops *ops,
 
 	if (ret)
 		return ret;
-
-	CLEAR_AFTER_FIELD(p, capabilities);
-
 	return ops->vidioc_reqbufs(file, fh, p);
 }
 
@@ -2087,7 +2084,7 @@ static int v4l_create_bufs(const struct v4l2_ioctl_ops *ops,
 	if (ret)
 		return ret;
 
-	CLEAR_AFTER_FIELD(create, capabilities);
+	CLEAR_AFTER_FIELD(create, flags);
 
 	v4l_sanitize_format(&create->format);
 
diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
index b1d4171fe50b..85d2681e73b6 100644
--- a/include/uapi/linux/videodev2.h
+++ b/include/uapi/linux/videodev2.h
@@ -951,7 +951,10 @@ struct v4l2_requestbuffers {
 	__u32			type;		/* enum v4l2_buf_type */
 	__u32			memory;		/* enum v4l2_memory */
 	__u32			capabilities;
-	__u32			reserved[1];
+	union {
+		__u32		flags;
+		__u32		reserved[1];
+	};
 };
 
 #define V4L2_FLAG_MEMORY_NON_COHERENT			(1 << 0)
@@ -2473,6 +2476,9 @@ struct v4l2_dbg_chip_info {
  * @memory:	enum v4l2_memory; buffer memory type
  * @format:	frame format, for which buffers are requested
  * @capabilities: capabilities of this buffer type.
+ * @flags:	additional buffer management attributes (ignored unless the
+ *		queue has V4L2_BUF_CAP_SUPPORTS_MMAP_CACHE_HINTS capability
+ *		and configured for MMAP streaming I/O).
  * @reserved:	future extensions
  */
 struct v4l2_create_buffers {
@@ -2481,7 +2487,8 @@ struct v4l2_create_buffers {
 	__u32			memory;
 	struct v4l2_format	format;
 	__u32			capabilities;
-	__u32			reserved[7];
+	__u32			flags;
+	__u32			reserved[6];
 };
 
 /*
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 8/8] videobuf2: handle non-contiguous DMA allocations
  2021-03-02  0:46 [PATCH 0/8] videobuf2: support new noncontiguous DMA API Sergey Senozhatsky
                   ` (6 preceding siblings ...)
  2021-03-02  0:46 ` [PATCH 7/8] videobuf2: handle V4L2_FLAG_MEMORY_NON_COHERENT flag Sergey Senozhatsky
@ 2021-03-02  0:46 ` Sergey Senozhatsky
  2021-03-22  7:40   ` Hans Verkuil
                     ` (2 more replies)
  2021-03-02  0:49 ` [PATCH] v4l-compliance: re-introduce NON_COHERENT and cache hints tests Sergey Senozhatsky
  2021-03-24 12:09 ` [PATCH 0/8] videobuf2: support new noncontiguous DMA API Tomasz Figa
  9 siblings, 3 replies; 21+ messages in thread
From: Sergey Senozhatsky @ 2021-03-02  0:46 UTC (permalink / raw)
  To: Tomasz Figa, Hans Verkuil
  Cc: Christoph Hellwig, Mauro Carvalho Chehab, linux-media,
	linux-kernel, Sergey Senozhatsky

This adds support for new noncontiguous DMA API, which
requires allocators to have two execution branches: one
for the current API, and one for the new one.

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
[hch: untested conversion to the ne API]
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 .../common/videobuf2/videobuf2-dma-contig.c   | 141 +++++++++++++++---
 1 file changed, 117 insertions(+), 24 deletions(-)

diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
index 1e218bc440c6..d6a9f7b682f3 100644
--- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
+++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
@@ -17,6 +17,7 @@
 #include <linux/sched.h>
 #include <linux/slab.h>
 #include <linux/dma-mapping.h>
+#include <linux/highmem.h>
 
 #include <media/videobuf2-v4l2.h>
 #include <media/videobuf2-dma-contig.h>
@@ -42,8 +43,14 @@ struct vb2_dc_buf {
 	struct dma_buf_attachment	*db_attach;
 
 	struct vb2_buffer		*vb;
+	unsigned int			non_coherent_mem:1;
 };
 
+static bool vb2_dc_is_coherent(struct vb2_dc_buf *buf)
+{
+	return !buf->non_coherent_mem;
+}
+
 /*********************************************/
 /*        scatterlist table functions        */
 /*********************************************/
@@ -78,12 +85,21 @@ static void *vb2_dc_cookie(struct vb2_buffer *vb, void *buf_priv)
 static void *vb2_dc_vaddr(struct vb2_buffer *vb, void *buf_priv)
 {
 	struct vb2_dc_buf *buf = buf_priv;
-	struct dma_buf_map map;
-	int ret;
 
-	if (!buf->vaddr && buf->db_attach) {
-		ret = dma_buf_vmap(buf->db_attach->dmabuf, &map);
-		buf->vaddr = ret ? NULL : map.vaddr;
+	if (buf->vaddr)
+		return buf->vaddr;
+
+	if (buf->db_attach) {
+		struct dma_buf_map map;
+
+		if (!dma_buf_vmap(buf->db_attach->dmabuf, &map))
+			buf->vaddr = map.vaddr;
+	}
+
+	if (!vb2_dc_is_coherent(buf)) {
+		buf->vaddr = dma_vmap_noncontiguous(buf->dev,
+						    buf->size,
+						    buf->dma_sgt);
 	}
 
 	return buf->vaddr;
@@ -101,13 +117,26 @@ static void vb2_dc_prepare(void *buf_priv)
 	struct vb2_dc_buf *buf = buf_priv;
 	struct sg_table *sgt = buf->dma_sgt;
 
+	/* This takes care of DMABUF and user-enforced cache sync hint */
 	if (buf->vb->skip_cache_sync_on_prepare)
 		return;
 
+	/*
+	 * Coherent MMAP buffers do not need to be synced, unlike coherent
+	 * USERPTR and non-coherent MMAP buffers.
+	 */
+	if (buf->vb->memory == V4L2_MEMORY_MMAP && vb2_dc_is_coherent(buf))
+		return;
+
 	if (!sgt)
 		return;
 
+	/* For both USERPTR and non-coherent MMAP */
 	dma_sync_sgtable_for_device(buf->dev, sgt, buf->dma_dir);
+
+	/* Non-coherrent MMAP only */
+	if (!vb2_dc_is_coherent(buf) && buf->vaddr)
+		flush_kernel_vmap_range(buf->vaddr, buf->size);
 }
 
 static void vb2_dc_finish(void *buf_priv)
@@ -115,19 +144,46 @@ static void vb2_dc_finish(void *buf_priv)
 	struct vb2_dc_buf *buf = buf_priv;
 	struct sg_table *sgt = buf->dma_sgt;
 
+	/* This takes care of DMABUF and user-enforced cache sync hint */
 	if (buf->vb->skip_cache_sync_on_finish)
 		return;
 
+	/*
+	 * Coherent MMAP buffers do not need to be synced, unlike coherent
+	 * USERPTR and non-coherent MMAP buffers.
+	 */
+	if (buf->vb->memory == V4L2_MEMORY_MMAP && vb2_dc_is_coherent(buf))
+		return;
+
 	if (!sgt)
 		return;
 
+	/* For both USERPTR and non-coherent MMAP */
 	dma_sync_sgtable_for_cpu(buf->dev, sgt, buf->dma_dir);
+
+	/* Non-coherrent MMAP only */
+	if (!vb2_dc_is_coherent(buf) && buf->vaddr)
+		invalidate_kernel_vmap_range(buf->vaddr, buf->size);
 }
 
 /*********************************************/
 /*        callbacks for MMAP buffers         */
 /*********************************************/
 
+static void __vb2_dc_put(struct vb2_dc_buf *buf)
+{
+	if (vb2_dc_is_coherent(buf)) {
+		dma_free_attrs(buf->dev, buf->size, buf->cookie,
+			       buf->dma_addr, buf->attrs);
+		return;
+	}
+
+	if (buf->vaddr)
+		dma_vunmap_noncontiguous(buf->dev, buf->vaddr);
+	dma_free_noncontiguous(buf->dev, buf->size,
+			       buf->dma_sgt, buf->dma_addr);
+}
+
 static void vb2_dc_put(void *buf_priv)
 {
 	struct vb2_dc_buf *buf = buf_priv;
@@ -139,17 +195,47 @@ static void vb2_dc_put(void *buf_priv)
 		sg_free_table(buf->sgt_base);
 		kfree(buf->sgt_base);
 	}
-	dma_free_attrs(buf->dev, buf->size, buf->cookie, buf->dma_addr,
-		       buf->attrs);
+	__vb2_dc_put(buf);
 	put_device(buf->dev);
 	kfree(buf);
 }
 
+static int vb2_dc_alloc_coherent(struct vb2_dc_buf *buf)
+{
+	struct vb2_queue *q = buf->vb->vb2_queue;
+
+	buf->cookie = dma_alloc_attrs(buf->dev,
+				      buf->size,
+				      &buf->dma_addr,
+				      GFP_KERNEL | q->gfp_flags,
+				      buf->attrs);
+	if (!buf->cookie)
+		return -ENOMEM;
+	if ((q->dma_attrs & DMA_ATTR_NO_KERNEL_MAPPING) == 0)
+		buf->vaddr = buf->cookie;
+	return 0;
+}
+
+static int vb2_dc_alloc_non_coherent(struct vb2_dc_buf *buf)
+{
+	struct vb2_queue *q = buf->vb->vb2_queue;
+
+	buf->dma_sgt = dma_alloc_noncontiguous(buf->dev,
+					       buf->size,
+					       buf->dma_dir,
+					       GFP_KERNEL | q->gfp_flags,
+					       buf->attrs);
+	if (!buf->dma_sgt)
+		return -ENOMEM;
+	return 0;
+}
+
 static void *vb2_dc_alloc(struct vb2_buffer *vb,
 			  struct device *dev,
 			  unsigned long size)
 {
 	struct vb2_dc_buf *buf;
+	int ret;
 
 	if (WARN_ON(!dev))
 		return ERR_PTR(-EINVAL);
@@ -159,27 +245,28 @@ static void *vb2_dc_alloc(struct vb2_buffer *vb,
 		return ERR_PTR(-ENOMEM);
 
 	buf->attrs = vb->vb2_queue->dma_attrs;
-	buf->cookie = dma_alloc_attrs(dev, size, &buf->dma_addr,
-				      GFP_KERNEL | vb->vb2_queue->gfp_flags,
-				      buf->attrs);
-	if (!buf->cookie) {
-		dev_err(dev, "dma_alloc_coherent of size %ld failed\n", size);
-		kfree(buf);
-		return ERR_PTR(-ENOMEM);
-	}
-
-	if ((buf->attrs & DMA_ATTR_NO_KERNEL_MAPPING) == 0)
-		buf->vaddr = buf->cookie;
+	buf->dma_dir = vb->vb2_queue->dma_dir;
+	buf->vb = vb;
+	buf->non_coherent_mem = vb->vb2_queue->non_coherent_mem;
 
+	buf->size = size;
 	/* Prevent the device from being released while the buffer is used */
 	buf->dev = get_device(dev);
-	buf->size = size;
-	buf->dma_dir = vb->vb2_queue->dma_dir;
+
+	if (vb2_dc_is_coherent(buf))
+		ret = vb2_dc_alloc_coherent(buf);
+	else
+		ret = vb2_dc_alloc_non_coherent(buf);
+
+	if (ret) {
+		dev_err(dev, "dma alloc of size %ld failed\n", size);
+		kfree(buf);
+		return ERR_PTR(-ENOMEM);
+	}
 
 	buf->handler.refcount = &buf->refcount;
 	buf->handler.put = vb2_dc_put;
 	buf->handler.arg = buf;
-	buf->vb = vb;
 
 	refcount_set(&buf->refcount, 1);
 
@@ -196,9 +283,12 @@ static int vb2_dc_mmap(void *buf_priv, struct vm_area_struct *vma)
 		return -EINVAL;
 	}
 
-	ret = dma_mmap_attrs(buf->dev, vma, buf->cookie,
-		buf->dma_addr, buf->size, buf->attrs);
-
+	if (vb2_dc_is_coherent(buf))
+		ret = dma_mmap_attrs(buf->dev, vma, buf->cookie, buf->dma_addr,
+				     buf->size, buf->attrs);
+	else
+		ret = dma_mmap_noncontiguous(buf->dev, vma, buf->size,
+					     buf->dma_sgt);
 	if (ret) {
 		pr_err("Remapping memory failed, error: %d\n", ret);
 		return ret;
@@ -390,6 +480,9 @@ static struct sg_table *vb2_dc_get_base_sgt(struct vb2_dc_buf *buf)
 	int ret;
 	struct sg_table *sgt;
 
+	if (!vb2_dc_is_coherent(buf))
+		return buf->dma_sgt;
+
 	sgt = kmalloc(sizeof(*sgt), GFP_KERNEL);
 	if (!sgt) {
 		dev_err(buf->dev, "failed to alloc sg table\n");
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH] v4l-compliance: re-introduce NON_COHERENT and cache hints tests
  2021-03-02  0:46 [PATCH 0/8] videobuf2: support new noncontiguous DMA API Sergey Senozhatsky
                   ` (7 preceding siblings ...)
  2021-03-02  0:46 ` [PATCH 8/8] videobuf2: handle non-contiguous DMA allocations Sergey Senozhatsky
@ 2021-03-02  0:49 ` Sergey Senozhatsky
  2021-03-02  0:50   ` Sergey Senozhatsky
  2021-03-24 12:09 ` [PATCH 0/8] videobuf2: support new noncontiguous DMA API Tomasz Figa
  9 siblings, 1 reply; 21+ messages in thread
From: Sergey Senozhatsky @ 2021-03-02  0:49 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: Tomasz Figa, Christoph Hellwig, Mauro Carvalho Chehab,
	linux-media, linux-kernel, Sergey Senozhatsky

This returns back non-coherent (previously known as NON_COHERENT)
memory flag and buffer cache management hints testing (for VB2_MEMORY_MMAP
buffers).

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
 utils/common/cv4l-helpers.h                 |  8 +--
 utils/common/v4l-helpers.h                  |  8 ++-
 utils/v4l2-compliance/v4l2-test-buffers.cpp | 65 ++++++++++++++++++---
 3 files changed, 66 insertions(+), 15 deletions(-)

diff --git a/utils/common/cv4l-helpers.h b/utils/common/cv4l-helpers.h
index 712efde6..3cee372b 100644
--- a/utils/common/cv4l-helpers.h
+++ b/utils/common/cv4l-helpers.h
@@ -754,17 +754,17 @@ public:
 	int g_fd(unsigned index, unsigned plane) const { return v4l_queue_g_fd(this, index, plane); }
 	void s_fd(unsigned index, unsigned plane, int fd) { v4l_queue_s_fd(this, index, plane, fd); }
 
-	int reqbufs(cv4l_fd *fd, unsigned count = 0)
+	int reqbufs(cv4l_fd *fd, unsigned count = 0, unsigned int flags = 0)
 	{
-		return v4l_queue_reqbufs(fd->g_v4l_fd(), this, count);
+		return v4l_queue_reqbufs(fd->g_v4l_fd(), this, count, flags);
 	}
 	bool has_create_bufs(cv4l_fd *fd) const
 	{
 		return v4l_queue_has_create_bufs(fd->g_v4l_fd(), this);
 	}
-	int create_bufs(cv4l_fd *fd, unsigned count, const v4l2_format *fmt = NULL)
+	int create_bufs(cv4l_fd *fd, unsigned count, const v4l2_format *fmt = NULL, unsigned int flags = 0)
 	{
-		return v4l_queue_create_bufs(fd->g_v4l_fd(), this, count, fmt);
+		return v4l_queue_create_bufs(fd->g_v4l_fd(), this, count, fmt, flags);
 	}
 	int mmap_bufs(cv4l_fd *fd, unsigned from = 0)
 	{
diff --git a/utils/common/v4l-helpers.h b/utils/common/v4l-helpers.h
index f96b3c38..c09cd987 100644
--- a/utils/common/v4l-helpers.h
+++ b/utils/common/v4l-helpers.h
@@ -1515,7 +1515,7 @@ static inline int v4l_queue_querybufs(struct v4l_fd *f, struct v4l_queue *q, uns
 }
 
 static inline int v4l_queue_reqbufs(struct v4l_fd *f,
-		struct v4l_queue *q, unsigned count)
+		struct v4l_queue *q, unsigned count, unsigned int flags = 0)
 {
 	struct v4l2_requestbuffers reqbufs;
 	int ret;
@@ -1523,6 +1523,7 @@ static inline int v4l_queue_reqbufs(struct v4l_fd *f,
 	reqbufs.type = q->type;
 	reqbufs.memory = q->memory;
 	reqbufs.count = count;
+	reqbufs.flags = flags;
 	/*
 	 * Problem: if REQBUFS returns an error, did it free any old
 	 * buffers or not?
@@ -1547,7 +1548,7 @@ static inline bool v4l_queue_has_create_bufs(struct v4l_fd *f, const struct v4l_
 
 static inline int v4l_queue_create_bufs(struct v4l_fd *f,
 		struct v4l_queue *q, unsigned count,
-		const struct v4l2_format *fmt)
+		const struct v4l2_format *fmt, unsigned int flags = 0)
 {
 	struct v4l2_create_buffers createbufs;
 	int ret;
@@ -1555,6 +1556,7 @@ static inline int v4l_queue_create_bufs(struct v4l_fd *f,
 	createbufs.format.type = q->type;
 	createbufs.memory = q->memory;
 	createbufs.count = count;
+	createbufs.flags = flags;
 	if (fmt) {
 		createbufs.format = *fmt;
 	} else {
@@ -1733,7 +1735,7 @@ static inline void v4l_queue_free(struct v4l_fd *f, struct v4l_queue *q)
 	v4l_ioctl(f, VIDIOC_STREAMOFF, &q->type);
 	v4l_queue_release_bufs(f, q, 0);
 	v4l_queue_close_exported_fds(q);
-	v4l_queue_reqbufs(f, q, 0);
+	v4l_queue_reqbufs(f, q, 0, 0);
 }
 
 static inline void v4l_queue_buffer_update(const struct v4l_queue *q,
diff --git a/utils/v4l2-compliance/v4l2-test-buffers.cpp b/utils/v4l2-compliance/v4l2-test-buffers.cpp
index e40461bd..6555c0cb 100644
--- a/utils/v4l2-compliance/v4l2-test-buffers.cpp
+++ b/utils/v4l2-compliance/v4l2-test-buffers.cpp
@@ -663,6 +663,10 @@ int testReqBufs(struct node *node)
 		fail_on_test(q.reqbufs(node, 0));
 
 		for (m = V4L2_MEMORY_MMAP; m <= V4L2_MEMORY_DMABUF; m++) {
+			bool cache_hints_cap = false;
+			bool consistent;
+
+			cache_hints_cap = q.g_capabilities() & V4L2_BUF_CAP_SUPPORTS_MMAP_CACHE_HINTS;
 			if (!(node->valid_memorytype & (1 << m)))
 				continue;
 			cv4l_queue q2(i, m);
@@ -678,8 +682,17 @@ int testReqBufs(struct node *node)
 			reqbufs.count = 1;
 			reqbufs.type = i;
 			reqbufs.memory = m;
+			reqbufs.flags = V4L2_FLAG_MEMORY_NON_COHERENT;
 			fail_on_test(doioctl(node, VIDIOC_REQBUFS, &reqbufs));
-			fail_on_test(check_0(reqbufs.reserved, sizeof(reqbufs.reserved)));
+			consistent = reqbufs.flags & V4L2_FLAG_MEMORY_NON_COHERENT;
+			if (!cache_hints_cap) {
+				fail_on_test(consistent);
+			} else {
+				if (m == V4L2_MEMORY_MMAP)
+					fail_on_test(!consistent);
+				else
+					fail_on_test(consistent);
+			}
 			q.reqbufs(node);
 
 			ret = q.create_bufs(node, 0);
@@ -692,9 +705,32 @@ int testReqBufs(struct node *node)
 			node->g_fmt(crbufs.format, i);
 			crbufs.count = 1;
 			crbufs.memory = m;
+			crbufs.flags = V4L2_FLAG_MEMORY_NON_COHERENT;
 			fail_on_test(doioctl(node, VIDIOC_CREATE_BUFS, &crbufs));
 			fail_on_test(check_0(crbufs.reserved, sizeof(crbufs.reserved)));
 			fail_on_test(crbufs.index != q.g_buffers());
+
+			consistent = crbufs.flags & V4L2_FLAG_MEMORY_NON_COHERENT;
+			if (!cache_hints_cap) {
+				fail_on_test(consistent);
+			} else {
+				if (m == V4L2_MEMORY_MMAP)
+					fail_on_test(!consistent);
+				else
+					fail_on_test(consistent);
+			}
+
+			if (cache_hints_cap) {
+				/*
+				 * Different memory consistency model. Should fail for MMAP
+				 * queues which support cache hints.
+				 */
+				crbufs.flags = 0;
+				if (m == V4L2_MEMORY_MMAP)
+					fail_on_test(doioctl(node, VIDIOC_CREATE_BUFS, &crbufs) != EINVAL);
+				else
+					fail_on_test(doioctl(node, VIDIOC_CREATE_BUFS, &crbufs));
+			}
 			q.reqbufs(node);
 
 			fail_on_test(q.create_bufs(node, 1));
@@ -1207,10 +1243,16 @@ static int setupMmap(struct node *node, cv4l_queue &q)
 		fail_on_test(buf.querybuf(node, i));
 		fail_on_test(buf.check(q, Unqueued, i));
 
-		flags = buf.g_flags();
-		flags |= V4L2_BUF_FLAG_NO_CACHE_INVALIDATE;
-		flags |= V4L2_BUF_FLAG_NO_CACHE_CLEAN;
-		buf.s_flags(flags);
+		/*
+		 * Do not set cache hints for all the buffers, but only on
+		 * some of them, so that we can test more cases.
+		 */
+		if (i == 0) {
+			flags = buf.g_flags();
+			flags |= V4L2_BUF_FLAG_NO_CACHE_INVALIDATE;
+			flags |= V4L2_BUF_FLAG_NO_CACHE_CLEAN;
+			buf.s_flags(flags);
+		}
 
 		for (unsigned p = 0; p < buf.g_num_planes(); p++) {
 			// Try a random offset
@@ -1250,8 +1292,15 @@ static int setupMmap(struct node *node, cv4l_queue &q)
 		}
 		flags = buf.g_flags();
 		if (cache_hints) {
-			fail_on_test(!(flags & V4L2_BUF_FLAG_NO_CACHE_INVALIDATE));
-			fail_on_test(!(flags & V4L2_BUF_FLAG_NO_CACHE_CLEAN));
+			if (i == 0) {
+				/* We do expect cache hints on this buffer */
+				fail_on_test(!(flags & V4L2_BUF_FLAG_NO_CACHE_INVALIDATE));
+				fail_on_test(!(flags & V4L2_BUF_FLAG_NO_CACHE_CLEAN));
+			} else {
+				/* We expect no cache hints on this buffer */
+				fail_on_test(flags & V4L2_BUF_FLAG_NO_CACHE_INVALIDATE);
+				fail_on_test(flags & V4L2_BUF_FLAG_NO_CACHE_CLEAN);
+			}
 		} else if (node->might_support_cache_hints) {
 			fail_on_test(flags & V4L2_BUF_FLAG_NO_CACHE_INVALIDATE);
 			fail_on_test(flags & V4L2_BUF_FLAG_NO_CACHE_CLEAN);
@@ -1341,7 +1390,7 @@ int testMmap(struct node *node, struct node *node_m2m_cap, unsigned frame_count,
 			have_createbufs = false;
 		if (have_createbufs) {
 			q.reqbufs(node);
-			q.create_bufs(node, 2, &cur_fmt);
+			q.create_bufs(node, 2, &cur_fmt, V4L2_FLAG_MEMORY_NON_COHERENT);
 			fail_on_test(setupMmap(node, q));
 			q.munmap_bufs(node);
 			q.reqbufs(node, 2);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] v4l-compliance: re-introduce NON_COHERENT and cache hints tests
  2021-03-02  0:49 ` [PATCH] v4l-compliance: re-introduce NON_COHERENT and cache hints tests Sergey Senozhatsky
@ 2021-03-02  0:50   ` Sergey Senozhatsky
  0 siblings, 0 replies; 21+ messages in thread
From: Sergey Senozhatsky @ 2021-03-02  0:50 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Hans Verkuil, Tomasz Figa, Christoph Hellwig,
	Mauro Carvalho Chehab, linux-media, linux-kernel

On (21/03/02 09:49), Sergey Senozhatsky wrote:
> This returns back non-coherent (previously known as NON_COHERENT)
							^^^
							NON_CONSISTENT...

> memory flag and buffer cache management hints testing (for VB2_MEMORY_MMAP
> buffers).

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/8] videobuf2: add V4L2_FLAG_MEMORY_NON_COHERENT flag
  2021-03-02  0:46 ` [PATCH 5/8] videobuf2: add V4L2_FLAG_MEMORY_NON_COHERENT flag Sergey Senozhatsky
@ 2021-03-22  7:02   ` Hans Verkuil
  0 siblings, 0 replies; 21+ messages in thread
From: Hans Verkuil @ 2021-03-22  7:02 UTC (permalink / raw)
  To: Sergey Senozhatsky, Tomasz Figa
  Cc: Christoph Hellwig, Mauro Carvalho Chehab, linux-media, linux-kernel

On 02/03/2021 01:46, Sergey Senozhatsky wrote:
> By setting or clearing V4L2_FLAG_MEMORY_NON_COHERENT flag
> user-space should be able to hint vb2 that either a non-coherent
> (if supported) or coherent memory should be used for the buffer
> allocation.
> 
> The patch set also adds a corresponding capability flag:
> fill_buf_caps() reports V4L2_BUF_CAP_SUPPORTS_MMAP_CACHE_HINTS
> when queue supports user-space cache management hints.

Hmm, this paragraph is probably outdated (copy and paste?) since this
capability already exists.

> 
> Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
> ---
>  .../userspace-api/media/v4l/buffer.rst        | 40 ++++++++++++++++++-
>  .../media/v4l/vidioc-reqbufs.rst              |  5 ++-
>  include/uapi/linux/videodev2.h                |  2 +
>  3 files changed, 43 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/userspace-api/media/v4l/buffer.rst b/Documentation/userspace-api/media/v4l/buffer.rst
> index 1b0fdc160533..a39852d6174f 100644
> --- a/Documentation/userspace-api/media/v4l/buffer.rst
> +++ b/Documentation/userspace-api/media/v4l/buffer.rst
> @@ -676,8 +676,6 @@ Buffer Flags
>  
>      \normalsize
>  
> -.. _memory-flags:
> -
>  enum v4l2_memory
>  ================
>  
> @@ -701,6 +699,44 @@ enum v4l2_memory
>        - 4
>        - The buffer is used for :ref:`DMA shared buffer <dmabuf>` I/O.
>  
> +.. _memory-flags:
> +
> +Memory Consistency Flags
> +------------------------
> +
> +.. raw:: latex
> +
> +    \small
> +
> +.. tabularcolumns:: |p{7.0cm}|p{2.1cm}|p{8.4cm}|
> +
> +.. cssclass:: longtable
> +
> +.. flat-table::
> +    :header-rows:  0
> +    :stub-columns: 0
> +    :widths:       3 1 4
> +
> +    * .. _`V4L2-FLAG-MEMORY-NON-COHERENT`:
> +
> +      - ``V4L2_FLAG_MEMORY_NON_COHERENT``

Rename this to V4L2_MEMORY_FLAG_NON_COHERENT: this is consistent with
V4L2_FMT_FLAG_ and V4L2_BUF_FLAG_.

> +      - 0x00000001
> +      - A buffer is allocated either in coherent (it will be automatically
> +	coherent between the CPU and the bus) or non-coherent memory. The
> +	latter can provide performance gains, for instance the CPU cache
> +	sync/flush operations can be avoided if the buffer is accessed by the
> +	corresponding device only and the CPU does not read/write to/from that
> +	buffer. However, this requires extra care from the driver -- it must
> +	guarantee memory consistency by issuing a cache flush/sync when
> +	consistency is needed. If this flag is set V4L2 will attempt to
> +	allocate the buffer in non-coherent memory. The flag takes effect
> +	only if the buffer is used for :ref:`memory mapping <mmap>` I/O and the
> +	queue reports the :ref:`V4L2_BUF_CAP_SUPPORTS_MMAP_CACHE_HINTS
> +	<V4L2-BUF-CAP-SUPPORTS-MMAP-CACHE-HINTS>` capability.
> +
> +.. raw:: latex
> +
> +    \normalsize
>  
>  Timecodes
>  =========
> diff --git a/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst b/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst
> index c1c88e00b106..950e7ec1aac5 100644
> --- a/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst
> +++ b/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst
> @@ -154,8 +154,9 @@ aborting or finishing any DMA in progress, an implicit
>        - This capability is set by the driver to indicate that the queue supports
>          cache and memory management hints. However, it's only valid when the
>          queue is used for :ref:`memory mapping <mmap>` streaming I/O. See
> -        :ref:`V4L2_BUF_FLAG_NO_CACHE_INVALIDATE <V4L2-BUF-FLAG-NO-CACHE-INVALIDATE>` and
> -        :ref:`V4L2_BUF_FLAG_NO_CACHE_CLEAN <V4L2-BUF-FLAG-NO-CACHE-CLEAN>`.
> +        :ref:`V4L2_BUF_FLAG_NO_CACHE_INVALIDATE <V4L2-BUF-FLAG-NO-CACHE-INVALIDATE>`,
> +        :ref:`V4L2_BUF_FLAG_NO_CACHE_CLEAN <V4L2-BUF-FLAG-NO-CACHE-CLEAN>` and
> +        :ref:`V4L2_FLAG_MEMORY_NON_COHERENT <V4L2-FLAG-MEMORY-NON-COHERENT>`.
>  
>  Return Value
>  ============
> diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
> index 79dbde3bcf8d..b1d4171fe50b 100644
> --- a/include/uapi/linux/videodev2.h
> +++ b/include/uapi/linux/videodev2.h
> @@ -954,6 +954,8 @@ struct v4l2_requestbuffers {
>  	__u32			reserved[1];
>  };
>  
> +#define V4L2_FLAG_MEMORY_NON_COHERENT			(1 << 0)
> +
>  /* capabilities for struct v4l2_requestbuffers and v4l2_create_buffers */
>  #define V4L2_BUF_CAP_SUPPORTS_MMAP			(1 << 0)
>  #define V4L2_BUF_CAP_SUPPORTS_USERPTR			(1 << 1)
> 

Regards,

	Hans

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 7/8] videobuf2: handle V4L2_FLAG_MEMORY_NON_COHERENT flag
  2021-03-02  0:46 ` [PATCH 7/8] videobuf2: handle V4L2_FLAG_MEMORY_NON_COHERENT flag Sergey Senozhatsky
@ 2021-03-22  7:16   ` Hans Verkuil
  2021-03-22  7:18   ` Hans Verkuil
  1 sibling, 0 replies; 21+ messages in thread
From: Hans Verkuil @ 2021-03-22  7:16 UTC (permalink / raw)
  To: Sergey Senozhatsky, Tomasz Figa
  Cc: Christoph Hellwig, Mauro Carvalho Chehab, linux-media, linux-kernel

On 02/03/2021 01:46, Sergey Senozhatsky wrote:
> This patch lets user-space to request a non-coherent memory
> allocation during CREATE_BUFS and REQBUFS ioctl calls.
> 
> = CREATE_BUFS
> 
>   struct v4l2_create_buffers has seven 4-byte reserved areas,
>   so reserved[0] is renamed to ->flags. The struct, thus, now
>   has six reserved 4-byte regions.
> 
> = CREATE_BUFS32
> 
>   struct v4l2_create_buffers32 has seven 4-byte reserved areas,
>   so reserved[0] is renamed to ->flags. The struct, thus, now
>   has six reserved 4-byte regions.
> 
> = REQBUFS
> 
>  We use one bit of a ->reserved[1] member of struct v4l2_requestbuffers,
>  which is now renamed to ->flags. Unlike v4l2_create_buffers, struct
>  v4l2_requestbuffers does not have enough reserved room. Therefore for
>  backward compatibility  ->reserved and ->flags were put into anonymous
>  union.
> 
> Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
> ---
>  .../media/v4l/vidioc-create-bufs.rst          |  7 ++++++-
>  .../media/v4l/vidioc-reqbufs.rst              | 11 +++++++++--
>  .../media/common/videobuf2/videobuf2-core.c   |  6 ++++++
>  .../media/common/videobuf2/videobuf2-v4l2.c   | 19 ++++++++++++++++---
>  drivers/media/v4l2-core/v4l2-compat-ioctl32.c |  9 ++++++++-
>  drivers/media/v4l2-core/v4l2-ioctl.c          |  5 +----
>  include/uapi/linux/videodev2.h                | 11 +++++++++--
>  7 files changed, 55 insertions(+), 13 deletions(-)
> 
> diff --git a/Documentation/userspace-api/media/v4l/vidioc-create-bufs.rst b/Documentation/userspace-api/media/v4l/vidioc-create-bufs.rst
> index b06e5b528e11..132c8b612a94 100644
> --- a/Documentation/userspace-api/media/v4l/vidioc-create-bufs.rst
> +++ b/Documentation/userspace-api/media/v4l/vidioc-create-bufs.rst
> @@ -113,7 +113,12 @@ than the number requested.
>  	``V4L2_MEMORY_MMAP`` and ``format.type`` to the buffer type.
>  
>      * - __u32
> -      - ``reserved``\ [7]
> +      - ``flags``
> +      - Specifies additional buffer management attributes.
> +	See :ref:`memory-flags`.
> +
> +    * - __u32
> +      - ``reserved``\ [6]
>        - A place holder for future extensions. Drivers and applications
>  	must set the array to zero.
>  
> diff --git a/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst b/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst
> index 950e7ec1aac5..80ea48acea84 100644
> --- a/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst
> +++ b/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst
> @@ -104,10 +104,17 @@ aborting or finishing any DMA in progress, an implicit
>  	``V4L2_MEMORY_MMAP`` and ``type`` set to the buffer type. This will
>  	free any previously allocated buffers, so this is typically something
>  	that will be done at the start of the application.
> +    * - union {
> +      - (anonymous)
> +    * - __u32
> +      - ``flags``
> +      - Specifies additional buffer management attributes.
> +	See :ref:`memory-flags`.
>      * - __u32
>        - ``reserved``\ [1]
> -      - A place holder for future extensions. Drivers and applications
> -	must set the array to zero.
> +      - Kept for backwards compatibility. Use ``flags`` instead.
> +    * - }
> +      -

I would do this differently. Replace the __u32 reserved[1] by:

	__u8 flags;
	__u8 reserved[3];

We only need a single bit for flags, so this way we still have a few reserved bytes
available.

>  
>  .. tabularcolumns:: |p{6.1cm}|p{2.2cm}|p{8.7cm}|
>  
> diff --git a/drivers/media/common/videobuf2/videobuf2-core.c b/drivers/media/common/videobuf2/videobuf2-core.c
> index 7040b7f47133..5906a48e7757 100644
> --- a/drivers/media/common/videobuf2/videobuf2-core.c
> +++ b/drivers/media/common/videobuf2/videobuf2-core.c
> @@ -768,6 +768,9 @@ int vb2_core_reqbufs(struct vb2_queue *q, enum vb2_memory memory,
>  	unsigned int i;
>  	int ret;
>  
> +	if (flags & V4L2_FLAG_MEMORY_NON_COHERENT)
> +		coherent_mem = false;
> +
>  	if (q->streaming) {
>  		dprintk(q, 1, "streaming active\n");
>  		return -EBUSY;
> @@ -911,6 +914,9 @@ int vb2_core_create_bufs(struct vb2_queue *q, enum vb2_memory memory,
>  	bool coherent_mem = true;
>  	int ret;
>  
> +	if (flags & V4L2_FLAG_MEMORY_NON_COHERENT)
> +		coherent_mem = false;
> +
>  	if (q->num_buffers == VB2_MAX_FRAME) {
>  		dprintk(q, 1, "maximum number of buffers already allocated\n");
>  		return -ENOBUFS;
> diff --git a/drivers/media/common/videobuf2/videobuf2-v4l2.c b/drivers/media/common/videobuf2/videobuf2-v4l2.c
> index 1166d5a9291a..f6a8dcc1b5c6 100644
> --- a/drivers/media/common/videobuf2/videobuf2-v4l2.c
> +++ b/drivers/media/common/videobuf2/videobuf2-v4l2.c
> @@ -692,12 +692,22 @@ static void fill_buf_caps(struct vb2_queue *q, u32 *caps)
>  #endif
>  }
>  
> +static void validate_coherency_flags(struct vb2_queue *q,

This should validate flags in general, not just the single coherency bit.
So rename to validate_memory_flags().

> +				     int memory,
> +				     unsigned int *flags)

Use u32 instead of unsigned int.

> +{
> +	if (!q->allow_cache_hints || memory != V4L2_MEMORY_MMAP)
> +		*flags &= ~V4L2_FLAG_MEMORY_NON_COHERENT;

This should clear all other bits as well. If new flags are added in the
future, you want to make sure this code will clear them since they are not
supported in this kernel version.

> +}
> +
>  int vb2_reqbufs(struct vb2_queue *q, struct v4l2_requestbuffers *req)
>  {
>  	int ret = vb2_verify_memory_type(q, req->memory, req->type);
>  
>  	fill_buf_caps(q, &req->capabilities);
> -	return ret ? ret : vb2_core_reqbufs(q, req->memory, 0, &req->count);
> +	validate_coherency_flags(q, req->memory, &req->flags);

Since in my proposal req->flags is a u8 you need to assign it to a u32 first,
then copy the result from validate_memory_flags back to req->flags.

> +	return ret ? ret : vb2_core_reqbufs(q, req->memory,
> +					    req->flags, &req->count);
>  }
>  EXPORT_SYMBOL_GPL(vb2_reqbufs);
>  
> @@ -729,6 +739,7 @@ int vb2_create_bufs(struct vb2_queue *q, struct v4l2_create_buffers *create)
>  	unsigned i;
>  
>  	fill_buf_caps(q, &create->capabilities);
> +	validate_coherency_flags(q, create->memory, &create->flags);
>  	create->index = q->num_buffers;
>  	if (create->count == 0)
>  		return ret != -EBUSY ? ret : 0;
> @@ -772,7 +783,7 @@ int vb2_create_bufs(struct vb2_queue *q, struct v4l2_create_buffers *create)
>  		if (requested_sizes[i] == 0)
>  			return -EINVAL;
>  	return ret ? ret : vb2_core_create_bufs(q, create->memory,
> -						0,
> +						create->flags,
>  						&create->count,
>  						requested_planes,
>  						requested_sizes);
> @@ -957,11 +968,12 @@ int vb2_ioctl_reqbufs(struct file *file, void *priv,
>  	int res = vb2_verify_memory_type(vdev->queue, p->memory, p->type);
>  
>  	fill_buf_caps(vdev->queue, &p->capabilities);
> +	validate_coherency_flags(vdev->queue, p->memory, &p->flags);
>  	if (res)
>  		return res;
>  	if (vb2_queue_is_busy(vdev, file))
>  		return -EBUSY;
> -	res = vb2_core_reqbufs(vdev->queue, p->memory, 0, &p->count);
> +	res = vb2_core_reqbufs(vdev->queue, p->memory, p->flags, &p->count);
>  	/* If count == 0, then the owner has released all buffers and he
>  	   is no longer owner of the queue. Otherwise we have a new owner. */
>  	if (res == 0)
> @@ -979,6 +991,7 @@ int vb2_ioctl_create_bufs(struct file *file, void *priv,
>  
>  	p->index = vdev->queue->num_buffers;
>  	fill_buf_caps(vdev->queue, &p->capabilities);
> +	validate_coherency_flags(vdev->queue, p->memory, &p->flags);
>  	/*
>  	 * If count == 0, then just check if memory and type are valid.
>  	 * Any -EBUSY result from vb2_verify_memory_type can be mapped to 0.
> diff --git a/drivers/media/v4l2-core/v4l2-compat-ioctl32.c b/drivers/media/v4l2-core/v4l2-compat-ioctl32.c
> index 0ca75f6784c5..1aa9ca3b6ca4 100644
> --- a/drivers/media/v4l2-core/v4l2-compat-ioctl32.c
> +++ b/drivers/media/v4l2-core/v4l2-compat-ioctl32.c
> @@ -126,6 +126,9 @@ struct v4l2_format32 {
>   * @memory:	buffer memory type
>   * @format:	frame format, for which buffers are requested
>   * @capabilities: capabilities of this buffer type.
> + * @flags:	additional buffer management attributes (ignored unless the
> + *		queue has V4L2_BUF_CAP_SUPPORTS_MMAP_CACHE_HINTS capability and
> + *		configured for MMAP streaming I/O).
>   * @reserved:	future extensions
>   */
>  struct v4l2_create_buffers32 {
> @@ -134,7 +137,8 @@ struct v4l2_create_buffers32 {
>  	__u32			memory;	/* enum v4l2_memory */
>  	struct v4l2_format32	format;
>  	__u32			capabilities;
> -	__u32			reserved[7];
> +	__u32			flags;
> +	__u32			reserved[6];
>  };
>  
>  static int get_v4l2_format32(struct v4l2_format *p64,
> @@ -182,6 +186,8 @@ static int get_v4l2_create32(struct v4l2_create_buffers *p64,
>  	if (copy_from_user(p64, p32,
>  			   offsetof(struct v4l2_create_buffers32, format)))
>  		return -EFAULT;
> +	if (copy_from_user(&p64->flags, &p32->flags, sizeof(p32->flags)))
> +		return -EFAULT;
>  	return get_v4l2_format32(&p64->format, &p32->format);
>  }
>  
> @@ -227,6 +233,7 @@ static int put_v4l2_create32(struct v4l2_create_buffers *p64,
>  	if (copy_to_user(p32, p64,
>  			 offsetof(struct v4l2_create_buffers32, format)) ||
>  	    put_user(p64->capabilities, &p32->capabilities) ||
> +	    put_user(p64->flags, &p32->flags) ||
>  	    copy_to_user(p32->reserved, p64->reserved, sizeof(p64->reserved)))
>  		return -EFAULT;
>  	return put_v4l2_format32(&p64->format, &p32->format);
> diff --git a/drivers/media/v4l2-core/v4l2-ioctl.c b/drivers/media/v4l2-core/v4l2-ioctl.c
> index 31d1342e61e8..f8d38a3cd1e0 100644
> --- a/drivers/media/v4l2-core/v4l2-ioctl.c
> +++ b/drivers/media/v4l2-core/v4l2-ioctl.c
> @@ -2045,9 +2045,6 @@ static int v4l_reqbufs(const struct v4l2_ioctl_ops *ops,
>  
>  	if (ret)
>  		return ret;
> -
> -	CLEAR_AFTER_FIELD(p, capabilities);
> -
>  	return ops->vidioc_reqbufs(file, fh, p);
>  }
>  
> @@ -2087,7 +2084,7 @@ static int v4l_create_bufs(const struct v4l2_ioctl_ops *ops,
>  	if (ret)
>  		return ret;
>  
> -	CLEAR_AFTER_FIELD(create, capabilities);
> +	CLEAR_AFTER_FIELD(create, flags);
>  
>  	v4l_sanitize_format(&create->format);
>  
> diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
> index b1d4171fe50b..85d2681e73b6 100644
> --- a/include/uapi/linux/videodev2.h
> +++ b/include/uapi/linux/videodev2.h
> @@ -951,7 +951,10 @@ struct v4l2_requestbuffers {
>  	__u32			type;		/* enum v4l2_buf_type */
>  	__u32			memory;		/* enum v4l2_memory */
>  	__u32			capabilities;
> -	__u32			reserved[1];
> +	union {
> +		__u32		flags;
> +		__u32		reserved[1];
> +	};
>  };
>  
>  #define V4L2_FLAG_MEMORY_NON_COHERENT			(1 << 0)
> @@ -2473,6 +2476,9 @@ struct v4l2_dbg_chip_info {
>   * @memory:	enum v4l2_memory; buffer memory type
>   * @format:	frame format, for which buffers are requested
>   * @capabilities: capabilities of this buffer type.
> + * @flags:	additional buffer management attributes (ignored unless the
> + *		queue has V4L2_BUF_CAP_SUPPORTS_MMAP_CACHE_HINTS capability
> + *		and configured for MMAP streaming I/O).
>   * @reserved:	future extensions
>   */
>  struct v4l2_create_buffers {
> @@ -2481,7 +2487,8 @@ struct v4l2_create_buffers {
>  	__u32			memory;
>  	struct v4l2_format	format;
>  	__u32			capabilities;
> -	__u32			reserved[7];
> +	__u32			flags;
> +	__u32			reserved[6];
>  };
>  
>  /*
> 

Regards,

	Hans

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 7/8] videobuf2: handle V4L2_FLAG_MEMORY_NON_COHERENT flag
  2021-03-02  0:46 ` [PATCH 7/8] videobuf2: handle V4L2_FLAG_MEMORY_NON_COHERENT flag Sergey Senozhatsky
  2021-03-22  7:16   ` Hans Verkuil
@ 2021-03-22  7:18   ` Hans Verkuil
  1 sibling, 0 replies; 21+ messages in thread
From: Hans Verkuil @ 2021-03-22  7:18 UTC (permalink / raw)
  To: Sergey Senozhatsky, Tomasz Figa
  Cc: Christoph Hellwig, Mauro Carvalho Chehab, linux-media, linux-kernel

On 02/03/2021 01:46, Sergey Senozhatsky wrote:
> This patch lets user-space to request a non-coherent memory
> allocation during CREATE_BUFS and REQBUFS ioctl calls.
> 
> = CREATE_BUFS
> 
>   struct v4l2_create_buffers has seven 4-byte reserved areas,
>   so reserved[0] is renamed to ->flags. The struct, thus, now
>   has six reserved 4-byte regions.
> 
> = CREATE_BUFS32
> 
>   struct v4l2_create_buffers32 has seven 4-byte reserved areas,
>   so reserved[0] is renamed to ->flags. The struct, thus, now
>   has six reserved 4-byte regions.
> 
> = REQBUFS
> 
>  We use one bit of a ->reserved[1] member of struct v4l2_requestbuffers,
>  which is now renamed to ->flags. Unlike v4l2_create_buffers, struct
>  v4l2_requestbuffers does not have enough reserved room. Therefore for
>  backward compatibility  ->reserved and ->flags were put into anonymous
>  union.
> 
> Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
> ---
>  .../media/v4l/vidioc-create-bufs.rst          |  7 ++++++-
>  .../media/v4l/vidioc-reqbufs.rst              | 11 +++++++++--
>  .../media/common/videobuf2/videobuf2-core.c   |  6 ++++++
>  .../media/common/videobuf2/videobuf2-v4l2.c   | 19 ++++++++++++++++---
>  drivers/media/v4l2-core/v4l2-compat-ioctl32.c |  9 ++++++++-
>  drivers/media/v4l2-core/v4l2-ioctl.c          |  5 +----
>  include/uapi/linux/videodev2.h                | 11 +++++++++--
>  7 files changed, 55 insertions(+), 13 deletions(-)
> 
> diff --git a/Documentation/userspace-api/media/v4l/vidioc-create-bufs.rst b/Documentation/userspace-api/media/v4l/vidioc-create-bufs.rst
> index b06e5b528e11..132c8b612a94 100644
> --- a/Documentation/userspace-api/media/v4l/vidioc-create-bufs.rst
> +++ b/Documentation/userspace-api/media/v4l/vidioc-create-bufs.rst
> @@ -113,7 +113,12 @@ than the number requested.
>  	``V4L2_MEMORY_MMAP`` and ``format.type`` to the buffer type.
>  
>      * - __u32
> -      - ``reserved``\ [7]
> +      - ``flags``
> +      - Specifies additional buffer management attributes.
> +	See :ref:`memory-flags`.
> +
> +    * - __u32
> +      - ``reserved``\ [6]
>        - A place holder for future extensions. Drivers and applications
>  	must set the array to zero.
>  
> diff --git a/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst b/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst
> index 950e7ec1aac5..80ea48acea84 100644
> --- a/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst
> +++ b/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst
> @@ -104,10 +104,17 @@ aborting or finishing any DMA in progress, an implicit
>  	``V4L2_MEMORY_MMAP`` and ``type`` set to the buffer type. This will
>  	free any previously allocated buffers, so this is typically something
>  	that will be done at the start of the application.
> +    * - union {
> +      - (anonymous)
> +    * - __u32
> +      - ``flags``
> +      - Specifies additional buffer management attributes.
> +	See :ref:`memory-flags`.
>      * - __u32
>        - ``reserved``\ [1]
> -      - A place holder for future extensions. Drivers and applications
> -	must set the array to zero.
> +      - Kept for backwards compatibility. Use ``flags`` instead.
> +    * - }
> +      -
>  
>  .. tabularcolumns:: |p{6.1cm}|p{2.2cm}|p{8.7cm}|
>  
> diff --git a/drivers/media/common/videobuf2/videobuf2-core.c b/drivers/media/common/videobuf2/videobuf2-core.c
> index 7040b7f47133..5906a48e7757 100644
> --- a/drivers/media/common/videobuf2/videobuf2-core.c
> +++ b/drivers/media/common/videobuf2/videobuf2-core.c
> @@ -768,6 +768,9 @@ int vb2_core_reqbufs(struct vb2_queue *q, enum vb2_memory memory,
>  	unsigned int i;
>  	int ret;
>  
> +	if (flags & V4L2_FLAG_MEMORY_NON_COHERENT)
> +		coherent_mem = false;
> +

I think it is better if this is done at the bool coherent_mem declaration:

	bool coherent_mem = !(flags & V4L2_FLAG_MEMORY_NON_COHERENT);

>  	if (q->streaming) {
>  		dprintk(q, 1, "streaming active\n");
>  		return -EBUSY;
> @@ -911,6 +914,9 @@ int vb2_core_create_bufs(struct vb2_queue *q, enum vb2_memory memory,
>  	bool coherent_mem = true;
>  	int ret;
>  
> +	if (flags & V4L2_FLAG_MEMORY_NON_COHERENT)
> +		coherent_mem = false;
> +

Ditto.

Regards,

	Hans

>  	if (q->num_buffers == VB2_MAX_FRAME) {
>  		dprintk(q, 1, "maximum number of buffers already allocated\n");
>  		return -ENOBUFS;
> diff --git a/drivers/media/common/videobuf2/videobuf2-v4l2.c b/drivers/media/common/videobuf2/videobuf2-v4l2.c
> index 1166d5a9291a..f6a8dcc1b5c6 100644
> --- a/drivers/media/common/videobuf2/videobuf2-v4l2.c
> +++ b/drivers/media/common/videobuf2/videobuf2-v4l2.c
> @@ -692,12 +692,22 @@ static void fill_buf_caps(struct vb2_queue *q, u32 *caps)
>  #endif
>  }
>  
> +static void validate_coherency_flags(struct vb2_queue *q,
> +				     int memory,
> +				     unsigned int *flags)
> +{
> +	if (!q->allow_cache_hints || memory != V4L2_MEMORY_MMAP)
> +		*flags &= ~V4L2_FLAG_MEMORY_NON_COHERENT;
> +}
> +
>  int vb2_reqbufs(struct vb2_queue *q, struct v4l2_requestbuffers *req)
>  {
>  	int ret = vb2_verify_memory_type(q, req->memory, req->type);
>  
>  	fill_buf_caps(q, &req->capabilities);
> -	return ret ? ret : vb2_core_reqbufs(q, req->memory, 0, &req->count);
> +	validate_coherency_flags(q, req->memory, &req->flags);
> +	return ret ? ret : vb2_core_reqbufs(q, req->memory,
> +					    req->flags, &req->count);
>  }
>  EXPORT_SYMBOL_GPL(vb2_reqbufs);
>  
> @@ -729,6 +739,7 @@ int vb2_create_bufs(struct vb2_queue *q, struct v4l2_create_buffers *create)
>  	unsigned i;
>  
>  	fill_buf_caps(q, &create->capabilities);
> +	validate_coherency_flags(q, create->memory, &create->flags);
>  	create->index = q->num_buffers;
>  	if (create->count == 0)
>  		return ret != -EBUSY ? ret : 0;
> @@ -772,7 +783,7 @@ int vb2_create_bufs(struct vb2_queue *q, struct v4l2_create_buffers *create)
>  		if (requested_sizes[i] == 0)
>  			return -EINVAL;
>  	return ret ? ret : vb2_core_create_bufs(q, create->memory,
> -						0,
> +						create->flags,
>  						&create->count,
>  						requested_planes,
>  						requested_sizes);
> @@ -957,11 +968,12 @@ int vb2_ioctl_reqbufs(struct file *file, void *priv,
>  	int res = vb2_verify_memory_type(vdev->queue, p->memory, p->type);
>  
>  	fill_buf_caps(vdev->queue, &p->capabilities);
> +	validate_coherency_flags(vdev->queue, p->memory, &p->flags);
>  	if (res)
>  		return res;
>  	if (vb2_queue_is_busy(vdev, file))
>  		return -EBUSY;
> -	res = vb2_core_reqbufs(vdev->queue, p->memory, 0, &p->count);
> +	res = vb2_core_reqbufs(vdev->queue, p->memory, p->flags, &p->count);
>  	/* If count == 0, then the owner has released all buffers and he
>  	   is no longer owner of the queue. Otherwise we have a new owner. */
>  	if (res == 0)
> @@ -979,6 +991,7 @@ int vb2_ioctl_create_bufs(struct file *file, void *priv,
>  
>  	p->index = vdev->queue->num_buffers;
>  	fill_buf_caps(vdev->queue, &p->capabilities);
> +	validate_coherency_flags(vdev->queue, p->memory, &p->flags);
>  	/*
>  	 * If count == 0, then just check if memory and type are valid.
>  	 * Any -EBUSY result from vb2_verify_memory_type can be mapped to 0.
> diff --git a/drivers/media/v4l2-core/v4l2-compat-ioctl32.c b/drivers/media/v4l2-core/v4l2-compat-ioctl32.c
> index 0ca75f6784c5..1aa9ca3b6ca4 100644
> --- a/drivers/media/v4l2-core/v4l2-compat-ioctl32.c
> +++ b/drivers/media/v4l2-core/v4l2-compat-ioctl32.c
> @@ -126,6 +126,9 @@ struct v4l2_format32 {
>   * @memory:	buffer memory type
>   * @format:	frame format, for which buffers are requested
>   * @capabilities: capabilities of this buffer type.
> + * @flags:	additional buffer management attributes (ignored unless the
> + *		queue has V4L2_BUF_CAP_SUPPORTS_MMAP_CACHE_HINTS capability and
> + *		configured for MMAP streaming I/O).
>   * @reserved:	future extensions
>   */
>  struct v4l2_create_buffers32 {
> @@ -134,7 +137,8 @@ struct v4l2_create_buffers32 {
>  	__u32			memory;	/* enum v4l2_memory */
>  	struct v4l2_format32	format;
>  	__u32			capabilities;
> -	__u32			reserved[7];
> +	__u32			flags;
> +	__u32			reserved[6];
>  };
>  
>  static int get_v4l2_format32(struct v4l2_format *p64,
> @@ -182,6 +186,8 @@ static int get_v4l2_create32(struct v4l2_create_buffers *p64,
>  	if (copy_from_user(p64, p32,
>  			   offsetof(struct v4l2_create_buffers32, format)))
>  		return -EFAULT;
> +	if (copy_from_user(&p64->flags, &p32->flags, sizeof(p32->flags)))
> +		return -EFAULT;
>  	return get_v4l2_format32(&p64->format, &p32->format);
>  }
>  
> @@ -227,6 +233,7 @@ static int put_v4l2_create32(struct v4l2_create_buffers *p64,
>  	if (copy_to_user(p32, p64,
>  			 offsetof(struct v4l2_create_buffers32, format)) ||
>  	    put_user(p64->capabilities, &p32->capabilities) ||
> +	    put_user(p64->flags, &p32->flags) ||
>  	    copy_to_user(p32->reserved, p64->reserved, sizeof(p64->reserved)))
>  		return -EFAULT;
>  	return put_v4l2_format32(&p64->format, &p32->format);
> diff --git a/drivers/media/v4l2-core/v4l2-ioctl.c b/drivers/media/v4l2-core/v4l2-ioctl.c
> index 31d1342e61e8..f8d38a3cd1e0 100644
> --- a/drivers/media/v4l2-core/v4l2-ioctl.c
> +++ b/drivers/media/v4l2-core/v4l2-ioctl.c
> @@ -2045,9 +2045,6 @@ static int v4l_reqbufs(const struct v4l2_ioctl_ops *ops,
>  
>  	if (ret)
>  		return ret;
> -
> -	CLEAR_AFTER_FIELD(p, capabilities);
> -
>  	return ops->vidioc_reqbufs(file, fh, p);
>  }
>  
> @@ -2087,7 +2084,7 @@ static int v4l_create_bufs(const struct v4l2_ioctl_ops *ops,
>  	if (ret)
>  		return ret;
>  
> -	CLEAR_AFTER_FIELD(create, capabilities);
> +	CLEAR_AFTER_FIELD(create, flags);
>  
>  	v4l_sanitize_format(&create->format);
>  
> diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
> index b1d4171fe50b..85d2681e73b6 100644
> --- a/include/uapi/linux/videodev2.h
> +++ b/include/uapi/linux/videodev2.h
> @@ -951,7 +951,10 @@ struct v4l2_requestbuffers {
>  	__u32			type;		/* enum v4l2_buf_type */
>  	__u32			memory;		/* enum v4l2_memory */
>  	__u32			capabilities;
> -	__u32			reserved[1];
> +	union {
> +		__u32		flags;
> +		__u32		reserved[1];
> +	};
>  };
>  
>  #define V4L2_FLAG_MEMORY_NON_COHERENT			(1 << 0)
> @@ -2473,6 +2476,9 @@ struct v4l2_dbg_chip_info {
>   * @memory:	enum v4l2_memory; buffer memory type
>   * @format:	frame format, for which buffers are requested
>   * @capabilities: capabilities of this buffer type.
> + * @flags:	additional buffer management attributes (ignored unless the
> + *		queue has V4L2_BUF_CAP_SUPPORTS_MMAP_CACHE_HINTS capability
> + *		and configured for MMAP streaming I/O).
>   * @reserved:	future extensions
>   */
>  struct v4l2_create_buffers {
> @@ -2481,7 +2487,8 @@ struct v4l2_create_buffers {
>  	__u32			memory;
>  	struct v4l2_format	format;
>  	__u32			capabilities;
> -	__u32			reserved[7];
> +	__u32			flags;
> +	__u32			reserved[6];
>  };
>  
>  /*
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 8/8] videobuf2: handle non-contiguous DMA allocations
  2021-03-02  0:46 ` [PATCH 8/8] videobuf2: handle non-contiguous DMA allocations Sergey Senozhatsky
@ 2021-03-22  7:40   ` Hans Verkuil
  2021-03-24 12:07   ` Tomasz Figa
  2023-07-04 10:50   ` Hsia-Jun Li
  2 siblings, 0 replies; 21+ messages in thread
From: Hans Verkuil @ 2021-03-22  7:40 UTC (permalink / raw)
  To: Sergey Senozhatsky, Tomasz Figa
  Cc: Christoph Hellwig, Mauro Carvalho Chehab, linux-media, linux-kernel

On 02/03/2021 01:46, Sergey Senozhatsky wrote:
> This adds support for new noncontiguous DMA API, which
> requires allocators to have two execution branches: one
> for the current API, and one for the new one.
> 
> Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
> [hch: untested conversion to the ne API]
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  .../common/videobuf2/videobuf2-dma-contig.c   | 141 +++++++++++++++---
>  1 file changed, 117 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> index 1e218bc440c6..d6a9f7b682f3 100644
> --- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> +++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> @@ -17,6 +17,7 @@
>  #include <linux/sched.h>
>  #include <linux/slab.h>
>  #include <linux/dma-mapping.h>
> +#include <linux/highmem.h>
>  
>  #include <media/videobuf2-v4l2.h>
>  #include <media/videobuf2-dma-contig.h>
> @@ -42,8 +43,14 @@ struct vb2_dc_buf {
>  	struct dma_buf_attachment	*db_attach;
>  
>  	struct vb2_buffer		*vb;
> +	unsigned int			non_coherent_mem:1;

Just use a bool here.

>  };
>  
> +static bool vb2_dc_is_coherent(struct vb2_dc_buf *buf)
> +{
> +	return !buf->non_coherent_mem;
> +}

I would just drop this 'helper' function. Testing against buf->non_coherent_mem
seems perfectly understandable to me. And better than negating that bool in
this helper function.

You can choose to invert non_coherent_mem: i.e. add a coherent_mem bool instead
of a non_coherent_mem bool if you think that is easier to understand. It's set
in just one place (alloc), so that's easy enough.

Calling it 'bool is_coherent;' would perhaps be the easiest to understand.

> +
>  /*********************************************/
>  /*        scatterlist table functions        */
>  /*********************************************/
> @@ -78,12 +85,21 @@ static void *vb2_dc_cookie(struct vb2_buffer *vb, void *buf_priv)
>  static void *vb2_dc_vaddr(struct vb2_buffer *vb, void *buf_priv)
>  {
>  	struct vb2_dc_buf *buf = buf_priv;
> -	struct dma_buf_map map;
> -	int ret;
>  
> -	if (!buf->vaddr && buf->db_attach) {
> -		ret = dma_buf_vmap(buf->db_attach->dmabuf, &map);
> -		buf->vaddr = ret ? NULL : map.vaddr;
> +	if (buf->vaddr)
> +		return buf->vaddr;
> +
> +	if (buf->db_attach) {
> +		struct dma_buf_map map;
> +
> +		if (!dma_buf_vmap(buf->db_attach->dmabuf, &map))
> +			buf->vaddr = map.vaddr;
> +	}
> +
> +	if (!vb2_dc_is_coherent(buf)) {
> +		buf->vaddr = dma_vmap_noncontiguous(buf->dev,
> +						    buf->size,
> +						    buf->dma_sgt);
>  	}
>  
>  	return buf->vaddr;
> @@ -101,13 +117,26 @@ static void vb2_dc_prepare(void *buf_priv)
>  	struct vb2_dc_buf *buf = buf_priv;
>  	struct sg_table *sgt = buf->dma_sgt;
>  
> +	/* This takes care of DMABUF and user-enforced cache sync hint */
>  	if (buf->vb->skip_cache_sync_on_prepare)
>  		return;
>  
> +	/*
> +	 * Coherent MMAP buffers do not need to be synced, unlike coherent
> +	 * USERPTR and non-coherent MMAP buffers.
> +	 */
> +	if (buf->vb->memory == V4L2_MEMORY_MMAP && vb2_dc_is_coherent(buf))
> +		return;
> +
>  	if (!sgt)
>  		return;
>  
> +	/* For both USERPTR and non-coherent MMAP */
>  	dma_sync_sgtable_for_device(buf->dev, sgt, buf->dma_dir);
> +
> +	/* Non-coherrent MMAP only */

Typo: coherrent -> coherent

> +	if (!vb2_dc_is_coherent(buf) && buf->vaddr)
> +		flush_kernel_vmap_range(buf->vaddr, buf->size);
>  }
>  
>  static void vb2_dc_finish(void *buf_priv)
> @@ -115,19 +144,46 @@ static void vb2_dc_finish(void *buf_priv)
>  	struct vb2_dc_buf *buf = buf_priv;
>  	struct sg_table *sgt = buf->dma_sgt;
>  
> +	/* This takes care of DMABUF and user-enforced cache sync hint */
>  	if (buf->vb->skip_cache_sync_on_finish)
>  		return;
>  
> +	/*
> +	 * Coherent MMAP buffers do not need to be synced, unlike coherent
> +	 * USERPTR and non-coherent MMAP buffers.
> +	 */
> +	if (buf->vb->memory == V4L2_MEMORY_MMAP && vb2_dc_is_coherent(buf))
> +		return;
> +
>  	if (!sgt)
>  		return;
>  
> +	/* For both USERPTR and non-coherent MMAP */
>  	dma_sync_sgtable_for_cpu(buf->dev, sgt, buf->dma_dir);
> +
> +	/* Non-coherrent MMAP only */

Same typo.

> +	if (!vb2_dc_is_coherent(buf) && buf->vaddr)
> +		invalidate_kernel_vmap_range(buf->vaddr, buf->size);
>  }
>  
>  /*********************************************/
>  /*        callbacks for MMAP buffers         */
>  /*********************************************/
>  
> +static void __vb2_dc_put(struct vb2_dc_buf *buf)
> +{
> +	if (vb2_dc_is_coherent(buf)) {
> +		dma_free_attrs(buf->dev, buf->size, buf->cookie,
> +			       buf->dma_addr, buf->attrs);
> +		return;
> +	}
> +
> +	if (buf->vaddr)
> +		dma_vunmap_noncontiguous(buf->dev, buf->vaddr);
> +	dma_free_noncontiguous(buf->dev, buf->size,
> +			       buf->dma_sgt, buf->dma_addr);
> +}
> +
>  static void vb2_dc_put(void *buf_priv)
>  {
>  	struct vb2_dc_buf *buf = buf_priv;
> @@ -139,17 +195,47 @@ static void vb2_dc_put(void *buf_priv)
>  		sg_free_table(buf->sgt_base);
>  		kfree(buf->sgt_base);
>  	}
> -	dma_free_attrs(buf->dev, buf->size, buf->cookie, buf->dma_addr,
> -		       buf->attrs);
> +	__vb2_dc_put(buf);
>  	put_device(buf->dev);
>  	kfree(buf);
>  }
>  
> +static int vb2_dc_alloc_coherent(struct vb2_dc_buf *buf)
> +{
> +	struct vb2_queue *q = buf->vb->vb2_queue;
> +
> +	buf->cookie = dma_alloc_attrs(buf->dev,
> +				      buf->size,
> +				      &buf->dma_addr,
> +				      GFP_KERNEL | q->gfp_flags,
> +				      buf->attrs);
> +	if (!buf->cookie)
> +		return -ENOMEM;
> +	if ((q->dma_attrs & DMA_ATTR_NO_KERNEL_MAPPING) == 0)
> +		buf->vaddr = buf->cookie;
> +	return 0;
> +}
> +
> +static int vb2_dc_alloc_non_coherent(struct vb2_dc_buf *buf)
> +{
> +	struct vb2_queue *q = buf->vb->vb2_queue;
> +
> +	buf->dma_sgt = dma_alloc_noncontiguous(buf->dev,
> +					       buf->size,
> +					       buf->dma_dir,
> +					       GFP_KERNEL | q->gfp_flags,
> +					       buf->attrs);
> +	if (!buf->dma_sgt)
> +		return -ENOMEM;

DMA_ATTR_NO_KERNEL_MAPPING makes no sense here? If so, then it would be
good to document that here.

> +	return 0;
> +}
> +
>  static void *vb2_dc_alloc(struct vb2_buffer *vb,
>  			  struct device *dev,
>  			  unsigned long size)
>  {
>  	struct vb2_dc_buf *buf;
> +	int ret;
>  
>  	if (WARN_ON(!dev))
>  		return ERR_PTR(-EINVAL);
> @@ -159,27 +245,28 @@ static void *vb2_dc_alloc(struct vb2_buffer *vb,
>  		return ERR_PTR(-ENOMEM);
>  
>  	buf->attrs = vb->vb2_queue->dma_attrs;
> -	buf->cookie = dma_alloc_attrs(dev, size, &buf->dma_addr,
> -				      GFP_KERNEL | vb->vb2_queue->gfp_flags,
> -				      buf->attrs);
> -	if (!buf->cookie) {
> -		dev_err(dev, "dma_alloc_coherent of size %ld failed\n", size);
> -		kfree(buf);
> -		return ERR_PTR(-ENOMEM);
> -	}
> -
> -	if ((buf->attrs & DMA_ATTR_NO_KERNEL_MAPPING) == 0)
> -		buf->vaddr = buf->cookie;
> +	buf->dma_dir = vb->vb2_queue->dma_dir;
> +	buf->vb = vb;
> +	buf->non_coherent_mem = vb->vb2_queue->non_coherent_mem;
>  
> +	buf->size = size;
>  	/* Prevent the device from being released while the buffer is used */
>  	buf->dev = get_device(dev);
> -	buf->size = size;
> -	buf->dma_dir = vb->vb2_queue->dma_dir;
> +
> +	if (vb2_dc_is_coherent(buf))
> +		ret = vb2_dc_alloc_coherent(buf);
> +	else
> +		ret = vb2_dc_alloc_non_coherent(buf);
> +
> +	if (ret) {
> +		dev_err(dev, "dma alloc of size %ld failed\n", size);
> +		kfree(buf);
> +		return ERR_PTR(-ENOMEM);
> +	}
>  
>  	buf->handler.refcount = &buf->refcount;
>  	buf->handler.put = vb2_dc_put;
>  	buf->handler.arg = buf;
> -	buf->vb = vb;
>  
>  	refcount_set(&buf->refcount, 1);
>  
> @@ -196,9 +283,12 @@ static int vb2_dc_mmap(void *buf_priv, struct vm_area_struct *vma)
>  		return -EINVAL;
>  	}
>  
> -	ret = dma_mmap_attrs(buf->dev, vma, buf->cookie,
> -		buf->dma_addr, buf->size, buf->attrs);
> -
> +	if (vb2_dc_is_coherent(buf))
> +		ret = dma_mmap_attrs(buf->dev, vma, buf->cookie, buf->dma_addr,
> +				     buf->size, buf->attrs);
> +	else
> +		ret = dma_mmap_noncontiguous(buf->dev, vma, buf->size,
> +					     buf->dma_sgt);
>  	if (ret) {
>  		pr_err("Remapping memory failed, error: %d\n", ret);
>  		return ret;
> @@ -390,6 +480,9 @@ static struct sg_table *vb2_dc_get_base_sgt(struct vb2_dc_buf *buf)
>  	int ret;
>  	struct sg_table *sgt;
>  
> +	if (!vb2_dc_is_coherent(buf))
> +		return buf->dma_sgt;
> +
>  	sgt = kmalloc(sizeof(*sgt), GFP_KERNEL);
>  	if (!sgt) {
>  		dev_err(buf->dev, "failed to alloc sg table\n");
> 

Regards,

	Hans

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 8/8] videobuf2: handle non-contiguous DMA allocations
  2021-03-02  0:46 ` [PATCH 8/8] videobuf2: handle non-contiguous DMA allocations Sergey Senozhatsky
  2021-03-22  7:40   ` Hans Verkuil
@ 2021-03-24 12:07   ` Tomasz Figa
  2023-07-04 10:50   ` Hsia-Jun Li
  2 siblings, 0 replies; 21+ messages in thread
From: Tomasz Figa @ 2021-03-24 12:07 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Hans Verkuil, Christoph Hellwig, Mauro Carvalho Chehab,
	linux-media, linux-kernel

On Tue, Mar 02, 2021 at 09:46:24AM +0900, Sergey Senozhatsky wrote:
> This adds support for new noncontiguous DMA API, which
> requires allocators to have two execution branches: one
> for the current API, and one for the new one.
> 
> Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
> [hch: untested conversion to the ne API]
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  .../common/videobuf2/videobuf2-dma-contig.c   | 141 +++++++++++++++---
>  1 file changed, 117 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> index 1e218bc440c6..d6a9f7b682f3 100644
> --- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> +++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> @@ -17,6 +17,7 @@
>  #include <linux/sched.h>
>  #include <linux/slab.h>
>  #include <linux/dma-mapping.h>
> +#include <linux/highmem.h>
>  
>  #include <media/videobuf2-v4l2.h>
>  #include <media/videobuf2-dma-contig.h>
> @@ -42,8 +43,14 @@ struct vb2_dc_buf {
>  	struct dma_buf_attachment	*db_attach;
>  
>  	struct vb2_buffer		*vb;
> +	unsigned int			non_coherent_mem:1;
>  };
>  
> +static bool vb2_dc_is_coherent(struct vb2_dc_buf *buf)
> +{
> +	return !buf->non_coherent_mem;
> +}

nit: Given that this is just a simple negated return, do we need a dedicated
helper for it?

> +
>  /*********************************************/
>  /*        scatterlist table functions        */
>  /*********************************************/
> @@ -78,12 +85,21 @@ static void *vb2_dc_cookie(struct vb2_buffer *vb, void *buf_priv)
>  static void *vb2_dc_vaddr(struct vb2_buffer *vb, void *buf_priv)
>  {
>  	struct vb2_dc_buf *buf = buf_priv;
> -	struct dma_buf_map map;
> -	int ret;
>  
> -	if (!buf->vaddr && buf->db_attach) {
> -		ret = dma_buf_vmap(buf->db_attach->dmabuf, &map);
> -		buf->vaddr = ret ? NULL : map.vaddr;
> +	if (buf->vaddr)
> +		return buf->vaddr;
> +
> +	if (buf->db_attach) {
> +		struct dma_buf_map map;
> +
> +		if (!dma_buf_vmap(buf->db_attach->dmabuf, &map))
> +			buf->vaddr = map.vaddr;
> +	}
> +
> +	if (!vb2_dc_is_coherent(buf)) {

I believe it's not possible for both buf->db_attach and
!vb2_dc_is_coherent() to be true, but nevertheless the code can be
misleading for the reader. Would it make sense to just return early in the
if that handles db_attach?

> +		buf->vaddr = dma_vmap_noncontiguous(buf->dev,
> +						    buf->size,
> +						    buf->dma_sgt);
>  	}
>  
>  	return buf->vaddr;
> @@ -101,13 +117,26 @@ static void vb2_dc_prepare(void *buf_priv)
>  	struct vb2_dc_buf *buf = buf_priv;
>  	struct sg_table *sgt = buf->dma_sgt;
>  
> +	/* This takes care of DMABUF and user-enforced cache sync hint */
>  	if (buf->vb->skip_cache_sync_on_prepare)
>  		return;
>  
> +	/*
> +	 * Coherent MMAP buffers do not need to be synced, unlike coherent
> +	 * USERPTR and non-coherent MMAP buffers.

USERPTR buffers are always considered non-coherent.

> +	 */
> +	if (buf->vb->memory == V4L2_MEMORY_MMAP && vb2_dc_is_coherent(buf))
> +		return;
> +
>  	if (!sgt)
>  		return;
>  
> +	/* For both USERPTR and non-coherent MMAP */
>  	dma_sync_sgtable_for_device(buf->dev, sgt, buf->dma_dir);
> +
> +	/* Non-coherrent MMAP only */
> +	if (!vb2_dc_is_coherent(buf) && buf->vaddr)
> +		flush_kernel_vmap_range(buf->vaddr, buf->size);
>  }
>  
>  static void vb2_dc_finish(void *buf_priv)
> @@ -115,19 +144,46 @@ static void vb2_dc_finish(void *buf_priv)
>  	struct vb2_dc_buf *buf = buf_priv;
>  	struct sg_table *sgt = buf->dma_sgt;
>  
> +	/* This takes care of DMABUF and user-enforced cache sync hint */
>  	if (buf->vb->skip_cache_sync_on_finish)
>  		return;
>  
> +	/*
> +	 * Coherent MMAP buffers do not need to be synced, unlike coherent
> +	 * USERPTR and non-coherent MMAP buffers.

Ditto.

> +	 */
> +	if (buf->vb->memory == V4L2_MEMORY_MMAP && vb2_dc_is_coherent(buf))
> +		return;
> +
>  	if (!sgt)
>  		return;
>  
> +	/* For both USERPTR and non-coherent MMAP */
>  	dma_sync_sgtable_for_cpu(buf->dev, sgt, buf->dma_dir);
> +
> +	/* Non-coherrent MMAP only */
> +	if (!vb2_dc_is_coherent(buf) && buf->vaddr)
> +		invalidate_kernel_vmap_range(buf->vaddr, buf->size);
>  }
>  
>  /*********************************************/
>  /*        callbacks for MMAP buffers         */
>  /*********************************************/
>  
> +static void __vb2_dc_put(struct vb2_dc_buf *buf)
> +{
> +	if (vb2_dc_is_coherent(buf)) {
> +		dma_free_attrs(buf->dev, buf->size, buf->cookie,
> +			       buf->dma_addr, buf->attrs);
> +		return;
> +	}
> +
> +	if (buf->vaddr)
> +		dma_vunmap_noncontiguous(buf->dev, buf->vaddr);
> +	dma_free_noncontiguous(buf->dev, buf->size,
> +			       buf->dma_sgt, buf->dma_addr);
> +}
> +
>  static void vb2_dc_put(void *buf_priv)

nit: Unrelated to this patch or series, but could be a follow-up clean-up:
Could we rename this and the newly added function to include mmap in the
name, since it's only for MMAP buffers?

Best regards,
Tomasz


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/8] videobuf2: support new noncontiguous DMA API
  2021-03-02  0:46 [PATCH 0/8] videobuf2: support new noncontiguous DMA API Sergey Senozhatsky
                   ` (8 preceding siblings ...)
  2021-03-02  0:49 ` [PATCH] v4l-compliance: re-introduce NON_COHERENT and cache hints tests Sergey Senozhatsky
@ 2021-03-24 12:09 ` Tomasz Figa
  9 siblings, 0 replies; 21+ messages in thread
From: Tomasz Figa @ 2021-03-24 12:09 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Hans Verkuil, Christoph Hellwig, Mauro Carvalho Chehab,
	linux-media, linux-kernel

Hi Sergey,

On Tue, Mar 02, 2021 at 09:46:16AM +0900, Sergey Senozhatsky wrote:
> Hello,
> 
> 	RFC
> 
> 	The series adds support for new noncontiguous DMA API [0] and
> adds V4L2_FLAG_MEMORY_NON_COHERENT UAPI. This is similar to previous
> V4L2_FLAG_MEMORY_NON_CONSISTENT (which was renamed), but the patch set
> goes a bit further this time and also does some videobuf2 API
> refactroings along the way.
> 
> A corresponding v4l2-compliance patch will be posted shortly.
> 
> [0] https://lore.kernel.org/lkml/20210301085236.947011-2-hch@lst.de/
> 
> Sergey Senozhatsky (8):
>   videobuf2: rework vb2_mem_ops API
>   videobuf2: inverse buffer cache_hints flags
>   videobuf2: split buffer cache_hints initialisation
>   videobuf2: move cache_hints handling to allocators
>   videobuf2: add V4L2_FLAG_MEMORY_NON_COHERENT flag
>   videobuf2: add queue memory coherency parameter
>   videobuf2: handle V4L2_FLAG_MEMORY_NON_COHERENT flag
>   videobuf2: handle non-contiguous DMA allocations
> 
>  .../userspace-api/media/v4l/buffer.rst        |  40 +++-
>  .../media/v4l/vidioc-create-bufs.rst          |   7 +-
>  .../media/v4l/vidioc-reqbufs.rst              |  16 +-
>  .../media/common/videobuf2/videobuf2-core.c   | 135 +++++++++-----
>  .../common/videobuf2/videobuf2-dma-contig.c   | 175 ++++++++++++++----
>  .../media/common/videobuf2/videobuf2-dma-sg.c |  39 ++--
>  .../media/common/videobuf2/videobuf2-v4l2.c   |  47 ++---
>  .../common/videobuf2/videobuf2-vmalloc.c      |  30 +--
>  drivers/media/dvb-core/dvb_vb2.c              |   2 +-
>  drivers/media/v4l2-core/v4l2-compat-ioctl32.c |   9 +-
>  drivers/media/v4l2-core/v4l2-ioctl.c          |   5 +-
>  include/media/videobuf2-core.h                |  57 +++---
>  include/uapi/linux/videodev2.h                |  13 +-
>  13 files changed, 396 insertions(+), 179 deletions(-)
> 
> -- 
> 2.30.1.766.gb4fecdf3b7-goog
> 

Just some minor nits for patch 8. Otherwise, with Hans's comments
addressed:

Acked-by: Tomasz Figa <tfiga@chromium.org>

Thanks for the great job.

Best regards,
Tomasz


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 8/8] videobuf2: handle non-contiguous DMA allocations
  2021-03-02  0:46 ` [PATCH 8/8] videobuf2: handle non-contiguous DMA allocations Sergey Senozhatsky
  2021-03-22  7:40   ` Hans Verkuil
  2021-03-24 12:07   ` Tomasz Figa
@ 2023-07-04 10:50   ` Hsia-Jun Li
  2023-07-05 10:45     ` Tomasz Figa
  2 siblings, 1 reply; 21+ messages in thread
From: Hsia-Jun Li @ 2023-07-04 10:50 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Christoph Hellwig, Tomasz Figa, Hans Verkuil,
	Mauro Carvalho Chehab, linux-media, linux-kernel

Hello Sergey

I known this patch have been merged for a long time. I am thinking 
whether we need this flag in the new v4l2_ext_buffer.

On 3/2/21 08:46, Sergey Senozhatsky wrote:
> This adds support for new noncontiguous DMA API, which
> requires allocators to have two execution branches: one
> for the current API, and one for the new one.

There is no way we could allocate a coherent buffer in the platform 
except the x86.

Should we make this flag a platform compiling time fixed value?

And I didn't see Gstreamer nor FFmpeg uses it, it is obvious that they 
are running in many(almost all) embedded devices which are ARM.

>
> Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
> [hch: untested conversion to the ne API]
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   .../common/videobuf2/videobuf2-dma-contig.c   | 141 +++++++++++++++---
>   1 file changed, 117 insertions(+), 24 deletions(-)
>
> diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> index 1e218bc440c6..d6a9f7b682f3 100644
> --- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> +++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> @@ -17,6 +17,7 @@
>   #include <linux/sched.h>
>   #include <linux/slab.h>
>   #include <linux/dma-mapping.h>
> +#include <linux/highmem.h>
>   
>   #include <media/videobuf2-v4l2.h>
>   #include <media/videobuf2-dma-contig.h>
> @@ -42,8 +43,14 @@ struct vb2_dc_buf {
>   	struct dma_buf_attachment	*db_attach;
>   
>   	struct vb2_buffer		*vb;
> +	unsigned int			non_coherent_mem:1;
>   };
>   
> +static bool vb2_dc_is_coherent(struct vb2_dc_buf *buf)
> +{
> +	return !buf->non_coherent_mem;
> +}
> +
>   /*********************************************/
>   /*        scatterlist table functions        */
>   /*********************************************/
> @@ -78,12 +85,21 @@ static void *vb2_dc_cookie(struct vb2_buffer *vb, void *buf_priv)
>   static void *vb2_dc_vaddr(struct vb2_buffer *vb, void *buf_priv)
>   {
>   	struct vb2_dc_buf *buf = buf_priv;
> -	struct dma_buf_map map;
> -	int ret;
>   
> -	if (!buf->vaddr && buf->db_attach) {
> -		ret = dma_buf_vmap(buf->db_attach->dmabuf, &map);
> -		buf->vaddr = ret ? NULL : map.vaddr;
> +	if (buf->vaddr)
> +		return buf->vaddr;
> +
> +	if (buf->db_attach) {
> +		struct dma_buf_map map;
> +
> +		if (!dma_buf_vmap(buf->db_attach->dmabuf, &map))
> +			buf->vaddr = map.vaddr;
> +	}
> +
> +	if (!vb2_dc_is_coherent(buf)) {
> +		buf->vaddr = dma_vmap_noncontiguous(buf->dev,
> +						    buf->size,
> +						    buf->dma_sgt);
>   	}
>   
>   	return buf->vaddr;
> @@ -101,13 +117,26 @@ static void vb2_dc_prepare(void *buf_priv)
>   	struct vb2_dc_buf *buf = buf_priv;
>   	struct sg_table *sgt = buf->dma_sgt;
>   
> +	/* This takes care of DMABUF and user-enforced cache sync hint */
>   	if (buf->vb->skip_cache_sync_on_prepare)
>   		return;
>   
> +	/*
> +	 * Coherent MMAP buffers do not need to be synced, unlike coherent
> +	 * USERPTR and non-coherent MMAP buffers.
> +	 */
> +	if (buf->vb->memory == V4L2_MEMORY_MMAP && vb2_dc_is_coherent(buf))
> +		return;
> +
>   	if (!sgt)
>   		return;
>   
> +	/* For both USERPTR and non-coherent MMAP */
>   	dma_sync_sgtable_for_device(buf->dev, sgt, buf->dma_dir);
> +
> +	/* Non-coherrent MMAP only */
> +	if (!vb2_dc_is_coherent(buf) && buf->vaddr)
> +		flush_kernel_vmap_range(buf->vaddr, buf->size);
>   }
>   
>   static void vb2_dc_finish(void *buf_priv)
> @@ -115,19 +144,46 @@ static void vb2_dc_finish(void *buf_priv)
>   	struct vb2_dc_buf *buf = buf_priv;
>   	struct sg_table *sgt = buf->dma_sgt;
>   
> +	/* This takes care of DMABUF and user-enforced cache sync hint */
>   	if (buf->vb->skip_cache_sync_on_finish)
>   		return;
>   
> +	/*
> +	 * Coherent MMAP buffers do not need to be synced, unlike coherent
> +	 * USERPTR and non-coherent MMAP buffers.
> +	 */
> +	if (buf->vb->memory == V4L2_MEMORY_MMAP && vb2_dc_is_coherent(buf))
> +		return;
> +
>   	if (!sgt)
>   		return;
>   
> +	/* For both USERPTR and non-coherent MMAP */
>   	dma_sync_sgtable_for_cpu(buf->dev, sgt, buf->dma_dir);
> +
> +	/* Non-coherrent MMAP only */
> +	if (!vb2_dc_is_coherent(buf) && buf->vaddr)
> +		invalidate_kernel_vmap_range(buf->vaddr, buf->size);
>   }
>   
>   /*********************************************/
>   /*        callbacks for MMAP buffers         */
>   /*********************************************/
>   
> +static void __vb2_dc_put(struct vb2_dc_buf *buf)
> +{
> +	if (vb2_dc_is_coherent(buf)) {
> +		dma_free_attrs(buf->dev, buf->size, buf->cookie,
> +			       buf->dma_addr, buf->attrs);
> +		return;
> +	}
> +
> +	if (buf->vaddr)
> +		dma_vunmap_noncontiguous(buf->dev, buf->vaddr);
> +	dma_free_noncontiguous(buf->dev, buf->size,
> +			       buf->dma_sgt, buf->dma_addr);
> +}
> +
>   static void vb2_dc_put(void *buf_priv)
>   {
>   	struct vb2_dc_buf *buf = buf_priv;
> @@ -139,17 +195,47 @@ static void vb2_dc_put(void *buf_priv)
>   		sg_free_table(buf->sgt_base);
>   		kfree(buf->sgt_base);
>   	}
> -	dma_free_attrs(buf->dev, buf->size, buf->cookie, buf->dma_addr,
> -		       buf->attrs);
> +	__vb2_dc_put(buf);
>   	put_device(buf->dev);
>   	kfree(buf);
>   }
>   
> +static int vb2_dc_alloc_coherent(struct vb2_dc_buf *buf)
> +{
> +	struct vb2_queue *q = buf->vb->vb2_queue;
> +
> +	buf->cookie = dma_alloc_attrs(buf->dev,
> +				      buf->size,
> +				      &buf->dma_addr,
> +				      GFP_KERNEL | q->gfp_flags,
> +				      buf->attrs);
> +	if (!buf->cookie)
> +		return -ENOMEM;
> +	if ((q->dma_attrs & DMA_ATTR_NO_KERNEL_MAPPING) == 0)
> +		buf->vaddr = buf->cookie;
> +	return 0;
> +}
> +
> +static int vb2_dc_alloc_non_coherent(struct vb2_dc_buf *buf)
> +{
> +	struct vb2_queue *q = buf->vb->vb2_queue;
> +
> +	buf->dma_sgt = dma_alloc_noncontiguous(buf->dev,
> +					       buf->size,
> +					       buf->dma_dir,
> +					       GFP_KERNEL | q->gfp_flags,
> +					       buf->attrs);
> +	if (!buf->dma_sgt)
> +		return -ENOMEM;
> +	return 0;
> +}
> +
>   static void *vb2_dc_alloc(struct vb2_buffer *vb,
>   			  struct device *dev,
>   			  unsigned long size)
>   {
>   	struct vb2_dc_buf *buf;
> +	int ret;
>   
>   	if (WARN_ON(!dev))
>   		return ERR_PTR(-EINVAL);
> @@ -159,27 +245,28 @@ static void *vb2_dc_alloc(struct vb2_buffer *vb,
>   		return ERR_PTR(-ENOMEM);
>   
>   	buf->attrs = vb->vb2_queue->dma_attrs;
> -	buf->cookie = dma_alloc_attrs(dev, size, &buf->dma_addr,
> -				      GFP_KERNEL | vb->vb2_queue->gfp_flags,
> -				      buf->attrs);
> -	if (!buf->cookie) {
> -		dev_err(dev, "dma_alloc_coherent of size %ld failed\n", size);
> -		kfree(buf);
> -		return ERR_PTR(-ENOMEM);
> -	}
> -
> -	if ((buf->attrs & DMA_ATTR_NO_KERNEL_MAPPING) == 0)
> -		buf->vaddr = buf->cookie;
> +	buf->dma_dir = vb->vb2_queue->dma_dir;
> +	buf->vb = vb;
> +	buf->non_coherent_mem = vb->vb2_queue->non_coherent_mem;
>   
> +	buf->size = size;
>   	/* Prevent the device from being released while the buffer is used */
>   	buf->dev = get_device(dev);
> -	buf->size = size;
> -	buf->dma_dir = vb->vb2_queue->dma_dir;
> +
> +	if (vb2_dc_is_coherent(buf))
> +		ret = vb2_dc_alloc_coherent(buf);
> +	else
> +		ret = vb2_dc_alloc_non_coherent(buf);
> +
> +	if (ret) {
> +		dev_err(dev, "dma alloc of size %ld failed\n", size);
> +		kfree(buf);
> +		return ERR_PTR(-ENOMEM);
> +	}
>   
>   	buf->handler.refcount = &buf->refcount;
>   	buf->handler.put = vb2_dc_put;
>   	buf->handler.arg = buf;
> -	buf->vb = vb;
>   
>   	refcount_set(&buf->refcount, 1);
>   
> @@ -196,9 +283,12 @@ static int vb2_dc_mmap(void *buf_priv, struct vm_area_struct *vma)
>   		return -EINVAL;
>   	}
>   
> -	ret = dma_mmap_attrs(buf->dev, vma, buf->cookie,
> -		buf->dma_addr, buf->size, buf->attrs);
> -
> +	if (vb2_dc_is_coherent(buf))
> +		ret = dma_mmap_attrs(buf->dev, vma, buf->cookie, buf->dma_addr,
> +				     buf->size, buf->attrs);
> +	else
> +		ret = dma_mmap_noncontiguous(buf->dev, vma, buf->size,
> +					     buf->dma_sgt);
>   	if (ret) {
>   		pr_err("Remapping memory failed, error: %d\n", ret);
>   		return ret;
> @@ -390,6 +480,9 @@ static struct sg_table *vb2_dc_get_base_sgt(struct vb2_dc_buf *buf)
>   	int ret;
>   	struct sg_table *sgt;
>   
> +	if (!vb2_dc_is_coherent(buf))
> +		return buf->dma_sgt;
> +
>   	sgt = kmalloc(sizeof(*sgt), GFP_KERNEL);
>   	if (!sgt) {
>   		dev_err(buf->dev, "failed to alloc sg table\n");

-- 
Hsia-Jun(Randy) Li


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 8/8] videobuf2: handle non-contiguous DMA allocations
  2023-07-04 10:50   ` Hsia-Jun Li
@ 2023-07-05 10:45     ` Tomasz Figa
  2023-07-17  9:21       ` Hsia-Jun Li
  0 siblings, 1 reply; 21+ messages in thread
From: Tomasz Figa @ 2023-07-05 10:45 UTC (permalink / raw)
  To: Hsia-Jun Li
  Cc: Sergey Senozhatsky, Christoph Hellwig, Hans Verkuil,
	Mauro Carvalho Chehab, linux-media, linux-kernel

On Tue, Jul 4, 2023 at 7:51 PM Hsia-Jun Li <Randy.Li@synaptics.com> wrote:
>
> Hello Sergey
>
> I known this patch have been merged for a long time. I am thinking
> whether we need this flag in the new v4l2_ext_buffer.
>
> On 3/2/21 08:46, Sergey Senozhatsky wrote:
> > This adds support for new noncontiguous DMA API, which
> > requires allocators to have two execution branches: one
> > for the current API, and one for the new one.
>
> There is no way we could allocate a coherent buffer in the platform
> except the x86.
>

The flag is for requesting the kernel to try allocating *non*-coherent
buffers if possible. If the flag is not given, it's up to the kernel
to choose the right mapping type, which for vb2-dma-contig is
coherent. For compatibility reasons, we need the user space to pass
the flag to change the allocation behavior to avoid UAPI breakages.

I don't get what you mean that there is no way to allocate a coherent
buffer on a platform other than x86. Most of the platforms implement
dma_alloc_coherent() by remapping the allocated memory in
uncached/write-combine mode. x86 is an exception because it usually
has the DMAs coherent with the CPU caches and no special handling is
necessary, so dma_alloc_coherent() is just a simple pass-through
allocator.

> Should we make this flag a platform compiling time fixed value?

This is not a platform-specific flag. There are use cases which
perform better with coherent buffers (i.e. when there is no CPU access
happening to the buffers or just a linear memcpy) and some perform
better when the mapping is non-coherent (i.e. non-linear access from
the CPU, e.g. a software video encoder).

>
> And I didn't see Gstreamer nor FFmpeg uses it, it is obvious that they
> are running in many(almost all) embedded devices which are ARM.
>

It's likely that those generic frameworks don't have any specific
advanced use cases which would benefit from the performance gains of
this flag. FWIW, ChromeOS uses it in the camera and video stack
whenever complex CPU access to the buffers is needed.

Best regards,
Tomasz

> >
> > Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
> > [hch: untested conversion to the ne API]
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > ---
> >   .../common/videobuf2/videobuf2-dma-contig.c   | 141 +++++++++++++++---
> >   1 file changed, 117 insertions(+), 24 deletions(-)
> >
> > diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> > index 1e218bc440c6..d6a9f7b682f3 100644
> > --- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> > +++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> > @@ -17,6 +17,7 @@
> >   #include <linux/sched.h>
> >   #include <linux/slab.h>
> >   #include <linux/dma-mapping.h>
> > +#include <linux/highmem.h>
> >
> >   #include <media/videobuf2-v4l2.h>
> >   #include <media/videobuf2-dma-contig.h>
> > @@ -42,8 +43,14 @@ struct vb2_dc_buf {
> >       struct dma_buf_attachment       *db_attach;
> >
> >       struct vb2_buffer               *vb;
> > +     unsigned int                    non_coherent_mem:1;
> >   };
> >
> > +static bool vb2_dc_is_coherent(struct vb2_dc_buf *buf)
> > +{
> > +     return !buf->non_coherent_mem;
> > +}
> > +
> >   /*********************************************/
> >   /*        scatterlist table functions        */
> >   /*********************************************/
> > @@ -78,12 +85,21 @@ static void *vb2_dc_cookie(struct vb2_buffer *vb, void *buf_priv)
> >   static void *vb2_dc_vaddr(struct vb2_buffer *vb, void *buf_priv)
> >   {
> >       struct vb2_dc_buf *buf = buf_priv;
> > -     struct dma_buf_map map;
> > -     int ret;
> >
> > -     if (!buf->vaddr && buf->db_attach) {
> > -             ret = dma_buf_vmap(buf->db_attach->dmabuf, &map);
> > -             buf->vaddr = ret ? NULL : map.vaddr;
> > +     if (buf->vaddr)
> > +             return buf->vaddr;
> > +
> > +     if (buf->db_attach) {
> > +             struct dma_buf_map map;
> > +
> > +             if (!dma_buf_vmap(buf->db_attach->dmabuf, &map))
> > +                     buf->vaddr = map.vaddr;
> > +     }
> > +
> > +     if (!vb2_dc_is_coherent(buf)) {
> > +             buf->vaddr = dma_vmap_noncontiguous(buf->dev,
> > +                                                 buf->size,
> > +                                                 buf->dma_sgt);
> >       }
> >
> >       return buf->vaddr;
> > @@ -101,13 +117,26 @@ static void vb2_dc_prepare(void *buf_priv)
> >       struct vb2_dc_buf *buf = buf_priv;
> >       struct sg_table *sgt = buf->dma_sgt;
> >
> > +     /* This takes care of DMABUF and user-enforced cache sync hint */
> >       if (buf->vb->skip_cache_sync_on_prepare)
> >               return;
> >
> > +     /*
> > +      * Coherent MMAP buffers do not need to be synced, unlike coherent
> > +      * USERPTR and non-coherent MMAP buffers.
> > +      */
> > +     if (buf->vb->memory == V4L2_MEMORY_MMAP && vb2_dc_is_coherent(buf))
> > +             return;
> > +
> >       if (!sgt)
> >               return;
> >
> > +     /* For both USERPTR and non-coherent MMAP */
> >       dma_sync_sgtable_for_device(buf->dev, sgt, buf->dma_dir);
> > +
> > +     /* Non-coherrent MMAP only */
> > +     if (!vb2_dc_is_coherent(buf) && buf->vaddr)
> > +             flush_kernel_vmap_range(buf->vaddr, buf->size);
> >   }
> >
> >   static void vb2_dc_finish(void *buf_priv)
> > @@ -115,19 +144,46 @@ static void vb2_dc_finish(void *buf_priv)
> >       struct vb2_dc_buf *buf = buf_priv;
> >       struct sg_table *sgt = buf->dma_sgt;
> >
> > +     /* This takes care of DMABUF and user-enforced cache sync hint */
> >       if (buf->vb->skip_cache_sync_on_finish)
> >               return;
> >
> > +     /*
> > +      * Coherent MMAP buffers do not need to be synced, unlike coherent
> > +      * USERPTR and non-coherent MMAP buffers.
> > +      */
> > +     if (buf->vb->memory == V4L2_MEMORY_MMAP && vb2_dc_is_coherent(buf))
> > +             return;
> > +
> >       if (!sgt)
> >               return;
> >
> > +     /* For both USERPTR and non-coherent MMAP */
> >       dma_sync_sgtable_for_cpu(buf->dev, sgt, buf->dma_dir);
> > +
> > +     /* Non-coherrent MMAP only */
> > +     if (!vb2_dc_is_coherent(buf) && buf->vaddr)
> > +             invalidate_kernel_vmap_range(buf->vaddr, buf->size);
> >   }
> >
> >   /*********************************************/
> >   /*        callbacks for MMAP buffers         */
> >   /*********************************************/
> >
> > +static void __vb2_dc_put(struct vb2_dc_buf *buf)
> > +{
> > +     if (vb2_dc_is_coherent(buf)) {
> > +             dma_free_attrs(buf->dev, buf->size, buf->cookie,
> > +                            buf->dma_addr, buf->attrs);
> > +             return;
> > +     }
> > +
> > +     if (buf->vaddr)
> > +             dma_vunmap_noncontiguous(buf->dev, buf->vaddr);
> > +     dma_free_noncontiguous(buf->dev, buf->size,
> > +                            buf->dma_sgt, buf->dma_addr);
> > +}
> > +
> >   static void vb2_dc_put(void *buf_priv)
> >   {
> >       struct vb2_dc_buf *buf = buf_priv;
> > @@ -139,17 +195,47 @@ static void vb2_dc_put(void *buf_priv)
> >               sg_free_table(buf->sgt_base);
> >               kfree(buf->sgt_base);
> >       }
> > -     dma_free_attrs(buf->dev, buf->size, buf->cookie, buf->dma_addr,
> > -                    buf->attrs);
> > +     __vb2_dc_put(buf);
> >       put_device(buf->dev);
> >       kfree(buf);
> >   }
> >
> > +static int vb2_dc_alloc_coherent(struct vb2_dc_buf *buf)
> > +{
> > +     struct vb2_queue *q = buf->vb->vb2_queue;
> > +
> > +     buf->cookie = dma_alloc_attrs(buf->dev,
> > +                                   buf->size,
> > +                                   &buf->dma_addr,
> > +                                   GFP_KERNEL | q->gfp_flags,
> > +                                   buf->attrs);
> > +     if (!buf->cookie)
> > +             return -ENOMEM;
> > +     if ((q->dma_attrs & DMA_ATTR_NO_KERNEL_MAPPING) == 0)
> > +             buf->vaddr = buf->cookie;
> > +     return 0;
> > +}
> > +
> > +static int vb2_dc_alloc_non_coherent(struct vb2_dc_buf *buf)
> > +{
> > +     struct vb2_queue *q = buf->vb->vb2_queue;
> > +
> > +     buf->dma_sgt = dma_alloc_noncontiguous(buf->dev,
> > +                                            buf->size,
> > +                                            buf->dma_dir,
> > +                                            GFP_KERNEL | q->gfp_flags,
> > +                                            buf->attrs);
> > +     if (!buf->dma_sgt)
> > +             return -ENOMEM;
> > +     return 0;
> > +}
> > +
> >   static void *vb2_dc_alloc(struct vb2_buffer *vb,
> >                         struct device *dev,
> >                         unsigned long size)
> >   {
> >       struct vb2_dc_buf *buf;
> > +     int ret;
> >
> >       if (WARN_ON(!dev))
> >               return ERR_PTR(-EINVAL);
> > @@ -159,27 +245,28 @@ static void *vb2_dc_alloc(struct vb2_buffer *vb,
> >               return ERR_PTR(-ENOMEM);
> >
> >       buf->attrs = vb->vb2_queue->dma_attrs;
> > -     buf->cookie = dma_alloc_attrs(dev, size, &buf->dma_addr,
> > -                                   GFP_KERNEL | vb->vb2_queue->gfp_flags,
> > -                                   buf->attrs);
> > -     if (!buf->cookie) {
> > -             dev_err(dev, "dma_alloc_coherent of size %ld failed\n", size);
> > -             kfree(buf);
> > -             return ERR_PTR(-ENOMEM);
> > -     }
> > -
> > -     if ((buf->attrs & DMA_ATTR_NO_KERNEL_MAPPING) == 0)
> > -             buf->vaddr = buf->cookie;
> > +     buf->dma_dir = vb->vb2_queue->dma_dir;
> > +     buf->vb = vb;
> > +     buf->non_coherent_mem = vb->vb2_queue->non_coherent_mem;
> >
> > +     buf->size = size;
> >       /* Prevent the device from being released while the buffer is used */
> >       buf->dev = get_device(dev);
> > -     buf->size = size;
> > -     buf->dma_dir = vb->vb2_queue->dma_dir;
> > +
> > +     if (vb2_dc_is_coherent(buf))
> > +             ret = vb2_dc_alloc_coherent(buf);
> > +     else
> > +             ret = vb2_dc_alloc_non_coherent(buf);
> > +
> > +     if (ret) {
> > +             dev_err(dev, "dma alloc of size %ld failed\n", size);
> > +             kfree(buf);
> > +             return ERR_PTR(-ENOMEM);
> > +     }
> >
> >       buf->handler.refcount = &buf->refcount;
> >       buf->handler.put = vb2_dc_put;
> >       buf->handler.arg = buf;
> > -     buf->vb = vb;
> >
> >       refcount_set(&buf->refcount, 1);
> >
> > @@ -196,9 +283,12 @@ static int vb2_dc_mmap(void *buf_priv, struct vm_area_struct *vma)
> >               return -EINVAL;
> >       }
> >
> > -     ret = dma_mmap_attrs(buf->dev, vma, buf->cookie,
> > -             buf->dma_addr, buf->size, buf->attrs);
> > -
> > +     if (vb2_dc_is_coherent(buf))
> > +             ret = dma_mmap_attrs(buf->dev, vma, buf->cookie, buf->dma_addr,
> > +                                  buf->size, buf->attrs);
> > +     else
> > +             ret = dma_mmap_noncontiguous(buf->dev, vma, buf->size,
> > +                                          buf->dma_sgt);
> >       if (ret) {
> >               pr_err("Remapping memory failed, error: %d\n", ret);
> >               return ret;
> > @@ -390,6 +480,9 @@ static struct sg_table *vb2_dc_get_base_sgt(struct vb2_dc_buf *buf)
> >       int ret;
> >       struct sg_table *sgt;
> >
> > +     if (!vb2_dc_is_coherent(buf))
> > +             return buf->dma_sgt;
> > +
> >       sgt = kmalloc(sizeof(*sgt), GFP_KERNEL);
> >       if (!sgt) {
> >               dev_err(buf->dev, "failed to alloc sg table\n");
>
> --
> Hsia-Jun(Randy) Li
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 8/8] videobuf2: handle non-contiguous DMA allocations
  2023-07-05 10:45     ` Tomasz Figa
@ 2023-07-17  9:21       ` Hsia-Jun Li
  2023-07-28  8:19         ` Tomasz Figa
  0 siblings, 1 reply; 21+ messages in thread
From: Hsia-Jun Li @ 2023-07-17  9:21 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: Sergey Senozhatsky, Christoph Hellwig, Hans Verkuil,
	Mauro Carvalho Chehab, linux-media, linux-kernel


On 7/5/23 18:45, Tomasz Figa wrote:
> CAUTION: Email originated externally, do not click links or open attachments unless you recognize the sender and know the content is safe.
>
>
> On Tue, Jul 4, 2023 at 7:51 PM Hsia-Jun Li <Randy.Li@synaptics.com> wrote:
>> Hello Sergey
>>
>> I known this patch have been merged for a long time. I am thinking
>> whether we need this flag in the new v4l2_ext_buffer.
>>
>> On 3/2/21 08:46, Sergey Senozhatsky wrote:
>>> This adds support for new noncontiguous DMA API, which
>>> requires allocators to have two execution branches: one
>>> for the current API, and one for the new one.
>> There is no way we could allocate a coherent buffer in the platform
>> except the x86.
>>
> The flag is for requesting the kernel to try allocating *non*-coherent
> buffers if possible. If the flag is not given, it's up to the kernel
> to choose the right mapping type, which for vb2-dma-contig is
> coherent. For compatibility reasons, we need the user space to pass
> the flag to change the allocation behavior to avoid UAPI breakages.
>
> I don't get what you mean that there is no way to allocate a coherent
> buffer on a platform other than x86.

I wonder the case for the x86 platform, does that means we don't need to 
do dma_sync_*() neither DMA_TO_DEVICE  nor DMA_FROM_DEVICE.

When a remote device likes a PCIe peer write to the system memory, the 
CPU's memory controller could be aware of that and invalidate the CPU's 
cache?

>   Most of the platforms implement
> dma_alloc_coherent() by remapping the allocated memory in
> uncached/write-combine mode. x86 is an exception because it usually
> has the DMAs coherent with the CPU caches and no special handling is
> necessary, so dma_alloc_coherent() is just a simple pass-through
> allocator.
>
>> Should we make this flag a platform compiling time fixed value?
> This is not a platform-specific flag. There are use cases which
> perform better with coherent buffers (i.e. when there is no CPU access
> happening to the buffers or just a linear memcpy)

I wonder how to implement the coherent memory in the platform likes 
ARMv7 or later. Disable the CPU cache for those pages?

> and some perform
> better when the mapping is non-coherent (i.e. non-linear access from
> the CPU, e.g. a software video encoder).

One problem from migration from ION to DMA-heap is that we don't have a 
flag for allocating non CPU cache buffer(coherent),

I was thinking, marking all the buffer in ARM to be non coherent, it 
sounds a bad idea now.

Maybe I should send a patch to userspace utils, which let them allocate 
the non coherent buffer first if no mmap() would be invoked.

>
>> And I didn't see Gstreamer nor FFmpeg uses it, it is obvious that they
>> are running in many(almost all) embedded devices which are ARM.
>>
> It's likely that those generic frameworks don't have any specific
> advanced use cases which would benefit from the performance gains of
> this flag. FWIW, ChromeOS uses it in the camera and video stack
> whenever complex CPU access to the buffers is needed.
>
> Best regards,
> Tomasz
>
>>> Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
>>> [hch: untested conversion to the ne API]
>>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>>> ---
>>>    .../common/videobuf2/videobuf2-dma-contig.c   | 141 +++++++++++++++---
>>>    1 file changed, 117 insertions(+), 24 deletions(-)
>>>
>>> diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
>>> index 1e218bc440c6..d6a9f7b682f3 100644
>>> --- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
>>> +++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
>>> @@ -17,6 +17,7 @@
>>>    #include <linux/sched.h>
>>>    #include <linux/slab.h>
>>>    #include <linux/dma-mapping.h>
>>> +#include <linux/highmem.h>
>>>
>>>    #include <media/videobuf2-v4l2.h>
>>>    #include <media/videobuf2-dma-contig.h>
>>> @@ -42,8 +43,14 @@ struct vb2_dc_buf {
>>>        struct dma_buf_attachment       *db_attach;
>>>
>>>        struct vb2_buffer               *vb;
>>> +     unsigned int                    non_coherent_mem:1;
>>>    };
>>>
>>> +static bool vb2_dc_is_coherent(struct vb2_dc_buf *buf)
>>> +{
>>> +     return !buf->non_coherent_mem;
>>> +}
>>> +
>>>    /*********************************************/
>>>    /*        scatterlist table functions        */
>>>    /*********************************************/
>>> @@ -78,12 +85,21 @@ static void *vb2_dc_cookie(struct vb2_buffer *vb, void *buf_priv)
>>>    static void *vb2_dc_vaddr(struct vb2_buffer *vb, void *buf_priv)
>>>    {
>>>        struct vb2_dc_buf *buf = buf_priv;
>>> -     struct dma_buf_map map;
>>> -     int ret;
>>>
>>> -     if (!buf->vaddr && buf->db_attach) {
>>> -             ret = dma_buf_vmap(buf->db_attach->dmabuf, &map);
>>> -             buf->vaddr = ret ? NULL : map.vaddr;
>>> +     if (buf->vaddr)
>>> +             return buf->vaddr;
>>> +
>>> +     if (buf->db_attach) {
>>> +             struct dma_buf_map map;
>>> +
>>> +             if (!dma_buf_vmap(buf->db_attach->dmabuf, &map))
>>> +                     buf->vaddr = map.vaddr;
>>> +     }
>>> +
>>> +     if (!vb2_dc_is_coherent(buf)) {
>>> +             buf->vaddr = dma_vmap_noncontiguous(buf->dev,
>>> +                                                 buf->size,
>>> +                                                 buf->dma_sgt);
>>>        }
>>>
>>>        return buf->vaddr;
>>> @@ -101,13 +117,26 @@ static void vb2_dc_prepare(void *buf_priv)
>>>        struct vb2_dc_buf *buf = buf_priv;
>>>        struct sg_table *sgt = buf->dma_sgt;
>>>
>>> +     /* This takes care of DMABUF and user-enforced cache sync hint */
>>>        if (buf->vb->skip_cache_sync_on_prepare)
>>>                return;
>>>
>>> +     /*
>>> +      * Coherent MMAP buffers do not need to be synced, unlike coherent
>>> +      * USERPTR and non-coherent MMAP buffers.
>>> +      */
>>> +     if (buf->vb->memory == V4L2_MEMORY_MMAP && vb2_dc_is_coherent(buf))
>>> +             return;
>>> +
>>>        if (!sgt)
>>>                return;
>>>
>>> +     /* For both USERPTR and non-coherent MMAP */
>>>        dma_sync_sgtable_for_device(buf->dev, sgt, buf->dma_dir);
>>> +
>>> +     /* Non-coherrent MMAP only */
>>> +     if (!vb2_dc_is_coherent(buf) && buf->vaddr)
>>> +             flush_kernel_vmap_range(buf->vaddr, buf->size);
>>>    }
>>>
>>>    static void vb2_dc_finish(void *buf_priv)
>>> @@ -115,19 +144,46 @@ static void vb2_dc_finish(void *buf_priv)
>>>        struct vb2_dc_buf *buf = buf_priv;
>>>        struct sg_table *sgt = buf->dma_sgt;
>>>
>>> +     /* This takes care of DMABUF and user-enforced cache sync hint */
>>>        if (buf->vb->skip_cache_sync_on_finish)
>>>                return;
>>>
>>> +     /*
>>> +      * Coherent MMAP buffers do not need to be synced, unlike coherent
>>> +      * USERPTR and non-coherent MMAP buffers.
>>> +      */
>>> +     if (buf->vb->memory == V4L2_MEMORY_MMAP && vb2_dc_is_coherent(buf))
>>> +             return;
>>> +
>>>        if (!sgt)
>>>                return;
>>>
>>> +     /* For both USERPTR and non-coherent MMAP */
>>>        dma_sync_sgtable_for_cpu(buf->dev, sgt, buf->dma_dir);
>>> +
>>> +     /* Non-coherrent MMAP only */
>>> +     if (!vb2_dc_is_coherent(buf) && buf->vaddr)
>>> +             invalidate_kernel_vmap_range(buf->vaddr, buf->size);
>>>    }
>>>
>>>    /*********************************************/
>>>    /*        callbacks for MMAP buffers         */
>>>    /*********************************************/
>>>
>>> +static void __vb2_dc_put(struct vb2_dc_buf *buf)
>>> +{
>>> +     if (vb2_dc_is_coherent(buf)) {
>>> +             dma_free_attrs(buf->dev, buf->size, buf->cookie,
>>> +                            buf->dma_addr, buf->attrs);
>>> +             return;
>>> +     }
>>> +
>>> +     if (buf->vaddr)
>>> +             dma_vunmap_noncontiguous(buf->dev, buf->vaddr);
>>> +     dma_free_noncontiguous(buf->dev, buf->size,
>>> +                            buf->dma_sgt, buf->dma_addr);
>>> +}
>>> +
>>>    static void vb2_dc_put(void *buf_priv)
>>>    {
>>>        struct vb2_dc_buf *buf = buf_priv;
>>> @@ -139,17 +195,47 @@ static void vb2_dc_put(void *buf_priv)
>>>                sg_free_table(buf->sgt_base);
>>>                kfree(buf->sgt_base);
>>>        }
>>> -     dma_free_attrs(buf->dev, buf->size, buf->cookie, buf->dma_addr,
>>> -                    buf->attrs);
>>> +     __vb2_dc_put(buf);
>>>        put_device(buf->dev);
>>>        kfree(buf);
>>>    }
>>>
>>> +static int vb2_dc_alloc_coherent(struct vb2_dc_buf *buf)
>>> +{
>>> +     struct vb2_queue *q = buf->vb->vb2_queue;
>>> +
>>> +     buf->cookie = dma_alloc_attrs(buf->dev,
>>> +                                   buf->size,
>>> +                                   &buf->dma_addr,
>>> +                                   GFP_KERNEL | q->gfp_flags,
>>> +                                   buf->attrs);
>>> +     if (!buf->cookie)
>>> +             return -ENOMEM;
>>> +     if ((q->dma_attrs & DMA_ATTR_NO_KERNEL_MAPPING) == 0)
>>> +             buf->vaddr = buf->cookie;
>>> +     return 0;
>>> +}
>>> +
>>> +static int vb2_dc_alloc_non_coherent(struct vb2_dc_buf *buf)
>>> +{
>>> +     struct vb2_queue *q = buf->vb->vb2_queue;
>>> +
>>> +     buf->dma_sgt = dma_alloc_noncontiguous(buf->dev,
>>> +                                            buf->size,
>>> +                                            buf->dma_dir,
>>> +                                            GFP_KERNEL | q->gfp_flags,
>>> +                                            buf->attrs);
>>> +     if (!buf->dma_sgt)
>>> +             return -ENOMEM;
>>> +     return 0;
>>> +}
>>> +
>>>    static void *vb2_dc_alloc(struct vb2_buffer *vb,
>>>                          struct device *dev,
>>>                          unsigned long size)
>>>    {
>>>        struct vb2_dc_buf *buf;
>>> +     int ret;
>>>
>>>        if (WARN_ON(!dev))
>>>                return ERR_PTR(-EINVAL);
>>> @@ -159,27 +245,28 @@ static void *vb2_dc_alloc(struct vb2_buffer *vb,
>>>                return ERR_PTR(-ENOMEM);
>>>
>>>        buf->attrs = vb->vb2_queue->dma_attrs;
>>> -     buf->cookie = dma_alloc_attrs(dev, size, &buf->dma_addr,
>>> -                                   GFP_KERNEL | vb->vb2_queue->gfp_flags,
>>> -                                   buf->attrs);
>>> -     if (!buf->cookie) {
>>> -             dev_err(dev, "dma_alloc_coherent of size %ld failed\n", size);
>>> -             kfree(buf);
>>> -             return ERR_PTR(-ENOMEM);
>>> -     }
>>> -
>>> -     if ((buf->attrs & DMA_ATTR_NO_KERNEL_MAPPING) == 0)
>>> -             buf->vaddr = buf->cookie;
>>> +     buf->dma_dir = vb->vb2_queue->dma_dir;
>>> +     buf->vb = vb;
>>> +     buf->non_coherent_mem = vb->vb2_queue->non_coherent_mem;
>>>
>>> +     buf->size = size;
>>>        /* Prevent the device from being released while the buffer is used */
>>>        buf->dev = get_device(dev);
>>> -     buf->size = size;
>>> -     buf->dma_dir = vb->vb2_queue->dma_dir;
>>> +
>>> +     if (vb2_dc_is_coherent(buf))
>>> +             ret = vb2_dc_alloc_coherent(buf);
>>> +     else
>>> +             ret = vb2_dc_alloc_non_coherent(buf);
>>> +
>>> +     if (ret) {
>>> +             dev_err(dev, "dma alloc of size %ld failed\n", size);
>>> +             kfree(buf);
>>> +             return ERR_PTR(-ENOMEM);
>>> +     }
>>>
>>>        buf->handler.refcount = &buf->refcount;
>>>        buf->handler.put = vb2_dc_put;
>>>        buf->handler.arg = buf;
>>> -     buf->vb = vb;
>>>
>>>        refcount_set(&buf->refcount, 1);
>>>
>>> @@ -196,9 +283,12 @@ static int vb2_dc_mmap(void *buf_priv, struct vm_area_struct *vma)
>>>                return -EINVAL;
>>>        }
>>>
>>> -     ret = dma_mmap_attrs(buf->dev, vma, buf->cookie,
>>> -             buf->dma_addr, buf->size, buf->attrs);
>>> -
>>> +     if (vb2_dc_is_coherent(buf))
>>> +             ret = dma_mmap_attrs(buf->dev, vma, buf->cookie, buf->dma_addr,
>>> +                                  buf->size, buf->attrs);
>>> +     else
>>> +             ret = dma_mmap_noncontiguous(buf->dev, vma, buf->size,
>>> +                                          buf->dma_sgt);
>>>        if (ret) {
>>>                pr_err("Remapping memory failed, error: %d\n", ret);
>>>                return ret;
>>> @@ -390,6 +480,9 @@ static struct sg_table *vb2_dc_get_base_sgt(struct vb2_dc_buf *buf)
>>>        int ret;
>>>        struct sg_table *sgt;
>>>
>>> +     if (!vb2_dc_is_coherent(buf))
>>> +             return buf->dma_sgt;
>>> +
>>>        sgt = kmalloc(sizeof(*sgt), GFP_KERNEL);
>>>        if (!sgt) {
>>>                dev_err(buf->dev, "failed to alloc sg table\n");
>> --
>> Hsia-Jun(Randy) Li
>>
-- 
Hsia-Jun(Randy) Li


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 8/8] videobuf2: handle non-contiguous DMA allocations
  2023-07-17  9:21       ` Hsia-Jun Li
@ 2023-07-28  8:19         ` Tomasz Figa
  0 siblings, 0 replies; 21+ messages in thread
From: Tomasz Figa @ 2023-07-28  8:19 UTC (permalink / raw)
  To: Hsia-Jun Li
  Cc: Sergey Senozhatsky, Christoph Hellwig, Hans Verkuil,
	Mauro Carvalho Chehab, linux-media, linux-kernel

On Mon, Jul 17, 2023 at 05:21:23PM +0800, Hsia-Jun Li wrote:
> 
> On 7/5/23 18:45, Tomasz Figa wrote:
> > CAUTION: Email originated externally, do not click links or open attachments unless you recognize the sender and know the content is safe.
> > 
> > 
> > On Tue, Jul 4, 2023 at 7:51 PM Hsia-Jun Li <Randy.Li@synaptics.com> wrote:
> > > Hello Sergey
> > > 
> > > I known this patch have been merged for a long time. I am thinking
> > > whether we need this flag in the new v4l2_ext_buffer.
> > > 
> > > On 3/2/21 08:46, Sergey Senozhatsky wrote:
> > > > This adds support for new noncontiguous DMA API, which
> > > > requires allocators to have two execution branches: one
> > > > for the current API, and one for the new one.
> > > There is no way we could allocate a coherent buffer in the platform
> > > except the x86.
> > > 
> > The flag is for requesting the kernel to try allocating *non*-coherent
> > buffers if possible. If the flag is not given, it's up to the kernel
> > to choose the right mapping type, which for vb2-dma-contig is
> > coherent. For compatibility reasons, we need the user space to pass
> > the flag to change the allocation behavior to avoid UAPI breakages.
> > 
> > I don't get what you mean that there is no way to allocate a coherent
> > buffer on a platform other than x86.
> 
> I wonder the case for the x86 platform, does that means we don't need to do
> dma_sync_*() neither DMA_TO_DEVICE  nor DMA_FROM_DEVICE.

It's only guaranteed if you allocate the memory using
dma_alloc_coherent() and friends, regardless of the architecture.

For memory that is not allocated from the coherent memory allocator,
there is never a guarantee that it would be coherent, so the drivers
need to always call dma_sync_*(). It's just that on x86, for 99% of the
cases, the corresponding implementation of the sync ops would be no-op.

> 
> When a remote device likes a PCIe peer write to the system memory, the CPU's
> memory controller could be aware of that and invalidate the CPU's cache?
> 

The details are specific to the CPU implementation, but yes, the CPU
cache would reflect the data as it would be written by the DMA peer.

> >   Most of the platforms implement
> > dma_alloc_coherent() by remapping the allocated memory in
> > uncached/write-combine mode. x86 is an exception because it usually
> > has the DMAs coherent with the CPU caches and no special handling is
> > necessary, so dma_alloc_coherent() is just a simple pass-through
> > allocator.
> > 
> > > Should we make this flag a platform compiling time fixed value?
> > This is not a platform-specific flag. There are use cases which
> > perform better with coherent buffers (i.e. when there is no CPU access
> > happening to the buffers or just a linear memcpy)
> 
> I wonder how to implement the coherent memory in the platform likes ARMv7 or
> later. Disable the CPU cache for those pages?
> 

Yes, that's exactly how dma_alloc_coherent() does it. It re-maps the
memory as write-combine.

> > and some perform
> > better when the mapping is non-coherent (i.e. non-linear access from
> > the CPU, e.g. a software video encoder).
> 
> One problem from migration from ION to DMA-heap is that we don't have a flag
> for allocating non CPU cache buffer(coherent),
> 
> I was thinking, marking all the buffer in ARM to be non coherent, it sounds
> a bad idea now.
> 
> Maybe I should send a patch to userspace utils, which let them allocate the
> non coherent buffer first if no mmap() would be invoked.
> 

I have to admit that I'm not really familiar with the DMA-buf heap
UAPIs, but I think we could add some allocation flag that would be a
hint from the userspace that the buffer doesn't have to be coherent.
(Which is exactly what we did for V4L2 MMAP buffers.)

Best regards,
Tomasz

> > 
> > > And I didn't see Gstreamer nor FFmpeg uses it, it is obvious that they
> > > are running in many(almost all) embedded devices which are ARM.
> > > 
> > It's likely that those generic frameworks don't have any specific
> > advanced use cases which would benefit from the performance gains of
> > this flag. FWIW, ChromeOS uses it in the camera and video stack
> > whenever complex CPU access to the buffers is needed.
> > 
> > Best regards,
> > Tomasz
> > 
> > > > Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
> > > > [hch: untested conversion to the ne API]
> > > > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > > > ---
> > > >    .../common/videobuf2/videobuf2-dma-contig.c   | 141 +++++++++++++++---
> > > >    1 file changed, 117 insertions(+), 24 deletions(-)
> > > > 
> > > > diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> > > > index 1e218bc440c6..d6a9f7b682f3 100644
> > > > --- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> > > > +++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
> > > > @@ -17,6 +17,7 @@
> > > >    #include <linux/sched.h>
> > > >    #include <linux/slab.h>
> > > >    #include <linux/dma-mapping.h>
> > > > +#include <linux/highmem.h>
> > > > 
> > > >    #include <media/videobuf2-v4l2.h>
> > > >    #include <media/videobuf2-dma-contig.h>
> > > > @@ -42,8 +43,14 @@ struct vb2_dc_buf {
> > > >        struct dma_buf_attachment       *db_attach;
> > > > 
> > > >        struct vb2_buffer               *vb;
> > > > +     unsigned int                    non_coherent_mem:1;
> > > >    };
> > > > 
> > > > +static bool vb2_dc_is_coherent(struct vb2_dc_buf *buf)
> > > > +{
> > > > +     return !buf->non_coherent_mem;
> > > > +}
> > > > +
> > > >    /*********************************************/
> > > >    /*        scatterlist table functions        */
> > > >    /*********************************************/
> > > > @@ -78,12 +85,21 @@ static void *vb2_dc_cookie(struct vb2_buffer *vb, void *buf_priv)
> > > >    static void *vb2_dc_vaddr(struct vb2_buffer *vb, void *buf_priv)
> > > >    {
> > > >        struct vb2_dc_buf *buf = buf_priv;
> > > > -     struct dma_buf_map map;
> > > > -     int ret;
> > > > 
> > > > -     if (!buf->vaddr && buf->db_attach) {
> > > > -             ret = dma_buf_vmap(buf->db_attach->dmabuf, &map);
> > > > -             buf->vaddr = ret ? NULL : map.vaddr;
> > > > +     if (buf->vaddr)
> > > > +             return buf->vaddr;
> > > > +
> > > > +     if (buf->db_attach) {
> > > > +             struct dma_buf_map map;
> > > > +
> > > > +             if (!dma_buf_vmap(buf->db_attach->dmabuf, &map))
> > > > +                     buf->vaddr = map.vaddr;
> > > > +     }
> > > > +
> > > > +     if (!vb2_dc_is_coherent(buf)) {
> > > > +             buf->vaddr = dma_vmap_noncontiguous(buf->dev,
> > > > +                                                 buf->size,
> > > > +                                                 buf->dma_sgt);
> > > >        }
> > > > 
> > > >        return buf->vaddr;
> > > > @@ -101,13 +117,26 @@ static void vb2_dc_prepare(void *buf_priv)
> > > >        struct vb2_dc_buf *buf = buf_priv;
> > > >        struct sg_table *sgt = buf->dma_sgt;
> > > > 
> > > > +     /* This takes care of DMABUF and user-enforced cache sync hint */
> > > >        if (buf->vb->skip_cache_sync_on_prepare)
> > > >                return;
> > > > 
> > > > +     /*
> > > > +      * Coherent MMAP buffers do not need to be synced, unlike coherent
> > > > +      * USERPTR and non-coherent MMAP buffers.
> > > > +      */
> > > > +     if (buf->vb->memory == V4L2_MEMORY_MMAP && vb2_dc_is_coherent(buf))
> > > > +             return;
> > > > +
> > > >        if (!sgt)
> > > >                return;
> > > > 
> > > > +     /* For both USERPTR and non-coherent MMAP */
> > > >        dma_sync_sgtable_for_device(buf->dev, sgt, buf->dma_dir);
> > > > +
> > > > +     /* Non-coherrent MMAP only */
> > > > +     if (!vb2_dc_is_coherent(buf) && buf->vaddr)
> > > > +             flush_kernel_vmap_range(buf->vaddr, buf->size);
> > > >    }
> > > > 
> > > >    static void vb2_dc_finish(void *buf_priv)
> > > > @@ -115,19 +144,46 @@ static void vb2_dc_finish(void *buf_priv)
> > > >        struct vb2_dc_buf *buf = buf_priv;
> > > >        struct sg_table *sgt = buf->dma_sgt;
> > > > 
> > > > +     /* This takes care of DMABUF and user-enforced cache sync hint */
> > > >        if (buf->vb->skip_cache_sync_on_finish)
> > > >                return;
> > > > 
> > > > +     /*
> > > > +      * Coherent MMAP buffers do not need to be synced, unlike coherent
> > > > +      * USERPTR and non-coherent MMAP buffers.
> > > > +      */
> > > > +     if (buf->vb->memory == V4L2_MEMORY_MMAP && vb2_dc_is_coherent(buf))
> > > > +             return;
> > > > +
> > > >        if (!sgt)
> > > >                return;
> > > > 
> > > > +     /* For both USERPTR and non-coherent MMAP */
> > > >        dma_sync_sgtable_for_cpu(buf->dev, sgt, buf->dma_dir);
> > > > +
> > > > +     /* Non-coherrent MMAP only */
> > > > +     if (!vb2_dc_is_coherent(buf) && buf->vaddr)
> > > > +             invalidate_kernel_vmap_range(buf->vaddr, buf->size);
> > > >    }
> > > > 
> > > >    /*********************************************/
> > > >    /*        callbacks for MMAP buffers         */
> > > >    /*********************************************/
> > > > 
> > > > +static void __vb2_dc_put(struct vb2_dc_buf *buf)
> > > > +{
> > > > +     if (vb2_dc_is_coherent(buf)) {
> > > > +             dma_free_attrs(buf->dev, buf->size, buf->cookie,
> > > > +                            buf->dma_addr, buf->attrs);
> > > > +             return;
> > > > +     }
> > > > +
> > > > +     if (buf->vaddr)
> > > > +             dma_vunmap_noncontiguous(buf->dev, buf->vaddr);
> > > > +     dma_free_noncontiguous(buf->dev, buf->size,
> > > > +                            buf->dma_sgt, buf->dma_addr);
> > > > +}
> > > > +
> > > >    static void vb2_dc_put(void *buf_priv)
> > > >    {
> > > >        struct vb2_dc_buf *buf = buf_priv;
> > > > @@ -139,17 +195,47 @@ static void vb2_dc_put(void *buf_priv)
> > > >                sg_free_table(buf->sgt_base);
> > > >                kfree(buf->sgt_base);
> > > >        }
> > > > -     dma_free_attrs(buf->dev, buf->size, buf->cookie, buf->dma_addr,
> > > > -                    buf->attrs);
> > > > +     __vb2_dc_put(buf);
> > > >        put_device(buf->dev);
> > > >        kfree(buf);
> > > >    }
> > > > 
> > > > +static int vb2_dc_alloc_coherent(struct vb2_dc_buf *buf)
> > > > +{
> > > > +     struct vb2_queue *q = buf->vb->vb2_queue;
> > > > +
> > > > +     buf->cookie = dma_alloc_attrs(buf->dev,
> > > > +                                   buf->size,
> > > > +                                   &buf->dma_addr,
> > > > +                                   GFP_KERNEL | q->gfp_flags,
> > > > +                                   buf->attrs);
> > > > +     if (!buf->cookie)
> > > > +             return -ENOMEM;
> > > > +     if ((q->dma_attrs & DMA_ATTR_NO_KERNEL_MAPPING) == 0)
> > > > +             buf->vaddr = buf->cookie;
> > > > +     return 0;
> > > > +}
> > > > +
> > > > +static int vb2_dc_alloc_non_coherent(struct vb2_dc_buf *buf)
> > > > +{
> > > > +     struct vb2_queue *q = buf->vb->vb2_queue;
> > > > +
> > > > +     buf->dma_sgt = dma_alloc_noncontiguous(buf->dev,
> > > > +                                            buf->size,
> > > > +                                            buf->dma_dir,
> > > > +                                            GFP_KERNEL | q->gfp_flags,
> > > > +                                            buf->attrs);
> > > > +     if (!buf->dma_sgt)
> > > > +             return -ENOMEM;
> > > > +     return 0;
> > > > +}
> > > > +
> > > >    static void *vb2_dc_alloc(struct vb2_buffer *vb,
> > > >                          struct device *dev,
> > > >                          unsigned long size)
> > > >    {
> > > >        struct vb2_dc_buf *buf;
> > > > +     int ret;
> > > > 
> > > >        if (WARN_ON(!dev))
> > > >                return ERR_PTR(-EINVAL);
> > > > @@ -159,27 +245,28 @@ static void *vb2_dc_alloc(struct vb2_buffer *vb,
> > > >                return ERR_PTR(-ENOMEM);
> > > > 
> > > >        buf->attrs = vb->vb2_queue->dma_attrs;
> > > > -     buf->cookie = dma_alloc_attrs(dev, size, &buf->dma_addr,
> > > > -                                   GFP_KERNEL | vb->vb2_queue->gfp_flags,
> > > > -                                   buf->attrs);
> > > > -     if (!buf->cookie) {
> > > > -             dev_err(dev, "dma_alloc_coherent of size %ld failed\n", size);
> > > > -             kfree(buf);
> > > > -             return ERR_PTR(-ENOMEM);
> > > > -     }
> > > > -
> > > > -     if ((buf->attrs & DMA_ATTR_NO_KERNEL_MAPPING) == 0)
> > > > -             buf->vaddr = buf->cookie;
> > > > +     buf->dma_dir = vb->vb2_queue->dma_dir;
> > > > +     buf->vb = vb;
> > > > +     buf->non_coherent_mem = vb->vb2_queue->non_coherent_mem;
> > > > 
> > > > +     buf->size = size;
> > > >        /* Prevent the device from being released while the buffer is used */
> > > >        buf->dev = get_device(dev);
> > > > -     buf->size = size;
> > > > -     buf->dma_dir = vb->vb2_queue->dma_dir;
> > > > +
> > > > +     if (vb2_dc_is_coherent(buf))
> > > > +             ret = vb2_dc_alloc_coherent(buf);
> > > > +     else
> > > > +             ret = vb2_dc_alloc_non_coherent(buf);
> > > > +
> > > > +     if (ret) {
> > > > +             dev_err(dev, "dma alloc of size %ld failed\n", size);
> > > > +             kfree(buf);
> > > > +             return ERR_PTR(-ENOMEM);
> > > > +     }
> > > > 
> > > >        buf->handler.refcount = &buf->refcount;
> > > >        buf->handler.put = vb2_dc_put;
> > > >        buf->handler.arg = buf;
> > > > -     buf->vb = vb;
> > > > 
> > > >        refcount_set(&buf->refcount, 1);
> > > > 
> > > > @@ -196,9 +283,12 @@ static int vb2_dc_mmap(void *buf_priv, struct vm_area_struct *vma)
> > > >                return -EINVAL;
> > > >        }
> > > > 
> > > > -     ret = dma_mmap_attrs(buf->dev, vma, buf->cookie,
> > > > -             buf->dma_addr, buf->size, buf->attrs);
> > > > -
> > > > +     if (vb2_dc_is_coherent(buf))
> > > > +             ret = dma_mmap_attrs(buf->dev, vma, buf->cookie, buf->dma_addr,
> > > > +                                  buf->size, buf->attrs);
> > > > +     else
> > > > +             ret = dma_mmap_noncontiguous(buf->dev, vma, buf->size,
> > > > +                                          buf->dma_sgt);
> > > >        if (ret) {
> > > >                pr_err("Remapping memory failed, error: %d\n", ret);
> > > >                return ret;
> > > > @@ -390,6 +480,9 @@ static struct sg_table *vb2_dc_get_base_sgt(struct vb2_dc_buf *buf)
> > > >        int ret;
> > > >        struct sg_table *sgt;
> > > > 
> > > > +     if (!vb2_dc_is_coherent(buf))
> > > > +             return buf->dma_sgt;
> > > > +
> > > >        sgt = kmalloc(sizeof(*sgt), GFP_KERNEL);
> > > >        if (!sgt) {
> > > >                dev_err(buf->dev, "failed to alloc sg table\n");
> > > --
> > > Hsia-Jun(Randy) Li
> > > 
> -- 
> Hsia-Jun(Randy) Li
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2023-07-28  8:20 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-02  0:46 [PATCH 0/8] videobuf2: support new noncontiguous DMA API Sergey Senozhatsky
2021-03-02  0:46 ` [PATCH 1/8] videobuf2: rework vb2_mem_ops API Sergey Senozhatsky
2021-03-02  0:46 ` [PATCH 2/8] videobuf2: inverse buffer cache_hints flags Sergey Senozhatsky
2021-03-02  0:46 ` [PATCH 3/8] videobuf2: split buffer cache_hints initialisation Sergey Senozhatsky
2021-03-02  0:46 ` [PATCH 4/8] videobuf2: move cache_hints handling to allocators Sergey Senozhatsky
2021-03-02  0:46 ` [PATCH 5/8] videobuf2: add V4L2_FLAG_MEMORY_NON_COHERENT flag Sergey Senozhatsky
2021-03-22  7:02   ` Hans Verkuil
2021-03-02  0:46 ` [PATCH 6/8] videobuf2: add queue memory coherency parameter Sergey Senozhatsky
2021-03-02  0:46 ` [PATCH 7/8] videobuf2: handle V4L2_FLAG_MEMORY_NON_COHERENT flag Sergey Senozhatsky
2021-03-22  7:16   ` Hans Verkuil
2021-03-22  7:18   ` Hans Verkuil
2021-03-02  0:46 ` [PATCH 8/8] videobuf2: handle non-contiguous DMA allocations Sergey Senozhatsky
2021-03-22  7:40   ` Hans Verkuil
2021-03-24 12:07   ` Tomasz Figa
2023-07-04 10:50   ` Hsia-Jun Li
2023-07-05 10:45     ` Tomasz Figa
2023-07-17  9:21       ` Hsia-Jun Li
2023-07-28  8:19         ` Tomasz Figa
2021-03-02  0:49 ` [PATCH] v4l-compliance: re-introduce NON_COHERENT and cache hints tests Sergey Senozhatsky
2021-03-02  0:50   ` Sergey Senozhatsky
2021-03-24 12:09 ` [PATCH 0/8] videobuf2: support new noncontiguous DMA API Tomasz Figa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).