linux-media.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC RESEND 00/11] vb2: Handle user cache hints, allow drivers to choose cache coherency
@ 2015-09-11 11:50 Sakari Ailus
  2015-09-11 11:50 ` [RFC RESEND 01/11] vb2: Rename confusingly named internal buffer preparation functions Sakari Ailus
                   ` (10 more replies)
  0 siblings, 11 replies; 38+ messages in thread
From: Sakari Ailus @ 2015-09-11 11:50 UTC (permalink / raw)
  To: linux-media
  Cc: pawel, m.szyprowski, kyungmin.park, hverkuil, sumit.semwal,
	robdclark, daniel.vetter, labbott

Hi folks,

This RFC patchset achieves two main objectives:

1. Respects cache flags passed from the user space. As no driver nor
videobuf2 has (ever?) implemented them, the two flags are replaced by a
single one (V4L2_BUF_FLAG_NO_CACHE_SYNC) and the two old flags are
deprecated. This is done since a single flag provides the driver with
enough information on what to do. (See more info in patch 4.)

2. Allows a driver using videobuf2 dma-contig memory type to choose
whether it prefers coherent or non-coherent CPU access to buffer memory
for MMAP and USERPTR buffers. This could be later extended to be specified
by the user, and per buffer if needed.

Only dma-contig memory type is changed but the same could be done to
dma-sg as well. I can add it to the set if people are happy with the
changes to dma-contig.

-- 
Kind regards,
Sakari


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [RFC RESEND 01/11] vb2: Rename confusingly named internal buffer preparation functions
  2015-09-11 11:50 [RFC RESEND 00/11] vb2: Handle user cache hints, allow drivers to choose cache coherency Sakari Ailus
@ 2015-09-11 11:50 ` Sakari Ailus
  2015-09-11 15:57   ` Hans Verkuil
  2016-12-15 15:40   ` Laurent Pinchart
  2015-09-11 11:50 ` [RFC RESEND 02/11] vb2: Move buffer cache synchronisation to prepare from queue Sakari Ailus
                   ` (9 subsequent siblings)
  10 siblings, 2 replies; 38+ messages in thread
From: Sakari Ailus @ 2015-09-11 11:50 UTC (permalink / raw)
  To: linux-media
  Cc: pawel, m.szyprowski, kyungmin.park, hverkuil, sumit.semwal,
	robdclark, daniel.vetter, labbott

Rename __qbuf_*() functions which are specific to a buffer type as
__prepare_*() which matches with what they do. The naming was there for
historical reasons; the purpose of the functions was changed without
renaming them.

Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
---
 drivers/media/v4l2-core/videobuf2-core.c | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/drivers/media/v4l2-core/videobuf2-core.c b/drivers/media/v4l2-core/videobuf2-core.c
index b866a6b..af6a23a 100644
--- a/drivers/media/v4l2-core/videobuf2-core.c
+++ b/drivers/media/v4l2-core/videobuf2-core.c
@@ -1391,18 +1391,19 @@ static void __fill_vb2_buffer(struct vb2_buffer *vb, const struct v4l2_buffer *b
 }
 
 /**
- * __qbuf_mmap() - handle qbuf of an MMAP buffer
+ * __prepare_mmap() - prepare an MMAP buffer
  */
-static int __qbuf_mmap(struct vb2_buffer *vb, const struct v4l2_buffer *b)
+static int __prepare_mmap(struct vb2_buffer *vb, const struct v4l2_buffer *b)
 {
 	__fill_vb2_buffer(vb, b, vb->v4l2_planes);
 	return call_vb_qop(vb, buf_prepare, vb);
 }
 
 /**
- * __qbuf_userptr() - handle qbuf of a USERPTR buffer
+ * __prepare_userptr() - prepare a USERPTR buffer
  */
-static int __qbuf_userptr(struct vb2_buffer *vb, const struct v4l2_buffer *b)
+static int __prepare_userptr(struct vb2_buffer *vb,
+			     const struct v4l2_buffer *b)
 {
 	struct v4l2_plane planes[VIDEO_MAX_PLANES];
 	struct vb2_queue *q = vb->vb2_queue;
@@ -1504,9 +1505,9 @@ err:
 }
 
 /**
- * __qbuf_dmabuf() - handle qbuf of a DMABUF buffer
+ * __prepare_dmabuf() - prepare a DMABUF buffer
  */
-static int __qbuf_dmabuf(struct vb2_buffer *vb, const struct v4l2_buffer *b)
+static int __prepare_dmabuf(struct vb2_buffer *vb, const struct v4l2_buffer *b)
 {
 	struct v4l2_plane planes[VIDEO_MAX_PLANES];
 	struct vb2_queue *q = vb->vb2_queue;
@@ -1678,15 +1679,15 @@ static int __buf_prepare(struct vb2_buffer *vb, const struct v4l2_buffer *b)
 
 	switch (q->memory) {
 	case V4L2_MEMORY_MMAP:
-		ret = __qbuf_mmap(vb, b);
+		ret = __prepare_mmap(vb, b);
 		break;
 	case V4L2_MEMORY_USERPTR:
 		down_read(&current->mm->mmap_sem);
-		ret = __qbuf_userptr(vb, b);
+		ret = __prepare_userptr(vb, b);
 		up_read(&current->mm->mmap_sem);
 		break;
 	case V4L2_MEMORY_DMABUF:
-		ret = __qbuf_dmabuf(vb, b);
+		ret = __prepare_dmabuf(vb, b);
 		break;
 	default:
 		WARN(1, "Invalid queue type\n");
-- 
2.1.0.231.g7484e3b


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC RESEND 02/11] vb2: Move buffer cache synchronisation to prepare from queue
  2015-09-11 11:50 [RFC RESEND 00/11] vb2: Handle user cache hints, allow drivers to choose cache coherency Sakari Ailus
  2015-09-11 11:50 ` [RFC RESEND 01/11] vb2: Rename confusingly named internal buffer preparation functions Sakari Ailus
@ 2015-09-11 11:50 ` Sakari Ailus
  2015-09-11 16:11   ` Hans Verkuil
  2015-09-11 11:50 ` [RFC RESEND 03/11] vb2: Move cache synchronisation from buffer done to dqbuf handler Sakari Ailus
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 38+ messages in thread
From: Sakari Ailus @ 2015-09-11 11:50 UTC (permalink / raw)
  To: linux-media
  Cc: pawel, m.szyprowski, kyungmin.park, hverkuil, sumit.semwal,
	robdclark, daniel.vetter, labbott

The buffer cache should be synchronised in buffer preparation, not when
the buffer is queued to the device. Fix this.

Mmap buffers do not need cache synchronisation since they are always
coherent.

Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
---
 drivers/media/v4l2-core/videobuf2-core.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/drivers/media/v4l2-core/videobuf2-core.c b/drivers/media/v4l2-core/videobuf2-core.c
index af6a23a..64fce4d 100644
--- a/drivers/media/v4l2-core/videobuf2-core.c
+++ b/drivers/media/v4l2-core/videobuf2-core.c
@@ -1636,10 +1636,6 @@ static void __enqueue_in_driver(struct vb2_buffer *vb)
 
 	trace_vb2_buf_queue(q, vb);
 
-	/* sync buffers */
-	for (plane = 0; plane < vb->num_planes; ++plane)
-		call_void_memop(vb, prepare, vb->planes[plane].mem_priv);
-
 	call_void_vb_qop(vb, buf_queue, vb);
 }
 
@@ -1694,11 +1690,19 @@ static int __buf_prepare(struct vb2_buffer *vb, const struct v4l2_buffer *b)
 		ret = -EINVAL;
 	}
 
-	if (ret)
+	if (ret) {
 		dprintk(1, "buffer preparation failed: %d\n", ret);
-	vb->state = ret ? VB2_BUF_STATE_DEQUEUED : VB2_BUF_STATE_PREPARED;
+		vb->state = VB2_BUF_STATE_DEQUEUED;
+		return ret;
+	}
 
-	return ret;
+	/* sync buffers */
+	for (plane = 0; plane < vb->num_planes; ++plane)
+		call_void_memop(vb, prepare, vb->planes[plane].mem_priv);
+
+	vb->state = VB2_BUF_STATE_PREPARED;
+
+	return 0;
 }
 
 static int vb2_queue_or_prepare_buf(struct vb2_queue *q, struct v4l2_buffer *b,
-- 
2.1.0.231.g7484e3b


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC RESEND 03/11] vb2: Move cache synchronisation from buffer done to dqbuf handler
  2015-09-11 11:50 [RFC RESEND 00/11] vb2: Handle user cache hints, allow drivers to choose cache coherency Sakari Ailus
  2015-09-11 11:50 ` [RFC RESEND 01/11] vb2: Rename confusingly named internal buffer preparation functions Sakari Ailus
  2015-09-11 11:50 ` [RFC RESEND 02/11] vb2: Move buffer cache synchronisation to prepare from queue Sakari Ailus
@ 2015-09-11 11:50 ` Sakari Ailus
  2015-09-11 16:25   ` Hans Verkuil
  2015-09-11 11:50 ` [RFC RESEND 04/11] v4l: Unify cache management hint buffer flags Sakari Ailus
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 38+ messages in thread
From: Sakari Ailus @ 2015-09-11 11:50 UTC (permalink / raw)
  To: linux-media
  Cc: pawel, m.szyprowski, kyungmin.park, hverkuil, sumit.semwal,
	robdclark, daniel.vetter, labbott

The cache synchronisation may be a time consuming operation and thus not
best performed in an interrupt which is a typical context for
vb2_buffer_done() calls. This may consume up to tens of ms on some
machines, depending on the buffer size.

Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
---
 drivers/media/v4l2-core/videobuf2-core.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/media/v4l2-core/videobuf2-core.c b/drivers/media/v4l2-core/videobuf2-core.c
index 64fce4d..c5c0707a 100644
--- a/drivers/media/v4l2-core/videobuf2-core.c
+++ b/drivers/media/v4l2-core/videobuf2-core.c
@@ -1177,7 +1177,6 @@ void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state)
 {
 	struct vb2_queue *q = vb->vb2_queue;
 	unsigned long flags;
-	unsigned int plane;
 
 	if (WARN_ON(vb->state != VB2_BUF_STATE_ACTIVE))
 		return;
@@ -1197,10 +1196,6 @@ void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state)
 	dprintk(4, "done processing on buffer %d, state: %d\n",
 			vb->v4l2_buf.index, state);
 
-	/* sync buffers */
-	for (plane = 0; plane < vb->num_planes; ++plane)
-		call_void_memop(vb, finish, vb->planes[plane].mem_priv);
-
 	/* Add the buffer to the done buffers list */
 	spin_lock_irqsave(&q->done_lock, flags);
 	vb->state = state;
@@ -2086,7 +2081,7 @@ EXPORT_SYMBOL_GPL(vb2_wait_for_all_buffers);
 static void __vb2_dqbuf(struct vb2_buffer *vb)
 {
 	struct vb2_queue *q = vb->vb2_queue;
-	unsigned int i;
+	unsigned int plane;
 
 	/* nothing to do if the buffer is already dequeued */
 	if (vb->state == VB2_BUF_STATE_DEQUEUED)
@@ -2094,13 +2089,18 @@ static void __vb2_dqbuf(struct vb2_buffer *vb)
 
 	vb->state = VB2_BUF_STATE_DEQUEUED;
 
+	/* sync buffers */
+	for (plane = 0; plane < vb->num_planes; plane++)
+		call_void_memop(vb, finish, vb->planes[plane].mem_priv);
+
 	/* unmap DMABUF buffer */
 	if (q->memory == V4L2_MEMORY_DMABUF)
-		for (i = 0; i < vb->num_planes; ++i) {
-			if (!vb->planes[i].dbuf_mapped)
+		for (plane = 0; plane < vb->num_planes; ++plane) {
+			if (!vb->planes[plane].dbuf_mapped)
 				continue;
-			call_void_memop(vb, unmap_dmabuf, vb->planes[i].mem_priv);
-			vb->planes[i].dbuf_mapped = 0;
+			call_void_memop(vb, unmap_dmabuf,
+					vb->planes[plane].mem_priv);
+			vb->planes[plane].dbuf_mapped = 0;
 		}
 }
 
-- 
2.1.0.231.g7484e3b


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC RESEND 04/11] v4l: Unify cache management hint buffer flags
  2015-09-11 11:50 [RFC RESEND 00/11] vb2: Handle user cache hints, allow drivers to choose cache coherency Sakari Ailus
                   ` (2 preceding siblings ...)
  2015-09-11 11:50 ` [RFC RESEND 03/11] vb2: Move cache synchronisation from buffer done to dqbuf handler Sakari Ailus
@ 2015-09-11 11:50 ` Sakari Ailus
  2015-09-11 16:26   ` Hans Verkuil
  2016-12-15 20:15   ` Laurent Pinchart
  2015-09-11 11:50 ` [RFC RESEND 05/11] v4l2-core: Don't sync cache for a buffer if so requested Sakari Ailus
                   ` (6 subsequent siblings)
  10 siblings, 2 replies; 38+ messages in thread
From: Sakari Ailus @ 2015-09-11 11:50 UTC (permalink / raw)
  To: linux-media
  Cc: pawel, m.szyprowski, kyungmin.park, hverkuil, sumit.semwal,
	robdclark, daniel.vetter, labbott

The V4L2_BUF_FLAG_NO_CACHE_INVALIDATE and V4L2_BUF_FLAG_NO_CACHE_CLEAN
buffer flags are currently not used by the kernel. Replace the definitions
by a single V4L2_BUF_FLAG_NO_CACHE_SYNC flag to be used by further
patches.

Different cache architectures should not be visible to the user space
which can make no meaningful use of the differences anyway. In case a
device can make use of non-coherent memory accesses, the necessary cache
operations depend on the CPU architecture and the buffer type, not the
requests of the user. The cache operation itself may be skipped on the
user's request which was the purpose of the two flags.

On ARM the invalidate and clean are separate operations whereas on
x86(-64) the two are a single operation (flush). Whether the hardware uses
the buffer for reading (V4L2_BUF_TYPE_*_OUTPUT*) or writing
(V4L2_BUF_TYPE_*CAPTURE*) already defines the required cache operation
(clean and invalidate, respectively). No user input is required.

Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
---
 Documentation/DocBook/media/v4l/io.xml | 25 +++++++++++--------------
 include/trace/events/v4l2.h            |  3 +--
 include/uapi/linux/videodev2.h         |  7 +++++--
 3 files changed, 17 insertions(+), 18 deletions(-)

diff --git a/Documentation/DocBook/media/v4l/io.xml b/Documentation/DocBook/media/v4l/io.xml
index 7bbc2a4..4facd63 100644
--- a/Documentation/DocBook/media/v4l/io.xml
+++ b/Documentation/DocBook/media/v4l/io.xml
@@ -1112,21 +1112,18 @@ application. Drivers set or clear this flag when the
 	  linkend="vidioc-qbuf">VIDIOC_DQBUF</link> ioctl is called.</entry>
 	  </row>
 	  <row>
-	    <entry><constant>V4L2_BUF_FLAG_NO_CACHE_INVALIDATE</constant></entry>
+	    <entry><constant>V4L2_BUF_FLAG_NO_CACHE_SYNC</constant></entry>
 	    <entry>0x00000800</entry>
-	    <entry>Caches do not have to be invalidated for this buffer.
-Typically applications shall use this flag if the data captured in the buffer
-is not going to be touched by the CPU, instead the buffer will, probably, be
-passed on to a DMA-capable hardware unit for further processing or output.
-</entry>
-	  </row>
-	  <row>
-	    <entry><constant>V4L2_BUF_FLAG_NO_CACHE_CLEAN</constant></entry>
-	    <entry>0x00001000</entry>
-	    <entry>Caches do not have to be cleaned for this buffer.
-Typically applications shall use this flag for output buffers if the data
-in this buffer has not been created by the CPU but by some DMA-capable unit,
-in which case caches have not been used.</entry>
+	    <entry>Do not perform CPU cache synchronisation operations
+	    when the buffer is queued or dequeued. The user is
+	    responsible for the correct use of this flag. It should be
+	    only used when the buffer is not accessed using the CPU,
+	    e.g. the buffer is written to by a hardware block and then
+	    read by another one, in which case the flag should be set
+	    in both <link linkend="vidioc-qbuf">VIDIOC_DQBUF</link>
+	    and <link linkend="vidioc-qbuf">VIDIOC_QBUF</link> IOCTLs.
+	    The flag has no effect on some devices / architectures.
+	    </entry>
 	  </row>
 	  <row>
 	    <entry><constant>V4L2_BUF_FLAG_LAST</constant></entry>
diff --git a/include/trace/events/v4l2.h b/include/trace/events/v4l2.h
index dbf017b..4cee91d 100644
--- a/include/trace/events/v4l2.h
+++ b/include/trace/events/v4l2.h
@@ -78,8 +78,7 @@ SHOW_FIELD
 		{ V4L2_BUF_FLAG_ERROR,		     "ERROR" },		      \
 		{ V4L2_BUF_FLAG_TIMECODE,	     "TIMECODE" },	      \
 		{ V4L2_BUF_FLAG_PREPARED,	     "PREPARED" },	      \
-		{ V4L2_BUF_FLAG_NO_CACHE_INVALIDATE, "NO_CACHE_INVALIDATE" }, \
-		{ V4L2_BUF_FLAG_NO_CACHE_CLEAN,	     "NO_CACHE_CLEAN" },      \
+		{ V4L2_BUF_FLAG_NO_CACHE_SYNC,	     "NO_CACHE_SYNC" },	      \
 		{ V4L2_BUF_FLAG_TIMESTAMP_MASK,	     "TIMESTAMP_MASK" },      \
 		{ V4L2_BUF_FLAG_TIMESTAMP_UNKNOWN,   "TIMESTAMP_UNKNOWN" },   \
 		{ V4L2_BUF_FLAG_TIMESTAMP_MONOTONIC, "TIMESTAMP_MONOTONIC" }, \
diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
index 3228fbe..8d85aac 100644
--- a/include/uapi/linux/videodev2.h
+++ b/include/uapi/linux/videodev2.h
@@ -875,8 +875,11 @@ struct v4l2_buffer {
 #define V4L2_BUF_FLAG_TIMECODE			0x00000100
 /* Buffer is prepared for queuing */
 #define V4L2_BUF_FLAG_PREPARED			0x00000400
-/* Cache handling flags */
-#define V4L2_BUF_FLAG_NO_CACHE_INVALIDATE	0x00000800
+/* Cache sync hint */
+#define V4L2_BUF_FLAG_NO_CACHE_SYNC		0x00000800
+/* DEPRECATED. THIS WILL BE REMOVED IN THE FUTURE! */
+#define V4L2_BUF_FLAG_NO_CACHE_INVALIDATE	V4L2_BUF_FLAG_NO_CACHE_SYNC
+/* DEPRECATED. THIS WILL BE REMOVED IN THE FUTURE! */
 #define V4L2_BUF_FLAG_NO_CACHE_CLEAN		0x00001000
 /* Timestamp type */
 #define V4L2_BUF_FLAG_TIMESTAMP_MASK		0x0000e000
-- 
2.1.0.231.g7484e3b


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC RESEND 05/11] v4l2-core: Don't sync cache for a buffer if so requested
  2015-09-11 11:50 [RFC RESEND 00/11] vb2: Handle user cache hints, allow drivers to choose cache coherency Sakari Ailus
                   ` (3 preceding siblings ...)
  2015-09-11 11:50 ` [RFC RESEND 04/11] v4l: Unify cache management hint buffer flags Sakari Ailus
@ 2015-09-11 11:50 ` Sakari Ailus
  2015-09-11 17:12   ` Hans Verkuil
  2016-12-15 20:37   ` Laurent Pinchart
  2015-09-11 11:50 ` [RFC RESEND 06/11] vb2: Improve struct vb2_mem_ops documentation; alloc and put are for MMAP Sakari Ailus
                   ` (5 subsequent siblings)
  10 siblings, 2 replies; 38+ messages in thread
From: Sakari Ailus @ 2015-09-11 11:50 UTC (permalink / raw)
  To: linux-media
  Cc: pawel, m.szyprowski, kyungmin.park, hverkuil, sumit.semwal,
	robdclark, daniel.vetter, labbott, Samu Onkalo

From: Samu Onkalo <samu.onkalo@intel.com>

The user may request to the driver (vb2) to skip the cache maintenance
operations in case the buffer does not need cache synchronisation, e.g. in
cases where the buffer is passed between hardware blocks without it being
touched by the CPU.

Also document that the prepare and finish vb2_mem_ops might not get called
every time the buffer ownership changes between the kernel and the user
space.

Signed-off-by: Samu Onkalo <samu.onkalo@intel.com>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
---
 drivers/media/v4l2-core/videobuf2-core.c | 52 +++++++++++++++++++++++++-------
 include/media/videobuf2-core.h           | 12 +++++---
 2 files changed, 49 insertions(+), 15 deletions(-)

diff --git a/drivers/media/v4l2-core/videobuf2-core.c b/drivers/media/v4l2-core/videobuf2-core.c
index c5c0707a..b664024 100644
--- a/drivers/media/v4l2-core/videobuf2-core.c
+++ b/drivers/media/v4l2-core/videobuf2-core.c
@@ -187,6 +187,28 @@ static void __vb2_queue_cancel(struct vb2_queue *q);
 static void __enqueue_in_driver(struct vb2_buffer *vb);
 
 /**
+ * __mem_prepare_planes() - call finish mem op for all planes of the buffer
+ */
+static void __mem_prepare_planes(struct vb2_buffer *vb)
+{
+	unsigned int plane;
+
+	for (plane = 0; plane < vb->num_planes; ++plane)
+		call_void_memop(vb, prepare, vb->planes[plane].mem_priv);
+}
+
+/**
+ * __mem_finish_planes() - call finish mem op for all planes of the buffer
+ */
+static void __mem_finish_planes(struct vb2_buffer *vb)
+{
+	unsigned int plane;
+
+	for (plane = 0; plane < vb->num_planes; ++plane)
+		call_void_memop(vb, finish, vb->planes[plane].mem_priv);
+}
+
+/**
  * __vb2_buf_mem_alloc() - allocate video memory for the given buffer
  */
 static int __vb2_buf_mem_alloc(struct vb2_buffer *vb)
@@ -1391,6 +1413,10 @@ static void __fill_vb2_buffer(struct vb2_buffer *vb, const struct v4l2_buffer *b
 static int __prepare_mmap(struct vb2_buffer *vb, const struct v4l2_buffer *b)
 {
 	__fill_vb2_buffer(vb, b, vb->v4l2_planes);
+
+	if (!(b->flags & V4L2_BUF_FLAG_NO_CACHE_SYNC))
+		__mem_prepare_planes(vb);
+
 	return call_vb_qop(vb, buf_prepare, vb);
 }
 
@@ -1476,6 +1502,11 @@ static int __prepare_userptr(struct vb2_buffer *vb,
 			dprintk(1, "buffer initialization failed\n");
 			goto err;
 		}
+
+		/* This is new buffer memory --- always synchronise cache. */
+		__mem_prepare_planes(vb);
+	} else if (!(b->flags & V4L2_BUF_FLAG_NO_CACHE_SYNC)) {
+		__mem_prepare_planes(vb);
 	}
 
 	ret = call_vb_qop(vb, buf_prepare, vb);
@@ -1601,6 +1632,11 @@ static int __prepare_dmabuf(struct vb2_buffer *vb, const struct v4l2_buffer *b)
 			dprintk(1, "buffer initialization failed\n");
 			goto err;
 		}
+
+		/* This is new buffer memory --- always synchronise cache. */
+		__mem_prepare_planes(vb);
+	} else if (!(b->flags & V4L2_BUF_FLAG_NO_CACHE_SYNC)) {
+		__mem_prepare_planes(vb);
 	}
 
 	ret = call_vb_qop(vb, buf_prepare, vb);
@@ -1624,7 +1660,6 @@ err:
 static void __enqueue_in_driver(struct vb2_buffer *vb)
 {
 	struct vb2_queue *q = vb->vb2_queue;
-	unsigned int plane;
 
 	vb->state = VB2_BUF_STATE_ACTIVE;
 	atomic_inc(&q->owned_by_drv_count);
@@ -1691,10 +1726,6 @@ static int __buf_prepare(struct vb2_buffer *vb, const struct v4l2_buffer *b)
 		return ret;
 	}
 
-	/* sync buffers */
-	for (plane = 0; plane < vb->num_planes; ++plane)
-		call_void_memop(vb, prepare, vb->planes[plane].mem_priv);
-
 	vb->state = VB2_BUF_STATE_PREPARED;
 
 	return 0;
@@ -2078,7 +2109,7 @@ EXPORT_SYMBOL_GPL(vb2_wait_for_all_buffers);
 /**
  * __vb2_dqbuf() - bring back the buffer to the DEQUEUED state
  */
-static void __vb2_dqbuf(struct vb2_buffer *vb)
+static void __vb2_dqbuf(struct vb2_buffer *vb, bool no_cache_sync)
 {
 	struct vb2_queue *q = vb->vb2_queue;
 	unsigned int plane;
@@ -2089,9 +2120,8 @@ static void __vb2_dqbuf(struct vb2_buffer *vb)
 
 	vb->state = VB2_BUF_STATE_DEQUEUED;
 
-	/* sync buffers */
-	for (plane = 0; plane < vb->num_planes; plane++)
-		call_void_memop(vb, finish, vb->planes[plane].mem_priv);
+	if (!no_cache_sync)
+		__mem_finish_planes(vb);
 
 	/* unmap DMABUF buffer */
 	if (q->memory == V4L2_MEMORY_DMABUF)
@@ -2143,7 +2173,7 @@ static int vb2_internal_dqbuf(struct vb2_queue *q, struct v4l2_buffer *b, bool n
 	    vb->v4l2_buf.flags & V4L2_BUF_FLAG_LAST)
 		q->last_buffer_dequeued = true;
 	/* go back to dequeued state */
-	__vb2_dqbuf(vb);
+	__vb2_dqbuf(vb, b->flags & V4L2_BUF_FLAG_NO_CACHE_SYNC);
 
 	dprintk(1, "dqbuf of buffer %d, with state %d\n",
 			vb->v4l2_buf.index, vb->state);
@@ -2246,7 +2276,7 @@ static void __vb2_queue_cancel(struct vb2_queue *q)
 			vb->state = VB2_BUF_STATE_PREPARED;
 			call_void_vb_qop(vb, buf_finish, vb);
 		}
-		__vb2_dqbuf(vb);
+		__vb2_dqbuf(vb, false);
 	}
 }
 
diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h
index 4f7f7ae..a825bd5 100644
--- a/include/media/videobuf2-core.h
+++ b/include/media/videobuf2-core.h
@@ -59,10 +59,14 @@ struct vb2_threadio_data;
  *		dmabuf.
  * @unmap_dmabuf: releases access control to the dmabuf - allocator is notified
  *		  that this driver is done using the dmabuf for now.
- * @prepare:	called every time the buffer is passed from userspace to the
- *		driver, useful for cache synchronisation, optional.
- * @finish:	called every time the buffer is passed back from the driver
- *		to the userspace, also optional.
+ * @prepare:	Called on the plane when the buffer ownership is passed from
+ *		the user space to the kernel and the plane must be cache
+ *		syncronised. The V4L2_BUF_FLAG_NO_CACHE_SYNC buffer flag may
+ *		be used to skip this call. Optional.
+ * @finish:	Called on the plane when the buffer ownership is passed from
+ *		the kernel to the user space and the plane must be cache
+ *		syncronised. The V4L2_BUF_FLAG_NO_CACHE_SYNC buffer flag may
+ *		be used to skip this call. Optional.
  * @vaddr:	return a kernel virtual address to a given memory buffer
  *		associated with the passed private structure or NULL if no
  *		such mapping exists.
-- 
2.1.0.231.g7484e3b


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC RESEND 06/11] vb2: Improve struct vb2_mem_ops documentation; alloc and put are for MMAP
  2015-09-11 11:50 [RFC RESEND 00/11] vb2: Handle user cache hints, allow drivers to choose cache coherency Sakari Ailus
                   ` (4 preceding siblings ...)
  2015-09-11 11:50 ` [RFC RESEND 05/11] v4l2-core: Don't sync cache for a buffer if so requested Sakari Ailus
@ 2015-09-11 11:50 ` Sakari Ailus
  2015-09-11 17:13   ` Hans Verkuil
  2016-12-15 20:50   ` Laurent Pinchart
  2015-09-11 11:50 ` [RFC RESEND 07/11] vb2: dma-contig: Remove redundant sgt_base field Sakari Ailus
                   ` (4 subsequent siblings)
  10 siblings, 2 replies; 38+ messages in thread
From: Sakari Ailus @ 2015-09-11 11:50 UTC (permalink / raw)
  To: linux-media
  Cc: pawel, m.szyprowski, kyungmin.park, hverkuil, sumit.semwal,
	robdclark, daniel.vetter, labbott

The alloc() and put() ops are for MMAP buffers only. Document it.

Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
---
 include/media/videobuf2-core.h | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h
index a825bd5..efc9a19 100644
--- a/include/media/videobuf2-core.h
+++ b/include/media/videobuf2-core.h
@@ -24,16 +24,16 @@ struct vb2_threadio_data;
 
 /**
  * struct vb2_mem_ops - memory handling/memory allocator operations
- * @alloc:	allocate video memory and, optionally, allocator private data,
- *		return NULL on failure or a pointer to allocator private,
- *		per-buffer data on success; the returned private structure
- *		will then be passed as buf_priv argument to other ops in this
- *		structure. Additional gfp_flags to use when allocating the
- *		are also passed to this operation. These flags are from the
- *		gfp_flags field of vb2_queue.
- * @put:	inform the allocator that the buffer will no longer be used;
- *		usually will result in the allocator freeing the buffer (if
- *		no other users of this buffer are present); the buf_priv
+ * @alloc:	allocate video memory for an MMAP buffer and, optionally,
+ *		allocator private data, return NULL on failure or a pointer
+ *		to allocator private, per-buffer data on success; the returned
+ *		private structure will then be passed as buf_priv argument to
+ *		other ops in this structure. Additional gfp_flags to use when
+ *		allocating the are also passed to this operation. These flags
+ *		are from the gfp_flags field of vb2_queue.
+ * @put:	inform the allocator that the MMAP buffer will no longer be
+ *		used; usually will result in the allocator freeing the buffer
+ *		(if no other users of this buffer are present); the buf_priv
  *		argument is the allocator private per-buffer structure
  *		previously returned from the alloc callback.
  * @get_userptr: acquire userspace memory for a hardware operation; used for
-- 
2.1.0.231.g7484e3b


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC RESEND 07/11] vb2: dma-contig: Remove redundant sgt_base field
  2015-09-11 11:50 [RFC RESEND 00/11] vb2: Handle user cache hints, allow drivers to choose cache coherency Sakari Ailus
                   ` (5 preceding siblings ...)
  2015-09-11 11:50 ` [RFC RESEND 06/11] vb2: Improve struct vb2_mem_ops documentation; alloc and put are for MMAP Sakari Ailus
@ 2015-09-11 11:50 ` Sakari Ailus
  2015-09-11 17:28   ` Hans Verkuil
  2016-12-15 21:08   ` Laurent Pinchart
  2015-09-11 11:50 ` [RFC RESEND 08/11] vb2: dma-contig: Move vb2_dc_get_base_sgt() up Sakari Ailus
                   ` (3 subsequent siblings)
  10 siblings, 2 replies; 38+ messages in thread
From: Sakari Ailus @ 2015-09-11 11:50 UTC (permalink / raw)
  To: linux-media
  Cc: pawel, m.szyprowski, kyungmin.park, hverkuil, sumit.semwal,
	robdclark, daniel.vetter, labbott

The struct vb2_dc_buf contains two struct sg_table fields: sgt_base and
dma_sgt. The former is used by DMA-BUF buffers whereas the latter is used
by USERPTR.

Unify the two, leaving dma_sgt.

MMAP buffers do not need cache flushing since they have been allocated
using dma_alloc_coherent().

Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
---
 drivers/media/v4l2-core/videobuf2-dma-contig.c | 21 ++++++++++-----------
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/drivers/media/v4l2-core/videobuf2-dma-contig.c b/drivers/media/v4l2-core/videobuf2-dma-contig.c
index 94c1e64..26a0a0f 100644
--- a/drivers/media/v4l2-core/videobuf2-dma-contig.c
+++ b/drivers/media/v4l2-core/videobuf2-dma-contig.c
@@ -36,7 +36,6 @@ struct vb2_dc_buf {
 	/* MMAP related */
 	struct vb2_vmarea_handler	handler;
 	atomic_t			refcount;
-	struct sg_table			*sgt_base;
 
 	/* USERPTR related */
 	struct vm_area_struct		*vma;
@@ -117,7 +116,7 @@ static void vb2_dc_prepare(void *buf_priv)
 	struct sg_table *sgt = buf->dma_sgt;
 
 	/* DMABUF exporter will flush the cache for us */
-	if (!sgt || buf->db_attach)
+	if (!buf->vma)
 		return;
 
 	dma_sync_sg_for_device(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir);
@@ -129,7 +128,7 @@ static void vb2_dc_finish(void *buf_priv)
 	struct sg_table *sgt = buf->dma_sgt;
 
 	/* DMABUF exporter will flush the cache for us */
-	if (!sgt || buf->db_attach)
+	if (!buf->vma)
 		return;
 
 	dma_sync_sg_for_cpu(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir);
@@ -146,9 +145,9 @@ static void vb2_dc_put(void *buf_priv)
 	if (!atomic_dec_and_test(&buf->refcount))
 		return;
 
-	if (buf->sgt_base) {
-		sg_free_table(buf->sgt_base);
-		kfree(buf->sgt_base);
+	if (buf->dma_sgt) {
+		sg_free_table(buf->dma_sgt);
+		kfree(buf->dma_sgt);
 	}
 	dma_free_coherent(buf->dev, buf->size, buf->vaddr, buf->dma_addr);
 	put_device(buf->dev);
@@ -252,13 +251,13 @@ static int vb2_dc_dmabuf_ops_attach(struct dma_buf *dbuf, struct device *dev,
 	/* Copy the buf->base_sgt scatter list to the attachment, as we can't
 	 * map the same scatter list to multiple attachments at the same time.
 	 */
-	ret = sg_alloc_table(sgt, buf->sgt_base->orig_nents, GFP_KERNEL);
+	ret = sg_alloc_table(sgt, buf->dma_sgt->orig_nents, GFP_KERNEL);
 	if (ret) {
 		kfree(attach);
 		return -ENOMEM;
 	}
 
-	rd = buf->sgt_base->sgl;
+	rd = buf->dma_sgt->sgl;
 	wr = sgt->sgl;
 	for (i = 0; i < sgt->orig_nents; ++i) {
 		sg_set_page(wr, sg_page(rd), rd->length, rd->offset);
@@ -409,10 +408,10 @@ static struct dma_buf *vb2_dc_get_dmabuf(void *buf_priv, unsigned long flags)
 	exp_info.flags = flags;
 	exp_info.priv = buf;
 
-	if (!buf->sgt_base)
-		buf->sgt_base = vb2_dc_get_base_sgt(buf);
+	if (!buf->dma_sgt)
+		buf->dma_sgt = vb2_dc_get_base_sgt(buf);
 
-	if (WARN_ON(!buf->sgt_base))
+	if (WARN_ON(!buf->dma_sgt))
 		return NULL;
 
 	dbuf = dma_buf_export(&exp_info);
-- 
2.1.0.231.g7484e3b


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC RESEND 08/11] vb2: dma-contig: Move vb2_dc_get_base_sgt() up
  2015-09-11 11:50 [RFC RESEND 00/11] vb2: Handle user cache hints, allow drivers to choose cache coherency Sakari Ailus
                   ` (6 preceding siblings ...)
  2015-09-11 11:50 ` [RFC RESEND 07/11] vb2: dma-contig: Remove redundant sgt_base field Sakari Ailus
@ 2015-09-11 11:50 ` Sakari Ailus
  2016-12-15 21:12   ` Laurent Pinchart
  2015-09-11 11:50 ` [RFC RESEND 09/11] vb2: dma-contig: Don't warn on failure in obtaining scatterlist Sakari Ailus
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 38+ messages in thread
From: Sakari Ailus @ 2015-09-11 11:50 UTC (permalink / raw)
  To: linux-media
  Cc: pawel, m.szyprowski, kyungmin.park, hverkuil, sumit.semwal,
	robdclark, daniel.vetter, labbott

Just move the function up. It'll be soon needed earlier than previously.

Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
---
 drivers/media/v4l2-core/videobuf2-dma-contig.c | 44 +++++++++++++-------------
 1 file changed, 22 insertions(+), 22 deletions(-)

diff --git a/drivers/media/v4l2-core/videobuf2-dma-contig.c b/drivers/media/v4l2-core/videobuf2-dma-contig.c
index 26a0a0f..3260392 100644
--- a/drivers/media/v4l2-core/videobuf2-dma-contig.c
+++ b/drivers/media/v4l2-core/videobuf2-dma-contig.c
@@ -82,6 +82,28 @@ static unsigned long vb2_dc_get_contiguous_size(struct sg_table *sgt)
 	return size;
 }
 
+static struct sg_table *vb2_dc_get_base_sgt(struct vb2_dc_buf *buf)
+{
+	int ret;
+	struct sg_table *sgt;
+
+	sgt = kmalloc(sizeof(*sgt), GFP_KERNEL);
+	if (!sgt) {
+		dev_err(buf->dev, "failed to alloc sg table\n");
+		return NULL;
+	}
+
+	ret = dma_get_sgtable(buf->dev, sgt, buf->vaddr, buf->dma_addr,
+		buf->size);
+	if (ret < 0) {
+		dev_err(buf->dev, "failed to get scatterlist from DMA API\n");
+		kfree(sgt);
+		return NULL;
+	}
+
+	return sgt;
+}
+
 /*********************************************/
 /*         callbacks for all buffers         */
 /*********************************************/
@@ -375,28 +397,6 @@ static struct dma_buf_ops vb2_dc_dmabuf_ops = {
 	.release = vb2_dc_dmabuf_ops_release,
 };
 
-static struct sg_table *vb2_dc_get_base_sgt(struct vb2_dc_buf *buf)
-{
-	int ret;
-	struct sg_table *sgt;
-
-	sgt = kmalloc(sizeof(*sgt), GFP_KERNEL);
-	if (!sgt) {
-		dev_err(buf->dev, "failed to alloc sg table\n");
-		return NULL;
-	}
-
-	ret = dma_get_sgtable(buf->dev, sgt, buf->vaddr, buf->dma_addr,
-		buf->size);
-	if (ret < 0) {
-		dev_err(buf->dev, "failed to get scatterlist from DMA API\n");
-		kfree(sgt);
-		return NULL;
-	}
-
-	return sgt;
-}
-
 static struct dma_buf *vb2_dc_get_dmabuf(void *buf_priv, unsigned long flags)
 {
 	struct vb2_dc_buf *buf = buf_priv;
-- 
2.1.0.231.g7484e3b


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC RESEND 09/11] vb2: dma-contig: Don't warn on failure in obtaining scatterlist
  2015-09-11 11:50 [RFC RESEND 00/11] vb2: Handle user cache hints, allow drivers to choose cache coherency Sakari Ailus
                   ` (7 preceding siblings ...)
  2015-09-11 11:50 ` [RFC RESEND 08/11] vb2: dma-contig: Move vb2_dc_get_base_sgt() up Sakari Ailus
@ 2015-09-11 11:50 ` Sakari Ailus
  2016-12-15 21:13   ` Laurent Pinchart
  2015-09-11 11:50 ` [RFC RESEND 10/11] vb2: dma-contig: Let drivers decide DMA attrs of MMAP and USERPTR bufs Sakari Ailus
  2015-09-11 11:50 ` [RFC RESEND 11/11] vb2: dma-contig: Add WARN_ON_ONCE() to check for potential bugs Sakari Ailus
  10 siblings, 1 reply; 38+ messages in thread
From: Sakari Ailus @ 2015-09-11 11:50 UTC (permalink / raw)
  To: linux-media
  Cc: pawel, m.szyprowski, kyungmin.park, hverkuil, sumit.semwal,
	robdclark, daniel.vetter, labbott

vb2_dc_get_base_sgt() which obtains the scatterlist already prints
information on why the scatterlist could not be obtained.

Also, remove the useless warning of a failed kmalloc().

Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
---
 drivers/media/v4l2-core/videobuf2-dma-contig.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/media/v4l2-core/videobuf2-dma-contig.c b/drivers/media/v4l2-core/videobuf2-dma-contig.c
index 3260392..65bc687 100644
--- a/drivers/media/v4l2-core/videobuf2-dma-contig.c
+++ b/drivers/media/v4l2-core/videobuf2-dma-contig.c
@@ -88,10 +88,8 @@ static struct sg_table *vb2_dc_get_base_sgt(struct vb2_dc_buf *buf)
 	struct sg_table *sgt;
 
 	sgt = kmalloc(sizeof(*sgt), GFP_KERNEL);
-	if (!sgt) {
-		dev_err(buf->dev, "failed to alloc sg table\n");
+	if (!sgt)
 		return NULL;
-	}
 
 	ret = dma_get_sgtable(buf->dev, sgt, buf->vaddr, buf->dma_addr,
 		buf->size);
@@ -411,7 +409,7 @@ static struct dma_buf *vb2_dc_get_dmabuf(void *buf_priv, unsigned long flags)
 	if (!buf->dma_sgt)
 		buf->dma_sgt = vb2_dc_get_base_sgt(buf);
 
-	if (WARN_ON(!buf->dma_sgt))
+	if (!buf->dma_sgt)
 		return NULL;
 
 	dbuf = dma_buf_export(&exp_info);
-- 
2.1.0.231.g7484e3b


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC RESEND 10/11] vb2: dma-contig: Let drivers decide DMA attrs of MMAP and USERPTR bufs
  2015-09-11 11:50 [RFC RESEND 00/11] vb2: Handle user cache hints, allow drivers to choose cache coherency Sakari Ailus
                   ` (8 preceding siblings ...)
  2015-09-11 11:50 ` [RFC RESEND 09/11] vb2: dma-contig: Don't warn on failure in obtaining scatterlist Sakari Ailus
@ 2015-09-11 11:50 ` Sakari Ailus
  2016-12-15 21:40   ` Laurent Pinchart
  2015-09-11 11:50 ` [RFC RESEND 11/11] vb2: dma-contig: Add WARN_ON_ONCE() to check for potential bugs Sakari Ailus
  10 siblings, 1 reply; 38+ messages in thread
From: Sakari Ailus @ 2015-09-11 11:50 UTC (permalink / raw)
  To: linux-media
  Cc: pawel, m.szyprowski, kyungmin.park, hverkuil, sumit.semwal,
	robdclark, daniel.vetter, labbott

The desirable DMA attributes are not generic for all devices using
Videobuf2 contiguous DMA ops. Let the drivers decide.

This change also results in MMAP buffers always having an sg_table
(dma_sgt field).

Also arrange the header files alphabetically.

As a result, also the DMA-BUF exporter must provide ops for synchronising
the cache. This adds begin_cpu_access and end_cpu_access ops to
vb2_dc_dmabuf_ops.

The driver API changes were done using the following coccinelle spatch:

@@
expression E;
@@

- vb2_dma_contig_init_ctx(E)
+ vb2_dma_contig_init_ctx(E, NULL)

Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
---
 drivers/media/v4l2-core/videobuf2-dma-contig.c | 104 ++++++++++++++++++-------
 include/media/videobuf2-dma-contig.h           |   3 +-
 2 files changed, 79 insertions(+), 28 deletions(-)

diff --git a/drivers/media/v4l2-core/videobuf2-dma-contig.c b/drivers/media/v4l2-core/videobuf2-dma-contig.c
index 65bc687..65ee122 100644
--- a/drivers/media/v4l2-core/videobuf2-dma-contig.c
+++ b/drivers/media/v4l2-core/videobuf2-dma-contig.c
@@ -11,11 +11,11 @@
  */
 
 #include <linux/dma-buf.h>
+#include <linux/dma-mapping.h>
 #include <linux/module.h>
 #include <linux/scatterlist.h>
 #include <linux/sched.h>
 #include <linux/slab.h>
-#include <linux/dma-mapping.h>
 
 #include <media/videobuf2-core.h>
 #include <media/videobuf2-dma-contig.h>
@@ -23,6 +23,7 @@
 
 struct vb2_dc_conf {
 	struct device		*dev;
+	struct dma_attrs	attrs;
 };
 
 struct vb2_dc_buf {
@@ -34,6 +35,7 @@ struct vb2_dc_buf {
 	struct sg_table			*dma_sgt;
 
 	/* MMAP related */
+	struct dma_attrs		*attrs;
 	struct vb2_vmarea_handler	handler;
 	atomic_t			refcount;
 
@@ -135,8 +137,12 @@ static void vb2_dc_prepare(void *buf_priv)
 	struct vb2_dc_buf *buf = buf_priv;
 	struct sg_table *sgt = buf->dma_sgt;
 
-	/* DMABUF exporter will flush the cache for us */
-	if (!buf->vma)
+	/*
+	 * DMABUF exporter will flush the cache for us; only USERPTR
+	 * and MMAP buffers with non-coherent memory will be flushed.
+	 */
+	if (!buf->attrs ||
+	    !dma_get_attr(DMA_ATTR_NON_CONSISTENT, buf->attrs))
 		return;
 
 	dma_sync_sg_for_device(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir);
@@ -147,8 +153,12 @@ static void vb2_dc_finish(void *buf_priv)
 	struct vb2_dc_buf *buf = buf_priv;
 	struct sg_table *sgt = buf->dma_sgt;
 
-	/* DMABUF exporter will flush the cache for us */
-	if (!buf->vma)
+	/*
+	 * DMABUF exporter will flush the cache for us; only USERPTR
+	 * and MMAP buffers with non-coherent memory will be flushed.
+	 */
+	if (!buf->attrs ||
+	    !dma_get_attr(DMA_ATTR_NON_CONSISTENT, buf->attrs))
 		return;
 
 	dma_sync_sg_for_cpu(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir);
@@ -169,7 +179,8 @@ static void vb2_dc_put(void *buf_priv)
 		sg_free_table(buf->dma_sgt);
 		kfree(buf->dma_sgt);
 	}
-	dma_free_coherent(buf->dev, buf->size, buf->vaddr, buf->dma_addr);
+	dma_free_attrs(buf->dev, buf->size, buf->vaddr, buf->dma_addr,
+		       buf->attrs);
 	put_device(buf->dev);
 	kfree(buf);
 }
@@ -185,14 +196,25 @@ static void *vb2_dc_alloc(void *alloc_ctx, unsigned long size,
 	if (!buf)
 		return ERR_PTR(-ENOMEM);
 
-	buf->vaddr = dma_alloc_coherent(dev, size, &buf->dma_addr,
-						GFP_KERNEL | gfp_flags);
+	buf->attrs = &conf->attrs;
+
+	buf->vaddr = dma_alloc_attrs(dev, size, &buf->dma_addr,
+				     GFP_KERNEL | gfp_flags, buf->attrs);
 	if (!buf->vaddr) {
-		dev_err(dev, "dma_alloc_coherent of size %ld failed\n", size);
+		dev_err(dev, "dma_alloc_attrs of size %ld failed\n", size);
 		kfree(buf);
 		return ERR_PTR(-ENOMEM);
 	}
 
+	if (dma_get_attr(DMA_ATTR_NON_CONSISTENT, buf->attrs)) {
+		buf->dma_sgt = vb2_dc_get_base_sgt(buf);
+		if (!buf->dma_sgt) {
+			dma_free_attrs(dev, size, buf->vaddr, buf->dma_addr,
+				       buf->attrs);
+			return ERR_PTR(-ENOMEM);
+		}
+	}
+
 	/* Prevent the device from being released while the buffer is used */
 	buf->dev = get_device(dev);
 	buf->size = size;
@@ -223,8 +245,8 @@ static int vb2_dc_mmap(void *buf_priv, struct vm_area_struct *vma)
 	 */
 	vma->vm_pgoff = 0;
 
-	ret = dma_mmap_coherent(buf->dev, vma, buf->vaddr,
-		buf->dma_addr, buf->size);
+	ret = dma_mmap_attrs(buf->dev, vma, buf->vaddr, buf->dma_addr,
+			     buf->size, buf->attrs);
 
 	if (ret) {
 		pr_err("Remapping memory failed, error: %d\n", ret);
@@ -370,6 +392,34 @@ static void *vb2_dc_dmabuf_ops_kmap(struct dma_buf *dbuf, unsigned long pgnum)
 	return buf->vaddr + pgnum * PAGE_SIZE;
 }
 
+static int vb2_dc_dmabuf_ops_begin_cpu_access(
+	struct dma_buf *dbuf, size_t start, size_t len,
+	enum dma_data_direction direction)
+{
+	struct vb2_dc_buf *buf = dbuf->priv;
+	struct sg_table *sgt = buf->dma_sgt;
+
+	if (!dma_get_attr(DMA_ATTR_NON_CONSISTENT, buf->attrs))
+		return 0;
+
+	dma_sync_sg_for_cpu(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir);
+
+	return 0;
+}
+
+static void vb2_dc_dmabuf_ops_end_cpu_access(
+	struct dma_buf *dbuf, size_t start, size_t len,
+	enum dma_data_direction direction)
+{
+	struct vb2_dc_buf *buf = dbuf->priv;
+	struct sg_table *sgt = buf->dma_sgt;
+
+	if (!dma_get_attr(DMA_ATTR_NON_CONSISTENT, buf->attrs))
+		return;
+
+	dma_sync_sg_for_device(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir);
+}
+
 static void *vb2_dc_dmabuf_ops_vmap(struct dma_buf *dbuf)
 {
 	struct vb2_dc_buf *buf = dbuf->priv;
@@ -390,6 +440,8 @@ static struct dma_buf_ops vb2_dc_dmabuf_ops = {
 	.unmap_dma_buf = vb2_dc_dmabuf_ops_unmap,
 	.kmap = vb2_dc_dmabuf_ops_kmap,
 	.kmap_atomic = vb2_dc_dmabuf_ops_kmap,
+	.begin_cpu_access = vb2_dc_dmabuf_ops_begin_cpu_access,
+	.end_cpu_access = vb2_dc_dmabuf_ops_end_cpu_access,
 	.vmap = vb2_dc_dmabuf_ops_vmap,
 	.mmap = vb2_dc_dmabuf_ops_mmap,
 	.release = vb2_dc_dmabuf_ops_release,
@@ -514,15 +566,13 @@ static void vb2_dc_put_userptr(void *buf_priv)
 	struct sg_table *sgt = buf->dma_sgt;
 
 	if (sgt) {
-		DEFINE_DMA_ATTRS(attrs);
-
-		dma_set_attr(DMA_ATTR_SKIP_CPU_SYNC, &attrs);
 		/*
-		 * No need to sync to CPU, it's already synced to the CPU
-		 * since the finish() memop will have been called before this.
+		 * Don't ask to skip cache sync in case if the user
+		 * did ask to skip cache flush the last time the
+		 * buffer was dequeued.
 		 */
 		dma_unmap_sg_attrs(buf->dev, sgt->sgl, sgt->orig_nents,
-				   buf->dma_dir, &attrs);
+				   buf->dma_dir, buf->attrs);
 		if (!vma_is_io(buf->vma))
 			vb2_dc_sgt_foreach_page(sgt, vb2_dc_put_dirty_page);
 
@@ -579,9 +629,6 @@ static void *vb2_dc_get_userptr(void *alloc_ctx, unsigned long vaddr,
 	struct sg_table *sgt;
 	unsigned long contig_size;
 	unsigned long dma_align = dma_get_cache_alignment();
-	DEFINE_DMA_ATTRS(attrs);
-
-	dma_set_attr(DMA_ATTR_SKIP_CPU_SYNC, &attrs);
 
 	/* Only cache aligned DMA transfers are reliable */
 	if (!IS_ALIGNED(vaddr | size, dma_align)) {
@@ -598,6 +645,8 @@ static void *vb2_dc_get_userptr(void *alloc_ctx, unsigned long vaddr,
 	if (!buf)
 		return ERR_PTR(-ENOMEM);
 
+	buf->attrs = &conf->attrs;
+
 	buf->dev = conf->dev;
 	buf->dma_dir = dma_dir;
 
@@ -667,12 +716,8 @@ static void *vb2_dc_get_userptr(void *alloc_ctx, unsigned long vaddr,
 	kfree(pages);
 	pages = NULL;
 
-	/*
-	 * No need to sync to the device, this will happen later when the
-	 * prepare() memop is called.
-	 */
 	sgt->nents = dma_map_sg_attrs(buf->dev, sgt->sgl, sgt->orig_nents,
-				      buf->dma_dir, &attrs);
+				      buf->dma_dir, buf->attrs);
 	if (sgt->nents <= 0) {
 		pr_err("failed to map scatterlist\n");
 		ret = -EIO;
@@ -695,7 +740,7 @@ static void *vb2_dc_get_userptr(void *alloc_ctx, unsigned long vaddr,
 
 fail_map_sg:
 	dma_unmap_sg_attrs(buf->dev, sgt->sgl, sgt->orig_nents,
-			   buf->dma_dir, &attrs);
+			   buf->dma_dir, buf->attrs);
 
 fail_sgt_init:
 	if (!vma_is_io(buf->vma))
@@ -856,7 +901,7 @@ const struct vb2_mem_ops vb2_dma_contig_memops = {
 };
 EXPORT_SYMBOL_GPL(vb2_dma_contig_memops);
 
-void *vb2_dma_contig_init_ctx(struct device *dev)
+void *vb2_dma_contig_init_ctx(struct device *dev, struct dma_attrs *attrs)
 {
 	struct vb2_dc_conf *conf;
 
@@ -866,6 +911,11 @@ void *vb2_dma_contig_init_ctx(struct device *dev)
 
 	conf->dev = dev;
 
+	if (!attrs)
+		init_dma_attrs(&conf->attrs);
+	else
+		conf->attrs = *attrs;
+
 	return conf;
 }
 EXPORT_SYMBOL_GPL(vb2_dma_contig_init_ctx);
diff --git a/include/media/videobuf2-dma-contig.h b/include/media/videobuf2-dma-contig.h
index 8197f87..1b85282 100644
--- a/include/media/videobuf2-dma-contig.h
+++ b/include/media/videobuf2-dma-contig.h
@@ -24,7 +24,8 @@ vb2_dma_contig_plane_dma_addr(struct vb2_buffer *vb, unsigned int plane_no)
 	return *addr;
 }
 
-void *vb2_dma_contig_init_ctx(struct device *dev);
+void *vb2_dma_contig_init_ctx(struct device *dev,
+			      struct dma_attrs *attrs);
 void vb2_dma_contig_cleanup_ctx(void *alloc_ctx);
 
 extern const struct vb2_mem_ops vb2_dma_contig_memops;
-- 
2.1.0.231.g7484e3b


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC RESEND 11/11] vb2: dma-contig: Add WARN_ON_ONCE() to check for potential bugs
  2015-09-11 11:50 [RFC RESEND 00/11] vb2: Handle user cache hints, allow drivers to choose cache coherency Sakari Ailus
                   ` (9 preceding siblings ...)
  2015-09-11 11:50 ` [RFC RESEND 10/11] vb2: dma-contig: Let drivers decide DMA attrs of MMAP and USERPTR bufs Sakari Ailus
@ 2015-09-11 11:50 ` Sakari Ailus
  2016-12-15 21:57   ` Laurent Pinchart
  10 siblings, 1 reply; 38+ messages in thread
From: Sakari Ailus @ 2015-09-11 11:50 UTC (permalink / raw)
  To: linux-media
  Cc: pawel, m.szyprowski, kyungmin.park, hverkuil, sumit.semwal,
	robdclark, daniel.vetter, labbott

The scatterlist should always be present when the cache would need to be
flushed. Each buffer type has its own means to provide that. Add
WARN_ON_ONCE() to check the scatterist exists.

Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
---
 drivers/media/v4l2-core/videobuf2-dma-contig.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/media/v4l2-core/videobuf2-dma-contig.c b/drivers/media/v4l2-core/videobuf2-dma-contig.c
index 65ee122..58c35c5 100644
--- a/drivers/media/v4l2-core/videobuf2-dma-contig.c
+++ b/drivers/media/v4l2-core/videobuf2-dma-contig.c
@@ -145,6 +145,9 @@ static void vb2_dc_prepare(void *buf_priv)
 	    !dma_get_attr(DMA_ATTR_NON_CONSISTENT, buf->attrs))
 		return;
 
+	if (WARN_ON_ONCE(!sgt))
+		return;
+
 	dma_sync_sg_for_device(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir);
 }
 
@@ -161,6 +164,9 @@ static void vb2_dc_finish(void *buf_priv)
 	    !dma_get_attr(DMA_ATTR_NON_CONSISTENT, buf->attrs))
 		return;
 
+	if (WARN_ON_ONCE(!sgt))
+		return;
+
 	dma_sync_sg_for_cpu(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir);
 }
 
-- 
2.1.0.231.g7484e3b


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [RFC RESEND 01/11] vb2: Rename confusingly named internal buffer preparation functions
  2015-09-11 11:50 ` [RFC RESEND 01/11] vb2: Rename confusingly named internal buffer preparation functions Sakari Ailus
@ 2015-09-11 15:57   ` Hans Verkuil
  2016-12-15 15:40   ` Laurent Pinchart
  1 sibling, 0 replies; 38+ messages in thread
From: Hans Verkuil @ 2015-09-11 15:57 UTC (permalink / raw)
  To: Sakari Ailus, linux-media
  Cc: pawel, m.szyprowski, kyungmin.park, sumit.semwal, robdclark,
	daniel.vetter, labbott

On 09/11/2015 01:50 PM, Sakari Ailus wrote:
> Rename __qbuf_*() functions which are specific to a buffer type as
> __prepare_*() which matches with what they do. The naming was there for
> historical reasons; the purpose of the functions was changed without
> renaming them.
> 
> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>

Acked-by: Hans Verkuil <hans.verkuil@cisco.com>

Good one!

	Hans

> ---
>  drivers/media/v4l2-core/videobuf2-core.c | 19 ++++++++++---------
>  1 file changed, 10 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/media/v4l2-core/videobuf2-core.c b/drivers/media/v4l2-core/videobuf2-core.c
> index b866a6b..af6a23a 100644
> --- a/drivers/media/v4l2-core/videobuf2-core.c
> +++ b/drivers/media/v4l2-core/videobuf2-core.c
> @@ -1391,18 +1391,19 @@ static void __fill_vb2_buffer(struct vb2_buffer *vb, const struct v4l2_buffer *b
>  }
>  
>  /**
> - * __qbuf_mmap() - handle qbuf of an MMAP buffer
> + * __prepare_mmap() - prepare an MMAP buffer
>   */
> -static int __qbuf_mmap(struct vb2_buffer *vb, const struct v4l2_buffer *b)
> +static int __prepare_mmap(struct vb2_buffer *vb, const struct v4l2_buffer *b)
>  {
>  	__fill_vb2_buffer(vb, b, vb->v4l2_planes);
>  	return call_vb_qop(vb, buf_prepare, vb);
>  }
>  
>  /**
> - * __qbuf_userptr() - handle qbuf of a USERPTR buffer
> + * __prepare_userptr() - prepare a USERPTR buffer
>   */
> -static int __qbuf_userptr(struct vb2_buffer *vb, const struct v4l2_buffer *b)
> +static int __prepare_userptr(struct vb2_buffer *vb,
> +			     const struct v4l2_buffer *b)
>  {
>  	struct v4l2_plane planes[VIDEO_MAX_PLANES];
>  	struct vb2_queue *q = vb->vb2_queue;
> @@ -1504,9 +1505,9 @@ err:
>  }
>  
>  /**
> - * __qbuf_dmabuf() - handle qbuf of a DMABUF buffer
> + * __prepare_dmabuf() - prepare a DMABUF buffer
>   */
> -static int __qbuf_dmabuf(struct vb2_buffer *vb, const struct v4l2_buffer *b)
> +static int __prepare_dmabuf(struct vb2_buffer *vb, const struct v4l2_buffer *b)
>  {
>  	struct v4l2_plane planes[VIDEO_MAX_PLANES];
>  	struct vb2_queue *q = vb->vb2_queue;
> @@ -1678,15 +1679,15 @@ static int __buf_prepare(struct vb2_buffer *vb, const struct v4l2_buffer *b)
>  
>  	switch (q->memory) {
>  	case V4L2_MEMORY_MMAP:
> -		ret = __qbuf_mmap(vb, b);
> +		ret = __prepare_mmap(vb, b);
>  		break;
>  	case V4L2_MEMORY_USERPTR:
>  		down_read(&current->mm->mmap_sem);
> -		ret = __qbuf_userptr(vb, b);
> +		ret = __prepare_userptr(vb, b);
>  		up_read(&current->mm->mmap_sem);
>  		break;
>  	case V4L2_MEMORY_DMABUF:
> -		ret = __qbuf_dmabuf(vb, b);
> +		ret = __prepare_dmabuf(vb, b);
>  		break;
>  	default:
>  		WARN(1, "Invalid queue type\n");
> 


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC RESEND 02/11] vb2: Move buffer cache synchronisation to prepare from queue
  2015-09-11 11:50 ` [RFC RESEND 02/11] vb2: Move buffer cache synchronisation to prepare from queue Sakari Ailus
@ 2015-09-11 16:11   ` Hans Verkuil
  0 siblings, 0 replies; 38+ messages in thread
From: Hans Verkuil @ 2015-09-11 16:11 UTC (permalink / raw)
  To: Sakari Ailus, linux-media
  Cc: pawel, m.szyprowski, kyungmin.park, sumit.semwal, robdclark,
	daniel.vetter, labbott

On 09/11/2015 01:50 PM, Sakari Ailus wrote:
> The buffer cache should be synchronised in buffer preparation, not when
> the buffer is queued to the device. Fix this.
> 
> Mmap buffers do not need cache synchronisation since they are always
> coherent.
> 

While I agree with this change, I also think it is incomplete. The memop prepare()
is now done before it is queued to the driver, but the corresponding memop
finish() is only called when it is returned by the driver.

So if the application queues buffers and (without calling STREAMON) destroys them
all (REQBUFS(0)), you should get 'unbalanced' messages in the kernel log.

I'm pretty sure v4l2-compliance tests this scenario, so that would be a good thing
to check.

Regards,

	Hans

> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
> ---
>  drivers/media/v4l2-core/videobuf2-core.c | 18 +++++++++++-------
>  1 file changed, 11 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/media/v4l2-core/videobuf2-core.c b/drivers/media/v4l2-core/videobuf2-core.c
> index af6a23a..64fce4d 100644
> --- a/drivers/media/v4l2-core/videobuf2-core.c
> +++ b/drivers/media/v4l2-core/videobuf2-core.c
> @@ -1636,10 +1636,6 @@ static void __enqueue_in_driver(struct vb2_buffer *vb)
>  
>  	trace_vb2_buf_queue(q, vb);
>  
> -	/* sync buffers */
> -	for (plane = 0; plane < vb->num_planes; ++plane)
> -		call_void_memop(vb, prepare, vb->planes[plane].mem_priv);
> -
>  	call_void_vb_qop(vb, buf_queue, vb);
>  }
>  
> @@ -1694,11 +1690,19 @@ static int __buf_prepare(struct vb2_buffer *vb, const struct v4l2_buffer *b)
>  		ret = -EINVAL;
>  	}
>  
> -	if (ret)
> +	if (ret) {
>  		dprintk(1, "buffer preparation failed: %d\n", ret);
> -	vb->state = ret ? VB2_BUF_STATE_DEQUEUED : VB2_BUF_STATE_PREPARED;
> +		vb->state = VB2_BUF_STATE_DEQUEUED;
> +		return ret;
> +	}
>  
> -	return ret;
> +	/* sync buffers */
> +	for (plane = 0; plane < vb->num_planes; ++plane)
> +		call_void_memop(vb, prepare, vb->planes[plane].mem_priv);
> +
> +	vb->state = VB2_BUF_STATE_PREPARED;
> +
> +	return 0;
>  }
>  
>  static int vb2_queue_or_prepare_buf(struct vb2_queue *q, struct v4l2_buffer *b,
> 


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC RESEND 03/11] vb2: Move cache synchronisation from buffer done to dqbuf handler
  2015-09-11 11:50 ` [RFC RESEND 03/11] vb2: Move cache synchronisation from buffer done to dqbuf handler Sakari Ailus
@ 2015-09-11 16:25   ` Hans Verkuil
  2015-09-15  7:51     ` Sakari Ailus
  0 siblings, 1 reply; 38+ messages in thread
From: Hans Verkuil @ 2015-09-11 16:25 UTC (permalink / raw)
  To: Sakari Ailus, linux-media
  Cc: pawel, m.szyprowski, kyungmin.park, sumit.semwal, robdclark,
	daniel.vetter, labbott

On 09/11/2015 01:50 PM, Sakari Ailus wrote:
> The cache synchronisation may be a time consuming operation and thus not
> best performed in an interrupt which is a typical context for
> vb2_buffer_done() calls. This may consume up to tens of ms on some
> machines, depending on the buffer size.
> 
> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
> ---
>  drivers/media/v4l2-core/videobuf2-core.c | 20 ++++++++++----------
>  1 file changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/media/v4l2-core/videobuf2-core.c b/drivers/media/v4l2-core/videobuf2-core.c
> index 64fce4d..c5c0707a 100644
> --- a/drivers/media/v4l2-core/videobuf2-core.c
> +++ b/drivers/media/v4l2-core/videobuf2-core.c
> @@ -1177,7 +1177,6 @@ void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state)
>  {
>  	struct vb2_queue *q = vb->vb2_queue;
>  	unsigned long flags;
> -	unsigned int plane;
>  
>  	if (WARN_ON(vb->state != VB2_BUF_STATE_ACTIVE))
>  		return;
> @@ -1197,10 +1196,6 @@ void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state)
>  	dprintk(4, "done processing on buffer %d, state: %d\n",
>  			vb->v4l2_buf.index, state);
>  
> -	/* sync buffers */
> -	for (plane = 0; plane < vb->num_planes; ++plane)
> -		call_void_memop(vb, finish, vb->planes[plane].mem_priv);
> -

Ah, OK, so it is removed here,

>  	/* Add the buffer to the done buffers list */
>  	spin_lock_irqsave(&q->done_lock, flags);
>  	vb->state = state;
> @@ -2086,7 +2081,7 @@ EXPORT_SYMBOL_GPL(vb2_wait_for_all_buffers);
>  static void __vb2_dqbuf(struct vb2_buffer *vb)
>  {
>  	struct vb2_queue *q = vb->vb2_queue;
> -	unsigned int i;
> +	unsigned int plane;
>  
>  	/* nothing to do if the buffer is already dequeued */
>  	if (vb->state == VB2_BUF_STATE_DEQUEUED)
> @@ -2094,13 +2089,18 @@ static void __vb2_dqbuf(struct vb2_buffer *vb)
>  
>  	vb->state = VB2_BUF_STATE_DEQUEUED;
>  
> +	/* sync buffers */
> +	for (plane = 0; plane < vb->num_planes; plane++)
> +		call_void_memop(vb, finish, vb->planes[plane].mem_priv);
> +

to here.

I'm not sure if this is correct... So __vb2_dqbuf is called from __vb2_queue_cancel(),
but now the buf_finish() callback is called *before* the memop finish() callback,
where this was the other way around in __vb2_queue_cancel(). I don't think that is
right since buf_finish() expects that the buffer is synced for the cpu.

Was this tested with CONFIG_VIDEO_ADV_DEBUG set and with 'v4l2-compliance -s'?
Not that that would help if things are done in the wrong order...

Regards,

	Hans

>  	/* unmap DMABUF buffer */
>  	if (q->memory == V4L2_MEMORY_DMABUF)
> -		for (i = 0; i < vb->num_planes; ++i) {
> -			if (!vb->planes[i].dbuf_mapped)
> +		for (plane = 0; plane < vb->num_planes; ++plane) {
> +			if (!vb->planes[plane].dbuf_mapped)
>  				continue;
> -			call_void_memop(vb, unmap_dmabuf, vb->planes[i].mem_priv);
> -			vb->planes[i].dbuf_mapped = 0;
> +			call_void_memop(vb, unmap_dmabuf,
> +					vb->planes[plane].mem_priv);
> +			vb->planes[plane].dbuf_mapped = 0;
>  		}
>  }
>  
> 


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC RESEND 04/11] v4l: Unify cache management hint buffer flags
  2015-09-11 11:50 ` [RFC RESEND 04/11] v4l: Unify cache management hint buffer flags Sakari Ailus
@ 2015-09-11 16:26   ` Hans Verkuil
  2015-09-11 16:44     ` Hans Verkuil
  2016-12-15 20:15   ` Laurent Pinchart
  1 sibling, 1 reply; 38+ messages in thread
From: Hans Verkuil @ 2015-09-11 16:26 UTC (permalink / raw)
  To: Sakari Ailus, linux-media
  Cc: pawel, m.szyprowski, kyungmin.park, sumit.semwal, robdclark,
	daniel.vetter, labbott

On 09/11/2015 01:50 PM, Sakari Ailus wrote:
> The V4L2_BUF_FLAG_NO_CACHE_INVALIDATE and V4L2_BUF_FLAG_NO_CACHE_CLEAN
> buffer flags are currently not used by the kernel. Replace the definitions
> by a single V4L2_BUF_FLAG_NO_CACHE_SYNC flag to be used by further
> patches.
> 
> Different cache architectures should not be visible to the user space
> which can make no meaningful use of the differences anyway. In case a
> device can make use of non-coherent memory accesses, the necessary cache
> operations depend on the CPU architecture and the buffer type, not the
> requests of the user. The cache operation itself may be skipped on the
> user's request which was the purpose of the two flags.
> 
> On ARM the invalidate and clean are separate operations whereas on
> x86(-64) the two are a single operation (flush). Whether the hardware uses
> the buffer for reading (V4L2_BUF_TYPE_*_OUTPUT*) or writing
> (V4L2_BUF_TYPE_*CAPTURE*) already defines the required cache operation
> (clean and invalidate, respectively). No user input is required.
> 
> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>

Acked-by: Hans Verkuil <hans.verkuil@cisco.com>

> ---
>  Documentation/DocBook/media/v4l/io.xml | 25 +++++++++++--------------
>  include/trace/events/v4l2.h            |  3 +--
>  include/uapi/linux/videodev2.h         |  7 +++++--
>  3 files changed, 17 insertions(+), 18 deletions(-)
> 
> diff --git a/Documentation/DocBook/media/v4l/io.xml b/Documentation/DocBook/media/v4l/io.xml
> index 7bbc2a4..4facd63 100644
> --- a/Documentation/DocBook/media/v4l/io.xml
> +++ b/Documentation/DocBook/media/v4l/io.xml
> @@ -1112,21 +1112,18 @@ application. Drivers set or clear this flag when the
>  	  linkend="vidioc-qbuf">VIDIOC_DQBUF</link> ioctl is called.</entry>
>  	  </row>
>  	  <row>
> -	    <entry><constant>V4L2_BUF_FLAG_NO_CACHE_INVALIDATE</constant></entry>
> +	    <entry><constant>V4L2_BUF_FLAG_NO_CACHE_SYNC</constant></entry>
>  	    <entry>0x00000800</entry>
> -	    <entry>Caches do not have to be invalidated for this buffer.
> -Typically applications shall use this flag if the data captured in the buffer
> -is not going to be touched by the CPU, instead the buffer will, probably, be
> -passed on to a DMA-capable hardware unit for further processing or output.
> -</entry>
> -	  </row>
> -	  <row>
> -	    <entry><constant>V4L2_BUF_FLAG_NO_CACHE_CLEAN</constant></entry>
> -	    <entry>0x00001000</entry>
> -	    <entry>Caches do not have to be cleaned for this buffer.
> -Typically applications shall use this flag for output buffers if the data
> -in this buffer has not been created by the CPU but by some DMA-capable unit,
> -in which case caches have not been used.</entry>
> +	    <entry>Do not perform CPU cache synchronisation operations
> +	    when the buffer is queued or dequeued. The user is
> +	    responsible for the correct use of this flag. It should be
> +	    only used when the buffer is not accessed using the CPU,
> +	    e.g. the buffer is written to by a hardware block and then
> +	    read by another one, in which case the flag should be set
> +	    in both <link linkend="vidioc-qbuf">VIDIOC_DQBUF</link>
> +	    and <link linkend="vidioc-qbuf">VIDIOC_QBUF</link> IOCTLs.
> +	    The flag has no effect on some devices / architectures.
> +	    </entry>
>  	  </row>
>  	  <row>
>  	    <entry><constant>V4L2_BUF_FLAG_LAST</constant></entry>
> diff --git a/include/trace/events/v4l2.h b/include/trace/events/v4l2.h
> index dbf017b..4cee91d 100644
> --- a/include/trace/events/v4l2.h
> +++ b/include/trace/events/v4l2.h
> @@ -78,8 +78,7 @@ SHOW_FIELD
>  		{ V4L2_BUF_FLAG_ERROR,		     "ERROR" },		      \
>  		{ V4L2_BUF_FLAG_TIMECODE,	     "TIMECODE" },	      \
>  		{ V4L2_BUF_FLAG_PREPARED,	     "PREPARED" },	      \
> -		{ V4L2_BUF_FLAG_NO_CACHE_INVALIDATE, "NO_CACHE_INVALIDATE" }, \
> -		{ V4L2_BUF_FLAG_NO_CACHE_CLEAN,	     "NO_CACHE_CLEAN" },      \
> +		{ V4L2_BUF_FLAG_NO_CACHE_SYNC,	     "NO_CACHE_SYNC" },	      \
>  		{ V4L2_BUF_FLAG_TIMESTAMP_MASK,	     "TIMESTAMP_MASK" },      \
>  		{ V4L2_BUF_FLAG_TIMESTAMP_UNKNOWN,   "TIMESTAMP_UNKNOWN" },   \
>  		{ V4L2_BUF_FLAG_TIMESTAMP_MONOTONIC, "TIMESTAMP_MONOTONIC" }, \
> diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
> index 3228fbe..8d85aac 100644
> --- a/include/uapi/linux/videodev2.h
> +++ b/include/uapi/linux/videodev2.h
> @@ -875,8 +875,11 @@ struct v4l2_buffer {
>  #define V4L2_BUF_FLAG_TIMECODE			0x00000100
>  /* Buffer is prepared for queuing */
>  #define V4L2_BUF_FLAG_PREPARED			0x00000400
> -/* Cache handling flags */
> -#define V4L2_BUF_FLAG_NO_CACHE_INVALIDATE	0x00000800
> +/* Cache sync hint */
> +#define V4L2_BUF_FLAG_NO_CACHE_SYNC		0x00000800
> +/* DEPRECATED. THIS WILL BE REMOVED IN THE FUTURE! */
> +#define V4L2_BUF_FLAG_NO_CACHE_INVALIDATE	V4L2_BUF_FLAG_NO_CACHE_SYNC
> +/* DEPRECATED. THIS WILL BE REMOVED IN THE FUTURE! */
>  #define V4L2_BUF_FLAG_NO_CACHE_CLEAN		0x00001000
>  /* Timestamp type */
>  #define V4L2_BUF_FLAG_TIMESTAMP_MASK		0x0000e000
> 


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC RESEND 04/11] v4l: Unify cache management hint buffer flags
  2015-09-11 16:26   ` Hans Verkuil
@ 2015-09-11 16:44     ` Hans Verkuil
  0 siblings, 0 replies; 38+ messages in thread
From: Hans Verkuil @ 2015-09-11 16:44 UTC (permalink / raw)
  To: Sakari Ailus, linux-media
  Cc: pawel, m.szyprowski, kyungmin.park, sumit.semwal, robdclark,
	daniel.vetter, labbott

On 09/11/2015 06:26 PM, Hans Verkuil wrote:
> On 09/11/2015 01:50 PM, Sakari Ailus wrote:
>> The V4L2_BUF_FLAG_NO_CACHE_INVALIDATE and V4L2_BUF_FLAG_NO_CACHE_CLEAN
>> buffer flags are currently not used by the kernel. Replace the definitions
>> by a single V4L2_BUF_FLAG_NO_CACHE_SYNC flag to be used by further
>> patches.
>>
>> Different cache architectures should not be visible to the user space
>> which can make no meaningful use of the differences anyway. In case a
>> device can make use of non-coherent memory accesses, the necessary cache
>> operations depend on the CPU architecture and the buffer type, not the
>> requests of the user. The cache operation itself may be skipped on the
>> user's request which was the purpose of the two flags.
>>
>> On ARM the invalidate and clean are separate operations whereas on
>> x86(-64) the two are a single operation (flush). Whether the hardware uses
>> the buffer for reading (V4L2_BUF_TYPE_*_OUTPUT*) or writing
>> (V4L2_BUF_TYPE_*CAPTURE*) already defines the required cache operation
>> (clean and invalidate, respectively). No user input is required.
>>
>> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
> 
> Acked-by: Hans Verkuil <hans.verkuil@cisco.com>

Let me add something here: drivers should be able to override this flag.

There are drivers that need to do fixups in buf_finish (a typical example is to
fill in some JPEG or MPEG header). Obviously this does require syncing to the
cpu.

This is actually a bigger problem since trying to do this with dmabuf can fail
as well. I made patches for that, but I never got around to finishing it:

http://git.linuxtv.org/cgit.cgi/hverkuil/media_tree.git/log/?h=vb2-prep5

Basically this changes drivers so they are explicit when they need cpu access
to a buffer.

By using these begin/end_cpu_access functions you can still implement fixups
correctly, although it does mean that the NO_CACHE_SYNC is effectively (but
unavoidably) ignored.

I think my patch series should be merged with this patch series 

The cover letter I wrote for the RFC patch series containing these cpu access
functions is here:

http://comments.gmane.org/gmane.linux.drivers.video-input-infrastructure/84399

I quote:

"The last 5 patches make this more strict by requiring all cpu access to
be bracketed by calls to vb2_plane_begin/end_cpu_access() which replaces
the old vb2_plane_vaddr() call.

Note: two drivers still use the vb2_plane_vaddr() call: coda and
exynos4-is/fimc-capture.c. For both drivers I will need some help since
I am not sure where to put the begin/end calls. Patch 15 removes
the vb2_plane_vaddr call, so obviously those two drivers won't compile
after that."

So that's the missing part that I never got around to.

Of course, there are new drivers as well since the last time I posted it,
so there may be more to do now.

I also said:

"The issues that patches 12-16 address are [...] only an issue when using
dmabuf with drivers that need cpu access."

But it will *also* be an issue once this new NO_CACHE_SYNC flag is honored.

Regards,

	Hans

> 
>> ---
>>  Documentation/DocBook/media/v4l/io.xml | 25 +++++++++++--------------
>>  include/trace/events/v4l2.h            |  3 +--
>>  include/uapi/linux/videodev2.h         |  7 +++++--
>>  3 files changed, 17 insertions(+), 18 deletions(-)
>>
>> diff --git a/Documentation/DocBook/media/v4l/io.xml b/Documentation/DocBook/media/v4l/io.xml
>> index 7bbc2a4..4facd63 100644
>> --- a/Documentation/DocBook/media/v4l/io.xml
>> +++ b/Documentation/DocBook/media/v4l/io.xml
>> @@ -1112,21 +1112,18 @@ application. Drivers set or clear this flag when the
>>  	  linkend="vidioc-qbuf">VIDIOC_DQBUF</link> ioctl is called.</entry>
>>  	  </row>
>>  	  <row>
>> -	    <entry><constant>V4L2_BUF_FLAG_NO_CACHE_INVALIDATE</constant></entry>
>> +	    <entry><constant>V4L2_BUF_FLAG_NO_CACHE_SYNC</constant></entry>
>>  	    <entry>0x00000800</entry>
>> -	    <entry>Caches do not have to be invalidated for this buffer.
>> -Typically applications shall use this flag if the data captured in the buffer
>> -is not going to be touched by the CPU, instead the buffer will, probably, be
>> -passed on to a DMA-capable hardware unit for further processing or output.
>> -</entry>
>> -	  </row>
>> -	  <row>
>> -	    <entry><constant>V4L2_BUF_FLAG_NO_CACHE_CLEAN</constant></entry>
>> -	    <entry>0x00001000</entry>
>> -	    <entry>Caches do not have to be cleaned for this buffer.
>> -Typically applications shall use this flag for output buffers if the data
>> -in this buffer has not been created by the CPU but by some DMA-capable unit,
>> -in which case caches have not been used.</entry>
>> +	    <entry>Do not perform CPU cache synchronisation operations
>> +	    when the buffer is queued or dequeued. The user is
>> +	    responsible for the correct use of this flag. It should be
>> +	    only used when the buffer is not accessed using the CPU,
>> +	    e.g. the buffer is written to by a hardware block and then
>> +	    read by another one, in which case the flag should be set
>> +	    in both <link linkend="vidioc-qbuf">VIDIOC_DQBUF</link>
>> +	    and <link linkend="vidioc-qbuf">VIDIOC_QBUF</link> IOCTLs.
>> +	    The flag has no effect on some devices / architectures.
>> +	    </entry>
>>  	  </row>
>>  	  <row>
>>  	    <entry><constant>V4L2_BUF_FLAG_LAST</constant></entry>
>> diff --git a/include/trace/events/v4l2.h b/include/trace/events/v4l2.h
>> index dbf017b..4cee91d 100644
>> --- a/include/trace/events/v4l2.h
>> +++ b/include/trace/events/v4l2.h
>> @@ -78,8 +78,7 @@ SHOW_FIELD
>>  		{ V4L2_BUF_FLAG_ERROR,		     "ERROR" },		      \
>>  		{ V4L2_BUF_FLAG_TIMECODE,	     "TIMECODE" },	      \
>>  		{ V4L2_BUF_FLAG_PREPARED,	     "PREPARED" },	      \
>> -		{ V4L2_BUF_FLAG_NO_CACHE_INVALIDATE, "NO_CACHE_INVALIDATE" }, \
>> -		{ V4L2_BUF_FLAG_NO_CACHE_CLEAN,	     "NO_CACHE_CLEAN" },      \
>> +		{ V4L2_BUF_FLAG_NO_CACHE_SYNC,	     "NO_CACHE_SYNC" },	      \
>>  		{ V4L2_BUF_FLAG_TIMESTAMP_MASK,	     "TIMESTAMP_MASK" },      \
>>  		{ V4L2_BUF_FLAG_TIMESTAMP_UNKNOWN,   "TIMESTAMP_UNKNOWN" },   \
>>  		{ V4L2_BUF_FLAG_TIMESTAMP_MONOTONIC, "TIMESTAMP_MONOTONIC" }, \
>> diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
>> index 3228fbe..8d85aac 100644
>> --- a/include/uapi/linux/videodev2.h
>> +++ b/include/uapi/linux/videodev2.h
>> @@ -875,8 +875,11 @@ struct v4l2_buffer {
>>  #define V4L2_BUF_FLAG_TIMECODE			0x00000100
>>  /* Buffer is prepared for queuing */
>>  #define V4L2_BUF_FLAG_PREPARED			0x00000400
>> -/* Cache handling flags */
>> -#define V4L2_BUF_FLAG_NO_CACHE_INVALIDATE	0x00000800
>> +/* Cache sync hint */
>> +#define V4L2_BUF_FLAG_NO_CACHE_SYNC		0x00000800
>> +/* DEPRECATED. THIS WILL BE REMOVED IN THE FUTURE! */
>> +#define V4L2_BUF_FLAG_NO_CACHE_INVALIDATE	V4L2_BUF_FLAG_NO_CACHE_SYNC
>> +/* DEPRECATED. THIS WILL BE REMOVED IN THE FUTURE! */
>>  #define V4L2_BUF_FLAG_NO_CACHE_CLEAN		0x00001000
>>  /* Timestamp type */
>>  #define V4L2_BUF_FLAG_TIMESTAMP_MASK		0x0000e000
>>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-media" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC RESEND 05/11] v4l2-core: Don't sync cache for a buffer if so requested
  2015-09-11 11:50 ` [RFC RESEND 05/11] v4l2-core: Don't sync cache for a buffer if so requested Sakari Ailus
@ 2015-09-11 17:12   ` Hans Verkuil
  2015-09-15  8:22     ` Sakari Ailus
  2016-12-15 20:37   ` Laurent Pinchart
  1 sibling, 1 reply; 38+ messages in thread
From: Hans Verkuil @ 2015-09-11 17:12 UTC (permalink / raw)
  To: Sakari Ailus, linux-media
  Cc: pawel, m.szyprowski, kyungmin.park, sumit.semwal, robdclark,
	daniel.vetter, labbott, Samu Onkalo

On 09/11/2015 01:50 PM, Sakari Ailus wrote:
> From: Samu Onkalo <samu.onkalo@intel.com>
> 
> The user may request to the driver (vb2) to skip the cache maintenance
> operations in case the buffer does not need cache synchronisation, e.g. in
> cases where the buffer is passed between hardware blocks without it being
> touched by the CPU.
> 
> Also document that the prepare and finish vb2_mem_ops might not get called
> every time the buffer ownership changes between the kernel and the user
> space.
> 
> Signed-off-by: Samu Onkalo <samu.onkalo@intel.com>
> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
> ---
>  drivers/media/v4l2-core/videobuf2-core.c | 52 +++++++++++++++++++++++++-------
>  include/media/videobuf2-core.h           | 12 +++++---
>  2 files changed, 49 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/media/v4l2-core/videobuf2-core.c b/drivers/media/v4l2-core/videobuf2-core.c
> index c5c0707a..b664024 100644
> --- a/drivers/media/v4l2-core/videobuf2-core.c
> +++ b/drivers/media/v4l2-core/videobuf2-core.c
> @@ -187,6 +187,28 @@ static void __vb2_queue_cancel(struct vb2_queue *q);
>  static void __enqueue_in_driver(struct vb2_buffer *vb);
>  
>  /**
> + * __mem_prepare_planes() - call finish mem op for all planes of the buffer
> + */
> +static void __mem_prepare_planes(struct vb2_buffer *vb)
> +{
> +	unsigned int plane;
> +
> +	for (plane = 0; plane < vb->num_planes; ++plane)
> +		call_void_memop(vb, prepare, vb->planes[plane].mem_priv);
> +}
> +
> +/**
> + * __mem_finish_planes() - call finish mem op for all planes of the buffer
> + */
> +static void __mem_finish_planes(struct vb2_buffer *vb)
> +{
> +	unsigned int plane;
> +
> +	for (plane = 0; plane < vb->num_planes; ++plane)
> +		call_void_memop(vb, finish, vb->planes[plane].mem_priv);
> +}
> +
> +/**
>   * __vb2_buf_mem_alloc() - allocate video memory for the given buffer
>   */
>  static int __vb2_buf_mem_alloc(struct vb2_buffer *vb)
> @@ -1391,6 +1413,10 @@ static void __fill_vb2_buffer(struct vb2_buffer *vb, const struct v4l2_buffer *b
>  static int __prepare_mmap(struct vb2_buffer *vb, const struct v4l2_buffer *b)
>  {
>  	__fill_vb2_buffer(vb, b, vb->v4l2_planes);
> +
> +	if (!(b->flags & V4L2_BUF_FLAG_NO_CACHE_SYNC))
> +		__mem_prepare_planes(vb);
> +
>  	return call_vb_qop(vb, buf_prepare, vb);
>  }
>  
> @@ -1476,6 +1502,11 @@ static int __prepare_userptr(struct vb2_buffer *vb,
>  			dprintk(1, "buffer initialization failed\n");
>  			goto err;
>  		}
> +
> +		/* This is new buffer memory --- always synchronise cache. */
> +		__mem_prepare_planes(vb);
> +	} else if (!(b->flags & V4L2_BUF_FLAG_NO_CACHE_SYNC)) {
> +		__mem_prepare_planes(vb);
>  	}
>  
>  	ret = call_vb_qop(vb, buf_prepare, vb);
> @@ -1601,6 +1632,11 @@ static int __prepare_dmabuf(struct vb2_buffer *vb, const struct v4l2_buffer *b)
>  			dprintk(1, "buffer initialization failed\n");
>  			goto err;
>  		}
> +
> +		/* This is new buffer memory --- always synchronise cache. */
> +		__mem_prepare_planes(vb);
> +	} else if (!(b->flags & V4L2_BUF_FLAG_NO_CACHE_SYNC)) {
> +		__mem_prepare_planes(vb);
>  	}
>  
>  	ret = call_vb_qop(vb, buf_prepare, vb);
> @@ -1624,7 +1660,6 @@ err:
>  static void __enqueue_in_driver(struct vb2_buffer *vb)
>  {
>  	struct vb2_queue *q = vb->vb2_queue;
> -	unsigned int plane;
>  
>  	vb->state = VB2_BUF_STATE_ACTIVE;
>  	atomic_inc(&q->owned_by_drv_count);
> @@ -1691,10 +1726,6 @@ static int __buf_prepare(struct vb2_buffer *vb, const struct v4l2_buffer *b)
>  		return ret;
>  	}
>  
> -	/* sync buffers */
> -	for (plane = 0; plane < vb->num_planes; ++plane)
> -		call_void_memop(vb, prepare, vb->planes[plane].mem_priv);
> -

Why not keep the prepare memop call here? Why push it down into the __prepare_*
functions?

It is wrong too, since the prepare memop should be called *after* the buf_prepare
callback (buf_prepare should be allowed to touch the buffer for fixups for video
output).

With the begin/end_cpu_access() code from my patch series buf_prepare would handle
this correctly, but we might be doing unnecessary syncs.

The more I review this, the more I think that the first step should be to take my
begin/end_cpu_access patch series. With that in place it is at least explicit if a
driver needs cpu access to a buffer.

And for USB drivers that use videobuf2-vmalloc this means that they always need cpu
access between the buf_prepare() and the buf_finish() calls. After all, they are
copying data from URBs into the buffer.

Regards,

	Hans

>  	vb->state = VB2_BUF_STATE_PREPARED;
>  
>  	return 0;
> @@ -2078,7 +2109,7 @@ EXPORT_SYMBOL_GPL(vb2_wait_for_all_buffers);
>  /**
>   * __vb2_dqbuf() - bring back the buffer to the DEQUEUED state
>   */
> -static void __vb2_dqbuf(struct vb2_buffer *vb)
> +static void __vb2_dqbuf(struct vb2_buffer *vb, bool no_cache_sync)
>  {
>  	struct vb2_queue *q = vb->vb2_queue;
>  	unsigned int plane;
> @@ -2089,9 +2120,8 @@ static void __vb2_dqbuf(struct vb2_buffer *vb)
>  
>  	vb->state = VB2_BUF_STATE_DEQUEUED;
>  
> -	/* sync buffers */
> -	for (plane = 0; plane < vb->num_planes; plane++)
> -		call_void_memop(vb, finish, vb->planes[plane].mem_priv);
> +	if (!no_cache_sync)
> +		__mem_finish_planes(vb);
>  
>  	/* unmap DMABUF buffer */
>  	if (q->memory == V4L2_MEMORY_DMABUF)
> @@ -2143,7 +2173,7 @@ static int vb2_internal_dqbuf(struct vb2_queue *q, struct v4l2_buffer *b, bool n
>  	    vb->v4l2_buf.flags & V4L2_BUF_FLAG_LAST)
>  		q->last_buffer_dequeued = true;
>  	/* go back to dequeued state */
> -	__vb2_dqbuf(vb);
> +	__vb2_dqbuf(vb, b->flags & V4L2_BUF_FLAG_NO_CACHE_SYNC);
>  
>  	dprintk(1, "dqbuf of buffer %d, with state %d\n",
>  			vb->v4l2_buf.index, vb->state);
> @@ -2246,7 +2276,7 @@ static void __vb2_queue_cancel(struct vb2_queue *q)
>  			vb->state = VB2_BUF_STATE_PREPARED;
>  			call_void_vb_qop(vb, buf_finish, vb);
>  		}
> -		__vb2_dqbuf(vb);
> +		__vb2_dqbuf(vb, false);
>  	}
>  }
>  
> diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h
> index 4f7f7ae..a825bd5 100644
> --- a/include/media/videobuf2-core.h
> +++ b/include/media/videobuf2-core.h
> @@ -59,10 +59,14 @@ struct vb2_threadio_data;
>   *		dmabuf.
>   * @unmap_dmabuf: releases access control to the dmabuf - allocator is notified
>   *		  that this driver is done using the dmabuf for now.
> - * @prepare:	called every time the buffer is passed from userspace to the
> - *		driver, useful for cache synchronisation, optional.
> - * @finish:	called every time the buffer is passed back from the driver
> - *		to the userspace, also optional.
> + * @prepare:	Called on the plane when the buffer ownership is passed from
> + *		the user space to the kernel and the plane must be cache
> + *		syncronised. The V4L2_BUF_FLAG_NO_CACHE_SYNC buffer flag may
> + *		be used to skip this call. Optional.
> + * @finish:	Called on the plane when the buffer ownership is passed from
> + *		the kernel to the user space and the plane must be cache
> + *		syncronised. The V4L2_BUF_FLAG_NO_CACHE_SYNC buffer flag may
> + *		be used to skip this call. Optional.
>   * @vaddr:	return a kernel virtual address to a given memory buffer
>   *		associated with the passed private structure or NULL if no
>   *		such mapping exists.
> 


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC RESEND 06/11] vb2: Improve struct vb2_mem_ops documentation; alloc and put are for MMAP
  2015-09-11 11:50 ` [RFC RESEND 06/11] vb2: Improve struct vb2_mem_ops documentation; alloc and put are for MMAP Sakari Ailus
@ 2015-09-11 17:13   ` Hans Verkuil
  2016-12-15 20:50   ` Laurent Pinchart
  1 sibling, 0 replies; 38+ messages in thread
From: Hans Verkuil @ 2015-09-11 17:13 UTC (permalink / raw)
  To: Sakari Ailus, linux-media
  Cc: pawel, m.szyprowski, kyungmin.park, sumit.semwal, robdclark,
	daniel.vetter, labbott

On 09/11/2015 01:50 PM, Sakari Ailus wrote:
> The alloc() and put() ops are for MMAP buffers only. Document it.
> 
> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>'

Acked-by: Hans Verkuil <hans.verkuil@cisco.com>

> ---
>  include/media/videobuf2-core.h | 20 ++++++++++----------
>  1 file changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h
> index a825bd5..efc9a19 100644
> --- a/include/media/videobuf2-core.h
> +++ b/include/media/videobuf2-core.h
> @@ -24,16 +24,16 @@ struct vb2_threadio_data;
>  
>  /**
>   * struct vb2_mem_ops - memory handling/memory allocator operations
> - * @alloc:	allocate video memory and, optionally, allocator private data,
> - *		return NULL on failure or a pointer to allocator private,
> - *		per-buffer data on success; the returned private structure
> - *		will then be passed as buf_priv argument to other ops in this
> - *		structure. Additional gfp_flags to use when allocating the
> - *		are also passed to this operation. These flags are from the
> - *		gfp_flags field of vb2_queue.
> - * @put:	inform the allocator that the buffer will no longer be used;
> - *		usually will result in the allocator freeing the buffer (if
> - *		no other users of this buffer are present); the buf_priv
> + * @alloc:	allocate video memory for an MMAP buffer and, optionally,
> + *		allocator private data, return NULL on failure or a pointer
> + *		to allocator private, per-buffer data on success; the returned
> + *		private structure will then be passed as buf_priv argument to
> + *		other ops in this structure. Additional gfp_flags to use when
> + *		allocating the are also passed to this operation. These flags
> + *		are from the gfp_flags field of vb2_queue.
> + * @put:	inform the allocator that the MMAP buffer will no longer be
> + *		used; usually will result in the allocator freeing the buffer
> + *		(if no other users of this buffer are present); the buf_priv
>   *		argument is the allocator private per-buffer structure
>   *		previously returned from the alloc callback.
>   * @get_userptr: acquire userspace memory for a hardware operation; used for
> 


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC RESEND 07/11] vb2: dma-contig: Remove redundant sgt_base field
  2015-09-11 11:50 ` [RFC RESEND 07/11] vb2: dma-contig: Remove redundant sgt_base field Sakari Ailus
@ 2015-09-11 17:28   ` Hans Verkuil
  2015-09-15  8:26     ` Sakari Ailus
  2016-12-15 21:08   ` Laurent Pinchart
  1 sibling, 1 reply; 38+ messages in thread
From: Hans Verkuil @ 2015-09-11 17:28 UTC (permalink / raw)
  To: Sakari Ailus, linux-media
  Cc: pawel, m.szyprowski, kyungmin.park, sumit.semwal, robdclark,
	daniel.vetter, labbott

On 09/11/2015 01:50 PM, Sakari Ailus wrote:
> The struct vb2_dc_buf contains two struct sg_table fields: sgt_base and
> dma_sgt. The former is used by DMA-BUF buffers whereas the latter is used
> by USERPTR.
> 
> Unify the two, leaving dma_sgt.
> 
> MMAP buffers do not need cache flushing since they have been allocated
> using dma_alloc_coherent().

I would have to see this again after it is rebased on 4.3-rc1. That will contain
Jan Kara's vb2 changes which might well affect this patch.

Are there use-cases where we want to allocate non-coherent memory? I know we
don't support this today, but is this something we might want in the future?

Just curious.

	Hans

> 
> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
> ---
>  drivers/media/v4l2-core/videobuf2-dma-contig.c | 21 ++++++++++-----------
>  1 file changed, 10 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/media/v4l2-core/videobuf2-dma-contig.c b/drivers/media/v4l2-core/videobuf2-dma-contig.c
> index 94c1e64..26a0a0f 100644
> --- a/drivers/media/v4l2-core/videobuf2-dma-contig.c
> +++ b/drivers/media/v4l2-core/videobuf2-dma-contig.c
> @@ -36,7 +36,6 @@ struct vb2_dc_buf {
>  	/* MMAP related */
>  	struct vb2_vmarea_handler	handler;
>  	atomic_t			refcount;
> -	struct sg_table			*sgt_base;
>  
>  	/* USERPTR related */
>  	struct vm_area_struct		*vma;
> @@ -117,7 +116,7 @@ static void vb2_dc_prepare(void *buf_priv)
>  	struct sg_table *sgt = buf->dma_sgt;
>  
>  	/* DMABUF exporter will flush the cache for us */
> -	if (!sgt || buf->db_attach)
> +	if (!buf->vma)
>  		return;
>  
>  	dma_sync_sg_for_device(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir);
> @@ -129,7 +128,7 @@ static void vb2_dc_finish(void *buf_priv)
>  	struct sg_table *sgt = buf->dma_sgt;
>  
>  	/* DMABUF exporter will flush the cache for us */
> -	if (!sgt || buf->db_attach)
> +	if (!buf->vma)
>  		return;
>  
>  	dma_sync_sg_for_cpu(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir);
> @@ -146,9 +145,9 @@ static void vb2_dc_put(void *buf_priv)
>  	if (!atomic_dec_and_test(&buf->refcount))
>  		return;
>  
> -	if (buf->sgt_base) {
> -		sg_free_table(buf->sgt_base);
> -		kfree(buf->sgt_base);
> +	if (buf->dma_sgt) {
> +		sg_free_table(buf->dma_sgt);
> +		kfree(buf->dma_sgt);
>  	}
>  	dma_free_coherent(buf->dev, buf->size, buf->vaddr, buf->dma_addr);
>  	put_device(buf->dev);
> @@ -252,13 +251,13 @@ static int vb2_dc_dmabuf_ops_attach(struct dma_buf *dbuf, struct device *dev,
>  	/* Copy the buf->base_sgt scatter list to the attachment, as we can't
>  	 * map the same scatter list to multiple attachments at the same time.
>  	 */
> -	ret = sg_alloc_table(sgt, buf->sgt_base->orig_nents, GFP_KERNEL);
> +	ret = sg_alloc_table(sgt, buf->dma_sgt->orig_nents, GFP_KERNEL);
>  	if (ret) {
>  		kfree(attach);
>  		return -ENOMEM;
>  	}
>  
> -	rd = buf->sgt_base->sgl;
> +	rd = buf->dma_sgt->sgl;
>  	wr = sgt->sgl;
>  	for (i = 0; i < sgt->orig_nents; ++i) {
>  		sg_set_page(wr, sg_page(rd), rd->length, rd->offset);
> @@ -409,10 +408,10 @@ static struct dma_buf *vb2_dc_get_dmabuf(void *buf_priv, unsigned long flags)
>  	exp_info.flags = flags;
>  	exp_info.priv = buf;
>  
> -	if (!buf->sgt_base)
> -		buf->sgt_base = vb2_dc_get_base_sgt(buf);
> +	if (!buf->dma_sgt)
> +		buf->dma_sgt = vb2_dc_get_base_sgt(buf);
>  
> -	if (WARN_ON(!buf->sgt_base))
> +	if (WARN_ON(!buf->dma_sgt))
>  		return NULL;
>  
>  	dbuf = dma_buf_export(&exp_info);
> 


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC RESEND 03/11] vb2: Move cache synchronisation from buffer done to dqbuf handler
  2015-09-11 16:25   ` Hans Verkuil
@ 2015-09-15  7:51     ` Sakari Ailus
  2016-12-15 23:47       ` Laurent Pinchart
  0 siblings, 1 reply; 38+ messages in thread
From: Sakari Ailus @ 2015-09-15  7:51 UTC (permalink / raw)
  To: Hans Verkuil, linux-media
  Cc: pawel, m.szyprowski, kyungmin.park, sumit.semwal, robdclark,
	daniel.vetter, labbott

Hi Hans,

Thank you for the review!

Hans Verkuil wrote:
> On 09/11/2015 01:50 PM, Sakari Ailus wrote:
>> The cache synchronisation may be a time consuming operation and thus not
>> best performed in an interrupt which is a typical context for
>> vb2_buffer_done() calls. This may consume up to tens of ms on some
>> machines, depending on the buffer size.
>>
>> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
>> ---
>>  drivers/media/v4l2-core/videobuf2-core.c | 20 ++++++++++----------
>>  1 file changed, 10 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/media/v4l2-core/videobuf2-core.c b/drivers/media/v4l2-core/videobuf2-core.c
>> index 64fce4d..c5c0707a 100644
>> --- a/drivers/media/v4l2-core/videobuf2-core.c
>> +++ b/drivers/media/v4l2-core/videobuf2-core.c
>> @@ -1177,7 +1177,6 @@ void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state)
>>  {
>>  	struct vb2_queue *q = vb->vb2_queue;
>>  	unsigned long flags;
>> -	unsigned int plane;
>>  
>>  	if (WARN_ON(vb->state != VB2_BUF_STATE_ACTIVE))
>>  		return;
>> @@ -1197,10 +1196,6 @@ void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state)
>>  	dprintk(4, "done processing on buffer %d, state: %d\n",
>>  			vb->v4l2_buf.index, state);
>>  
>> -	/* sync buffers */
>> -	for (plane = 0; plane < vb->num_planes; ++plane)
>> -		call_void_memop(vb, finish, vb->planes[plane].mem_priv);
>> -
> 
> Ah, OK, so it is removed here,

I can merge the two patches for the next version if you prefer that.

> 
>>  	/* Add the buffer to the done buffers list */
>>  	spin_lock_irqsave(&q->done_lock, flags);
>>  	vb->state = state;
>> @@ -2086,7 +2081,7 @@ EXPORT_SYMBOL_GPL(vb2_wait_for_all_buffers);
>>  static void __vb2_dqbuf(struct vb2_buffer *vb)
>>  {
>>  	struct vb2_queue *q = vb->vb2_queue;
>> -	unsigned int i;
>> +	unsigned int plane;
>>  
>>  	/* nothing to do if the buffer is already dequeued */
>>  	if (vb->state == VB2_BUF_STATE_DEQUEUED)
>> @@ -2094,13 +2089,18 @@ static void __vb2_dqbuf(struct vb2_buffer *vb)
>>  
>>  	vb->state = VB2_BUF_STATE_DEQUEUED;
>>  
>> +	/* sync buffers */
>> +	for (plane = 0; plane < vb->num_planes; plane++)
>> +		call_void_memop(vb, finish, vb->planes[plane].mem_priv);
>> +
> 
> to here.
> 
> I'm not sure if this is correct... So __vb2_dqbuf is called from __vb2_queue_cancel(),
> but now the buf_finish() callback is called *before* the memop finish() callback,
> where this was the other way around in __vb2_queue_cancel(). I don't think that is
> right since buf_finish() expects that the buffer is synced for the cpu.

I don't mind reordering them. The vb->state will be different as
__vb2_dqbuf() has already been called, there's at least one buffer state
check in a driver. However, __vb2_dqbuf() unconditionally sets the
buffer state to DEQUEUED, overriding e.g. ERROR which a driver would be
interested in.

I think the cache sync needs to be moved out of __vb2_dqbuf() to the
same level where it's called so proper ordering can be maintained while
still flushing cache before buf_finish() is called.

> 
> Was this tested with CONFIG_VIDEO_ADV_DEBUG set and with 'v4l2-compliance -s'?
> Not that that would help if things are done in the wrong order...

I'll do that the next time.

-- 
Regards,

Sakari Ailus
sakari.ailus@linux.intel.com

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC RESEND 05/11] v4l2-core: Don't sync cache for a buffer if so requested
  2015-09-11 17:12   ` Hans Verkuil
@ 2015-09-15  8:22     ` Sakari Ailus
  2015-09-15  9:06       ` Hans Verkuil
  0 siblings, 1 reply; 38+ messages in thread
From: Sakari Ailus @ 2015-09-15  8:22 UTC (permalink / raw)
  To: Hans Verkuil, linux-media
  Cc: pawel, m.szyprowski, kyungmin.park, sumit.semwal, robdclark,
	daniel.vetter, labbott, Samu Onkalo

Hi Hans,

Hans Verkuil wrote:
> On 09/11/2015 01:50 PM, Sakari Ailus wrote:
>> From: Samu Onkalo <samu.onkalo@intel.com>
>>
>> The user may request to the driver (vb2) to skip the cache maintenance
>> operations in case the buffer does not need cache synchronisation, e.g. in
>> cases where the buffer is passed between hardware blocks without it being
>> touched by the CPU.
>>
>> Also document that the prepare and finish vb2_mem_ops might not get called
>> every time the buffer ownership changes between the kernel and the user
>> space.
>>
>> Signed-off-by: Samu Onkalo <samu.onkalo@intel.com>
>> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
>> ---
>>  drivers/media/v4l2-core/videobuf2-core.c | 52 +++++++++++++++++++++++++-------
>>  include/media/videobuf2-core.h           | 12 +++++---
>>  2 files changed, 49 insertions(+), 15 deletions(-)
>>
>> diff --git a/drivers/media/v4l2-core/videobuf2-core.c b/drivers/media/v4l2-core/videobuf2-core.c
>> index c5c0707a..b664024 100644
>> --- a/drivers/media/v4l2-core/videobuf2-core.c
>> +++ b/drivers/media/v4l2-core/videobuf2-core.c
>> @@ -187,6 +187,28 @@ static void __vb2_queue_cancel(struct vb2_queue *q);
>>  static void __enqueue_in_driver(struct vb2_buffer *vb);
>>  
>>  /**
>> + * __mem_prepare_planes() - call finish mem op for all planes of the buffer
>> + */
>> +static void __mem_prepare_planes(struct vb2_buffer *vb)
>> +{
>> +	unsigned int plane;
>> +
>> +	for (plane = 0; plane < vb->num_planes; ++plane)
>> +		call_void_memop(vb, prepare, vb->planes[plane].mem_priv);
>> +}
>> +
>> +/**
>> + * __mem_finish_planes() - call finish mem op for all planes of the buffer
>> + */
>> +static void __mem_finish_planes(struct vb2_buffer *vb)
>> +{
>> +	unsigned int plane;
>> +
>> +	for (plane = 0; plane < vb->num_planes; ++plane)
>> +		call_void_memop(vb, finish, vb->planes[plane].mem_priv);
>> +}
>> +
>> +/**
>>   * __vb2_buf_mem_alloc() - allocate video memory for the given buffer
>>   */
>>  static int __vb2_buf_mem_alloc(struct vb2_buffer *vb)
>> @@ -1391,6 +1413,10 @@ static void __fill_vb2_buffer(struct vb2_buffer *vb, const struct v4l2_buffer *b
>>  static int __prepare_mmap(struct vb2_buffer *vb, const struct v4l2_buffer *b)
>>  {
>>  	__fill_vb2_buffer(vb, b, vb->v4l2_planes);
>> +
>> +	if (!(b->flags & V4L2_BUF_FLAG_NO_CACHE_SYNC))
>> +		__mem_prepare_planes(vb);
>> +
>>  	return call_vb_qop(vb, buf_prepare, vb);
>>  }
>>  
>> @@ -1476,6 +1502,11 @@ static int __prepare_userptr(struct vb2_buffer *vb,
>>  			dprintk(1, "buffer initialization failed\n");
>>  			goto err;
>>  		}
>> +
>> +		/* This is new buffer memory --- always synchronise cache. */
>> +		__mem_prepare_planes(vb);
>> +	} else if (!(b->flags & V4L2_BUF_FLAG_NO_CACHE_SYNC)) {
>> +		__mem_prepare_planes(vb);
>>  	}
>>  
>>  	ret = call_vb_qop(vb, buf_prepare, vb);
>> @@ -1601,6 +1632,11 @@ static int __prepare_dmabuf(struct vb2_buffer *vb, const struct v4l2_buffer *b)
>>  			dprintk(1, "buffer initialization failed\n");
>>  			goto err;
>>  		}
>> +
>> +		/* This is new buffer memory --- always synchronise cache. */
>> +		__mem_prepare_planes(vb);
>> +	} else if (!(b->flags & V4L2_BUF_FLAG_NO_CACHE_SYNC)) {
>> +		__mem_prepare_planes(vb);
>>  	}
>>  
>>  	ret = call_vb_qop(vb, buf_prepare, vb);
>> @@ -1624,7 +1660,6 @@ err:
>>  static void __enqueue_in_driver(struct vb2_buffer *vb)
>>  {
>>  	struct vb2_queue *q = vb->vb2_queue;
>> -	unsigned int plane;
>>  
>>  	vb->state = VB2_BUF_STATE_ACTIVE;
>>  	atomic_inc(&q->owned_by_drv_count);
>> @@ -1691,10 +1726,6 @@ static int __buf_prepare(struct vb2_buffer *vb, const struct v4l2_buffer *b)
>>  		return ret;
>>  	}
>>  
>> -	/* sync buffers */
>> -	for (plane = 0; plane < vb->num_planes; ++plane)
>> -		call_void_memop(vb, prepare, vb->planes[plane].mem_priv);
>> -
> 
> Why not keep the prepare memop call here? Why push it down into the __prepare_*
> functions?

I think flushing the buffer the first time it's used is a good thing to
do regardless of what the user suggests, since it's freshly allocated
memory and there could data in the CPU write cache. Once streaming is
ongoing, the user process owns the buffer memory.

> 
> It is wrong too, since the prepare memop should be called *after* the buf_prepare
> callback (buf_prepare should be allowed to touch the buffer for fixups for video
> output).

This is a different matter. I'll reorder the calls.

> 
> With the begin/end_cpu_access() code from my patch series buf_prepare would handle
> this correctly, but we might be doing unnecessary syncs.
> 
> The more I review this, the more I think that the first step should be to take my
> begin/end_cpu_access patch series. With that in place it is at least explicit if a
> driver needs cpu access to a buffer.

Could you rebase and re-post what's not in upstream of that set, please?

> And for USB drivers that use videobuf2-vmalloc this means that they always need cpu
> access between the buf_prepare() and the buf_finish() calls. After all, they are
> copying data from URBs into the buffer.

Indeed.

-- 
Kind regards,

Sakari Ailus
sakari.ailus@linux.intel.com

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC RESEND 07/11] vb2: dma-contig: Remove redundant sgt_base field
  2015-09-11 17:28   ` Hans Verkuil
@ 2015-09-15  8:26     ` Sakari Ailus
  0 siblings, 0 replies; 38+ messages in thread
From: Sakari Ailus @ 2015-09-15  8:26 UTC (permalink / raw)
  To: Hans Verkuil, linux-media
  Cc: pawel, m.szyprowski, kyungmin.park, sumit.semwal, robdclark,
	daniel.vetter, labbott

Hi Hans,

Hans Verkuil wrote:
> On 09/11/2015 01:50 PM, Sakari Ailus wrote:
>> The struct vb2_dc_buf contains two struct sg_table fields: sgt_base and
>> dma_sgt. The former is used by DMA-BUF buffers whereas the latter is used
>> by USERPTR.
>>
>> Unify the two, leaving dma_sgt.
>>
>> MMAP buffers do not need cache flushing since they have been allocated
>> using dma_alloc_coherent().
> 
> I would have to see this again after it is rebased on 4.3-rc1. That will contain
> Jan Kara's vb2 changes which might well affect this patch.

Ok. I'll do that.

> 
> Are there use-cases where we want to allocate non-coherent memory? I know we
> don't support this today, but is this something we might want in the future?

Yes. Not all hardware supports coherent memory access, not even on x86.
Or coherent memory (sometimes uncached!) may be slower than non-coherent
access, including the time it takes to flush the cache first.

-- 
Kind regards,

Sakari Ailus
sakari.ailus@linux.intel.com

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC RESEND 05/11] v4l2-core: Don't sync cache for a buffer if so requested
  2015-09-15  8:22     ` Sakari Ailus
@ 2015-09-15  9:06       ` Hans Verkuil
  0 siblings, 0 replies; 38+ messages in thread
From: Hans Verkuil @ 2015-09-15  9:06 UTC (permalink / raw)
  To: Sakari Ailus, linux-media
  Cc: pawel, m.szyprowski, kyungmin.park, sumit.semwal, robdclark,
	daniel.vetter, labbott, Samu Onkalo

On 09/15/15 10:22, Sakari Ailus wrote:
> Could you rebase and re-post what's not in upstream of that set, please?

Done. Available here:

http://git.linuxtv.org/cgit.cgi/hverkuil/media_tree.git/log/?h=vb2-cpu-access

These drivers need work to replace vb2_plane_vaddr by
vb2_plane_begin/end_cpu_access():

$ git grep vb2_plane_vaddr
drivers/media/pci/netup_unidvb/netup_unidvb_core.c:     u8 *p = vb2_plane_vaddr(&buf->vb, 0);
drivers/media/platform/coda/coda-bit.c: n = kfifo_in(&ctx->bitstream_fifo, vb2_plane_vaddr(src_buf, 0),
drivers/media/platform/coda/coda-bit.c: if (vb2_plane_vaddr(src_buf, 0) == NULL) {
drivers/media/platform/coda/coda-bit.c:         memset(vb2_plane_vaddr(buf, 0), 0, 64);
drivers/media/platform/coda/coda-bit.c:                 if (((char *)vb2_plane_vaddr(buf, 0))[i] != 0)
drivers/media/platform/coda/coda-bit.c: memcpy(header, vb2_plane_vaddr(buf, 0), *size);
drivers/media/platform/coda/coda-bit.c:         memcpy(vb2_plane_vaddr(dst_buf, 0),
drivers/media/platform/coda/coda-bit.c:         memcpy(vb2_plane_vaddr(dst_buf, 0) + ctx->vpu_header_size[0],
drivers/media/platform/coda/coda-bit.c:         memcpy(vb2_plane_vaddr(dst_buf, 0) + ctx->vpu_header_size[0] +
drivers/media/platform/coda/coda-jpeg.c:        void *vaddr = vb2_plane_vaddr(vb, 0);
drivers/media/platform/exynos4-is/fimc-capture.c:                       vaddr = vb2_plane_vaddr(&v_buf->vb, plane);
drivers/media/platform/rcar_jpu.c:              void *buffer = vb2_plane_vaddr(vb, 0);
drivers/media/platform/rcar_jpu.c:      buffer = vb2_plane_vaddr(vb, 0);
drivers/media/usb/au0828/au0828-vbi.c:  buf->mem = vb2_plane_vaddr(vb, 0);
drivers/media/usb/au0828/au0828-video.c:                outp = vb2_plane_vaddr(&buf->vb, 0);
drivers/media/usb/au0828/au0828-video.c:                vbioutp = vb2_plane_vaddr(&vbi_buf->vb, 0);
drivers/media/usb/au0828/au0828-video.c:                                        vbioutp = vb2_plane_vaddr(
drivers/media/usb/au0828/au0828-video.c:                                        outp = vb2_plane_vaddr(&buf->vb, 0);
drivers/media/usb/au0828/au0828-video.c:        buf->mem = vb2_plane_vaddr(vb, 0);
drivers/media/usb/au0828/au0828-video.c:                vid_data = vb2_plane_vaddr(&buf->vb, 0);
drivers/media/usb/au0828/au0828-video.c:                vbi_data = vb2_plane_vaddr(&buf->vb, 0);

The coda and exynos4-is I never converted and the others are new.

Regards,

	Hans

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC RESEND 01/11] vb2: Rename confusingly named internal buffer preparation functions
  2015-09-11 11:50 ` [RFC RESEND 01/11] vb2: Rename confusingly named internal buffer preparation functions Sakari Ailus
  2015-09-11 15:57   ` Hans Verkuil
@ 2016-12-15 15:40   ` Laurent Pinchart
  1 sibling, 0 replies; 38+ messages in thread
From: Laurent Pinchart @ 2016-12-15 15:40 UTC (permalink / raw)
  To: Sakari Ailus
  Cc: linux-media, pawel, m.szyprowski, kyungmin.park, hverkuil,
	sumit.semwal, robdclark, daniel.vetter, labbott

Hi Sakari,

Thank you for the patch.

On Friday 11 Sep 2015 14:50:24 Sakari Ailus wrote:
> Rename __qbuf_*() functions which are specific to a buffer type as
> __prepare_*() which matches with what they do. The naming was there for
> historical reasons; the purpose of the functions was changed without
> renaming them.
> 
> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>

Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>

> ---
>  drivers/media/v4l2-core/videobuf2-core.c | 19 ++++++++++---------
>  1 file changed, 10 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/media/v4l2-core/videobuf2-core.c
> b/drivers/media/v4l2-core/videobuf2-core.c index b866a6b..af6a23a 100644
> --- a/drivers/media/v4l2-core/videobuf2-core.c
> +++ b/drivers/media/v4l2-core/videobuf2-core.c
> @@ -1391,18 +1391,19 @@ static void __fill_vb2_buffer(struct vb2_buffer *vb,
> const struct v4l2_buffer *b }
> 
>  /**
> - * __qbuf_mmap() - handle qbuf of an MMAP buffer
> + * __prepare_mmap() - prepare an MMAP buffer
>   */
> -static int __qbuf_mmap(struct vb2_buffer *vb, const struct v4l2_buffer *b)
> +static int __prepare_mmap(struct vb2_buffer *vb, const struct v4l2_buffer
> *b) {
>  	__fill_vb2_buffer(vb, b, vb->v4l2_planes);
>  	return call_vb_qop(vb, buf_prepare, vb);
>  }
> 
>  /**
> - * __qbuf_userptr() - handle qbuf of a USERPTR buffer
> + * __prepare_userptr() - prepare a USERPTR buffer
>   */
> -static int __qbuf_userptr(struct vb2_buffer *vb, const struct v4l2_buffer
> *b) +static int __prepare_userptr(struct vb2_buffer *vb,
> +			     const struct v4l2_buffer *b)
>  {
>  	struct v4l2_plane planes[VIDEO_MAX_PLANES];
>  	struct vb2_queue *q = vb->vb2_queue;
> @@ -1504,9 +1505,9 @@ err:
>  }
> 
>  /**
> - * __qbuf_dmabuf() - handle qbuf of a DMABUF buffer
> + * __prepare_dmabuf() - prepare a DMABUF buffer
>   */
> -static int __qbuf_dmabuf(struct vb2_buffer *vb, const struct v4l2_buffer
> *b) +static int __prepare_dmabuf(struct vb2_buffer *vb, const struct
> v4l2_buffer *b) {
>  	struct v4l2_plane planes[VIDEO_MAX_PLANES];
>  	struct vb2_queue *q = vb->vb2_queue;
> @@ -1678,15 +1679,15 @@ static int __buf_prepare(struct vb2_buffer *vb,
> const struct v4l2_buffer *b)
> 
>  	switch (q->memory) {
>  	case V4L2_MEMORY_MMAP:
> -		ret = __qbuf_mmap(vb, b);
> +		ret = __prepare_mmap(vb, b);
>  		break;
>  	case V4L2_MEMORY_USERPTR:
>  		down_read(&current->mm->mmap_sem);
> -		ret = __qbuf_userptr(vb, b);
> +		ret = __prepare_userptr(vb, b);
>  		up_read(&current->mm->mmap_sem);
>  		break;
>  	case V4L2_MEMORY_DMABUF:
> -		ret = __qbuf_dmabuf(vb, b);
> +		ret = __prepare_dmabuf(vb, b);
>  		break;
>  	default:
>  		WARN(1, "Invalid queue type\n");

-- 
Regards,

Laurent Pinchart


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC RESEND 04/11] v4l: Unify cache management hint buffer flags
  2015-09-11 11:50 ` [RFC RESEND 04/11] v4l: Unify cache management hint buffer flags Sakari Ailus
  2015-09-11 16:26   ` Hans Verkuil
@ 2016-12-15 20:15   ` Laurent Pinchart
  2016-12-17  0:35     ` Sakari Ailus
  1 sibling, 1 reply; 38+ messages in thread
From: Laurent Pinchart @ 2016-12-15 20:15 UTC (permalink / raw)
  To: Sakari Ailus
  Cc: linux-media, pawel, m.szyprowski, kyungmin.park, hverkuil,
	sumit.semwal, robdclark, daniel.vetter, labbott

Hi Sakari,

Thank you for the patch.

On Friday 11 Sep 2015 14:50:27 Sakari Ailus wrote:
> The V4L2_BUF_FLAG_NO_CACHE_INVALIDATE and V4L2_BUF_FLAG_NO_CACHE_CLEAN
> buffer flags are currently not used by the kernel. Replace the definitions
> by a single V4L2_BUF_FLAG_NO_CACHE_SYNC flag to be used by further
> patches.
> 
> Different cache architectures should not be visible to the user space
> which can make no meaningful use of the differences anyway. In case a
> device can make use of non-coherent memory accesses, the necessary cache
> operations depend on the CPU architecture and the buffer type, not the
> requests of the user. The cache operation itself may be skipped on the
> user's request which was the purpose of the two flags.
> 
> On ARM the invalidate and clean are separate operations whereas on
> x86(-64) the two are a single operation (flush). Whether the hardware uses
> the buffer for reading (V4L2_BUF_TYPE_*_OUTPUT*) or writing
> (V4L2_BUF_TYPE_*CAPTURE*) already defines the required cache operation
> (clean and invalidate, respectively). No user input is required.

We need to perform the following operations.

	| QBUF		| DQBUF
-----------------------------------------------
CAPTURE	| Invalidate	| Invalidate (*)
OUTPUT	| Clean		| -

(*) for systems using speculative pre-fetching only.

The following optimizations are possible:

1. CAPTURE, the CPU has not written to the buffer before QBUF

Cache invalidation can be skipped at QBUF time, but becomes required at DQBUF 
time on all systems, regardless of whether they use speculative prefetching.

2. CAPTURE, the CPU will not read from the buffer after DQBUF

Cache invalidation can be skipped at DQBUF time.

3. CAPTURE, combination of (1) and (2)

Cache invalidation can be skipped at both QBUF and DQBUF time.

4. OUTPUT, the CPU has not written to the buffer before QBUF

Cache clean can be skipped at QBUF time.


A single flag can cover all cases, provided we keep track of the flag being 
set at QBUF time to force cache invalidation at DQBUF time for case (1) if the 
flag isn't set at DQBUF time.

One issue is that cache invalidation at DQBUF time for CAPTURE buffers isn't 
fully under the control of videobuf. We can instruct the DMA mapping API to 
skip cache handling, but we can't ask it to force cache invalidation in the 
sync_for_cpu operation for non speculative prefetching systems. On ARM32 the 
implementation currently always invalidates the cache in 
__dma_page_dev_to_cpu() for CAPTURE buffers so we're currently safe, but 
there's a FIXME comment that might lead to someone fixing the implementation 
in the future. I believe we'll have to fix this in the DMA mapping level, 
userspace shouldn't be affected.

Feel free to capture (part of) this explanation in the commit message to 
clarify your last paragraph.

> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
> ---
>  Documentation/DocBook/media/v4l/io.xml | 25 +++++++++++--------------
>  include/trace/events/v4l2.h            |  3 +--
>  include/uapi/linux/videodev2.h         |  7 +++++--
>  3 files changed, 17 insertions(+), 18 deletions(-)
> 
> diff --git a/Documentation/DocBook/media/v4l/io.xml
> b/Documentation/DocBook/media/v4l/io.xml index 7bbc2a4..4facd63 100644
> --- a/Documentation/DocBook/media/v4l/io.xml
> +++ b/Documentation/DocBook/media/v4l/io.xml
> @@ -1112,21 +1112,18 @@ application. Drivers set or clear this flag when the
> linkend="vidioc-qbuf">VIDIOC_DQBUF</link> ioctl is called.</entry> </row>
>  	  <row>
> -	    
<entry><constant>V4L2_BUF_FLAG_NO_CACHE_INVALIDATE</constant></entry>
> +	    <entry><constant>V4L2_BUF_FLAG_NO_CACHE_SYNC</constant></entry>
>  	    <entry>0x00000800</entry>
> -	    <entry>Caches do not have to be invalidated for this buffer.
> -Typically applications shall use this flag if the data captured in the
> buffer -is not going to be touched by the CPU, instead the buffer will,
> probably, be -passed on to a DMA-capable hardware unit for further
> processing or output. -</entry>
> -	  </row>
> -	  <row>
> -	    <entry><constant>V4L2_BUF_FLAG_NO_CACHE_CLEAN</constant></entry>
> -	    <entry>0x00001000</entry>
> -	    <entry>Caches do not have to be cleaned for this buffer.
> -Typically applications shall use this flag for output buffers if the data
> -in this buffer has not been created by the CPU but by some DMA-capable
> unit, -in which case caches have not been used.</entry>
> +	    <entry>Do not perform CPU cache synchronisation operations
> +	    when the buffer is queued or dequeued. The user is
> +	    responsible for the correct use of this flag. It should be
> +	    only used when the buffer is not accessed using the CPU,
> +	    e.g. the buffer is written to by a hardware block and then
> +	    read by another one, in which case the flag should be set
> +	    in both <link linkend="vidioc-qbuf">VIDIOC_DQBUF</link>
> +	    and <link linkend="vidioc-qbuf">VIDIOC_QBUF</link> IOCTLs.

I'd like to word this differently. As explained above, there can be cases 
where the flag would only be set in either QBUF or DQBUF. I would prefer 
documenting the flag as a hint for the kernel that userspace has not written 
to the buffer before QBUF (when the flag is set for VIDIOC_QBUF) or will not 
read from the buffer after DQBUF (when the flag is set for VIDIOC_DQBUF) and 
that the kernel is free to perform the appropriate cache optimizations 
(without any guarantee).

> +	    The flag has no effect on some devices / architectures.
> +	    </entry>
>  	  </row>
>  	  <row>
>  	    <entry><constant>V4L2_BUF_FLAG_LAST</constant></entry>

[snip]

-- 
Regards,

Laurent Pinchart


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC RESEND 05/11] v4l2-core: Don't sync cache for a buffer if so requested
  2015-09-11 11:50 ` [RFC RESEND 05/11] v4l2-core: Don't sync cache for a buffer if so requested Sakari Ailus
  2015-09-11 17:12   ` Hans Verkuil
@ 2016-12-15 20:37   ` Laurent Pinchart
  2016-12-17  0:36     ` Sakari Ailus
  1 sibling, 1 reply; 38+ messages in thread
From: Laurent Pinchart @ 2016-12-15 20:37 UTC (permalink / raw)
  To: Sakari Ailus
  Cc: linux-media, pawel, m.szyprowski, kyungmin.park, hverkuil,
	sumit.semwal, robdclark, daniel.vetter, labbott, Samu Onkalo

Hi Sakari,

Thank you for the patch.

On Friday 11 Sep 2015 14:50:28 Sakari Ailus wrote:
> From: Samu Onkalo <samu.onkalo@intel.com>
> 
> The user may request to the driver (vb2) to skip the cache maintenance
> operations in case the buffer does not need cache synchronisation, e.g. in
> cases where the buffer is passed between hardware blocks without it being
> touched by the CPU.
> 
> Also document that the prepare and finish vb2_mem_ops might not get called
> every time the buffer ownership changes between the kernel and the user
> space.
> 
> Signed-off-by: Samu Onkalo <samu.onkalo@intel.com>
> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
> ---
>  drivers/media/v4l2-core/videobuf2-core.c | 52 ++++++++++++++++++++++-------
>  include/media/videobuf2-core.h           | 12 +++++---
>  2 files changed, 49 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/media/v4l2-core/videobuf2-core.c
> b/drivers/media/v4l2-core/videobuf2-core.c index c5c0707a..b664024 100644
> --- a/drivers/media/v4l2-core/videobuf2-core.c
> +++ b/drivers/media/v4l2-core/videobuf2-core.c
> @@ -187,6 +187,28 @@ static void __vb2_queue_cancel(struct vb2_queue *q);
>  static void __enqueue_in_driver(struct vb2_buffer *vb);
> 
>  /**
> + * __mem_prepare_planes() - call finish mem op for all planes of the buffer
> + */
> +static void __mem_prepare_planes(struct vb2_buffer *vb)
> +{
> +	unsigned int plane;
> +
> +	for (plane = 0; plane < vb->num_planes; ++plane)
> +		call_void_memop(vb, prepare, vb->planes[plane].mem_priv);
> +}
> +
> +/**
> + * __mem_finish_planes() - call finish mem op for all planes of the buffer
> + */
> +static void __mem_finish_planes(struct vb2_buffer *vb)
> +{
> +	unsigned int plane;
> +
> +	for (plane = 0; plane < vb->num_planes; ++plane)
> +		call_void_memop(vb, finish, vb->planes[plane].mem_priv);
> +}
> +
> +/**
>   * __vb2_buf_mem_alloc() - allocate video memory for the given buffer
>   */
>  static int __vb2_buf_mem_alloc(struct vb2_buffer *vb)
> @@ -1391,6 +1413,10 @@ static void __fill_vb2_buffer(struct vb2_buffer *vb,
> const struct v4l2_buffer *b static int __prepare_mmap(struct vb2_buffer
> *vb, const struct v4l2_buffer *b) {
>  	__fill_vb2_buffer(vb, b, vb->v4l2_planes);
> +
> +	if (!(b->flags & V4L2_BUF_FLAG_NO_CACHE_SYNC))
> +		__mem_prepare_planes(vb);
> +
>  	return call_vb_qop(vb, buf_prepare, vb);
>  }
> 
> @@ -1476,6 +1502,11 @@ static int __prepare_userptr(struct vb2_buffer *vb,
>  			dprintk(1, "buffer initialization failed\n");
>  			goto err;
>  		}
> +
> +		/* This is new buffer memory --- always synchronise cache. */
> +		__mem_prepare_planes(vb);
> +	} else if (!(b->flags & V4L2_BUF_FLAG_NO_CACHE_SYNC)) {
> +		__mem_prepare_planes(vb);
>  	}
> 
>  	ret = call_vb_qop(vb, buf_prepare, vb);
> @@ -1601,6 +1632,11 @@ static int __prepare_dmabuf(struct vb2_buffer *vb,
> const struct v4l2_buffer *b) dprintk(1, "buffer initialization failed\n");
>  			goto err;
>  		}
> +
> +		/* This is new buffer memory --- always synchronise cache. */
> +		__mem_prepare_planes(vb);
> +	} else if (!(b->flags & V4L2_BUF_FLAG_NO_CACHE_SYNC)) {
> +		__mem_prepare_planes(vb);
>  	}
> 
>  	ret = call_vb_qop(vb, buf_prepare, vb);
> @@ -1624,7 +1660,6 @@ err:
>  static void __enqueue_in_driver(struct vb2_buffer *vb)
>  {
>  	struct vb2_queue *q = vb->vb2_queue;
> -	unsigned int plane;
> 
>  	vb->state = VB2_BUF_STATE_ACTIVE;
>  	atomic_inc(&q->owned_by_drv_count);
> @@ -1691,10 +1726,6 @@ static int __buf_prepare(struct vb2_buffer *vb, const
> struct v4l2_buffer *b) return ret;
>  	}
> 
> -	/* sync buffers */
> -	for (plane = 0; plane < vb->num_planes; ++plane)
> -		call_void_memop(vb, prepare, vb->planes[plane].mem_priv);
> -
>  	vb->state = VB2_BUF_STATE_PREPARED;
> 
>  	return 0;
> @@ -2078,7 +2109,7 @@ EXPORT_SYMBOL_GPL(vb2_wait_for_all_buffers);
>  /**
>   * __vb2_dqbuf() - bring back the buffer to the DEQUEUED state
>   */
> -static void __vb2_dqbuf(struct vb2_buffer *vb)
> +static void __vb2_dqbuf(struct vb2_buffer *vb, bool no_cache_sync)
>  {
>  	struct vb2_queue *q = vb->vb2_queue;
>  	unsigned int plane;
> @@ -2089,9 +2120,8 @@ static void __vb2_dqbuf(struct vb2_buffer *vb)
> 
>  	vb->state = VB2_BUF_STATE_DEQUEUED;
> 
> -	/* sync buffers */
> -	for (plane = 0; plane < vb->num_planes; plane++)
> -		call_void_memop(vb, finish, vb->planes[plane].mem_priv);
> +	if (!no_cache_sync)
> +		__mem_finish_planes(vb);
> 
>  	/* unmap DMABUF buffer */
>  	if (q->memory == V4L2_MEMORY_DMABUF)
> @@ -2143,7 +2173,7 @@ static int vb2_internal_dqbuf(struct vb2_queue *q,
> struct v4l2_buffer *b, bool n vb->v4l2_buf.flags & V4L2_BUF_FLAG_LAST)
>  		q->last_buffer_dequeued = true;
>  	/* go back to dequeued state */
> -	__vb2_dqbuf(vb);
> +	__vb2_dqbuf(vb, b->flags & V4L2_BUF_FLAG_NO_CACHE_SYNC);
> 
>  	dprintk(1, "dqbuf of buffer %d, with state %d\n",
>  			vb->v4l2_buf.index, vb->state);
> @@ -2246,7 +2276,7 @@ static void __vb2_queue_cancel(struct vb2_queue *q)
>  			vb->state = VB2_BUF_STATE_PREPARED;
>  			call_void_vb_qop(vb, buf_finish, vb);
>  		}
> -		__vb2_dqbuf(vb);
> +		__vb2_dqbuf(vb, false);
>  	}
>  }
> 
> diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h
> index 4f7f7ae..a825bd5 100644
> --- a/include/media/videobuf2-core.h
> +++ b/include/media/videobuf2-core.h
> @@ -59,10 +59,14 @@ struct vb2_threadio_data;
>   *		dmabuf.
>   * @unmap_dmabuf: releases access control to the dmabuf - allocator is
> notified
> *		  that this driver is done using the dmabuf for now.
> - * @prepare:	called every time the buffer is passed from userspace to the
> - *		driver, useful for cache synchronisation, optional.
> - * @finish:	called every time the buffer is passed back from the driver
> - *		to the userspace, also optional.
> + * @prepare:	Called on the plane when the buffer ownership is passed from
> + *		the user space to the kernel and the plane must be cache
> + *		syncronised. The V4L2_BUF_FLAG_NO_CACHE_SYNC buffer flag may

s/syncronised/synchronised/

> + *		be used to skip this call. Optional.

I would say "The call might be skipped by the videobuf core when the 
V4L2_BUF_FLAG_NO_CACHE_SYNC buffer flag is set". Your wording could mean that 
the memops implementation is free to skip cache synchronization internally if 
the flag is set.

If the core skips the prepare and finish operations based on the 
V4L2_BUF_FLAG_NO_CACHE_SYNC flag then those operations must not perform 
anything else than cache synchronization. That's fine with the current 
implementation, but it should be documented as such, and I'd rename the 
operations to sync_for_cpu and sync_for dev (or something similar).

> + * @finish:	Called on the plane when the buffer ownership is passed from
> + *		the kernel to the user space and the plane must be cache
> + *		syncronised. The V4L2_BUF_FLAG_NO_CACHE_SYNC buffer flag may
> + *		be used to skip this call. Optional.

Same comment here.

>   * @vaddr:	return a kernel virtual address to a given memory buffer
>   *		associated with the passed private structure or NULL if no
>   *		such mapping exists.

-- 
Regards,

Laurent Pinchart


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC RESEND 06/11] vb2: Improve struct vb2_mem_ops documentation; alloc and put are for MMAP
  2015-09-11 11:50 ` [RFC RESEND 06/11] vb2: Improve struct vb2_mem_ops documentation; alloc and put are for MMAP Sakari Ailus
  2015-09-11 17:13   ` Hans Verkuil
@ 2016-12-15 20:50   ` Laurent Pinchart
  1 sibling, 0 replies; 38+ messages in thread
From: Laurent Pinchart @ 2016-12-15 20:50 UTC (permalink / raw)
  To: Sakari Ailus
  Cc: linux-media, pawel, m.szyprowski, kyungmin.park, hverkuil,
	sumit.semwal, robdclark, daniel.vetter, labbott

Hi Sakari,

Thank you for the patch.

On Friday 11 Sep 2015 14:50:29 Sakari Ailus wrote:
> The alloc() and put() ops are for MMAP buffers only. Document it.
> 
> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
> ---
>  include/media/videobuf2-core.h | 20 ++++++++++----------
>  1 file changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h
> index a825bd5..efc9a19 100644
> --- a/include/media/videobuf2-core.h
> +++ b/include/media/videobuf2-core.h
> @@ -24,16 +24,16 @@ struct vb2_threadio_data;
> 
>  /**
>   * struct vb2_mem_ops - memory handling/memory allocator operations
> - * @alloc:	allocate video memory and, optionally, allocator private data,
> - *		return NULL on failure or a pointer to allocator private,
> - *		per-buffer data on success; the returned private structure
> - *		will then be passed as buf_priv argument to other ops in this
> - *		structure. Additional gfp_flags to use when allocating the
> - *		are also passed to this operation. These flags are from the
> - *		gfp_flags field of vb2_queue.
> - * @put:	inform the allocator that the buffer will no longer be used;
> - *		usually will result in the allocator freeing the buffer (if
> - *		no other users of this buffer are present); the buf_priv
> + * @alloc:	allocate video memory for an MMAP buffer and, optionally,
> + *		allocator private data, return NULL on failure or a pointer
> + *		to allocator private, per-buffer data on success; the returned
> + *		private structure will then be passed as buf_priv argument to
> + *		other ops in this structure. Additional gfp_flags to use when
> + *		allocating the are also passed to this operation. These flags

s/the are/the memory are/

Apart from that,

Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>

> + *		are from the gfp_flags field of vb2_queue.
> + * @put:	inform the allocator that the MMAP buffer will no longer be
> + *		used; usually will result in the allocator freeing the buffer
> + *		(if no other users of this buffer are present); the buf_priv
>   *		argument is the allocator private per-buffer structure
>   *		previously returned from the alloc callback.
>   * @get_userptr: acquire userspace memory for a hardware operation; used
> for

-- 
Regards,

Laurent Pinchart


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC RESEND 07/11] vb2: dma-contig: Remove redundant sgt_base field
  2015-09-11 11:50 ` [RFC RESEND 07/11] vb2: dma-contig: Remove redundant sgt_base field Sakari Ailus
  2015-09-11 17:28   ` Hans Verkuil
@ 2016-12-15 21:08   ` Laurent Pinchart
  2016-12-17  0:40     ` Sakari Ailus
  1 sibling, 1 reply; 38+ messages in thread
From: Laurent Pinchart @ 2016-12-15 21:08 UTC (permalink / raw)
  To: Sakari Ailus
  Cc: linux-media, pawel, m.szyprowski, kyungmin.park, hverkuil,
	sumit.semwal, robdclark, daniel.vetter, labbott

Hi Sakari,

Thank you for the patch.

On Friday 11 Sep 2015 14:50:30 Sakari Ailus wrote:
> The struct vb2_dc_buf contains two struct sg_table fields: sgt_base and
> dma_sgt. The former is used by DMA-BUF buffers whereas the latter is used
> by USERPTR.
> 
> Unify the two, leaving dma_sgt.

Looks good to me. I initially thought this would prevent exporting a USERPTR 
buffer through dmabuf, but that's forbidden by the videobuf2 core (and rightly 
so).

> MMAP buffers do not need cache flushing since they have been allocated
> using dma_alloc_coherent().

I had understood this as meaning that the patch changes the behaviour for MMAP 
buffers. How about

"The prepare and finish implementation currently skip cache sync for MMAP 
buffers and identify them based on dma_sgt being NULL. Now that dma_sgt is 
used to export the MMAP buffers the condition must be updated."

> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
> ---
>  drivers/media/v4l2-core/videobuf2-dma-contig.c | 21 ++++++++++-----------
>  1 file changed, 10 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/media/v4l2-core/videobuf2-dma-contig.c
> b/drivers/media/v4l2-core/videobuf2-dma-contig.c index 94c1e64..26a0a0f
> 100644
> --- a/drivers/media/v4l2-core/videobuf2-dma-contig.c
> +++ b/drivers/media/v4l2-core/videobuf2-dma-contig.c
> @@ -36,7 +36,6 @@ struct vb2_dc_buf {
>  	/* MMAP related */
>  	struct vb2_vmarea_handler	handler;
>  	atomic_t			refcount;
> -	struct sg_table			*sgt_base;
> 
>  	/* USERPTR related */
>  	struct vm_area_struct		*vma;
> @@ -117,7 +116,7 @@ static void vb2_dc_prepare(void *buf_priv)
>  	struct sg_table *sgt = buf->dma_sgt;
> 
>  	/* DMABUF exporter will flush the cache for us */

How about updating the comment to

	/*
	 * Skip cache sync for MMAP buffers (they don't need it) and DMABUF
	 * buffers (the exporter will sync the cache for us).
	 */

> -	if (!sgt || buf->db_attach)
> +	if (!buf->vma)
>  		return;
> 
>  	dma_sync_sg_for_device(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir);
> @@ -129,7 +128,7 @@ static void vb2_dc_finish(void *buf_priv)
>  	struct sg_table *sgt = buf->dma_sgt;
> 
>  	/* DMABUF exporter will flush the cache for us */
> -	if (!sgt || buf->db_attach)
> +	if (!buf->vma)
>  		return;
> 
>  	dma_sync_sg_for_cpu(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir);
> @@ -146,9 +145,9 @@ static void vb2_dc_put(void *buf_priv)
>  	if (!atomic_dec_and_test(&buf->refcount))
>  		return;
> 
> -	if (buf->sgt_base) {
> -		sg_free_table(buf->sgt_base);
> -		kfree(buf->sgt_base);
> +	if (buf->dma_sgt) {
> +		sg_free_table(buf->dma_sgt);
> +		kfree(buf->dma_sgt);
>  	}
>  	dma_free_coherent(buf->dev, buf->size, buf->vaddr, buf->dma_addr);
>  	put_device(buf->dev);
> @@ -252,13 +251,13 @@ static int vb2_dc_dmabuf_ops_attach(struct dma_buf
> *dbuf, struct device *dev, /* Copy the buf->base_sgt scatter list to the
> attachment, as we can't * map the same scatter list to multiple attachments
> at the same time. */
> -	ret = sg_alloc_table(sgt, buf->sgt_base->orig_nents, GFP_KERNEL);
> +	ret = sg_alloc_table(sgt, buf->dma_sgt->orig_nents, GFP_KERNEL);
>  	if (ret) {
>  		kfree(attach);
>  		return -ENOMEM;
>  	}
> 
> -	rd = buf->sgt_base->sgl;
> +	rd = buf->dma_sgt->sgl;
>  	wr = sgt->sgl;
>  	for (i = 0; i < sgt->orig_nents; ++i) {
>  		sg_set_page(wr, sg_page(rd), rd->length, rd->offset);
> @@ -409,10 +408,10 @@ static struct dma_buf *vb2_dc_get_dmabuf(void
> *buf_priv, unsigned long flags) exp_info.flags = flags;
>  	exp_info.priv = buf;
> 
> -	if (!buf->sgt_base)
> -		buf->sgt_base = vb2_dc_get_base_sgt(buf);
> +	if (!buf->dma_sgt)
> +		buf->dma_sgt = vb2_dc_get_base_sgt(buf);
> 
> -	if (WARN_ON(!buf->sgt_base))
> +	if (WARN_ON(!buf->dma_sgt))
>  		return NULL;
> 
>  	dbuf = dma_buf_export(&exp_info);

-- 
Regards,

Laurent Pinchart


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC RESEND 08/11] vb2: dma-contig: Move vb2_dc_get_base_sgt() up
  2015-09-11 11:50 ` [RFC RESEND 08/11] vb2: dma-contig: Move vb2_dc_get_base_sgt() up Sakari Ailus
@ 2016-12-15 21:12   ` Laurent Pinchart
  0 siblings, 0 replies; 38+ messages in thread
From: Laurent Pinchart @ 2016-12-15 21:12 UTC (permalink / raw)
  To: Sakari Ailus
  Cc: linux-media, pawel, m.szyprowski, kyungmin.park, hverkuil,
	sumit.semwal, robdclark, daniel.vetter, labbott

Hi Sakari,

Thank you for the patch.

On Friday 11 Sep 2015 14:50:31 Sakari Ailus wrote:
> Just move the function up. It'll be soon needed earlier than previously.
> 
> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>

Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>

I would move this patch to 09/11 though, just before the patch that requires 
it.

> ---
>  drivers/media/v4l2-core/videobuf2-dma-contig.c | 44 +++++++++++------------
>  1 file changed, 22 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/media/v4l2-core/videobuf2-dma-contig.c
> b/drivers/media/v4l2-core/videobuf2-dma-contig.c index 26a0a0f..3260392
> 100644
> --- a/drivers/media/v4l2-core/videobuf2-dma-contig.c
> +++ b/drivers/media/v4l2-core/videobuf2-dma-contig.c
> @@ -82,6 +82,28 @@ static unsigned long vb2_dc_get_contiguous_size(struct
> sg_table *sgt) return size;
>  }
> 
> +static struct sg_table *vb2_dc_get_base_sgt(struct vb2_dc_buf *buf)
> +{
> +	int ret;
> +	struct sg_table *sgt;
> +
> +	sgt = kmalloc(sizeof(*sgt), GFP_KERNEL);
> +	if (!sgt) {
> +		dev_err(buf->dev, "failed to alloc sg table\n");
> +		return NULL;
> +	}
> +
> +	ret = dma_get_sgtable(buf->dev, sgt, buf->vaddr, buf->dma_addr,
> +		buf->size);
> +	if (ret < 0) {
> +		dev_err(buf->dev, "failed to get scatterlist from DMA API\n");
> +		kfree(sgt);
> +		return NULL;
> +	}
> +
> +	return sgt;
> +}
> +
>  /*********************************************/
>  /*         callbacks for all buffers         */
>  /*********************************************/
> @@ -375,28 +397,6 @@ static struct dma_buf_ops vb2_dc_dmabuf_ops = {
>  	.release = vb2_dc_dmabuf_ops_release,
>  };
> 
> -static struct sg_table *vb2_dc_get_base_sgt(struct vb2_dc_buf *buf)
> -{
> -	int ret;
> -	struct sg_table *sgt;
> -
> -	sgt = kmalloc(sizeof(*sgt), GFP_KERNEL);
> -	if (!sgt) {
> -		dev_err(buf->dev, "failed to alloc sg table\n");
> -		return NULL;
> -	}
> -
> -	ret = dma_get_sgtable(buf->dev, sgt, buf->vaddr, buf->dma_addr,
> -		buf->size);
> -	if (ret < 0) {
> -		dev_err(buf->dev, "failed to get scatterlist from DMA API\n");
> -		kfree(sgt);
> -		return NULL;
> -	}
> -
> -	return sgt;
> -}
> -
>  static struct dma_buf *vb2_dc_get_dmabuf(void *buf_priv, unsigned long
> flags) {
>  	struct vb2_dc_buf *buf = buf_priv;

-- 
Regards,

Laurent Pinchart


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC RESEND 09/11] vb2: dma-contig: Don't warn on failure in obtaining scatterlist
  2015-09-11 11:50 ` [RFC RESEND 09/11] vb2: dma-contig: Don't warn on failure in obtaining scatterlist Sakari Ailus
@ 2016-12-15 21:13   ` Laurent Pinchart
  0 siblings, 0 replies; 38+ messages in thread
From: Laurent Pinchart @ 2016-12-15 21:13 UTC (permalink / raw)
  To: Sakari Ailus
  Cc: linux-media, pawel, m.szyprowski, kyungmin.park, hverkuil,
	sumit.semwal, robdclark, daniel.vetter, labbott

Hi Sakari,

Thank you for the patch.

On Friday 11 Sep 2015 14:50:32 Sakari Ailus wrote:
> vb2_dc_get_base_sgt() which obtains the scatterlist already prints
> information on why the scatterlist could not be obtained.
> 
> Also, remove the useless warning of a failed kmalloc().
> 
> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>

Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>

> ---
>  drivers/media/v4l2-core/videobuf2-dma-contig.c | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/media/v4l2-core/videobuf2-dma-contig.c
> b/drivers/media/v4l2-core/videobuf2-dma-contig.c index 3260392..65bc687
> 100644
> --- a/drivers/media/v4l2-core/videobuf2-dma-contig.c
> +++ b/drivers/media/v4l2-core/videobuf2-dma-contig.c
> @@ -88,10 +88,8 @@ static struct sg_table *vb2_dc_get_base_sgt(struct
> vb2_dc_buf *buf) struct sg_table *sgt;
> 
>  	sgt = kmalloc(sizeof(*sgt), GFP_KERNEL);
> -	if (!sgt) {
> -		dev_err(buf->dev, "failed to alloc sg table\n");
> +	if (!sgt)
>  		return NULL;
> -	}
> 
>  	ret = dma_get_sgtable(buf->dev, sgt, buf->vaddr, buf->dma_addr,
>  		buf->size);
> @@ -411,7 +409,7 @@ static struct dma_buf *vb2_dc_get_dmabuf(void *buf_priv,
> unsigned long flags) if (!buf->dma_sgt)
>  		buf->dma_sgt = vb2_dc_get_base_sgt(buf);
> 
> -	if (WARN_ON(!buf->dma_sgt))
> +	if (!buf->dma_sgt)
>  		return NULL;
> 
>  	dbuf = dma_buf_export(&exp_info);

-- 
Regards,

Laurent Pinchart


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC RESEND 10/11] vb2: dma-contig: Let drivers decide DMA attrs of MMAP and USERPTR bufs
  2015-09-11 11:50 ` [RFC RESEND 10/11] vb2: dma-contig: Let drivers decide DMA attrs of MMAP and USERPTR bufs Sakari Ailus
@ 2016-12-15 21:40   ` Laurent Pinchart
  0 siblings, 0 replies; 38+ messages in thread
From: Laurent Pinchart @ 2016-12-15 21:40 UTC (permalink / raw)
  To: Sakari Ailus
  Cc: linux-media, pawel, m.szyprowski, kyungmin.park, hverkuil,
	sumit.semwal, robdclark, daniel.vetter, labbott

Hi Sakari,

Thank you for the patch.

On Friday 11 Sep 2015 14:50:33 Sakari Ailus wrote:
> The desirable DMA attributes are not generic for all devices using
> Videobuf2 contiguous DMA ops. Let the drivers decide.
> 
> This change also results in MMAP buffers always having an sg_table
> (dma_sgt field).
> 
> Also arrange the header files alphabetically.
> 
> As a result, also the DMA-BUF exporter must provide ops for synchronising
> the cache. This adds begin_cpu_access and end_cpu_access ops to
> vb2_dc_dmabuf_ops.
> 
> The driver API changes were done using the following coccinelle spatch:
> 
> @@
> expression E;
> @@
> 
> - vb2_dma_contig_init_ctx(E)
> + vb2_dma_contig_init_ctx(E, NULL)

I wanted to mention that this patch includes no driver change, but the 
vb2_dma_contig_init_ctx() has disappeared now, replaced by an attrs attribute 
to the .alloc() operation. This, and the fact that the kernel has moved from 
struct dma_attrs to unsigned long to store the DMA attributes, generates a lot 
of conflicts. I'll review the rebased version.

> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
> ---
>  drivers/media/v4l2-core/videobuf2-dma-contig.c | 104 +++++++++++++++-------
>  include/media/videobuf2-dma-contig.h           |   3 +-
>  2 files changed, 79 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/media/v4l2-core/videobuf2-dma-contig.c
> b/drivers/media/v4l2-core/videobuf2-dma-contig.c index 65bc687..65ee122
> 100644
> --- a/drivers/media/v4l2-core/videobuf2-dma-contig.c
> +++ b/drivers/media/v4l2-core/videobuf2-dma-contig.c
> @@ -11,11 +11,11 @@
>   */
> 
>  #include <linux/dma-buf.h>
> +#include <linux/dma-mapping.h>
>  #include <linux/module.h>
>  #include <linux/scatterlist.h>
>  #include <linux/sched.h>
>  #include <linux/slab.h>
> -#include <linux/dma-mapping.h>
> 
>  #include <media/videobuf2-core.h>
>  #include <media/videobuf2-dma-contig.h>
> @@ -23,6 +23,7 @@
> 
>  struct vb2_dc_conf {
>  	struct device		*dev;
> +	struct dma_attrs	attrs;
>  };
> 
>  struct vb2_dc_buf {
> @@ -34,6 +35,7 @@ struct vb2_dc_buf {
>  	struct sg_table			*dma_sgt;
> 
>  	/* MMAP related */
> +	struct dma_attrs		*attrs;
>  	struct vb2_vmarea_handler	handler;
>  	atomic_t			refcount;
> 
> @@ -135,8 +137,12 @@ static void vb2_dc_prepare(void *buf_priv)
>  	struct vb2_dc_buf *buf = buf_priv;
>  	struct sg_table *sgt = buf->dma_sgt;
> 
> -	/* DMABUF exporter will flush the cache for us */
> -	if (!buf->vma)
> +	/*
> +	 * DMABUF exporter will flush the cache for us; only USERPTR
> +	 * and MMAP buffers with non-coherent memory will be flushed.
> +	 */
> +	if (!buf->attrs ||
> +	    !dma_get_attr(DMA_ATTR_NON_CONSISTENT, buf->attrs))
>  		return;
> 
>  	dma_sync_sg_for_device(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir);
> @@ -147,8 +153,12 @@ static void vb2_dc_finish(void *buf_priv)
>  	struct vb2_dc_buf *buf = buf_priv;
>  	struct sg_table *sgt = buf->dma_sgt;
> 
> -	/* DMABUF exporter will flush the cache for us */
> -	if (!buf->vma)
> +	/*
> +	 * DMABUF exporter will flush the cache for us; only USERPTR
> +	 * and MMAP buffers with non-coherent memory will be flushed.
> +	 */
> +	if (!buf->attrs ||
> +	    !dma_get_attr(DMA_ATTR_NON_CONSISTENT, buf->attrs))
>  		return;
> 
>  	dma_sync_sg_for_cpu(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir);
> @@ -169,7 +179,8 @@ static void vb2_dc_put(void *buf_priv)
>  		sg_free_table(buf->dma_sgt);
>  		kfree(buf->dma_sgt);
>  	}
> -	dma_free_coherent(buf->dev, buf->size, buf->vaddr, buf->dma_addr);
> +	dma_free_attrs(buf->dev, buf->size, buf->vaddr, buf->dma_addr,
> +		       buf->attrs);
>  	put_device(buf->dev);
>  	kfree(buf);
>  }
> @@ -185,14 +196,25 @@ static void *vb2_dc_alloc(void *alloc_ctx, unsigned
> long size, if (!buf)
>  		return ERR_PTR(-ENOMEM);
> 
> -	buf->vaddr = dma_alloc_coherent(dev, size, &buf->dma_addr,
> -						GFP_KERNEL | gfp_flags);
> +	buf->attrs = &conf->attrs;
> +
> +	buf->vaddr = dma_alloc_attrs(dev, size, &buf->dma_addr,
> +				     GFP_KERNEL | gfp_flags, buf->attrs);
>  	if (!buf->vaddr) {
> -		dev_err(dev, "dma_alloc_coherent of size %ld failed\n", size);
> +		dev_err(dev, "dma_alloc_attrs of size %ld failed\n", size);
>  		kfree(buf);
>  		return ERR_PTR(-ENOMEM);
>  	}
> 
> +	if (dma_get_attr(DMA_ATTR_NON_CONSISTENT, buf->attrs)) {
> +		buf->dma_sgt = vb2_dc_get_base_sgt(buf);
> +		if (!buf->dma_sgt) {
> +			dma_free_attrs(dev, size, buf->vaddr, buf->dma_addr,
> +				       buf->attrs);
> +			return ERR_PTR(-ENOMEM);
> +		}
> +	}
> +
>  	/* Prevent the device from being released while the buffer is used */
>  	buf->dev = get_device(dev);
>  	buf->size = size;
> @@ -223,8 +245,8 @@ static int vb2_dc_mmap(void *buf_priv, struct
> vm_area_struct *vma) */
>  	vma->vm_pgoff = 0;
> 
> -	ret = dma_mmap_coherent(buf->dev, vma, buf->vaddr,
> -		buf->dma_addr, buf->size);
> +	ret = dma_mmap_attrs(buf->dev, vma, buf->vaddr, buf->dma_addr,
> +			     buf->size, buf->attrs);
> 
>  	if (ret) {
>  		pr_err("Remapping memory failed, error: %d\n", ret);
> @@ -370,6 +392,34 @@ static void *vb2_dc_dmabuf_ops_kmap(struct dma_buf
> *dbuf, unsigned long pgnum) return buf->vaddr + pgnum * PAGE_SIZE;
>  }
> 
> +static int vb2_dc_dmabuf_ops_begin_cpu_access(
> +	struct dma_buf *dbuf, size_t start, size_t len,
> +	enum dma_data_direction direction)
> +{
> +	struct vb2_dc_buf *buf = dbuf->priv;
> +	struct sg_table *sgt = buf->dma_sgt;
> +
> +	if (!dma_get_attr(DMA_ATTR_NON_CONSISTENT, buf->attrs))
> +		return 0;
> +
> +	dma_sync_sg_for_cpu(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir);
> +
> +	return 0;
> +}
> +
> +static void vb2_dc_dmabuf_ops_end_cpu_access(
> +	struct dma_buf *dbuf, size_t start, size_t len,
> +	enum dma_data_direction direction)
> +{
> +	struct vb2_dc_buf *buf = dbuf->priv;
> +	struct sg_table *sgt = buf->dma_sgt;
> +
> +	if (!dma_get_attr(DMA_ATTR_NON_CONSISTENT, buf->attrs))
> +		return;
> +
> +	dma_sync_sg_for_device(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir);
> +}
> +
>  static void *vb2_dc_dmabuf_ops_vmap(struct dma_buf *dbuf)
>  {
>  	struct vb2_dc_buf *buf = dbuf->priv;
> @@ -390,6 +440,8 @@ static struct dma_buf_ops vb2_dc_dmabuf_ops = {
>  	.unmap_dma_buf = vb2_dc_dmabuf_ops_unmap,
>  	.kmap = vb2_dc_dmabuf_ops_kmap,
>  	.kmap_atomic = vb2_dc_dmabuf_ops_kmap,
> +	.begin_cpu_access = vb2_dc_dmabuf_ops_begin_cpu_access,
> +	.end_cpu_access = vb2_dc_dmabuf_ops_end_cpu_access,
>  	.vmap = vb2_dc_dmabuf_ops_vmap,
>  	.mmap = vb2_dc_dmabuf_ops_mmap,
>  	.release = vb2_dc_dmabuf_ops_release,
> @@ -514,15 +566,13 @@ static void vb2_dc_put_userptr(void *buf_priv)
>  	struct sg_table *sgt = buf->dma_sgt;
> 
>  	if (sgt) {
> -		DEFINE_DMA_ATTRS(attrs);
> -
> -		dma_set_attr(DMA_ATTR_SKIP_CPU_SYNC, &attrs);
>  		/*
> -		 * No need to sync to CPU, it's already synced to the CPU
> -		 * since the finish() memop will have been called before this.
> +		 * Don't ask to skip cache sync in case if the user
> +		 * did ask to skip cache flush the last time the
> +		 * buffer was dequeued.
>  		 */
>  		dma_unmap_sg_attrs(buf->dev, sgt->sgl, sgt->orig_nents,
> -				   buf->dma_dir, &attrs);
> +				   buf->dma_dir, buf->attrs);
>  		if (!vma_is_io(buf->vma))
>  			vb2_dc_sgt_foreach_page(sgt, vb2_dc_put_dirty_page);
> 
> @@ -579,9 +629,6 @@ static void *vb2_dc_get_userptr(void *alloc_ctx,
> unsigned long vaddr, struct sg_table *sgt;
>  	unsigned long contig_size;
>  	unsigned long dma_align = dma_get_cache_alignment();
> -	DEFINE_DMA_ATTRS(attrs);
> -
> -	dma_set_attr(DMA_ATTR_SKIP_CPU_SYNC, &attrs);
> 
>  	/* Only cache aligned DMA transfers are reliable */
>  	if (!IS_ALIGNED(vaddr | size, dma_align)) {
> @@ -598,6 +645,8 @@ static void *vb2_dc_get_userptr(void *alloc_ctx,
> unsigned long vaddr, if (!buf)
>  		return ERR_PTR(-ENOMEM);
> 
> +	buf->attrs = &conf->attrs;
> +
>  	buf->dev = conf->dev;
>  	buf->dma_dir = dma_dir;
> 
> @@ -667,12 +716,8 @@ static void *vb2_dc_get_userptr(void *alloc_ctx,
> unsigned long vaddr, kfree(pages);
>  	pages = NULL;
> 
> -	/*
> -	 * No need to sync to the device, this will happen later when the
> -	 * prepare() memop is called.
> -	 */
>  	sgt->nents = dma_map_sg_attrs(buf->dev, sgt->sgl, sgt->orig_nents,
> -				      buf->dma_dir, &attrs);
> +				      buf->dma_dir, buf->attrs);
>  	if (sgt->nents <= 0) {
>  		pr_err("failed to map scatterlist\n");
>  		ret = -EIO;
> @@ -695,7 +740,7 @@ static void *vb2_dc_get_userptr(void *alloc_ctx,
> unsigned long vaddr,
> 
>  fail_map_sg:
>  	dma_unmap_sg_attrs(buf->dev, sgt->sgl, sgt->orig_nents,
> -			   buf->dma_dir, &attrs);
> +			   buf->dma_dir, buf->attrs);
> 
>  fail_sgt_init:
>  	if (!vma_is_io(buf->vma))
> @@ -856,7 +901,7 @@ const struct vb2_mem_ops vb2_dma_contig_memops = {
>  };
>  EXPORT_SYMBOL_GPL(vb2_dma_contig_memops);
> 
> -void *vb2_dma_contig_init_ctx(struct device *dev)
> +void *vb2_dma_contig_init_ctx(struct device *dev, struct dma_attrs *attrs)
>  {
>  	struct vb2_dc_conf *conf;
> 
> @@ -866,6 +911,11 @@ void *vb2_dma_contig_init_ctx(struct device *dev)
> 
>  	conf->dev = dev;
> 
> +	if (!attrs)
> +		init_dma_attrs(&conf->attrs);
> +	else
> +		conf->attrs = *attrs;
> +
>  	return conf;
>  }
>  EXPORT_SYMBOL_GPL(vb2_dma_contig_init_ctx);
> diff --git a/include/media/videobuf2-dma-contig.h
> b/include/media/videobuf2-dma-contig.h index 8197f87..1b85282 100644
> --- a/include/media/videobuf2-dma-contig.h
> +++ b/include/media/videobuf2-dma-contig.h
> @@ -24,7 +24,8 @@ vb2_dma_contig_plane_dma_addr(struct vb2_buffer *vb,
> unsigned int plane_no) return *addr;
>  }
> 
> -void *vb2_dma_contig_init_ctx(struct device *dev);
> +void *vb2_dma_contig_init_ctx(struct device *dev,
> +			      struct dma_attrs *attrs);
>  void vb2_dma_contig_cleanup_ctx(void *alloc_ctx);
> 
>  extern const struct vb2_mem_ops vb2_dma_contig_memops;

-- 
Regards,

Laurent Pinchart


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC RESEND 11/11] vb2: dma-contig: Add WARN_ON_ONCE() to check for potential bugs
  2015-09-11 11:50 ` [RFC RESEND 11/11] vb2: dma-contig: Add WARN_ON_ONCE() to check for potential bugs Sakari Ailus
@ 2016-12-15 21:57   ` Laurent Pinchart
  2016-12-17  0:50     ` Sakari Ailus
  0 siblings, 1 reply; 38+ messages in thread
From: Laurent Pinchart @ 2016-12-15 21:57 UTC (permalink / raw)
  To: Sakari Ailus
  Cc: linux-media, pawel, m.szyprowski, kyungmin.park, hverkuil,
	sumit.semwal, robdclark, daniel.vetter, labbott

Hi Sakari,

Thank you for the patch.

On Friday 11 Sep 2015 14:50:34 Sakari Ailus wrote:
> The scatterlist should always be present when the cache would need to be
> flushed. Each buffer type has its own means to provide that. Add
> WARN_ON_ONCE() to check the scatterist exists.

Do you think such a check is really needed ? Have you run into this before ?

> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
> ---
>  drivers/media/v4l2-core/videobuf2-dma-contig.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/media/v4l2-core/videobuf2-dma-contig.c
> b/drivers/media/v4l2-core/videobuf2-dma-contig.c index 65ee122..58c35c5
> 100644
> --- a/drivers/media/v4l2-core/videobuf2-dma-contig.c
> +++ b/drivers/media/v4l2-core/videobuf2-dma-contig.c
> @@ -145,6 +145,9 @@ static void vb2_dc_prepare(void *buf_priv)
>  	    !dma_get_attr(DMA_ATTR_NON_CONSISTENT, buf->attrs))
>  		return;
> 
> +	if (WARN_ON_ONCE(!sgt))
> +		return;
> +
>  	dma_sync_sg_for_device(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir);
>  }
> 
> @@ -161,6 +164,9 @@ static void vb2_dc_finish(void *buf_priv)
>  	    !dma_get_attr(DMA_ATTR_NON_CONSISTENT, buf->attrs))
>  		return;
> 
> +	if (WARN_ON_ONCE(!sgt))
> +		return;
> +
>  	dma_sync_sg_for_cpu(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir);
>  }

-- 
Regards,

Laurent Pinchart


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC RESEND 03/11] vb2: Move cache synchronisation from buffer done to dqbuf handler
  2015-09-15  7:51     ` Sakari Ailus
@ 2016-12-15 23:47       ` Laurent Pinchart
  0 siblings, 0 replies; 38+ messages in thread
From: Laurent Pinchart @ 2016-12-15 23:47 UTC (permalink / raw)
  To: Sakari Ailus
  Cc: Hans Verkuil, linux-media, pawel, m.szyprowski, kyungmin.park,
	sumit.semwal, robdclark, daniel.vetter, labbott

Hi Sakari,

Thank you for the patch.

On Tuesday 15 Sep 2015 10:51:10 Sakari Ailus wrote:
> Hans Verkuil wrote:
> > On 09/11/2015 01:50 PM, Sakari Ailus wrote:
> >> The cache synchronisation may be a time consuming operation and thus not
> >> best performed in an interrupt which is a typical context for
> >> vb2_buffer_done() calls. This may consume up to tens of ms on some
> >> machines, depending on the buffer size.
> >> 
> >> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
> >> ---
> >> 
> >>  drivers/media/v4l2-core/videobuf2-core.c | 20 ++++++++++----------
> >>  1 file changed, 10 insertions(+), 10 deletions(-)
> >> 
> >> diff --git a/drivers/media/v4l2-core/videobuf2-core.c
> >> b/drivers/media/v4l2-core/videobuf2-core.c index 64fce4d..c5c0707a
> >> 100644
> >> --- a/drivers/media/v4l2-core/videobuf2-core.c
> >> +++ b/drivers/media/v4l2-core/videobuf2-core.c
> >> @@ -1177,7 +1177,6 @@ void vb2_buffer_done(struct vb2_buffer *vb, enum
> >> vb2_buffer_state state)
> >>  {
> >>  	struct vb2_queue *q = vb->vb2_queue;
> >>  	unsigned long flags;
> >> -	unsigned int plane;
> >> 
> >>  	if (WARN_ON(vb->state != VB2_BUF_STATE_ACTIVE))
> >>  		return;
> >> @@ -1197,10 +1196,6 @@ void vb2_buffer_done(struct vb2_buffer *vb, enum
> >> vb2_buffer_state state)
> >>  	dprintk(4, "done processing on buffer %d, state: %d\n",
> >>  			vb->v4l2_buf.index, state);
> >> 
> >> -	/* sync buffers */
> >> -	for (plane = 0; plane < vb->num_planes; ++plane)
> >> -		call_void_memop(vb, finish, vb->planes[plane].mem_priv);
> >> -
> > 
> > Ah, OK, so it is removed here,
> 
> I can merge the two patches for the next version if you prefer that.

I think that would be a good idea. They're both pretty small, and they're 
related. Merging them will make review easier.

> >>  	/* Add the buffer to the done buffers list */
> >>  	spin_lock_irqsave(&q->done_lock, flags);
> >>  	vb->state = state;
> >> 
> >> @@ -2086,7 +2081,7 @@ EXPORT_SYMBOL_GPL(vb2_wait_for_all_buffers);
> >>  static void __vb2_dqbuf(struct vb2_buffer *vb)
> >>  {
> >>  	struct vb2_queue *q = vb->vb2_queue;
> >> -	unsigned int i;
> >> +	unsigned int plane;
> >> 
> >>  	/* nothing to do if the buffer is already dequeued */
> >>  	if (vb->state == VB2_BUF_STATE_DEQUEUED)
> >> @@ -2094,13 +2089,18 @@ static void __vb2_dqbuf(struct vb2_buffer *vb)
> >> 
> >>  	vb->state = VB2_BUF_STATE_DEQUEUED;
> >> 
> >> +	/* sync buffers */
> >> +	for (plane = 0; plane < vb->num_planes; plane++)
> >> +		call_void_memop(vb, finish, vb->planes[plane].mem_priv);
> >> +
> > 
> > to here.
> > 
> > I'm not sure if this is correct... So __vb2_dqbuf is called from
> > __vb2_queue_cancel(), but now the buf_finish() callback is called
> > *before* the memop finish() callback, where this was the other way around
> > in __vb2_queue_cancel(). I don't think that is right since buf_finish()
> > expects that the buffer is synced for the cpu.
>
> I don't mind reordering them.

The .buf_finish() driver callback could touch the contents of the buffer using 
the CPU, so I agree with Hans that we need to reorder the calls.

> The vb->state will be different as __vb2_dqbuf() has already been called,
> there's at least one buffer state check in a driver. However, __vb2_dqbuf()
> unconditionally sets the buffer state to DEQUEUED, overriding e.g. ERROR
> which a driver would be interested in.
> 
> I think the cache sync needs to be moved out of __vb2_dqbuf() to the
> same level where it's called so proper ordering can be maintained while
> still flushing cache before buf_finish() is called.

I think that would be the easiest option. It's a bit of a shame to duplicate 
the call, but I don't see any other easy way. A comment that states that cache 
sync needs to occur before .buf_finish() would probably be useful.

> > Was this tested with CONFIG_VIDEO_ADV_DEBUG set and with 'v4l2-compliance
> > -s'? Not that that would help if things are done in the wrong order...
> 
> I'll do that the next time.

-- 
Regards,

Laurent Pinchart


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC RESEND 04/11] v4l: Unify cache management hint buffer flags
  2016-12-15 20:15   ` Laurent Pinchart
@ 2016-12-17  0:35     ` Sakari Ailus
  0 siblings, 0 replies; 38+ messages in thread
From: Sakari Ailus @ 2016-12-17  0:35 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Sakari Ailus, linux-media, pawel, m.szyprowski, kyungmin.park,
	hverkuil, sumit.semwal, robdclark, daniel.vetter, labbott

Hi Laurent,

Thank you for the review.

On Thu, Dec 15, 2016 at 10:15:39PM +0200, Laurent Pinchart wrote:
> Hi Sakari,
> 
> Thank you for the patch.
> 
> On Friday 11 Sep 2015 14:50:27 Sakari Ailus wrote:
> > The V4L2_BUF_FLAG_NO_CACHE_INVALIDATE and V4L2_BUF_FLAG_NO_CACHE_CLEAN
> > buffer flags are currently not used by the kernel. Replace the definitions
> > by a single V4L2_BUF_FLAG_NO_CACHE_SYNC flag to be used by further
> > patches.
> > 
> > Different cache architectures should not be visible to the user space
> > which can make no meaningful use of the differences anyway. In case a
> > device can make use of non-coherent memory accesses, the necessary cache
> > operations depend on the CPU architecture and the buffer type, not the
> > requests of the user. The cache operation itself may be skipped on the
> > user's request which was the purpose of the two flags.
> > 
> > On ARM the invalidate and clean are separate operations whereas on
> > x86(-64) the two are a single operation (flush). Whether the hardware uses
> > the buffer for reading (V4L2_BUF_TYPE_*_OUTPUT*) or writing
> > (V4L2_BUF_TYPE_*CAPTURE*) already defines the required cache operation
> > (clean and invalidate, respectively). No user input is required.
> 
> We need to perform the following operations.
> 
> 	| QBUF		| DQBUF
> -----------------------------------------------
> CAPTURE	| Invalidate	| Invalidate (*)
> OUTPUT	| Clean		| -
> 
> (*) for systems using speculative pre-fetching only.
> 
> The following optimizations are possible:
> 
> 1. CAPTURE, the CPU has not written to the buffer before QBUF
> 
> Cache invalidation can be skipped at QBUF time, but becomes required at DQBUF 
> time on all systems, regardless of whether they use speculative prefetching.
> 
> 2. CAPTURE, the CPU will not read from the buffer after DQBUF
> 
> Cache invalidation can be skipped at DQBUF time.
> 
> 3. CAPTURE, combination of (1) and (2)
> 
> Cache invalidation can be skipped at both QBUF and DQBUF time.
> 
> 4. OUTPUT, the CPU has not written to the buffer before QBUF
> 
> Cache clean can be skipped at QBUF time.

Ack.

> 
> 
> A single flag can cover all cases, provided we keep track of the flag being 
> set at QBUF time to force cache invalidation at DQBUF time for case (1) if the 
> flag isn't set at DQBUF time.
> 
> One issue is that cache invalidation at DQBUF time for CAPTURE buffers isn't 
> fully under the control of videobuf. We can instruct the DMA mapping API to 
> skip cache handling, but we can't ask it to force cache invalidation in the 
> sync_for_cpu operation for non speculative prefetching systems. On ARM32 the 
> implementation currently always invalidates the cache in 
> __dma_page_dev_to_cpu() for CAPTURE buffers so we're currently safe, but 
> there's a FIXME comment that might lead to someone fixing the implementation 
> in the future. I believe we'll have to fix this in the DMA mapping level, 
> userspace shouldn't be affected.
> 
> Feel free to capture (part of) this explanation in the commit message to 
> clarify your last paragraph.
> 
> > Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
> > ---
> >  Documentation/DocBook/media/v4l/io.xml | 25 +++++++++++--------------
> >  include/trace/events/v4l2.h            |  3 +--
> >  include/uapi/linux/videodev2.h         |  7 +++++--
> >  3 files changed, 17 insertions(+), 18 deletions(-)
> > 
> > diff --git a/Documentation/DocBook/media/v4l/io.xml
> > b/Documentation/DocBook/media/v4l/io.xml index 7bbc2a4..4facd63 100644
> > --- a/Documentation/DocBook/media/v4l/io.xml
> > +++ b/Documentation/DocBook/media/v4l/io.xml
> > @@ -1112,21 +1112,18 @@ application. Drivers set or clear this flag when the
> > linkend="vidioc-qbuf">VIDIOC_DQBUF</link> ioctl is called.</entry> </row>
> >  	  <row>
> > -	    
> <entry><constant>V4L2_BUF_FLAG_NO_CACHE_INVALIDATE</constant></entry>
> > +	    <entry><constant>V4L2_BUF_FLAG_NO_CACHE_SYNC</constant></entry>
> >  	    <entry>0x00000800</entry>
> > -	    <entry>Caches do not have to be invalidated for this buffer.
> > -Typically applications shall use this flag if the data captured in the
> > buffer -is not going to be touched by the CPU, instead the buffer will,
> > probably, be -passed on to a DMA-capable hardware unit for further
> > processing or output. -</entry>
> > -	  </row>
> > -	  <row>
> > -	    <entry><constant>V4L2_BUF_FLAG_NO_CACHE_CLEAN</constant></entry>
> > -	    <entry>0x00001000</entry>
> > -	    <entry>Caches do not have to be cleaned for this buffer.
> > -Typically applications shall use this flag for output buffers if the data
> > -in this buffer has not been created by the CPU but by some DMA-capable
> > unit, -in which case caches have not been used.</entry>
> > +	    <entry>Do not perform CPU cache synchronisation operations
> > +	    when the buffer is queued or dequeued. The user is
> > +	    responsible for the correct use of this flag. It should be
> > +	    only used when the buffer is not accessed using the CPU,
> > +	    e.g. the buffer is written to by a hardware block and then
> > +	    read by another one, in which case the flag should be set
> > +	    in both <link linkend="vidioc-qbuf">VIDIOC_DQBUF</link>
> > +	    and <link linkend="vidioc-qbuf">VIDIOC_QBUF</link> IOCTLs.
> 
> I'd like to word this differently. As explained above, there can be cases 
> where the flag would only be set in either QBUF or DQBUF. I would prefer 
> documenting the flag as a hint for the kernel that userspace has not written 
> to the buffer before QBUF (when the flag is set for VIDIOC_QBUF) or will not 
> read from the buffer after DQBUF (when the flag is set for VIDIOC_DQBUF) and 
> that the kernel is free to perform the appropriate cache optimizations 
> (without any guarantee).

AFAIR that was my intention but I don't seem to have written that in a way
that could be deciphered later on. Sounds good to me. :-)

> 
> > +	    The flag has no effect on some devices / architectures.
> > +	    </entry>
> >  	  </row>
> >  	  <row>
> >  	    <entry><constant>V4L2_BUF_FLAG_LAST</constant></entry>
> 
> [snip]
> 

-- 
Kind regards,

Sakari Ailus
e-mail: sakari.ailus@iki.fi	XMPP: sailus@retiisi.org.uk

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC RESEND 05/11] v4l2-core: Don't sync cache for a buffer if so requested
  2016-12-15 20:37   ` Laurent Pinchart
@ 2016-12-17  0:36     ` Sakari Ailus
  0 siblings, 0 replies; 38+ messages in thread
From: Sakari Ailus @ 2016-12-17  0:36 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Sakari Ailus, linux-media, pawel, m.szyprowski, kyungmin.park,
	hverkuil, sumit.semwal, robdclark, daniel.vetter, labbott,
	Samu Onkalo

Hi Laurent,

On Thu, Dec 15, 2016 at 10:37:20PM +0200, Laurent Pinchart wrote:
> Hi Sakari,
> 
> Thank you for the patch.
> 
> On Friday 11 Sep 2015 14:50:28 Sakari Ailus wrote:
> > From: Samu Onkalo <samu.onkalo@intel.com>
> > 
> > The user may request to the driver (vb2) to skip the cache maintenance
> > operations in case the buffer does not need cache synchronisation, e.g. in
> > cases where the buffer is passed between hardware blocks without it being
> > touched by the CPU.
> > 
> > Also document that the prepare and finish vb2_mem_ops might not get called
> > every time the buffer ownership changes between the kernel and the user
> > space.
> > 
> > Signed-off-by: Samu Onkalo <samu.onkalo@intel.com>
> > Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
> > ---
> >  drivers/media/v4l2-core/videobuf2-core.c | 52 ++++++++++++++++++++++-------
> >  include/media/videobuf2-core.h           | 12 +++++---
> >  2 files changed, 49 insertions(+), 15 deletions(-)
> > 
> > diff --git a/drivers/media/v4l2-core/videobuf2-core.c
> > b/drivers/media/v4l2-core/videobuf2-core.c index c5c0707a..b664024 100644
> > --- a/drivers/media/v4l2-core/videobuf2-core.c
> > +++ b/drivers/media/v4l2-core/videobuf2-core.c
> > @@ -187,6 +187,28 @@ static void __vb2_queue_cancel(struct vb2_queue *q);
> >  static void __enqueue_in_driver(struct vb2_buffer *vb);
> > 
> >  /**
> > + * __mem_prepare_planes() - call finish mem op for all planes of the buffer
> > + */
> > +static void __mem_prepare_planes(struct vb2_buffer *vb)
> > +{
> > +	unsigned int plane;
> > +
> > +	for (plane = 0; plane < vb->num_planes; ++plane)
> > +		call_void_memop(vb, prepare, vb->planes[plane].mem_priv);
> > +}
> > +
> > +/**
> > + * __mem_finish_planes() - call finish mem op for all planes of the buffer
> > + */
> > +static void __mem_finish_planes(struct vb2_buffer *vb)
> > +{
> > +	unsigned int plane;
> > +
> > +	for (plane = 0; plane < vb->num_planes; ++plane)
> > +		call_void_memop(vb, finish, vb->planes[plane].mem_priv);
> > +}
> > +
> > +/**
> >   * __vb2_buf_mem_alloc() - allocate video memory for the given buffer
> >   */
> >  static int __vb2_buf_mem_alloc(struct vb2_buffer *vb)
> > @@ -1391,6 +1413,10 @@ static void __fill_vb2_buffer(struct vb2_buffer *vb,
> > const struct v4l2_buffer *b static int __prepare_mmap(struct vb2_buffer
> > *vb, const struct v4l2_buffer *b) {
> >  	__fill_vb2_buffer(vb, b, vb->v4l2_planes);
> > +
> > +	if (!(b->flags & V4L2_BUF_FLAG_NO_CACHE_SYNC))
> > +		__mem_prepare_planes(vb);
> > +
> >  	return call_vb_qop(vb, buf_prepare, vb);
> >  }
> > 
> > @@ -1476,6 +1502,11 @@ static int __prepare_userptr(struct vb2_buffer *vb,
> >  			dprintk(1, "buffer initialization failed\n");
> >  			goto err;
> >  		}
> > +
> > +		/* This is new buffer memory --- always synchronise cache. */
> > +		__mem_prepare_planes(vb);
> > +	} else if (!(b->flags & V4L2_BUF_FLAG_NO_CACHE_SYNC)) {
> > +		__mem_prepare_planes(vb);
> >  	}
> > 
> >  	ret = call_vb_qop(vb, buf_prepare, vb);
> > @@ -1601,6 +1632,11 @@ static int __prepare_dmabuf(struct vb2_buffer *vb,
> > const struct v4l2_buffer *b) dprintk(1, "buffer initialization failed\n");
> >  			goto err;
> >  		}
> > +
> > +		/* This is new buffer memory --- always synchronise cache. */
> > +		__mem_prepare_planes(vb);
> > +	} else if (!(b->flags & V4L2_BUF_FLAG_NO_CACHE_SYNC)) {
> > +		__mem_prepare_planes(vb);
> >  	}
> > 
> >  	ret = call_vb_qop(vb, buf_prepare, vb);
> > @@ -1624,7 +1660,6 @@ err:
> >  static void __enqueue_in_driver(struct vb2_buffer *vb)
> >  {
> >  	struct vb2_queue *q = vb->vb2_queue;
> > -	unsigned int plane;
> > 
> >  	vb->state = VB2_BUF_STATE_ACTIVE;
> >  	atomic_inc(&q->owned_by_drv_count);
> > @@ -1691,10 +1726,6 @@ static int __buf_prepare(struct vb2_buffer *vb, const
> > struct v4l2_buffer *b) return ret;
> >  	}
> > 
> > -	/* sync buffers */
> > -	for (plane = 0; plane < vb->num_planes; ++plane)
> > -		call_void_memop(vb, prepare, vb->planes[plane].mem_priv);
> > -
> >  	vb->state = VB2_BUF_STATE_PREPARED;
> > 
> >  	return 0;
> > @@ -2078,7 +2109,7 @@ EXPORT_SYMBOL_GPL(vb2_wait_for_all_buffers);
> >  /**
> >   * __vb2_dqbuf() - bring back the buffer to the DEQUEUED state
> >   */
> > -static void __vb2_dqbuf(struct vb2_buffer *vb)
> > +static void __vb2_dqbuf(struct vb2_buffer *vb, bool no_cache_sync)
> >  {
> >  	struct vb2_queue *q = vb->vb2_queue;
> >  	unsigned int plane;
> > @@ -2089,9 +2120,8 @@ static void __vb2_dqbuf(struct vb2_buffer *vb)
> > 
> >  	vb->state = VB2_BUF_STATE_DEQUEUED;
> > 
> > -	/* sync buffers */
> > -	for (plane = 0; plane < vb->num_planes; plane++)
> > -		call_void_memop(vb, finish, vb->planes[plane].mem_priv);
> > +	if (!no_cache_sync)
> > +		__mem_finish_planes(vb);
> > 
> >  	/* unmap DMABUF buffer */
> >  	if (q->memory == V4L2_MEMORY_DMABUF)
> > @@ -2143,7 +2173,7 @@ static int vb2_internal_dqbuf(struct vb2_queue *q,
> > struct v4l2_buffer *b, bool n vb->v4l2_buf.flags & V4L2_BUF_FLAG_LAST)
> >  		q->last_buffer_dequeued = true;
> >  	/* go back to dequeued state */
> > -	__vb2_dqbuf(vb);
> > +	__vb2_dqbuf(vb, b->flags & V4L2_BUF_FLAG_NO_CACHE_SYNC);
> > 
> >  	dprintk(1, "dqbuf of buffer %d, with state %d\n",
> >  			vb->v4l2_buf.index, vb->state);
> > @@ -2246,7 +2276,7 @@ static void __vb2_queue_cancel(struct vb2_queue *q)
> >  			vb->state = VB2_BUF_STATE_PREPARED;
> >  			call_void_vb_qop(vb, buf_finish, vb);
> >  		}
> > -		__vb2_dqbuf(vb);
> > +		__vb2_dqbuf(vb, false);
> >  	}
> >  }
> > 
> > diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h
> > index 4f7f7ae..a825bd5 100644
> > --- a/include/media/videobuf2-core.h
> > +++ b/include/media/videobuf2-core.h
> > @@ -59,10 +59,14 @@ struct vb2_threadio_data;
> >   *		dmabuf.
> >   * @unmap_dmabuf: releases access control to the dmabuf - allocator is
> > notified
> > *		  that this driver is done using the dmabuf for now.
> > - * @prepare:	called every time the buffer is passed from userspace to the
> > - *		driver, useful for cache synchronisation, optional.
> > - * @finish:	called every time the buffer is passed back from the driver
> > - *		to the userspace, also optional.
> > + * @prepare:	Called on the plane when the buffer ownership is passed from
> > + *		the user space to the kernel and the plane must be cache
> > + *		syncronised. The V4L2_BUF_FLAG_NO_CACHE_SYNC buffer flag may
> 
> s/syncronised/synchronised/
> 
> > + *		be used to skip this call. Optional.
> 
> I would say "The call might be skipped by the videobuf core when the 
> V4L2_BUF_FLAG_NO_CACHE_SYNC buffer flag is set". Your wording could mean that 
> the memops implementation is free to skip cache synchronization internally if 
> the flag is set.
> 
> If the core skips the prepare and finish operations based on the 
> V4L2_BUF_FLAG_NO_CACHE_SYNC flag then those operations must not perform 
> anything else than cache synchronization. That's fine with the current 
> implementation, but it should be documented as such, and I'd rename the 
> operations to sync_for_cpu and sync_for dev (or something similar).

Sounds good to me.

> 
> > + * @finish:	Called on the plane when the buffer ownership is passed from
> > + *		the kernel to the user space and the plane must be cache
> > + *		syncronised. The V4L2_BUF_FLAG_NO_CACHE_SYNC buffer flag may
> > + *		be used to skip this call. Optional.
> 
> Same comment here.
> 
> >   * @vaddr:	return a kernel virtual address to a given memory buffer
> >   *		associated with the passed private structure or NULL if no
> >   *		such mapping exists.
> 

-- 
Regards,

Sakari Ailus
e-mail: sakari.ailus@iki.fi	XMPP: sailus@retiisi.org.uk

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC RESEND 07/11] vb2: dma-contig: Remove redundant sgt_base field
  2016-12-15 21:08   ` Laurent Pinchart
@ 2016-12-17  0:40     ` Sakari Ailus
  0 siblings, 0 replies; 38+ messages in thread
From: Sakari Ailus @ 2016-12-17  0:40 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Sakari Ailus, linux-media, pawel, m.szyprowski, kyungmin.park,
	hverkuil, sumit.semwal, robdclark, daniel.vetter, labbott

On Thu, Dec 15, 2016 at 11:08:37PM +0200, Laurent Pinchart wrote:
> Hi Sakari,
> 
> Thank you for the patch.
> 
> On Friday 11 Sep 2015 14:50:30 Sakari Ailus wrote:
> > The struct vb2_dc_buf contains two struct sg_table fields: sgt_base and
> > dma_sgt. The former is used by DMA-BUF buffers whereas the latter is used
> > by USERPTR.
> > 
> > Unify the two, leaving dma_sgt.
> 
> Looks good to me. I initially thought this would prevent exporting a USERPTR 
> buffer through dmabuf, but that's forbidden by the videobuf2 core (and rightly 
> so).
> 
> > MMAP buffers do not need cache flushing since they have been allocated
> > using dma_alloc_coherent().
> 
> I had understood this as meaning that the patch changes the behaviour for MMAP 
> buffers. How about
> 
> "The prepare and finish implementation currently skip cache sync for MMAP 
> buffers and identify them based on dma_sgt being NULL. Now that dma_sgt is 
> used to export the MMAP buffers the condition must be updated."
> 
> > Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
> > ---
> >  drivers/media/v4l2-core/videobuf2-dma-contig.c | 21 ++++++++++-----------
> >  1 file changed, 10 insertions(+), 11 deletions(-)
> > 
> > diff --git a/drivers/media/v4l2-core/videobuf2-dma-contig.c
> > b/drivers/media/v4l2-core/videobuf2-dma-contig.c index 94c1e64..26a0a0f
> > 100644
> > --- a/drivers/media/v4l2-core/videobuf2-dma-contig.c
> > +++ b/drivers/media/v4l2-core/videobuf2-dma-contig.c
> > @@ -36,7 +36,6 @@ struct vb2_dc_buf {
> >  	/* MMAP related */
> >  	struct vb2_vmarea_handler	handler;
> >  	atomic_t			refcount;
> > -	struct sg_table			*sgt_base;
> > 
> >  	/* USERPTR related */
> >  	struct vm_area_struct		*vma;
> > @@ -117,7 +116,7 @@ static void vb2_dc_prepare(void *buf_priv)
> >  	struct sg_table *sgt = buf->dma_sgt;
> > 
> >  	/* DMABUF exporter will flush the cache for us */
> 
> How about updating the comment to
> 
> 	/*
> 	 * Skip cache sync for MMAP buffers (they don't need it) and DMABUF
> 	 * buffers (the exporter will sync the cache for us).
> 	 */

Both are fine for me.

> 
> > -	if (!sgt || buf->db_attach)
> > +	if (!buf->vma)
> >  		return;
> > 
> >  	dma_sync_sg_for_device(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir);
> > @@ -129,7 +128,7 @@ static void vb2_dc_finish(void *buf_priv)
> >  	struct sg_table *sgt = buf->dma_sgt;
> > 
> >  	/* DMABUF exporter will flush the cache for us */
> > -	if (!sgt || buf->db_attach)
> > +	if (!buf->vma)
> >  		return;
> > 
> >  	dma_sync_sg_for_cpu(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir);
> > @@ -146,9 +145,9 @@ static void vb2_dc_put(void *buf_priv)
> >  	if (!atomic_dec_and_test(&buf->refcount))
> >  		return;
> > 
> > -	if (buf->sgt_base) {
> > -		sg_free_table(buf->sgt_base);
> > -		kfree(buf->sgt_base);
> > +	if (buf->dma_sgt) {
> > +		sg_free_table(buf->dma_sgt);
> > +		kfree(buf->dma_sgt);
> >  	}
> >  	dma_free_coherent(buf->dev, buf->size, buf->vaddr, buf->dma_addr);
> >  	put_device(buf->dev);
> > @@ -252,13 +251,13 @@ static int vb2_dc_dmabuf_ops_attach(struct dma_buf
> > *dbuf, struct device *dev, /* Copy the buf->base_sgt scatter list to the
> > attachment, as we can't * map the same scatter list to multiple attachments
> > at the same time. */
> > -	ret = sg_alloc_table(sgt, buf->sgt_base->orig_nents, GFP_KERNEL);
> > +	ret = sg_alloc_table(sgt, buf->dma_sgt->orig_nents, GFP_KERNEL);
> >  	if (ret) {
> >  		kfree(attach);
> >  		return -ENOMEM;
> >  	}
> > 
> > -	rd = buf->sgt_base->sgl;
> > +	rd = buf->dma_sgt->sgl;
> >  	wr = sgt->sgl;
> >  	for (i = 0; i < sgt->orig_nents; ++i) {
> >  		sg_set_page(wr, sg_page(rd), rd->length, rd->offset);
> > @@ -409,10 +408,10 @@ static struct dma_buf *vb2_dc_get_dmabuf(void
> > *buf_priv, unsigned long flags) exp_info.flags = flags;
> >  	exp_info.priv = buf;
> > 
> > -	if (!buf->sgt_base)
> > -		buf->sgt_base = vb2_dc_get_base_sgt(buf);
> > +	if (!buf->dma_sgt)
> > +		buf->dma_sgt = vb2_dc_get_base_sgt(buf);
> > 
> > -	if (WARN_ON(!buf->sgt_base))
> > +	if (WARN_ON(!buf->dma_sgt))
> >  		return NULL;
> > 
> >  	dbuf = dma_buf_export(&exp_info);
> 

-- 
Sakari Ailus
e-mail: sakari.ailus@iki.fi	XMPP: sailus@retiisi.org.uk

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC RESEND 11/11] vb2: dma-contig: Add WARN_ON_ONCE() to check for potential bugs
  2016-12-15 21:57   ` Laurent Pinchart
@ 2016-12-17  0:50     ` Sakari Ailus
  0 siblings, 0 replies; 38+ messages in thread
From: Sakari Ailus @ 2016-12-17  0:50 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Sakari Ailus, linux-media, pawel, m.szyprowski, kyungmin.park,
	hverkuil, sumit.semwal, robdclark, daniel.vetter, labbott

On Thu, Dec 15, 2016 at 11:57:54PM +0200, Laurent Pinchart wrote:
> Hi Sakari,
> 
> Thank you for the patch.
> 
> On Friday 11 Sep 2015 14:50:34 Sakari Ailus wrote:
> > The scatterlist should always be present when the cache would need to be
> > flushed. Each buffer type has its own means to provide that. Add
> > WARN_ON_ONCE() to check the scatterist exists.
> 
> Do you think such a check is really needed ? Have you run into this before ?

I think I may have, but the reason was that the code is non-trivial and
letting the user know what went wrong and where is nice. I guess this one
could be dropped, too.

> 
> > Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
> > ---
> >  drivers/media/v4l2-core/videobuf2-dma-contig.c | 6 ++++++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/drivers/media/v4l2-core/videobuf2-dma-contig.c
> > b/drivers/media/v4l2-core/videobuf2-dma-contig.c index 65ee122..58c35c5
> > 100644
> > --- a/drivers/media/v4l2-core/videobuf2-dma-contig.c
> > +++ b/drivers/media/v4l2-core/videobuf2-dma-contig.c
> > @@ -145,6 +145,9 @@ static void vb2_dc_prepare(void *buf_priv)
> >  	    !dma_get_attr(DMA_ATTR_NON_CONSISTENT, buf->attrs))
> >  		return;
> > 
> > +	if (WARN_ON_ONCE(!sgt))
> > +		return;
> > +
> >  	dma_sync_sg_for_device(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir);
> >  }
> > 
> > @@ -161,6 +164,9 @@ static void vb2_dc_finish(void *buf_priv)
> >  	    !dma_get_attr(DMA_ATTR_NON_CONSISTENT, buf->attrs))
> >  		return;
> > 
> > +	if (WARN_ON_ONCE(!sgt))
> > +		return;
> > +
> >  	dma_sync_sg_for_cpu(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir);
> >  }
> 

-- 
Sakari Ailus
e-mail: sakari.ailus@iki.fi	XMPP: sailus@retiisi.org.uk

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2016-12-17  0:50 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-11 11:50 [RFC RESEND 00/11] vb2: Handle user cache hints, allow drivers to choose cache coherency Sakari Ailus
2015-09-11 11:50 ` [RFC RESEND 01/11] vb2: Rename confusingly named internal buffer preparation functions Sakari Ailus
2015-09-11 15:57   ` Hans Verkuil
2016-12-15 15:40   ` Laurent Pinchart
2015-09-11 11:50 ` [RFC RESEND 02/11] vb2: Move buffer cache synchronisation to prepare from queue Sakari Ailus
2015-09-11 16:11   ` Hans Verkuil
2015-09-11 11:50 ` [RFC RESEND 03/11] vb2: Move cache synchronisation from buffer done to dqbuf handler Sakari Ailus
2015-09-11 16:25   ` Hans Verkuil
2015-09-15  7:51     ` Sakari Ailus
2016-12-15 23:47       ` Laurent Pinchart
2015-09-11 11:50 ` [RFC RESEND 04/11] v4l: Unify cache management hint buffer flags Sakari Ailus
2015-09-11 16:26   ` Hans Verkuil
2015-09-11 16:44     ` Hans Verkuil
2016-12-15 20:15   ` Laurent Pinchart
2016-12-17  0:35     ` Sakari Ailus
2015-09-11 11:50 ` [RFC RESEND 05/11] v4l2-core: Don't sync cache for a buffer if so requested Sakari Ailus
2015-09-11 17:12   ` Hans Verkuil
2015-09-15  8:22     ` Sakari Ailus
2015-09-15  9:06       ` Hans Verkuil
2016-12-15 20:37   ` Laurent Pinchart
2016-12-17  0:36     ` Sakari Ailus
2015-09-11 11:50 ` [RFC RESEND 06/11] vb2: Improve struct vb2_mem_ops documentation; alloc and put are for MMAP Sakari Ailus
2015-09-11 17:13   ` Hans Verkuil
2016-12-15 20:50   ` Laurent Pinchart
2015-09-11 11:50 ` [RFC RESEND 07/11] vb2: dma-contig: Remove redundant sgt_base field Sakari Ailus
2015-09-11 17:28   ` Hans Verkuil
2015-09-15  8:26     ` Sakari Ailus
2016-12-15 21:08   ` Laurent Pinchart
2016-12-17  0:40     ` Sakari Ailus
2015-09-11 11:50 ` [RFC RESEND 08/11] vb2: dma-contig: Move vb2_dc_get_base_sgt() up Sakari Ailus
2016-12-15 21:12   ` Laurent Pinchart
2015-09-11 11:50 ` [RFC RESEND 09/11] vb2: dma-contig: Don't warn on failure in obtaining scatterlist Sakari Ailus
2016-12-15 21:13   ` Laurent Pinchart
2015-09-11 11:50 ` [RFC RESEND 10/11] vb2: dma-contig: Let drivers decide DMA attrs of MMAP and USERPTR bufs Sakari Ailus
2016-12-15 21:40   ` Laurent Pinchart
2015-09-11 11:50 ` [RFC RESEND 11/11] vb2: dma-contig: Add WARN_ON_ONCE() to check for potential bugs Sakari Ailus
2016-12-15 21:57   ` Laurent Pinchart
2016-12-17  0:50     ` Sakari Ailus

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).