All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC v2 00/11] vb2: Handle user cache hints, allow drivers to choose cache coherency
@ 2016-12-16  1:24 ` Laurent Pinchart
  0 siblings, 0 replies; 25+ messages in thread
From: Laurent Pinchart @ 2016-12-16  1:24 UTC (permalink / raw)
  To: linux-media
  Cc: dri-devel, Pawel Osciak, Marek Szyprowski, Kyungmin Park,
	Hans Verkuil, Sumit Semwal, Rob Clark, Daniel Vetter,
	Laura Abbott, Sakari Ailus

Hello,

This is a rebased version of the vb2 cache hints support patch series posted
by Sakari more than a year ago. The patches have been modified as needed by
the upstream changes and received the occasional small odd fix but are 
otherwise not modified. Please see the individual commit messages for more
information.

The videobuf2 memory managers use the DMA mapping API to handle cache
synchronization on systems that require them transparently for drivers. As
cache operations are expensive, system performances can be impacted. Cache
synchronization can't be skipped altogether if we want to retain correct
behaviour, but optimizations are possible in cases related to buffer sharing
between multiple devices without CPU access to the memory.

The first optimization covers cases where the memory never needs to be
accessed by the CPU (neither in kernelspace nor in userspace). In those cases,
as no CPU memory mappings exist, cache synchronization can be skipped. The
situation could be detected in the kernel as we have enough information to
determine whether CPU mappings for kernelspace or userspace exist (in the
first case because drivers should request them explicitly, in the second case
because the mmap() handler hasn't been invoked). This optimization is not
implemented currently but should at least be prototyped as it could improve
performances automatically in a large number of cases.

The second class of optimizations cover cases where the memory sometimes needs
to be accessed by the CPU. In those cases memory mapping must be created and
cache handled, but cache synchronization could be skipped for buffer that are
not touched by the CPU.

By default the following cache synchronization operations need to be performed
related to the buffer management ioctls. For simplicity means of QBUF below
apply to buf VIDIOC_QBUF and VIDIOC_PREPARE_BUF.

		| QBUF		| DQBUF
	----------------------------------------
	CAPTURE	| Invalidate	| Invalidate (*)
	OUTPUT	| Clean		| -

(*) for systems using speculative pre-fetching only

The following cases can be optimized.

1. CAPTURE, the CPU has not written to the buffer before QBUF

   Cache invalidation can be skipped at QBUF time, but becomes required at
   DQBUF time on all systems, regardless of whether they use speculative
   prefetching.

2. CAPTURE, the CPU will not read from the buffer after DQBUF

   Cache invalidation can be skipped at DQBUF time.

3. CAPTURE, combination of (1) and (2)

   Cache invalidation can be skipped at both QBUF and DQBUF time.

4. OUTPUT, the CPU has not written to the buffer before QBUF

   Cache clean can be skipped at QBUF time.


The kernel can't detect thoses situations automatically and thus requires
hints from userspace to decide whether cache synchronization can be skipped.
It should be noted that those hints might not be honoured. In particular, if
userspace hints that it hasn't touched the buffer with the CPU, drivers might
need to perform memory accesses themselves (adding JPEG or MPEG headers to
buffers is a common case where CPU access could be needed in the kernel), in
which case the userspace hints will be ignored.

Getting the hints wrong will result in data corruption. Userspace applications
are allowed to shoot themselves in the foot, but driver are responsible for
deciding whether data corruption can pose a risk to the system in general. For
instance if the device could be made to crash, or behave in a way that would
jeopardize system security, reliability or performances, when fed with invalid
data, cache synchronization shall not be skipped solely due to possibly
incorrect userspace hints.

The V4L2 API defines two flags, V4L2-BUF-FLAG-NO-CACHE-INVALIDATE and
V4L2_BUF_FLAG_NO_CACHE_SYNC, that can be used to provide cache-related hints
to the kernel. However, no kernel has ever implemented support for those flags
that are thus most likely unused.

A single flag is enough to cover all the optimization cases described above,
provided we keep track of the flag being set at QBUF time to force cache
invalidation at DQBUF time for case (1) if the  flag isn't set at DQBUF time.
This patch series thus cleans up the userspace API and merges both flags into
a single one.

One potential issue with case (1) is that cache invalidation at DQBUF time for
CAPTURE buffers isn't fully under the control of videobuf2. We can instruct
the DMA mapping API to skip cache handling, but we can't force it to
invalidate the cache in the sync_for_cpu operation for non speculative
prefetching systems. Luckily, on ARM32 the current implementation always
invalidates the cache in __dma_page_dev_to_cpu() for CAPTURE buffers so we are
safe fot now. However, this is documented by a FIXME comment that might lead
to someone fixing the implementation in the future. I believe we will have to
the problem at the DMA mapping level, the userspace hint API shouldn't be
affected.

This RFC patch set achieves two main objectives:

1. Respect cache flags passed from the user space. As no driver nor videobuf2
has (ever?) implemented them, the two flags are replaced by a single one
(V4L2_BUF_FLAG_NO_CACHE_SYNC) and the two old flags are deprecated. This is
done since a single flag provides the driver with enough information on what
to do. (Patches 01/11 to 05/11, see patch 04/11 for more information.)

2. Allow a driver using videobuf2 dma-contig memory type to choose whether it
prefers coherent or non-coherent CPU access to buffer memory for MMAP and
USERPTR buffers. This could be later extended to be specified by the user, and
per buffer if needed. (Patches 06/11 and 11/11).

Only dma-contig memory type is changed but the same could be done to dma-sg as
well. Sakari offered in v1 to add it to the set if people are happy with the
changes to dma-contig.

Note should be taken that the series performs cache optimization for MMAP
buffers only. DMABUF imported buffers have their cache synchronization handled
by the exported through the dma_buf_map_attachment() and
dma_buf_unmap_attachment() functions, and dma-buf lacks an API to perform
memory synchronization without unmapping and remapping the buffers. This is
not a blocker as far as this patch series is concerned, but importing buffers
(usually exported by the CPU) is such an important use case that we can't
considered the cache optimization problem anywhere close to being solved if we
don't address this case. I plan to start addressing the problem in January or
February 2017, feedback on this point will thus be appreciated.


Sakari Ailus (10):
  vb2: Rename confusingly named internal buffer preparation functions
  vb2: Move buffer cache synchronisation to prepare from queue
  vb2: Move cache synchronisation from buffer done to dqbuf handler
  v4l: Unify cache management hint buffer flags
  vb2: Improve struct vb2_mem_ops documentation; alloc and put are for
    MMAP
  vb2: dma-contig: Remove redundant sgt_base field
  vb2: dma-contig: Don't warn on failure in obtaining scatterlist
  vb2: dma-contig: Move vb2_dc_get_base_sgt() up
  vb2: dma-contig: Let drivers decide DMA attrs of MMAP and USERPTR bufs
  vb2: dma-contig: Add WARN_ON_ONCE() to check for potential bugs

Samu Onkalo (1):
  v4l2-core: Don't sync cache for a buffer if so requested

 Documentation/media/uapi/v4l/buffer.rst            |  24 ++--
 .../media/uapi/v4l/vidioc-prepare-buf.rst          |   5 +-
 drivers/media/v4l2-core/videobuf2-core.c           | 120 ++++++++++++------
 drivers/media/v4l2-core/videobuf2-dma-contig.c     | 135 ++++++++++++++-------
 drivers/media/v4l2-core/videobuf2-v4l2.c           |  14 ++-
 include/media/videobuf2-core.h                     |  43 ++++---
 include/trace/events/v4l2.h                        |   3 +-
 include/uapi/linux/videodev2.h                     |   7 +-
 8 files changed, 228 insertions(+), 123 deletions(-)

-- 
Regards,

Laurent Pinchart


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [RFC v2 00/11] vb2: Handle user cache hints, allow drivers to choose cache coherency
@ 2016-12-16  1:24 ` Laurent Pinchart
  0 siblings, 0 replies; 25+ messages in thread
From: Laurent Pinchart @ 2016-12-16  1:24 UTC (permalink / raw)
  To: linux-media
  Cc: Daniel Vetter, dri-devel, Hans Verkuil, Kyungmin Park,
	Sakari Ailus, Pawel Osciak, Marek Szyprowski

Hello,

This is a rebased version of the vb2 cache hints support patch series posted
by Sakari more than a year ago. The patches have been modified as needed by
the upstream changes and received the occasional small odd fix but are 
otherwise not modified. Please see the individual commit messages for more
information.

The videobuf2 memory managers use the DMA mapping API to handle cache
synchronization on systems that require them transparently for drivers. As
cache operations are expensive, system performances can be impacted. Cache
synchronization can't be skipped altogether if we want to retain correct
behaviour, but optimizations are possible in cases related to buffer sharing
between multiple devices without CPU access to the memory.

The first optimization covers cases where the memory never needs to be
accessed by the CPU (neither in kernelspace nor in userspace). In those cases,
as no CPU memory mappings exist, cache synchronization can be skipped. The
situation could be detected in the kernel as we have enough information to
determine whether CPU mappings for kernelspace or userspace exist (in the
first case because drivers should request them explicitly, in the second case
because the mmap() handler hasn't been invoked). This optimization is not
implemented currently but should at least be prototyped as it could improve
performances automatically in a large number of cases.

The second class of optimizations cover cases where the memory sometimes needs
to be accessed by the CPU. In those cases memory mapping must be created and
cache handled, but cache synchronization could be skipped for buffer that are
not touched by the CPU.

By default the following cache synchronization operations need to be performed
related to the buffer management ioctls. For simplicity means of QBUF below
apply to buf VIDIOC_QBUF and VIDIOC_PREPARE_BUF.

		| QBUF		| DQBUF
	----------------------------------------
	CAPTURE	| Invalidate	| Invalidate (*)
	OUTPUT	| Clean		| -

(*) for systems using speculative pre-fetching only

The following cases can be optimized.

1. CAPTURE, the CPU has not written to the buffer before QBUF

   Cache invalidation can be skipped at QBUF time, but becomes required at
   DQBUF time on all systems, regardless of whether they use speculative
   prefetching.

2. CAPTURE, the CPU will not read from the buffer after DQBUF

   Cache invalidation can be skipped at DQBUF time.

3. CAPTURE, combination of (1) and (2)

   Cache invalidation can be skipped at both QBUF and DQBUF time.

4. OUTPUT, the CPU has not written to the buffer before QBUF

   Cache clean can be skipped at QBUF time.


The kernel can't detect thoses situations automatically and thus requires
hints from userspace to decide whether cache synchronization can be skipped.
It should be noted that those hints might not be honoured. In particular, if
userspace hints that it hasn't touched the buffer with the CPU, drivers might
need to perform memory accesses themselves (adding JPEG or MPEG headers to
buffers is a common case where CPU access could be needed in the kernel), in
which case the userspace hints will be ignored.

Getting the hints wrong will result in data corruption. Userspace applications
are allowed to shoot themselves in the foot, but driver are responsible for
deciding whether data corruption can pose a risk to the system in general. For
instance if the device could be made to crash, or behave in a way that would
jeopardize system security, reliability or performances, when fed with invalid
data, cache synchronization shall not be skipped solely due to possibly
incorrect userspace hints.

The V4L2 API defines two flags, V4L2-BUF-FLAG-NO-CACHE-INVALIDATE and
V4L2_BUF_FLAG_NO_CACHE_SYNC, that can be used to provide cache-related hints
to the kernel. However, no kernel has ever implemented support for those flags
that are thus most likely unused.

A single flag is enough to cover all the optimization cases described above,
provided we keep track of the flag being set at QBUF time to force cache
invalidation at DQBUF time for case (1) if the  flag isn't set at DQBUF time.
This patch series thus cleans up the userspace API and merges both flags into
a single one.

One potential issue with case (1) is that cache invalidation at DQBUF time for
CAPTURE buffers isn't fully under the control of videobuf2. We can instruct
the DMA mapping API to skip cache handling, but we can't force it to
invalidate the cache in the sync_for_cpu operation for non speculative
prefetching systems. Luckily, on ARM32 the current implementation always
invalidates the cache in __dma_page_dev_to_cpu() for CAPTURE buffers so we are
safe fot now. However, this is documented by a FIXME comment that might lead
to someone fixing the implementation in the future. I believe we will have to
the problem at the DMA mapping level, the userspace hint API shouldn't be
affected.

This RFC patch set achieves two main objectives:

1. Respect cache flags passed from the user space. As no driver nor videobuf2
has (ever?) implemented them, the two flags are replaced by a single one
(V4L2_BUF_FLAG_NO_CACHE_SYNC) and the two old flags are deprecated. This is
done since a single flag provides the driver with enough information on what
to do. (Patches 01/11 to 05/11, see patch 04/11 for more information.)

2. Allow a driver using videobuf2 dma-contig memory type to choose whether it
prefers coherent or non-coherent CPU access to buffer memory for MMAP and
USERPTR buffers. This could be later extended to be specified by the user, and
per buffer if needed. (Patches 06/11 and 11/11).

Only dma-contig memory type is changed but the same could be done to dma-sg as
well. Sakari offered in v1 to add it to the set if people are happy with the
changes to dma-contig.

Note should be taken that the series performs cache optimization for MMAP
buffers only. DMABUF imported buffers have their cache synchronization handled
by the exported through the dma_buf_map_attachment() and
dma_buf_unmap_attachment() functions, and dma-buf lacks an API to perform
memory synchronization without unmapping and remapping the buffers. This is
not a blocker as far as this patch series is concerned, but importing buffers
(usually exported by the CPU) is such an important use case that we can't
considered the cache optimization problem anywhere close to being solved if we
don't address this case. I plan to start addressing the problem in January or
February 2017, feedback on this point will thus be appreciated.


Sakari Ailus (10):
  vb2: Rename confusingly named internal buffer preparation functions
  vb2: Move buffer cache synchronisation to prepare from queue
  vb2: Move cache synchronisation from buffer done to dqbuf handler
  v4l: Unify cache management hint buffer flags
  vb2: Improve struct vb2_mem_ops documentation; alloc and put are for
    MMAP
  vb2: dma-contig: Remove redundant sgt_base field
  vb2: dma-contig: Don't warn on failure in obtaining scatterlist
  vb2: dma-contig: Move vb2_dc_get_base_sgt() up
  vb2: dma-contig: Let drivers decide DMA attrs of MMAP and USERPTR bufs
  vb2: dma-contig: Add WARN_ON_ONCE() to check for potential bugs

Samu Onkalo (1):
  v4l2-core: Don't sync cache for a buffer if so requested

 Documentation/media/uapi/v4l/buffer.rst            |  24 ++--
 .../media/uapi/v4l/vidioc-prepare-buf.rst          |   5 +-
 drivers/media/v4l2-core/videobuf2-core.c           | 120 ++++++++++++------
 drivers/media/v4l2-core/videobuf2-dma-contig.c     | 135 ++++++++++++++-------
 drivers/media/v4l2-core/videobuf2-v4l2.c           |  14 ++-
 include/media/videobuf2-core.h                     |  43 ++++---
 include/trace/events/v4l2.h                        |   3 +-
 include/uapi/linux/videodev2.h                     |   7 +-
 8 files changed, 228 insertions(+), 123 deletions(-)

-- 
Regards,

Laurent Pinchart

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [RFC v2 01/11] vb2: Rename confusingly named internal buffer preparation functions
  2016-12-16  1:24 ` Laurent Pinchart
  (?)
@ 2016-12-16  1:24 ` Laurent Pinchart
  -1 siblings, 0 replies; 25+ messages in thread
From: Laurent Pinchart @ 2016-12-16  1:24 UTC (permalink / raw)
  To: linux-media
  Cc: dri-devel, Pawel Osciak, Marek Szyprowski, Kyungmin Park,
	Hans Verkuil, Sumit Semwal, Rob Clark, Daniel Vetter,
	Laura Abbott, Sakari Ailus

From: Sakari Ailus <sakari.ailus@linux.intel.com>

Rename __qbuf_*() functions which are specific to a buffer type as
__prepare_*() which matches with what they do. The naming was there for
historical reasons; the purpose of the functions was changed without
renaming them.

Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Acked-by: Hans Verkuil <hans.verkuil@cisco.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
---
 drivers/media/v4l2-core/videobuf2-core.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/media/v4l2-core/videobuf2-core.c b/drivers/media/v4l2-core/videobuf2-core.c
index 7c1d390ea438..5379c8718010 100644
--- a/drivers/media/v4l2-core/videobuf2-core.c
+++ b/drivers/media/v4l2-core/videobuf2-core.c
@@ -956,9 +956,9 @@ void vb2_discard_done(struct vb2_queue *q)
 EXPORT_SYMBOL_GPL(vb2_discard_done);
 
 /**
- * __qbuf_mmap() - handle qbuf of an MMAP buffer
+ * __prepare_mmap() - prepare an MMAP buffer
  */
-static int __qbuf_mmap(struct vb2_buffer *vb, const void *pb)
+static int __prepare_mmap(struct vb2_buffer *vb, const void *pb)
 {
 	int ret = 0;
 
@@ -969,9 +969,9 @@ static int __qbuf_mmap(struct vb2_buffer *vb, const void *pb)
 }
 
 /**
- * __qbuf_userptr() - handle qbuf of a USERPTR buffer
+ * __prepare_userptr() - prepare a USERPTR buffer
  */
-static int __qbuf_userptr(struct vb2_buffer *vb, const void *pb)
+static int __prepare_userptr(struct vb2_buffer *vb, const void *pb)
 {
 	struct vb2_plane planes[VB2_MAX_PLANES];
 	struct vb2_queue *q = vb->vb2_queue;
@@ -1086,9 +1086,9 @@ static int __qbuf_userptr(struct vb2_buffer *vb, const void *pb)
 }
 
 /**
- * __qbuf_dmabuf() - handle qbuf of a DMABUF buffer
+ * __prepare_dmabuf() - prepare a DMABUF buffer
  */
-static int __qbuf_dmabuf(struct vb2_buffer *vb, const void *pb)
+static int __prepare_dmabuf(struct vb2_buffer *vb, const void *pb)
 {
 	struct vb2_plane planes[VB2_MAX_PLANES];
 	struct vb2_queue *q = vb->vb2_queue;
@@ -1253,13 +1253,13 @@ static int __buf_prepare(struct vb2_buffer *vb, const void *pb)
 
 	switch (q->memory) {
 	case VB2_MEMORY_MMAP:
-		ret = __qbuf_mmap(vb, pb);
+		ret = __prepare_mmap(vb, pb);
 		break;
 	case VB2_MEMORY_USERPTR:
-		ret = __qbuf_userptr(vb, pb);
+		ret = __prepare_userptr(vb, pb);
 		break;
 	case VB2_MEMORY_DMABUF:
-		ret = __qbuf_dmabuf(vb, pb);
+		ret = __prepare_dmabuf(vb, pb);
 		break;
 	default:
 		WARN(1, "Invalid queue type\n");
-- 
Regards,

Laurent Pinchart


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC v2 02/11] vb2: Move buffer cache synchronisation to prepare from queue
  2016-12-16  1:24 ` Laurent Pinchart
  (?)
  (?)
@ 2016-12-16  1:24 ` Laurent Pinchart
  -1 siblings, 0 replies; 25+ messages in thread
From: Laurent Pinchart @ 2016-12-16  1:24 UTC (permalink / raw)
  To: linux-media
  Cc: dri-devel, Pawel Osciak, Marek Szyprowski, Kyungmin Park,
	Hans Verkuil, Sumit Semwal, Rob Clark, Daniel Vetter,
	Laura Abbott, Sakari Ailus

From: Sakari Ailus <sakari.ailus@linux.intel.com>

The buffer cache should be synchronised in buffer preparation, not when
the buffer is queued to the device. Fix this.

Mmap buffers do not need cache synchronisation since they are always
coherent.

Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
---
 drivers/media/v4l2-core/videobuf2-core.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/drivers/media/v4l2-core/videobuf2-core.c b/drivers/media/v4l2-core/videobuf2-core.c
index 5379c8718010..8ba48703b189 100644
--- a/drivers/media/v4l2-core/videobuf2-core.c
+++ b/drivers/media/v4l2-core/videobuf2-core.c
@@ -1225,23 +1225,19 @@ static int __prepare_dmabuf(struct vb2_buffer *vb, const void *pb)
 static void __enqueue_in_driver(struct vb2_buffer *vb)
 {
 	struct vb2_queue *q = vb->vb2_queue;
-	unsigned int plane;
 
 	vb->state = VB2_BUF_STATE_ACTIVE;
 	atomic_inc(&q->owned_by_drv_count);
 
 	trace_vb2_buf_queue(q, vb);
 
-	/* sync buffers */
-	for (plane = 0; plane < vb->num_planes; ++plane)
-		call_void_memop(vb, prepare, vb->planes[plane].mem_priv);
-
 	call_void_vb_qop(vb, buf_queue, vb);
 }
 
 static int __buf_prepare(struct vb2_buffer *vb, const void *pb)
 {
 	struct vb2_queue *q = vb->vb2_queue;
+	unsigned int plane;
 	int ret;
 
 	if (q->error) {
@@ -1266,11 +1262,19 @@ static int __buf_prepare(struct vb2_buffer *vb, const void *pb)
 		ret = -EINVAL;
 	}
 
-	if (ret)
+	if (ret) {
 		dprintk(1, "buffer preparation failed: %d\n", ret);
-	vb->state = ret ? VB2_BUF_STATE_DEQUEUED : VB2_BUF_STATE_PREPARED;
+		vb->state = VB2_BUF_STATE_DEQUEUED;
+		return ret;
+	}
 
-	return ret;
+	/* sync buffers */
+	for (plane = 0; plane < vb->num_planes; ++plane)
+		call_void_memop(vb, prepare, vb->planes[plane].mem_priv);
+
+	vb->state = VB2_BUF_STATE_PREPARED;
+
+	return 0;
 }
 
 int vb2_core_prepare_buf(struct vb2_queue *q, unsigned int index, void *pb)
-- 
Regards,

Laurent Pinchart


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC v2 03/11] vb2: Move cache synchronisation from buffer done to dqbuf handler
  2016-12-16  1:24 ` Laurent Pinchart
                   ` (2 preceding siblings ...)
  (?)
@ 2016-12-16  1:24 ` Laurent Pinchart
  2017-03-27 21:27   ` Shuah Khan
  -1 siblings, 1 reply; 25+ messages in thread
From: Laurent Pinchart @ 2016-12-16  1:24 UTC (permalink / raw)
  To: linux-media
  Cc: dri-devel, Pawel Osciak, Marek Szyprowski, Kyungmin Park,
	Hans Verkuil, Sumit Semwal, Rob Clark, Daniel Vetter,
	Laura Abbott, Sakari Ailus

From: Sakari Ailus <sakari.ailus@linux.intel.com>

The cache synchronisation may be a time consuming operation and thus not
best performed in an interrupt which is a typical context for
vb2_buffer_done() calls. This may consume up to tens of ms on some
machines, depending on the buffer size.

Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
---
Changes since v1:

- Don't rename the 'i' loop counter to 'plane'
---
 drivers/media/v4l2-core/videobuf2-core.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/media/v4l2-core/videobuf2-core.c b/drivers/media/v4l2-core/videobuf2-core.c
index 8ba48703b189..15a83f338072 100644
--- a/drivers/media/v4l2-core/videobuf2-core.c
+++ b/drivers/media/v4l2-core/videobuf2-core.c
@@ -889,7 +889,6 @@ void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state)
 {
 	struct vb2_queue *q = vb->vb2_queue;
 	unsigned long flags;
-	unsigned int plane;
 
 	if (WARN_ON(vb->state != VB2_BUF_STATE_ACTIVE))
 		return;
@@ -910,10 +909,6 @@ void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state)
 	dprintk(4, "done processing on buffer %d, state: %d\n",
 			vb->index, state);
 
-	/* sync buffers */
-	for (plane = 0; plane < vb->num_planes; ++plane)
-		call_void_memop(vb, finish, vb->planes[plane].mem_priv);
-
 	spin_lock_irqsave(&q->done_lock, flags);
 	if (state == VB2_BUF_STATE_QUEUED ||
 	    state == VB2_BUF_STATE_REQUEUEING) {
@@ -1571,6 +1566,10 @@ static void __vb2_dqbuf(struct vb2_buffer *vb)
 
 	vb->state = VB2_BUF_STATE_DEQUEUED;
 
+	/* sync buffers */
+	for (i = 0; i < vb->num_planes; ++i)
+		call_void_memop(vb, finish, vb->planes[i].mem_priv);
+
 	/* unmap DMABUF buffer */
 	if (q->memory == VB2_MEMORY_DMABUF)
 		for (i = 0; i < vb->num_planes; ++i) {
-- 
Regards,

Laurent Pinchart


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC v2 04/11] v4l: Unify cache management hint buffer flags
  2016-12-16  1:24 ` Laurent Pinchart
                   ` (3 preceding siblings ...)
  (?)
@ 2016-12-16  1:24 ` Laurent Pinchart
  -1 siblings, 0 replies; 25+ messages in thread
From: Laurent Pinchart @ 2016-12-16  1:24 UTC (permalink / raw)
  To: linux-media
  Cc: dri-devel, Pawel Osciak, Marek Szyprowski, Kyungmin Park,
	Hans Verkuil, Sumit Semwal, Rob Clark, Daniel Vetter,
	Laura Abbott, Sakari Ailus

From: Sakari Ailus <sakari.ailus@linux.intel.com>

The V4L2_BUF_FLAG_NO_CACHE_INVALIDATE and V4L2_BUF_FLAG_NO_CACHE_CLEAN
buffer flags are currently not used by the kernel. Replace the definitions
by a single V4L2_BUF_FLAG_NO_CACHE_SYNC flag to be used by further
patches.

Different cache architectures should not be visible to the user space
which can make no meaningful use of the differences anyway. In case a
device can make use of non-coherent memory accesses, the necessary cache
operations depend on the CPU architecture and the buffer type, not the
requests of the user. The cache operation itself may be skipped on the
user's request which was the purpose of the two flags.

On ARM the invalidate and clean are separate operations whereas on
x86(-64) the two are a single operation (flush). Whether the hardware uses
the buffer for reading (V4L2_BUF_TYPE_*_OUTPUT*) or writing
(V4L2_BUF_TYPE_*CAPTURE*) already defines the required cache operation
(clean and invalidate, respectively). No user input is required.

Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Acked-by: Hans Verkuil <hans.verkuil@cisco.com>
---
 Documentation/media/uapi/v4l/buffer.rst            | 24 ++++++++--------------
 .../media/uapi/v4l/vidioc-prepare-buf.rst          |  5 ++---
 include/trace/events/v4l2.h                        |  3 +--
 include/uapi/linux/videodev2.h                     |  7 +++++--
 4 files changed, 17 insertions(+), 22 deletions(-)

diff --git a/Documentation/media/uapi/v4l/buffer.rst b/Documentation/media/uapi/v4l/buffer.rst
index ac58966ccb9b..601c3e96464a 100644
--- a/Documentation/media/uapi/v4l/buffer.rst
+++ b/Documentation/media/uapi/v4l/buffer.rst
@@ -437,23 +437,17 @@ Buffer Flags
 	:ref:`VIDIOC_PREPARE_BUF <VIDIOC_QBUF>`,
 	:ref:`VIDIOC_QBUF` or
 	:ref:`VIDIOC_DQBUF <VIDIOC_QBUF>` ioctl is called.
-    * .. _`V4L2-BUF-FLAG-NO-CACHE-INVALIDATE`:
+    * .. _`V4L2-BUF-FLAG-NO-CACHE-SYNC`:
 
-      - ``V4L2_BUF_FLAG_NO_CACHE_INVALIDATE``
+      - ``V4L2_BUF_FLAG_NO_CACHE_SYNC``
       - 0x00000800
-      - Caches do not have to be invalidated for this buffer. Typically
-	applications shall use this flag if the data captured in the
-	buffer is not going to be touched by the CPU, instead the buffer
-	will, probably, be passed on to a DMA-capable hardware unit for
-	further processing or output.
-    * .. _`V4L2-BUF-FLAG-NO-CACHE-CLEAN`:
-
-      - ``V4L2_BUF_FLAG_NO_CACHE_CLEAN``
-      - 0x00001000
-      - Caches do not have to be cleaned for this buffer. Typically
-	applications shall use this flag for output buffers if the data in
-	this buffer has not been created by the CPU but by some
-	DMA-capable unit, in which case caches have not been used.
+      - Do not perform CPU cache synchronisation operations when the buffer is
+	queued or dequeued. The user is responsible for the correct use of
+	this flag. It should be only used when the buffer is not accessed
+	using the CPU, e.g. the buffer is written to by a hardware block and
+	then read by another one, in which case the flag should be set in both
+	:ref:`VIDIOC_QBUF` and :ref:`VIDIOC_DQBUF` ioctls. The flag has no
+	effect on some devices / architectures.
     * .. _`V4L2-BUF-FLAG-LAST`:
 
       - ``V4L2_BUF_FLAG_LAST``
diff --git a/Documentation/media/uapi/v4l/vidioc-prepare-buf.rst b/Documentation/media/uapi/v4l/vidioc-prepare-buf.rst
index bdcfd9fe550d..80aeb7e403f3 100644
--- a/Documentation/media/uapi/v4l/vidioc-prepare-buf.rst
+++ b/Documentation/media/uapi/v4l/vidioc-prepare-buf.rst
@@ -36,9 +36,8 @@ pass ownership of the buffer to the driver before actually enqueuing it,
 using the :ref:`VIDIOC_QBUF` ioctl, and to prepare it for future I/O. Such
 preparations may include cache invalidation or cleaning. Performing them
 in advance saves time during the actual I/O. In case such cache
-operations are not required, the application can use one of
-``V4L2_BUF_FLAG_NO_CACHE_INVALIDATE`` and
-``V4L2_BUF_FLAG_NO_CACHE_CLEAN`` flags to skip the respective step.
+operations are not required, the application can use the
+``V4L2_BUF_FLAG_NO_CACHE_SYNC`` flag to skip the cache synchronization step.
 
 The struct :c:type:`v4l2_buffer` structure is specified in
 :ref:`buffer`.
diff --git a/include/trace/events/v4l2.h b/include/trace/events/v4l2.h
index ee7754c6e4a1..fb9ad7b0dddd 100644
--- a/include/trace/events/v4l2.h
+++ b/include/trace/events/v4l2.h
@@ -80,8 +80,7 @@ SHOW_FIELD
 		{ V4L2_BUF_FLAG_ERROR,		     "ERROR" },		      \
 		{ V4L2_BUF_FLAG_TIMECODE,	     "TIMECODE" },	      \
 		{ V4L2_BUF_FLAG_PREPARED,	     "PREPARED" },	      \
-		{ V4L2_BUF_FLAG_NO_CACHE_INVALIDATE, "NO_CACHE_INVALIDATE" }, \
-		{ V4L2_BUF_FLAG_NO_CACHE_CLEAN,	     "NO_CACHE_CLEAN" },      \
+		{ V4L2_BUF_FLAG_NO_CACHE_SYNC,	     "NO_CACHE_SYNC" },	      \
 		{ V4L2_BUF_FLAG_TIMESTAMP_MASK,	     "TIMESTAMP_MASK" },      \
 		{ V4L2_BUF_FLAG_TIMESTAMP_UNKNOWN,   "TIMESTAMP_UNKNOWN" },   \
 		{ V4L2_BUF_FLAG_TIMESTAMP_MONOTONIC, "TIMESTAMP_MONOTONIC" }, \
diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
index 46e8a2e369f9..3516dd638009 100644
--- a/include/uapi/linux/videodev2.h
+++ b/include/uapi/linux/videodev2.h
@@ -935,8 +935,11 @@ struct v4l2_buffer {
 #define V4L2_BUF_FLAG_TIMECODE			0x00000100
 /* Buffer is prepared for queuing */
 #define V4L2_BUF_FLAG_PREPARED			0x00000400
-/* Cache handling flags */
-#define V4L2_BUF_FLAG_NO_CACHE_INVALIDATE	0x00000800
+/* Cache sync hint */
+#define V4L2_BUF_FLAG_NO_CACHE_SYNC		0x00000800
+/* DEPRECATED. THIS WILL BE REMOVED IN THE FUTURE! */
+#define V4L2_BUF_FLAG_NO_CACHE_INVALIDATE	V4L2_BUF_FLAG_NO_CACHE_SYNC
+/* DEPRECATED. THIS WILL BE REMOVED IN THE FUTURE! */
 #define V4L2_BUF_FLAG_NO_CACHE_CLEAN		0x00001000
 /* Timestamp type */
 #define V4L2_BUF_FLAG_TIMESTAMP_MASK		0x0000e000
-- 
Regards,

Laurent Pinchart


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC v2 05/11] v4l2-core: Don't sync cache for a buffer if so requested
  2016-12-16  1:24 ` Laurent Pinchart
                   ` (4 preceding siblings ...)
  (?)
@ 2016-12-16  1:24 ` Laurent Pinchart
  2017-03-27 22:08   ` Shuah Khan
  -1 siblings, 1 reply; 25+ messages in thread
From: Laurent Pinchart @ 2016-12-16  1:24 UTC (permalink / raw)
  To: linux-media
  Cc: dri-devel, Pawel Osciak, Marek Szyprowski, Kyungmin Park,
	Hans Verkuil, Sumit Semwal, Rob Clark, Daniel Vetter,
	Laura Abbott, Sakari Ailus, Samu Onkalo

From: Samu Onkalo <samu.onkalo@intel.com>

The user may request to the driver (vb2) to skip the cache maintenance
operations in case the buffer does not need cache synchronisation, e.g. in
cases where the buffer is passed between hardware blocks without it being
touched by the CPU.

Also document that the prepare and finish vb2_mem_ops might not get called
every time the buffer ownership changes between the kernel and the user
space.

Signed-off-by: Samu Onkalo <samu.onkalo@intel.com>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
---
Changes since v1:

- Add a no_cache_sync argument to vb2 core prepare/qbuf/dqbuf functions
  to get round the inability to access v4l2_buffer flags from vb2 core.
---
 drivers/media/v4l2-core/videobuf2-core.c | 101 +++++++++++++++++++++----------
 drivers/media/v4l2-core/videobuf2-v4l2.c |  14 ++++-
 include/media/videobuf2-core.h           |  23 ++++---
 3 files changed, 97 insertions(+), 41 deletions(-)

diff --git a/drivers/media/v4l2-core/videobuf2-core.c b/drivers/media/v4l2-core/videobuf2-core.c
index 15a83f338072..e5371ef213b0 100644
--- a/drivers/media/v4l2-core/videobuf2-core.c
+++ b/drivers/media/v4l2-core/videobuf2-core.c
@@ -189,6 +189,28 @@ static void __vb2_queue_cancel(struct vb2_queue *q);
 static void __enqueue_in_driver(struct vb2_buffer *vb);
 
 /**
+ * __mem_prepare_planes() - call finish mem op for all planes of the buffer
+ */
+static void __mem_prepare_planes(struct vb2_buffer *vb)
+{
+	unsigned int plane;
+
+	for (plane = 0; plane < vb->num_planes; ++plane)
+		call_void_memop(vb, prepare, vb->planes[plane].mem_priv);
+}
+
+/**
+ * __mem_finish_planes() - call finish mem op for all planes of the buffer
+ */
+static void __mem_finish_planes(struct vb2_buffer *vb)
+{
+	unsigned int plane;
+
+	for (plane = 0; plane < vb->num_planes; ++plane)
+		call_void_memop(vb, finish, vb->planes[plane].mem_priv);
+}
+
+/**
  * __vb2_buf_mem_alloc() - allocate video memory for the given buffer
  */
 static int __vb2_buf_mem_alloc(struct vb2_buffer *vb)
@@ -953,20 +975,29 @@ EXPORT_SYMBOL_GPL(vb2_discard_done);
 /**
  * __prepare_mmap() - prepare an MMAP buffer
  */
-static int __prepare_mmap(struct vb2_buffer *vb, const void *pb)
+static int __prepare_mmap(struct vb2_buffer *vb, const void *pb,
+			  bool no_cache_sync)
 {
-	int ret = 0;
+	int ret;
 
-	if (pb)
+	if (pb) {
 		ret = call_bufop(vb->vb2_queue, fill_vb2_buffer,
 				 vb, pb, vb->planes);
-	return ret ? ret : call_vb_qop(vb, buf_prepare, vb);
+		if (ret)
+			return ret;
+	}
+
+	if (!no_cache_sync)
+		__mem_prepare_planes(vb);
+
+	return call_vb_qop(vb, buf_prepare, vb);
 }
 
 /**
  * __prepare_userptr() - prepare a USERPTR buffer
  */
-static int __prepare_userptr(struct vb2_buffer *vb, const void *pb)
+static int __prepare_userptr(struct vb2_buffer *vb, const void *pb,
+			     bool no_cache_sync)
 {
 	struct vb2_plane planes[VB2_MAX_PLANES];
 	struct vb2_queue *q = vb->vb2_queue;
@@ -1056,6 +1087,11 @@ static int __prepare_userptr(struct vb2_buffer *vb, const void *pb)
 			dprintk(1, "buffer initialization failed\n");
 			goto err;
 		}
+
+		/* This is new buffer memory --- always synchronise cache. */
+		__mem_prepare_planes(vb);
+	} else if (!no_cache_sync) {
+		__mem_prepare_planes(vb);
 	}
 
 	ret = call_vb_qop(vb, buf_prepare, vb);
@@ -1083,7 +1119,8 @@ static int __prepare_userptr(struct vb2_buffer *vb, const void *pb)
 /**
  * __prepare_dmabuf() - prepare a DMABUF buffer
  */
-static int __prepare_dmabuf(struct vb2_buffer *vb, const void *pb)
+static int __prepare_dmabuf(struct vb2_buffer *vb, const void *pb,
+			    bool no_cache_sync)
 {
 	struct vb2_plane planes[VB2_MAX_PLANES];
 	struct vb2_queue *q = vb->vb2_queue;
@@ -1197,6 +1234,11 @@ static int __prepare_dmabuf(struct vb2_buffer *vb, const void *pb)
 			dprintk(1, "buffer initialization failed\n");
 			goto err;
 		}
+
+		/* This is new buffer memory --- always synchronise cache. */
+		__mem_prepare_planes(vb);
+	} else if (!no_cache_sync) {
+		__mem_prepare_planes(vb);
 	}
 
 	ret = call_vb_qop(vb, buf_prepare, vb);
@@ -1229,10 +1271,10 @@ static void __enqueue_in_driver(struct vb2_buffer *vb)
 	call_void_vb_qop(vb, buf_queue, vb);
 }
 
-static int __buf_prepare(struct vb2_buffer *vb, const void *pb)
+static int __buf_prepare(struct vb2_buffer *vb, const void *pb,
+			 bool no_cache_sync)
 {
 	struct vb2_queue *q = vb->vb2_queue;
-	unsigned int plane;
 	int ret;
 
 	if (q->error) {
@@ -1244,13 +1286,13 @@ static int __buf_prepare(struct vb2_buffer *vb, const void *pb)
 
 	switch (q->memory) {
 	case VB2_MEMORY_MMAP:
-		ret = __prepare_mmap(vb, pb);
+		ret = __prepare_mmap(vb, pb, no_cache_sync);
 		break;
 	case VB2_MEMORY_USERPTR:
-		ret = __prepare_userptr(vb, pb);
+		ret = __prepare_userptr(vb, pb, no_cache_sync);
 		break;
 	case VB2_MEMORY_DMABUF:
-		ret = __prepare_dmabuf(vb, pb);
+		ret = __prepare_dmabuf(vb, pb, no_cache_sync);
 		break;
 	default:
 		WARN(1, "Invalid queue type\n");
@@ -1263,16 +1305,13 @@ static int __buf_prepare(struct vb2_buffer *vb, const void *pb)
 		return ret;
 	}
 
-	/* sync buffers */
-	for (plane = 0; plane < vb->num_planes; ++plane)
-		call_void_memop(vb, prepare, vb->planes[plane].mem_priv);
-
 	vb->state = VB2_BUF_STATE_PREPARED;
 
 	return 0;
 }
 
-int vb2_core_prepare_buf(struct vb2_queue *q, unsigned int index, void *pb)
+int vb2_core_prepare_buf(struct vb2_queue *q, unsigned int index, void *pb,
+			 bool no_cache_sync)
 {
 	struct vb2_buffer *vb;
 	int ret;
@@ -1284,7 +1323,7 @@ int vb2_core_prepare_buf(struct vb2_queue *q, unsigned int index, void *pb)
 		return -EINVAL;
 	}
 
-	ret = __buf_prepare(vb, pb);
+	ret = __buf_prepare(vb, pb, no_cache_sync);
 	if (ret)
 		return ret;
 
@@ -1360,7 +1399,8 @@ static int vb2_start_streaming(struct vb2_queue *q)
 	return ret;
 }
 
-int vb2_core_qbuf(struct vb2_queue *q, unsigned int index, void *pb)
+int vb2_core_qbuf(struct vb2_queue *q, unsigned int index, void *pb,
+		  bool no_cache_sync)
 {
 	struct vb2_buffer *vb;
 	int ret;
@@ -1369,7 +1409,7 @@ int vb2_core_qbuf(struct vb2_queue *q, unsigned int index, void *pb)
 
 	switch (vb->state) {
 	case VB2_BUF_STATE_DEQUEUED:
-		ret = __buf_prepare(vb, pb);
+		ret = __buf_prepare(vb, pb, no_cache_sync);
 		if (ret)
 			return ret;
 		break;
@@ -1555,7 +1595,7 @@ EXPORT_SYMBOL_GPL(vb2_wait_for_all_buffers);
 /**
  * __vb2_dqbuf() - bring back the buffer to the DEQUEUED state
  */
-static void __vb2_dqbuf(struct vb2_buffer *vb)
+static void __vb2_dqbuf(struct vb2_buffer *vb, bool no_cache_sync)
 {
 	struct vb2_queue *q = vb->vb2_queue;
 	unsigned int i;
@@ -1566,9 +1606,8 @@ static void __vb2_dqbuf(struct vb2_buffer *vb)
 
 	vb->state = VB2_BUF_STATE_DEQUEUED;
 
-	/* sync buffers */
-	for (i = 0; i < vb->num_planes; ++i)
-		call_void_memop(vb, finish, vb->planes[i].mem_priv);
+	if (!no_cache_sync)
+		__mem_finish_planes(vb);
 
 	/* unmap DMABUF buffer */
 	if (q->memory == VB2_MEMORY_DMABUF)
@@ -1581,7 +1620,7 @@ static void __vb2_dqbuf(struct vb2_buffer *vb)
 }
 
 int vb2_core_dqbuf(struct vb2_queue *q, unsigned int *pindex, void *pb,
-		   bool nonblocking)
+		   bool nonblocking, bool no_cache_sync)
 {
 	struct vb2_buffer *vb = NULL;
 	int ret;
@@ -1618,7 +1657,7 @@ int vb2_core_dqbuf(struct vb2_queue *q, unsigned int *pindex, void *pb,
 	trace_vb2_dqbuf(q, vb);
 
 	/* go back to dequeued state */
-	__vb2_dqbuf(vb);
+	__vb2_dqbuf(vb, no_cache_sync);
 
 	dprintk(1, "dqbuf of buffer %d, with state %d\n",
 			vb->index, vb->state);
@@ -1692,7 +1731,7 @@ static void __vb2_queue_cancel(struct vb2_queue *q)
 			vb->state = VB2_BUF_STATE_PREPARED;
 			call_void_vb_qop(vb, buf_finish, vb);
 		}
-		__vb2_dqbuf(vb);
+		__vb2_dqbuf(vb, false);
 	}
 }
 
@@ -2240,7 +2279,7 @@ static int __vb2_init_fileio(struct vb2_queue *q, int read)
 		 * Queue all buffers.
 		 */
 		for (i = 0; i < q->num_buffers; i++) {
-			ret = vb2_core_qbuf(q, i, NULL);
+			ret = vb2_core_qbuf(q, i, NULL, false);
 			if (ret)
 				goto err_reqbufs;
 			fileio->bufs[i].queued = 1;
@@ -2343,7 +2382,7 @@ static size_t __vb2_perform_fileio(struct vb2_queue *q, char __user *data, size_
 		/*
 		 * Call vb2_dqbuf to get buffer back.
 		 */
-		ret = vb2_core_dqbuf(q, &index, NULL, nonblock);
+		ret = vb2_core_dqbuf(q, &index, NULL, nonblock, false);
 		dprintk(5, "vb2_dqbuf result: %d\n", ret);
 		if (ret)
 			return ret;
@@ -2419,7 +2458,7 @@ static size_t __vb2_perform_fileio(struct vb2_queue *q, char __user *data, size_
 
 		if (copy_timestamp)
 			b->timestamp = ktime_get_ns();
-		ret = vb2_core_qbuf(q, index, NULL);
+		ret = vb2_core_qbuf(q, index, NULL, false);
 		dprintk(5, "vb2_dbuf result: %d\n", ret);
 		if (ret)
 			return ret;
@@ -2505,7 +2544,7 @@ static int vb2_thread(void *data)
 		} else {
 			call_void_qop(q, wait_finish, q);
 			if (!threadio->stop)
-				ret = vb2_core_dqbuf(q, &index, NULL, 0);
+				ret = vb2_core_dqbuf(q, &index, NULL, 0, false);
 			call_void_qop(q, wait_prepare, q);
 			dprintk(5, "file io: vb2_dqbuf result: %d\n", ret);
 			if (!ret)
@@ -2522,7 +2561,7 @@ static int vb2_thread(void *data)
 		if (copy_timestamp)
 			vb->timestamp = ktime_get_ns();;
 		if (!threadio->stop)
-			ret = vb2_core_qbuf(q, vb->index, NULL);
+			ret = vb2_core_qbuf(q, vb->index, NULL, false);
 		call_void_qop(q, wait_prepare, q);
 		if (ret || threadio->stop)
 			break;
diff --git a/drivers/media/v4l2-core/videobuf2-v4l2.c b/drivers/media/v4l2-core/videobuf2-v4l2.c
index 3529849d2218..7e327ad6ef30 100644
--- a/drivers/media/v4l2-core/videobuf2-v4l2.c
+++ b/drivers/media/v4l2-core/videobuf2-v4l2.c
@@ -499,8 +499,11 @@ int vb2_prepare_buf(struct vb2_queue *q, struct v4l2_buffer *b)
 	}
 
 	ret = vb2_queue_or_prepare_buf(q, b, "prepare_buf");
+	if (ret)
+		return ret;
 
-	return ret ? ret : vb2_core_prepare_buf(q, b->index, b);
+	return vb2_core_prepare_buf(q, b->index, b,
+				    b->flags & V4L2_BUF_FLAG_NO_CACHE_SYNC);
 }
 EXPORT_SYMBOL_GPL(vb2_prepare_buf);
 
@@ -565,7 +568,11 @@ int vb2_qbuf(struct vb2_queue *q, struct v4l2_buffer *b)
 	}
 
 	ret = vb2_queue_or_prepare_buf(q, b, "qbuf");
-	return ret ? ret : vb2_core_qbuf(q, b->index, b);
+	if (ret)
+		return ret;
+
+	return vb2_core_qbuf(q, b->index, b,
+			     b->flags & V4L2_BUF_FLAG_NO_CACHE_SYNC);
 }
 EXPORT_SYMBOL_GPL(vb2_qbuf);
 
@@ -583,7 +590,8 @@ int vb2_dqbuf(struct vb2_queue *q, struct v4l2_buffer *b, bool nonblocking)
 		return -EINVAL;
 	}
 
-	ret = vb2_core_dqbuf(q, NULL, b, nonblocking);
+	ret = vb2_core_dqbuf(q, NULL, b, nonblocking,
+			     b->flags & V4L2_BUF_FLAG_NO_CACHE_SYNC);
 
 	/*
 	 *  After calling the VIDIOC_DQBUF V4L2_BUF_FLAG_DONE must be
diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h
index ac5898a55fd9..bfad0588bb2b 100644
--- a/include/media/videobuf2-core.h
+++ b/include/media/videobuf2-core.h
@@ -83,10 +83,14 @@ struct vb2_threadio_data;
  *		dmabuf.
  * @unmap_dmabuf: releases access control to the dmabuf - allocator is notified
  *		  that this driver is done using the dmabuf for now.
- * @prepare:	called every time the buffer is passed from userspace to the
- *		driver, useful for cache synchronisation, optional.
- * @finish:	called every time the buffer is passed back from the driver
- *		to the userspace, also optional.
+ * @prepare:	Called on the plane when the buffer ownership is passed from
+ *		the user space to the kernel and the plane must be cache
+ *		syncronised. The V4L2_BUF_FLAG_NO_CACHE_SYNC buffer flag may
+ *		be used to skip this call. Optional.
+ * @finish:	Called on the plane when the buffer ownership is passed from
+ *		the kernel to the user space and the plane must be cache
+ *		syncronised. The V4L2_BUF_FLAG_NO_CACHE_SYNC buffer flag may
+ *		be used to skip this call. Optional.
  * @vaddr:	return a kernel virtual address to a given memory buffer
  *		associated with the passed private structure or NULL if no
  *		such mapping exists.
@@ -695,6 +699,7 @@ int vb2_core_create_bufs(struct vb2_queue *q, enum vb2_memory memory,
  * @index:	id number of the buffer
  * @pb:		buffer structure passed from userspace to vidioc_prepare_buf
  *		handler in driver
+ * @no_cache_sync if true, skip cache synchronization
  *
  * Should be called from vidioc_prepare_buf ioctl handler of a driver.
  * The passed buffer should have been verified.
@@ -704,7 +709,8 @@ int vb2_core_create_bufs(struct vb2_queue *q, enum vb2_memory memory,
  * The return values from this function are intended to be directly returned
  * from vidioc_prepare_buf handler in driver.
  */
-int vb2_core_prepare_buf(struct vb2_queue *q, unsigned int index, void *pb);
+int vb2_core_prepare_buf(struct vb2_queue *q, unsigned int index, void *pb,
+			 bool no_cache_sync);
 
 /**
  * vb2_core_qbuf() - Queue a buffer from userspace
@@ -713,6 +719,7 @@ int vb2_core_prepare_buf(struct vb2_queue *q, unsigned int index, void *pb);
  * @index:	id number of the buffer
  * @pb:		buffer structure passed from userspace to vidioc_qbuf handler
  *		in driver
+ * @no_cache_sync if true, skip cache synchronization
  *
  * Should be called from vidioc_qbuf ioctl handler of a driver.
  * The passed buffer should have been verified.
@@ -727,7 +734,8 @@ int vb2_core_prepare_buf(struct vb2_queue *q, unsigned int index, void *pb);
  * The return values from this function are intended to be directly returned
  * from vidioc_qbuf handler in driver.
  */
-int vb2_core_qbuf(struct vb2_queue *q, unsigned int index, void *pb);
+int vb2_core_qbuf(struct vb2_queue *q, unsigned int index, void *pb,
+		  bool no_cache_sync);
 
 /**
  * vb2_core_dqbuf() - Dequeue a buffer to the userspace
@@ -738,6 +746,7 @@ int vb2_core_qbuf(struct vb2_queue *q, unsigned int index, void *pb);
  * @nonblocking: if true, this call will not sleep waiting for a buffer if no
  *		 buffers ready for dequeuing are present. Normally the driver
  *		 would be passing (file->f_flags & O_NONBLOCK) here
+ * @no_cache_sync if true, skip cache synchronization
  *
  * Should be called from vidioc_dqbuf ioctl handler of a driver.
  * The passed buffer should have been verified.
@@ -754,7 +763,7 @@ int vb2_core_qbuf(struct vb2_queue *q, unsigned int index, void *pb);
  * from vidioc_dqbuf handler in driver.
  */
 int vb2_core_dqbuf(struct vb2_queue *q, unsigned int *pindex, void *pb,
-		   bool nonblocking);
+		   bool nonblocking, bool no_cache_sync);
 
 int vb2_core_streamon(struct vb2_queue *q, unsigned int type);
 int vb2_core_streamoff(struct vb2_queue *q, unsigned int type);
-- 
Regards,

Laurent Pinchart


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC v2 06/11] vb2: Improve struct vb2_mem_ops documentation; alloc and put are for MMAP
  2016-12-16  1:24 ` Laurent Pinchart
                   ` (5 preceding siblings ...)
  (?)
@ 2016-12-16  1:24 ` Laurent Pinchart
  -1 siblings, 0 replies; 25+ messages in thread
From: Laurent Pinchart @ 2016-12-16  1:24 UTC (permalink / raw)
  To: linux-media
  Cc: dri-devel, Pawel Osciak, Marek Szyprowski, Kyungmin Park,
	Hans Verkuil, Sumit Semwal, Rob Clark, Daniel Vetter,
	Laura Abbott, Sakari Ailus

From: Sakari Ailus <sakari.ailus@linux.intel.com>

The alloc() and put() ops are for MMAP buffers only. Document it.

Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Acked-by: Hans Verkuil <hans.verkuil@cisco.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
---
Changes since v1:

- Fixed typo in documentation
---
 include/media/videobuf2-core.h | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h
index bfad0588bb2b..15084b90e2d5 100644
--- a/include/media/videobuf2-core.h
+++ b/include/media/videobuf2-core.h
@@ -46,16 +46,16 @@ struct vb2_threadio_data;
 
 /**
  * struct vb2_mem_ops - memory handling/memory allocator operations
- * @alloc:	allocate video memory and, optionally, allocator private data,
- *		return ERR_PTR() on failure or a pointer to allocator private,
- *		per-buffer data on success; the returned private structure
- *		will then be passed as @buf_priv argument to other ops in this
- *		structure. Additional gfp_flags to use when allocating the
- *		are also passed to this operation. These flags are from the
- *		gfp_flags field of vb2_queue.
- * @put:	inform the allocator that the buffer will no longer be used;
- *		usually will result in the allocator freeing the buffer (if
- *		no other users of this buffer are present); the @buf_priv
+ * @alloc:	allocate video memory for an MMAP buffer and, optionally,
+ *		allocator private data, return ERR_PTR() on failure or a pointer
+ *		to allocator private, per-buffer data on success; the returned
+ *		private structure will then be passed as @buf_priv argument to
+ *		other ops in this structure. Additional gfp_flags to use when
+ *		allocating the memory are also passed to this operation. These
+ *		flags are from the gfp_flags field of vb2_queue.
+ * @put:	inform the allocator that the MMAP buffer will no longer be
+ *		used; usually will result in the allocator freeing the buffer
+ *		(if no other users of this buffer are present); the @buf_priv
  *		argument is the allocator private per-buffer structure
  *		previously returned from the alloc callback.
  * @get_dmabuf: acquire userspace memory for a hardware operation; used for
-- 
Regards,

Laurent Pinchart


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC v2 07/11] vb2: dma-contig: Remove redundant sgt_base field
  2016-12-16  1:24 ` Laurent Pinchart
                   ` (6 preceding siblings ...)
  (?)
@ 2016-12-16  1:24 ` Laurent Pinchart
  2017-03-27 22:51   ` Shuah Khan
  -1 siblings, 1 reply; 25+ messages in thread
From: Laurent Pinchart @ 2016-12-16  1:24 UTC (permalink / raw)
  To: linux-media
  Cc: dri-devel, Pawel Osciak, Marek Szyprowski, Kyungmin Park,
	Hans Verkuil, Sumit Semwal, Rob Clark, Daniel Vetter,
	Laura Abbott, Sakari Ailus

From: Sakari Ailus <sakari.ailus@linux.intel.com>

The struct vb2_dc_buf contains two struct sg_table fields: sgt_base and
dma_sgt. The former is used by DMA-BUF buffers whereas the latter is used
by USERPTR.

Unify the two, leaving dma_sgt.

MMAP buffers do not need cache flushing since they have been allocated
using dma_alloc_coherent().

Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
---
Changes since v1:

- Test for MMAP or DMABUF type through the vec field instead of the now
  gone vma field.
- Move the vec field to a USERPTR section in struct vb2_dc_buf, where
  the vma field was located.
---
 drivers/media/v4l2-core/videobuf2-dma-contig.c | 25 +++++++++++++------------
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/drivers/media/v4l2-core/videobuf2-dma-contig.c b/drivers/media/v4l2-core/videobuf2-dma-contig.c
index fb6a177be461..2a00d12ffee2 100644
--- a/drivers/media/v4l2-core/videobuf2-dma-contig.c
+++ b/drivers/media/v4l2-core/videobuf2-dma-contig.c
@@ -30,12 +30,13 @@ struct vb2_dc_buf {
 	unsigned long			attrs;
 	enum dma_data_direction		dma_dir;
 	struct sg_table			*dma_sgt;
-	struct frame_vector		*vec;
 
 	/* MMAP related */
 	struct vb2_vmarea_handler	handler;
 	atomic_t			refcount;
-	struct sg_table			*sgt_base;
+
+	/* USERPTR related */
+	struct frame_vector		*vec;
 
 	/* DMABUF related */
 	struct dma_buf_attachment	*db_attach;
@@ -95,7 +96,7 @@ static void vb2_dc_prepare(void *buf_priv)
 	struct sg_table *sgt = buf->dma_sgt;
 
 	/* DMABUF exporter will flush the cache for us */
-	if (!sgt || buf->db_attach)
+	if (!buf->vec)
 		return;
 
 	dma_sync_sg_for_device(buf->dev, sgt->sgl, sgt->orig_nents,
@@ -108,7 +109,7 @@ static void vb2_dc_finish(void *buf_priv)
 	struct sg_table *sgt = buf->dma_sgt;
 
 	/* DMABUF exporter will flush the cache for us */
-	if (!sgt || buf->db_attach)
+	if (!buf->vec)
 		return;
 
 	dma_sync_sg_for_cpu(buf->dev, sgt->sgl, sgt->orig_nents, buf->dma_dir);
@@ -125,9 +126,9 @@ static void vb2_dc_put(void *buf_priv)
 	if (!atomic_dec_and_test(&buf->refcount))
 		return;
 
-	if (buf->sgt_base) {
-		sg_free_table(buf->sgt_base);
-		kfree(buf->sgt_base);
+	if (buf->dma_sgt) {
+		sg_free_table(buf->dma_sgt);
+		kfree(buf->dma_sgt);
 	}
 	dma_free_attrs(buf->dev, buf->size, buf->cookie, buf->dma_addr,
 		       buf->attrs);
@@ -239,13 +240,13 @@ static int vb2_dc_dmabuf_ops_attach(struct dma_buf *dbuf, struct device *dev,
 	/* Copy the buf->base_sgt scatter list to the attachment, as we can't
 	 * map the same scatter list to multiple attachments at the same time.
 	 */
-	ret = sg_alloc_table(sgt, buf->sgt_base->orig_nents, GFP_KERNEL);
+	ret = sg_alloc_table(sgt, buf->dma_sgt->orig_nents, GFP_KERNEL);
 	if (ret) {
 		kfree(attach);
 		return -ENOMEM;
 	}
 
-	rd = buf->sgt_base->sgl;
+	rd = buf->dma_sgt->sgl;
 	wr = sgt->sgl;
 	for (i = 0; i < sgt->orig_nents; ++i) {
 		sg_set_page(wr, sg_page(rd), rd->length, rd->offset);
@@ -396,10 +397,10 @@ static struct dma_buf *vb2_dc_get_dmabuf(void *buf_priv, unsigned long flags)
 	exp_info.flags = flags;
 	exp_info.priv = buf;
 
-	if (!buf->sgt_base)
-		buf->sgt_base = vb2_dc_get_base_sgt(buf);
+	if (!buf->dma_sgt)
+		buf->dma_sgt = vb2_dc_get_base_sgt(buf);
 
-	if (WARN_ON(!buf->sgt_base))
+	if (WARN_ON(!buf->dma_sgt))
 		return NULL;
 
 	dbuf = dma_buf_export(&exp_info);
-- 
Regards,

Laurent Pinchart


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC v2 08/11] vb2: dma-contig: Don't warn on failure in obtaining scatterlist
  2016-12-16  1:24 ` Laurent Pinchart
@ 2016-12-16  1:24   ` Laurent Pinchart
  -1 siblings, 0 replies; 25+ messages in thread
From: Laurent Pinchart @ 2016-12-16  1:24 UTC (permalink / raw)
  To: linux-media
  Cc: dri-devel, Pawel Osciak, Marek Szyprowski, Kyungmin Park,
	Hans Verkuil, Sumit Semwal, Rob Clark, Daniel Vetter,
	Laura Abbott, Sakari Ailus

From: Sakari Ailus <sakari.ailus@linux.intel.com>

vb2_dc_get_base_sgt() which obtains the scatterlist already prints
information on why the scatterlist could not be obtained.

Also, remove the useless warning of a failed kmalloc().

Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
---
 drivers/media/v4l2-core/videobuf2-dma-contig.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/media/v4l2-core/videobuf2-dma-contig.c b/drivers/media/v4l2-core/videobuf2-dma-contig.c
index 2a00d12ffee2..d59f107f0457 100644
--- a/drivers/media/v4l2-core/videobuf2-dma-contig.c
+++ b/drivers/media/v4l2-core/videobuf2-dma-contig.c
@@ -370,10 +370,8 @@ static struct sg_table *vb2_dc_get_base_sgt(struct vb2_dc_buf *buf)
 	struct sg_table *sgt;
 
 	sgt = kmalloc(sizeof(*sgt), GFP_KERNEL);
-	if (!sgt) {
-		dev_err(buf->dev, "failed to alloc sg table\n");
+	if (!sgt)
 		return NULL;
-	}
 
 	ret = dma_get_sgtable_attrs(buf->dev, sgt, buf->cookie, buf->dma_addr,
 		buf->size, buf->attrs);
@@ -400,7 +398,7 @@ static struct dma_buf *vb2_dc_get_dmabuf(void *buf_priv, unsigned long flags)
 	if (!buf->dma_sgt)
 		buf->dma_sgt = vb2_dc_get_base_sgt(buf);
 
-	if (WARN_ON(!buf->dma_sgt))
+	if (!buf->dma_sgt)
 		return NULL;
 
 	dbuf = dma_buf_export(&exp_info);
-- 
Regards,

Laurent Pinchart


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC v2 08/11] vb2: dma-contig: Don't warn on failure in obtaining scatterlist
@ 2016-12-16  1:24   ` Laurent Pinchart
  0 siblings, 0 replies; 25+ messages in thread
From: Laurent Pinchart @ 2016-12-16  1:24 UTC (permalink / raw)
  To: linux-media
  Cc: Daniel Vetter, dri-devel, Hans Verkuil, Kyungmin Park,
	Sakari Ailus, Pawel Osciak, Marek Szyprowski

From: Sakari Ailus <sakari.ailus@linux.intel.com>

vb2_dc_get_base_sgt() which obtains the scatterlist already prints
information on why the scatterlist could not be obtained.

Also, remove the useless warning of a failed kmalloc().

Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
---
 drivers/media/v4l2-core/videobuf2-dma-contig.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/media/v4l2-core/videobuf2-dma-contig.c b/drivers/media/v4l2-core/videobuf2-dma-contig.c
index 2a00d12ffee2..d59f107f0457 100644
--- a/drivers/media/v4l2-core/videobuf2-dma-contig.c
+++ b/drivers/media/v4l2-core/videobuf2-dma-contig.c
@@ -370,10 +370,8 @@ static struct sg_table *vb2_dc_get_base_sgt(struct vb2_dc_buf *buf)
 	struct sg_table *sgt;
 
 	sgt = kmalloc(sizeof(*sgt), GFP_KERNEL);
-	if (!sgt) {
-		dev_err(buf->dev, "failed to alloc sg table\n");
+	if (!sgt)
 		return NULL;
-	}
 
 	ret = dma_get_sgtable_attrs(buf->dev, sgt, buf->cookie, buf->dma_addr,
 		buf->size, buf->attrs);
@@ -400,7 +398,7 @@ static struct dma_buf *vb2_dc_get_dmabuf(void *buf_priv, unsigned long flags)
 	if (!buf->dma_sgt)
 		buf->dma_sgt = vb2_dc_get_base_sgt(buf);
 
-	if (WARN_ON(!buf->dma_sgt))
+	if (!buf->dma_sgt)
 		return NULL;
 
 	dbuf = dma_buf_export(&exp_info);
-- 
Regards,

Laurent Pinchart

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC v2 09/11] vb2: dma-contig: Move vb2_dc_get_base_sgt() up
  2016-12-16  1:24 ` Laurent Pinchart
                   ` (8 preceding siblings ...)
  (?)
@ 2016-12-16  1:24 ` Laurent Pinchart
  -1 siblings, 0 replies; 25+ messages in thread
From: Laurent Pinchart @ 2016-12-16  1:24 UTC (permalink / raw)
  To: linux-media
  Cc: dri-devel, Pawel Osciak, Marek Szyprowski, Kyungmin Park,
	Hans Verkuil, Sumit Semwal, Rob Clark, Daniel Vetter,
	Laura Abbott, Sakari Ailus

From: Sakari Ailus <sakari.ailus@linux.intel.com>

Just move the function up. It'll be soon needed earlier than previously.

Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
---
 drivers/media/v4l2-core/videobuf2-dma-contig.c | 40 +++++++++++++-------------
 1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/drivers/media/v4l2-core/videobuf2-dma-contig.c b/drivers/media/v4l2-core/videobuf2-dma-contig.c
index d59f107f0457..d503647ea522 100644
--- a/drivers/media/v4l2-core/videobuf2-dma-contig.c
+++ b/drivers/media/v4l2-core/videobuf2-dma-contig.c
@@ -62,6 +62,26 @@ static unsigned long vb2_dc_get_contiguous_size(struct sg_table *sgt)
 	return size;
 }
 
+static struct sg_table *vb2_dc_get_base_sgt(struct vb2_dc_buf *buf)
+{
+	int ret;
+	struct sg_table *sgt;
+
+	sgt = kmalloc(sizeof(*sgt), GFP_KERNEL);
+	if (!sgt)
+		return NULL;
+
+	ret = dma_get_sgtable_attrs(buf->dev, sgt, buf->cookie, buf->dma_addr,
+		buf->size, buf->attrs);
+	if (ret < 0) {
+		dev_err(buf->dev, "failed to get scatterlist from DMA API\n");
+		kfree(sgt);
+		return NULL;
+	}
+
+	return sgt;
+}
+
 /*********************************************/
 /*         callbacks for all buffers         */
 /*********************************************/
@@ -364,26 +384,6 @@ static struct dma_buf_ops vb2_dc_dmabuf_ops = {
 	.release = vb2_dc_dmabuf_ops_release,
 };
 
-static struct sg_table *vb2_dc_get_base_sgt(struct vb2_dc_buf *buf)
-{
-	int ret;
-	struct sg_table *sgt;
-
-	sgt = kmalloc(sizeof(*sgt), GFP_KERNEL);
-	if (!sgt)
-		return NULL;
-
-	ret = dma_get_sgtable_attrs(buf->dev, sgt, buf->cookie, buf->dma_addr,
-		buf->size, buf->attrs);
-	if (ret < 0) {
-		dev_err(buf->dev, "failed to get scatterlist from DMA API\n");
-		kfree(sgt);
-		return NULL;
-	}
-
-	return sgt;
-}
-
 static struct dma_buf *vb2_dc_get_dmabuf(void *buf_priv, unsigned long flags)
 {
 	struct vb2_dc_buf *buf = buf_priv;
-- 
Regards,

Laurent Pinchart


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC v2 10/11] vb2: dma-contig: Let drivers decide DMA attrs of MMAP and USERPTR bufs
  2016-12-16  1:24 ` Laurent Pinchart
                   ` (9 preceding siblings ...)
  (?)
@ 2016-12-16  1:24 ` Laurent Pinchart
  2016-12-26  7:58   ` [RFC, v2, " Ricky Liang
  -1 siblings, 1 reply; 25+ messages in thread
From: Laurent Pinchart @ 2016-12-16  1:24 UTC (permalink / raw)
  To: linux-media
  Cc: dri-devel, Pawel Osciak, Marek Szyprowski, Kyungmin Park,
	Hans Verkuil, Sumit Semwal, Rob Clark, Daniel Vetter,
	Laura Abbott, Sakari Ailus

From: Sakari Ailus <sakari.ailus@linux.intel.com>

The desirable DMA attributes are not generic for all devices using
Videobuf2 contiguous DMA ops. Let the drivers decide.

This change also results in MMAP buffers always having an sg_table
(dma_sgt field).

Also arrange the header files alphabetically.

As a result, also the DMA-BUF exporter must provide ops for synchronising
the cache. This adds begin_cpu_access and end_cpu_access ops to
vb2_dc_dmabuf_ops.

Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
---
 drivers/media/v4l2-core/videobuf2-dma-contig.c | 66 ++++++++++++++++++++++----
 1 file changed, 56 insertions(+), 10 deletions(-)

diff --git a/drivers/media/v4l2-core/videobuf2-dma-contig.c b/drivers/media/v4l2-core/videobuf2-dma-contig.c
index d503647ea522..a0e88ad93f07 100644
--- a/drivers/media/v4l2-core/videobuf2-dma-contig.c
+++ b/drivers/media/v4l2-core/videobuf2-dma-contig.c
@@ -11,11 +11,11 @@
  */
 
 #include <linux/dma-buf.h>
+#include <linux/dma-mapping.h>
 #include <linux/module.h>
 #include <linux/scatterlist.h>
 #include <linux/sched.h>
 #include <linux/slab.h>
-#include <linux/dma-mapping.h>
 
 #include <media/videobuf2-v4l2.h>
 #include <media/videobuf2-dma-contig.h>
@@ -115,8 +115,11 @@ static void vb2_dc_prepare(void *buf_priv)
 	struct vb2_dc_buf *buf = buf_priv;
 	struct sg_table *sgt = buf->dma_sgt;
 
-	/* DMABUF exporter will flush the cache for us */
-	if (!buf->vec)
+	/*
+	 * DMABUF exporter will flush the cache for us; only USERPTR
+	 * and MMAP buffers with non-coherent memory will be flushed.
+	 */
+	if (!(buf->attrs & DMA_ATTR_NON_CONSISTENT))
 		return;
 
 	dma_sync_sg_for_device(buf->dev, sgt->sgl, sgt->orig_nents,
@@ -128,8 +131,11 @@ static void vb2_dc_finish(void *buf_priv)
 	struct vb2_dc_buf *buf = buf_priv;
 	struct sg_table *sgt = buf->dma_sgt;
 
-	/* DMABUF exporter will flush the cache for us */
-	if (!buf->vec)
+	/*
+	 * DMABUF exporter will flush the cache for us; only USERPTR
+	 * and MMAP buffers with non-coherent memory will be flushed.
+	 */
+	if (!(buf->attrs & DMA_ATTR_NON_CONSISTENT))
 		return;
 
 	dma_sync_sg_for_cpu(buf->dev, sgt->sgl, sgt->orig_nents, buf->dma_dir);
@@ -172,13 +178,22 @@ static void *vb2_dc_alloc(struct device *dev, unsigned long attrs,
 	if (attrs)
 		buf->attrs = attrs;
 	buf->cookie = dma_alloc_attrs(dev, size, &buf->dma_addr,
-					GFP_KERNEL | gfp_flags, buf->attrs);
+				     GFP_KERNEL | gfp_flags, buf->attrs);
 	if (!buf->cookie) {
-		dev_err(dev, "dma_alloc_coherent of size %ld failed\n", size);
+		dev_err(dev, "dma_alloc_attrs of size %ld failed\n", size);
 		kfree(buf);
 		return ERR_PTR(-ENOMEM);
 	}
 
+	if (buf->attrs & DMA_ATTR_NON_CONSISTENT) {
+		buf->dma_sgt = vb2_dc_get_base_sgt(buf);
+		if (!buf->dma_sgt) {
+			dma_free_attrs(dev, size, buf->cookie, buf->dma_addr,
+				       buf->attrs);
+			return ERR_PTR(-ENOMEM);
+		}
+	}
+
 	if ((buf->attrs & DMA_ATTR_NO_KERNEL_MAPPING) == 0)
 		buf->vaddr = buf->cookie;
 
@@ -359,6 +374,34 @@ static void *vb2_dc_dmabuf_ops_kmap(struct dma_buf *dbuf, unsigned long pgnum)
 	return buf->vaddr ? buf->vaddr + pgnum * PAGE_SIZE : NULL;
 }
 
+static int vb2_dc_dmabuf_ops_begin_cpu_access(struct dma_buf *dbuf,
+					      enum dma_data_direction direction)
+{
+	struct vb2_dc_buf *buf = dbuf->priv;
+	struct sg_table *sgt = buf->dma_sgt;
+
+	if (!(buf->attrs & DMA_ATTR_NON_CONSISTENT))
+		return 0;
+
+	dma_sync_sg_for_cpu(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir);
+
+	return 0;
+}
+
+static int vb2_dc_dmabuf_ops_end_cpu_access(struct dma_buf *dbuf,
+					    enum dma_data_direction direction)
+{
+	struct vb2_dc_buf *buf = dbuf->priv;
+	struct sg_table *sgt = buf->dma_sgt;
+
+	if (!(buf->attrs & DMA_ATTR_NON_CONSISTENT))
+		return 0;
+
+	dma_sync_sg_for_device(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir);
+
+	return 0;
+}
+
 static void *vb2_dc_dmabuf_ops_vmap(struct dma_buf *dbuf)
 {
 	struct vb2_dc_buf *buf = dbuf->priv;
@@ -379,6 +422,8 @@ static struct dma_buf_ops vb2_dc_dmabuf_ops = {
 	.unmap_dma_buf = vb2_dc_dmabuf_ops_unmap,
 	.kmap = vb2_dc_dmabuf_ops_kmap,
 	.kmap_atomic = vb2_dc_dmabuf_ops_kmap,
+	.begin_cpu_access = vb2_dc_dmabuf_ops_begin_cpu_access,
+	.end_cpu_access = vb2_dc_dmabuf_ops_end_cpu_access,
 	.vmap = vb2_dc_dmabuf_ops_vmap,
 	.mmap = vb2_dc_dmabuf_ops_mmap,
 	.release = vb2_dc_dmabuf_ops_release,
@@ -424,11 +469,12 @@ static void vb2_dc_put_userptr(void *buf_priv)
 
 	if (sgt) {
 		/*
-		 * No need to sync to CPU, it's already synced to the CPU
-		 * since the finish() memop will have been called before this.
+		 * Don't ask to skip cache sync in case if the user
+		 * did ask to skip cache flush the last time the
+		 * buffer was dequeued.
 		 */
 		dma_unmap_sg_attrs(buf->dev, sgt->sgl, sgt->orig_nents,
-				   buf->dma_dir, DMA_ATTR_SKIP_CPU_SYNC);
+				   buf->dma_dir, 0);
 		pages = frame_vector_pages(buf->vec);
 		/* sgt should exist only if vector contains pages... */
 		BUG_ON(IS_ERR(pages));
-- 
Regards,

Laurent Pinchart


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC v2 11/11] vb2: dma-contig: Add WARN_ON_ONCE() to check for potential bugs
  2016-12-16  1:24 ` Laurent Pinchart
                   ` (10 preceding siblings ...)
  (?)
@ 2016-12-16  1:24 ` Laurent Pinchart
  -1 siblings, 0 replies; 25+ messages in thread
From: Laurent Pinchart @ 2016-12-16  1:24 UTC (permalink / raw)
  To: linux-media
  Cc: dri-devel, Pawel Osciak, Marek Szyprowski, Kyungmin Park,
	Hans Verkuil, Sumit Semwal, Rob Clark, Daniel Vetter,
	Laura Abbott, Sakari Ailus

From: Sakari Ailus <sakari.ailus@linux.intel.com>

The scatterlist should always be present when the cache would need to be
flushed. Each buffer type has its own means to provide that. Add
WARN_ON_ONCE() to check the scatterist exists.

Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
---
 drivers/media/v4l2-core/videobuf2-dma-contig.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/media/v4l2-core/videobuf2-dma-contig.c b/drivers/media/v4l2-core/videobuf2-dma-contig.c
index a0e88ad93f07..9409f458cf89 100644
--- a/drivers/media/v4l2-core/videobuf2-dma-contig.c
+++ b/drivers/media/v4l2-core/videobuf2-dma-contig.c
@@ -122,6 +122,9 @@ static void vb2_dc_prepare(void *buf_priv)
 	if (!(buf->attrs & DMA_ATTR_NON_CONSISTENT))
 		return;
 
+	if (WARN_ON_ONCE(!sgt))
+		return;
+
 	dma_sync_sg_for_device(buf->dev, sgt->sgl, sgt->orig_nents,
 			       buf->dma_dir);
 }
@@ -138,6 +141,9 @@ static void vb2_dc_finish(void *buf_priv)
 	if (!(buf->attrs & DMA_ATTR_NON_CONSISTENT))
 		return;
 
+	if (WARN_ON_ONCE(!sgt))
+		return;
+
 	dma_sync_sg_for_cpu(buf->dev, sgt->sgl, sgt->orig_nents, buf->dma_dir);
 }
 
-- 
Regards,

Laurent Pinchart


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [RFC v2 00/11] vb2: Handle user cache hints, allow drivers to choose cache coherency
  2016-12-16  1:24 ` Laurent Pinchart
@ 2016-12-16 12:06   ` Hans Verkuil
  -1 siblings, 0 replies; 25+ messages in thread
From: Hans Verkuil @ 2016-12-16 12:06 UTC (permalink / raw)
  To: Laurent Pinchart, linux-media
  Cc: dri-devel, Pawel Osciak, Marek Szyprowski, Kyungmin Park,
	Sumit Semwal, Rob Clark, Daniel Vetter, Laura Abbott,
	Sakari Ailus

On 16/12/16 02:24, Laurent Pinchart wrote:
> Hello,
>
> This is a rebased version of the vb2 cache hints support patch series posted
> by Sakari more than a year ago. The patches have been modified as needed by
> the upstream changes and received the occasional small odd fix but are
> otherwise not modified. Please see the individual commit messages for more
> information.
>
> The videobuf2 memory managers use the DMA mapping API to handle cache
> synchronization on systems that require them transparently for drivers. As
> cache operations are expensive, system performances can be impacted. Cache
> synchronization can't be skipped altogether if we want to retain correct
> behaviour, but optimizations are possible in cases related to buffer sharing
> between multiple devices without CPU access to the memory.
>
> The first optimization covers cases where the memory never needs to be
> accessed by the CPU (neither in kernelspace nor in userspace). In those cases,
> as no CPU memory mappings exist, cache synchronization can be skipped. The
> situation could be detected in the kernel as we have enough information to
> determine whether CPU mappings for kernelspace or userspace exist (in the
> first case because drivers should request them explicitly, in the second case
> because the mmap() handler hasn't been invoked). This optimization is not
> implemented currently but should at least be prototyped as it could improve
> performances automatically in a large number of cases.
>
> The second class of optimizations cover cases where the memory sometimes needs
> to be accessed by the CPU. In those cases memory mapping must be created and
> cache handled, but cache synchronization could be skipped for buffer that are
> not touched by the CPU.
>
> By default the following cache synchronization operations need to be performed
> related to the buffer management ioctls. For simplicity means of QBUF below
> apply to buf VIDIOC_QBUF and VIDIOC_PREPARE_BUF.
>
> 		| QBUF		| DQBUF
> 	----------------------------------------
> 	CAPTURE	| Invalidate	| Invalidate (*)
> 	OUTPUT	| Clean		| -
>
> (*) for systems using speculative pre-fetching only
>
> The following cases can be optimized.
>
> 1. CAPTURE, the CPU has not written to the buffer before QBUF
>
>    Cache invalidation can be skipped at QBUF time, but becomes required at
>    DQBUF time on all systems, regardless of whether they use speculative
>    prefetching.
>
> 2. CAPTURE, the CPU will not read from the buffer after DQBUF
>
>    Cache invalidation can be skipped at DQBUF time.
>
> 3. CAPTURE, combination of (1) and (2)
>
>    Cache invalidation can be skipped at both QBUF and DQBUF time.
>
> 4. OUTPUT, the CPU has not written to the buffer before QBUF
>
>    Cache clean can be skipped at QBUF time.
>
>
> The kernel can't detect thoses situations automatically and thus requires
> hints from userspace to decide whether cache synchronization can be skipped.
> It should be noted that those hints might not be honoured. In particular, if
> userspace hints that it hasn't touched the buffer with the CPU, drivers might
> need to perform memory accesses themselves (adding JPEG or MPEG headers to
> buffers is a common case where CPU access could be needed in the kernel), in
> which case the userspace hints will be ignored.
>
> Getting the hints wrong will result in data corruption. Userspace applications
> are allowed to shoot themselves in the foot, but driver are responsible for
> deciding whether data corruption can pose a risk to the system in general. For
> instance if the device could be made to crash, or behave in a way that would
> jeopardize system security, reliability or performances, when fed with invalid
> data, cache synchronization shall not be skipped solely due to possibly
> incorrect userspace hints.
>
> The V4L2 API defines two flags, V4L2-BUF-FLAG-NO-CACHE-INVALIDATE and
> V4L2_BUF_FLAG_NO_CACHE_SYNC, that can be used to provide cache-related hints
> to the kernel. However, no kernel has ever implemented support for those flags
> that are thus most likely unused.
>
> A single flag is enough to cover all the optimization cases described above,
> provided we keep track of the flag being set at QBUF time to force cache
> invalidation at DQBUF time for case (1) if the  flag isn't set at DQBUF time.
> This patch series thus cleans up the userspace API and merges both flags into
> a single one.
>
> One potential issue with case (1) is that cache invalidation at DQBUF time for
> CAPTURE buffers isn't fully under the control of videobuf2. We can instruct
> the DMA mapping API to skip cache handling, but we can't force it to
> invalidate the cache in the sync_for_cpu operation for non speculative
> prefetching systems. Luckily, on ARM32 the current implementation always
> invalidates the cache in __dma_page_dev_to_cpu() for CAPTURE buffers so we are
> safe fot now. However, this is documented by a FIXME comment that might lead
> to someone fixing the implementation in the future. I believe we will have to
> the problem at the DMA mapping level, the userspace hint API shouldn't be
> affected.
>
> This RFC patch set achieves two main objectives:
>
> 1. Respect cache flags passed from the user space. As no driver nor videobuf2
> has (ever?) implemented them, the two flags are replaced by a single one
> (V4L2_BUF_FLAG_NO_CACHE_SYNC) and the two old flags are deprecated. This is
> done since a single flag provides the driver with enough information on what
> to do. (Patches 01/11 to 05/11, see patch 04/11 for more information.)
>
> 2. Allow a driver using videobuf2 dma-contig memory type to choose whether it
> prefers coherent or non-coherent CPU access to buffer memory for MMAP and
> USERPTR buffers. This could be later extended to be specified by the user, and
> per buffer if needed. (Patches 06/11 and 11/11).
>
> Only dma-contig memory type is changed but the same could be done to dma-sg as
> well. Sakari offered in v1 to add it to the set if people are happy with the
> changes to dma-contig.
>
> Note should be taken that the series performs cache optimization for MMAP
> buffers only. DMABUF imported buffers have their cache synchronization handled
> by the exported through the dma_buf_map_attachment() and
> dma_buf_unmap_attachment() functions, and dma-buf lacks an API to perform
> memory synchronization without unmapping and remapping the buffers. This is
> not a blocker as far as this patch series is concerned, but importing buffers
> (usually exported by the CPU) is such an important use case that we can't
> considered the cache optimization problem anywhere close to being solved if we
> don't address this case. I plan to start addressing the problem in January or
> February 2017, feedback on this point will thus be appreciated.

This series looks good to me, so:

Acked-by: Hans Verkuil <hans.verkuil@cisco.com>

Regards,

	Hans

>
>
> Sakari Ailus (10):
>   vb2: Rename confusingly named internal buffer preparation functions
>   vb2: Move buffer cache synchronisation to prepare from queue
>   vb2: Move cache synchronisation from buffer done to dqbuf handler
>   v4l: Unify cache management hint buffer flags
>   vb2: Improve struct vb2_mem_ops documentation; alloc and put are for
>     MMAP
>   vb2: dma-contig: Remove redundant sgt_base field
>   vb2: dma-contig: Don't warn on failure in obtaining scatterlist
>   vb2: dma-contig: Move vb2_dc_get_base_sgt() up
>   vb2: dma-contig: Let drivers decide DMA attrs of MMAP and USERPTR bufs
>   vb2: dma-contig: Add WARN_ON_ONCE() to check for potential bugs
>
> Samu Onkalo (1):
>   v4l2-core: Don't sync cache for a buffer if so requested
>
>  Documentation/media/uapi/v4l/buffer.rst            |  24 ++--
>  .../media/uapi/v4l/vidioc-prepare-buf.rst          |   5 +-
>  drivers/media/v4l2-core/videobuf2-core.c           | 120 ++++++++++++------
>  drivers/media/v4l2-core/videobuf2-dma-contig.c     | 135 ++++++++++++++-------
>  drivers/media/v4l2-core/videobuf2-v4l2.c           |  14 ++-
>  include/media/videobuf2-core.h                     |  43 ++++---
>  include/trace/events/v4l2.h                        |   3 +-
>  include/uapi/linux/videodev2.h                     |   7 +-
>  8 files changed, 228 insertions(+), 123 deletions(-)
>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC v2 00/11] vb2: Handle user cache hints, allow drivers to choose cache coherency
@ 2016-12-16 12:06   ` Hans Verkuil
  0 siblings, 0 replies; 25+ messages in thread
From: Hans Verkuil @ 2016-12-16 12:06 UTC (permalink / raw)
  To: Laurent Pinchart, linux-media
  Cc: Daniel Vetter, Kyungmin Park, dri-devel, Sakari Ailus,
	Pawel Osciak, Marek Szyprowski

On 16/12/16 02:24, Laurent Pinchart wrote:
> Hello,
>
> This is a rebased version of the vb2 cache hints support patch series posted
> by Sakari more than a year ago. The patches have been modified as needed by
> the upstream changes and received the occasional small odd fix but are
> otherwise not modified. Please see the individual commit messages for more
> information.
>
> The videobuf2 memory managers use the DMA mapping API to handle cache
> synchronization on systems that require them transparently for drivers. As
> cache operations are expensive, system performances can be impacted. Cache
> synchronization can't be skipped altogether if we want to retain correct
> behaviour, but optimizations are possible in cases related to buffer sharing
> between multiple devices without CPU access to the memory.
>
> The first optimization covers cases where the memory never needs to be
> accessed by the CPU (neither in kernelspace nor in userspace). In those cases,
> as no CPU memory mappings exist, cache synchronization can be skipped. The
> situation could be detected in the kernel as we have enough information to
> determine whether CPU mappings for kernelspace or userspace exist (in the
> first case because drivers should request them explicitly, in the second case
> because the mmap() handler hasn't been invoked). This optimization is not
> implemented currently but should at least be prototyped as it could improve
> performances automatically in a large number of cases.
>
> The second class of optimizations cover cases where the memory sometimes needs
> to be accessed by the CPU. In those cases memory mapping must be created and
> cache handled, but cache synchronization could be skipped for buffer that are
> not touched by the CPU.
>
> By default the following cache synchronization operations need to be performed
> related to the buffer management ioctls. For simplicity means of QBUF below
> apply to buf VIDIOC_QBUF and VIDIOC_PREPARE_BUF.
>
> 		| QBUF		| DQBUF
> 	----------------------------------------
> 	CAPTURE	| Invalidate	| Invalidate (*)
> 	OUTPUT	| Clean		| -
>
> (*) for systems using speculative pre-fetching only
>
> The following cases can be optimized.
>
> 1. CAPTURE, the CPU has not written to the buffer before QBUF
>
>    Cache invalidation can be skipped at QBUF time, but becomes required at
>    DQBUF time on all systems, regardless of whether they use speculative
>    prefetching.
>
> 2. CAPTURE, the CPU will not read from the buffer after DQBUF
>
>    Cache invalidation can be skipped at DQBUF time.
>
> 3. CAPTURE, combination of (1) and (2)
>
>    Cache invalidation can be skipped at both QBUF and DQBUF time.
>
> 4. OUTPUT, the CPU has not written to the buffer before QBUF
>
>    Cache clean can be skipped at QBUF time.
>
>
> The kernel can't detect thoses situations automatically and thus requires
> hints from userspace to decide whether cache synchronization can be skipped.
> It should be noted that those hints might not be honoured. In particular, if
> userspace hints that it hasn't touched the buffer with the CPU, drivers might
> need to perform memory accesses themselves (adding JPEG or MPEG headers to
> buffers is a common case where CPU access could be needed in the kernel), in
> which case the userspace hints will be ignored.
>
> Getting the hints wrong will result in data corruption. Userspace applications
> are allowed to shoot themselves in the foot, but driver are responsible for
> deciding whether data corruption can pose a risk to the system in general. For
> instance if the device could be made to crash, or behave in a way that would
> jeopardize system security, reliability or performances, when fed with invalid
> data, cache synchronization shall not be skipped solely due to possibly
> incorrect userspace hints.
>
> The V4L2 API defines two flags, V4L2-BUF-FLAG-NO-CACHE-INVALIDATE and
> V4L2_BUF_FLAG_NO_CACHE_SYNC, that can be used to provide cache-related hints
> to the kernel. However, no kernel has ever implemented support for those flags
> that are thus most likely unused.
>
> A single flag is enough to cover all the optimization cases described above,
> provided we keep track of the flag being set at QBUF time to force cache
> invalidation at DQBUF time for case (1) if the  flag isn't set at DQBUF time.
> This patch series thus cleans up the userspace API and merges both flags into
> a single one.
>
> One potential issue with case (1) is that cache invalidation at DQBUF time for
> CAPTURE buffers isn't fully under the control of videobuf2. We can instruct
> the DMA mapping API to skip cache handling, but we can't force it to
> invalidate the cache in the sync_for_cpu operation for non speculative
> prefetching systems. Luckily, on ARM32 the current implementation always
> invalidates the cache in __dma_page_dev_to_cpu() for CAPTURE buffers so we are
> safe fot now. However, this is documented by a FIXME comment that might lead
> to someone fixing the implementation in the future. I believe we will have to
> the problem at the DMA mapping level, the userspace hint API shouldn't be
> affected.
>
> This RFC patch set achieves two main objectives:
>
> 1. Respect cache flags passed from the user space. As no driver nor videobuf2
> has (ever?) implemented them, the two flags are replaced by a single one
> (V4L2_BUF_FLAG_NO_CACHE_SYNC) and the two old flags are deprecated. This is
> done since a single flag provides the driver with enough information on what
> to do. (Patches 01/11 to 05/11, see patch 04/11 for more information.)
>
> 2. Allow a driver using videobuf2 dma-contig memory type to choose whether it
> prefers coherent or non-coherent CPU access to buffer memory for MMAP and
> USERPTR buffers. This could be later extended to be specified by the user, and
> per buffer if needed. (Patches 06/11 and 11/11).
>
> Only dma-contig memory type is changed but the same could be done to dma-sg as
> well. Sakari offered in v1 to add it to the set if people are happy with the
> changes to dma-contig.
>
> Note should be taken that the series performs cache optimization for MMAP
> buffers only. DMABUF imported buffers have their cache synchronization handled
> by the exported through the dma_buf_map_attachment() and
> dma_buf_unmap_attachment() functions, and dma-buf lacks an API to perform
> memory synchronization without unmapping and remapping the buffers. This is
> not a blocker as far as this patch series is concerned, but importing buffers
> (usually exported by the CPU) is such an important use case that we can't
> considered the cache optimization problem anywhere close to being solved if we
> don't address this case. I plan to start addressing the problem in January or
> February 2017, feedback on this point will thus be appreciated.

This series looks good to me, so:

Acked-by: Hans Verkuil <hans.verkuil@cisco.com>

Regards,

	Hans

>
>
> Sakari Ailus (10):
>   vb2: Rename confusingly named internal buffer preparation functions
>   vb2: Move buffer cache synchronisation to prepare from queue
>   vb2: Move cache synchronisation from buffer done to dqbuf handler
>   v4l: Unify cache management hint buffer flags
>   vb2: Improve struct vb2_mem_ops documentation; alloc and put are for
>     MMAP
>   vb2: dma-contig: Remove redundant sgt_base field
>   vb2: dma-contig: Don't warn on failure in obtaining scatterlist
>   vb2: dma-contig: Move vb2_dc_get_base_sgt() up
>   vb2: dma-contig: Let drivers decide DMA attrs of MMAP and USERPTR bufs
>   vb2: dma-contig: Add WARN_ON_ONCE() to check for potential bugs
>
> Samu Onkalo (1):
>   v4l2-core: Don't sync cache for a buffer if so requested
>
>  Documentation/media/uapi/v4l/buffer.rst            |  24 ++--
>  .../media/uapi/v4l/vidioc-prepare-buf.rst          |   5 +-
>  drivers/media/v4l2-core/videobuf2-core.c           | 120 ++++++++++++------
>  drivers/media/v4l2-core/videobuf2-dma-contig.c     | 135 ++++++++++++++-------
>  drivers/media/v4l2-core/videobuf2-v4l2.c           |  14 ++-
>  include/media/videobuf2-core.h                     |  43 ++++---
>  include/trace/events/v4l2.h                        |   3 +-
>  include/uapi/linux/videodev2.h                     |   7 +-
>  8 files changed, 228 insertions(+), 123 deletions(-)
>

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC, v2, 10/11] vb2: dma-contig: Let drivers decide DMA attrs of MMAP and USERPTR bufs
  2016-12-16  1:24 ` [RFC v2 10/11] vb2: dma-contig: Let drivers decide DMA attrs of MMAP and USERPTR bufs Laurent Pinchart
@ 2016-12-26  7:58   ` Ricky Liang
  2017-04-05 13:13     ` Sakari Ailus
  2017-04-07 11:42     ` Laurent Pinchart
  0 siblings, 2 replies; 25+ messages in thread
From: Ricky Liang @ 2016-12-26  7:58 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: linux-media, dri-devel, Pawel Osciak, Marek Szyprowski,
	Kyungmin Park, Hans Verkuil, Sumit Semwal, Rob Clark,
	Daniel Vetter, Laura Abbott, Sakari Ailus

Hi Laurent,

On Fri, Dec 16, 2016 at 9:24 AM, Laurent Pinchart
<laurent.pinchart+renesas@ideasonboard.com> wrote:
> From: Sakari Ailus <sakari.ailus@linux.intel.com>
>
> The desirable DMA attributes are not generic for all devices using
> Videobuf2 contiguous DMA ops. Let the drivers decide.
>
> This change also results in MMAP buffers always having an sg_table
> (dma_sgt field).
>
> Also arrange the header files alphabetically.
>
> As a result, also the DMA-BUF exporter must provide ops for synchronising
> the cache. This adds begin_cpu_access and end_cpu_access ops to
> vb2_dc_dmabuf_ops.
>
> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
> ---
>  drivers/media/v4l2-core/videobuf2-dma-contig.c | 66 ++++++++++++++++++++++----
>  1 file changed, 56 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/media/v4l2-core/videobuf2-dma-contig.c b/drivers/media/v4l2-core/videobuf2-dma-contig.c
> index d503647ea522..a0e88ad93f07 100644
> --- a/drivers/media/v4l2-core/videobuf2-dma-contig.c
> +++ b/drivers/media/v4l2-core/videobuf2-dma-contig.c
> @@ -11,11 +11,11 @@
>   */
>
>  #include <linux/dma-buf.h>
> +#include <linux/dma-mapping.h>
>  #include <linux/module.h>
>  #include <linux/scatterlist.h>
>  #include <linux/sched.h>
>  #include <linux/slab.h>
> -#include <linux/dma-mapping.h>
>
>  #include <media/videobuf2-v4l2.h>
>  #include <media/videobuf2-dma-contig.h>
> @@ -115,8 +115,11 @@ static void vb2_dc_prepare(void *buf_priv)
>         struct vb2_dc_buf *buf = buf_priv;
>         struct sg_table *sgt = buf->dma_sgt;
>
> -       /* DMABUF exporter will flush the cache for us */
> -       if (!buf->vec)
> +       /*
> +        * DMABUF exporter will flush the cache for us; only USERPTR
> +        * and MMAP buffers with non-coherent memory will be flushed.
> +        */
> +       if (!(buf->attrs & DMA_ATTR_NON_CONSISTENT))

Should here be "if (!buf->vec || !(buf->attrs & DMA_ATTR_NON_CONSISTENT))" ?

>                 return;
>
>         dma_sync_sg_for_device(buf->dev, sgt->sgl, sgt->orig_nents,
> @@ -128,8 +131,11 @@ static void vb2_dc_finish(void *buf_priv)
>         struct vb2_dc_buf *buf = buf_priv;
>         struct sg_table *sgt = buf->dma_sgt;
>
> -       /* DMABUF exporter will flush the cache for us */
> -       if (!buf->vec)
> +       /*
> +        * DMABUF exporter will flush the cache for us; only USERPTR
> +        * and MMAP buffers with non-coherent memory will be flushed.
> +        */
> +       if (!(buf->attrs & DMA_ATTR_NON_CONSISTENT))
>                 return;
>
>         dma_sync_sg_for_cpu(buf->dev, sgt->sgl, sgt->orig_nents, buf->dma_dir);
> @@ -172,13 +178,22 @@ static void *vb2_dc_alloc(struct device *dev, unsigned long attrs,
>         if (attrs)
>                 buf->attrs = attrs;
>         buf->cookie = dma_alloc_attrs(dev, size, &buf->dma_addr,
> -                                       GFP_KERNEL | gfp_flags, buf->attrs);
> +                                    GFP_KERNEL | gfp_flags, buf->attrs);
>         if (!buf->cookie) {
> -               dev_err(dev, "dma_alloc_coherent of size %ld failed\n", size);
> +               dev_err(dev, "dma_alloc_attrs of size %ld failed\n", size);
>                 kfree(buf);
>                 return ERR_PTR(-ENOMEM);
>         }
>
> +       if (buf->attrs & DMA_ATTR_NON_CONSISTENT) {
> +               buf->dma_sgt = vb2_dc_get_base_sgt(buf);
> +               if (!buf->dma_sgt) {
> +                       dma_free_attrs(dev, size, buf->cookie, buf->dma_addr,
> +                                      buf->attrs);
> +                       return ERR_PTR(-ENOMEM);
> +               }
> +       }
> +
>         if ((buf->attrs & DMA_ATTR_NO_KERNEL_MAPPING) == 0)
>                 buf->vaddr = buf->cookie;
>
> @@ -359,6 +374,34 @@ static void *vb2_dc_dmabuf_ops_kmap(struct dma_buf *dbuf, unsigned long pgnum)
>         return buf->vaddr ? buf->vaddr + pgnum * PAGE_SIZE : NULL;
>  }
>
> +static int vb2_dc_dmabuf_ops_begin_cpu_access(struct dma_buf *dbuf,
> +                                             enum dma_data_direction direction)
> +{
> +       struct vb2_dc_buf *buf = dbuf->priv;
> +       struct sg_table *sgt = buf->dma_sgt;
> +
> +       if (!(buf->attrs & DMA_ATTR_NON_CONSISTENT))
> +               return 0;
> +
> +       dma_sync_sg_for_cpu(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir);
> +
> +       return 0;
> +}
> +
> +static int vb2_dc_dmabuf_ops_end_cpu_access(struct dma_buf *dbuf,
> +                                           enum dma_data_direction direction)
> +{
> +       struct vb2_dc_buf *buf = dbuf->priv;
> +       struct sg_table *sgt = buf->dma_sgt;
> +
> +       if (!(buf->attrs & DMA_ATTR_NON_CONSISTENT))
> +               return 0;
> +
> +       dma_sync_sg_for_device(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir);
> +
> +       return 0;
> +}
> +
>  static void *vb2_dc_dmabuf_ops_vmap(struct dma_buf *dbuf)
>  {
>         struct vb2_dc_buf *buf = dbuf->priv;
> @@ -379,6 +422,8 @@ static struct dma_buf_ops vb2_dc_dmabuf_ops = {
>         .unmap_dma_buf = vb2_dc_dmabuf_ops_unmap,
>         .kmap = vb2_dc_dmabuf_ops_kmap,
>         .kmap_atomic = vb2_dc_dmabuf_ops_kmap,
> +       .begin_cpu_access = vb2_dc_dmabuf_ops_begin_cpu_access,
> +       .end_cpu_access = vb2_dc_dmabuf_ops_end_cpu_access,
>         .vmap = vb2_dc_dmabuf_ops_vmap,
>         .mmap = vb2_dc_dmabuf_ops_mmap,
>         .release = vb2_dc_dmabuf_ops_release,
> @@ -424,11 +469,12 @@ static void vb2_dc_put_userptr(void *buf_priv)
>
>         if (sgt) {
>                 /*
> -                * No need to sync to CPU, it's already synced to the CPU
> -                * since the finish() memop will have been called before this.
> +                * Don't ask to skip cache sync in case if the user
> +                * did ask to skip cache flush the last time the
> +                * buffer was dequeued.
>                  */
>                 dma_unmap_sg_attrs(buf->dev, sgt->sgl, sgt->orig_nents,
> -                                  buf->dma_dir, DMA_ATTR_SKIP_CPU_SYNC);
> +                                  buf->dma_dir, 0);
>                 pages = frame_vector_pages(buf->vec);
>                 /* sgt should exist only if vector contains pages... */
>                 BUG_ON(IS_ERR(pages));

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC v2 03/11] vb2: Move cache synchronisation from buffer done to dqbuf handler
  2016-12-16  1:24 ` [RFC v2 03/11] vb2: Move cache synchronisation from buffer done to dqbuf handler Laurent Pinchart
@ 2017-03-27 21:27   ` Shuah Khan
  2017-03-28 12:31       ` Laurent Pinchart
  0 siblings, 1 reply; 25+ messages in thread
From: Shuah Khan @ 2017-03-27 21:27 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: linux-media, dri-devel, Pawel Osciak, Marek Szyprowski,
	Kyungmin Park, Hans Verkuil, Sumit Semwal, Rob Clark,
	Daniel Vetter, Laura Abbott, Sakari Ailus, Shuah Khan

On Thu, Dec 15, 2016 at 6:24 PM, Laurent Pinchart
<laurent.pinchart+renesas@ideasonboard.com> wrote:
> From: Sakari Ailus <sakari.ailus@linux.intel.com>
>
> The cache synchronisation may be a time consuming operation and thus not
> best performed in an interrupt which is a typical context for
> vb2_buffer_done() calls. This may consume up to tens of ms on some
> machines, depending on the buffer size.
>
> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
> ---
> Changes since v1:
>
> - Don't rename the 'i' loop counter to 'plane'
> ---
>  drivers/media/v4l2-core/videobuf2-core.c | 9 ++++-----
>  1 file changed, 4 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/media/v4l2-core/videobuf2-core.c b/drivers/media/v4l2-core/videobuf2-core.c
> index 8ba48703b189..15a83f338072 100644
> --- a/drivers/media/v4l2-core/videobuf2-core.c
> +++ b/drivers/media/v4l2-core/videobuf2-core.c
> @@ -889,7 +889,6 @@ void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state)
>  {
>         struct vb2_queue *q = vb->vb2_queue;
>         unsigned long flags;
> -       unsigned int plane;
>
>         if (WARN_ON(vb->state != VB2_BUF_STATE_ACTIVE))
>                 return;
> @@ -910,10 +909,6 @@ void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state)
>         dprintk(4, "done processing on buffer %d, state: %d\n",
>                         vb->index, state);
>
> -       /* sync buffers */
> -       for (plane = 0; plane < vb->num_planes; ++plane)
> -               call_void_memop(vb, finish, vb->planes[plane].mem_priv);
> -
>         spin_lock_irqsave(&q->done_lock, flags);
>         if (state == VB2_BUF_STATE_QUEUED ||
>             state == VB2_BUF_STATE_REQUEUEING) {
> @@ -1571,6 +1566,10 @@ static void __vb2_dqbuf(struct vb2_buffer *vb)
>
>         vb->state = VB2_BUF_STATE_DEQUEUED;
>
> +       /* sync buffers */
> +       for (i = 0; i < vb->num_planes; ++i)
> +               call_void_memop(vb, finish, vb->planes[i].mem_priv);
> +

Does this compile?? Where is "i" defined? Looks like it needs to be added
back in.

-- Shuah

>         /* unmap DMABUF buffer */
>         if (q->memory == VB2_MEMORY_DMABUF)
>                 for (i = 0; i < vb->num_planes; ++i) {
> --
> Regards,
>
> Laurent Pinchart
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-media" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC v2 05/11] v4l2-core: Don't sync cache for a buffer if so requested
  2016-12-16  1:24 ` [RFC v2 05/11] v4l2-core: Don't sync cache for a buffer if so requested Laurent Pinchart
@ 2017-03-27 22:08   ` Shuah Khan
  0 siblings, 0 replies; 25+ messages in thread
From: Shuah Khan @ 2017-03-27 22:08 UTC (permalink / raw)
  To: Laurent Pinchart, Hans Verkuil
  Cc: linux-media, dri-devel, Pawel Osciak, Marek Szyprowski,
	Kyungmin Park, Sumit Semwal, Rob Clark, Daniel Vetter,
	Laura Abbott, Sakari Ailus, Samu Onkalo, Shuah Khan

On Thu, Dec 15, 2016 at 6:24 PM, Laurent Pinchart
<laurent.pinchart+renesas@ideasonboard.com> wrote:
> From: Samu Onkalo <samu.onkalo@intel.com>
>
> The user may request to the driver (vb2) to skip the cache maintenance
> operations in case the buffer does not need cache synchronisation, e.g. in
> cases where the buffer is passed between hardware blocks without it being
> touched by the CPU.
>
> Also document that the prepare and finish vb2_mem_ops might not get called
> every time the buffer ownership changes between the kernel and the user
> space.
>
> Signed-off-by: Samu Onkalo <samu.onkalo@intel.com>
> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
> ---
> Changes since v1:
>
> - Add a no_cache_sync argument to vb2 core prepare/qbuf/dqbuf functions
>   to get round the inability to access v4l2_buffer flags from vb2 core.

Hmm.. Is this necessary? Lots of changes to pass on single flag.
vb2_core_qbuf(), vb2_core_dqbuf(), and vb2_core_prepare_buf() all have
access to v4l2_buf - void *pb

Would it make it easeir and avoid adding a new parameter all these routines
if we add a vl42-common routine to return the true if
V4L2_BUF_FLAG_NO_CACHE_SYNC is set?

Something along the lines of v4l2_is_no_cache_sync_set()?

thanks,
-- Shuah

> ---
>  drivers/media/v4l2-core/videobuf2-core.c | 101 +++++++++++++++++++++----------
>  drivers/media/v4l2-core/videobuf2-v4l2.c |  14 ++++-
>  include/media/videobuf2-core.h           |  23 ++++---
>  3 files changed, 97 insertions(+), 41 deletions(-)
>
> diff --git a/drivers/media/v4l2-core/videobuf2-core.c b/drivers/media/v4l2-core/videobuf2-core.c
> index 15a83f338072..e5371ef213b0 100644
> --- a/drivers/media/v4l2-core/videobuf2-core.c
> +++ b/drivers/media/v4l2-core/videobuf2-core.c
> @@ -189,6 +189,28 @@ static void __vb2_queue_cancel(struct vb2_queue *q);
>  static void __enqueue_in_driver(struct vb2_buffer *vb);
>
>  /**
> + * __mem_prepare_planes() - call finish mem op for all planes of the buffer
> + */
> +static void __mem_prepare_planes(struct vb2_buffer *vb)
> +{
> +       unsigned int plane;
> +
> +       for (plane = 0; plane < vb->num_planes; ++plane)
> +               call_void_memop(vb, prepare, vb->planes[plane].mem_priv);
> +}
> +
> +/**
> + * __mem_finish_planes() - call finish mem op for all planes of the buffer
> + */
> +static void __mem_finish_planes(struct vb2_buffer *vb)
> +{
> +       unsigned int plane;
> +
> +       for (plane = 0; plane < vb->num_planes; ++plane)
> +               call_void_memop(vb, finish, vb->planes[plane].mem_priv);
> +}
> +
> +/**
>   * __vb2_buf_mem_alloc() - allocate video memory for the given buffer
>   */
>  static int __vb2_buf_mem_alloc(struct vb2_buffer *vb)
> @@ -953,20 +975,29 @@ EXPORT_SYMBOL_GPL(vb2_discard_done);
>  /**
>   * __prepare_mmap() - prepare an MMAP buffer
>   */
> -static int __prepare_mmap(struct vb2_buffer *vb, const void *pb)
> +static int __prepare_mmap(struct vb2_buffer *vb, const void *pb,
> +                         bool no_cache_sync)
>  {
> -       int ret = 0;
> +       int ret;
>
> -       if (pb)
> +       if (pb) {
>                 ret = call_bufop(vb->vb2_queue, fill_vb2_buffer,
>                                  vb, pb, vb->planes);
> -       return ret ? ret : call_vb_qop(vb, buf_prepare, vb);
> +               if (ret)
> +                       return ret;
> +       }
> +
> +       if (!no_cache_sync)
> +               __mem_prepare_planes(vb);
> +
> +       return call_vb_qop(vb, buf_prepare, vb);
>  }
>
>  /**
>   * __prepare_userptr() - prepare a USERPTR buffer
>   */
> -static int __prepare_userptr(struct vb2_buffer *vb, const void *pb)
> +static int __prepare_userptr(struct vb2_buffer *vb, const void *pb,
> +                            bool no_cache_sync)
>  {
>         struct vb2_plane planes[VB2_MAX_PLANES];
>         struct vb2_queue *q = vb->vb2_queue;
> @@ -1056,6 +1087,11 @@ static int __prepare_userptr(struct vb2_buffer *vb, const void *pb)
>                         dprintk(1, "buffer initialization failed\n");
>                         goto err;
>                 }
> +
> +               /* This is new buffer memory --- always synchronise cache. */
> +               __mem_prepare_planes(vb);
> +       } else if (!no_cache_sync) {
> +               __mem_prepare_planes(vb);
>         }
>
>         ret = call_vb_qop(vb, buf_prepare, vb);
> @@ -1083,7 +1119,8 @@ static int __prepare_userptr(struct vb2_buffer *vb, const void *pb)
>  /**
>   * __prepare_dmabuf() - prepare a DMABUF buffer
>   */
> -static int __prepare_dmabuf(struct vb2_buffer *vb, const void *pb)
> +static int __prepare_dmabuf(struct vb2_buffer *vb, const void *pb,
> +                           bool no_cache_sync)
>  {
>         struct vb2_plane planes[VB2_MAX_PLANES];
>         struct vb2_queue *q = vb->vb2_queue;
> @@ -1197,6 +1234,11 @@ static int __prepare_dmabuf(struct vb2_buffer *vb, const void *pb)
>                         dprintk(1, "buffer initialization failed\n");
>                         goto err;
>                 }
> +
> +               /* This is new buffer memory --- always synchronise cache. */
> +               __mem_prepare_planes(vb);
> +       } else if (!no_cache_sync) {
> +               __mem_prepare_planes(vb);
>         }
>
>         ret = call_vb_qop(vb, buf_prepare, vb);
> @@ -1229,10 +1271,10 @@ static void __enqueue_in_driver(struct vb2_buffer *vb)
>         call_void_vb_qop(vb, buf_queue, vb);
>  }
>
> -static int __buf_prepare(struct vb2_buffer *vb, const void *pb)
> +static int __buf_prepare(struct vb2_buffer *vb, const void *pb,
> +                        bool no_cache_sync)
>  {
>         struct vb2_queue *q = vb->vb2_queue;
> -       unsigned int plane;
>         int ret;
>
>         if (q->error) {
> @@ -1244,13 +1286,13 @@ static int __buf_prepare(struct vb2_buffer *vb, const void *pb)
>
>         switch (q->memory) {
>         case VB2_MEMORY_MMAP:
> -               ret = __prepare_mmap(vb, pb);
> +               ret = __prepare_mmap(vb, pb, no_cache_sync);
>                 break;
>         case VB2_MEMORY_USERPTR:
> -               ret = __prepare_userptr(vb, pb);
> +               ret = __prepare_userptr(vb, pb, no_cache_sync);
>                 break;
>         case VB2_MEMORY_DMABUF:
> -               ret = __prepare_dmabuf(vb, pb);
> +               ret = __prepare_dmabuf(vb, pb, no_cache_sync);
>                 break;
>         default:
>                 WARN(1, "Invalid queue type\n");
> @@ -1263,16 +1305,13 @@ static int __buf_prepare(struct vb2_buffer *vb, const void *pb)
>                 return ret;
>         }
>
> -       /* sync buffers */
> -       for (plane = 0; plane < vb->num_planes; ++plane)
> -               call_void_memop(vb, prepare, vb->planes[plane].mem_priv);
> -
>         vb->state = VB2_BUF_STATE_PREPARED;
>
>         return 0;
>  }
>
> -int vb2_core_prepare_buf(struct vb2_queue *q, unsigned int index, void *pb)
> +int vb2_core_prepare_buf(struct vb2_queue *q, unsigned int index, void *pb,
> +                        bool no_cache_sync)
>  {
>         struct vb2_buffer *vb;
>         int ret;
> @@ -1284,7 +1323,7 @@ int vb2_core_prepare_buf(struct vb2_queue *q, unsigned int index, void *pb)
>                 return -EINVAL;
>         }
>
> -       ret = __buf_prepare(vb, pb);
> +       ret = __buf_prepare(vb, pb, no_cache_sync);
>         if (ret)
>                 return ret;
>
> @@ -1360,7 +1399,8 @@ static int vb2_start_streaming(struct vb2_queue *q)
>         return ret;
>  }
>
> -int vb2_core_qbuf(struct vb2_queue *q, unsigned int index, void *pb)
> +int vb2_core_qbuf(struct vb2_queue *q, unsigned int index, void *pb,
> +                 bool no_cache_sync)
>  {
>         struct vb2_buffer *vb;
>         int ret;
> @@ -1369,7 +1409,7 @@ int vb2_core_qbuf(struct vb2_queue *q, unsigned int index, void *pb)
>
>         switch (vb->state) {
>         case VB2_BUF_STATE_DEQUEUED:
> -               ret = __buf_prepare(vb, pb);
> +               ret = __buf_prepare(vb, pb, no_cache_sync);
>                 if (ret)
>                         return ret;
>                 break;
> @@ -1555,7 +1595,7 @@ EXPORT_SYMBOL_GPL(vb2_wait_for_all_buffers);
>  /**
>   * __vb2_dqbuf() - bring back the buffer to the DEQUEUED state
>   */
> -static void __vb2_dqbuf(struct vb2_buffer *vb)
> +static void __vb2_dqbuf(struct vb2_buffer *vb, bool no_cache_sync)
>  {
>         struct vb2_queue *q = vb->vb2_queue;
>         unsigned int i;
> @@ -1566,9 +1606,8 @@ static void __vb2_dqbuf(struct vb2_buffer *vb)
>
>         vb->state = VB2_BUF_STATE_DEQUEUED;
>
> -       /* sync buffers */
> -       for (i = 0; i < vb->num_planes; ++i)
> -               call_void_memop(vb, finish, vb->planes[i].mem_priv);
> +       if (!no_cache_sync)
> +               __mem_finish_planes(vb);
>
>         /* unmap DMABUF buffer */
>         if (q->memory == VB2_MEMORY_DMABUF)
> @@ -1581,7 +1620,7 @@ static void __vb2_dqbuf(struct vb2_buffer *vb)
>  }
>
>  int vb2_core_dqbuf(struct vb2_queue *q, unsigned int *pindex, void *pb,
> -                  bool nonblocking)
> +                  bool nonblocking, bool no_cache_sync)
>  {
>         struct vb2_buffer *vb = NULL;
>         int ret;
> @@ -1618,7 +1657,7 @@ int vb2_core_dqbuf(struct vb2_queue *q, unsigned int *pindex, void *pb,
>         trace_vb2_dqbuf(q, vb);
>
>         /* go back to dequeued state */
> -       __vb2_dqbuf(vb);
> +       __vb2_dqbuf(vb, no_cache_sync);
>
>         dprintk(1, "dqbuf of buffer %d, with state %d\n",
>                         vb->index, vb->state);
> @@ -1692,7 +1731,7 @@ static void __vb2_queue_cancel(struct vb2_queue *q)
>                         vb->state = VB2_BUF_STATE_PREPARED;
>                         call_void_vb_qop(vb, buf_finish, vb);
>                 }
> -               __vb2_dqbuf(vb);
> +               __vb2_dqbuf(vb, false);
>         }
>  }
>
> @@ -2240,7 +2279,7 @@ static int __vb2_init_fileio(struct vb2_queue *q, int read)
>                  * Queue all buffers.
>                  */
>                 for (i = 0; i < q->num_buffers; i++) {
> -                       ret = vb2_core_qbuf(q, i, NULL);
> +                       ret = vb2_core_qbuf(q, i, NULL, false);
>                         if (ret)
>                                 goto err_reqbufs;
>                         fileio->bufs[i].queued = 1;
> @@ -2343,7 +2382,7 @@ static size_t __vb2_perform_fileio(struct vb2_queue *q, char __user *data, size_
>                 /*
>                  * Call vb2_dqbuf to get buffer back.
>                  */
> -               ret = vb2_core_dqbuf(q, &index, NULL, nonblock);
> +               ret = vb2_core_dqbuf(q, &index, NULL, nonblock, false);
>                 dprintk(5, "vb2_dqbuf result: %d\n", ret);
>                 if (ret)
>                         return ret;
> @@ -2419,7 +2458,7 @@ static size_t __vb2_perform_fileio(struct vb2_queue *q, char __user *data, size_
>
>                 if (copy_timestamp)
>                         b->timestamp = ktime_get_ns();
> -               ret = vb2_core_qbuf(q, index, NULL);
> +               ret = vb2_core_qbuf(q, index, NULL, false);
>                 dprintk(5, "vb2_dbuf result: %d\n", ret);
>                 if (ret)
>                         return ret;
> @@ -2505,7 +2544,7 @@ static int vb2_thread(void *data)
>                 } else {
>                         call_void_qop(q, wait_finish, q);
>                         if (!threadio->stop)
> -                               ret = vb2_core_dqbuf(q, &index, NULL, 0);
> +                               ret = vb2_core_dqbuf(q, &index, NULL, 0, false);
>                         call_void_qop(q, wait_prepare, q);
>                         dprintk(5, "file io: vb2_dqbuf result: %d\n", ret);
>                         if (!ret)
> @@ -2522,7 +2561,7 @@ static int vb2_thread(void *data)
>                 if (copy_timestamp)
>                         vb->timestamp = ktime_get_ns();;
>                 if (!threadio->stop)
> -                       ret = vb2_core_qbuf(q, vb->index, NULL);
> +                       ret = vb2_core_qbuf(q, vb->index, NULL, false);
>                 call_void_qop(q, wait_prepare, q);
>                 if (ret || threadio->stop)
>                         break;
> diff --git a/drivers/media/v4l2-core/videobuf2-v4l2.c b/drivers/media/v4l2-core/videobuf2-v4l2.c
> index 3529849d2218..7e327ad6ef30 100644
> --- a/drivers/media/v4l2-core/videobuf2-v4l2.c
> +++ b/drivers/media/v4l2-core/videobuf2-v4l2.c
> @@ -499,8 +499,11 @@ int vb2_prepare_buf(struct vb2_queue *q, struct v4l2_buffer *b)
>         }
>
>         ret = vb2_queue_or_prepare_buf(q, b, "prepare_buf");
> +       if (ret)
> +               return ret;
>
> -       return ret ? ret : vb2_core_prepare_buf(q, b->index, b);
> +       return vb2_core_prepare_buf(q, b->index, b,
> +                                   b->flags & V4L2_BUF_FLAG_NO_CACHE_SYNC);
>  }
>  EXPORT_SYMBOL_GPL(vb2_prepare_buf);
>
> @@ -565,7 +568,11 @@ int vb2_qbuf(struct vb2_queue *q, struct v4l2_buffer *b)
>         }
>
>         ret = vb2_queue_or_prepare_buf(q, b, "qbuf");
> -       return ret ? ret : vb2_core_qbuf(q, b->index, b);
> +       if (ret)
> +               return ret;
> +
> +       return vb2_core_qbuf(q, b->index, b,
> +                            b->flags & V4L2_BUF_FLAG_NO_CACHE_SYNC);
>  }
>  EXPORT_SYMBOL_GPL(vb2_qbuf);
>
> @@ -583,7 +590,8 @@ int vb2_dqbuf(struct vb2_queue *q, struct v4l2_buffer *b, bool nonblocking)
>                 return -EINVAL;
>         }
>
> -       ret = vb2_core_dqbuf(q, NULL, b, nonblocking);
> +       ret = vb2_core_dqbuf(q, NULL, b, nonblocking,
> +                            b->flags & V4L2_BUF_FLAG_NO_CACHE_SYNC);
>
>         /*
>          *  After calling the VIDIOC_DQBUF V4L2_BUF_FLAG_DONE must be
> diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h
> index ac5898a55fd9..bfad0588bb2b 100644
> --- a/include/media/videobuf2-core.h
> +++ b/include/media/videobuf2-core.h
> @@ -83,10 +83,14 @@ struct vb2_threadio_data;
>   *             dmabuf.
>   * @unmap_dmabuf: releases access control to the dmabuf - allocator is notified
>   *               that this driver is done using the dmabuf for now.
> - * @prepare:   called every time the buffer is passed from userspace to the
> - *             driver, useful for cache synchronisation, optional.
> - * @finish:    called every time the buffer is passed back from the driver
> - *             to the userspace, also optional.
> + * @prepare:   Called on the plane when the buffer ownership is passed from
> + *             the user space to the kernel and the plane must be cache
> + *             syncronised. The V4L2_BUF_FLAG_NO_CACHE_SYNC buffer flag may
> + *             be used to skip this call. Optional.
> + * @finish:    Called on the plane when the buffer ownership is passed from
> + *             the kernel to the user space and the plane must be cache
> + *             syncronised. The V4L2_BUF_FLAG_NO_CACHE_SYNC buffer flag may
> + *             be used to skip this call. Optional.
>   * @vaddr:     return a kernel virtual address to a given memory buffer
>   *             associated with the passed private structure or NULL if no
>   *             such mapping exists.
> @@ -695,6 +699,7 @@ int vb2_core_create_bufs(struct vb2_queue *q, enum vb2_memory memory,
>   * @index:     id number of the buffer
>   * @pb:                buffer structure passed from userspace to vidioc_prepare_buf
>   *             handler in driver
> + * @no_cache_sync if true, skip cache synchronization
>   *
>   * Should be called from vidioc_prepare_buf ioctl handler of a driver.
>   * The passed buffer should have been verified.
> @@ -704,7 +709,8 @@ int vb2_core_create_bufs(struct vb2_queue *q, enum vb2_memory memory,
>   * The return values from this function are intended to be directly returned
>   * from vidioc_prepare_buf handler in driver.
>   */
> -int vb2_core_prepare_buf(struct vb2_queue *q, unsigned int index, void *pb);
> +int vb2_core_prepare_buf(struct vb2_queue *q, unsigned int index, void *pb,
> +                        bool no_cache_sync);
>
>  /**
>   * vb2_core_qbuf() - Queue a buffer from userspace
> @@ -713,6 +719,7 @@ int vb2_core_prepare_buf(struct vb2_queue *q, unsigned int index, void *pb);
>   * @index:     id number of the buffer
>   * @pb:                buffer structure passed from userspace to vidioc_qbuf handler
>   *             in driver
> + * @no_cache_sync if true, skip cache synchronization
>   *
>   * Should be called from vidioc_qbuf ioctl handler of a driver.
>   * The passed buffer should have been verified.
> @@ -727,7 +734,8 @@ int vb2_core_prepare_buf(struct vb2_queue *q, unsigned int index, void *pb);
>   * The return values from this function are intended to be directly returned
>   * from vidioc_qbuf handler in driver.
>   */
> -int vb2_core_qbuf(struct vb2_queue *q, unsigned int index, void *pb);
> +int vb2_core_qbuf(struct vb2_queue *q, unsigned int index, void *pb,
> +                 bool no_cache_sync);
>
>  /**
>   * vb2_core_dqbuf() - Dequeue a buffer to the userspace
> @@ -738,6 +746,7 @@ int vb2_core_qbuf(struct vb2_queue *q, unsigned int index, void *pb);
>   * @nonblocking: if true, this call will not sleep waiting for a buffer if no
>   *              buffers ready for dequeuing are present. Normally the driver
>   *              would be passing (file->f_flags & O_NONBLOCK) here
> + * @no_cache_sync if true, skip cache synchronization
>   *
>   * Should be called from vidioc_dqbuf ioctl handler of a driver.
>   * The passed buffer should have been verified.
> @@ -754,7 +763,7 @@ int vb2_core_qbuf(struct vb2_queue *q, unsigned int index, void *pb);
>   * from vidioc_dqbuf handler in driver.
>   */
>  int vb2_core_dqbuf(struct vb2_queue *q, unsigned int *pindex, void *pb,
> -                  bool nonblocking);
> +                  bool nonblocking, bool no_cache_sync);
>
>  int vb2_core_streamon(struct vb2_queue *q, unsigned int type);
>  int vb2_core_streamoff(struct vb2_queue *q, unsigned int type);
> --
> Regards,
>
> Laurent Pinchart
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-media" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC v2 07/11] vb2: dma-contig: Remove redundant sgt_base field
  2016-12-16  1:24 ` [RFC v2 07/11] vb2: dma-contig: Remove redundant sgt_base field Laurent Pinchart
@ 2017-03-27 22:51   ` Shuah Khan
  2017-04-07 12:37     ` Sakari Ailus
  0 siblings, 1 reply; 25+ messages in thread
From: Shuah Khan @ 2017-03-27 22:51 UTC (permalink / raw)
  To: Laurent Pinchart, Sakari Ailus
  Cc: linux-media, dri-devel, Pawel Osciak, Marek Szyprowski,
	Kyungmin Park, Hans Verkuil, Sumit Semwal, Rob Clark,
	Daniel Vetter, Laura Abbott, Shuah Khan

On Thu, Dec 15, 2016 at 6:24 PM, Laurent Pinchart
<laurent.pinchart+renesas@ideasonboard.com> wrote:
> From: Sakari Ailus <sakari.ailus@linux.intel.com>
>
> The struct vb2_dc_buf contains two struct sg_table fields: sgt_base and
> dma_sgt. The former is used by DMA-BUF buffers whereas the latter is used
> by USERPTR.
>
> Unify the two, leaving dma_sgt.

I think this patch should be split in two.

1. Unifying dma_sgt and sgt_base

>
> MMAP buffers do not need cache flushing since they have been allocated
> using dma_alloc_coherent().

2. That uses vec to check for checking for no flush needed condition.

>
> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
> ---
> Changes since v1:
>
> - Test for MMAP or DMABUF type through the vec field instead of the now
>   gone vma field.

What is this gone vma field? Did I miss a patch in the series that
makes this change? This check that is changed used dma_sgt and
db_attach vma

These comments don't agree with the code change.

> - Move the vec field to a USERPTR section in struct vb2_dc_buf, where
>   the vma field was located.
> ---
>  drivers/media/v4l2-core/videobuf2-dma-contig.c | 25 +++++++++++++------------
>  1 file changed, 13 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/media/v4l2-core/videobuf2-dma-contig.c b/drivers/media/v4l2-core/videobuf2-dma-contig.c
> index fb6a177be461..2a00d12ffee2 100644
> --- a/drivers/media/v4l2-core/videobuf2-dma-contig.c
> +++ b/drivers/media/v4l2-core/videobuf2-dma-contig.c
> @@ -30,12 +30,13 @@ struct vb2_dc_buf {
>         unsigned long                   attrs;
>         enum dma_data_direction         dma_dir;
>         struct sg_table                 *dma_sgt;
> -       struct frame_vector             *vec;
>
>         /* MMAP related */
>         struct vb2_vmarea_handler       handler;
>         atomic_t                        refcount;
> -       struct sg_table                 *sgt_base;
> +
> +       /* USERPTR related */
> +       struct frame_vector             *vec;
>
>         /* DMABUF related */
>         struct dma_buf_attachment       *db_attach;
> @@ -95,7 +96,7 @@ static void vb2_dc_prepare(void *buf_priv)
>         struct sg_table *sgt = buf->dma_sgt;
>
>         /* DMABUF exporter will flush the cache for us */
> -       if (!sgt || buf->db_attach)
> +       if (!buf->vec)
>                 return;

With the unification dma_sgt is valid for MMAP buffers after vb2_dma_sg_alloc()
if dma_sgt is not null, sync happens - the patch description doesn't seem to be
in sync with the change.

I might be missing something. I think it would help if these two changes are
split since they are really separate changes.

thanks,
-- Shuah

>
>         dma_sync_sg_for_device(buf->dev, sgt->sgl, sgt->orig_nents,
> @@ -108,7 +109,7 @@ static void vb2_dc_finish(void *buf_priv)
>         struct sg_table *sgt = buf->dma_sgt;
>
>         /* DMABUF exporter will flush the cache for us */
> -       if (!sgt || buf->db_attach)
> +       if (!buf->vec)
>                 return;
>
>         dma_sync_sg_for_cpu(buf->dev, sgt->sgl, sgt->orig_nents, buf->dma_dir);
> @@ -125,9 +126,9 @@ static void vb2_dc_put(void *buf_priv)
>         if (!atomic_dec_and_test(&buf->refcount))
>                 return;
>
> -       if (buf->sgt_base) {
> -               sg_free_table(buf->sgt_base);
> -               kfree(buf->sgt_base);
> +       if (buf->dma_sgt) {
> +               sg_free_table(buf->dma_sgt);
> +               kfree(buf->dma_sgt);
>         }
>         dma_free_attrs(buf->dev, buf->size, buf->cookie, buf->dma_addr,
>                        buf->attrs);
> @@ -239,13 +240,13 @@ static int vb2_dc_dmabuf_ops_attach(struct dma_buf *dbuf, struct device *dev,
>         /* Copy the buf->base_sgt scatter list to the attachment, as we can't
>          * map the same scatter list to multiple attachments at the same time.
>          */
> -       ret = sg_alloc_table(sgt, buf->sgt_base->orig_nents, GFP_KERNEL);
> +       ret = sg_alloc_table(sgt, buf->dma_sgt->orig_nents, GFP_KERNEL);
>         if (ret) {
>                 kfree(attach);
>                 return -ENOMEM;
>         }
>
> -       rd = buf->sgt_base->sgl;
> +       rd = buf->dma_sgt->sgl;
>         wr = sgt->sgl;
>         for (i = 0; i < sgt->orig_nents; ++i) {
>                 sg_set_page(wr, sg_page(rd), rd->length, rd->offset);
> @@ -396,10 +397,10 @@ static struct dma_buf *vb2_dc_get_dmabuf(void *buf_priv, unsigned long flags)
>         exp_info.flags = flags;
>         exp_info.priv = buf;
>
> -       if (!buf->sgt_base)
> -               buf->sgt_base = vb2_dc_get_base_sgt(buf);
> +       if (!buf->dma_sgt)
> +               buf->dma_sgt = vb2_dc_get_base_sgt(buf);
>
> -       if (WARN_ON(!buf->sgt_base))
> +       if (WARN_ON(!buf->dma_sgt))
>                 return NULL;
>
>         dbuf = dma_buf_export(&exp_info);
> --
> Regards,
>
> Laurent Pinchart
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-media" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC v2 03/11] vb2: Move cache synchronisation from buffer done to dqbuf handler
  2017-03-27 21:27   ` Shuah Khan
@ 2017-03-28 12:31       ` Laurent Pinchart
  0 siblings, 0 replies; 25+ messages in thread
From: Laurent Pinchart @ 2017-03-28 12:31 UTC (permalink / raw)
  To: Shuah Khan
  Cc: Laurent Pinchart, linux-media, dri-devel, Pawel Osciak,
	Marek Szyprowski, Kyungmin Park, Hans Verkuil, Sumit Semwal,
	Rob Clark, Daniel Vetter, Laura Abbott, Sakari Ailus

Hi Shuah,

On Monday 27 Mar 2017 15:27:36 Shuah Khan wrote:
> On Thu, Dec 15, 2016 at 6:24 PM, Laurent Pinchart wrote:
> > From: Sakari Ailus <sakari.ailus@linux.intel.com>
> > 
> > The cache synchronisation may be a time consuming operation and thus not
> > best performed in an interrupt which is a typical context for
> > vb2_buffer_done() calls. This may consume up to tens of ms on some
> > machines, depending on the buffer size.
> > 
> > Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
> > ---
> > Changes since v1:
> > 
> > - Don't rename the 'i' loop counter to 'plane'
> > ---
> > 
> >  drivers/media/v4l2-core/videobuf2-core.c | 9 ++++-----
> >  1 file changed, 4 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/media/v4l2-core/videobuf2-core.c
> > b/drivers/media/v4l2-core/videobuf2-core.c index
> > 8ba48703b189..15a83f338072 100644
> > --- a/drivers/media/v4l2-core/videobuf2-core.c
> > +++ b/drivers/media/v4l2-core/videobuf2-core.c
> > @@ -889,7 +889,6 @@ void vb2_buffer_done(struct vb2_buffer *vb, enum
> > vb2_buffer_state state)> 
> >  {
> >         struct vb2_queue *q = vb->vb2_queue;
> >         unsigned long flags;
> > -       unsigned int plane;
> > 
> >         if (WARN_ON(vb->state != VB2_BUF_STATE_ACTIVE))
> >                 return;
> > @@ -910,10 +909,6 @@ void vb2_buffer_done(struct vb2_buffer *vb, enum
> > vb2_buffer_state state)
> >         dprintk(4, "done processing on buffer %d, state: %d\n",
> >                         vb->index, state);
> > 
> > -       /* sync buffers */
> > -       for (plane = 0; plane < vb->num_planes; ++plane)
> > -               call_void_memop(vb, finish, vb->planes[plane].mem_priv);
> > -
> >         spin_lock_irqsave(&q->done_lock, flags);
> >         if (state == VB2_BUF_STATE_QUEUED ||
> >             state == VB2_BUF_STATE_REQUEUEING) {
> > @@ -1571,6 +1566,10 @@ static void __vb2_dqbuf(struct vb2_buffer *vb)
> > 
> >         vb->state = VB2_BUF_STATE_DEQUEUED;
> > 
> > +       /* sync buffers */
> > +       for (i = 0; i < vb->num_planes; ++i)
> > +               call_void_memop(vb, finish, vb->planes[i].mem_priv);
> > +
> 
> Does this compile?? Where is "i" defined? Looks like it needs to be added
> back in.

It's already defined at the beginning of the __vb2_dqbuf() function.

> >         /* unmap DMABUF buffer */
> >         if (q->memory == VB2_MEMORY_DMABUF)
> >                 for (i = 0; i < vb->num_planes; ++i) {

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC v2 03/11] vb2: Move cache synchronisation from buffer done to dqbuf handler
@ 2017-03-28 12:31       ` Laurent Pinchart
  0 siblings, 0 replies; 25+ messages in thread
From: Laurent Pinchart @ 2017-03-28 12:31 UTC (permalink / raw)
  To: Shuah Khan
  Cc: Laurent Pinchart, Daniel Vetter, dri-devel, Hans Verkuil,
	Kyungmin Park, Sakari Ailus, linux-media, Pawel Osciak,
	Marek Szyprowski

Hi Shuah,

On Monday 27 Mar 2017 15:27:36 Shuah Khan wrote:
> On Thu, Dec 15, 2016 at 6:24 PM, Laurent Pinchart wrote:
> > From: Sakari Ailus <sakari.ailus@linux.intel.com>
> > 
> > The cache synchronisation may be a time consuming operation and thus not
> > best performed in an interrupt which is a typical context for
> > vb2_buffer_done() calls. This may consume up to tens of ms on some
> > machines, depending on the buffer size.
> > 
> > Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
> > ---
> > Changes since v1:
> > 
> > - Don't rename the 'i' loop counter to 'plane'
> > ---
> > 
> >  drivers/media/v4l2-core/videobuf2-core.c | 9 ++++-----
> >  1 file changed, 4 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/media/v4l2-core/videobuf2-core.c
> > b/drivers/media/v4l2-core/videobuf2-core.c index
> > 8ba48703b189..15a83f338072 100644
> > --- a/drivers/media/v4l2-core/videobuf2-core.c
> > +++ b/drivers/media/v4l2-core/videobuf2-core.c
> > @@ -889,7 +889,6 @@ void vb2_buffer_done(struct vb2_buffer *vb, enum
> > vb2_buffer_state state)> 
> >  {
> >         struct vb2_queue *q = vb->vb2_queue;
> >         unsigned long flags;
> > -       unsigned int plane;
> > 
> >         if (WARN_ON(vb->state != VB2_BUF_STATE_ACTIVE))
> >                 return;
> > @@ -910,10 +909,6 @@ void vb2_buffer_done(struct vb2_buffer *vb, enum
> > vb2_buffer_state state)
> >         dprintk(4, "done processing on buffer %d, state: %d\n",
> >                         vb->index, state);
> > 
> > -       /* sync buffers */
> > -       for (plane = 0; plane < vb->num_planes; ++plane)
> > -               call_void_memop(vb, finish, vb->planes[plane].mem_priv);
> > -
> >         spin_lock_irqsave(&q->done_lock, flags);
> >         if (state == VB2_BUF_STATE_QUEUED ||
> >             state == VB2_BUF_STATE_REQUEUEING) {
> > @@ -1571,6 +1566,10 @@ static void __vb2_dqbuf(struct vb2_buffer *vb)
> > 
> >         vb->state = VB2_BUF_STATE_DEQUEUED;
> > 
> > +       /* sync buffers */
> > +       for (i = 0; i < vb->num_planes; ++i)
> > +               call_void_memop(vb, finish, vb->planes[i].mem_priv);
> > +
> 
> Does this compile?? Where is "i" defined? Looks like it needs to be added
> back in.

It's already defined at the beginning of the __vb2_dqbuf() function.

> >         /* unmap DMABUF buffer */
> >         if (q->memory == VB2_MEMORY_DMABUF)
> >                 for (i = 0; i < vb->num_planes; ++i) {

-- 
Regards,

Laurent Pinchart

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC, v2, 10/11] vb2: dma-contig: Let drivers decide DMA attrs of MMAP and USERPTR bufs
  2016-12-26  7:58   ` [RFC, v2, " Ricky Liang
@ 2017-04-05 13:13     ` Sakari Ailus
  2017-04-07 11:42     ` Laurent Pinchart
  1 sibling, 0 replies; 25+ messages in thread
From: Sakari Ailus @ 2017-04-05 13:13 UTC (permalink / raw)
  To: Ricky Liang
  Cc: Laurent Pinchart, linux-media, dri-devel, Pawel Osciak,
	Marek Szyprowski, Kyungmin Park, Hans Verkuil, Sumit Semwal,
	Rob Clark, Daniel Vetter, Laura Abbott, Sakari Ailus

Hi Ricky,

On Mon, Dec 26, 2016 at 03:58:07PM +0800, Ricky Liang wrote:
> Hi Laurent,
> 
> On Fri, Dec 16, 2016 at 9:24 AM, Laurent Pinchart
> <laurent.pinchart+renesas@ideasonboard.com> wrote:
> > From: Sakari Ailus <sakari.ailus@linux.intel.com>
> >
> > The desirable DMA attributes are not generic for all devices using
> > Videobuf2 contiguous DMA ops. Let the drivers decide.
> >
> > This change also results in MMAP buffers always having an sg_table
> > (dma_sgt field).
> >
> > Also arrange the header files alphabetically.
> >
> > As a result, also the DMA-BUF exporter must provide ops for synchronising
> > the cache. This adds begin_cpu_access and end_cpu_access ops to
> > vb2_dc_dmabuf_ops.
> >
> > Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
> > ---
> >  drivers/media/v4l2-core/videobuf2-dma-contig.c | 66 ++++++++++++++++++++++----
> >  1 file changed, 56 insertions(+), 10 deletions(-)
> >
> > diff --git a/drivers/media/v4l2-core/videobuf2-dma-contig.c b/drivers/media/v4l2-core/videobuf2-dma-contig.c
> > index d503647ea522..a0e88ad93f07 100644
> > --- a/drivers/media/v4l2-core/videobuf2-dma-contig.c
> > +++ b/drivers/media/v4l2-core/videobuf2-dma-contig.c
> > @@ -11,11 +11,11 @@
> >   */
> >
> >  #include <linux/dma-buf.h>
> > +#include <linux/dma-mapping.h>
> >  #include <linux/module.h>
> >  #include <linux/scatterlist.h>
> >  #include <linux/sched.h>
> >  #include <linux/slab.h>
> > -#include <linux/dma-mapping.h>
> >
> >  #include <media/videobuf2-v4l2.h>
> >  #include <media/videobuf2-dma-contig.h>
> > @@ -115,8 +115,11 @@ static void vb2_dc_prepare(void *buf_priv)
> >         struct vb2_dc_buf *buf = buf_priv;
> >         struct sg_table *sgt = buf->dma_sgt;
> >
> > -       /* DMABUF exporter will flush the cache for us */
> > -       if (!buf->vec)
> > +       /*
> > +        * DMABUF exporter will flush the cache for us; only USERPTR
> > +        * and MMAP buffers with non-coherent memory will be flushed.
> > +        */
> > +       if (!(buf->attrs & DMA_ATTR_NON_CONSISTENT))
> 
> Should here be "if (!buf->vec || !(buf->attrs & DMA_ATTR_NON_CONSISTENT))" ?

The patch was originally using struct dma_attrs and I believe rebasing
changed how it it works. Thank you for pointing that out.

Using buf->vec for the purpose alone is not enough since also MMAP buffers
may require cache synchronisation from this patch onwards.

-- 
Kind regards,

Sakari Ailus
e-mail: sakari.ailus@iki.fi	XMPP: sailus@retiisi.org.uk

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC, v2, 10/11] vb2: dma-contig: Let drivers decide DMA attrs of MMAP and USERPTR bufs
  2016-12-26  7:58   ` [RFC, v2, " Ricky Liang
  2017-04-05 13:13     ` Sakari Ailus
@ 2017-04-07 11:42     ` Laurent Pinchart
  1 sibling, 0 replies; 25+ messages in thread
From: Laurent Pinchart @ 2017-04-07 11:42 UTC (permalink / raw)
  To: Ricky Liang
  Cc: Laurent Pinchart, linux-media, dri-devel, Pawel Osciak,
	Marek Szyprowski, Kyungmin Park, Hans Verkuil, Sumit Semwal,
	Rob Clark, Daniel Vetter, Laura Abbott, Sakari Ailus

Hi Ricky,

On Monday 26 Dec 2016 15:58:07 Ricky Liang wrote:
> On Fri, Dec 16, 2016 at 9:24 AM, Laurent Pinchart wrote:
> > From: Sakari Ailus <sakari.ailus@linux.intel.com>
> > 
> > The desirable DMA attributes are not generic for all devices using
> > Videobuf2 contiguous DMA ops. Let the drivers decide.
> > 
> > This change also results in MMAP buffers always having an sg_table
> > (dma_sgt field).
> > 
> > Also arrange the header files alphabetically.
> > 
> > As a result, also the DMA-BUF exporter must provide ops for synchronising
> > the cache. This adds begin_cpu_access and end_cpu_access ops to
> > vb2_dc_dmabuf_ops.
> > 
> > Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
> > ---
> > 
> >  drivers/media/v4l2-core/videobuf2-dma-contig.c | 66 +++++++++++++++++----
> >  1 file changed, 56 insertions(+), 10 deletions(-)
> > 
> > diff --git a/drivers/media/v4l2-core/videobuf2-dma-contig.c
> > b/drivers/media/v4l2-core/videobuf2-dma-contig.c index
> > d503647ea522..a0e88ad93f07 100644
> > --- a/drivers/media/v4l2-core/videobuf2-dma-contig.c
> > +++ b/drivers/media/v4l2-core/videobuf2-dma-contig.c

[snip]

> > @@ -115,8 +115,11 @@ static void vb2_dc_prepare(void *buf_priv)
> >         struct vb2_dc_buf *buf = buf_priv;
> >         struct sg_table *sgt = buf->dma_sgt;
> > 
> > -       /* DMABUF exporter will flush the cache for us */
> > -       if (!buf->vec)
> > +       /*
> > +        * DMABUF exporter will flush the cache for us; only USERPTR
> > +        * and MMAP buffers with non-coherent memory will be flushed.
> > +        */
> > +       if (!(buf->attrs & DMA_ATTR_NON_CONSISTENT))
> 
> Should here be "if (!buf->vec || !(buf->attrs & DMA_ATTR_NON_CONSISTENT))" ?

I don't think so. buf->vec indicates that the buffer is using USERPTR. The 
check would thus return immediately for everything that is not USERPTR. What 
we want to do is return for DMABUF, and for MMAP and USERPTR buffers that 
don't have the DMA_ATTR_NON_CONSISTENT attribute set. As DMABUF buffers never 
have that attribute set (because attrs is set in vb2_dc_alloc, which is not 
called for DMABUF buffers), we can check the flag only.

> >                 return;
> >         
> >         dma_sync_sg_for_device(buf->dev, sgt->sgl, sgt->orig_nents,

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC v2 07/11] vb2: dma-contig: Remove redundant sgt_base field
  2017-03-27 22:51   ` Shuah Khan
@ 2017-04-07 12:37     ` Sakari Ailus
  0 siblings, 0 replies; 25+ messages in thread
From: Sakari Ailus @ 2017-04-07 12:37 UTC (permalink / raw)
  To: Shuah Khan
  Cc: Laurent Pinchart, Sakari Ailus, linux-media, dri-devel,
	Pawel Osciak, Marek Szyprowski, Kyungmin Park, Hans Verkuil,
	Sumit Semwal, Rob Clark, Daniel Vetter, Laura Abbott

Hi Shuah,

On Mon, Mar 27, 2017 at 04:51:40PM -0600, Shuah Khan wrote:
> On Thu, Dec 15, 2016 at 6:24 PM, Laurent Pinchart
> <laurent.pinchart+renesas@ideasonboard.com> wrote:
> > From: Sakari Ailus <sakari.ailus@linux.intel.com>
> >
> > The struct vb2_dc_buf contains two struct sg_table fields: sgt_base and
> > dma_sgt. The former is used by DMA-BUF buffers whereas the latter is used
> > by USERPTR.
> >
> > Unify the two, leaving dma_sgt.
> 
> I think this patch should be split in two.
> 
> 1. Unifying dma_sgt and sgt_base
> 
> >
> > MMAP buffers do not need cache flushing since they have been allocated
> > using dma_alloc_coherent().
> 
> 2. That uses vec to check for checking for no flush needed condition.

I can split this, sure. A non-NULL vec indicates a USERPTR buffer. Before
this patch, non-NULL buf->dma_sgt did the same.

> 
> >
> > Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
> > ---
> > Changes since v1:
> >
> > - Test for MMAP or DMABUF type through the vec field instead of the now
> >   gone vma field.
> 
> What is this gone vma field? Did I miss a patch in the series that
> makes this change? This check that is changed used dma_sgt and
> db_attach vma
> 

The field existed on bc0195aad0daa2ad5b0d76cce22b167bc3435590, i.e. v4.2-rc2
from which the earlier version of this patch was rebased from.

> These comments don't agree with the code change.
> 
> > - Move the vec field to a USERPTR section in struct vb2_dc_buf, where
> >   the vma field was located.
> > ---
> >  drivers/media/v4l2-core/videobuf2-dma-contig.c | 25 +++++++++++++------------
> >  1 file changed, 13 insertions(+), 12 deletions(-)
> >
> > diff --git a/drivers/media/v4l2-core/videobuf2-dma-contig.c b/drivers/media/v4l2-core/videobuf2-dma-contig.c
> > index fb6a177be461..2a00d12ffee2 100644
> > --- a/drivers/media/v4l2-core/videobuf2-dma-contig.c
> > +++ b/drivers/media/v4l2-core/videobuf2-dma-contig.c
> > @@ -30,12 +30,13 @@ struct vb2_dc_buf {
> >         unsigned long                   attrs;
> >         enum dma_data_direction         dma_dir;
> >         struct sg_table                 *dma_sgt;
> > -       struct frame_vector             *vec;
> >
> >         /* MMAP related */
> >         struct vb2_vmarea_handler       handler;
> >         atomic_t                        refcount;
> > -       struct sg_table                 *sgt_base;
> > +
> > +       /* USERPTR related */
> > +       struct frame_vector             *vec;
> >
> >         /* DMABUF related */
> >         struct dma_buf_attachment       *db_attach;
> > @@ -95,7 +96,7 @@ static void vb2_dc_prepare(void *buf_priv)
> >         struct sg_table *sgt = buf->dma_sgt;
> >
> >         /* DMABUF exporter will flush the cache for us */
> > -       if (!sgt || buf->db_attach)
> > +       if (!buf->vec)
> >                 return;
> 
> With the unification dma_sgt is valid for MMAP buffers after vb2_dma_sg_alloc()
> if dma_sgt is not null, sync happens - the patch description doesn't seem to be
> in sync with the change.

I'm not sure what you're referring to. The condition for sync is changed to
use buf->vec instead, i.e. the functionality is not affected.

-- 
Kind regards,

Sakari Ailus
e-mail: sakari.ailus@iki.fi	XMPP: sailus@retiisi.org.uk

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2017-04-07 12:37 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-16  1:24 [RFC v2 00/11] vb2: Handle user cache hints, allow drivers to choose cache coherency Laurent Pinchart
2016-12-16  1:24 ` Laurent Pinchart
2016-12-16  1:24 ` [RFC v2 01/11] vb2: Rename confusingly named internal buffer preparation functions Laurent Pinchart
2016-12-16  1:24 ` [RFC v2 02/11] vb2: Move buffer cache synchronisation to prepare from queue Laurent Pinchart
2016-12-16  1:24 ` [RFC v2 03/11] vb2: Move cache synchronisation from buffer done to dqbuf handler Laurent Pinchart
2017-03-27 21:27   ` Shuah Khan
2017-03-28 12:31     ` Laurent Pinchart
2017-03-28 12:31       ` Laurent Pinchart
2016-12-16  1:24 ` [RFC v2 04/11] v4l: Unify cache management hint buffer flags Laurent Pinchart
2016-12-16  1:24 ` [RFC v2 05/11] v4l2-core: Don't sync cache for a buffer if so requested Laurent Pinchart
2017-03-27 22:08   ` Shuah Khan
2016-12-16  1:24 ` [RFC v2 06/11] vb2: Improve struct vb2_mem_ops documentation; alloc and put are for MMAP Laurent Pinchart
2016-12-16  1:24 ` [RFC v2 07/11] vb2: dma-contig: Remove redundant sgt_base field Laurent Pinchart
2017-03-27 22:51   ` Shuah Khan
2017-04-07 12:37     ` Sakari Ailus
2016-12-16  1:24 ` [RFC v2 08/11] vb2: dma-contig: Don't warn on failure in obtaining scatterlist Laurent Pinchart
2016-12-16  1:24   ` Laurent Pinchart
2016-12-16  1:24 ` [RFC v2 09/11] vb2: dma-contig: Move vb2_dc_get_base_sgt() up Laurent Pinchart
2016-12-16  1:24 ` [RFC v2 10/11] vb2: dma-contig: Let drivers decide DMA attrs of MMAP and USERPTR bufs Laurent Pinchart
2016-12-26  7:58   ` [RFC, v2, " Ricky Liang
2017-04-05 13:13     ` Sakari Ailus
2017-04-07 11:42     ` Laurent Pinchart
2016-12-16  1:24 ` [RFC v2 11/11] vb2: dma-contig: Add WARN_ON_ONCE() to check for potential bugs Laurent Pinchart
2016-12-16 12:06 ` [RFC v2 00/11] vb2: Handle user cache hints, allow drivers to choose cache coherency Hans Verkuil
2016-12-16 12:06   ` Hans Verkuil

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.