[PATCHv3 0/4] v4l: vsp1: Fix suspend/resume and race on M2M pipelines

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCHv3 0/4] v4l: vsp1: Fix suspend/resume and race on M2M pipelines
@ 2016-12-13 17:59 Kieran Bingham
  2016-12-13 17:59 ` [PATCHv3 1/4] v4l: vsp1: Move vsp1_video_setup_pipeline() Kieran Bingham
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Kieran Bingham @ 2016-12-13 17:59 UTC (permalink / raw)
  To: laurent.pinchart; +Cc: linux-renesas-soc, linux-media, Kieran Bingham

This small patchset helps rework the VSP1 driver to repair an issue on
suspend/resume operations whereby the pipeline does not get reconfigured after
it has been re-initialised following a resume operation.

Along side this, there was an intrinsic race in the vsp1_video_start_streaming()
function whereby multiple streams operating through a BRU, could find themselves
commencing an operation before the pipeline has been configured, or worse -
commencing, just as the pipeline is being configured resulting in a null pointer
dereference on pipe->dl.

This series superceeds a previous effort to fix the BRU race.

Patch [1/4] is a code move only, with no functional change.
Patch [2/4] refactors the vsp1_video_start_streaming() function and fixes both
            suspend/resume, and the BRU race in a single change
Patch [3/4] removes the context scoped 'pipe->dl' which has been susceptible to
            races and isn't required to be in the context.
Patch [4/4] is an RFC patch really, that fixes a segfault on error paths  and I
            certainly expect feedback and brief discussion. Please drop Patch 4
            in the event of any further discussion, and don't consider it as
            blocking for the first three patches of this series.

v3:
 - Move configured=false from vsp1_device_init to vsp1_reset_wpf()
 - Clean up flag dereferencing with a local struct *

v2:
 - Refactor video pipeline configuration implementation to solve both suspend
   resume and the VSP BRU race in a single change

v1:
 - Original pipeline configuration rework

Kieran Bingham (4):
  v4l: vsp1: Move vsp1_video_setup_pipeline()
  v4l: vsp1: Refactor video pipeline configuration
  v4l: vsp1: Use local display lists and remove global pipe->dl
  media: Catch null pipes on pipeline stop

 drivers/media/media-entity.c             |   2 +
 drivers/media/platform/vsp1/vsp1_drm.c   |  20 ++---
 drivers/media/platform/vsp1/vsp1_drv.c   |   4 +
 drivers/media/platform/vsp1/vsp1_pipe.c  |   1 +
 drivers/media/platform/vsp1/vsp1_pipe.h  |   4 +-
 drivers/media/platform/vsp1/vsp1_video.c | 127 +++++++++++++++----------------
 6 files changed, 79 insertions(+), 79 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCHv3 1/4] v4l: vsp1: Move vsp1_video_setup_pipeline()
  2016-12-13 17:59 [PATCHv3 0/4] v4l: vsp1: Fix suspend/resume and race on M2M pipelines Kieran Bingham
@ 2016-12-13 17:59 ` Kieran Bingham
  2016-12-14 20:21   ` Laurent Pinchart
  2016-12-13 17:59 ` [PATCHv3 2/4] v4l: vsp1: Refactor video pipeline configuration Kieran Bingham
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 18+ messages in thread
From: Kieran Bingham @ 2016-12-13 17:59 UTC (permalink / raw)
  To: laurent.pinchart; +Cc: linux-renesas-soc, linux-media, Kieran Bingham

Move the static vsp1_video_setup_pipeline() function in preparation for
the callee updates so that the vsp1_video_pipeline_run() call can
configure pipelines following suspend resume actions.

This commit is just a code move for clarity performing no functional
change.

Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
---
 drivers/media/platform/vsp1/vsp1_video.c | 82 ++++++++++++++++----------------
 1 file changed, 41 insertions(+), 41 deletions(-)

diff --git a/drivers/media/platform/vsp1/vsp1_video.c b/drivers/media/platform/vsp1/vsp1_video.c
index d351b9c768d2..44b687c0b8df 100644
--- a/drivers/media/platform/vsp1/vsp1_video.c
+++ b/drivers/media/platform/vsp1/vsp1_video.c
@@ -350,6 +350,47 @@ static void vsp1_video_frame_end(struct vsp1_pipeline *pipe,
 	pipe->buffers_ready |= 1 << video->pipe_index;
 }
 
+static int vsp1_video_setup_pipeline(struct vsp1_pipeline *pipe)
+{
+	struct vsp1_entity *entity;
+
+	/* Determine this pipelines sizes for image partitioning support. */
+	vsp1_video_pipeline_setup_partitions(pipe);
+
+	/* Prepare the display list. */
+	pipe->dl = vsp1_dl_list_get(pipe->output->dlm);
+	if (!pipe->dl)
+		return -ENOMEM;
+
+	if (pipe->uds) {
+		struct vsp1_uds *uds = to_uds(&pipe->uds->subdev);
+
+		/* If a BRU is present in the pipeline before the UDS, the alpha
+		 * component doesn't need to be scaled as the BRU output alpha
+		 * value is fixed to 255. Otherwise we need to scale the alpha
+		 * component only when available at the input RPF.
+		 */
+		if (pipe->uds_input->type == VSP1_ENTITY_BRU) {
+			uds->scale_alpha = false;
+		} else {
+			struct vsp1_rwpf *rpf =
+				to_rwpf(&pipe->uds_input->subdev);
+
+			uds->scale_alpha = rpf->fmtinfo->alpha;
+		}
+	}
+
+	list_for_each_entry(entity, &pipe->entities, list_pipe) {
+		vsp1_entity_route_setup(entity, pipe->dl);
+
+		if (entity->ops->configure)
+			entity->ops->configure(entity, pipe, pipe->dl,
+					       VSP1_ENTITY_PARAMS_INIT);
+	}
+
+	return 0;
+}
+
 static void vsp1_video_pipeline_run_partition(struct vsp1_pipeline *pipe,
 					      struct vsp1_dl_list *dl)
 {
@@ -747,47 +788,6 @@ static void vsp1_video_buffer_queue(struct vb2_buffer *vb)
 	spin_unlock_irqrestore(&pipe->irqlock, flags);
 }
 
-static int vsp1_video_setup_pipeline(struct vsp1_pipeline *pipe)
-{
-	struct vsp1_entity *entity;
-
-	/* Determine this pipelines sizes for image partitioning support. */
-	vsp1_video_pipeline_setup_partitions(pipe);
-
-	/* Prepare the display list. */
-	pipe->dl = vsp1_dl_list_get(pipe->output->dlm);
-	if (!pipe->dl)
-		return -ENOMEM;
-
-	if (pipe->uds) {
-		struct vsp1_uds *uds = to_uds(&pipe->uds->subdev);
-
-		/* If a BRU is present in the pipeline before the UDS, the alpha
-		 * component doesn't need to be scaled as the BRU output alpha
-		 * value is fixed to 255. Otherwise we need to scale the alpha
-		 * component only when available at the input RPF.
-		 */
-		if (pipe->uds_input->type == VSP1_ENTITY_BRU) {
-			uds->scale_alpha = false;
-		} else {
-			struct vsp1_rwpf *rpf =
-				to_rwpf(&pipe->uds_input->subdev);
-
-			uds->scale_alpha = rpf->fmtinfo->alpha;
-		}
-	}
-
-	list_for_each_entry(entity, &pipe->entities, list_pipe) {
-		vsp1_entity_route_setup(entity, pipe->dl);
-
-		if (entity->ops->configure)
-			entity->ops->configure(entity, pipe, pipe->dl,
-					       VSP1_ENTITY_PARAMS_INIT);
-	}
-
-	return 0;
-}
-
 static int vsp1_video_start_streaming(struct vb2_queue *vq, unsigned int count)
 {
 	struct vsp1_video *video = vb2_get_drv_priv(vq);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCHv3 2/4] v4l: vsp1: Refactor video pipeline configuration
  2016-12-13 17:59 [PATCHv3 0/4] v4l: vsp1: Fix suspend/resume and race on M2M pipelines Kieran Bingham
  2016-12-13 17:59 ` [PATCHv3 1/4] v4l: vsp1: Move vsp1_video_setup_pipeline() Kieran Bingham
@ 2016-12-13 17:59 ` Kieran Bingham
  2016-12-14 16:30   ` Laurent Pinchart
  2016-12-13 17:59 ` [PATCHv3 3/4] v4l: vsp1: Use local display lists and remove global pipe->dl Kieran Bingham
  2016-12-13 17:59 ` [PATCHv3 RFC 4/4] media: Catch null pipes on pipeline stop Kieran Bingham
  3 siblings, 1 reply; 18+ messages in thread
From: Kieran Bingham @ 2016-12-13 17:59 UTC (permalink / raw)
  To: laurent.pinchart; +Cc: linux-renesas-soc, linux-media, Kieran Bingham

With multiple inputs through the BRU it is feasible for the streams to
race each other at stream-on. In the case of the video pipelines, this
can present two serious issues.

 1) A null-dereference if the pipe->dl is committed at the same time as
    the vsp1_video_setup_pipeline() is processing

 2) A hardware hang, where a display list is committed without having
    called vsp1_video_setup_pipeline() first

Along side these race conditions, the work done by
vsp1_video_setup_pipeline() is undone by the re-initialisation during a
suspend resume cycle, and an active pipeline does not attempt to
reconfigure the correct routing and init parameters for the entities.

To repair all of these issues, we can move the call to a conditional
inside vsp1_video_pipeline_run() and ensure that this can only be called
on the last stream which calls into vsp1_video_start_streaming()

As a convenient side effect of this, by specifying that the
configuration has been lost during suspend/resume actions - the
vsp1_video_pipeline_run() call can re-initialise pipelines when
necessary thus repairing resume actions for active m2m pipelines.

Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>

---
v3:
 - Move 'flag reset' to be inside the vsp1_reset_wpf() function call
 - Tidy up the wpf->pipe reference for the configured flag

 drivers/media/platform/vsp1/vsp1_drv.c   |  4 ++++
 drivers/media/platform/vsp1/vsp1_pipe.c  |  1 +
 drivers/media/platform/vsp1/vsp1_pipe.h  |  2 ++
 drivers/media/platform/vsp1/vsp1_video.c | 20 +++++++++-----------
 4 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/drivers/media/platform/vsp1/vsp1_drv.c b/drivers/media/platform/vsp1/vsp1_drv.c
index 57c713a4e1df..1dc3726c4e83 100644
--- a/drivers/media/platform/vsp1/vsp1_drv.c
+++ b/drivers/media/platform/vsp1/vsp1_drv.c
@@ -413,6 +413,7 @@ static int vsp1_create_entities(struct vsp1_device *vsp1)
 
 int vsp1_reset_wpf(struct vsp1_device *vsp1, unsigned int index)
 {
+	struct vsp1_rwpf *wpf = vsp1->wpf[index];
 	unsigned int timeout;
 	u32 status;
 
@@ -429,6 +430,9 @@ int vsp1_reset_wpf(struct vsp1_device *vsp1, unsigned int index)
 		usleep_range(1000, 2000);
 	}
 
+	if (wpf->pipe)
+		wpf->pipe->configured = false;
+
 	if (!timeout) {
 		dev_err(vsp1->dev, "failed to reset wpf.%u\n", index);
 		return -ETIMEDOUT;
diff --git a/drivers/media/platform/vsp1/vsp1_pipe.c b/drivers/media/platform/vsp1/vsp1_pipe.c
index 756ca4ea7668..7ddf862ee403 100644
--- a/drivers/media/platform/vsp1/vsp1_pipe.c
+++ b/drivers/media/platform/vsp1/vsp1_pipe.c
@@ -208,6 +208,7 @@ void vsp1_pipeline_init(struct vsp1_pipeline *pipe)
 
 	INIT_LIST_HEAD(&pipe->entities);
 	pipe->state = VSP1_PIPELINE_STOPPED;
+	pipe->configured = false;
 }
 
 /* Must be called with the pipe irqlock held. */
diff --git a/drivers/media/platform/vsp1/vsp1_pipe.h b/drivers/media/platform/vsp1/vsp1_pipe.h
index ac4ad2655551..0743b9fcb655 100644
--- a/drivers/media/platform/vsp1/vsp1_pipe.h
+++ b/drivers/media/platform/vsp1/vsp1_pipe.h
@@ -61,6 +61,7 @@ enum vsp1_pipeline_state {
  * @pipe: the media pipeline
  * @irqlock: protects the pipeline state
  * @state: current state
+ * @configured: determines routing configuration active on cell.
  * @wq: wait queue to wait for state change completion
  * @frame_end: frame end interrupt handler
  * @lock: protects the pipeline use count and stream count
@@ -86,6 +87,7 @@ struct vsp1_pipeline {
 
 	spinlock_t irqlock;
 	enum vsp1_pipeline_state state;
+	bool configured;
 	wait_queue_head_t wq;
 
 	void (*frame_end)(struct vsp1_pipeline *pipe);
diff --git a/drivers/media/platform/vsp1/vsp1_video.c b/drivers/media/platform/vsp1/vsp1_video.c
index 44b687c0b8df..7ff9f4c19ff0 100644
--- a/drivers/media/platform/vsp1/vsp1_video.c
+++ b/drivers/media/platform/vsp1/vsp1_video.c
@@ -388,6 +388,8 @@ static int vsp1_video_setup_pipeline(struct vsp1_pipeline *pipe)
 					       VSP1_ENTITY_PARAMS_INIT);
 	}
 
+	pipe->configured = true;
+
 	return 0;
 }
 
@@ -411,6 +413,9 @@ static void vsp1_video_pipeline_run(struct vsp1_pipeline *pipe)
 	struct vsp1_device *vsp1 = pipe->output->entity.vsp1;
 	struct vsp1_entity *entity;
 
+	if (!pipe->configured)
+		vsp1_video_setup_pipeline(pipe);
+
 	if (!pipe->dl)
 		pipe->dl = vsp1_dl_list_get(pipe->output->dlm);
 
@@ -793,25 +798,18 @@ static int vsp1_video_start_streaming(struct vb2_queue *vq, unsigned int count)
 	struct vsp1_video *video = vb2_get_drv_priv(vq);
 	struct vsp1_pipeline *pipe = video->rwpf->pipe;
 	unsigned long flags;
-	int ret;
 
 	mutex_lock(&pipe->lock);
 	if (pipe->stream_count == pipe->num_inputs) {
-		ret = vsp1_video_setup_pipeline(pipe);
-		if (ret < 0) {
-			mutex_unlock(&pipe->lock);
-			return ret;
-		}
+		spin_lock_irqsave(&pipe->irqlock, flags);
+		if (vsp1_pipeline_ready(pipe))
+			vsp1_video_pipeline_run(pipe);
+		spin_unlock_irqrestore(&pipe->irqlock, flags);
 	}
 
 	pipe->stream_count++;
 	mutex_unlock(&pipe->lock);
 
-	spin_lock_irqsave(&pipe->irqlock, flags);
-	if (vsp1_pipeline_ready(pipe))
-		vsp1_video_pipeline_run(pipe);
-	spin_unlock_irqrestore(&pipe->irqlock, flags);
-
 	return 0;
 }
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCHv3 3/4] v4l: vsp1: Use local display lists and remove global pipe->dl
  2016-12-13 17:59 [PATCHv3 0/4] v4l: vsp1: Fix suspend/resume and race on M2M pipelines Kieran Bingham
  2016-12-13 17:59 ` [PATCHv3 1/4] v4l: vsp1: Move vsp1_video_setup_pipeline() Kieran Bingham
  2016-12-13 17:59 ` [PATCHv3 2/4] v4l: vsp1: Refactor video pipeline configuration Kieran Bingham
@ 2016-12-13 17:59 ` Kieran Bingham
  2016-12-13 17:59 ` [PATCHv3 RFC 4/4] media: Catch null pipes on pipeline stop Kieran Bingham
  3 siblings, 0 replies; 18+ messages in thread
From: Kieran Bingham @ 2016-12-13 17:59 UTC (permalink / raw)
  To: laurent.pinchart; +Cc: linux-renesas-soc, linux-media, Kieran Bingham

The usage of pipe->dl is susceptible to races, and it is redundant to
keep this pointer in a larger scoped context.

Now that the calling order of vsp1_video_setup_pipeline() has been
adapted, it is possible to remove the pipe->dl and pass the variable as
required.

Currently the pipe->dl is set during the atomic begin hook, but it is
not utilised until the flush. Moving this should do no harm.

Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
---
 drivers/media/platform/vsp1/vsp1_drm.c   | 20 +++++++-------
 drivers/media/platform/vsp1/vsp1_pipe.h  |  2 --
 drivers/media/platform/vsp1/vsp1_video.c | 45 ++++++++++++++------------------
 3 files changed, 30 insertions(+), 37 deletions(-)

diff --git a/drivers/media/platform/vsp1/vsp1_drm.c b/drivers/media/platform/vsp1/vsp1_drm.c
index cd209dccff1b..bf735e85b597 100644
--- a/drivers/media/platform/vsp1/vsp1_drm.c
+++ b/drivers/media/platform/vsp1/vsp1_drm.c
@@ -220,9 +220,6 @@ void vsp1_du_atomic_begin(struct device *dev)
 	struct vsp1_pipeline *pipe = &vsp1->drm->pipe;
 
 	vsp1->drm->num_inputs = pipe->num_inputs;
-
-	/* Prepare the display list. */
-	pipe->dl = vsp1_dl_list_get(pipe->output->dlm);
 }
 EXPORT_SYMBOL_GPL(vsp1_du_atomic_begin);
 
@@ -426,10 +423,14 @@ void vsp1_du_atomic_flush(struct device *dev)
 	struct vsp1_pipeline *pipe = &vsp1->drm->pipe;
 	struct vsp1_rwpf *inputs[VSP1_MAX_RPF] = { NULL, };
 	struct vsp1_entity *entity;
+	struct vsp1_dl_list *dl;
 	unsigned long flags;
 	unsigned int i;
 	int ret;
 
+	/* Prepare the display list. */
+	dl = vsp1_dl_list_get(pipe->output->dlm);
+
 	/* Count the number of enabled inputs and sort them by Z-order. */
 	pipe->num_inputs = 0;
 
@@ -484,26 +485,25 @@ void vsp1_du_atomic_flush(struct device *dev)
 			struct vsp1_rwpf *rpf = to_rwpf(&entity->subdev);
 
 			if (!pipe->inputs[rpf->entity.index]) {
-				vsp1_dl_list_write(pipe->dl, entity->route->reg,
+				vsp1_dl_list_write(dl, entity->route->reg,
 						   VI6_DPR_NODE_UNUSED);
 				continue;
 			}
 		}
 
-		vsp1_entity_route_setup(entity, pipe->dl);
+		vsp1_entity_route_setup(entity, dl);
 
 		if (entity->ops->configure) {
-			entity->ops->configure(entity, pipe, pipe->dl,
+			entity->ops->configure(entity, pipe, dl,
 					       VSP1_ENTITY_PARAMS_INIT);
-			entity->ops->configure(entity, pipe, pipe->dl,
+			entity->ops->configure(entity, pipe, dl,
 					       VSP1_ENTITY_PARAMS_RUNTIME);
-			entity->ops->configure(entity, pipe, pipe->dl,
+			entity->ops->configure(entity, pipe, dl,
 					       VSP1_ENTITY_PARAMS_PARTITION);
 		}
 	}
 
-	vsp1_dl_list_commit(pipe->dl);
-	pipe->dl = NULL;
+	vsp1_dl_list_commit(dl);
 
 	/* Start or stop the pipeline if needed. */
 	if (!vsp1->drm->num_inputs && pipe->num_inputs) {
diff --git a/drivers/media/platform/vsp1/vsp1_pipe.h b/drivers/media/platform/vsp1/vsp1_pipe.h
index 0743b9fcb655..98980c85081f 100644
--- a/drivers/media/platform/vsp1/vsp1_pipe.h
+++ b/drivers/media/platform/vsp1/vsp1_pipe.h
@@ -108,8 +108,6 @@ struct vsp1_pipeline {
 
 	struct list_head entities;
 
-	struct vsp1_dl_list *dl;
-
 	unsigned int div_size;
 	unsigned int partitions;
 	struct v4l2_rect partition;
diff --git a/drivers/media/platform/vsp1/vsp1_video.c b/drivers/media/platform/vsp1/vsp1_video.c
index 7ff9f4c19ff0..9619ed4dda7c 100644
--- a/drivers/media/platform/vsp1/vsp1_video.c
+++ b/drivers/media/platform/vsp1/vsp1_video.c
@@ -350,18 +350,14 @@ static void vsp1_video_frame_end(struct vsp1_pipeline *pipe,
 	pipe->buffers_ready |= 1 << video->pipe_index;
 }
 
-static int vsp1_video_setup_pipeline(struct vsp1_pipeline *pipe)
+static int vsp1_video_setup_pipeline(struct vsp1_pipeline *pipe,
+				     struct vsp1_dl_list *dl)
 {
 	struct vsp1_entity *entity;
 
 	/* Determine this pipelines sizes for image partitioning support. */
 	vsp1_video_pipeline_setup_partitions(pipe);
 
-	/* Prepare the display list. */
-	pipe->dl = vsp1_dl_list_get(pipe->output->dlm);
-	if (!pipe->dl)
-		return -ENOMEM;
-
 	if (pipe->uds) {
 		struct vsp1_uds *uds = to_uds(&pipe->uds->subdev);
 
@@ -381,10 +377,10 @@ static int vsp1_video_setup_pipeline(struct vsp1_pipeline *pipe)
 	}
 
 	list_for_each_entry(entity, &pipe->entities, list_pipe) {
-		vsp1_entity_route_setup(entity, pipe->dl);
+		vsp1_entity_route_setup(entity, dl);
 
 		if (entity->ops->configure)
-			entity->ops->configure(entity, pipe, pipe->dl,
+			entity->ops->configure(entity, pipe, dl,
 					       VSP1_ENTITY_PARAMS_INIT);
 	}
 
@@ -412,12 +408,16 @@ static void vsp1_video_pipeline_run(struct vsp1_pipeline *pipe)
 {
 	struct vsp1_device *vsp1 = pipe->output->entity.vsp1;
 	struct vsp1_entity *entity;
+	struct vsp1_dl_list *dl;
 
-	if (!pipe->configured)
-		vsp1_video_setup_pipeline(pipe);
+	dl = vsp1_dl_list_get(pipe->output->dlm);
+	if (!dl) {
+		dev_err(vsp1->dev, "Failed to obtain a dl list\n");
+		return;
+	}
 
-	if (!pipe->dl)
-		pipe->dl = vsp1_dl_list_get(pipe->output->dlm);
+	if (!pipe->configured)
+		vsp1_video_setup_pipeline(pipe, dl);
 
 	/*
 	 * Start with the runtime parameters as the configure operation can
@@ -426,45 +426,43 @@ static void vsp1_video_pipeline_run(struct vsp1_pipeline *pipe)
 	 */
 	list_for_each_entry(entity, &pipe->entities, list_pipe) {
 		if (entity->ops->configure)
-			entity->ops->configure(entity, pipe, pipe->dl,
+			entity->ops->configure(entity, pipe, dl,
 					       VSP1_ENTITY_PARAMS_RUNTIME);
 	}
 
 	/* Run the first partition */
 	pipe->current_partition = 0;
-	vsp1_video_pipeline_run_partition(pipe, pipe->dl);
+	vsp1_video_pipeline_run_partition(pipe, dl);
 
 	/* Process consecutive partitions as necessary */
 	for (pipe->current_partition = 1;
 	     pipe->current_partition < pipe->partitions;
 	     pipe->current_partition++) {
-		struct vsp1_dl_list *dl;
+		struct vsp1_dl_list *child;
 
 		/*
 		 * Partition configuration operations will utilise
 		 * the pipe->current_partition variable to determine
 		 * the work they should complete.
 		 */
-		dl = vsp1_dl_list_get(pipe->output->dlm);
+		child = vsp1_dl_list_get(pipe->output->dlm);
 
 		/*
 		 * An incomplete chain will still function, but output only
 		 * the partitions that had a dl available. The frame end
 		 * interrupt will be marked on the last dl in the chain.
 		 */
-		if (!dl) {
+		if (!child) {
 			dev_err(vsp1->dev, "Failed to obtain a dl list. Frame will be incomplete\n");
 			break;
 		}
 
-		vsp1_video_pipeline_run_partition(pipe, dl);
-		vsp1_dl_list_add_chain(pipe->dl, dl);
+		vsp1_video_pipeline_run_partition(pipe, child);
+		vsp1_dl_list_add_chain(dl, child);
 	}
 
 	/* Complete, and commit the head display list. */
-	vsp1_dl_list_commit(pipe->dl);
-	pipe->dl = NULL;
-
+	vsp1_dl_list_commit(dl);
 	vsp1_pipeline_run(pipe);
 }
 
@@ -835,9 +833,6 @@ static void vsp1_video_stop_streaming(struct vb2_queue *vq)
 		ret = vsp1_pipeline_stop(pipe);
 		if (ret == -ETIMEDOUT)
 			dev_err(video->vsp1->dev, "pipeline stop timeout\n");
-
-		vsp1_dl_list_put(pipe->dl);
-		pipe->dl = NULL;
 	}
 	mutex_unlock(&pipe->lock);
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCHv3 RFC 4/4] media: Catch null pipes on pipeline stop
  2016-12-13 17:59 [PATCHv3 0/4] v4l: vsp1: Fix suspend/resume and race on M2M pipelines Kieran Bingham
                   ` (2 preceding siblings ...)
  2016-12-13 17:59 ` [PATCHv3 3/4] v4l: vsp1: Use local display lists and remove global pipe->dl Kieran Bingham
@ 2016-12-13 17:59 ` Kieran Bingham
  2016-12-14  7:28   ` Sakari Ailus
  2016-12-14 19:56   ` Kieran Bingham
  3 siblings, 2 replies; 18+ messages in thread
From: Kieran Bingham @ 2016-12-13 17:59 UTC (permalink / raw)
  To: laurent.pinchart; +Cc: linux-renesas-soc, linux-media, Kieran Bingham

media_entity_pipeline_stop() can be called through error paths with a
NULL entity pipe object. In this instance, stopping is a no-op, so
simply return without any action

Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
---

I've marked this patch as RFC, although if deemed suitable, by all means
integrate it as is.

When testing suspend/resume operations on VSP1, I encountered a segfault on the
WARN_ON(!pipe->streaming_count) line, where 'pipe == NULL'. The simple
protection fix is to return early in this instance, as this patch does however:

A) Does this early return path warrant a WARN() statement itself, to identify
drivers which are incorrectly calling media_entity_pipeline_stop() with an
invalid entity, or would this just be noise ...

and therefore..

B) I also partly assume this patch could simply get NAK'd with a request to go
and dig out the root cause of calling media_entity_pipeline_stop() with an
invalid entity. 

My brief investigation so far here so far shows that it's almost a second order
fault - where the first suspend resume cycle completes but leaves the entity in
an invalid state having followed an error path - and then on a second
suspend/resume - the stop fails with the affected segfault.

If statement A) or B) apply here, please drop this patch from the series, and
don't consider it a blocking issue for the other 3 patches.

Kieran

 drivers/media/media-entity.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/media/media-entity.c b/drivers/media/media-entity.c
index c68239e60487..93c9cbf4bf46 100644
--- a/drivers/media/media-entity.c
+++ b/drivers/media/media-entity.c
@@ -508,6 +508,8 @@ void __media_entity_pipeline_stop(struct media_entity *entity)
 	struct media_entity_graph *graph = &entity->pipe->graph;
 	struct media_pipeline *pipe = entity->pipe;

+	if (!pipe)
+		return;

 	WARN_ON(!pipe->streaming_count);
 	media_entity_graph_walk_start(graph, entity);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCHv3 RFC 4/4] media: Catch null pipes on pipeline stop
  2016-12-13 17:59 ` [PATCHv3 RFC 4/4] media: Catch null pipes on pipeline stop Kieran Bingham
@ 2016-12-14  7:28   ` Sakari Ailus
  2016-12-14 12:27       ` Kieran Bingham
  2016-12-14 19:56   ` Kieran Bingham
  1 sibling, 1 reply; 18+ messages in thread
From: Sakari Ailus @ 2016-12-14  7:28 UTC (permalink / raw)
  To: Kieran Bingham; +Cc: laurent.pinchart, linux-renesas-soc, linux-media

Hi Kieran,

On Tue, Dec 13, 2016 at 05:59:44PM +0000, Kieran Bingham wrote:
> media_entity_pipeline_stop() can be called through error paths with a
> NULL entity pipe object. In this instance, stopping is a no-op, so
> simply return without any action

The approach of returning silently is wrong; the streaming count is indeed a
count: you have to decrement it the exactly same number of times it's been
increased. Code that attempts to call __media_entity_pipeline_stop() on an
entity that's not streaming is simply buggy.

I've got a patch here that adds a warning to graph traversal on streaming
count being zero. I sent a pull request including that some days ago:

<URL:http://www.spinics.net/lists/linux-media/msg108980.html>
<URL:http://www.spinics.net/lists/linux-media/msg108995.html>

I think the check here could simply be removed as the check is done for
every entity in the pipeline with the above patch.

If there's still a wish to check for the pipe field which should not be
written by drivers, it should be done during pipeline traversal so that it
applies to all entities in the pipeline, not just where traversal starts.

> 
> Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
> ---
> 
> I've marked this patch as RFC, although if deemed suitable, by all means
> integrate it as is.
> 
> When testing suspend/resume operations on VSP1, I encountered a segfault on the
> WARN_ON(!pipe->streaming_count) line, where 'pipe == NULL'. The simple
> protection fix is to return early in this instance, as this patch does however:
> 
> A) Does this early return path warrant a WARN() statement itself, to identify
> drivers which are incorrectly calling media_entity_pipeline_stop() with an
> invalid entity, or would this just be noise ...
> 
> and therefore..
> 
> B) I also partly assume this patch could simply get NAK'd with a request to go
> and dig out the root cause of calling media_entity_pipeline_stop() with an
> invalid entity. 
> 
> My brief investigation so far here so far shows that it's almost a second order
> fault - where the first suspend resume cycle completes but leaves the entity in
> an invalid state having followed an error path - and then on a second
> suspend/resume - the stop fails with the affected segfault.
> 
> If statement A) or B) apply here, please drop this patch from the series, and
> don't consider it a blocking issue for the other 3 patches.
> 
> Kieran
> 
> 
>  drivers/media/media-entity.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/media/media-entity.c b/drivers/media/media-entity.c
> index c68239e60487..93c9cbf4bf46 100644
> --- a/drivers/media/media-entity.c
> +++ b/drivers/media/media-entity.c
> @@ -508,6 +508,8 @@ void __media_entity_pipeline_stop(struct media_entity *entity)
>  	struct media_entity_graph *graph = &entity->pipe->graph;
>  	struct media_pipeline *pipe = entity->pipe;
>  
> +	if (!pipe)
> +		return;
>  
>  	WARN_ON(!pipe->streaming_count);
>  	media_entity_graph_walk_start(graph, entity);

-- 
Kind regards,

Sakari Ailus
e-mail: sakari.ailus@iki.fi	XMPP: sailus@retiisi.org.uk

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCHv3 RFC 4/4] media: Catch null pipes on pipeline stop
  2016-12-14  7:28   ` Sakari Ailus
@ 2016-12-14 12:27       ` Kieran Bingham
  0 siblings, 0 replies; 18+ messages in thread
From: Kieran Bingham @ 2016-12-14 12:27 UTC (permalink / raw)
  To: Sakari Ailus, Kieran Bingham
  Cc: laurent.pinchart, linux-renesas-soc, linux-media

Hi Sakari,

On 14/12/16 07:28, Sakari Ailus wrote:
> Hi Kieran,
> 
> On Tue, Dec 13, 2016 at 05:59:44PM +0000, Kieran Bingham wrote:
>> media_entity_pipeline_stop() can be called through error paths with a
>> NULL entity pipe object. In this instance, stopping is a no-op, so
>> simply return without any action
> 
> The approach of returning silently is wrong; the streaming count is indeed a
> count: you have to decrement it the exactly same number of times it's been
> increased. Code that attempts to call __media_entity_pipeline_stop() on an
> entity that's not streaming is simply buggy.

Ok, Thanks for the heads up on where to look, as I suspected we are in
section B) of my comments below :D

> I've got a patch here that adds a warning to graph traversal on streaming
> count being zero. I sent a pull request including that some days ago:
> 
> <URL:http://www.spinics.net/lists/linux-media/msg108980.html>
> <URL:http://www.spinics.net/lists/linux-media/msg108995.html>

Excellent, thanks, I've merged your branch into mine, and I'll
investigate where the error is actually coming from.

--
Ok - so I've merged your patches, (and dropped this patch) but they
don't fire any warning before I hit my null-pointer dereference in
__media_pipeline_stop(), on the WARN_ON(!pipe->streaming_count);

So I think this is a case of calling stop on an invalid entity rather
than an unbalanced start/stop somehow:

[  613.830334] vsp1 fea38000.vsp: failed to reset wpf.0
[  613.838137] PM: resume of devices complete after 124.410 msecs
[  613.847390] PM: Finishing wakeup.
[  613.852247] Restarting tasks ... done.
[  615.864801] ravb e6800000.ethernet eth0: Link is Up - 100Mbps/Full -
flow control rx/tx
[  617.011299] vsp1 fea38000.vsp: failed to reset wpf.0
[  617.017777] vsp1 fea38000.vsp: DRM pipeline stop timeout
[  617.024649] Unable to handle kernel NULL pointer dereference at
virtual address 00000000
[  617.034273] pgd = ffffff80098c5000
[  617.039171] [00000000] *pgd=000000073fffe003[  617.043336] ,
*pud=000000073fffe003
, *pmd=0000000000000000[  617.050353]
[  617.053282] Internal error: Oops: 96000005 [#1] PREEMPT SMP
[  617.060309] Modules linked in:
[  617.064793] CPU: 1 PID: 683 Comm: kworker/1:2 Not tainted
4.9.0-00506-g4d2a939ea654 #350
[  617.074320] Hardware name: Renesas Salvator-X board based on r8a7795 (DT)
[  617.082578] Workqueue: events drm_mode_rmfb_work_fn
[  617.088884] task: ffffffc6fabd2b00 task.stack: ffffffc6f9d90000
[  617.096246] PC is at __media_pipeline_stop+0x24/0xe8
[  617.102644] LR is at media_pipeline_stop+0x34/0x48

<regs and stack snipped>

[  617.863386] [<ffffff800853e724>] __media_pipeline_stop+0x24/0xe8
[  617.870417] [<ffffff800853e81c>] media_pipeline_stop+0x34/0x48
[  617.877272] [<ffffff8008568000>] vsp1_du_setup_lif+0x68/0x5a8
[  617.884047] [<ffffff80084de9d4>] rcar_du_vsp_disable+0x2c/0x38
[  617.890898] [<ffffff80084da710>] rcar_du_crtc_stop+0x198/0x1e8
[  617.897749] [<ffffff80084da780>] rcar_du_crtc_disable+0x20/0x38
[  617.904683] [<ffffff80084ac9b0>]
drm_atomic_helper_commit_modeset_disables+0x1b0/0x3d8
[  617.913634] [<ffffff80084db9cc>] rcar_du_atomic_complete+0x34/0xc8
[  617.920828] [<ffffff80084dbd20>] rcar_du_atomic_commit+0x2c0/0x2d0
[  617.928012] [<ffffff80084ceffc>] drm_atomic_commit+0x5c/0x68
[  617.934663] [<ffffff80084ad2e0>] drm_atomic_helper_set_config+0x90/0xd0
[  617.942288] [<ffffff80084c06a0>] drm_mode_set_config_internal+0x70/0x100
[  617.949996] [<ffffff80084c0760>] drm_crtc_force_disable+0x30/0x40
[  617.957084] [<ffffff80084d0b10>] drm_framebuffer_remove+0x100/0x128
[  617.964347] [<ffffff80084d0b80>] drm_mode_rmfb_work_fn+0x48/0x60
[  617.971352] [<ffffff80080e9770>] process_one_work+0x1e0/0x738
[  617.978084] [<ffffff80080e9f24>] worker_thread+0x25c/0x4a0
[  617.984546] [<ffffff80080f11ac>] kthread+0xd4/0xe8
[  617.990305] [<ffffff8008083680>] ret_from_fork+0x10/0x50


So, I'll have to schedule some time to investigate this deeper and find
out where we are calling stop on an invalid entity object.

I agree that this patch should simply be dropped though, it was more to
promote a bit of discussion on the area to help me understand what was
going on, which it has!.

Thankyou Sakari!

--
Regards

Kieran


> I think the check here could simply be removed as the check is done for
> every entity in the pipeline with the above patch.
> 
> If there's still a wish to check for the pipe field which should not be
> written by drivers, it should be done during pipeline traversal so that it
> applies to all entities in the pipeline, not just where traversal starts.
> 
>>
>> Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
>> ---
>>
>> I've marked this patch as RFC, although if deemed suitable, by all means
>> integrate it as is.
>>
>> When testing suspend/resume operations on VSP1, I encountered a segfault on the
>> WARN_ON(!pipe->streaming_count) line, where 'pipe == NULL'. The simple
>> protection fix is to return early in this instance, as this patch does however:
>>
>> A) Does this early return path warrant a WARN() statement itself, to identify
>> drivers which are incorrectly calling media_entity_pipeline_stop() with an
>> invalid entity, or would this just be noise ...
>>
>> and therefore..
>>
>> B) I also partly assume this patch could simply get NAK'd with a request to go
>> and dig out the root cause of calling media_entity_pipeline_stop() with an
>> invalid entity. 
>>
>> My brief investigation so far here so far shows that it's almost a second order
>> fault - where the first suspend resume cycle completes but leaves the entity in
>> an invalid state having followed an error path - and then on a second
>> suspend/resume - the stop fails with the affected segfault.
>>
>> If statement A) or B) apply here, please drop this patch from the series, and
>> don't consider it a blocking issue for the other 3 patches.
>>
>> Kieran
>>
>>
>>  drivers/media/media-entity.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/media/media-entity.c b/drivers/media/media-entity.c
>> index c68239e60487..93c9cbf4bf46 100644
>> --- a/drivers/media/media-entity.c
>> +++ b/drivers/media/media-entity.c
>> @@ -508,6 +508,8 @@ void __media_entity_pipeline_stop(struct media_entity *entity)
>>  	struct media_entity_graph *graph = &entity->pipe->graph;
>>  	struct media_pipeline *pipe = entity->pipe;
>>  
>> +	if (!pipe)
>> +		return;
>>  
>>  	WARN_ON(!pipe->streaming_count);
>>  	media_entity_graph_walk_start(graph, entity);
> 

-- 
Regards

Kieran Bingham

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCHv3 RFC 4/4] media: Catch null pipes on pipeline stop
@ 2016-12-14 12:27       ` Kieran Bingham
  0 siblings, 0 replies; 18+ messages in thread
From: Kieran Bingham @ 2016-12-14 12:27 UTC (permalink / raw)
  To: Sakari Ailus, Kieran Bingham
  Cc: laurent.pinchart, linux-renesas-soc, linux-media

Hi Sakari,

On 14/12/16 07:28, Sakari Ailus wrote:
> Hi Kieran,
> 
> On Tue, Dec 13, 2016 at 05:59:44PM +0000, Kieran Bingham wrote:
>> media_entity_pipeline_stop() can be called through error paths with a
>> NULL entity pipe object. In this instance, stopping is a no-op, so
>> simply return without any action
> 
> The approach of returning silently is wrong; the streaming count is indeed a
> count: you have to decrement it the exactly same number of times it's been
> increased. Code that attempts to call __media_entity_pipeline_stop() on an
> entity that's not streaming is simply buggy.

Ok, Thanks for the heads up on where to look, as I suspected we are in
section B) of my comments below :D

> I've got a patch here that adds a warning to graph traversal on streaming
> count being zero. I sent a pull request including that some days ago:
> 
> <URL:http://www.spinics.net/lists/linux-media/msg108980.html>
> <URL:http://www.spinics.net/lists/linux-media/msg108995.html>

Excellent, thanks, I've merged your branch into mine, and I'll
investigate where the error is actually coming from.

--
Ok - so I've merged your patches, (and dropped this patch) but they
don't fire any warning before I hit my null-pointer dereference in
__media_pipeline_stop(), on the WARN_ON(!pipe->streaming_count);

So I think this is a case of calling stop on an invalid entity rather
than an unbalanced start/stop somehow:

[  613.830334] vsp1 fea38000.vsp: failed to reset wpf.0
[  613.838137] PM: resume of devices complete after 124.410 msecs
[  613.847390] PM: Finishing wakeup.
[  613.852247] Restarting tasks ... done.
[  615.864801] ravb e6800000.ethernet eth0: Link is Up - 100Mbps/Full -
flow control rx/tx
[  617.011299] vsp1 fea38000.vsp: failed to reset wpf.0
[  617.017777] vsp1 fea38000.vsp: DRM pipeline stop timeout
[  617.024649] Unable to handle kernel NULL pointer dereference at
virtual address 00000000
[  617.034273] pgd = ffffff80098c5000
[  617.039171] [00000000] *pgd=000000073fffe003[  617.043336] ,
*pud=000000073fffe003
, *pmd=0000000000000000[  617.050353]
[  617.053282] Internal error: Oops: 96000005 [#1] PREEMPT SMP
[  617.060309] Modules linked in:
[  617.064793] CPU: 1 PID: 683 Comm: kworker/1:2 Not tainted
4.9.0-00506-g4d2a939ea654 #350
[  617.074320] Hardware name: Renesas Salvator-X board based on r8a7795 (DT)
[  617.082578] Workqueue: events drm_mode_rmfb_work_fn
[  617.088884] task: ffffffc6fabd2b00 task.stack: ffffffc6f9d90000
[  617.096246] PC is at __media_pipeline_stop+0x24/0xe8
[  617.102644] LR is at media_pipeline_stop+0x34/0x48

<regs and stack snipped>

[  617.863386] [<ffffff800853e724>] __media_pipeline_stop+0x24/0xe8
[  617.870417] [<ffffff800853e81c>] media_pipeline_stop+0x34/0x48
[  617.877272] [<ffffff8008568000>] vsp1_du_setup_lif+0x68/0x5a8
[  617.884047] [<ffffff80084de9d4>] rcar_du_vsp_disable+0x2c/0x38
[  617.890898] [<ffffff80084da710>] rcar_du_crtc_stop+0x198/0x1e8
[  617.897749] [<ffffff80084da780>] rcar_du_crtc_disable+0x20/0x38
[  617.904683] [<ffffff80084ac9b0>]
drm_atomic_helper_commit_modeset_disables+0x1b0/0x3d8
[  617.913634] [<ffffff80084db9cc>] rcar_du_atomic_complete+0x34/0xc8
[  617.920828] [<ffffff80084dbd20>] rcar_du_atomic_commit+0x2c0/0x2d0
[  617.928012] [<ffffff80084ceffc>] drm_atomic_commit+0x5c/0x68
[  617.934663] [<ffffff80084ad2e0>] drm_atomic_helper_set_config+0x90/0xd0
[  617.942288] [<ffffff80084c06a0>] drm_mode_set_config_internal+0x70/0x100
[  617.949996] [<ffffff80084c0760>] drm_crtc_force_disable+0x30/0x40
[  617.957084] [<ffffff80084d0b10>] drm_framebuffer_remove+0x100/0x128
[  617.964347] [<ffffff80084d0b80>] drm_mode_rmfb_work_fn+0x48/0x60
[  617.971352] [<ffffff80080e9770>] process_one_work+0x1e0/0x738
[  617.978084] [<ffffff80080e9f24>] worker_thread+0x25c/0x4a0
[  617.984546] [<ffffff80080f11ac>] kthread+0xd4/0xe8
[  617.990305] [<ffffff8008083680>] ret_from_fork+0x10/0x50


So, I'll have to schedule some time to investigate this deeper and find
out where we are calling stop on an invalid entity object.

I agree that this patch should simply be dropped though, it was more to
promote a bit of discussion on the area to help me understand what was
going on, which it has!.

Thankyou Sakari!

--
Regards

Kieran


> I think the check here could simply be removed as the check is done for
> every entity in the pipeline with the above patch.
> 
> If there's still a wish to check for the pipe field which should not be
> written by drivers, it should be done during pipeline traversal so that it
> applies to all entities in the pipeline, not just where traversal starts.
> 
>>
>> Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
>> ---
>>
>> I've marked this patch as RFC, although if deemed suitable, by all means
>> integrate it as is.
>>
>> When testing suspend/resume operations on VSP1, I encountered a segfault on the
>> WARN_ON(!pipe->streaming_count) line, where 'pipe == NULL'. The simple
>> protection fix is to return early in this instance, as this patch does however:
>>
>> A) Does this early return path warrant a WARN() statement itself, to identify
>> drivers which are incorrectly calling media_entity_pipeline_stop() with an
>> invalid entity, or would this just be noise ...
>>
>> and therefore..
>>
>> B) I also partly assume this patch could simply get NAK'd with a request to go
>> and dig out the root cause of calling media_entity_pipeline_stop() with an
>> invalid entity. 
>>
>> My brief investigation so far here so far shows that it's almost a second order
>> fault - where the first suspend resume cycle completes but leaves the entity in
>> an invalid state having followed an error path - and then on a second
>> suspend/resume - the stop fails with the affected segfault.
>>
>> If statement A) or B) apply here, please drop this patch from the series, and
>> don't consider it a blocking issue for the other 3 patches.
>>
>> Kieran
>>
>>
>>  drivers/media/media-entity.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/media/media-entity.c b/drivers/media/media-entity.c
>> index c68239e60487..93c9cbf4bf46 100644
>> --- a/drivers/media/media-entity.c
>> +++ b/drivers/media/media-entity.c
>> @@ -508,6 +508,8 @@ void __media_entity_pipeline_stop(struct media_entity *entity)
>>  	struct media_entity_graph *graph = &entity->pipe->graph;
>>  	struct media_pipeline *pipe = entity->pipe;
>>  
>> +	if (!pipe)
>> +		return;
>>  
>>  	WARN_ON(!pipe->streaming_count);
>>  	media_entity_graph_walk_start(graph, entity);
> 

-- 
Regards

Kieran Bingham

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCHv3 RFC 4/4] media: Catch null pipes on pipeline stop
  2016-12-14 12:27       ` Kieran Bingham
  (?)
@ 2016-12-14 12:43       ` Sakari Ailus
  2016-12-14 17:53         ` Kieran Bingham
  -1 siblings, 1 reply; 18+ messages in thread
From: Sakari Ailus @ 2016-12-14 12:43 UTC (permalink / raw)
  To: Kieran Bingham
  Cc: Kieran Bingham, laurent.pinchart, linux-renesas-soc, linux-media

Hi Kieran,

On Wed, Dec 14, 2016 at 12:27:37PM +0000, Kieran Bingham wrote:
> Hi Sakari,
> 
> On 14/12/16 07:28, Sakari Ailus wrote:
> > Hi Kieran,
> > 
> > On Tue, Dec 13, 2016 at 05:59:44PM +0000, Kieran Bingham wrote:
> >> media_entity_pipeline_stop() can be called through error paths with a
> >> NULL entity pipe object. In this instance, stopping is a no-op, so
> >> simply return without any action
> > 
> > The approach of returning silently is wrong; the streaming count is indeed a
> > count: you have to decrement it the exactly same number of times it's been
> > increased. Code that attempts to call __media_entity_pipeline_stop() on an
> > entity that's not streaming is simply buggy.
> 
> Ok, Thanks for the heads up on where to look, as I suspected we are in
> section B) of my comments below :D
> 
> > I've got a patch here that adds a warning to graph traversal on streaming
> > count being zero. I sent a pull request including that some days ago:
> > 
> > <URL:http://www.spinics.net/lists/linux-media/msg108980.html>
> > <URL:http://www.spinics.net/lists/linux-media/msg108995.html>
> 
> Excellent, thanks, I've merged your branch into mine, and I'll
> investigate where the error is actually coming from.
> 
> --
> Ok - so I've merged your patches, (and dropped this patch) but they
> don't fire any warning before I hit my null-pointer dereference in
> __media_pipeline_stop(), on the WARN_ON(!pipe->streaming_count);
> 
> So I think this is a case of calling stop on an invalid entity rather
> than an unbalanced start/stop somehow:

Not necessarily. The pipe is set to NULL if the count goes to zero.

This check should be dropped, it's ill-placed anyway: if streaming count is
zero the pipe is expected to be NULL anyway in which case you get a NULL
pointer exception (that you saw there). With the check removed, you should
get the warning instead.

> 
> [  613.830334] vsp1 fea38000.vsp: failed to reset wpf.0
> [  613.838137] PM: resume of devices complete after 124.410 msecs
> [  613.847390] PM: Finishing wakeup.
> [  613.852247] Restarting tasks ... done.
> [  615.864801] ravb e6800000.ethernet eth0: Link is Up - 100Mbps/Full -
> flow control rx/tx
> [  617.011299] vsp1 fea38000.vsp: failed to reset wpf.0
> [  617.017777] vsp1 fea38000.vsp: DRM pipeline stop timeout
> [  617.024649] Unable to handle kernel NULL pointer dereference at
> virtual address 00000000
> [  617.034273] pgd = ffffff80098c5000
> [  617.039171] [00000000] *pgd=000000073fffe003[  617.043336] ,
> *pud=000000073fffe003
> , *pmd=0000000000000000[  617.050353]
> [  617.053282] Internal error: Oops: 96000005 [#1] PREEMPT SMP
> [  617.060309] Modules linked in:
> [  617.064793] CPU: 1 PID: 683 Comm: kworker/1:2 Not tainted
> 4.9.0-00506-g4d2a939ea654 #350
> [  617.074320] Hardware name: Renesas Salvator-X board based on r8a7795 (DT)
> [  617.082578] Workqueue: events drm_mode_rmfb_work_fn
> [  617.088884] task: ffffffc6fabd2b00 task.stack: ffffffc6f9d90000
> [  617.096246] PC is at __media_pipeline_stop+0x24/0xe8
> [  617.102644] LR is at media_pipeline_stop+0x34/0x48
> 
> <regs and stack snipped>
> 
> [  617.863386] [<ffffff800853e724>] __media_pipeline_stop+0x24/0xe8
> [  617.870417] [<ffffff800853e81c>] media_pipeline_stop+0x34/0x48
> [  617.877272] [<ffffff8008568000>] vsp1_du_setup_lif+0x68/0x5a8
> [  617.884047] [<ffffff80084de9d4>] rcar_du_vsp_disable+0x2c/0x38
> [  617.890898] [<ffffff80084da710>] rcar_du_crtc_stop+0x198/0x1e8
> [  617.897749] [<ffffff80084da780>] rcar_du_crtc_disable+0x20/0x38
> [  617.904683] [<ffffff80084ac9b0>]
> drm_atomic_helper_commit_modeset_disables+0x1b0/0x3d8
> [  617.913634] [<ffffff80084db9cc>] rcar_du_atomic_complete+0x34/0xc8
> [  617.920828] [<ffffff80084dbd20>] rcar_du_atomic_commit+0x2c0/0x2d0
> [  617.928012] [<ffffff80084ceffc>] drm_atomic_commit+0x5c/0x68
> [  617.934663] [<ffffff80084ad2e0>] drm_atomic_helper_set_config+0x90/0xd0
> [  617.942288] [<ffffff80084c06a0>] drm_mode_set_config_internal+0x70/0x100
> [  617.949996] [<ffffff80084c0760>] drm_crtc_force_disable+0x30/0x40
> [  617.957084] [<ffffff80084d0b10>] drm_framebuffer_remove+0x100/0x128
> [  617.964347] [<ffffff80084d0b80>] drm_mode_rmfb_work_fn+0x48/0x60
> [  617.971352] [<ffffff80080e9770>] process_one_work+0x1e0/0x738
> [  617.978084] [<ffffff80080e9f24>] worker_thread+0x25c/0x4a0
> [  617.984546] [<ffffff80080f11ac>] kthread+0xd4/0xe8
> [  617.990305] [<ffffff8008083680>] ret_from_fork+0x10/0x50
> 
> 
> So, I'll have to schedule some time to investigate this deeper and find
> out where we are calling stop on an invalid entity object.
> 
> I agree that this patch should simply be dropped though, it was more to
> promote a bit of discussion on the area to help me understand what was
> going on, which it has!.
> 
> Thankyou Sakari!
> 
> --
> Regards
> 
> Kieran
> 
> 
> > I think the check here could simply be removed as the check is done for
> > every entity in the pipeline with the above patch.
> > 
> > If there's still a wish to check for the pipe field which should not be
> > written by drivers, it should be done during pipeline traversal so that it
> > applies to all entities in the pipeline, not just where traversal starts.
> > 
> >>
> >> Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
> >> ---
> >>
> >> I've marked this patch as RFC, although if deemed suitable, by all means
> >> integrate it as is.
> >>
> >> When testing suspend/resume operations on VSP1, I encountered a segfault on the
> >> WARN_ON(!pipe->streaming_count) line, where 'pipe == NULL'. The simple
> >> protection fix is to return early in this instance, as this patch does however:
> >>
> >> A) Does this early return path warrant a WARN() statement itself, to identify
> >> drivers which are incorrectly calling media_entity_pipeline_stop() with an
> >> invalid entity, or would this just be noise ...
> >>
> >> and therefore..
> >>
> >> B) I also partly assume this patch could simply get NAK'd with a request to go
> >> and dig out the root cause of calling media_entity_pipeline_stop() with an
> >> invalid entity. 
> >>
> >> My brief investigation so far here so far shows that it's almost a second order
> >> fault - where the first suspend resume cycle completes but leaves the entity in
> >> an invalid state having followed an error path - and then on a second
> >> suspend/resume - the stop fails with the affected segfault.
> >>
> >> If statement A) or B) apply here, please drop this patch from the series, and
> >> don't consider it a blocking issue for the other 3 patches.
> >>
> >> Kieran
> >>
> >>
> >>  drivers/media/media-entity.c | 2 ++
> >>  1 file changed, 2 insertions(+)
> >>
> >> diff --git a/drivers/media/media-entity.c b/drivers/media/media-entity.c
> >> index c68239e60487..93c9cbf4bf46 100644
> >> --- a/drivers/media/media-entity.c
> >> +++ b/drivers/media/media-entity.c
> >> @@ -508,6 +508,8 @@ void __media_entity_pipeline_stop(struct media_entity *entity)
> >>  	struct media_entity_graph *graph = &entity->pipe->graph;
> >>  	struct media_pipeline *pipe = entity->pipe;
> >>  
> >> +	if (!pipe)
> >> +		return;
> >>  
> >>  	WARN_ON(!pipe->streaming_count);
> >>  	media_entity_graph_walk_start(graph, entity);
> > 

-- 
Regards,

Sakari Ailus
e-mail: sakari.ailus@iki.fi	XMPP: sailus@retiisi.org.uk

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCHv3 2/4] v4l: vsp1: Refactor video pipeline configuration
  2016-12-13 17:59 ` [PATCHv3 2/4] v4l: vsp1: Refactor video pipeline configuration Kieran Bingham
@ 2016-12-14 16:30   ` Laurent Pinchart
  2016-12-15 11:50     ` Kieran Bingham
  0 siblings, 1 reply; 18+ messages in thread
From: Laurent Pinchart @ 2016-12-14 16:30 UTC (permalink / raw)
  To: Kieran Bingham; +Cc: linux-renesas-soc, linux-media

Hi Kieran,

Thank you for the patch.

On Tuesday 13 Dec 2016 17:59:42 Kieran Bingham wrote:
> With multiple inputs through the BRU it is feasible for the streams to
> race each other at stream-on.

Could you please explain the race condition in the commit message ? The issue 
is that multiple VIDIOC_STREAMON calls racing each other could have process 
N-1 skipping over the pipeline setup section and then start the pipeline, if 
videobuf2 has already enqueued buffers to the driver for process N but not 
called the .start_streaming() operation yet.

> In the case of the video pipelines, this
> can present two serious issues.
> 
>  1) A null-dereference if the pipe->dl is committed at the same time as
>     the vsp1_video_setup_pipeline() is processing
> 
>  2) A hardware hang, where a display list is committed without having
>     called vsp1_video_setup_pipeline() first
> 
> Along side these race conditions, the work done by
> vsp1_video_setup_pipeline() is undone by the re-initialisation during a
> suspend resume cycle, and an active pipeline does not attempt to
> reconfigure the correct routing and init parameters for the entities.
> 
> To repair all of these issues, we can move the call to a conditional
> inside vsp1_video_pipeline_run() and ensure that this can only be called
> on the last stream which calls into vsp1_video_start_streaming()
> 
> As a convenient side effect of this, by specifying that the
> configuration has been lost during suspend/resume actions - the
> vsp1_video_pipeline_run() call can re-initialise pipelines when
> necessary thus repairing resume actions for active m2m pipelines.
> 
> Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
> 
> ---
> v3:
>  - Move 'flag reset' to be inside the vsp1_reset_wpf() function call
>  - Tidy up the wpf->pipe reference for the configured flag
> 
>  drivers/media/platform/vsp1/vsp1_drv.c   |  4 ++++
>  drivers/media/platform/vsp1/vsp1_pipe.c  |  1 +
>  drivers/media/platform/vsp1/vsp1_pipe.h  |  2 ++
>  drivers/media/platform/vsp1/vsp1_video.c | 20 +++++++++-----------
>  4 files changed, 16 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/media/platform/vsp1/vsp1_drv.c
> b/drivers/media/platform/vsp1/vsp1_drv.c index 57c713a4e1df..1dc3726c4e83
> 100644
> --- a/drivers/media/platform/vsp1/vsp1_drv.c
> +++ b/drivers/media/platform/vsp1/vsp1_drv.c
> @@ -413,6 +413,7 @@ static int vsp1_create_entities(struct vsp1_device
> *vsp1)
> 
>  int vsp1_reset_wpf(struct vsp1_device *vsp1, unsigned int index)
>  {
> +	struct vsp1_rwpf *wpf = vsp1->wpf[index];
>  	unsigned int timeout;
>  	u32 status;
> 
> @@ -429,6 +430,9 @@ int vsp1_reset_wpf(struct vsp1_device *vsp1, unsigned
> int index) usleep_range(1000, 2000);
>  	}
> 
> +	if (wpf->pipe)
> +		wpf->pipe->configured = false;
> +
>  	if (!timeout) {
>  		dev_err(vsp1->dev, "failed to reset wpf.%u\n", index);
>  		return -ETIMEDOUT;
> diff --git a/drivers/media/platform/vsp1/vsp1_pipe.c
> b/drivers/media/platform/vsp1/vsp1_pipe.c index 756ca4ea7668..7ddf862ee403
> 100644
> --- a/drivers/media/platform/vsp1/vsp1_pipe.c
> +++ b/drivers/media/platform/vsp1/vsp1_pipe.c
> @@ -208,6 +208,7 @@ void vsp1_pipeline_init(struct vsp1_pipeline *pipe)
> 
>  	INIT_LIST_HEAD(&pipe->entities);
>  	pipe->state = VSP1_PIPELINE_STOPPED;
> +	pipe->configured = false;
>  }
> 
>  /* Must be called with the pipe irqlock held. */
> diff --git a/drivers/media/platform/vsp1/vsp1_pipe.h
> b/drivers/media/platform/vsp1/vsp1_pipe.h index ac4ad2655551..0743b9fcb655
> 100644
> --- a/drivers/media/platform/vsp1/vsp1_pipe.h
> +++ b/drivers/media/platform/vsp1/vsp1_pipe.h
> @@ -61,6 +61,7 @@ enum vsp1_pipeline_state {
>   * @pipe: the media pipeline
>   * @irqlock: protects the pipeline state
>   * @state: current state
> + * @configured: determines routing configuration active on cell.

I'm not sure to understand that. How about "true if the pipeline has been set 
up" ? Or maybe "true if the pipeline has been set up for video streaming" as 
it only applies to pipelines handled through the V4L2 API ?

>   * @wq: wait queue to wait for state change completion
>   * @frame_end: frame end interrupt handler
>   * @lock: protects the pipeline use count and stream count
> @@ -86,6 +87,7 @@ struct vsp1_pipeline {
> 
>  	spinlock_t irqlock;
>  	enum vsp1_pipeline_state state;
> +	bool configured;
>  	wait_queue_head_t wq;
> 
>  	void (*frame_end)(struct vsp1_pipeline *pipe);
> diff --git a/drivers/media/platform/vsp1/vsp1_video.c
> b/drivers/media/platform/vsp1/vsp1_video.c index 44b687c0b8df..7ff9f4c19ff0
> 100644
> --- a/drivers/media/platform/vsp1/vsp1_video.c
> +++ b/drivers/media/platform/vsp1/vsp1_video.c
> @@ -388,6 +388,8 @@ static int vsp1_video_setup_pipeline(struct
> vsp1_pipeline *pipe) VSP1_ENTITY_PARAMS_INIT);
>  	}
> 
> +	pipe->configured = true;
> +
>  	return 0;
>  }
> 
> @@ -411,6 +413,9 @@ static void vsp1_video_pipeline_run(struct vsp1_pipeline
> *pipe) struct vsp1_device *vsp1 = pipe->output->entity.vsp1;
>  	struct vsp1_entity *entity;
> 
> +	if (!pipe->configured)
> +		vsp1_video_setup_pipeline(pipe);
> +

I don't like this much. The vsp1_video_pipeline_run() is called with a 
spinlock held. We should avoid operations as time consuming as 
vsp1_video_setup_pipeline() here.

>  	if (!pipe->dl)
>  		pipe->dl = vsp1_dl_list_get(pipe->output->dlm);
> 
> @@ -793,25 +798,18 @@ static int vsp1_video_start_streaming(struct vb2_queue
> *vq, unsigned int count) struct vsp1_video *video = vb2_get_drv_priv(vq);
>  	struct vsp1_pipeline *pipe = video->rwpf->pipe;
>  	unsigned long flags;
> -	int ret;
> 
>  	mutex_lock(&pipe->lock);
>  	if (pipe->stream_count == pipe->num_inputs) {
> -		ret = vsp1_video_setup_pipeline(pipe);
> -		if (ret < 0) {
> -			mutex_unlock(&pipe->lock);
> -			return ret;
> -		}
> +		spin_lock_irqsave(&pipe->irqlock, flags);
> +		if (vsp1_pipeline_ready(pipe))
> +			vsp1_video_pipeline_run(pipe);
> +		spin_unlock_irqrestore(&pipe->irqlock, flags);
>  	}
> 
>  	pipe->stream_count++;
>  	mutex_unlock(&pipe->lock);
> 
> -	spin_lock_irqsave(&pipe->irqlock, flags);
> -	if (vsp1_pipeline_ready(pipe))
> -		vsp1_video_pipeline_run(pipe);
> -	spin_unlock_irqrestore(&pipe->irqlock, flags);
> -

How about the following ?

	bool start_pipeline = false;

 	mutex_lock(&pipe->lock);
 	if (pipe->stream_count == pipe->num_inputs) {
		ret = vsp1_video_setup_pipeline(pipe);
		if (ret < 0) {
			mutex_unlock(&pipe->lock);
			return ret;
		}

		start_pipeline = true;
 	}

 	pipe->stream_count++;
 	mutex_unlock(&pipe->lock);

	/*
	 * Don't attempt to start the pipeline if we haven't configured it
	 * explicitly, as otherwise multiple streamon calls could race each
	 * other and one of them try to start the pipeline unconfigured.
	 */
	if (!start_pipeline)
		return 0;

	spin_lock_irqsave(&pipe->irqlock, flags);
	if (vsp1_pipeline_ready(pipe))
		vsp1_video_pipeline_run(pipe);
	spin_unlock_irqrestore(&pipe->irqlock, flags);


This won't fix the suspend/resume issue, but I think splitting that to a 
separate patch would be a good idea anyway.

I'm also wondering whether we could have a streamon/suspend race. If the 
pipeline is setup and the DL allocated but not committed at suspend time I 
think we'll leak the DL at resume time.

>  	return 0;
>  }

-- 
Regards,

Laurent Pinchart


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCHv3 RFC 4/4] media: Catch null pipes on pipeline stop
  2016-12-14 12:43       ` Sakari Ailus
@ 2016-12-14 17:53         ` Kieran Bingham
  2016-12-15  7:14           ` Sakari Ailus
  0 siblings, 1 reply; 18+ messages in thread
From: Kieran Bingham @ 2016-12-14 17:53 UTC (permalink / raw)
  To: Sakari Ailus
  Cc: Kieran Bingham, laurent.pinchart, linux-renesas-soc, linux-media

Hi Sakari,

On 14/12/16 12:43, Sakari Ailus wrote:
> Hi Kieran,
> 
> On Wed, Dec 14, 2016 at 12:27:37PM +0000, Kieran Bingham wrote:
>> Hi Sakari,
>>
>> On 14/12/16 07:28, Sakari Ailus wrote:
>>> Hi Kieran,
>>>
>>> On Tue, Dec 13, 2016 at 05:59:44PM +0000, Kieran Bingham wrote:
>>>> media_entity_pipeline_stop() can be called through error paths with a
>>>> NULL entity pipe object. In this instance, stopping is a no-op, so
>>>> simply return without any action
>>>
>>> The approach of returning silently is wrong; the streaming count is indeed a
>>> count: you have to decrement it the exactly same number of times it's been
>>> increased. Code that attempts to call __media_entity_pipeline_stop() on an
>>> entity that's not streaming is simply buggy.
>>
>> Ok, Thanks for the heads up on where to look, as I suspected we are in
>> section B) of my comments below :D
>>
>>> I've got a patch here that adds a warning to graph traversal on streaming
>>> count being zero. I sent a pull request including that some days ago:
>>>
>>> <URL:http://www.spinics.net/lists/linux-media/msg108980.html>
>>> <URL:http://www.spinics.net/lists/linux-media/msg108995.html>
>>
>> Excellent, thanks, I've merged your branch into mine, and I'll
>> investigate where the error is actually coming from.
>>
>> --
>> Ok - so I've merged your patches, (and dropped this patch) but they
>> don't fire any warning before I hit my null-pointer dereference in
>> __media_pipeline_stop(), on the WARN_ON(!pipe->streaming_count);
>>
>> So I think this is a case of calling stop on an invalid entity rather
>> than an unbalanced start/stop somehow:
> 
> Not necessarily. The pipe is set to NULL if the count goes to zero.
> 
> This check should be dropped, it's ill-placed anyway: if streaming count is
> zero the pipe is expected to be NULL anyway in which case you get a NULL
> pointer exception (that you saw there). With the check removed, you should
> get the warning instead.

Ahh, yes, I'd missed the part there that was setting it to NULL.

However, it will still segfault ... (though hopefully after the warning)
as the last line of the function states:

	if (!--pipe->streaming_count)
		media_entity_graph_walk_cleanup(graph);

So we will hit a null deref on the pipe->streaming_count there, which
still leaves an unbalanced stop as a kernel panic.

In fact, I've just tested removing that WARN_ON(!pipe->streaming_count);
 - it doesn't even get that far - and segfaults in

 		media_entity_graph_walk_cleanup(graph);

[   80.916558] Unable to handle kernel NULL pointer dereference at
virtual address 00000110
....
[   81.769492] [<ffffff800853e278>] media_graph_walk_start+0x20/0xf0
[   81.776615] [<ffffff800853e73c>] __media_pipeline_stop+0x3c/0xd8
[   81.783644] [<ffffff800853e80c>] media_pipeline_stop+0x34/0x48
[   81.790505] [<ffffff8008567ff0>] vsp1_du_setup_lif+0x68/0x5a8
[   81.797279] [<ffffff80084de9d4>] rcar_du_vsp_disable+0x2c/0x38
[   81.804137] [<ffffff80084da710>] rcar_du_crtc_stop+0x198/0x1e8
[   81.810984] [<ffffff80084da780>] rcar_du_crtc_disable+0x20/0x38

due to the fact that 'graph' is dereferenced through the 'pipe'.

{
	/* Entity->pipe is NULL, therefore 'graph' is invalid */
	struct media_graph *graph = &entity->pipe->graph;
	struct media_pipeline *pipe = entity->pipe;

	media_graph_walk_start(graph, entity);
...
}

So I suspect we do still need a warning or check in there somewhere.
Something to tell the developer that there is an unbalanced stop, would
be much better than a panic/segfault...

(And of course, we still need to look at the actual unbalanced stop in
vsp1_du_setup_lif!)

--
Kieran

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCHv3 RFC 4/4] media: Catch null pipes on pipeline stop
  2016-12-13 17:59 ` [PATCHv3 RFC 4/4] media: Catch null pipes on pipeline stop Kieran Bingham
  2016-12-14  7:28   ` Sakari Ailus
@ 2016-12-14 19:56   ` Kieran Bingham
  2016-12-14 21:16     ` Laurent Pinchart
  1 sibling, 1 reply; 18+ messages in thread
From: Kieran Bingham @ 2016-12-14 19:56 UTC (permalink / raw)
  To: Kieran Bingham, laurent.pinchart
  Cc: linux-renesas-soc, linux-media, Sakari Ailus

Hello Me.

Ok, so a bit of investigation into *why* we have an unbalanced
media_pipeline stop call.

After a suspend/resume cycle, the function vsp1_pm_runtime_resume() is
called by the PM framework.

The hardware is unable to reset successfully and the reset call returns
-ETIMEDOUT which gets passed back to the PM framework as a failure and
thus the device is not 'resumed'

This process is commenced, as the DU driver tries to enable the VSP, we
get the following call stack:

rcar_du_crtc_resume()
  rcar_du_vsp_enable()
    vsp1_du_setup_lif() // returns void
      vsp1_device_get() // returns -ETIMEDOUT,

As the vsp1_du_setup_lif() returns void, the fact that the hardware
failed to start is ignored.

Later we get a call to  rcar_du_vsp_disable(), which again calls into
vsp1_du_setup_lif(), this time to disable the pipeline. And it is here,
that the call to media_pipeline_stop() is an unbalanced call, as the
corresponding media_pipeline_start() would only have been called if the
vsp1_device_get() had succeeded, thus we find ourselves with a kernel
panic on a null dereference.

Sorry for the terse notes, they are possibly/probably really for me if I
get tasked to look back at this.
--
Regards

Kieran


On 13/12/16 17:59, Kieran Bingham wrote:
> media_entity_pipeline_stop() can be called through error paths with a
> NULL entity pipe object. In this instance, stopping is a no-op, so
> simply return without any action
> 
> Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
> ---
> 
> I've marked this patch as RFC, although if deemed suitable, by all means
> integrate it as is.
> 
> When testing suspend/resume operations on VSP1, I encountered a segfault on the
> WARN_ON(!pipe->streaming_count) line, where 'pipe == NULL'. The simple
> protection fix is to return early in this instance, as this patch does however:
> 
> A) Does this early return path warrant a WARN() statement itself, to identify
> drivers which are incorrectly calling media_entity_pipeline_stop() with an
> invalid entity, or would this just be noise ...
> 
> and therefore..
> 
> B) I also partly assume this patch could simply get NAK'd with a request to go
> and dig out the root cause of calling media_entity_pipeline_stop() with an
> invalid entity. 
> 
> My brief investigation so far here so far shows that it's almost a second order
> fault - where the first suspend resume cycle completes but leaves the entity in
> an invalid state having followed an error path - and then on a second
> suspend/resume - the stop fails with the affected segfault.
> 
> If statement A) or B) apply here, please drop this patch from the series, and
> don't consider it a blocking issue for the other 3 patches.
> 
> Kieran
> 
> 
>  drivers/media/media-entity.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/media/media-entity.c b/drivers/media/media-entity.c
> index c68239e60487..93c9cbf4bf46 100644
> --- a/drivers/media/media-entity.c
> +++ b/drivers/media/media-entity.c
> @@ -508,6 +508,8 @@ void __media_entity_pipeline_stop(struct media_entity *entity)
>  	struct media_entity_graph *graph = &entity->pipe->graph;
>  	struct media_pipeline *pipe = entity->pipe;
>  
> +	if (!pipe)
> +		return;
>  
>  	WARN_ON(!pipe->streaming_count);
>  	media_entity_graph_walk_start(graph, entity);
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCHv3 1/4] v4l: vsp1: Move vsp1_video_setup_pipeline()
  2016-12-13 17:59 ` [PATCHv3 1/4] v4l: vsp1: Move vsp1_video_setup_pipeline() Kieran Bingham
@ 2016-12-14 20:21   ` Laurent Pinchart
  0 siblings, 0 replies; 18+ messages in thread
From: Laurent Pinchart @ 2016-12-14 20:21 UTC (permalink / raw)
  To: Kieran Bingham; +Cc: linux-renesas-soc, linux-media

Hi Kieran,

Thank you for the patch.

On Tuesday 13 Dec 2016 17:59:41 Kieran Bingham wrote:
> Move the static vsp1_video_setup_pipeline() function in preparation for
> the callee updates so that the vsp1_video_pipeline_run() call can
> configure pipelines following suspend resume actions.
> 
> This commit is just a code move for clarity performing no functional
> change.
> 
> Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>

Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>

provided of course we still need this after the rework of 2/4.

> ---
>  drivers/media/platform/vsp1/vsp1_video.c | 82 ++++++++++++++---------------
>  1 file changed, 41 insertions(+), 41 deletions(-)
> 
> diff --git a/drivers/media/platform/vsp1/vsp1_video.c
> b/drivers/media/platform/vsp1/vsp1_video.c index d351b9c768d2..44b687c0b8df
> 100644
> --- a/drivers/media/platform/vsp1/vsp1_video.c
> +++ b/drivers/media/platform/vsp1/vsp1_video.c
> @@ -350,6 +350,47 @@ static void vsp1_video_frame_end(struct vsp1_pipeline
> *pipe, pipe->buffers_ready |= 1 << video->pipe_index;
>  }
> 
> +static int vsp1_video_setup_pipeline(struct vsp1_pipeline *pipe)
> +{
> +	struct vsp1_entity *entity;
> +
> +	/* Determine this pipelines sizes for image partitioning support. */
> +	vsp1_video_pipeline_setup_partitions(pipe);
> +
> +	/* Prepare the display list. */
> +	pipe->dl = vsp1_dl_list_get(pipe->output->dlm);
> +	if (!pipe->dl)
> +		return -ENOMEM;
> +
> +	if (pipe->uds) {
> +		struct vsp1_uds *uds = to_uds(&pipe->uds->subdev);
> +
> +		/* If a BRU is present in the pipeline before the UDS, the 
alpha
> +		 * component doesn't need to be scaled as the BRU output alpha
> +		 * value is fixed to 255. Otherwise we need to scale the alpha
> +		 * component only when available at the input RPF.
> +		 */
> +		if (pipe->uds_input->type == VSP1_ENTITY_BRU) {
> +			uds->scale_alpha = false;
> +		} else {
> +			struct vsp1_rwpf *rpf =
> +				to_rwpf(&pipe->uds_input->subdev);
> +
> +			uds->scale_alpha = rpf->fmtinfo->alpha;
> +		}
> +	}
> +
> +	list_for_each_entry(entity, &pipe->entities, list_pipe) {
> +		vsp1_entity_route_setup(entity, pipe->dl);
> +
> +		if (entity->ops->configure)
> +			entity->ops->configure(entity, pipe, pipe->dl,
> +					       VSP1_ENTITY_PARAMS_INIT);
> +	}
> +
> +	return 0;
> +}
> +
>  static void vsp1_video_pipeline_run_partition(struct vsp1_pipeline *pipe,
>  					      struct vsp1_dl_list *dl)
>  {
> @@ -747,47 +788,6 @@ static void vsp1_video_buffer_queue(struct vb2_buffer
> *vb) spin_unlock_irqrestore(&pipe->irqlock, flags);
>  }
> 
> -static int vsp1_video_setup_pipeline(struct vsp1_pipeline *pipe)
> -{
> -	struct vsp1_entity *entity;
> -
> -	/* Determine this pipelines sizes for image partitioning support. */
> -	vsp1_video_pipeline_setup_partitions(pipe);
> -
> -	/* Prepare the display list. */
> -	pipe->dl = vsp1_dl_list_get(pipe->output->dlm);
> -	if (!pipe->dl)
> -		return -ENOMEM;
> -
> -	if (pipe->uds) {
> -		struct vsp1_uds *uds = to_uds(&pipe->uds->subdev);
> -
> -		/* If a BRU is present in the pipeline before the UDS, the 
alpha
> -		 * component doesn't need to be scaled as the BRU output alpha
> -		 * value is fixed to 255. Otherwise we need to scale the alpha
> -		 * component only when available at the input RPF.
> -		 */
> -		if (pipe->uds_input->type == VSP1_ENTITY_BRU) {
> -			uds->scale_alpha = false;
> -		} else {
> -			struct vsp1_rwpf *rpf =
> -				to_rwpf(&pipe->uds_input->subdev);
> -
> -			uds->scale_alpha = rpf->fmtinfo->alpha;
> -		}
> -	}
> -
> -	list_for_each_entry(entity, &pipe->entities, list_pipe) {
> -		vsp1_entity_route_setup(entity, pipe->dl);
> -
> -		if (entity->ops->configure)
> -			entity->ops->configure(entity, pipe, pipe->dl,
> -					       VSP1_ENTITY_PARAMS_INIT);
> -	}
> -
> -	return 0;
> -}
> -
>  static int vsp1_video_start_streaming(struct vb2_queue *vq, unsigned int
> count) {
>  	struct vsp1_video *video = vb2_get_drv_priv(vq);

-- 
Regards,

Laurent Pinchart


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCHv3 RFC 4/4] media: Catch null pipes on pipeline stop
  2016-12-14 19:56   ` Kieran Bingham
@ 2016-12-14 21:16     ` Laurent Pinchart
  0 siblings, 0 replies; 18+ messages in thread
From: Laurent Pinchart @ 2016-12-14 21:16 UTC (permalink / raw)
  To: Kieran Bingham
  Cc: Kieran Bingham, linux-renesas-soc, linux-media, Sakari Ailus

Hi Kieran,

On Wednesday 14 Dec 2016 19:56:51 Kieran Bingham wrote:
> Hello Me.
> 
> Ok, so a bit of investigation into *why* we have an unbalanced
> media_pipeline stop call.
> 
> After a suspend/resume cycle, the function vsp1_pm_runtime_resume() is
> called by the PM framework.
> 
> The hardware is unable to reset successfully and the reset call returns
> -ETIMEDOUT which gets passed back to the PM framework as a failure and
> thus the device is not 'resumed'
> 
> This process is commenced, as the DU driver tries to enable the VSP, we
> get the following call stack:
> 
> rcar_du_crtc_resume()
>   rcar_du_vsp_enable()
>     vsp1_du_setup_lif() // returns void
>       vsp1_device_get() // returns -ETIMEDOUT,

I suspect the call stack to not be entirely accurate as the 
rcar_du_crtc_resume() is never called :-)

> As the vsp1_du_setup_lif() returns void, the fact that the hardware
> failed to start is ignored.
> 
> Later we get a call to  rcar_du_vsp_disable(), which again calls into
> vsp1_du_setup_lif(), this time to disable the pipeline. And it is here,
> that the call to media_pipeline_stop() is an unbalanced call, as the
> corresponding media_pipeline_start() would only have been called if the
> vsp1_device_get() had succeeded, thus we find ourselves with a kernel
> panic on a null dereference.
> 
> Sorry for the terse notes, they are possibly/probably really for me if I
> get tasked to look back at this.
> --
> Regards
> 
> Kieran
> 
> On 13/12/16 17:59, Kieran Bingham wrote:
> > media_entity_pipeline_stop() can be called through error paths with a
> > NULL entity pipe object. In this instance, stopping is a no-op, so
> > simply return without any action
> > 
> > Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
> > ---
> > 
> > I've marked this patch as RFC, although if deemed suitable, by all means
> > integrate it as is.
> > 
> > When testing suspend/resume operations on VSP1, I encountered a segfault
> > on the WARN_ON(!pipe->streaming_count) line, where 'pipe == NULL'. The
> > simple protection fix is to return early in this instance, as this patch
> > does however:
> > 
> > A) Does this early return path warrant a WARN() statement itself, to
> > identify drivers which are incorrectly calling
> > media_entity_pipeline_stop() with an invalid entity, or would this just
> > be noise ...
> > 
> > and therefore..
> > 
> > B) I also partly assume this patch could simply get NAK'd with a request
> > to go and dig out the root cause of calling media_entity_pipeline_stop()
> > with an invalid entity.
> > 
> > My brief investigation so far here so far shows that it's almost a second
> > order fault - where the first suspend resume cycle completes but leaves
> > the entity in an invalid state having followed an error path - and then
> > on a second suspend/resume - the stop fails with the affected segfault.
> > 
> > If statement A) or B) apply here, please drop this patch from the series,
> > and don't consider it a blocking issue for the other 3 patches.
> > 
> > Kieran
> > 
> >  drivers/media/media-entity.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/drivers/media/media-entity.c b/drivers/media/media-entity.c
> > index c68239e60487..93c9cbf4bf46 100644
> > --- a/drivers/media/media-entity.c
> > +++ b/drivers/media/media-entity.c
> > @@ -508,6 +508,8 @@ void __media_entity_pipeline_stop(struct media_entity
> > *entity)> 
> >  	struct media_entity_graph *graph = &entity->pipe->graph;
> >  	struct media_pipeline *pipe = entity->pipe;
> > 
> > +	if (!pipe)
> > +		return;
> > 
> >  	WARN_ON(!pipe->streaming_count);
> >  	media_entity_graph_walk_start(graph, entity);

-- 
Regards,

Laurent Pinchart


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCHv3 RFC 4/4] media: Catch null pipes on pipeline stop
  2016-12-14 17:53         ` Kieran Bingham
@ 2016-12-15  7:14           ` Sakari Ailus
  0 siblings, 0 replies; 18+ messages in thread
From: Sakari Ailus @ 2016-12-15  7:14 UTC (permalink / raw)
  To: Kieran Bingham
  Cc: Kieran Bingham, laurent.pinchart, linux-renesas-soc, linux-media

On Wed, Dec 14, 2016 at 05:53:00PM +0000, Kieran Bingham wrote:
> Hi Sakari,
> 
> On 14/12/16 12:43, Sakari Ailus wrote:
> > Hi Kieran,
> > 
> > On Wed, Dec 14, 2016 at 12:27:37PM +0000, Kieran Bingham wrote:
> >> Hi Sakari,
> >>
> >> On 14/12/16 07:28, Sakari Ailus wrote:
> >>> Hi Kieran,
> >>>
> >>> On Tue, Dec 13, 2016 at 05:59:44PM +0000, Kieran Bingham wrote:
> >>>> media_entity_pipeline_stop() can be called through error paths with a
> >>>> NULL entity pipe object. In this instance, stopping is a no-op, so
> >>>> simply return without any action
> >>>
> >>> The approach of returning silently is wrong; the streaming count is indeed a
> >>> count: you have to decrement it the exactly same number of times it's been
> >>> increased. Code that attempts to call __media_entity_pipeline_stop() on an
> >>> entity that's not streaming is simply buggy.
> >>
> >> Ok, Thanks for the heads up on where to look, as I suspected we are in
> >> section B) of my comments below :D
> >>
> >>> I've got a patch here that adds a warning to graph traversal on streaming
> >>> count being zero. I sent a pull request including that some days ago:
> >>>
> >>> <URL:http://www.spinics.net/lists/linux-media/msg108980.html>
> >>> <URL:http://www.spinics.net/lists/linux-media/msg108995.html>
> >>
> >> Excellent, thanks, I've merged your branch into mine, and I'll
> >> investigate where the error is actually coming from.
> >>
> >> --
> >> Ok - so I've merged your patches, (and dropped this patch) but they
> >> don't fire any warning before I hit my null-pointer dereference in
> >> __media_pipeline_stop(), on the WARN_ON(!pipe->streaming_count);
> >>
> >> So I think this is a case of calling stop on an invalid entity rather
> >> than an unbalanced start/stop somehow:
> > 
> > Not necessarily. The pipe is set to NULL if the count goes to zero.
> > 
> > This check should be dropped, it's ill-placed anyway: if streaming count is
> > zero the pipe is expected to be NULL anyway in which case you get a NULL
> > pointer exception (that you saw there). With the check removed, you should
> > get the warning instead.
> 
> Ahh, yes, I'd missed the part there that was setting it to NULL.
> 
> However, it will still segfault ... (though hopefully after the warning)
> as the last line of the function states:
> 
> 	if (!--pipe->streaming_count)
> 		media_entity_graph_walk_cleanup(graph);
> 
> So we will hit a null deref on the pipe->streaming_count there, which
> still leaves an unbalanced stop as a kernel panic.

Right, indeed. The graph is part of the pipeline, so if pipe was NULL you
don't have graph either. How about changing the check to:

if (WARN_ON(!pipe))
	return;

Instead of removing it. Nothing in that function really makes sense if
there's no pipeline. The driver bug still exists. That's just to make the
framework behaving a little bit better in the presence of that sort of bug.

> 
> In fact, I've just tested removing that WARN_ON(!pipe->streaming_count);
>  - it doesn't even get that far - and segfaults in
> 
>  		media_entity_graph_walk_cleanup(graph);
> 
> [   80.916558] Unable to handle kernel NULL pointer dereference at
> virtual address 00000110
> ....
> [   81.769492] [<ffffff800853e278>] media_graph_walk_start+0x20/0xf0
> [   81.776615] [<ffffff800853e73c>] __media_pipeline_stop+0x3c/0xd8
> [   81.783644] [<ffffff800853e80c>] media_pipeline_stop+0x34/0x48
> [   81.790505] [<ffffff8008567ff0>] vsp1_du_setup_lif+0x68/0x5a8
> [   81.797279] [<ffffff80084de9d4>] rcar_du_vsp_disable+0x2c/0x38
> [   81.804137] [<ffffff80084da710>] rcar_du_crtc_stop+0x198/0x1e8
> [   81.810984] [<ffffff80084da780>] rcar_du_crtc_disable+0x20/0x38
> 
> due to the fact that 'graph' is dereferenced through the 'pipe'.
> 
> {
> 	/* Entity->pipe is NULL, therefore 'graph' is invalid */
> 	struct media_graph *graph = &entity->pipe->graph;
> 	struct media_pipeline *pipe = entity->pipe;
> 
> 	media_graph_walk_start(graph, entity);
> ...
> }
> 
> So I suspect we do still need a warning or check in there somewhere.
> Something to tell the developer that there is an unbalanced stop, would
> be much better than a panic/segfault...
> 
> (And of course, we still need to look at the actual unbalanced stop in
> vsp1_du_setup_lif!)

Indeed.

> 

-- 
Kind regards,

Sakari Ailus
e-mail: sakari.ailus@iki.fi	XMPP: sailus@retiisi.org.uk

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCHv3 2/4] v4l: vsp1: Refactor video pipeline configuration
  2016-12-14 16:30   ` Laurent Pinchart
@ 2016-12-15 11:50     ` Kieran Bingham
  2017-01-06 11:11         ` Kieran Bingham
  0 siblings, 1 reply; 18+ messages in thread
From: Kieran Bingham @ 2016-12-15 11:50 UTC (permalink / raw)
  To: Laurent Pinchart, Kieran Bingham; +Cc: linux-renesas-soc, linux-media

Hi Laurent,

On 14/12/16 16:30, Laurent Pinchart wrote:
> Hi Kieran,
> 
> Thank you for the patch.
> 
> On Tuesday 13 Dec 2016 17:59:42 Kieran Bingham wrote:
>> With multiple inputs through the BRU it is feasible for the streams to
>> race each other at stream-on.
> 
> Could you please explain the race condition in the commit message ? The issue 
> is that multiple VIDIOC_STREAMON calls racing each other could have process 
> N-1 skipping over the pipeline setup section and then start the pipeline, if 
> videobuf2 has already enqueued buffers to the driver for process N but not 
> called the .start_streaming() operation yet.
> 
>> In the case of the video pipelines, this
>> can present two serious issues.
>>
>>  1) A null-dereference if the pipe->dl is committed at the same time as
>>     the vsp1_video_setup_pipeline() is processing
>>
>>  2) A hardware hang, where a display list is committed without having
>>     called vsp1_video_setup_pipeline() first
>>
>> Along side these race conditions, the work done by
>> vsp1_video_setup_pipeline() is undone by the re-initialisation during a
>> suspend resume cycle, and an active pipeline does not attempt to
>> reconfigure the correct routing and init parameters for the entities.
>>
>> To repair all of these issues, we can move the call to a conditional
>> inside vsp1_video_pipeline_run() and ensure that this can only be called
>> on the last stream which calls into vsp1_video_start_streaming()
>>
>> As a convenient side effect of this, by specifying that the
>> configuration has been lost during suspend/resume actions - the
>> vsp1_video_pipeline_run() call can re-initialise pipelines when
>> necessary thus repairing resume actions for active m2m pipelines.
>>
>> Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
>>
>> ---
>> v3:
>>  - Move 'flag reset' to be inside the vsp1_reset_wpf() function call
>>  - Tidy up the wpf->pipe reference for the configured flag
>>
>>  drivers/media/platform/vsp1/vsp1_drv.c   |  4 ++++
>>  drivers/media/platform/vsp1/vsp1_pipe.c  |  1 +
>>  drivers/media/platform/vsp1/vsp1_pipe.h  |  2 ++
>>  drivers/media/platform/vsp1/vsp1_video.c | 20 +++++++++-----------
>>  4 files changed, 16 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/media/platform/vsp1/vsp1_drv.c
>> b/drivers/media/platform/vsp1/vsp1_drv.c index 57c713a4e1df..1dc3726c4e83
>> 100644
>> --- a/drivers/media/platform/vsp1/vsp1_drv.c
>> +++ b/drivers/media/platform/vsp1/vsp1_drv.c
>> @@ -413,6 +413,7 @@ static int vsp1_create_entities(struct vsp1_device
>> *vsp1)
>>
>>  int vsp1_reset_wpf(struct vsp1_device *vsp1, unsigned int index)
>>  {
>> +	struct vsp1_rwpf *wpf = vsp1->wpf[index];
>>  	unsigned int timeout;
>>  	u32 status;
>>
>> @@ -429,6 +430,9 @@ int vsp1_reset_wpf(struct vsp1_device *vsp1, unsigned
>> int index) usleep_range(1000, 2000);
>>  	}
>>
>> +	if (wpf->pipe)
>> +		wpf->pipe->configured = false;
>> +
>>  	if (!timeout) {
>>  		dev_err(vsp1->dev, "failed to reset wpf.%u\n", index);
>>  		return -ETIMEDOUT;
>> diff --git a/drivers/media/platform/vsp1/vsp1_pipe.c
>> b/drivers/media/platform/vsp1/vsp1_pipe.c index 756ca4ea7668..7ddf862ee403
>> 100644
>> --- a/drivers/media/platform/vsp1/vsp1_pipe.c
>> +++ b/drivers/media/platform/vsp1/vsp1_pipe.c
>> @@ -208,6 +208,7 @@ void vsp1_pipeline_init(struct vsp1_pipeline *pipe)
>>
>>  	INIT_LIST_HEAD(&pipe->entities);
>>  	pipe->state = VSP1_PIPELINE_STOPPED;
>> +	pipe->configured = false;
>>  }
>>
>>  /* Must be called with the pipe irqlock held. */
>> diff --git a/drivers/media/platform/vsp1/vsp1_pipe.h
>> b/drivers/media/platform/vsp1/vsp1_pipe.h index ac4ad2655551..0743b9fcb655
>> 100644
>> --- a/drivers/media/platform/vsp1/vsp1_pipe.h
>> +++ b/drivers/media/platform/vsp1/vsp1_pipe.h
>> @@ -61,6 +61,7 @@ enum vsp1_pipeline_state {
>>   * @pipe: the media pipeline
>>   * @irqlock: protects the pipeline state
>>   * @state: current state
>> + * @configured: determines routing configuration active on cell.
> 
> I'm not sure to understand that. How about "true if the pipeline has been set 
> up" ? Or maybe "true if the pipeline has been set up for video streaming" as 
> it only applies to pipelines handled through the V4L2 API ?


Yes, Reading it now - I have no idea what context I was writing that in.
I hope it was late and I was tired ... otherwise I have no excuse :D



>>   * @wq: wait queue to wait for state change completion
>>   * @frame_end: frame end interrupt handler
>>   * @lock: protects the pipeline use count and stream count
>> @@ -86,6 +87,7 @@ struct vsp1_pipeline {
>>
>>  	spinlock_t irqlock;
>>  	enum vsp1_pipeline_state state;
>> +	bool configured;
>>  	wait_queue_head_t wq;
>>
>>  	void (*frame_end)(struct vsp1_pipeline *pipe);
>> diff --git a/drivers/media/platform/vsp1/vsp1_video.c
>> b/drivers/media/platform/vsp1/vsp1_video.c index 44b687c0b8df..7ff9f4c19ff0
>> 100644
>> --- a/drivers/media/platform/vsp1/vsp1_video.c
>> +++ b/drivers/media/platform/vsp1/vsp1_video.c
>> @@ -388,6 +388,8 @@ static int vsp1_video_setup_pipeline(struct
>> vsp1_pipeline *pipe) VSP1_ENTITY_PARAMS_INIT);
>>  	}
>>
>> +	pipe->configured = true;
>> +
>>  	return 0;
>>  }
>>
>> @@ -411,6 +413,9 @@ static void vsp1_video_pipeline_run(struct vsp1_pipeline
>> *pipe) struct vsp1_device *vsp1 = pipe->output->entity.vsp1;
>>  	struct vsp1_entity *entity;
>>
>> +	if (!pipe->configured)
>> +		vsp1_video_setup_pipeline(pipe);
>> +
> 
> I don't like this much. The vsp1_video_pipeline_run() is called with a 
> spinlock held. We should avoid operations as time consuming as 
> vsp1_video_setup_pipeline() here.


I'm going to argue my case here, as I thought this was a more elegant
solution (and I really seem to dislike that poorly protected global
pipe->dl);

However I will back down quickly/happily if needed. Especially knowing
that we are trying to reduce the recalculations of the DL, and I can see
where those changes will lead us anyway.

Regardless of the outcome, I felt it was useful (at least to me) to
measure the call times on these functions to get an understanding, thus
I have used ftrace to measure/monitor a full run of the vsp-test suite:

 trace-cmd record \
        -p function_graph \
        -l 'vsp1_video_setup_pipeline' \
        -l 'vsp1_irq_handler' \
        \
        ./vsp-tests.sh

* I had to mark vsp1_video_setup_pipeline() as noinline to be able to
measure it with ftrace

A) Call frequency

Whilst vsp1_video_pipeline_run() is called for every frame, it should
only be on the first frame of each stream, (or the first frame following
a resume operation) where the flag pipe->configured == false.

 - Measuring in the entirety of the vsp-test suite,  I count 121
   calls here.
   (trace-cmd report | grep vsp1_video_setup_pipeline | wc -l)


B) Function duration

The log of each of the function durations is available at:
   http://paste.ubuntu.com/23632986/

Some quick and dirty statistical analysis shows the following duration
times for this function.

trace-cmd report \
	| grep vsp1_video_setup_pipeline \
	| sed 's/.*funcgraph_entry:\W*\([0-9]*\.[0-9]*\) us.*/\1/' \
	| st

## Numbers in uS ...
N       min     max     sum     mean    stddev
121     4.68    17.76   887.885 7.33789 2.3195


So we do hit nearly 18 uS in this function, which could be considered a
lot for interrupt context, but when it is only once at the beginning of
stream start?

(Incidentally, running the same measurements on the whole of
vsp1_irq_handler nets the following statistical results:

trace-cmd report \
	| grep vsp1_irq_handler \
	| sed 's/.*funcgraph_entry:\W*\([0-9]*\.[0-9]*\) us.*/\1/' \
	| st

N       min     max     sum     mean    stddev
12360   1.32    165.481 125934  10.1889 15.0117

Of course the coverage on the IRQ handler is more broad, and I'm not
sure how to only measure the IRQ handler calls that result in calling
vsp1_video_setup_pipeline().

Anyway, like I said - possibly a moot point anyway - but I wanted to
actually measure the function times to see what we're up against.

>>  	if (!pipe->dl)
>>  		pipe->dl = vsp1_dl_list_get(pipe->output->dlm);
>>
>> @@ -793,25 +798,18 @@ static int vsp1_video_start_streaming(struct vb2_queue
>> *vq, unsigned int count) struct vsp1_video *video = vb2_get_drv_priv(vq);
>>  	struct vsp1_pipeline *pipe = video->rwpf->pipe;
>>  	unsigned long flags;
>> -	int ret;
>>
>>  	mutex_lock(&pipe->lock);
>>  	if (pipe->stream_count == pipe->num_inputs) {
>> -		ret = vsp1_video_setup_pipeline(pipe);
>> -		if (ret < 0) {
>> -			mutex_unlock(&pipe->lock);
>> -			return ret;
>> -		}
>> +		spin_lock_irqsave(&pipe->irqlock, flags);
>> +		if (vsp1_pipeline_ready(pipe))
>> +			vsp1_video_pipeline_run(pipe);
>> +		spin_unlock_irqrestore(&pipe->irqlock, flags);
>>  	}
>>
>>  	pipe->stream_count++;
>>  	mutex_unlock(&pipe->lock);
>>
>> -	spin_lock_irqsave(&pipe->irqlock, flags);
>> -	if (vsp1_pipeline_ready(pipe))
>> -		vsp1_video_pipeline_run(pipe);
>> -	spin_unlock_irqrestore(&pipe->irqlock, flags);
>> -
> 
> How about the following ?
> 
> 	bool start_pipeline = false;
> 
>  	mutex_lock(&pipe->lock);
>  	if (pipe->stream_count == pipe->num_inputs) {
> 		ret = vsp1_video_setup_pipeline(pipe);
> 		if (ret < 0) {
> 			mutex_unlock(&pipe->lock);
> 			return ret;
> 		}
> 
> 		start_pipeline = true;
>  	}
> 
>  	pipe->stream_count++;
>  	mutex_unlock(&pipe->lock);
> 
> 	/*
> 	 * Don't attempt to start the pipeline if we haven't configured it
> 	 * explicitly, as otherwise multiple streamon calls could race each
> 	 * other and one of them try to start the pipeline unconfigured.
> 	 */
> 	if (!start_pipeline)
> 		return 0;
> 
> 	spin_lock_irqsave(&pipe->irqlock, flags);
> 	if (vsp1_pipeline_ready(pipe))
> 		vsp1_video_pipeline_run(pipe);
> 	spin_unlock_irqrestore(&pipe->irqlock, flags);

Ok, I'm happy with that, it's effectively v1 of my patchset, before I
discovered that the suspend resume issue was closely related.


> This won't fix the suspend/resume issue, but I think splitting that to a 
> separate patch would be a good idea anyway.

I'll have to look again to consider how this could be done separately.


> I'm also wondering whether we could have a streamon/suspend race. If the 
> pipeline is setup and the DL allocated but not committed at suspend time I 
> think we'll leak the DL at resume time.

If there is, then I believe I resolved this in :
 "v4l: vsp1: Use local display lists and remove global pipe->dl"

but yes, if we end up not being able to use that patch then I'll take a
look and see if there's anything more needs to be done here.


>>  	return 0;
>>  }
> 

-- 
Regards

Kieran Bingham

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCHv3 2/4] v4l: vsp1: Refactor video pipeline configuration
  2016-12-15 11:50     ` Kieran Bingham
@ 2017-01-06 11:11         ` Kieran Bingham
  0 siblings, 0 replies; 18+ messages in thread
From: Kieran Bingham @ 2017-01-06 11:11 UTC (permalink / raw)
  To: Laurent Pinchart, Kieran Bingham; +Cc: linux-renesas-soc, linux-media

Hi Laurent,

I've been reworking this series to split things out and adapt for the
comments you've provided, but I have the following queries outstanding:

On 15/12/16 11:50, Kieran Bingham wrote:
> Hi Laurent,
> 
> On 14/12/16 16:30, Laurent Pinchart wrote:
>> Hi Kieran,
>>
>> Thank you for the patch.
>>
>> On Tuesday 13 Dec 2016 17:59:42 Kieran Bingham wrote:
>>> With multiple inputs through the BRU it is feasible for the streams to
>>> race each other at stream-on.
>>
>> Could you please explain the race condition in the commit message ? The issue 
>> is that multiple VIDIOC_STREAMON calls racing each other could have process 
>> N-1 skipping over the pipeline setup section and then start the pipeline, if 
>> videobuf2 has already enqueued buffers to the driver for process N but not 
>> called the .start_streaming() operation yet.
>>
>>> In the case of the video pipelines, this
>>> can present two serious issues.
>>>
>>>  1) A null-dereference if the pipe->dl is committed at the same time as
>>>     the vsp1_video_setup_pipeline() is processing
>>>
>>>  2) A hardware hang, where a display list is committed without having
>>>     called vsp1_video_setup_pipeline() first
>>>
>>> Along side these race conditions, the work done by
>>> vsp1_video_setup_pipeline() is undone by the re-initialisation during a
>>> suspend resume cycle, and an active pipeline does not attempt to
>>> reconfigure the correct routing and init parameters for the entities.
>>>
>>> To repair all of these issues, we can move the call to a conditional
>>> inside vsp1_video_pipeline_run() and ensure that this can only be called
>>> on the last stream which calls into vsp1_video_start_streaming()
>>>
>>> As a convenient side effect of this, by specifying that the
>>> configuration has been lost during suspend/resume actions - the
>>> vsp1_video_pipeline_run() call can re-initialise pipelines when
>>> necessary thus repairing resume actions for active m2m pipelines.
>>>
>>> Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
>>>
>>> ---
>>> v3:
>>>  - Move 'flag reset' to be inside the vsp1_reset_wpf() function call
>>>  - Tidy up the wpf->pipe reference for the configured flag
>>>
>>>  drivers/media/platform/vsp1/vsp1_drv.c   |  4 ++++
>>>  drivers/media/platform/vsp1/vsp1_pipe.c  |  1 +
>>>  drivers/media/platform/vsp1/vsp1_pipe.h  |  2 ++
>>>  drivers/media/platform/vsp1/vsp1_video.c | 20 +++++++++-----------
>>>  4 files changed, 16 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/drivers/media/platform/vsp1/vsp1_drv.c
>>> b/drivers/media/platform/vsp1/vsp1_drv.c index 57c713a4e1df..1dc3726c4e83
>>> 100644
>>> --- a/drivers/media/platform/vsp1/vsp1_drv.c
>>> +++ b/drivers/media/platform/vsp1/vsp1_drv.c
>>> @@ -413,6 +413,7 @@ static int vsp1_create_entities(struct vsp1_device
>>> *vsp1)
>>>
>>>  int vsp1_reset_wpf(struct vsp1_device *vsp1, unsigned int index)
>>>  {
>>> +	struct vsp1_rwpf *wpf = vsp1->wpf[index];
>>>  	unsigned int timeout;
>>>  	u32 status;
>>>
>>> @@ -429,6 +430,9 @@ int vsp1_reset_wpf(struct vsp1_device *vsp1, unsigned
>>> int index) usleep_range(1000, 2000);
>>>  	}
>>>
>>> +	if (wpf->pipe)
>>> +		wpf->pipe->configured = false;
>>> +
>>>  	if (!timeout) {
>>>  		dev_err(vsp1->dev, "failed to reset wpf.%u\n", index);
>>>  		return -ETIMEDOUT;
>>> diff --git a/drivers/media/platform/vsp1/vsp1_pipe.c
>>> b/drivers/media/platform/vsp1/vsp1_pipe.c index 756ca4ea7668..7ddf862ee403
>>> 100644
>>> --- a/drivers/media/platform/vsp1/vsp1_pipe.c
>>> +++ b/drivers/media/platform/vsp1/vsp1_pipe.c
>>> @@ -208,6 +208,7 @@ void vsp1_pipeline_init(struct vsp1_pipeline *pipe)
>>>
>>>  	INIT_LIST_HEAD(&pipe->entities);
>>>  	pipe->state = VSP1_PIPELINE_STOPPED;
>>> +	pipe->configured = false;
>>>  }
>>>
>>>  /* Must be called with the pipe irqlock held. */
>>> diff --git a/drivers/media/platform/vsp1/vsp1_pipe.h
>>> b/drivers/media/platform/vsp1/vsp1_pipe.h index ac4ad2655551..0743b9fcb655
>>> 100644
>>> --- a/drivers/media/platform/vsp1/vsp1_pipe.h
>>> +++ b/drivers/media/platform/vsp1/vsp1_pipe.h
>>> @@ -61,6 +61,7 @@ enum vsp1_pipeline_state {
>>>   * @pipe: the media pipeline
>>>   * @irqlock: protects the pipeline state
>>>   * @state: current state
>>> + * @configured: determines routing configuration active on cell.
>>
>> I'm not sure to understand that. How about "true if the pipeline has been set 
>> up" ? Or maybe "true if the pipeline has been set up for video streaming" as 
>> it only applies to pipelines handled through the V4L2 API ?
> 
> 
> Yes, Reading it now - I have no idea what context I was writing that in.
> I hope it was late and I was tired ... otherwise I have no excuse :D
> 
> 
> 
>>>   * @wq: wait queue to wait for state change completion
>>>   * @frame_end: frame end interrupt handler
>>>   * @lock: protects the pipeline use count and stream count
>>> @@ -86,6 +87,7 @@ struct vsp1_pipeline {
>>>
>>>  	spinlock_t irqlock;
>>>  	enum vsp1_pipeline_state state;
>>> +	bool configured;
>>>  	wait_queue_head_t wq;
>>>
>>>  	void (*frame_end)(struct vsp1_pipeline *pipe);
>>> diff --git a/drivers/media/platform/vsp1/vsp1_video.c
>>> b/drivers/media/platform/vsp1/vsp1_video.c index 44b687c0b8df..7ff9f4c19ff0
>>> 100644
>>> --- a/drivers/media/platform/vsp1/vsp1_video.c
>>> +++ b/drivers/media/platform/vsp1/vsp1_video.c
>>> @@ -388,6 +388,8 @@ static int vsp1_video_setup_pipeline(struct
>>> vsp1_pipeline *pipe) VSP1_ENTITY_PARAMS_INIT);
>>>  	}
>>>
>>> +	pipe->configured = true;
>>> +
>>>  	return 0;
>>>  }
>>>
>>> @@ -411,6 +413,9 @@ static void vsp1_video_pipeline_run(struct vsp1_pipeline
>>> *pipe) struct vsp1_device *vsp1 = pipe->output->entity.vsp1;
>>>  	struct vsp1_entity *entity;
>>>
>>> +	if (!pipe->configured)
>>> +		vsp1_video_setup_pipeline(pipe);
>>> +
>>
>> I don't like this much. The vsp1_video_pipeline_run() is called with a 
>> spinlock held. We should avoid operations as time consuming as 
>> vsp1_video_setup_pipeline() here.


I'm wondering where you stand on this now that the measurements have
been taken.

Is 18uS, once per stream start (/resume) still against your desires here?

Reading the comments from, struct vsp1_pipeline

 * @irqlock: protects the pipeline state
 * @lock: protects the pipeline use count and stream count

Doesn't this infer that calling vsp1_video_setup_pipeline, is better run
with the @irqlock held rather than the @lock anyway?


> I'm going to argue my case here, as I thought this was a more elegant
> solution (and I really seem to dislike that poorly protected global
> pipe->dl);
> 
> However I will back down quickly/happily if needed. Especially knowing
> that we are trying to reduce the recalculations of the DL, and I can see
> where those changes will lead us anyway.
> 
> Regardless of the outcome, I felt it was useful (at least to me) to
> measure the call times on these functions to get an understanding, thus
> I have used ftrace to measure/monitor a full run of the vsp-test suite:
> 
>  trace-cmd record \
>         -p function_graph \
>         -l 'vsp1_video_setup_pipeline' \
>         -l 'vsp1_irq_handler' \
>         \
>         ./vsp-tests.sh
> 
> * I had to mark vsp1_video_setup_pipeline() as noinline to be able to
> measure it with ftrace
> 
> A) Call frequency
> 
> Whilst vsp1_video_pipeline_run() is called for every frame, it should
> only be on the first frame of each stream, (or the first frame following
> a resume operation) where the flag pipe->configured == false.
> 
>  - Measuring in the entirety of the vsp-test suite,  I count 121
>    calls here.
>    (trace-cmd report | grep vsp1_video_setup_pipeline | wc -l)
> 
> 
> B) Function duration
> 
> The log of each of the function durations is available at:
>    http://paste.ubuntu.com/23632986/
> 
> Some quick and dirty statistical analysis shows the following duration
> times for this function.
> 
> trace-cmd report \
> 	| grep vsp1_video_setup_pipeline \
> 	| sed 's/.*funcgraph_entry:\W*\([0-9]*\.[0-9]*\) us.*/\1/' \
> 	| st
> 
> ## Numbers in uS ...
> N       min     max     sum     mean    stddev
> 121     4.68    17.76   887.885 7.33789 2.3195
> 
> 
> So we do hit nearly 18 uS in this function, which could be considered a
> lot for interrupt context, but when it is only once at the beginning of
> stream start?
>
> (Incidentally, running the same measurements on the whole of
> vsp1_irq_handler nets the following statistical results:
> 
> trace-cmd report \
> 	| grep vsp1_irq_handler \
> 	| sed 's/.*funcgraph_entry:\W*\([0-9]*\.[0-9]*\) us.*/\1/' \
> 	| st
> 
> N       min     max     sum     mean    stddev
> 12360   1.32    165.481 125934  10.1889 15.0117
> 
> Of course the coverage on the IRQ handler is more broad, and I'm not
> sure how to only measure the IRQ handler calls that result in calling
> vsp1_video_setup_pipeline().
> 

Those interrupt context comparisons are irrelevant now that you've
cleared up the reference was due to the spinlock being held, rather than
the calls being in interrupt context :)


> Anyway, like I said - possibly a moot point anyway - but I wanted to
> actually measure the function times to see what we're up against.
> 
>>>  	if (!pipe->dl)
>>>  		pipe->dl = vsp1_dl_list_get(pipe->output->dlm);
>>>
>>> @@ -793,25 +798,18 @@ static int vsp1_video_start_streaming(struct vb2_queue
>>> *vq, unsigned int count) struct vsp1_video *video = vb2_get_drv_priv(vq);
>>>  	struct vsp1_pipeline *pipe = video->rwpf->pipe;
>>>  	unsigned long flags;
>>> -	int ret;
>>>
>>>  	mutex_lock(&pipe->lock);
>>>  	if (pipe->stream_count == pipe->num_inputs) {
>>> -		ret = vsp1_video_setup_pipeline(pipe);
>>> -		if (ret < 0) {
>>> -			mutex_unlock(&pipe->lock);
>>> -			return ret;
>>> -		}
>>> +		spin_lock_irqsave(&pipe->irqlock, flags);
>>> +		if (vsp1_pipeline_ready(pipe))
>>> +			vsp1_video_pipeline_run(pipe);
>>> +		spin_unlock_irqrestore(&pipe->irqlock, flags);
>>>  	}
>>>
>>>  	pipe->stream_count++;
>>>  	mutex_unlock(&pipe->lock);
>>>
>>> -	spin_lock_irqsave(&pipe->irqlock, flags);
>>> -	if (vsp1_pipeline_ready(pipe))
>>> -		vsp1_video_pipeline_run(pipe);
>>> -	spin_unlock_irqrestore(&pipe->irqlock, flags);
>>> -
>>
>> How about the following ?
>>
>> 	bool start_pipeline = false;
>>
>>  	mutex_lock(&pipe->lock);
>>  	if (pipe->stream_count == pipe->num_inputs) {
>> 		ret = vsp1_video_setup_pipeline(pipe);
>> 		if (ret < 0) {
>> 			mutex_unlock(&pipe->lock);
>> 			return ret;
>> 		}
>>
>> 		start_pipeline = true;
>>  	}
>>
>>  	pipe->stream_count++;
>>  	mutex_unlock(&pipe->lock);
>>
>> 	/*
>> 	 * Don't attempt to start the pipeline if we haven't configured it
>> 	 * explicitly, as otherwise multiple streamon calls could race each
>> 	 * other and one of them try to start the pipeline unconfigured.
>> 	 */
>> 	if (!start_pipeline)
>> 		return 0;
>>
>> 	spin_lock_irqsave(&pipe->irqlock, flags);
>> 	if (vsp1_pipeline_ready(pipe))
>> 		vsp1_video_pipeline_run(pipe);
>> 	spin_unlock_irqrestore(&pipe->irqlock, flags);
> 
> Ok, I'm happy with that, it's effectively v1 of my patchset, before I
> discovered that the suspend resume issue was closely related.
> 
> 
>> This won't fix the suspend/resume issue, but I think splitting that to a 
>> separate patch would be a good idea anyway.
> 

Reverting back to the 'v1' style, and then moving the call to setup the
pipeline, into the pipeline_run() against a configured flag to fix for
suspend/resume cycles, netts the following code :

	mutex_lock(&pipe->lock);
	if (pipe->stream_count == pipe->num_inputs)
		start_pipeline = true;

	pipe->stream_count++;
	mutex_unlock(&pipe->lock);

	/*
	 * vsp1_pipeline_ready() is not sufficient to establish that all streams
	 * are prepared and the pipeline is configured, as multiple streams
	 * can race through streamon with buffers already queued; Therefore we
	 * don't even attempt to start the pipeline until the last stream has
	 * called through here.
	 */
	if (!start_pipeline)
		return 0;

	spin_lock_irqsave(&pipe->irqlock, flags);
	if (vsp1_pipeline_ready(pipe))
		vsp1_video_pipeline_run(pipe);
	spin_unlock_irqrestore(&pipe->irqlock, flags);



> I'll have to look again to consider how this could be done separately.
> 
> 
>> I'm also wondering whether we could have a streamon/suspend race. If the 
>> pipeline is setup and the DL allocated but not committed at suspend time I 
>> think we'll leak the DL at resume time.
> 
> If there is, then I believe I resolved this in :
>  "v4l: vsp1: Use local display lists and remove global pipe->dl"
> 
> but yes, if we end up not being able to use that patch then I'll take a
> look and see if there's anything more needs to be done here.
> 
>>>  	return 0;
>>>  }
>>
> 

-- 
Regards

Kieran Bingham

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCHv3 2/4] v4l: vsp1: Refactor video pipeline configuration
@ 2017-01-06 11:11         ` Kieran Bingham
  0 siblings, 0 replies; 18+ messages in thread
From: Kieran Bingham @ 2017-01-06 11:11 UTC (permalink / raw)
  To: Laurent Pinchart, Kieran Bingham; +Cc: linux-renesas-soc, linux-media

Hi Laurent,

I've been reworking this series to split things out and adapt for the
comments you've provided, but I have the following queries outstanding:

On 15/12/16 11:50, Kieran Bingham wrote:
> Hi Laurent,
> 
> On 14/12/16 16:30, Laurent Pinchart wrote:
>> Hi Kieran,
>>
>> Thank you for the patch.
>>
>> On Tuesday 13 Dec 2016 17:59:42 Kieran Bingham wrote:
>>> With multiple inputs through the BRU it is feasible for the streams to
>>> race each other at stream-on.
>>
>> Could you please explain the race condition in the commit message ? The issue 
>> is that multiple VIDIOC_STREAMON calls racing each other could have process 
>> N-1 skipping over the pipeline setup section and then start the pipeline, if 
>> videobuf2 has already enqueued buffers to the driver for process N but not 
>> called the .start_streaming() operation yet.
>>
>>> In the case of the video pipelines, this
>>> can present two serious issues.
>>>
>>>  1) A null-dereference if the pipe->dl is committed at the same time as
>>>     the vsp1_video_setup_pipeline() is processing
>>>
>>>  2) A hardware hang, where a display list is committed without having
>>>     called vsp1_video_setup_pipeline() first
>>>
>>> Along side these race conditions, the work done by
>>> vsp1_video_setup_pipeline() is undone by the re-initialisation during a
>>> suspend resume cycle, and an active pipeline does not attempt to
>>> reconfigure the correct routing and init parameters for the entities.
>>>
>>> To repair all of these issues, we can move the call to a conditional
>>> inside vsp1_video_pipeline_run() and ensure that this can only be called
>>> on the last stream which calls into vsp1_video_start_streaming()
>>>
>>> As a convenient side effect of this, by specifying that the
>>> configuration has been lost during suspend/resume actions - the
>>> vsp1_video_pipeline_run() call can re-initialise pipelines when
>>> necessary thus repairing resume actions for active m2m pipelines.
>>>
>>> Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
>>>
>>> ---
>>> v3:
>>>  - Move 'flag reset' to be inside the vsp1_reset_wpf() function call
>>>  - Tidy up the wpf->pipe reference for the configured flag
>>>
>>>  drivers/media/platform/vsp1/vsp1_drv.c   |  4 ++++
>>>  drivers/media/platform/vsp1/vsp1_pipe.c  |  1 +
>>>  drivers/media/platform/vsp1/vsp1_pipe.h  |  2 ++
>>>  drivers/media/platform/vsp1/vsp1_video.c | 20 +++++++++-----------
>>>  4 files changed, 16 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/drivers/media/platform/vsp1/vsp1_drv.c
>>> b/drivers/media/platform/vsp1/vsp1_drv.c index 57c713a4e1df..1dc3726c4e83
>>> 100644
>>> --- a/drivers/media/platform/vsp1/vsp1_drv.c
>>> +++ b/drivers/media/platform/vsp1/vsp1_drv.c
>>> @@ -413,6 +413,7 @@ static int vsp1_create_entities(struct vsp1_device
>>> *vsp1)
>>>
>>>  int vsp1_reset_wpf(struct vsp1_device *vsp1, unsigned int index)
>>>  {
>>> +	struct vsp1_rwpf *wpf = vsp1->wpf[index];
>>>  	unsigned int timeout;
>>>  	u32 status;
>>>
>>> @@ -429,6 +430,9 @@ int vsp1_reset_wpf(struct vsp1_device *vsp1, unsigned
>>> int index) usleep_range(1000, 2000);
>>>  	}
>>>
>>> +	if (wpf->pipe)
>>> +		wpf->pipe->configured = false;
>>> +
>>>  	if (!timeout) {
>>>  		dev_err(vsp1->dev, "failed to reset wpf.%u\n", index);
>>>  		return -ETIMEDOUT;
>>> diff --git a/drivers/media/platform/vsp1/vsp1_pipe.c
>>> b/drivers/media/platform/vsp1/vsp1_pipe.c index 756ca4ea7668..7ddf862ee403
>>> 100644
>>> --- a/drivers/media/platform/vsp1/vsp1_pipe.c
>>> +++ b/drivers/media/platform/vsp1/vsp1_pipe.c
>>> @@ -208,6 +208,7 @@ void vsp1_pipeline_init(struct vsp1_pipeline *pipe)
>>>
>>>  	INIT_LIST_HEAD(&pipe->entities);
>>>  	pipe->state = VSP1_PIPELINE_STOPPED;
>>> +	pipe->configured = false;
>>>  }
>>>
>>>  /* Must be called with the pipe irqlock held. */
>>> diff --git a/drivers/media/platform/vsp1/vsp1_pipe.h
>>> b/drivers/media/platform/vsp1/vsp1_pipe.h index ac4ad2655551..0743b9fcb655
>>> 100644
>>> --- a/drivers/media/platform/vsp1/vsp1_pipe.h
>>> +++ b/drivers/media/platform/vsp1/vsp1_pipe.h
>>> @@ -61,6 +61,7 @@ enum vsp1_pipeline_state {
>>>   * @pipe: the media pipeline
>>>   * @irqlock: protects the pipeline state
>>>   * @state: current state
>>> + * @configured: determines routing configuration active on cell.
>>
>> I'm not sure to understand that. How about "true if the pipeline has been set 
>> up" ? Or maybe "true if the pipeline has been set up for video streaming" as 
>> it only applies to pipelines handled through the V4L2 API ?
> 
> 
> Yes, Reading it now - I have no idea what context I was writing that in.
> I hope it was late and I was tired ... otherwise I have no excuse :D
> 
> 
> 
>>>   * @wq: wait queue to wait for state change completion
>>>   * @frame_end: frame end interrupt handler
>>>   * @lock: protects the pipeline use count and stream count
>>> @@ -86,6 +87,7 @@ struct vsp1_pipeline {
>>>
>>>  	spinlock_t irqlock;
>>>  	enum vsp1_pipeline_state state;
>>> +	bool configured;
>>>  	wait_queue_head_t wq;
>>>
>>>  	void (*frame_end)(struct vsp1_pipeline *pipe);
>>> diff --git a/drivers/media/platform/vsp1/vsp1_video.c
>>> b/drivers/media/platform/vsp1/vsp1_video.c index 44b687c0b8df..7ff9f4c19ff0
>>> 100644
>>> --- a/drivers/media/platform/vsp1/vsp1_video.c
>>> +++ b/drivers/media/platform/vsp1/vsp1_video.c
>>> @@ -388,6 +388,8 @@ static int vsp1_video_setup_pipeline(struct
>>> vsp1_pipeline *pipe) VSP1_ENTITY_PARAMS_INIT);
>>>  	}
>>>
>>> +	pipe->configured = true;
>>> +
>>>  	return 0;
>>>  }
>>>
>>> @@ -411,6 +413,9 @@ static void vsp1_video_pipeline_run(struct vsp1_pipeline
>>> *pipe) struct vsp1_device *vsp1 = pipe->output->entity.vsp1;
>>>  	struct vsp1_entity *entity;
>>>
>>> +	if (!pipe->configured)
>>> +		vsp1_video_setup_pipeline(pipe);
>>> +
>>
>> I don't like this much. The vsp1_video_pipeline_run() is called with a 
>> spinlock held. We should avoid operations as time consuming as 
>> vsp1_video_setup_pipeline() here.


I'm wondering where you stand on this now that the measurements have
been taken.

Is 18uS, once per stream start (/resume) still against your desires here?

Reading the comments from, struct vsp1_pipeline

 * @irqlock: protects the pipeline state
 * @lock: protects the pipeline use count and stream count

Doesn't this infer that calling vsp1_video_setup_pipeline, is better run
with the @irqlock held rather than the @lock anyway?


> I'm going to argue my case here, as I thought this was a more elegant
> solution (and I really seem to dislike that poorly protected global
> pipe->dl);
> 
> However I will back down quickly/happily if needed. Especially knowing
> that we are trying to reduce the recalculations of the DL, and I can see
> where those changes will lead us anyway.
> 
> Regardless of the outcome, I felt it was useful (at least to me) to
> measure the call times on these functions to get an understanding, thus
> I have used ftrace to measure/monitor a full run of the vsp-test suite:
> 
>  trace-cmd record \
>         -p function_graph \
>         -l 'vsp1_video_setup_pipeline' \
>         -l 'vsp1_irq_handler' \
>         \
>         ./vsp-tests.sh
> 
> * I had to mark vsp1_video_setup_pipeline() as noinline to be able to
> measure it with ftrace
> 
> A) Call frequency
> 
> Whilst vsp1_video_pipeline_run() is called for every frame, it should
> only be on the first frame of each stream, (or the first frame following
> a resume operation) where the flag pipe->configured == false.
> 
>  - Measuring in the entirety of the vsp-test suite,  I count 121
>    calls here.
>    (trace-cmd report | grep vsp1_video_setup_pipeline | wc -l)
> 
> 
> B) Function duration
> 
> The log of each of the function durations is available at:
>    http://paste.ubuntu.com/23632986/
> 
> Some quick and dirty statistical analysis shows the following duration
> times for this function.
> 
> trace-cmd report \
> 	| grep vsp1_video_setup_pipeline \
> 	| sed 's/.*funcgraph_entry:\W*\([0-9]*\.[0-9]*\) us.*/\1/' \
> 	| st
> 
> ## Numbers in uS ...
> N       min     max     sum     mean    stddev
> 121     4.68    17.76   887.885 7.33789 2.3195
> 
> 
> So we do hit nearly 18 uS in this function, which could be considered a
> lot for interrupt context, but when it is only once at the beginning of
> stream start?
>
> (Incidentally, running the same measurements on the whole of
> vsp1_irq_handler nets the following statistical results:
> 
> trace-cmd report \
> 	| grep vsp1_irq_handler \
> 	| sed 's/.*funcgraph_entry:\W*\([0-9]*\.[0-9]*\) us.*/\1/' \
> 	| st
> 
> N       min     max     sum     mean    stddev
> 12360   1.32    165.481 125934  10.1889 15.0117
> 
> Of course the coverage on the IRQ handler is more broad, and I'm not
> sure how to only measure the IRQ handler calls that result in calling
> vsp1_video_setup_pipeline().
> 

Those interrupt context comparisons are irrelevant now that you've
cleared up the reference was due to the spinlock being held, rather than
the calls being in interrupt context :)


> Anyway, like I said - possibly a moot point anyway - but I wanted to
> actually measure the function times to see what we're up against.
> 
>>>  	if (!pipe->dl)
>>>  		pipe->dl = vsp1_dl_list_get(pipe->output->dlm);
>>>
>>> @@ -793,25 +798,18 @@ static int vsp1_video_start_streaming(struct vb2_queue
>>> *vq, unsigned int count) struct vsp1_video *video = vb2_get_drv_priv(vq);
>>>  	struct vsp1_pipeline *pipe = video->rwpf->pipe;
>>>  	unsigned long flags;
>>> -	int ret;
>>>
>>>  	mutex_lock(&pipe->lock);
>>>  	if (pipe->stream_count == pipe->num_inputs) {
>>> -		ret = vsp1_video_setup_pipeline(pipe);
>>> -		if (ret < 0) {
>>> -			mutex_unlock(&pipe->lock);
>>> -			return ret;
>>> -		}
>>> +		spin_lock_irqsave(&pipe->irqlock, flags);
>>> +		if (vsp1_pipeline_ready(pipe))
>>> +			vsp1_video_pipeline_run(pipe);
>>> +		spin_unlock_irqrestore(&pipe->irqlock, flags);
>>>  	}
>>>
>>>  	pipe->stream_count++;
>>>  	mutex_unlock(&pipe->lock);
>>>
>>> -	spin_lock_irqsave(&pipe->irqlock, flags);
>>> -	if (vsp1_pipeline_ready(pipe))
>>> -		vsp1_video_pipeline_run(pipe);
>>> -	spin_unlock_irqrestore(&pipe->irqlock, flags);
>>> -
>>
>> How about the following ?
>>
>> 	bool start_pipeline = false;
>>
>>  	mutex_lock(&pipe->lock);
>>  	if (pipe->stream_count == pipe->num_inputs) {
>> 		ret = vsp1_video_setup_pipeline(pipe);
>> 		if (ret < 0) {
>> 			mutex_unlock(&pipe->lock);
>> 			return ret;
>> 		}
>>
>> 		start_pipeline = true;
>>  	}
>>
>>  	pipe->stream_count++;
>>  	mutex_unlock(&pipe->lock);
>>
>> 	/*
>> 	 * Don't attempt to start the pipeline if we haven't configured it
>> 	 * explicitly, as otherwise multiple streamon calls could race each
>> 	 * other and one of them try to start the pipeline unconfigured.
>> 	 */
>> 	if (!start_pipeline)
>> 		return 0;
>>
>> 	spin_lock_irqsave(&pipe->irqlock, flags);
>> 	if (vsp1_pipeline_ready(pipe))
>> 		vsp1_video_pipeline_run(pipe);
>> 	spin_unlock_irqrestore(&pipe->irqlock, flags);
> 
> Ok, I'm happy with that, it's effectively v1 of my patchset, before I
> discovered that the suspend resume issue was closely related.
> 
> 
>> This won't fix the suspend/resume issue, but I think splitting that to a 
>> separate patch would be a good idea anyway.
> 

Reverting back to the 'v1' style, and then moving the call to setup the
pipeline, into the pipeline_run() against a configured flag to fix for
suspend/resume cycles, netts the following code :

	mutex_lock(&pipe->lock);
	if (pipe->stream_count == pipe->num_inputs)
		start_pipeline = true;

	pipe->stream_count++;
	mutex_unlock(&pipe->lock);

	/*
	 * vsp1_pipeline_ready() is not sufficient to establish that all streams
	 * are prepared and the pipeline is configured, as multiple streams
	 * can race through streamon with buffers already queued; Therefore we
	 * don't even attempt to start the pipeline until the last stream has
	 * called through here.
	 */
	if (!start_pipeline)
		return 0;

	spin_lock_irqsave(&pipe->irqlock, flags);
	if (vsp1_pipeline_ready(pipe))
		vsp1_video_pipeline_run(pipe);
	spin_unlock_irqrestore(&pipe->irqlock, flags);



> I'll have to look again to consider how this could be done separately.
> 
> 
>> I'm also wondering whether we could have a streamon/suspend race. If the 
>> pipeline is setup and the DL allocated but not committed at suspend time I 
>> think we'll leak the DL at resume time.
> 
> If there is, then I believe I resolved this in :
>  "v4l: vsp1: Use local display lists and remove global pipe->dl"
> 
> but yes, if we end up not being able to use that patch then I'll take a
> look and see if there's anything more needs to be done here.
> 
>>>  	return 0;
>>>  }
>>
> 

-- 
Regards

Kieran Bingham

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2017-01-06 12:20 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-13 17:59 [PATCHv3 0/4] v4l: vsp1: Fix suspend/resume and race on M2M pipelines Kieran Bingham
2016-12-13 17:59 ` [PATCHv3 1/4] v4l: vsp1: Move vsp1_video_setup_pipeline() Kieran Bingham
2016-12-14 20:21   ` Laurent Pinchart
2016-12-13 17:59 ` [PATCHv3 2/4] v4l: vsp1: Refactor video pipeline configuration Kieran Bingham
2016-12-14 16:30   ` Laurent Pinchart
2016-12-15 11:50     ` Kieran Bingham
2017-01-06 11:11       ` Kieran Bingham
2017-01-06 11:11         ` Kieran Bingham
2016-12-13 17:59 ` [PATCHv3 3/4] v4l: vsp1: Use local display lists and remove global pipe->dl Kieran Bingham
2016-12-13 17:59 ` [PATCHv3 RFC 4/4] media: Catch null pipes on pipeline stop Kieran Bingham
2016-12-14  7:28   ` Sakari Ailus
2016-12-14 12:27     ` Kieran Bingham
2016-12-14 12:27       ` Kieran Bingham
2016-12-14 12:43       ` Sakari Ailus
2016-12-14 17:53         ` Kieran Bingham
2016-12-15  7:14           ` Sakari Ailus
2016-12-14 19:56   ` Kieran Bingham
2016-12-14 21:16     ` Laurent Pinchart

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.