All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/8] vsp1: TLB optimisation and DL caching
@ 2017-08-14 15:13 Kieran Bingham
  2017-08-14 15:13 ` [PATCH v2 1/8] v4l: vsp1: Protect fragments against overflow Kieran Bingham
                   ` (7 more replies)
  0 siblings, 8 replies; 32+ messages in thread
From: Kieran Bingham @ 2017-08-14 15:13 UTC (permalink / raw)
  To: laurent.pinchart, linux-renesas-soc, linux-media; +Cc: Kieran Bingham

Each display list currently allocates an area of DMA memory to store register
settings for the VSP1 to process. Each of these allocations adds pressure to
the IPMMU TLB entries.

We can reduce the pressure by pre-allocating larger areas and dividing the area
across multiple bodies represented as a pool.

With this reconfiguration of bodies, we can adapt the configuration code to
separate out constant hardware configuration and cache it for re-use.

Patch 1 adds protection to ensure that the display list body does not overflow
and will allow us to reduce the size of the body allocations in the future (it
has already helped me catch an overflow during the development of this series,
so I thought it was a worth while addition)

Patch 2 implements the fragment pool object and provides function helpers to
interact with the pool

Patch 3 converts the existing allocations to use the new fragment pool.

>From patch 4 to 7, we then refactor the display list handling code to separate
out the two stages of stream setup and frame configuration and then configure
directly into display list bodies. This allows us to cache the constant stream
configuration in a reusable display list body which also repairs suspend/resume
cycles for the video pipelines.

Finally in patch 8, the size of the internal display list body is reduced down
to 64 entries, as the maximum used is now 41 slots. The cached video pipeline
stream configuration appears to use a maximum of 64 entries, but to allow for
expansion this is set to 128 for now to prevent unexpected overflows.

Kieran Bingham (8):
  v4l: vsp1: Protect fragments against overflow
  v4l: vsp1: Provide a fragment pool
  v4l: vsp1: Convert display lists to use new fragment pool
  v4l: vsp1: Use reference counting for fragments
  v4l: vsp1: Refactor display list configure operations
  v4l: vsp1: Adapt entities to configure into a body
  v4l: vsp1: Move video configuration to a cached dlb
  v4l: vsp1: Reduce display list body size

 drivers/media/platform/vsp1/vsp1_bru.c    |  32 +--
 drivers/media/platform/vsp1/vsp1_clu.c    |  86 +++---
 drivers/media/platform/vsp1/vsp1_clu.h    |   1 +-
 drivers/media/platform/vsp1/vsp1_dl.c     | 331 ++++++++++++-----------
 drivers/media/platform/vsp1/vsp1_dl.h     |  13 +-
 drivers/media/platform/vsp1/vsp1_drm.c    |  21 +-
 drivers/media/platform/vsp1/vsp1_entity.c |  23 +-
 drivers/media/platform/vsp1/vsp1_entity.h |  31 +--
 drivers/media/platform/vsp1/vsp1_hgo.c    |  26 +--
 drivers/media/platform/vsp1/vsp1_hgt.c    |  28 +--
 drivers/media/platform/vsp1/vsp1_hsit.c   |  20 +-
 drivers/media/platform/vsp1/vsp1_lif.c    |  23 +--
 drivers/media/platform/vsp1/vsp1_lut.c    |  65 +++--
 drivers/media/platform/vsp1/vsp1_lut.h    |   1 +-
 drivers/media/platform/vsp1/vsp1_pipe.c   |   8 +-
 drivers/media/platform/vsp1/vsp1_pipe.h   |   7 +-
 drivers/media/platform/vsp1/vsp1_rpf.c    | 179 ++++++------
 drivers/media/platform/vsp1/vsp1_sru.c    |  24 +--
 drivers/media/platform/vsp1/vsp1_uds.c    |  73 ++---
 drivers/media/platform/vsp1/vsp1_uds.h    |   2 +-
 drivers/media/platform/vsp1/vsp1_video.c  |  82 +++---
 drivers/media/platform/vsp1/vsp1_video.h  |   2 +-
 drivers/media/platform/vsp1/vsp1_wpf.c    | 325 ++++++++++++-----------
 23 files changed, 753 insertions(+), 650 deletions(-)

base-commit: f44bd631453bf7dcbe57f79b924db3a6dd038bff
-- 
git-series 0.9.1

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 1/8] v4l: vsp1: Protect fragments against overflow
  2017-08-14 15:13 [PATCH v2 0/8] vsp1: TLB optimisation and DL caching Kieran Bingham
@ 2017-08-14 15:13 ` Kieran Bingham
  2017-08-16 21:53   ` Laurent Pinchart
  2017-08-14 15:13 ` [PATCH v2 2/8] v4l: vsp1: Provide a fragment pool Kieran Bingham
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 32+ messages in thread
From: Kieran Bingham @ 2017-08-14 15:13 UTC (permalink / raw)
  To: laurent.pinchart, linux-renesas-soc, linux-media; +Cc: Kieran Bingham

The fragment write function relies on the code never asking it to
write more than the entries available in the list.

Currently with each list body containing 256 entries, this is fine,
but we can reduce this number greatly saving memory.

In preparation of this - add a level of protection to catch any
buffer overflows.

Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
---
 drivers/media/platform/vsp1/vsp1_dl.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/media/platform/vsp1/vsp1_dl.c b/drivers/media/platform/vsp1/vsp1_dl.c
index 8b5cbb6b7a70..cb4625ae13c2 100644
--- a/drivers/media/platform/vsp1/vsp1_dl.c
+++ b/drivers/media/platform/vsp1/vsp1_dl.c
@@ -50,6 +50,7 @@ struct vsp1_dl_entry {
  * @dma: DMA address of the entries
  * @size: size of the DMA memory in bytes
  * @num_entries: number of stored entries
+ * @max_entries: number of entries available
  */
 struct vsp1_dl_body {
 	struct list_head list;
@@ -60,6 +61,7 @@ struct vsp1_dl_body {
 	size_t size;
 
 	unsigned int num_entries;
+	unsigned int max_entries;
 };
 
 /**
@@ -138,6 +140,7 @@ static int vsp1_dl_body_init(struct vsp1_device *vsp1,
 
 	dlb->vsp1 = vsp1;
 	dlb->size = size;
+	dlb->max_entries = num_entries;
 
 	dlb->entries = dma_alloc_wc(vsp1->bus_master, dlb->size, &dlb->dma,
 				    GFP_KERNEL);
@@ -220,6 +223,11 @@ void vsp1_dl_fragment_free(struct vsp1_dl_body *dlb)
  */
 void vsp1_dl_fragment_write(struct vsp1_dl_body *dlb, u32 reg, u32 data)
 {
+	if (unlikely(dlb->num_entries >= dlb->max_entries)) {
+		WARN_ONCE(true, "DLB size exceeded (max %u)", dlb->max_entries);
+		return;
+	}
+
 	dlb->entries[dlb->num_entries].addr = reg;
 	dlb->entries[dlb->num_entries].data = data;
 	dlb->num_entries++;
-- 
git-series 0.9.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 2/8] v4l: vsp1: Provide a fragment pool
  2017-08-14 15:13 [PATCH v2 0/8] vsp1: TLB optimisation and DL caching Kieran Bingham
  2017-08-14 15:13 ` [PATCH v2 1/8] v4l: vsp1: Protect fragments against overflow Kieran Bingham
@ 2017-08-14 15:13 ` Kieran Bingham
  2017-08-17 12:13   ` Laurent Pinchart
  2017-08-14 15:13 ` [PATCH v2 3/8] v4l: vsp1: Convert display lists to use new " Kieran Bingham
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 32+ messages in thread
From: Kieran Bingham @ 2017-08-14 15:13 UTC (permalink / raw)
  To: laurent.pinchart, linux-renesas-soc, linux-media; +Cc: Kieran Bingham

Each display list allocates a body to store register values in a dma
accessible buffer from a dma_alloc_wc() allocation. Each of these
results in an entry in the TLB, and a large number of display list
allocations adds pressure to this resource.

Reduce TLB pressure on the IPMMUs by allocating multiple display list
bodies in a single allocation, and providing these to the display list
through a 'fragment pool'. A pool can be allocated by the display list
manager or entities which require their own body allocations.

Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>

---
v2:
 - assign dlb->dma correctly
---
 drivers/media/platform/vsp1/vsp1_dl.c | 129 +++++++++++++++++++++++++++-
 drivers/media/platform/vsp1/vsp1_dl.h |   8 ++-
 2 files changed, 137 insertions(+)

diff --git a/drivers/media/platform/vsp1/vsp1_dl.c b/drivers/media/platform/vsp1/vsp1_dl.c
index cb4625ae13c2..aab9dd6ec0eb 100644
--- a/drivers/media/platform/vsp1/vsp1_dl.c
+++ b/drivers/media/platform/vsp1/vsp1_dl.c
@@ -45,6 +45,8 @@ struct vsp1_dl_entry {
 /**
  * struct vsp1_dl_body - Display list body
  * @list: entry in the display list list of bodies
+ * @free: entry in the pool free body list
+ * @pool: pool to which this body belongs
  * @vsp1: the VSP1 device
  * @entries: array of entries
  * @dma: DMA address of the entries
@@ -54,6 +56,9 @@ struct vsp1_dl_entry {
  */
 struct vsp1_dl_body {
 	struct list_head list;
+	struct list_head free;
+
+	struct vsp1_dl_fragment_pool *pool;
 	struct vsp1_device *vsp1;
 
 	struct vsp1_dl_entry *entries;
@@ -65,6 +70,30 @@ struct vsp1_dl_body {
 };
 
 /**
+ * struct vsp1_dl_fragment_pool - display list body/fragment pool
+ * @dma: DMA address of the entries
+ * @size: size of the full DMA memory pool in bytes
+ * @mem: CPU memory pointer for the pool
+ * @bodies: Array of DLB structures for the pool
+ * @free: List of free DLB entries
+ * @lock: Protects the pool and free list
+ * @vsp1: the VSP1 device
+ */
+struct vsp1_dl_fragment_pool {
+	/* DMA allocation */
+	dma_addr_t dma;
+	size_t size;
+	void *mem;
+
+	/* Body management */
+	struct vsp1_dl_body *bodies;
+	struct list_head free;
+	spinlock_t lock;
+
+	struct vsp1_device *vsp1;
+};
+
+/**
  * struct vsp1_dl_list - Display list
  * @list: entry in the display list manager lists
  * @dlm: the display list manager
@@ -104,6 +133,7 @@ enum vsp1_dl_mode {
  * @active: list currently being processed (loaded) by hardware
  * @queued: list queued to the hardware (written to the DL registers)
  * @pending: list waiting to be queued to the hardware
+ * @pool: fragment pool for the display list bodies
  * @gc_work: fragments garbage collector work struct
  * @gc_fragments: array of display list fragments waiting to be freed
  */
@@ -119,6 +149,8 @@ struct vsp1_dl_manager {
 	struct vsp1_dl_list *queued;
 	struct vsp1_dl_list *pending;
 
+	struct vsp1_dl_fragment_pool *pool;
+
 	struct work_struct gc_work;
 	struct list_head gc_fragments;
 };
@@ -128,6 +160,103 @@ struct vsp1_dl_manager {
  */
 
 /*
+ * Fragment pool's reduce the pressure on the iommu TLB by allocating a single
+ * large area of DMA memory and allocating it as a pool of fragment bodies
+ */
+struct vsp1_dl_fragment_pool *
+vsp1_dl_fragment_pool_alloc(struct vsp1_device *vsp1, unsigned int qty,
+			    unsigned int num_entries, size_t extra_size)
+{
+	struct vsp1_dl_fragment_pool *pool;
+	size_t dlb_size;
+	unsigned int i;
+
+	pool = kzalloc(sizeof(*pool), GFP_KERNEL);
+	if (!pool)
+		return NULL;
+
+	pool->vsp1 = vsp1;
+
+	dlb_size = num_entries * sizeof(struct vsp1_dl_entry) + extra_size;
+	pool->size = dlb_size * qty;
+
+	pool->bodies = kcalloc(qty, sizeof(*pool->bodies), GFP_KERNEL);
+	if (!pool->bodies) {
+		kfree(pool);
+		return NULL;
+	}
+
+	pool->mem = dma_alloc_wc(vsp1->bus_master, pool->size, &pool->dma,
+					    GFP_KERNEL);
+	if (!pool->mem) {
+		kfree(pool->bodies);
+		kfree(pool);
+		return NULL;
+	}
+
+	spin_lock_init(&pool->lock);
+	INIT_LIST_HEAD(&pool->free);
+
+	for (i = 0; i < qty; ++i) {
+		struct vsp1_dl_body *dlb = &pool->bodies[i];
+
+		dlb->pool = pool;
+		dlb->max_entries = num_entries;
+
+		dlb->dma = pool->dma + i * dlb_size;
+		dlb->entries = pool->mem + i * dlb_size;
+
+		list_add_tail(&dlb->free, &pool->free);
+	}
+
+	return pool;
+}
+
+void vsp1_dl_fragment_pool_free(struct vsp1_dl_fragment_pool *pool)
+{
+	if (!pool)
+		return;
+
+	if (pool->mem)
+		dma_free_wc(pool->vsp1->bus_master, pool->size, pool->mem,
+			    pool->dma);
+
+	kfree(pool->bodies);
+	kfree(pool);
+}
+
+struct vsp1_dl_body *vsp1_dl_fragment_get(struct vsp1_dl_fragment_pool *pool)
+{
+	struct vsp1_dl_body *dlb = NULL;
+	unsigned long flags;
+
+	spin_lock_irqsave(&pool->lock, flags);
+
+	if (!list_empty(&pool->free)) {
+		dlb = list_first_entry(&pool->free, struct vsp1_dl_body, free);
+		list_del(&dlb->free);
+	}
+
+	spin_unlock_irqrestore(&pool->lock, flags);
+
+	return dlb;
+}
+
+void vsp1_dl_fragment_put(struct vsp1_dl_body *dlb)
+{
+	unsigned long flags;
+
+	if (!dlb)
+		return;
+
+	dlb->num_entries = 0;
+
+	spin_lock_irqsave(&dlb->pool->lock, flags);
+	list_add_tail(&dlb->free, &dlb->pool->free);
+	spin_unlock_irqrestore(&dlb->pool->lock, flags);
+}
+
+/*
  * Initialize a display list body object and allocate DMA memory for the body
  * data. The display list body object is expected to have been initialized to
  * 0 when allocated.
diff --git a/drivers/media/platform/vsp1/vsp1_dl.h b/drivers/media/platform/vsp1/vsp1_dl.h
index ee3508172f0a..9528484a8a34 100644
--- a/drivers/media/platform/vsp1/vsp1_dl.h
+++ b/drivers/media/platform/vsp1/vsp1_dl.h
@@ -17,6 +17,7 @@
 
 struct vsp1_device;
 struct vsp1_dl_fragment;
+struct vsp1_dl_fragment_pool;
 struct vsp1_dl_list;
 struct vsp1_dl_manager;
 
@@ -34,6 +35,13 @@ void vsp1_dl_list_put(struct vsp1_dl_list *dl);
 void vsp1_dl_list_write(struct vsp1_dl_list *dl, u32 reg, u32 data);
 void vsp1_dl_list_commit(struct vsp1_dl_list *dl);
 
+struct vsp1_dl_fragment_pool *
+vsp1_dl_fragment_pool_alloc(struct vsp1_device *vsp1, unsigned int qty,
+			    unsigned int num_entries, size_t extra_size);
+void vsp1_dl_fragment_pool_free(struct vsp1_dl_fragment_pool *pool);
+struct vsp1_dl_body *vsp1_dl_fragment_get(struct vsp1_dl_fragment_pool *pool);
+void vsp1_dl_fragment_put(struct vsp1_dl_body *dlb);
+
 struct vsp1_dl_body *vsp1_dl_fragment_alloc(struct vsp1_device *vsp1,
 					    unsigned int num_entries);
 void vsp1_dl_fragment_free(struct vsp1_dl_body *dlb);
-- 
git-series 0.9.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 3/8] v4l: vsp1: Convert display lists to use new fragment pool
  2017-08-14 15:13 [PATCH v2 0/8] vsp1: TLB optimisation and DL caching Kieran Bingham
  2017-08-14 15:13 ` [PATCH v2 1/8] v4l: vsp1: Protect fragments against overflow Kieran Bingham
  2017-08-14 15:13 ` [PATCH v2 2/8] v4l: vsp1: Provide a fragment pool Kieran Bingham
@ 2017-08-14 15:13 ` Kieran Bingham
  2017-08-17 12:13   ` Laurent Pinchart
  2017-08-14 15:13 ` [PATCH v2 4/8] v4l: vsp1: Use reference counting for fragments Kieran Bingham
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 32+ messages in thread
From: Kieran Bingham @ 2017-08-14 15:13 UTC (permalink / raw)
  To: laurent.pinchart, linux-renesas-soc, linux-media; +Cc: Kieran Bingham

Adapt the dl->body0 object to use an object from the fragment pool.
This greatly reduces the pressure on the TLB for IPMMU use cases, as
all of the lists use a single allocation for the main body.

The CLU and LUT objects pre-allocate a pool containing two bodies,
allowing a userspace update before the hardware has committed a previous
set of tables.

Fragments are no longer 'freed' in interrupt context, but instead
released back to their respective pools.  This allows us to remove the
garbage collector in the DLM.

Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>

---
v2:
 - Use dl->body0->max_entries to determine header offset, instead of the
   global constant VSP1_DL_NUM_ENTRIES which is incorrect.
 - squash updates for LUT, CLU, and fragment cleanup into single patch.
   (Not fully bisectable when separated)
---
 drivers/media/platform/vsp1/vsp1_clu.c |  22 ++-
 drivers/media/platform/vsp1/vsp1_clu.h |   1 +-
 drivers/media/platform/vsp1/vsp1_dl.c  | 223 +++++---------------------
 drivers/media/platform/vsp1/vsp1_dl.h  |   3 +-
 drivers/media/platform/vsp1/vsp1_lut.c |  23 ++-
 drivers/media/platform/vsp1/vsp1_lut.h |   1 +-
 6 files changed, 90 insertions(+), 183 deletions(-)

diff --git a/drivers/media/platform/vsp1/vsp1_clu.c b/drivers/media/platform/vsp1/vsp1_clu.c
index f2fb26e5ab4e..52c523625e2f 100644
--- a/drivers/media/platform/vsp1/vsp1_clu.c
+++ b/drivers/media/platform/vsp1/vsp1_clu.c
@@ -23,6 +23,8 @@
 #define CLU_MIN_SIZE				4U
 #define CLU_MAX_SIZE				8190U
 
+#define CLU_SIZE				(17 * 17 * 17)
+
 /* -----------------------------------------------------------------------------
  * Device Access
  */
@@ -47,19 +49,19 @@ static int clu_set_table(struct vsp1_clu *clu, struct v4l2_ctrl *ctrl)
 	struct vsp1_dl_body *dlb;
 	unsigned int i;
 
-	dlb = vsp1_dl_fragment_alloc(clu->entity.vsp1, 1 + 17 * 17 * 17);
+	dlb = vsp1_dl_fragment_get(clu->pool);
 	if (!dlb)
 		return -ENOMEM;
 
 	vsp1_dl_fragment_write(dlb, VI6_CLU_ADDR, 0);
-	for (i = 0; i < 17 * 17 * 17; ++i)
+	for (i = 0; i < CLU_SIZE; ++i)
 		vsp1_dl_fragment_write(dlb, VI6_CLU_DATA, ctrl->p_new.p_u32[i]);
 
 	spin_lock_irq(&clu->lock);
 	swap(clu->clu, dlb);
 	spin_unlock_irq(&clu->lock);
 
-	vsp1_dl_fragment_free(dlb);
+	vsp1_dl_fragment_put(dlb);
 	return 0;
 }
 
@@ -261,8 +263,16 @@ static void clu_configure(struct vsp1_entity *entity,
 	}
 }
 
+static void clu_destroy(struct vsp1_entity *entity)
+{
+	struct vsp1_clu *clu = to_clu(&entity->subdev);
+
+	vsp1_dl_fragment_pool_free(clu->pool);
+}
+
 static const struct vsp1_entity_operations clu_entity_ops = {
 	.configure = clu_configure,
+	.destroy = clu_destroy,
 };
 
 /* -----------------------------------------------------------------------------
@@ -288,6 +298,12 @@ struct vsp1_clu *vsp1_clu_create(struct vsp1_device *vsp1)
 	if (ret < 0)
 		return ERR_PTR(ret);
 
+	/* Allocate a fragment pool */
+	clu->pool = vsp1_dl_fragment_pool_alloc(clu->entity.vsp1, 2,
+						CLU_SIZE + 1, 0);
+	if (!clu->pool)
+		return ERR_PTR(-ENOMEM);
+
 	/* Initialize the control handler. */
 	v4l2_ctrl_handler_init(&clu->ctrls, 2);
 	v4l2_ctrl_new_custom(&clu->ctrls, &clu_table_control, NULL);
diff --git a/drivers/media/platform/vsp1/vsp1_clu.h b/drivers/media/platform/vsp1/vsp1_clu.h
index 036e0a2f1a42..601ffb558e30 100644
--- a/drivers/media/platform/vsp1/vsp1_clu.h
+++ b/drivers/media/platform/vsp1/vsp1_clu.h
@@ -36,6 +36,7 @@ struct vsp1_clu {
 	spinlock_t lock;
 	unsigned int mode;
 	struct vsp1_dl_body *clu;
+	struct vsp1_dl_fragment_pool *pool;
 };
 
 static inline struct vsp1_clu *to_clu(struct v4l2_subdev *subdev)
diff --git a/drivers/media/platform/vsp1/vsp1_dl.c b/drivers/media/platform/vsp1/vsp1_dl.c
index aab9dd6ec0eb..6ffdc3549283 100644
--- a/drivers/media/platform/vsp1/vsp1_dl.c
+++ b/drivers/media/platform/vsp1/vsp1_dl.c
@@ -110,7 +110,7 @@ struct vsp1_dl_list {
 	struct vsp1_dl_header *header;
 	dma_addr_t dma;
 
-	struct vsp1_dl_body body0;
+	struct vsp1_dl_body *body0;
 	struct list_head fragments;
 
 	bool has_chain;
@@ -134,8 +134,6 @@ enum vsp1_dl_mode {
  * @queued: list queued to the hardware (written to the DL registers)
  * @pending: list waiting to be queued to the hardware
  * @pool: fragment pool for the display list bodies
- * @gc_work: fragments garbage collector work struct
- * @gc_fragments: array of display list fragments waiting to be freed
  */
 struct vsp1_dl_manager {
 	unsigned int index;
@@ -150,9 +148,6 @@ struct vsp1_dl_manager {
 	struct vsp1_dl_list *pending;
 
 	struct vsp1_dl_fragment_pool *pool;
-
-	struct work_struct gc_work;
-	struct list_head gc_fragments;
 };
 
 /* -----------------------------------------------------------------------------
@@ -256,90 +251,6 @@ void vsp1_dl_fragment_put(struct vsp1_dl_body *dlb)
 	spin_unlock_irqrestore(&dlb->pool->lock, flags);
 }
 
-/*
- * Initialize a display list body object and allocate DMA memory for the body
- * data. The display list body object is expected to have been initialized to
- * 0 when allocated.
- */
-static int vsp1_dl_body_init(struct vsp1_device *vsp1,
-			     struct vsp1_dl_body *dlb, unsigned int num_entries,
-			     size_t extra_size)
-{
-	size_t size = num_entries * sizeof(*dlb->entries) + extra_size;
-
-	dlb->vsp1 = vsp1;
-	dlb->size = size;
-	dlb->max_entries = num_entries;
-
-	dlb->entries = dma_alloc_wc(vsp1->bus_master, dlb->size, &dlb->dma,
-				    GFP_KERNEL);
-	if (!dlb->entries)
-		return -ENOMEM;
-
-	return 0;
-}
-
-/*
- * Cleanup a display list body and free allocated DMA memory allocated.
- */
-static void vsp1_dl_body_cleanup(struct vsp1_dl_body *dlb)
-{
-	dma_free_wc(dlb->vsp1->bus_master, dlb->size, dlb->entries, dlb->dma);
-}
-
-/**
- * vsp1_dl_fragment_alloc - Allocate a display list fragment
- * @vsp1: The VSP1 device
- * @num_entries: The maximum number of entries that the fragment can contain
- *
- * Allocate a display list fragment with enough memory to contain the requested
- * number of entries.
- *
- * Return a pointer to a fragment on success or NULL if memory can't be
- * allocated.
- */
-struct vsp1_dl_body *vsp1_dl_fragment_alloc(struct vsp1_device *vsp1,
-					    unsigned int num_entries)
-{
-	struct vsp1_dl_body *dlb;
-	int ret;
-
-	dlb = kzalloc(sizeof(*dlb), GFP_KERNEL);
-	if (!dlb)
-		return NULL;
-
-	ret = vsp1_dl_body_init(vsp1, dlb, num_entries, 0);
-	if (ret < 0) {
-		kfree(dlb);
-		return NULL;
-	}
-
-	return dlb;
-}
-
-/**
- * vsp1_dl_fragment_free - Free a display list fragment
- * @dlb: The fragment
- *
- * Free the given display list fragment and the associated DMA memory.
- *
- * Fragments must only be freed explicitly if they are not added to a display
- * list, as the display list will take ownership of them and free them
- * otherwise. Manual free typically happens at cleanup time for fragments that
- * have been allocated but not used.
- *
- * Passing a NULL pointer to this function is safe, in that case no operation
- * will be performed.
- */
-void vsp1_dl_fragment_free(struct vsp1_dl_body *dlb)
-{
-	if (!dlb)
-		return;
-
-	vsp1_dl_body_cleanup(dlb);
-	kfree(dlb);
-}
-
 /**
  * vsp1_dl_fragment_write - Write a register to a display list fragment
  * @dlb: The fragment
@@ -366,11 +277,10 @@ void vsp1_dl_fragment_write(struct vsp1_dl_body *dlb, u32 reg, u32 data)
  * Display List Transaction Management
  */
 
-static struct vsp1_dl_list *vsp1_dl_list_alloc(struct vsp1_dl_manager *dlm)
+static struct vsp1_dl_list *vsp1_dl_list_alloc(struct vsp1_dl_manager *dlm,
+					struct vsp1_dl_fragment_pool *pool)
 {
 	struct vsp1_dl_list *dl;
-	size_t header_size;
-	int ret;
 
 	dl = kzalloc(sizeof(*dl), GFP_KERNEL);
 	if (!dl)
@@ -379,41 +289,39 @@ static struct vsp1_dl_list *vsp1_dl_list_alloc(struct vsp1_dl_manager *dlm)
 	INIT_LIST_HEAD(&dl->fragments);
 	dl->dlm = dlm;
 
-	/*
-	 * Initialize the display list body and allocate DMA memory for the body
-	 * and the optional header. Both are allocated together to avoid memory
-	 * fragmentation, with the header located right after the body in
-	 * memory.
-	 */
-	header_size = dlm->mode == VSP1_DL_MODE_HEADER
-		    ? ALIGN(sizeof(struct vsp1_dl_header), 8)
-		    : 0;
-
-	ret = vsp1_dl_body_init(dlm->vsp1, &dl->body0, VSP1_DL_NUM_ENTRIES,
-				header_size);
-	if (ret < 0) {
-		kfree(dl);
+	/* Retrieve a body from our DLM body pool */
+	dl->body0 = vsp1_dl_fragment_get(pool);
+	if (!dl->body0)
 		return NULL;
-	}
-
 	if (dlm->mode == VSP1_DL_MODE_HEADER) {
-		size_t header_offset = VSP1_DL_NUM_ENTRIES
-				     * sizeof(*dl->body0.entries);
+		size_t header_offset = dl->body0->max_entries
+				     * sizeof(*dl->body0->entries);
 
-		dl->header = ((void *)dl->body0.entries) + header_offset;
-		dl->dma = dl->body0.dma + header_offset;
+		dl->header = ((void *)dl->body0->entries) + header_offset;
+		dl->dma = dl->body0->dma + header_offset;
 
 		memset(dl->header, 0, sizeof(*dl->header));
-		dl->header->lists[0].addr = dl->body0.dma;
+		dl->header->lists[0].addr = dl->body0->dma;
 	}
 
 	return dl;
 }
 
+static void vsp1_dl_list_fragments_free(struct vsp1_dl_list *dl)
+{
+	struct vsp1_dl_body *dlb, *tmp;
+
+	list_for_each_entry_safe(dlb, tmp, &dl->fragments, list) {
+		list_del(&dlb->list);
+		vsp1_dl_fragment_put(dlb);
+	}
+}
+
 static void vsp1_dl_list_free(struct vsp1_dl_list *dl)
 {
-	vsp1_dl_body_cleanup(&dl->body0);
-	list_splice_init(&dl->fragments, &dl->dlm->gc_fragments);
+	vsp1_dl_fragment_put(dl->body0);
+	vsp1_dl_list_fragments_free(dl);
+
 	kfree(dl);
 }
 
@@ -467,18 +375,10 @@ static void __vsp1_dl_list_put(struct vsp1_dl_list *dl)
 
 	dl->has_chain = false;
 
-	/*
-	 * We can't free fragments here as DMA memory can only be freed in
-	 * interruptible context. Move all fragments to the display list
-	 * manager's list of fragments to be freed, they will be
-	 * garbage-collected by the work queue.
-	 */
-	if (!list_empty(&dl->fragments)) {
-		list_splice_init(&dl->fragments, &dl->dlm->gc_fragments);
-		schedule_work(&dl->dlm->gc_work);
-	}
+	vsp1_dl_list_fragments_free(dl);
 
-	dl->body0.num_entries = 0;
+	/* body0 is reused */
+	dl->body0->num_entries = 0;
 
 	list_add_tail(&dl->list, &dl->dlm->free);
 }
@@ -515,7 +415,7 @@ void vsp1_dl_list_put(struct vsp1_dl_list *dl)
  */
 void vsp1_dl_list_write(struct vsp1_dl_list *dl, u32 reg, u32 data)
 {
-	vsp1_dl_fragment_write(&dl->body0, reg, data);
+	vsp1_dl_fragment_write(dl->body0, reg, data);
 }
 
 /**
@@ -528,8 +428,7 @@ void vsp1_dl_list_write(struct vsp1_dl_list *dl, u32 reg, u32 data)
  * list, in the order in which fragments are added.
  *
  * Adding a fragment to a display list passes ownership of the fragment to the
- * list. The caller must not touch the fragment after this call, and must not
- * free it explicitly with vsp1_dl_fragment_free().
+ * list. The caller must not touch the fragment after this call.
  *
  * Fragments are only usable for display lists in header mode. Attempt to
  * add a fragment to a header-less display list will return an error.
@@ -587,7 +486,7 @@ static void vsp1_dl_list_fill_header(struct vsp1_dl_list *dl, bool is_last)
 	 * list was allocated.
 	 */
 
-	hdr->num_bytes = dl->body0.num_entries
+	hdr->num_bytes = dl->body0->num_entries
 		       * sizeof(*dl->header->lists);
 
 	list_for_each_entry(dlb, &dl->fragments, list) {
@@ -660,9 +559,9 @@ static void vsp1_dl_list_hw_enqueue(struct vsp1_dl_list *dl)
 		 * bit will be cleared by the hardware when the display list
 		 * processing starts.
 		 */
-		vsp1_write(vsp1, VI6_DL_HDR_ADDR(0), dl->body0.dma);
+		vsp1_write(vsp1, VI6_DL_HDR_ADDR(0), dl->body0->dma);
 		vsp1_write(vsp1, VI6_DL_BODY_SIZE, VI6_DL_BODY_SIZE_UPD |
-			   (dl->body0.num_entries * sizeof(*dl->header->lists)));
+			   (dl->body0->num_entries * sizeof(*dl->header->lists)));
 	} else {
 		/*
 		 * In header mode, program the display list header address. If
@@ -845,45 +744,12 @@ void vsp1_dlm_reset(struct vsp1_dl_manager *dlm)
 	dlm->pending = NULL;
 }
 
-/*
- * Free all fragments awaiting to be garbage-collected.
- *
- * This function must be called without the display list manager lock held.
- */
-static void vsp1_dlm_fragments_free(struct vsp1_dl_manager *dlm)
-{
-	unsigned long flags;
-
-	spin_lock_irqsave(&dlm->lock, flags);
-
-	while (!list_empty(&dlm->gc_fragments)) {
-		struct vsp1_dl_body *dlb;
-
-		dlb = list_first_entry(&dlm->gc_fragments, struct vsp1_dl_body,
-				       list);
-		list_del(&dlb->list);
-
-		spin_unlock_irqrestore(&dlm->lock, flags);
-		vsp1_dl_fragment_free(dlb);
-		spin_lock_irqsave(&dlm->lock, flags);
-	}
-
-	spin_unlock_irqrestore(&dlm->lock, flags);
-}
-
-static void vsp1_dlm_garbage_collect(struct work_struct *work)
-{
-	struct vsp1_dl_manager *dlm =
-		container_of(work, struct vsp1_dl_manager, gc_work);
-
-	vsp1_dlm_fragments_free(dlm);
-}
-
 struct vsp1_dl_manager *vsp1_dlm_create(struct vsp1_device *vsp1,
 					unsigned int index,
 					unsigned int prealloc)
 {
 	struct vsp1_dl_manager *dlm;
+	size_t header_size;
 	unsigned int i;
 
 	dlm = devm_kzalloc(vsp1->dev, sizeof(*dlm), GFP_KERNEL);
@@ -898,13 +764,26 @@ struct vsp1_dl_manager *vsp1_dlm_create(struct vsp1_device *vsp1,
 
 	spin_lock_init(&dlm->lock);
 	INIT_LIST_HEAD(&dlm->free);
-	INIT_LIST_HEAD(&dlm->gc_fragments);
-	INIT_WORK(&dlm->gc_work, vsp1_dlm_garbage_collect);
+
+	/*
+	 * Initialize the display list body and allocate DMA memory for the body
+	 * and the optional header. Both are allocated together to avoid memory
+	 * fragmentation, with the header located right after the body in
+	 * memory.
+	 */
+	header_size = dlm->mode == VSP1_DL_MODE_HEADER
+		    ? ALIGN(sizeof(struct vsp1_dl_header), 8)
+		    : 0;
+
+	dlm->pool = vsp1_dl_fragment_pool_alloc(vsp1, prealloc,
+					VSP1_DL_NUM_ENTRIES, header_size);
+	if (!dlm->pool)
+		return NULL;
 
 	for (i = 0; i < prealloc; ++i) {
 		struct vsp1_dl_list *dl;
 
-		dl = vsp1_dl_list_alloc(dlm);
+		dl = vsp1_dl_list_alloc(dlm, dlm->pool);
 		if (!dl)
 			return NULL;
 
@@ -921,12 +800,10 @@ void vsp1_dlm_destroy(struct vsp1_dl_manager *dlm)
 	if (!dlm)
 		return;
 
-	cancel_work_sync(&dlm->gc_work);
-
 	list_for_each_entry_safe(dl, next, &dlm->free, list) {
 		list_del(&dl->list);
 		vsp1_dl_list_free(dl);
 	}
 
-	vsp1_dlm_fragments_free(dlm);
+	vsp1_dl_fragment_pool_free(dlm->pool);
 }
diff --git a/drivers/media/platform/vsp1/vsp1_dl.h b/drivers/media/platform/vsp1/vsp1_dl.h
index 9528484a8a34..e1718c3cbb7b 100644
--- a/drivers/media/platform/vsp1/vsp1_dl.h
+++ b/drivers/media/platform/vsp1/vsp1_dl.h
@@ -42,9 +42,6 @@ void vsp1_dl_fragment_pool_free(struct vsp1_dl_fragment_pool *pool);
 struct vsp1_dl_body *vsp1_dl_fragment_get(struct vsp1_dl_fragment_pool *pool);
 void vsp1_dl_fragment_put(struct vsp1_dl_body *dlb);
 
-struct vsp1_dl_body *vsp1_dl_fragment_alloc(struct vsp1_device *vsp1,
-					    unsigned int num_entries);
-void vsp1_dl_fragment_free(struct vsp1_dl_body *dlb);
 void vsp1_dl_fragment_write(struct vsp1_dl_body *dlb, u32 reg, u32 data);
 int vsp1_dl_list_add_fragment(struct vsp1_dl_list *dl,
 			      struct vsp1_dl_body *dlb);
diff --git a/drivers/media/platform/vsp1/vsp1_lut.c b/drivers/media/platform/vsp1/vsp1_lut.c
index c67cc60db0db..57482e057e54 100644
--- a/drivers/media/platform/vsp1/vsp1_lut.c
+++ b/drivers/media/platform/vsp1/vsp1_lut.c
@@ -23,6 +23,8 @@
 #define LUT_MIN_SIZE				4U
 #define LUT_MAX_SIZE				8190U
 
+#define LUT_SIZE				256
+
 /* -----------------------------------------------------------------------------
  * Device Access
  */
@@ -44,11 +46,11 @@ static int lut_set_table(struct vsp1_lut *lut, struct v4l2_ctrl *ctrl)
 	struct vsp1_dl_body *dlb;
 	unsigned int i;
 
-	dlb = vsp1_dl_fragment_alloc(lut->entity.vsp1, 256);
+	dlb = vsp1_dl_fragment_get(lut->pool);
 	if (!dlb)
 		return -ENOMEM;
 
-	for (i = 0; i < 256; ++i)
+	for (i = 0; i < LUT_SIZE; ++i)
 		vsp1_dl_fragment_write(dlb, VI6_LUT_TABLE + 4 * i,
 				       ctrl->p_new.p_u32[i]);
 
@@ -56,7 +58,7 @@ static int lut_set_table(struct vsp1_lut *lut, struct v4l2_ctrl *ctrl)
 	swap(lut->lut, dlb);
 	spin_unlock_irq(&lut->lock);
 
-	vsp1_dl_fragment_free(dlb);
+	vsp1_dl_fragment_put(dlb);
 	return 0;
 }
 
@@ -87,7 +89,7 @@ static const struct v4l2_ctrl_config lut_table_control = {
 	.max = 0x00ffffff,
 	.step = 1,
 	.def = 0,
-	.dims = { 256},
+	.dims = { LUT_SIZE },
 };
 
 /* -----------------------------------------------------------------------------
@@ -217,8 +219,16 @@ static void lut_configure(struct vsp1_entity *entity,
 	}
 }
 
+static void lut_destroy(struct vsp1_entity *entity)
+{
+	struct vsp1_lut *lut = to_lut(&entity->subdev);
+
+	vsp1_dl_fragment_pool_free(lut->pool);
+}
+
 static const struct vsp1_entity_operations lut_entity_ops = {
 	.configure = lut_configure,
+	.destroy = lut_destroy,
 };
 
 /* -----------------------------------------------------------------------------
@@ -244,6 +254,11 @@ struct vsp1_lut *vsp1_lut_create(struct vsp1_device *vsp1)
 	if (ret < 0)
 		return ERR_PTR(ret);
 
+	/* Allocate a fragment pool */
+	lut->pool = vsp1_dl_fragment_pool_alloc(vsp1, 2, LUT_SIZE, 0);
+	if (!lut->pool)
+		return ERR_PTR(-ENOMEM);
+
 	/* Initialize the control handler. */
 	v4l2_ctrl_handler_init(&lut->ctrls, 1);
 	v4l2_ctrl_new_custom(&lut->ctrls, &lut_table_control, NULL);
diff --git a/drivers/media/platform/vsp1/vsp1_lut.h b/drivers/media/platform/vsp1/vsp1_lut.h
index f8c4e8f0a79d..538563d57454 100644
--- a/drivers/media/platform/vsp1/vsp1_lut.h
+++ b/drivers/media/platform/vsp1/vsp1_lut.h
@@ -33,6 +33,7 @@ struct vsp1_lut {
 
 	spinlock_t lock;
 	struct vsp1_dl_body *lut;
+	struct vsp1_dl_fragment_pool *pool;
 };
 
 static inline struct vsp1_lut *to_lut(struct v4l2_subdev *subdev)
-- 
git-series 0.9.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 4/8] v4l: vsp1: Use reference counting for fragments
  2017-08-14 15:13 [PATCH v2 0/8] vsp1: TLB optimisation and DL caching Kieran Bingham
                   ` (2 preceding siblings ...)
  2017-08-14 15:13 ` [PATCH v2 3/8] v4l: vsp1: Convert display lists to use new " Kieran Bingham
@ 2017-08-14 15:13 ` Kieran Bingham
  2017-08-17 12:53   ` Laurent Pinchart
  2017-08-14 15:13 ` [PATCH v2 5/8] v4l: vsp1: Refactor display list configure operations Kieran Bingham
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 32+ messages in thread
From: Kieran Bingham @ 2017-08-14 15:13 UTC (permalink / raw)
  To: laurent.pinchart, linux-renesas-soc, linux-media; +Cc: Kieran Bingham

Extend the display list body with a reference count, allowing bodies to
be kept as long as a reference is maintained. This provides the ability
to keep a cached copy of bodies which will not change, so that they can
be re-applied to multiple display lists.

Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>

---
This could be squashed into the fragment update code, but it's not a
straightforward squash as the refcounts will affect both:
  v4l: vsp1: Provide a fragment pool
and
  v4l: vsp1: Convert display lists to use new fragment pool
therefore, I have kept this separate to prevent breaking bisectability
of the vsp-tests.
---
 drivers/media/platform/vsp1/vsp1_clu.c |  7 ++++++-
 drivers/media/platform/vsp1/vsp1_dl.c  | 15 ++++++++++++++-
 drivers/media/platform/vsp1/vsp1_lut.c |  7 ++++++-
 3 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/drivers/media/platform/vsp1/vsp1_clu.c b/drivers/media/platform/vsp1/vsp1_clu.c
index 52c523625e2f..175717018e11 100644
--- a/drivers/media/platform/vsp1/vsp1_clu.c
+++ b/drivers/media/platform/vsp1/vsp1_clu.c
@@ -257,8 +257,13 @@ static void clu_configure(struct vsp1_entity *entity,
 		clu->clu = NULL;
 		spin_unlock_irqrestore(&clu->lock, flags);
 
-		if (dlb)
+		if (dlb) {
 			vsp1_dl_list_add_fragment(dl, dlb);
+
+			/* release our local reference */
+			vsp1_dl_fragment_put(dlb);
+		}
+
 		break;
 	}
 }
diff --git a/drivers/media/platform/vsp1/vsp1_dl.c b/drivers/media/platform/vsp1/vsp1_dl.c
index 6ffdc3549283..37feda248946 100644
--- a/drivers/media/platform/vsp1/vsp1_dl.c
+++ b/drivers/media/platform/vsp1/vsp1_dl.c
@@ -14,6 +14,7 @@
 #include <linux/device.h>
 #include <linux/dma-mapping.h>
 #include <linux/gfp.h>
+#include <linux/refcount.h>
 #include <linux/slab.h>
 #include <linux/workqueue.h>
 
@@ -58,6 +59,8 @@ struct vsp1_dl_body {
 	struct list_head list;
 	struct list_head free;
 
+	refcount_t refcnt;
+
 	struct vsp1_dl_fragment_pool *pool;
 	struct vsp1_device *vsp1;
 
@@ -230,6 +233,7 @@ struct vsp1_dl_body *vsp1_dl_fragment_get(struct vsp1_dl_fragment_pool *pool)
 	if (!list_empty(&pool->free)) {
 		dlb = list_first_entry(&pool->free, struct vsp1_dl_body, free);
 		list_del(&dlb->free);
+		refcount_set(&dlb->refcnt, 1);
 	}
 
 	spin_unlock_irqrestore(&pool->lock, flags);
@@ -244,6 +248,9 @@ void vsp1_dl_fragment_put(struct vsp1_dl_body *dlb)
 	if (!dlb)
 		return;
 
+	if (!refcount_dec_and_test(&dlb->refcnt))
+		return;
+
 	dlb->num_entries = 0;
 
 	spin_lock_irqsave(&dlb->pool->lock, flags);
@@ -428,7 +435,11 @@ void vsp1_dl_list_write(struct vsp1_dl_list *dl, u32 reg, u32 data)
  * list, in the order in which fragments are added.
  *
  * Adding a fragment to a display list passes ownership of the fragment to the
- * list. The caller must not touch the fragment after this call.
+ * list. The caller must not modify the fragment after this call, but can retain
+ * a reference to it for future use if necessary, to add to subsequent lists.
+ *
+ * The reference count of the body is incremented by this attachment, and thus
+ * the caller should release it's reference if does not want to cache the body.
  *
  * Fragments are only usable for display lists in header mode. Attempt to
  * add a fragment to a header-less display list will return an error.
@@ -440,6 +451,8 @@ int vsp1_dl_list_add_fragment(struct vsp1_dl_list *dl,
 	if (dl->dlm->mode != VSP1_DL_MODE_HEADER)
 		return -EINVAL;
 
+	refcount_inc(&dlb->refcnt);
+
 	list_add_tail(&dlb->list, &dl->fragments);
 	return 0;
 }
diff --git a/drivers/media/platform/vsp1/vsp1_lut.c b/drivers/media/platform/vsp1/vsp1_lut.c
index 57482e057e54..388bd89ade0b 100644
--- a/drivers/media/platform/vsp1/vsp1_lut.c
+++ b/drivers/media/platform/vsp1/vsp1_lut.c
@@ -213,8 +213,13 @@ static void lut_configure(struct vsp1_entity *entity,
 		lut->lut = NULL;
 		spin_unlock_irqrestore(&lut->lock, flags);
 
-		if (dlb)
+		if (dlb) {
 			vsp1_dl_list_add_fragment(dl, dlb);
+
+			/* release our local reference */
+			vsp1_dl_fragment_put(dlb);
+		}
+
 		break;
 	}
 }
-- 
git-series 0.9.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 5/8] v4l: vsp1: Refactor display list configure operations
  2017-08-14 15:13 [PATCH v2 0/8] vsp1: TLB optimisation and DL caching Kieran Bingham
                   ` (3 preceding siblings ...)
  2017-08-14 15:13 ` [PATCH v2 4/8] v4l: vsp1: Use reference counting for fragments Kieran Bingham
@ 2017-08-14 15:13 ` Kieran Bingham
  2017-08-17 18:13   ` Laurent Pinchart
  2017-08-14 15:13 ` [PATCH v2 6/8] v4l: vsp1: Adapt entities to configure into a body Kieran Bingham
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 32+ messages in thread
From: Kieran Bingham @ 2017-08-14 15:13 UTC (permalink / raw)
  To: laurent.pinchart, linux-renesas-soc, linux-media; +Cc: Kieran Bingham

The entities provide a single .configure operation which configures the
object into the target display list, based on the vsp1_entity_params
selection.

This restricts us to a single function prototype for both static
configuration (the pre-stream INIT stage) and the dynamic runtime stages
for both each frame - and each partition therein.

Split the configure function into two parts, '.prepare()' and
'.configure()', merging both the VSP1_ENTITY_PARAMS_RUNTIME and
VSP1_ENTITY_PARAMS_PARTITION stages into a single call through the
.configure(). The configuration for individual partitions is handled by
passing the partition number to the configure call, and processing any
runtime stage actions on the first partition only.

Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
---
 drivers/media/platform/vsp1/vsp1_bru.c    |  12 +-
 drivers/media/platform/vsp1/vsp1_clu.c    |  43 +--
 drivers/media/platform/vsp1/vsp1_drm.c    |  11 +-
 drivers/media/platform/vsp1/vsp1_entity.c |  15 +-
 drivers/media/platform/vsp1/vsp1_entity.h |  27 +--
 drivers/media/platform/vsp1/vsp1_hgo.c    |  12 +-
 drivers/media/platform/vsp1/vsp1_hgt.c    |  12 +-
 drivers/media/platform/vsp1/vsp1_hsit.c   |  12 +-
 drivers/media/platform/vsp1/vsp1_lif.c    |  12 +-
 drivers/media/platform/vsp1/vsp1_lut.c    |  24 +-
 drivers/media/platform/vsp1/vsp1_rpf.c    | 162 ++++++-------
 drivers/media/platform/vsp1/vsp1_sru.c    |  12 +-
 drivers/media/platform/vsp1/vsp1_uds.c    |  55 ++--
 drivers/media/platform/vsp1/vsp1_video.c  |  24 +--
 drivers/media/platform/vsp1/vsp1_wpf.c    | 297 ++++++++++++-----------
 15 files changed, 359 insertions(+), 371 deletions(-)

diff --git a/drivers/media/platform/vsp1/vsp1_bru.c b/drivers/media/platform/vsp1/vsp1_bru.c
index e8fd2ae3b3eb..b9ff96f76b3e 100644
--- a/drivers/media/platform/vsp1/vsp1_bru.c
+++ b/drivers/media/platform/vsp1/vsp1_bru.c
@@ -285,19 +285,15 @@ static const struct v4l2_subdev_ops bru_ops = {
  * VSP1 Entity Operations
  */
 
-static void bru_configure(struct vsp1_entity *entity,
-			  struct vsp1_pipeline *pipe,
-			  struct vsp1_dl_list *dl,
-			  enum vsp1_entity_params params)
+static void bru_prepare(struct vsp1_entity *entity,
+			struct vsp1_pipeline *pipe,
+			struct vsp1_dl_list *dl)
 {
 	struct vsp1_bru *bru = to_bru(&entity->subdev);
 	struct v4l2_mbus_framefmt *format;
 	unsigned int flags;
 	unsigned int i;
 
-	if (params != VSP1_ENTITY_PARAMS_INIT)
-		return;
-
 	format = vsp1_entity_get_pad_format(&bru->entity, bru->entity.config,
 					    bru->entity.source_pad);
 
@@ -404,7 +400,7 @@ static void bru_configure(struct vsp1_entity *entity,
 }
 
 static const struct vsp1_entity_operations bru_entity_ops = {
-	.configure = bru_configure,
+	.prepare = bru_prepare,
 };
 
 /* -----------------------------------------------------------------------------
diff --git a/drivers/media/platform/vsp1/vsp1_clu.c b/drivers/media/platform/vsp1/vsp1_clu.c
index 175717018e11..5f65ce3ad97f 100644
--- a/drivers/media/platform/vsp1/vsp1_clu.c
+++ b/drivers/media/platform/vsp1/vsp1_clu.c
@@ -213,37 +213,37 @@ static const struct v4l2_subdev_ops clu_ops = {
 /* -----------------------------------------------------------------------------
  * VSP1 Entity Operations
  */
+static void clu_prepare(struct vsp1_entity *entity,
+			struct vsp1_pipeline *pipe,
+			struct vsp1_dl_list *dl)
+{
+	struct vsp1_clu *clu = to_clu(&entity->subdev);
+
+	/*
+	 * The format can't be changed during streaming, only verify it
+	 * at setup time and store the information internally for future
+	 * runtime configuration calls.
+	 */
+	struct v4l2_mbus_framefmt *format;
+
+	format = vsp1_entity_get_pad_format(&clu->entity,
+					    clu->entity.config,
+					    CLU_PAD_SINK);
+	clu->yuv_mode = format->code == MEDIA_BUS_FMT_AYUV8_1X32;
+}
 
 static void clu_configure(struct vsp1_entity *entity,
 			  struct vsp1_pipeline *pipe,
 			  struct vsp1_dl_list *dl,
-			  enum vsp1_entity_params params)
+			  unsigned int partition)
 {
 	struct vsp1_clu *clu = to_clu(&entity->subdev);
 	struct vsp1_dl_body *dlb;
 	unsigned long flags;
 	u32 ctrl = VI6_CLU_CTRL_AAI | VI6_CLU_CTRL_MVS | VI6_CLU_CTRL_EN;
 
-	switch (params) {
-	case VSP1_ENTITY_PARAMS_INIT: {
-		/*
-		 * The format can't be changed during streaming, only verify it
-		 * at setup time and store the information internally for future
-		 * runtime configuration calls.
-		 */
-		struct v4l2_mbus_framefmt *format;
-
-		format = vsp1_entity_get_pad_format(&clu->entity,
-						    clu->entity.config,
-						    CLU_PAD_SINK);
-		clu->yuv_mode = format->code == MEDIA_BUS_FMT_AYUV8_1X32;
-		break;
-	}
-
-	case VSP1_ENTITY_PARAMS_PARTITION:
-		break;
 
-	case VSP1_ENTITY_PARAMS_RUNTIME:
+	if (partition == 0) {
 		/* 2D mode can only be used with the YCbCr pixel encoding. */
 		if (clu->mode == V4L2_CID_VSP1_CLU_MODE_2D && clu->yuv_mode)
 			ctrl |= VI6_CLU_CTRL_AX1I_2D | VI6_CLU_CTRL_AX2I_2D
@@ -263,8 +263,6 @@ static void clu_configure(struct vsp1_entity *entity,
 			/* release our local reference */
 			vsp1_dl_fragment_put(dlb);
 		}
-
-		break;
 	}
 }
 
@@ -276,6 +274,7 @@ static void clu_destroy(struct vsp1_entity *entity)
 }
 
 static const struct vsp1_entity_operations clu_entity_ops = {
+	.prepare = clu_prepare,
 	.configure = clu_configure,
 	.destroy = clu_destroy,
 };
diff --git a/drivers/media/platform/vsp1/vsp1_drm.c b/drivers/media/platform/vsp1/vsp1_drm.c
index 4dfbeac8f42c..2a4fcb866629 100644
--- a/drivers/media/platform/vsp1/vsp1_drm.c
+++ b/drivers/media/platform/vsp1/vsp1_drm.c
@@ -558,15 +558,8 @@ void vsp1_du_atomic_flush(struct device *dev, unsigned int pipe_index)
 		}
 
 		vsp1_entity_route_setup(entity, pipe, dl);
-
-		if (entity->ops->configure) {
-			entity->ops->configure(entity, pipe, dl,
-					       VSP1_ENTITY_PARAMS_INIT);
-			entity->ops->configure(entity, pipe, dl,
-					       VSP1_ENTITY_PARAMS_RUNTIME);
-			entity->ops->configure(entity, pipe, dl,
-					       VSP1_ENTITY_PARAMS_PARTITION);
-		}
+		vsp1_entity_prepare(entity, pipe, dl);
+		vsp1_entity_configure(entity, pipe, dl, 0);
 	}
 
 	vsp1_dl_list_commit(dl);
diff --git a/drivers/media/platform/vsp1/vsp1_entity.c b/drivers/media/platform/vsp1/vsp1_entity.c
index 54de15095709..76f240f005af 100644
--- a/drivers/media/platform/vsp1/vsp1_entity.c
+++ b/drivers/media/platform/vsp1/vsp1_entity.c
@@ -73,6 +73,21 @@ void vsp1_entity_route_setup(struct vsp1_entity *entity,
 	vsp1_dl_list_write(dl, source->route->reg, route);
 }
 
+void vsp1_entity_prepare(struct vsp1_entity *entity, struct vsp1_pipeline *pipe,
+			 struct vsp1_dl_list *dl)
+{
+	if (entity->ops->prepare)
+		entity->ops->prepare(entity, pipe, dl);
+}
+
+void vsp1_entity_configure(struct vsp1_entity *entity,
+			   struct vsp1_pipeline *pipe, struct vsp1_dl_list *dl,
+			   unsigned int partition)
+{
+	if (entity->ops->configure)
+		entity->ops->configure(entity, pipe, dl, partition);
+}
+
 /* -----------------------------------------------------------------------------
  * V4L2 Subdevice Operations
  */
diff --git a/drivers/media/platform/vsp1/vsp1_entity.h b/drivers/media/platform/vsp1/vsp1_entity.h
index 408602ebeb97..2f33e343ccc6 100644
--- a/drivers/media/platform/vsp1/vsp1_entity.h
+++ b/drivers/media/platform/vsp1/vsp1_entity.h
@@ -40,18 +40,6 @@ enum vsp1_entity_type {
 	VSP1_ENTITY_WPF,
 };
 
-/**
- * enum vsp1_entity_params - Entity configuration parameters class
- * @VSP1_ENTITY_PARAMS_INIT - Initial parameters
- * @VSP1_ENTITY_PARAMS_PARTITION - Per-image partition parameters
- * @VSP1_ENTITY_PARAMS_RUNTIME - Runtime-configurable parameters
- */
-enum vsp1_entity_params {
-	VSP1_ENTITY_PARAMS_INIT,
-	VSP1_ENTITY_PARAMS_PARTITION,
-	VSP1_ENTITY_PARAMS_RUNTIME,
-};
-
 #define VSP1_ENTITY_MAX_INPUTS		5	/* For the BRU */
 
 /*
@@ -80,8 +68,10 @@ struct vsp1_route {
 /**
  * struct vsp1_entity_operations - Entity operations
  * @destroy:	Destroy the entity.
- * @configure:	Setup the hardware based on the entity state (pipeline, formats,
- *		selection rectangles, ...)
+ * @prepare:	Setup the initial hardware parameters for the stream (pipeline,
+ *		formats)
+ * @configure:	Configure the runtime parameters for each partition (rectangles,
+ *		buffer addresses, ...)
  * @max_width:	Return the max supported width of data that the entity can
  *		process in a single operation.
  * @partition:	Process the partition construction based on this entity's
@@ -89,8 +79,10 @@ struct vsp1_route {
  */
 struct vsp1_entity_operations {
 	void (*destroy)(struct vsp1_entity *);
+	void (*prepare)(struct vsp1_entity *, struct vsp1_pipeline *,
+			struct vsp1_dl_list *);
 	void (*configure)(struct vsp1_entity *, struct vsp1_pipeline *,
-			  struct vsp1_dl_list *, enum vsp1_entity_params);
+			  struct vsp1_dl_list *, unsigned int partition);
 	unsigned int (*max_width)(struct vsp1_entity *, struct vsp1_pipeline *);
 	void (*partition)(struct vsp1_entity *, struct vsp1_pipeline *,
 			  struct vsp1_partition *, unsigned int,
@@ -156,6 +148,11 @@ int vsp1_entity_init_cfg(struct v4l2_subdev *subdev,
 void vsp1_entity_route_setup(struct vsp1_entity *entity,
 			     struct vsp1_pipeline *pipe,
 			     struct vsp1_dl_list *dl);
+void vsp1_entity_prepare(struct vsp1_entity *entity, struct vsp1_pipeline *pipe,
+			 struct vsp1_dl_list *dl);
+void vsp1_entity_configure(struct vsp1_entity *entity,
+			   struct vsp1_pipeline *pipe, struct vsp1_dl_list *dl,
+			   unsigned int partition);
 
 struct media_pad *vsp1_entity_remote_pad(struct media_pad *pad);
 
diff --git a/drivers/media/platform/vsp1/vsp1_hgo.c b/drivers/media/platform/vsp1/vsp1_hgo.c
index 50309c053b78..5705ba67dbc8 100644
--- a/drivers/media/platform/vsp1/vsp1_hgo.c
+++ b/drivers/media/platform/vsp1/vsp1_hgo.c
@@ -133,10 +133,9 @@ static const struct v4l2_ctrl_config hgo_num_bins_control = {
  * VSP1 Entity Operations
  */
 
-static void hgo_configure(struct vsp1_entity *entity,
-			  struct vsp1_pipeline *pipe,
-			  struct vsp1_dl_list *dl,
-			  enum vsp1_entity_params params)
+static void hgo_prepare(struct vsp1_entity *entity,
+			struct vsp1_pipeline *pipe,
+			struct vsp1_dl_list *dl)
 {
 	struct vsp1_hgo *hgo = to_hgo(&entity->subdev);
 	struct v4l2_rect *compose;
@@ -144,9 +143,6 @@ static void hgo_configure(struct vsp1_entity *entity,
 	unsigned int hratio;
 	unsigned int vratio;
 
-	if (params != VSP1_ENTITY_PARAMS_INIT)
-		return;
-
 	crop = vsp1_entity_get_pad_selection(entity, entity->config,
 					     HISTO_PAD_SINK, V4L2_SEL_TGT_CROP);
 	compose = vsp1_entity_get_pad_selection(entity, entity->config,
@@ -178,7 +174,7 @@ static void hgo_configure(struct vsp1_entity *entity,
 }
 
 static const struct vsp1_entity_operations hgo_entity_ops = {
-	.configure = hgo_configure,
+	.prepare = hgo_prepare,
 	.destroy = vsp1_histogram_destroy,
 };
 
diff --git a/drivers/media/platform/vsp1/vsp1_hgt.c b/drivers/media/platform/vsp1/vsp1_hgt.c
index b5ce305e3e6f..bdd1247e090f 100644
--- a/drivers/media/platform/vsp1/vsp1_hgt.c
+++ b/drivers/media/platform/vsp1/vsp1_hgt.c
@@ -129,10 +129,9 @@ static const struct v4l2_ctrl_config hgt_hue_areas = {
  * VSP1 Entity Operations
  */
 
-static void hgt_configure(struct vsp1_entity *entity,
-			  struct vsp1_pipeline *pipe,
-			  struct vsp1_dl_list *dl,
-			  enum vsp1_entity_params params)
+static void hgt_prepare(struct vsp1_entity *entity,
+			struct vsp1_pipeline *pipe,
+			struct vsp1_dl_list *dl)
 {
 	struct vsp1_hgt *hgt = to_hgt(&entity->subdev);
 	struct v4l2_rect *compose;
@@ -143,9 +142,6 @@ static void hgt_configure(struct vsp1_entity *entity,
 	u8 upper;
 	unsigned int i;
 
-	if (params != VSP1_ENTITY_PARAMS_INIT)
-		return;
-
 	crop = vsp1_entity_get_pad_selection(entity, entity->config,
 					     HISTO_PAD_SINK, V4L2_SEL_TGT_CROP);
 	compose = vsp1_entity_get_pad_selection(entity, entity->config,
@@ -179,7 +175,7 @@ static void hgt_configure(struct vsp1_entity *entity,
 }
 
 static const struct vsp1_entity_operations hgt_entity_ops = {
-	.configure = hgt_configure,
+	.prepare = hgt_prepare,
 	.destroy = vsp1_histogram_destroy,
 };
 
diff --git a/drivers/media/platform/vsp1/vsp1_hsit.c b/drivers/media/platform/vsp1/vsp1_hsit.c
index 764d405345ee..cf96ce2c6da9 100644
--- a/drivers/media/platform/vsp1/vsp1_hsit.c
+++ b/drivers/media/platform/vsp1/vsp1_hsit.c
@@ -131,16 +131,12 @@ static const struct v4l2_subdev_ops hsit_ops = {
  * VSP1 Entity Operations
  */
 
-static void hsit_configure(struct vsp1_entity *entity,
-			   struct vsp1_pipeline *pipe,
-			   struct vsp1_dl_list *dl,
-			   enum vsp1_entity_params params)
+static void hsit_prepare(struct vsp1_entity *entity,
+			 struct vsp1_pipeline *pipe,
+			 struct vsp1_dl_list *dl)
 {
 	struct vsp1_hsit *hsit = to_hsit(&entity->subdev);
 
-	if (params != VSP1_ENTITY_PARAMS_INIT)
-		return;
-
 	if (hsit->inverse)
 		vsp1_hsit_write(hsit, dl, VI6_HSI_CTRL, VI6_HSI_CTRL_EN);
 	else
@@ -148,7 +144,7 @@ static void hsit_configure(struct vsp1_entity *entity,
 }
 
 static const struct vsp1_entity_operations hsit_entity_ops = {
-	.configure = hsit_configure,
+	.prepare = hsit_prepare,
 };
 
 /* -----------------------------------------------------------------------------
diff --git a/drivers/media/platform/vsp1/vsp1_lif.c b/drivers/media/platform/vsp1/vsp1_lif.c
index e6fa16d7fda8..0141bce92c2f 100644
--- a/drivers/media/platform/vsp1/vsp1_lif.c
+++ b/drivers/media/platform/vsp1/vsp1_lif.c
@@ -128,10 +128,9 @@ static const struct v4l2_subdev_ops lif_ops = {
  * VSP1 Entity Operations
  */
 
-static void lif_configure(struct vsp1_entity *entity,
-			  struct vsp1_pipeline *pipe,
-			  struct vsp1_dl_list *dl,
-			  enum vsp1_entity_params params)
+static void lif_prepare(struct vsp1_entity *entity,
+			struct vsp1_pipeline *pipe,
+			struct vsp1_dl_list *dl)
 {
 	const struct v4l2_mbus_framefmt *format;
 	struct vsp1_lif *lif = to_lif(&entity->subdev);
@@ -139,9 +138,6 @@ static void lif_configure(struct vsp1_entity *entity,
 	unsigned int obth = 400;
 	unsigned int lbth = 200;
 
-	if (params != VSP1_ENTITY_PARAMS_INIT)
-		return;
-
 	format = vsp1_entity_get_pad_format(&lif->entity, lif->entity.config,
 					    LIF_PAD_SOURCE);
 
@@ -158,7 +154,7 @@ static void lif_configure(struct vsp1_entity *entity,
 }
 
 static const struct vsp1_entity_operations lif_entity_ops = {
-	.configure = lif_configure,
+	.prepare = lif_prepare,
 };
 
 /* -----------------------------------------------------------------------------
diff --git a/drivers/media/platform/vsp1/vsp1_lut.c b/drivers/media/platform/vsp1/vsp1_lut.c
index 388bd89ade0b..0af074e65457 100644
--- a/drivers/media/platform/vsp1/vsp1_lut.c
+++ b/drivers/media/platform/vsp1/vsp1_lut.c
@@ -190,24 +190,25 @@ static const struct v4l2_subdev_ops lut_ops = {
  * VSP1 Entity Operations
  */
 
+static void lut_prepare(struct vsp1_entity *entity,
+			struct vsp1_pipeline *pipe,
+			struct vsp1_dl_list *dl)
+{
+	struct vsp1_lut *lut = to_lut(&entity->subdev);
+
+	vsp1_lut_write(lut, dl, VI6_LUT_CTRL, VI6_LUT_CTRL_EN);
+}
+
 static void lut_configure(struct vsp1_entity *entity,
 			  struct vsp1_pipeline *pipe,
 			  struct vsp1_dl_list *dl,
-			  enum vsp1_entity_params params)
+			  unsigned int partition)
 {
 	struct vsp1_lut *lut = to_lut(&entity->subdev);
 	struct vsp1_dl_body *dlb;
 	unsigned long flags;
 
-	switch (params) {
-	case VSP1_ENTITY_PARAMS_INIT:
-		vsp1_lut_write(lut, dl, VI6_LUT_CTRL, VI6_LUT_CTRL_EN);
-		break;
-
-	case VSP1_ENTITY_PARAMS_PARTITION:
-		break;
-
-	case VSP1_ENTITY_PARAMS_RUNTIME:
+	if (partition == 0) {
 		spin_lock_irqsave(&lut->lock, flags);
 		dlb = lut->lut;
 		lut->lut = NULL;
@@ -219,8 +220,6 @@ static void lut_configure(struct vsp1_entity *entity,
 			/* release our local reference */
 			vsp1_dl_fragment_put(dlb);
 		}
-
-		break;
 	}
 }
 
@@ -232,6 +231,7 @@ static void lut_destroy(struct vsp1_entity *entity)
 }
 
 static const struct vsp1_entity_operations lut_entity_ops = {
+	.prepare = lut_prepare,
 	.configure = lut_configure,
 	.destroy = lut_destroy,
 };
diff --git a/drivers/media/platform/vsp1/vsp1_rpf.c b/drivers/media/platform/vsp1/vsp1_rpf.c
index fe0633da5a5f..87a47997a086 100644
--- a/drivers/media/platform/vsp1/vsp1_rpf.c
+++ b/drivers/media/platform/vsp1/vsp1_rpf.c
@@ -46,10 +46,9 @@ static const struct v4l2_subdev_ops rpf_ops = {
  * VSP1 Entity Operations
  */
 
-static void rpf_configure(struct vsp1_entity *entity,
-			  struct vsp1_pipeline *pipe,
-			  struct vsp1_dl_list *dl,
-			  enum vsp1_entity_params params)
+static void rpf_prepare(struct vsp1_entity *entity,
+			struct vsp1_pipeline *pipe,
+			struct vsp1_dl_list *dl)
 {
 	struct vsp1_rwpf *rpf = to_rwpf(&entity->subdev);
 	const struct vsp1_format_info *fmtinfo = rpf->fmtinfo;
@@ -61,80 +60,6 @@ static void rpf_configure(struct vsp1_entity *entity,
 	u32 pstride;
 	u32 infmt;
 
-	if (params == VSP1_ENTITY_PARAMS_RUNTIME) {
-		vsp1_rpf_write(rpf, dl, VI6_RPF_VRTCOL_SET,
-			       rpf->alpha << VI6_RPF_VRTCOL_SET_LAYA_SHIFT);
-		vsp1_rpf_write(rpf, dl, VI6_RPF_MULT_ALPHA, rpf->mult_alpha |
-			       (rpf->alpha << VI6_RPF_MULT_ALPHA_RATIO_SHIFT));
-
-		vsp1_pipeline_propagate_alpha(pipe, dl, rpf->alpha);
-		return;
-	}
-
-	if (params == VSP1_ENTITY_PARAMS_PARTITION) {
-		struct vsp1_device *vsp1 = rpf->entity.vsp1;
-		struct vsp1_rwpf_memory mem = rpf->mem;
-		struct v4l2_rect crop;
-
-		/*
-		 * Source size and crop offsets.
-		 *
-		 * The crop offsets correspond to the location of the crop
-		 * rectangle top left corner in the plane buffer. Only two
-		 * offsets are needed, as planes 2 and 3 always have identical
-		 * strides.
-		 */
-		crop = *vsp1_rwpf_get_crop(rpf, rpf->entity.config);
-
-		/*
-		 * Partition Algorithm Control
-		 *
-		 * The partition algorithm can split this frame into multiple
-		 * slices. We must scale our partition window based on the pipe
-		 * configuration to match the destination partition window.
-		 * To achieve this, we adjust our crop to provide a 'sub-crop'
-		 * matching the expected partition window. Only 'left' and
-		 * 'width' need to be adjusted.
-		 */
-		if (pipe->partitions > 1) {
-			crop.width = pipe->partition->rpf.width;
-			crop.left += pipe->partition->rpf.left;
-		}
-
-		vsp1_rpf_write(rpf, dl, VI6_RPF_SRC_BSIZE,
-			       (crop.width << VI6_RPF_SRC_BSIZE_BHSIZE_SHIFT) |
-			       (crop.height << VI6_RPF_SRC_BSIZE_BVSIZE_SHIFT));
-		vsp1_rpf_write(rpf, dl, VI6_RPF_SRC_ESIZE,
-			       (crop.width << VI6_RPF_SRC_ESIZE_EHSIZE_SHIFT) |
-			       (crop.height << VI6_RPF_SRC_ESIZE_EVSIZE_SHIFT));
-
-		mem.addr[0] += crop.top * format->plane_fmt[0].bytesperline
-			     + crop.left * fmtinfo->bpp[0] / 8;
-
-		if (format->num_planes > 1) {
-			unsigned int offset;
-
-			offset = crop.top * format->plane_fmt[1].bytesperline
-			       + crop.left / fmtinfo->hsub
-			       * fmtinfo->bpp[1] / 8;
-			mem.addr[1] += offset;
-			mem.addr[2] += offset;
-		}
-
-		/*
-		 * On Gen3 hardware the SPUVS bit has no effect on 3-planar
-		 * formats. Swap the U and V planes manually in that case.
-		 */
-		if (vsp1->info->gen == 3 && format->num_planes == 3 &&
-		    fmtinfo->swap_uv)
-			swap(mem.addr[1], mem.addr[2]);
-
-		vsp1_rpf_write(rpf, dl, VI6_RPF_SRCM_ADDR_Y, mem.addr[0]);
-		vsp1_rpf_write(rpf, dl, VI6_RPF_SRCM_ADDR_C0, mem.addr[1]);
-		vsp1_rpf_write(rpf, dl, VI6_RPF_SRCM_ADDR_C1, mem.addr[2]);
-		return;
-	}
-
 	/* Stride */
 	pstride = format->plane_fmt[0].bytesperline
 		<< VI6_RPF_SRCM_PSTRIDE_Y_SHIFT;
@@ -247,6 +172,86 @@ static void rpf_configure(struct vsp1_entity *entity,
 
 }
 
+static void rpf_configure(struct vsp1_entity *entity,
+			  struct vsp1_pipeline *pipe,
+			  struct vsp1_dl_list *dl,
+			  unsigned int partition)
+{
+	struct vsp1_rwpf *rpf = to_rwpf(&entity->subdev);
+	struct vsp1_rwpf_memory mem = rpf->mem;
+	struct vsp1_device *vsp1 = rpf->entity.vsp1;
+	const struct vsp1_format_info *fmtinfo = rpf->fmtinfo;
+	const struct v4l2_pix_format_mplane *format = &rpf->format;
+	struct v4l2_rect crop;
+
+	if (partition == 0) {
+		vsp1_rpf_write(rpf, dl, VI6_RPF_VRTCOL_SET,
+			       rpf->alpha << VI6_RPF_VRTCOL_SET_LAYA_SHIFT);
+		vsp1_rpf_write(rpf, dl, VI6_RPF_MULT_ALPHA, rpf->mult_alpha |
+			       (rpf->alpha << VI6_RPF_MULT_ALPHA_RATIO_SHIFT));
+
+		vsp1_pipeline_propagate_alpha(pipe, dl, rpf->alpha);
+	}
+
+
+	/*
+	 * Source size and crop offsets.
+	 *
+	 * The crop offsets correspond to the location of the crop
+	 * rectangle top left corner in the plane buffer. Only two
+	 * offsets are needed, as planes 2 and 3 always have identical
+	 * strides.
+	 */
+	crop = *vsp1_rwpf_get_crop(rpf, rpf->entity.config);
+
+	/*
+	 * Partition Algorithm Control
+	 *
+	 * The partition algorithm can split this frame into multiple
+	 * slices. We must scale our partition window based on the pipe
+	 * configuration to match the destination partition window.
+	 * To achieve this, we adjust our crop to provide a 'sub-crop'
+	 * matching the expected partition window. Only 'left' and
+	 * 'width' need to be adjusted.
+	 */
+	if (pipe->partitions > 1) {
+		crop.width = pipe->partition->rpf.width;
+		crop.left += pipe->partition->rpf.left;
+	}
+
+	vsp1_rpf_write(rpf, dl, VI6_RPF_SRC_BSIZE,
+		       (crop.width << VI6_RPF_SRC_BSIZE_BHSIZE_SHIFT) |
+		       (crop.height << VI6_RPF_SRC_BSIZE_BVSIZE_SHIFT));
+	vsp1_rpf_write(rpf, dl, VI6_RPF_SRC_ESIZE,
+		       (crop.width << VI6_RPF_SRC_ESIZE_EHSIZE_SHIFT) |
+		       (crop.height << VI6_RPF_SRC_ESIZE_EVSIZE_SHIFT));
+
+	mem.addr[0] += crop.top * format->plane_fmt[0].bytesperline
+		     + crop.left * fmtinfo->bpp[0] / 8;
+
+	if (format->num_planes > 1) {
+		unsigned int offset;
+
+		offset = crop.top * format->plane_fmt[1].bytesperline
+		       + crop.left / fmtinfo->hsub
+		       * fmtinfo->bpp[1] / 8;
+		mem.addr[1] += offset;
+		mem.addr[2] += offset;
+	}
+
+	/*
+	 * On Gen3 hardware the SPUVS bit has no effect on 3-planar
+	 * formats. Swap the U and V planes manually in that case.
+	 */
+	if (vsp1->info->gen == 3 && format->num_planes == 3 &&
+	    fmtinfo->swap_uv)
+		swap(mem.addr[1], mem.addr[2]);
+
+	vsp1_rpf_write(rpf, dl, VI6_RPF_SRCM_ADDR_Y, mem.addr[0]);
+	vsp1_rpf_write(rpf, dl, VI6_RPF_SRCM_ADDR_C0, mem.addr[1]);
+	vsp1_rpf_write(rpf, dl, VI6_RPF_SRCM_ADDR_C1, mem.addr[2]);
+}
+
 static void rpf_partition(struct vsp1_entity *entity,
 			  struct vsp1_pipeline *pipe,
 			  struct vsp1_partition *partition,
@@ -257,6 +262,7 @@ static void rpf_partition(struct vsp1_entity *entity,
 }
 
 static const struct vsp1_entity_operations rpf_entity_ops = {
+	.prepare = rpf_prepare,
 	.configure = rpf_configure,
 	.partition = rpf_partition,
 };
diff --git a/drivers/media/platform/vsp1/vsp1_sru.c b/drivers/media/platform/vsp1/vsp1_sru.c
index 51e5691187c3..0a24bc59bc2f 100644
--- a/drivers/media/platform/vsp1/vsp1_sru.c
+++ b/drivers/media/platform/vsp1/vsp1_sru.c
@@ -271,10 +271,9 @@ static const struct v4l2_subdev_ops sru_ops = {
  * VSP1 Entity Operations
  */
 
-static void sru_configure(struct vsp1_entity *entity,
-			  struct vsp1_pipeline *pipe,
-			  struct vsp1_dl_list *dl,
-			  enum vsp1_entity_params params)
+static void sru_prepare(struct vsp1_entity *entity,
+			struct vsp1_pipeline *pipe,
+			struct vsp1_dl_list *dl)
 {
 	const struct vsp1_sru_param *param;
 	struct vsp1_sru *sru = to_sru(&entity->subdev);
@@ -282,9 +281,6 @@ static void sru_configure(struct vsp1_entity *entity,
 	struct v4l2_mbus_framefmt *output;
 	u32 ctrl0;
 
-	if (params != VSP1_ENTITY_PARAMS_INIT)
-		return;
-
 	input = vsp1_entity_get_pad_format(&sru->entity, sru->entity.config,
 					   SRU_PAD_SINK);
 	output = vsp1_entity_get_pad_format(&sru->entity, sru->entity.config,
@@ -351,7 +347,7 @@ static void sru_partition(struct vsp1_entity *entity,
 }
 
 static const struct vsp1_entity_operations sru_entity_ops = {
-	.configure = sru_configure,
+	.prepare = sru_prepare,
 	.max_width = sru_max_width,
 	.partition = sru_partition,
 };
diff --git a/drivers/media/platform/vsp1/vsp1_uds.c b/drivers/media/platform/vsp1/vsp1_uds.c
index 72f72a9d2152..84be962a33b1 100644
--- a/drivers/media/platform/vsp1/vsp1_uds.c
+++ b/drivers/media/platform/vsp1/vsp1_uds.c
@@ -259,10 +259,9 @@ static const struct v4l2_subdev_ops uds_ops = {
  * VSP1 Entity Operations
  */
 
-static void uds_configure(struct vsp1_entity *entity,
-			  struct vsp1_pipeline *pipe,
-			  struct vsp1_dl_list *dl,
-			  enum vsp1_entity_params params)
+static void uds_prepare(struct vsp1_entity *entity,
+			struct vsp1_pipeline *pipe,
+			struct vsp1_dl_list *dl)
 {
 	struct vsp1_uds *uds = to_uds(&entity->subdev);
 	const struct v4l2_mbus_framefmt *output;
@@ -276,27 +275,6 @@ static void uds_configure(struct vsp1_entity *entity,
 	output = vsp1_entity_get_pad_format(&uds->entity, uds->entity.config,
 					    UDS_PAD_SOURCE);
 
-	if (params == VSP1_ENTITY_PARAMS_PARTITION) {
-		struct vsp1_partition *partition = pipe->partition;
-
-		/* Input size clipping */
-		vsp1_uds_write(uds, dl, VI6_UDS_HSZCLIP, VI6_UDS_HSZCLIP_HCEN |
-			       (0 << VI6_UDS_HSZCLIP_HCL_OFST_SHIFT) |
-			       (partition->uds_sink.width
-					<< VI6_UDS_HSZCLIP_HCL_SIZE_SHIFT));
-
-		/* Output size clipping */
-		vsp1_uds_write(uds, dl, VI6_UDS_CLIP_SIZE,
-			       (partition->uds_source.width
-					<< VI6_UDS_CLIP_SIZE_HSIZE_SHIFT) |
-			       (output->height
-					<< VI6_UDS_CLIP_SIZE_VSIZE_SHIFT));
-		return;
-	}
-
-	if (params != VSP1_ENTITY_PARAMS_INIT)
-		return;
-
 	hscale = uds_compute_ratio(input->width, output->width);
 	vscale = uds_compute_ratio(input->height, output->height);
 
@@ -328,6 +306,32 @@ static void uds_configure(struct vsp1_entity *entity,
 		       (vscale << VI6_UDS_SCALE_VFRAC_SHIFT));
 }
 
+static void uds_configure(struct vsp1_entity *entity,
+			  struct vsp1_pipeline *pipe,
+			  struct vsp1_dl_list *dl,
+			  unsigned int pindex)
+{
+	struct vsp1_uds *uds = to_uds(&entity->subdev);
+	struct vsp1_partition *partition = pipe->partition;
+	const struct v4l2_mbus_framefmt *output;
+
+	output = vsp1_entity_get_pad_format(&uds->entity, uds->entity.config,
+					    UDS_PAD_SOURCE);
+
+	/* Input size clipping */
+	vsp1_uds_write(uds, dl, VI6_UDS_HSZCLIP, VI6_UDS_HSZCLIP_HCEN |
+		       (0 << VI6_UDS_HSZCLIP_HCL_OFST_SHIFT) |
+		       (partition->uds_sink.width
+				<< VI6_UDS_HSZCLIP_HCL_SIZE_SHIFT));
+
+	/* Output size clipping */
+	vsp1_uds_write(uds, dl, VI6_UDS_CLIP_SIZE,
+		       (partition->uds_source.width
+				<< VI6_UDS_CLIP_SIZE_HSIZE_SHIFT) |
+		       (output->height
+				<< VI6_UDS_CLIP_SIZE_VSIZE_SHIFT));
+}
+
 static unsigned int uds_max_width(struct vsp1_entity *entity,
 				  struct vsp1_pipeline *pipe)
 {
@@ -384,6 +388,7 @@ static void uds_partition(struct vsp1_entity *entity,
 }
 
 static const struct vsp1_entity_operations uds_entity_ops = {
+	.prepare = uds_prepare,
 	.configure = uds_configure,
 	.max_width = uds_max_width,
 	.partition = uds_partition,
diff --git a/drivers/media/platform/vsp1/vsp1_video.c b/drivers/media/platform/vsp1/vsp1_video.c
index c2d3b8f0f487..bd5403f24dda 100644
--- a/drivers/media/platform/vsp1/vsp1_video.c
+++ b/drivers/media/platform/vsp1/vsp1_video.c
@@ -386,33 +386,18 @@ static void vsp1_video_pipeline_run_partition(struct vsp1_pipeline *pipe,
 
 	pipe->partition = &pipe->part_table[partition];
 
-	list_for_each_entry(entity, &pipe->entities, list_pipe) {
-		if (entity->ops->configure)
-			entity->ops->configure(entity, pipe, dl,
-					       VSP1_ENTITY_PARAMS_PARTITION);
-	}
+	list_for_each_entry(entity, &pipe->entities, list_pipe)
+		vsp1_entity_configure(entity, pipe, dl, partition);
 }
 
 static void vsp1_video_pipeline_run(struct vsp1_pipeline *pipe)
 {
 	struct vsp1_device *vsp1 = pipe->output->entity.vsp1;
-	struct vsp1_entity *entity;
 	unsigned int partition;
 
 	if (!pipe->dl)
 		pipe->dl = vsp1_dl_list_get(pipe->output->dlm);
 
-	/*
-	 * Start with the runtime parameters as the configure operation can
-	 * compute/cache information needed when configuring partitions. This
-	 * is the case with flipping in the WPF.
-	 */
-	list_for_each_entry(entity, &pipe->entities, list_pipe) {
-		if (entity->ops->configure)
-			entity->ops->configure(entity, pipe, pipe->dl,
-					       VSP1_ENTITY_PARAMS_RUNTIME);
-	}
-
 	/* Run the first partition */
 	vsp1_video_pipeline_run_partition(pipe, pipe->dl, 0);
 
@@ -840,10 +825,7 @@ static int vsp1_video_setup_pipeline(struct vsp1_pipeline *pipe)
 
 	list_for_each_entry(entity, &pipe->entities, list_pipe) {
 		vsp1_entity_route_setup(entity, pipe, pipe->dl);
-
-		if (entity->ops->configure)
-			entity->ops->configure(entity, pipe, pipe->dl,
-					       VSP1_ENTITY_PARAMS_INIT);
+		vsp1_entity_prepare(entity, pipe, pipe->dl);
 	}
 
 	return 0;
diff --git a/drivers/media/platform/vsp1/vsp1_wpf.c b/drivers/media/platform/vsp1/vsp1_wpf.c
index f7f3b4b2c2de..d6dd7e783d27 100644
--- a/drivers/media/platform/vsp1/vsp1_wpf.c
+++ b/drivers/media/platform/vsp1/vsp1_wpf.c
@@ -236,10 +236,9 @@ static void vsp1_wpf_destroy(struct vsp1_entity *entity)
 	vsp1_dlm_destroy(wpf->dlm);
 }
 
-static void wpf_configure(struct vsp1_entity *entity,
-			  struct vsp1_pipeline *pipe,
-			  struct vsp1_dl_list *dl,
-			  enum vsp1_entity_params params)
+static void wpf_prepare(struct vsp1_entity *entity,
+			struct vsp1_pipeline *pipe,
+			struct vsp1_dl_list *dl)
 {
 	struct vsp1_rwpf *wpf = to_rwpf(&entity->subdev);
 	struct vsp1_device *vsp1 = wpf->entity.vsp1;
@@ -249,149 +248,12 @@ static void wpf_configure(struct vsp1_entity *entity,
 	u32 outfmt = 0;
 	u32 srcrpf = 0;
 
-	if (params == VSP1_ENTITY_PARAMS_RUNTIME) {
-		const unsigned int mask = BIT(WPF_CTRL_VFLIP)
-					| BIT(WPF_CTRL_HFLIP);
-		unsigned long flags;
-
-		spin_lock_irqsave(&wpf->flip.lock, flags);
-		wpf->flip.active = (wpf->flip.active & ~mask)
-				 | (wpf->flip.pending & mask);
-		spin_unlock_irqrestore(&wpf->flip.lock, flags);
-
-		outfmt = (wpf->alpha << VI6_WPF_OUTFMT_PDV_SHIFT) | wpf->outfmt;
-
-		if (wpf->flip.active & BIT(WPF_CTRL_VFLIP))
-			outfmt |= VI6_WPF_OUTFMT_FLP;
-		if (wpf->flip.active & BIT(WPF_CTRL_HFLIP))
-			outfmt |= VI6_WPF_OUTFMT_HFLP;
-
-		vsp1_wpf_write(wpf, dl, VI6_WPF_OUTFMT, outfmt);
-		return;
-	}
-
 	sink_format = vsp1_entity_get_pad_format(&wpf->entity,
 						 wpf->entity.config,
 						 RWPF_PAD_SINK);
 	source_format = vsp1_entity_get_pad_format(&wpf->entity,
 						   wpf->entity.config,
 						   RWPF_PAD_SOURCE);
-
-	if (params == VSP1_ENTITY_PARAMS_PARTITION) {
-		const struct v4l2_pix_format_mplane *format = &wpf->format;
-		const struct vsp1_format_info *fmtinfo = wpf->fmtinfo;
-		struct vsp1_rwpf_memory mem = wpf->mem;
-		unsigned int flip = wpf->flip.active;
-		unsigned int width = sink_format->width;
-		unsigned int height = sink_format->height;
-		unsigned int offset;
-
-		/*
-		 * Cropping. The partition algorithm can split the image into
-		 * multiple slices.
-		 */
-		if (pipe->partitions > 1)
-			width = pipe->partition->wpf.width;
-
-		vsp1_wpf_write(wpf, dl, VI6_WPF_HSZCLIP, VI6_WPF_SZCLIP_EN |
-			       (0 << VI6_WPF_SZCLIP_OFST_SHIFT) |
-			       (width << VI6_WPF_SZCLIP_SIZE_SHIFT));
-		vsp1_wpf_write(wpf, dl, VI6_WPF_VSZCLIP, VI6_WPF_SZCLIP_EN |
-			       (0 << VI6_WPF_SZCLIP_OFST_SHIFT) |
-			       (height << VI6_WPF_SZCLIP_SIZE_SHIFT));
-
-		if (pipe->lif)
-			return;
-
-		/*
-		 * Update the memory offsets based on flipping configuration.
-		 * The destination addresses point to the locations where the
-		 * VSP starts writing to memory, which can be any corner of the
-		 * image depending on the combination of flipping and rotation.
-		 */
-
-		/*
-		 * First take the partition left coordinate into account.
-		 * Compute the offset to order the partitions correctly on the
-		 * output based on whether flipping is enabled. Consider
-		 * horizontal flipping when rotation is disabled but vertical
-		 * flipping when rotation is enabled, as rotating the image
-		 * switches the horizontal and vertical directions. The offset
-		 * is applied horizontally or vertically accordingly.
-		 */
-		if (flip & BIT(WPF_CTRL_HFLIP) && !wpf->flip.rotate)
-			offset = format->width - pipe->partition->wpf.left
-				- pipe->partition->wpf.width;
-		else if (flip & BIT(WPF_CTRL_VFLIP) && wpf->flip.rotate)
-			offset = format->height - pipe->partition->wpf.left
-				- pipe->partition->wpf.width;
-		else
-			offset = pipe->partition->wpf.left;
-
-		for (i = 0; i < format->num_planes; ++i) {
-			unsigned int hsub = i > 0 ? fmtinfo->hsub : 1;
-			unsigned int vsub = i > 0 ? fmtinfo->vsub : 1;
-
-			if (wpf->flip.rotate)
-				mem.addr[i] += offset / vsub
-					     * format->plane_fmt[i].bytesperline;
-			else
-				mem.addr[i] += offset / hsub
-					     * fmtinfo->bpp[i] / 8;
-		}
-
-		if (flip & BIT(WPF_CTRL_VFLIP)) {
-			/*
-			 * When rotating the output (after rotation) image
-			 * height is equal to the partition width (before
-			 * rotation). Otherwise it is equal to the output
-			 * image height.
-			 */
-			if (wpf->flip.rotate)
-				height = pipe->partition->wpf.width;
-			else
-				height = format->height;
-
-			mem.addr[0] += (height - 1)
-				     * format->plane_fmt[0].bytesperline;
-
-			if (format->num_planes > 1) {
-				offset = (height / fmtinfo->vsub - 1)
-				       * format->plane_fmt[1].bytesperline;
-				mem.addr[1] += offset;
-				mem.addr[2] += offset;
-			}
-		}
-
-		if (wpf->flip.rotate && !(flip & BIT(WPF_CTRL_HFLIP))) {
-			unsigned int hoffset = max(0, (int)format->width - 16);
-
-			/*
-			 * Compute the output coordinate. The partition
-			 * horizontal (left) offset becomes a vertical offset.
-			 */
-			for (i = 0; i < format->num_planes; ++i) {
-				unsigned int hsub = i > 0 ? fmtinfo->hsub : 1;
-
-				mem.addr[i] += hoffset / hsub
-					     * fmtinfo->bpp[i] / 8;
-			}
-		}
-
-		/*
-		 * On Gen3 hardware the SPUVS bit has no effect on 3-planar
-		 * formats. Swap the U and V planes manually in that case.
-		 */
-		if (vsp1->info->gen == 3 && format->num_planes == 3 &&
-		    fmtinfo->swap_uv)
-			swap(mem.addr[1], mem.addr[2]);
-
-		vsp1_wpf_write(wpf, dl, VI6_WPF_DSTM_ADDR_Y, mem.addr[0]);
-		vsp1_wpf_write(wpf, dl, VI6_WPF_DSTM_ADDR_C0, mem.addr[1]);
-		vsp1_wpf_write(wpf, dl, VI6_WPF_DSTM_ADDR_C1, mem.addr[2]);
-		return;
-	}
-
 	/* Format */
 	if (!pipe->lif) {
 		const struct v4l2_pix_format_mplane *format = &wpf->format;
@@ -465,6 +327,158 @@ static void wpf_configure(struct vsp1_entity *entity,
 			   VI6_WFP_IRQ_ENB_DFEE);
 }
 
+static void wpf_configure(struct vsp1_entity *entity,
+			  struct vsp1_pipeline *pipe,
+			  struct vsp1_dl_list *dl,
+			  unsigned int partition)
+{
+	struct vsp1_rwpf *wpf = to_rwpf(&entity->subdev);
+	struct vsp1_device *vsp1 = wpf->entity.vsp1;
+	struct vsp1_rwpf_memory mem = wpf->mem;
+	const struct v4l2_mbus_framefmt *sink_format;
+	const struct v4l2_pix_format_mplane *format = &wpf->format;
+	const struct vsp1_format_info *fmtinfo = wpf->fmtinfo;
+	unsigned int flip;
+	unsigned int i;
+	unsigned int width;
+	unsigned int height;
+	unsigned int offset;
+	u32 outfmt = 0;
+
+	/* Handle the per frame constants */
+	if (partition == 0) {
+		const unsigned int mask = BIT(WPF_CTRL_VFLIP)
+					| BIT(WPF_CTRL_HFLIP);
+		unsigned long flags;
+
+		spin_lock_irqsave(&wpf->flip.lock, flags);
+		wpf->flip.active = (wpf->flip.active & ~mask)
+				 | (wpf->flip.pending & mask);
+		spin_unlock_irqrestore(&wpf->flip.lock, flags);
+
+		outfmt = (wpf->alpha << VI6_WPF_OUTFMT_PDV_SHIFT) | wpf->outfmt;
+
+		if (wpf->flip.active & BIT(WPF_CTRL_VFLIP))
+			outfmt |= VI6_WPF_OUTFMT_FLP;
+		if (wpf->flip.active & BIT(WPF_CTRL_HFLIP))
+			outfmt |= VI6_WPF_OUTFMT_HFLP;
+
+		vsp1_wpf_write(wpf, dl, VI6_WPF_OUTFMT, outfmt);
+	}
+
+	sink_format = vsp1_entity_get_pad_format(&wpf->entity,
+						 wpf->entity.config,
+						 RWPF_PAD_SINK);
+	width = sink_format->width;
+	height = sink_format->height;
+
+	/*
+	 * Cropping. The partition algorithm can split the image into
+	 * multiple slices.
+	 */
+	if (pipe->partitions > 1)
+		width = pipe->partition->wpf.width;
+
+	vsp1_wpf_write(wpf, dl, VI6_WPF_HSZCLIP, VI6_WPF_SZCLIP_EN |
+		       (0 << VI6_WPF_SZCLIP_OFST_SHIFT) |
+		       (width << VI6_WPF_SZCLIP_SIZE_SHIFT));
+	vsp1_wpf_write(wpf, dl, VI6_WPF_VSZCLIP, VI6_WPF_SZCLIP_EN |
+		       (0 << VI6_WPF_SZCLIP_OFST_SHIFT) |
+		       (height << VI6_WPF_SZCLIP_SIZE_SHIFT));
+
+	if (pipe->lif)
+		return;
+
+	/*
+	 * Update the memory offsets based on flipping configuration.
+	 * The destination addresses point to the locations where the
+	 * VSP starts writing to memory, which can be any corner of the
+	 * image depending on the combination of flipping and rotation.
+	 */
+
+	/*
+	 * First take the partition left coordinate into account.
+	 * Compute the offset to order the partitions correctly on the
+	 * output based on whether flipping is enabled. Consider
+	 * horizontal flipping when rotation is disabled but vertical
+	 * flipping when rotation is enabled, as rotating the image
+	 * switches the horizontal and vertical directions. The offset
+	 * is applied horizontally or vertically accordingly.
+	 */
+	flip = wpf->flip.active;
+
+	if (flip & BIT(WPF_CTRL_HFLIP) && !wpf->flip.rotate)
+		offset = format->width - pipe->partition->wpf.left
+			- pipe->partition->wpf.width;
+	else if (flip & BIT(WPF_CTRL_VFLIP) && wpf->flip.rotate)
+		offset = format->height - pipe->partition->wpf.left
+			- pipe->partition->wpf.width;
+	else
+		offset = pipe->partition->wpf.left;
+
+	for (i = 0; i < format->num_planes; ++i) {
+		unsigned int hsub = i > 0 ? fmtinfo->hsub : 1;
+		unsigned int vsub = i > 0 ? fmtinfo->vsub : 1;
+
+		if (wpf->flip.rotate)
+			mem.addr[i] += offset / vsub
+				     * format->plane_fmt[i].bytesperline;
+		else
+			mem.addr[i] += offset / hsub
+				     * fmtinfo->bpp[i] / 8;
+	}
+
+	if (flip & BIT(WPF_CTRL_VFLIP)) {
+		/*
+		 * When rotating the output (after rotation) image
+		 * height is equal to the partition width (before
+		 * rotation). Otherwise it is equal to the output
+		 * image height.
+		 */
+		if (wpf->flip.rotate)
+			height = pipe->partition->wpf.width;
+		else
+			height = format->height;
+
+		mem.addr[0] += (height - 1)
+			     * format->plane_fmt[0].bytesperline;
+
+		if (format->num_planes > 1) {
+			offset = (height / fmtinfo->vsub - 1)
+			       * format->plane_fmt[1].bytesperline;
+			mem.addr[1] += offset;
+			mem.addr[2] += offset;
+		}
+	}
+
+	if (wpf->flip.rotate && !(flip & BIT(WPF_CTRL_HFLIP))) {
+		unsigned int hoffset = max(0, (int)format->width - 16);
+
+		/*
+		 * Compute the output coordinate. The partition
+		 * horizontal (left) offset becomes a vertical offset.
+		 */
+		for (i = 0; i < format->num_planes; ++i) {
+			unsigned int hsub = i > 0 ? fmtinfo->hsub : 1;
+
+			mem.addr[i] += hoffset / hsub
+				     * fmtinfo->bpp[i] / 8;
+		}
+	}
+
+	/*
+	 * On Gen3 hardware the SPUVS bit has no effect on 3-planar
+	 * formats. Swap the U and V planes manually in that case.
+	 */
+	if (vsp1->info->gen == 3 && format->num_planes == 3 &&
+	    fmtinfo->swap_uv)
+		swap(mem.addr[1], mem.addr[2]);
+
+	vsp1_wpf_write(wpf, dl, VI6_WPF_DSTM_ADDR_Y, mem.addr[0]);
+	vsp1_wpf_write(wpf, dl, VI6_WPF_DSTM_ADDR_C0, mem.addr[1]);
+	vsp1_wpf_write(wpf, dl, VI6_WPF_DSTM_ADDR_C1, mem.addr[2]);
+}
+
 static unsigned int wpf_max_width(struct vsp1_entity *entity,
 				  struct vsp1_pipeline *pipe)
 {
@@ -484,6 +498,7 @@ static void wpf_partition(struct vsp1_entity *entity,
 
 static const struct vsp1_entity_operations wpf_entity_ops = {
 	.destroy = vsp1_wpf_destroy,
+	.prepare = wpf_prepare,
 	.configure = wpf_configure,
 	.max_width = wpf_max_width,
 	.partition = wpf_partition,
-- 
git-series 0.9.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 6/8] v4l: vsp1: Adapt entities to configure into a body
  2017-08-14 15:13 [PATCH v2 0/8] vsp1: TLB optimisation and DL caching Kieran Bingham
                   ` (4 preceding siblings ...)
  2017-08-14 15:13 ` [PATCH v2 5/8] v4l: vsp1: Refactor display list configure operations Kieran Bingham
@ 2017-08-14 15:13 ` Kieran Bingham
  2017-08-17 17:58   ` Laurent Pinchart
  2017-08-14 15:13 ` [PATCH v2 7/8] v4l: vsp1: Move video configuration to a cached dlb Kieran Bingham
  2017-08-14 15:13 ` [PATCH v2 8/8] v4l: vsp1: Reduce display list body size Kieran Bingham
  7 siblings, 1 reply; 32+ messages in thread
From: Kieran Bingham @ 2017-08-14 15:13 UTC (permalink / raw)
  To: laurent.pinchart, linux-renesas-soc, linux-media; +Cc: Kieran Bingham

Currently the entities store their configurations into a display list.
Adapt this such that the code can be configured into a body fragment
directly, allowing greater flexibility and control of the content.

All users of vsp1_dl_list_write() are removed in this process, thus it
too is removed.

A helper, vsp1_dl_list_body() is provided to access the internal body0
from the display list.

Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
---
 drivers/media/platform/vsp1/vsp1_bru.c    | 22 ++++++------
 drivers/media/platform/vsp1/vsp1_clu.c    | 22 ++++++------
 drivers/media/platform/vsp1/vsp1_dl.c     | 12 ++-----
 drivers/media/platform/vsp1/vsp1_dl.h     |  2 +-
 drivers/media/platform/vsp1/vsp1_drm.c    | 14 +++++---
 drivers/media/platform/vsp1/vsp1_entity.c | 16 ++++-----
 drivers/media/platform/vsp1/vsp1_entity.h | 12 ++++---
 drivers/media/platform/vsp1/vsp1_hgo.c    | 16 ++++-----
 drivers/media/platform/vsp1/vsp1_hgt.c    | 18 +++++-----
 drivers/media/platform/vsp1/vsp1_hsit.c   | 10 +++---
 drivers/media/platform/vsp1/vsp1_lif.c    | 13 +++----
 drivers/media/platform/vsp1/vsp1_lut.c    | 21 ++++++------
 drivers/media/platform/vsp1/vsp1_pipe.c   |  4 +-
 drivers/media/platform/vsp1/vsp1_pipe.h   |  3 +-
 drivers/media/platform/vsp1/vsp1_rpf.c    | 43 +++++++++++-------------
 drivers/media/platform/vsp1/vsp1_sru.c    | 14 ++++----
 drivers/media/platform/vsp1/vsp1_uds.c    | 24 +++++++------
 drivers/media/platform/vsp1/vsp1_uds.h    |  2 +-
 drivers/media/platform/vsp1/vsp1_video.c  | 11 ++++--
 drivers/media/platform/vsp1/vsp1_wpf.c    | 42 ++++++++++++-----------
 20 files changed, 168 insertions(+), 153 deletions(-)

diff --git a/drivers/media/platform/vsp1/vsp1_bru.c b/drivers/media/platform/vsp1/vsp1_bru.c
index b9ff96f76b3e..652b42e3ec2d 100644
--- a/drivers/media/platform/vsp1/vsp1_bru.c
+++ b/drivers/media/platform/vsp1/vsp1_bru.c
@@ -30,10 +30,10 @@
  * Device Access
  */
 
-static inline void vsp1_bru_write(struct vsp1_bru *bru, struct vsp1_dl_list *dl,
-				  u32 reg, u32 data)
+static inline void vsp1_bru_write(struct vsp1_bru *bru,
+				  struct vsp1_dl_body *dlb, u32 reg, u32 data)
 {
-	vsp1_dl_list_write(dl, bru->base + reg, data);
+	vsp1_dl_fragment_write(dlb, bru->base + reg, data);
 }
 
 /* -----------------------------------------------------------------------------
@@ -287,7 +287,7 @@ static const struct v4l2_subdev_ops bru_ops = {
 
 static void bru_prepare(struct vsp1_entity *entity,
 			struct vsp1_pipeline *pipe,
-			struct vsp1_dl_list *dl)
+			struct vsp1_dl_body *dlb)
 {
 	struct vsp1_bru *bru = to_bru(&entity->subdev);
 	struct v4l2_mbus_framefmt *format;
@@ -309,7 +309,7 @@ static void bru_prepare(struct vsp1_entity *entity,
 	 * format at the pipeline output is premultiplied.
 	 */
 	flags = pipe->output ? pipe->output->format.flags : 0;
-	vsp1_bru_write(bru, dl, VI6_BRU_INCTRL,
+	vsp1_bru_write(bru, dlb, VI6_BRU_INCTRL,
 		       flags & V4L2_PIX_FMT_FLAG_PREMUL_ALPHA ?
 		       0 : VI6_BRU_INCTRL_NRM);
 
@@ -317,12 +317,12 @@ static void bru_prepare(struct vsp1_entity *entity,
 	 * Set the background position to cover the whole output image and
 	 * configure its color.
 	 */
-	vsp1_bru_write(bru, dl, VI6_BRU_VIRRPF_SIZE,
+	vsp1_bru_write(bru, dlb, VI6_BRU_VIRRPF_SIZE,
 		       (format->width << VI6_BRU_VIRRPF_SIZE_HSIZE_SHIFT) |
 		       (format->height << VI6_BRU_VIRRPF_SIZE_VSIZE_SHIFT));
-	vsp1_bru_write(bru, dl, VI6_BRU_VIRRPF_LOC, 0);
+	vsp1_bru_write(bru, dlb, VI6_BRU_VIRRPF_LOC, 0);
 
-	vsp1_bru_write(bru, dl, VI6_BRU_VIRRPF_COL, bru->bgcolor |
+	vsp1_bru_write(bru, dlb, VI6_BRU_VIRRPF_COL, bru->bgcolor |
 		       (0xff << VI6_BRU_VIRRPF_COL_A_SHIFT));
 
 	/*
@@ -332,7 +332,7 @@ static void bru_prepare(struct vsp1_entity *entity,
 	 * unit.
 	 */
 	if (entity->type == VSP1_ENTITY_BRU)
-		vsp1_bru_write(bru, dl, VI6_BRU_ROP,
+		vsp1_bru_write(bru, dlb, VI6_BRU_ROP,
 			       VI6_BRU_ROP_DSTSEL_BRUIN(1) |
 			       VI6_BRU_ROP_CROP(VI6_ROP_NOP) |
 			       VI6_BRU_ROP_AROP(VI6_ROP_NOP));
@@ -374,7 +374,7 @@ static void bru_prepare(struct vsp1_entity *entity,
 		if (!(entity->type == VSP1_ENTITY_BRU && i == 1))
 			ctrl |= VI6_BRU_CTRL_SRCSEL_BRUIN(i);
 
-		vsp1_bru_write(bru, dl, VI6_BRU_CTRL(i), ctrl);
+		vsp1_bru_write(bru, dlb, VI6_BRU_CTRL(i), ctrl);
 
 		/*
 		 * Harcode the blending formula to
@@ -389,7 +389,7 @@ static void bru_prepare(struct vsp1_entity *entity,
 		 *
 		 * otherwise.
 		 */
-		vsp1_bru_write(bru, dl, VI6_BRU_BLD(i),
+		vsp1_bru_write(bru, dlb, VI6_BRU_BLD(i),
 			       VI6_BRU_BLD_CCMDX_255_SRC_A |
 			       (premultiplied ? VI6_BRU_BLD_CCMDY_COEFY :
 						VI6_BRU_BLD_CCMDY_SRC_A) |
diff --git a/drivers/media/platform/vsp1/vsp1_clu.c b/drivers/media/platform/vsp1/vsp1_clu.c
index 5f65ce3ad97f..b59f68bbb259 100644
--- a/drivers/media/platform/vsp1/vsp1_clu.c
+++ b/drivers/media/platform/vsp1/vsp1_clu.c
@@ -29,10 +29,10 @@
  * Device Access
  */
 
-static inline void vsp1_clu_write(struct vsp1_clu *clu, struct vsp1_dl_list *dl,
-				  u32 reg, u32 data)
+static inline void vsp1_clu_write(struct vsp1_clu *clu,
+				  struct vsp1_dl_body *dlb, u32 reg, u32 data)
 {
-	vsp1_dl_list_write(dl, reg, data);
+	vsp1_dl_fragment_write(dlb, reg, data);
 }
 
 /* -----------------------------------------------------------------------------
@@ -215,7 +215,7 @@ static const struct v4l2_subdev_ops clu_ops = {
  */
 static void clu_prepare(struct vsp1_entity *entity,
 			struct vsp1_pipeline *pipe,
-			struct vsp1_dl_list *dl)
+			struct vsp1_dl_body *dlb)
 {
 	struct vsp1_clu *clu = to_clu(&entity->subdev);
 
@@ -235,14 +235,14 @@ static void clu_prepare(struct vsp1_entity *entity,
 static void clu_configure(struct vsp1_entity *entity,
 			  struct vsp1_pipeline *pipe,
 			  struct vsp1_dl_list *dl,
+			  struct vsp1_dl_body *dlb,
 			  unsigned int partition)
 {
 	struct vsp1_clu *clu = to_clu(&entity->subdev);
-	struct vsp1_dl_body *dlb;
+	struct vsp1_dl_body *clu_dlb;
 	unsigned long flags;
 	u32 ctrl = VI6_CLU_CTRL_AAI | VI6_CLU_CTRL_MVS | VI6_CLU_CTRL_EN;
 
-
 	if (partition == 0) {
 		/* 2D mode can only be used with the YCbCr pixel encoding. */
 		if (clu->mode == V4L2_CID_VSP1_CLU_MODE_2D && clu->yuv_mode)
@@ -250,18 +250,18 @@ static void clu_configure(struct vsp1_entity *entity,
 			     |  VI6_CLU_CTRL_OS0_2D | VI6_CLU_CTRL_OS1_2D
 			     |  VI6_CLU_CTRL_OS2_2D | VI6_CLU_CTRL_M2D;
 
-		vsp1_clu_write(clu, dl, VI6_CLU_CTRL, ctrl);
+		vsp1_clu_write(clu, dlb, VI6_CLU_CTRL, ctrl);
 
 		spin_lock_irqsave(&clu->lock, flags);
-		dlb = clu->clu;
+		clu_dlb = clu->clu;
 		clu->clu = NULL;
 		spin_unlock_irqrestore(&clu->lock, flags);
 
-		if (dlb) {
-			vsp1_dl_list_add_fragment(dl, dlb);
+		if (clu_dlb) {
+			vsp1_dl_list_add_fragment(dl, clu_dlb);
 
 			/* release our local reference */
-			vsp1_dl_fragment_put(dlb);
+			vsp1_dl_fragment_put(clu_dlb);
 		}
 	}
 }
diff --git a/drivers/media/platform/vsp1/vsp1_dl.c b/drivers/media/platform/vsp1/vsp1_dl.c
index 37feda248946..176a258146ac 100644
--- a/drivers/media/platform/vsp1/vsp1_dl.c
+++ b/drivers/media/platform/vsp1/vsp1_dl.c
@@ -412,17 +412,15 @@ void vsp1_dl_list_put(struct vsp1_dl_list *dl)
 }
 
 /**
- * vsp1_dl_list_write - Write a register to the display list
+ * vsp1_dl_list_get_body - Obtain the default body for the display list
  * @dl: The display list
- * @reg: The register address
- * @data: The register value
  *
- * Write the given register and value to the display list. Up to 256 registers
- * can be written per display list.
+ * Obtain a pointer to the internal display list body allowing this to be passed
+ * directly to configure operations.
  */
-void vsp1_dl_list_write(struct vsp1_dl_list *dl, u32 reg, u32 data)
+struct vsp1_dl_body *vsp1_dl_list_body(struct vsp1_dl_list *dl)
 {
-	vsp1_dl_fragment_write(dl->body0, reg, data);
+	return dl->body0;
 }
 
 /**
diff --git a/drivers/media/platform/vsp1/vsp1_dl.h b/drivers/media/platform/vsp1/vsp1_dl.h
index e1718c3cbb7b..5310821175d5 100644
--- a/drivers/media/platform/vsp1/vsp1_dl.h
+++ b/drivers/media/platform/vsp1/vsp1_dl.h
@@ -32,7 +32,7 @@ bool vsp1_dlm_irq_frame_end(struct vsp1_dl_manager *dlm);
 
 struct vsp1_dl_list *vsp1_dl_list_get(struct vsp1_dl_manager *dlm);
 void vsp1_dl_list_put(struct vsp1_dl_list *dl);
-void vsp1_dl_list_write(struct vsp1_dl_list *dl, u32 reg, u32 data);
+struct vsp1_dl_body *vsp1_dl_list_body(struct vsp1_dl_list *dl);
 void vsp1_dl_list_commit(struct vsp1_dl_list *dl);
 
 struct vsp1_dl_fragment_pool *
diff --git a/drivers/media/platform/vsp1/vsp1_drm.c b/drivers/media/platform/vsp1/vsp1_drm.c
index 2a4fcb866629..08fb41717d1b 100644
--- a/drivers/media/platform/vsp1/vsp1_drm.c
+++ b/drivers/media/platform/vsp1/vsp1_drm.c
@@ -487,6 +487,7 @@ void vsp1_du_atomic_flush(struct device *dev, unsigned int pipe_index)
 	struct vsp1_entity *entity;
 	struct vsp1_entity *next;
 	struct vsp1_dl_list *dl;
+	struct vsp1_dl_body *dlb;
 	const char *bru_name;
 	unsigned long flags;
 	unsigned int i;
@@ -497,6 +498,9 @@ void vsp1_du_atomic_flush(struct device *dev, unsigned int pipe_index)
 	/* Prepare the display list. */
 	dl = vsp1_dl_list_get(pipe->output->dlm);
 
+	/* Retrieve the default DLB from the list */
+	dlb = vsp1_dl_list_body(dl);
+
 	/* Count the number of enabled inputs and sort them by Z-order. */
 	pipe->num_inputs = 0;
 
@@ -549,17 +553,17 @@ void vsp1_du_atomic_flush(struct device *dev, unsigned int pipe_index)
 		/* Disconnect unused RPFs from the pipeline. */
 		if (entity->type == VSP1_ENTITY_RPF &&
 		    !pipe->inputs[entity->index]) {
-			vsp1_dl_list_write(dl, entity->route->reg,
-					   VI6_DPR_NODE_UNUSED);
+			vsp1_dl_fragment_write(dlb, entity->route->reg,
+					       VI6_DPR_NODE_UNUSED);
 
 			list_del_init(&entity->list_pipe);
 
 			continue;
 		}
 
-		vsp1_entity_route_setup(entity, pipe, dl);
-		vsp1_entity_prepare(entity, pipe, dl);
-		vsp1_entity_configure(entity, pipe, dl, 0);
+		vsp1_entity_route_setup(entity, pipe, dlb);
+		vsp1_entity_prepare(entity, pipe, dlb);
+		vsp1_entity_configure(entity, pipe, dl, dlb, 0);
 	}
 
 	vsp1_dl_list_commit(dl);
diff --git a/drivers/media/platform/vsp1/vsp1_entity.c b/drivers/media/platform/vsp1/vsp1_entity.c
index 76f240f005af..012654347be2 100644
--- a/drivers/media/platform/vsp1/vsp1_entity.c
+++ b/drivers/media/platform/vsp1/vsp1_entity.c
@@ -26,7 +26,7 @@
 
 void vsp1_entity_route_setup(struct vsp1_entity *entity,
 			     struct vsp1_pipeline *pipe,
-			     struct vsp1_dl_list *dl)
+			     struct vsp1_dl_body *dlb)
 {
 	struct vsp1_entity *source;
 	u32 route;
@@ -42,7 +42,7 @@ void vsp1_entity_route_setup(struct vsp1_entity *entity,
 		smppt = (pipe->output->entity.index << VI6_DPR_SMPPT_TGW_SHIFT)
 		      | (source->route->output << VI6_DPR_SMPPT_PT_SHIFT);
 
-		vsp1_dl_list_write(dl, VI6_DPR_HGO_SMPPT, smppt);
+		vsp1_dl_fragment_write(dlb, VI6_DPR_HGO_SMPPT, smppt);
 		return;
 	} else if (entity->type == VSP1_ENTITY_HGT) {
 		u32 smppt;
@@ -55,7 +55,7 @@ void vsp1_entity_route_setup(struct vsp1_entity *entity,
 		smppt = (pipe->output->entity.index << VI6_DPR_SMPPT_TGW_SHIFT)
 		      | (source->route->output << VI6_DPR_SMPPT_PT_SHIFT);
 
-		vsp1_dl_list_write(dl, VI6_DPR_HGT_SMPPT, smppt);
+		vsp1_dl_fragment_write(dlb, VI6_DPR_HGT_SMPPT, smppt);
 		return;
 	}
 
@@ -70,22 +70,22 @@ void vsp1_entity_route_setup(struct vsp1_entity *entity,
 	 */
 	if (source->type == VSP1_ENTITY_BRS)
 		route |= VI6_DPR_ROUTE_BRSSEL;
-	vsp1_dl_list_write(dl, source->route->reg, route);
+	vsp1_dl_fragment_write(dlb, source->route->reg, route);
 }
 
 void vsp1_entity_prepare(struct vsp1_entity *entity, struct vsp1_pipeline *pipe,
-			 struct vsp1_dl_list *dl)
+			 struct vsp1_dl_body *dlb)
 {
 	if (entity->ops->prepare)
-		entity->ops->prepare(entity, pipe, dl);
+		entity->ops->prepare(entity, pipe, dlb);
 }
 
 void vsp1_entity_configure(struct vsp1_entity *entity,
 			   struct vsp1_pipeline *pipe, struct vsp1_dl_list *dl,
-			   unsigned int partition)
+			   struct vsp1_dl_body *dlb, unsigned int partition)
 {
 	if (entity->ops->configure)
-		entity->ops->configure(entity, pipe, dl, partition);
+		entity->ops->configure(entity, pipe, dl, dlb, partition);
 }
 
 /* -----------------------------------------------------------------------------
diff --git a/drivers/media/platform/vsp1/vsp1_entity.h b/drivers/media/platform/vsp1/vsp1_entity.h
index 2f33e343ccc6..4eb8afd7e402 100644
--- a/drivers/media/platform/vsp1/vsp1_entity.h
+++ b/drivers/media/platform/vsp1/vsp1_entity.h
@@ -19,6 +19,7 @@
 #include <media/v4l2-subdev.h>
 
 struct vsp1_device;
+struct vsp1_dl_body;
 struct vsp1_dl_list;
 struct vsp1_pipeline;
 struct vsp1_partition;
@@ -80,9 +81,10 @@ struct vsp1_route {
 struct vsp1_entity_operations {
 	void (*destroy)(struct vsp1_entity *);
 	void (*prepare)(struct vsp1_entity *, struct vsp1_pipeline *,
-			struct vsp1_dl_list *);
+			struct vsp1_dl_body *);
 	void (*configure)(struct vsp1_entity *, struct vsp1_pipeline *,
-			  struct vsp1_dl_list *, unsigned int partition);
+			  struct vsp1_dl_list *, struct vsp1_dl_body *,
+			  unsigned int partition);
 	unsigned int (*max_width)(struct vsp1_entity *, struct vsp1_pipeline *);
 	void (*partition)(struct vsp1_entity *, struct vsp1_pipeline *,
 			  struct vsp1_partition *, unsigned int,
@@ -147,12 +149,12 @@ int vsp1_entity_init_cfg(struct v4l2_subdev *subdev,
 
 void vsp1_entity_route_setup(struct vsp1_entity *entity,
 			     struct vsp1_pipeline *pipe,
-			     struct vsp1_dl_list *dl);
+			     struct vsp1_dl_body *dlb);
 void vsp1_entity_prepare(struct vsp1_entity *entity, struct vsp1_pipeline *pipe,
-			 struct vsp1_dl_list *dl);
+			 struct vsp1_dl_body *dlb);
 void vsp1_entity_configure(struct vsp1_entity *entity,
 			   struct vsp1_pipeline *pipe, struct vsp1_dl_list *dl,
-			   unsigned int partition);
+			   struct vsp1_dl_body *dlb, unsigned int partition);
 
 struct media_pad *vsp1_entity_remote_pad(struct media_pad *pad);
 
diff --git a/drivers/media/platform/vsp1/vsp1_hgo.c b/drivers/media/platform/vsp1/vsp1_hgo.c
index 5705ba67dbc8..89ccc2a02155 100644
--- a/drivers/media/platform/vsp1/vsp1_hgo.c
+++ b/drivers/media/platform/vsp1/vsp1_hgo.c
@@ -32,10 +32,10 @@ static inline u32 vsp1_hgo_read(struct vsp1_hgo *hgo, u32 reg)
 	return vsp1_read(hgo->histo.entity.vsp1, reg);
 }
 
-static inline void vsp1_hgo_write(struct vsp1_hgo *hgo, struct vsp1_dl_list *dl,
-				  u32 reg, u32 data)
+static inline void vsp1_hgo_write(struct vsp1_hgo *hgo,
+				  struct vsp1_dl_body *dlb, u32 reg, u32 data)
 {
-	vsp1_dl_list_write(dl, reg, data);
+	vsp1_dl_fragment_write(dlb, reg, data);
 }
 
 /* -----------------------------------------------------------------------------
@@ -135,7 +135,7 @@ static const struct v4l2_ctrl_config hgo_num_bins_control = {
 
 static void hgo_prepare(struct vsp1_entity *entity,
 			struct vsp1_pipeline *pipe,
-			struct vsp1_dl_list *dl)
+			struct vsp1_dl_body *dlb)
 {
 	struct vsp1_hgo *hgo = to_hgo(&entity->subdev);
 	struct v4l2_rect *compose;
@@ -149,12 +149,12 @@ static void hgo_prepare(struct vsp1_entity *entity,
 						HISTO_PAD_SINK,
 						V4L2_SEL_TGT_COMPOSE);
 
-	vsp1_hgo_write(hgo, dl, VI6_HGO_REGRST, VI6_HGO_REGRST_RCLEA);
+	vsp1_hgo_write(hgo, dlb, VI6_HGO_REGRST, VI6_HGO_REGRST_RCLEA);
 
-	vsp1_hgo_write(hgo, dl, VI6_HGO_OFFSET,
+	vsp1_hgo_write(hgo, dlb, VI6_HGO_OFFSET,
 		       (crop->left << VI6_HGO_OFFSET_HOFFSET_SHIFT) |
 		       (crop->top << VI6_HGO_OFFSET_VOFFSET_SHIFT));
-	vsp1_hgo_write(hgo, dl, VI6_HGO_SIZE,
+	vsp1_hgo_write(hgo, dlb, VI6_HGO_SIZE,
 		       (crop->width << VI6_HGO_SIZE_HSIZE_SHIFT) |
 		       (crop->height << VI6_HGO_SIZE_VSIZE_SHIFT));
 
@@ -166,7 +166,7 @@ static void hgo_prepare(struct vsp1_entity *entity,
 
 	hratio = crop->width * 2 / compose->width / 3;
 	vratio = crop->height * 2 / compose->height / 3;
-	vsp1_hgo_write(hgo, dl, VI6_HGO_MODE,
+	vsp1_hgo_write(hgo, dlb, VI6_HGO_MODE,
 		       (hgo->num_bins == 256 ? VI6_HGO_MODE_STEP : 0) |
 		       (hgo->max_rgb ? VI6_HGO_MODE_MAXRGB : 0) |
 		       (hratio << VI6_HGO_MODE_HRATIO_SHIFT) |
diff --git a/drivers/media/platform/vsp1/vsp1_hgt.c b/drivers/media/platform/vsp1/vsp1_hgt.c
index bdd1247e090f..ae92cacdf95e 100644
--- a/drivers/media/platform/vsp1/vsp1_hgt.c
+++ b/drivers/media/platform/vsp1/vsp1_hgt.c
@@ -32,10 +32,10 @@ static inline u32 vsp1_hgt_read(struct vsp1_hgt *hgt, u32 reg)
 	return vsp1_read(hgt->histo.entity.vsp1, reg);
 }
 
-static inline void vsp1_hgt_write(struct vsp1_hgt *hgt, struct vsp1_dl_list *dl,
-				  u32 reg, u32 data)
+static inline void vsp1_hgt_write(struct vsp1_hgt *hgt,
+				  struct vsp1_dl_body *dlb, u32 reg, u32 data)
 {
-	vsp1_dl_list_write(dl, reg, data);
+	vsp1_dl_fragment_write(dlb, reg, data);
 }
 
 /* -----------------------------------------------------------------------------
@@ -131,7 +131,7 @@ static const struct v4l2_ctrl_config hgt_hue_areas = {
 
 static void hgt_prepare(struct vsp1_entity *entity,
 			struct vsp1_pipeline *pipe,
-			struct vsp1_dl_list *dl)
+			struct vsp1_dl_body *dlb)
 {
 	struct vsp1_hgt *hgt = to_hgt(&entity->subdev);
 	struct v4l2_rect *compose;
@@ -148,12 +148,12 @@ static void hgt_prepare(struct vsp1_entity *entity,
 						HISTO_PAD_SINK,
 						V4L2_SEL_TGT_COMPOSE);
 
-	vsp1_hgt_write(hgt, dl, VI6_HGT_REGRST, VI6_HGT_REGRST_RCLEA);
+	vsp1_hgt_write(hgt, dlb, VI6_HGT_REGRST, VI6_HGT_REGRST_RCLEA);
 
-	vsp1_hgt_write(hgt, dl, VI6_HGT_OFFSET,
+	vsp1_hgt_write(hgt, dlb, VI6_HGT_OFFSET,
 		       (crop->left << VI6_HGT_OFFSET_HOFFSET_SHIFT) |
 		       (crop->top << VI6_HGT_OFFSET_VOFFSET_SHIFT));
-	vsp1_hgt_write(hgt, dl, VI6_HGT_SIZE,
+	vsp1_hgt_write(hgt, dlb, VI6_HGT_SIZE,
 		       (crop->width << VI6_HGT_SIZE_HSIZE_SHIFT) |
 		       (crop->height << VI6_HGT_SIZE_VSIZE_SHIFT));
 
@@ -161,7 +161,7 @@ static void hgt_prepare(struct vsp1_entity *entity,
 	for (i = 0; i < HGT_NUM_HUE_AREAS; ++i) {
 		lower = hgt->hue_areas[i*2 + 0];
 		upper = hgt->hue_areas[i*2 + 1];
-		vsp1_hgt_write(hgt, dl, VI6_HGT_HUE_AREA(i),
+		vsp1_hgt_write(hgt, dlb, VI6_HGT_HUE_AREA(i),
 			       (lower << VI6_HGT_HUE_AREA_LOWER_SHIFT) |
 			       (upper << VI6_HGT_HUE_AREA_UPPER_SHIFT));
 	}
@@ -169,7 +169,7 @@ static void hgt_prepare(struct vsp1_entity *entity,
 
 	hratio = crop->width * 2 / compose->width / 3;
 	vratio = crop->height * 2 / compose->height / 3;
-	vsp1_hgt_write(hgt, dl, VI6_HGT_MODE,
+	vsp1_hgt_write(hgt, dlb, VI6_HGT_MODE,
 		       (hratio << VI6_HGT_MODE_HRATIO_SHIFT) |
 		       (vratio << VI6_HGT_MODE_VRATIO_SHIFT));
 }
diff --git a/drivers/media/platform/vsp1/vsp1_hsit.c b/drivers/media/platform/vsp1/vsp1_hsit.c
index cf96ce2c6da9..dcb62d10240f 100644
--- a/drivers/media/platform/vsp1/vsp1_hsit.c
+++ b/drivers/media/platform/vsp1/vsp1_hsit.c
@@ -28,9 +28,9 @@
  */
 
 static inline void vsp1_hsit_write(struct vsp1_hsit *hsit,
-				   struct vsp1_dl_list *dl, u32 reg, u32 data)
+				   struct vsp1_dl_body *dlb, u32 reg, u32 data)
 {
-	vsp1_dl_list_write(dl, reg, data);
+	vsp1_dl_fragment_write(dlb, reg, data);
 }
 
 /* -----------------------------------------------------------------------------
@@ -133,14 +133,14 @@ static const struct v4l2_subdev_ops hsit_ops = {
 
 static void hsit_prepare(struct vsp1_entity *entity,
 			 struct vsp1_pipeline *pipe,
-			 struct vsp1_dl_list *dl)
+			 struct vsp1_dl_body *dlb)
 {
 	struct vsp1_hsit *hsit = to_hsit(&entity->subdev);
 
 	if (hsit->inverse)
-		vsp1_hsit_write(hsit, dl, VI6_HSI_CTRL, VI6_HSI_CTRL_EN);
+		vsp1_hsit_write(hsit, dlb, VI6_HSI_CTRL, VI6_HSI_CTRL_EN);
 	else
-		vsp1_hsit_write(hsit, dl, VI6_HST_CTRL, VI6_HST_CTRL_EN);
+		vsp1_hsit_write(hsit, dlb, VI6_HST_CTRL, VI6_HST_CTRL_EN);
 }
 
 static const struct vsp1_entity_operations hsit_entity_ops = {
diff --git a/drivers/media/platform/vsp1/vsp1_lif.c b/drivers/media/platform/vsp1/vsp1_lif.c
index 0141bce92c2f..4ed55f4e9d03 100644
--- a/drivers/media/platform/vsp1/vsp1_lif.c
+++ b/drivers/media/platform/vsp1/vsp1_lif.c
@@ -27,10 +27,11 @@
  * Device Access
  */
 
-static inline void vsp1_lif_write(struct vsp1_lif *lif, struct vsp1_dl_list *dl,
-				  u32 reg, u32 data)
+static inline void vsp1_lif_write(struct vsp1_lif *lif,
+				  struct vsp1_dl_body *dlb, u32 reg, u32 data)
 {
-	vsp1_dl_list_write(dl, reg + lif->entity.index * VI6_LIF_OFFSET, data);
+	vsp1_dl_fragment_write(dlb, reg + lif->entity.index * VI6_LIF_OFFSET,
+			       data);
 }
 
 /* -----------------------------------------------------------------------------
@@ -130,7 +131,7 @@ static const struct v4l2_subdev_ops lif_ops = {
 
 static void lif_prepare(struct vsp1_entity *entity,
 			struct vsp1_pipeline *pipe,
-			struct vsp1_dl_list *dl)
+			struct vsp1_dl_body *dlb)
 {
 	const struct v4l2_mbus_framefmt *format;
 	struct vsp1_lif *lif = to_lif(&entity->subdev);
@@ -143,11 +144,11 @@ static void lif_prepare(struct vsp1_entity *entity,
 
 	obth = min(obth, (format->width + 1) / 2 * format->height - 4);
 
-	vsp1_lif_write(lif, dl, VI6_LIF_CSBTH,
+	vsp1_lif_write(lif, dlb, VI6_LIF_CSBTH,
 			(hbth << VI6_LIF_CSBTH_HBTH_SHIFT) |
 			(lbth << VI6_LIF_CSBTH_LBTH_SHIFT));
 
-	vsp1_lif_write(lif, dl, VI6_LIF_CTRL,
+	vsp1_lif_write(lif, dlb, VI6_LIF_CTRL,
 			(obth << VI6_LIF_CTRL_OBTH_SHIFT) |
 			(format->code == 0 ? VI6_LIF_CTRL_CFMT : 0) |
 			VI6_LIF_CTRL_REQSEL | VI6_LIF_CTRL_LIF_EN);
diff --git a/drivers/media/platform/vsp1/vsp1_lut.c b/drivers/media/platform/vsp1/vsp1_lut.c
index 0af074e65457..0c48c6071186 100644
--- a/drivers/media/platform/vsp1/vsp1_lut.c
+++ b/drivers/media/platform/vsp1/vsp1_lut.c
@@ -29,10 +29,10 @@
  * Device Access
  */
 
-static inline void vsp1_lut_write(struct vsp1_lut *lut, struct vsp1_dl_list *dl,
-				  u32 reg, u32 data)
+static inline void vsp1_lut_write(struct vsp1_lut *lut,
+				  struct vsp1_dl_body *dlb, u32 reg, u32 data)
 {
-	vsp1_dl_list_write(dl, reg, data);
+	vsp1_dl_fragment_write(dlb, reg, data);
 }
 
 /* -----------------------------------------------------------------------------
@@ -192,33 +192,34 @@ static const struct v4l2_subdev_ops lut_ops = {
 
 static void lut_prepare(struct vsp1_entity *entity,
 			struct vsp1_pipeline *pipe,
-			struct vsp1_dl_list *dl)
+			struct vsp1_dl_body *dlb)
 {
 	struct vsp1_lut *lut = to_lut(&entity->subdev);
 
-	vsp1_lut_write(lut, dl, VI6_LUT_CTRL, VI6_LUT_CTRL_EN);
+	vsp1_lut_write(lut, dlb, VI6_LUT_CTRL, VI6_LUT_CTRL_EN);
 }
 
 static void lut_configure(struct vsp1_entity *entity,
 			  struct vsp1_pipeline *pipe,
 			  struct vsp1_dl_list *dl,
+			  struct vsp1_dl_body *dlb,
 			  unsigned int partition)
 {
 	struct vsp1_lut *lut = to_lut(&entity->subdev);
-	struct vsp1_dl_body *dlb;
+	struct vsp1_dl_body *lut_dlb;
 	unsigned long flags;
 
 	if (partition == 0) {
 		spin_lock_irqsave(&lut->lock, flags);
-		dlb = lut->lut;
+		lut_dlb = lut->lut;
 		lut->lut = NULL;
 		spin_unlock_irqrestore(&lut->lock, flags);
 
-		if (dlb) {
-			vsp1_dl_list_add_fragment(dl, dlb);
+		if (lut_dlb) {
+			vsp1_dl_list_add_fragment(dl, lut_dlb);
 
 			/* release our local reference */
-			vsp1_dl_fragment_put(dlb);
+			vsp1_dl_fragment_put(lut_dlb);
 		}
 	}
 }
diff --git a/drivers/media/platform/vsp1/vsp1_pipe.c b/drivers/media/platform/vsp1/vsp1_pipe.c
index 44944ac86d9b..5012643583b6 100644
--- a/drivers/media/platform/vsp1/vsp1_pipe.c
+++ b/drivers/media/platform/vsp1/vsp1_pipe.c
@@ -367,7 +367,7 @@ void vsp1_pipeline_frame_end(struct vsp1_pipeline *pipe)
  * from the input RPF alpha.
  */
 void vsp1_pipeline_propagate_alpha(struct vsp1_pipeline *pipe,
-				   struct vsp1_dl_list *dl, unsigned int alpha)
+				   struct vsp1_dl_body *dlb, unsigned int alpha)
 {
 	if (!pipe->uds)
 		return;
@@ -380,7 +380,7 @@ void vsp1_pipeline_propagate_alpha(struct vsp1_pipeline *pipe,
 	    pipe->uds_input->type == VSP1_ENTITY_BRS)
 		alpha = 255;
 
-	vsp1_uds_set_alpha(pipe->uds, dl, alpha);
+	vsp1_uds_set_alpha(pipe->uds, dlb, alpha);
 }
 
 /*
diff --git a/drivers/media/platform/vsp1/vsp1_pipe.h b/drivers/media/platform/vsp1/vsp1_pipe.h
index dfff9b5685fe..90d29492b9b9 100644
--- a/drivers/media/platform/vsp1/vsp1_pipe.h
+++ b/drivers/media/platform/vsp1/vsp1_pipe.h
@@ -161,7 +161,8 @@ bool vsp1_pipeline_ready(struct vsp1_pipeline *pipe);
 void vsp1_pipeline_frame_end(struct vsp1_pipeline *pipe);
 
 void vsp1_pipeline_propagate_alpha(struct vsp1_pipeline *pipe,
-				   struct vsp1_dl_list *dl, unsigned int alpha);
+				   struct vsp1_dl_body *dlb,
+				   unsigned int alpha);
 
 void vsp1_pipeline_propagate_partition(struct vsp1_pipeline *pipe,
 				       struct vsp1_partition *partition,
diff --git a/drivers/media/platform/vsp1/vsp1_rpf.c b/drivers/media/platform/vsp1/vsp1_rpf.c
index 87a47997a086..cc234929ca54 100644
--- a/drivers/media/platform/vsp1/vsp1_rpf.c
+++ b/drivers/media/platform/vsp1/vsp1_rpf.c
@@ -29,9 +29,10 @@
  */
 
 static inline void vsp1_rpf_write(struct vsp1_rwpf *rpf,
-				  struct vsp1_dl_list *dl, u32 reg, u32 data)
+				  struct vsp1_dl_body *dlb, u32 reg, u32 data)
 {
-	vsp1_dl_list_write(dl, reg + rpf->entity.index * VI6_RPF_OFFSET, data);
+	vsp1_dl_fragment_write(dlb, reg + rpf->entity.index * VI6_RPF_OFFSET,
+			       data);
 }
 
 /* -----------------------------------------------------------------------------
@@ -48,7 +49,7 @@ static const struct v4l2_subdev_ops rpf_ops = {
 
 static void rpf_prepare(struct vsp1_entity *entity,
 			struct vsp1_pipeline *pipe,
-			struct vsp1_dl_list *dl)
+			struct vsp1_dl_body *dlb)
 {
 	struct vsp1_rwpf *rpf = to_rwpf(&entity->subdev);
 	const struct vsp1_format_info *fmtinfo = rpf->fmtinfo;
@@ -67,7 +68,7 @@ static void rpf_prepare(struct vsp1_entity *entity,
 		pstride |= format->plane_fmt[1].bytesperline
 			<< VI6_RPF_SRCM_PSTRIDE_C_SHIFT;
 
-	vsp1_rpf_write(rpf, dl, VI6_RPF_SRCM_PSTRIDE, pstride);
+	vsp1_rpf_write(rpf, dlb, VI6_RPF_SRCM_PSTRIDE, pstride);
 
 	/* Format */
 	sink_format = vsp1_entity_get_pad_format(&rpf->entity,
@@ -88,8 +89,8 @@ static void rpf_prepare(struct vsp1_entity *entity,
 	if (sink_format->code != source_format->code)
 		infmt |= VI6_RPF_INFMT_CSC;
 
-	vsp1_rpf_write(rpf, dl, VI6_RPF_INFMT, infmt);
-	vsp1_rpf_write(rpf, dl, VI6_RPF_DSWAP, fmtinfo->swap);
+	vsp1_rpf_write(rpf, dlb, VI6_RPF_INFMT, infmt);
+	vsp1_rpf_write(rpf, dlb, VI6_RPF_DSWAP, fmtinfo->swap);
 
 	/* Output location */
 	if (pipe->bru) {
@@ -103,7 +104,7 @@ static void rpf_prepare(struct vsp1_entity *entity,
 		top = compose->top;
 	}
 
-	vsp1_rpf_write(rpf, dl, VI6_RPF_LOC,
+	vsp1_rpf_write(rpf, dlb, VI6_RPF_LOC,
 		       (left << VI6_RPF_LOC_HCOORD_SHIFT) |
 		       (top << VI6_RPF_LOC_VCOORD_SHIFT));
 
@@ -130,7 +131,7 @@ static void rpf_prepare(struct vsp1_entity *entity,
 	 *
 	 * In all cases, disable color keying.
 	 */
-	vsp1_rpf_write(rpf, dl, VI6_RPF_ALPH_SEL, VI6_RPF_ALPH_SEL_AEXT_EXT |
+	vsp1_rpf_write(rpf, dlb, VI6_RPF_ALPH_SEL, VI6_RPF_ALPH_SEL_AEXT_EXT |
 		       (fmtinfo->alpha ? VI6_RPF_ALPH_SEL_ASEL_PACKED
 				       : VI6_RPF_ALPH_SEL_ASEL_FIXED));
 
@@ -167,15 +168,14 @@ static void rpf_prepare(struct vsp1_entity *entity,
 		rpf->mult_alpha = mult;
 	}
 
-	vsp1_rpf_write(rpf, dl, VI6_RPF_MSK_CTRL, 0);
-	vsp1_rpf_write(rpf, dl, VI6_RPF_CKEY_CTRL, 0);
+	vsp1_rpf_write(rpf, dlb, VI6_RPF_MSK_CTRL, 0);
+	vsp1_rpf_write(rpf, dlb, VI6_RPF_CKEY_CTRL, 0);
 
 }
 
 static void rpf_configure(struct vsp1_entity *entity,
-			  struct vsp1_pipeline *pipe,
-			  struct vsp1_dl_list *dl,
-			  unsigned int partition)
+			  struct vsp1_pipeline *pipe, struct vsp1_dl_list *dl,
+			  struct vsp1_dl_body *dlb, unsigned int partition)
 {
 	struct vsp1_rwpf *rpf = to_rwpf(&entity->subdev);
 	struct vsp1_rwpf_memory mem = rpf->mem;
@@ -185,15 +185,14 @@ static void rpf_configure(struct vsp1_entity *entity,
 	struct v4l2_rect crop;
 
 	if (partition == 0) {
-		vsp1_rpf_write(rpf, dl, VI6_RPF_VRTCOL_SET,
+		vsp1_rpf_write(rpf, dlb, VI6_RPF_VRTCOL_SET,
 			       rpf->alpha << VI6_RPF_VRTCOL_SET_LAYA_SHIFT);
-		vsp1_rpf_write(rpf, dl, VI6_RPF_MULT_ALPHA, rpf->mult_alpha |
+		vsp1_rpf_write(rpf, dlb, VI6_RPF_MULT_ALPHA, rpf->mult_alpha |
 			       (rpf->alpha << VI6_RPF_MULT_ALPHA_RATIO_SHIFT));
 
-		vsp1_pipeline_propagate_alpha(pipe, dl, rpf->alpha);
+		vsp1_pipeline_propagate_alpha(pipe, dlb, rpf->alpha);
 	}
 
-
 	/*
 	 * Source size and crop offsets.
 	 *
@@ -219,10 +218,10 @@ static void rpf_configure(struct vsp1_entity *entity,
 		crop.left += pipe->partition->rpf.left;
 	}
 
-	vsp1_rpf_write(rpf, dl, VI6_RPF_SRC_BSIZE,
+	vsp1_rpf_write(rpf, dlb, VI6_RPF_SRC_BSIZE,
 		       (crop.width << VI6_RPF_SRC_BSIZE_BHSIZE_SHIFT) |
 		       (crop.height << VI6_RPF_SRC_BSIZE_BVSIZE_SHIFT));
-	vsp1_rpf_write(rpf, dl, VI6_RPF_SRC_ESIZE,
+	vsp1_rpf_write(rpf, dlb, VI6_RPF_SRC_ESIZE,
 		       (crop.width << VI6_RPF_SRC_ESIZE_EHSIZE_SHIFT) |
 		       (crop.height << VI6_RPF_SRC_ESIZE_EVSIZE_SHIFT));
 
@@ -247,9 +246,9 @@ static void rpf_configure(struct vsp1_entity *entity,
 	    fmtinfo->swap_uv)
 		swap(mem.addr[1], mem.addr[2]);
 
-	vsp1_rpf_write(rpf, dl, VI6_RPF_SRCM_ADDR_Y, mem.addr[0]);
-	vsp1_rpf_write(rpf, dl, VI6_RPF_SRCM_ADDR_C0, mem.addr[1]);
-	vsp1_rpf_write(rpf, dl, VI6_RPF_SRCM_ADDR_C1, mem.addr[2]);
+	vsp1_rpf_write(rpf, dlb, VI6_RPF_SRCM_ADDR_Y, mem.addr[0]);
+	vsp1_rpf_write(rpf, dlb, VI6_RPF_SRCM_ADDR_C0, mem.addr[1]);
+	vsp1_rpf_write(rpf, dlb, VI6_RPF_SRCM_ADDR_C1, mem.addr[2]);
 }
 
 static void rpf_partition(struct vsp1_entity *entity,
diff --git a/drivers/media/platform/vsp1/vsp1_sru.c b/drivers/media/platform/vsp1/vsp1_sru.c
index 0a24bc59bc2f..15212b4b36cf 100644
--- a/drivers/media/platform/vsp1/vsp1_sru.c
+++ b/drivers/media/platform/vsp1/vsp1_sru.c
@@ -28,10 +28,10 @@
  * Device Access
  */
 
-static inline void vsp1_sru_write(struct vsp1_sru *sru, struct vsp1_dl_list *dl,
-				  u32 reg, u32 data)
+static inline void vsp1_sru_write(struct vsp1_sru *sru,
+				  struct vsp1_dl_body *dlb, u32 reg, u32 data)
 {
-	vsp1_dl_list_write(dl, reg, data);
+	vsp1_dl_fragment_write(dlb, reg, data);
 }
 
 /* -----------------------------------------------------------------------------
@@ -273,7 +273,7 @@ static const struct v4l2_subdev_ops sru_ops = {
 
 static void sru_prepare(struct vsp1_entity *entity,
 			struct vsp1_pipeline *pipe,
-			struct vsp1_dl_list *dl)
+			struct vsp1_dl_body *dlb)
 {
 	const struct vsp1_sru_param *param;
 	struct vsp1_sru *sru = to_sru(&entity->subdev);
@@ -299,9 +299,9 @@ static void sru_prepare(struct vsp1_entity *entity,
 
 	ctrl0 |= param->ctrl0;
 
-	vsp1_sru_write(sru, dl, VI6_SRU_CTRL0, ctrl0);
-	vsp1_sru_write(sru, dl, VI6_SRU_CTRL1, VI6_SRU_CTRL1_PARAM5);
-	vsp1_sru_write(sru, dl, VI6_SRU_CTRL2, param->ctrl2);
+	vsp1_sru_write(sru, dlb, VI6_SRU_CTRL0, ctrl0);
+	vsp1_sru_write(sru, dlb, VI6_SRU_CTRL1, VI6_SRU_CTRL1_PARAM5);
+	vsp1_sru_write(sru, dlb, VI6_SRU_CTRL2, param->ctrl2);
 }
 
 static unsigned int sru_max_width(struct vsp1_entity *entity,
diff --git a/drivers/media/platform/vsp1/vsp1_uds.c b/drivers/media/platform/vsp1/vsp1_uds.c
index 84be962a33b1..bbe7e03b805d 100644
--- a/drivers/media/platform/vsp1/vsp1_uds.c
+++ b/drivers/media/platform/vsp1/vsp1_uds.c
@@ -31,22 +31,23 @@
  * Device Access
  */
 
-static inline void vsp1_uds_write(struct vsp1_uds *uds, struct vsp1_dl_list *dl,
-				  u32 reg, u32 data)
+static inline void vsp1_uds_write(struct vsp1_uds *uds,
+				  struct vsp1_dl_body *dlb, u32 reg, u32 data)
 {
-	vsp1_dl_list_write(dl, reg + uds->entity.index * VI6_UDS_OFFSET, data);
+	vsp1_dl_fragment_write(dlb, reg + uds->entity.index * VI6_UDS_OFFSET,
+			       data);
 }
 
 /* -----------------------------------------------------------------------------
  * Scaling Computation
  */
 
-void vsp1_uds_set_alpha(struct vsp1_entity *entity, struct vsp1_dl_list *dl,
+void vsp1_uds_set_alpha(struct vsp1_entity *entity, struct vsp1_dl_body *dlb,
 			unsigned int alpha)
 {
 	struct vsp1_uds *uds = to_uds(&entity->subdev);
 
-	vsp1_uds_write(uds, dl, VI6_UDS_ALPVAL,
+	vsp1_uds_write(uds, dlb, VI6_UDS_ALPVAL,
 		       alpha << VI6_UDS_ALPVAL_VAL0_SHIFT);
 }
 
@@ -261,7 +262,7 @@ static const struct v4l2_subdev_ops uds_ops = {
 
 static void uds_prepare(struct vsp1_entity *entity,
 			struct vsp1_pipeline *pipe,
-			struct vsp1_dl_list *dl)
+			struct vsp1_dl_body *dlb)
 {
 	struct vsp1_uds *uds = to_uds(&entity->subdev);
 	const struct v4l2_mbus_framefmt *output;
@@ -290,18 +291,18 @@ static void uds_prepare(struct vsp1_entity *entity,
 	else
 		multitap = true;
 
-	vsp1_uds_write(uds, dl, VI6_UDS_CTRL,
+	vsp1_uds_write(uds, dlb, VI6_UDS_CTRL,
 		       (uds->scale_alpha ? VI6_UDS_CTRL_AON : 0) |
 		       (multitap ? VI6_UDS_CTRL_BC : 0));
 
-	vsp1_uds_write(uds, dl, VI6_UDS_PASS_BWIDTH,
+	vsp1_uds_write(uds, dlb, VI6_UDS_PASS_BWIDTH,
 		       (uds_passband_width(hscale)
 				<< VI6_UDS_PASS_BWIDTH_H_SHIFT) |
 		       (uds_passband_width(vscale)
 				<< VI6_UDS_PASS_BWIDTH_V_SHIFT));
 
 	/* Set the scaling ratios. */
-	vsp1_uds_write(uds, dl, VI6_UDS_SCALE,
+	vsp1_uds_write(uds, dlb, VI6_UDS_SCALE,
 		       (hscale << VI6_UDS_SCALE_HFRAC_SHIFT) |
 		       (vscale << VI6_UDS_SCALE_VFRAC_SHIFT));
 }
@@ -309,6 +310,7 @@ static void uds_prepare(struct vsp1_entity *entity,
 static void uds_configure(struct vsp1_entity *entity,
 			  struct vsp1_pipeline *pipe,
 			  struct vsp1_dl_list *dl,
+			  struct vsp1_dl_body *dlb,
 			  unsigned int pindex)
 {
 	struct vsp1_uds *uds = to_uds(&entity->subdev);
@@ -319,13 +321,13 @@ static void uds_configure(struct vsp1_entity *entity,
 					    UDS_PAD_SOURCE);
 
 	/* Input size clipping */
-	vsp1_uds_write(uds, dl, VI6_UDS_HSZCLIP, VI6_UDS_HSZCLIP_HCEN |
+	vsp1_uds_write(uds, dlb, VI6_UDS_HSZCLIP, VI6_UDS_HSZCLIP_HCEN |
 		       (0 << VI6_UDS_HSZCLIP_HCL_OFST_SHIFT) |
 		       (partition->uds_sink.width
 				<< VI6_UDS_HSZCLIP_HCL_SIZE_SHIFT));
 
 	/* Output size clipping */
-	vsp1_uds_write(uds, dl, VI6_UDS_CLIP_SIZE,
+	vsp1_uds_write(uds, dlb, VI6_UDS_CLIP_SIZE,
 		       (partition->uds_source.width
 				<< VI6_UDS_CLIP_SIZE_HSIZE_SHIFT) |
 		       (output->height
diff --git a/drivers/media/platform/vsp1/vsp1_uds.h b/drivers/media/platform/vsp1/vsp1_uds.h
index 7bf3cdcffc65..d99997f3b28d 100644
--- a/drivers/media/platform/vsp1/vsp1_uds.h
+++ b/drivers/media/platform/vsp1/vsp1_uds.h
@@ -35,7 +35,7 @@ static inline struct vsp1_uds *to_uds(struct v4l2_subdev *subdev)
 
 struct vsp1_uds *vsp1_uds_create(struct vsp1_device *vsp1, unsigned int index);
 
-void vsp1_uds_set_alpha(struct vsp1_entity *uds, struct vsp1_dl_list *dl,
+void vsp1_uds_set_alpha(struct vsp1_entity *uds, struct vsp1_dl_body *dlb,
 			unsigned int alpha);
 
 #endif /* __VSP1_UDS_H__ */
diff --git a/drivers/media/platform/vsp1/vsp1_video.c b/drivers/media/platform/vsp1/vsp1_video.c
index bd5403f24dda..7e825f3360bf 100644
--- a/drivers/media/platform/vsp1/vsp1_video.c
+++ b/drivers/media/platform/vsp1/vsp1_video.c
@@ -383,11 +383,12 @@ static void vsp1_video_pipeline_run_partition(struct vsp1_pipeline *pipe,
 					      unsigned int partition)
 {
 	struct vsp1_entity *entity;
+	struct vsp1_dl_body *dlb = vsp1_dl_list_body(dl);
 
 	pipe->partition = &pipe->part_table[partition];
 
 	list_for_each_entry(entity, &pipe->entities, list_pipe)
-		vsp1_entity_configure(entity, pipe, dl, partition);
+		vsp1_entity_configure(entity, pipe, dl, dlb, partition);
 }
 
 static void vsp1_video_pipeline_run(struct vsp1_pipeline *pipe)
@@ -790,6 +791,7 @@ static void vsp1_video_buffer_queue(struct vb2_buffer *vb)
 static int vsp1_video_setup_pipeline(struct vsp1_pipeline *pipe)
 {
 	struct vsp1_entity *entity;
+	struct vsp1_dl_body *dlb;
 	int ret;
 
 	/* Determine this pipelines sizes for image partitioning support. */
@@ -802,6 +804,9 @@ static int vsp1_video_setup_pipeline(struct vsp1_pipeline *pipe)
 	if (!pipe->dl)
 		return -ENOMEM;
 
+	/* Retrieve the default DLB from the list */
+	dlb = vsp1_dl_list_get_body(pipe->dl);
+
 	if (pipe->uds) {
 		struct vsp1_uds *uds = to_uds(&pipe->uds->subdev);
 
@@ -824,8 +829,8 @@ static int vsp1_video_setup_pipeline(struct vsp1_pipeline *pipe)
 	}
 
 	list_for_each_entry(entity, &pipe->entities, list_pipe) {
-		vsp1_entity_route_setup(entity, pipe, pipe->dl);
-		vsp1_entity_prepare(entity, pipe, pipe->dl);
+		vsp1_entity_route_setup(entity, pipe, dlb);
+		vsp1_entity_prepare(entity, pipe, dlb);
 	}
 
 	return 0;
diff --git a/drivers/media/platform/vsp1/vsp1_wpf.c b/drivers/media/platform/vsp1/vsp1_wpf.c
index d6dd7e783d27..b02c17fb3b88 100644
--- a/drivers/media/platform/vsp1/vsp1_wpf.c
+++ b/drivers/media/platform/vsp1/vsp1_wpf.c
@@ -31,9 +31,10 @@
  */
 
 static inline void vsp1_wpf_write(struct vsp1_rwpf *wpf,
-				  struct vsp1_dl_list *dl, u32 reg, u32 data)
+				  struct vsp1_dl_body *dlb, u32 reg, u32 data)
 {
-	vsp1_dl_list_write(dl, reg + wpf->entity.index * VI6_WPF_OFFSET, data);
+	vsp1_dl_fragment_write(dlb, reg + wpf->entity.index * VI6_WPF_OFFSET,
+			       data);
 }
 
 /* -----------------------------------------------------------------------------
@@ -238,7 +239,7 @@ static void vsp1_wpf_destroy(struct vsp1_entity *entity)
 
 static void wpf_prepare(struct vsp1_entity *entity,
 			struct vsp1_pipeline *pipe,
-			struct vsp1_dl_list *dl)
+			struct vsp1_dl_body *dlb)
 {
 	struct vsp1_rwpf *wpf = to_rwpf(&entity->subdev);
 	struct vsp1_device *vsp1 = wpf->entity.vsp1;
@@ -272,17 +273,17 @@ static void wpf_prepare(struct vsp1_entity *entity,
 			outfmt |= VI6_WPF_OUTFMT_SPUVS;
 
 		/* Destination stride and byte swapping. */
-		vsp1_wpf_write(wpf, dl, VI6_WPF_DSTM_STRIDE_Y,
+		vsp1_wpf_write(wpf, dlb, VI6_WPF_DSTM_STRIDE_Y,
 			       format->plane_fmt[0].bytesperline);
 		if (format->num_planes > 1)
-			vsp1_wpf_write(wpf, dl, VI6_WPF_DSTM_STRIDE_C,
+			vsp1_wpf_write(wpf, dlb, VI6_WPF_DSTM_STRIDE_C,
 				       format->plane_fmt[1].bytesperline);
 
-		vsp1_wpf_write(wpf, dl, VI6_WPF_DSWAP, fmtinfo->swap);
+		vsp1_wpf_write(wpf, dlb, VI6_WPF_DSWAP, fmtinfo->swap);
 
 		if (vsp1->info->features & VSP1_HAS_WPF_HFLIP &&
 		    wpf->entity.index == 0)
-			vsp1_wpf_write(wpf, dl, VI6_WPF_ROT_CTRL,
+			vsp1_wpf_write(wpf, dlb, VI6_WPF_ROT_CTRL,
 				       VI6_WPF_ROT_CTRL_LN16 |
 				       (256 << VI6_WPF_ROT_CTRL_LMEM_WD_SHIFT));
 	}
@@ -292,10 +293,10 @@ static void wpf_prepare(struct vsp1_entity *entity,
 
 	wpf->outfmt = outfmt;
 
-	vsp1_dl_list_write(dl, VI6_DPR_WPF_FPORCH(wpf->entity.index),
-			   VI6_DPR_WPF_FPORCH_FP_WPFN);
+	vsp1_dl_fragment_write(dlb, VI6_DPR_WPF_FPORCH(wpf->entity.index),
+			       VI6_DPR_WPF_FPORCH_FP_WPFN);
 
-	vsp1_dl_list_write(dl, VI6_WPF_WRBCK_CTRL, 0);
+	vsp1_dl_fragment_write(dlb, VI6_WPF_WRBCK_CTRL, 0);
 
 	/*
 	 * Sources. If the pipeline has a single input and BRU is not used,
@@ -319,17 +320,18 @@ static void wpf_prepare(struct vsp1_entity *entity,
 			? VI6_WPF_SRCRPF_VIRACT_MST
 			: VI6_WPF_SRCRPF_VIRACT2_MST;
 
-	vsp1_wpf_write(wpf, dl, VI6_WPF_SRCRPF, srcrpf);
+	vsp1_wpf_write(wpf, dlb, VI6_WPF_SRCRPF, srcrpf);
 
 	/* Enable interrupts */
-	vsp1_dl_list_write(dl, VI6_WPF_IRQ_STA(wpf->entity.index), 0);
-	vsp1_dl_list_write(dl, VI6_WPF_IRQ_ENB(wpf->entity.index),
-			   VI6_WFP_IRQ_ENB_DFEE);
+	vsp1_dl_fragment_write(dlb, VI6_WPF_IRQ_STA(wpf->entity.index), 0);
+	vsp1_dl_fragment_write(dlb, VI6_WPF_IRQ_ENB(wpf->entity.index),
+			       VI6_WFP_IRQ_ENB_DFEE);
 }
 
 static void wpf_configure(struct vsp1_entity *entity,
 			  struct vsp1_pipeline *pipe,
 			  struct vsp1_dl_list *dl,
+			  struct vsp1_dl_body *dlb,
 			  unsigned int partition)
 {
 	struct vsp1_rwpf *wpf = to_rwpf(&entity->subdev);
@@ -363,7 +365,7 @@ static void wpf_configure(struct vsp1_entity *entity,
 		if (wpf->flip.active & BIT(WPF_CTRL_HFLIP))
 			outfmt |= VI6_WPF_OUTFMT_HFLP;
 
-		vsp1_wpf_write(wpf, dl, VI6_WPF_OUTFMT, outfmt);
+		vsp1_wpf_write(wpf, dlb, VI6_WPF_OUTFMT, outfmt);
 	}
 
 	sink_format = vsp1_entity_get_pad_format(&wpf->entity,
@@ -379,10 +381,10 @@ static void wpf_configure(struct vsp1_entity *entity,
 	if (pipe->partitions > 1)
 		width = pipe->partition->wpf.width;
 
-	vsp1_wpf_write(wpf, dl, VI6_WPF_HSZCLIP, VI6_WPF_SZCLIP_EN |
+	vsp1_wpf_write(wpf, dlb, VI6_WPF_HSZCLIP, VI6_WPF_SZCLIP_EN |
 		       (0 << VI6_WPF_SZCLIP_OFST_SHIFT) |
 		       (width << VI6_WPF_SZCLIP_SIZE_SHIFT));
-	vsp1_wpf_write(wpf, dl, VI6_WPF_VSZCLIP, VI6_WPF_SZCLIP_EN |
+	vsp1_wpf_write(wpf, dlb, VI6_WPF_VSZCLIP, VI6_WPF_SZCLIP_EN |
 		       (0 << VI6_WPF_SZCLIP_OFST_SHIFT) |
 		       (height << VI6_WPF_SZCLIP_SIZE_SHIFT));
 
@@ -474,9 +476,9 @@ static void wpf_configure(struct vsp1_entity *entity,
 	    fmtinfo->swap_uv)
 		swap(mem.addr[1], mem.addr[2]);
 
-	vsp1_wpf_write(wpf, dl, VI6_WPF_DSTM_ADDR_Y, mem.addr[0]);
-	vsp1_wpf_write(wpf, dl, VI6_WPF_DSTM_ADDR_C0, mem.addr[1]);
-	vsp1_wpf_write(wpf, dl, VI6_WPF_DSTM_ADDR_C1, mem.addr[2]);
+	vsp1_wpf_write(wpf, dlb, VI6_WPF_DSTM_ADDR_Y, mem.addr[0]);
+	vsp1_wpf_write(wpf, dlb, VI6_WPF_DSTM_ADDR_C0, mem.addr[1]);
+	vsp1_wpf_write(wpf, dlb, VI6_WPF_DSTM_ADDR_C1, mem.addr[2]);
 }
 
 static unsigned int wpf_max_width(struct vsp1_entity *entity,
-- 
git-series 0.9.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 7/8] v4l: vsp1: Move video configuration to a cached dlb
  2017-08-14 15:13 [PATCH v2 0/8] vsp1: TLB optimisation and DL caching Kieran Bingham
                   ` (5 preceding siblings ...)
  2017-08-14 15:13 ` [PATCH v2 6/8] v4l: vsp1: Adapt entities to configure into a body Kieran Bingham
@ 2017-08-14 15:13 ` Kieran Bingham
  2017-08-17 18:10   ` Laurent Pinchart
  2017-08-14 15:13 ` [PATCH v2 8/8] v4l: vsp1: Reduce display list body size Kieran Bingham
  7 siblings, 1 reply; 32+ messages in thread
From: Kieran Bingham @ 2017-08-14 15:13 UTC (permalink / raw)
  To: laurent.pinchart, linux-renesas-soc, linux-media; +Cc: Kieran Bingham

We are now able to configure a pipeline directly into a local display
list body. Take advantage of this fact, and create a cacheable body to
store the configuration of the pipeline in the video object.

vsp1_video_pipeline_run() is now the last user of the pipe->dl object.
Convert this function to use the cached video->config body and obtain a
local display list reference.

Attach the video->config body to the display list when needed before
committing to hardware.

The pipe object is marked as un-configured when entering a suspend. This
ensures that upon resume, where the hardware is reset - our cached
configuration will be re-attached to the next committed DL.

Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
---

Our video DL usage now looks like the below output:

dl->body0 contains our disposable runtime configuration. Max 41.
dl_child->body0 is our partition specific configuration. Max 12.
dl->fragments shows our constant configuration and LUTs.

  These two are LUT/CLU:
     * dl->fragments[x]->num_entries 256 / max 256
     * dl->fragments[x]->num_entries 4914 / max 4914

Which shows that our 'constant' configuration cache is currently
utilised to a maximum of 64 entries.

trace-cmd report | \
    grep max | sed 's/.*vsp1_dl_list_commit://g' | sort | uniq;

  dl->body0->num_entries 13 / max 128
  dl->body0->num_entries 14 / max 128
  dl->body0->num_entries 16 / max 128
  dl->body0->num_entries 20 / max 128
  dl->body0->num_entries 27 / max 128
  dl->body0->num_entries 34 / max 128
  dl->body0->num_entries 41 / max 128
  dl_child->body0->num_entries 10 / max 128
  dl_child->body0->num_entries 12 / max 128
  dl->fragments[x]->num_entries 15 / max 128
  dl->fragments[x]->num_entries 16 / max 128
  dl->fragments[x]->num_entries 17 / max 128
  dl->fragments[x]->num_entries 18 / max 128
  dl->fragments[x]->num_entries 20 / max 128
  dl->fragments[x]->num_entries 21 / max 128
  dl->fragments[x]->num_entries 256 / max 256
  dl->fragments[x]->num_entries 31 / max 128
  dl->fragments[x]->num_entries 32 / max 128
  dl->fragments[x]->num_entries 39 / max 128
  dl->fragments[x]->num_entries 40 / max 128
  dl->fragments[x]->num_entries 47 / max 128
  dl->fragments[x]->num_entries 48 / max 128
  dl->fragments[x]->num_entries 4914 / max 4914
  dl->fragments[x]->num_entries 55 / max 128
  dl->fragments[x]->num_entries 56 / max 128
  dl->fragments[x]->num_entries 63 / max 128
  dl->fragments[x]->num_entries 64 / max 128
---
 drivers/media/platform/vsp1/vsp1_pipe.c  |  4 +-
 drivers/media/platform/vsp1/vsp1_pipe.h  |  4 +-
 drivers/media/platform/vsp1/vsp1_video.c | 67 ++++++++++++++++---------
 drivers/media/platform/vsp1/vsp1_video.h |  2 +-
 4 files changed, 51 insertions(+), 26 deletions(-)

diff --git a/drivers/media/platform/vsp1/vsp1_pipe.c b/drivers/media/platform/vsp1/vsp1_pipe.c
index 5012643583b6..7d1f7ba43060 100644
--- a/drivers/media/platform/vsp1/vsp1_pipe.c
+++ b/drivers/media/platform/vsp1/vsp1_pipe.c
@@ -249,6 +249,7 @@ void vsp1_pipeline_run(struct vsp1_pipeline *pipe)
 		vsp1_write(vsp1, VI6_CMD(pipe->output->entity.index),
 			   VI6_CMD_STRCMD);
 		pipe->state = VSP1_PIPELINE_RUNNING;
+		pipe->configured = true;
 	}
 
 	pipe->buffers_ready = 0;
@@ -430,6 +431,9 @@ void vsp1_pipelines_suspend(struct vsp1_device *vsp1)
 		spin_lock_irqsave(&pipe->irqlock, flags);
 		if (pipe->state == VSP1_PIPELINE_RUNNING)
 			pipe->state = VSP1_PIPELINE_STOPPING;
+
+		/* After a suspend, the hardware will be reset */
+		pipe->configured = false;
 		spin_unlock_irqrestore(&pipe->irqlock, flags);
 	}
 
diff --git a/drivers/media/platform/vsp1/vsp1_pipe.h b/drivers/media/platform/vsp1/vsp1_pipe.h
index 90d29492b9b9..e7ad6211b4d0 100644
--- a/drivers/media/platform/vsp1/vsp1_pipe.h
+++ b/drivers/media/platform/vsp1/vsp1_pipe.h
@@ -90,6 +90,7 @@ struct vsp1_partition {
  * @irqlock: protects the pipeline state
  * @state: current state
  * @wq: wait queue to wait for state change completion
+ * @configured: flag determining if the hardware has run since reset
  * @frame_end: frame end interrupt handler
  * @lock: protects the pipeline use count and stream count
  * @kref: pipeline reference count
@@ -117,6 +118,7 @@ struct vsp1_pipeline {
 	spinlock_t irqlock;
 	enum vsp1_pipeline_state state;
 	wait_queue_head_t wq;
+	bool configured;
 
 	void (*frame_end)(struct vsp1_pipeline *pipe, bool completed);
 
@@ -143,8 +145,6 @@ struct vsp1_pipeline {
 	 */
 	struct list_head entities;
 
-	struct vsp1_dl_list *dl;
-
 	unsigned int partitions;
 	struct vsp1_partition *partition;
 	struct vsp1_partition *part_table;
diff --git a/drivers/media/platform/vsp1/vsp1_video.c b/drivers/media/platform/vsp1/vsp1_video.c
index 7e825f3360bf..42b70b8465ba 100644
--- a/drivers/media/platform/vsp1/vsp1_video.c
+++ b/drivers/media/platform/vsp1/vsp1_video.c
@@ -394,37 +394,43 @@ static void vsp1_video_pipeline_run_partition(struct vsp1_pipeline *pipe,
 static void vsp1_video_pipeline_run(struct vsp1_pipeline *pipe)
 {
 	struct vsp1_device *vsp1 = pipe->output->entity.vsp1;
+	struct vsp1_video *video = pipe->output->video;
 	unsigned int partition;
+	struct vsp1_dl_list *dl;
+
+	dl = vsp1_dl_list_get(pipe->output->dlm);
 
-	if (!pipe->dl)
-		pipe->dl = vsp1_dl_list_get(pipe->output->dlm);
+	/* Attach our pipe configuration to fully initialise the hardware */
+	if (!pipe->configured) {
+		vsp1_dl_list_add_fragment(dl, video->pipe_config);
+		pipe->configured = true;
+	}
 
 	/* Run the first partition */
-	vsp1_video_pipeline_run_partition(pipe, pipe->dl, 0);
+	vsp1_video_pipeline_run_partition(pipe, dl, 0);
 
 	/* Process consecutive partitions as necessary */
 	for (partition = 1; partition < pipe->partitions; ++partition) {
-		struct vsp1_dl_list *dl;
+		struct vsp1_dl_list *dl_child;
 
-		dl = vsp1_dl_list_get(pipe->output->dlm);
+		dl_child = vsp1_dl_list_get(pipe->output->dlm);
 
 		/*
 		 * An incomplete chain will still function, but output only
 		 * the partitions that had a dl available. The frame end
 		 * interrupt will be marked on the last dl in the chain.
 		 */
-		if (!dl) {
+		if (!dl_child) {
 			dev_err(vsp1->dev, "Failed to obtain a dl list. Frame will be incomplete\n");
 			break;
 		}
 
-		vsp1_video_pipeline_run_partition(pipe, dl, partition);
-		vsp1_dl_list_add_chain(pipe->dl, dl);
+		vsp1_video_pipeline_run_partition(pipe, dl_child, partition);
+		vsp1_dl_list_add_chain(dl, dl_child);
 	}
 
 	/* Complete, and commit the head display list. */
-	vsp1_dl_list_commit(pipe->dl);
-	pipe->dl = NULL;
+	vsp1_dl_list_commit(dl);
 
 	vsp1_pipeline_run(pipe);
 }
@@ -790,8 +796,8 @@ static void vsp1_video_buffer_queue(struct vb2_buffer *vb)
 
 static int vsp1_video_setup_pipeline(struct vsp1_pipeline *pipe)
 {
+	struct vsp1_video *video = pipe->output->video;
 	struct vsp1_entity *entity;
-	struct vsp1_dl_body *dlb;
 	int ret;
 
 	/* Determine this pipelines sizes for image partitioning support. */
@@ -799,14 +805,6 @@ static int vsp1_video_setup_pipeline(struct vsp1_pipeline *pipe)
 	if (ret < 0)
 		return ret;
 
-	/* Prepare the display list. */
-	pipe->dl = vsp1_dl_list_get(pipe->output->dlm);
-	if (!pipe->dl)
-		return -ENOMEM;
-
-	/* Retrieve the default DLB from the list */
-	dlb = vsp1_dl_list_get_body(pipe->dl);
-
 	if (pipe->uds) {
 		struct vsp1_uds *uds = to_uds(&pipe->uds->subdev);
 
@@ -828,11 +826,20 @@ static int vsp1_video_setup_pipeline(struct vsp1_pipeline *pipe)
 		}
 	}
 
+	/* Obtain a clean body from our pool */
+	video->pipe_config = vsp1_dl_fragment_get(video->dlbs);
+	if (!video->pipe_config)
+		return -ENOMEM;
+
+	/* Configure the entities into our cached pipe configuration */
 	list_for_each_entry(entity, &pipe->entities, list_pipe) {
-		vsp1_entity_route_setup(entity, pipe, dlb);
-		vsp1_entity_prepare(entity, pipe, dlb);
+		vsp1_entity_route_setup(entity, pipe, video->pipe_config);
+		vsp1_entity_prepare(entity, pipe, video->pipe_config);
 	}
 
+	/* Ensure that our cached configuration is updated in the next DL */
+	pipe->configured = false;
+
 	return 0;
 }
 
@@ -842,6 +849,9 @@ static void vsp1_video_cleanup_pipeline(struct vsp1_pipeline *pipe)
 	struct vsp1_vb2_buffer *buffer;
 	unsigned long flags;
 
+	/* Release any cached configuration */
+	vsp1_dl_fragment_put(video->pipe_config);
+
 	/* Remove all buffers from the IRQ queue. */
 	spin_lock_irqsave(&video->irqlock, flags);
 	list_for_each_entry(buffer, &video->irqqueue, queue)
@@ -918,9 +928,6 @@ static void vsp1_video_stop_streaming(struct vb2_queue *vq)
 		ret = vsp1_pipeline_stop(pipe);
 		if (ret == -ETIMEDOUT)
 			dev_err(video->vsp1->dev, "pipeline stop timeout\n");
-
-		vsp1_dl_list_put(pipe->dl);
-		pipe->dl = NULL;
 	}
 	mutex_unlock(&pipe->lock);
 
@@ -1240,6 +1247,16 @@ struct vsp1_video *vsp1_video_create(struct vsp1_device *vsp1,
 		goto error;
 	}
 
+	/*
+	 * Create a fragment pool to cache the constant configuration of the
+	 * pipeline object
+	 */
+	video->dlbs = vsp1_dl_fragment_pool_alloc(vsp1, 2, 128, 0);
+	if (!video->dlbs) {
+		ret = -ENOMEM;
+		goto error;
+	}
+
 	return video;
 
 error:
@@ -1249,6 +1266,8 @@ struct vsp1_video *vsp1_video_create(struct vsp1_device *vsp1,
 
 void vsp1_video_cleanup(struct vsp1_video *video)
 {
+	vsp1_dl_fragment_pool_free(video->dlbs);
+
 	if (video_is_registered(&video->video))
 		video_unregister_device(&video->video);
 
diff --git a/drivers/media/platform/vsp1/vsp1_video.h b/drivers/media/platform/vsp1/vsp1_video.h
index 50ea7f02205f..2499d3d792b4 100644
--- a/drivers/media/platform/vsp1/vsp1_video.h
+++ b/drivers/media/platform/vsp1/vsp1_video.h
@@ -43,6 +43,8 @@ struct vsp1_video {
 
 	struct mutex lock;
 
+	struct vsp1_dl_fragment_pool *dlbs;
+	struct vsp1_dl_body *pipe_config;
 	unsigned int pipe_index;
 
 	struct vb2_queue queue;
-- 
git-series 0.9.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 8/8] v4l: vsp1: Reduce display list body size
  2017-08-14 15:13 [PATCH v2 0/8] vsp1: TLB optimisation and DL caching Kieran Bingham
                   ` (6 preceding siblings ...)
  2017-08-14 15:13 ` [PATCH v2 7/8] v4l: vsp1: Move video configuration to a cached dlb Kieran Bingham
@ 2017-08-14 15:13 ` Kieran Bingham
  2017-08-17 16:11   ` Laurent Pinchart
  7 siblings, 1 reply; 32+ messages in thread
From: Kieran Bingham @ 2017-08-14 15:13 UTC (permalink / raw)
  To: laurent.pinchart, linux-renesas-soc, linux-media; +Cc: Kieran Bingham

The display list originally allocated a body of 256 entries to store all
of the register lists required for each frame.

This has now been separated into fragments for constant stream setup, and
runtime updates.

Empirical testing shows that the body0 now uses a maximum of 41
registers for each frame, for both DRM and Video API pipelines thus a
rounded 64 entries provides a suitable allocation.

Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
---
 drivers/media/platform/vsp1/vsp1_dl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/media/platform/vsp1/vsp1_dl.c b/drivers/media/platform/vsp1/vsp1_dl.c
index 176a258146ac..b3f5eb2f9a4f 100644
--- a/drivers/media/platform/vsp1/vsp1_dl.c
+++ b/drivers/media/platform/vsp1/vsp1_dl.c
@@ -21,7 +21,7 @@
 #include "vsp1.h"
 #include "vsp1_dl.h"
 
-#define VSP1_DL_NUM_ENTRIES		256
+#define VSP1_DL_NUM_ENTRIES		64
 
 #define VSP1_DLH_INT_ENABLE		(1 << 1)
 #define VSP1_DLH_AUTO_START		(1 << 0)
-- 
git-series 0.9.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 1/8] v4l: vsp1: Protect fragments against overflow
  2017-08-14 15:13 ` [PATCH v2 1/8] v4l: vsp1: Protect fragments against overflow Kieran Bingham
@ 2017-08-16 21:53   ` Laurent Pinchart
  2017-08-17  8:16     ` Kieran Bingham
  0 siblings, 1 reply; 32+ messages in thread
From: Laurent Pinchart @ 2017-08-16 21:53 UTC (permalink / raw)
  To: Kieran Bingham; +Cc: linux-renesas-soc, linux-media

Hi Kieran,

Thank you for the patch.

On Monday 14 Aug 2017 16:13:24 Kieran Bingham wrote:
> The fragment write function relies on the code never asking it to
> write more than the entries available in the list.
> 
> Currently with each list body containing 256 entries, this is fine,
> but we can reduce this number greatly saving memory.
> 
> In preparation of this - add a level of protection to catch any
> buffer overflows.
> 
> Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
> ---
>  drivers/media/platform/vsp1/vsp1_dl.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/media/platform/vsp1/vsp1_dl.c
> b/drivers/media/platform/vsp1/vsp1_dl.c index 8b5cbb6b7a70..cb4625ae13c2
> 100644
> --- a/drivers/media/platform/vsp1/vsp1_dl.c
> +++ b/drivers/media/platform/vsp1/vsp1_dl.c
> @@ -50,6 +50,7 @@ struct vsp1_dl_entry {
>   * @dma: DMA address of the entries
>   * @size: size of the DMA memory in bytes
>   * @num_entries: number of stored entries
> + * @max_entries: number of entries available
>   */
>  struct vsp1_dl_body {
>  	struct list_head list;
> @@ -60,6 +61,7 @@ struct vsp1_dl_body {
>  	size_t size;
> 
>  	unsigned int num_entries;
> +	unsigned int max_entries;
>  };
> 
>  /**
> @@ -138,6 +140,7 @@ static int vsp1_dl_body_init(struct vsp1_device *vsp1,
> 
>  	dlb->vsp1 = vsp1;
>  	dlb->size = size;
> +	dlb->max_entries = num_entries;
> 
>  	dlb->entries = dma_alloc_wc(vsp1->bus_master, dlb->size, &dlb->dma,
>  				    GFP_KERNEL);
> @@ -220,6 +223,11 @@ void vsp1_dl_fragment_free(struct vsp1_dl_body *dlb)
>   */
>  void vsp1_dl_fragment_write(struct vsp1_dl_body *dlb, u32 reg, u32 data)
>  {
> +	if (unlikely(dlb->num_entries >= dlb->max_entries)) {
> +		WARN_ONCE(true, "DLB size exceeded (max %u)", dlb-
>max_entries);
> +		return;
> +	}

How about

	if (WARN_ONCE(dlb->num_entries >= dlb->max_entries,
		      "DLB size exceeded (max %u)", dlb->max_entries))
		return;

(WARN_ONCE contains the unlikely() already)

I'm not fussed either way,

Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>

>  	dlb->entries[dlb->num_entries].addr = reg;
>  	dlb->entries[dlb->num_entries].data = data;
>  	dlb->num_entries++;

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 1/8] v4l: vsp1: Protect fragments against overflow
  2017-08-16 21:53   ` Laurent Pinchart
@ 2017-08-17  8:16     ` Kieran Bingham
  0 siblings, 0 replies; 32+ messages in thread
From: Kieran Bingham @ 2017-08-17  8:16 UTC (permalink / raw)
  To: Laurent Pinchart, Kieran Bingham; +Cc: linux-renesas-soc, linux-media

Hi Laurent,

Thanks for your review,

On 16/08/17 22:53, Laurent Pinchart wrote:
> Hi Kieran,
> 
> Thank you for the patch.

> How about
> 
> 	if (WARN_ONCE(dlb->num_entries >= dlb->max_entries,
> 		      "DLB size exceeded (max %u)", dlb->max_entries))
> 		return;
> 
> (WARN_ONCE contains the unlikely() already)
> 
> I'm not fussed either way,

That does seem cleaner. Updated ready for any repost.

Thanks
--
Kieran

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 3/8] v4l: vsp1: Convert display lists to use new fragment pool
  2017-08-14 15:13 ` [PATCH v2 3/8] v4l: vsp1: Convert display lists to use new " Kieran Bingham
@ 2017-08-17 12:13   ` Laurent Pinchart
  2017-09-11 20:27     ` Kieran Bingham
  0 siblings, 1 reply; 32+ messages in thread
From: Laurent Pinchart @ 2017-08-17 12:13 UTC (permalink / raw)
  To: Kieran Bingham; +Cc: linux-renesas-soc, linux-media

Hi Kieran,

Thank you for the patch.

On Monday 14 Aug 2017 16:13:26 Kieran Bingham wrote:
> Adapt the dl->body0 object to use an object from the fragment pool.
> This greatly reduces the pressure on the TLB for IPMMU use cases, as
> all of the lists use a single allocation for the main body.
> 
> The CLU and LUT objects pre-allocate a pool containing two bodies,
> allowing a userspace update before the hardware has committed a previous
> set of tables.

I think you'll need three bodies, one for the DL queued to the hardware, one 
for the pending DL and one for the new DL needed when you update the LUT/CLU. 
Given that the VSP test suite hasn't caught this problem, we also need a new 
test :-)

> Fragments are no longer 'freed' in interrupt context, but instead
> released back to their respective pools.  This allows us to remove the
> garbage collector in the DLM.
> 
> Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
> 
> ---
> v2:
>  - Use dl->body0->max_entries to determine header offset, instead of the
>    global constant VSP1_DL_NUM_ENTRIES which is incorrect.
>  - squash updates for LUT, CLU, and fragment cleanup into single patch.
>    (Not fully bisectable when separated)
> ---
>  drivers/media/platform/vsp1/vsp1_clu.c |  22 ++-
>  drivers/media/platform/vsp1/vsp1_clu.h |   1 +-
>  drivers/media/platform/vsp1/vsp1_dl.c  | 223 +++++---------------------
>  drivers/media/platform/vsp1/vsp1_dl.h  |   3 +-
>  drivers/media/platform/vsp1/vsp1_lut.c |  23 ++-
>  drivers/media/platform/vsp1/vsp1_lut.h |   1 +-
>  6 files changed, 90 insertions(+), 183 deletions(-)

This is a nice diffstat, but only if you add kerneldoc for the new functions 
introduced in patch 2/8, otherwise the overall documentation diffstat looks 
bad :-)

> diff --git a/drivers/media/platform/vsp1/vsp1_clu.c
> b/drivers/media/platform/vsp1/vsp1_clu.c index f2fb26e5ab4e..52c523625e2f
> 100644
> --- a/drivers/media/platform/vsp1/vsp1_clu.c
> +++ b/drivers/media/platform/vsp1/vsp1_clu.c

[snip]

> @@ -288,6 +298,12 @@ struct vsp1_clu *vsp1_clu_create(struct vsp1_device
> *vsp1) if (ret < 0)
>  		return ERR_PTR(ret);
> 
> +	/* Allocate a fragment pool */

The comment would be more useful if you explained why you need to allocate a 
pool here. Same comment for the LUT.

> +	clu->pool = vsp1_dl_fragment_pool_alloc(clu->entity.vsp1, 2,
> +						CLU_SIZE + 1, 0);
> +	if (!clu->pool)
> +		return ERR_PTR(-ENOMEM);
> +
>  	/* Initialize the control handler. */
>  	v4l2_ctrl_handler_init(&clu->ctrls, 2);
>  	v4l2_ctrl_new_custom(&clu->ctrls, &clu_table_control, NULL);

[snip]

> diff --git a/drivers/media/platform/vsp1/vsp1_dl.c
> b/drivers/media/platform/vsp1/vsp1_dl.c index aab9dd6ec0eb..6ffdc3549283
> 100644
> --- a/drivers/media/platform/vsp1/vsp1_dl.c
> +++ b/drivers/media/platform/vsp1/vsp1_dl.c

[snip]


> @@ -379,41 +289,39 @@ static struct vsp1_dl_list *vsp1_dl_list_alloc(struct
> vsp1_dl_manager *dlm) INIT_LIST_HEAD(&dl->fragments);
>  	dl->dlm = dlm;
> 
> -	/*
> -	 * Initialize the display list body and allocate DMA memory for the 
body
> -	 * and the optional header. Both are allocated together to avoid 
memory
> -	 * fragmentation, with the header located right after the body in
> -	 * memory.
> -	 */
> -	header_size = dlm->mode == VSP1_DL_MODE_HEADER
> -		    ? ALIGN(sizeof(struct vsp1_dl_header), 8)
> -		    : 0;
> -
> -	ret = vsp1_dl_body_init(dlm->vsp1, &dl->body0, VSP1_DL_NUM_ENTRIES,
> -				header_size);
> -	if (ret < 0) {
> -		kfree(dl);
> +	/* Retrieve a body from our DLM body pool */
> +	dl->body0 = vsp1_dl_fragment_get(pool);
> +	if (!dl->body0)
>  		return NULL;
> -	}
> -
>  	if (dlm->mode == VSP1_DL_MODE_HEADER) {
> -		size_t header_offset = VSP1_DL_NUM_ENTRIES
> -				     * sizeof(*dl->body0.entries);
> +		size_t header_offset = dl->body0->max_entries
> +				     * sizeof(*dl->body0->entries);
> 
> -		dl->header = ((void *)dl->body0.entries) + header_offset;
> -		dl->dma = dl->body0.dma + header_offset;
> +		dl->header = ((void *)dl->body0->entries) + header_offset;
> +		dl->dma = dl->body0->dma + header_offset;
> 
>  		memset(dl->header, 0, sizeof(*dl->header));
> -		dl->header->lists[0].addr = dl->body0.dma;
> +		dl->header->lists[0].addr = dl->body0->dma;
>  	}
> 
>  	return dl;
>  }
> 
> +static void vsp1_dl_list_fragments_free(struct vsp1_dl_list *dl)

This function doesn't free fragments put puts them back to the free list. I'd 
call it vsp1_dl_list_fragments_put().

> +{
> +	struct vsp1_dl_body *dlb, *tmp;
> +
> +	list_for_each_entry_safe(dlb, tmp, &dl->fragments, list) {
> +		list_del(&dlb->list);
> +		vsp1_dl_fragment_put(dlb);
> +	}
> +}
> +
>  static void vsp1_dl_list_free(struct vsp1_dl_list *dl)
>  {
> -	vsp1_dl_body_cleanup(&dl->body0);
> -	list_splice_init(&dl->fragments, &dl->dlm->gc_fragments);
> +	vsp1_dl_fragment_put(dl->body0);
> +	vsp1_dl_list_fragments_free(dl);

I wonder whether the second line is actually needed. vsp1_dl_list_free() is 
called from vsp1_dlm_destroy() for every entry in the dlm->free list. A DL can 
only be put in that list by vsp1_dlm_create() or __vsp1_dl_list_put(). The 
former creates lists with no fragment, while the latter calls 
vsp1_dl_list_fragments_free() already.

If you're not entirely sure you could add a WARN_ON(!list_empty(&dl-
>fragments)) and run the test suite. A comment explaining why the fragments 
list should already be empty here would be useful too.

> +
>  	kfree(dl);
>  }
> 
> @@ -467,18 +375,10 @@ static void __vsp1_dl_list_put(struct vsp1_dl_list
> *dl)
> 
>  	dl->has_chain = false;
> 
> -	/*
> -	 * We can't free fragments here as DMA memory can only be freed in
> -	 * interruptible context. Move all fragments to the display list
> -	 * manager's list of fragments to be freed, they will be
> -	 * garbage-collected by the work queue.
> -	 */
> -	if (!list_empty(&dl->fragments)) {
> -		list_splice_init(&dl->fragments, &dl->dlm->gc_fragments);
> -		schedule_work(&dl->dlm->gc_work);
> -	}
> +	vsp1_dl_list_fragments_free(dl);
> 
> -	dl->body0.num_entries = 0;
> +	/* body0 is reused */

It would be useful to explain why. Maybe something like "body0 is reused as an 
optimization as every display list needs at least one body." ? And now I'm 
wondering it it's really a useful optimization :-)

> +	dl->body0->num_entries = 0;
> 
>  	list_add_tail(&dl->list, &dl->dlm->free);
>  }

[snip]

> @@ -898,13 +764,26 @@ struct vsp1_dl_manager *vsp1_dlm_create(struct
> vsp1_device *vsp1,
> 
>  	spin_lock_init(&dlm->lock);
>  	INIT_LIST_HEAD(&dlm->free);
> -	INIT_LIST_HEAD(&dlm->gc_fragments);
> -	INIT_WORK(&dlm->gc_work, vsp1_dlm_garbage_collect);
> +
> +	/*
> +	 * Initialize the display list body and allocate DMA memory for the 
body
> +	 * and the optional header. Both are allocated together to avoid 
memory
> +	 * fragmentation, with the header located right after the body in
> +	 * memory.
> +	 */

Nice to see you're keeping this comment, but maybe you want to update it 
according to the code changes ;-)

> +	header_size = dlm->mode == VSP1_DL_MODE_HEADER
> +		    ? ALIGN(sizeof(struct vsp1_dl_header), 8)
> +		    : 0;
> +
> +	dlm->pool = vsp1_dl_fragment_pool_alloc(vsp1, prealloc,
> +					VSP1_DL_NUM_ENTRIES, header_size);
> +	if (!dlm->pool)
> +		return NULL;
> 
>  	for (i = 0; i < prealloc; ++i) {
>  		struct vsp1_dl_list *dl;
> 
> -		dl = vsp1_dl_list_alloc(dlm);
> +		dl = vsp1_dl_list_alloc(dlm, dlm->pool);
>  		if (!dl)
>  			return NULL;
> 

[snip]

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 2/8] v4l: vsp1: Provide a fragment pool
  2017-08-14 15:13 ` [PATCH v2 2/8] v4l: vsp1: Provide a fragment pool Kieran Bingham
@ 2017-08-17 12:13   ` Laurent Pinchart
  2017-09-11 20:30     ` Kieran Bingham
  0 siblings, 1 reply; 32+ messages in thread
From: Laurent Pinchart @ 2017-08-17 12:13 UTC (permalink / raw)
  To: Kieran Bingham; +Cc: linux-renesas-soc, linux-media

Hi Kieran,

Thank you for the patch.

On Monday 14 Aug 2017 16:13:25 Kieran Bingham wrote:
> Each display list allocates a body to store register values in a dma
> accessible buffer from a dma_alloc_wc() allocation. Each of these
> results in an entry in the TLB, and a large number of display list
> allocations adds pressure to this resource.
> 
> Reduce TLB pressure on the IPMMUs by allocating multiple display list
> bodies in a single allocation, and providing these to the display list
> through a 'fragment pool'. A pool can be allocated by the display list
> manager or entities which require their own body allocations.
> 
> Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
> 
> ---
> v2:
>  - assign dlb->dma correctly
> ---
>  drivers/media/platform/vsp1/vsp1_dl.c | 129 +++++++++++++++++++++++++++-
>  drivers/media/platform/vsp1/vsp1_dl.h |   8 ++-
>  2 files changed, 137 insertions(+)
> 
> diff --git a/drivers/media/platform/vsp1/vsp1_dl.c
> b/drivers/media/platform/vsp1/vsp1_dl.c index cb4625ae13c2..aab9dd6ec0eb
> 100644
> --- a/drivers/media/platform/vsp1/vsp1_dl.c
> +++ b/drivers/media/platform/vsp1/vsp1_dl.c
> @@ -45,6 +45,8 @@ struct vsp1_dl_entry {
>  /**
>   * struct vsp1_dl_body - Display list body
>   * @list: entry in the display list list of bodies
> + * @free: entry in the pool free body list
> + * @pool: pool to which this body belongs
>   * @vsp1: the VSP1 device
>   * @entries: array of entries
>   * @dma: DMA address of the entries
> @@ -54,6 +56,9 @@ struct vsp1_dl_entry {
>   */
>  struct vsp1_dl_body {
>  	struct list_head list;
> +	struct list_head free;
> +
> +	struct vsp1_dl_fragment_pool *pool;
>  	struct vsp1_device *vsp1;
> 
>  	struct vsp1_dl_entry *entries;
> @@ -65,6 +70,30 @@ struct vsp1_dl_body {
>  };
> 
>  /**
> + * struct vsp1_dl_fragment_pool - display list body/fragment pool
> + * @dma: DMA address of the entries
> + * @size: size of the full DMA memory pool in bytes
> + * @mem: CPU memory pointer for the pool
> + * @bodies: Array of DLB structures for the pool
> + * @free: List of free DLB entries
> + * @lock: Protects the pool and free list
> + * @vsp1: the VSP1 device
> + */
> +struct vsp1_dl_fragment_pool {
> +	/* DMA allocation */
> +	dma_addr_t dma;
> +	size_t size;
> +	void *mem;
> +
> +	/* Body management */
> +	struct vsp1_dl_body *bodies;
> +	struct list_head free;
> +	spinlock_t lock;
> +
> +	struct vsp1_device *vsp1;
> +};
> +
> +/**
>   * struct vsp1_dl_list - Display list
>   * @list: entry in the display list manager lists
>   * @dlm: the display list manager
> @@ -104,6 +133,7 @@ enum vsp1_dl_mode {
>   * @active: list currently being processed (loaded) by hardware
>   * @queued: list queued to the hardware (written to the DL registers)
>   * @pending: list waiting to be queued to the hardware
> + * @pool: fragment pool for the display list bodies
>   * @gc_work: fragments garbage collector work struct
>   * @gc_fragments: array of display list fragments waiting to be freed
>   */
> @@ -119,6 +149,8 @@ struct vsp1_dl_manager {
>  	struct vsp1_dl_list *queued;
>  	struct vsp1_dl_list *pending;
> 
> +	struct vsp1_dl_fragment_pool *pool;
> +
>  	struct work_struct gc_work;
>  	struct list_head gc_fragments;
>  };
> @@ -128,6 +160,103 @@ struct vsp1_dl_manager {
>   */
> 
>  /*
> + * Fragment pool's reduce the pressure on the iommu TLB by allocating a
> single
> + * large area of DMA memory and allocating it as a pool of fragment bodies
> + */

Could you document non-static function using kerneldoc ? Parameters to this 
function would benefit from some documentation. I'd also like to see the 
fragment get/put functions documented, as you remove existing kerneldoc for 
the alloc/free existing functions in patch 3/8.

> +struct vsp1_dl_fragment_pool *
> +vsp1_dl_fragment_pool_alloc(struct vsp1_device *vsp1, unsigned int qty,

I think I would name this function vsp1_dl_fragment_pool_create(), as it does 
more than just allocating memory. Similarly I'd call the free function 
vsp1_dl_fragment_pool_destroy().

qty is a bit vague, I'd rename it to num_fragments.

> +			    unsigned int num_entries, size_t extra_size)
> +{
> +	struct vsp1_dl_fragment_pool *pool;
> +	size_t dlb_size;
> +	unsigned int i;
> +
> +	pool = kzalloc(sizeof(*pool), GFP_KERNEL);
> +	if (!pool)
> +		return NULL;
> +
> +	pool->vsp1 = vsp1;
> +
> +	dlb_size = num_entries * sizeof(struct vsp1_dl_entry) + extra_size;

extra_size is only used by vsp1_dlm_create(), to allocate extra memory for the 
display list header. We need one header per display list, not per display list 
body.

> +	pool->size = dlb_size * qty;
> +
> +	pool->bodies = kcalloc(qty, sizeof(*pool->bodies), GFP_KERNEL);
> +	if (!pool->bodies) {
> +		kfree(pool);
> +		return NULL;
> +	}
> +
> +	pool->mem = dma_alloc_wc(vsp1->bus_master, pool->size, &pool->dma,
> +					    GFP_KERNEL);

This is a weird indentation.

> +	if (!pool->mem) {
> +		kfree(pool->bodies);
> +		kfree(pool);
> +		return NULL;
> +	}
> +
> +	spin_lock_init(&pool->lock);
> +	INIT_LIST_HEAD(&pool->free);
> +
> +	for (i = 0; i < qty; ++i) {
> +		struct vsp1_dl_body *dlb = &pool->bodies[i];
> +
> +		dlb->pool = pool;
> +		dlb->max_entries = num_entries;
> +
> +		dlb->dma = pool->dma + i * dlb_size;
> +		dlb->entries = pool->mem + i * dlb_size;
> +
> +		list_add_tail(&dlb->free, &pool->free);
> +	}
> +
> +	return pool;
> +}
> +
> +void vsp1_dl_fragment_pool_free(struct vsp1_dl_fragment_pool *pool)
> +{
> +	if (!pool)
> +		return;

Can this happen ?

> +
> +	if (pool->mem)
> +		dma_free_wc(pool->vsp1->bus_master, pool->size, pool->mem,
> +			    pool->dma);
> +
> +	kfree(pool->bodies);
> +	kfree(pool);
> +}
> +
> +struct vsp1_dl_body *vsp1_dl_fragment_get(struct vsp1_dl_fragment_pool
> *pool)
> +{
> +	struct vsp1_dl_body *dlb = NULL;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&pool->lock, flags);
> +
> +	if (!list_empty(&pool->free)) {
> +		dlb = list_first_entry(&pool->free, struct vsp1_dl_body, 
free);
> +		list_del(&dlb->free);
> +	}
> +
> +	spin_unlock_irqrestore(&pool->lock, flags);
> +
> +	return dlb;
> +}
> +
> +void vsp1_dl_fragment_put(struct vsp1_dl_body *dlb)
> +{
> +	unsigned long flags;
> +
> +	if (!dlb)
> +		return;
> +
> +	dlb->num_entries = 0;
> +
> +	spin_lock_irqsave(&dlb->pool->lock, flags);
> +	list_add_tail(&dlb->free, &dlb->pool->free);
> +	spin_unlock_irqrestore(&dlb->pool->lock, flags);
> +}
> +
> +/*
>   * Initialize a display list body object and allocate DMA memory for the
> body * data. The display list body object is expected to have been
> initialized to * 0 when allocated.
> diff --git a/drivers/media/platform/vsp1/vsp1_dl.h
> b/drivers/media/platform/vsp1/vsp1_dl.h index ee3508172f0a..9528484a8a34
> 100644
> --- a/drivers/media/platform/vsp1/vsp1_dl.h
> +++ b/drivers/media/platform/vsp1/vsp1_dl.h
> @@ -17,6 +17,7 @@
> 
>  struct vsp1_device;
>  struct vsp1_dl_fragment;
> +struct vsp1_dl_fragment_pool;

I noticed that the vsp1_dl_fragment structure is declared here but never 
defined or used. The vsp1_dl_fragment_* functions all operate on vsp1_dl_body 
structures.

The name body is used in the datasheet, so I think it would make sense to 
s/fragments/bodies/ and s/fragment/body/ through the code as a prerequisite 
for this patch, and rebasing it accordingly.

>  struct vsp1_dl_list;
>  struct vsp1_dl_manager;
> 
> @@ -34,6 +35,13 @@ void vsp1_dl_list_put(struct vsp1_dl_list *dl);
>  void vsp1_dl_list_write(struct vsp1_dl_list *dl, u32 reg, u32 data);
>  void vsp1_dl_list_commit(struct vsp1_dl_list *dl);
> 
> +struct vsp1_dl_fragment_pool *
> +vsp1_dl_fragment_pool_alloc(struct vsp1_device *vsp1, unsigned int qty,
> +			    unsigned int num_entries, size_t extra_size);
> +void vsp1_dl_fragment_pool_free(struct vsp1_dl_fragment_pool *pool);
> +struct vsp1_dl_body *vsp1_dl_fragment_get(struct vsp1_dl_fragment_pool
> *pool);
> +void vsp1_dl_fragment_put(struct vsp1_dl_body *dlb);
> +
>  struct vsp1_dl_body *vsp1_dl_fragment_alloc(struct vsp1_device *vsp1,
>  					    unsigned int num_entries);
>  void vsp1_dl_fragment_free(struct vsp1_dl_body *dlb);
a
-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 4/8] v4l: vsp1: Use reference counting for fragments
  2017-08-14 15:13 ` [PATCH v2 4/8] v4l: vsp1: Use reference counting for fragments Kieran Bingham
@ 2017-08-17 12:53   ` Laurent Pinchart
  0 siblings, 0 replies; 32+ messages in thread
From: Laurent Pinchart @ 2017-08-17 12:53 UTC (permalink / raw)
  To: Kieran Bingham; +Cc: linux-renesas-soc, linux-media

Hi Kieran,

Thank you for the patch.

On Monday 14 Aug 2017 16:13:27 Kieran Bingham wrote:
> Extend the display list body with a reference count, allowing bodies to
> be kept as long as a reference is maintained. This provides the ability
> to keep a cached copy of bodies which will not change, so that they can
> be re-applied to multiple display lists.
> 
> Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
> 
> ---
> This could be squashed into the fragment update code, but it's not a
> straightforward squash as the refcounts will affect both:
>   v4l: vsp1: Provide a fragment pool
> and
>   v4l: vsp1: Convert display lists to use new fragment pool
> therefore, I have kept this separate to prevent breaking bisectability
> of the vsp-tests.

Sounds good to me.

> ---
>  drivers/media/platform/vsp1/vsp1_clu.c |  7 ++++++-
>  drivers/media/platform/vsp1/vsp1_dl.c  | 15 ++++++++++++++-
>  drivers/media/platform/vsp1/vsp1_lut.c |  7 ++++++-
>  3 files changed, 26 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/media/platform/vsp1/vsp1_clu.c
> b/drivers/media/platform/vsp1/vsp1_clu.c index 52c523625e2f..175717018e11
> 100644
> --- a/drivers/media/platform/vsp1/vsp1_clu.c
> +++ b/drivers/media/platform/vsp1/vsp1_clu.c
> @@ -257,8 +257,13 @@ static void clu_configure(struct vsp1_entity *entity,
>  		clu->clu = NULL;
>  		spin_unlock_irqrestore(&clu->lock, flags);
> 
> -		if (dlb)
> +		if (dlb) {
>  			vsp1_dl_list_add_fragment(dl, dlb);
> +
> +			/* release our local reference */
> +			vsp1_dl_fragment_put(dlb);
> +		}
> +
>  		break;
>  	}
>  }
> diff --git a/drivers/media/platform/vsp1/vsp1_dl.c
> b/drivers/media/platform/vsp1/vsp1_dl.c index 6ffdc3549283..37feda248946
> 100644
> --- a/drivers/media/platform/vsp1/vsp1_dl.c
> +++ b/drivers/media/platform/vsp1/vsp1_dl.c
> @@ -14,6 +14,7 @@
>  #include <linux/device.h>
>  #include <linux/dma-mapping.h>
>  #include <linux/gfp.h>
> +#include <linux/refcount.h>
>  #include <linux/slab.h>
>  #include <linux/workqueue.h>
> 
> @@ -58,6 +59,8 @@ struct vsp1_dl_body {
>  	struct list_head list;
>  	struct list_head free;
> 
> +	refcount_t refcnt;
> +
>  	struct vsp1_dl_fragment_pool *pool;
>  	struct vsp1_device *vsp1;
> 
> @@ -230,6 +233,7 @@ struct vsp1_dl_body *vsp1_dl_fragment_get(struct
> vsp1_dl_fragment_pool *pool)
>  	if (!list_empty(&pool->free)) {
>  		dlb = list_first_entry(&pool->free, struct vsp1_dl_body,
> free);
>  		list_del(&dlb->free);
> +		refcount_set(&dlb->refcnt, 1);
>  	}
> 
>  	spin_unlock_irqrestore(&pool->lock, flags);
> @@ -244,6 +248,9 @@ void vsp1_dl_fragment_put(struct vsp1_dl_body *dlb)
>  	if (!dlb)
>  		return;
> 
> +	if (!refcount_dec_and_test(&dlb->refcnt))
> +		return;
> +
>  	dlb->num_entries = 0;
> 
>  	spin_lock_irqsave(&dlb->pool->lock, flags);
> @@ -428,7 +435,11 @@ void vsp1_dl_list_write(struct vsp1_dl_list *dl, u32
> reg, u32 data)
>   * list, in the order in which fragments are added.
>   *
>   * Adding a fragment to a display list passes ownership of the fragment to
> the
> - * list. The caller must not touch the fragment after this call.
> + * list. The caller must not modify the fragment after this call, but can
> retain
> + * a reference to it for future use if necessary, to add to subsequent
> lists.

I think there's a bit of contradiction here, if the ownership passes to the 
list then the caller shouldn't touch it anymore. How about stating it as 
follows ?

 * The caller retains its reference to the fragment when adding it to a
 * display list, but is not allowed to add new entries to the fragment.
 * The reference must be explicitly released by a call to
 * vsp1_dl_fragment_put() when the fragment isn't needed anymore.

> the
> - * list. The caller must not touch the fragment after this call.
> + * list. The caller must not modify the fragment after this call, but can
> retain
> + * a reference to it for future use if necessary, to add to subsequent
> lists.
> + *
> + * The reference count of the body is incremented by this attachment, and
> thus
> + * the caller should release it's reference if does not want to cache the
> body.
>   *
>   * Fragments are only usable for display lists in header mode. Attempt to
>   * add a fragment to a header-less display list will return an error.
> @@ -440,6 +451,8 @@ int vsp1_dl_list_add_fragment(struct vsp1_dl_list *dl,
>  	if (dl->dlm->mode != VSP1_DL_MODE_HEADER)
>  		return -EINVAL;
> 
> +	refcount_inc(&dlb->refcnt);
> +
>  	list_add_tail(&dlb->list, &dl->fragments);
>  	return 0;
>  }
> diff --git a/drivers/media/platform/vsp1/vsp1_lut.c
> b/drivers/media/platform/vsp1/vsp1_lut.c index 57482e057e54..388bd89ade0b
> 100644
> --- a/drivers/media/platform/vsp1/vsp1_lut.c
> +++ b/drivers/media/platform/vsp1/vsp1_lut.c
> @@ -213,8 +213,13 @@ static void lut_configure(struct vsp1_entity *entity,
>  		lut->lut = NULL;
>  		spin_unlock_irqrestore(&lut->lock, flags);
> 
> -		if (dlb)
> +		if (dlb) {
>  			vsp1_dl_list_add_fragment(dl, dlb);
> +
> +			/* release our local reference */
> +			vsp1_dl_fragment_put(dlb);
> +		}
> +
>  		break;
>  	}
>  }

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 8/8] v4l: vsp1: Reduce display list body size
  2017-08-14 15:13 ` [PATCH v2 8/8] v4l: vsp1: Reduce display list body size Kieran Bingham
@ 2017-08-17 16:11   ` Laurent Pinchart
  2017-09-11 21:15     ` Kieran Bingham
  0 siblings, 1 reply; 32+ messages in thread
From: Laurent Pinchart @ 2017-08-17 16:11 UTC (permalink / raw)
  To: Kieran Bingham; +Cc: linux-renesas-soc, linux-media

Hi Kieran,

Thank you for the patch.

On Monday 14 Aug 2017 16:13:31 Kieran Bingham wrote:
> The display list originally allocated a body of 256 entries to store all
> of the register lists required for each frame.
> 
> This has now been separated into fragments for constant stream setup, and
> runtime updates.
> 
> Empirical testing shows that the body0 now uses a maximum of 41
> registers for each frame, for both DRM and Video API pipelines thus a
> rounded 64 entries provides a suitable allocation.

Didn't you mention in patch 7/8 that one of the fragments uses exactly 64 
entries ? Which one is it, and is there a risk it could use more ? 

> Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
> ---
>  drivers/media/platform/vsp1/vsp1_dl.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/media/platform/vsp1/vsp1_dl.c
> b/drivers/media/platform/vsp1/vsp1_dl.c index 176a258146ac..b3f5eb2f9a4f
> 100644
> --- a/drivers/media/platform/vsp1/vsp1_dl.c
> +++ b/drivers/media/platform/vsp1/vsp1_dl.c
> @@ -21,7 +21,7 @@
>  #include "vsp1.h"
>  #include "vsp1_dl.h"
> 
> -#define VSP1_DL_NUM_ENTRIES		256
> +#define VSP1_DL_NUM_ENTRIES		64
> 
>  #define VSP1_DLH_INT_ENABLE		(1 << 1)
>  #define VSP1_DLH_AUTO_START		(1 << 0)

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 6/8] v4l: vsp1: Adapt entities to configure into a body
  2017-08-14 15:13 ` [PATCH v2 6/8] v4l: vsp1: Adapt entities to configure into a body Kieran Bingham
@ 2017-08-17 17:58   ` Laurent Pinchart
  2017-09-11 21:42     ` Kieran Bingham
  0 siblings, 1 reply; 32+ messages in thread
From: Laurent Pinchart @ 2017-08-17 17:58 UTC (permalink / raw)
  To: Kieran Bingham; +Cc: linux-renesas-soc, linux-media

Hi Kieran,

Thank you for the patch.

On Monday 14 Aug 2017 16:13:29 Kieran Bingham wrote:
> Currently the entities store their configurations into a display list.
> Adapt this such that the code can be configured into a body fragment
> directly, allowing greater flexibility and control of the content.
> 
> All users of vsp1_dl_list_write() are removed in this process, thus it
> too is removed.
> 
> A helper, vsp1_dl_list_body() is provided to access the internal body0
> from the display list.
> 
> Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
> ---
>  drivers/media/platform/vsp1/vsp1_bru.c    | 22 ++++++------
>  drivers/media/platform/vsp1/vsp1_clu.c    | 22 ++++++------
>  drivers/media/platform/vsp1/vsp1_dl.c     | 12 ++-----
>  drivers/media/platform/vsp1/vsp1_dl.h     |  2 +-
>  drivers/media/platform/vsp1/vsp1_drm.c    | 14 +++++---
>  drivers/media/platform/vsp1/vsp1_entity.c | 16 ++++-----
>  drivers/media/platform/vsp1/vsp1_entity.h | 12 ++++---
>  drivers/media/platform/vsp1/vsp1_hgo.c    | 16 ++++-----
>  drivers/media/platform/vsp1/vsp1_hgt.c    | 18 +++++-----
>  drivers/media/platform/vsp1/vsp1_hsit.c   | 10 +++---
>  drivers/media/platform/vsp1/vsp1_lif.c    | 13 +++----
>  drivers/media/platform/vsp1/vsp1_lut.c    | 21 ++++++------
>  drivers/media/platform/vsp1/vsp1_pipe.c   |  4 +-
>  drivers/media/platform/vsp1/vsp1_pipe.h   |  3 +-
>  drivers/media/platform/vsp1/vsp1_rpf.c    | 43 +++++++++++-------------
>  drivers/media/platform/vsp1/vsp1_sru.c    | 14 ++++----
>  drivers/media/platform/vsp1/vsp1_uds.c    | 24 +++++++------
>  drivers/media/platform/vsp1/vsp1_uds.h    |  2 +-
>  drivers/media/platform/vsp1/vsp1_video.c  | 11 ++++--
>  drivers/media/platform/vsp1/vsp1_wpf.c    | 42 ++++++++++++-----------
>  20 files changed, 168 insertions(+), 153 deletions(-)

This is quite intrusive, and it bothers me slightly that we need to pass both 
the DL and the DLB to the configure function in order to add fragments to the 
DL in the CLU and LUT modules. Wouldn't it be simpler to add a pointer to the 
current body in the DL structure, and modify vsp1_dl_list_write() to write to 
the current fragment ?

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 7/8] v4l: vsp1: Move video configuration to a cached dlb
  2017-08-14 15:13 ` [PATCH v2 7/8] v4l: vsp1: Move video configuration to a cached dlb Kieran Bingham
@ 2017-08-17 18:10   ` Laurent Pinchart
  2017-11-16 18:19     ` Kieran Bingham
  0 siblings, 1 reply; 32+ messages in thread
From: Laurent Pinchart @ 2017-08-17 18:10 UTC (permalink / raw)
  To: Kieran Bingham; +Cc: linux-renesas-soc, linux-media

Hi Kieran,

Thank you for the patch.

On Monday 14 Aug 2017 16:13:30 Kieran Bingham wrote:
> We are now able to configure a pipeline directly into a local display
> list body. Take advantage of this fact, and create a cacheable body to
> store the configuration of the pipeline in the video object.
> 
> vsp1_video_pipeline_run() is now the last user of the pipe->dl object.
> Convert this function to use the cached video->config body and obtain a
> local display list reference.
> 
> Attach the video->config body to the display list when needed before
> committing to hardware.
> 
> The pipe object is marked as un-configured when entering a suspend. This
> ensures that upon resume, where the hardware is reset - our cached
> configuration will be re-attached to the next committed DL.
> 
> Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
> ---
> 
> Our video DL usage now looks like the below output:
> 
> dl->body0 contains our disposable runtime configuration. Max 41.
> dl_child->body0 is our partition specific configuration. Max 12.
> dl->fragments shows our constant configuration and LUTs.
> 
>   These two are LUT/CLU:
>      * dl->fragments[x]->num_entries 256 / max 256
>      * dl->fragments[x]->num_entries 4914 / max 4914
> 
> Which shows that our 'constant' configuration cache is currently
> utilised to a maximum of 64 entries.
> 
> trace-cmd report | \
>     grep max | sed 's/.*vsp1_dl_list_commit://g' | sort | uniq;
> 
>   dl->body0->num_entries 13 / max 128
>   dl->body0->num_entries 14 / max 128
>   dl->body0->num_entries 16 / max 128
>   dl->body0->num_entries 20 / max 128
>   dl->body0->num_entries 27 / max 128
>   dl->body0->num_entries 34 / max 128
>   dl->body0->num_entries 41 / max 128
>   dl_child->body0->num_entries 10 / max 128
>   dl_child->body0->num_entries 12 / max 128
>   dl->fragments[x]->num_entries 15 / max 128
>   dl->fragments[x]->num_entries 16 / max 128
>   dl->fragments[x]->num_entries 17 / max 128
>   dl->fragments[x]->num_entries 18 / max 128
>   dl->fragments[x]->num_entries 20 / max 128
>   dl->fragments[x]->num_entries 21 / max 128
>   dl->fragments[x]->num_entries 256 / max 256
>   dl->fragments[x]->num_entries 31 / max 128
>   dl->fragments[x]->num_entries 32 / max 128
>   dl->fragments[x]->num_entries 39 / max 128
>   dl->fragments[x]->num_entries 40 / max 128
>   dl->fragments[x]->num_entries 47 / max 128
>   dl->fragments[x]->num_entries 48 / max 128
>   dl->fragments[x]->num_entries 4914 / max 4914
>   dl->fragments[x]->num_entries 55 / max 128
>   dl->fragments[x]->num_entries 56 / max 128
>   dl->fragments[x]->num_entries 63 / max 128
>   dl->fragments[x]->num_entries 64 / max 128
> ---
>  drivers/media/platform/vsp1/vsp1_pipe.c  |  4 +-
>  drivers/media/platform/vsp1/vsp1_pipe.h  |  4 +-
>  drivers/media/platform/vsp1/vsp1_video.c | 67 ++++++++++++++++---------
>  drivers/media/platform/vsp1/vsp1_video.h |  2 +-
>  4 files changed, 51 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/media/platform/vsp1/vsp1_pipe.c
> b/drivers/media/platform/vsp1/vsp1_pipe.c index 5012643583b6..7d1f7ba43060
> 100644
> --- a/drivers/media/platform/vsp1/vsp1_pipe.c
> +++ b/drivers/media/platform/vsp1/vsp1_pipe.c
> @@ -249,6 +249,7 @@ void vsp1_pipeline_run(struct vsp1_pipeline *pipe)
>  		vsp1_write(vsp1, VI6_CMD(pipe->output->entity.index),
>  			   VI6_CMD_STRCMD);
>  		pipe->state = VSP1_PIPELINE_RUNNING;
> +		pipe->configured = true;
>  	}
> 
>  	pipe->buffers_ready = 0;
> @@ -430,6 +431,9 @@ void vsp1_pipelines_suspend(struct vsp1_device *vsp1)
>  		spin_lock_irqsave(&pipe->irqlock, flags);
>  		if (pipe->state == VSP1_PIPELINE_RUNNING)
>  			pipe->state = VSP1_PIPELINE_STOPPING;
> +
> +		/* After a suspend, the hardware will be reset */
> +		pipe->configured = false;

It shouldn't make a difference in practice, but I think it would be more 
logical to set the configured field to false after the hardware has been 
reset. I'd move this to the resume handler and update the comment to "The 
hardware might have been reset during suspend and need a full 
reconfiguration". 

>  		spin_unlock_irqrestore(&pipe->irqlock, flags);
>  	}
> 
> diff --git a/drivers/media/platform/vsp1/vsp1_pipe.h
> b/drivers/media/platform/vsp1/vsp1_pipe.h index 90d29492b9b9..e7ad6211b4d0
> 100644
> --- a/drivers/media/platform/vsp1/vsp1_pipe.h
> +++ b/drivers/media/platform/vsp1/vsp1_pipe.h
> @@ -90,6 +90,7 @@ struct vsp1_partition {
>   * @irqlock: protects the pipeline state
>   * @state: current state
>   * @wq: wait queue to wait for state change completion
> + * @configured: flag determining if the hardware has run since reset
>   * @frame_end: frame end interrupt handler
>   * @lock: protects the pipeline use count and stream count
>   * @kref: pipeline reference count
> @@ -117,6 +118,7 @@ struct vsp1_pipeline {
>  	spinlock_t irqlock;
>  	enum vsp1_pipeline_state state;
>  	wait_queue_head_t wq;
> +	bool configured;
> 
>  	void (*frame_end)(struct vsp1_pipeline *pipe, bool completed);
> 
> @@ -143,8 +145,6 @@ struct vsp1_pipeline {
>  	 */
>  	struct list_head entities;
> 
> -	struct vsp1_dl_list *dl;
> -
>  	unsigned int partitions;
>  	struct vsp1_partition *partition;
>  	struct vsp1_partition *part_table;
> diff --git a/drivers/media/platform/vsp1/vsp1_video.c
> b/drivers/media/platform/vsp1/vsp1_video.c index 7e825f3360bf..42b70b8465ba
> 100644
> --- a/drivers/media/platform/vsp1/vsp1_video.c
> +++ b/drivers/media/platform/vsp1/vsp1_video.c
> @@ -394,37 +394,43 @@ static void vsp1_video_pipeline_run_partition(struct
> vsp1_pipeline *pipe, static void vsp1_video_pipeline_run(struct
> vsp1_pipeline *pipe)
>  {
>  	struct vsp1_device *vsp1 = pipe->output->entity.vsp1;
> +	struct vsp1_video *video = pipe->output->video;
>  	unsigned int partition;
> +	struct vsp1_dl_list *dl;
> +
> +	dl = vsp1_dl_list_get(pipe->output->dlm);
> 
> -	if (!pipe->dl)
> -		pipe->dl = vsp1_dl_list_get(pipe->output->dlm);
> +	/* Attach our pipe configuration to fully initialise the hardware */
> +	if (!pipe->configured) {
> +		vsp1_dl_list_add_fragment(dl, video->pipe_config);
> +		pipe->configured = true;
> +	}
> 
>  	/* Run the first partition */
> -	vsp1_video_pipeline_run_partition(pipe, pipe->dl, 0);
> +	vsp1_video_pipeline_run_partition(pipe, dl, 0);
> 
>  	/* Process consecutive partitions as necessary */
>  	for (partition = 1; partition < pipe->partitions; ++partition) {
> -		struct vsp1_dl_list *dl;
> +		struct vsp1_dl_list *dl_child;

Is this really a child ? From a chaining point of view, it's more of a 
sibling. Maybe dl_next or dl_partition ?

> 
> -		dl = vsp1_dl_list_get(pipe->output->dlm);
> +		dl_child = vsp1_dl_list_get(pipe->output->dlm);
> 
>  		/*
>  		 * An incomplete chain will still function, but output only
>  		 * the partitions that had a dl available. The frame end
>  		 * interrupt will be marked on the last dl in the chain.
>  		 */
> -		if (!dl) {
> +		if (!dl_child) {
>  			dev_err(vsp1->dev, "Failed to obtain a dl list. Frame
> will be incomplete\n");
>  			break;
>  		}
> 
> -		vsp1_video_pipeline_run_partition(pipe, dl, partition);
> -		vsp1_dl_list_add_chain(pipe->dl, dl);
> +		vsp1_video_pipeline_run_partition(pipe, dl_child, partition);
> +		vsp1_dl_list_add_chain(dl, dl_child);
>  	}
> 
>  	/* Complete, and commit the head display list. */
> -	vsp1_dl_list_commit(pipe->dl);
> -	pipe->dl = NULL;
> +	vsp1_dl_list_commit(dl);
> 
>  	vsp1_pipeline_run(pipe);
>  }
> @@ -790,8 +796,8 @@ static void vsp1_video_buffer_queue(struct vb2_buffer
> *vb)
> 
>  static int vsp1_video_setup_pipeline(struct vsp1_pipeline *pipe)
>  {
> +	struct vsp1_video *video = pipe->output->video;
>  	struct vsp1_entity *entity;
> -	struct vsp1_dl_body *dlb;
>  	int ret;
> 
>  	/* Determine this pipelines sizes for image partitioning support. */
> @@ -799,14 +805,6 @@ static int vsp1_video_setup_pipeline(struct
> vsp1_pipeline *pipe)
>  	if (ret < 0)
>  		return ret;
> 
> -	/* Prepare the display list. */
> -	pipe->dl = vsp1_dl_list_get(pipe->output->dlm);
> -	if (!pipe->dl)
> -		return -ENOMEM;
> -
> -	/* Retrieve the default DLB from the list */
> -	dlb = vsp1_dl_list_get_body(pipe->dl);
> -
>  	if (pipe->uds) {
>  		struct vsp1_uds *uds = to_uds(&pipe->uds->subdev);
> 
> @@ -828,11 +826,20 @@ static int vsp1_video_setup_pipeline(struct
> vsp1_pipeline *pipe) }
>  	}
> 
> +	/* Obtain a clean body from our pool */
> +	video->pipe_config = vsp1_dl_fragment_get(video->dlbs);
> +	if (!video->pipe_config)
> +		return -ENOMEM;

Is there a reason to store the pipe configuration in the video object instead 
of the pipeline object ?

> +	/* Configure the entities into our cached pipe configuration */
>  	list_for_each_entry(entity, &pipe->entities, list_pipe) {
> -		vsp1_entity_route_setup(entity, pipe, dlb);
> -		vsp1_entity_prepare(entity, pipe, dlb);
> +		vsp1_entity_route_setup(entity, pipe, video->pipe_config);
> +		vsp1_entity_prepare(entity, pipe, video->pipe_config);
>  	}
> 
> +	/* Ensure that our cached configuration is updated in the next DL */
> +	pipe->configured = false;

I'm tempted to move this at pipeline stop time (either to 
vsp1_video_stop_streaming() right after the vsp1_pipeline_stop() call, or in 
vsp1_pipeline_stop() itself), possibly with a WARN_ON() here to catch bugs in 
the driver.

>  	return 0;
>  }
> 
> @@ -842,6 +849,9 @@ static void vsp1_video_cleanup_pipeline(struct
> vsp1_pipeline *pipe) struct vsp1_vb2_buffer *buffer;
>  	unsigned long flags;
> 
> +	/* Release any cached configuration */
> +	vsp1_dl_fragment_put(video->pipe_config);
> +
>  	/* Remove all buffers from the IRQ queue. */
>  	spin_lock_irqsave(&video->irqlock, flags);
>  	list_for_each_entry(buffer, &video->irqqueue, queue)
> @@ -918,9 +928,6 @@ static void vsp1_video_stop_streaming(struct vb2_queue
> *vq) ret = vsp1_pipeline_stop(pipe);
>  		if (ret == -ETIMEDOUT)
>  			dev_err(video->vsp1->dev, "pipeline stop timeout\n");
> -
> -		vsp1_dl_list_put(pipe->dl);
> -		pipe->dl = NULL;
>  	}
>  	mutex_unlock(&pipe->lock);
> 
> @@ -1240,6 +1247,16 @@ struct vsp1_video *vsp1_video_create(struct
> vsp1_device *vsp1, goto error;
>  	}
> 
> +	/*
> +	 * Create a fragment pool to cache the constant configuration of the
> +	 * pipeline object
> +	 */
> +	video->dlbs = vsp1_dl_fragment_pool_alloc(vsp1, 2, 128, 0);
> +	if (!video->dlbs) {
> +		ret = -ENOMEM;
> +		goto error;
> +	}
> +
>  	return video;
> 
>  error:
> @@ -1249,6 +1266,8 @@ struct vsp1_video *vsp1_video_create(struct
> vsp1_device *vsp1,
> 
>  void vsp1_video_cleanup(struct vsp1_video *video)
>  {
> +	vsp1_dl_fragment_pool_free(video->dlbs);
> +
>  	if (video_is_registered(&video->video))
>  		video_unregister_device(&video->video);
> 
> diff --git a/drivers/media/platform/vsp1/vsp1_video.h
> b/drivers/media/platform/vsp1/vsp1_video.h index 50ea7f02205f..2499d3d792b4
> 100644
> --- a/drivers/media/platform/vsp1/vsp1_video.h
> +++ b/drivers/media/platform/vsp1/vsp1_video.h
> @@ -43,6 +43,8 @@ struct vsp1_video {
> 
>  	struct mutex lock;
> 
> +	struct vsp1_dl_fragment_pool *dlbs;
> +	struct vsp1_dl_body *pipe_config;
>  	unsigned int pipe_index;
> 
>  	struct vb2_queue queue;

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 5/8] v4l: vsp1: Refactor display list configure operations
  2017-08-14 15:13 ` [PATCH v2 5/8] v4l: vsp1: Refactor display list configure operations Kieran Bingham
@ 2017-08-17 18:13   ` Laurent Pinchart
  2017-09-11 21:16     ` Kieran Bingham
  0 siblings, 1 reply; 32+ messages in thread
From: Laurent Pinchart @ 2017-08-17 18:13 UTC (permalink / raw)
  To: Kieran Bingham; +Cc: linux-renesas-soc, linux-media

Hi Kieran,

Thank you for the patch.

On Monday 14 Aug 2017 16:13:28 Kieran Bingham wrote:
> The entities provide a single .configure operation which configures the
> object into the target display list, based on the vsp1_entity_params
> selection.
> 
> This restricts us to a single function prototype for both static
> configuration (the pre-stream INIT stage) and the dynamic runtime stages
> for both each frame - and each partition therein.
> 
> Split the configure function into two parts, '.prepare()' and
> '.configure()', merging both the VSP1_ENTITY_PARAMS_RUNTIME and
> VSP1_ENTITY_PARAMS_PARTITION stages into a single call through the
> .configure(). The configuration for individual partitions is handled by
> passing the partition number to the configure call, and processing any
> runtime stage actions on the first partition only.
> 
> Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
> ---
>  drivers/media/platform/vsp1/vsp1_bru.c    |  12 +-
>  drivers/media/platform/vsp1/vsp1_clu.c    |  43 +--
>  drivers/media/platform/vsp1/vsp1_drm.c    |  11 +-
>  drivers/media/platform/vsp1/vsp1_entity.c |  15 +-
>  drivers/media/platform/vsp1/vsp1_entity.h |  27 +--
>  drivers/media/platform/vsp1/vsp1_hgo.c    |  12 +-
>  drivers/media/platform/vsp1/vsp1_hgt.c    |  12 +-
>  drivers/media/platform/vsp1/vsp1_hsit.c   |  12 +-
>  drivers/media/platform/vsp1/vsp1_lif.c    |  12 +-
>  drivers/media/platform/vsp1/vsp1_lut.c    |  24 +-
>  drivers/media/platform/vsp1/vsp1_rpf.c    | 162 ++++++-------
>  drivers/media/platform/vsp1/vsp1_sru.c    |  12 +-
>  drivers/media/platform/vsp1/vsp1_uds.c    |  55 ++--
>  drivers/media/platform/vsp1/vsp1_video.c  |  24 +--
>  drivers/media/platform/vsp1/vsp1_wpf.c    | 297 ++++++++++++-----------
>  15 files changed, 359 insertions(+), 371 deletions(-)

[snip]

> diff --git a/drivers/media/platform/vsp1/vsp1_clu.c
> b/drivers/media/platform/vsp1/vsp1_clu.c index 175717018e11..5f65ce3ad97f
> 100644
> --- a/drivers/media/platform/vsp1/vsp1_clu.c
> +++ b/drivers/media/platform/vsp1/vsp1_clu.c
> @@ -213,37 +213,37 @@ static const struct v4l2_subdev_ops clu_ops = {
>  /* ------------------------------------------------------------------------
>   * VSP1 Entity Operations
>   */
> +static void clu_prepare(struct vsp1_entity *entity,
> +			struct vsp1_pipeline *pipe,
> +			struct vsp1_dl_list *dl)
> +{
> +	struct vsp1_clu *clu = to_clu(&entity->subdev);
> +
> +	/*
> +	 * The format can't be changed during streaming, only verify it
> +	 * at setup time and store the information internally for future
> +	 * runtime configuration calls.
> +	 */

I know you're just moving the comment around, but let's fix it at the same 
time. There's no verification here (and no "setup time" either). I'd write it 
as

	/*
	 * The format can't be changed during streaming. Cache it internally
	 * for future runtime configuration calls.
	 */

> +	struct v4l2_mbus_framefmt *format;
> +
> +	format = vsp1_entity_get_pad_format(&clu->entity,
> +					    clu->entity.config,
> +					    CLU_PAD_SINK);
> +	clu->yuv_mode = format->code == MEDIA_BUS_FMT_AYUV8_1X32;
> +}

[snip]

> diff --git a/drivers/media/platform/vsp1/vsp1_entity.h
> b/drivers/media/platform/vsp1/vsp1_entity.h index
> 408602ebeb97..2f33e343ccc6 100644
> --- a/drivers/media/platform/vsp1/vsp1_entity.h
> +++ b/drivers/media/platform/vsp1/vsp1_entity.h

[snip]

> @@ -80,8 +68,10 @@ struct vsp1_route {
>  /**
>   * struct vsp1_entity_operations - Entity operations
>   * @destroy:	Destroy the entity.
> - * @configure:	Setup the hardware based on the entity state
> (pipeline, formats,
> - *		selection rectangles, ...)
> + * @prepare:	Setup the initial hardware parameters for the stream
> (pipeline,
> + *		formats)
> + * @configure:	Configure the runtime parameters for each partition
> (rectangles,
> + *		buffer addresses, ...)

Now moving to the bikeshedding territory, I'm not sure if prepare and 
configure are the best names for those operations. I'd like to also point out 
that we could go one step further by caching the partition-related parameters 
too, in which case we would need a third operation (or possibly passing the 
partition number to the prepare operation). While I won't mind if you 
implement this now, the issue could also be addressed later, but I'd like the 
operations to already support that use case to avoid yet another painful 
rename patch.

>   * @max_width:	Return the max supported width of data that the entity
> can
>   *		process in a single operation.
>   * @partition:	Process the partition construction based on this
> entity's

[snip]

The rest of the patch looks good to me.

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 3/8] v4l: vsp1: Convert display lists to use new fragment pool
  2017-08-17 12:13   ` Laurent Pinchart
@ 2017-09-11 20:27     ` Kieran Bingham
  2017-09-13  2:26       ` Laurent Pinchart
  0 siblings, 1 reply; 32+ messages in thread
From: Kieran Bingham @ 2017-09-11 20:27 UTC (permalink / raw)
  To: Laurent Pinchart; +Cc: linux-renesas-soc, linux-media

Hi Laurent,

Thanks for the review

On 17/08/17 13:13, Laurent Pinchart wrote:
> Hi Kieran,
> 
> Thank you for the patch.
> 
> On Monday 14 Aug 2017 16:13:26 Kieran Bingham wrote:
>> Adapt the dl->body0 object to use an object from the fragment pool.
>> This greatly reduces the pressure on the TLB for IPMMU use cases, as
>> all of the lists use a single allocation for the main body.
>>
>> The CLU and LUT objects pre-allocate a pool containing two bodies,
>> allowing a userspace update before the hardware has committed a previous
>> set of tables.
> 
> I think you'll need three bodies, one for the DL queued to the hardware, one 
> for the pending DL and one for the new DL needed when you update the LUT/CLU. 
> Given that the VSP test suite hasn't caught this problem, we also need a new 
> test :-)
> 
>> Fragments are no longer 'freed' in interrupt context, but instead
>> released back to their respective pools.  This allows us to remove the
>> garbage collector in the DLM.
>>
>> Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
>>
>> ---
>> v2:
>>  - Use dl->body0->max_entries to determine header offset, instead of the
>>    global constant VSP1_DL_NUM_ENTRIES which is incorrect.
>>  - squash updates for LUT, CLU, and fragment cleanup into single patch.
>>    (Not fully bisectable when separated)
>> ---
>>  drivers/media/platform/vsp1/vsp1_clu.c |  22 ++-
>>  drivers/media/platform/vsp1/vsp1_clu.h |   1 +-
>>  drivers/media/platform/vsp1/vsp1_dl.c  | 223 +++++---------------------
>>  drivers/media/platform/vsp1/vsp1_dl.h  |   3 +-
>>  drivers/media/platform/vsp1/vsp1_lut.c |  23 ++-
>>  drivers/media/platform/vsp1/vsp1_lut.h |   1 +-
>>  6 files changed, 90 insertions(+), 183 deletions(-)
> 
> This is a nice diffstat, but only if you add kerneldoc for the new functions 
> introduced in patch 2/8, otherwise the overall documentation diffstat looks 
> bad :-)
> 
>> diff --git a/drivers/media/platform/vsp1/vsp1_clu.c
>> b/drivers/media/platform/vsp1/vsp1_clu.c index f2fb26e5ab4e..52c523625e2f
>> 100644
>> --- a/drivers/media/platform/vsp1/vsp1_clu.c
>> +++ b/drivers/media/platform/vsp1/vsp1_clu.c
> 
> [snip]
> 
>> @@ -288,6 +298,12 @@ struct vsp1_clu *vsp1_clu_create(struct vsp1_device
>> *vsp1) if (ret < 0)
>>  		return ERR_PTR(ret);
>>
>> +	/* Allocate a fragment pool */
> 
> The comment would be more useful if you explained why you need to allocate a 
> pool here. Same comment for the LUT.

Done

> 
>> +	clu->pool = vsp1_dl_fragment_pool_alloc(clu->entity.vsp1, 2,
>> +						CLU_SIZE + 1, 0);
>> +	if (!clu->pool)
>> +		return ERR_PTR(-ENOMEM);
>> +
>>  	/* Initialize the control handler. */
>>  	v4l2_ctrl_handler_init(&clu->ctrls, 2);
>>  	v4l2_ctrl_new_custom(&clu->ctrls, &clu_table_control, NULL);
> 
> [snip]
> 
>> diff --git a/drivers/media/platform/vsp1/vsp1_dl.c
>> b/drivers/media/platform/vsp1/vsp1_dl.c index aab9dd6ec0eb..6ffdc3549283
>> 100644
>> --- a/drivers/media/platform/vsp1/vsp1_dl.c
>> +++ b/drivers/media/platform/vsp1/vsp1_dl.c
> 
> [snip]
> 
> 
>> @@ -379,41 +289,39 @@ static struct vsp1_dl_list *vsp1_dl_list_alloc(struct
>> vsp1_dl_manager *dlm) INIT_LIST_HEAD(&dl->fragments);
>>  	dl->dlm = dlm;
>>
>> -	/*
>> -	 * Initialize the display list body and allocate DMA memory for the 
> body
>> -	 * and the optional header. Both are allocated together to avoid 
> memory
>> -	 * fragmentation, with the header located right after the body in
>> -	 * memory.
>> -	 */
>> -	header_size = dlm->mode == VSP1_DL_MODE_HEADER
>> -		    ? ALIGN(sizeof(struct vsp1_dl_header), 8)
>> -		    : 0;
>> -
>> -	ret = vsp1_dl_body_init(dlm->vsp1, &dl->body0, VSP1_DL_NUM_ENTRIES,
>> -				header_size);
>> -	if (ret < 0) {
>> -		kfree(dl);
>> +	/* Retrieve a body from our DLM body pool */
>> +	dl->body0 = vsp1_dl_fragment_get(pool);
>> +	if (!dl->body0)
>>  		return NULL;
>> -	}
>> -
>>  	if (dlm->mode == VSP1_DL_MODE_HEADER) {
>> -		size_t header_offset = VSP1_DL_NUM_ENTRIES
>> -				     * sizeof(*dl->body0.entries);
>> +		size_t header_offset = dl->body0->max_entries
>> +				     * sizeof(*dl->body0->entries);
>>
>> -		dl->header = ((void *)dl->body0.entries) + header_offset;
>> -		dl->dma = dl->body0.dma + header_offset;
>> +		dl->header = ((void *)dl->body0->entries) + header_offset;
>> +		dl->dma = dl->body0->dma + header_offset;
>>
>>  		memset(dl->header, 0, sizeof(*dl->header));
>> -		dl->header->lists[0].addr = dl->body0.dma;
>> +		dl->header->lists[0].addr = dl->body0->dma;
>>  	}
>>
>>  	return dl;
>>  }
>>
>> +static void vsp1_dl_list_fragments_free(struct vsp1_dl_list *dl)
> 
> This function doesn't free fragments put puts them back to the free list. I'd 
> call it vsp1_dl_list_fragments_put().
> 

Done

>> +{
>> +	struct vsp1_dl_body *dlb, *tmp;
>> +
>> +	list_for_each_entry_safe(dlb, tmp, &dl->fragments, list) {
>> +		list_del(&dlb->list);
>> +		vsp1_dl_fragment_put(dlb);
>> +	}
>> +}
>> +
>>  static void vsp1_dl_list_free(struct vsp1_dl_list *dl)
>>  {
>> -	vsp1_dl_body_cleanup(&dl->body0);
>> -	list_splice_init(&dl->fragments, &dl->dlm->gc_fragments);
>> +	vsp1_dl_fragment_put(dl->body0);
>> +	vsp1_dl_list_fragments_free(dl);
> 
> I wonder whether the second line is actually needed. vsp1_dl_list_free() is 
> called from vsp1_dlm_destroy() for every entry in the dlm->free list. A DL can 
> only be put in that list by vsp1_dlm_create() or __vsp1_dl_list_put(). The 
> former creates lists with no fragment, while the latter calls 
> vsp1_dl_list_fragments_free() already.
> 
> If you're not entirely sure you could add a WARN_ON(!list_empty(&dl-
>> fragments)) and run the test suite. A comment explaining why the fragments 
> list should already be empty here would be useful too.
> 

You may be right here, but would you object to leaving it in ?

Isn't it correct to ensure that the list is completely cleaned up on release?

Furthermore - I would anticipate that in the future - 'body0' could be removed,
(becoming a fragment) and thus this line would then be required.

## /where 's/fragments/bodies/g' applies to the above text. ##

>> +
>>  	kfree(dl);
>>  }
>>
>> @@ -467,18 +375,10 @@ static void __vsp1_dl_list_put(struct vsp1_dl_list
>> *dl)
>>
>>  	dl->has_chain = false;
>>
>> -	/*
>> -	 * We can't free fragments here as DMA memory can only be freed in
>> -	 * interruptible context. Move all fragments to the display list
>> -	 * manager's list of fragments to be freed, they will be
>> -	 * garbage-collected by the work queue.
>> -	 */
>> -	if (!list_empty(&dl->fragments)) {
>> -		list_splice_init(&dl->fragments, &dl->dlm->gc_fragments);
>> -		schedule_work(&dl->dlm->gc_work);
>> -	}
>> +	vsp1_dl_list_fragments_free(dl);
>>
>> -	dl->body0.num_entries = 0;
>> +	/* body0 is reused */
> 
> It would be useful to explain why. Maybe something like "body0 is reused as an 
> optimization as every display list needs at least one body." ? And now I'm 
> wondering it it's really a useful optimization :-)

Yes, currently each list has at least one body, == body0 - but I can foresee
that being 'optimised' out soon.


>> +	dl->body0->num_entries = 0;
>>
>>  	list_add_tail(&dl->list, &dl->dlm->free);
>>  }
> 
> [snip]
> 
>> @@ -898,13 +764,26 @@ struct vsp1_dl_manager *vsp1_dlm_create(struct
>> vsp1_device *vsp1,
>>
>>  	spin_lock_init(&dlm->lock);
>>  	INIT_LIST_HEAD(&dlm->free);
>> -	INIT_LIST_HEAD(&dlm->gc_fragments);
>> -	INIT_WORK(&dlm->gc_work, vsp1_dlm_garbage_collect);
>> +
>> +	/*
>> +	 * Initialize the display list body and allocate DMA memory for the 
> body
>> +	 * and the optional header. Both are allocated together to avoid 
> memory
>> +	 * fragmentation, with the header located right after the body in
>> +	 * memory.
>> +	 */
> 
> Nice to see you're keeping this comment, but maybe you want to update it 
> according to the code changes ;-)

Ahh yes, - of course this needs adjusting so that we only allocate a single
header per display list as well - I'll catch that in the next version.

I'm using this current rebase to clean up comments and rebase to mainline.


> 
>> +	header_size = dlm->mode == VSP1_DL_MODE_HEADER
>> +		    ? ALIGN(sizeof(struct vsp1_dl_header), 8)
>> +		    : 0;
>> +
>> +	dlm->pool = vsp1_dl_fragment_pool_alloc(vsp1, prealloc,
>> +					VSP1_DL_NUM_ENTRIES, header_size);
>> +	if (!dlm->pool)
>> +		return NULL;
>>
>>  	for (i = 0; i < prealloc; ++i) {
>>  		struct vsp1_dl_list *dl;
>>
>> -		dl = vsp1_dl_list_alloc(dlm);
>> +		dl = vsp1_dl_list_alloc(dlm, dlm->pool);
>>  		if (!dl)
>>  			return NULL;
>>
> 
> [snip]
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 2/8] v4l: vsp1: Provide a fragment pool
  2017-08-17 12:13   ` Laurent Pinchart
@ 2017-09-11 20:30     ` Kieran Bingham
  2017-09-13  2:15       ` Laurent Pinchart
  0 siblings, 1 reply; 32+ messages in thread
From: Kieran Bingham @ 2017-09-11 20:30 UTC (permalink / raw)
  To: Laurent Pinchart; +Cc: linux-renesas-soc, linux-media

Hi Laurent,

Thanks for your review,

On 17/08/17 13:13, Laurent Pinchart wrote:
> Hi Kieran,
> 
> Thank you for the patch.
> 
> On Monday 14 Aug 2017 16:13:25 Kieran Bingham wrote:
>> Each display list allocates a body to store register values in a dma
>> accessible buffer from a dma_alloc_wc() allocation. Each of these
>> results in an entry in the TLB, and a large number of display list
>> allocations adds pressure to this resource.
>>
>> Reduce TLB pressure on the IPMMUs by allocating multiple display list
>> bodies in a single allocation, and providing these to the display list
>> through a 'fragment pool'. A pool can be allocated by the display list
>> manager or entities which require their own body allocations.
>>
>> Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
>>
>> ---
>> v2:
>>  - assign dlb->dma correctly
>> ---
>>  drivers/media/platform/vsp1/vsp1_dl.c | 129 +++++++++++++++++++++++++++-
>>  drivers/media/platform/vsp1/vsp1_dl.h |   8 ++-
>>  2 files changed, 137 insertions(+)
>>
>> diff --git a/drivers/media/platform/vsp1/vsp1_dl.c
>> b/drivers/media/platform/vsp1/vsp1_dl.c index cb4625ae13c2..aab9dd6ec0eb
>> 100644
>> --- a/drivers/media/platform/vsp1/vsp1_dl.c
>> +++ b/drivers/media/platform/vsp1/vsp1_dl.c
>> @@ -45,6 +45,8 @@ struct vsp1_dl_entry {
>>  /**
>>   * struct vsp1_dl_body - Display list body
>>   * @list: entry in the display list list of bodies
>> + * @free: entry in the pool free body list
>> + * @pool: pool to which this body belongs
>>   * @vsp1: the VSP1 device
>>   * @entries: array of entries
>>   * @dma: DMA address of the entries
>> @@ -54,6 +56,9 @@ struct vsp1_dl_entry {
>>   */
>>  struct vsp1_dl_body {
>>  	struct list_head list;
>> +	struct list_head free;
>> +
>> +	struct vsp1_dl_fragment_pool *pool;
>>  	struct vsp1_device *vsp1;
>>
>>  	struct vsp1_dl_entry *entries;
>> @@ -65,6 +70,30 @@ struct vsp1_dl_body {
>>  };
>>
>>  /**
>> + * struct vsp1_dl_fragment_pool - display list body/fragment pool
>> + * @dma: DMA address of the entries
>> + * @size: size of the full DMA memory pool in bytes
>> + * @mem: CPU memory pointer for the pool
>> + * @bodies: Array of DLB structures for the pool
>> + * @free: List of free DLB entries
>> + * @lock: Protects the pool and free list
>> + * @vsp1: the VSP1 device
>> + */
>> +struct vsp1_dl_fragment_pool {
>> +	/* DMA allocation */
>> +	dma_addr_t dma;
>> +	size_t size;
>> +	void *mem;
>> +
>> +	/* Body management */
>> +	struct vsp1_dl_body *bodies;
>> +	struct list_head free;
>> +	spinlock_t lock;
>> +
>> +	struct vsp1_device *vsp1;
>> +};
>> +
>> +/**
>>   * struct vsp1_dl_list - Display list
>>   * @list: entry in the display list manager lists
>>   * @dlm: the display list manager
>> @@ -104,6 +133,7 @@ enum vsp1_dl_mode {
>>   * @active: list currently being processed (loaded) by hardware
>>   * @queued: list queued to the hardware (written to the DL registers)
>>   * @pending: list waiting to be queued to the hardware
>> + * @pool: fragment pool for the display list bodies
>>   * @gc_work: fragments garbage collector work struct
>>   * @gc_fragments: array of display list fragments waiting to be freed
>>   */
>> @@ -119,6 +149,8 @@ struct vsp1_dl_manager {
>>  	struct vsp1_dl_list *queued;
>>  	struct vsp1_dl_list *pending;
>>
>> +	struct vsp1_dl_fragment_pool *pool;
>> +
>>  	struct work_struct gc_work;
>>  	struct list_head gc_fragments;
>>  };
>> @@ -128,6 +160,103 @@ struct vsp1_dl_manager {
>>   */
>>
>>  /*
>> + * Fragment pool's reduce the pressure on the iommu TLB by allocating a
>> single
>> + * large area of DMA memory and allocating it as a pool of fragment bodies
>> + */
> 
> Could you document non-static function using kerneldoc ? Parameters to this 
> function would benefit from some documentation. I'd also like to see the 
> fragment get/put functions documented, as you remove existing kerneldoc for 
> the alloc/free existing functions in patch 3/8.

Ah yes of course.

>> +struct vsp1_dl_fragment_pool *
>> +vsp1_dl_fragment_pool_alloc(struct vsp1_device *vsp1, unsigned int qty,
> 
> I think I would name this function vsp1_dl_fragment_pool_create(), as it does 
> more than just allocating memory. Similarly I'd call the free function 
> vsp1_dl_fragment_pool_destroy().

That sounds reasonable. Done.

> qty is a bit vague, I'd rename it to num_fragments.

Ok with me.

> 
>> +			    unsigned int num_entries, size_t extra_size)
>> +{
>> +	struct vsp1_dl_fragment_pool *pool;
>> +	size_t dlb_size;
>> +	unsigned int i;
>> +
>> +	pool = kzalloc(sizeof(*pool), GFP_KERNEL);
>> +	if (!pool)
>> +		return NULL;
>> +
>> +	pool->vsp1 = vsp1;
>> +
>> +	dlb_size = num_entries * sizeof(struct vsp1_dl_entry) + extra_size;
> 
> extra_size is only used by vsp1_dlm_create(), to allocate extra memory for the 
> display list header. We need one header per display list, not per display list 
> body.

Good catch, that will take a little bit of reworking.

>> +	pool->size = dlb_size * qty;
>> +
>> +	pool->bodies = kcalloc(qty, sizeof(*pool->bodies), GFP_KERNEL);
>> +	if (!pool->bodies) {
>> +		kfree(pool);
>> +		return NULL;
>> +	}
>> +
>> +	pool->mem = dma_alloc_wc(vsp1->bus_master, pool->size, &pool->dma,
>> +					    GFP_KERNEL);
> 
> This is a weird indentation.

I know! - Not sure how that slipped by :)

> 
>> +	if (!pool->mem) {
>> +		kfree(pool->bodies);
>> +		kfree(pool);
>> +		return NULL;
>> +	}
>> +
>> +	spin_lock_init(&pool->lock);
>> +	INIT_LIST_HEAD(&pool->free);
>> +
>> +	for (i = 0; i < qty; ++i) {
>> +		struct vsp1_dl_body *dlb = &pool->bodies[i];
>> +
>> +		dlb->pool = pool;
>> +		dlb->max_entries = num_entries;
>> +
>> +		dlb->dma = pool->dma + i * dlb_size;
>> +		dlb->entries = pool->mem + i * dlb_size;
>> +
>> +		list_add_tail(&dlb->free, &pool->free);
>> +	}
>> +
>> +	return pool;
>> +}
>> +
>> +void vsp1_dl_fragment_pool_free(struct vsp1_dl_fragment_pool *pool)
>> +{
>> +	if (!pool)
>> +		return;
> 
> Can this happen ?

I was mirroring 'kfree()' support here ... such that error paths can be simple.

Would you prefer that it's required to be valid (non-null) pointer?

Actually - I think it is better to leave this for now - as we now call this
function from the .destroy() entity functions ...

>> +
>> +	if (pool->mem)
>> +		dma_free_wc(pool->vsp1->bus_master, pool->size, pool->mem,
>> +			    pool->dma);
>> +
>> +	kfree(pool->bodies);
>> +	kfree(pool);
>> +}
>> +
>> +struct vsp1_dl_body *vsp1_dl_fragment_get(struct vsp1_dl_fragment_pool
>> *pool)
>> +{
>> +	struct vsp1_dl_body *dlb = NULL;
>> +	unsigned long flags;
>> +
>> +	spin_lock_irqsave(&pool->lock, flags);
>> +
>> +	if (!list_empty(&pool->free)) {
>> +		dlb = list_first_entry(&pool->free, struct vsp1_dl_body, 
> free);
>> +		list_del(&dlb->free);
>> +	}
>> +
>> +	spin_unlock_irqrestore(&pool->lock, flags);
>> +
>> +	return dlb;
>> +}
>> +
>> +void vsp1_dl_fragment_put(struct vsp1_dl_body *dlb)
>> +{
>> +	unsigned long flags;
>> +
>> +	if (!dlb)
>> +		return;
>> +
>> +	dlb->num_entries = 0;
>> +
>> +	spin_lock_irqsave(&dlb->pool->lock, flags);
>> +	list_add_tail(&dlb->free, &dlb->pool->free);
>> +	spin_unlock_irqrestore(&dlb->pool->lock, flags);
>> +}
>> +
>> +/*
>>   * Initialize a display list body object and allocate DMA memory for the
>> body * data. The display list body object is expected to have been
>> initialized to * 0 when allocated.
>> diff --git a/drivers/media/platform/vsp1/vsp1_dl.h
>> b/drivers/media/platform/vsp1/vsp1_dl.h index ee3508172f0a..9528484a8a34
>> 100644
>> --- a/drivers/media/platform/vsp1/vsp1_dl.h
>> +++ b/drivers/media/platform/vsp1/vsp1_dl.h
>> @@ -17,6 +17,7 @@
>>
>>  struct vsp1_device;
>>  struct vsp1_dl_fragment;
>> +struct vsp1_dl_fragment_pool;
> 
> I noticed that the vsp1_dl_fragment structure is declared here but never 
> defined or used. The vsp1_dl_fragment_* functions all operate on vsp1_dl_body 
> structures.
> 
> The name body is used in the datasheet, so I think it would make sense to 
> s/fragments/bodies/ and s/fragment/body/ through the code as a prerequisite 
> for this patch, and rebasing it accordingly.

I agree, we work with bodies.

Patch created, and I'm rebasing this series on top. (of course this breaks all
of these patches .. /me takes a deep breath while fixing up :D )

# editing this mail 2 weeks later and I must be still holding my breath! - But
it's done :)

>>  struct vsp1_dl_list;
>>  struct vsp1_dl_manager;
>>
>> @@ -34,6 +35,13 @@ void vsp1_dl_list_put(struct vsp1_dl_list *dl);
>>  void vsp1_dl_list_write(struct vsp1_dl_list *dl, u32 reg, u32 data);
>>  void vsp1_dl_list_commit(struct vsp1_dl_list *dl);
>>
>> +struct vsp1_dl_fragment_pool *
>> +vsp1_dl_fragment_pool_alloc(struct vsp1_device *vsp1, unsigned int qty,
>> +			    unsigned int num_entries, size_t extra_size);
>> +void vsp1_dl_fragment_pool_free(struct vsp1_dl_fragment_pool *pool);
>> +struct vsp1_dl_body *vsp1_dl_fragment_get(struct vsp1_dl_fragment_pool
>> *pool);
>> +void vsp1_dl_fragment_put(struct vsp1_dl_body *dlb);
>> +
>>  struct vsp1_dl_body *vsp1_dl_fragment_alloc(struct vsp1_device *vsp1,
>>  					    unsigned int num_entries);
>>  void vsp1_dl_fragment_free(struct vsp1_dl_body *dlb);
> a
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 8/8] v4l: vsp1: Reduce display list body size
  2017-08-17 16:11   ` Laurent Pinchart
@ 2017-09-11 21:15     ` Kieran Bingham
  0 siblings, 0 replies; 32+ messages in thread
From: Kieran Bingham @ 2017-09-11 21:15 UTC (permalink / raw)
  To: Laurent Pinchart; +Cc: linux-renesas-soc, linux-media

On 17/08/17 17:11, Laurent Pinchart wrote:
> Hi Kieran,
> 
> Thank you for the patch.
> 
> On Monday 14 Aug 2017 16:13:31 Kieran Bingham wrote:
>> The display list originally allocated a body of 256 entries to store all
>> of the register lists required for each frame.
>>
>> This has now been separated into fragments for constant stream setup, and
>> runtime updates.
>>
>> Empirical testing shows that the body0 now uses a maximum of 41
>> registers for each frame, for both DRM and Video API pipelines thus a
>> rounded 64 entries provides a suitable allocation.
> 
> Didn't you mention in patch 7/8 that one of the fragments uses exactly 64 
> entries ? Which one is it, and is there a risk it could use more ? 

No, that referred to the fragments(bodies) which had been attached. This change
refers only to the body0 allocation which has a maximum of 41 entries written.

The fragment and partition allocations which reach 64 entries, are allocated
with room for 128 currently...

< yes, this can be revisited >

>> Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
>> ---
>>  drivers/media/platform/vsp1/vsp1_dl.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/media/platform/vsp1/vsp1_dl.c
>> b/drivers/media/platform/vsp1/vsp1_dl.c index 176a258146ac..b3f5eb2f9a4f
>> 100644
>> --- a/drivers/media/platform/vsp1/vsp1_dl.c
>> +++ b/drivers/media/platform/vsp1/vsp1_dl.c
>> @@ -21,7 +21,7 @@
>>  #include "vsp1.h"
>>  #include "vsp1_dl.h"
>>
>> -#define VSP1_DL_NUM_ENTRIES		256
>> +#define VSP1_DL_NUM_ENTRIES		64

This now only defines the size of the body0 which is the defacto list of entries
in a display list.

This too could / should be removed at somepoint I believe, leaving allocations
only where they are needed.
>>  #define VSP1_DLH_INT_ENABLE		(1 << 1)
>>  #define VSP1_DLH_AUTO_START		(1 << 0)
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 5/8] v4l: vsp1: Refactor display list configure operations
  2017-08-17 18:13   ` Laurent Pinchart
@ 2017-09-11 21:16     ` Kieran Bingham
  2017-09-12 19:19       ` Laurent Pinchart
  0 siblings, 1 reply; 32+ messages in thread
From: Kieran Bingham @ 2017-09-11 21:16 UTC (permalink / raw)
  To: Laurent Pinchart; +Cc: linux-renesas-soc, linux-media

Hi Laurent,

On 17/08/17 19:13, Laurent Pinchart wrote:
> Hi Kieran,
> 
> Thank you for the patch.
> 
> On Monday 14 Aug 2017 16:13:28 Kieran Bingham wrote:
>> The entities provide a single .configure operation which configures the
>> object into the target display list, based on the vsp1_entity_params
>> selection.
>>
>> This restricts us to a single function prototype for both static
>> configuration (the pre-stream INIT stage) and the dynamic runtime stages
>> for both each frame - and each partition therein.
>>
>> Split the configure function into two parts, '.prepare()' and
>> '.configure()', merging both the VSP1_ENTITY_PARAMS_RUNTIME and
>> VSP1_ENTITY_PARAMS_PARTITION stages into a single call through the
>> .configure(). The configuration for individual partitions is handled by
>> passing the partition number to the configure call, and processing any
>> runtime stage actions on the first partition only.
>>
>> Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
>> ---
>>  drivers/media/platform/vsp1/vsp1_bru.c    |  12 +-
>>  drivers/media/platform/vsp1/vsp1_clu.c    |  43 +--
>>  drivers/media/platform/vsp1/vsp1_drm.c    |  11 +-
>>  drivers/media/platform/vsp1/vsp1_entity.c |  15 +-
>>  drivers/media/platform/vsp1/vsp1_entity.h |  27 +--
>>  drivers/media/platform/vsp1/vsp1_hgo.c    |  12 +-
>>  drivers/media/platform/vsp1/vsp1_hgt.c    |  12 +-
>>  drivers/media/platform/vsp1/vsp1_hsit.c   |  12 +-
>>  drivers/media/platform/vsp1/vsp1_lif.c    |  12 +-
>>  drivers/media/platform/vsp1/vsp1_lut.c    |  24 +-
>>  drivers/media/platform/vsp1/vsp1_rpf.c    | 162 ++++++-------
>>  drivers/media/platform/vsp1/vsp1_sru.c    |  12 +-
>>  drivers/media/platform/vsp1/vsp1_uds.c    |  55 ++--
>>  drivers/media/platform/vsp1/vsp1_video.c  |  24 +--
>>  drivers/media/platform/vsp1/vsp1_wpf.c    | 297 ++++++++++++-----------
>>  15 files changed, 359 insertions(+), 371 deletions(-)
> 
> [snip]
> 
>> diff --git a/drivers/media/platform/vsp1/vsp1_clu.c
>> b/drivers/media/platform/vsp1/vsp1_clu.c index 175717018e11..5f65ce3ad97f
>> 100644
>> --- a/drivers/media/platform/vsp1/vsp1_clu.c
>> +++ b/drivers/media/platform/vsp1/vsp1_clu.c
>> @@ -213,37 +213,37 @@ static const struct v4l2_subdev_ops clu_ops = {
>>  /* ------------------------------------------------------------------------
>>   * VSP1 Entity Operations
>>   */
>> +static void clu_prepare(struct vsp1_entity *entity,
>> +			struct vsp1_pipeline *pipe,
>> +			struct vsp1_dl_list *dl)
>> +{
>> +	struct vsp1_clu *clu = to_clu(&entity->subdev);
>> +
>> +	/*
>> +	 * The format can't be changed during streaming, only verify it
>> +	 * at setup time and store the information internally for future
>> +	 * runtime configuration calls.
>> +	 */
> 
> I know you're just moving the comment around, but let's fix it at the same 
> time. There's no verification here (and no "setup time" either). I'd write it 
> as
> 
> 	/*
> 	 * The format can't be changed during streaming. Cache it internally
> 	 * for future runtime configuration calls.
> 	 */

I think I'm ok with that and I've updated the patch - but I'm not sure we are
really caching the 'format' here, as much as the yuv_mode ...

I'll ponder ...

> 
>> +	struct v4l2_mbus_framefmt *format;
>> +
>> +	format = vsp1_entity_get_pad_format(&clu->entity,
>> +					    clu->entity.config,
>> +					    CLU_PAD_SINK);
>> +	clu->yuv_mode = format->code == MEDIA_BUS_FMT_AYUV8_1X32;
>> +}
> 
> [snip]
> 
>> diff --git a/drivers/media/platform/vsp1/vsp1_entity.h
>> b/drivers/media/platform/vsp1/vsp1_entity.h index
>> 408602ebeb97..2f33e343ccc6 100644
>> --- a/drivers/media/platform/vsp1/vsp1_entity.h
>> +++ b/drivers/media/platform/vsp1/vsp1_entity.h
> 
> [snip]
> 
>> @@ -80,8 +68,10 @@ struct vsp1_route {
>>  /**
>>   * struct vsp1_entity_operations - Entity operations
>>   * @destroy:	Destroy the entity.
>> - * @configure:	Setup the hardware based on the entity state
>> (pipeline, formats,
>> - *		selection rectangles, ...)
>> + * @prepare:	Setup the initial hardware parameters for the stream
>> (pipeline,
>> + *		formats)
>> + * @configure:	Configure the runtime parameters for each partition
>> (rectangles,
>> + *		buffer addresses, ...)
> 
> Now moving to the bikeshedding territory, I'm not sure if prepare and 
> configure are the best names for those operations. I'd like to also point out 
> that we could go one step further by caching the partition-related parameters 
> too, in which case we would need a third operation (or possibly passing the 
> partition number to the prepare operation). While I won't mind if you 
> implement this now, the issue could also be addressed later, but I'd like the 
> operations to already support that use case to avoid yet another painful 
> rename patch.

Ok, understood - but I think I'll have to defer to a v4 for now ... I'm running
out of time.

>>   * @max_width:	Return the max supported width of data that the entity
>> can
>>   *		process in a single operation.
>>   * @partition:	Process the partition construction based on this
>> entity's
> 
> [snip]
> 
> The rest of the patch looks good to me.
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 6/8] v4l: vsp1: Adapt entities to configure into a body
  2017-08-17 17:58   ` Laurent Pinchart
@ 2017-09-11 21:42     ` Kieran Bingham
  2017-09-12 19:18       ` Laurent Pinchart
  0 siblings, 1 reply; 32+ messages in thread
From: Kieran Bingham @ 2017-09-11 21:42 UTC (permalink / raw)
  To: Laurent Pinchart; +Cc: linux-renesas-soc, linux-media

Hi Laurent,

On 17/08/17 18:58, Laurent Pinchart wrote:
> Hi Kieran,
> 
> Thank you for the patch.
> 
> On Monday 14 Aug 2017 16:13:29 Kieran Bingham wrote:
>> Currently the entities store their configurations into a display list.
>> Adapt this such that the code can be configured into a body fragment
>> directly, allowing greater flexibility and control of the content.
>>
>> All users of vsp1_dl_list_write() are removed in this process, thus it
>> too is removed.
>>
>> A helper, vsp1_dl_list_body() is provided to access the internal body0
>> from the display list.
>>
>> Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
>> ---
>>  drivers/media/platform/vsp1/vsp1_bru.c    | 22 ++++++------
>>  drivers/media/platform/vsp1/vsp1_clu.c    | 22 ++++++------
>>  drivers/media/platform/vsp1/vsp1_dl.c     | 12 ++-----
>>  drivers/media/platform/vsp1/vsp1_dl.h     |  2 +-
>>  drivers/media/platform/vsp1/vsp1_drm.c    | 14 +++++---
>>  drivers/media/platform/vsp1/vsp1_entity.c | 16 ++++-----
>>  drivers/media/platform/vsp1/vsp1_entity.h | 12 ++++---
>>  drivers/media/platform/vsp1/vsp1_hgo.c    | 16 ++++-----
>>  drivers/media/platform/vsp1/vsp1_hgt.c    | 18 +++++-----
>>  drivers/media/platform/vsp1/vsp1_hsit.c   | 10 +++---
>>  drivers/media/platform/vsp1/vsp1_lif.c    | 13 +++----
>>  drivers/media/platform/vsp1/vsp1_lut.c    | 21 ++++++------
>>  drivers/media/platform/vsp1/vsp1_pipe.c   |  4 +-
>>  drivers/media/platform/vsp1/vsp1_pipe.h   |  3 +-
>>  drivers/media/platform/vsp1/vsp1_rpf.c    | 43 +++++++++++-------------
>>  drivers/media/platform/vsp1/vsp1_sru.c    | 14 ++++----
>>  drivers/media/platform/vsp1/vsp1_uds.c    | 24 +++++++------
>>  drivers/media/platform/vsp1/vsp1_uds.h    |  2 +-
>>  drivers/media/platform/vsp1/vsp1_video.c  | 11 ++++--
>>  drivers/media/platform/vsp1/vsp1_wpf.c    | 42 ++++++++++++-----------
>>  20 files changed, 168 insertions(+), 153 deletions(-)
> 
> This is quite intrusive, and it bothers me slightly that we need to pass both 
> the DL and the DLB to the configure function in order to add fragments to the 
> DL in the CLU and LUT modules. Wouldn't it be simpler to add a pointer to the 
> current body in the DL structure, and modify vsp1_dl_list_write() to write to 
> the current fragment ?
> 

No doubt about it, 168+, 153- is certainly intrusive.

Yes, now I'm looking back at this, I think this does look like this part is not
quite the right approach.

Which otherwise stalls the series until I have time to reconsider. I will likely
repost the work I have done on the earlier patches, including the
's/fragment/body/g' changes and ready myself for a v4 which will contain the
heavier reworks.

--
Kieran

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 6/8] v4l: vsp1: Adapt entities to configure into a body
  2017-09-11 21:42     ` Kieran Bingham
@ 2017-09-12 19:18       ` Laurent Pinchart
  2017-11-17 13:40         ` Kieran Bingham
  0 siblings, 1 reply; 32+ messages in thread
From: Laurent Pinchart @ 2017-09-12 19:18 UTC (permalink / raw)
  To: kieran.bingham; +Cc: linux-renesas-soc, linux-media

Hi Kieran,

On Tuesday, 12 September 2017 00:42:09 EEST Kieran Bingham wrote:
> On 17/08/17 18:58, Laurent Pinchart wrote:
> > On Monday 14 Aug 2017 16:13:29 Kieran Bingham wrote:
> >> Currently the entities store their configurations into a display list.
> >> Adapt this such that the code can be configured into a body fragment
> >> directly, allowing greater flexibility and control of the content.
> >> 
> >> All users of vsp1_dl_list_write() are removed in this process, thus it
> >> too is removed.
> >> 
> >> A helper, vsp1_dl_list_body() is provided to access the internal body0
> >> from the display list.
> >> 
> >> Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
> >> ---
> >> 
> >>  drivers/media/platform/vsp1/vsp1_bru.c    | 22 ++++++------
> >>  drivers/media/platform/vsp1/vsp1_clu.c    | 22 ++++++------
> >>  drivers/media/platform/vsp1/vsp1_dl.c     | 12 ++-----
> >>  drivers/media/platform/vsp1/vsp1_dl.h     |  2 +-
> >>  drivers/media/platform/vsp1/vsp1_drm.c    | 14 +++++---
> >>  drivers/media/platform/vsp1/vsp1_entity.c | 16 ++++-----
> >>  drivers/media/platform/vsp1/vsp1_entity.h | 12 ++++---
> >>  drivers/media/platform/vsp1/vsp1_hgo.c    | 16 ++++-----
> >>  drivers/media/platform/vsp1/vsp1_hgt.c    | 18 +++++-----
> >>  drivers/media/platform/vsp1/vsp1_hsit.c   | 10 +++---
> >>  drivers/media/platform/vsp1/vsp1_lif.c    | 13 +++----
> >>  drivers/media/platform/vsp1/vsp1_lut.c    | 21 ++++++------
> >>  drivers/media/platform/vsp1/vsp1_pipe.c   |  4 +-
> >>  drivers/media/platform/vsp1/vsp1_pipe.h   |  3 +-
> >>  drivers/media/platform/vsp1/vsp1_rpf.c    | 43 +++++++++++-------------
> >>  drivers/media/platform/vsp1/vsp1_sru.c    | 14 ++++----
> >>  drivers/media/platform/vsp1/vsp1_uds.c    | 24 +++++++------
> >>  drivers/media/platform/vsp1/vsp1_uds.h    |  2 +-
> >>  drivers/media/platform/vsp1/vsp1_video.c  | 11 ++++--
> >>  drivers/media/platform/vsp1/vsp1_wpf.c    | 42 ++++++++++++-----------
> >>  20 files changed, 168 insertions(+), 153 deletions(-)
> > 
> > This is quite intrusive, and it bothers me slightly that we need to pass
> > both the DL and the DLB to the configure function in order to add
> > fragments to the DL in the CLU and LUT modules. Wouldn't it be simpler to
> > add a pointer to the current body in the DL structure, and modify
> > vsp1_dl_list_write() to write to the current fragment ?
> 
> No doubt about it, 168+, 153- is certainly intrusive.
> 
> Yes, now I'm looking back at this, I think this does look like this part is
> not quite the right approach.
> 
> Which otherwise stalls the series until I have time to reconsider. I will
> likely repost the work I have done on the earlier patches, including the
> 's/fragment/body/g' changes and ready myself for a v4 which will contain the
> heavier reworks.

Fine with me. Could you make sure to mention the open issues in the cover 
letter ? I want to avoid commenting on them if you know already that you will 
rework them later.

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 5/8] v4l: vsp1: Refactor display list configure operations
  2017-09-11 21:16     ` Kieran Bingham
@ 2017-09-12 19:19       ` Laurent Pinchart
  2017-11-17 15:07         ` Kieran Bingham
  0 siblings, 1 reply; 32+ messages in thread
From: Laurent Pinchart @ 2017-09-12 19:19 UTC (permalink / raw)
  To: kieran.bingham; +Cc: linux-renesas-soc, linux-media

Hi Kieran,

On Tuesday, 12 September 2017 00:16:50 EEST Kieran Bingham wrote:
> On 17/08/17 19:13, Laurent Pinchart wrote:
> > On Monday 14 Aug 2017 16:13:28 Kieran Bingham wrote:
> >> The entities provide a single .configure operation which configures the
> >> object into the target display list, based on the vsp1_entity_params
> >> selection.
> >> 
> >> This restricts us to a single function prototype for both static
> >> configuration (the pre-stream INIT stage) and the dynamic runtime stages
> >> for both each frame - and each partition therein.
> >> 
> >> Split the configure function into two parts, '.prepare()' and
> >> '.configure()', merging both the VSP1_ENTITY_PARAMS_RUNTIME and
> >> VSP1_ENTITY_PARAMS_PARTITION stages into a single call through the
> >> .configure(). The configuration for individual partitions is handled by
> >> passing the partition number to the configure call, and processing any
> >> runtime stage actions on the first partition only.
> >> 
> >> Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
> >> ---
> >> 
> >>  drivers/media/platform/vsp1/vsp1_bru.c    |  12 +-
> >>  drivers/media/platform/vsp1/vsp1_clu.c    |  43 +--
> >>  drivers/media/platform/vsp1/vsp1_drm.c    |  11 +-
> >>  drivers/media/platform/vsp1/vsp1_entity.c |  15 +-
> >>  drivers/media/platform/vsp1/vsp1_entity.h |  27 +--
> >>  drivers/media/platform/vsp1/vsp1_hgo.c    |  12 +-
> >>  drivers/media/platform/vsp1/vsp1_hgt.c    |  12 +-
> >>  drivers/media/platform/vsp1/vsp1_hsit.c   |  12 +-
> >>  drivers/media/platform/vsp1/vsp1_lif.c    |  12 +-
> >>  drivers/media/platform/vsp1/vsp1_lut.c    |  24 +-
> >>  drivers/media/platform/vsp1/vsp1_rpf.c    | 162 ++++++-------
> >>  drivers/media/platform/vsp1/vsp1_sru.c    |  12 +-
> >>  drivers/media/platform/vsp1/vsp1_uds.c    |  55 ++--
> >>  drivers/media/platform/vsp1/vsp1_video.c  |  24 +--
> >>  drivers/media/platform/vsp1/vsp1_wpf.c    | 297 ++++++++++++-----------
> >>  15 files changed, 359 insertions(+), 371 deletions(-)
> > 
> > [snip]
> > 
> >> diff --git a/drivers/media/platform/vsp1/vsp1_clu.c
> >> b/drivers/media/platform/vsp1/vsp1_clu.c index 175717018e11..5f65ce3ad97f
> >> 100644
> >> --- a/drivers/media/platform/vsp1/vsp1_clu.c
> >> +++ b/drivers/media/platform/vsp1/vsp1_clu.c
> >> @@ -213,37 +213,37 @@ static const struct v4l2_subdev_ops clu_ops = {
> >> 
> >>  /*
> >>  -----------------------------------------------------------------------
> >>  -
> >>  
> >>   * VSP1 Entity Operations
> >>   */
> >> 
> >> +static void clu_prepare(struct vsp1_entity *entity,
> >> +			struct vsp1_pipeline *pipe,
> >> +			struct vsp1_dl_list *dl)
> >> +{
> >> +	struct vsp1_clu *clu = to_clu(&entity->subdev);
> >> +
> >> +	/*
> >> +	 * The format can't be changed during streaming, only verify it
> >> +	 * at setup time and store the information internally for future
> >> +	 * runtime configuration calls.
> >> +	 */
> > 
> > I know you're just moving the comment around, but let's fix it at the same
> > time. There's no verification here (and no "setup time" either). I'd write
> > it as
> > 
> > 	/*
> > 	
> > 	 * The format can't be changed during streaming. Cache it internally
> > 	 * for future runtime configuration calls.
> > 	 */
> 
> I think I'm ok with that and I've updated the patch - but I'm not sure we
> are really caching the 'format' here, as much as the yuv_mode ...

Yes, it's the YUV mode we're caching, feel free to update the comment.

> I'll ponder ...
> 
> >> +	struct v4l2_mbus_framefmt *format;
> >> +
> >> +	format = vsp1_entity_get_pad_format(&clu->entity,
> >> +					    clu->entity.config,
> >> +					    CLU_PAD_SINK);
> >> +	clu->yuv_mode = format->code == MEDIA_BUS_FMT_AYUV8_1X32;
> >> +}
> > 
> > [snip]
> > 
> >> diff --git a/drivers/media/platform/vsp1/vsp1_entity.h
> >> b/drivers/media/platform/vsp1/vsp1_entity.h index
> >> 408602ebeb97..2f33e343ccc6 100644
> >> --- a/drivers/media/platform/vsp1/vsp1_entity.h
> >> +++ b/drivers/media/platform/vsp1/vsp1_entity.h
> > 
> > [snip]
> > 
> >> @@ -80,8 +68,10 @@ struct vsp1_route {
> >> 
> >>  /**
> >>  
> >>   * struct vsp1_entity_operations - Entity operations
> >>   * @destroy:	Destroy the entity.
> >> 
> >> - * @configure:	Setup the hardware based on the entity state
> >> (pipeline, formats,
> >> - *		selection rectangles, ...)
> >> + * @prepare:	Setup the initial hardware parameters for the stream
> >> (pipeline,
> >> + *		formats)
> >> + * @configure:	Configure the runtime parameters for each partition
> >> (rectangles,
> >> + *		buffer addresses, ...)
> > 
> > Now moving to the bikeshedding territory, I'm not sure if prepare and
> > configure are the best names for those operations. I'd like to also point
> > out that we could go one step further by caching the partition-related
> > parameters too, in which case we would need a third operation (or
> > possibly passing the partition number to the prepare operation). While I
> > won't mind if you implement this now, the issue could also be addressed
> > later, but I'd like the operations to already support that use case to
> > avoid yet another painful rename patch.
> 
> Ok, understood - but I think I'll have to defer to a v4 for now ... I'm
> running out of time.
> 
> >>   * @max_width:	Return the max supported width of data that the entity
> >> 
> >> can
> >> 
> >>   *		process in a single operation.
> >>   * @partition:	Process the partition construction based on this
> >> 
> >> entity's
> > 
> > [snip]
> > 
> > The rest of the patch looks good to me.

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 2/8] v4l: vsp1: Provide a fragment pool
  2017-09-11 20:30     ` Kieran Bingham
@ 2017-09-13  2:15       ` Laurent Pinchart
  0 siblings, 0 replies; 32+ messages in thread
From: Laurent Pinchart @ 2017-09-13  2:15 UTC (permalink / raw)
  To: kieran.bingham; +Cc: linux-renesas-soc, linux-media

Hi Kieran,

On Monday, 11 September 2017 23:30:25 EEST Kieran Bingham wrote:
> On 17/08/17 13:13, Laurent Pinchart wrote:
> > On Monday 14 Aug 2017 16:13:25 Kieran Bingham wrote:
> >> Each display list allocates a body to store register values in a dma
> >> accessible buffer from a dma_alloc_wc() allocation. Each of these
> >> results in an entry in the TLB, and a large number of display list
> >> allocations adds pressure to this resource.
> >> 
> >> Reduce TLB pressure on the IPMMUs by allocating multiple display list
> >> bodies in a single allocation, and providing these to the display list
> >> through a 'fragment pool'. A pool can be allocated by the display list
> >> manager or entities which require their own body allocations.
> >> 
> >> Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
> >> 
> >> ---
> >> 
> >> v2:
> >>  - assign dlb->dma correctly
> >> 
> >> ---
> >> 
> >>  drivers/media/platform/vsp1/vsp1_dl.c | 129 +++++++++++++++++++++++++++-
> >>  drivers/media/platform/vsp1/vsp1_dl.h |   8 ++-
> >>  2 files changed, 137 insertions(+)
> >> 
> >> diff --git a/drivers/media/platform/vsp1/vsp1_dl.c
> >> b/drivers/media/platform/vsp1/vsp1_dl.c index cb4625ae13c2..aab9dd6ec0eb
> >> 100644
> >> --- a/drivers/media/platform/vsp1/vsp1_dl.c
> >> +++ b/drivers/media/platform/vsp1/vsp1_dl.c

[snip]

> >>  /*
> >> + * Fragment pool's reduce the pressure on the iommu TLB by allocating a
> >> single
> >> + * large area of DMA memory and allocating it as a pool of fragment
> >> bodies
> >> + */
> > 
> > Could you document non-static function using kerneldoc ? Parameters to
> > this function would benefit from some documentation. I'd also like to see
> > the fragment get/put functions documented, as you remove existing
> > kerneldoc for the alloc/free existing functions in patch 3/8.
> 
> Ah yes of course.
> 
> >> +struct vsp1_dl_fragment_pool *
> >> +vsp1_dl_fragment_pool_alloc(struct vsp1_device *vsp1, unsigned int qty,
> > 
> > I think I would name this function vsp1_dl_fragment_pool_create(), as it
> > does more than just allocating memory. Similarly I'd call the free
> > function vsp1_dl_fragment_pool_destroy().
> 
> That sounds reasonable. Done.
> 
> > qty is a bit vague, I'd rename it to num_fragments.
> 
> Ok with me.
> 
> >> +			    unsigned int num_entries, size_t extra_size)
> >> +{
> >> +	struct vsp1_dl_fragment_pool *pool;
> >> +	size_t dlb_size;
> >> +	unsigned int i;
> >> +
> >> +	pool = kzalloc(sizeof(*pool), GFP_KERNEL);
> >> +	if (!pool)
> >> +		return NULL;
> >> +
> >> +	pool->vsp1 = vsp1;
> >> +
> >> +	dlb_size = num_entries * sizeof(struct vsp1_dl_entry) + extra_size;
> > 
> > extra_size is only used by vsp1_dlm_create(), to allocate extra memory for
> > the display list header. We need one header per display list, not per
> > display list body.
> 
> Good catch, that will take a little bit of reworking.

I didn't propose a fix for this as I wasn't sure how to fix it properly. I 
thus won't complain too loudly if you can't fix it either and waste a bit of 
memory :-) But in that case please add a comment to explain what's going on.

> >> +	pool->size = dlb_size * qty;
> >> +
> >> +	pool->bodies = kcalloc(qty, sizeof(*pool->bodies), GFP_KERNEL);
> >> +	if (!pool->bodies) {
> >> +		kfree(pool);
> >> +		return NULL;
> >> +	}
> >> +
> >> +	pool->mem = dma_alloc_wc(vsp1->bus_master, pool->size, &pool->dma,
> >> +					    GFP_KERNEL);
> > 
> > This is a weird indentation.
> 
> I know! - Not sure how that slipped by :)
> 
> >> +	if (!pool->mem) {
> >> +		kfree(pool->bodies);
> >> +		kfree(pool);
> >> +		return NULL;
> >> +	}
> >> +
> >> +	spin_lock_init(&pool->lock);
> >> +	INIT_LIST_HEAD(&pool->free);
> >> +
> >> +	for (i = 0; i < qty; ++i) {
> >> +		struct vsp1_dl_body *dlb = &pool->bodies[i];
> >> +
> >> +		dlb->pool = pool;
> >> +		dlb->max_entries = num_entries;
> >> +
> >> +		dlb->dma = pool->dma + i * dlb_size;
> >> +		dlb->entries = pool->mem + i * dlb_size;
> >> +
> >> +		list_add_tail(&dlb->free, &pool->free);
> >> +	}
> >> +
> >> +	return pool;
> >> +}
> >> +
> >> +void vsp1_dl_fragment_pool_free(struct vsp1_dl_fragment_pool *pool)
> >> +{
> >> +	if (!pool)
> >> +		return;
> > 
> > Can this happen ?
> 
> I was mirroring 'kfree()' support here ... such that error paths can be
> simple.
> 
> Would you prefer that it's required to be valid (non-null) pointer?
> 
> Actually - I think it is better to leave this for now - as we now call this
> function from the .destroy() entity functions ...

It was a genuine question :-) We have more control over the 
vsp1_dl_fragment_pool_free() callers as the function is internal to the 
driver. If we have real use cases for pool being NULL then let's keep the 
check.

> >> +
> >> +	if (pool->mem)
> >> +		dma_free_wc(pool->vsp1->bus_master, pool->size, pool->mem,
> >> +			    pool->dma);
> >> +
> >> +	kfree(pool->bodies);
> >> +	kfree(pool);
> >> +}

[snip]

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 3/8] v4l: vsp1: Convert display lists to use new fragment pool
  2017-09-11 20:27     ` Kieran Bingham
@ 2017-09-13  2:26       ` Laurent Pinchart
  0 siblings, 0 replies; 32+ messages in thread
From: Laurent Pinchart @ 2017-09-13  2:26 UTC (permalink / raw)
  To: kieran.bingham; +Cc: linux-renesas-soc, linux-media

Hi Kieran,

On Monday, 11 September 2017 23:27:39 EEST Kieran Bingham wrote:
> On 17/08/17 13:13, Laurent Pinchart wrote:
> > On Monday 14 Aug 2017 16:13:26 Kieran Bingham wrote:
> >> Adapt the dl->body0 object to use an object from the fragment pool.
> >> This greatly reduces the pressure on the TLB for IPMMU use cases, as
> >> all of the lists use a single allocation for the main body.
> >> 
> >> The CLU and LUT objects pre-allocate a pool containing two bodies,
> >> allowing a userspace update before the hardware has committed a previous
> >> set of tables.
> > 
> > I think you'll need three bodies, one for the DL queued to the hardware,
> > one for the pending DL and one for the new DL needed when you update the
> > LUT/CLU. Given that the VSP test suite hasn't caught this problem, we
> > also need a new test :-)
> > 
> >> Fragments are no longer 'freed' in interrupt context, but instead
> >> released back to their respective pools.  This allows us to remove the
> >> garbage collector in the DLM.
> >> 
> >> Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
> >> 
> >> ---
> >> 
> >> v2:
> >>  - Use dl->body0->max_entries to determine header offset, instead of the
> >>    global constant VSP1_DL_NUM_ENTRIES which is incorrect.
> >>  
> >>  - squash updates for LUT, CLU, and fragment cleanup into single patch.
> >>    (Not fully bisectable when separated)
> >> 
> >> ---
> >> 
> >>  drivers/media/platform/vsp1/vsp1_clu.c |  22 ++-
> >>  drivers/media/platform/vsp1/vsp1_clu.h |   1 +-
> >>  drivers/media/platform/vsp1/vsp1_dl.c  | 223 +++++---------------------
> >>  drivers/media/platform/vsp1/vsp1_dl.h  |   3 +-
> >>  drivers/media/platform/vsp1/vsp1_lut.c |  23 ++-
> >>  drivers/media/platform/vsp1/vsp1_lut.h |   1 +-
> >>  6 files changed, 90 insertions(+), 183 deletions(-)
> > 
> > This is a nice diffstat, but only if you add kerneldoc for the new
> > functions introduced in patch 2/8, otherwise the overall documentation
> > diffstat looks bad :-)

[snip]

> >> diff --git a/drivers/media/platform/vsp1/vsp1_dl.c
> >> b/drivers/media/platform/vsp1/vsp1_dl.c index aab9dd6ec0eb..6ffdc3549283
> >> 100644
> >> --- a/drivers/media/platform/vsp1/vsp1_dl.c
> >> +++ b/drivers/media/platform/vsp1/vsp1_dl.c

[snip]

> >>  static void vsp1_dl_list_free(struct vsp1_dl_list *dl)
> >>  {
> >> 
> >> -	vsp1_dl_body_cleanup(&dl->body0);
> >> -	list_splice_init(&dl->fragments, &dl->dlm->gc_fragments);
> >> +	vsp1_dl_fragment_put(dl->body0);
> >> +	vsp1_dl_list_fragments_free(dl);
> > 
> > I wonder whether the second line is actually needed. vsp1_dl_list_free()
> > is called from vsp1_dlm_destroy() for every entry in the dlm->free list. A
> > DL can only be put in that list by vsp1_dlm_create() or
> > __vsp1_dl_list_put(). The former creates lists with no fragment, while
> > the latter calls vsp1_dl_list_fragments_free() already.
> > 
> > If you're not entirely sure you could add a WARN_ON(!list_empty(&dl-
> > >fragments)) and run the test suite. A comment explaining why the
> > fragments list should already be empty here would be useful too.
> 
> You may be right here, but would you object to leaving it in ?
> 
> Isn't it correct to ensure that the list is completely cleaned up on
> release?
> 
> Furthermore - I would anticipate that in the future - 'body0' could be
> removed, (becoming a fragment) and thus this line would then be required.
> 
> ## /where 's/fragments/bodies/g' applies to the above text. ##

I'm fine with that for now.

> >> +
> >> 
> >>  	kfree(dl);
> >>  }

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 7/8] v4l: vsp1: Move video configuration to a cached dlb
  2017-08-17 18:10   ` Laurent Pinchart
@ 2017-11-16 18:19     ` Kieran Bingham
  0 siblings, 0 replies; 32+ messages in thread
From: Kieran Bingham @ 2017-11-16 18:19 UTC (permalink / raw)
  To: Laurent Pinchart; +Cc: linux-renesas-soc, linux-media

Hi Laurent,

Thankyou for the review, and your patience on the long awaited response on these
remaining patches.

On 17/08/17 19:10, Laurent Pinchart wrote:
> Hi Kieran,
> 
> Thank you for the patch.
> 
> On Monday 14 Aug 2017 16:13:30 Kieran Bingham wrote:
>> We are now able to configure a pipeline directly into a local display
>> list body. Take advantage of this fact, and create a cacheable body to
>> store the configuration of the pipeline in the video object.
>>
>> vsp1_video_pipeline_run() is now the last user of the pipe->dl object.
>> Convert this function to use the cached video->config body and obtain a
>> local display list reference.
>>
>> Attach the video->config body to the display list when needed before
>> committing to hardware.
>>
>> The pipe object is marked as un-configured when entering a suspend. This
>> ensures that upon resume, where the hardware is reset - our cached
>> configuration will be re-attached to the next committed DL.
>>
>> Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
>> ---
>>
>> Our video DL usage now looks like the below output:
>>
>> dl->body0 contains our disposable runtime configuration. Max 41.
>> dl_child->body0 is our partition specific configuration. Max 12.
>> dl->fragments shows our constant configuration and LUTs.
>>
>>   These two are LUT/CLU:
>>      * dl->fragments[x]->num_entries 256 / max 256
>>      * dl->fragments[x]->num_entries 4914 / max 4914
>>
>> Which shows that our 'constant' configuration cache is currently
>> utilised to a maximum of 64 entries.
>>
>> trace-cmd report | \
>>     grep max | sed 's/.*vsp1_dl_list_commit://g' | sort | uniq;
>>
>>   dl->body0->num_entries 13 / max 128
>>   dl->body0->num_entries 14 / max 128
>>   dl->body0->num_entries 16 / max 128
>>   dl->body0->num_entries 20 / max 128
>>   dl->body0->num_entries 27 / max 128
>>   dl->body0->num_entries 34 / max 128
>>   dl->body0->num_entries 41 / max 128
>>   dl_child->body0->num_entries 10 / max 128
>>   dl_child->body0->num_entries 12 / max 128
>>   dl->fragments[x]->num_entries 15 / max 128
>>   dl->fragments[x]->num_entries 16 / max 128
>>   dl->fragments[x]->num_entries 17 / max 128
>>   dl->fragments[x]->num_entries 18 / max 128
>>   dl->fragments[x]->num_entries 20 / max 128
>>   dl->fragments[x]->num_entries 21 / max 128
>>   dl->fragments[x]->num_entries 256 / max 256
>>   dl->fragments[x]->num_entries 31 / max 128
>>   dl->fragments[x]->num_entries 32 / max 128
>>   dl->fragments[x]->num_entries 39 / max 128
>>   dl->fragments[x]->num_entries 40 / max 128
>>   dl->fragments[x]->num_entries 47 / max 128
>>   dl->fragments[x]->num_entries 48 / max 128
>>   dl->fragments[x]->num_entries 4914 / max 4914
>>   dl->fragments[x]->num_entries 55 / max 128
>>   dl->fragments[x]->num_entries 56 / max 128
>>   dl->fragments[x]->num_entries 63 / max 128
>>   dl->fragments[x]->num_entries 64 / max 128
>> ---
>>  drivers/media/platform/vsp1/vsp1_pipe.c  |  4 +-
>>  drivers/media/platform/vsp1/vsp1_pipe.h  |  4 +-
>>  drivers/media/platform/vsp1/vsp1_video.c | 67 ++++++++++++++++---------
>>  drivers/media/platform/vsp1/vsp1_video.h |  2 +-
>>  4 files changed, 51 insertions(+), 26 deletions(-)
>>
>> diff --git a/drivers/media/platform/vsp1/vsp1_pipe.c
>> b/drivers/media/platform/vsp1/vsp1_pipe.c index 5012643583b6..7d1f7ba43060
>> 100644
>> --- a/drivers/media/platform/vsp1/vsp1_pipe.c
>> +++ b/drivers/media/platform/vsp1/vsp1_pipe.c
>> @@ -249,6 +249,7 @@ void vsp1_pipeline_run(struct vsp1_pipeline *pipe)
>>  		vsp1_write(vsp1, VI6_CMD(pipe->output->entity.index),
>>  			   VI6_CMD_STRCMD);
>>  		pipe->state = VSP1_PIPELINE_RUNNING;
>> +		pipe->configured = true;
>>  	}
>>
>>  	pipe->buffers_ready = 0;
>> @@ -430,6 +431,9 @@ void vsp1_pipelines_suspend(struct vsp1_device *vsp1)
>>  		spin_lock_irqsave(&pipe->irqlock, flags);
>>  		if (pipe->state == VSP1_PIPELINE_RUNNING)
>>  			pipe->state = VSP1_PIPELINE_STOPPING;
>> +
>> +		/* After a suspend, the hardware will be reset */
>> +		pipe->configured = false;
> 
> It shouldn't make a difference in practice, but I think it would be more 
> logical to set the configured field to false after the hardware has been 
> reset. I'd move this to the resume handler and update the comment to "The 
> hardware might have been reset during suspend and need a full 
> reconfiguration". 

Agreed, and Done.


> 
>>  		spin_unlock_irqrestore(&pipe->irqlock, flags);
>>  	}
>>
>> diff --git a/drivers/media/platform/vsp1/vsp1_pipe.h
>> b/drivers/media/platform/vsp1/vsp1_pipe.h index 90d29492b9b9..e7ad6211b4d0
>> 100644
>> --- a/drivers/media/platform/vsp1/vsp1_pipe.h
>> +++ b/drivers/media/platform/vsp1/vsp1_pipe.h
>> @@ -90,6 +90,7 @@ struct vsp1_partition {
>>   * @irqlock: protects the pipeline state
>>   * @state: current state
>>   * @wq: wait queue to wait for state change completion
>> + * @configured: flag determining if the hardware has run since reset
>>   * @frame_end: frame end interrupt handler
>>   * @lock: protects the pipeline use count and stream count
>>   * @kref: pipeline reference count
>> @@ -117,6 +118,7 @@ struct vsp1_pipeline {
>>  	spinlock_t irqlock;
>>  	enum vsp1_pipeline_state state;
>>  	wait_queue_head_t wq;
>> +	bool configured;
>>
>>  	void (*frame_end)(struct vsp1_pipeline *pipe, bool completed);
>>
>> @@ -143,8 +145,6 @@ struct vsp1_pipeline {
>>  	 */
>>  	struct list_head entities;
>>
>> -	struct vsp1_dl_list *dl;
>> -
>>  	unsigned int partitions;
>>  	struct vsp1_partition *partition;
>>  	struct vsp1_partition *part_table;
>> diff --git a/drivers/media/platform/vsp1/vsp1_video.c
>> b/drivers/media/platform/vsp1/vsp1_video.c index 7e825f3360bf..42b70b8465ba
>> 100644
>> --- a/drivers/media/platform/vsp1/vsp1_video.c
>> +++ b/drivers/media/platform/vsp1/vsp1_video.c
>> @@ -394,37 +394,43 @@ static void vsp1_video_pipeline_run_partition(struct
>> vsp1_pipeline *pipe, static void vsp1_video_pipeline_run(struct
>> vsp1_pipeline *pipe)
>>  {
>>  	struct vsp1_device *vsp1 = pipe->output->entity.vsp1;
>> +	struct vsp1_video *video = pipe->output->video;
>>  	unsigned int partition;
>> +	struct vsp1_dl_list *dl;
>> +
>> +	dl = vsp1_dl_list_get(pipe->output->dlm);
>>
>> -	if (!pipe->dl)
>> -		pipe->dl = vsp1_dl_list_get(pipe->output->dlm);
>> +	/* Attach our pipe configuration to fully initialise the hardware */
>> +	if (!pipe->configured) {
>> +		vsp1_dl_list_add_fragment(dl, video->pipe_config);
>> +		pipe->configured = true;
>> +	}
>>
>>  	/* Run the first partition */
>> -	vsp1_video_pipeline_run_partition(pipe, pipe->dl, 0);
>> +	vsp1_video_pipeline_run_partition(pipe, dl, 0);
>>
>>  	/* Process consecutive partitions as necessary */
>>  	for (partition = 1; partition < pipe->partitions; ++partition) {
>> -		struct vsp1_dl_list *dl;
>> +		struct vsp1_dl_list *dl_child;
> 
> Is this really a child ? From a chaining point of view, it's more of a 
> sibling. Maybe dl_next or dl_partition ?
> 

Updated to dl_next.

>>
>> -		dl = vsp1_dl_list_get(pipe->output->dlm);
>> +		dl_child = vsp1_dl_list_get(pipe->output->dlm);
>>
>>  		/*
>>  		 * An incomplete chain will still function, but output only
>>  		 * the partitions that had a dl available. The frame end
>>  		 * interrupt will be marked on the last dl in the chain.
>>  		 */
>> -		if (!dl) {
>> +		if (!dl_child) {
>>  			dev_err(vsp1->dev, "Failed to obtain a dl list. Frame
>> will be incomplete\n");
>>  			break;
>>  		}
>>
>> -		vsp1_video_pipeline_run_partition(pipe, dl, partition);
>> -		vsp1_dl_list_add_chain(pipe->dl, dl);
>> +		vsp1_video_pipeline_run_partition(pipe, dl_child, partition);
>> +		vsp1_dl_list_add_chain(dl, dl_child);
>>  	}
>>
>>  	/* Complete, and commit the head display list. */
>> -	vsp1_dl_list_commit(pipe->dl);
>> -	pipe->dl = NULL;
>> +	vsp1_dl_list_commit(dl);
>>
>>  	vsp1_pipeline_run(pipe);
>>  }
>> @@ -790,8 +796,8 @@ static void vsp1_video_buffer_queue(struct vb2_buffer
>> *vb)
>>
>>  static int vsp1_video_setup_pipeline(struct vsp1_pipeline *pipe)
>>  {
>> +	struct vsp1_video *video = pipe->output->video;
>>  	struct vsp1_entity *entity;
>> -	struct vsp1_dl_body *dlb;
>>  	int ret;
>>
>>  	/* Determine this pipelines sizes for image partitioning support. */
>> @@ -799,14 +805,6 @@ static int vsp1_video_setup_pipeline(struct
>> vsp1_pipeline *pipe)
>>  	if (ret < 0)
>>  		return ret;
>>
>> -	/* Prepare the display list. */
>> -	pipe->dl = vsp1_dl_list_get(pipe->output->dlm);
>> -	if (!pipe->dl)
>> -		return -ENOMEM;
>> -
>> -	/* Retrieve the default DLB from the list */
>> -	dlb = vsp1_dl_list_get_body(pipe->dl);
>> -
>>  	if (pipe->uds) {
>>  		struct vsp1_uds *uds = to_uds(&pipe->uds->subdev);
>>
>> @@ -828,11 +826,20 @@ static int vsp1_video_setup_pipeline(struct
>> vsp1_pipeline *pipe) }
>>  	}
>>
>> +	/* Obtain a clean body from our pool */
>> +	video->pipe_config = vsp1_dl_fragment_get(video->dlbs);
>> +	if (!video->pipe_config)
>> +		return -ENOMEM;
> 
> Is there a reason to store the pipe configuration in the video object instead 
> of the pipeline object ?

At the moment, yes,

If this is allocated as part of the pipe, then the vsp1_pipeline_init() changes
from a function that simply initialises the existing structures, to a function
that can fail to allocate. I didn't want this to be that invasive.

Also - the pipe_configuration cache is local only to the vsp1_video object - not
the vsp1_drm object.

If you'd prefer this to be allocated as part of the pipe object I can update
this, however it then means that we are allocating these bodies for the DRM use
case as well.


>> +	/* Configure the entities into our cached pipe configuration */
>>  	list_for_each_entry(entity, &pipe->entities, list_pipe) {
>> -		vsp1_entity_route_setup(entity, pipe, dlb);
>> -		vsp1_entity_prepare(entity, pipe, dlb);
>> +		vsp1_entity_route_setup(entity, pipe, video->pipe_config);
>> +		vsp1_entity_prepare(entity, pipe, video->pipe_config);
>>  	}
>>
>> +	/* Ensure that our cached configuration is updated in the next DL */
>> +	pipe->configured = false;
> 
> I'm tempted to move this at pipeline stop time (either to 
> vsp1_video_stop_streaming() right after the vsp1_pipeline_stop() call, or in 
> vsp1_pipeline_stop() itself), possibly with a WARN_ON() here to catch bugs in 
> the driver.

Do you mean just setting the flag? or the pipe_configuration? This is a setup
task - not a stop task ... ? We are doing this as part of
vsp1_video_start_streaming().

IMO, The flag should only be updated after the configuration has been updated to
signal that the new configuration should be written out to the hardware.

Unless you mean to mark the pipe->configured = false; at vsp1_pipeline_stop()
time because we reset the pipe to halt it ?

> 
>>  	return 0;
>>  }
>>
>> @@ -842,6 +849,9 @@ static void vsp1_video_cleanup_pipeline(struct
>> vsp1_pipeline *pipe) struct vsp1_vb2_buffer *buffer;
>>  	unsigned long flags;
>>
>> +	/* Release any cached configuration */
>> +	vsp1_dl_fragment_put(video->pipe_config);
>> +
>>  	/* Remove all buffers from the IRQ queue. */
>>  	spin_lock_irqsave(&video->irqlock, flags);
>>  	list_for_each_entry(buffer, &video->irqqueue, queue)
>> @@ -918,9 +928,6 @@ static void vsp1_video_stop_streaming(struct vb2_queue
>> *vq) ret = vsp1_pipeline_stop(pipe);
>>  		if (ret == -ETIMEDOUT)
>>  			dev_err(video->vsp1->dev, "pipeline stop timeout\n");
>> -
>> -		vsp1_dl_list_put(pipe->dl);
>> -		pipe->dl = NULL;
>>  	}
>>  	mutex_unlock(&pipe->lock);
>>
>> @@ -1240,6 +1247,16 @@ struct vsp1_video *vsp1_video_create(struct
>> vsp1_device *vsp1, goto error;
>>  	}
>>
>> +	/*
>> +	 * Create a fragment pool to cache the constant configuration of the
>> +	 * pipeline object
>> +	 */
>> +	video->dlbs = vsp1_dl_fragment_pool_alloc(vsp1, 2, 128, 0);
>> +	if (!video->dlbs) {
>> +		ret = -ENOMEM;
>> +		goto error;
>> +	}
>> +
>>  	return video;
>>
>>  error:
>> @@ -1249,6 +1266,8 @@ struct vsp1_video *vsp1_video_create(struct
>> vsp1_device *vsp1,
>>
>>  void vsp1_video_cleanup(struct vsp1_video *video)
>>  {
>> +	vsp1_dl_fragment_pool_free(video->dlbs);
>> +
>>  	if (video_is_registered(&video->video))
>>  		video_unregister_device(&video->video);
>>
>> diff --git a/drivers/media/platform/vsp1/vsp1_video.h
>> b/drivers/media/platform/vsp1/vsp1_video.h index 50ea7f02205f..2499d3d792b4
>> 100644
>> --- a/drivers/media/platform/vsp1/vsp1_video.h
>> +++ b/drivers/media/platform/vsp1/vsp1_video.h
>> @@ -43,6 +43,8 @@ struct vsp1_video {
>>
>>  	struct mutex lock;
>>
>> +	struct vsp1_dl_fragment_pool *dlbs;
>> +	struct vsp1_dl_body *pipe_config;
>>  	unsigned int pipe_index;
>>
>>  	struct vb2_queue queue;
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 6/8] v4l: vsp1: Adapt entities to configure into a body
  2017-09-12 19:18       ` Laurent Pinchart
@ 2017-11-17 13:40         ` Kieran Bingham
  0 siblings, 0 replies; 32+ messages in thread
From: Kieran Bingham @ 2017-11-17 13:40 UTC (permalink / raw)
  To: Laurent Pinchart; +Cc: linux-renesas-soc, linux-media

Hi Laurent,

On 12/09/17 20:18, Laurent Pinchart wrote:
> Hi Kieran,
> 
> On Tuesday, 12 September 2017 00:42:09 EEST Kieran Bingham wrote:
>> On 17/08/17 18:58, Laurent Pinchart wrote:
>>> On Monday 14 Aug 2017 16:13:29 Kieran Bingham wrote:
>>>> Currently the entities store their configurations into a display list.
>>>> Adapt this such that the code can be configured into a body fragment
>>>> directly, allowing greater flexibility and control of the content.
>>>>
>>>> All users of vsp1_dl_list_write() are removed in this process, thus it
>>>> too is removed.
>>>>
>>>> A helper, vsp1_dl_list_body() is provided to access the internal body0
>>>> from the display list.
>>>>
>>>> Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
>>>> ---
>>>>
>>>>  drivers/media/platform/vsp1/vsp1_bru.c    | 22 ++++++------
>>>>  drivers/media/platform/vsp1/vsp1_clu.c    | 22 ++++++------
>>>>  drivers/media/platform/vsp1/vsp1_dl.c     | 12 ++-----
>>>>  drivers/media/platform/vsp1/vsp1_dl.h     |  2 +-
>>>>  drivers/media/platform/vsp1/vsp1_drm.c    | 14 +++++---
>>>>  drivers/media/platform/vsp1/vsp1_entity.c | 16 ++++-----
>>>>  drivers/media/platform/vsp1/vsp1_entity.h | 12 ++++---
>>>>  drivers/media/platform/vsp1/vsp1_hgo.c    | 16 ++++-----
>>>>  drivers/media/platform/vsp1/vsp1_hgt.c    | 18 +++++-----
>>>>  drivers/media/platform/vsp1/vsp1_hsit.c   | 10 +++---
>>>>  drivers/media/platform/vsp1/vsp1_lif.c    | 13 +++----
>>>>  drivers/media/platform/vsp1/vsp1_lut.c    | 21 ++++++------
>>>>  drivers/media/platform/vsp1/vsp1_pipe.c   |  4 +-
>>>>  drivers/media/platform/vsp1/vsp1_pipe.h   |  3 +-
>>>>  drivers/media/platform/vsp1/vsp1_rpf.c    | 43 +++++++++++-------------
>>>>  drivers/media/platform/vsp1/vsp1_sru.c    | 14 ++++----
>>>>  drivers/media/platform/vsp1/vsp1_uds.c    | 24 +++++++------
>>>>  drivers/media/platform/vsp1/vsp1_uds.h    |  2 +-
>>>>  drivers/media/platform/vsp1/vsp1_video.c  | 11 ++++--
>>>>  drivers/media/platform/vsp1/vsp1_wpf.c    | 42 ++++++++++++-----------
>>>>  20 files changed, 168 insertions(+), 153 deletions(-)
>>>
>>> This is quite intrusive, and it bothers me slightly that we need to pass
>>> both the DL and the DLB to the configure function in order to add
>>> fragments to the DL in the CLU and LUT modules. Wouldn't it be simpler to
>>> add a pointer to the current body in the DL structure, and modify
>>> vsp1_dl_list_write() to write to the current fragment ?
>>
>> No doubt about it, 168+, 153- is certainly intrusive.
>>
>> Yes, now I'm looking back at this, I think this does look like this part is
>> not quite the right approach.
>>
>> Which otherwise stalls the series until I have time to reconsider. I will
>> likely repost the work I have done on the earlier patches, including the
>> 's/fragment/body/g' changes and ready myself for a v4 which will contain the
>> heavier reworks.
> 
> Fine with me. Could you make sure to mention the open issues in the cover 
> letter ? I want to avoid commenting on them if you know already that you will 
> rework them later.

I've been trying to tackle this today, but I think I've come up a bit stuck on a
key part.

The reason for this patch, is to allow the functions to configure directly into
a display list body, even when that body *is not part* of a display list.

So - converting vsp1_dl_list_write() to configure into the 'current' body (was
fragment) of a display list would not work for writing to the cached objects -
which do not have a display list. They are simply body objects.

It seems a bit extraneous to create holding display lists to contain a single
body, when the display list itself will never be used, but I can't think of an
alternative.

Would you prefer this 'container display list' approach? or do you have another
idea?

--
Regards

Kieran

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 5/8] v4l: vsp1: Refactor display list configure operations
  2017-09-12 19:19       ` Laurent Pinchart
@ 2017-11-17 15:07         ` Kieran Bingham
  2018-02-28 16:41           ` Kieran Bingham
  0 siblings, 1 reply; 32+ messages in thread
From: Kieran Bingham @ 2017-11-17 15:07 UTC (permalink / raw)
  To: Laurent Pinchart; +Cc: linux-renesas-soc, linux-media

Hi Laurent,

Just a query on your bikeshedding here.

Choose your colours wisely :)

--
Kieran

On 12/09/17 20:19, Laurent Pinchart wrote:
> Hi Kieran,
> 
> On Tuesday, 12 September 2017 00:16:50 EEST Kieran Bingham wrote:
>> On 17/08/17 19:13, Laurent Pinchart wrote:
>>> On Monday 14 Aug 2017 16:13:28 Kieran Bingham wrote:
>>>> The entities provide a single .configure operation which configures the
>>>> object into the target display list, based on the vsp1_entity_params
>>>> selection.
>>>>
>>>> This restricts us to a single function prototype for both static
>>>> configuration (the pre-stream INIT stage) and the dynamic runtime stages
>>>> for both each frame - and each partition therein.
>>>>
>>>> Split the configure function into two parts, '.prepare()' and
>>>> '.configure()', merging both the VSP1_ENTITY_PARAMS_RUNTIME and
>>>> VSP1_ENTITY_PARAMS_PARTITION stages into a single call through the
>>>> .configure(). The configuration for individual partitions is handled by
>>>> passing the partition number to the configure call, and processing any
>>>> runtime stage actions on the first partition only.
>>>>
>>>> Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
>>>> ---
>>>>
>>>>  drivers/media/platform/vsp1/vsp1_bru.c    |  12 +-
>>>>  drivers/media/platform/vsp1/vsp1_clu.c    |  43 +--
>>>>  drivers/media/platform/vsp1/vsp1_drm.c    |  11 +-
>>>>  drivers/media/platform/vsp1/vsp1_entity.c |  15 +-
>>>>  drivers/media/platform/vsp1/vsp1_entity.h |  27 +--
>>>>  drivers/media/platform/vsp1/vsp1_hgo.c    |  12 +-
>>>>  drivers/media/platform/vsp1/vsp1_hgt.c    |  12 +-
>>>>  drivers/media/platform/vsp1/vsp1_hsit.c   |  12 +-
>>>>  drivers/media/platform/vsp1/vsp1_lif.c    |  12 +-
>>>>  drivers/media/platform/vsp1/vsp1_lut.c    |  24 +-
>>>>  drivers/media/platform/vsp1/vsp1_rpf.c    | 162 ++++++-------
>>>>  drivers/media/platform/vsp1/vsp1_sru.c    |  12 +-
>>>>  drivers/media/platform/vsp1/vsp1_uds.c    |  55 ++--
>>>>  drivers/media/platform/vsp1/vsp1_video.c  |  24 +--
>>>>  drivers/media/platform/vsp1/vsp1_wpf.c    | 297 ++++++++++++-----------
>>>>  15 files changed, 359 insertions(+), 371 deletions(-)
>>>
>>> [snip]
>>>
>>>> diff --git a/drivers/media/platform/vsp1/vsp1_clu.c
>>>> b/drivers/media/platform/vsp1/vsp1_clu.c index 175717018e11..5f65ce3ad97f
>>>> 100644
>>>> --- a/drivers/media/platform/vsp1/vsp1_clu.c
>>>> +++ b/drivers/media/platform/vsp1/vsp1_clu.c
>>>> @@ -213,37 +213,37 @@ static const struct v4l2_subdev_ops clu_ops = {
>>>>
>>>>  /*
>>>>  -----------------------------------------------------------------------
>>>>  -
>>>>  
>>>>   * VSP1 Entity Operations
>>>>   */
>>>>
>>>> +static void clu_prepare(struct vsp1_entity *entity,
>>>> +			struct vsp1_pipeline *pipe,
>>>> +			struct vsp1_dl_list *dl)
>>>> +{
>>>> +	struct vsp1_clu *clu = to_clu(&entity->subdev);
>>>> +
>>>> +	/*
>>>> +	 * The format can't be changed during streaming, only verify it
>>>> +	 * at setup time and store the information internally for future
>>>> +	 * runtime configuration calls.
>>>> +	 */
>>>
>>> I know you're just moving the comment around, but let's fix it at the same
>>> time. There's no verification here (and no "setup time" either). I'd write
>>> it as
>>>
>>> 	/*
>>> 	
>>> 	 * The format can't be changed during streaming. Cache it internally
>>> 	 * for future runtime configuration calls.
>>> 	 */
>>
>> I think I'm ok with that and I've updated the patch - but I'm not sure we
>> are really caching the 'format' here, as much as the yuv_mode ...
> 
> Yes, it's the YUV mode we're caching, feel free to update the comment.

Done.

> 
>> I'll ponder ...
>>
>>>> +	struct v4l2_mbus_framefmt *format;
>>>> +
>>>> +	format = vsp1_entity_get_pad_format(&clu->entity,
>>>> +					    clu->entity.config,
>>>> +					    CLU_PAD_SINK);
>>>> +	clu->yuv_mode = format->code == MEDIA_BUS_FMT_AYUV8_1X32;
>>>> +}
>>>
>>> [snip]
>>>
>>>> diff --git a/drivers/media/platform/vsp1/vsp1_entity.h
>>>> b/drivers/media/platform/vsp1/vsp1_entity.h index
>>>> 408602ebeb97..2f33e343ccc6 100644
>>>> --- a/drivers/media/platform/vsp1/vsp1_entity.h
>>>> +++ b/drivers/media/platform/vsp1/vsp1_entity.h
>>>
>>> [snip]
>>>
>>>> @@ -80,8 +68,10 @@ struct vsp1_route {
>>>>
>>>>  /**
>>>>  
>>>>   * struct vsp1_entity_operations - Entity operations
>>>>   * @destroy:	Destroy the entity.
>>>>
>>>> - * @configure:	Setup the hardware based on the entity state
>>>> (pipeline, formats,
>>>> - *		selection rectangles, ...)
>>>> + * @prepare:	Setup the initial hardware parameters for the stream
>>>> (pipeline,
>>>> + *		formats)
>>>> + * @configure:	Configure the runtime parameters for each partition
>>>> (rectangles,
>>>> + *		buffer addresses, ...)
>>>
>>> Now moving to the bikeshedding territory, I'm not sure if prepare and
>>> configure are the best names for those operations.


Would init() and configure() be more suitable for you ?

Or 'setup()' and 'configure() or perhaps 'runtime()' ?

I'm not convinced on either init() or setup() yet, as they might refer to
'initialising' the object, rather than portraying the configuration of the
object into a body...

>>> I'd like to also point
>>> out that we could go one step further by caching the partition-related
>>> parameters too, in which case we would need a third operation (or
>>> possibly passing the partition number to the prepare operation). While I
>>> won't mind if you implement this now, the issue could also be addressed
>>> later, but I'd like the operations to already support that use case to
>>> avoid yet another painful rename patch.
>>
>> Ok, understood - but I think I'll have to defer to a v4 for now ... I'm
>> running out of time.>>>
>>>>   * @max_width:	Return the max supported width of data that the entity
>>>>
>>>> can
>>>>
>>>>   *		process in a single operation.
>>>>   * @partition:	Process the partition construction based on this
>>>>
>>>> entity's
>>>
>>> [snip]
>>>
>>> The rest of the patch looks good to me.
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 5/8] v4l: vsp1: Refactor display list configure operations
  2017-11-17 15:07         ` Kieran Bingham
@ 2018-02-28 16:41           ` Kieran Bingham
  2018-02-28 21:04             ` Laurent Pinchart
  0 siblings, 1 reply; 32+ messages in thread
From: Kieran Bingham @ 2018-02-28 16:41 UTC (permalink / raw)
  To: Laurent Pinchart; +Cc: linux-renesas-soc, linux-media

Hi Laurent,

This series has a pending question below:

On 17/11/17 15:07, Kieran Bingham wrote:
> Hi Laurent,
> 
> Just a query on your bikeshedding here.
> 
> Choose your colours wisely :)
> 
> --
> Kieran
> 
> On 12/09/17 20:19, Laurent Pinchart wrote:
>> Hi Kieran,
>>
>> On Tuesday, 12 September 2017 00:16:50 EEST Kieran Bingham wrote:
>>> On 17/08/17 19:13, Laurent Pinchart wrote:
>>>> On Monday 14 Aug 2017 16:13:28 Kieran Bingham wrote:
>>>>> The entities provide a single .configure operation which configures the
>>>>> object into the target display list, based on the vsp1_entity_params
>>>>> selection.
>>>>>
>>>>> This restricts us to a single function prototype for both static
>>>>> configuration (the pre-stream INIT stage) and the dynamic runtime stages
>>>>> for both each frame - and each partition therein.
>>>>>
>>>>> Split the configure function into two parts, '.prepare()' and
>>>>> '.configure()', merging both the VSP1_ENTITY_PARAMS_RUNTIME and
>>>>> VSP1_ENTITY_PARAMS_PARTITION stages into a single call through the
>>>>> .configure(). The configuration for individual partitions is handled by
>>>>> passing the partition number to the configure call, and processing any
>>>>> runtime stage actions on the first partition only.
>>>>>
>>>>> Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
>>>>> ---
>>>>>
>>>>>  drivers/media/platform/vsp1/vsp1_bru.c    |  12 +-
>>>>>  drivers/media/platform/vsp1/vsp1_clu.c    |  43 +--
>>>>>  drivers/media/platform/vsp1/vsp1_drm.c    |  11 +-
>>>>>  drivers/media/platform/vsp1/vsp1_entity.c |  15 +-
>>>>>  drivers/media/platform/vsp1/vsp1_entity.h |  27 +--
>>>>>  drivers/media/platform/vsp1/vsp1_hgo.c    |  12 +-
>>>>>  drivers/media/platform/vsp1/vsp1_hgt.c    |  12 +-
>>>>>  drivers/media/platform/vsp1/vsp1_hsit.c   |  12 +-
>>>>>  drivers/media/platform/vsp1/vsp1_lif.c    |  12 +-
>>>>>  drivers/media/platform/vsp1/vsp1_lut.c    |  24 +-
>>>>>  drivers/media/platform/vsp1/vsp1_rpf.c    | 162 ++++++-------
>>>>>  drivers/media/platform/vsp1/vsp1_sru.c    |  12 +-
>>>>>  drivers/media/platform/vsp1/vsp1_uds.c    |  55 ++--
>>>>>  drivers/media/platform/vsp1/vsp1_video.c  |  24 +--
>>>>>  drivers/media/platform/vsp1/vsp1_wpf.c    | 297 ++++++++++++-----------
>>>>>  15 files changed, 359 insertions(+), 371 deletions(-)
>>>>
>>>> [snip]
>>>>
>>>>> diff --git a/drivers/media/platform/vsp1/vsp1_clu.c
>>>>> b/drivers/media/platform/vsp1/vsp1_clu.c index 175717018e11..5f65ce3ad97f
>>>>> 100644
>>>>> --- a/drivers/media/platform/vsp1/vsp1_clu.c
>>>>> +++ b/drivers/media/platform/vsp1/vsp1_clu.c
>>>>> @@ -213,37 +213,37 @@ static const struct v4l2_subdev_ops clu_ops = {
>>>>>
>>>>>  /*
>>>>>  -----------------------------------------------------------------------
>>>>>  -
>>>>>  
>>>>>   * VSP1 Entity Operations
>>>>>   */
>>>>>
>>>>> +static void clu_prepare(struct vsp1_entity *entity,
>>>>> +			struct vsp1_pipeline *pipe,
>>>>> +			struct vsp1_dl_list *dl)
>>>>> +{
>>>>> +	struct vsp1_clu *clu = to_clu(&entity->subdev);
>>>>> +
>>>>> +	/*
>>>>> +	 * The format can't be changed during streaming, only verify it
>>>>> +	 * at setup time and store the information internally for future
>>>>> +	 * runtime configuration calls.
>>>>> +	 */
>>>>
>>>> I know you're just moving the comment around, but let's fix it at the same
>>>> time. There's no verification here (and no "setup time" either). I'd write
>>>> it as
>>>>
>>>> 	/*
>>>> 	
>>>> 	 * The format can't be changed during streaming. Cache it internally
>>>> 	 * for future runtime configuration calls.
>>>> 	 */
>>>
>>> I think I'm ok with that and I've updated the patch - but I'm not sure we
>>> are really caching the 'format' here, as much as the yuv_mode ...
>>
>> Yes, it's the YUV mode we're caching, feel free to update the comment.
> 
> Done.
> 
>>
>>> I'll ponder ...
>>>
>>>>> +	struct v4l2_mbus_framefmt *format;
>>>>> +
>>>>> +	format = vsp1_entity_get_pad_format(&clu->entity,
>>>>> +					    clu->entity.config,
>>>>> +					    CLU_PAD_SINK);
>>>>> +	clu->yuv_mode = format->code == MEDIA_BUS_FMT_AYUV8_1X32;
>>>>> +}
>>>>
>>>> [snip]
>>>>
>>>>> diff --git a/drivers/media/platform/vsp1/vsp1_entity.h
>>>>> b/drivers/media/platform/vsp1/vsp1_entity.h index
>>>>> 408602ebeb97..2f33e343ccc6 100644
>>>>> --- a/drivers/media/platform/vsp1/vsp1_entity.h
>>>>> +++ b/drivers/media/platform/vsp1/vsp1_entity.h
>>>>
>>>> [snip]
>>>>
>>>>> @@ -80,8 +68,10 @@ struct vsp1_route {
>>>>>
>>>>>  /**
>>>>>  
>>>>>   * struct vsp1_entity_operations - Entity operations
>>>>>   * @destroy:	Destroy the entity.
>>>>>
>>>>> - * @configure:	Setup the hardware based on the entity state
>>>>> (pipeline, formats,
>>>>> - *		selection rectangles, ...)
>>>>> + * @prepare:	Setup the initial hardware parameters for the stream
>>>>> (pipeline,
>>>>> + *		formats)
>>>>> + * @configure:	Configure the runtime parameters for each partition
>>>>> (rectangles,
>>>>> + *		buffer addresses, ...)
>>>>
>>>> Now moving to the bikeshedding territory, I'm not sure if prepare and
>>>> configure are the best names for those operations.
> 
> 
> Would init() and configure() be more suitable for you ?
> 
> Or 'setup()' and 'configure() or perhaps 'runtime()' ?
> 
> I'm not convinced on either init() or setup() yet, as they might refer to
> 'initialising' the object, rather than portraying the configuration of the
> object into a body...

Any preference or alternative for the namings on the above topic?



>>>> I'd like to also point
>>>> out that we could go one step further by caching the partition-related
>>>> parameters too, in which case we would need a third operation (or
>>>> possibly passing the partition number to the prepare operation). While I
>>>> won't mind if you implement this now, the issue could also be addressed
>>>> later, but I'd like the operations to already support that use case to
>>>> avoid yet another painful rename patch.

Or based on the above - would you prefer a different approach to handling this?

I think the reason for the split was to prevent passing a display list when not
available or required. This could be passed as NULL on operations where it is
not used.

And in fact, with this series - it looks like the only use for passing the
display list now, is to handle the LUT and CLU body swaps.

Any ideas how we could improve this so that we didn't need to pass a display
list ?

--
Kieran



>>>
>>> Ok, understood - but I think I'll have to defer to a v4 for now ... I'm
>>> running out of time.>>>
>>>>>   * @max_width:	Return the max supported width of data that the entity
>>>>>
>>>>> can
>>>>>
>>>>>   *		process in a single operation.
>>>>>   * @partition:	Process the partition construction based on this
>>>>>
>>>>> entity's
>>>>
>>>> [snip]
>>>>
>>>> The rest of the patch looks good to me.
>>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 5/8] v4l: vsp1: Refactor display list configure operations
  2018-02-28 16:41           ` Kieran Bingham
@ 2018-02-28 21:04             ` Laurent Pinchart
  0 siblings, 0 replies; 32+ messages in thread
From: Laurent Pinchart @ 2018-02-28 21:04 UTC (permalink / raw)
  To: kieran.bingham; +Cc: linux-renesas-soc, linux-media

Hi Kieran,

On Wednesday, 28 February 2018 18:41:31 EET Kieran Bingham wrote:
> Hi Laurent,
> 
> This series has a pending question below:
> 
> On 17/11/17 15:07, Kieran Bingham wrote:
> > Hi Laurent,
> > 
> > Just a query on your bikeshedding here.
> > 
> > Choose your colours wisely :)
> > 
> > On 12/09/17 20:19, Laurent Pinchart wrote:
> >> On Tuesday, 12 September 2017 00:16:50 EEST Kieran Bingham wrote:
> >>> On 17/08/17 19:13, Laurent Pinchart wrote:
> >>>> On Monday 14 Aug 2017 16:13:28 Kieran Bingham wrote:
> >>>>> The entities provide a single .configure operation which configures
> >>>>> the object into the target display list, based on the
> >>>>> vsp1_entity_params selection.
> >>>>> 
> >>>>> This restricts us to a single function prototype for both static
> >>>>> configuration (the pre-stream INIT stage) and the dynamic runtime
> >>>>> stages for both each frame - and each partition therein.
> >>>>> 
> >>>>> Split the configure function into two parts, '.prepare()' and
> >>>>> '.configure()', merging both the VSP1_ENTITY_PARAMS_RUNTIME and
> >>>>> VSP1_ENTITY_PARAMS_PARTITION stages into a single call through the
> >>>>> .configure(). The configuration for individual partitions is handled
> >>>>> by passing the partition number to the configure call, and processing
> >>>>> any runtime stage actions on the first partition only.
> >>>>> 
> >>>>> Signed-off-by: Kieran Bingham
> >>>>> <kieran.bingham+renesas@ideasonboard.com>
> >>>>> ---
> >>>>> 
> >>>>>  drivers/media/platform/vsp1/vsp1_bru.c    |  12 +-
> >>>>>  drivers/media/platform/vsp1/vsp1_clu.c    |  43 +--
> >>>>>  drivers/media/platform/vsp1/vsp1_drm.c    |  11 +-
> >>>>>  drivers/media/platform/vsp1/vsp1_entity.c |  15 +-
> >>>>>  drivers/media/platform/vsp1/vsp1_entity.h |  27 +--
> >>>>>  drivers/media/platform/vsp1/vsp1_hgo.c    |  12 +-
> >>>>>  drivers/media/platform/vsp1/vsp1_hgt.c    |  12 +-
> >>>>>  drivers/media/platform/vsp1/vsp1_hsit.c   |  12 +-
> >>>>>  drivers/media/platform/vsp1/vsp1_lif.c    |  12 +-
> >>>>>  drivers/media/platform/vsp1/vsp1_lut.c    |  24 +-
> >>>>>  drivers/media/platform/vsp1/vsp1_rpf.c    | 162 ++++++-------
> >>>>>  drivers/media/platform/vsp1/vsp1_sru.c    |  12 +-
> >>>>>  drivers/media/platform/vsp1/vsp1_uds.c    |  55 ++--
> >>>>>  drivers/media/platform/vsp1/vsp1_video.c  |  24 +--
> >>>>>  drivers/media/platform/vsp1/vsp1_wpf.c    | 297 +++++++++++----------
> >>>>>  15 files changed, 359 insertions(+), 371 deletions(-)
> >>>> 
> >>>> [snip]
> >>>> 
> >>>>> diff --git a/drivers/media/platform/vsp1/vsp1_clu.c
> >>>>> b/drivers/media/platform/vsp1/vsp1_clu.c index
> >>>>> 175717018e11..5f65ce3ad97f
> >>>>> 100644
> >>>>> --- a/drivers/media/platform/vsp1/vsp1_clu.c
> >>>>> +++ b/drivers/media/platform/vsp1/vsp1_clu.c
> >>>>> @@ -213,37 +213,37 @@ static const struct v4l2_subdev_ops clu_ops = {
> >>>>>  /* ------------------------------------------------------------------
> >>>>>   * VSP1 Entity Operations
> >>>>>   */
> >>>>> +static void clu_prepare(struct vsp1_entity *entity,
> >>>>> +			struct vsp1_pipeline *pipe,
> >>>>> +			struct vsp1_dl_list *dl)
> >>>>> +{
> >>>>> +	struct vsp1_clu *clu = to_clu(&entity->subdev);
> >>>>> +
> >>>>> +	/*
> >>>>> +	 * The format can't be changed during streaming, only verify it
> >>>>> +	 * at setup time and store the information internally for future
> >>>>> +	 * runtime configuration calls.
> >>>>> +	 */
> >>>> 
> >>>> I know you're just moving the comment around, but let's fix it at the
> >>>> same time. There's no verification here (and no "setup time" either).
> >>>> I'd write it as
> >>>> 
> >>>> 	/*
> >>>> 	 * The format can't be changed during streaming. Cache it internally
> >>>> 	 * for future runtime configuration calls.
> >>>> 	 */
> >>> 
> >>> I think I'm ok with that and I've updated the patch - but I'm not sure
> >>> we are really caching the 'format' here, as much as the yuv_mode ...
> >> 
> >> Yes, it's the YUV mode we're caching, feel free to update the comment.
> > 
> > Done.
> > 
> >>> I'll ponder ...
> >>> 
> >>>>> +	struct v4l2_mbus_framefmt *format;
> >>>>> +
> >>>>> +	format = vsp1_entity_get_pad_format(&clu->entity,
> >>>>> +					    clu->entity.config,
> >>>>> +					    CLU_PAD_SINK);
> >>>>> +	clu->yuv_mode = format->code == MEDIA_BUS_FMT_AYUV8_1X32;
> >>>>> +}
> >>>> 
> >>>> [snip]
> >>>> 
> >>>>> diff --git a/drivers/media/platform/vsp1/vsp1_entity.h
> >>>>> b/drivers/media/platform/vsp1/vsp1_entity.h index
> >>>>> 408602ebeb97..2f33e343ccc6 100644
> >>>>> --- a/drivers/media/platform/vsp1/vsp1_entity.h
> >>>>> +++ b/drivers/media/platform/vsp1/vsp1_entity.h
> >>>> 
> >>>> [snip]
> >>>> 
> >>>>> @@ -80,8 +68,10 @@ struct vsp1_route {
> >>>>>  /**
> >>>>>   * struct vsp1_entity_operations - Entity operations
> >>>>>   * @destroy:	Destroy the entity.
> >>>>> - * @configure:	Setup the hardware based on the entity state
> >>>>> (pipeline, formats,
> >>>>> - *		selection rectangles, ...)
> >>>>> + * @prepare:	Setup the initial hardware parameters for the stream
> >>>>> (pipeline,
> >>>>> + *		formats)
> >>>>> + * @configure:	Configure the runtime parameters for each partition
> >>>>> (rectangles,
> >>>>> + *		buffer addresses, ...)
> >>>> 
> >>>> Now moving to the bikeshedding territory, I'm not sure if prepare and
> >>>> configure are the best names for those operations.
> > 
> > Would init() and configure() be more suitable for you ?
> > 
> > Or 'setup()' and 'configure() or perhaps 'runtime()' ?
> > 
> > I'm not convinced on either init() or setup() yet, as they might refer to
> > 'initialising' the object, rather than portraying the configuration of the
> > object into a body...
> 
> Any preference or alternative for the namings on the above topic?

I'd like the names to convey the fact that the functions fill display lists 
(or rather parts thereof) for the purpose of hardware configuration from the 
software configuration of the entity.

> >>>> I'd like to also point out that we could go one step further by caching
> >>>> the partition-related parameters too, in which case we would need a
> >>>> third operation (or possibly passing the partition number to the
> >>>> prepare operation). While I won't mind if you implement this now, the
> >>>> issue could also be addressed later, but I'd like the operations to
> >>>> already support that use case to avoid yet another painful rename
> >>>> patch.
> 
> Or based on the above - would you prefer a different approach to handling
> this?
> 
> I think the reason for the split was to prevent passing a display list when
> not available or required. This could be passed as NULL on operations where
> it is not used.
> 
> And in fact, with this series - it looks like the only use for passing the
> display list now, is to handle the LUT and CLU body swaps.
> 
> Any ideas how we could improve this so that we didn't need to pass a display
> list ?

Sorry, it's been too long, I can't remember. I'm not sure when I'll have time 
to dive into this again.

> >>> Ok, understood - but I think I'll have to defer to a v4 for now ... I'm
> >>> running out of time.
> >>> 
> >>>>>   * @max_width:	Return the max supported width of data that the entity
> >>>>> can
> >>>>>   *		process in a single operation.
> >>>>>   * @partition:	Process the partition construction based on this
> >>>>> entity's
> >>>> 
> >>>> [snip]
> >>>> 
> >>>> The rest of the patch looks good to me.

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2018-02-28 21:03 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-14 15:13 [PATCH v2 0/8] vsp1: TLB optimisation and DL caching Kieran Bingham
2017-08-14 15:13 ` [PATCH v2 1/8] v4l: vsp1: Protect fragments against overflow Kieran Bingham
2017-08-16 21:53   ` Laurent Pinchart
2017-08-17  8:16     ` Kieran Bingham
2017-08-14 15:13 ` [PATCH v2 2/8] v4l: vsp1: Provide a fragment pool Kieran Bingham
2017-08-17 12:13   ` Laurent Pinchart
2017-09-11 20:30     ` Kieran Bingham
2017-09-13  2:15       ` Laurent Pinchart
2017-08-14 15:13 ` [PATCH v2 3/8] v4l: vsp1: Convert display lists to use new " Kieran Bingham
2017-08-17 12:13   ` Laurent Pinchart
2017-09-11 20:27     ` Kieran Bingham
2017-09-13  2:26       ` Laurent Pinchart
2017-08-14 15:13 ` [PATCH v2 4/8] v4l: vsp1: Use reference counting for fragments Kieran Bingham
2017-08-17 12:53   ` Laurent Pinchart
2017-08-14 15:13 ` [PATCH v2 5/8] v4l: vsp1: Refactor display list configure operations Kieran Bingham
2017-08-17 18:13   ` Laurent Pinchart
2017-09-11 21:16     ` Kieran Bingham
2017-09-12 19:19       ` Laurent Pinchart
2017-11-17 15:07         ` Kieran Bingham
2018-02-28 16:41           ` Kieran Bingham
2018-02-28 21:04             ` Laurent Pinchart
2017-08-14 15:13 ` [PATCH v2 6/8] v4l: vsp1: Adapt entities to configure into a body Kieran Bingham
2017-08-17 17:58   ` Laurent Pinchart
2017-09-11 21:42     ` Kieran Bingham
2017-09-12 19:18       ` Laurent Pinchart
2017-11-17 13:40         ` Kieran Bingham
2017-08-14 15:13 ` [PATCH v2 7/8] v4l: vsp1: Move video configuration to a cached dlb Kieran Bingham
2017-08-17 18:10   ` Laurent Pinchart
2017-11-16 18:19     ` Kieran Bingham
2017-08-14 15:13 ` [PATCH v2 8/8] v4l: vsp1: Reduce display list body size Kieran Bingham
2017-08-17 16:11   ` Laurent Pinchart
2017-09-11 21:15     ` Kieran Bingham

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.