All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 0/8] vsp1: TLB optimisation and DL caching
@ 2018-03-08  0:05 Kieran Bingham
  2018-03-08  0:05 ` [PATCH v7 1/8] media: vsp1: Reword uses of 'fragment' as 'body' Kieran Bingham
                   ` (8 more replies)
  0 siblings, 9 replies; 26+ messages in thread
From: Kieran Bingham @ 2018-03-08  0:05 UTC (permalink / raw)
  To: linux-media, linux-renesas-soc, Laurent Pinchart
  Cc: Kieran Bingham, Kieran Bingham

Each display list currently allocates an area of DMA memory to store register
settings for the VSP1 to process. Each of these allocations adds pressure to
the IPMMU TLB entries.

We can reduce the pressure by pre-allocating larger areas and dividing the area
across multiple bodies represented as a pool.

With this reconfiguration of bodies, we can adapt the configuration code to
separate out constant hardware configuration and cache it for re-use.

The patches provided in this series can be found at:
  git://git.kernel.org/pub/scm/linux/kernel/git/kbingham/rcar.git  tags/vsp1/tlb-optimise/v7

I hope that this series is at a stage where it could be integrated now.  It has
had some thorough testing and is already integrated in both renesas-drivers and
renesas-bsp. (except for the minor changes in v7 that is...)

Please note that checkpatch complains on patch 6/8 in this series:

v7-0006-media-vsp1-Refactor-display-list-configure-operations.patch
------------------------------------------------------------------------------------------------------
WARNING: function definition argument 'struct vsp1_entity *' should also have an identifier name
#290: FILE: drivers/media/platform/vsp1/vsp1_entity.h:82:
+       void (*configure_stream)(struct vsp1_entity *, struct vsp1_pipeline *,

However - this complaint is regarding pre-existing code. I have only renamed
the function pointers.  I do also disagree with checkpatch here - as there is
no need to provide an identifier name, and it does not improve readability in
this instance to state:
	...(vsp1_entity *entity, struct vsp1_pipeline *pipe)

Thus - I have ignored these warnings.


Changelog:
----------

v7:
 - Rebased on to linux-media/master (v4.16-rc4)
 - Clean up the formatting of the vsp1_dl_list_add_body()
 - Fix formatting and white space
 -  s/prepare/configure_stream/
 -  s/configure/configure_frame/

v6:
 - Rebased on to linux-media/master (v4.16-rc1)
 - Removed DRM/UIF (DISCOM/ColorKey) updates

v5:
 - Rebased on to renesas-drivers-2018-01-09-v4.15-rc7 to fix conflicts
   with DRM and UIF updates on VSP1 driver

v4:
 - Rebased to v4.14
 * v4l: vsp1: Use reference counting for bodies
   - Fix up reference handling comments

 * v4l: vsp1: Provide a body pool
   - Provide comment explaining extra allocation on body pool
     highlighting area for optimisation later.

 * v4l: vsp1: Refactor display list configure operations
   - Fix up comment to describe yuv_mode caching rather than format

 * vsp1: Adapt entities to configure into a body
   - Rename vsp1_dl_list_get_body() to vsp1_dl_list_get_body0()

 * v4l: vsp1: Move video configuration to a cached dlb
   - Adjust pipe configured flag to be reset on resume rather than suspend
   - rename dl_child, dl_next

Testing:
--------
The VSP unit tests have been run on this patch set with the following results:

--- Test loop 1 ---
- vsp-unit-test-0000.sh
Test Conditions:
  Platform          Renesas Salvator-X 2nd version board based on r8a7795 ES2.0+
  Kernel release    4.16.0-rc4-arm64-renesas-01067-g397eb3811ec0
  convert           /usr/bin/convert
  compare           /usr/bin/compare
  killall           /usr/bin/killall
  raw2rgbpnm        /usr/bin/raw2rgbpnm
  stress            /usr/bin/stress
  yavta             /usr/bin/yavta
- vsp-unit-test-0001.sh
Testing WPF packing in RGB332: pass
Testing WPF packing in ARGB555: pass
Testing WPF packing in XRGB555: pass
Testing WPF packing in RGB565: pass
Testing WPF packing in BGR24: pass
Testing WPF packing in RGB24: pass
Testing WPF packing in ABGR32: pass
Testing WPF packing in ARGB32: pass
Testing WPF packing in XBGR32: pass
Testing WPF packing in XRGB32: pass
- vsp-unit-test-0002.sh
Testing WPF packing in NV12M: pass
Testing WPF packing in NV16M: pass
Testing WPF packing in NV21M: pass
Testing WPF packing in NV61M: pass
Testing WPF packing in UYVY: pass
Testing WPF packing in VYUY: skip
Testing WPF packing in YUV420M: pass
Testing WPF packing in YUV422M: pass
Testing WPF packing in YUV444M: pass
Testing WPF packing in YVU420M: pass
Testing WPF packing in YVU422M: pass
Testing WPF packing in YVU444M: pass
Testing WPF packing in YUYV: pass
Testing WPF packing in YVYU: pass
- vsp-unit-test-0003.sh
Testing scaling from 640x640 to 640x480 in RGB24: pass
Testing scaling from 1024x768 to 640x480 in RGB24: pass
Testing scaling from 640x480 to 1024x768 in RGB24: pass
Testing scaling from 640x640 to 640x480 in YUV444M: pass
Testing scaling from 1024x768 to 640x480 in YUV444M: pass
Testing scaling from 640x480 to 1024x768 in YUV444M: pass
- vsp-unit-test-0004.sh
Testing histogram in RGB24: pass
Testing histogram in YUV444M: pass
- vsp-unit-test-0005.sh
Testing RPF.0: pass
Testing RPF.1: pass
Testing RPF.2: pass
Testing RPF.3: pass
Testing RPF.4: pass
- vsp-unit-test-0006.sh
Testing invalid pipeline with no RPF: pass
Testing invalid pipeline with no WPF: pass
- vsp-unit-test-0007.sh
Testing BRU in RGB24 with 1 inputs: pass
Testing BRU in RGB24 with 2 inputs: pass
Testing BRU in RGB24 with 3 inputs: pass
Testing BRU in RGB24 with 4 inputs: pass
Testing BRU in RGB24 with 5 inputs: pass
Testing BRU in YUV444M with 1 inputs: pass
Testing BRU in YUV444M with 2 inputs: pass
Testing BRU in YUV444M with 3 inputs: pass
Testing BRU in YUV444M with 4 inputs: pass
Testing BRU in YUV444M with 5 inputs: pass
- vsp-unit-test-0008.sh
Test requires unavailable feature set `bru rpf.0 uds wpf.0': skipped
- vsp-unit-test-0009.sh
Test requires unavailable feature set `rpf.0 wpf.0 wpf.1': skipped
- vsp-unit-test-0010.sh
Testing CLU in RGB24 with zero configuration: pass
Testing CLU in RGB24 with identity configuration: pass
Testing CLU in RGB24 with wave configuration: pass
Testing CLU in YUV444M with zero configuration: pass
Testing CLU in YUV444M with identity configuration: pass
Testing CLU in YUV444M with wave configuration: pass
Testing LUT in RGB24 with zero configuration: pass
Testing LUT in RGB24 with identity configuration: pass
Testing LUT in RGB24 with gamma configuration: pass
Testing LUT in YUV444M with zero configuration: pass
Testing LUT in YUV444M with identity configuration: pass
Testing LUT in YUV444M with gamma configuration: pass
- vsp-unit-test-0011.sh
Testing  hflip=0 vflip=0 rotate=0: pass
Testing  hflip=1 vflip=0 rotate=0: pass
Testing  hflip=0 vflip=1 rotate=0: pass
Testing  hflip=1 vflip=1 rotate=0: pass
Testing  hflip=0 vflip=0 rotate=90: pass
Testing  hflip=1 vflip=0 rotate=90: pass
Testing  hflip=0 vflip=1 rotate=90: pass
Testing  hflip=1 vflip=1 rotate=90: pass
- vsp-unit-test-0012.sh
Testing hflip: pass
Testing vflip: pass
- vsp-unit-test-0013.sh
Testing RPF unpacking in RGB332: pass
Testing RPF unpacking in ARGB555: pass
Testing RPF unpacking in XRGB555: pass
Testing RPF unpacking in RGB565: pass
Testing RPF unpacking in BGR24: pass
Testing RPF unpacking in RGB24: pass
Testing RPF unpacking in ABGR32: pass
Testing RPF unpacking in ARGB32: pass
Testing RPF unpacking in XBGR32: pass
Testing RPF unpacking in XRGB32: pass
- vsp-unit-test-0014.sh
Testing RPF unpacking in NV12M: pass
Testing RPF unpacking in NV16M: pass
Testing RPF unpacking in NV21M: pass
Testing RPF unpacking in NV61M: pass
Testing RPF unpacking in UYVY: pass
Testing RPF unpacking in VYUY: skip
Testing RPF unpacking in YUV420M: pass
Testing RPF unpacking in YUV422M: pass
Testing RPF unpacking in YUV444M: pass
Testing RPF unpacking in YVU420M: pass
Testing RPF unpacking in YVU422M: pass
Testing RPF unpacking in YVU444M: pass
Testing RPF unpacking in YUYV: pass
Testing RPF unpacking in YVYU: pass
- vsp-unit-test-0015.sh
Testing SRU scaling from 1024x768 to 1024x768 in RGB24: pass
Testing SRU scaling from 1024x768 to 2048x1536 in RGB24: pass
Testing SRU scaling from 1024x768 to 1024x768 in YUV444M: pass
Testing SRU scaling from 1024x768 to 2048x1536 in YUV444M: pass
- vsp-unit-test-0016.sh
Testing  hflip=0 vflip=0 rotate=0 640x480 -> 640x480: pass
Testing  hflip=0 vflip=0 rotate=0 640x480 -> 1024x768: pass
Testing  hflip=0 vflip=0 rotate=0 1024x768 -> 640x480: pass
Testing  hflip=1 vflip=0 rotate=0 640x480 -> 640x480: pass
Testing  hflip=1 vflip=0 rotate=0 640x480 -> 1024x768: pass
Testing  hflip=1 vflip=0 rotate=0 1024x768 -> 640x480: pass
Testing  hflip=0 vflip=1 rotate=0 640x480 -> 640x480: pass
Testing  hflip=0 vflip=1 rotate=0 640x480 -> 1024x768: pass
Testing  hflip=0 vflip=1 rotate=0 1024x768 -> 640x480: pass
Testing  hflip=1 vflip=1 rotate=0 640x480 -> 640x480: pass
Testing  hflip=1 vflip=1 rotate=0 640x480 -> 1024x768: pass
Testing  hflip=1 vflip=1 rotate=0 1024x768 -> 640x480: pass
Testing  hflip=0 vflip=0 rotate=90 640x480 -> 640x480: pass
Testing  hflip=0 vflip=0 rotate=90 640x480 -> 1024x768: pass
Testing  hflip=0 vflip=0 rotate=90 1024x768 -> 640x480: pass
Testing  hflip=1 vflip=0 rotate=90 640x480 -> 640x480: pass
Testing  hflip=1 vflip=0 rotate=90 640x480 -> 1024x768: pass
Testing  hflip=1 vflip=0 rotate=90 1024x768 -> 640x480: pass
Testing  hflip=0 vflip=1 rotate=90 640x480 -> 640x480: pass
Testing  hflip=0 vflip=1 rotate=90 640x480 -> 1024x768: pass
Testing  hflip=0 vflip=1 rotate=90 1024x768 -> 640x480: pass
Testing  hflip=1 vflip=1 rotate=90 640x480 -> 640x480: pass
Testing  hflip=1 vflip=1 rotate=90 640x480 -> 1024x768: pass
Testing  hflip=1 vflip=1 rotate=90 1024x768 -> 640x480: pass
- vsp-unit-test-0017.sh
- vsp-unit-test-0018.sh
Testing RPF crop from (0,0)/512x384: pass
Testing RPF crop from (32,32)/512x384: pass
Testing RPF crop from (32,64)/512x384: pass
Testing RPF crop from (64,32)/512x384: pass
- vsp-unit-test-0019.sh
- vsp-unit-test-0020.sh
- vsp-unit-test-0021.sh
Testing WPF packing in RGB332 during stress testing: pass
Testing WPF packing in ARGB555 during stress testing: pass
Testing WPF packing in XRGB555 during stress testing: pass
Testing WPF packing in RGB565 during stress testing: pass
Testing WPF packing in BGR24 during stress testing: pass
Testing WPF packing in RGB24 during stress testing: pass
Testing WPF packing in ABGR32 during stress testing: pass
Testing WPF packing in ARGB32 during stress testing: pass
Testing WPF packing in XBGR32 during stress testing: pass
Testing WPF packing in XRGB32 during stress testing: pass
./vsp-unit-test-0021.sh: line 34:  4489 Killed                  stress --cpu 8 --io 4 --vm 2 --vm-bytes 128M
- vsp-unit-test-0022.sh
Testing long duration pipelines under stress: pass
./vsp-unit-test-0022.sh: line 38:  6457 Killed                  stress --cpu 8 --io 4 --vm 2 --vm-bytes 128M
- vsp-unit-test-0023.sh
Testing histogram HGT with hue areas 0,255,255,255,255,255,255,255,255,255,255,255: pass
Testing histogram HGT with hue areas 0,40,40,80,80,120,120,160,160,200,200,255: pass
Testing histogram HGT with hue areas 220,40,40,80,80,120,120,160,160,200,200,220: pass
Testing histogram HGT with hue areas 0,10,50,60,100,110,150,160,200,210,250,255: pass
Testing histogram HGT with hue areas 10,20,50,60,100,110,150,160,200,210,230,240: pass
Testing histogram HGT with hue areas 240,20,60,80,100,120,140,160,180,200,210,220: pass
- vsp-unit-test-0024.sh
Test requires unavailable feature set `rpf.0 rpf.1 brs wpf.0': skipped
158 tests: 142 passed, 0 failed, 3 skipped

Kieran Bingham (8):
  media: vsp1: Reword uses of 'fragment' as 'body'
  media: vsp1: Protect bodies against overflow
  media: vsp1: Provide a body pool
  media: vsp1: Convert display lists to use new body pool
  media: vsp1: Use reference counting for bodies
  media: vsp1: Refactor display list configure operations
  media: vsp1: Adapt entities to configure into a body
  media: vsp1: Move video configuration to a cached dlb

 drivers/media/platform/vsp1/vsp1_bru.c    |  32 +--
 drivers/media/platform/vsp1/vsp1_clu.c    | 102 +++---
 drivers/media/platform/vsp1/vsp1_clu.h    |   1 +-
 drivers/media/platform/vsp1/vsp1_dl.c     | 393 +++++++++++++----------
 drivers/media/platform/vsp1/vsp1_dl.h     |  19 +-
 drivers/media/platform/vsp1/vsp1_drm.c    |  35 +--
 drivers/media/platform/vsp1/vsp1_entity.c |  26 +-
 drivers/media/platform/vsp1/vsp1_entity.h |  38 +-
 drivers/media/platform/vsp1/vsp1_hgo.c    |  26 +--
 drivers/media/platform/vsp1/vsp1_hgt.c    |  28 +--
 drivers/media/platform/vsp1/vsp1_hsit.c   |  20 +-
 drivers/media/platform/vsp1/vsp1_lif.c    |  25 +-
 drivers/media/platform/vsp1/vsp1_lut.c    |  77 +++--
 drivers/media/platform/vsp1/vsp1_lut.h    |   1 +-
 drivers/media/platform/vsp1/vsp1_pipe.c   |  11 +-
 drivers/media/platform/vsp1/vsp1_pipe.h   |   7 +-
 drivers/media/platform/vsp1/vsp1_rpf.c    | 183 +++++------
 drivers/media/platform/vsp1/vsp1_sru.c    |  24 +-
 drivers/media/platform/vsp1/vsp1_uds.c    |  75 ++--
 drivers/media/platform/vsp1/vsp1_uds.h    |   2 +-
 drivers/media/platform/vsp1/vsp1_video.c  |  82 ++---
 drivers/media/platform/vsp1/vsp1_video.h  |   2 +-
 drivers/media/platform/vsp1/vsp1_wpf.c    | 327 +++++++++----------
 23 files changed, 845 insertions(+), 691 deletions(-)

base-commit: 8514509ba5933f4e4ade0d5d81be117f18c1ebd2
-- 
git-series 0.9.1

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v7 1/8] media: vsp1: Reword uses of 'fragment' as 'body'
  2018-03-08  0:05 [PATCH v7 0/8] vsp1: TLB optimisation and DL caching Kieran Bingham
@ 2018-03-08  0:05 ` Kieran Bingham
  2018-04-06 21:38   ` Laurent Pinchart
  2018-03-08  0:05 ` [PATCH v7 2/8] media: vsp1: Protect bodies against overflow Kieran Bingham
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 26+ messages in thread
From: Kieran Bingham @ 2018-03-08  0:05 UTC (permalink / raw)
  To: linux-media, linux-renesas-soc, Laurent Pinchart
  Cc: Kieran Bingham, Kieran Bingham

Throughout the codebase, the term 'fragment' is used to represent a
display list body. This term duplicates the 'body' which is already in
use.

The datasheet references these objects as a body, therefore replace all
mentions of a fragment with a body, along with the corresponding
pluralised terms.

Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>

---
v7
 - Clean up the formatting of the vsp1_dl_list_add_body()

 drivers/media/platform/vsp1/vsp1_clu.c |  10 +-
 drivers/media/platform/vsp1/vsp1_dl.c  | 109 ++++++++++++--------------
 drivers/media/platform/vsp1/vsp1_dl.h  |  13 +--
 drivers/media/platform/vsp1/vsp1_lut.c |   8 +-
 4 files changed, 69 insertions(+), 71 deletions(-)

diff --git a/drivers/media/platform/vsp1/vsp1_clu.c b/drivers/media/platform/vsp1/vsp1_clu.c
index f2fb26e5ab4e..9621afa3658c 100644
--- a/drivers/media/platform/vsp1/vsp1_clu.c
+++ b/drivers/media/platform/vsp1/vsp1_clu.c
@@ -47,19 +47,19 @@ static int clu_set_table(struct vsp1_clu *clu, struct v4l2_ctrl *ctrl)
 	struct vsp1_dl_body *dlb;
 	unsigned int i;
 
-	dlb = vsp1_dl_fragment_alloc(clu->entity.vsp1, 1 + 17 * 17 * 17);
+	dlb = vsp1_dl_body_alloc(clu->entity.vsp1, 1 + 17 * 17 * 17);
 	if (!dlb)
 		return -ENOMEM;
 
-	vsp1_dl_fragment_write(dlb, VI6_CLU_ADDR, 0);
+	vsp1_dl_body_write(dlb, VI6_CLU_ADDR, 0);
 	for (i = 0; i < 17 * 17 * 17; ++i)
-		vsp1_dl_fragment_write(dlb, VI6_CLU_DATA, ctrl->p_new.p_u32[i]);
+		vsp1_dl_body_write(dlb, VI6_CLU_DATA, ctrl->p_new.p_u32[i]);
 
 	spin_lock_irq(&clu->lock);
 	swap(clu->clu, dlb);
 	spin_unlock_irq(&clu->lock);
 
-	vsp1_dl_fragment_free(dlb);
+	vsp1_dl_body_free(dlb);
 	return 0;
 }
 
@@ -256,7 +256,7 @@ static void clu_configure(struct vsp1_entity *entity,
 		spin_unlock_irqrestore(&clu->lock, flags);
 
 		if (dlb)
-			vsp1_dl_list_add_fragment(dl, dlb);
+			vsp1_dl_list_add_body(dl, dlb);
 		break;
 	}
 }
diff --git a/drivers/media/platform/vsp1/vsp1_dl.c b/drivers/media/platform/vsp1/vsp1_dl.c
index 0b86ed01e85d..caed441f5f0c 100644
--- a/drivers/media/platform/vsp1/vsp1_dl.c
+++ b/drivers/media/platform/vsp1/vsp1_dl.c
@@ -69,7 +69,7 @@ struct vsp1_dl_body {
  * @header: display list header, NULL for headerless lists
  * @dma: DMA address for the header
  * @body0: first display list body
- * @fragments: list of extra display list bodies
+ * @bodies: list of extra display list bodies
  * @has_chain: if true, indicates that there's a partition chain
  * @chain: entry in the display list partition chain
  */
@@ -81,7 +81,7 @@ struct vsp1_dl_list {
 	dma_addr_t dma;
 
 	struct vsp1_dl_body body0;
-	struct list_head fragments;
+	struct list_head bodies;
 
 	bool has_chain;
 	struct list_head chain;
@@ -98,13 +98,13 @@ enum vsp1_dl_mode {
  * @mode: display list operation mode (header or headerless)
  * @singleshot: execute the display list in single-shot mode
  * @vsp1: the VSP1 device
- * @lock: protects the free, active, queued, pending and gc_fragments lists
+ * @lock: protects the free, active, queued, pending and gc_bodies lists
  * @free: array of all free display lists
  * @active: list currently being processed (loaded) by hardware
  * @queued: list queued to the hardware (written to the DL registers)
  * @pending: list waiting to be queued to the hardware
- * @gc_work: fragments garbage collector work struct
- * @gc_fragments: array of display list fragments waiting to be freed
+ * @gc_work: bodies garbage collector work struct
+ * @gc_bodies: array of display list bodies waiting to be freed
  */
 struct vsp1_dl_manager {
 	unsigned int index;
@@ -119,7 +119,7 @@ struct vsp1_dl_manager {
 	struct vsp1_dl_list *pending;
 
 	struct work_struct gc_work;
-	struct list_head gc_fragments;
+	struct list_head gc_bodies;
 };
 
 /* -----------------------------------------------------------------------------
@@ -157,17 +157,16 @@ static void vsp1_dl_body_cleanup(struct vsp1_dl_body *dlb)
 }
 
 /**
- * vsp1_dl_fragment_alloc - Allocate a display list fragment
+ * vsp1_dl_body_alloc - Allocate a display list body
  * @vsp1: The VSP1 device
- * @num_entries: The maximum number of entries that the fragment can contain
+ * @num_entries: The maximum number of entries that the body can contain
  *
- * Allocate a display list fragment with enough memory to contain the requested
+ * Allocate a display list body with enough memory to contain the requested
  * number of entries.
  *
- * Return a pointer to a fragment on success or NULL if memory can't be
- * allocated.
+ * Return a pointer to a body on success or NULL if memory can't be allocated.
  */
-struct vsp1_dl_body *vsp1_dl_fragment_alloc(struct vsp1_device *vsp1,
+struct vsp1_dl_body *vsp1_dl_body_alloc(struct vsp1_device *vsp1,
 					    unsigned int num_entries)
 {
 	struct vsp1_dl_body *dlb;
@@ -187,20 +186,20 @@ struct vsp1_dl_body *vsp1_dl_fragment_alloc(struct vsp1_device *vsp1,
 }
 
 /**
- * vsp1_dl_fragment_free - Free a display list fragment
- * @dlb: The fragment
+ * vsp1_dl_body_free - Free a display list body
+ * @dlb: The body
  *
- * Free the given display list fragment and the associated DMA memory.
+ * Free the given display list body and the associated DMA memory.
  *
- * Fragments must only be freed explicitly if they are not added to a display
+ * Bodies must only be freed explicitly if they are not added to a display
  * list, as the display list will take ownership of them and free them
- * otherwise. Manual free typically happens at cleanup time for fragments that
+ * otherwise. Manual free typically happens at cleanup time for bodies that
  * have been allocated but not used.
  *
  * Passing a NULL pointer to this function is safe, in that case no operation
  * will be performed.
  */
-void vsp1_dl_fragment_free(struct vsp1_dl_body *dlb)
+void vsp1_dl_body_free(struct vsp1_dl_body *dlb)
 {
 	if (!dlb)
 		return;
@@ -210,16 +209,16 @@ void vsp1_dl_fragment_free(struct vsp1_dl_body *dlb)
 }
 
 /**
- * vsp1_dl_fragment_write - Write a register to a display list fragment
- * @dlb: The fragment
+ * vsp1_dl_body_write - Write a register to a display list body
+ * @dlb: The body
  * @reg: The register address
  * @data: The register value
  *
- * Write the given register and value to the display list fragment. The maximum
- * number of entries that can be written in a fragment is specified when the
- * fragment is allocated by vsp1_dl_fragment_alloc().
+ * Write the given register and value to the display list body. The maximum
+ * number of entries that can be written in a body is specified when the body is
+ * allocated by vsp1_dl_body_alloc().
  */
-void vsp1_dl_fragment_write(struct vsp1_dl_body *dlb, u32 reg, u32 data)
+void vsp1_dl_body_write(struct vsp1_dl_body *dlb, u32 reg, u32 data)
 {
 	dlb->entries[dlb->num_entries].addr = reg;
 	dlb->entries[dlb->num_entries].data = data;
@@ -240,7 +239,7 @@ static struct vsp1_dl_list *vsp1_dl_list_alloc(struct vsp1_dl_manager *dlm)
 	if (!dl)
 		return NULL;
 
-	INIT_LIST_HEAD(&dl->fragments);
+	INIT_LIST_HEAD(&dl->bodies);
 	dl->dlm = dlm;
 
 	/*
@@ -277,7 +276,7 @@ static struct vsp1_dl_list *vsp1_dl_list_alloc(struct vsp1_dl_manager *dlm)
 static void vsp1_dl_list_free(struct vsp1_dl_list *dl)
 {
 	vsp1_dl_body_cleanup(&dl->body0);
-	list_splice_init(&dl->fragments, &dl->dlm->gc_fragments);
+	list_splice_init(&dl->bodies, &dl->dlm->gc_bodies);
 	kfree(dl);
 }
 
@@ -332,13 +331,13 @@ static void __vsp1_dl_list_put(struct vsp1_dl_list *dl)
 	dl->has_chain = false;
 
 	/*
-	 * We can't free fragments here as DMA memory can only be freed in
-	 * interruptible context. Move all fragments to the display list
-	 * manager's list of fragments to be freed, they will be
-	 * garbage-collected by the work queue.
+	 * We can't free bodies here as DMA memory can only be freed in
+	 * interruptible context. Move all bodies to the display list manager's
+	 * list of bodies to be freed, they will be garbage-collected by the
+	 * work queue.
 	 */
-	if (!list_empty(&dl->fragments)) {
-		list_splice_init(&dl->fragments, &dl->dlm->gc_fragments);
+	if (!list_empty(&dl->bodies)) {
+		list_splice_init(&dl->bodies, &dl->dlm->gc_bodies);
 		schedule_work(&dl->dlm->gc_work);
 	}
 
@@ -379,33 +378,33 @@ void vsp1_dl_list_put(struct vsp1_dl_list *dl)
  */
 void vsp1_dl_list_write(struct vsp1_dl_list *dl, u32 reg, u32 data)
 {
-	vsp1_dl_fragment_write(&dl->body0, reg, data);
+	vsp1_dl_body_write(&dl->body0, reg, data);
 }
 
 /**
- * vsp1_dl_list_add_fragment - Add a fragment to the display list
+ * vsp1_dl_list_add_body - Add a body to the display list
  * @dl: The display list
- * @dlb: The fragment
+ * @dlb: The body
  *
- * Add a display list body as a fragment to a display list. Registers contained
- * in fragments are processed after registers contained in the main display
- * list, in the order in which fragments are added.
+ * Add a display list body as a body to a display list. Registers contained
+ * in bodies are processed after registers contained in the main display list,
+ * in the order in which bodies are added.
  *
- * Adding a fragment to a display list passes ownership of the fragment to the
- * list. The caller must not touch the fragment after this call, and must not
- * free it explicitly with vsp1_dl_fragment_free().
+ * Adding a body to a display list passes ownership of the body to the list. The
+ * caller must not touch the body after this call, and must not free it
+ * explicitly with vsp1_dl_body_free().
  *
- * Fragments are only usable for display lists in header mode. Attempt to
- * add a fragment to a header-less display list will return an error.
+ * Additional bodies are only usable for display lists in header mode.
+ * Attempting to add a body to a header-less display list will return an error.
  */
-int vsp1_dl_list_add_fragment(struct vsp1_dl_list *dl,
-			      struct vsp1_dl_body *dlb)
+int vsp1_dl_list_add_body(struct vsp1_dl_list *dl, struct vsp1_dl_body *dlb)
 {
 	/* Multi-body lists are only available in header mode. */
 	if (dl->dlm->mode != VSP1_DL_MODE_HEADER)
 		return -EINVAL;
 
-	list_add_tail(&dlb->list, &dl->fragments);
+	list_add_tail(&dlb->list, &dl->bodies);
+
 	return 0;
 }
 
@@ -454,7 +453,7 @@ static void vsp1_dl_list_fill_header(struct vsp1_dl_list *dl, bool is_last)
 	hdr->num_bytes = dl->body0.num_entries
 		       * sizeof(*dl->header->lists);
 
-	list_for_each_entry(dlb, &dl->fragments, list) {
+	list_for_each_entry(dlb, &dl->bodies, list) {
 		num_lists++;
 		hdr++;
 
@@ -711,25 +710,25 @@ void vsp1_dlm_reset(struct vsp1_dl_manager *dlm)
 }
 
 /*
- * Free all fragments awaiting to be garbage-collected.
+ * Free all bodies awaiting to be garbage-collected.
  *
  * This function must be called without the display list manager lock held.
  */
-static void vsp1_dlm_fragments_free(struct vsp1_dl_manager *dlm)
+static void vsp1_dlm_bodies_free(struct vsp1_dl_manager *dlm)
 {
 	unsigned long flags;
 
 	spin_lock_irqsave(&dlm->lock, flags);
 
-	while (!list_empty(&dlm->gc_fragments)) {
+	while (!list_empty(&dlm->gc_bodies)) {
 		struct vsp1_dl_body *dlb;
 
-		dlb = list_first_entry(&dlm->gc_fragments, struct vsp1_dl_body,
+		dlb = list_first_entry(&dlm->gc_bodies, struct vsp1_dl_body,
 				       list);
 		list_del(&dlb->list);
 
 		spin_unlock_irqrestore(&dlm->lock, flags);
-		vsp1_dl_fragment_free(dlb);
+		vsp1_dl_body_free(dlb);
 		spin_lock_irqsave(&dlm->lock, flags);
 	}
 
@@ -741,7 +740,7 @@ static void vsp1_dlm_garbage_collect(struct work_struct *work)
 	struct vsp1_dl_manager *dlm =
 		container_of(work, struct vsp1_dl_manager, gc_work);
 
-	vsp1_dlm_fragments_free(dlm);
+	vsp1_dlm_bodies_free(dlm);
 }
 
 struct vsp1_dl_manager *vsp1_dlm_create(struct vsp1_device *vsp1,
@@ -763,7 +762,7 @@ struct vsp1_dl_manager *vsp1_dlm_create(struct vsp1_device *vsp1,
 
 	spin_lock_init(&dlm->lock);
 	INIT_LIST_HEAD(&dlm->free);
-	INIT_LIST_HEAD(&dlm->gc_fragments);
+	INIT_LIST_HEAD(&dlm->gc_bodies);
 	INIT_WORK(&dlm->gc_work, vsp1_dlm_garbage_collect);
 
 	for (i = 0; i < prealloc; ++i) {
@@ -793,5 +792,5 @@ void vsp1_dlm_destroy(struct vsp1_dl_manager *dlm)
 		vsp1_dl_list_free(dl);
 	}
 
-	vsp1_dlm_fragments_free(dlm);
+	vsp1_dlm_bodies_free(dlm);
 }
diff --git a/drivers/media/platform/vsp1/vsp1_dl.h b/drivers/media/platform/vsp1/vsp1_dl.h
index ee3508172f0a..cf57f986b69a 100644
--- a/drivers/media/platform/vsp1/vsp1_dl.h
+++ b/drivers/media/platform/vsp1/vsp1_dl.h
@@ -16,7 +16,7 @@
 #include <linux/types.h>
 
 struct vsp1_device;
-struct vsp1_dl_fragment;
+struct vsp1_dl_body;
 struct vsp1_dl_list;
 struct vsp1_dl_manager;
 
@@ -34,12 +34,11 @@ void vsp1_dl_list_put(struct vsp1_dl_list *dl);
 void vsp1_dl_list_write(struct vsp1_dl_list *dl, u32 reg, u32 data);
 void vsp1_dl_list_commit(struct vsp1_dl_list *dl);
 
-struct vsp1_dl_body *vsp1_dl_fragment_alloc(struct vsp1_device *vsp1,
-					    unsigned int num_entries);
-void vsp1_dl_fragment_free(struct vsp1_dl_body *dlb);
-void vsp1_dl_fragment_write(struct vsp1_dl_body *dlb, u32 reg, u32 data);
-int vsp1_dl_list_add_fragment(struct vsp1_dl_list *dl,
-			      struct vsp1_dl_body *dlb);
+struct vsp1_dl_body *vsp1_dl_body_alloc(struct vsp1_device *vsp1,
+					unsigned int num_entries);
+void vsp1_dl_body_free(struct vsp1_dl_body *dlb);
+void vsp1_dl_body_write(struct vsp1_dl_body *dlb, u32 reg, u32 data);
+int vsp1_dl_list_add_body(struct vsp1_dl_list *dl, struct vsp1_dl_body *dlb);
 int vsp1_dl_list_add_chain(struct vsp1_dl_list *head, struct vsp1_dl_list *dl);
 
 #endif /* __VSP1_DL_H__ */
diff --git a/drivers/media/platform/vsp1/vsp1_lut.c b/drivers/media/platform/vsp1/vsp1_lut.c
index c67cc60db0db..aa2b40327529 100644
--- a/drivers/media/platform/vsp1/vsp1_lut.c
+++ b/drivers/media/platform/vsp1/vsp1_lut.c
@@ -44,19 +44,19 @@ static int lut_set_table(struct vsp1_lut *lut, struct v4l2_ctrl *ctrl)
 	struct vsp1_dl_body *dlb;
 	unsigned int i;
 
-	dlb = vsp1_dl_fragment_alloc(lut->entity.vsp1, 256);
+	dlb = vsp1_dl_body_alloc(lut->entity.vsp1, 256);
 	if (!dlb)
 		return -ENOMEM;
 
 	for (i = 0; i < 256; ++i)
-		vsp1_dl_fragment_write(dlb, VI6_LUT_TABLE + 4 * i,
+		vsp1_dl_body_write(dlb, VI6_LUT_TABLE + 4 * i,
 				       ctrl->p_new.p_u32[i]);
 
 	spin_lock_irq(&lut->lock);
 	swap(lut->lut, dlb);
 	spin_unlock_irq(&lut->lock);
 
-	vsp1_dl_fragment_free(dlb);
+	vsp1_dl_body_free(dlb);
 	return 0;
 }
 
@@ -212,7 +212,7 @@ static void lut_configure(struct vsp1_entity *entity,
 		spin_unlock_irqrestore(&lut->lock, flags);
 
 		if (dlb)
-			vsp1_dl_list_add_fragment(dl, dlb);
+			vsp1_dl_list_add_body(dl, dlb);
 		break;
 	}
 }
-- 
git-series 0.9.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v7 2/8] media: vsp1: Protect bodies against overflow
  2018-03-08  0:05 [PATCH v7 0/8] vsp1: TLB optimisation and DL caching Kieran Bingham
  2018-03-08  0:05 ` [PATCH v7 1/8] media: vsp1: Reword uses of 'fragment' as 'body' Kieran Bingham
@ 2018-03-08  0:05 ` Kieran Bingham
  2018-03-08  0:05 ` [PATCH v7 3/8] media: vsp1: Provide a body pool Kieran Bingham
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 26+ messages in thread
From: Kieran Bingham @ 2018-03-08  0:05 UTC (permalink / raw)
  To: linux-media, linux-renesas-soc, Laurent Pinchart
  Cc: Kieran Bingham, Kieran Bingham

The body write function relies on the code never asking it to write more
than the entries available in the list.

Currently with each list body containing 256 entries, this is fine, but
we can reduce this number greatly saving memory. In preparation of this
add a level of protection to catch any buffer overflows.

Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>

---

v3:
 - adapt for new 'body' terminology
 - simplify WARN_ON macro usage

 drivers/media/platform/vsp1/vsp1_dl.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/media/platform/vsp1/vsp1_dl.c b/drivers/media/platform/vsp1/vsp1_dl.c
index caed441f5f0c..67cc16c1b8e3 100644
--- a/drivers/media/platform/vsp1/vsp1_dl.c
+++ b/drivers/media/platform/vsp1/vsp1_dl.c
@@ -50,6 +50,7 @@ struct vsp1_dl_entry {
  * @dma: DMA address of the entries
  * @size: size of the DMA memory in bytes
  * @num_entries: number of stored entries
+ * @max_entries: number of entries available
  */
 struct vsp1_dl_body {
 	struct list_head list;
@@ -60,6 +61,7 @@ struct vsp1_dl_body {
 	size_t size;
 
 	unsigned int num_entries;
+	unsigned int max_entries;
 };
 
 /**
@@ -139,6 +141,7 @@ static int vsp1_dl_body_init(struct vsp1_device *vsp1,
 
 	dlb->vsp1 = vsp1;
 	dlb->size = size;
+	dlb->max_entries = num_entries;
 
 	dlb->entries = dma_alloc_wc(vsp1->bus_master, dlb->size, &dlb->dma,
 				    GFP_KERNEL);
@@ -220,6 +223,10 @@ void vsp1_dl_body_free(struct vsp1_dl_body *dlb)
  */
 void vsp1_dl_body_write(struct vsp1_dl_body *dlb, u32 reg, u32 data)
 {
+	if (WARN_ONCE(dlb->num_entries >= dlb->max_entries,
+		      "DLB size exceeded (max %u)", dlb->max_entries))
+		return;
+
 	dlb->entries[dlb->num_entries].addr = reg;
 	dlb->entries[dlb->num_entries].data = data;
 	dlb->num_entries++;
-- 
git-series 0.9.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v7 3/8] media: vsp1: Provide a body pool
  2018-03-08  0:05 [PATCH v7 0/8] vsp1: TLB optimisation and DL caching Kieran Bingham
  2018-03-08  0:05 ` [PATCH v7 1/8] media: vsp1: Reword uses of 'fragment' as 'body' Kieran Bingham
  2018-03-08  0:05 ` [PATCH v7 2/8] media: vsp1: Protect bodies against overflow Kieran Bingham
@ 2018-03-08  0:05 ` Kieran Bingham
  2018-04-06 22:33   ` Laurent Pinchart
  2018-03-08  0:05 ` [PATCH v7 4/8] media: vsp1: Convert display lists to use new " Kieran Bingham
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 26+ messages in thread
From: Kieran Bingham @ 2018-03-08  0:05 UTC (permalink / raw)
  To: linux-media, linux-renesas-soc, Laurent Pinchart
  Cc: Kieran Bingham, Kieran Bingham

Each display list allocates a body to store register values in a dma
accessible buffer from a dma_alloc_wc() allocation. Each of these
results in an entry in the TLB, and a large number of display list
allocations adds pressure to this resource.

Reduce TLB pressure on the IPMMUs by allocating multiple display list
bodies in a single allocation, and providing these to the display list
through a 'body pool'. A pool can be allocated by the display list
manager or entities which require their own body allocations.

Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>

---
v4:
 - Provide comment explaining extra allocation on body pool
   highlighting area for optimisation later.

v3:
 - s/fragment/body/, s/fragments/bodies/
 - qty -> num_bodies
 - indentation fix
 - s/vsp1_dl_body_pool_{alloc,free}/vsp1_dl_body_pool_{create,destroy}/'
 - Add kerneldoc to non-static functions

v2:
 - assign dlb->dma correctly

 drivers/media/platform/vsp1/vsp1_dl.c | 163 +++++++++++++++++++++++++++-
 drivers/media/platform/vsp1/vsp1_dl.h |   8 +-
 2 files changed, 171 insertions(+)

diff --git a/drivers/media/platform/vsp1/vsp1_dl.c b/drivers/media/platform/vsp1/vsp1_dl.c
index 67cc16c1b8e3..0208e72cb356 100644
--- a/drivers/media/platform/vsp1/vsp1_dl.c
+++ b/drivers/media/platform/vsp1/vsp1_dl.c
@@ -45,6 +45,8 @@ struct vsp1_dl_entry {
 /**
  * struct vsp1_dl_body - Display list body
  * @list: entry in the display list list of bodies
+ * @free: entry in the pool free body list
+ * @pool: pool to which this body belongs
  * @vsp1: the VSP1 device
  * @entries: array of entries
  * @dma: DMA address of the entries
@@ -54,6 +56,9 @@ struct vsp1_dl_entry {
  */
 struct vsp1_dl_body {
 	struct list_head list;
+	struct list_head free;
+
+	struct vsp1_dl_body_pool *pool;
 	struct vsp1_device *vsp1;
 
 	struct vsp1_dl_entry *entries;
@@ -65,6 +70,30 @@ struct vsp1_dl_body {
 };
 
 /**
+ * struct vsp1_dl_body_pool - display list body pool
+ * @dma: DMA address of the entries
+ * @size: size of the full DMA memory pool in bytes
+ * @mem: CPU memory pointer for the pool
+ * @bodies: Array of DLB structures for the pool
+ * @free: List of free DLB entries
+ * @lock: Protects the pool and free list
+ * @vsp1: the VSP1 device
+ */
+struct vsp1_dl_body_pool {
+	/* DMA allocation */
+	dma_addr_t dma;
+	size_t size;
+	void *mem;
+
+	/* Body management */
+	struct vsp1_dl_body *bodies;
+	struct list_head free;
+	spinlock_t lock;
+
+	struct vsp1_device *vsp1;
+};
+
+/**
  * struct vsp1_dl_list - Display list
  * @list: entry in the display list manager lists
  * @dlm: the display list manager
@@ -105,6 +134,7 @@ enum vsp1_dl_mode {
  * @active: list currently being processed (loaded) by hardware
  * @queued: list queued to the hardware (written to the DL registers)
  * @pending: list waiting to be queued to the hardware
+ * @pool: body pool for the display list bodies
  * @gc_work: bodies garbage collector work struct
  * @gc_bodies: array of display list bodies waiting to be freed
  */
@@ -120,6 +150,8 @@ struct vsp1_dl_manager {
 	struct vsp1_dl_list *queued;
 	struct vsp1_dl_list *pending;
 
+	struct vsp1_dl_body_pool *pool;
+
 	struct work_struct gc_work;
 	struct list_head gc_bodies;
 };
@@ -128,6 +160,137 @@ struct vsp1_dl_manager {
  * Display List Body Management
  */
 
+/**
+ * vsp1_dl_body_pool_create - Create a pool of bodies from a single allocation
+ * @vsp1: The VSP1 device
+ * @num_bodies: The quantity of bodies to allocate
+ * @num_entries: The maximum number of entries that the body can contain
+ * @extra_size: Extra allocation provided for the bodies
+ *
+ * Allocate a pool of display list bodies each with enough memory to contain the
+ * requested number of entries.
+ *
+ * Return a pointer to a pool on success or NULL if memory can't be allocated.
+ */
+struct vsp1_dl_body_pool *
+vsp1_dl_body_pool_create(struct vsp1_device *vsp1, unsigned int num_bodies,
+			 unsigned int num_entries, size_t extra_size)
+{
+	struct vsp1_dl_body_pool *pool;
+	size_t dlb_size;
+	unsigned int i;
+
+	pool = kzalloc(sizeof(*pool), GFP_KERNEL);
+	if (!pool)
+		return NULL;
+
+	pool->vsp1 = vsp1;
+
+	/*
+	 * Todo: 'extra_size' is only used by vsp1_dlm_create(), to allocate
+	 * extra memory for the display list header. We need only one header per
+	 * display list, not per display list body, thus this allocation is
+	 * extraneous and should be reworked in the future.
+	 */
+	dlb_size = num_entries * sizeof(struct vsp1_dl_entry) + extra_size;
+	pool->size = dlb_size * num_bodies;
+
+	pool->bodies = kcalloc(num_bodies, sizeof(*pool->bodies), GFP_KERNEL);
+	if (!pool->bodies) {
+		kfree(pool);
+		return NULL;
+	}
+
+	pool->mem = dma_alloc_wc(vsp1->bus_master, pool->size, &pool->dma,
+				 GFP_KERNEL);
+	if (!pool->mem) {
+		kfree(pool->bodies);
+		kfree(pool);
+		return NULL;
+	}
+
+	spin_lock_init(&pool->lock);
+	INIT_LIST_HEAD(&pool->free);
+
+	for (i = 0; i < num_bodies; ++i) {
+		struct vsp1_dl_body *dlb = &pool->bodies[i];
+
+		dlb->pool = pool;
+		dlb->max_entries = num_entries;
+
+		dlb->dma = pool->dma + i * dlb_size;
+		dlb->entries = pool->mem + i * dlb_size;
+
+		list_add_tail(&dlb->free, &pool->free);
+	}
+
+	return pool;
+}
+
+/**
+ * vsp1_dl_body_pool_destroy - Release a body pool
+ * @pool: The body pool
+ *
+ * Release all components of a pool allocation.
+ */
+void vsp1_dl_body_pool_destroy(struct vsp1_dl_body_pool *pool)
+{
+	if (!pool)
+		return;
+
+	if (pool->mem)
+		dma_free_wc(pool->vsp1->bus_master, pool->size, pool->mem,
+			    pool->dma);
+
+	kfree(pool->bodies);
+	kfree(pool);
+}
+
+/**
+ * vsp1_dl_body_get - Obtain a body from a pool
+ * @pool: The body pool
+ *
+ * Obtain a body from the pool allocation without blocking.
+ *
+ * Returns a display list body or NULL if there are none available.
+ */
+struct vsp1_dl_body *vsp1_dl_body_get(struct vsp1_dl_body_pool *pool)
+{
+	struct vsp1_dl_body *dlb = NULL;
+	unsigned long flags;
+
+	spin_lock_irqsave(&pool->lock, flags);
+
+	if (!list_empty(&pool->free)) {
+		dlb = list_first_entry(&pool->free, struct vsp1_dl_body, free);
+		list_del(&dlb->free);
+	}
+
+	spin_unlock_irqrestore(&pool->lock, flags);
+
+	return dlb;
+}
+
+/**
+ * vsp1_dl_body_put - Return a body back to its pool
+ * @dlb: The display list body
+ *
+ * Return a body back to the pool, and reset the num_entries to clear the list.
+ */
+void vsp1_dl_body_put(struct vsp1_dl_body *dlb)
+{
+	unsigned long flags;
+
+	if (!dlb)
+		return;
+
+	dlb->num_entries = 0;
+
+	spin_lock_irqsave(&dlb->pool->lock, flags);
+	list_add_tail(&dlb->free, &dlb->pool->free);
+	spin_unlock_irqrestore(&dlb->pool->lock, flags);
+}
+
 /*
  * Initialize a display list body object and allocate DMA memory for the body
  * data. The display list body object is expected to have been initialized to
diff --git a/drivers/media/platform/vsp1/vsp1_dl.h b/drivers/media/platform/vsp1/vsp1_dl.h
index cf57f986b69a..031032e304d2 100644
--- a/drivers/media/platform/vsp1/vsp1_dl.h
+++ b/drivers/media/platform/vsp1/vsp1_dl.h
@@ -17,6 +17,7 @@
 
 struct vsp1_device;
 struct vsp1_dl_body;
+struct vsp1_dl_body_pool;
 struct vsp1_dl_list;
 struct vsp1_dl_manager;
 
@@ -34,6 +35,13 @@ void vsp1_dl_list_put(struct vsp1_dl_list *dl);
 void vsp1_dl_list_write(struct vsp1_dl_list *dl, u32 reg, u32 data);
 void vsp1_dl_list_commit(struct vsp1_dl_list *dl);
 
+struct vsp1_dl_body_pool *
+vsp1_dl_body_pool_create(struct vsp1_device *vsp1, unsigned int num_bodies,
+			 unsigned int num_entries, size_t extra_size);
+void vsp1_dl_body_pool_destroy(struct vsp1_dl_body_pool *pool);
+struct vsp1_dl_body *vsp1_dl_body_get(struct vsp1_dl_body_pool *pool);
+void vsp1_dl_body_put(struct vsp1_dl_body *dlb);
+
 struct vsp1_dl_body *vsp1_dl_body_alloc(struct vsp1_device *vsp1,
 					unsigned int num_entries);
 void vsp1_dl_body_free(struct vsp1_dl_body *dlb);
-- 
git-series 0.9.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v7 4/8] media: vsp1: Convert display lists to use new body pool
  2018-03-08  0:05 [PATCH v7 0/8] vsp1: TLB optimisation and DL caching Kieran Bingham
                   ` (2 preceding siblings ...)
  2018-03-08  0:05 ` [PATCH v7 3/8] media: vsp1: Provide a body pool Kieran Bingham
@ 2018-03-08  0:05 ` Kieran Bingham
  2018-04-06 22:55   ` Laurent Pinchart
  2018-03-08  0:05 ` [PATCH v7 5/8] media: vsp1: Use reference counting for bodies Kieran Bingham
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 26+ messages in thread
From: Kieran Bingham @ 2018-03-08  0:05 UTC (permalink / raw)
  To: linux-media, linux-renesas-soc, Laurent Pinchart
  Cc: Kieran Bingham, Kieran Bingham

Adapt the dl->body0 object to use an object from the body pool. This
greatly reduces the pressure on the TLB for IPMMU use cases, as all of
the lists use a single allocation for the main body.

The CLU and LUT objects pre-allocate a pool containing three bodies,
allowing a userspace update before the hardware has committed a previous
set of tables.

Bodies are no longer 'freed' in interrupt context, but instead released
back to their respective pools. This allows us to remove the garbage
collector in the DLM.

Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>

---
v3:
 - 's/fragment/body', 's/fragments/bodies/'
 - CLU/LUT now allocate 3 bodies
 - vsp1_dl_list_fragments_free -> vsp1_dl_list_bodies_put

v2:
 - Use dl->body0->max_entries to determine header offset, instead of the
   global constant VSP1_DL_NUM_ENTRIES which is incorrect.
 - squash updates for LUT, CLU, and fragment cleanup into single patch.
   (Not fully bisectable when separated)

 drivers/media/platform/vsp1/vsp1_clu.c |  27 ++-
 drivers/media/platform/vsp1/vsp1_clu.h |   1 +-
 drivers/media/platform/vsp1/vsp1_dl.c  | 223 ++++++--------------------
 drivers/media/platform/vsp1/vsp1_dl.h  |   3 +-
 drivers/media/platform/vsp1/vsp1_lut.c |  27 ++-
 drivers/media/platform/vsp1/vsp1_lut.h |   1 +-
 6 files changed, 101 insertions(+), 181 deletions(-)

diff --git a/drivers/media/platform/vsp1/vsp1_clu.c b/drivers/media/platform/vsp1/vsp1_clu.c
index 9621afa3658c..2018144470c5 100644
--- a/drivers/media/platform/vsp1/vsp1_clu.c
+++ b/drivers/media/platform/vsp1/vsp1_clu.c
@@ -23,6 +23,8 @@
 #define CLU_MIN_SIZE				4U
 #define CLU_MAX_SIZE				8190U
 
+#define CLU_SIZE				(17 * 17 * 17)
+
 /* -----------------------------------------------------------------------------
  * Device Access
  */
@@ -47,19 +49,19 @@ static int clu_set_table(struct vsp1_clu *clu, struct v4l2_ctrl *ctrl)
 	struct vsp1_dl_body *dlb;
 	unsigned int i;
 
-	dlb = vsp1_dl_body_alloc(clu->entity.vsp1, 1 + 17 * 17 * 17);
+	dlb = vsp1_dl_body_get(clu->pool);
 	if (!dlb)
 		return -ENOMEM;
 
 	vsp1_dl_body_write(dlb, VI6_CLU_ADDR, 0);
-	for (i = 0; i < 17 * 17 * 17; ++i)
+	for (i = 0; i < CLU_SIZE; ++i)
 		vsp1_dl_body_write(dlb, VI6_CLU_DATA, ctrl->p_new.p_u32[i]);
 
 	spin_lock_irq(&clu->lock);
 	swap(clu->clu, dlb);
 	spin_unlock_irq(&clu->lock);
 
-	vsp1_dl_body_free(dlb);
+	vsp1_dl_body_put(dlb);
 	return 0;
 }
 
@@ -261,8 +263,16 @@ static void clu_configure(struct vsp1_entity *entity,
 	}
 }
 
+static void clu_destroy(struct vsp1_entity *entity)
+{
+	struct vsp1_clu *clu = to_clu(&entity->subdev);
+
+	vsp1_dl_body_pool_destroy(clu->pool);
+}
+
 static const struct vsp1_entity_operations clu_entity_ops = {
 	.configure = clu_configure,
+	.destroy = clu_destroy,
 };
 
 /* -----------------------------------------------------------------------------
@@ -288,6 +298,17 @@ struct vsp1_clu *vsp1_clu_create(struct vsp1_device *vsp1)
 	if (ret < 0)
 		return ERR_PTR(ret);
 
+	/*
+	 * Pre-allocate a body pool, with 3 bodies allowing a userspace update
+	 * before the hardware has committed a previous set of tables, handling
+	 * both the queued and pending dl entries. One extra entry is added to
+	 * the CLU_SIZE to allow for the VI6_CLU_ADDR header.
+	 */
+	clu->pool = vsp1_dl_body_pool_create(clu->entity.vsp1, 3, CLU_SIZE + 1,
+					     0);
+	if (!clu->pool)
+		return ERR_PTR(-ENOMEM);
+
 	/* Initialize the control handler. */
 	v4l2_ctrl_handler_init(&clu->ctrls, 2);
 	v4l2_ctrl_new_custom(&clu->ctrls, &clu_table_control, NULL);
diff --git a/drivers/media/platform/vsp1/vsp1_clu.h b/drivers/media/platform/vsp1/vsp1_clu.h
index 036e0a2f1a42..fa3fe856725b 100644
--- a/drivers/media/platform/vsp1/vsp1_clu.h
+++ b/drivers/media/platform/vsp1/vsp1_clu.h
@@ -36,6 +36,7 @@ struct vsp1_clu {
 	spinlock_t lock;
 	unsigned int mode;
 	struct vsp1_dl_body *clu;
+	struct vsp1_dl_body_pool *pool;
 };
 
 static inline struct vsp1_clu *to_clu(struct v4l2_subdev *subdev)
diff --git a/drivers/media/platform/vsp1/vsp1_dl.c b/drivers/media/platform/vsp1/vsp1_dl.c
index 0208e72cb356..74476726451c 100644
--- a/drivers/media/platform/vsp1/vsp1_dl.c
+++ b/drivers/media/platform/vsp1/vsp1_dl.c
@@ -111,7 +111,7 @@ struct vsp1_dl_list {
 	struct vsp1_dl_header *header;
 	dma_addr_t dma;
 
-	struct vsp1_dl_body body0;
+	struct vsp1_dl_body *body0;
 	struct list_head bodies;
 
 	bool has_chain;
@@ -135,8 +135,6 @@ enum vsp1_dl_mode {
  * @queued: list queued to the hardware (written to the DL registers)
  * @pending: list waiting to be queued to the hardware
  * @pool: body pool for the display list bodies
- * @gc_work: bodies garbage collector work struct
- * @gc_bodies: array of display list bodies waiting to be freed
  */
 struct vsp1_dl_manager {
 	unsigned int index;
@@ -151,9 +149,6 @@ struct vsp1_dl_manager {
 	struct vsp1_dl_list *pending;
 
 	struct vsp1_dl_body_pool *pool;
-
-	struct work_struct gc_work;
-	struct list_head gc_bodies;
 };
 
 /* -----------------------------------------------------------------------------
@@ -291,89 +286,6 @@ void vsp1_dl_body_put(struct vsp1_dl_body *dlb)
 	spin_unlock_irqrestore(&dlb->pool->lock, flags);
 }
 
-/*
- * Initialize a display list body object and allocate DMA memory for the body
- * data. The display list body object is expected to have been initialized to
- * 0 when allocated.
- */
-static int vsp1_dl_body_init(struct vsp1_device *vsp1,
-			     struct vsp1_dl_body *dlb, unsigned int num_entries,
-			     size_t extra_size)
-{
-	size_t size = num_entries * sizeof(*dlb->entries) + extra_size;
-
-	dlb->vsp1 = vsp1;
-	dlb->size = size;
-	dlb->max_entries = num_entries;
-
-	dlb->entries = dma_alloc_wc(vsp1->bus_master, dlb->size, &dlb->dma,
-				    GFP_KERNEL);
-	if (!dlb->entries)
-		return -ENOMEM;
-
-	return 0;
-}
-
-/*
- * Cleanup a display list body and free allocated DMA memory allocated.
- */
-static void vsp1_dl_body_cleanup(struct vsp1_dl_body *dlb)
-{
-	dma_free_wc(dlb->vsp1->bus_master, dlb->size, dlb->entries, dlb->dma);
-}
-
-/**
- * vsp1_dl_body_alloc - Allocate a display list body
- * @vsp1: The VSP1 device
- * @num_entries: The maximum number of entries that the body can contain
- *
- * Allocate a display list body with enough memory to contain the requested
- * number of entries.
- *
- * Return a pointer to a body on success or NULL if memory can't be allocated.
- */
-struct vsp1_dl_body *vsp1_dl_body_alloc(struct vsp1_device *vsp1,
-					    unsigned int num_entries)
-{
-	struct vsp1_dl_body *dlb;
-	int ret;
-
-	dlb = kzalloc(sizeof(*dlb), GFP_KERNEL);
-	if (!dlb)
-		return NULL;
-
-	ret = vsp1_dl_body_init(vsp1, dlb, num_entries, 0);
-	if (ret < 0) {
-		kfree(dlb);
-		return NULL;
-	}
-
-	return dlb;
-}
-
-/**
- * vsp1_dl_body_free - Free a display list body
- * @dlb: The body
- *
- * Free the given display list body and the associated DMA memory.
- *
- * Bodies must only be freed explicitly if they are not added to a display
- * list, as the display list will take ownership of them and free them
- * otherwise. Manual free typically happens at cleanup time for bodies that
- * have been allocated but not used.
- *
- * Passing a NULL pointer to this function is safe, in that case no operation
- * will be performed.
- */
-void vsp1_dl_body_free(struct vsp1_dl_body *dlb)
-{
-	if (!dlb)
-		return;
-
-	vsp1_dl_body_cleanup(dlb);
-	kfree(dlb);
-}
-
 /**
  * vsp1_dl_body_write - Write a register to a display list body
  * @dlb: The body
@@ -399,11 +311,10 @@ void vsp1_dl_body_write(struct vsp1_dl_body *dlb, u32 reg, u32 data)
  * Display List Transaction Management
  */
 
-static struct vsp1_dl_list *vsp1_dl_list_alloc(struct vsp1_dl_manager *dlm)
+static struct vsp1_dl_list *vsp1_dl_list_alloc(struct vsp1_dl_manager *dlm,
+					       struct vsp1_dl_body_pool *pool)
 {
 	struct vsp1_dl_list *dl;
-	size_t header_size;
-	int ret;
 
 	dl = kzalloc(sizeof(*dl), GFP_KERNEL);
 	if (!dl)
@@ -412,41 +323,39 @@ static struct vsp1_dl_list *vsp1_dl_list_alloc(struct vsp1_dl_manager *dlm)
 	INIT_LIST_HEAD(&dl->bodies);
 	dl->dlm = dlm;
 
-	/*
-	 * Initialize the display list body and allocate DMA memory for the body
-	 * and the optional header. Both are allocated together to avoid memory
-	 * fragmentation, with the header located right after the body in
-	 * memory.
-	 */
-	header_size = dlm->mode == VSP1_DL_MODE_HEADER
-		    ? ALIGN(sizeof(struct vsp1_dl_header), 8)
-		    : 0;
-
-	ret = vsp1_dl_body_init(dlm->vsp1, &dl->body0, VSP1_DL_NUM_ENTRIES,
-				header_size);
-	if (ret < 0) {
-		kfree(dl);
+	/* Retrieve a body from our DLM body pool */
+	dl->body0 = vsp1_dl_body_get(pool);
+	if (!dl->body0)
 		return NULL;
-	}
-
 	if (dlm->mode == VSP1_DL_MODE_HEADER) {
-		size_t header_offset = VSP1_DL_NUM_ENTRIES
-				     * sizeof(*dl->body0.entries);
+		size_t header_offset = dl->body0->max_entries
+				     * sizeof(*dl->body0->entries);
 
-		dl->header = ((void *)dl->body0.entries) + header_offset;
-		dl->dma = dl->body0.dma + header_offset;
+		dl->header = ((void *)dl->body0->entries) + header_offset;
+		dl->dma = dl->body0->dma + header_offset;
 
 		memset(dl->header, 0, sizeof(*dl->header));
-		dl->header->lists[0].addr = dl->body0.dma;
+		dl->header->lists[0].addr = dl->body0->dma;
 	}
 
 	return dl;
 }
 
+static void vsp1_dl_list_bodies_put(struct vsp1_dl_list *dl)
+{
+	struct vsp1_dl_body *dlb, *tmp;
+
+	list_for_each_entry_safe(dlb, tmp, &dl->bodies, list) {
+		list_del(&dlb->list);
+		vsp1_dl_body_put(dlb);
+	}
+}
+
 static void vsp1_dl_list_free(struct vsp1_dl_list *dl)
 {
-	vsp1_dl_body_cleanup(&dl->body0);
-	list_splice_init(&dl->bodies, &dl->dlm->gc_bodies);
+	vsp1_dl_body_put(dl->body0);
+	vsp1_dl_list_bodies_put(dl);
+
 	kfree(dl);
 }
 
@@ -500,18 +409,13 @@ static void __vsp1_dl_list_put(struct vsp1_dl_list *dl)
 
 	dl->has_chain = false;
 
+	vsp1_dl_list_bodies_put(dl);
+
 	/*
-	 * We can't free bodies here as DMA memory can only be freed in
-	 * interruptible context. Move all bodies to the display list manager's
-	 * list of bodies to be freed, they will be garbage-collected by the
-	 * work queue.
+	 * body0 is reused as as an optimisation as presently every display list
+	 * has at least one body, thus we reinitialise the entries list
 	 */
-	if (!list_empty(&dl->bodies)) {
-		list_splice_init(&dl->bodies, &dl->dlm->gc_bodies);
-		schedule_work(&dl->dlm->gc_work);
-	}
-
-	dl->body0.num_entries = 0;
+	dl->body0->num_entries = 0;
 
 	list_add_tail(&dl->list, &dl->dlm->free);
 }
@@ -548,7 +452,7 @@ void vsp1_dl_list_put(struct vsp1_dl_list *dl)
  */
 void vsp1_dl_list_write(struct vsp1_dl_list *dl, u32 reg, u32 data)
 {
-	vsp1_dl_body_write(&dl->body0, reg, data);
+	vsp1_dl_body_write(dl->body0, reg, data);
 }
 
 /**
@@ -561,8 +465,7 @@ void vsp1_dl_list_write(struct vsp1_dl_list *dl, u32 reg, u32 data)
  * in the order in which bodies are added.
  *
  * Adding a body to a display list passes ownership of the body to the list. The
- * caller must not touch the body after this call, and must not free it
- * explicitly with vsp1_dl_body_free().
+ * caller must not touch the body after this call.
  *
  * Additional bodies are only usable for display lists in header mode.
  * Attempting to add a body to a header-less display list will return an error.
@@ -620,7 +523,7 @@ static void vsp1_dl_list_fill_header(struct vsp1_dl_list *dl, bool is_last)
 	 * list was allocated.
 	 */
 
-	hdr->num_bytes = dl->body0.num_entries
+	hdr->num_bytes = dl->body0->num_entries
 		       * sizeof(*dl->header->lists);
 
 	list_for_each_entry(dlb, &dl->bodies, list) {
@@ -694,9 +597,9 @@ static void vsp1_dl_list_hw_enqueue(struct vsp1_dl_list *dl)
 		 * bit will be cleared by the hardware when the display list
 		 * processing starts.
 		 */
-		vsp1_write(vsp1, VI6_DL_HDR_ADDR(0), dl->body0.dma);
+		vsp1_write(vsp1, VI6_DL_HDR_ADDR(0), dl->body0->dma);
 		vsp1_write(vsp1, VI6_DL_BODY_SIZE, VI6_DL_BODY_SIZE_UPD |
-			   (dl->body0.num_entries * sizeof(*dl->header->lists)));
+			(dl->body0->num_entries * sizeof(*dl->header->lists)));
 	} else {
 		/*
 		 * In header mode, program the display list header address. If
@@ -879,45 +782,12 @@ void vsp1_dlm_reset(struct vsp1_dl_manager *dlm)
 	dlm->pending = NULL;
 }
 
-/*
- * Free all bodies awaiting to be garbage-collected.
- *
- * This function must be called without the display list manager lock held.
- */
-static void vsp1_dlm_bodies_free(struct vsp1_dl_manager *dlm)
-{
-	unsigned long flags;
-
-	spin_lock_irqsave(&dlm->lock, flags);
-
-	while (!list_empty(&dlm->gc_bodies)) {
-		struct vsp1_dl_body *dlb;
-
-		dlb = list_first_entry(&dlm->gc_bodies, struct vsp1_dl_body,
-				       list);
-		list_del(&dlb->list);
-
-		spin_unlock_irqrestore(&dlm->lock, flags);
-		vsp1_dl_body_free(dlb);
-		spin_lock_irqsave(&dlm->lock, flags);
-	}
-
-	spin_unlock_irqrestore(&dlm->lock, flags);
-}
-
-static void vsp1_dlm_garbage_collect(struct work_struct *work)
-{
-	struct vsp1_dl_manager *dlm =
-		container_of(work, struct vsp1_dl_manager, gc_work);
-
-	vsp1_dlm_bodies_free(dlm);
-}
-
 struct vsp1_dl_manager *vsp1_dlm_create(struct vsp1_device *vsp1,
 					unsigned int index,
 					unsigned int prealloc)
 {
 	struct vsp1_dl_manager *dlm;
+	size_t header_size;
 	unsigned int i;
 
 	dlm = devm_kzalloc(vsp1->dev, sizeof(*dlm), GFP_KERNEL);
@@ -932,13 +802,26 @@ struct vsp1_dl_manager *vsp1_dlm_create(struct vsp1_device *vsp1,
 
 	spin_lock_init(&dlm->lock);
 	INIT_LIST_HEAD(&dlm->free);
-	INIT_LIST_HEAD(&dlm->gc_bodies);
-	INIT_WORK(&dlm->gc_work, vsp1_dlm_garbage_collect);
+
+	/*
+	 * Initialize the display list body and allocate DMA memory for the body
+	 * and the optional header. Both are allocated together to avoid memory
+	 * fragmentation, with the header located right after the body in
+	 * memory.
+	 */
+	header_size = dlm->mode == VSP1_DL_MODE_HEADER
+		    ? ALIGN(sizeof(struct vsp1_dl_header), 8)
+		    : 0;
+
+	dlm->pool = vsp1_dl_body_pool_create(vsp1, prealloc,
+					     VSP1_DL_NUM_ENTRIES, header_size);
+	if (!dlm->pool)
+		return NULL;
 
 	for (i = 0; i < prealloc; ++i) {
 		struct vsp1_dl_list *dl;
 
-		dl = vsp1_dl_list_alloc(dlm);
+		dl = vsp1_dl_list_alloc(dlm, dlm->pool);
 		if (!dl)
 			return NULL;
 
@@ -955,12 +838,10 @@ void vsp1_dlm_destroy(struct vsp1_dl_manager *dlm)
 	if (!dlm)
 		return;
 
-	cancel_work_sync(&dlm->gc_work);
-
 	list_for_each_entry_safe(dl, next, &dlm->free, list) {
 		list_del(&dl->list);
 		vsp1_dl_list_free(dl);
 	}
 
-	vsp1_dlm_bodies_free(dlm);
+	vsp1_dl_body_pool_destroy(dlm->pool);
 }
diff --git a/drivers/media/platform/vsp1/vsp1_dl.h b/drivers/media/platform/vsp1/vsp1_dl.h
index 031032e304d2..7e820ac6865a 100644
--- a/drivers/media/platform/vsp1/vsp1_dl.h
+++ b/drivers/media/platform/vsp1/vsp1_dl.h
@@ -42,9 +42,6 @@ void vsp1_dl_body_pool_destroy(struct vsp1_dl_body_pool *pool);
 struct vsp1_dl_body *vsp1_dl_body_get(struct vsp1_dl_body_pool *pool);
 void vsp1_dl_body_put(struct vsp1_dl_body *dlb);
 
-struct vsp1_dl_body *vsp1_dl_body_alloc(struct vsp1_device *vsp1,
-					unsigned int num_entries);
-void vsp1_dl_body_free(struct vsp1_dl_body *dlb);
 void vsp1_dl_body_write(struct vsp1_dl_body *dlb, u32 reg, u32 data);
 int vsp1_dl_list_add_body(struct vsp1_dl_list *dl, struct vsp1_dl_body *dlb);
 int vsp1_dl_list_add_chain(struct vsp1_dl_list *head, struct vsp1_dl_list *dl);
diff --git a/drivers/media/platform/vsp1/vsp1_lut.c b/drivers/media/platform/vsp1/vsp1_lut.c
index aa2b40327529..262cb72139d6 100644
--- a/drivers/media/platform/vsp1/vsp1_lut.c
+++ b/drivers/media/platform/vsp1/vsp1_lut.c
@@ -23,6 +23,8 @@
 #define LUT_MIN_SIZE				4U
 #define LUT_MAX_SIZE				8190U
 
+#define LUT_SIZE				256
+
 /* -----------------------------------------------------------------------------
  * Device Access
  */
@@ -44,11 +46,11 @@ static int lut_set_table(struct vsp1_lut *lut, struct v4l2_ctrl *ctrl)
 	struct vsp1_dl_body *dlb;
 	unsigned int i;
 
-	dlb = vsp1_dl_body_alloc(lut->entity.vsp1, 256);
+	dlb = vsp1_dl_body_get(lut->pool);
 	if (!dlb)
 		return -ENOMEM;
 
-	for (i = 0; i < 256; ++i)
+	for (i = 0; i < LUT_SIZE; ++i)
 		vsp1_dl_body_write(dlb, VI6_LUT_TABLE + 4 * i,
 				       ctrl->p_new.p_u32[i]);
 
@@ -56,7 +58,7 @@ static int lut_set_table(struct vsp1_lut *lut, struct v4l2_ctrl *ctrl)
 	swap(lut->lut, dlb);
 	spin_unlock_irq(&lut->lock);
 
-	vsp1_dl_body_free(dlb);
+	vsp1_dl_body_put(dlb);
 	return 0;
 }
 
@@ -87,7 +89,7 @@ static const struct v4l2_ctrl_config lut_table_control = {
 	.max = 0x00ffffff,
 	.step = 1,
 	.def = 0,
-	.dims = { 256},
+	.dims = { LUT_SIZE },
 };
 
 /* -----------------------------------------------------------------------------
@@ -217,8 +219,16 @@ static void lut_configure(struct vsp1_entity *entity,
 	}
 }
 
+static void lut_destroy(struct vsp1_entity *entity)
+{
+	struct vsp1_lut *lut = to_lut(&entity->subdev);
+
+	vsp1_dl_body_pool_destroy(lut->pool);
+}
+
 static const struct vsp1_entity_operations lut_entity_ops = {
 	.configure = lut_configure,
+	.destroy = lut_destroy,
 };
 
 /* -----------------------------------------------------------------------------
@@ -244,6 +254,15 @@ struct vsp1_lut *vsp1_lut_create(struct vsp1_device *vsp1)
 	if (ret < 0)
 		return ERR_PTR(ret);
 
+	/*
+	 * Pre-allocate a body pool, with 3 bodies allowing a userspace update
+	 * before the hardware has committed a previous set of tables, handling
+	 * both the queued and pending dl entries.
+	 */
+	lut->pool = vsp1_dl_body_pool_create(vsp1, 3, LUT_SIZE, 0);
+	if (!lut->pool)
+		return ERR_PTR(-ENOMEM);
+
 	/* Initialize the control handler. */
 	v4l2_ctrl_handler_init(&lut->ctrls, 1);
 	v4l2_ctrl_new_custom(&lut->ctrls, &lut_table_control, NULL);
diff --git a/drivers/media/platform/vsp1/vsp1_lut.h b/drivers/media/platform/vsp1/vsp1_lut.h
index f8c4e8f0a79d..499ed0070bd2 100644
--- a/drivers/media/platform/vsp1/vsp1_lut.h
+++ b/drivers/media/platform/vsp1/vsp1_lut.h
@@ -33,6 +33,7 @@ struct vsp1_lut {
 
 	spinlock_t lock;
 	struct vsp1_dl_body *lut;
+	struct vsp1_dl_body_pool *pool;
 };
 
 static inline struct vsp1_lut *to_lut(struct v4l2_subdev *subdev)
-- 
git-series 0.9.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v7 5/8] media: vsp1: Use reference counting for bodies
  2018-03-08  0:05 [PATCH v7 0/8] vsp1: TLB optimisation and DL caching Kieran Bingham
                   ` (3 preceding siblings ...)
  2018-03-08  0:05 ` [PATCH v7 4/8] media: vsp1: Convert display lists to use new " Kieran Bingham
@ 2018-03-08  0:05 ` Kieran Bingham
  2018-04-06 23:06   ` Laurent Pinchart
  2018-03-08  0:05 ` [PATCH v7 6/8] media: vsp1: Refactor display list configure operations Kieran Bingham
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 26+ messages in thread
From: Kieran Bingham @ 2018-03-08  0:05 UTC (permalink / raw)
  To: linux-media, linux-renesas-soc, Laurent Pinchart
  Cc: Kieran Bingham, Kieran Bingham

Extend the display list body with a reference count, allowing bodies to
be kept as long as a reference is maintained. This provides the ability
to keep a cached copy of bodies which will not change, so that they can
be re-applied to multiple display lists.

Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>

---
This could be squashed into the body update code, but it's not a
straightforward squash as the refcounts will affect both:
  v4l: vsp1: Provide a body pool
and
  v4l: vsp1: Convert display lists to use new body pool
therefore, I have kept this separate to prevent breaking bisectability
of the vsp-tests.

v3:
 - 's/fragment/body/'

v4:
 - Fix up reference handling comments.

 drivers/media/platform/vsp1/vsp1_clu.c |  7 ++++++-
 drivers/media/platform/vsp1/vsp1_dl.c  | 15 ++++++++++++++-
 drivers/media/platform/vsp1/vsp1_lut.c |  7 ++++++-
 3 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/drivers/media/platform/vsp1/vsp1_clu.c b/drivers/media/platform/vsp1/vsp1_clu.c
index 2018144470c5..b2a39a6ef7e4 100644
--- a/drivers/media/platform/vsp1/vsp1_clu.c
+++ b/drivers/media/platform/vsp1/vsp1_clu.c
@@ -257,8 +257,13 @@ static void clu_configure(struct vsp1_entity *entity,
 		clu->clu = NULL;
 		spin_unlock_irqrestore(&clu->lock, flags);
 
-		if (dlb)
+		if (dlb) {
 			vsp1_dl_list_add_body(dl, dlb);
+
+			/* release our local reference */
+			vsp1_dl_body_put(dlb);
+		}
+
 		break;
 	}
 }
diff --git a/drivers/media/platform/vsp1/vsp1_dl.c b/drivers/media/platform/vsp1/vsp1_dl.c
index 74476726451c..134865287c02 100644
--- a/drivers/media/platform/vsp1/vsp1_dl.c
+++ b/drivers/media/platform/vsp1/vsp1_dl.c
@@ -14,6 +14,7 @@
 #include <linux/device.h>
 #include <linux/dma-mapping.h>
 #include <linux/gfp.h>
+#include <linux/refcount.h>
 #include <linux/slab.h>
 #include <linux/workqueue.h>
 
@@ -58,6 +59,8 @@ struct vsp1_dl_body {
 	struct list_head list;
 	struct list_head free;
 
+	refcount_t refcnt;
+
 	struct vsp1_dl_body_pool *pool;
 	struct vsp1_device *vsp1;
 
@@ -259,6 +262,7 @@ struct vsp1_dl_body *vsp1_dl_body_get(struct vsp1_dl_body_pool *pool)
 	if (!list_empty(&pool->free)) {
 		dlb = list_first_entry(&pool->free, struct vsp1_dl_body, free);
 		list_del(&dlb->free);
+		refcount_set(&dlb->refcnt, 1);
 	}
 
 	spin_unlock_irqrestore(&pool->lock, flags);
@@ -279,6 +283,9 @@ void vsp1_dl_body_put(struct vsp1_dl_body *dlb)
 	if (!dlb)
 		return;
 
+	if (!refcount_dec_and_test(&dlb->refcnt))
+		return;
+
 	dlb->num_entries = 0;
 
 	spin_lock_irqsave(&dlb->pool->lock, flags);
@@ -465,7 +472,11 @@ void vsp1_dl_list_write(struct vsp1_dl_list *dl, u32 reg, u32 data)
  * in the order in which bodies are added.
  *
  * Adding a body to a display list passes ownership of the body to the list. The
- * caller must not touch the body after this call.
+ * caller retains its reference to the fragment when adding it to the display
+ * list, but is not allowed to add new entries to the body.
+ *
+ * The reference must be explicitly released by a call to vsp1_dl_body_put()
+ * when the body isn't needed anymore.
  *
  * Additional bodies are only usable for display lists in header mode.
  * Attempting to add a body to a header-less display list will return an error.
@@ -476,6 +487,8 @@ int vsp1_dl_list_add_body(struct vsp1_dl_list *dl, struct vsp1_dl_body *dlb)
 	if (dl->dlm->mode != VSP1_DL_MODE_HEADER)
 		return -EINVAL;
 
+	refcount_inc(&dlb->refcnt);
+
 	list_add_tail(&dlb->list, &dl->bodies);
 
 	return 0;
diff --git a/drivers/media/platform/vsp1/vsp1_lut.c b/drivers/media/platform/vsp1/vsp1_lut.c
index 262cb72139d6..77cf7137a0f2 100644
--- a/drivers/media/platform/vsp1/vsp1_lut.c
+++ b/drivers/media/platform/vsp1/vsp1_lut.c
@@ -213,8 +213,13 @@ static void lut_configure(struct vsp1_entity *entity,
 		lut->lut = NULL;
 		spin_unlock_irqrestore(&lut->lock, flags);
 
-		if (dlb)
+		if (dlb) {
 			vsp1_dl_list_add_body(dl, dlb);
+
+			/* release our local reference */
+			vsp1_dl_body_put(dlb);
+		}
+
 		break;
 	}
 }
-- 
git-series 0.9.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v7 6/8] media: vsp1: Refactor display list configure operations
  2018-03-08  0:05 [PATCH v7 0/8] vsp1: TLB optimisation and DL caching Kieran Bingham
                   ` (4 preceding siblings ...)
  2018-03-08  0:05 ` [PATCH v7 5/8] media: vsp1: Use reference counting for bodies Kieran Bingham
@ 2018-03-08  0:05 ` Kieran Bingham
  2018-04-06 23:38   ` Laurent Pinchart
  2018-03-08  0:05 ` [PATCH v7 7/8] media: vsp1: Adapt entities to configure into a body Kieran Bingham
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 26+ messages in thread
From: Kieran Bingham @ 2018-03-08  0:05 UTC (permalink / raw)
  To: linux-media, linux-renesas-soc, Laurent Pinchart
  Cc: Kieran Bingham, Kieran Bingham

The entities provide a single .configure operation which configures the
object into the target display list, based on the vsp1_entity_params
selection.

This restricts us to a single function prototype for both static
configuration (the pre-stream INIT stage) and the dynamic runtime stages
for both each frame - and each partition therein.

Split the configure function into two parts, '.configure_stream()' and
'.configure_frame()', merging both the VSP1_ENTITY_PARAMS_RUNTIME and
VSP1_ENTITY_PARAMS_PARTITION stages into a single call through the
.configure_frame(). The configuration for individual partitions is
handled by passing the partition number to the configure call, and
processing any runtime stage actions on the first partition only.

Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>

---
v7
 - Fix formatting and white space
 - s/prepare/configure_stream/
 - s/configure/configure_frame/

 drivers/media/platform/vsp1/vsp1_bru.c    |  12 +-
 drivers/media/platform/vsp1/vsp1_clu.c    |  50 +---
 drivers/media/platform/vsp1/vsp1_dl.h     |   1 +-
 drivers/media/platform/vsp1/vsp1_drm.c    |  21 +--
 drivers/media/platform/vsp1/vsp1_entity.c |  17 +-
 drivers/media/platform/vsp1/vsp1_entity.h |  33 +--
 drivers/media/platform/vsp1/vsp1_hgo.c    |  12 +-
 drivers/media/platform/vsp1/vsp1_hgt.c    |  12 +-
 drivers/media/platform/vsp1/vsp1_hsit.c   |  12 +-
 drivers/media/platform/vsp1/vsp1_lif.c    |  12 +-
 drivers/media/platform/vsp1/vsp1_lut.c    |  32 +-
 drivers/media/platform/vsp1/vsp1_rpf.c    | 164 ++++++-------
 drivers/media/platform/vsp1/vsp1_sru.c    |  12 +-
 drivers/media/platform/vsp1/vsp1_uds.c    |  57 ++--
 drivers/media/platform/vsp1/vsp1_video.c  |  24 +--
 drivers/media/platform/vsp1/vsp1_wpf.c    | 299 ++++++++++++-----------
 16 files changed, 378 insertions(+), 392 deletions(-)

diff --git a/drivers/media/platform/vsp1/vsp1_bru.c b/drivers/media/platform/vsp1/vsp1_bru.c
index e8fd2ae3b3eb..d6fd265eaccb 100644
--- a/drivers/media/platform/vsp1/vsp1_bru.c
+++ b/drivers/media/platform/vsp1/vsp1_bru.c
@@ -285,19 +285,15 @@ static const struct v4l2_subdev_ops bru_ops = {
  * VSP1 Entity Operations
  */
 
-static void bru_configure(struct vsp1_entity *entity,
-			  struct vsp1_pipeline *pipe,
-			  struct vsp1_dl_list *dl,
-			  enum vsp1_entity_params params)
+static void bru_configure_stream(struct vsp1_entity *entity,
+				 struct vsp1_pipeline *pipe,
+				 struct vsp1_dl_list *dl)
 {
 	struct vsp1_bru *bru = to_bru(&entity->subdev);
 	struct v4l2_mbus_framefmt *format;
 	unsigned int flags;
 	unsigned int i;
 
-	if (params != VSP1_ENTITY_PARAMS_INIT)
-		return;
-
 	format = vsp1_entity_get_pad_format(&bru->entity, bru->entity.config,
 					    bru->entity.source_pad);
 
@@ -404,7 +400,7 @@ static void bru_configure(struct vsp1_entity *entity,
 }
 
 static const struct vsp1_entity_operations bru_entity_ops = {
-	.configure = bru_configure,
+	.configure_stream = bru_configure_stream,
 };
 
 /* -----------------------------------------------------------------------------
diff --git a/drivers/media/platform/vsp1/vsp1_clu.c b/drivers/media/platform/vsp1/vsp1_clu.c
index b2a39a6ef7e4..b8d8af6d4910 100644
--- a/drivers/media/platform/vsp1/vsp1_clu.c
+++ b/drivers/media/platform/vsp1/vsp1_clu.c
@@ -213,37 +213,36 @@ static const struct v4l2_subdev_ops clu_ops = {
 /* -----------------------------------------------------------------------------
  * VSP1 Entity Operations
  */
+static void clu_configure_stream(struct vsp1_entity *entity,
+				 struct vsp1_pipeline *pipe,
+				 struct vsp1_dl_list *dl)
+{
+	struct vsp1_clu *clu = to_clu(&entity->subdev);
+
+	/*
+	 * The yuv_mode can't be changed during streaming. Cache it internally
+	 * for future runtime configuration calls.
+	 */
+	struct v4l2_mbus_framefmt *format;
+
+	format = vsp1_entity_get_pad_format(&clu->entity,
+					    clu->entity.config,
+					    CLU_PAD_SINK);
+	clu->yuv_mode = format->code == MEDIA_BUS_FMT_AYUV8_1X32;
+}
 
-static void clu_configure(struct vsp1_entity *entity,
-			  struct vsp1_pipeline *pipe,
-			  struct vsp1_dl_list *dl,
-			  enum vsp1_entity_params params)
+static void clu_configure_frame(struct vsp1_entity *entity,
+				struct vsp1_pipeline *pipe,
+				struct vsp1_dl_list *dl,
+				unsigned int partition)
 {
 	struct vsp1_clu *clu = to_clu(&entity->subdev);
 	struct vsp1_dl_body *dlb;
 	unsigned long flags;
 	u32 ctrl = VI6_CLU_CTRL_AAI | VI6_CLU_CTRL_MVS | VI6_CLU_CTRL_EN;
 
-	switch (params) {
-	case VSP1_ENTITY_PARAMS_INIT: {
-		/*
-		 * The format can't be changed during streaming, only verify it
-		 * at setup time and store the information internally for future
-		 * runtime configuration calls.
-		 */
-		struct v4l2_mbus_framefmt *format;
-
-		format = vsp1_entity_get_pad_format(&clu->entity,
-						    clu->entity.config,
-						    CLU_PAD_SINK);
-		clu->yuv_mode = format->code == MEDIA_BUS_FMT_AYUV8_1X32;
-		break;
-	}
 
-	case VSP1_ENTITY_PARAMS_PARTITION:
-		break;
-
-	case VSP1_ENTITY_PARAMS_RUNTIME:
+	if (partition == 0) {
 		/* 2D mode can only be used with the YCbCr pixel encoding. */
 		if (clu->mode == V4L2_CID_VSP1_CLU_MODE_2D && clu->yuv_mode)
 			ctrl |= VI6_CLU_CTRL_AX1I_2D | VI6_CLU_CTRL_AX2I_2D
@@ -263,8 +262,6 @@ static void clu_configure(struct vsp1_entity *entity,
 			/* release our local reference */
 			vsp1_dl_body_put(dlb);
 		}
-
-		break;
 	}
 }
 
@@ -276,7 +273,8 @@ static void clu_destroy(struct vsp1_entity *entity)
 }
 
 static const struct vsp1_entity_operations clu_entity_ops = {
-	.configure = clu_configure,
+	.configure_stream = clu_configure_stream,
+	.configure_frame = clu_configure_frame,
 	.destroy = clu_destroy,
 };
 
diff --git a/drivers/media/platform/vsp1/vsp1_dl.h b/drivers/media/platform/vsp1/vsp1_dl.h
index 7e820ac6865a..f45083251644 100644
--- a/drivers/media/platform/vsp1/vsp1_dl.h
+++ b/drivers/media/platform/vsp1/vsp1_dl.h
@@ -41,7 +41,6 @@ vsp1_dl_body_pool_create(struct vsp1_device *vsp1, unsigned int num_bodies,
 void vsp1_dl_body_pool_destroy(struct vsp1_dl_body_pool *pool);
 struct vsp1_dl_body *vsp1_dl_body_get(struct vsp1_dl_body_pool *pool);
 void vsp1_dl_body_put(struct vsp1_dl_body *dlb);
-
 void vsp1_dl_body_write(struct vsp1_dl_body *dlb, u32 reg, u32 data);
 int vsp1_dl_list_add_body(struct vsp1_dl_list *dl, struct vsp1_dl_body *dlb);
 int vsp1_dl_list_add_chain(struct vsp1_dl_list *head, struct vsp1_dl_list *dl);
diff --git a/drivers/media/platform/vsp1/vsp1_drm.c b/drivers/media/platform/vsp1/vsp1_drm.c
index b8fee1834253..12db55d43d01 100644
--- a/drivers/media/platform/vsp1/vsp1_drm.c
+++ b/drivers/media/platform/vsp1/vsp1_drm.c
@@ -259,14 +259,8 @@ int vsp1_du_setup_lif(struct device *dev, unsigned int pipe_index,
 	list_for_each_entry_safe(entity, next, &pipe->entities, list_pipe) {
 		vsp1_entity_route_setup(entity, pipe, dl);
 
-		if (entity->ops->configure) {
-			entity->ops->configure(entity, pipe, dl,
-					       VSP1_ENTITY_PARAMS_INIT);
-			entity->ops->configure(entity, pipe, dl,
-					       VSP1_ENTITY_PARAMS_RUNTIME);
-			entity->ops->configure(entity, pipe, dl,
-					       VSP1_ENTITY_PARAMS_PARTITION);
-		}
+		vsp1_entity_configure_stream(entity, pipe, dl);
+		vsp1_entity_configure_frame(entity, pipe, dl, 0);
 	}
 
 	vsp1_dl_list_commit(dl);
@@ -588,15 +582,8 @@ void vsp1_du_atomic_flush(struct device *dev, unsigned int pipe_index)
 		}
 
 		vsp1_entity_route_setup(entity, pipe, dl);
-
-		if (entity->ops->configure) {
-			entity->ops->configure(entity, pipe, dl,
-					       VSP1_ENTITY_PARAMS_INIT);
-			entity->ops->configure(entity, pipe, dl,
-					       VSP1_ENTITY_PARAMS_RUNTIME);
-			entity->ops->configure(entity, pipe, dl,
-					       VSP1_ENTITY_PARAMS_PARTITION);
-		}
+		vsp1_entity_configure_stream(entity, pipe, dl);
+		vsp1_entity_configure_frame(entity, pipe, dl, 0);
 	}
 
 	vsp1_dl_list_commit(dl);
diff --git a/drivers/media/platform/vsp1/vsp1_entity.c b/drivers/media/platform/vsp1/vsp1_entity.c
index 54de15095709..472284b638bc 100644
--- a/drivers/media/platform/vsp1/vsp1_entity.c
+++ b/drivers/media/platform/vsp1/vsp1_entity.c
@@ -73,6 +73,23 @@ void vsp1_entity_route_setup(struct vsp1_entity *entity,
 	vsp1_dl_list_write(dl, source->route->reg, route);
 }
 
+void vsp1_entity_configure_stream(struct vsp1_entity *entity,
+				  struct vsp1_pipeline *pipe,
+				  struct vsp1_dl_list *dl)
+{
+	if (entity->ops->configure_stream)
+		entity->ops->configure_stream(entity, pipe, dl);
+}
+
+void vsp1_entity_configure_frame(struct vsp1_entity *entity,
+				 struct vsp1_pipeline *pipe,
+				 struct vsp1_dl_list *dl,
+				 unsigned int partition)
+{
+	if (entity->ops->configure_frame)
+		entity->ops->configure_frame(entity, pipe, dl, partition);
+}
+
 /* -----------------------------------------------------------------------------
  * V4L2 Subdevice Operations
  */
diff --git a/drivers/media/platform/vsp1/vsp1_entity.h b/drivers/media/platform/vsp1/vsp1_entity.h
index 408602ebeb97..b44ed5414fc3 100644
--- a/drivers/media/platform/vsp1/vsp1_entity.h
+++ b/drivers/media/platform/vsp1/vsp1_entity.h
@@ -40,18 +40,6 @@ enum vsp1_entity_type {
 	VSP1_ENTITY_WPF,
 };
 
-/**
- * enum vsp1_entity_params - Entity configuration parameters class
- * @VSP1_ENTITY_PARAMS_INIT - Initial parameters
- * @VSP1_ENTITY_PARAMS_PARTITION - Per-image partition parameters
- * @VSP1_ENTITY_PARAMS_RUNTIME - Runtime-configurable parameters
- */
-enum vsp1_entity_params {
-	VSP1_ENTITY_PARAMS_INIT,
-	VSP1_ENTITY_PARAMS_PARTITION,
-	VSP1_ENTITY_PARAMS_RUNTIME,
-};
-
 #define VSP1_ENTITY_MAX_INPUTS		5	/* For the BRU */
 
 /*
@@ -80,8 +68,10 @@ struct vsp1_route {
 /**
  * struct vsp1_entity_operations - Entity operations
  * @destroy:	Destroy the entity.
- * @configure:	Setup the hardware based on the entity state (pipeline, formats,
- *		selection rectangles, ...)
+ * @configure_stream:	Setup the initial hardware parameters for the stream
+ *			(pipeline, formats)
+ * @configure_frame:	Configure the runtime parameters for each partition
+ *			(rectangles, buffer addresses, ...)
  * @max_width:	Return the max supported width of data that the entity can
  *		process in a single operation.
  * @partition:	Process the partition construction based on this entity's
@@ -89,8 +79,10 @@ struct vsp1_route {
  */
 struct vsp1_entity_operations {
 	void (*destroy)(struct vsp1_entity *);
-	void (*configure)(struct vsp1_entity *, struct vsp1_pipeline *,
-			  struct vsp1_dl_list *, enum vsp1_entity_params);
+	void (*configure_stream)(struct vsp1_entity *, struct vsp1_pipeline *,
+				 struct vsp1_dl_list *);
+	void (*configure_frame)(struct vsp1_entity *, struct vsp1_pipeline *,
+				struct vsp1_dl_list *, unsigned int partition);
 	unsigned int (*max_width)(struct vsp1_entity *, struct vsp1_pipeline *);
 	void (*partition)(struct vsp1_entity *, struct vsp1_pipeline *,
 			  struct vsp1_partition *, unsigned int,
@@ -157,6 +149,15 @@ void vsp1_entity_route_setup(struct vsp1_entity *entity,
 			     struct vsp1_pipeline *pipe,
 			     struct vsp1_dl_list *dl);
 
+void vsp1_entity_configure_stream(struct vsp1_entity *entity,
+				  struct vsp1_pipeline *pipe,
+				  struct vsp1_dl_list *dl);
+
+void vsp1_entity_configure_frame(struct vsp1_entity *entity,
+				 struct vsp1_pipeline *pipe,
+				 struct vsp1_dl_list *dl,
+				 unsigned int partition);
+
 struct media_pad *vsp1_entity_remote_pad(struct media_pad *pad);
 
 int vsp1_subdev_get_pad_format(struct v4l2_subdev *subdev,
diff --git a/drivers/media/platform/vsp1/vsp1_hgo.c b/drivers/media/platform/vsp1/vsp1_hgo.c
index 50309c053b78..ddba9f83ac7d 100644
--- a/drivers/media/platform/vsp1/vsp1_hgo.c
+++ b/drivers/media/platform/vsp1/vsp1_hgo.c
@@ -133,10 +133,9 @@ static const struct v4l2_ctrl_config hgo_num_bins_control = {
  * VSP1 Entity Operations
  */
 
-static void hgo_configure(struct vsp1_entity *entity,
-			  struct vsp1_pipeline *pipe,
-			  struct vsp1_dl_list *dl,
-			  enum vsp1_entity_params params)
+static void hgo_configure_stream(struct vsp1_entity *entity,
+				 struct vsp1_pipeline *pipe,
+				 struct vsp1_dl_list *dl)
 {
 	struct vsp1_hgo *hgo = to_hgo(&entity->subdev);
 	struct v4l2_rect *compose;
@@ -144,9 +143,6 @@ static void hgo_configure(struct vsp1_entity *entity,
 	unsigned int hratio;
 	unsigned int vratio;
 
-	if (params != VSP1_ENTITY_PARAMS_INIT)
-		return;
-
 	crop = vsp1_entity_get_pad_selection(entity, entity->config,
 					     HISTO_PAD_SINK, V4L2_SEL_TGT_CROP);
 	compose = vsp1_entity_get_pad_selection(entity, entity->config,
@@ -178,7 +174,7 @@ static void hgo_configure(struct vsp1_entity *entity,
 }
 
 static const struct vsp1_entity_operations hgo_entity_ops = {
-	.configure = hgo_configure,
+	.configure_stream = hgo_configure_stream,
 	.destroy = vsp1_histogram_destroy,
 };
 
diff --git a/drivers/media/platform/vsp1/vsp1_hgt.c b/drivers/media/platform/vsp1/vsp1_hgt.c
index b5ce305e3e6f..c7cde8e90029 100644
--- a/drivers/media/platform/vsp1/vsp1_hgt.c
+++ b/drivers/media/platform/vsp1/vsp1_hgt.c
@@ -129,10 +129,9 @@ static const struct v4l2_ctrl_config hgt_hue_areas = {
  * VSP1 Entity Operations
  */
 
-static void hgt_configure(struct vsp1_entity *entity,
-			  struct vsp1_pipeline *pipe,
-			  struct vsp1_dl_list *dl,
-			  enum vsp1_entity_params params)
+static void hgt_configure_stream(struct vsp1_entity *entity,
+				 struct vsp1_pipeline *pipe,
+				 struct vsp1_dl_list *dl)
 {
 	struct vsp1_hgt *hgt = to_hgt(&entity->subdev);
 	struct v4l2_rect *compose;
@@ -143,9 +142,6 @@ static void hgt_configure(struct vsp1_entity *entity,
 	u8 upper;
 	unsigned int i;
 
-	if (params != VSP1_ENTITY_PARAMS_INIT)
-		return;
-
 	crop = vsp1_entity_get_pad_selection(entity, entity->config,
 					     HISTO_PAD_SINK, V4L2_SEL_TGT_CROP);
 	compose = vsp1_entity_get_pad_selection(entity, entity->config,
@@ -179,7 +175,7 @@ static void hgt_configure(struct vsp1_entity *entity,
 }
 
 static const struct vsp1_entity_operations hgt_entity_ops = {
-	.configure = hgt_configure,
+	.configure_stream = hgt_configure_stream,
 	.destroy = vsp1_histogram_destroy,
 };
 
diff --git a/drivers/media/platform/vsp1/vsp1_hsit.c b/drivers/media/platform/vsp1/vsp1_hsit.c
index 764d405345ee..0452f99592f8 100644
--- a/drivers/media/platform/vsp1/vsp1_hsit.c
+++ b/drivers/media/platform/vsp1/vsp1_hsit.c
@@ -131,16 +131,12 @@ static const struct v4l2_subdev_ops hsit_ops = {
  * VSP1 Entity Operations
  */
 
-static void hsit_configure(struct vsp1_entity *entity,
-			   struct vsp1_pipeline *pipe,
-			   struct vsp1_dl_list *dl,
-			   enum vsp1_entity_params params)
+static void hsit_configure_stream(struct vsp1_entity *entity,
+				  struct vsp1_pipeline *pipe,
+				  struct vsp1_dl_list *dl)
 {
 	struct vsp1_hsit *hsit = to_hsit(&entity->subdev);
 
-	if (params != VSP1_ENTITY_PARAMS_INIT)
-		return;
-
 	if (hsit->inverse)
 		vsp1_hsit_write(hsit, dl, VI6_HSI_CTRL, VI6_HSI_CTRL_EN);
 	else
@@ -148,7 +144,7 @@ static void hsit_configure(struct vsp1_entity *entity,
 }
 
 static const struct vsp1_entity_operations hsit_entity_ops = {
-	.configure = hsit_configure,
+	.configure_stream = hsit_configure_stream,
 };
 
 /* -----------------------------------------------------------------------------
diff --git a/drivers/media/platform/vsp1/vsp1_lif.c b/drivers/media/platform/vsp1/vsp1_lif.c
index 704920753998..9d6a77586285 100644
--- a/drivers/media/platform/vsp1/vsp1_lif.c
+++ b/drivers/media/platform/vsp1/vsp1_lif.c
@@ -128,10 +128,9 @@ static const struct v4l2_subdev_ops lif_ops = {
  * VSP1 Entity Operations
  */
 
-static void lif_configure(struct vsp1_entity *entity,
-			  struct vsp1_pipeline *pipe,
-			  struct vsp1_dl_list *dl,
-			  enum vsp1_entity_params params)
+static void lif_configure_stream(struct vsp1_entity *entity,
+				 struct vsp1_pipeline *pipe,
+				 struct vsp1_dl_list *dl)
 {
 	const struct v4l2_mbus_framefmt *format;
 	struct vsp1_lif *lif = to_lif(&entity->subdev);
@@ -139,9 +138,6 @@ static void lif_configure(struct vsp1_entity *entity,
 	unsigned int obth = 400;
 	unsigned int lbth = 200;
 
-	if (params != VSP1_ENTITY_PARAMS_INIT)
-		return;
-
 	format = vsp1_entity_get_pad_format(&lif->entity, lif->entity.config,
 					    LIF_PAD_SOURCE);
 
@@ -170,7 +166,7 @@ static void lif_configure(struct vsp1_entity *entity,
 }
 
 static const struct vsp1_entity_operations lif_entity_ops = {
-	.configure = lif_configure,
+	.configure_stream = lif_configure_stream,
 };
 
 /* -----------------------------------------------------------------------------
diff --git a/drivers/media/platform/vsp1/vsp1_lut.c b/drivers/media/platform/vsp1/vsp1_lut.c
index 77cf7137a0f2..6d160aabb185 100644
--- a/drivers/media/platform/vsp1/vsp1_lut.c
+++ b/drivers/media/platform/vsp1/vsp1_lut.c
@@ -190,24 +190,25 @@ static const struct v4l2_subdev_ops lut_ops = {
  * VSP1 Entity Operations
  */
 
-static void lut_configure(struct vsp1_entity *entity,
-			  struct vsp1_pipeline *pipe,
-			  struct vsp1_dl_list *dl,
-			  enum vsp1_entity_params params)
+static void lut_configure_stream(struct vsp1_entity *entity,
+				 struct vsp1_pipeline *pipe,
+				 struct vsp1_dl_list *dl)
 {
 	struct vsp1_lut *lut = to_lut(&entity->subdev);
-	struct vsp1_dl_body *dlb;
-	unsigned long flags;
 
-	switch (params) {
-	case VSP1_ENTITY_PARAMS_INIT:
-		vsp1_lut_write(lut, dl, VI6_LUT_CTRL, VI6_LUT_CTRL_EN);
-		break;
+	vsp1_lut_write(lut, dl, VI6_LUT_CTRL, VI6_LUT_CTRL_EN);
+}
 
-	case VSP1_ENTITY_PARAMS_PARTITION:
-		break;
+static void lut_configure_frame(struct vsp1_entity *entity,
+				struct vsp1_pipeline *pipe,
+				struct vsp1_dl_list *dl,
+				unsigned int partition)
+{
+	struct vsp1_lut *lut = to_lut(&entity->subdev);
+	struct vsp1_dl_body *dlb;
+	unsigned long flags;
 
-	case VSP1_ENTITY_PARAMS_RUNTIME:
+	if (partition == 0) {
 		spin_lock_irqsave(&lut->lock, flags);
 		dlb = lut->lut;
 		lut->lut = NULL;
@@ -219,8 +220,6 @@ static void lut_configure(struct vsp1_entity *entity,
 			/* release our local reference */
 			vsp1_dl_body_put(dlb);
 		}
-
-		break;
 	}
 }
 
@@ -232,7 +231,8 @@ static void lut_destroy(struct vsp1_entity *entity)
 }
 
 static const struct vsp1_entity_operations lut_entity_ops = {
-	.configure = lut_configure,
+	.configure_stream = lut_configure_stream,
+	.configure_frame = lut_configure_frame,
 	.destroy = lut_destroy,
 };
 
diff --git a/drivers/media/platform/vsp1/vsp1_rpf.c b/drivers/media/platform/vsp1/vsp1_rpf.c
index fe0633da5a5f..48c65e4a8546 100644
--- a/drivers/media/platform/vsp1/vsp1_rpf.c
+++ b/drivers/media/platform/vsp1/vsp1_rpf.c
@@ -46,10 +46,9 @@ static const struct v4l2_subdev_ops rpf_ops = {
  * VSP1 Entity Operations
  */
 
-static void rpf_configure(struct vsp1_entity *entity,
-			  struct vsp1_pipeline *pipe,
-			  struct vsp1_dl_list *dl,
-			  enum vsp1_entity_params params)
+static void rpf_configure_stream(struct vsp1_entity *entity,
+				 struct vsp1_pipeline *pipe,
+				 struct vsp1_dl_list *dl)
 {
 	struct vsp1_rwpf *rpf = to_rwpf(&entity->subdev);
 	const struct vsp1_format_info *fmtinfo = rpf->fmtinfo;
@@ -61,80 +60,6 @@ static void rpf_configure(struct vsp1_entity *entity,
 	u32 pstride;
 	u32 infmt;
 
-	if (params == VSP1_ENTITY_PARAMS_RUNTIME) {
-		vsp1_rpf_write(rpf, dl, VI6_RPF_VRTCOL_SET,
-			       rpf->alpha << VI6_RPF_VRTCOL_SET_LAYA_SHIFT);
-		vsp1_rpf_write(rpf, dl, VI6_RPF_MULT_ALPHA, rpf->mult_alpha |
-			       (rpf->alpha << VI6_RPF_MULT_ALPHA_RATIO_SHIFT));
-
-		vsp1_pipeline_propagate_alpha(pipe, dl, rpf->alpha);
-		return;
-	}
-
-	if (params == VSP1_ENTITY_PARAMS_PARTITION) {
-		struct vsp1_device *vsp1 = rpf->entity.vsp1;
-		struct vsp1_rwpf_memory mem = rpf->mem;
-		struct v4l2_rect crop;
-
-		/*
-		 * Source size and crop offsets.
-		 *
-		 * The crop offsets correspond to the location of the crop
-		 * rectangle top left corner in the plane buffer. Only two
-		 * offsets are needed, as planes 2 and 3 always have identical
-		 * strides.
-		 */
-		crop = *vsp1_rwpf_get_crop(rpf, rpf->entity.config);
-
-		/*
-		 * Partition Algorithm Control
-		 *
-		 * The partition algorithm can split this frame into multiple
-		 * slices. We must scale our partition window based on the pipe
-		 * configuration to match the destination partition window.
-		 * To achieve this, we adjust our crop to provide a 'sub-crop'
-		 * matching the expected partition window. Only 'left' and
-		 * 'width' need to be adjusted.
-		 */
-		if (pipe->partitions > 1) {
-			crop.width = pipe->partition->rpf.width;
-			crop.left += pipe->partition->rpf.left;
-		}
-
-		vsp1_rpf_write(rpf, dl, VI6_RPF_SRC_BSIZE,
-			       (crop.width << VI6_RPF_SRC_BSIZE_BHSIZE_SHIFT) |
-			       (crop.height << VI6_RPF_SRC_BSIZE_BVSIZE_SHIFT));
-		vsp1_rpf_write(rpf, dl, VI6_RPF_SRC_ESIZE,
-			       (crop.width << VI6_RPF_SRC_ESIZE_EHSIZE_SHIFT) |
-			       (crop.height << VI6_RPF_SRC_ESIZE_EVSIZE_SHIFT));
-
-		mem.addr[0] += crop.top * format->plane_fmt[0].bytesperline
-			     + crop.left * fmtinfo->bpp[0] / 8;
-
-		if (format->num_planes > 1) {
-			unsigned int offset;
-
-			offset = crop.top * format->plane_fmt[1].bytesperline
-			       + crop.left / fmtinfo->hsub
-			       * fmtinfo->bpp[1] / 8;
-			mem.addr[1] += offset;
-			mem.addr[2] += offset;
-		}
-
-		/*
-		 * On Gen3 hardware the SPUVS bit has no effect on 3-planar
-		 * formats. Swap the U and V planes manually in that case.
-		 */
-		if (vsp1->info->gen == 3 && format->num_planes == 3 &&
-		    fmtinfo->swap_uv)
-			swap(mem.addr[1], mem.addr[2]);
-
-		vsp1_rpf_write(rpf, dl, VI6_RPF_SRCM_ADDR_Y, mem.addr[0]);
-		vsp1_rpf_write(rpf, dl, VI6_RPF_SRCM_ADDR_C0, mem.addr[1]);
-		vsp1_rpf_write(rpf, dl, VI6_RPF_SRCM_ADDR_C1, mem.addr[2]);
-		return;
-	}
-
 	/* Stride */
 	pstride = format->plane_fmt[0].bytesperline
 		<< VI6_RPF_SRCM_PSTRIDE_Y_SHIFT;
@@ -247,6 +172,86 @@ static void rpf_configure(struct vsp1_entity *entity,
 
 }
 
+static void rpf_configure_frame(struct vsp1_entity *entity,
+				struct vsp1_pipeline *pipe,
+				struct vsp1_dl_list *dl,
+				unsigned int partition)
+{
+	struct vsp1_rwpf *rpf = to_rwpf(&entity->subdev);
+	struct vsp1_rwpf_memory mem = rpf->mem;
+	struct vsp1_device *vsp1 = rpf->entity.vsp1;
+	const struct vsp1_format_info *fmtinfo = rpf->fmtinfo;
+	const struct v4l2_pix_format_mplane *format = &rpf->format;
+	struct v4l2_rect crop;
+
+	if (partition == 0) {
+		vsp1_rpf_write(rpf, dl, VI6_RPF_VRTCOL_SET,
+			       rpf->alpha << VI6_RPF_VRTCOL_SET_LAYA_SHIFT);
+		vsp1_rpf_write(rpf, dl, VI6_RPF_MULT_ALPHA, rpf->mult_alpha |
+			       (rpf->alpha << VI6_RPF_MULT_ALPHA_RATIO_SHIFT));
+
+		vsp1_pipeline_propagate_alpha(pipe, dl, rpf->alpha);
+	}
+
+
+	/*
+	 * Source size and crop offsets.
+	 *
+	 * The crop offsets correspond to the location of the crop
+	 * rectangle top left corner in the plane buffer. Only two
+	 * offsets are needed, as planes 2 and 3 always have identical
+	 * strides.
+	 */
+	crop = *vsp1_rwpf_get_crop(rpf, rpf->entity.config);
+
+	/*
+	 * Partition Algorithm Control
+	 *
+	 * The partition algorithm can split this frame into multiple
+	 * slices. We must scale our partition window based on the pipe
+	 * configuration to match the destination partition window.
+	 * To achieve this, we adjust our crop to provide a 'sub-crop'
+	 * matching the expected partition window. Only 'left' and
+	 * 'width' need to be adjusted.
+	 */
+	if (pipe->partitions > 1) {
+		crop.width = pipe->partition->rpf.width;
+		crop.left += pipe->partition->rpf.left;
+	}
+
+	vsp1_rpf_write(rpf, dl, VI6_RPF_SRC_BSIZE,
+		       (crop.width << VI6_RPF_SRC_BSIZE_BHSIZE_SHIFT) |
+		       (crop.height << VI6_RPF_SRC_BSIZE_BVSIZE_SHIFT));
+	vsp1_rpf_write(rpf, dl, VI6_RPF_SRC_ESIZE,
+		       (crop.width << VI6_RPF_SRC_ESIZE_EHSIZE_SHIFT) |
+		       (crop.height << VI6_RPF_SRC_ESIZE_EVSIZE_SHIFT));
+
+	mem.addr[0] += crop.top * format->plane_fmt[0].bytesperline
+		     + crop.left * fmtinfo->bpp[0] / 8;
+
+	if (format->num_planes > 1) {
+		unsigned int offset;
+
+		offset = crop.top * format->plane_fmt[1].bytesperline
+		       + crop.left / fmtinfo->hsub
+		       * fmtinfo->bpp[1] / 8;
+		mem.addr[1] += offset;
+		mem.addr[2] += offset;
+	}
+
+	/*
+	 * On Gen3 hardware the SPUVS bit has no effect on 3-planar
+	 * formats. Swap the U and V planes manually in that case.
+	 */
+	if (vsp1->info->gen == 3 && format->num_planes == 3 &&
+	    fmtinfo->swap_uv)
+		swap(mem.addr[1], mem.addr[2]);
+
+	vsp1_rpf_write(rpf, dl, VI6_RPF_SRCM_ADDR_Y, mem.addr[0]);
+	vsp1_rpf_write(rpf, dl, VI6_RPF_SRCM_ADDR_C0, mem.addr[1]);
+	vsp1_rpf_write(rpf, dl, VI6_RPF_SRCM_ADDR_C1, mem.addr[2]);
+}
+
 static void rpf_partition(struct vsp1_entity *entity,
 			  struct vsp1_pipeline *pipe,
 			  struct vsp1_partition *partition,
@@ -257,7 +262,8 @@ static void rpf_partition(struct vsp1_entity *entity,
 }
 
 static const struct vsp1_entity_operations rpf_entity_ops = {
-	.configure = rpf_configure,
+	.configure_stream = rpf_configure_stream,
+	.configure_frame = rpf_configure_frame,
 	.partition = rpf_partition,
 };
 
diff --git a/drivers/media/platform/vsp1/vsp1_sru.c b/drivers/media/platform/vsp1/vsp1_sru.c
index 51e5691187c3..485b2820c8cd 100644
--- a/drivers/media/platform/vsp1/vsp1_sru.c
+++ b/drivers/media/platform/vsp1/vsp1_sru.c
@@ -271,10 +271,9 @@ static const struct v4l2_subdev_ops sru_ops = {
  * VSP1 Entity Operations
  */
 
-static void sru_configure(struct vsp1_entity *entity,
-			  struct vsp1_pipeline *pipe,
-			  struct vsp1_dl_list *dl,
-			  enum vsp1_entity_params params)
+static void sru_configure_stream(struct vsp1_entity *entity,
+				 struct vsp1_pipeline *pipe,
+				 struct vsp1_dl_list *dl)
 {
 	const struct vsp1_sru_param *param;
 	struct vsp1_sru *sru = to_sru(&entity->subdev);
@@ -282,9 +281,6 @@ static void sru_configure(struct vsp1_entity *entity,
 	struct v4l2_mbus_framefmt *output;
 	u32 ctrl0;
 
-	if (params != VSP1_ENTITY_PARAMS_INIT)
-		return;
-
 	input = vsp1_entity_get_pad_format(&sru->entity, sru->entity.config,
 					   SRU_PAD_SINK);
 	output = vsp1_entity_get_pad_format(&sru->entity, sru->entity.config,
@@ -351,7 +347,7 @@ static void sru_partition(struct vsp1_entity *entity,
 }
 
 static const struct vsp1_entity_operations sru_entity_ops = {
-	.configure = sru_configure,
+	.configure_stream = sru_configure_stream,
 	.max_width = sru_max_width,
 	.partition = sru_partition,
 };
diff --git a/drivers/media/platform/vsp1/vsp1_uds.c b/drivers/media/platform/vsp1/vsp1_uds.c
index 72f72a9d2152..ce1731c2b3a9 100644
--- a/drivers/media/platform/vsp1/vsp1_uds.c
+++ b/drivers/media/platform/vsp1/vsp1_uds.c
@@ -259,10 +259,9 @@ static const struct v4l2_subdev_ops uds_ops = {
  * VSP1 Entity Operations
  */
 
-static void uds_configure(struct vsp1_entity *entity,
-			  struct vsp1_pipeline *pipe,
-			  struct vsp1_dl_list *dl,
-			  enum vsp1_entity_params params)
+static void uds_configure_stream(struct vsp1_entity *entity,
+				 struct vsp1_pipeline *pipe,
+				 struct vsp1_dl_list *dl)
 {
 	struct vsp1_uds *uds = to_uds(&entity->subdev);
 	const struct v4l2_mbus_framefmt *output;
@@ -276,27 +275,6 @@ static void uds_configure(struct vsp1_entity *entity,
 	output = vsp1_entity_get_pad_format(&uds->entity, uds->entity.config,
 					    UDS_PAD_SOURCE);
 
-	if (params == VSP1_ENTITY_PARAMS_PARTITION) {
-		struct vsp1_partition *partition = pipe->partition;
-
-		/* Input size clipping */
-		vsp1_uds_write(uds, dl, VI6_UDS_HSZCLIP, VI6_UDS_HSZCLIP_HCEN |
-			       (0 << VI6_UDS_HSZCLIP_HCL_OFST_SHIFT) |
-			       (partition->uds_sink.width
-					<< VI6_UDS_HSZCLIP_HCL_SIZE_SHIFT));
-
-		/* Output size clipping */
-		vsp1_uds_write(uds, dl, VI6_UDS_CLIP_SIZE,
-			       (partition->uds_source.width
-					<< VI6_UDS_CLIP_SIZE_HSIZE_SHIFT) |
-			       (output->height
-					<< VI6_UDS_CLIP_SIZE_VSIZE_SHIFT));
-		return;
-	}
-
-	if (params != VSP1_ENTITY_PARAMS_INIT)
-		return;
-
 	hscale = uds_compute_ratio(input->width, output->width);
 	vscale = uds_compute_ratio(input->height, output->height);
 
@@ -328,6 +306,32 @@ static void uds_configure(struct vsp1_entity *entity,
 		       (vscale << VI6_UDS_SCALE_VFRAC_SHIFT));
 }
 
+static void uds_configure_frame(struct vsp1_entity *entity,
+				struct vsp1_pipeline *pipe,
+				struct vsp1_dl_list *dl,
+				unsigned int pindex)
+{
+	struct vsp1_uds *uds = to_uds(&entity->subdev);
+	struct vsp1_partition *partition = pipe->partition;
+	const struct v4l2_mbus_framefmt *output;
+
+	output = vsp1_entity_get_pad_format(&uds->entity, uds->entity.config,
+					    UDS_PAD_SOURCE);
+
+	/* Input size clipping */
+	vsp1_uds_write(uds, dl, VI6_UDS_HSZCLIP, VI6_UDS_HSZCLIP_HCEN |
+		       (0 << VI6_UDS_HSZCLIP_HCL_OFST_SHIFT) |
+		       (partition->uds_sink.width
+				<< VI6_UDS_HSZCLIP_HCL_SIZE_SHIFT));
+
+	/* Output size clipping */
+	vsp1_uds_write(uds, dl, VI6_UDS_CLIP_SIZE,
+		       (partition->uds_source.width
+				<< VI6_UDS_CLIP_SIZE_HSIZE_SHIFT) |
+		       (output->height
+				<< VI6_UDS_CLIP_SIZE_VSIZE_SHIFT));
+}
+
 static unsigned int uds_max_width(struct vsp1_entity *entity,
 				  struct vsp1_pipeline *pipe)
 {
@@ -384,7 +388,8 @@ static void uds_partition(struct vsp1_entity *entity,
 }
 
 static const struct vsp1_entity_operations uds_entity_ops = {
-	.configure = uds_configure,
+	.configure_stream = uds_configure_stream,
+	.configure_frame = uds_configure_frame,
 	.max_width = uds_max_width,
 	.partition = uds_partition,
 };
diff --git a/drivers/media/platform/vsp1/vsp1_video.c b/drivers/media/platform/vsp1/vsp1_video.c
index c2d3b8f0f487..1b5a31734834 100644
--- a/drivers/media/platform/vsp1/vsp1_video.c
+++ b/drivers/media/platform/vsp1/vsp1_video.c
@@ -386,33 +386,18 @@ static void vsp1_video_pipeline_run_partition(struct vsp1_pipeline *pipe,
 
 	pipe->partition = &pipe->part_table[partition];
 
-	list_for_each_entry(entity, &pipe->entities, list_pipe) {
-		if (entity->ops->configure)
-			entity->ops->configure(entity, pipe, dl,
-					       VSP1_ENTITY_PARAMS_PARTITION);
-	}
+	list_for_each_entry(entity, &pipe->entities, list_pipe)
+		vsp1_entity_configure_frame(entity, pipe, dl, partition);
 }
 
 static void vsp1_video_pipeline_run(struct vsp1_pipeline *pipe)
 {
 	struct vsp1_device *vsp1 = pipe->output->entity.vsp1;
-	struct vsp1_entity *entity;
 	unsigned int partition;
 
 	if (!pipe->dl)
 		pipe->dl = vsp1_dl_list_get(pipe->output->dlm);
 
-	/*
-	 * Start with the runtime parameters as the configure operation can
-	 * compute/cache information needed when configuring partitions. This
-	 * is the case with flipping in the WPF.
-	 */
-	list_for_each_entry(entity, &pipe->entities, list_pipe) {
-		if (entity->ops->configure)
-			entity->ops->configure(entity, pipe, pipe->dl,
-					       VSP1_ENTITY_PARAMS_RUNTIME);
-	}
-
 	/* Run the first partition */
 	vsp1_video_pipeline_run_partition(pipe, pipe->dl, 0);
 
@@ -840,10 +825,7 @@ static int vsp1_video_setup_pipeline(struct vsp1_pipeline *pipe)
 
 	list_for_each_entry(entity, &pipe->entities, list_pipe) {
 		vsp1_entity_route_setup(entity, pipe, pipe->dl);
-
-		if (entity->ops->configure)
-			entity->ops->configure(entity, pipe, pipe->dl,
-					       VSP1_ENTITY_PARAMS_INIT);
+		vsp1_entity_configure_stream(entity, pipe, pipe->dl);
 	}
 
 	return 0;
diff --git a/drivers/media/platform/vsp1/vsp1_wpf.c b/drivers/media/platform/vsp1/vsp1_wpf.c
index f7f3b4b2c2de..6a6cdf0fb5f1 100644
--- a/drivers/media/platform/vsp1/vsp1_wpf.c
+++ b/drivers/media/platform/vsp1/vsp1_wpf.c
@@ -236,10 +236,9 @@ static void vsp1_wpf_destroy(struct vsp1_entity *entity)
 	vsp1_dlm_destroy(wpf->dlm);
 }
 
-static void wpf_configure(struct vsp1_entity *entity,
-			  struct vsp1_pipeline *pipe,
-			  struct vsp1_dl_list *dl,
-			  enum vsp1_entity_params params)
+static void wpf_configure_stream(struct vsp1_entity *entity,
+				 struct vsp1_pipeline *pipe,
+				 struct vsp1_dl_list *dl)
 {
 	struct vsp1_rwpf *wpf = to_rwpf(&entity->subdev);
 	struct vsp1_device *vsp1 = wpf->entity.vsp1;
@@ -249,149 +248,12 @@ static void wpf_configure(struct vsp1_entity *entity,
 	u32 outfmt = 0;
 	u32 srcrpf = 0;
 
-	if (params == VSP1_ENTITY_PARAMS_RUNTIME) {
-		const unsigned int mask = BIT(WPF_CTRL_VFLIP)
-					| BIT(WPF_CTRL_HFLIP);
-		unsigned long flags;
-
-		spin_lock_irqsave(&wpf->flip.lock, flags);
-		wpf->flip.active = (wpf->flip.active & ~mask)
-				 | (wpf->flip.pending & mask);
-		spin_unlock_irqrestore(&wpf->flip.lock, flags);
-
-		outfmt = (wpf->alpha << VI6_WPF_OUTFMT_PDV_SHIFT) | wpf->outfmt;
-
-		if (wpf->flip.active & BIT(WPF_CTRL_VFLIP))
-			outfmt |= VI6_WPF_OUTFMT_FLP;
-		if (wpf->flip.active & BIT(WPF_CTRL_HFLIP))
-			outfmt |= VI6_WPF_OUTFMT_HFLP;
-
-		vsp1_wpf_write(wpf, dl, VI6_WPF_OUTFMT, outfmt);
-		return;
-	}
-
 	sink_format = vsp1_entity_get_pad_format(&wpf->entity,
 						 wpf->entity.config,
 						 RWPF_PAD_SINK);
 	source_format = vsp1_entity_get_pad_format(&wpf->entity,
 						   wpf->entity.config,
 						   RWPF_PAD_SOURCE);
-
-	if (params == VSP1_ENTITY_PARAMS_PARTITION) {
-		const struct v4l2_pix_format_mplane *format = &wpf->format;
-		const struct vsp1_format_info *fmtinfo = wpf->fmtinfo;
-		struct vsp1_rwpf_memory mem = wpf->mem;
-		unsigned int flip = wpf->flip.active;
-		unsigned int width = sink_format->width;
-		unsigned int height = sink_format->height;
-		unsigned int offset;
-
-		/*
-		 * Cropping. The partition algorithm can split the image into
-		 * multiple slices.
-		 */
-		if (pipe->partitions > 1)
-			width = pipe->partition->wpf.width;
-
-		vsp1_wpf_write(wpf, dl, VI6_WPF_HSZCLIP, VI6_WPF_SZCLIP_EN |
-			       (0 << VI6_WPF_SZCLIP_OFST_SHIFT) |
-			       (width << VI6_WPF_SZCLIP_SIZE_SHIFT));
-		vsp1_wpf_write(wpf, dl, VI6_WPF_VSZCLIP, VI6_WPF_SZCLIP_EN |
-			       (0 << VI6_WPF_SZCLIP_OFST_SHIFT) |
-			       (height << VI6_WPF_SZCLIP_SIZE_SHIFT));
-
-		if (pipe->lif)
-			return;
-
-		/*
-		 * Update the memory offsets based on flipping configuration.
-		 * The destination addresses point to the locations where the
-		 * VSP starts writing to memory, which can be any corner of the
-		 * image depending on the combination of flipping and rotation.
-		 */
-
-		/*
-		 * First take the partition left coordinate into account.
-		 * Compute the offset to order the partitions correctly on the
-		 * output based on whether flipping is enabled. Consider
-		 * horizontal flipping when rotation is disabled but vertical
-		 * flipping when rotation is enabled, as rotating the image
-		 * switches the horizontal and vertical directions. The offset
-		 * is applied horizontally or vertically accordingly.
-		 */
-		if (flip & BIT(WPF_CTRL_HFLIP) && !wpf->flip.rotate)
-			offset = format->width - pipe->partition->wpf.left
-				- pipe->partition->wpf.width;
-		else if (flip & BIT(WPF_CTRL_VFLIP) && wpf->flip.rotate)
-			offset = format->height - pipe->partition->wpf.left
-				- pipe->partition->wpf.width;
-		else
-			offset = pipe->partition->wpf.left;
-
-		for (i = 0; i < format->num_planes; ++i) {
-			unsigned int hsub = i > 0 ? fmtinfo->hsub : 1;
-			unsigned int vsub = i > 0 ? fmtinfo->vsub : 1;
-
-			if (wpf->flip.rotate)
-				mem.addr[i] += offset / vsub
-					     * format->plane_fmt[i].bytesperline;
-			else
-				mem.addr[i] += offset / hsub
-					     * fmtinfo->bpp[i] / 8;
-		}
-
-		if (flip & BIT(WPF_CTRL_VFLIP)) {
-			/*
-			 * When rotating the output (after rotation) image
-			 * height is equal to the partition width (before
-			 * rotation). Otherwise it is equal to the output
-			 * image height.
-			 */
-			if (wpf->flip.rotate)
-				height = pipe->partition->wpf.width;
-			else
-				height = format->height;
-
-			mem.addr[0] += (height - 1)
-				     * format->plane_fmt[0].bytesperline;
-
-			if (format->num_planes > 1) {
-				offset = (height / fmtinfo->vsub - 1)
-				       * format->plane_fmt[1].bytesperline;
-				mem.addr[1] += offset;
-				mem.addr[2] += offset;
-			}
-		}
-
-		if (wpf->flip.rotate && !(flip & BIT(WPF_CTRL_HFLIP))) {
-			unsigned int hoffset = max(0, (int)format->width - 16);
-
-			/*
-			 * Compute the output coordinate. The partition
-			 * horizontal (left) offset becomes a vertical offset.
-			 */
-			for (i = 0; i < format->num_planes; ++i) {
-				unsigned int hsub = i > 0 ? fmtinfo->hsub : 1;
-
-				mem.addr[i] += hoffset / hsub
-					     * fmtinfo->bpp[i] / 8;
-			}
-		}
-
-		/*
-		 * On Gen3 hardware the SPUVS bit has no effect on 3-planar
-		 * formats. Swap the U and V planes manually in that case.
-		 */
-		if (vsp1->info->gen == 3 && format->num_planes == 3 &&
-		    fmtinfo->swap_uv)
-			swap(mem.addr[1], mem.addr[2]);
-
-		vsp1_wpf_write(wpf, dl, VI6_WPF_DSTM_ADDR_Y, mem.addr[0]);
-		vsp1_wpf_write(wpf, dl, VI6_WPF_DSTM_ADDR_C0, mem.addr[1]);
-		vsp1_wpf_write(wpf, dl, VI6_WPF_DSTM_ADDR_C1, mem.addr[2]);
-		return;
-	}
-
 	/* Format */
 	if (!pipe->lif) {
 		const struct v4l2_pix_format_mplane *format = &wpf->format;
@@ -465,6 +327,158 @@ static void wpf_configure(struct vsp1_entity *entity,
 			   VI6_WFP_IRQ_ENB_DFEE);
 }
 
+static void wpf_configure_frame(struct vsp1_entity *entity,
+				struct vsp1_pipeline *pipe,
+				struct vsp1_dl_list *dl,
+				unsigned int partition)
+{
+	struct vsp1_rwpf *wpf = to_rwpf(&entity->subdev);
+	struct vsp1_device *vsp1 = wpf->entity.vsp1;
+	struct vsp1_rwpf_memory mem = wpf->mem;
+	const struct v4l2_mbus_framefmt *sink_format;
+	const struct v4l2_pix_format_mplane *format = &wpf->format;
+	const struct vsp1_format_info *fmtinfo = wpf->fmtinfo;
+	unsigned int flip;
+	unsigned int i;
+	unsigned int width;
+	unsigned int height;
+	unsigned int offset;
+	u32 outfmt = 0;
+
+	/* Handle the per frame constants */
+	if (partition == 0) {
+		const unsigned int mask = BIT(WPF_CTRL_VFLIP)
+					| BIT(WPF_CTRL_HFLIP);
+		unsigned long flags;
+
+		spin_lock_irqsave(&wpf->flip.lock, flags);
+		wpf->flip.active = (wpf->flip.active & ~mask)
+				 | (wpf->flip.pending & mask);
+		spin_unlock_irqrestore(&wpf->flip.lock, flags);
+
+		outfmt = (wpf->alpha << VI6_WPF_OUTFMT_PDV_SHIFT) | wpf->outfmt;
+
+		if (wpf->flip.active & BIT(WPF_CTRL_VFLIP))
+			outfmt |= VI6_WPF_OUTFMT_FLP;
+		if (wpf->flip.active & BIT(WPF_CTRL_HFLIP))
+			outfmt |= VI6_WPF_OUTFMT_HFLP;
+
+		vsp1_wpf_write(wpf, dl, VI6_WPF_OUTFMT, outfmt);
+	}
+
+	sink_format = vsp1_entity_get_pad_format(&wpf->entity,
+						 wpf->entity.config,
+						 RWPF_PAD_SINK);
+	width = sink_format->width;
+	height = sink_format->height;
+
+	/*
+	 * Cropping. The partition algorithm can split the image into
+	 * multiple slices.
+	 */
+	if (pipe->partitions > 1)
+		width = pipe->partition->wpf.width;
+
+	vsp1_wpf_write(wpf, dl, VI6_WPF_HSZCLIP, VI6_WPF_SZCLIP_EN |
+		       (0 << VI6_WPF_SZCLIP_OFST_SHIFT) |
+		       (width << VI6_WPF_SZCLIP_SIZE_SHIFT));
+	vsp1_wpf_write(wpf, dl, VI6_WPF_VSZCLIP, VI6_WPF_SZCLIP_EN |
+		       (0 << VI6_WPF_SZCLIP_OFST_SHIFT) |
+		       (height << VI6_WPF_SZCLIP_SIZE_SHIFT));
+
+	if (pipe->lif)
+		return;
+
+	/*
+	 * Update the memory offsets based on flipping configuration.
+	 * The destination addresses point to the locations where the
+	 * VSP starts writing to memory, which can be any corner of the
+	 * image depending on the combination of flipping and rotation.
+	 */
+
+	/*
+	 * First take the partition left coordinate into account.
+	 * Compute the offset to order the partitions correctly on the
+	 * output based on whether flipping is enabled. Consider
+	 * horizontal flipping when rotation is disabled but vertical
+	 * flipping when rotation is enabled, as rotating the image
+	 * switches the horizontal and vertical directions. The offset
+	 * is applied horizontally or vertically accordingly.
+	 */
+	flip = wpf->flip.active;
+
+	if (flip & BIT(WPF_CTRL_HFLIP) && !wpf->flip.rotate)
+		offset = format->width - pipe->partition->wpf.left
+			- pipe->partition->wpf.width;
+	else if (flip & BIT(WPF_CTRL_VFLIP) && wpf->flip.rotate)
+		offset = format->height - pipe->partition->wpf.left
+			- pipe->partition->wpf.width;
+	else
+		offset = pipe->partition->wpf.left;
+
+	for (i = 0; i < format->num_planes; ++i) {
+		unsigned int hsub = i > 0 ? fmtinfo->hsub : 1;
+		unsigned int vsub = i > 0 ? fmtinfo->vsub : 1;
+
+		if (wpf->flip.rotate)
+			mem.addr[i] += offset / vsub
+				     * format->plane_fmt[i].bytesperline;
+		else
+			mem.addr[i] += offset / hsub
+				     * fmtinfo->bpp[i] / 8;
+	}
+
+	if (flip & BIT(WPF_CTRL_VFLIP)) {
+		/*
+		 * When rotating the output (after rotation) image
+		 * height is equal to the partition width (before
+		 * rotation). Otherwise it is equal to the output
+		 * image height.
+		 */
+		if (wpf->flip.rotate)
+			height = pipe->partition->wpf.width;
+		else
+			height = format->height;
+
+		mem.addr[0] += (height - 1)
+			     * format->plane_fmt[0].bytesperline;
+
+		if (format->num_planes > 1) {
+			offset = (height / fmtinfo->vsub - 1)
+			       * format->plane_fmt[1].bytesperline;
+			mem.addr[1] += offset;
+			mem.addr[2] += offset;
+		}
+	}
+
+	if (wpf->flip.rotate && !(flip & BIT(WPF_CTRL_HFLIP))) {
+		unsigned int hoffset = max(0, (int)format->width - 16);
+
+		/*
+		 * Compute the output coordinate. The partition
+		 * horizontal (left) offset becomes a vertical offset.
+		 */
+		for (i = 0; i < format->num_planes; ++i) {
+			unsigned int hsub = i > 0 ? fmtinfo->hsub : 1;
+
+			mem.addr[i] += hoffset / hsub
+				     * fmtinfo->bpp[i] / 8;
+		}
+	}
+
+	/*
+	 * On Gen3 hardware the SPUVS bit has no effect on 3-planar
+	 * formats. Swap the U and V planes manually in that case.
+	 */
+	if (vsp1->info->gen == 3 && format->num_planes == 3 &&
+	    fmtinfo->swap_uv)
+		swap(mem.addr[1], mem.addr[2]);
+
+	vsp1_wpf_write(wpf, dl, VI6_WPF_DSTM_ADDR_Y, mem.addr[0]);
+	vsp1_wpf_write(wpf, dl, VI6_WPF_DSTM_ADDR_C0, mem.addr[1]);
+	vsp1_wpf_write(wpf, dl, VI6_WPF_DSTM_ADDR_C1, mem.addr[2]);
+}
+
 static unsigned int wpf_max_width(struct vsp1_entity *entity,
 				  struct vsp1_pipeline *pipe)
 {
@@ -484,7 +498,8 @@ static void wpf_partition(struct vsp1_entity *entity,
 
 static const struct vsp1_entity_operations wpf_entity_ops = {
 	.destroy = vsp1_wpf_destroy,
-	.configure = wpf_configure,
+	.configure_stream = wpf_configure_stream,
+	.configure_frame = wpf_configure_frame,
 	.max_width = wpf_max_width,
 	.partition = wpf_partition,
 };
-- 
git-series 0.9.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v7 7/8] media: vsp1: Adapt entities to configure into a body
  2018-03-08  0:05 [PATCH v7 0/8] vsp1: TLB optimisation and DL caching Kieran Bingham
                   ` (5 preceding siblings ...)
  2018-03-08  0:05 ` [PATCH v7 6/8] media: vsp1: Refactor display list configure operations Kieran Bingham
@ 2018-03-08  0:05 ` Kieran Bingham
  2018-04-06 23:55   ` Laurent Pinchart
  2018-03-08  0:05 ` [PATCH v7 8/8] media: vsp1: Move video configuration to a cached dlb Kieran Bingham
  2018-04-07  0:30 ` [PATCH v7 0/8] vsp1: TLB optimisation and DL caching Laurent Pinchart
  8 siblings, 1 reply; 26+ messages in thread
From: Kieran Bingham @ 2018-03-08  0:05 UTC (permalink / raw)
  To: linux-media, linux-renesas-soc, Laurent Pinchart
  Cc: Kieran Bingham, Kieran Bingham

Currently the entities store their configurations into a display list.
Adapt this such that the code can be configured into a body directly,
allowing greater flexibility and control of the content.

All users of vsp1_dl_list_write() are removed in this process, thus it
too is removed.

A helper, vsp1_dl_list_get_body0() is provided to access the internal body0
from the display list.

Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>

---
v7
 - Rebase
 - s/prepare/configure_stream/
 - s/configure/configure_frame/

 drivers/media/platform/vsp1/vsp1_bru.c    | 22 ++++++-------
 drivers/media/platform/vsp1/vsp1_clu.c    | 22 ++++++-------
 drivers/media/platform/vsp1/vsp1_dl.c     | 12 ++-----
 drivers/media/platform/vsp1/vsp1_dl.h     |  2 +-
 drivers/media/platform/vsp1/vsp1_drm.c    | 20 +++++++----
 drivers/media/platform/vsp1/vsp1_entity.c | 15 ++++-----
 drivers/media/platform/vsp1/vsp1_entity.h | 11 +++---
 drivers/media/platform/vsp1/vsp1_hgo.c    | 16 ++++-----
 drivers/media/platform/vsp1/vsp1_hgt.c    | 18 +++++-----
 drivers/media/platform/vsp1/vsp1_hsit.c   | 10 +++---
 drivers/media/platform/vsp1/vsp1_lif.c    | 15 ++++-----
 drivers/media/platform/vsp1/vsp1_lut.c    | 21 ++++++------
 drivers/media/platform/vsp1/vsp1_pipe.c   |  4 +-
 drivers/media/platform/vsp1/vsp1_pipe.h   |  3 +-
 drivers/media/platform/vsp1/vsp1_rpf.c    | 39 +++++++++++-----------
 drivers/media/platform/vsp1/vsp1_sru.c    | 14 ++++----
 drivers/media/platform/vsp1/vsp1_uds.c    | 24 +++++++-------
 drivers/media/platform/vsp1/vsp1_uds.h    |  2 +-
 drivers/media/platform/vsp1/vsp1_video.c  | 11 ++++--
 drivers/media/platform/vsp1/vsp1_wpf.c    | 42 ++++++++++++------------
 20 files changed, 172 insertions(+), 151 deletions(-)

diff --git a/drivers/media/platform/vsp1/vsp1_bru.c b/drivers/media/platform/vsp1/vsp1_bru.c
index d6fd265eaccb..7b9cf78b4be8 100644
--- a/drivers/media/platform/vsp1/vsp1_bru.c
+++ b/drivers/media/platform/vsp1/vsp1_bru.c
@@ -30,10 +30,10 @@
  * Device Access
  */
 
-static inline void vsp1_bru_write(struct vsp1_bru *bru, struct vsp1_dl_list *dl,
-				  u32 reg, u32 data)
+static inline void vsp1_bru_write(struct vsp1_bru *bru,
+				  struct vsp1_dl_body *dlb, u32 reg, u32 data)
 {
-	vsp1_dl_list_write(dl, bru->base + reg, data);
+	vsp1_dl_body_write(dlb, bru->base + reg, data);
 }
 
 /* -----------------------------------------------------------------------------
@@ -287,7 +287,7 @@ static const struct v4l2_subdev_ops bru_ops = {
 
 static void bru_configure_stream(struct vsp1_entity *entity,
 				 struct vsp1_pipeline *pipe,
-				 struct vsp1_dl_list *dl)
+				 struct vsp1_dl_body *dlb)
 {
 	struct vsp1_bru *bru = to_bru(&entity->subdev);
 	struct v4l2_mbus_framefmt *format;
@@ -309,7 +309,7 @@ static void bru_configure_stream(struct vsp1_entity *entity,
 	 * format at the pipeline output is premultiplied.
 	 */
 	flags = pipe->output ? pipe->output->format.flags : 0;
-	vsp1_bru_write(bru, dl, VI6_BRU_INCTRL,
+	vsp1_bru_write(bru, dlb, VI6_BRU_INCTRL,
 		       flags & V4L2_PIX_FMT_FLAG_PREMUL_ALPHA ?
 		       0 : VI6_BRU_INCTRL_NRM);
 
@@ -317,12 +317,12 @@ static void bru_configure_stream(struct vsp1_entity *entity,
 	 * Set the background position to cover the whole output image and
 	 * configure its color.
 	 */
-	vsp1_bru_write(bru, dl, VI6_BRU_VIRRPF_SIZE,
+	vsp1_bru_write(bru, dlb, VI6_BRU_VIRRPF_SIZE,
 		       (format->width << VI6_BRU_VIRRPF_SIZE_HSIZE_SHIFT) |
 		       (format->height << VI6_BRU_VIRRPF_SIZE_VSIZE_SHIFT));
-	vsp1_bru_write(bru, dl, VI6_BRU_VIRRPF_LOC, 0);
+	vsp1_bru_write(bru, dlb, VI6_BRU_VIRRPF_LOC, 0);
 
-	vsp1_bru_write(bru, dl, VI6_BRU_VIRRPF_COL, bru->bgcolor |
+	vsp1_bru_write(bru, dlb, VI6_BRU_VIRRPF_COL, bru->bgcolor |
 		       (0xff << VI6_BRU_VIRRPF_COL_A_SHIFT));
 
 	/*
@@ -332,7 +332,7 @@ static void bru_configure_stream(struct vsp1_entity *entity,
 	 * unit.
 	 */
 	if (entity->type == VSP1_ENTITY_BRU)
-		vsp1_bru_write(bru, dl, VI6_BRU_ROP,
+		vsp1_bru_write(bru, dlb, VI6_BRU_ROP,
 			       VI6_BRU_ROP_DSTSEL_BRUIN(1) |
 			       VI6_BRU_ROP_CROP(VI6_ROP_NOP) |
 			       VI6_BRU_ROP_AROP(VI6_ROP_NOP));
@@ -374,7 +374,7 @@ static void bru_configure_stream(struct vsp1_entity *entity,
 		if (!(entity->type == VSP1_ENTITY_BRU && i == 1))
 			ctrl |= VI6_BRU_CTRL_SRCSEL_BRUIN(i);
 
-		vsp1_bru_write(bru, dl, VI6_BRU_CTRL(i), ctrl);
+		vsp1_bru_write(bru, dlb, VI6_BRU_CTRL(i), ctrl);
 
 		/*
 		 * Harcode the blending formula to
@@ -389,7 +389,7 @@ static void bru_configure_stream(struct vsp1_entity *entity,
 		 *
 		 * otherwise.
 		 */
-		vsp1_bru_write(bru, dl, VI6_BRU_BLD(i),
+		vsp1_bru_write(bru, dlb, VI6_BRU_BLD(i),
 			       VI6_BRU_BLD_CCMDX_255_SRC_A |
 			       (premultiplied ? VI6_BRU_BLD_CCMDY_COEFY :
 						VI6_BRU_BLD_CCMDY_SRC_A) |
diff --git a/drivers/media/platform/vsp1/vsp1_clu.c b/drivers/media/platform/vsp1/vsp1_clu.c
index b8d8af6d4910..d30f8ad4687c 100644
--- a/drivers/media/platform/vsp1/vsp1_clu.c
+++ b/drivers/media/platform/vsp1/vsp1_clu.c
@@ -29,10 +29,10 @@
  * Device Access
  */
 
-static inline void vsp1_clu_write(struct vsp1_clu *clu, struct vsp1_dl_list *dl,
-				  u32 reg, u32 data)
+static inline void vsp1_clu_write(struct vsp1_clu *clu,
+				  struct vsp1_dl_body *dlb, u32 reg, u32 data)
 {
-	vsp1_dl_list_write(dl, reg, data);
+	vsp1_dl_body_write(dlb, reg, data);
 }
 
 /* -----------------------------------------------------------------------------
@@ -215,7 +215,7 @@ static const struct v4l2_subdev_ops clu_ops = {
  */
 static void clu_configure_stream(struct vsp1_entity *entity,
 				 struct vsp1_pipeline *pipe,
-				 struct vsp1_dl_list *dl)
+				 struct vsp1_dl_body *dlb)
 {
 	struct vsp1_clu *clu = to_clu(&entity->subdev);
 
@@ -234,14 +234,14 @@ static void clu_configure_stream(struct vsp1_entity *entity,
 static void clu_configure_frame(struct vsp1_entity *entity,
 				struct vsp1_pipeline *pipe,
 				struct vsp1_dl_list *dl,
+				struct vsp1_dl_body *dlb,
 				unsigned int partition)
 {
 	struct vsp1_clu *clu = to_clu(&entity->subdev);
-	struct vsp1_dl_body *dlb;
+	struct vsp1_dl_body *clu_dlb;
 	unsigned long flags;
 	u32 ctrl = VI6_CLU_CTRL_AAI | VI6_CLU_CTRL_MVS | VI6_CLU_CTRL_EN;
 
-
 	if (partition == 0) {
 		/* 2D mode can only be used with the YCbCr pixel encoding. */
 		if (clu->mode == V4L2_CID_VSP1_CLU_MODE_2D && clu->yuv_mode)
@@ -249,18 +249,18 @@ static void clu_configure_frame(struct vsp1_entity *entity,
 			     |  VI6_CLU_CTRL_OS0_2D | VI6_CLU_CTRL_OS1_2D
 			     |  VI6_CLU_CTRL_OS2_2D | VI6_CLU_CTRL_M2D;
 
-		vsp1_clu_write(clu, dl, VI6_CLU_CTRL, ctrl);
+		vsp1_clu_write(clu, dlb, VI6_CLU_CTRL, ctrl);
 
 		spin_lock_irqsave(&clu->lock, flags);
-		dlb = clu->clu;
+		clu_dlb = clu->clu;
 		clu->clu = NULL;
 		spin_unlock_irqrestore(&clu->lock, flags);
 
-		if (dlb) {
-			vsp1_dl_list_add_body(dl, dlb);
+		if (clu_dlb) {
+			vsp1_dl_list_add_body(dl, clu_dlb);
 
 			/* release our local reference */
-			vsp1_dl_body_put(dlb);
+			vsp1_dl_body_put(clu_dlb);
 		}
 	}
 }
diff --git a/drivers/media/platform/vsp1/vsp1_dl.c b/drivers/media/platform/vsp1/vsp1_dl.c
index 134865287c02..37e2c984fbf3 100644
--- a/drivers/media/platform/vsp1/vsp1_dl.c
+++ b/drivers/media/platform/vsp1/vsp1_dl.c
@@ -449,17 +449,15 @@ void vsp1_dl_list_put(struct vsp1_dl_list *dl)
 }
 
 /**
- * vsp1_dl_list_write - Write a register to the display list
+ * vsp1_dl_list_get_body0 - Obtain the default body for the display list
  * @dl: The display list
- * @reg: The register address
- * @data: The register value
  *
- * Write the given register and value to the display list. Up to 256 registers
- * can be written per display list.
+ * Obtain a pointer to the internal display list body allowing this to be passed
+ * directly to configure operations.
  */
-void vsp1_dl_list_write(struct vsp1_dl_list *dl, u32 reg, u32 data)
+struct vsp1_dl_body *vsp1_dl_list_get_body0(struct vsp1_dl_list *dl)
 {
-	vsp1_dl_body_write(dl->body0, reg, data);
+	return dl->body0;
 }
 
 /**
diff --git a/drivers/media/platform/vsp1/vsp1_dl.h b/drivers/media/platform/vsp1/vsp1_dl.h
index f45083251644..5ad2cec5cad9 100644
--- a/drivers/media/platform/vsp1/vsp1_dl.h
+++ b/drivers/media/platform/vsp1/vsp1_dl.h
@@ -32,7 +32,7 @@ bool vsp1_dlm_irq_frame_end(struct vsp1_dl_manager *dlm);
 
 struct vsp1_dl_list *vsp1_dl_list_get(struct vsp1_dl_manager *dlm);
 void vsp1_dl_list_put(struct vsp1_dl_list *dl);
-void vsp1_dl_list_write(struct vsp1_dl_list *dl, u32 reg, u32 data);
+struct vsp1_dl_body *vsp1_dl_list_get_body0(struct vsp1_dl_list *dl);
 void vsp1_dl_list_commit(struct vsp1_dl_list *dl);
 
 struct vsp1_dl_body_pool *
diff --git a/drivers/media/platform/vsp1/vsp1_drm.c b/drivers/media/platform/vsp1/vsp1_drm.c
index 12db55d43d01..3c8b1952799d 100644
--- a/drivers/media/platform/vsp1/vsp1_drm.c
+++ b/drivers/media/platform/vsp1/vsp1_drm.c
@@ -88,6 +88,7 @@ int vsp1_du_setup_lif(struct device *dev, unsigned int pipe_index,
 	struct vsp1_entity *entity;
 	struct vsp1_entity *next;
 	struct vsp1_dl_list *dl;
+	struct vsp1_dl_body *dlb;
 	struct v4l2_subdev_format format;
 	unsigned long flags;
 	unsigned int i;
@@ -255,12 +256,13 @@ int vsp1_du_setup_lif(struct device *dev, unsigned int pipe_index,
 
 	/* Configure all entities in the pipeline. */
 	dl = vsp1_dl_list_get(pipe->output->dlm);
+	dlb = vsp1_dl_list_get_body0(dl);
 
 	list_for_each_entry_safe(entity, next, &pipe->entities, list_pipe) {
-		vsp1_entity_route_setup(entity, pipe, dl);
+		vsp1_entity_route_setup(entity, pipe, dlb);
 
-		vsp1_entity_configure_stream(entity, pipe, dl);
-		vsp1_entity_configure_frame(entity, pipe, dl, 0);
+		vsp1_entity_configure_stream(entity, pipe, dlb);
+		vsp1_entity_configure_frame(entity, pipe, dl, dlb, 0);
 	}
 
 	vsp1_dl_list_commit(dl);
@@ -506,12 +508,16 @@ void vsp1_du_atomic_flush(struct device *dev, unsigned int pipe_index)
 	struct vsp1_entity *entity;
 	struct vsp1_entity *next;
 	struct vsp1_dl_list *dl;
+	struct vsp1_dl_body *dlb;
 	unsigned int i;
 	int ret;
 
 	/* Prepare the display list. */
 	dl = vsp1_dl_list_get(pipe->output->dlm);
 
+	/* Retrieve the default DLB from the list */
+	dlb = vsp1_dl_list_get_body0(dl);
+
 	/* Count the number of enabled inputs and sort them by Z-order. */
 	pipe->num_inputs = 0;
 
@@ -573,7 +579,7 @@ void vsp1_du_atomic_flush(struct device *dev, unsigned int pipe_index)
 		/* Disconnect unused RPFs from the pipeline. */
 		if (entity->type == VSP1_ENTITY_RPF &&
 		    !pipe->inputs[entity->index]) {
-			vsp1_dl_list_write(dl, entity->route->reg,
+			vsp1_dl_body_write(dlb, entity->route->reg,
 					   VI6_DPR_NODE_UNUSED);
 
 			list_del_init(&entity->list_pipe);
@@ -581,9 +587,9 @@ void vsp1_du_atomic_flush(struct device *dev, unsigned int pipe_index)
 			continue;
 		}
 
-		vsp1_entity_route_setup(entity, pipe, dl);
-		vsp1_entity_configure_stream(entity, pipe, dl);
-		vsp1_entity_configure_frame(entity, pipe, dl, 0);
+		vsp1_entity_route_setup(entity, pipe, dlb);
+		vsp1_entity_configure_stream(entity, pipe, dlb);
+		vsp1_entity_configure_frame(entity, pipe, dl, dlb, 0);
 	}
 
 	vsp1_dl_list_commit(dl);
diff --git a/drivers/media/platform/vsp1/vsp1_entity.c b/drivers/media/platform/vsp1/vsp1_entity.c
index 472284b638bc..185ca770deb7 100644
--- a/drivers/media/platform/vsp1/vsp1_entity.c
+++ b/drivers/media/platform/vsp1/vsp1_entity.c
@@ -26,7 +26,7 @@
 
 void vsp1_entity_route_setup(struct vsp1_entity *entity,
 			     struct vsp1_pipeline *pipe,
-			     struct vsp1_dl_list *dl)
+			     struct vsp1_dl_body *dlb)
 {
 	struct vsp1_entity *source;
 	u32 route;
@@ -42,7 +42,7 @@ void vsp1_entity_route_setup(struct vsp1_entity *entity,
 		smppt = (pipe->output->entity.index << VI6_DPR_SMPPT_TGW_SHIFT)
 		      | (source->route->output << VI6_DPR_SMPPT_PT_SHIFT);
 
-		vsp1_dl_list_write(dl, VI6_DPR_HGO_SMPPT, smppt);
+		vsp1_dl_body_write(dlb, VI6_DPR_HGO_SMPPT, smppt);
 		return;
 	} else if (entity->type == VSP1_ENTITY_HGT) {
 		u32 smppt;
@@ -55,7 +55,7 @@ void vsp1_entity_route_setup(struct vsp1_entity *entity,
 		smppt = (pipe->output->entity.index << VI6_DPR_SMPPT_TGW_SHIFT)
 		      | (source->route->output << VI6_DPR_SMPPT_PT_SHIFT);
 
-		vsp1_dl_list_write(dl, VI6_DPR_HGT_SMPPT, smppt);
+		vsp1_dl_body_write(dlb, VI6_DPR_HGT_SMPPT, smppt);
 		return;
 	}
 
@@ -70,24 +70,25 @@ void vsp1_entity_route_setup(struct vsp1_entity *entity,
 	 */
 	if (source->type == VSP1_ENTITY_BRS)
 		route |= VI6_DPR_ROUTE_BRSSEL;
-	vsp1_dl_list_write(dl, source->route->reg, route);
+	vsp1_dl_body_write(dlb, source->route->reg, route);
 }
 
 void vsp1_entity_configure_stream(struct vsp1_entity *entity,
 				  struct vsp1_pipeline *pipe,
-				  struct vsp1_dl_list *dl)
+				  struct vsp1_dl_body *dlb)
 {
 	if (entity->ops->configure_stream)
-		entity->ops->configure_stream(entity, pipe, dl);
+		entity->ops->configure_stream(entity, pipe, dlb);
 }
 
 void vsp1_entity_configure_frame(struct vsp1_entity *entity,
 				 struct vsp1_pipeline *pipe,
 				 struct vsp1_dl_list *dl,
+				 struct vsp1_dl_body *dlb,
 				 unsigned int partition)
 {
 	if (entity->ops->configure_frame)
-		entity->ops->configure_frame(entity, pipe, dl, partition);
+		entity->ops->configure_frame(entity, pipe, dl, dlb, partition);
 }
 
 /* -----------------------------------------------------------------------------
diff --git a/drivers/media/platform/vsp1/vsp1_entity.h b/drivers/media/platform/vsp1/vsp1_entity.h
index b44ed5414fc3..4a3602d3919b 100644
--- a/drivers/media/platform/vsp1/vsp1_entity.h
+++ b/drivers/media/platform/vsp1/vsp1_entity.h
@@ -19,6 +19,7 @@
 #include <media/v4l2-subdev.h>
 
 struct vsp1_device;
+struct vsp1_dl_body;
 struct vsp1_dl_list;
 struct vsp1_pipeline;
 struct vsp1_partition;
@@ -80,9 +81,10 @@ struct vsp1_route {
 struct vsp1_entity_operations {
 	void (*destroy)(struct vsp1_entity *);
 	void (*configure_stream)(struct vsp1_entity *, struct vsp1_pipeline *,
-				 struct vsp1_dl_list *);
+				 struct vsp1_dl_body *);
 	void (*configure_frame)(struct vsp1_entity *, struct vsp1_pipeline *,
-				struct vsp1_dl_list *, unsigned int partition);
+				struct vsp1_dl_list *, struct vsp1_dl_body *,
+				unsigned int partition);
 	unsigned int (*max_width)(struct vsp1_entity *, struct vsp1_pipeline *);
 	void (*partition)(struct vsp1_entity *, struct vsp1_pipeline *,
 			  struct vsp1_partition *, unsigned int,
@@ -147,15 +149,16 @@ int vsp1_entity_init_cfg(struct v4l2_subdev *subdev,
 
 void vsp1_entity_route_setup(struct vsp1_entity *entity,
 			     struct vsp1_pipeline *pipe,
-			     struct vsp1_dl_list *dl);
+			     struct vsp1_dl_body *dlb);
 
 void vsp1_entity_configure_stream(struct vsp1_entity *entity,
 				  struct vsp1_pipeline *pipe,
-				  struct vsp1_dl_list *dl);
+				  struct vsp1_dl_body *dlb);
 
 void vsp1_entity_configure_frame(struct vsp1_entity *entity,
 				 struct vsp1_pipeline *pipe,
 				 struct vsp1_dl_list *dl,
+				 struct vsp1_dl_body *dlb,
 				 unsigned int partition);
 
 struct media_pad *vsp1_entity_remote_pad(struct media_pad *pad);
diff --git a/drivers/media/platform/vsp1/vsp1_hgo.c b/drivers/media/platform/vsp1/vsp1_hgo.c
index ddba9f83ac7d..ec8e6d8c47e8 100644
--- a/drivers/media/platform/vsp1/vsp1_hgo.c
+++ b/drivers/media/platform/vsp1/vsp1_hgo.c
@@ -32,10 +32,10 @@ static inline u32 vsp1_hgo_read(struct vsp1_hgo *hgo, u32 reg)
 	return vsp1_read(hgo->histo.entity.vsp1, reg);
 }
 
-static inline void vsp1_hgo_write(struct vsp1_hgo *hgo, struct vsp1_dl_list *dl,
-				  u32 reg, u32 data)
+static inline void vsp1_hgo_write(struct vsp1_hgo *hgo,
+				  struct vsp1_dl_body *dlb, u32 reg, u32 data)
 {
-	vsp1_dl_list_write(dl, reg, data);
+	vsp1_dl_body_write(dlb, reg, data);
 }
 
 /* -----------------------------------------------------------------------------
@@ -135,7 +135,7 @@ static const struct v4l2_ctrl_config hgo_num_bins_control = {
 
 static void hgo_configure_stream(struct vsp1_entity *entity,
 				 struct vsp1_pipeline *pipe,
-				 struct vsp1_dl_list *dl)
+				 struct vsp1_dl_body *dlb)
 {
 	struct vsp1_hgo *hgo = to_hgo(&entity->subdev);
 	struct v4l2_rect *compose;
@@ -149,12 +149,12 @@ static void hgo_configure_stream(struct vsp1_entity *entity,
 						HISTO_PAD_SINK,
 						V4L2_SEL_TGT_COMPOSE);
 
-	vsp1_hgo_write(hgo, dl, VI6_HGO_REGRST, VI6_HGO_REGRST_RCLEA);
+	vsp1_hgo_write(hgo, dlb, VI6_HGO_REGRST, VI6_HGO_REGRST_RCLEA);
 
-	vsp1_hgo_write(hgo, dl, VI6_HGO_OFFSET,
+	vsp1_hgo_write(hgo, dlb, VI6_HGO_OFFSET,
 		       (crop->left << VI6_HGO_OFFSET_HOFFSET_SHIFT) |
 		       (crop->top << VI6_HGO_OFFSET_VOFFSET_SHIFT));
-	vsp1_hgo_write(hgo, dl, VI6_HGO_SIZE,
+	vsp1_hgo_write(hgo, dlb, VI6_HGO_SIZE,
 		       (crop->width << VI6_HGO_SIZE_HSIZE_SHIFT) |
 		       (crop->height << VI6_HGO_SIZE_VSIZE_SHIFT));
 
@@ -166,7 +166,7 @@ static void hgo_configure_stream(struct vsp1_entity *entity,
 
 	hratio = crop->width * 2 / compose->width / 3;
 	vratio = crop->height * 2 / compose->height / 3;
-	vsp1_hgo_write(hgo, dl, VI6_HGO_MODE,
+	vsp1_hgo_write(hgo, dlb, VI6_HGO_MODE,
 		       (hgo->num_bins == 256 ? VI6_HGO_MODE_STEP : 0) |
 		       (hgo->max_rgb ? VI6_HGO_MODE_MAXRGB : 0) |
 		       (hratio << VI6_HGO_MODE_HRATIO_SHIFT) |
diff --git a/drivers/media/platform/vsp1/vsp1_hgt.c b/drivers/media/platform/vsp1/vsp1_hgt.c
index c7cde8e90029..e780357f979b 100644
--- a/drivers/media/platform/vsp1/vsp1_hgt.c
+++ b/drivers/media/platform/vsp1/vsp1_hgt.c
@@ -32,10 +32,10 @@ static inline u32 vsp1_hgt_read(struct vsp1_hgt *hgt, u32 reg)
 	return vsp1_read(hgt->histo.entity.vsp1, reg);
 }
 
-static inline void vsp1_hgt_write(struct vsp1_hgt *hgt, struct vsp1_dl_list *dl,
-				  u32 reg, u32 data)
+static inline void vsp1_hgt_write(struct vsp1_hgt *hgt,
+				  struct vsp1_dl_body *dlb, u32 reg, u32 data)
 {
-	vsp1_dl_list_write(dl, reg, data);
+	vsp1_dl_body_write(dlb, reg, data);
 }
 
 /* -----------------------------------------------------------------------------
@@ -131,7 +131,7 @@ static const struct v4l2_ctrl_config hgt_hue_areas = {
 
 static void hgt_configure_stream(struct vsp1_entity *entity,
 				 struct vsp1_pipeline *pipe,
-				 struct vsp1_dl_list *dl)
+				 struct vsp1_dl_body *dlb)
 {
 	struct vsp1_hgt *hgt = to_hgt(&entity->subdev);
 	struct v4l2_rect *compose;
@@ -148,12 +148,12 @@ static void hgt_configure_stream(struct vsp1_entity *entity,
 						HISTO_PAD_SINK,
 						V4L2_SEL_TGT_COMPOSE);
 
-	vsp1_hgt_write(hgt, dl, VI6_HGT_REGRST, VI6_HGT_REGRST_RCLEA);
+	vsp1_hgt_write(hgt, dlb, VI6_HGT_REGRST, VI6_HGT_REGRST_RCLEA);
 
-	vsp1_hgt_write(hgt, dl, VI6_HGT_OFFSET,
+	vsp1_hgt_write(hgt, dlb, VI6_HGT_OFFSET,
 		       (crop->left << VI6_HGT_OFFSET_HOFFSET_SHIFT) |
 		       (crop->top << VI6_HGT_OFFSET_VOFFSET_SHIFT));
-	vsp1_hgt_write(hgt, dl, VI6_HGT_SIZE,
+	vsp1_hgt_write(hgt, dlb, VI6_HGT_SIZE,
 		       (crop->width << VI6_HGT_SIZE_HSIZE_SHIFT) |
 		       (crop->height << VI6_HGT_SIZE_VSIZE_SHIFT));
 
@@ -161,7 +161,7 @@ static void hgt_configure_stream(struct vsp1_entity *entity,
 	for (i = 0; i < HGT_NUM_HUE_AREAS; ++i) {
 		lower = hgt->hue_areas[i*2 + 0];
 		upper = hgt->hue_areas[i*2 + 1];
-		vsp1_hgt_write(hgt, dl, VI6_HGT_HUE_AREA(i),
+		vsp1_hgt_write(hgt, dlb, VI6_HGT_HUE_AREA(i),
 			       (lower << VI6_HGT_HUE_AREA_LOWER_SHIFT) |
 			       (upper << VI6_HGT_HUE_AREA_UPPER_SHIFT));
 	}
@@ -169,7 +169,7 @@ static void hgt_configure_stream(struct vsp1_entity *entity,
 
 	hratio = crop->width * 2 / compose->width / 3;
 	vratio = crop->height * 2 / compose->height / 3;
-	vsp1_hgt_write(hgt, dl, VI6_HGT_MODE,
+	vsp1_hgt_write(hgt, dlb, VI6_HGT_MODE,
 		       (hratio << VI6_HGT_MODE_HRATIO_SHIFT) |
 		       (vratio << VI6_HGT_MODE_VRATIO_SHIFT));
 }
diff --git a/drivers/media/platform/vsp1/vsp1_hsit.c b/drivers/media/platform/vsp1/vsp1_hsit.c
index 0452f99592f8..b33e437b88b1 100644
--- a/drivers/media/platform/vsp1/vsp1_hsit.c
+++ b/drivers/media/platform/vsp1/vsp1_hsit.c
@@ -28,9 +28,9 @@
  */
 
 static inline void vsp1_hsit_write(struct vsp1_hsit *hsit,
-				   struct vsp1_dl_list *dl, u32 reg, u32 data)
+				   struct vsp1_dl_body *dlb, u32 reg, u32 data)
 {
-	vsp1_dl_list_write(dl, reg, data);
+	vsp1_dl_body_write(dlb, reg, data);
 }
 
 /* -----------------------------------------------------------------------------
@@ -133,14 +133,14 @@ static const struct v4l2_subdev_ops hsit_ops = {
 
 static void hsit_configure_stream(struct vsp1_entity *entity,
 				  struct vsp1_pipeline *pipe,
-				  struct vsp1_dl_list *dl)
+				  struct vsp1_dl_body *dlb)
 {
 	struct vsp1_hsit *hsit = to_hsit(&entity->subdev);
 
 	if (hsit->inverse)
-		vsp1_hsit_write(hsit, dl, VI6_HSI_CTRL, VI6_HSI_CTRL_EN);
+		vsp1_hsit_write(hsit, dlb, VI6_HSI_CTRL, VI6_HSI_CTRL_EN);
 	else
-		vsp1_hsit_write(hsit, dl, VI6_HST_CTRL, VI6_HST_CTRL_EN);
+		vsp1_hsit_write(hsit, dlb, VI6_HST_CTRL, VI6_HST_CTRL_EN);
 }
 
 static const struct vsp1_entity_operations hsit_entity_ops = {
diff --git a/drivers/media/platform/vsp1/vsp1_lif.c b/drivers/media/platform/vsp1/vsp1_lif.c
index 9d6a77586285..d9309133529d 100644
--- a/drivers/media/platform/vsp1/vsp1_lif.c
+++ b/drivers/media/platform/vsp1/vsp1_lif.c
@@ -27,10 +27,11 @@
  * Device Access
  */
 
-static inline void vsp1_lif_write(struct vsp1_lif *lif, struct vsp1_dl_list *dl,
-				  u32 reg, u32 data)
+static inline void vsp1_lif_write(struct vsp1_lif *lif,
+				  struct vsp1_dl_body *dlb, u32 reg, u32 data)
 {
-	vsp1_dl_list_write(dl, reg + lif->entity.index * VI6_LIF_OFFSET, data);
+	vsp1_dl_body_write(dlb, reg + lif->entity.index * VI6_LIF_OFFSET,
+			       data);
 }
 
 /* -----------------------------------------------------------------------------
@@ -130,7 +131,7 @@ static const struct v4l2_subdev_ops lif_ops = {
 
 static void lif_configure_stream(struct vsp1_entity *entity,
 				 struct vsp1_pipeline *pipe,
-				 struct vsp1_dl_list *dl)
+				 struct vsp1_dl_body *dlb)
 {
 	const struct v4l2_mbus_framefmt *format;
 	struct vsp1_lif *lif = to_lif(&entity->subdev);
@@ -143,11 +144,11 @@ static void lif_configure_stream(struct vsp1_entity *entity,
 
 	obth = min(obth, (format->width + 1) / 2 * format->height - 4);
 
-	vsp1_lif_write(lif, dl, VI6_LIF_CSBTH,
+	vsp1_lif_write(lif, dlb, VI6_LIF_CSBTH,
 			(hbth << VI6_LIF_CSBTH_HBTH_SHIFT) |
 			(lbth << VI6_LIF_CSBTH_LBTH_SHIFT));
 
-	vsp1_lif_write(lif, dl, VI6_LIF_CTRL,
+	vsp1_lif_write(lif, dlb, VI6_LIF_CTRL,
 			(obth << VI6_LIF_CTRL_OBTH_SHIFT) |
 			(format->code == 0 ? VI6_LIF_CTRL_CFMT : 0) |
 			VI6_LIF_CTRL_REQSEL | VI6_LIF_CTRL_LIF_EN);
@@ -160,7 +161,7 @@ static void lif_configure_stream(struct vsp1_entity *entity,
 	 */
 	if ((entity->vsp1->version & VI6_IP_VERSION_MASK) ==
 	    (VI6_IP_VERSION_MODEL_VSPD_V3 | VI6_IP_VERSION_SOC_V3M))
-		vsp1_lif_write(lif, dl, VI6_LIF_LBA,
+		vsp1_lif_write(lif, dlb, VI6_LIF_LBA,
 			       VI6_LIF_LBA_LBA0 |
 			       (1536 << VI6_LIF_LBA_LBA1_SHIFT));
 }
diff --git a/drivers/media/platform/vsp1/vsp1_lut.c b/drivers/media/platform/vsp1/vsp1_lut.c
index 6d160aabb185..ec07d5b0a0c0 100644
--- a/drivers/media/platform/vsp1/vsp1_lut.c
+++ b/drivers/media/platform/vsp1/vsp1_lut.c
@@ -29,10 +29,10 @@
  * Device Access
  */
 
-static inline void vsp1_lut_write(struct vsp1_lut *lut, struct vsp1_dl_list *dl,
-				  u32 reg, u32 data)
+static inline void vsp1_lut_write(struct vsp1_lut *lut,
+				  struct vsp1_dl_body *dlb, u32 reg, u32 data)
 {
-	vsp1_dl_list_write(dl, reg, data);
+	vsp1_dl_body_write(dlb, reg, data);
 }
 
 /* -----------------------------------------------------------------------------
@@ -192,33 +192,34 @@ static const struct v4l2_subdev_ops lut_ops = {
 
 static void lut_configure_stream(struct vsp1_entity *entity,
 				 struct vsp1_pipeline *pipe,
-				 struct vsp1_dl_list *dl)
+				 struct vsp1_dl_body *dlb)
 {
 	struct vsp1_lut *lut = to_lut(&entity->subdev);
 
-	vsp1_lut_write(lut, dl, VI6_LUT_CTRL, VI6_LUT_CTRL_EN);
+	vsp1_lut_write(lut, dlb, VI6_LUT_CTRL, VI6_LUT_CTRL_EN);
 }
 
 static void lut_configure_frame(struct vsp1_entity *entity,
 				struct vsp1_pipeline *pipe,
 				struct vsp1_dl_list *dl,
+				struct vsp1_dl_body *dlb,
 				unsigned int partition)
 {
 	struct vsp1_lut *lut = to_lut(&entity->subdev);
-	struct vsp1_dl_body *dlb;
+	struct vsp1_dl_body *lut_dlb;
 	unsigned long flags;
 
 	if (partition == 0) {
 		spin_lock_irqsave(&lut->lock, flags);
-		dlb = lut->lut;
+		lut_dlb = lut->lut;
 		lut->lut = NULL;
 		spin_unlock_irqrestore(&lut->lock, flags);
 
-		if (dlb) {
-			vsp1_dl_list_add_body(dl, dlb);
+		if (lut_dlb) {
+			vsp1_dl_list_add_body(dl, lut_dlb);
 
 			/* release our local reference */
-			vsp1_dl_body_put(dlb);
+			vsp1_dl_body_put(lut_dlb);
 		}
 	}
 }
diff --git a/drivers/media/platform/vsp1/vsp1_pipe.c b/drivers/media/platform/vsp1/vsp1_pipe.c
index 44944ac86d9b..5012643583b6 100644
--- a/drivers/media/platform/vsp1/vsp1_pipe.c
+++ b/drivers/media/platform/vsp1/vsp1_pipe.c
@@ -367,7 +367,7 @@ void vsp1_pipeline_frame_end(struct vsp1_pipeline *pipe)
  * from the input RPF alpha.
  */
 void vsp1_pipeline_propagate_alpha(struct vsp1_pipeline *pipe,
-				   struct vsp1_dl_list *dl, unsigned int alpha)
+				   struct vsp1_dl_body *dlb, unsigned int alpha)
 {
 	if (!pipe->uds)
 		return;
@@ -380,7 +380,7 @@ void vsp1_pipeline_propagate_alpha(struct vsp1_pipeline *pipe,
 	    pipe->uds_input->type == VSP1_ENTITY_BRS)
 		alpha = 255;
 
-	vsp1_uds_set_alpha(pipe->uds, dl, alpha);
+	vsp1_uds_set_alpha(pipe->uds, dlb, alpha);
 }
 
 /*
diff --git a/drivers/media/platform/vsp1/vsp1_pipe.h b/drivers/media/platform/vsp1/vsp1_pipe.h
index dfff9b5685fe..90d29492b9b9 100644
--- a/drivers/media/platform/vsp1/vsp1_pipe.h
+++ b/drivers/media/platform/vsp1/vsp1_pipe.h
@@ -161,7 +161,8 @@ bool vsp1_pipeline_ready(struct vsp1_pipeline *pipe);
 void vsp1_pipeline_frame_end(struct vsp1_pipeline *pipe);
 
 void vsp1_pipeline_propagate_alpha(struct vsp1_pipeline *pipe,
-				   struct vsp1_dl_list *dl, unsigned int alpha);
+				   struct vsp1_dl_body *dlb,
+				   unsigned int alpha);
 
 void vsp1_pipeline_propagate_partition(struct vsp1_pipeline *pipe,
 				       struct vsp1_partition *partition,
diff --git a/drivers/media/platform/vsp1/vsp1_rpf.c b/drivers/media/platform/vsp1/vsp1_rpf.c
index 48c65e4a8546..67f2fb3e0611 100644
--- a/drivers/media/platform/vsp1/vsp1_rpf.c
+++ b/drivers/media/platform/vsp1/vsp1_rpf.c
@@ -29,9 +29,10 @@
  */
 
 static inline void vsp1_rpf_write(struct vsp1_rwpf *rpf,
-				  struct vsp1_dl_list *dl, u32 reg, u32 data)
+				  struct vsp1_dl_body *dlb, u32 reg, u32 data)
 {
-	vsp1_dl_list_write(dl, reg + rpf->entity.index * VI6_RPF_OFFSET, data);
+	vsp1_dl_body_write(dlb, reg + rpf->entity.index * VI6_RPF_OFFSET,
+			       data);
 }
 
 /* -----------------------------------------------------------------------------
@@ -48,7 +49,7 @@ static const struct v4l2_subdev_ops rpf_ops = {
 
 static void rpf_configure_stream(struct vsp1_entity *entity,
 				 struct vsp1_pipeline *pipe,
-				 struct vsp1_dl_list *dl)
+				 struct vsp1_dl_body *dlb)
 {
 	struct vsp1_rwpf *rpf = to_rwpf(&entity->subdev);
 	const struct vsp1_format_info *fmtinfo = rpf->fmtinfo;
@@ -67,7 +68,7 @@ static void rpf_configure_stream(struct vsp1_entity *entity,
 		pstride |= format->plane_fmt[1].bytesperline
 			<< VI6_RPF_SRCM_PSTRIDE_C_SHIFT;
 
-	vsp1_rpf_write(rpf, dl, VI6_RPF_SRCM_PSTRIDE, pstride);
+	vsp1_rpf_write(rpf, dlb, VI6_RPF_SRCM_PSTRIDE, pstride);
 
 	/* Format */
 	sink_format = vsp1_entity_get_pad_format(&rpf->entity,
@@ -88,8 +89,8 @@ static void rpf_configure_stream(struct vsp1_entity *entity,
 	if (sink_format->code != source_format->code)
 		infmt |= VI6_RPF_INFMT_CSC;
 
-	vsp1_rpf_write(rpf, dl, VI6_RPF_INFMT, infmt);
-	vsp1_rpf_write(rpf, dl, VI6_RPF_DSWAP, fmtinfo->swap);
+	vsp1_rpf_write(rpf, dlb, VI6_RPF_INFMT, infmt);
+	vsp1_rpf_write(rpf, dlb, VI6_RPF_DSWAP, fmtinfo->swap);
 
 	/* Output location */
 	if (pipe->bru) {
@@ -103,7 +104,7 @@ static void rpf_configure_stream(struct vsp1_entity *entity,
 		top = compose->top;
 	}
 
-	vsp1_rpf_write(rpf, dl, VI6_RPF_LOC,
+	vsp1_rpf_write(rpf, dlb, VI6_RPF_LOC,
 		       (left << VI6_RPF_LOC_HCOORD_SHIFT) |
 		       (top << VI6_RPF_LOC_VCOORD_SHIFT));
 
@@ -130,7 +131,7 @@ static void rpf_configure_stream(struct vsp1_entity *entity,
 	 *
 	 * In all cases, disable color keying.
 	 */
-	vsp1_rpf_write(rpf, dl, VI6_RPF_ALPH_SEL, VI6_RPF_ALPH_SEL_AEXT_EXT |
+	vsp1_rpf_write(rpf, dlb, VI6_RPF_ALPH_SEL, VI6_RPF_ALPH_SEL_AEXT_EXT |
 		       (fmtinfo->alpha ? VI6_RPF_ALPH_SEL_ASEL_PACKED
 				       : VI6_RPF_ALPH_SEL_ASEL_FIXED));
 
@@ -167,14 +168,15 @@ static void rpf_configure_stream(struct vsp1_entity *entity,
 		rpf->mult_alpha = mult;
 	}
 
-	vsp1_rpf_write(rpf, dl, VI6_RPF_MSK_CTRL, 0);
-	vsp1_rpf_write(rpf, dl, VI6_RPF_CKEY_CTRL, 0);
+	vsp1_rpf_write(rpf, dlb, VI6_RPF_MSK_CTRL, 0);
+	vsp1_rpf_write(rpf, dlb, VI6_RPF_CKEY_CTRL, 0);
 
 }
 
 static void rpf_configure_frame(struct vsp1_entity *entity,
 				struct vsp1_pipeline *pipe,
 				struct vsp1_dl_list *dl,
+				struct vsp1_dl_body *dlb,
 				unsigned int partition)
 {
 	struct vsp1_rwpf *rpf = to_rwpf(&entity->subdev);
@@ -185,15 +187,14 @@ static void rpf_configure_frame(struct vsp1_entity *entity,
 	struct v4l2_rect crop;
 
 	if (partition == 0) {
-		vsp1_rpf_write(rpf, dl, VI6_RPF_VRTCOL_SET,
+		vsp1_rpf_write(rpf, dlb, VI6_RPF_VRTCOL_SET,
 			       rpf->alpha << VI6_RPF_VRTCOL_SET_LAYA_SHIFT);
-		vsp1_rpf_write(rpf, dl, VI6_RPF_MULT_ALPHA, rpf->mult_alpha |
+		vsp1_rpf_write(rpf, dlb, VI6_RPF_MULT_ALPHA, rpf->mult_alpha |
 			       (rpf->alpha << VI6_RPF_MULT_ALPHA_RATIO_SHIFT));
 
-		vsp1_pipeline_propagate_alpha(pipe, dl, rpf->alpha);
+		vsp1_pipeline_propagate_alpha(pipe, dlb, rpf->alpha);
 	}
 
-
 	/*
 	 * Source size and crop offsets.
 	 *
@@ -219,10 +220,10 @@ static void rpf_configure_frame(struct vsp1_entity *entity,
 		crop.left += pipe->partition->rpf.left;
 	}
 
-	vsp1_rpf_write(rpf, dl, VI6_RPF_SRC_BSIZE,
+	vsp1_rpf_write(rpf, dlb, VI6_RPF_SRC_BSIZE,
 		       (crop.width << VI6_RPF_SRC_BSIZE_BHSIZE_SHIFT) |
 		       (crop.height << VI6_RPF_SRC_BSIZE_BVSIZE_SHIFT));
-	vsp1_rpf_write(rpf, dl, VI6_RPF_SRC_ESIZE,
+	vsp1_rpf_write(rpf, dlb, VI6_RPF_SRC_ESIZE,
 		       (crop.width << VI6_RPF_SRC_ESIZE_EHSIZE_SHIFT) |
 		       (crop.height << VI6_RPF_SRC_ESIZE_EVSIZE_SHIFT));
 
@@ -247,9 +248,9 @@ static void rpf_configure_frame(struct vsp1_entity *entity,
 	    fmtinfo->swap_uv)
 		swap(mem.addr[1], mem.addr[2]);
 
-	vsp1_rpf_write(rpf, dl, VI6_RPF_SRCM_ADDR_Y, mem.addr[0]);
-	vsp1_rpf_write(rpf, dl, VI6_RPF_SRCM_ADDR_C0, mem.addr[1]);
-	vsp1_rpf_write(rpf, dl, VI6_RPF_SRCM_ADDR_C1, mem.addr[2]);
+	vsp1_rpf_write(rpf, dlb, VI6_RPF_SRCM_ADDR_Y, mem.addr[0]);
+	vsp1_rpf_write(rpf, dlb, VI6_RPF_SRCM_ADDR_C0, mem.addr[1]);
+	vsp1_rpf_write(rpf, dlb, VI6_RPF_SRCM_ADDR_C1, mem.addr[2]);
 }
 
 static void rpf_partition(struct vsp1_entity *entity,
diff --git a/drivers/media/platform/vsp1/vsp1_sru.c b/drivers/media/platform/vsp1/vsp1_sru.c
index 485b2820c8cd..7b6235655529 100644
--- a/drivers/media/platform/vsp1/vsp1_sru.c
+++ b/drivers/media/platform/vsp1/vsp1_sru.c
@@ -28,10 +28,10 @@
  * Device Access
  */
 
-static inline void vsp1_sru_write(struct vsp1_sru *sru, struct vsp1_dl_list *dl,
-				  u32 reg, u32 data)
+static inline void vsp1_sru_write(struct vsp1_sru *sru,
+				  struct vsp1_dl_body *dlb, u32 reg, u32 data)
 {
-	vsp1_dl_list_write(dl, reg, data);
+	vsp1_dl_body_write(dlb, reg, data);
 }
 
 /* -----------------------------------------------------------------------------
@@ -273,7 +273,7 @@ static const struct v4l2_subdev_ops sru_ops = {
 
 static void sru_configure_stream(struct vsp1_entity *entity,
 				 struct vsp1_pipeline *pipe,
-				 struct vsp1_dl_list *dl)
+				 struct vsp1_dl_body *dlb)
 {
 	const struct vsp1_sru_param *param;
 	struct vsp1_sru *sru = to_sru(&entity->subdev);
@@ -299,9 +299,9 @@ static void sru_configure_stream(struct vsp1_entity *entity,
 
 	ctrl0 |= param->ctrl0;
 
-	vsp1_sru_write(sru, dl, VI6_SRU_CTRL0, ctrl0);
-	vsp1_sru_write(sru, dl, VI6_SRU_CTRL1, VI6_SRU_CTRL1_PARAM5);
-	vsp1_sru_write(sru, dl, VI6_SRU_CTRL2, param->ctrl2);
+	vsp1_sru_write(sru, dlb, VI6_SRU_CTRL0, ctrl0);
+	vsp1_sru_write(sru, dlb, VI6_SRU_CTRL1, VI6_SRU_CTRL1_PARAM5);
+	vsp1_sru_write(sru, dlb, VI6_SRU_CTRL2, param->ctrl2);
 }
 
 static unsigned int sru_max_width(struct vsp1_entity *entity,
diff --git a/drivers/media/platform/vsp1/vsp1_uds.c b/drivers/media/platform/vsp1/vsp1_uds.c
index ce1731c2b3a9..6ddfce4bd095 100644
--- a/drivers/media/platform/vsp1/vsp1_uds.c
+++ b/drivers/media/platform/vsp1/vsp1_uds.c
@@ -31,22 +31,23 @@
  * Device Access
  */
 
-static inline void vsp1_uds_write(struct vsp1_uds *uds, struct vsp1_dl_list *dl,
-				  u32 reg, u32 data)
+static inline void vsp1_uds_write(struct vsp1_uds *uds,
+				  struct vsp1_dl_body *dlb, u32 reg, u32 data)
 {
-	vsp1_dl_list_write(dl, reg + uds->entity.index * VI6_UDS_OFFSET, data);
+	vsp1_dl_body_write(dlb, reg + uds->entity.index * VI6_UDS_OFFSET,
+			       data);
 }
 
 /* -----------------------------------------------------------------------------
  * Scaling Computation
  */
 
-void vsp1_uds_set_alpha(struct vsp1_entity *entity, struct vsp1_dl_list *dl,
+void vsp1_uds_set_alpha(struct vsp1_entity *entity, struct vsp1_dl_body *dlb,
 			unsigned int alpha)
 {
 	struct vsp1_uds *uds = to_uds(&entity->subdev);
 
-	vsp1_uds_write(uds, dl, VI6_UDS_ALPVAL,
+	vsp1_uds_write(uds, dlb, VI6_UDS_ALPVAL,
 		       alpha << VI6_UDS_ALPVAL_VAL0_SHIFT);
 }
 
@@ -261,7 +262,7 @@ static const struct v4l2_subdev_ops uds_ops = {
 
 static void uds_configure_stream(struct vsp1_entity *entity,
 				 struct vsp1_pipeline *pipe,
-				 struct vsp1_dl_list *dl)
+				 struct vsp1_dl_body *dlb)
 {
 	struct vsp1_uds *uds = to_uds(&entity->subdev);
 	const struct v4l2_mbus_framefmt *output;
@@ -290,18 +291,18 @@ static void uds_configure_stream(struct vsp1_entity *entity,
 	else
 		multitap = true;
 
-	vsp1_uds_write(uds, dl, VI6_UDS_CTRL,
+	vsp1_uds_write(uds, dlb, VI6_UDS_CTRL,
 		       (uds->scale_alpha ? VI6_UDS_CTRL_AON : 0) |
 		       (multitap ? VI6_UDS_CTRL_BC : 0));
 
-	vsp1_uds_write(uds, dl, VI6_UDS_PASS_BWIDTH,
+	vsp1_uds_write(uds, dlb, VI6_UDS_PASS_BWIDTH,
 		       (uds_passband_width(hscale)
 				<< VI6_UDS_PASS_BWIDTH_H_SHIFT) |
 		       (uds_passband_width(vscale)
 				<< VI6_UDS_PASS_BWIDTH_V_SHIFT));
 
 	/* Set the scaling ratios. */
-	vsp1_uds_write(uds, dl, VI6_UDS_SCALE,
+	vsp1_uds_write(uds, dlb, VI6_UDS_SCALE,
 		       (hscale << VI6_UDS_SCALE_HFRAC_SHIFT) |
 		       (vscale << VI6_UDS_SCALE_VFRAC_SHIFT));
 }
@@ -309,6 +310,7 @@ static void uds_configure_stream(struct vsp1_entity *entity,
 static void uds_configure_frame(struct vsp1_entity *entity,
 				struct vsp1_pipeline *pipe,
 				struct vsp1_dl_list *dl,
+				struct vsp1_dl_body *dlb,
 				unsigned int pindex)
 {
 	struct vsp1_uds *uds = to_uds(&entity->subdev);
@@ -319,13 +321,13 @@ static void uds_configure_frame(struct vsp1_entity *entity,
 					    UDS_PAD_SOURCE);
 
 	/* Input size clipping */
-	vsp1_uds_write(uds, dl, VI6_UDS_HSZCLIP, VI6_UDS_HSZCLIP_HCEN |
+	vsp1_uds_write(uds, dlb, VI6_UDS_HSZCLIP, VI6_UDS_HSZCLIP_HCEN |
 		       (0 << VI6_UDS_HSZCLIP_HCL_OFST_SHIFT) |
 		       (partition->uds_sink.width
 				<< VI6_UDS_HSZCLIP_HCL_SIZE_SHIFT));
 
 	/* Output size clipping */
-	vsp1_uds_write(uds, dl, VI6_UDS_CLIP_SIZE,
+	vsp1_uds_write(uds, dlb, VI6_UDS_CLIP_SIZE,
 		       (partition->uds_source.width
 				<< VI6_UDS_CLIP_SIZE_HSIZE_SHIFT) |
 		       (output->height
diff --git a/drivers/media/platform/vsp1/vsp1_uds.h b/drivers/media/platform/vsp1/vsp1_uds.h
index 7bf3cdcffc65..d99997f3b28d 100644
--- a/drivers/media/platform/vsp1/vsp1_uds.h
+++ b/drivers/media/platform/vsp1/vsp1_uds.h
@@ -35,7 +35,7 @@ static inline struct vsp1_uds *to_uds(struct v4l2_subdev *subdev)
 
 struct vsp1_uds *vsp1_uds_create(struct vsp1_device *vsp1, unsigned int index);
 
-void vsp1_uds_set_alpha(struct vsp1_entity *uds, struct vsp1_dl_list *dl,
+void vsp1_uds_set_alpha(struct vsp1_entity *uds, struct vsp1_dl_body *dlb,
 			unsigned int alpha);
 
 #endif /* __VSP1_UDS_H__ */
diff --git a/drivers/media/platform/vsp1/vsp1_video.c b/drivers/media/platform/vsp1/vsp1_video.c
index 1b5a31734834..b47708660e53 100644
--- a/drivers/media/platform/vsp1/vsp1_video.c
+++ b/drivers/media/platform/vsp1/vsp1_video.c
@@ -383,11 +383,12 @@ static void vsp1_video_pipeline_run_partition(struct vsp1_pipeline *pipe,
 					      unsigned int partition)
 {
 	struct vsp1_entity *entity;
+	struct vsp1_dl_body *dlb = vsp1_dl_list_get_body0(dl);
 
 	pipe->partition = &pipe->part_table[partition];
 
 	list_for_each_entry(entity, &pipe->entities, list_pipe)
-		vsp1_entity_configure_frame(entity, pipe, dl, partition);
+		vsp1_entity_configure_frame(entity, pipe, dl, dlb, partition);
 }
 
 static void vsp1_video_pipeline_run(struct vsp1_pipeline *pipe)
@@ -790,6 +791,7 @@ static void vsp1_video_buffer_queue(struct vb2_buffer *vb)
 static int vsp1_video_setup_pipeline(struct vsp1_pipeline *pipe)
 {
 	struct vsp1_entity *entity;
+	struct vsp1_dl_body *dlb;
 	int ret;
 
 	/* Determine this pipelines sizes for image partitioning support. */
@@ -802,6 +804,9 @@ static int vsp1_video_setup_pipeline(struct vsp1_pipeline *pipe)
 	if (!pipe->dl)
 		return -ENOMEM;
 
+	/* Retrieve the default DLB from the list */
+	dlb = vsp1_dl_list_get_body0(pipe->dl);
+
 	if (pipe->uds) {
 		struct vsp1_uds *uds = to_uds(&pipe->uds->subdev);
 
@@ -824,8 +829,8 @@ static int vsp1_video_setup_pipeline(struct vsp1_pipeline *pipe)
 	}
 
 	list_for_each_entry(entity, &pipe->entities, list_pipe) {
-		vsp1_entity_route_setup(entity, pipe, pipe->dl);
-		vsp1_entity_configure_stream(entity, pipe, pipe->dl);
+		vsp1_entity_route_setup(entity, pipe, dlb);
+		vsp1_entity_configure_stream(entity, pipe, dlb);
 	}
 
 	return 0;
diff --git a/drivers/media/platform/vsp1/vsp1_wpf.c b/drivers/media/platform/vsp1/vsp1_wpf.c
index 6a6cdf0fb5f1..68218625549e 100644
--- a/drivers/media/platform/vsp1/vsp1_wpf.c
+++ b/drivers/media/platform/vsp1/vsp1_wpf.c
@@ -31,9 +31,10 @@
  */
 
 static inline void vsp1_wpf_write(struct vsp1_rwpf *wpf,
-				  struct vsp1_dl_list *dl, u32 reg, u32 data)
+				  struct vsp1_dl_body *dlb, u32 reg, u32 data)
 {
-	vsp1_dl_list_write(dl, reg + wpf->entity.index * VI6_WPF_OFFSET, data);
+	vsp1_dl_body_write(dlb, reg + wpf->entity.index * VI6_WPF_OFFSET,
+			       data);
 }
 
 /* -----------------------------------------------------------------------------
@@ -238,7 +239,7 @@ static void vsp1_wpf_destroy(struct vsp1_entity *entity)
 
 static void wpf_configure_stream(struct vsp1_entity *entity,
 				 struct vsp1_pipeline *pipe,
-				 struct vsp1_dl_list *dl)
+				 struct vsp1_dl_body *dlb)
 {
 	struct vsp1_rwpf *wpf = to_rwpf(&entity->subdev);
 	struct vsp1_device *vsp1 = wpf->entity.vsp1;
@@ -272,17 +273,17 @@ static void wpf_configure_stream(struct vsp1_entity *entity,
 			outfmt |= VI6_WPF_OUTFMT_SPUVS;
 
 		/* Destination stride and byte swapping. */
-		vsp1_wpf_write(wpf, dl, VI6_WPF_DSTM_STRIDE_Y,
+		vsp1_wpf_write(wpf, dlb, VI6_WPF_DSTM_STRIDE_Y,
 			       format->plane_fmt[0].bytesperline);
 		if (format->num_planes > 1)
-			vsp1_wpf_write(wpf, dl, VI6_WPF_DSTM_STRIDE_C,
+			vsp1_wpf_write(wpf, dlb, VI6_WPF_DSTM_STRIDE_C,
 				       format->plane_fmt[1].bytesperline);
 
-		vsp1_wpf_write(wpf, dl, VI6_WPF_DSWAP, fmtinfo->swap);
+		vsp1_wpf_write(wpf, dlb, VI6_WPF_DSWAP, fmtinfo->swap);
 
 		if (vsp1->info->features & VSP1_HAS_WPF_HFLIP &&
 		    wpf->entity.index == 0)
-			vsp1_wpf_write(wpf, dl, VI6_WPF_ROT_CTRL,
+			vsp1_wpf_write(wpf, dlb, VI6_WPF_ROT_CTRL,
 				       VI6_WPF_ROT_CTRL_LN16 |
 				       (256 << VI6_WPF_ROT_CTRL_LMEM_WD_SHIFT));
 	}
@@ -292,10 +293,10 @@ static void wpf_configure_stream(struct vsp1_entity *entity,
 
 	wpf->outfmt = outfmt;
 
-	vsp1_dl_list_write(dl, VI6_DPR_WPF_FPORCH(wpf->entity.index),
-			   VI6_DPR_WPF_FPORCH_FP_WPFN);
+	vsp1_dl_body_write(dlb, VI6_DPR_WPF_FPORCH(wpf->entity.index),
+			       VI6_DPR_WPF_FPORCH_FP_WPFN);
 
-	vsp1_dl_list_write(dl, VI6_WPF_WRBCK_CTRL, 0);
+	vsp1_dl_body_write(dlb, VI6_WPF_WRBCK_CTRL, 0);
 
 	/*
 	 * Sources. If the pipeline has a single input and BRU is not used,
@@ -319,17 +320,18 @@ static void wpf_configure_stream(struct vsp1_entity *entity,
 			? VI6_WPF_SRCRPF_VIRACT_MST
 			: VI6_WPF_SRCRPF_VIRACT2_MST;
 
-	vsp1_wpf_write(wpf, dl, VI6_WPF_SRCRPF, srcrpf);
+	vsp1_wpf_write(wpf, dlb, VI6_WPF_SRCRPF, srcrpf);
 
 	/* Enable interrupts */
-	vsp1_dl_list_write(dl, VI6_WPF_IRQ_STA(wpf->entity.index), 0);
-	vsp1_dl_list_write(dl, VI6_WPF_IRQ_ENB(wpf->entity.index),
-			   VI6_WFP_IRQ_ENB_DFEE);
+	vsp1_dl_body_write(dlb, VI6_WPF_IRQ_STA(wpf->entity.index), 0);
+	vsp1_dl_body_write(dlb, VI6_WPF_IRQ_ENB(wpf->entity.index),
+			       VI6_WFP_IRQ_ENB_DFEE);
 }
 
 static void wpf_configure_frame(struct vsp1_entity *entity,
 				struct vsp1_pipeline *pipe,
 				struct vsp1_dl_list *dl,
+				struct vsp1_dl_body *dlb,
 				unsigned int partition)
 {
 	struct vsp1_rwpf *wpf = to_rwpf(&entity->subdev);
@@ -363,7 +365,7 @@ static void wpf_configure_frame(struct vsp1_entity *entity,
 		if (wpf->flip.active & BIT(WPF_CTRL_HFLIP))
 			outfmt |= VI6_WPF_OUTFMT_HFLP;
 
-		vsp1_wpf_write(wpf, dl, VI6_WPF_OUTFMT, outfmt);
+		vsp1_wpf_write(wpf, dlb, VI6_WPF_OUTFMT, outfmt);
 	}
 
 	sink_format = vsp1_entity_get_pad_format(&wpf->entity,
@@ -379,10 +381,10 @@ static void wpf_configure_frame(struct vsp1_entity *entity,
 	if (pipe->partitions > 1)
 		width = pipe->partition->wpf.width;
 
-	vsp1_wpf_write(wpf, dl, VI6_WPF_HSZCLIP, VI6_WPF_SZCLIP_EN |
+	vsp1_wpf_write(wpf, dlb, VI6_WPF_HSZCLIP, VI6_WPF_SZCLIP_EN |
 		       (0 << VI6_WPF_SZCLIP_OFST_SHIFT) |
 		       (width << VI6_WPF_SZCLIP_SIZE_SHIFT));
-	vsp1_wpf_write(wpf, dl, VI6_WPF_VSZCLIP, VI6_WPF_SZCLIP_EN |
+	vsp1_wpf_write(wpf, dlb, VI6_WPF_VSZCLIP, VI6_WPF_SZCLIP_EN |
 		       (0 << VI6_WPF_SZCLIP_OFST_SHIFT) |
 		       (height << VI6_WPF_SZCLIP_SIZE_SHIFT));
 
@@ -474,9 +476,9 @@ static void wpf_configure_frame(struct vsp1_entity *entity,
 	    fmtinfo->swap_uv)
 		swap(mem.addr[1], mem.addr[2]);
 
-	vsp1_wpf_write(wpf, dl, VI6_WPF_DSTM_ADDR_Y, mem.addr[0]);
-	vsp1_wpf_write(wpf, dl, VI6_WPF_DSTM_ADDR_C0, mem.addr[1]);
-	vsp1_wpf_write(wpf, dl, VI6_WPF_DSTM_ADDR_C1, mem.addr[2]);
+	vsp1_wpf_write(wpf, dlb, VI6_WPF_DSTM_ADDR_Y, mem.addr[0]);
+	vsp1_wpf_write(wpf, dlb, VI6_WPF_DSTM_ADDR_C0, mem.addr[1]);
+	vsp1_wpf_write(wpf, dlb, VI6_WPF_DSTM_ADDR_C1, mem.addr[2]);
 }
 
 static unsigned int wpf_max_width(struct vsp1_entity *entity,
-- 
git-series 0.9.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v7 8/8] media: vsp1: Move video configuration to a cached dlb
  2018-03-08  0:05 [PATCH v7 0/8] vsp1: TLB optimisation and DL caching Kieran Bingham
                   ` (6 preceding siblings ...)
  2018-03-08  0:05 ` [PATCH v7 7/8] media: vsp1: Adapt entities to configure into a body Kieran Bingham
@ 2018-03-08  0:05 ` Kieran Bingham
  2018-04-07  0:23   ` Laurent Pinchart
  2018-04-07  0:30 ` [PATCH v7 0/8] vsp1: TLB optimisation and DL caching Laurent Pinchart
  8 siblings, 1 reply; 26+ messages in thread
From: Kieran Bingham @ 2018-03-08  0:05 UTC (permalink / raw)
  To: linux-media, linux-renesas-soc, Laurent Pinchart
  Cc: Kieran Bingham, Kieran Bingham

We are now able to configure a pipeline directly into a local display
list body. Take advantage of this fact, and create a cacheable body to
store the configuration of the pipeline in the video object.

vsp1_video_pipeline_run() is now the last user of the pipe->dl object.
Convert this function to use the cached video->config body and obtain a
local display list reference.

Attach the video->config body to the display list when needed before
committing to hardware.

The pipe object is marked as un-configured when resuming from a suspend.
This ensures that when the hardware is reset - our cached configuration
will be re-attached to the next committed DL.

Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
---

v3:
 - 's/fragment/body/', 's/fragments/bodies/'
 - video dlb cache allocation increased from 2 to 3 dlbs

Our video DL usage now looks like the below output:

dl->body0 contains our disposable runtime configuration. Max 41.
dl_child->body0 is our partition specific configuration. Max 12.
dl->bodies shows our constant configuration and LUTs.

  These two are LUT/CLU:
     * dl->bodies[x]->num_entries 256 / max 256
     * dl->bodies[x]->num_entries 4914 / max 4914

Which shows that our 'constant' configuration cache is currently
utilised to a maximum of 64 entries.

trace-cmd report | \
    grep max | sed 's/.*vsp1_dl_list_commit://g' | sort | uniq;

  dl->body0->num_entries 13 / max 128
  dl->body0->num_entries 14 / max 128
  dl->body0->num_entries 16 / max 128
  dl->body0->num_entries 20 / max 128
  dl->body0->num_entries 27 / max 128
  dl->body0->num_entries 34 / max 128
  dl->body0->num_entries 41 / max 128
  dl_child->body0->num_entries 10 / max 128
  dl_child->body0->num_entries 12 / max 128
  dl->bodies[x]->num_entries 15 / max 128
  dl->bodies[x]->num_entries 16 / max 128
  dl->bodies[x]->num_entries 17 / max 128
  dl->bodies[x]->num_entries 18 / max 128
  dl->bodies[x]->num_entries 20 / max 128
  dl->bodies[x]->num_entries 21 / max 128
  dl->bodies[x]->num_entries 256 / max 256
  dl->bodies[x]->num_entries 31 / max 128
  dl->bodies[x]->num_entries 32 / max 128
  dl->bodies[x]->num_entries 39 / max 128
  dl->bodies[x]->num_entries 40 / max 128
  dl->bodies[x]->num_entries 47 / max 128
  dl->bodies[x]->num_entries 48 / max 128
  dl->bodies[x]->num_entries 4914 / max 4914
  dl->bodies[x]->num_entries 55 / max 128
  dl->bodies[x]->num_entries 56 / max 128
  dl->bodies[x]->num_entries 63 / max 128
  dl->bodies[x]->num_entries 64 / max 128

v4:
 - Adjust pipe configured flag to be reset on resume rather than suspend
 - rename dl_child, dl_next

 drivers/media/platform/vsp1/vsp1_pipe.c  |  7 +++-
 drivers/media/platform/vsp1/vsp1_pipe.h  |  4 +-
 drivers/media/platform/vsp1/vsp1_video.c | 67 ++++++++++++++++---------
 drivers/media/platform/vsp1/vsp1_video.h |  2 +-
 4 files changed, 54 insertions(+), 26 deletions(-)

diff --git a/drivers/media/platform/vsp1/vsp1_pipe.c b/drivers/media/platform/vsp1/vsp1_pipe.c
index 5012643583b6..fa445b1a2e38 100644
--- a/drivers/media/platform/vsp1/vsp1_pipe.c
+++ b/drivers/media/platform/vsp1/vsp1_pipe.c
@@ -249,6 +249,7 @@ void vsp1_pipeline_run(struct vsp1_pipeline *pipe)
 		vsp1_write(vsp1, VI6_CMD(pipe->output->entity.index),
 			   VI6_CMD_STRCMD);
 		pipe->state = VSP1_PIPELINE_RUNNING;
+		pipe->configured = true;
 	}
 
 	pipe->buffers_ready = 0;
@@ -470,6 +471,12 @@ void vsp1_pipelines_resume(struct vsp1_device *vsp1)
 			continue;
 
 		spin_lock_irqsave(&pipe->irqlock, flags);
+		/*
+		 * The hardware may have been reset during a suspend and will
+		 * need a full reconfiguration
+		 */
+		pipe->configured = false;
+
 		if (vsp1_pipeline_ready(pipe))
 			vsp1_pipeline_run(pipe);
 		spin_unlock_irqrestore(&pipe->irqlock, flags);
diff --git a/drivers/media/platform/vsp1/vsp1_pipe.h b/drivers/media/platform/vsp1/vsp1_pipe.h
index 90d29492b9b9..e7ad6211b4d0 100644
--- a/drivers/media/platform/vsp1/vsp1_pipe.h
+++ b/drivers/media/platform/vsp1/vsp1_pipe.h
@@ -90,6 +90,7 @@ struct vsp1_partition {
  * @irqlock: protects the pipeline state
  * @state: current state
  * @wq: wait queue to wait for state change completion
+ * @configured: flag determining if the hardware has run since reset
  * @frame_end: frame end interrupt handler
  * @lock: protects the pipeline use count and stream count
  * @kref: pipeline reference count
@@ -117,6 +118,7 @@ struct vsp1_pipeline {
 	spinlock_t irqlock;
 	enum vsp1_pipeline_state state;
 	wait_queue_head_t wq;
+	bool configured;
 
 	void (*frame_end)(struct vsp1_pipeline *pipe, bool completed);
 
@@ -143,8 +145,6 @@ struct vsp1_pipeline {
 	 */
 	struct list_head entities;
 
-	struct vsp1_dl_list *dl;
-
 	unsigned int partitions;
 	struct vsp1_partition *partition;
 	struct vsp1_partition *part_table;
diff --git a/drivers/media/platform/vsp1/vsp1_video.c b/drivers/media/platform/vsp1/vsp1_video.c
index b47708660e53..96d9872667d9 100644
--- a/drivers/media/platform/vsp1/vsp1_video.c
+++ b/drivers/media/platform/vsp1/vsp1_video.c
@@ -394,37 +394,43 @@ static void vsp1_video_pipeline_run_partition(struct vsp1_pipeline *pipe,
 static void vsp1_video_pipeline_run(struct vsp1_pipeline *pipe)
 {
 	struct vsp1_device *vsp1 = pipe->output->entity.vsp1;
+	struct vsp1_video *video = pipe->output->video;
 	unsigned int partition;
+	struct vsp1_dl_list *dl;
+
+	dl = vsp1_dl_list_get(pipe->output->dlm);
 
-	if (!pipe->dl)
-		pipe->dl = vsp1_dl_list_get(pipe->output->dlm);
+	/* Attach our pipe configuration to fully initialise the hardware */
+	if (!pipe->configured) {
+		vsp1_dl_list_add_body(dl, video->pipe_config);
+		pipe->configured = true;
+	}
 
 	/* Run the first partition */
-	vsp1_video_pipeline_run_partition(pipe, pipe->dl, 0);
+	vsp1_video_pipeline_run_partition(pipe, dl, 0);
 
 	/* Process consecutive partitions as necessary */
 	for (partition = 1; partition < pipe->partitions; ++partition) {
-		struct vsp1_dl_list *dl;
+		struct vsp1_dl_list *dl_next;
 
-		dl = vsp1_dl_list_get(pipe->output->dlm);
+		dl_next = vsp1_dl_list_get(pipe->output->dlm);
 
 		/*
 		 * An incomplete chain will still function, but output only
 		 * the partitions that had a dl available. The frame end
 		 * interrupt will be marked on the last dl in the chain.
 		 */
-		if (!dl) {
+		if (!dl_next) {
 			dev_err(vsp1->dev, "Failed to obtain a dl list. Frame will be incomplete\n");
 			break;
 		}
 
-		vsp1_video_pipeline_run_partition(pipe, dl, partition);
-		vsp1_dl_list_add_chain(pipe->dl, dl);
+		vsp1_video_pipeline_run_partition(pipe, dl_next, partition);
+		vsp1_dl_list_add_chain(dl, dl_next);
 	}
 
 	/* Complete, and commit the head display list. */
-	vsp1_dl_list_commit(pipe->dl);
-	pipe->dl = NULL;
+	vsp1_dl_list_commit(dl);
 
 	vsp1_pipeline_run(pipe);
 }
@@ -790,8 +796,8 @@ static void vsp1_video_buffer_queue(struct vb2_buffer *vb)
 
 static int vsp1_video_setup_pipeline(struct vsp1_pipeline *pipe)
 {
+	struct vsp1_video *video = pipe->output->video;
 	struct vsp1_entity *entity;
-	struct vsp1_dl_body *dlb;
 	int ret;
 
 	/* Determine this pipelines sizes for image partitioning support. */
@@ -799,14 +805,6 @@ static int vsp1_video_setup_pipeline(struct vsp1_pipeline *pipe)
 	if (ret < 0)
 		return ret;
 
-	/* Prepare the display list. */
-	pipe->dl = vsp1_dl_list_get(pipe->output->dlm);
-	if (!pipe->dl)
-		return -ENOMEM;
-
-	/* Retrieve the default DLB from the list */
-	dlb = vsp1_dl_list_get_body0(pipe->dl);
-
 	if (pipe->uds) {
 		struct vsp1_uds *uds = to_uds(&pipe->uds->subdev);
 
@@ -828,11 +826,20 @@ static int vsp1_video_setup_pipeline(struct vsp1_pipeline *pipe)
 		}
 	}
 
+	/* Obtain a clean body from our pool */
+	video->pipe_config = vsp1_dl_body_get(video->dlbs);
+	if (!video->pipe_config)
+		return -ENOMEM;
+
+	/* Configure the entities into our cached pipe configuration */
 	list_for_each_entry(entity, &pipe->entities, list_pipe) {
-		vsp1_entity_route_setup(entity, pipe, dlb);
-		vsp1_entity_configure_stream(entity, pipe, dlb);
+		vsp1_entity_route_setup(entity, pipe, video->pipe_config);
+		vsp1_entity_configure_stream(entity, pipe, video->pipe_config);
 	}
 
+	/* Ensure that our cached configuration is updated in the next DL */
+	pipe->configured = false;
+
 	return 0;
 }
 
@@ -842,6 +849,9 @@ static void vsp1_video_cleanup_pipeline(struct vsp1_pipeline *pipe)
 	struct vsp1_vb2_buffer *buffer;
 	unsigned long flags;
 
+	/* Release any cached configuration */
+	vsp1_dl_body_put(video->pipe_config);
+
 	/* Remove all buffers from the IRQ queue. */
 	spin_lock_irqsave(&video->irqlock, flags);
 	list_for_each_entry(buffer, &video->irqqueue, queue)
@@ -918,9 +928,6 @@ static void vsp1_video_stop_streaming(struct vb2_queue *vq)
 		ret = vsp1_pipeline_stop(pipe);
 		if (ret == -ETIMEDOUT)
 			dev_err(video->vsp1->dev, "pipeline stop timeout\n");
-
-		vsp1_dl_list_put(pipe->dl);
-		pipe->dl = NULL;
 	}
 	mutex_unlock(&pipe->lock);
 
@@ -1240,6 +1247,16 @@ struct vsp1_video *vsp1_video_create(struct vsp1_device *vsp1,
 		goto error;
 	}
 
+	/*
+	 * Utilise a body pool to cache the constant configuration of the
+	 * pipeline object.
+	 */
+	video->dlbs = vsp1_dl_body_pool_create(vsp1, 3, 128, 0);
+	if (!video->dlbs) {
+		ret = -ENOMEM;
+		goto error;
+	}
+
 	return video;
 
 error:
@@ -1249,6 +1266,8 @@ struct vsp1_video *vsp1_video_create(struct vsp1_device *vsp1,
 
 void vsp1_video_cleanup(struct vsp1_video *video)
 {
+	vsp1_dl_body_pool_destroy(video->dlbs);
+
 	if (video_is_registered(&video->video))
 		video_unregister_device(&video->video);
 
diff --git a/drivers/media/platform/vsp1/vsp1_video.h b/drivers/media/platform/vsp1/vsp1_video.h
index 50ea7f02205f..e84f8ee902c1 100644
--- a/drivers/media/platform/vsp1/vsp1_video.h
+++ b/drivers/media/platform/vsp1/vsp1_video.h
@@ -43,6 +43,8 @@ struct vsp1_video {
 
 	struct mutex lock;
 
+	struct vsp1_dl_body_pool *dlbs;
+	struct vsp1_dl_body *pipe_config;
 	unsigned int pipe_index;
 
 	struct vb2_queue queue;
-- 
git-series 0.9.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 1/8] media: vsp1: Reword uses of 'fragment' as 'body'
  2018-03-08  0:05 ` [PATCH v7 1/8] media: vsp1: Reword uses of 'fragment' as 'body' Kieran Bingham
@ 2018-04-06 21:38   ` Laurent Pinchart
  0 siblings, 0 replies; 26+ messages in thread
From: Laurent Pinchart @ 2018-04-06 21:38 UTC (permalink / raw)
  To: Kieran Bingham; +Cc: linux-media, linux-renesas-soc, Kieran Bingham

Hi Kieran,

Thank you for the patch.

On Thursday, 8 March 2018 02:05:24 EEST Kieran Bingham wrote:
> Throughout the codebase, the term 'fragment' is used to represent a
> display list body. This term duplicates the 'body' which is already in
> use.
> 
> The datasheet references these objects as a body, therefore replace all
> mentions of a fragment with a body, along with the corresponding
> pluralised terms.

I like this, the code seems less confusing to me this way. Please see below 
for a few minor comments.

> Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
> 
> ---
> v7
>  - Clean up the formatting of the vsp1_dl_list_add_body()
> 
>  drivers/media/platform/vsp1/vsp1_clu.c |  10 +-
>  drivers/media/platform/vsp1/vsp1_dl.c  | 109 ++++++++++++--------------
>  drivers/media/platform/vsp1/vsp1_dl.h  |  13 +--
>  drivers/media/platform/vsp1/vsp1_lut.c |   8 +-
>  4 files changed, 69 insertions(+), 71 deletions(-)

[snip]

> diff --git a/drivers/media/platform/vsp1/vsp1_dl.c
> b/drivers/media/platform/vsp1/vsp1_dl.c index 0b86ed01e85d..caed441f5f0c
> 100644
> --- a/drivers/media/platform/vsp1/vsp1_dl.c
> +++ b/drivers/media/platform/vsp1/vsp1_dl.c

[snip]

> @@ -157,17 +157,16 @@ static void vsp1_dl_body_cleanup(struct
> vsp1_dl_body *dlb) }
> 
>  /**
> - * vsp1_dl_fragment_alloc - Allocate a display list fragment
> + * vsp1_dl_body_alloc - Allocate a display list body
>   * @vsp1: The VSP1 device
> - * @num_entries: The maximum number of entries that the fragment can
> contain
> + * @num_entries: The maximum number of entries that the body can contain
>   *
> - * Allocate a display list fragment with enough memory to contain the
> requested
> + * Allocate a display list body with enough memory to contain the requested
>   * number of entries.
>   *
> - * Return a pointer to a fragment on success or NULL if memory can't be
> - * allocated.
> + * Return a pointer to a body on success or NULL if memory can't be
> allocated.
>   */
> -struct vsp1_dl_body *vsp1_dl_fragment_alloc(struct vsp1_device *vsp1,
> +struct vsp1_dl_body *vsp1_dl_body_alloc(struct vsp1_device *vsp1,
>  					    unsigned int num_entries)

The indentation of the second line now looks wrong.

[snip]

> @@ -379,33 +378,33 @@ void vsp1_dl_list_put(struct vsp1_dl_list *dl)
>   */
>  void vsp1_dl_list_write(struct vsp1_dl_list *dl, u32 reg, u32 data)
>  {
> -	vsp1_dl_fragment_write(&dl->body0, reg, data);
> +	vsp1_dl_body_write(&dl->body0, reg, data);
>  }
> 
>  /**
> - * vsp1_dl_list_add_fragment - Add a fragment to the display list
> + * vsp1_dl_list_add_body - Add a body to the display list
>   * @dl: The display list
> - * @dlb: The fragment
> + * @dlb: The body
>   *
> - * Add a display list body as a fragment to a display list. Registers
> contained
> - * in fragments are processed after registers contained in the main display
> - * list, in the order in which fragments are added.
> + * Add a display list body as a body to a display list. Registers contained

"body as a body" sounds strange. How about just "Add a display list body to 
the display list." ?

> + * in bodies are processed after registers contained in the main display
> list,
> + * in the order in which bodies are added.
>   *
> - * Adding a fragment to a display list passes ownership of the fragment to
> the
> - * list. The caller must not touch the fragment after this call, and
> must not
> - * free it explicitly with vsp1_dl_fragment_free().
> + * Adding a body to a display list passes ownership of the body to the
> list. The
> + * caller must not touch the body after this call, and must not free it
> + * explicitly with vsp1_dl_body_free().
>   *
> - * Fragments are only usable for display lists in header mode. Attempt to
> - * add a fragment to a header-less display list will return an error.
> + * Additional bodies are only usable for display lists in header mode.
> + * Attempting to add a body to a header-less display list will return an
> error.
>   */

[snip]

With those two small issues fixed,

Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 3/8] media: vsp1: Provide a body pool
  2018-03-08  0:05 ` [PATCH v7 3/8] media: vsp1: Provide a body pool Kieran Bingham
@ 2018-04-06 22:33   ` Laurent Pinchart
  2018-04-30 14:12     ` Kieran Bingham
  0 siblings, 1 reply; 26+ messages in thread
From: Laurent Pinchart @ 2018-04-06 22:33 UTC (permalink / raw)
  To: Kieran Bingham; +Cc: linux-media, linux-renesas-soc, Kieran Bingham

Hi Kieran,

Thank you for the patch.

On Thursday, 8 March 2018 02:05:26 EEST Kieran Bingham wrote:
> Each display list allocates a body to store register values in a dma
> accessible buffer from a dma_alloc_wc() allocation. Each of these
> results in an entry in the TLB, and a large number of display list

I'd write it as "IOMMU TLB" to make it clear we're not concerned about CPU MMU 
TLB pressure.

> allocations adds pressure to this resource.
> 
> Reduce TLB pressure on the IPMMUs by allocating multiple display list
> bodies in a single allocation, and providing these to the display list
> through a 'body pool'. A pool can be allocated by the display list
> manager or entities which require their own body allocations.
> 
> Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
> 
> ---
> v4:
>  - Provide comment explaining extra allocation on body pool
>    highlighting area for optimisation later.
> 
> v3:
>  - s/fragment/body/, s/fragments/bodies/
>  - qty -> num_bodies
>  - indentation fix
>  - s/vsp1_dl_body_pool_{alloc,free}/vsp1_dl_body_pool_{create,destroy}/'
>  - Add kerneldoc to non-static functions
> 
> v2:
>  - assign dlb->dma correctly
> 
>  drivers/media/platform/vsp1/vsp1_dl.c | 163 +++++++++++++++++++++++++++-
>  drivers/media/platform/vsp1/vsp1_dl.h |   8 +-
>  2 files changed, 171 insertions(+)
> 
> diff --git a/drivers/media/platform/vsp1/vsp1_dl.c
> b/drivers/media/platform/vsp1/vsp1_dl.c index 67cc16c1b8e3..0208e72cb356
> 100644
> --- a/drivers/media/platform/vsp1/vsp1_dl.c
> +++ b/drivers/media/platform/vsp1/vsp1_dl.c
> @@ -45,6 +45,8 @@ struct vsp1_dl_entry {
>  /**
>   * struct vsp1_dl_body - Display list body
>   * @list: entry in the display list list of bodies
> + * @free: entry in the pool free body list

Could we reuse @list for this purpose ? Unless I'm mistaken, when a body is in 
a pool it doesn't belong to any particular display list, and when it is in a 
display list it isn't in the pool anymore.

> + * @pool: pool to which this body belongs
>   * @vsp1: the VSP1 device
>   * @entries: array of entries
>   * @dma: DMA address of the entries
> @@ -54,6 +56,9 @@ struct vsp1_dl_entry {
>   */
>  struct vsp1_dl_body {
>  	struct list_head list;
> +	struct list_head free;
> +
> +	struct vsp1_dl_body_pool *pool;
>  	struct vsp1_device *vsp1;
> 
>  	struct vsp1_dl_entry *entries;
> @@ -65,6 +70,30 @@ struct vsp1_dl_body {
>  };
> 
>  /**
> + * struct vsp1_dl_body_pool - display list body pool
> + * @dma: DMA address of the entries
> + * @size: size of the full DMA memory pool in bytes
> + * @mem: CPU memory pointer for the pool
> + * @bodies: Array of DLB structures for the pool
> + * @free: List of free DLB entries
> + * @lock: Protects the pool and free list

The pool and free list ? As far as I can tell the lock only protects the free 
list.

> + * @vsp1: the VSP1 device
> + */
> +struct vsp1_dl_body_pool {
> +	/* DMA allocation */
> +	dma_addr_t dma;
> +	size_t size;
> +	void *mem;
> +
> +	/* Body management */
> +	struct vsp1_dl_body *bodies;
> +	struct list_head free;
> +	spinlock_t lock;
> +
> +	struct vsp1_device *vsp1;
> +};
> +
> +/**
>   * struct vsp1_dl_list - Display list
>   * @list: entry in the display list manager lists
>   * @dlm: the display list manager
> @@ -105,6 +134,7 @@ enum vsp1_dl_mode {
>   * @active: list currently being processed (loaded) by hardware
>   * @queued: list queued to the hardware (written to the DL registers)
>   * @pending: list waiting to be queued to the hardware
> + * @pool: body pool for the display list bodies
>   * @gc_work: bodies garbage collector work struct
>   * @gc_bodies: array of display list bodies waiting to be freed
>   */
> @@ -120,6 +150,8 @@ struct vsp1_dl_manager {
>  	struct vsp1_dl_list *queued;
>  	struct vsp1_dl_list *pending;
> 
> +	struct vsp1_dl_body_pool *pool;
> +
>  	struct work_struct gc_work;
>  	struct list_head gc_bodies;
>  };
> @@ -128,6 +160,137 @@ struct vsp1_dl_manager {
>   * Display List Body Management
>   */
> 
> +/**
> + * vsp1_dl_body_pool_create - Create a pool of bodies from a single
> allocation
> + * @vsp1: The VSP1 device
> + * @num_bodies: The quantity of bodies to allocate

For consistency, s/quantity/number/

> + * @num_entries: The maximum number of entries that the body can contain

Maybe s/the body/a body/ ?

> + * @extra_size: Extra allocation provided for the bodies
> + *
> + * Allocate a pool of display list bodies each with enough memory to
> contain the
> + * requested number of entries.

How about

the requested number of entries plus the @extra_size.

> + *
> + * Return a pointer to a pool on success or NULL if memory can't be
> allocated.
> + */
> +struct vsp1_dl_body_pool *
> +vsp1_dl_body_pool_create(struct vsp1_device *vsp1, unsigned int num_bodies,
> +			 unsigned int num_entries, size_t extra_size)
> +{
> +	struct vsp1_dl_body_pool *pool;
> +	size_t dlb_size;
> +	unsigned int i;
> +
> +	pool = kzalloc(sizeof(*pool), GFP_KERNEL);
> +	if (!pool)
> +		return NULL;
> +
> +	pool->vsp1 = vsp1;
> +
> +	/*
> +	 * Todo: 'extra_size' is only used by vsp1_dlm_create(), to allocate

s/Todo/TODO/

> +	 * extra memory for the display list header. We need only one header per
> +	 * display list, not per display list body, thus this allocation is
> +	 * extraneous and should be reworked in the future.
> +	 */

Any plan to fix this ? :-)

> +	dlb_size = num_entries * sizeof(struct vsp1_dl_entry) + extra_size;
> +	pool->size = dlb_size * num_bodies;
> +
> +	pool->bodies = kcalloc(num_bodies, sizeof(*pool->bodies), GFP_KERNEL);
> +	if (!pool->bodies) {
> +		kfree(pool);
> +		return NULL;
> +	}
> +
> +	pool->mem = dma_alloc_wc(vsp1->bus_master, pool->size, &pool->dma,
> +				 GFP_KERNEL);
> +	if (!pool->mem) {
> +		kfree(pool->bodies);
> +		kfree(pool);
> +		return NULL;
> +	}
> +
> +	spin_lock_init(&pool->lock);
> +	INIT_LIST_HEAD(&pool->free);
> +
> +	for (i = 0; i < num_bodies; ++i) {
> +		struct vsp1_dl_body *dlb = &pool->bodies[i];
> +
> +		dlb->pool = pool;
> +		dlb->max_entries = num_entries;
> +
> +		dlb->dma = pool->dma + i * dlb_size;
> +		dlb->entries = pool->mem + i * dlb_size;
> +
> +		list_add_tail(&dlb->free, &pool->free);
> +	}
> +
> +	return pool;
> +}
> +
> +/**
> + * vsp1_dl_body_pool_destroy - Release a body pool
> + * @pool: The body pool
> + *
> + * Release all components of a pool allocation.
> + */
> +void vsp1_dl_body_pool_destroy(struct vsp1_dl_body_pool *pool)
> +{
> +	if (!pool)
> +		return;
> +
> +	if (pool->mem)
> +		dma_free_wc(pool->vsp1->bus_master, pool->size, pool->mem,
> +			    pool->dma);
> +
> +	kfree(pool->bodies);
> +	kfree(pool);
> +}
> +
> +/**
> + * vsp1_dl_body_get - Obtain a body from a pool
> + * @pool: The body pool
> + *
> + * Obtain a body from the pool allocation without blocking.

"the pool allocation" ? Did you mean just "the pool" ?

> + *
> + * Returns a display list body or NULL if there are none available.
> + */
> +struct vsp1_dl_body *vsp1_dl_body_get(struct vsp1_dl_body_pool *pool)
> +{
> +	struct vsp1_dl_body *dlb = NULL;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&pool->lock, flags);
> +
> +	if (!list_empty(&pool->free)) {
> +		dlb = list_first_entry(&pool->free, struct vsp1_dl_body, free);
> +		list_del(&dlb->free);
> +	}
> +
> +	spin_unlock_irqrestore(&pool->lock, flags);
> +
> +	return dlb;
> +}
> +
> +/**
> + * vsp1_dl_body_put - Return a body back to its pool
> + * @dlb: The display list body
> + *
> + * Return a body back to the pool, and reset the num_entries to clear the
> list.
> + */
> +void vsp1_dl_body_put(struct vsp1_dl_body *dlb)
> +{
> +	unsigned long flags;
> +
> +	if (!dlb)
> +		return;
> +
> +	dlb->num_entries = 0;
> +
> +	spin_lock_irqsave(&dlb->pool->lock, flags);
> +	list_add_tail(&dlb->free, &dlb->pool->free);
> +	spin_unlock_irqrestore(&dlb->pool->lock, flags);
> +}
> +
>  /*
>   * Initialize a display list body object and allocate DMA memory for the
> body * data. The display list body object is expected to have been
> initialized to diff --git a/drivers/media/platform/vsp1/vsp1_dl.h
> b/drivers/media/platform/vsp1/vsp1_dl.h index cf57f986b69a..031032e304d2
> 100644
> --- a/drivers/media/platform/vsp1/vsp1_dl.h
> +++ b/drivers/media/platform/vsp1/vsp1_dl.h
> @@ -17,6 +17,7 @@
> 
>  struct vsp1_device;
>  struct vsp1_dl_body;
> +struct vsp1_dl_body_pool;
>  struct vsp1_dl_list;
>  struct vsp1_dl_manager;
> 
> @@ -34,6 +35,13 @@ void vsp1_dl_list_put(struct vsp1_dl_list *dl);
>  void vsp1_dl_list_write(struct vsp1_dl_list *dl, u32 reg, u32 data);
>  void vsp1_dl_list_commit(struct vsp1_dl_list *dl);
> 
> +struct vsp1_dl_body_pool *
> +vsp1_dl_body_pool_create(struct vsp1_device *vsp1, unsigned int num_bodies,
> +			 unsigned int num_entries, size_t extra_size);
> +void vsp1_dl_body_pool_destroy(struct vsp1_dl_body_pool *pool);
> +struct vsp1_dl_body *vsp1_dl_body_get(struct vsp1_dl_body_pool *pool);
> +void vsp1_dl_body_put(struct vsp1_dl_body *dlb);
> +
>  struct vsp1_dl_body *vsp1_dl_body_alloc(struct vsp1_device *vsp1,
>  					unsigned int num_entries);
>  void vsp1_dl_body_free(struct vsp1_dl_body *dlb);


-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 4/8] media: vsp1: Convert display lists to use new body pool
  2018-03-08  0:05 ` [PATCH v7 4/8] media: vsp1: Convert display lists to use new " Kieran Bingham
@ 2018-04-06 22:55   ` Laurent Pinchart
  2018-04-30 14:39     ` Kieran Bingham
  0 siblings, 1 reply; 26+ messages in thread
From: Laurent Pinchart @ 2018-04-06 22:55 UTC (permalink / raw)
  To: Kieran Bingham; +Cc: linux-media, linux-renesas-soc, Kieran Bingham

Hi Kieran,

Thank you for the patch.

On Thursday, 8 March 2018 02:05:27 EEST Kieran Bingham wrote:
> Adapt the dl->body0 object to use an object from the body pool. This
> greatly reduces the pressure on the TLB for IPMMU use cases, as all of
> the lists use a single allocation for the main body.
> 
> The CLU and LUT objects pre-allocate a pool containing three bodies,
> allowing a userspace update before the hardware has committed a previous
> set of tables.
> 
> Bodies are no longer 'freed' in interrupt context, but instead released
> back to their respective pools. This allows us to remove the garbage
> collector in the DLM.
> 
> Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
> 
> ---
> v3:
>  - 's/fragment/body', 's/fragments/bodies/'
>  - CLU/LUT now allocate 3 bodies
>  - vsp1_dl_list_fragments_free -> vsp1_dl_list_bodies_put
> 
> v2:
>  - Use dl->body0->max_entries to determine header offset, instead of the
>    global constant VSP1_DL_NUM_ENTRIES which is incorrect.
>  - squash updates for LUT, CLU, and fragment cleanup into single patch.
>    (Not fully bisectable when separated)
> 
>  drivers/media/platform/vsp1/vsp1_clu.c |  27 ++-
>  drivers/media/platform/vsp1/vsp1_clu.h |   1 +-
>  drivers/media/platform/vsp1/vsp1_dl.c  | 223 ++++++--------------------
>  drivers/media/platform/vsp1/vsp1_dl.h  |   3 +-
>  drivers/media/platform/vsp1/vsp1_lut.c |  27 ++-
>  drivers/media/platform/vsp1/vsp1_lut.h |   1 +-
>  6 files changed, 101 insertions(+), 181 deletions(-)

Still a nice diffstart :-)

[snip]

> diff --git a/drivers/media/platform/vsp1/vsp1_dl.c
> b/drivers/media/platform/vsp1/vsp1_dl.c index 0208e72cb356..74476726451c
> 100644
> --- a/drivers/media/platform/vsp1/vsp1_dl.c
> +++ b/drivers/media/platform/vsp1/vsp1_dl.c

[snip]

> @@ -399,11 +311,10 @@ void vsp1_dl_body_write(struct vsp1_dl_body *dlb, u32
> reg, u32 data) * Display List Transaction Management
>   */
> 
> -static struct vsp1_dl_list *vsp1_dl_list_alloc(struct vsp1_dl_manager *dlm)
> +static struct vsp1_dl_list *vsp1_dl_list_alloc(struct vsp1_dl_manager
> *dlm,
> +					       struct vsp1_dl_body_pool *pool)

Given that the only caller of this function passes dlm->pool as the second 
argument, can't you remove the second argument ?

>  {
>  	struct vsp1_dl_list *dl;
> -	size_t header_size;
> -	int ret;
> 
>  	dl = kzalloc(sizeof(*dl), GFP_KERNEL);
>  	if (!dl)
> @@ -412,41 +323,39 @@ static struct vsp1_dl_list *vsp1_dl_list_alloc(struct
> vsp1_dl_manager *dlm) INIT_LIST_HEAD(&dl->bodies);
>  	dl->dlm = dlm;
> 
> -	/*
> -	 * Initialize the display list body and allocate DMA memory for the body
> -	 * and the optional header. Both are allocated together to avoid memory
> -	 * fragmentation, with the header located right after the body in
> -	 * memory.
> -	 */
> -	header_size = dlm->mode == VSP1_DL_MODE_HEADER
> -		    ? ALIGN(sizeof(struct vsp1_dl_header), 8)
> -		    : 0;
> -
> -	ret = vsp1_dl_body_init(dlm->vsp1, &dl->body0, VSP1_DL_NUM_ENTRIES,
> -				header_size);
> -	if (ret < 0) {
> -		kfree(dl);
> +	/* Retrieve a body from our DLM body pool */

s/body pool/body pool./

(And I would have said "Get a body" but that's up to you)

> +	dl->body0 = vsp1_dl_body_get(pool);
> +	if (!dl->body0)
>  		return NULL;
> -	}
> -
>  	if (dlm->mode == VSP1_DL_MODE_HEADER) {
> -		size_t header_offset = VSP1_DL_NUM_ENTRIES
> -				     * sizeof(*dl->body0.entries);
> +		size_t header_offset = dl->body0->max_entries
> +				     * sizeof(*dl->body0->entries);
> 
> -		dl->header = ((void *)dl->body0.entries) + header_offset;
> -		dl->dma = dl->body0.dma + header_offset;
> +		dl->header = ((void *)dl->body0->entries) + header_offset;
> +		dl->dma = dl->body0->dma + header_offset;
> 
>  		memset(dl->header, 0, sizeof(*dl->header));
> -		dl->header->lists[0].addr = dl->body0.dma;
> +		dl->header->lists[0].addr = dl->body0->dma;
>  	}
> 
>  	return dl;
>  }
> 
> +static void vsp1_dl_list_bodies_put(struct vsp1_dl_list *dl)
> +{
> +	struct vsp1_dl_body *dlb, *tmp;
> +
> +	list_for_each_entry_safe(dlb, tmp, &dl->bodies, list) {
> +		list_del(&dlb->list);
> +		vsp1_dl_body_put(dlb);
> +	}
> +}
> +
>  static void vsp1_dl_list_free(struct vsp1_dl_list *dl)
>  {
> -	vsp1_dl_body_cleanup(&dl->body0);
> -	list_splice_init(&dl->bodies, &dl->dlm->gc_bodies);
> +	vsp1_dl_body_put(dl->body0);
> +	vsp1_dl_list_bodies_put(dl);

Too bad we can't keep the list splice, it's more efficient than iterating over 
the list, but I suppose it's unavoidable if we want to reset the number of 
used entries to 0 for each body. Beside, we should have a small number of 
bodies only, so hopefully it won't be a big deal.

> +
>  	kfree(dl);
>  }
> 
> @@ -500,18 +409,13 @@ static void __vsp1_dl_list_put(struct vsp1_dl_list
> *dl)
> 
>  	dl->has_chain = false;
> 
> +	vsp1_dl_list_bodies_put(dl);
> +
>  	/*
> -	 * We can't free bodies here as DMA memory can only be freed in
> -	 * interruptible context. Move all bodies to the display list manager's
> -	 * list of bodies to be freed, they will be garbage-collected by the
> -	 * work queue.
> +	 * body0 is reused as as an optimisation as presently every display list
> +	 * has at least one body, thus we reinitialise the entries list

s/entries list/entries list./

>  	 */
> -	if (!list_empty(&dl->bodies)) {
> -		list_splice_init(&dl->bodies, &dl->dlm->gc_bodies);
> -		schedule_work(&dl->dlm->gc_work);
> -	}

We can certainly do this synchronously now that we don't need to free memory 
anymore. I wonder however about the potential performance impact, as there's a 
kfree() in vsp1_dl_list_free(). Do you think it could have a noticeable impact 
on the time spent with interrupts disabled ?

> -
> -	dl->body0.num_entries = 0;
> +	dl->body0->num_entries = 0;
> 
>  	list_add_tail(&dl->list, &dl->dlm->free);
>  }
> @@ -548,7 +452,7 @@ void vsp1_dl_list_put(struct vsp1_dl_list *dl)
>   */
>  void vsp1_dl_list_write(struct vsp1_dl_list *dl, u32 reg, u32 data)
>  {
> -	vsp1_dl_body_write(&dl->body0, reg, data);
> +	vsp1_dl_body_write(dl->body0, reg, data);
>  }
> 
>  /**
> @@ -561,8 +465,7 @@ void vsp1_dl_list_write(struct vsp1_dl_list *dl, u32
> reg, u32 data)
>   * in the order in which bodies are added.
>   *
>   * Adding a body to a display list passes ownership of the body to the
> list. The
> - * caller must not touch the body after this call, and must not free it
> - * explicitly with vsp1_dl_body_free().

Shouldn't we keep the last part of the sentence and adapt it ? Maybe something 
like

	and must not release it explicitly with vsp1_dl_body_put().

I know that you introduce a reference count in the next patches that would 
make this comment invalid, up to this patch it should be correct. When 
introducing reference-counting you can update the comment to state that the 
reference must be released.

> + * caller must not touch the body after this call.
>   *
>   * Additional bodies are only usable for display lists in header mode.
>   * Attempting to add a body to a header-less display list will return an
> error. @@ -620,7 +523,7 @@ static void vsp1_dl_list_fill_header(struct
> vsp1_dl_list *dl, bool is_last)
>   * list was allocated.
>  	 */
> 
> -	hdr->num_bytes = dl->body0.num_entries
> +	hdr->num_bytes = dl->body0->num_entries
>  		       * sizeof(*dl->header->lists);
> 
>  	list_for_each_entry(dlb, &dl->bodies, list) {
> @@ -694,9 +597,9 @@ static void vsp1_dl_list_hw_enqueue(struct vsp1_dl_list
> *dl) * bit will be cleared by the hardware when the display list
>  		 * processing starts.
>  		 */
> -		vsp1_write(vsp1, VI6_DL_HDR_ADDR(0), dl->body0.dma);
> +		vsp1_write(vsp1, VI6_DL_HDR_ADDR(0), dl->body0->dma);
>  		vsp1_write(vsp1, VI6_DL_BODY_SIZE, VI6_DL_BODY_SIZE_UPD |
> -			   (dl->body0.num_entries * sizeof(*dl->header->lists)));
> +			(dl->body0->num_entries * sizeof(*dl->header->lists)));
>  	} else {
>  		/*
>  		 * In header mode, program the display list header address. If
> @@ -879,45 +782,12 @@ void vsp1_dlm_reset(struct vsp1_dl_manager *dlm)
>  	dlm->pending = NULL;
>  }
> 
> -/*
> - * Free all bodies awaiting to be garbage-collected.
> - *
> - * This function must be called without the display list manager lock held.
> - */
> -static void vsp1_dlm_bodies_free(struct vsp1_dl_manager *dlm)
> -{
> -	unsigned long flags;
> -
> -	spin_lock_irqsave(&dlm->lock, flags);
> -
> -	while (!list_empty(&dlm->gc_bodies)) {
> -		struct vsp1_dl_body *dlb;
> -
> -		dlb = list_first_entry(&dlm->gc_bodies, struct vsp1_dl_body,
> -				       list);
> -		list_del(&dlb->list);
> -
> -		spin_unlock_irqrestore(&dlm->lock, flags);
> -		vsp1_dl_body_free(dlb);
> -		spin_lock_irqsave(&dlm->lock, flags);
> -	}
> -
> -	spin_unlock_irqrestore(&dlm->lock, flags);
> -}
> -
> -static void vsp1_dlm_garbage_collect(struct work_struct *work)
> -{
> -	struct vsp1_dl_manager *dlm =
> -		container_of(work, struct vsp1_dl_manager, gc_work);
> -
> -	vsp1_dlm_bodies_free(dlm);
> -}
> -
>  struct vsp1_dl_manager *vsp1_dlm_create(struct vsp1_device *vsp1,
>  					unsigned int index,
>  					unsigned int prealloc)
>  {
>  	struct vsp1_dl_manager *dlm;
> +	size_t header_size;
>  	unsigned int i;
> 
>  	dlm = devm_kzalloc(vsp1->dev, sizeof(*dlm), GFP_KERNEL);
> @@ -932,13 +802,26 @@ struct vsp1_dl_manager *vsp1_dlm_create(struct
> vsp1_device *vsp1,
> 
>  	spin_lock_init(&dlm->lock);
>  	INIT_LIST_HEAD(&dlm->free);
> -	INIT_LIST_HEAD(&dlm->gc_bodies);
> -	INIT_WORK(&dlm->gc_work, vsp1_dlm_garbage_collect);
> +
> +	/*
> +	 * Initialize the display list body and allocate DMA memory for the body
> +	 * and the optional header. Both are allocated together to avoid memory
> +	 * fragmentation, with the header located right after the body in
> +	 * memory.
> +	 */
> +	header_size = dlm->mode == VSP1_DL_MODE_HEADER
> +		    ? ALIGN(sizeof(struct vsp1_dl_header), 8)
> +		    : 0;
> +
> +	dlm->pool = vsp1_dl_body_pool_create(vsp1, prealloc,
> +					     VSP1_DL_NUM_ENTRIES, header_size);
> +	if (!dlm->pool)
> +		return NULL;
> 
>  	for (i = 0; i < prealloc; ++i) {
>  		struct vsp1_dl_list *dl;
> 
> -		dl = vsp1_dl_list_alloc(dlm);
> +		dl = vsp1_dl_list_alloc(dlm, dlm->pool);
>  		if (!dl)
>  			return NULL;
> 
> @@ -955,12 +838,10 @@ void vsp1_dlm_destroy(struct vsp1_dl_manager *dlm)
>  	if (!dlm)
>  		return;
> 
> -	cancel_work_sync(&dlm->gc_work);
> -
>  	list_for_each_entry_safe(dl, next, &dlm->free, list) {
>  		list_del(&dl->list);
>  		vsp1_dl_list_free(dl);
>  	}
> 
> -	vsp1_dlm_bodies_free(dlm);
> +	vsp1_dl_body_pool_destroy(dlm->pool);
>  }

[snip]

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 5/8] media: vsp1: Use reference counting for bodies
  2018-03-08  0:05 ` [PATCH v7 5/8] media: vsp1: Use reference counting for bodies Kieran Bingham
@ 2018-04-06 23:06   ` Laurent Pinchart
  0 siblings, 0 replies; 26+ messages in thread
From: Laurent Pinchart @ 2018-04-06 23:06 UTC (permalink / raw)
  To: Kieran Bingham; +Cc: linux-media, linux-renesas-soc, Kieran Bingham

Hi Kieran,

Thank you for the patch.

On Thursday, 8 March 2018 02:05:28 EEST Kieran Bingham wrote:
> Extend the display list body with a reference count, allowing bodies to
> be kept as long as a reference is maintained. This provides the ability
> to keep a cached copy of bodies which will not change, so that they can
> be re-applied to multiple display lists.
> 
> Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
> 
> ---
> This could be squashed into the body update code, but it's not a
> straightforward squash as the refcounts will affect both:
>   v4l: vsp1: Provide a body pool
> and
>   v4l: vsp1: Convert display lists to use new body pool
> therefore, I have kept this separate to prevent breaking bisectability
> of the vsp-tests.
> 
> v3:
>  - 's/fragment/body/'
> 
> v4:
>  - Fix up reference handling comments.
> 
>  drivers/media/platform/vsp1/vsp1_clu.c |  7 ++++++-
>  drivers/media/platform/vsp1/vsp1_dl.c  | 15 ++++++++++++++-
>  drivers/media/platform/vsp1/vsp1_lut.c |  7 ++++++-
>  3 files changed, 26 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/media/platform/vsp1/vsp1_clu.c
> b/drivers/media/platform/vsp1/vsp1_clu.c index 2018144470c5..b2a39a6ef7e4
> 100644
> --- a/drivers/media/platform/vsp1/vsp1_clu.c
> +++ b/drivers/media/platform/vsp1/vsp1_clu.c
> @@ -257,8 +257,13 @@ static void clu_configure(struct vsp1_entity *entity,
>  		clu->clu = NULL;
>  		spin_unlock_irqrestore(&clu->lock, flags);
> 
> -		if (dlb)
> +		if (dlb) {
>  			vsp1_dl_list_add_body(dl, dlb);
> +
> +			/* release our local reference */

s/release/Release/
s/reference/reference./

> +			vsp1_dl_body_put(dlb);
> +		}
> +
>  		break;
>  	}
>  }
> diff --git a/drivers/media/platform/vsp1/vsp1_dl.c
> b/drivers/media/platform/vsp1/vsp1_dl.c index 74476726451c..134865287c02
> 100644
> --- a/drivers/media/platform/vsp1/vsp1_dl.c
> +++ b/drivers/media/platform/vsp1/vsp1_dl.c
> @@ -14,6 +14,7 @@
>  #include <linux/device.h>
>  #include <linux/dma-mapping.h>
>  #include <linux/gfp.h>
> +#include <linux/refcount.h>
>  #include <linux/slab.h>
>  #include <linux/workqueue.h>
> 
> @@ -58,6 +59,8 @@ struct vsp1_dl_body {
>  	struct list_head list;
>  	struct list_head free;
> 
> +	refcount_t refcnt;
> +
>  	struct vsp1_dl_body_pool *pool;
>  	struct vsp1_device *vsp1;
> 
> @@ -259,6 +262,7 @@ struct vsp1_dl_body *vsp1_dl_body_get(struct
> vsp1_dl_body_pool *pool)
>  	if (!list_empty(&pool->free)) {
>  		dlb = list_first_entry(&pool->free, struct vsp1_dl_body, free);
>  		list_del(&dlb->free);
> +		refcount_set(&dlb->refcnt, 1);
>  	}
> 
>  	spin_unlock_irqrestore(&pool->lock, flags);
> @@ -279,6 +283,9 @@ void vsp1_dl_body_put(struct vsp1_dl_body *dlb)
>  	if (!dlb)
>  		return;
> 
> +	if (!refcount_dec_and_test(&dlb->refcnt))
> +		return;
> +
>  	dlb->num_entries = 0;
> 
>  	spin_lock_irqsave(&dlb->pool->lock, flags);
> @@ -465,7 +472,11 @@ void vsp1_dl_list_write(struct vsp1_dl_list *dl, u32
> reg, u32 data) * in the order in which bodies are added.
>   *
>   * Adding a body to a display list passes ownership of the body to the
> list. The
> - * caller must not touch the body after this call.
> + * caller retains its reference to the fragment when adding it to the
> display
> + * list, but is not allowed to add new entries to the body.
> + *
> + * The reference must be explicitly released by a call to
> vsp1_dl_body_put()
> + * when the body isn't needed anymore.
>   *
>   * Additional bodies are only usable for display lists in header mode.
>   * Attempting to add a body to a header-less display list will return an
> error. @@ -476,6 +487,8 @@ int vsp1_dl_list_add_body(struct vsp1_dl_list
> *dl, struct vsp1_dl_body *dlb)
>  	if (dl->dlm->mode != VSP1_DL_MODE_HEADER)
>  		return -EINVAL;
> 
> +	refcount_inc(&dlb->refcnt);
> +
>  	list_add_tail(&dlb->list, &dl->bodies);
> 
>  	return 0;
> diff --git a/drivers/media/platform/vsp1/vsp1_lut.c
> b/drivers/media/platform/vsp1/vsp1_lut.c index 262cb72139d6..77cf7137a0f2
> 100644
> --- a/drivers/media/platform/vsp1/vsp1_lut.c
> +++ b/drivers/media/platform/vsp1/vsp1_lut.c
> @@ -213,8 +213,13 @@ static void lut_configure(struct vsp1_entity *entity,
>  		lut->lut = NULL;
>  		spin_unlock_irqrestore(&lut->lock, flags);
> 
> -		if (dlb)
> +		if (dlb) {
>  			vsp1_dl_list_add_body(dl, dlb);
> +
> +			/* release our local reference */

s/release/Release/
s/reference/reference./

With these small issues fixed,

Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>

> +			vsp1_dl_body_put(dlb);
> +		}
> +
>  		break;
>  	}
>  }

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 6/8] media: vsp1: Refactor display list configure operations
  2018-03-08  0:05 ` [PATCH v7 6/8] media: vsp1: Refactor display list configure operations Kieran Bingham
@ 2018-04-06 23:38   ` Laurent Pinchart
  2018-04-30 16:22     ` Kieran Bingham
  0 siblings, 1 reply; 26+ messages in thread
From: Laurent Pinchart @ 2018-04-06 23:38 UTC (permalink / raw)
  To: Kieran Bingham; +Cc: linux-media, linux-renesas-soc, Kieran Bingham

Hi Kieran,

Thank you for the patch.

On Thursday, 8 March 2018 02:05:29 EEST Kieran Bingham wrote:
> The entities provide a single .configure operation which configures the
> object into the target display list, based on the vsp1_entity_params
> selection.
> 
> This restricts us to a single function prototype for both static
> configuration (the pre-stream INIT stage) and the dynamic runtime stages
> for both each frame - and each partition therein.
> 
> Split the configure function into two parts, '.configure_stream()' and
> '.configure_frame()', merging both the VSP1_ENTITY_PARAMS_RUNTIME and
> VSP1_ENTITY_PARAMS_PARTITION stages into a single call through the
> .configure_frame(). The configuration for individual partitions is
> handled by passing the partition number to the configure call, and
> processing any runtime stage actions on the first partition only.
> 
> Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
> 
> ---
> v7
>  - Fix formatting and white space
>  - s/prepare/configure_stream/
>  - s/configure/configure_frame/
> 
>  drivers/media/platform/vsp1/vsp1_bru.c    |  12 +-
>  drivers/media/platform/vsp1/vsp1_clu.c    |  50 +---
>  drivers/media/platform/vsp1/vsp1_dl.h     |   1 +-
>  drivers/media/platform/vsp1/vsp1_drm.c    |  21 +--
>  drivers/media/platform/vsp1/vsp1_entity.c |  17 +-
>  drivers/media/platform/vsp1/vsp1_entity.h |  33 +--
>  drivers/media/platform/vsp1/vsp1_hgo.c    |  12 +-
>  drivers/media/platform/vsp1/vsp1_hgt.c    |  12 +-
>  drivers/media/platform/vsp1/vsp1_hsit.c   |  12 +-
>  drivers/media/platform/vsp1/vsp1_lif.c    |  12 +-
>  drivers/media/platform/vsp1/vsp1_lut.c    |  32 +-
>  drivers/media/platform/vsp1/vsp1_rpf.c    | 164 ++++++-------
>  drivers/media/platform/vsp1/vsp1_sru.c    |  12 +-
>  drivers/media/platform/vsp1/vsp1_uds.c    |  57 ++--
>  drivers/media/platform/vsp1/vsp1_video.c  |  24 +--
>  drivers/media/platform/vsp1/vsp1_wpf.c    | 299 ++++++++++++-----------
>  16 files changed, 378 insertions(+), 392 deletions(-)

[snip]

> diff --git a/drivers/media/platform/vsp1/vsp1_clu.c
> b/drivers/media/platform/vsp1/vsp1_clu.c index b2a39a6ef7e4..b8d8af6d4910
> 100644
> --- a/drivers/media/platform/vsp1/vsp1_clu.c
> +++ b/drivers/media/platform/vsp1/vsp1_clu.c
> @@ -213,37 +213,36 @@ static const struct v4l2_subdev_ops clu_ops = {
>  /*
> ---------------------------------------------------------------------------
> -- * VSP1 Entity Operations
>   */
> +static void clu_configure_stream(struct vsp1_entity *entity,
> +				 struct vsp1_pipeline *pipe,
> +				 struct vsp1_dl_list *dl)
> +{
> +	struct vsp1_clu *clu = to_clu(&entity->subdev);
> +
> +	/*
> +	 * The yuv_mode can't be changed during streaming. Cache it internally
> +	 * for future runtime configuration calls.
> +	 */

I'd move this comment right before the vsp1_entity_get_pad_format() call to 
keep all variable declarations together.

> +	struct v4l2_mbus_framefmt *format;
> +
> +	format = vsp1_entity_get_pad_format(&clu->entity,
> +					    clu->entity.config,
> +					    CLU_PAD_SINK);
> +	clu->yuv_mode = format->code == MEDIA_BUS_FMT_AYUV8_1X32;
> +}

[snip]


> diff --git a/drivers/media/platform/vsp1/vsp1_dl.h
> b/drivers/media/platform/vsp1/vsp1_dl.h index 7e820ac6865a..f45083251644
> 100644
> --- a/drivers/media/platform/vsp1/vsp1_dl.h
> +++ b/drivers/media/platform/vsp1/vsp1_dl.h
> @@ -41,7 +41,6 @@ vsp1_dl_body_pool_create(struct vsp1_device *vsp1,
> unsigned int num_bodies, void vsp1_dl_body_pool_destroy(struct
> vsp1_dl_body_pool *pool);
>  struct vsp1_dl_body *vsp1_dl_body_get(struct vsp1_dl_body_pool *pool);
>  void vsp1_dl_body_put(struct vsp1_dl_body *dlb);
> -

This is an unrelated change.

>  void vsp1_dl_body_write(struct vsp1_dl_body *dlb, u32 reg, u32 data);
>  int vsp1_dl_list_add_body(struct vsp1_dl_list *dl, struct vsp1_dl_body
> *dlb);
>  int vsp1_dl_list_add_chain(struct vsp1_dl_list *head, struct vsp1_dl_list
>  *dl);

[snip]

> diff --git a/drivers/media/platform/vsp1/vsp1_entity.h
> b/drivers/media/platform/vsp1/vsp1_entity.h index
> 408602ebeb97..b44ed5414fc3 100644
> --- a/drivers/media/platform/vsp1/vsp1_entity.h
> +++ b/drivers/media/platform/vsp1/vsp1_entity.h

[snip]

> @@ -80,8 +68,10 @@ struct vsp1_route {
>  /**
>   * struct vsp1_entity_operations - Entity operations
>   * @destroy:	Destroy the entity.
> - * @configure:	Setup the hardware based on the entity state (pipeline,
> formats,
> - *		selection rectangles, ...)
> + * @configure_stream:	Setup the initial hardware parameters for the 
stream
> + *			(pipeline, formats)

Instead of initial I would say "Setup hardware parameters that stay constant 
for the whole stream (pipeline, formats)", or possible "that don't vary 
between frames" instead.

> + * @configure_frame:	Configure the runtime parameters for each partition
> + *			(rectangles, buffer addresses, ...)

Maybe "for each frame and each partition thereof" ?

I think we mentioned, when discussing naming, the option of also having a 
configure_partition() operation. Do you think that would make sense ? The 
fact that the partition parameter to the .configure_frame() operation is used 
for the sole purpose of checking whether to configure frame-related parameters 
when partition == 0 makes me think that having two separate operations could 
make sense.

>   * @max_width:	Return the max supported width of data that the entity can
>   *		process in a single operation.
>   * @partition:	Process the partition construction based on this entity's

[snip]

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 7/8] media: vsp1: Adapt entities to configure into a body
  2018-03-08  0:05 ` [PATCH v7 7/8] media: vsp1: Adapt entities to configure into a body Kieran Bingham
@ 2018-04-06 23:55   ` Laurent Pinchart
  0 siblings, 0 replies; 26+ messages in thread
From: Laurent Pinchart @ 2018-04-06 23:55 UTC (permalink / raw)
  To: Kieran Bingham; +Cc: linux-media, linux-renesas-soc, Kieran Bingham

Hi Kieran,

Thank you for the patch.

On Thursday, 8 March 2018 02:05:30 EEST Kieran Bingham wrote:
> Currently the entities store their configurations into a display list.
> Adapt this such that the code can be configured into a body directly,
> allowing greater flexibility and control of the content.
> 
> All users of vsp1_dl_list_write() are removed in this process, thus it
> too is removed.
> 
> A helper, vsp1_dl_list_get_body0() is provided to access the internal body0
> from the display list.
> 
> Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
> 
> ---
> v7
>  - Rebase
>  - s/prepare/configure_stream/
>  - s/configure/configure_frame/
> 
>  drivers/media/platform/vsp1/vsp1_bru.c    | 22 ++++++-------
>  drivers/media/platform/vsp1/vsp1_clu.c    | 22 ++++++-------
>  drivers/media/platform/vsp1/vsp1_dl.c     | 12 ++-----
>  drivers/media/platform/vsp1/vsp1_dl.h     |  2 +-
>  drivers/media/platform/vsp1/vsp1_drm.c    | 20 +++++++----
>  drivers/media/platform/vsp1/vsp1_entity.c | 15 ++++-----
>  drivers/media/platform/vsp1/vsp1_entity.h | 11 +++---
>  drivers/media/platform/vsp1/vsp1_hgo.c    | 16 ++++-----
>  drivers/media/platform/vsp1/vsp1_hgt.c    | 18 +++++-----
>  drivers/media/platform/vsp1/vsp1_hsit.c   | 10 +++---
>  drivers/media/platform/vsp1/vsp1_lif.c    | 15 ++++-----
>  drivers/media/platform/vsp1/vsp1_lut.c    | 21 ++++++------
>  drivers/media/platform/vsp1/vsp1_pipe.c   |  4 +-
>  drivers/media/platform/vsp1/vsp1_pipe.h   |  3 +-
>  drivers/media/platform/vsp1/vsp1_rpf.c    | 39 +++++++++++-----------
>  drivers/media/platform/vsp1/vsp1_sru.c    | 14 ++++----
>  drivers/media/platform/vsp1/vsp1_uds.c    | 24 +++++++-------
>  drivers/media/platform/vsp1/vsp1_uds.h    |  2 +-
>  drivers/media/platform/vsp1/vsp1_video.c  | 11 ++++--
>  drivers/media/platform/vsp1/vsp1_wpf.c    | 42 ++++++++++++------------
>  20 files changed, 172 insertions(+), 151 deletions(-)

[snip]

> diff --git a/drivers/media/platform/vsp1/vsp1_uds.c
> b/drivers/media/platform/vsp1/vsp1_uds.c index ce1731c2b3a9..6ddfce4bd095
> 100644
> --- a/drivers/media/platform/vsp1/vsp1_uds.c
> +++ b/drivers/media/platform/vsp1/vsp1_uds.c
> @@ -31,22 +31,23 @@
>   * Device Access
>   */
> 
> -static inline void vsp1_uds_write(struct vsp1_uds *uds, struct vsp1_dl_list
> *dl,
> -				  u32 reg, u32 data)
> +static inline void vsp1_uds_write(struct vsp1_uds *uds,
> +				  struct vsp1_dl_body *dlb, u32 reg, u32 data)
>  {
> -	vsp1_dl_list_write(dl, reg + uds->entity.index * VI6_UDS_OFFSET, data);
> +	vsp1_dl_body_write(dlb, reg + uds->entity.index * VI6_UDS_OFFSET,
> +			       data);

This can hold on a single line.

>  }

[snip]

> diff --git a/drivers/media/platform/vsp1/vsp1_video.c
> b/drivers/media/platform/vsp1/vsp1_video.c index 1b5a31734834..b47708660e53
> 100644
> --- a/drivers/media/platform/vsp1/vsp1_video.c
> +++ b/drivers/media/platform/vsp1/vsp1_video.c

[snip]

> @@ -802,6 +804,9 @@ static int vsp1_video_setup_pipeline(struct
> vsp1_pipeline *pipe) if (!pipe->dl)
>  		return -ENOMEM;
> 
> +	/* Retrieve the default DLB from the list */

s/list/list./

> +	dlb = vsp1_dl_list_get_body0(pipe->dl);
> +
>  	if (pipe->uds) {
>  		struct vsp1_uds *uds = to_uds(&pipe->uds->subdev);
> 
> @@ -824,8 +829,8 @@ static int vsp1_video_setup_pipeline(struct
> vsp1_pipeline *pipe) }
> 
>  	list_for_each_entry(entity, &pipe->entities, list_pipe) {
> -		vsp1_entity_route_setup(entity, pipe, pipe->dl);
> -		vsp1_entity_configure_stream(entity, pipe, pipe->dl);
> +		vsp1_entity_route_setup(entity, pipe, dlb);
> +		vsp1_entity_configure_stream(entity, pipe, dlb);
>  	}
> 
>  	return 0;
> diff --git a/drivers/media/platform/vsp1/vsp1_wpf.c
> b/drivers/media/platform/vsp1/vsp1_wpf.c index 6a6cdf0fb5f1..68218625549e
> 100644
> --- a/drivers/media/platform/vsp1/vsp1_wpf.c
> +++ b/drivers/media/platform/vsp1/vsp1_wpf.c
> @@ -31,9 +31,10 @@
>   */
> 
>  static inline void vsp1_wpf_write(struct vsp1_rwpf *wpf,
> -				  struct vsp1_dl_list *dl, u32 reg, u32 data)
> +				  struct vsp1_dl_body *dlb, u32 reg, u32 data)
>  {
> -	vsp1_dl_list_write(dl, reg + wpf->entity.index * VI6_WPF_OFFSET, data);
> +	vsp1_dl_body_write(dlb, reg + wpf->entity.index * VI6_WPF_OFFSET,
> +			       data);

This can hold on a single line.

>  }

[snip]

> @@ -292,10 +293,10 @@ static void wpf_configure_stream(struct vsp1_entity
> *entity,
> 
>  	wpf->outfmt = outfmt;
> 
> -	vsp1_dl_list_write(dl, VI6_DPR_WPF_FPORCH(wpf->entity.index),
> -			   VI6_DPR_WPF_FPORCH_FP_WPFN);
> +	vsp1_dl_body_write(dlb, VI6_DPR_WPF_FPORCH(wpf->entity.index),
> +			       VI6_DPR_WPF_FPORCH_FP_WPFN);

Strange indentation.

> 
> -	vsp1_dl_list_write(dl, VI6_WPF_WRBCK_CTRL, 0);
> +	vsp1_dl_body_write(dlb, VI6_WPF_WRBCK_CTRL, 0);
> 
>  	/*
>  	 * Sources. If the pipeline has a single input and BRU is not used,
> @@ -319,17 +320,18 @@ static void wpf_configure_stream(struct vsp1_entity
> *entity, ? VI6_WPF_SRCRPF_VIRACT_MST
> 
>  			: VI6_WPF_SRCRPF_VIRACT2_MST;
> 
> -	vsp1_wpf_write(wpf, dl, VI6_WPF_SRCRPF, srcrpf);
> +	vsp1_wpf_write(wpf, dlb, VI6_WPF_SRCRPF, srcrpf);
> 
>  	/* Enable interrupts */
> -	vsp1_dl_list_write(dl, VI6_WPF_IRQ_STA(wpf->entity.index), 0);
> -	vsp1_dl_list_write(dl, VI6_WPF_IRQ_ENB(wpf->entity.index),
> -			   VI6_WFP_IRQ_ENB_DFEE);
> +	vsp1_dl_body_write(dlb, VI6_WPF_IRQ_STA(wpf->entity.index), 0);
> +	vsp1_dl_body_write(dlb, VI6_WPF_IRQ_ENB(wpf->entity.index),
> +			       VI6_WFP_IRQ_ENB_DFEE);

Here too.

>  }

[snip]

With those small issues fixed,

Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 8/8] media: vsp1: Move video configuration to a cached dlb
  2018-03-08  0:05 ` [PATCH v7 8/8] media: vsp1: Move video configuration to a cached dlb Kieran Bingham
@ 2018-04-07  0:23   ` Laurent Pinchart
  2018-04-30 17:48     ` Kieran Bingham
  0 siblings, 1 reply; 26+ messages in thread
From: Laurent Pinchart @ 2018-04-07  0:23 UTC (permalink / raw)
  To: Kieran Bingham; +Cc: linux-media, linux-renesas-soc, Kieran Bingham

Hi Kieran,

Thank you for the patch.

On Thursday, 8 March 2018 02:05:31 EEST Kieran Bingham wrote:
> We are now able to configure a pipeline directly into a local display
> list body. Take advantage of this fact, and create a cacheable body to
> store the configuration of the pipeline in the video object.
> 
> vsp1_video_pipeline_run() is now the last user of the pipe->dl object.
> Convert this function to use the cached video->config body and obtain a
> local display list reference.
> 
> Attach the video->config body to the display list when needed before
> committing to hardware.
> 
> The pipe object is marked as un-configured when resuming from a suspend.
> This ensures that when the hardware is reset - our cached configuration
> will be re-attached to the next committed DL.
> 
> Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
> ---
> 
> v3:
>  - 's/fragment/body/', 's/fragments/bodies/'
>  - video dlb cache allocation increased from 2 to 3 dlbs
> 
> Our video DL usage now looks like the below output:
> 
> dl->body0 contains our disposable runtime configuration. Max 41.
> dl_child->body0 is our partition specific configuration. Max 12.
> dl->bodies shows our constant configuration and LUTs.
> 
>   These two are LUT/CLU:
>      * dl->bodies[x]->num_entries 256 / max 256
>      * dl->bodies[x]->num_entries 4914 / max 4914
> 
> Which shows that our 'constant' configuration cache is currently
> utilised to a maximum of 64 entries.
> 
> trace-cmd report | \
>     grep max | sed 's/.*vsp1_dl_list_commit://g' | sort | uniq;
> 
>   dl->body0->num_entries 13 / max 128
>   dl->body0->num_entries 14 / max 128
>   dl->body0->num_entries 16 / max 128
>   dl->body0->num_entries 20 / max 128
>   dl->body0->num_entries 27 / max 128
>   dl->body0->num_entries 34 / max 128
>   dl->body0->num_entries 41 / max 128
>   dl_child->body0->num_entries 10 / max 128
>   dl_child->body0->num_entries 12 / max 128
>   dl->bodies[x]->num_entries 15 / max 128
>   dl->bodies[x]->num_entries 16 / max 128
>   dl->bodies[x]->num_entries 17 / max 128
>   dl->bodies[x]->num_entries 18 / max 128
>   dl->bodies[x]->num_entries 20 / max 128
>   dl->bodies[x]->num_entries 21 / max 128
>   dl->bodies[x]->num_entries 256 / max 256
>   dl->bodies[x]->num_entries 31 / max 128
>   dl->bodies[x]->num_entries 32 / max 128
>   dl->bodies[x]->num_entries 39 / max 128
>   dl->bodies[x]->num_entries 40 / max 128
>   dl->bodies[x]->num_entries 47 / max 128
>   dl->bodies[x]->num_entries 48 / max 128
>   dl->bodies[x]->num_entries 4914 / max 4914
>   dl->bodies[x]->num_entries 55 / max 128
>   dl->bodies[x]->num_entries 56 / max 128
>   dl->bodies[x]->num_entries 63 / max 128
>   dl->bodies[x]->num_entries 64 / max 128

This might be useful to capture in the main part of the commit message.

> v4:
>  - Adjust pipe configured flag to be reset on resume rather than suspend
>  - rename dl_child, dl_next
> 
>  drivers/media/platform/vsp1/vsp1_pipe.c  |  7 +++-
>  drivers/media/platform/vsp1/vsp1_pipe.h  |  4 +-
>  drivers/media/platform/vsp1/vsp1_video.c | 67 ++++++++++++++++---------
>  drivers/media/platform/vsp1/vsp1_video.h |  2 +-
>  4 files changed, 54 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/media/platform/vsp1/vsp1_pipe.c
> b/drivers/media/platform/vsp1/vsp1_pipe.c index 5012643583b6..fa445b1a2e38
> 100644
> --- a/drivers/media/platform/vsp1/vsp1_pipe.c
> +++ b/drivers/media/platform/vsp1/vsp1_pipe.c
> @@ -249,6 +249,7 @@ void vsp1_pipeline_run(struct vsp1_pipeline *pipe)
>  		vsp1_write(vsp1, VI6_CMD(pipe->output->entity.index),
>  			   VI6_CMD_STRCMD);
>  		pipe->state = VSP1_PIPELINE_RUNNING;
> +		pipe->configured = true;
>  	}
> 
>  	pipe->buffers_ready = 0;
> @@ -470,6 +471,12 @@ void vsp1_pipelines_resume(struct vsp1_device *vsp1)
>  			continue;
> 
>  		spin_lock_irqsave(&pipe->irqlock, flags);
> +		/*
> +		 * The hardware may have been reset during a suspend and will
> +		 * need a full reconfiguration
> +		 */

s/reconfiguration/reconfiguration./

> +		pipe->configured = false;
> +

Where does that full reconfiguration occur, given that the vsp1_pipeline_run() 
right below sets pipe->configured to true without performing reconfiguration ?

>  		if (vsp1_pipeline_ready(pipe))
>  			vsp1_pipeline_run(pipe);
>  		spin_unlock_irqrestore(&pipe->irqlock, flags);
> diff --git a/drivers/media/platform/vsp1/vsp1_pipe.h
> b/drivers/media/platform/vsp1/vsp1_pipe.h index 90d29492b9b9..e7ad6211b4d0
> 100644
> --- a/drivers/media/platform/vsp1/vsp1_pipe.h
> +++ b/drivers/media/platform/vsp1/vsp1_pipe.h
> @@ -90,6 +90,7 @@ struct vsp1_partition {
>   * @irqlock: protects the pipeline state
>   * @state: current state
>   * @wq: wait queue to wait for state change completion
> + * @configured: flag determining if the hardware has run since reset
>   * @frame_end: frame end interrupt handler
>   * @lock: protects the pipeline use count and stream count
>   * @kref: pipeline reference count
> @@ -117,6 +118,7 @@ struct vsp1_pipeline {
>  	spinlock_t irqlock;
>  	enum vsp1_pipeline_state state;
>  	wait_queue_head_t wq;
> +	bool configured;
> 
>  	void (*frame_end)(struct vsp1_pipeline *pipe, bool completed);
> 
> @@ -143,8 +145,6 @@ struct vsp1_pipeline {
>  	 */
>  	struct list_head entities;
> 
> -	struct vsp1_dl_list *dl;
> -

You should remove the corresponding line from the structure documentation.

>  	unsigned int partitions;
>  	struct vsp1_partition *partition;
>  	struct vsp1_partition *part_table;
> diff --git a/drivers/media/platform/vsp1/vsp1_video.c
> b/drivers/media/platform/vsp1/vsp1_video.c index b47708660e53..96d9872667d9
> 100644
> --- a/drivers/media/platform/vsp1/vsp1_video.c
> +++ b/drivers/media/platform/vsp1/vsp1_video.c
> @@ -394,37 +394,43 @@ static void vsp1_video_pipeline_run_partition(struct
> vsp1_pipeline *pipe, static void vsp1_video_pipeline_run(struct
> vsp1_pipeline *pipe)
>  {
>  	struct vsp1_device *vsp1 = pipe->output->entity.vsp1;
> +	struct vsp1_video *video = pipe->output->video;
>  	unsigned int partition;
> +	struct vsp1_dl_list *dl;
> +
> +	dl = vsp1_dl_list_get(pipe->output->dlm);
> 
> -	if (!pipe->dl)
> -		pipe->dl = vsp1_dl_list_get(pipe->output->dlm);
> +	/* Attach our pipe configuration to fully initialise the hardware */

s/hardware/hardware./

There are other similar comments in this patch.

> +	if (!pipe->configured) {
> +		vsp1_dl_list_add_body(dl, video->pipe_config);
> +		pipe->configured = true;
> +	}
> 
>  	/* Run the first partition */
> -	vsp1_video_pipeline_run_partition(pipe, pipe->dl, 0);
> +	vsp1_video_pipeline_run_partition(pipe, dl, 0);
> 
>  	/* Process consecutive partitions as necessary */
>  	for (partition = 1; partition < pipe->partitions; ++partition) {
> -		struct vsp1_dl_list *dl;
> +		struct vsp1_dl_list *dl_next;
> 
> -		dl = vsp1_dl_list_get(pipe->output->dlm);
> +		dl_next = vsp1_dl_list_get(pipe->output->dlm);
> 
>  		/*
>  		 * An incomplete chain will still function, but output only
>  		 * the partitions that had a dl available. The frame end
>  		 * interrupt will be marked on the last dl in the chain.
>  		 */
> -		if (!dl) {
> +		if (!dl_next) {
>  			dev_err(vsp1->dev, "Failed to obtain a dl list. Frame will be
> incomplete\n"); break;
>  		}
> 
> -		vsp1_video_pipeline_run_partition(pipe, dl, partition);
> -		vsp1_dl_list_add_chain(pipe->dl, dl);
> +		vsp1_video_pipeline_run_partition(pipe, dl_next, partition);
> +		vsp1_dl_list_add_chain(dl, dl_next);
>  	}
> 
>  	/* Complete, and commit the head display list. */
> -	vsp1_dl_list_commit(pipe->dl);
> -	pipe->dl = NULL;
> +	vsp1_dl_list_commit(dl);
> 
>  	vsp1_pipeline_run(pipe);
>  }
> @@ -790,8 +796,8 @@ static void vsp1_video_buffer_queue(struct vb2_buffer
> *vb)
> 
>  static int vsp1_video_setup_pipeline(struct vsp1_pipeline *pipe)
>  {
> +	struct vsp1_video *video = pipe->output->video;
>  	struct vsp1_entity *entity;
> -	struct vsp1_dl_body *dlb;
>  	int ret;
> 
>  	/* Determine this pipelines sizes for image partitioning support. */
> @@ -799,14 +805,6 @@ static int vsp1_video_setup_pipeline(struct
> vsp1_pipeline *pipe) if (ret < 0)
>  		return ret;
> 
> -	/* Prepare the display list. */
> -	pipe->dl = vsp1_dl_list_get(pipe->output->dlm);
> -	if (!pipe->dl)
> -		return -ENOMEM;
> -
> -	/* Retrieve the default DLB from the list */
> -	dlb = vsp1_dl_list_get_body0(pipe->dl);
> -
>  	if (pipe->uds) {
>  		struct vsp1_uds *uds = to_uds(&pipe->uds->subdev);
> 
> @@ -828,11 +826,20 @@ static int vsp1_video_setup_pipeline(struct
> vsp1_pipeline *pipe) }
>  	}
> 
> +	/* Obtain a clean body from our pool */
> +	video->pipe_config = vsp1_dl_body_get(video->dlbs);
> +	if (!video->pipe_config)
> +		return -ENOMEM;
> +
> +	/* Configure the entities into our cached pipe configuration */
>  	list_for_each_entry(entity, &pipe->entities, list_pipe) {
> -		vsp1_entity_route_setup(entity, pipe, dlb);
> -		vsp1_entity_configure_stream(entity, pipe, dlb);
> +		vsp1_entity_route_setup(entity, pipe, video->pipe_config);
> +		vsp1_entity_configure_stream(entity, pipe, video->pipe_config);
>  	}
> 
> +	/* Ensure that our cached configuration is updated in the next DL */
> +	pipe->configured = false;

Quoting my comment to a previous version, and your reply to it which I have 
failed to answer,

> > I'm tempted to move this at pipeline stop time (either to
> > vsp1_video_stop_streaming() right after the vsp1_pipeline_stop() call, or
> > in vsp1_pipeline_stop() itself), possibly with a WARN_ON() here to catch
> > bugs in the driver.
> 
> Do you mean just setting the flag? or the pipe_configuration? This is a
> setup task - not a stop task ... ? We are doing this as part of
> vsp1_video_start_streaming().

I meant just setting the configured flag back to false.

> IMO, The flag should only be updated after the configuration has been
> updated to signal that the new configuration should be written out to the
> hardware.
> 
> Unless you mean to mark the pipe->configured = false; at
> vsp1_pipeline_stop() time because we reset the pipe to halt it ?\0

That's the idea, yes. And now that I think about it again, we could also set 
pipe->configured to false in vsp1_video_cleanup_pipeline() right after the 
vsp1_dl_body_put() call.

What bothers me here is that the pipe->configured flag is handled both in 
vsp1_pipe.c and vsp1_video.c. Coupled with my above comment about the full 
reconfiguration at resume time, I think we might not be abstracting this as we 
should. I wonder whether it would be possible to either make the flag local to 
vsp1_pipe.c, or local to vsp1_video.c and move it from the pipeline object to 
the video object. My gut feeling right now (and it might be too late to trust 
it) is that, as the pipe_config object is stored in vsp1_video, so should the 
configured flag.

Please feel free to challenge this.

> +
>  	return 0;
>  }
> 
> @@ -842,6 +849,9 @@ static void vsp1_video_cleanup_pipeline(struct
> vsp1_pipeline *pipe) struct vsp1_vb2_buffer *buffer;
>  	unsigned long flags;
> 
> +	/* Release any cached configuration */
> +	vsp1_dl_body_put(video->pipe_config);
> +
>  	/* Remove all buffers from the IRQ queue. */
>  	spin_lock_irqsave(&video->irqlock, flags);
>  	list_for_each_entry(buffer, &video->irqqueue, queue)
> @@ -918,9 +928,6 @@ static void vsp1_video_stop_streaming(struct vb2_queue
> *vq) ret = vsp1_pipeline_stop(pipe);
>  		if (ret == -ETIMEDOUT)
>  			dev_err(video->vsp1->dev, "pipeline stop timeout\n");
> -
> -		vsp1_dl_list_put(pipe->dl);
> -		pipe->dl = NULL;
>  	}
>  	mutex_unlock(&pipe->lock);
> 
> @@ -1240,6 +1247,16 @@ struct vsp1_video *vsp1_video_create(struct
> vsp1_device *vsp1, goto error;
>  	}
> 
> +	/*
> +	 * Utilise a body pool to cache the constant configuration of the
> +	 * pipeline object.
> +	 */
> +	video->dlbs = vsp1_dl_body_pool_create(vsp1, 3, 128, 0);
> +	if (!video->dlbs) {
> +		ret = -ENOMEM;
> +		goto error;
> +	}
> +
>  	return video;
> 
>  error:
> @@ -1249,6 +1266,8 @@ struct vsp1_video *vsp1_video_create(struct
> vsp1_device *vsp1,
> 
>  void vsp1_video_cleanup(struct vsp1_video *video)
>  {
> +	vsp1_dl_body_pool_destroy(video->dlbs);
> +
>  	if (video_is_registered(&video->video))
>  		video_unregister_device(&video->video);
> 
> diff --git a/drivers/media/platform/vsp1/vsp1_video.h
> b/drivers/media/platform/vsp1/vsp1_video.h index 50ea7f02205f..e84f8ee902c1
> 100644
> --- a/drivers/media/platform/vsp1/vsp1_video.h
> +++ b/drivers/media/platform/vsp1/vsp1_video.h
> @@ -43,6 +43,8 @@ struct vsp1_video {
> 
>  	struct mutex lock;
> 
> +	struct vsp1_dl_body_pool *dlbs;
> +	struct vsp1_dl_body *pipe_config;
>  	unsigned int pipe_index;
> 
>  	struct vb2_queue queue;

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 0/8] vsp1: TLB optimisation and DL caching
  2018-03-08  0:05 [PATCH v7 0/8] vsp1: TLB optimisation and DL caching Kieran Bingham
                   ` (7 preceding siblings ...)
  2018-03-08  0:05 ` [PATCH v7 8/8] media: vsp1: Move video configuration to a cached dlb Kieran Bingham
@ 2018-04-07  0:30 ` Laurent Pinchart
  8 siblings, 0 replies; 26+ messages in thread
From: Laurent Pinchart @ 2018-04-07  0:30 UTC (permalink / raw)
  To: Kieran Bingham; +Cc: linux-media, linux-renesas-soc, Kieran Bingham

Hi Kieran,

I've finished reviewing the series. For your convenience, I've rebased it on 
top of the BRU/BRS dynamic allocation patches, and pushed the result to

	git://linuxtv.org/pinchartl/media.git v4l2/vsp1/tlb-optimise

(Please note it has been compile-tested only)

I have also taken the liberty to incorporate both my review comments and my 
Reviewed-by line for the patches that have received a conditional Reviewed-by, 
that is patches 1/8, 5/8 and 7/8. Patches 3/8, 4/8, 6/8 and 8/8 have open 
questions so I haven't touched them.

Please don't despair, v8 might well be the last version we will need :-)

On Thursday, 8 March 2018 02:05:23 EEST Kieran Bingham wrote:
> Each display list currently allocates an area of DMA memory to store
> register settings for the VSP1 to process. Each of these allocations adds
> pressure to the IPMMU TLB entries.
> 
> We can reduce the pressure by pre-allocating larger areas and dividing the
> area across multiple bodies represented as a pool.
> 
> With this reconfiguration of bodies, we can adapt the configuration code to
> separate out constant hardware configuration and cache it for re-use.
> 
> The patches provided in this series can be found at:
>   git://git.kernel.org/pub/scm/linux/kernel/git/kbingham/rcar.git 
> tags/vsp1/tlb-optimise/v7
> 
> I hope that this series is at a stage where it could be integrated now.  It
> has had some thorough testing and is already integrated in both
> renesas-drivers and renesas-bsp. (except for the minor changes in v7 that
> is...)
> 
> Please note that checkpatch complains on patch 6/8 in this series:
> 
> v7-0006-media-vsp1-Refactor-display-list-configure-operations.patch
> ----------------------------------------------------------------------------
> -------------------------- WARNING: function definition argument 'struct
> vsp1_entity *' should also have an identifier name #290: FILE:
> drivers/media/platform/vsp1/vsp1_entity.h:82:
> +       void (*configure_stream)(struct vsp1_entity *, struct vsp1_pipeline
> *,
> 
> However - this complaint is regarding pre-existing code. I have only renamed
> the function pointers.  I do also disagree with checkpatch here - as there
> is no need to provide an identifier name, and it does not improve
> readability in this instance to state:
> 	...(vsp1_entity *entity, struct vsp1_pipeline *pipe)
> 
> Thus - I have ignored these warnings.
> 
> 
> Changelog:
> ----------
> 
> v7:
>  - Rebased on to linux-media/master (v4.16-rc4)
>  - Clean up the formatting of the vsp1_dl_list_add_body()
>  - Fix formatting and white space
>  -  s/prepare/configure_stream/
>  -  s/configure/configure_frame/
> 
> v6:
>  - Rebased on to linux-media/master (v4.16-rc1)
>  - Removed DRM/UIF (DISCOM/ColorKey) updates
> 
> v5:
>  - Rebased on to renesas-drivers-2018-01-09-v4.15-rc7 to fix conflicts
>    with DRM and UIF updates on VSP1 driver
> 
> v4:
>  - Rebased to v4.14
>  * v4l: vsp1: Use reference counting for bodies
>    - Fix up reference handling comments
> 
>  * v4l: vsp1: Provide a body pool
>    - Provide comment explaining extra allocation on body pool
>      highlighting area for optimisation later.
> 
>  * v4l: vsp1: Refactor display list configure operations
>    - Fix up comment to describe yuv_mode caching rather than format
> 
>  * vsp1: Adapt entities to configure into a body
>    - Rename vsp1_dl_list_get_body() to vsp1_dl_list_get_body0()
> 
>  * v4l: vsp1: Move video configuration to a cached dlb
>    - Adjust pipe configured flag to be reset on resume rather than suspend
>    - rename dl_child, dl_next
> 
> Testing:
> --------
> The VSP unit tests have been run on this patch set with the following
> results:
> 
> --- Test loop 1 ---
> - vsp-unit-test-0000.sh
> Test Conditions:
>   Platform          Renesas Salvator-X 2nd version board based on r8a7795
> ES2.0+ Kernel release    4.16.0-rc4-arm64-renesas-01067-g397eb3811ec0
>   convert           /usr/bin/convert
>   compare           /usr/bin/compare
>   killall           /usr/bin/killall
>   raw2rgbpnm        /usr/bin/raw2rgbpnm
>   stress            /usr/bin/stress
>   yavta             /usr/bin/yavta
> - vsp-unit-test-0001.sh
> Testing WPF packing in RGB332: pass
> Testing WPF packing in ARGB555: pass
> Testing WPF packing in XRGB555: pass
> Testing WPF packing in RGB565: pass
> Testing WPF packing in BGR24: pass
> Testing WPF packing in RGB24: pass
> Testing WPF packing in ABGR32: pass
> Testing WPF packing in ARGB32: pass
> Testing WPF packing in XBGR32: pass
> Testing WPF packing in XRGB32: pass
> - vsp-unit-test-0002.sh
> Testing WPF packing in NV12M: pass
> Testing WPF packing in NV16M: pass
> Testing WPF packing in NV21M: pass
> Testing WPF packing in NV61M: pass
> Testing WPF packing in UYVY: pass
> Testing WPF packing in VYUY: skip
> Testing WPF packing in YUV420M: pass
> Testing WPF packing in YUV422M: pass
> Testing WPF packing in YUV444M: pass
> Testing WPF packing in YVU420M: pass
> Testing WPF packing in YVU422M: pass
> Testing WPF packing in YVU444M: pass
> Testing WPF packing in YUYV: pass
> Testing WPF packing in YVYU: pass
> - vsp-unit-test-0003.sh
> Testing scaling from 640x640 to 640x480 in RGB24: pass
> Testing scaling from 1024x768 to 640x480 in RGB24: pass
> Testing scaling from 640x480 to 1024x768 in RGB24: pass
> Testing scaling from 640x640 to 640x480 in YUV444M: pass
> Testing scaling from 1024x768 to 640x480 in YUV444M: pass
> Testing scaling from 640x480 to 1024x768 in YUV444M: pass
> - vsp-unit-test-0004.sh
> Testing histogram in RGB24: pass
> Testing histogram in YUV444M: pass
> - vsp-unit-test-0005.sh
> Testing RPF.0: pass
> Testing RPF.1: pass
> Testing RPF.2: pass
> Testing RPF.3: pass
> Testing RPF.4: pass
> - vsp-unit-test-0006.sh
> Testing invalid pipeline with no RPF: pass
> Testing invalid pipeline with no WPF: pass
> - vsp-unit-test-0007.sh
> Testing BRU in RGB24 with 1 inputs: pass
> Testing BRU in RGB24 with 2 inputs: pass
> Testing BRU in RGB24 with 3 inputs: pass
> Testing BRU in RGB24 with 4 inputs: pass
> Testing BRU in RGB24 with 5 inputs: pass
> Testing BRU in YUV444M with 1 inputs: pass
> Testing BRU in YUV444M with 2 inputs: pass
> Testing BRU in YUV444M with 3 inputs: pass
> Testing BRU in YUV444M with 4 inputs: pass
> Testing BRU in YUV444M with 5 inputs: pass
> - vsp-unit-test-0008.sh
> Test requires unavailable feature set `bru rpf.0 uds wpf.0': skipped
> - vsp-unit-test-0009.sh
> Test requires unavailable feature set `rpf.0 wpf.0 wpf.1': skipped
> - vsp-unit-test-0010.sh
> Testing CLU in RGB24 with zero configuration: pass
> Testing CLU in RGB24 with identity configuration: pass
> Testing CLU in RGB24 with wave configuration: pass
> Testing CLU in YUV444M with zero configuration: pass
> Testing CLU in YUV444M with identity configuration: pass
> Testing CLU in YUV444M with wave configuration: pass
> Testing LUT in RGB24 with zero configuration: pass
> Testing LUT in RGB24 with identity configuration: pass
> Testing LUT in RGB24 with gamma configuration: pass
> Testing LUT in YUV444M with zero configuration: pass
> Testing LUT in YUV444M with identity configuration: pass
> Testing LUT in YUV444M with gamma configuration: pass
> - vsp-unit-test-0011.sh
> Testing  hflip=0 vflip=0 rotate=0: pass
> Testing  hflip=1 vflip=0 rotate=0: pass
> Testing  hflip=0 vflip=1 rotate=0: pass
> Testing  hflip=1 vflip=1 rotate=0: pass
> Testing  hflip=0 vflip=0 rotate=90: pass
> Testing  hflip=1 vflip=0 rotate=90: pass
> Testing  hflip=0 vflip=1 rotate=90: pass
> Testing  hflip=1 vflip=1 rotate=90: pass
> - vsp-unit-test-0012.sh
> Testing hflip: pass
> Testing vflip: pass
> - vsp-unit-test-0013.sh
> Testing RPF unpacking in RGB332: pass
> Testing RPF unpacking in ARGB555: pass
> Testing RPF unpacking in XRGB555: pass
> Testing RPF unpacking in RGB565: pass
> Testing RPF unpacking in BGR24: pass
> Testing RPF unpacking in RGB24: pass
> Testing RPF unpacking in ABGR32: pass
> Testing RPF unpacking in ARGB32: pass
> Testing RPF unpacking in XBGR32: pass
> Testing RPF unpacking in XRGB32: pass
> - vsp-unit-test-0014.sh
> Testing RPF unpacking in NV12M: pass
> Testing RPF unpacking in NV16M: pass
> Testing RPF unpacking in NV21M: pass
> Testing RPF unpacking in NV61M: pass
> Testing RPF unpacking in UYVY: pass
> Testing RPF unpacking in VYUY: skip
> Testing RPF unpacking in YUV420M: pass
> Testing RPF unpacking in YUV422M: pass
> Testing RPF unpacking in YUV444M: pass
> Testing RPF unpacking in YVU420M: pass
> Testing RPF unpacking in YVU422M: pass
> Testing RPF unpacking in YVU444M: pass
> Testing RPF unpacking in YUYV: pass
> Testing RPF unpacking in YVYU: pass
> - vsp-unit-test-0015.sh
> Testing SRU scaling from 1024x768 to 1024x768 in RGB24: pass
> Testing SRU scaling from 1024x768 to 2048x1536 in RGB24: pass
> Testing SRU scaling from 1024x768 to 1024x768 in YUV444M: pass
> Testing SRU scaling from 1024x768 to 2048x1536 in YUV444M: pass
> - vsp-unit-test-0016.sh
> Testing  hflip=0 vflip=0 rotate=0 640x480 -> 640x480: pass
> Testing  hflip=0 vflip=0 rotate=0 640x480 -> 1024x768: pass
> Testing  hflip=0 vflip=0 rotate=0 1024x768 -> 640x480: pass
> Testing  hflip=1 vflip=0 rotate=0 640x480 -> 640x480: pass
> Testing  hflip=1 vflip=0 rotate=0 640x480 -> 1024x768: pass
> Testing  hflip=1 vflip=0 rotate=0 1024x768 -> 640x480: pass
> Testing  hflip=0 vflip=1 rotate=0 640x480 -> 640x480: pass
> Testing  hflip=0 vflip=1 rotate=0 640x480 -> 1024x768: pass
> Testing  hflip=0 vflip=1 rotate=0 1024x768 -> 640x480: pass
> Testing  hflip=1 vflip=1 rotate=0 640x480 -> 640x480: pass
> Testing  hflip=1 vflip=1 rotate=0 640x480 -> 1024x768: pass
> Testing  hflip=1 vflip=1 rotate=0 1024x768 -> 640x480: pass
> Testing  hflip=0 vflip=0 rotate=90 640x480 -> 640x480: pass
> Testing  hflip=0 vflip=0 rotate=90 640x480 -> 1024x768: pass
> Testing  hflip=0 vflip=0 rotate=90 1024x768 -> 640x480: pass
> Testing  hflip=1 vflip=0 rotate=90 640x480 -> 640x480: pass
> Testing  hflip=1 vflip=0 rotate=90 640x480 -> 1024x768: pass
> Testing  hflip=1 vflip=0 rotate=90 1024x768 -> 640x480: pass
> Testing  hflip=0 vflip=1 rotate=90 640x480 -> 640x480: pass
> Testing  hflip=0 vflip=1 rotate=90 640x480 -> 1024x768: pass
> Testing  hflip=0 vflip=1 rotate=90 1024x768 -> 640x480: pass
> Testing  hflip=1 vflip=1 rotate=90 640x480 -> 640x480: pass
> Testing  hflip=1 vflip=1 rotate=90 640x480 -> 1024x768: pass
> Testing  hflip=1 vflip=1 rotate=90 1024x768 -> 640x480: pass
> - vsp-unit-test-0017.sh
> - vsp-unit-test-0018.sh
> Testing RPF crop from (0,0)/512x384: pass
> Testing RPF crop from (32,32)/512x384: pass
> Testing RPF crop from (32,64)/512x384: pass
> Testing RPF crop from (64,32)/512x384: pass
> - vsp-unit-test-0019.sh
> - vsp-unit-test-0020.sh
> - vsp-unit-test-0021.sh
> Testing WPF packing in RGB332 during stress testing: pass
> Testing WPF packing in ARGB555 during stress testing: pass
> Testing WPF packing in XRGB555 during stress testing: pass
> Testing WPF packing in RGB565 during stress testing: pass
> Testing WPF packing in BGR24 during stress testing: pass
> Testing WPF packing in RGB24 during stress testing: pass
> Testing WPF packing in ABGR32 during stress testing: pass
> Testing WPF packing in ARGB32 during stress testing: pass
> Testing WPF packing in XBGR32 during stress testing: pass
> Testing WPF packing in XRGB32 during stress testing: pass
> ./vsp-unit-test-0021.sh: line 34:  4489 Killed                  stress --cpu
> 8 --io 4 --vm 2 --vm-bytes 128M - vsp-unit-test-0022.sh
> Testing long duration pipelines under stress: pass
> ./vsp-unit-test-0022.sh: line 38:  6457 Killed                  stress --cpu
> 8 --io 4 --vm 2 --vm-bytes 128M - vsp-unit-test-0023.sh
> Testing histogram HGT with hue areas
> 0,255,255,255,255,255,255,255,255,255,255,255: pass Testing histogram HGT
> with hue areas 0,40,40,80,80,120,120,160,160,200,200,255: pass Testing
> histogram HGT with hue areas 220,40,40,80,80,120,120,160,160,200,200,220:
> pass Testing histogram HGT with hue areas
> 0,10,50,60,100,110,150,160,200,210,250,255: pass Testing histogram HGT with
> hue areas 10,20,50,60,100,110,150,160,200,210,230,240: pass Testing
> histogram HGT with hue areas 240,20,60,80,100,120,140,160,180,200,210,220:
> pass - vsp-unit-test-0024.sh
> Test requires unavailable feature set `rpf.0 rpf.1 brs wpf.0': skipped
> 158 tests: 142 passed, 0 failed, 3 skipped
> 
> Kieran Bingham (8):
>   media: vsp1: Reword uses of 'fragment' as 'body'
>   media: vsp1: Protect bodies against overflow
>   media: vsp1: Provide a body pool
>   media: vsp1: Convert display lists to use new body pool
>   media: vsp1: Use reference counting for bodies
>   media: vsp1: Refactor display list configure operations
>   media: vsp1: Adapt entities to configure into a body
>   media: vsp1: Move video configuration to a cached dlb
> 
>  drivers/media/platform/vsp1/vsp1_bru.c    |  32 +--
>  drivers/media/platform/vsp1/vsp1_clu.c    | 102 +++---
>  drivers/media/platform/vsp1/vsp1_clu.h    |   1 +-
>  drivers/media/platform/vsp1/vsp1_dl.c     | 393 +++++++++++++----------
>  drivers/media/platform/vsp1/vsp1_dl.h     |  19 +-
>  drivers/media/platform/vsp1/vsp1_drm.c    |  35 +--
>  drivers/media/platform/vsp1/vsp1_entity.c |  26 +-
>  drivers/media/platform/vsp1/vsp1_entity.h |  38 +-
>  drivers/media/platform/vsp1/vsp1_hgo.c    |  26 +--
>  drivers/media/platform/vsp1/vsp1_hgt.c    |  28 +--
>  drivers/media/platform/vsp1/vsp1_hsit.c   |  20 +-
>  drivers/media/platform/vsp1/vsp1_lif.c    |  25 +-
>  drivers/media/platform/vsp1/vsp1_lut.c    |  77 +++--
>  drivers/media/platform/vsp1/vsp1_lut.h    |   1 +-
>  drivers/media/platform/vsp1/vsp1_pipe.c   |  11 +-
>  drivers/media/platform/vsp1/vsp1_pipe.h   |   7 +-
>  drivers/media/platform/vsp1/vsp1_rpf.c    | 183 +++++------
>  drivers/media/platform/vsp1/vsp1_sru.c    |  24 +-
>  drivers/media/platform/vsp1/vsp1_uds.c    |  75 ++--
>  drivers/media/platform/vsp1/vsp1_uds.h    |   2 +-
>  drivers/media/platform/vsp1/vsp1_video.c  |  82 ++---
>  drivers/media/platform/vsp1/vsp1_video.h  |   2 +-
>  drivers/media/platform/vsp1/vsp1_wpf.c    | 327 +++++++++----------
>  23 files changed, 845 insertions(+), 691 deletions(-)
> 
> base-commit: 8514509ba5933f4e4ade0d5d81be117f18c1ebd2

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 3/8] media: vsp1: Provide a body pool
  2018-04-06 22:33   ` Laurent Pinchart
@ 2018-04-30 14:12     ` Kieran Bingham
  0 siblings, 0 replies; 26+ messages in thread
From: Kieran Bingham @ 2018-04-30 14:12 UTC (permalink / raw)
  To: Laurent Pinchart; +Cc: linux-media, linux-renesas-soc, Kieran Bingham

Hi Laurent,

On 06/04/18 23:33, Laurent Pinchart wrote:
> Hi Kieran,
> 
> Thank you for the patch.
> 
> On Thursday, 8 March 2018 02:05:26 EEST Kieran Bingham wrote:
>> Each display list allocates a body to store register values in a dma
>> accessible buffer from a dma_alloc_wc() allocation. Each of these
>> results in an entry in the TLB, and a large number of display list
> 
> I'd write it as "IOMMU TLB" to make it clear we're not concerned about CPU MMU 
> TLB pressure.

Yes, of course.

> 
>> allocations adds pressure to this resource.
>>
>> Reduce TLB pressure on the IPMMUs by allocating multiple display list
>> bodies in a single allocation, and providing these to the display list
>> through a 'body pool'. A pool can be allocated by the display list
>> manager or entities which require their own body allocations.
>>
>> Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
>>
>> ---
>> v4:
>>  - Provide comment explaining extra allocation on body pool
>>    highlighting area for optimisation later.
>>
>> v3:
>>  - s/fragment/body/, s/fragments/bodies/
>>  - qty -> num_bodies
>>  - indentation fix
>>  - s/vsp1_dl_body_pool_{alloc,free}/vsp1_dl_body_pool_{create,destroy}/'
>>  - Add kerneldoc to non-static functions
>>
>> v2:
>>  - assign dlb->dma correctly
>>
>>  drivers/media/platform/vsp1/vsp1_dl.c | 163 +++++++++++++++++++++++++++-
>>  drivers/media/platform/vsp1/vsp1_dl.h |   8 +-
>>  2 files changed, 171 insertions(+)
>>
>> diff --git a/drivers/media/platform/vsp1/vsp1_dl.c
>> b/drivers/media/platform/vsp1/vsp1_dl.c index 67cc16c1b8e3..0208e72cb356
>> 100644
>> --- a/drivers/media/platform/vsp1/vsp1_dl.c
>> +++ b/drivers/media/platform/vsp1/vsp1_dl.c
>> @@ -45,6 +45,8 @@ struct vsp1_dl_entry {
>>  /**
>>   * struct vsp1_dl_body - Display list body
>>   * @list: entry in the display list list of bodies
>> + * @free: entry in the pool free body list
> 
> Could we reuse @list for this purpose ? Unless I'm mistaken, when a body is in 
> a pool it doesn't belong to any particular display list, and when it is in a 
> display list it isn't in the pool anymore.
I've adapted the @list doc-string to read:

 * @list: entry in the display list list of bodies, or body pool free list


Actually, I think I'm tempted to leave this distinct and separate.

If we use the single @list field, then we have calls to vsp1_dl_body_get() which
'might not' add that body to a display list.

Consider the lut_set_table call, if it was called twice without a dl_commit() in
between.

The first call would get a dlb, and populate it with the LUT, setting it as the
active LUT in the entitiy, but not adding it to any list.

The call to vsp1_dl_body_get() *has* to remove it from the free list at that
point. But now the list structure '@list' is poisoned.


Now consider the general case for 'putting' the object back to the pool free
list when it's reference count is zero:

vsp1_dl_body_put() must delete the structure from any existing list and add it
to the pool-free list.

We can use list_move_tail for this:

@@ -286,7 +284,7 @@ void vsp1_dl_body_put(struct vsp1_dl_body *dlb)
        dlb->num_entries = 0;

        spin_lock_irqsave(&dlb->pool->lock, flags);
-       list_add_tail(&dlb->free, &dlb->pool->free);
+       list_move_tail(&dlb->list, &dlb->pool->free);
        spin_unlock_irqrestore(&dlb->pool->lock, flags);


However - this fails in the instance above where the dlb did not make it onto a
list, because the call to __list_del_entry() will segfault on the poisoned values.

I don't believe we can expect the callers of vsp1_dl_body_put() to guarantee
that the entity is not on a list either, as that is intrinsic to the refcounting
associated with the object.


I think we have the same issue with the cached body in the vsp1_video object too.

So - leaving this as it is.


> 
>> + * @pool: pool to which this body belongs
>>   * @vsp1: the VSP1 device
>>   * @entries: array of entries
>>   * @dma: DMA address of the entries
>> @@ -54,6 +56,9 @@ struct vsp1_dl_entry {
>>   */
>>  struct vsp1_dl_body {
>>  	struct list_head list;
>> +	struct list_head free;
>> +
>> +	struct vsp1_dl_body_pool *pool;
>>  	struct vsp1_device *vsp1;
>>
>>  	struct vsp1_dl_entry *entries;
>> @@ -65,6 +70,30 @@ struct vsp1_dl_body {
>>  };
>>
>>  /**
>> + * struct vsp1_dl_body_pool - display list body pool
>> + * @dma: DMA address of the entries
>> + * @size: size of the full DMA memory pool in bytes
>> + * @mem: CPU memory pointer for the pool
>> + * @bodies: Array of DLB structures for the pool
>> + * @free: List of free DLB entries
>> + * @lock: Protects the pool and free list
> 
> The pool and free list ? As far as I can tell the lock only protects the free 
> list.

I've removed the reference to the pool for the lock. I don't think much else
needs protecting in the current code, so it should be fine.

> 
>> + * @vsp1: the VSP1 device
>> + */
>> +struct vsp1_dl_body_pool {
>> +	/* DMA allocation */
>> +	dma_addr_t dma;
>> +	size_t size;
>> +	void *mem;
>> +
>> +	/* Body management */
>> +	struct vsp1_dl_body *bodies;
>> +	struct list_head free;
>> +	spinlock_t lock;
>> +
>> +	struct vsp1_device *vsp1;
>> +};
>> +
>> +/**
>>   * struct vsp1_dl_list - Display list
>>   * @list: entry in the display list manager lists
>>   * @dlm: the display list manager
>> @@ -105,6 +134,7 @@ enum vsp1_dl_mode {
>>   * @active: list currently being processed (loaded) by hardware
>>   * @queued: list queued to the hardware (written to the DL registers)
>>   * @pending: list waiting to be queued to the hardware
>> + * @pool: body pool for the display list bodies
>>   * @gc_work: bodies garbage collector work struct
>>   * @gc_bodies: array of display list bodies waiting to be freed
>>   */
>> @@ -120,6 +150,8 @@ struct vsp1_dl_manager {
>>  	struct vsp1_dl_list *queued;
>>  	struct vsp1_dl_list *pending;
>>
>> +	struct vsp1_dl_body_pool *pool;
>> +
>>  	struct work_struct gc_work;
>>  	struct list_head gc_bodies;
>>  };
>> @@ -128,6 +160,137 @@ struct vsp1_dl_manager {
>>   * Display List Body Management
>>   */
>>
>> +/**
>> + * vsp1_dl_body_pool_create - Create a pool of bodies from a single
>> allocation
>> + * @vsp1: The VSP1 device
>> + * @num_bodies: The quantity of bodies to allocate
> 
> For consistency, s/quantity/number/

Done

> 
>> + * @num_entries: The maximum number of entries that the body can contain
> 
> Maybe s/the body/a body/ ?

Done

> 
>> + * @extra_size: Extra allocation provided for the bodies
>> + *
>> + * Allocate a pool of display list bodies each with enough memory to
>> contain the
>> + * requested number of entries.
> 
> How about
> 
> the requested number of entries plus the @extra_size.

Done

> 
>> + *
>> + * Return a pointer to a pool on success or NULL if memory can't be
>> allocated.
>> + */
>> +struct vsp1_dl_body_pool *
>> +vsp1_dl_body_pool_create(struct vsp1_device *vsp1, unsigned int num_bodies,
>> +			 unsigned int num_entries, size_t extra_size)
>> +{
>> +	struct vsp1_dl_body_pool *pool;
>> +	size_t dlb_size;
>> +	unsigned int i;
>> +
>> +	pool = kzalloc(sizeof(*pool), GFP_KERNEL);
>> +	if (!pool)
>> +		return NULL;
>> +
>> +	pool->vsp1 = vsp1;
>> +
>> +	/*
>> +	 * Todo: 'extra_size' is only used by vsp1_dlm_create(), to allocate
> 
> s/Todo/TODO/
> 
>> +	 * extra memory for the display list header. We need only one header per
>> +	 * display list, not per display list body, thus this allocation is
>> +	 * extraneous and should be reworked in the future.
>> +	 */
> 
> Any plan to fix this ? :-)

Sort of - The DU interlaced work brings in further need to create pool type
memory - and creates an opportunity to further rework and separate the headers
from the display list body.

(I think something can be built on top of the DU Interlaced patches, they don't
need to be blocked by this TODO:)

But yes, this is in my mind for how to fix it.

Fixed the capitalisation.

> 
>> +	dlb_size = num_entries * sizeof(struct vsp1_dl_entry) + extra_size;
>> +	pool->size = dlb_size * num_bodies;
>> +
>> +	pool->bodies = kcalloc(num_bodies, sizeof(*pool->bodies), GFP_KERNEL);
>> +	if (!pool->bodies) {
>> +		kfree(pool);
>> +		return NULL;
>> +	}
>> +
>> +	pool->mem = dma_alloc_wc(vsp1->bus_master, pool->size, &pool->dma,
>> +				 GFP_KERNEL);
>> +	if (!pool->mem) {
>> +		kfree(pool->bodies);
>> +		kfree(pool);
>> +		return NULL;
>> +	}
>> +
>> +	spin_lock_init(&pool->lock);
>> +	INIT_LIST_HEAD(&pool->free);
>> +
>> +	for (i = 0; i < num_bodies; ++i) {
>> +		struct vsp1_dl_body *dlb = &pool->bodies[i];
>> +
>> +		dlb->pool = pool;
>> +		dlb->max_entries = num_entries;
>> +
>> +		dlb->dma = pool->dma + i * dlb_size;
>> +		dlb->entries = pool->mem + i * dlb_size;
>> +
>> +		list_add_tail(&dlb->free, &pool->free);
>> +	}
>> +
>> +	return pool;
>> +}
>> +
>> +/**
>> + * vsp1_dl_body_pool_destroy - Release a body pool
>> + * @pool: The body pool
>> + *
>> + * Release all components of a pool allocation.
>> + */
>> +void vsp1_dl_body_pool_destroy(struct vsp1_dl_body_pool *pool)
>> +{
>> +	if (!pool)
>> +		return;
>> +
>> +	if (pool->mem)
>> +		dma_free_wc(pool->vsp1->bus_master, pool->size, pool->mem,
>> +			    pool->dma);
>> +
>> +	kfree(pool->bodies);
>> +	kfree(pool);
>> +}
>> +
>> +/**
>> + * vsp1_dl_body_get - Obtain a body from a pool
>> + * @pool: The body pool
>> + *
>> + * Obtain a body from the pool allocation without blocking.
> 
> "the pool allocation" ? Did you mean just "the pool" ?

That sounds better :D Done.

> 
>> + *
>> + * Returns a display list body or NULL if there are none available.
>> + */
>> +struct vsp1_dl_body *vsp1_dl_body_get(struct vsp1_dl_body_pool *pool)
>> +{
>> +	struct vsp1_dl_body *dlb = NULL;
>> +	unsigned long flags;
>> +
>> +	spin_lock_irqsave(&pool->lock, flags);
>> +
>> +	if (!list_empty(&pool->free)) {
>> +		dlb = list_first_entry(&pool->free, struct vsp1_dl_body, free);
>> +		list_del(&dlb->free);
>> +	}
>> +
>> +	spin_unlock_irqrestore(&pool->lock, flags);
>> +
>> +	return dlb;
>> +}
>> +
>> +/**
>> + * vsp1_dl_body_put - Return a body back to its pool
>> + * @dlb: The display list body
>> + *
>> + * Return a body back to the pool, and reset the num_entries to clear the
>> list.
>> + */
>> +void vsp1_dl_body_put(struct vsp1_dl_body *dlb)
>> +{
>> +	unsigned long flags;
>> +
>> +	if (!dlb)
>> +		return;
>> +
>> +	dlb->num_entries = 0;
>> +
>> +	spin_lock_irqsave(&dlb->pool->lock, flags);
>> +	list_add_tail(&dlb->free, &dlb->pool->free);
>> +	spin_unlock_irqrestore(&dlb->pool->lock, flags);
>> +}
>> +
>>  /*
>>   * Initialize a display list body object and allocate DMA memory for the
>> body * data. The display list body object is expected to have been
>> initialized to diff --git a/drivers/media/platform/vsp1/vsp1_dl.h
>> b/drivers/media/platform/vsp1/vsp1_dl.h index cf57f986b69a..031032e304d2
>> 100644
>> --- a/drivers/media/platform/vsp1/vsp1_dl.h
>> +++ b/drivers/media/platform/vsp1/vsp1_dl.h
>> @@ -17,6 +17,7 @@
>>
>>  struct vsp1_device;
>>  struct vsp1_dl_body;
>> +struct vsp1_dl_body_pool;
>>  struct vsp1_dl_list;
>>  struct vsp1_dl_manager;
>>
>> @@ -34,6 +35,13 @@ void vsp1_dl_list_put(struct vsp1_dl_list *dl);
>>  void vsp1_dl_list_write(struct vsp1_dl_list *dl, u32 reg, u32 data);
>>  void vsp1_dl_list_commit(struct vsp1_dl_list *dl);
>>
>> +struct vsp1_dl_body_pool *
>> +vsp1_dl_body_pool_create(struct vsp1_device *vsp1, unsigned int num_bodies,
>> +			 unsigned int num_entries, size_t extra_size);
>> +void vsp1_dl_body_pool_destroy(struct vsp1_dl_body_pool *pool);
>> +struct vsp1_dl_body *vsp1_dl_body_get(struct vsp1_dl_body_pool *pool);
>> +void vsp1_dl_body_put(struct vsp1_dl_body *dlb);
>> +
>>  struct vsp1_dl_body *vsp1_dl_body_alloc(struct vsp1_device *vsp1,
>>  					unsigned int num_entries);
>>  void vsp1_dl_body_free(struct vsp1_dl_body *dlb);
> 
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 4/8] media: vsp1: Convert display lists to use new body pool
  2018-04-06 22:55   ` Laurent Pinchart
@ 2018-04-30 14:39     ` Kieran Bingham
  0 siblings, 0 replies; 26+ messages in thread
From: Kieran Bingham @ 2018-04-30 14:39 UTC (permalink / raw)
  To: Laurent Pinchart; +Cc: linux-media, linux-renesas-soc, Kieran Bingham

Hi Laurent,

On 06/04/18 23:55, Laurent Pinchart wrote:
> Hi Kieran,
> 
> Thank you for the patch.
> 
> On Thursday, 8 March 2018 02:05:27 EEST Kieran Bingham wrote:
>> Adapt the dl->body0 object to use an object from the body pool. This
>> greatly reduces the pressure on the TLB for IPMMU use cases, as all of
>> the lists use a single allocation for the main body.
>>
>> The CLU and LUT objects pre-allocate a pool containing three bodies,
>> allowing a userspace update before the hardware has committed a previous
>> set of tables.
>>
>> Bodies are no longer 'freed' in interrupt context, but instead released
>> back to their respective pools. This allows us to remove the garbage
>> collector in the DLM.
>>
>> Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
>>
>> ---
>> v3:
>>  - 's/fragment/body', 's/fragments/bodies/'
>>  - CLU/LUT now allocate 3 bodies
>>  - vsp1_dl_list_fragments_free -> vsp1_dl_list_bodies_put
>>
>> v2:
>>  - Use dl->body0->max_entries to determine header offset, instead of the
>>    global constant VSP1_DL_NUM_ENTRIES which is incorrect.
>>  - squash updates for LUT, CLU, and fragment cleanup into single patch.
>>    (Not fully bisectable when separated)
>>
>>  drivers/media/platform/vsp1/vsp1_clu.c |  27 ++-
>>  drivers/media/platform/vsp1/vsp1_clu.h |   1 +-
>>  drivers/media/platform/vsp1/vsp1_dl.c  | 223 ++++++--------------------
>>  drivers/media/platform/vsp1/vsp1_dl.h  |   3 +-
>>  drivers/media/platform/vsp1/vsp1_lut.c |  27 ++-
>>  drivers/media/platform/vsp1/vsp1_lut.h |   1 +-
>>  6 files changed, 101 insertions(+), 181 deletions(-)
> 
> Still a nice diffstart :-)
> 
> [snip]
> 
>> diff --git a/drivers/media/platform/vsp1/vsp1_dl.c
>> b/drivers/media/platform/vsp1/vsp1_dl.c index 0208e72cb356..74476726451c
>> 100644
>> --- a/drivers/media/platform/vsp1/vsp1_dl.c
>> +++ b/drivers/media/platform/vsp1/vsp1_dl.c
> 
> [snip]
> 
>> @@ -399,11 +311,10 @@ void vsp1_dl_body_write(struct vsp1_dl_body *dlb, u32
>> reg, u32 data) * Display List Transaction Management
>>   */
>>
>> -static struct vsp1_dl_list *vsp1_dl_list_alloc(struct vsp1_dl_manager *dlm)
>> +static struct vsp1_dl_list *vsp1_dl_list_alloc(struct vsp1_dl_manager
>> *dlm,
>> +					       struct vsp1_dl_body_pool *pool)
> 
> Given that the only caller of this function passes dlm->pool as the second 
> argument, can't you remove the second argument ?

Hrm ... I thought there was going to be a use case where the pool will be
separated. But perhaps not.

So yes - Removing.

> 
>>  {
>>  	struct vsp1_dl_list *dl;
>> -	size_t header_size;
>> -	int ret;
>>
>>  	dl = kzalloc(sizeof(*dl), GFP_KERNEL);
>>  	if (!dl)
>> @@ -412,41 +323,39 @@ static struct vsp1_dl_list *vsp1_dl_list_alloc(struct
>> vsp1_dl_manager *dlm) INIT_LIST_HEAD(&dl->bodies);
>>  	dl->dlm = dlm;
>>
>> -	/*
>> -	 * Initialize the display list body and allocate DMA memory for the body
>> -	 * and the optional header. Both are allocated together to avoid memory
>> -	 * fragmentation, with the header located right after the body in
>> -	 * memory.
>> -	 */
>> -	header_size = dlm->mode == VSP1_DL_MODE_HEADER
>> -		    ? ALIGN(sizeof(struct vsp1_dl_header), 8)
>> -		    : 0;
>> -
>> -	ret = vsp1_dl_body_init(dlm->vsp1, &dl->body0, VSP1_DL_NUM_ENTRIES,
>> -				header_size);
>> -	if (ret < 0) {
>> -		kfree(dl);
>> +	/* Retrieve a body from our DLM body pool */
> 
> s/body pool/body pool./
> 
> (And I would have said "Get a body" but that's up to you)

I think that's evident by the function name "vsp1_dl_body_get()", thus I've
adapted this comment to be a bit more meaningful:
	/* Get a default body for our list. */

But I'm not opposed to dropping the comment. Also at somepoint, I think there's
scope to remove the dl->body0 so it may not matter.

> 
>> +	dl->body0 = vsp1_dl_body_get(pool);
>> +	if (!dl->body0)
>>  		return NULL;
>> -	}
>> -
>>  	if (dlm->mode == VSP1_DL_MODE_HEADER) {
>> -		size_t header_offset = VSP1_DL_NUM_ENTRIES
>> -				     * sizeof(*dl->body0.entries);
>> +		size_t header_offset = dl->body0->max_entries
>> +				     * sizeof(*dl->body0->entries);
>>
>> -		dl->header = ((void *)dl->body0.entries) + header_offset;
>> -		dl->dma = dl->body0.dma + header_offset;
>> +		dl->header = ((void *)dl->body0->entries) + header_offset;
>> +		dl->dma = dl->body0->dma + header_offset;
>>
>>  		memset(dl->header, 0, sizeof(*dl->header));
>> -		dl->header->lists[0].addr = dl->body0.dma;
>> +		dl->header->lists[0].addr = dl->body0->dma;
>>  	}
>>
>>  	return dl;
>>  }
>>
>> +static void vsp1_dl_list_bodies_put(struct vsp1_dl_list *dl)
>> +{
>> +	struct vsp1_dl_body *dlb, *tmp;
>> +
>> +	list_for_each_entry_safe(dlb, tmp, &dl->bodies, list) {
>> +		list_del(&dlb->list);
>> +		vsp1_dl_body_put(dlb);
>> +	}
>> +}
>> +
>>  static void vsp1_dl_list_free(struct vsp1_dl_list *dl)
>>  {
>> -	vsp1_dl_body_cleanup(&dl->body0);
>> -	list_splice_init(&dl->bodies, &dl->dlm->gc_bodies);
>> +	vsp1_dl_body_put(dl->body0);
>> +	vsp1_dl_list_bodies_put(dl);
> 
> Too bad we can't keep the list splice, it's more efficient than iterating over 
> the list, but I suppose it's unavoidable if we want to reset the number of 
> used entries to 0 for each body. Beside, we should have a small number of 
> bodies only, so hopefully it won't be a big deal.

Yes, plus reference counting needs to be tracked too ... so I think we need to
keep this.

> 
>> +
>>  	kfree(dl);
>>  }
>>
>> @@ -500,18 +409,13 @@ static void __vsp1_dl_list_put(struct vsp1_dl_list
>> *dl)
>>
>>  	dl->has_chain = false;
>>
>> +	vsp1_dl_list_bodies_put(dl);
>> +
>>  	/*
>> -	 * We can't free bodies here as DMA memory can only be freed in
>> -	 * interruptible context. Move all bodies to the display list manager's
>> -	 * list of bodies to be freed, they will be garbage-collected by the
>> -	 * work queue.
>> +	 * body0 is reused as as an optimisation as presently every display list
>> +	 * has at least one body, thus we reinitialise the entries list
> 
> s/entries list/entries list./

Done

> 
>>  	 */
>> -	if (!list_empty(&dl->bodies)) {
>> -		list_splice_init(&dl->bodies, &dl->dlm->gc_bodies);
>> -		schedule_work(&dl->dlm->gc_work);
>> -	}
> 
> We can certainly do this synchronously now that we don't need to free memory 
> anymore. I wonder however about the potential performance impact, as there's a 
> kfree() in vsp1_dl_list_free().


Yes, but ...

> Do you think it could have a noticeable impact on the time spent with interrupts disabled ?

I doubt it ... vsp1_dl_list_free() is only called from vsp1_dlm_destroy(), which
is only called from vsp1_wpf_destroy().

That's the whole benefit of being able to remove the garbage collector.


>> -
>> -	dl->body0.num_entries = 0;
>> +	dl->body0->num_entries = 0;
>>
>>  	list_add_tail(&dl->list, &dl->dlm->free);
>>  }
>> @@ -548,7 +452,7 @@ void vsp1_dl_list_put(struct vsp1_dl_list *dl)
>>   */
>>  void vsp1_dl_list_write(struct vsp1_dl_list *dl, u32 reg, u32 data)
>>  {
>> -	vsp1_dl_body_write(&dl->body0, reg, data);
>> +	vsp1_dl_body_write(dl->body0, reg, data);
>>  }
>>
>>  /**
>> @@ -561,8 +465,7 @@ void vsp1_dl_list_write(struct vsp1_dl_list *dl, u32
>> reg, u32 data)
>>   * in the order in which bodies are added.
>>   *
>>   * Adding a body to a display list passes ownership of the body to the
>> list. The
>> - * caller must not touch the body after this call, and must not free it
>> - * explicitly with vsp1_dl_body_free().
> 
> Shouldn't we keep the last part of the sentence and adapt it ? Maybe something 
> like
> 
> 	and must not release it explicitly with vsp1_dl_body_put().
> 
> I know that you introduce a reference count in the next patches that would 
> make this comment invalid, up to this patch it should be correct. When 
> introducing reference-counting you can update the comment to state that the 
> reference must be released.
> 

Sure ... It's a very temporary statement :-)

>> + * caller must not touch the body after this call.
>>   *
>>   * Additional bodies are only usable for display lists in header mode.
>>   * Attempting to add a body to a header-less display list will return an
>> error. @@ -620,7 +523,7 @@ static void vsp1_dl_list_fill_header(struct
>> vsp1_dl_list *dl, bool is_last)
>>   * list was allocated.
>>  	 */
>>
>> -	hdr->num_bytes = dl->body0.num_entries
>> +	hdr->num_bytes = dl->body0->num_entries
>>  		       * sizeof(*dl->header->lists);
>>
>>  	list_for_each_entry(dlb, &dl->bodies, list) {
>> @@ -694,9 +597,9 @@ static void vsp1_dl_list_hw_enqueue(struct vsp1_dl_list
>> *dl) * bit will be cleared by the hardware when the display list
>>  		 * processing starts.
>>  		 */
>> -		vsp1_write(vsp1, VI6_DL_HDR_ADDR(0), dl->body0.dma);
>> +		vsp1_write(vsp1, VI6_DL_HDR_ADDR(0), dl->body0->dma);
>>  		vsp1_write(vsp1, VI6_DL_BODY_SIZE, VI6_DL_BODY_SIZE_UPD |
>> -			   (dl->body0.num_entries * sizeof(*dl->header->lists)));
>> +			(dl->body0->num_entries * sizeof(*dl->header->lists)));
>>  	} else {
>>  		/*
>>  		 * In header mode, program the display list header address. If
>> @@ -879,45 +782,12 @@ void vsp1_dlm_reset(struct vsp1_dl_manager *dlm)
>>  	dlm->pending = NULL;
>>  }
>>
>> -/*
>> - * Free all bodies awaiting to be garbage-collected.
>> - *
>> - * This function must be called without the display list manager lock held.
>> - */
>> -static void vsp1_dlm_bodies_free(struct vsp1_dl_manager *dlm)
>> -{
>> -	unsigned long flags;
>> -
>> -	spin_lock_irqsave(&dlm->lock, flags);
>> -
>> -	while (!list_empty(&dlm->gc_bodies)) {
>> -		struct vsp1_dl_body *dlb;
>> -
>> -		dlb = list_first_entry(&dlm->gc_bodies, struct vsp1_dl_body,
>> -				       list);
>> -		list_del(&dlb->list);
>> -
>> -		spin_unlock_irqrestore(&dlm->lock, flags);
>> -		vsp1_dl_body_free(dlb);
>> -		spin_lock_irqsave(&dlm->lock, flags);
>> -	}
>> -
>> -	spin_unlock_irqrestore(&dlm->lock, flags);
>> -}
>> -
>> -static void vsp1_dlm_garbage_collect(struct work_struct *work)
>> -{
>> -	struct vsp1_dl_manager *dlm =
>> -		container_of(work, struct vsp1_dl_manager, gc_work);
>> -
>> -	vsp1_dlm_bodies_free(dlm);
>> -}
>> -
>>  struct vsp1_dl_manager *vsp1_dlm_create(struct vsp1_device *vsp1,
>>  					unsigned int index,
>>  					unsigned int prealloc)
>>  {
>>  	struct vsp1_dl_manager *dlm;
>> +	size_t header_size;
>>  	unsigned int i;
>>
>>  	dlm = devm_kzalloc(vsp1->dev, sizeof(*dlm), GFP_KERNEL);
>> @@ -932,13 +802,26 @@ struct vsp1_dl_manager *vsp1_dlm_create(struct
>> vsp1_device *vsp1,
>>
>>  	spin_lock_init(&dlm->lock);
>>  	INIT_LIST_HEAD(&dlm->free);
>> -	INIT_LIST_HEAD(&dlm->gc_bodies);
>> -	INIT_WORK(&dlm->gc_work, vsp1_dlm_garbage_collect);
>> +
>> +	/*
>> +	 * Initialize the display list body and allocate DMA memory for the body
>> +	 * and the optional header. Both are allocated together to avoid memory
>> +	 * fragmentation, with the header located right after the body in
>> +	 * memory.
>> +	 */
>> +	header_size = dlm->mode == VSP1_DL_MODE_HEADER
>> +		    ? ALIGN(sizeof(struct vsp1_dl_header), 8)
>> +		    : 0;
>> +
>> +	dlm->pool = vsp1_dl_body_pool_create(vsp1, prealloc,
>> +					     VSP1_DL_NUM_ENTRIES, header_size);
>> +	if (!dlm->pool)
>> +		return NULL;
>>
>>  	for (i = 0; i < prealloc; ++i) {
>>  		struct vsp1_dl_list *dl;
>>
>> -		dl = vsp1_dl_list_alloc(dlm);
>> +		dl = vsp1_dl_list_alloc(dlm, dlm->pool);
>>  		if (!dl)
>>  			return NULL;
>>
>> @@ -955,12 +838,10 @@ void vsp1_dlm_destroy(struct vsp1_dl_manager *dlm)
>>  	if (!dlm)
>>  		return;
>>
>> -	cancel_work_sync(&dlm->gc_work);
>> -
>>  	list_for_each_entry_safe(dl, next, &dlm->free, list) {
>>  		list_del(&dl->list);
>>  		vsp1_dl_list_free(dl);
>>  	}
>>
>> -	vsp1_dlm_bodies_free(dlm);
>> +	vsp1_dl_body_pool_destroy(dlm->pool);
>>  }
> 
> [snip]
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 6/8] media: vsp1: Refactor display list configure operations
  2018-04-06 23:38   ` Laurent Pinchart
@ 2018-04-30 16:22     ` Kieran Bingham
  0 siblings, 0 replies; 26+ messages in thread
From: Kieran Bingham @ 2018-04-30 16:22 UTC (permalink / raw)
  To: Laurent Pinchart; +Cc: linux-media, linux-renesas-soc, Kieran Bingham

Hi Laurent,

On 07/04/18 00:38, Laurent Pinchart wrote:
> Hi Kieran,
> 
> Thank you for the patch.
> 
> On Thursday, 8 March 2018 02:05:29 EEST Kieran Bingham wrote:
>> The entities provide a single .configure operation which configures the
>> object into the target display list, based on the vsp1_entity_params
>> selection.
>>
>> This restricts us to a single function prototype for both static
>> configuration (the pre-stream INIT stage) and the dynamic runtime stages
>> for both each frame - and each partition therein.
>>
>> Split the configure function into two parts, '.configure_stream()' and
>> '.configure_frame()', merging both the VSP1_ENTITY_PARAMS_RUNTIME and
>> VSP1_ENTITY_PARAMS_PARTITION stages into a single call through the
>> .configure_frame(). The configuration for individual partitions is
>> handled by passing the partition number to the configure call, and
>> processing any runtime stage actions on the first partition only.
>>
>> Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
>>
>> ---
>> v7
>>  - Fix formatting and white space
>>  - s/prepare/configure_stream/
>>  - s/configure/configure_frame/
>>
>>  drivers/media/platform/vsp1/vsp1_bru.c    |  12 +-
>>  drivers/media/platform/vsp1/vsp1_clu.c    |  50 +---
>>  drivers/media/platform/vsp1/vsp1_dl.h     |   1 +-
>>  drivers/media/platform/vsp1/vsp1_drm.c    |  21 +--
>>  drivers/media/platform/vsp1/vsp1_entity.c |  17 +-
>>  drivers/media/platform/vsp1/vsp1_entity.h |  33 +--
>>  drivers/media/platform/vsp1/vsp1_hgo.c    |  12 +-
>>  drivers/media/platform/vsp1/vsp1_hgt.c    |  12 +-
>>  drivers/media/platform/vsp1/vsp1_hsit.c   |  12 +-
>>  drivers/media/platform/vsp1/vsp1_lif.c    |  12 +-
>>  drivers/media/platform/vsp1/vsp1_lut.c    |  32 +-
>>  drivers/media/platform/vsp1/vsp1_rpf.c    | 164 ++++++-------
>>  drivers/media/platform/vsp1/vsp1_sru.c    |  12 +-
>>  drivers/media/platform/vsp1/vsp1_uds.c    |  57 ++--
>>  drivers/media/platform/vsp1/vsp1_video.c  |  24 +--
>>  drivers/media/platform/vsp1/vsp1_wpf.c    | 299 ++++++++++++-----------
>>  16 files changed, 378 insertions(+), 392 deletions(-)
> 
> [snip]
> 
>> diff --git a/drivers/media/platform/vsp1/vsp1_clu.c
>> b/drivers/media/platform/vsp1/vsp1_clu.c index b2a39a6ef7e4..b8d8af6d4910
>> 100644
>> --- a/drivers/media/platform/vsp1/vsp1_clu.c
>> +++ b/drivers/media/platform/vsp1/vsp1_clu.c
>> @@ -213,37 +213,36 @@ static const struct v4l2_subdev_ops clu_ops = {
>>  /*
>> ---------------------------------------------------------------------------
>> -- * VSP1 Entity Operations
>>   */
>> +static void clu_configure_stream(struct vsp1_entity *entity,
>> +				 struct vsp1_pipeline *pipe,
>> +				 struct vsp1_dl_list *dl)
>> +{
>> +	struct vsp1_clu *clu = to_clu(&entity->subdev);
>> +
>> +	/*
>> +	 * The yuv_mode can't be changed during streaming. Cache it internally
>> +	 * for future runtime configuration calls.
>> +	 */
> 
> I'd move this comment right before the vsp1_entity_get_pad_format() call to 
> keep all variable declarations together.

Agreed, Done.

> 
>> +	struct v4l2_mbus_framefmt *format;
>> +
>> +	format = vsp1_entity_get_pad_format(&clu->entity,
>> +					    clu->entity.config,
>> +					    CLU_PAD_SINK);
>> +	clu->yuv_mode = format->code == MEDIA_BUS_FMT_AYUV8_1X32;
>> +}
> 
> [snip]
> 
> 
>> diff --git a/drivers/media/platform/vsp1/vsp1_dl.h
>> b/drivers/media/platform/vsp1/vsp1_dl.h index 7e820ac6865a..f45083251644
>> 100644
>> --- a/drivers/media/platform/vsp1/vsp1_dl.h
>> +++ b/drivers/media/platform/vsp1/vsp1_dl.h
>> @@ -41,7 +41,6 @@ vsp1_dl_body_pool_create(struct vsp1_device *vsp1,
>> unsigned int num_bodies, void vsp1_dl_body_pool_destroy(struct
>> vsp1_dl_body_pool *pool);
>>  struct vsp1_dl_body *vsp1_dl_body_get(struct vsp1_dl_body_pool *pool);
>>  void vsp1_dl_body_put(struct vsp1_dl_body *dlb);
>> -
> 
> This is an unrelated change.
> 

Removed

>>  void vsp1_dl_body_write(struct vsp1_dl_body *dlb, u32 reg, u32 data);
>>  int vsp1_dl_list_add_body(struct vsp1_dl_list *dl, struct vsp1_dl_body
>> *dlb);
>>  int vsp1_dl_list_add_chain(struct vsp1_dl_list *head, struct vsp1_dl_list
>>  *dl);
> 
> [snip]
> 
>> diff --git a/drivers/media/platform/vsp1/vsp1_entity.h
>> b/drivers/media/platform/vsp1/vsp1_entity.h index
>> 408602ebeb97..b44ed5414fc3 100644
>> --- a/drivers/media/platform/vsp1/vsp1_entity.h
>> +++ b/drivers/media/platform/vsp1/vsp1_entity.h
> 
> [snip]
> 
>> @@ -80,8 +68,10 @@ struct vsp1_route {
>>  /**
>>   * struct vsp1_entity_operations - Entity operations
>>   * @destroy:	Destroy the entity.
>> - * @configure:	Setup the hardware based on the entity state (pipeline,
>> formats,
>> - *		selection rectangles, ...)
>> + * @configure_stream:	Setup the initial hardware parameters for the 
> stream
>> + *			(pipeline, formats)
> 
> Instead of initial I would say "Setup hardware parameters that stay constant 
> for the whole stream (pipeline, formats)", or possible "that don't vary 
> between frames" instead.
> 
>> + * @configure_frame:	Configure the runtime parameters for each partition
>> + *			(rectangles, buffer addresses, ...)
> 
> Maybe "for each frame and each partition thereof" ?
> 
> I think we mentioned, when discussing naming, the option of also having a 
> configure_partition() operation. Do you think that would make sense ? The 
> fact that the partition parameter to the .configure_frame() operation is used 
> for the sole purpose of checking whether to configure frame-related parameters 
> when partition == 0 makes me think that having two separate operations could 
> make sense.

OK, I'll give this a go now ...


Right ... it's looking good. A good clear separation


> 
>>   * @max_width:	Return the max supported width of data that the entity can
>>   *		process in a single operation.
>>   * @partition:	Process the partition construction based on this entity's
> 
> [snip]
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 8/8] media: vsp1: Move video configuration to a cached dlb
  2018-04-07  0:23   ` Laurent Pinchart
@ 2018-04-30 17:48     ` Kieran Bingham
  2018-05-01  8:28       ` Kieran Bingham
  2018-05-17 14:35       ` Laurent Pinchart
  0 siblings, 2 replies; 26+ messages in thread
From: Kieran Bingham @ 2018-04-30 17:48 UTC (permalink / raw)
  To: Laurent Pinchart; +Cc: linux-media, linux-renesas-soc, Kieran Bingham

Hi Laurent,

On 07/04/18 01:23, Laurent Pinchart wrote:
> Hi Kieran,
> 
> Thank you for the patch.
> 
> On Thursday, 8 March 2018 02:05:31 EEST Kieran Bingham wrote:
>> We are now able to configure a pipeline directly into a local display
>> list body. Take advantage of this fact, and create a cacheable body to
>> store the configuration of the pipeline in the video object.
>>
>> vsp1_video_pipeline_run() is now the last user of the pipe->dl object.
>> Convert this function to use the cached video->config body and obtain a
>> local display list reference.
>>
>> Attach the video->config body to the display list when needed before
>> committing to hardware.
>>
>> The pipe object is marked as un-configured when resuming from a suspend.
>> This ensures that when the hardware is reset - our cached configuration
>> will be re-attached to the next committed DL.
>>
>> Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
>> ---
>>
>> v3:
>>  - 's/fragment/body/', 's/fragments/bodies/'
>>  - video dlb cache allocation increased from 2 to 3 dlbs
>>
>> Our video DL usage now looks like the below output:
>>
>> dl->body0 contains our disposable runtime configuration. Max 41.
>> dl_child->body0 is our partition specific configuration. Max 12.
>> dl->bodies shows our constant configuration and LUTs.
>>
>>   These two are LUT/CLU:
>>      * dl->bodies[x]->num_entries 256 / max 256
>>      * dl->bodies[x]->num_entries 4914 / max 4914
>>
>> Which shows that our 'constant' configuration cache is currently
>> utilised to a maximum of 64 entries.
>>
>> trace-cmd report | \
>>     grep max | sed 's/.*vsp1_dl_list_commit://g' | sort | uniq;
>>
>>   dl->body0->num_entries 13 / max 128
>>   dl->body0->num_entries 14 / max 128
>>   dl->body0->num_entries 16 / max 128
>>   dl->body0->num_entries 20 / max 128
>>   dl->body0->num_entries 27 / max 128
>>   dl->body0->num_entries 34 / max 128
>>   dl->body0->num_entries 41 / max 128
>>   dl_child->body0->num_entries 10 / max 128
>>   dl_child->body0->num_entries 12 / max 128
>>   dl->bodies[x]->num_entries 15 / max 128
>>   dl->bodies[x]->num_entries 16 / max 128
>>   dl->bodies[x]->num_entries 17 / max 128
>>   dl->bodies[x]->num_entries 18 / max 128
>>   dl->bodies[x]->num_entries 20 / max 128
>>   dl->bodies[x]->num_entries 21 / max 128
>>   dl->bodies[x]->num_entries 256 / max 256
>>   dl->bodies[x]->num_entries 31 / max 128
>>   dl->bodies[x]->num_entries 32 / max 128
>>   dl->bodies[x]->num_entries 39 / max 128
>>   dl->bodies[x]->num_entries 40 / max 128
>>   dl->bodies[x]->num_entries 47 / max 128
>>   dl->bodies[x]->num_entries 48 / max 128
>>   dl->bodies[x]->num_entries 4914 / max 4914
>>   dl->bodies[x]->num_entries 55 / max 128
>>   dl->bodies[x]->num_entries 56 / max 128
>>   dl->bodies[x]->num_entries 63 / max 128
>>   dl->bodies[x]->num_entries 64 / max 128
> 
> This might be useful to capture in the main part of the commit message.
> 
>> v4:
>>  - Adjust pipe configured flag to be reset on resume rather than suspend
>>  - rename dl_child, dl_next
>>
>>  drivers/media/platform/vsp1/vsp1_pipe.c  |  7 +++-
>>  drivers/media/platform/vsp1/vsp1_pipe.h  |  4 +-
>>  drivers/media/platform/vsp1/vsp1_video.c | 67 ++++++++++++++++---------
>>  drivers/media/platform/vsp1/vsp1_video.h |  2 +-
>>  4 files changed, 54 insertions(+), 26 deletions(-)
>>
>> diff --git a/drivers/media/platform/vsp1/vsp1_pipe.c
>> b/drivers/media/platform/vsp1/vsp1_pipe.c index 5012643583b6..fa445b1a2e38
>> 100644
>> --- a/drivers/media/platform/vsp1/vsp1_pipe.c
>> +++ b/drivers/media/platform/vsp1/vsp1_pipe.c
>> @@ -249,6 +249,7 @@ void vsp1_pipeline_run(struct vsp1_pipeline *pipe)
>>  		vsp1_write(vsp1, VI6_CMD(pipe->output->entity.index),
>>  			   VI6_CMD_STRCMD);
>>  		pipe->state = VSP1_PIPELINE_RUNNING;
>> +		pipe->configured = true;
>>  	}
>>
>>  	pipe->buffers_ready = 0;
>> @@ -470,6 +471,12 @@ void vsp1_pipelines_resume(struct vsp1_device *vsp1)
>>  			continue;
>>
>>  		spin_lock_irqsave(&pipe->irqlock, flags);
>> +		/*
>> +		 * The hardware may have been reset during a suspend and will
>> +		 * need a full reconfiguration
>> +		 */
> 
> s/reconfiguration/reconfiguration./
> 
>> +		pipe->configured = false;
>> +
> 
> Where does that full reconfiguration occur, given that the vsp1_pipeline_run() 
> right below sets pipe->configured to true without performing reconfiguration ?

It's magic isn't it :D

If the pipe->configured flag gets set to false, the next execution of
vsp1_pipeline_run() attaches the video->pipe_config (the cached configuration,
containing the route_setup() and the configure_stream() entries) to the display
list before configuring for the next frame.

This means that the hardware gets a full configuration written to it after a
suspend/resume action.

Perhaps the comment should say "The video object will write out it's cached pipe
configuration on the next display list commit"


> 
>>  		if (vsp1_pipeline_ready(pipe))
>>  			vsp1_pipeline_run(pipe);
>>  		spin_unlock_irqrestore(&pipe->irqlock, flags);
>> diff --git a/drivers/media/platform/vsp1/vsp1_pipe.h
>> b/drivers/media/platform/vsp1/vsp1_pipe.h index 90d29492b9b9..e7ad6211b4d0
>> 100644
>> --- a/drivers/media/platform/vsp1/vsp1_pipe.h
>> +++ b/drivers/media/platform/vsp1/vsp1_pipe.h
>> @@ -90,6 +90,7 @@ struct vsp1_partition {
>>   * @irqlock: protects the pipeline state
>>   * @state: current state
>>   * @wq: wait queue to wait for state change completion
>> + * @configured: flag determining if the hardware has run since reset
>>   * @frame_end: frame end interrupt handler
>>   * @lock: protects the pipeline use count and stream count
>>   * @kref: pipeline reference count
>> @@ -117,6 +118,7 @@ struct vsp1_pipeline {
>>  	spinlock_t irqlock;
>>  	enum vsp1_pipeline_state state;
>>  	wait_queue_head_t wq;
>> +	bool configured;
>>
>>  	void (*frame_end)(struct vsp1_pipeline *pipe, bool completed);
>>
>> @@ -143,8 +145,6 @@ struct vsp1_pipeline {
>>  	 */
>>  	struct list_head entities;
>>
>> -	struct vsp1_dl_list *dl;
>> -
> 
> You should remove the corresponding line from the structure documentation.

Done.

> 
>>  	unsigned int partitions;
>>  	struct vsp1_partition *partition;
>>  	struct vsp1_partition *part_table;
>> diff --git a/drivers/media/platform/vsp1/vsp1_video.c
>> b/drivers/media/platform/vsp1/vsp1_video.c index b47708660e53..96d9872667d9
>> 100644
>> --- a/drivers/media/platform/vsp1/vsp1_video.c
>> +++ b/drivers/media/platform/vsp1/vsp1_video.c
>> @@ -394,37 +394,43 @@ static void vsp1_video_pipeline_run_partition(struct
>> vsp1_pipeline *pipe, static void vsp1_video_pipeline_run(struct
>> vsp1_pipeline *pipe)
>>  {
>>  	struct vsp1_device *vsp1 = pipe->output->entity.vsp1;
>> +	struct vsp1_video *video = pipe->output->video;
>>  	unsigned int partition;
>> +	struct vsp1_dl_list *dl;
>> +
>> +	dl = vsp1_dl_list_get(pipe->output->dlm);
>>
>> -	if (!pipe->dl)
>> -		pipe->dl = vsp1_dl_list_get(pipe->output->dlm);
>> +	/* Attach our pipe configuration to fully initialise the hardware */
> 
> s/hardware/hardware./
> 
> There are other similar comments in this patch.
> 
>> +	if (!pipe->configured) {
>> +		vsp1_dl_list_add_body(dl, video->pipe_config);
>> +		pipe->configured = true;
>> +	}
>>
>>  	/* Run the first partition */
>> -	vsp1_video_pipeline_run_partition(pipe, pipe->dl, 0);
>> +	vsp1_video_pipeline_run_partition(pipe, dl, 0);
>>
>>  	/* Process consecutive partitions as necessary */
>>  	for (partition = 1; partition < pipe->partitions; ++partition) {
>> -		struct vsp1_dl_list *dl;
>> +		struct vsp1_dl_list *dl_next;
>>
>> -		dl = vsp1_dl_list_get(pipe->output->dlm);
>> +		dl_next = vsp1_dl_list_get(pipe->output->dlm);
>>
>>  		/*
>>  		 * An incomplete chain will still function, but output only
>>  		 * the partitions that had a dl available. The frame end
>>  		 * interrupt will be marked on the last dl in the chain.
>>  		 */
>> -		if (!dl) {
>> +		if (!dl_next) {
>>  			dev_err(vsp1->dev, "Failed to obtain a dl list. Frame will be
>> incomplete\n"); break;
>>  		}
>>
>> -		vsp1_video_pipeline_run_partition(pipe, dl, partition);
>> -		vsp1_dl_list_add_chain(pipe->dl, dl);
>> +		vsp1_video_pipeline_run_partition(pipe, dl_next, partition);
>> +		vsp1_dl_list_add_chain(dl, dl_next);
>>  	}
>>
>>  	/* Complete, and commit the head display list. */
>> -	vsp1_dl_list_commit(pipe->dl);
>> -	pipe->dl = NULL;
>> +	vsp1_dl_list_commit(dl);
>>
>>  	vsp1_pipeline_run(pipe);
>>  }
>> @@ -790,8 +796,8 @@ static void vsp1_video_buffer_queue(struct vb2_buffer
>> *vb)
>>
>>  static int vsp1_video_setup_pipeline(struct vsp1_pipeline *pipe)
>>  {
>> +	struct vsp1_video *video = pipe->output->video;
>>  	struct vsp1_entity *entity;
>> -	struct vsp1_dl_body *dlb;
>>  	int ret;
>>
>>  	/* Determine this pipelines sizes for image partitioning support. */
>> @@ -799,14 +805,6 @@ static int vsp1_video_setup_pipeline(struct
>> vsp1_pipeline *pipe) if (ret < 0)
>>  		return ret;
>>
>> -	/* Prepare the display list. */
>> -	pipe->dl = vsp1_dl_list_get(pipe->output->dlm);
>> -	if (!pipe->dl)
>> -		return -ENOMEM;
>> -
>> -	/* Retrieve the default DLB from the list */
>> -	dlb = vsp1_dl_list_get_body0(pipe->dl);
>> -
>>  	if (pipe->uds) {
>>  		struct vsp1_uds *uds = to_uds(&pipe->uds->subdev);
>>
>> @@ -828,11 +826,20 @@ static int vsp1_video_setup_pipeline(struct
>> vsp1_pipeline *pipe) }
>>  	}
>>
>> +	/* Obtain a clean body from our pool */
>> +	video->pipe_config = vsp1_dl_body_get(video->dlbs);
>> +	if (!video->pipe_config)
>> +		return -ENOMEM;
>> +
>> +	/* Configure the entities into our cached pipe configuration */
>>  	list_for_each_entry(entity, &pipe->entities, list_pipe) {
>> -		vsp1_entity_route_setup(entity, pipe, dlb);
>> -		vsp1_entity_configure_stream(entity, pipe, dlb);
>> +		vsp1_entity_route_setup(entity, pipe, video->pipe_config);
>> +		vsp1_entity_configure_stream(entity, pipe, video->pipe_config);
>>  	}
>>
>> +	/* Ensure that our cached configuration is updated in the next DL */
>> +	pipe->configured = false;
> 
> Quoting my comment to a previous version, and your reply to it which I have 
> failed to answer,
> 
>>> I'm tempted to move this at pipeline stop time (either to
>>> vsp1_video_stop_streaming() right after the vsp1_pipeline_stop() call, or
>>> in vsp1_pipeline_stop() itself), possibly with a WARN_ON() here to catch
>>> bugs in the driver.
>>
>> Do you mean just setting the flag? or the pipe_configuration? This is a
>> setup task - not a stop task ... ? We are doing this as part of
>> vsp1_video_start_streaming().
> 
> I meant just setting the configured flag back to false.

The point at this line in the code is to ensure that the flag is set false,
because all of that stream configuration isn't included in the display list -
unless the flag is false.

If the flag is initialised false in object creation, and stream stop - then
that's fine. I felt like setting it false here was appropriate because as soon
as the video->pipe_config cache is populated - that's the time it also needs to
be 'flushed' to the hardware through the next dl_commit()

> 
>> IMO, The flag should only be updated after the configuration has been
>> updated to signal that the new configuration should be written out to the
>> hardware.
>>
>> Unless you mean to mark the pipe->configured = false; at
>> vsp1_pipeline_stop() time because we reset the pipe to halt it ?
> 
> That's the idea, yes. And now that I think about it again, we could also set 
> pipe->configured to false in vsp1_video_cleanup_pipeline() right after the 
> vsp1_dl_body_put() call.
> 
> What bothers me here is that the pipe->configured flag is handled both in 
> vsp1_pipe.c and vsp1_video.c. Coupled with my above comment about the full 
> reconfiguration at resume time, 

Which comment - the one saying it doesn't happen? (It does... it uses the cached
configuration)

> I think we might not be abstracting this as we 
> should. I wonder whether it would be possible to either make the flag local to 
> vsp1_pipe.c, or local to vsp1_video.c and move it from the pipeline object to 
> the video object. My gut feeling right now (and it might be too late to trust 
> it) is that, as the pipe_config object is stored in vsp1_video, so should the 
> configured flag.
> 
> Please feel free to challenge this.

The flag is in the pipe because that's accessible at resume time. I could
provide accessors so that it's not modified directly from the vsp_video object?

But the configuration cache is specific to the video object - which is why it's
in there...

I'm not sure that the pipeline vsp1_pipelines_resume() can modify flags in the
video object at resume time though ... which would be the other direction of
approaching this ...


> 
>> +
>>  	return 0;
>>  }
>>
>> @@ -842,6 +849,9 @@ static void vsp1_video_cleanup_pipeline(struct
>> vsp1_pipeline *pipe) struct vsp1_vb2_buffer *buffer;
>>  	unsigned long flags;
>>
>> +	/* Release any cached configuration */
>> +	vsp1_dl_body_put(video->pipe_config);
>> +
>>  	/* Remove all buffers from the IRQ queue. */
>>  	spin_lock_irqsave(&video->irqlock, flags);
>>  	list_for_each_entry(buffer, &video->irqqueue, queue)
>> @@ -918,9 +928,6 @@ static void vsp1_video_stop_streaming(struct vb2_queue
>> *vq) ret = vsp1_pipeline_stop(pipe);
>>  		if (ret == -ETIMEDOUT)
>>  			dev_err(video->vsp1->dev, "pipeline stop timeout\n");
>> -
>> -		vsp1_dl_list_put(pipe->dl);
>> -		pipe->dl = NULL;
>>  	}
>>  	mutex_unlock(&pipe->lock);
>>
>> @@ -1240,6 +1247,16 @@ struct vsp1_video *vsp1_video_create(struct
>> vsp1_device *vsp1, goto error;
>>  	}
>>
>> +	/*
>> +	 * Utilise a body pool to cache the constant configuration of the
>> +	 * pipeline object.
>> +	 */
>> +	video->dlbs = vsp1_dl_body_pool_create(vsp1, 3, 128, 0);
>> +	if (!video->dlbs) {
>> +		ret = -ENOMEM;
>> +		goto error;
>> +	}
>> +
>>  	return video;
>>
>>  error:
>> @@ -1249,6 +1266,8 @@ struct vsp1_video *vsp1_video_create(struct
>> vsp1_device *vsp1,
>>
>>  void vsp1_video_cleanup(struct vsp1_video *video)
>>  {
>> +	vsp1_dl_body_pool_destroy(video->dlbs);
>> +
>>  	if (video_is_registered(&video->video))
>>  		video_unregister_device(&video->video);
>>
>> diff --git a/drivers/media/platform/vsp1/vsp1_video.h
>> b/drivers/media/platform/vsp1/vsp1_video.h index 50ea7f02205f..e84f8ee902c1
>> 100644
>> --- a/drivers/media/platform/vsp1/vsp1_video.h
>> +++ b/drivers/media/platform/vsp1/vsp1_video.h
>> @@ -43,6 +43,8 @@ struct vsp1_video {
>>
>>  	struct mutex lock;
>>
>> +	struct vsp1_dl_body_pool *dlbs;
>> +	struct vsp1_dl_body *pipe_config;
>>  	unsigned int pipe_index;
>>
>>  	struct vb2_queue queue;
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 8/8] media: vsp1: Move video configuration to a cached dlb
  2018-04-30 17:48     ` Kieran Bingham
@ 2018-05-01  8:28       ` Kieran Bingham
  2018-05-01  9:07         ` Kieran Bingham
  2018-05-17 14:35       ` Laurent Pinchart
  1 sibling, 1 reply; 26+ messages in thread
From: Kieran Bingham @ 2018-05-01  8:28 UTC (permalink / raw)
  To: Laurent Pinchart; +Cc: linux-media, linux-renesas-soc


[-- Attachment #1.1: Type: text/plain, Size: 16623 bytes --]

Hi Laurent,

New plan ... (from the .. why didn't I think of this earlier department)



On 30/04/18 18:48, Kieran Bingham wrote:
> Hi Laurent,
> 
> On 07/04/18 01:23, Laurent Pinchart wrote:
>> Hi Kieran,
>>
>> Thank you for the patch.
>>
>> On Thursday, 8 March 2018 02:05:31 EEST Kieran Bingham wrote:
>>> We are now able to configure a pipeline directly into a local display
>>> list body. Take advantage of this fact, and create a cacheable body to
>>> store the configuration of the pipeline in the video object.
>>>
>>> vsp1_video_pipeline_run() is now the last user of the pipe->dl object.
>>> Convert this function to use the cached video->config body and obtain a
>>> local display list reference.
>>>
>>> Attach the video->config body to the display list when needed before
>>> committing to hardware.
>>>
>>> The pipe object is marked as un-configured when resuming from a suspend.
>>> This ensures that when the hardware is reset - our cached configuration
>>> will be re-attached to the next committed DL.
>>>
>>> Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
>>> ---
>>>
>>> v3:
>>>  - 's/fragment/body/', 's/fragments/bodies/'
>>>  - video dlb cache allocation increased from 2 to 3 dlbs
>>>
>>> Our video DL usage now looks like the below output:
>>>
>>> dl->body0 contains our disposable runtime configuration. Max 41.
>>> dl_child->body0 is our partition specific configuration. Max 12.
>>> dl->bodies shows our constant configuration and LUTs.
>>>
>>>   These two are LUT/CLU:
>>>      * dl->bodies[x]->num_entries 256 / max 256
>>>      * dl->bodies[x]->num_entries 4914 / max 4914
>>>
>>> Which shows that our 'constant' configuration cache is currently
>>> utilised to a maximum of 64 entries.
>>>
>>> trace-cmd report | \
>>>     grep max | sed 's/.*vsp1_dl_list_commit://g' | sort | uniq;
>>>
>>>   dl->body0->num_entries 13 / max 128
>>>   dl->body0->num_entries 14 / max 128
>>>   dl->body0->num_entries 16 / max 128
>>>   dl->body0->num_entries 20 / max 128
>>>   dl->body0->num_entries 27 / max 128
>>>   dl->body0->num_entries 34 / max 128
>>>   dl->body0->num_entries 41 / max 128
>>>   dl_child->body0->num_entries 10 / max 128
>>>   dl_child->body0->num_entries 12 / max 128
>>>   dl->bodies[x]->num_entries 15 / max 128
>>>   dl->bodies[x]->num_entries 16 / max 128
>>>   dl->bodies[x]->num_entries 17 / max 128
>>>   dl->bodies[x]->num_entries 18 / max 128
>>>   dl->bodies[x]->num_entries 20 / max 128
>>>   dl->bodies[x]->num_entries 21 / max 128
>>>   dl->bodies[x]->num_entries 256 / max 256
>>>   dl->bodies[x]->num_entries 31 / max 128
>>>   dl->bodies[x]->num_entries 32 / max 128
>>>   dl->bodies[x]->num_entries 39 / max 128
>>>   dl->bodies[x]->num_entries 40 / max 128
>>>   dl->bodies[x]->num_entries 47 / max 128
>>>   dl->bodies[x]->num_entries 48 / max 128
>>>   dl->bodies[x]->num_entries 4914 / max 4914
>>>   dl->bodies[x]->num_entries 55 / max 128
>>>   dl->bodies[x]->num_entries 56 / max 128
>>>   dl->bodies[x]->num_entries 63 / max 128
>>>   dl->bodies[x]->num_entries 64 / max 128
>>
>> This might be useful to capture in the main part of the commit message.
>>
>>> v4:
>>>  - Adjust pipe configured flag to be reset on resume rather than suspend
>>>  - rename dl_child, dl_next
>>>
>>>  drivers/media/platform/vsp1/vsp1_pipe.c  |  7 +++-
>>>  drivers/media/platform/vsp1/vsp1_pipe.h  |  4 +-
>>>  drivers/media/platform/vsp1/vsp1_video.c | 67 ++++++++++++++++---------
>>>  drivers/media/platform/vsp1/vsp1_video.h |  2 +-
>>>  4 files changed, 54 insertions(+), 26 deletions(-)
>>>
>>> diff --git a/drivers/media/platform/vsp1/vsp1_pipe.c
>>> b/drivers/media/platform/vsp1/vsp1_pipe.c index 5012643583b6..fa445b1a2e38
>>> 100644
>>> --- a/drivers/media/platform/vsp1/vsp1_pipe.c
>>> +++ b/drivers/media/platform/vsp1/vsp1_pipe.c
>>> @@ -249,6 +249,7 @@ void vsp1_pipeline_run(struct vsp1_pipeline *pipe)
>>>  		vsp1_write(vsp1, VI6_CMD(pipe->output->entity.index),
>>>  			   VI6_CMD_STRCMD);
>>>  		pipe->state = VSP1_PIPELINE_RUNNING;
>>> +		pipe->configured = true;

Look at that lovely pipe->state flag update right above the pipe->configured
update...

>>>  	}
>>>
>>>  	pipe->buffers_ready = 0;
>>> @@ -470,6 +471,12 @@ void vsp1_pipelines_resume(struct vsp1_device *vsp1)
>>>  			continue;
>>>
>>>  		spin_lock_irqsave(&pipe->irqlock, flags);
>>> +		/*
>>> +		 * The hardware may have been reset during a suspend and will
>>> +		 * need a full reconfiguration
>>> +		 */
>>
>> s/reconfiguration/reconfiguration./
>>
>>> +		pipe->configured = false;

If we have 'suspended' then pipe->state == STOPPED


>>> +
>>
>> Where does that full reconfiguration occur, given that the vsp1_pipeline_run() 
>> right below sets pipe->configured to true without performing reconfiguration ?
> 
> It's magic isn't it :D
> 
> If the pipe->configured flag gets set to false, the next execution of
> vsp1_pipeline_run() attaches the video->pipe_config (the cached configuration,
> containing the route_setup() and the configure_stream() entries) to the display
> list before configuring for the next frame.
> 
> This means that the hardware gets a full configuration written to it after a
> suspend/resume action.
> 
> Perhaps the comment should say "The video object will write out it's cached pipe
> configuration on the next display list commit"
> 
> 
>>
>>>  		if (vsp1_pipeline_ready(pipe))
>>>  			vsp1_pipeline_run(pipe);

>>>  		spin_unlock_irqrestore(&pipe->irqlock, flags);
>>> diff --git a/drivers/media/platform/vsp1/vsp1_pipe.h
>>> b/drivers/media/platform/vsp1/vsp1_pipe.h index 90d29492b9b9..e7ad6211b4d0
>>> 100644
>>> --- a/drivers/media/platform/vsp1/vsp1_pipe.h
>>> +++ b/drivers/media/platform/vsp1/vsp1_pipe.h
>>> @@ -90,6 +90,7 @@ struct vsp1_partition {
>>>   * @irqlock: protects the pipeline state
>>>   * @state: current state
>>>   * @wq: wait queue to wait for state change completion
>>> + * @configured: flag determining if the hardware has run since reset

I think this flag can now be removed...

>>>   * @frame_end: frame end interrupt handler
>>>   * @lock: protects the pipeline use count and stream count
>>>   * @kref: pipeline reference count
>>> @@ -117,6 +118,7 @@ struct vsp1_pipeline {
>>>  	spinlock_t irqlock;
>>>  	enum vsp1_pipeline_state state;
>>>  	wait_queue_head_t wq;
>>> +	bool configured;

and here of course...

>>>
>>>  	void (*frame_end)(struct vsp1_pipeline *pipe, bool completed);
>>>
>>> @@ -143,8 +145,6 @@ struct vsp1_pipeline {
>>>  	 */
>>>  	struct list_head entities;
>>>
>>> -	struct vsp1_dl_list *dl;
>>> -
>>
>> You should remove the corresponding line from the structure documentation.
> 
> Done.
> 
>>
>>>  	unsigned int partitions;
>>>  	struct vsp1_partition *partition;
>>>  	struct vsp1_partition *part_table;
>>> diff --git a/drivers/media/platform/vsp1/vsp1_video.c
>>> b/drivers/media/platform/vsp1/vsp1_video.c index b47708660e53..96d9872667d9
>>> 100644
>>> --- a/drivers/media/platform/vsp1/vsp1_video.c
>>> +++ b/drivers/media/platform/vsp1/vsp1_video.c
>>> @@ -394,37 +394,43 @@ static void vsp1_video_pipeline_run_partition(struct
>>> vsp1_pipeline *pipe, static void vsp1_video_pipeline_run(struct
>>> vsp1_pipeline *pipe)
>>>  {
>>>  	struct vsp1_device *vsp1 = pipe->output->entity.vsp1;
>>> +	struct vsp1_video *video = pipe->output->video;
>>>  	unsigned int partition;
>>> +	struct vsp1_dl_list *dl;
>>> +
>>> +	dl = vsp1_dl_list_get(pipe->output->dlm);
>>>
>>> -	if (!pipe->dl)
>>> -		pipe->dl = vsp1_dl_list_get(pipe->output->dlm);
>>> +	/* Attach our pipe configuration to fully initialise the hardware */
>>
>> s/hardware/hardware./
>>
>> There are other similar comments in this patch.

I think I've fixed these up.

>>
>>> +	if (!pipe->configured) {

So - if this line, instead of reading !pipe->configured was:

+	if (vsp1_pipeline_stopped(pipe)) {

>>> +		vsp1_dl_list_add_body(dl, video->pipe_config);
>>> +		pipe->configured = true;

Then we don't need to update the flag, or access the pipe internals.

>>> +	}




>>>
>>>  	/* Run the first partition */
>>> -	vsp1_video_pipeline_run_partition(pipe, pipe->dl, 0);
>>> +	vsp1_video_pipeline_run_partition(pipe, dl, 0);
>>>
>>>  	/* Process consecutive partitions as necessary */
>>>  	for (partition = 1; partition < pipe->partitions; ++partition) {
>>> -		struct vsp1_dl_list *dl;
>>> +		struct vsp1_dl_list *dl_next;
>>>
>>> -		dl = vsp1_dl_list_get(pipe->output->dlm);
>>> +		dl_next = vsp1_dl_list_get(pipe->output->dlm);
>>>
>>>  		/*
>>>  		 * An incomplete chain will still function, but output only
>>>  		 * the partitions that had a dl available. The frame end
>>>  		 * interrupt will be marked on the last dl in the chain.
>>>  		 */
>>> -		if (!dl) {
>>> +		if (!dl_next) {
>>>  			dev_err(vsp1->dev, "Failed to obtain a dl list. Frame will be
>>> incomplete\n"); break;
>>>  		}
>>>
>>> -		vsp1_video_pipeline_run_partition(pipe, dl, partition);
>>> -		vsp1_dl_list_add_chain(pipe->dl, dl);
>>> +		vsp1_video_pipeline_run_partition(pipe, dl_next, partition);
>>> +		vsp1_dl_list_add_chain(dl, dl_next);
>>>  	}
>>>
>>>  	/* Complete, and commit the head display list. */
>>> -	vsp1_dl_list_commit(pipe->dl);
>>> -	pipe->dl = NULL;
>>> +	vsp1_dl_list_commit(dl);
>>>
>>>  	vsp1_pipeline_run(pipe);
>>>  }
>>> @@ -790,8 +796,8 @@ static void vsp1_video_buffer_queue(struct vb2_buffer
>>> *vb)
>>>
>>>  static int vsp1_video_setup_pipeline(struct vsp1_pipeline *pipe)
>>>  {
>>> +	struct vsp1_video *video = pipe->output->video;
>>>  	struct vsp1_entity *entity;
>>> -	struct vsp1_dl_body *dlb;
>>>  	int ret;
>>>
>>>  	/* Determine this pipelines sizes for image partitioning support. */
>>> @@ -799,14 +805,6 @@ static int vsp1_video_setup_pipeline(struct
>>> vsp1_pipeline *pipe) if (ret < 0)
>>>  		return ret;
>>>
>>> -	/* Prepare the display list. */
>>> -	pipe->dl = vsp1_dl_list_get(pipe->output->dlm);
>>> -	if (!pipe->dl)
>>> -		return -ENOMEM;
>>> -
>>> -	/* Retrieve the default DLB from the list */
>>> -	dlb = vsp1_dl_list_get_body0(pipe->dl);
>>> -
>>>  	if (pipe->uds) {
>>>  		struct vsp1_uds *uds = to_uds(&pipe->uds->subdev);
>>>
>>> @@ -828,11 +826,20 @@ static int vsp1_video_setup_pipeline(struct
>>> vsp1_pipeline *pipe) }
>>>  	}
>>>
>>> +	/* Obtain a clean body from our pool */
>>> +	video->pipe_config = vsp1_dl_body_get(video->dlbs);
>>> +	if (!video->pipe_config)
>>> +		return -ENOMEM;
>>> +
>>> +	/* Configure the entities into our cached pipe configuration */
>>>  	list_for_each_entry(entity, &pipe->entities, list_pipe) {
>>> -		vsp1_entity_route_setup(entity, pipe, dlb);
>>> -		vsp1_entity_configure_stream(entity, pipe, dlb);
>>> +		vsp1_entity_route_setup(entity, pipe, video->pipe_config);
>>> +		vsp1_entity_configure_stream(entity, pipe, video->pipe_config);
>>>  	}
>>>
>>> +	/* Ensure that our cached configuration is updated in the next DL */
>>> +	pipe->configured = false;
>>
>> Quoting my comment to a previous version, and your reply to it which I have 
>> failed to answer,
>>
>>>> I'm tempted to move this at pipeline stop time (either to
>>>> vsp1_video_stop_streaming() right after the vsp1_pipeline_stop() call, or
>>>> in vsp1_pipeline_stop() itself), possibly with a WARN_ON() here to catch
>>>> bugs in the driver.
>>>
>>> Do you mean just setting the flag? or the pipe_configuration? This is a
>>> setup task - not a stop task ... ? We are doing this as part of
>>> vsp1_video_start_streaming().
>>
>> I meant just setting the configured flag back to false.
> 
> The point at this line in the code is to ensure that the flag is set false,
> because all of that stream configuration isn't included in the display list -
> unless the flag is false.
> 
> If the flag is initialised false in object creation, and stream stop - then
> that's fine. I felt like setting it false here was appropriate because as soon
> as the video->pipe_config cache is populated - that's the time it also needs to
> be 'flushed' to the hardware through the next dl_commit()
> 
>>
>>> IMO, The flag should only be updated after the configuration has been
>>> updated to signal that the new configuration should be written out to the
>>> hardware.
>>>
>>> Unless you mean to mark the pipe->configured = false; at
>>> vsp1_pipeline_stop() time because we reset the pipe to halt it ?
>>
>> That's the idea, yes. And now that I think about it again, we could also set 
>> pipe->configured to false in vsp1_video_cleanup_pipeline() right after the 
>> vsp1_dl_body_put() call.
>>
>> What bothers me here is that the pipe->configured flag is handled both in 
>> vsp1_pipe.c and vsp1_video.c. Coupled with my above comment about the full 
>> reconfiguration at resume time, 
> 
> Which comment - the one saying it doesn't happen? (It does... it uses the cached
> configuration)
> 
>> I think we might not be abstracting this as we 
>> should. I wonder whether it would be possible to either make the flag local to 
>> vsp1_pipe.c, or local to vsp1_video.c and move it from the pipeline object to 
>> the video object. My gut feeling right now (and it might be too late to trust 
>> it) is that, as the pipe_config object is stored in vsp1_video, so should the 
>> configured flag.
>>
>> Please feel free to challenge this.
> 
> The flag is in the pipe because that's accessible at resume time. I could
> provide accessors so that it's not modified directly from the vsp_video object?
> 
> But the configuration cache is specific to the video object - which is why it's
> in there...
> 
> I'm not sure that the pipeline vsp1_pipelines_resume() can modify flags in the
> video object at resume time though ... which would be the other direction of
> approaching this ...


So summarising my inlines above...

Yes, currently this patch is modifying a flag based on the hardware state to
know when to apply the current cached configuration to the next outgoing display
list.

We don't need to have a flag (or the cross-object pollution) because I believe
we can use the pipe->state status to tell us the exact information we need.
(when the pipeline has stopped, and thus needs to have the routing and stream
information sent to hardware)

I'll try it out - and hopefully send a v8... that I'm happy with :D



--
Kieran



> 
>>
>>> +
>>>  	return 0;
>>>  }
>>>
>>> @@ -842,6 +849,9 @@ static void vsp1_video_cleanup_pipeline(struct
>>> vsp1_pipeline *pipe) struct vsp1_vb2_buffer *buffer;
>>>  	unsigned long flags;
>>>
>>> +	/* Release any cached configuration */
>>> +	vsp1_dl_body_put(video->pipe_config);
>>> +
>>>  	/* Remove all buffers from the IRQ queue. */
>>>  	spin_lock_irqsave(&video->irqlock, flags);
>>>  	list_for_each_entry(buffer, &video->irqqueue, queue)
>>> @@ -918,9 +928,6 @@ static void vsp1_video_stop_streaming(struct vb2_queue
>>> *vq) ret = vsp1_pipeline_stop(pipe);
>>>  		if (ret == -ETIMEDOUT)
>>>  			dev_err(video->vsp1->dev, "pipeline stop timeout\n");
>>> -
>>> -		vsp1_dl_list_put(pipe->dl);
>>> -		pipe->dl = NULL;
>>>  	}
>>>  	mutex_unlock(&pipe->lock);
>>>
>>> @@ -1240,6 +1247,16 @@ struct vsp1_video *vsp1_video_create(struct
>>> vsp1_device *vsp1, goto error;
>>>  	}
>>>
>>> +	/*
>>> +	 * Utilise a body pool to cache the constant configuration of the
>>> +	 * pipeline object.
>>> +	 */
>>> +	video->dlbs = vsp1_dl_body_pool_create(vsp1, 3, 128, 0);
>>> +	if (!video->dlbs) {
>>> +		ret = -ENOMEM;
>>> +		goto error;
>>> +	}
>>> +
>>>  	return video;
>>>
>>>  error:
>>> @@ -1249,6 +1266,8 @@ struct vsp1_video *vsp1_video_create(struct
>>> vsp1_device *vsp1,
>>>
>>>  void vsp1_video_cleanup(struct vsp1_video *video)
>>>  {
>>> +	vsp1_dl_body_pool_destroy(video->dlbs);
>>> +
>>>  	if (video_is_registered(&video->video))
>>>  		video_unregister_device(&video->video);
>>>
>>> diff --git a/drivers/media/platform/vsp1/vsp1_video.h
>>> b/drivers/media/platform/vsp1/vsp1_video.h index 50ea7f02205f..e84f8ee902c1
>>> 100644
>>> --- a/drivers/media/platform/vsp1/vsp1_video.h
>>> +++ b/drivers/media/platform/vsp1/vsp1_video.h
>>> @@ -43,6 +43,8 @@ struct vsp1_video {
>>>
>>>  	struct mutex lock;
>>>
>>> +	struct vsp1_dl_body_pool *dlbs;
>>> +	struct vsp1_dl_body *pipe_config;
>>>  	unsigned int pipe_index;
>>>
>>>  	struct vb2_queue queue;
>>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 8/8] media: vsp1: Move video configuration to a cached dlb
  2018-05-01  8:28       ` Kieran Bingham
@ 2018-05-01  9:07         ` Kieran Bingham
  0 siblings, 0 replies; 26+ messages in thread
From: Kieran Bingham @ 2018-05-01  9:07 UTC (permalink / raw)
  To: Laurent Pinchart; +Cc: linux-media, linux-renesas-soc

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

On 01/05/18 09:28, Kieran Bingham wrote:
> Hi Laurent,
> 
> New plan ... (from the .. why didn't I think of this earlier department)

Nope ... My suggestion in the previous mail (regarding dropping the configured
flag in place of using the pipe->state s) doesn't work because the pipeline is
set to be running before the vsp1_video_pipeline_run() gets the opportunity to
check it.

We could extend the states to include a 'VSP1_PIPELINE_STARTING' in between
_STOPPED, and _RUNNING, but I think that's more complicated as the state
machine across the whole code will be affected.

(which is why I chose to add a flag in the first place)


> On 30/04/18 18:48, Kieran Bingham wrote:
>> Hi Laurent,
>> 
>> On 07/04/18 01:23, Laurent Pinchart wrote:
>>> Hi Kieran,
>>> 
>>> Thank you for the patch.
>>> 
>>> On Thursday, 8 March 2018 02:05:31 EEST Kieran Bingham wrote:
>>>> We are now able to configure a pipeline directly into a local
>>>> display list body. Take advantage of this fact, and create a
>>>> cacheable body to store the configuration of the pipeline in the
>>>> video object.
>>>> 
>>>> vsp1_video_pipeline_run() is now the last user of the pipe->dl
>>>> object. Convert this function to use the cached video->config body
>>>> and obtain a local display list reference.
>>>> 
>>>> Attach the video->config body to the display list when needed before 
>>>> committing to hardware.
>>>> 
>>>> The pipe object is marked as un-configured when resuming from a
>>>> suspend. This ensures that when the hardware is reset - our cached
>>>> configuration will be re-attached to the next committed DL.
>>>> 
>>>> Signed-off-by: Kieran Bingham
>>>> <kieran.bingham+renesas@ideasonboard.com> ---
>>>> 
>>>> v3: - 's/fragment/body/', 's/fragments/bodies/' - video dlb cache
>>>> allocation increased from 2 to 3 dlbs
>>>> 
>>>> Our video DL usage now looks like the below output:
>>>> 
>>>> dl->body0 contains our disposable runtime configuration. Max 41. 
>>>> dl_child->body0 is our partition specific configuration. Max 12. 
>>>> dl->bodies shows our constant configuration and LUTs.
>>>> 
>>>> These two are LUT/CLU: * dl->bodies[x]->num_entries 256 / max 256 *
>>>> dl->bodies[x]->num_entries 4914 / max 4914
>>>> 
>>>> Which shows that our 'constant' configuration cache is currently 
>>>> utilised to a maximum of 64 entries.
>>>> 
>>>> trace-cmd report | \ grep max | sed 's/.*vsp1_dl_list_commit://g' |
>>>> sort | uniq;
>>>> 
>>>> dl->body0->num_entries 13 / max 128 dl->body0->num_entries 14 / max
>>>> 128 dl->body0->num_entries 16 / max 128 dl->body0->num_entries 20 /
>>>> max 128 dl->body0->num_entries 27 / max 128 dl->body0->num_entries 34
>>>> / max 128 dl->body0->num_entries 41 / max 128 
>>>> dl_child->body0->num_entries 10 / max 128 
>>>> dl_child->body0->num_entries 12 / max 128 dl->bodies[x]->num_entries
>>>> 15 / max 128 dl->bodies[x]->num_entries 16 / max 128 
>>>> dl->bodies[x]->num_entries 17 / max 128 dl->bodies[x]->num_entries 18
>>>> / max 128 dl->bodies[x]->num_entries 20 / max 128 
>>>> dl->bodies[x]->num_entries 21 / max 128 dl->bodies[x]->num_entries
>>>> 256 / max 256 dl->bodies[x]->num_entries 31 / max 128 
>>>> dl->bodies[x]->num_entries 32 / max 128 dl->bodies[x]->num_entries 39
>>>> / max 128 dl->bodies[x]->num_entries 40 / max 128 
>>>> dl->bodies[x]->num_entries 47 / max 128 dl->bodies[x]->num_entries 48
>>>> / max 128 dl->bodies[x]->num_entries 4914 / max 4914 
>>>> dl->bodies[x]->num_entries 55 / max 128 dl->bodies[x]->num_entries 56
>>>> / max 128 dl->bodies[x]->num_entries 63 / max 128 
>>>> dl->bodies[x]->num_entries 64 / max 128
>>> 
>>> This might be useful to capture in the main part of the commit
>>> message.
>>> 
>>>> v4: - Adjust pipe configured flag to be reset on resume rather than
>>>> suspend - rename dl_child, dl_next
>>>> 
>>>> drivers/media/platform/vsp1/vsp1_pipe.c  |  7 +++- 
>>>> drivers/media/platform/vsp1/vsp1_pipe.h  |  4 +- 
>>>> drivers/media/platform/vsp1/vsp1_video.c | 67
>>>> ++++++++++++++++--------- drivers/media/platform/vsp1/vsp1_video.h |
>>>> 2 +- 4 files changed, 54 insertions(+), 26 deletions(-)
>>>> 
>>>> diff --git a/drivers/media/platform/vsp1/vsp1_pipe.c 
>>>> b/drivers/media/platform/vsp1/vsp1_pipe.c index
>>>> 5012643583b6..fa445b1a2e38 100644 ---
>>>> a/drivers/media/platform/vsp1/vsp1_pipe.c +++
>>>> b/drivers/media/platform/vsp1/vsp1_pipe.c @@ -249,6 +249,7 @@ void
>>>> vsp1_pipeline_run(struct vsp1_pipeline *pipe) vsp1_write(vsp1,
>>>> VI6_CMD(pipe->output->entity.index), VI6_CMD_STRCMD); pipe->state =
>>>> VSP1_PIPELINE_RUNNING; +		pipe->configured = true;
> 
> Look at that lovely pipe->state flag update right above the
> pipe->configured update...
> 
>>>> }
>>>> 
>>>> pipe->buffers_ready = 0; @@ -470,6 +471,12 @@ void
>>>> vsp1_pipelines_resume(struct vsp1_device *vsp1) continue;
>>>> 
>>>> spin_lock_irqsave(&pipe->irqlock, flags); +		/* +		 * The hardware
>>>> may have been reset during a suspend and will +		 * need a full
>>>> reconfiguration +		 */
>>> 
>>> s/reconfiguration/reconfiguration./
>>> 
>>>> +		pipe->configured = false;
> 
> If we have 'suspended' then pipe->state == STOPPED
> 
> 
>>>> +
>>> 
>>> Where does that full reconfiguration occur, given that the
>>> vsp1_pipeline_run() right below sets pipe->configured to true without
>>> performing reconfiguration ?
>> 
>> It's magic isn't it :D
>> 
>> If the pipe->configured flag gets set to false, the next execution of 
>> vsp1_pipeline_run() attaches the video->pipe_config (the cached
>> configuration, containing the route_setup() and the configure_stream()
>> entries) to the display list before configuring for the next frame.
>> 
>> This means that the hardware gets a full configuration written to it
>> after a suspend/resume action.
>> 
>> Perhaps the comment should say "The video object will write out it's
>> cached pipe configuration on the next display list commit"
>> 
>> 
>>> 
>>>> if (vsp1_pipeline_ready(pipe)) vsp1_pipeline_run(pipe);
> 
>>>> spin_unlock_irqrestore(&pipe->irqlock, flags); diff --git
>>>> a/drivers/media/platform/vsp1/vsp1_pipe.h 
>>>> b/drivers/media/platform/vsp1/vsp1_pipe.h index
>>>> 90d29492b9b9..e7ad6211b4d0 100644 ---
>>>> a/drivers/media/platform/vsp1/vsp1_pipe.h +++
>>>> b/drivers/media/platform/vsp1/vsp1_pipe.h @@ -90,6 +90,7 @@ struct
>>>> vsp1_partition { * @irqlock: protects the pipeline state * @state:
>>>> current state * @wq: wait queue to wait for state change completion +
>>>> * @configured: flag determining if the hardware has run since reset
> 
> I think this flag can now be removed...
> 
>>>> * @frame_end: frame end interrupt handler * @lock: protects the
>>>> pipeline use count and stream count * @kref: pipeline reference
>>>> count @@ -117,6 +118,7 @@ struct vsp1_pipeline { spinlock_t irqlock; 
>>>> enum vsp1_pipeline_state state; wait_queue_head_t wq; +	bool
>>>> configured;
> 
> and here of course...
> 
>>>> 
>>>> void (*frame_end)(struct vsp1_pipeline *pipe, bool completed);
>>>> 
>>>> @@ -143,8 +145,6 @@ struct vsp1_pipeline { */ struct list_head
>>>> entities;
>>>> 
>>>> -	struct vsp1_dl_list *dl; -
>>> 
>>> You should remove the corresponding line from the structure
>>> documentation.
>> 
>> Done.
>> 
>>> 
>>>> unsigned int partitions; struct vsp1_partition *partition; struct
>>>> vsp1_partition *part_table; diff --git
>>>> a/drivers/media/platform/vsp1/vsp1_video.c 
>>>> b/drivers/media/platform/vsp1/vsp1_video.c index
>>>> b47708660e53..96d9872667d9 100644 ---
>>>> a/drivers/media/platform/vsp1/vsp1_video.c +++
>>>> b/drivers/media/platform/vsp1/vsp1_video.c @@ -394,37 +394,43 @@
>>>> static void vsp1_video_pipeline_run_partition(struct vsp1_pipeline
>>>> *pipe, static void vsp1_video_pipeline_run(struct vsp1_pipeline
>>>> *pipe) { struct vsp1_device *vsp1 = pipe->output->entity.vsp1; +
>>>> struct vsp1_video *video = pipe->output->video; unsigned int
>>>> partition; +	struct vsp1_dl_list *dl; + +	dl =
>>>> vsp1_dl_list_get(pipe->output->dlm);
>>>> 
>>>> -	if (!pipe->dl) -		pipe->dl = vsp1_dl_list_get(pipe->output->dlm); +
>>>> /* Attach our pipe configuration to fully initialise the hardware */
>>> 
>>> s/hardware/hardware./
>>> 
>>> There are other similar comments in this patch.
> 
> I think I've fixed these up.
> 
>>> 
>>>> +	if (!pipe->configured) {
> 
> So - if this line, instead of reading !pipe->configured was:
> 
> +	if (vsp1_pipeline_stopped(pipe)) {
> 
>>>> +		vsp1_dl_list_add_body(dl, video->pipe_config); +		pipe->configured
>>>> = true;
> 
> Then we don't need to update the flag, or access the pipe internals.
> 
>>>> +	}
> 
> 
> 
> 
>>>> 
>>>> /* Run the first partition */ -
>>>> vsp1_video_pipeline_run_partition(pipe, pipe->dl, 0); +
>>>> vsp1_video_pipeline_run_partition(pipe, dl, 0);
>>>> 
>>>> /* Process consecutive partitions as necessary */ for (partition = 1;
>>>> partition < pipe->partitions; ++partition) { -		struct vsp1_dl_list
>>>> *dl; +		struct vsp1_dl_list *dl_next;
>>>> 
>>>> -		dl = vsp1_dl_list_get(pipe->output->dlm); +		dl_next =
>>>> vsp1_dl_list_get(pipe->output->dlm);
>>>> 
>>>> /* * An incomplete chain will still function, but output only * the
>>>> partitions that had a dl available. The frame end * interrupt will be
>>>> marked on the last dl in the chain. */ -		if (!dl) { +		if (!dl_next)
>>>> { dev_err(vsp1->dev, "Failed to obtain a dl list. Frame will be 
>>>> incomplete\n"); break; }
>>>> 
>>>> -		vsp1_video_pipeline_run_partition(pipe, dl, partition); -
>>>> vsp1_dl_list_add_chain(pipe->dl, dl); +
>>>> vsp1_video_pipeline_run_partition(pipe, dl_next, partition); +
>>>> vsp1_dl_list_add_chain(dl, dl_next); }
>>>> 
>>>> /* Complete, and commit the head display list. */ -
>>>> vsp1_dl_list_commit(pipe->dl); -	pipe->dl = NULL; +
>>>> vsp1_dl_list_commit(dl);
>>>> 
>>>> vsp1_pipeline_run(pipe); } @@ -790,8 +796,8 @@ static void
>>>> vsp1_video_buffer_queue(struct vb2_buffer *vb)
>>>> 
>>>> static int vsp1_video_setup_pipeline(struct vsp1_pipeline *pipe) { +
>>>> struct vsp1_video *video = pipe->output->video; struct vsp1_entity
>>>> *entity; -	struct vsp1_dl_body *dlb; int ret;
>>>> 
>>>> /* Determine this pipelines sizes for image partitioning support. */ 
>>>> @@ -799,14 +805,6 @@ static int vsp1_video_setup_pipeline(struct 
>>>> vsp1_pipeline *pipe) if (ret < 0) return ret;
>>>> 
>>>> -	/* Prepare the display list. */ -	pipe->dl =
>>>> vsp1_dl_list_get(pipe->output->dlm); -	if (!pipe->dl) -		return
>>>> -ENOMEM; - -	/* Retrieve the default DLB from the list */ -	dlb =
>>>> vsp1_dl_list_get_body0(pipe->dl); - if (pipe->uds) { struct vsp1_uds
>>>> *uds = to_uds(&pipe->uds->subdev);
>>>> 
>>>> @@ -828,11 +826,20 @@ static int vsp1_video_setup_pipeline(struct 
>>>> vsp1_pipeline *pipe) } }
>>>> 
>>>> +	/* Obtain a clean body from our pool */ +	video->pipe_config =
>>>> vsp1_dl_body_get(video->dlbs); +	if (!video->pipe_config) +		return
>>>> -ENOMEM; + +	/* Configure the entities into our cached pipe
>>>> configuration */ list_for_each_entry(entity, &pipe->entities,
>>>> list_pipe) { -		vsp1_entity_route_setup(entity, pipe, dlb); -
>>>> vsp1_entity_configure_stream(entity, pipe, dlb); +
>>>> vsp1_entity_route_setup(entity, pipe, video->pipe_config); +
>>>> vsp1_entity_configure_stream(entity, pipe, video->pipe_config); }
>>>> 
>>>> +	/* Ensure that our cached configuration is updated in the next DL
>>>> */ +	pipe->configured = false;
>>> 
>>> Quoting my comment to a previous version, and your reply to it which I
>>> have failed to answer,
>>> 
>>>>> I'm tempted to move this at pipeline stop time (either to 
>>>>> vsp1_video_stop_streaming() right after the vsp1_pipeline_stop()
>>>>> call, or in vsp1_pipeline_stop() itself), possibly with a WARN_ON()
>>>>> here to catch bugs in the driver.
>>>> 
>>>> Do you mean just setting the flag? or the pipe_configuration? This is
>>>> a setup task - not a stop task ... ? We are doing this as part of 
>>>> vsp1_video_start_streaming().
>>> 
>>> I meant just setting the configured flag back to false.
>> 
>> The point at this line in the code is to ensure that the flag is set
>> false, because all of that stream configuration isn't included in the
>> display list - unless the flag is false.
>> 
>> If the flag is initialised false in object creation, and stream stop -
>> then that's fine. I felt like setting it false here was appropriate
>> because as soon as the video->pipe_config cache is populated - that's the
>> time it also needs to be 'flushed' to the hardware through the next
>> dl_commit()
>> 
>>> 
>>>> IMO, The flag should only be updated after the configuration has
>>>> been updated to signal that the new configuration should be written
>>>> out to the hardware.
>>>> 
>>>> Unless you mean to mark the pipe->configured = false; at 
>>>> vsp1_pipeline_stop() time because we reset the pipe to halt it ?
>>> 
>>> That's the idea, yes. And now that I think about it again, we could
>>> also set pipe->configured to false in vsp1_video_cleanup_pipeline()
>>> right after the vsp1_dl_body_put() call.
>>> 
>>> What bothers me here is that the pipe->configured flag is handled both
>>> in vsp1_pipe.c and vsp1_video.c. Coupled with my above comment about
>>> the full reconfiguration at resume time,
>> 
>> Which comment - the one saying it doesn't happen? (It does... it uses the
>> cached configuration)
>> 
>>> I think we might not be abstracting this as we should. I wonder whether
>>> it would be possible to either make the flag local to vsp1_pipe.c, or
>>> local to vsp1_video.c and move it from the pipeline object to the video
>>> object. My gut feeling right now (and it might be too late to trust it)
>>> is that, as the pipe_config object is stored in vsp1_video, so should
>>> the configured flag.
>>> 
>>> Please feel free to challenge this.
>> 
>> The flag is in the pipe because that's accessible at resume time. I
>> could provide accessors so that it's not modified directly from the
>> vsp_video object?
>> 
>> But the configuration cache is specific to the video object - which is
>> why it's in there...
>> 
>> I'm not sure that the pipeline vsp1_pipelines_resume() can modify flags
>> in the video object at resume time though ... which would be the other
>> direction of approaching this ...
> 
> 
> So summarising my inlines above...
> 
> Yes, currently this patch is modifying a flag based on the hardware state
> to know when to apply the current cached configuration to the next outgoing
> display list.
> 
> We don't need to have a flag (or the cross-object pollution) because I
> believe we can use the pipe->state status to tell us the exact information
> we need. (when the pipeline has stopped, and thus needs to have the routing
> and stream information sent to hardware)
> 
> I'll try it out - and hopefully send a v8... that I'm happy with :D
> 
> 
> 
> -- Kieran
> 
> 
> 
>> 
>>> 
>>>> + return 0; }
>>>> 
>>>> @@ -842,6 +849,9 @@ static void vsp1_video_cleanup_pipeline(struct 
>>>> vsp1_pipeline *pipe) struct vsp1_vb2_buffer *buffer; unsigned long
>>>> flags;
>>>> 
>>>> +	/* Release any cached configuration */ +
>>>> vsp1_dl_body_put(video->pipe_config); + /* Remove all buffers from
>>>> the IRQ queue. */ spin_lock_irqsave(&video->irqlock, flags); 
>>>> list_for_each_entry(buffer, &video->irqqueue, queue) @@ -918,9 +928,6
>>>> @@ static void vsp1_video_stop_streaming(struct vb2_queue *vq) ret =
>>>> vsp1_pipeline_stop(pipe); if (ret == -ETIMEDOUT) 
>>>> dev_err(video->vsp1->dev, "pipeline stop timeout\n"); - -
>>>> vsp1_dl_list_put(pipe->dl); -		pipe->dl = NULL; } 
>>>> mutex_unlock(&pipe->lock);
>>>> 
>>>> @@ -1240,6 +1247,16 @@ struct vsp1_video *vsp1_video_create(struct 
>>>> vsp1_device *vsp1, goto error; }
>>>> 
>>>> +	/* +	 * Utilise a body pool to cache the constant configuration of
>>>> the +	 * pipeline object. +	 */ +	video->dlbs =
>>>> vsp1_dl_body_pool_create(vsp1, 3, 128, 0); +	if (!video->dlbs) { +
>>>> ret = -ENOMEM; +		goto error; +	} + return video;
>>>> 
>>>> error: @@ -1249,6 +1266,8 @@ struct vsp1_video
>>>> *vsp1_video_create(struct vsp1_device *vsp1,
>>>> 
>>>> void vsp1_video_cleanup(struct vsp1_video *video) { +
>>>> vsp1_dl_body_pool_destroy(video->dlbs); + if
>>>> (video_is_registered(&video->video)) 
>>>> video_unregister_device(&video->video);
>>>> 
>>>> diff --git a/drivers/media/platform/vsp1/vsp1_video.h 
>>>> b/drivers/media/platform/vsp1/vsp1_video.h index
>>>> 50ea7f02205f..e84f8ee902c1 100644 ---
>>>> a/drivers/media/platform/vsp1/vsp1_video.h +++
>>>> b/drivers/media/platform/vsp1/vsp1_video.h @@ -43,6 +43,8 @@ struct
>>>> vsp1_video {
>>>> 
>>>> struct mutex lock;
>>>> 
>>>> +	struct vsp1_dl_body_pool *dlbs; +	struct vsp1_dl_body
>>>> *pipe_config;

Meanwhile, renaming pipe_config to stream_config seems to make sense as it
stores the stream configuration (non-runtime) parameters.


>>>> unsigned int pipe_index;
>>>> 
>>>> struct vb2_queue queue;
>>> 
> 
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCgAdFiEEkC3XmD+9KP3jctR6oR5GchCkYf0FAlroLmEACgkQoR5GchCk
Yf2kthAAqdOsIsFfjjRdZrxoZJwanQE8KRpXcdkfBMJvdAKJUqIMKYWX9YCdPlQN
lmppii6U8Ru8MG9OBMlYVlG2SYFIEbGJSDJA5OIvQN41VLVkvSOLbdf6hp03kSNs
QxqORR3ZKYqApxzeX7ayXv8c2JXLk7MrdPyvYdh8PNB+sj/B6aFc997IxKccwmuN
6ZZOt6AAEEUtDR0kMgY5JamkT6UqRoNxebBbxgM4yNCoOAVp3AYyA23GTwvviKuE
dAMGWE6hdCwz3k057RzpwePfeHKXnSxusabRr/vVRnLrzUw/DB81SHs9E87qkyWV
mu9SLkbK0TadWx94j5iZJPyN7VoUXned7JWSOkS4GZ/GpiVtgioKdEDbztJSSBUY
8WzUhuYfyYWppGGW1MwPByaeKhiKgCO2zsMLM3qgLPywAvkxIt5hsJUJ1IZGWpW/
eOhDtwjA4XTTKH0bPRhcxEJrojwwOKT38TbyKjejDgFMAe9jPSWKxgrTJZls/JkE
/kydepbUJmd0dzliUs69py0T1jq8kQbHm6Kn1lxkiKJCEBajRSOjtK+fe0msBJzh
fHjZAFMUpZX28C6NpQqnRz/9FQhmOlYAW90MEXB7yQTGB/oeWgQW/WViuxHj+iYZ
TuoUzpK+O2sE9eGYOhlb+VIxAlxWrh2Xexv7pLMSMKCCEzdEFMo=
=zey1
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 8/8] media: vsp1: Move video configuration to a cached dlb
  2018-04-30 17:48     ` Kieran Bingham
  2018-05-01  8:28       ` Kieran Bingham
@ 2018-05-17 14:35       ` Laurent Pinchart
  2018-05-17 17:06         ` Kieran Bingham
  1 sibling, 1 reply; 26+ messages in thread
From: Laurent Pinchart @ 2018-05-17 14:35 UTC (permalink / raw)
  To: kieran.bingham; +Cc: linux-media, linux-renesas-soc

Hi Kieran,

On Monday, 30 April 2018 20:48:03 EEST Kieran Bingham wrote:
> On 07/04/18 01:23, Laurent Pinchart wrote:
> > On Thursday, 8 March 2018 02:05:31 EEST Kieran Bingham wrote:
> >> We are now able to configure a pipeline directly into a local display
> >> list body. Take advantage of this fact, and create a cacheable body to
> >> store the configuration of the pipeline in the video object.
> >> 
> >> vsp1_video_pipeline_run() is now the last user of the pipe->dl object.
> >> Convert this function to use the cached video->config body and obtain a
> >> local display list reference.
> >> 
> >> Attach the video->config body to the display list when needed before
> >> committing to hardware.
> >> 
> >> The pipe object is marked as un-configured when resuming from a suspend.
> >> This ensures that when the hardware is reset - our cached configuration
> >> will be re-attached to the next committed DL.
> >> 
> >> Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
> >> ---
> >> 
> >> v3:
> >>  - 's/fragment/body/', 's/fragments/bodies/'
> >>  - video dlb cache allocation increased from 2 to 3 dlbs
> >> 
> >> Our video DL usage now looks like the below output:
> >> 
> >> dl->body0 contains our disposable runtime configuration. Max 41.
> >> dl_child->body0 is our partition specific configuration. Max 12.
> >> dl->bodies shows our constant configuration and LUTs.
> >> 
> >>   These two are LUT/CLU:
> >>      * dl->bodies[x]->num_entries 256 / max 256
> >>      * dl->bodies[x]->num_entries 4914 / max 4914
> >> 
> >> Which shows that our 'constant' configuration cache is currently
> >> utilised to a maximum of 64 entries.
> >> 
> >> trace-cmd report | \
> >> 
> >>     grep max | sed 's/.*vsp1_dl_list_commit://g' | sort | uniq;
> >>   
> >>   dl->body0->num_entries 13 / max 128
> >>   dl->body0->num_entries 14 / max 128
> >>   dl->body0->num_entries 16 / max 128
> >>   dl->body0->num_entries 20 / max 128
> >>   dl->body0->num_entries 27 / max 128
> >>   dl->body0->num_entries 34 / max 128
> >>   dl->body0->num_entries 41 / max 128
> >>   dl_child->body0->num_entries 10 / max 128
> >>   dl_child->body0->num_entries 12 / max 128
> >>   dl->bodies[x]->num_entries 15 / max 128
> >>   dl->bodies[x]->num_entries 16 / max 128
> >>   dl->bodies[x]->num_entries 17 / max 128
> >>   dl->bodies[x]->num_entries 18 / max 128
> >>   dl->bodies[x]->num_entries 20 / max 128
> >>   dl->bodies[x]->num_entries 21 / max 128
> >>   dl->bodies[x]->num_entries 256 / max 256
> >>   dl->bodies[x]->num_entries 31 / max 128
> >>   dl->bodies[x]->num_entries 32 / max 128
> >>   dl->bodies[x]->num_entries 39 / max 128
> >>   dl->bodies[x]->num_entries 40 / max 128
> >>   dl->bodies[x]->num_entries 47 / max 128
> >>   dl->bodies[x]->num_entries 48 / max 128
> >>   dl->bodies[x]->num_entries 4914 / max 4914
> >>   dl->bodies[x]->num_entries 55 / max 128
> >>   dl->bodies[x]->num_entries 56 / max 128
> >>   dl->bodies[x]->num_entries 63 / max 128
> >>   dl->bodies[x]->num_entries 64 / max 128
> > 
> > This might be useful to capture in the main part of the commit message.
> > 
> >> v4:
> >>  - Adjust pipe configured flag to be reset on resume rather than suspend
> >>  - rename dl_child, dl_next
> >>  
> >>  drivers/media/platform/vsp1/vsp1_pipe.c  |  7 +++-
> >>  drivers/media/platform/vsp1/vsp1_pipe.h  |  4 +-
> >>  drivers/media/platform/vsp1/vsp1_video.c | 67 ++++++++++++++++---------
> >>  drivers/media/platform/vsp1/vsp1_video.h |  2 +-
> >>  4 files changed, 54 insertions(+), 26 deletions(-)
> >> 
> >> diff --git a/drivers/media/platform/vsp1/vsp1_pipe.c
> >> b/drivers/media/platform/vsp1/vsp1_pipe.c index
> >> 5012643583b6..fa445b1a2e38
> >> 100644
> >> --- a/drivers/media/platform/vsp1/vsp1_pipe.c
> >> +++ b/drivers/media/platform/vsp1/vsp1_pipe.c
> >> @@ -249,6 +249,7 @@ void vsp1_pipeline_run(struct vsp1_pipeline *pipe)
> >>  		vsp1_write(vsp1, VI6_CMD(pipe->output->entity.index),
> >>  			   VI6_CMD_STRCMD);
> >>  		pipe->state = VSP1_PIPELINE_RUNNING;
> >> +		pipe->configured = true;
> >>  	}
> >>  	
> >>  	pipe->buffers_ready = 0;
> >> @@ -470,6 +471,12 @@ void vsp1_pipelines_resume(struct vsp1_device *vsp1)
> >>  			continue;
> >>  		
> >>  		spin_lock_irqsave(&pipe->irqlock, flags);
> >> +		/*
> >> +		 * The hardware may have been reset during a suspend and will
> >> +		 * need a full reconfiguration
> >> +		 */
> > 
> > s/reconfiguration/reconfiguration./
> > 
> >> +		pipe->configured = false;
> >> +
> > 
> > Where does that full reconfiguration occur, given that the
> > vsp1_pipeline_run() right below sets pipe->configured to true without
> > performing reconfiguration ?
Q 
> It's magic isn't it :D
> 
> If the pipe->configured flag gets set to false, the next execution of
> vsp1_pipeline_run() attaches the video->pipe_config (the cached
> configuration, containing the route_setup() and the configure_stream()
> entries) to the display list before configuring for the next frame.

Unless I'm mistaken, it's vsp1_video_pipeline_run() that does so, not 
vsp1_pipeline_run().

> This means that the hardware gets a full configuration written to it after a
> suspend/resume action.
> 
> Perhaps the comment should say "The video object will write out it's cached
> pipe configuration on the next display list commit"
> 
> >>  		if (vsp1_pipeline_ready(pipe))
> >>  			vsp1_pipeline_run(pipe);
> >>  		spin_unlock_irqrestore(&pipe->irqlock, flags);
> >> diff --git a/drivers/media/platform/vsp1/vsp1_pipe.h
> >> b/drivers/media/platform/vsp1/vsp1_pipe.h index
> >> 90d29492b9b9..e7ad6211b4d0
> >> 100644
> >> --- a/drivers/media/platform/vsp1/vsp1_pipe.h
> >> +++ b/drivers/media/platform/vsp1/vsp1_pipe.h
> >> @@ -90,6 +90,7 @@ struct vsp1_partition {
> >>   * @irqlock: protects the pipeline state
> >>   * @state: current state
> >>   * @wq: wait queue to wait for state change completion
> >> + * @configured: flag determining if the hardware has run since reset
> >>   * @frame_end: frame end interrupt handler
> >>   * @lock: protects the pipeline use count and stream count
> >>   * @kref: pipeline reference count
> >> @@ -117,6 +118,7 @@ struct vsp1_pipeline {
> >>  	spinlock_t irqlock;
> >>  	enum vsp1_pipeline_state state;
> >>  	wait_queue_head_t wq;
> >> +	bool configured;
> >> 
> >>  	void (*frame_end)(struct vsp1_pipeline *pipe, bool completed);
> >> 
> >> @@ -143,8 +145,6 @@ struct vsp1_pipeline {
> >>  	 */
> >>  	struct list_head entities;
> >> 
> >> -	struct vsp1_dl_list *dl;
> >> -
> > 
> > You should remove the corresponding line from the structure documentation.
> 
> Done.
> 
> >>  	unsigned int partitions;
> >>  	struct vsp1_partition *partition;
> >>  	struct vsp1_partition *part_table;
> >> diff --git a/drivers/media/platform/vsp1/vsp1_video.c
> >> b/drivers/media/platform/vsp1/vsp1_video.c index
> >> b47708660e53..96d9872667d9
> >> 100644
> >> --- a/drivers/media/platform/vsp1/vsp1_video.c
> >> +++ b/drivers/media/platform/vsp1/vsp1_video.c
> >> @@ -394,37 +394,43 @@ static void
> >> vsp1_video_pipeline_run_partition(struct
> >> vsp1_pipeline *pipe, static void vsp1_video_pipeline_run(struct
> >> vsp1_pipeline *pipe)
> >>  {
> >>  	struct vsp1_device *vsp1 = pipe->output->entity.vsp1;
> >> +	struct vsp1_video *video = pipe->output->video;
> >>  	unsigned int partition;
> >> +	struct vsp1_dl_list *dl;
> >> +
> >> +	dl = vsp1_dl_list_get(pipe->output->dlm);
> >> 
> >> -	if (!pipe->dl)
> >> -		pipe->dl = vsp1_dl_list_get(pipe->output->dlm);
> >> +	/* Attach our pipe configuration to fully initialise the hardware */
> > 
> > s/hardware/hardware./
> > 
> > There are other similar comments in this patch.
> > 
> >> +	if (!pipe->configured) {
> >> +		vsp1_dl_list_add_body(dl, video->pipe_config);
> >> +		pipe->configured = true;
> >> +	}
> >> 
> >>  	/* Run the first partition */
> >> -	vsp1_video_pipeline_run_partition(pipe, pipe->dl, 0);
> >> +	vsp1_video_pipeline_run_partition(pipe, dl, 0);
> >> 
> >>  	/* Process consecutive partitions as necessary */
> >>  	for (partition = 1; partition < pipe->partitions; ++partition) {
> >> -		struct vsp1_dl_list *dl;
> >> +		struct vsp1_dl_list *dl_next;
> >> 
> >> -		dl = vsp1_dl_list_get(pipe->output->dlm);
> >> +		dl_next = vsp1_dl_list_get(pipe->output->dlm);
> >> 
> >>  		/*
> >>  		 * An incomplete chain will still function, but output only
> >>  		 * the partitions that had a dl available. The frame end
> >>  		 * interrupt will be marked on the last dl in the chain.
> >>  		 */
> >> -		if (!dl) {
> >> +		if (!dl_next) {
> >>  			dev_err(vsp1->dev, "Failed to obtain a dl list. Frame will be
> >> incomplete\n");
> >>  			break;
> >>  		}
> >> 
> >> -		vsp1_video_pipeline_run_partition(pipe, dl, partition);
> >> -		vsp1_dl_list_add_chain(pipe->dl, dl);
> >> +		vsp1_video_pipeline_run_partition(pipe, dl_next, partition);
> >> +		vsp1_dl_list_add_chain(dl, dl_next);
> >>  	}
> >>  	
> >>  	/* Complete, and commit the head display list. */
> >> -	vsp1_dl_list_commit(pipe->dl);
> >> -	pipe->dl = NULL;
> >> +	vsp1_dl_list_commit(dl);
> >> 
> >>  	vsp1_pipeline_run(pipe);
> >>  }
> >> @@ -790,8 +796,8 @@ static void vsp1_video_buffer_queue(struct vb2_buffer
> >> *vb)
> >> 
> >>  static int vsp1_video_setup_pipeline(struct vsp1_pipeline *pipe)
> >>  {
> >> +	struct vsp1_video *video = pipe->output->video;
> >>  	struct vsp1_entity *entity;
> >> -	struct vsp1_dl_body *dlb;
> >>  	int ret;
> >>  	
> >>  	/* Determine this pipelines sizes for image partitioning support. */
> >> @@ -799,14 +805,6 @@ static int vsp1_video_setup_pipeline(struct
> >> vsp1_pipeline *pipe)
> >>  	if (ret < 0)
> >>  		return ret;
> >> 
> >> -	/* Prepare the display list. */
> >> -	pipe->dl = vsp1_dl_list_get(pipe->output->dlm);
> >> -	if (!pipe->dl)
> >> -		return -ENOMEM;
> >> -
> >> -	/* Retrieve the default DLB from the list */
> >> -	dlb = vsp1_dl_list_get_body0(pipe->dl);
> >> -
> >>  	if (pipe->uds) {
> >>  		struct vsp1_uds *uds = to_uds(&pipe->uds->subdev);
> >> 
> >> @@ -828,11 +826,20 @@ static int vsp1_video_setup_pipeline(struct
> >> vsp1_pipeline *pipe)
> >>  		}
> >>  	}
> >> 
> >> +	/* Obtain a clean body from our pool */
> >> +	video->pipe_config = vsp1_dl_body_get(video->dlbs);
> >> +	if (!video->pipe_config)
> >> +		return -ENOMEM;
> >> +
> >> +	/* Configure the entities into our cached pipe configuration */
> >>  	list_for_each_entry(entity, &pipe->entities, list_pipe) {
> >> -		vsp1_entity_route_setup(entity, pipe, dlb);
> >> -		vsp1_entity_configure_stream(entity, pipe, dlb);
> >> +		vsp1_entity_route_setup(entity, pipe, video->pipe_config);
> >> +		vsp1_entity_configure_stream(entity, pipe, video->pipe_config);
> >>  	}
> >> 
> >> +	/* Ensure that our cached configuration is updated in the next DL */
> >> +	pipe->configured = false;
> > 
> > Quoting my comment to a previous version, and your reply to it which I
> > have failed to answer,
> > 
> >>> I'm tempted to move this at pipeline stop time (either to
> >>> vsp1_video_stop_streaming() right after the vsp1_pipeline_stop() call,
> >>> or in vsp1_pipeline_stop() itself), possibly with a WARN_ON() here to
> >>> catch bugs in the driver.
> >> 
> >> Do you mean just setting the flag? or the pipe_configuration? This is a
> >> setup task - not a stop task ... ? We are doing this as part of
> >> vsp1_video_start_streaming().
> > 
> > I meant just setting the configured flag back to false.
> 
> The point at this line in the code is to ensure that the flag is set false,
> because all of that stream configuration isn't included in the display list
> - unless the flag is false.
> 
> If the flag is initialised false in object creation, and stream stop - then
> that's fine. I felt like setting it false here was appropriate because as
> soon as the video->pipe_config cache is populated - that's the time it also
> needs to be 'flushed' to the hardware through the next dl_commit()
> 
> >> IMO, The flag should only be updated after the configuration has been
> >> updated to signal that the new configuration should be written out to the
> >> hardware.
> >> 
> >> Unless you mean to mark the pipe->configured = false; at
> >> vsp1_pipeline_stop() time because we reset the pipe to halt it ?
> > 
> > That's the idea, yes. And now that I think about it again, we could also
> > set pipe->configured to false in vsp1_video_cleanup_pipeline() right
> > after the vsp1_dl_body_put() call.
> > 
> > What bothers me here is that the pipe->configured flag is handled both in
> > vsp1_pipe.c and vsp1_video.c. Coupled with my above comment about the full
> > reconfiguration at resume time,
> 
> Which comment - the one saying it doesn't happen? (It does... it uses the
> cached configuration)

As far as I understand it still doesn't :-)

> > I think we might not be abstracting this as we should. I wonder whether it
> > would be possible to either make the flag local to vsp1_pipe.c, or local
> > to vsp1_video.c and move it from the pipeline object to the video object.
> > My gut feeling right now (and it might be too late to trust it) is that,
> > as the pipe_config object is stored in vsp1_video, so should the
> > configured flag.
> > 
> > Please feel free to challenge this.
> 
> The flag is in the pipe because that's accessible at resume time. I could
> provide accessors so that it's not modified directly from the vsp_video
> object?
> 
> But the configuration cache is specific to the video object - which is why
> it's in there...
> 
> I'm not sure that the pipeline vsp1_pipelines_resume() can modify flags in
> the video object at resume time though ... which would be the other
> direction of approaching this ...
> 
> >> +
> >>  	return 0;
> >>  }
> >> 
> >> @@ -842,6 +849,9 @@ static void vsp1_video_cleanup_pipeline(struct
> >> vsp1_pipeline *pipe)
> >>  	struct vsp1_vb2_buffer *buffer;
> >>  	unsigned long flags;
> >> 
> >> +	/* Release any cached configuration */
> >> +	vsp1_dl_body_put(video->pipe_config);
> >> +
> >>  	/* Remove all buffers from the IRQ queue. */
> >>  	spin_lock_irqsave(&video->irqlock, flags);
> >>  	list_for_each_entry(buffer, &video->irqqueue, queue)
> >> @@ -918,9 +928,6 @@ static void vsp1_video_stop_streaming(struct
> >> vb2_queue *vq)
> >>  		ret = vsp1_pipeline_stop(pipe);
> >>  		if (ret == -ETIMEDOUT)
> >>  			dev_err(video->vsp1->dev, "pipeline stop timeout\n");
> >> -
> >> -		vsp1_dl_list_put(pipe->dl);
> >> -		pipe->dl = NULL;
> >>  	}
> >>  	mutex_unlock(&pipe->lock);
> >> 
> >> @@ -1240,6 +1247,16 @@ struct vsp1_video *vsp1_video_create(struct
> >> vsp1_device *vsp1,
> >>  		goto error;
> >>  	}
> >> 
> >> +	/*
> >> +	 * Utilise a body pool to cache the constant configuration of the
> >> +	 * pipeline object.
> >> +	 */
> >> +	video->dlbs = vsp1_dl_body_pool_create(vsp1, 3, 128, 0);
> >> +	if (!video->dlbs) {
> >> +		ret = -ENOMEM;
> >> +		goto error;
> >> +	}
> >> +
> >>  	return video;
> >>  
> >>  error:
> >> @@ -1249,6 +1266,8 @@ struct vsp1_video *vsp1_video_create(struct
> >> vsp1_device *vsp1,
> >> 
> >>  void vsp1_video_cleanup(struct vsp1_video *video)
> >>  {
> >> +	vsp1_dl_body_pool_destroy(video->dlbs);
> >> +
> >>  	if (video_is_registered(&video->video))
> >>  		video_unregister_device(&video->video);
> >> 
> >> diff --git a/drivers/media/platform/vsp1/vsp1_video.h
> >> b/drivers/media/platform/vsp1/vsp1_video.h index
> >> 50ea7f02205f..e84f8ee902c1
> >> 100644
> >> --- a/drivers/media/platform/vsp1/vsp1_video.h
> >> +++ b/drivers/media/platform/vsp1/vsp1_video.h
> >> @@ -43,6 +43,8 @@ struct vsp1_video {
> >> 
> >>  	struct mutex lock;
> >> 
> >> +	struct vsp1_dl_body_pool *dlbs;
> >> +	struct vsp1_dl_body *pipe_config;
> >>  	unsigned int pipe_index;
> >>  	
> >>  	struct vb2_queue queue;

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 8/8] media: vsp1: Move video configuration to a cached dlb
  2018-05-17 14:35       ` Laurent Pinchart
@ 2018-05-17 17:06         ` Kieran Bingham
  2018-05-17 20:11           ` Laurent Pinchart
  0 siblings, 1 reply; 26+ messages in thread
From: Kieran Bingham @ 2018-05-17 17:06 UTC (permalink / raw)
  To: Laurent Pinchart; +Cc: linux-media, linux-renesas-soc


[-- Attachment #1.1: Type: text/plain, Size: 19089 bytes --]

Hi Laurent,

On 17/05/18 15:35, Laurent Pinchart wrote:
> Hi Kieran,
> 
> On Monday, 30 April 2018 20:48:03 EEST Kieran Bingham wrote:
>> On 07/04/18 01:23, Laurent Pinchart wrote:
>>> On Thursday, 8 March 2018 02:05:31 EEST Kieran Bingham wrote:
>>>> We are now able to configure a pipeline directly into a local display
>>>> list body. Take advantage of this fact, and create a cacheable body to
>>>> store the configuration of the pipeline in the video object.
>>>>
>>>> vsp1_video_pipeline_run() is now the last user of the pipe->dl object.
>>>> Convert this function to use the cached video->config body and obtain a
>>>> local display list reference.
>>>>
>>>> Attach the video->config body to the display list when needed before
>>>> committing to hardware.
>>>>
>>>> The pipe object is marked as un-configured when resuming from a suspend.
>>>> This ensures that when the hardware is reset - our cached configuration
>>>> will be re-attached to the next committed DL.
>>>>
>>>> Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
>>>> ---
>>>>
>>>> v3:
>>>>  - 's/fragment/body/', 's/fragments/bodies/'
>>>>  - video dlb cache allocation increased from 2 to 3 dlbs
>>>>
>>>> Our video DL usage now looks like the below output:
>>>>
>>>> dl->body0 contains our disposable runtime configuration. Max 41.
>>>> dl_child->body0 is our partition specific configuration. Max 12.
>>>> dl->bodies shows our constant configuration and LUTs.
>>>>
>>>>   These two are LUT/CLU:
>>>>      * dl->bodies[x]->num_entries 256 / max 256
>>>>      * dl->bodies[x]->num_entries 4914 / max 4914
>>>>
>>>> Which shows that our 'constant' configuration cache is currently
>>>> utilised to a maximum of 64 entries.
>>>>
>>>> trace-cmd report | \
>>>>
>>>>     grep max | sed 's/.*vsp1_dl_list_commit://g' | sort | uniq;
>>>>   
>>>>   dl->body0->num_entries 13 / max 128
>>>>   dl->body0->num_entries 14 / max 128
>>>>   dl->body0->num_entries 16 / max 128
>>>>   dl->body0->num_entries 20 / max 128
>>>>   dl->body0->num_entries 27 / max 128
>>>>   dl->body0->num_entries 34 / max 128
>>>>   dl->body0->num_entries 41 / max 128
>>>>   dl_child->body0->num_entries 10 / max 128
>>>>   dl_child->body0->num_entries 12 / max 128
>>>>   dl->bodies[x]->num_entries 15 / max 128
>>>>   dl->bodies[x]->num_entries 16 / max 128
>>>>   dl->bodies[x]->num_entries 17 / max 128
>>>>   dl->bodies[x]->num_entries 18 / max 128
>>>>   dl->bodies[x]->num_entries 20 / max 128
>>>>   dl->bodies[x]->num_entries 21 / max 128
>>>>   dl->bodies[x]->num_entries 256 / max 256
>>>>   dl->bodies[x]->num_entries 31 / max 128
>>>>   dl->bodies[x]->num_entries 32 / max 128
>>>>   dl->bodies[x]->num_entries 39 / max 128
>>>>   dl->bodies[x]->num_entries 40 / max 128
>>>>   dl->bodies[x]->num_entries 47 / max 128
>>>>   dl->bodies[x]->num_entries 48 / max 128
>>>>   dl->bodies[x]->num_entries 4914 / max 4914
>>>>   dl->bodies[x]->num_entries 55 / max 128
>>>>   dl->bodies[x]->num_entries 56 / max 128
>>>>   dl->bodies[x]->num_entries 63 / max 128
>>>>   dl->bodies[x]->num_entries 64 / max 128
>>>
>>> This might be useful to capture in the main part of the commit message.
>>>
>>>> v4:
>>>>  - Adjust pipe configured flag to be reset on resume rather than suspend
>>>>  - rename dl_child, dl_next
>>>>  
>>>>  drivers/media/platform/vsp1/vsp1_pipe.c  |  7 +++-
>>>>  drivers/media/platform/vsp1/vsp1_pipe.h  |  4 +-
>>>>  drivers/media/platform/vsp1/vsp1_video.c | 67 ++++++++++++++++---------
>>>>  drivers/media/platform/vsp1/vsp1_video.h |  2 +-
>>>>  4 files changed, 54 insertions(+), 26 deletions(-)
>>>>
>>>> diff --git a/drivers/media/platform/vsp1/vsp1_pipe.c
>>>> b/drivers/media/platform/vsp1/vsp1_pipe.c index
>>>> 5012643583b6..fa445b1a2e38
>>>> 100644
>>>> --- a/drivers/media/platform/vsp1/vsp1_pipe.c
>>>> +++ b/drivers/media/platform/vsp1/vsp1_pipe.c
>>>> @@ -249,6 +249,7 @@ void vsp1_pipeline_run(struct vsp1_pipeline *pipe)
>>>>  		vsp1_write(vsp1, VI6_CMD(pipe->output->entity.index),
>>>>  			   VI6_CMD_STRCMD);
>>>>  		pipe->state = VSP1_PIPELINE_RUNNING;
>>>> +		pipe->configured = true;
>>>>  	}
>>>>  	
>>>>  	pipe->buffers_ready = 0;
>>>> @@ -470,6 +471,12 @@ void vsp1_pipelines_resume(struct vsp1_device *vsp1)
>>>>  			continue;
>>>>  		
>>>>  		spin_lock_irqsave(&pipe->irqlock, flags);
>>>> +		/*
>>>> +		 * The hardware may have been reset during a suspend and will
>>>> +		 * need a full reconfiguration
>>>> +		 */
>>>
>>> s/reconfiguration/reconfiguration./
>>>
>>>> +		pipe->configured = false;
>>>> +
>>>
>>> Where does that full reconfiguration occur, given that the
>>> vsp1_pipeline_run() right below sets pipe->configured to true without
>>> performing reconfiguration ?
> Q 
>> It's magic isn't it :D
>>
>> If the pipe->configured flag gets set to false, the next execution of
>> vsp1_pipeline_run() attaches the video->pipe_config (the cached
>> configuration, containing the route_setup() and the configure_stream()
>> entries) to the display list before configuring for the next frame.
> 
> Unless I'm mistaken, it's vsp1_video_pipeline_run() that does so, not 
> vsp1_pipeline_run().


Aha - ok - I think I see the issue.

1) Yes - you are correct - vsp1_video_pipeline_run() adds the full cached
configuration to the display list, to ensure that the routes are re-configured
after a resume.


> 
>> This means that the hardware gets a full configuration written to it after a
>> suspend/resume action.
>>
>> Perhaps the comment should say "The video object will write out it's cached
>> pipe configuration on the next display list commit"
>>
>>>>  		if (vsp1_pipeline_ready(pipe))

2) ... Although the next line is a call to vsp1_pipeline_run(), upon a resume
for a video pipeline - I believe vsp1_pipeline_ready() is false, (we will have
gone through STOPPING, STOPPED) thus it won't run until the *next* iteration. DU
pipelines will not be affected by the usage pipe->configured flag...

>>>>  			vsp1_pipeline_run(pipe);


However - now I see it - yes this feels a bit ugly in that regards, and now
feels like it's only worked by chance rather than design! :-(

Hrm ... in fact as there will be no active DL committed for the pipeline - is it
ever possible for the above vsp1_pipeline_run() to start the pipeline ? So it's
not chance at least :D


Perhaps instead of adding the configured flag, I could add the cached dlb
configuration if the pipe->state is STOPPED ...

 ... I feel like I've already tried to go down the route of using the
pipe->state though ...


Ok - a quick attempt, and I've removed the pipe->configured flag - and changed
the attach as follows:

	/* Attach our pipe configuration to fully initialise the hardware */
	if (!pipe->state == VSP1_PIPELINE_STOPPED)
		vsp1_dl_list_add_body(dl, video->pipe_config);

This is 'cleaner' I think as it doesn't add an extra flag - but relies upon the
exact same circumstance that the pipe->state is not set to running at resume
time (which I believe to be OK).

It also passes the relevant tests on vsp-tests:

root@Ubuntu-ARM64:~/vsp-tests# ./vsp-unit-test-0000.sh
Test Conditions:
  Platform          Renesas Salvator-X 2nd version board based on r8a7795 ES2.0+
  Kernel release    4.17.0-rc4-arm64-renesas-00397-g3d2f6f2901b0
  convert           /usr/bin/convert
  compare           /usr/bin/compare
  killall           /usr/bin/killall
  raw2rgbpnm        /usr/bin/raw2rgbpnm
  stress            /usr/bin/stress
  yavta             /usr/bin/yavta

root@Ubuntu-ARM64:~/vsp-tests# ./vsp-unit-test-0019.sh
Testing non-active pipeline suspend/resume in suspend:freezer: passed
Testing non-active pipeline suspend/resume in suspend:devices: passed
Testing non-active pipeline suspend/resume in suspend:platform: passed
Testing non-active pipeline suspend/resume in suspend:processors: passed
Testing non-active pipeline suspend/resume in suspend:core: passed

root@Ubuntu-ARM64:~/vsp-tests# ./vsp-unit-test-0020.sh
Testing Testing active pipeline suspend/resume in suspend:freezer: pass
Testing Testing active pipeline suspend/resume in suspend:devices: pass
Testing Testing active pipeline suspend/resume in suspend:platform: pass
Testing Testing active pipeline suspend/resume in suspend:processors: pass
Testing Testing active pipeline suspend/resume in suspend:core: pass


--
Kieran


>>>>  		spin_unlock_irqrestore(&pipe->irqlock, flags);
>>>> diff --git a/drivers/media/platform/vsp1/vsp1_pipe.h
>>>> b/drivers/media/platform/vsp1/vsp1_pipe.h index
>>>> 90d29492b9b9..e7ad6211b4d0
>>>> 100644
>>>> --- a/drivers/media/platform/vsp1/vsp1_pipe.h
>>>> +++ b/drivers/media/platform/vsp1/vsp1_pipe.h
>>>> @@ -90,6 +90,7 @@ struct vsp1_partition {
>>>>   * @irqlock: protects the pipeline state
>>>>   * @state: current state
>>>>   * @wq: wait queue to wait for state change completion
>>>> + * @configured: flag determining if the hardware has run since reset
>>>>   * @frame_end: frame end interrupt handler
>>>>   * @lock: protects the pipeline use count and stream count
>>>>   * @kref: pipeline reference count
>>>> @@ -117,6 +118,7 @@ struct vsp1_pipeline {
>>>>  	spinlock_t irqlock;
>>>>  	enum vsp1_pipeline_state state;
>>>>  	wait_queue_head_t wq;
>>>> +	bool configured;
>>>>
>>>>  	void (*frame_end)(struct vsp1_pipeline *pipe, bool completed);
>>>>
>>>> @@ -143,8 +145,6 @@ struct vsp1_pipeline {
>>>>  	 */
>>>>  	struct list_head entities;
>>>>
>>>> -	struct vsp1_dl_list *dl;
>>>> -
>>>
>>> You should remove the corresponding line from the structure documentation.
>>
>> Done.
>>
>>>>  	unsigned int partitions;
>>>>  	struct vsp1_partition *partition;
>>>>  	struct vsp1_partition *part_table;
>>>> diff --git a/drivers/media/platform/vsp1/vsp1_video.c
>>>> b/drivers/media/platform/vsp1/vsp1_video.c index
>>>> b47708660e53..96d9872667d9
>>>> 100644
>>>> --- a/drivers/media/platform/vsp1/vsp1_video.c
>>>> +++ b/drivers/media/platform/vsp1/vsp1_video.c
>>>> @@ -394,37 +394,43 @@ static void
>>>> vsp1_video_pipeline_run_partition(struct
>>>> vsp1_pipeline *pipe, static void vsp1_video_pipeline_run(struct
>>>> vsp1_pipeline *pipe)
>>>>  {
>>>>  	struct vsp1_device *vsp1 = pipe->output->entity.vsp1;
>>>> +	struct vsp1_video *video = pipe->output->video;
>>>>  	unsigned int partition;
>>>> +	struct vsp1_dl_list *dl;
>>>> +
>>>> +	dl = vsp1_dl_list_get(pipe->output->dlm);
>>>>
>>>> -	if (!pipe->dl)
>>>> -		pipe->dl = vsp1_dl_list_get(pipe->output->dlm);
>>>> +	/* Attach our pipe configuration to fully initialise the hardware */
>>>
>>> s/hardware/hardware./
>>>
>>> There are other similar comments in this patch.
>>>
>>>> +	if (!pipe->configured) {
>>>> +		vsp1_dl_list_add_body(dl, video->pipe_config);
>>>> +		pipe->configured = true;
>>>> +	}
>>>>
>>>>  	/* Run the first partition */
>>>> -	vsp1_video_pipeline_run_partition(pipe, pipe->dl, 0);
>>>> +	vsp1_video_pipeline_run_partition(pipe, dl, 0);
>>>>
>>>>  	/* Process consecutive partitions as necessary */
>>>>  	for (partition = 1; partition < pipe->partitions; ++partition) {
>>>> -		struct vsp1_dl_list *dl;
>>>> +		struct vsp1_dl_list *dl_next;
>>>>
>>>> -		dl = vsp1_dl_list_get(pipe->output->dlm);
>>>> +		dl_next = vsp1_dl_list_get(pipe->output->dlm);
>>>>
>>>>  		/*
>>>>  		 * An incomplete chain will still function, but output only
>>>>  		 * the partitions that had a dl available. The frame end
>>>>  		 * interrupt will be marked on the last dl in the chain.
>>>>  		 */
>>>> -		if (!dl) {
>>>> +		if (!dl_next) {
>>>>  			dev_err(vsp1->dev, "Failed to obtain a dl list. Frame will be
>>>> incomplete\n");
>>>>  			break;
>>>>  		}
>>>>
>>>> -		vsp1_video_pipeline_run_partition(pipe, dl, partition);
>>>> -		vsp1_dl_list_add_chain(pipe->dl, dl);
>>>> +		vsp1_video_pipeline_run_partition(pipe, dl_next, partition);
>>>> +		vsp1_dl_list_add_chain(dl, dl_next);
>>>>  	}
>>>>  	
>>>>  	/* Complete, and commit the head display list. */
>>>> -	vsp1_dl_list_commit(pipe->dl);
>>>> -	pipe->dl = NULL;
>>>> +	vsp1_dl_list_commit(dl);
>>>>
>>>>  	vsp1_pipeline_run(pipe);
>>>>  }
>>>> @@ -790,8 +796,8 @@ static void vsp1_video_buffer_queue(struct vb2_buffer
>>>> *vb)
>>>>
>>>>  static int vsp1_video_setup_pipeline(struct vsp1_pipeline *pipe)
>>>>  {
>>>> +	struct vsp1_video *video = pipe->output->video;
>>>>  	struct vsp1_entity *entity;
>>>> -	struct vsp1_dl_body *dlb;
>>>>  	int ret;
>>>>  	
>>>>  	/* Determine this pipelines sizes for image partitioning support. */
>>>> @@ -799,14 +805,6 @@ static int vsp1_video_setup_pipeline(struct
>>>> vsp1_pipeline *pipe)
>>>>  	if (ret < 0)
>>>>  		return ret;
>>>>
>>>> -	/* Prepare the display list. */
>>>> -	pipe->dl = vsp1_dl_list_get(pipe->output->dlm);
>>>> -	if (!pipe->dl)
>>>> -		return -ENOMEM;
>>>> -
>>>> -	/* Retrieve the default DLB from the list */
>>>> -	dlb = vsp1_dl_list_get_body0(pipe->dl);
>>>> -
>>>>  	if (pipe->uds) {
>>>>  		struct vsp1_uds *uds = to_uds(&pipe->uds->subdev);
>>>>
>>>> @@ -828,11 +826,20 @@ static int vsp1_video_setup_pipeline(struct
>>>> vsp1_pipeline *pipe)
>>>>  		}
>>>>  	}
>>>>
>>>> +	/* Obtain a clean body from our pool */
>>>> +	video->pipe_config = vsp1_dl_body_get(video->dlbs);
>>>> +	if (!video->pipe_config)
>>>> +		return -ENOMEM;
>>>> +
>>>> +	/* Configure the entities into our cached pipe configuration */
>>>>  	list_for_each_entry(entity, &pipe->entities, list_pipe) {
>>>> -		vsp1_entity_route_setup(entity, pipe, dlb);
>>>> -		vsp1_entity_configure_stream(entity, pipe, dlb);
>>>> +		vsp1_entity_route_setup(entity, pipe, video->pipe_config);
>>>> +		vsp1_entity_configure_stream(entity, pipe, video->pipe_config);
>>>>  	}
>>>>
>>>> +	/* Ensure that our cached configuration is updated in the next DL */
>>>> +	pipe->configured = false;


>>>
>>> Quoting my comment to a previous version, and your reply to it which I
>>> have failed to answer,
>>>
>>>>> I'm tempted to move this at pipeline stop time (either to
>>>>> vsp1_video_stop_streaming() right after the vsp1_pipeline_stop() call,
>>>>> or in vsp1_pipeline_stop() itself), possibly with a WARN_ON() here to
>>>>> catch bugs in the driver.
>>>>
>>>> Do you mean just setting the flag? or the pipe_configuration? This is a
>>>> setup task - not a stop task ... ? We are doing this as part of
>>>> vsp1_video_start_streaming().
>>>
>>> I meant just setting the configured flag back to false.
>>
>> The point at this line in the code is to ensure that the flag is set false,
>> because all of that stream configuration isn't included in the display list
>> - unless the flag is false.
>>
>> If the flag is initialised false in object creation, and stream stop - then
>> that's fine. I felt like setting it false here was appropriate because as
>> soon as the video->pipe_config cache is populated - that's the time it also
>> needs to be 'flushed' to the hardware through the next dl_commit()
>>
>>>> IMO, The flag should only be updated after the configuration has been
>>>> updated to signal that the new configuration should be written out to the
>>>> hardware.
>>>>
>>>> Unless you mean to mark the pipe->configured = false; at
>>>> vsp1_pipeline_stop() time because we reset the pipe to halt it ?
>>>
>>> That's the idea, yes. And now that I think about it again, we could also
>>> set pipe->configured to false in vsp1_video_cleanup_pipeline() right
>>> after the vsp1_dl_body_put() call.
>>>
>>> What bothers me here is that the pipe->configured flag is handled both in
>>> vsp1_pipe.c and vsp1_video.c. Coupled with my above comment about the full
>>> reconfiguration at resume time,
>>
>> Which comment - the one saying it doesn't happen? (It does... it uses the
>> cached configuration)
> 
> As far as I understand it still doesn't :-)
> 
>>> I think we might not be abstracting this as we should. I wonder whether it
>>> would be possible to either make the flag local to vsp1_pipe.c, or local
>>> to vsp1_video.c and move it from the pipeline object to the video object.
>>> My gut feeling right now (and it might be too late to trust it) is that,
>>> as the pipe_config object is stored in vsp1_video, so should the
>>> configured flag.
>>>
>>> Please feel free to challenge this.
>>
>> The flag is in the pipe because that's accessible at resume time. I could
>> provide accessors so that it's not modified directly from the vsp_video
>> object?
>>
>> But the configuration cache is specific to the video object - which is why
>> it's in there...
>>
>> I'm not sure that the pipeline vsp1_pipelines_resume() can modify flags in
>> the video object at resume time though ... which would be the other
>> direction of approaching this ...
>>
>>>> +
>>>>  	return 0;
>>>>  }
>>>>
>>>> @@ -842,6 +849,9 @@ static void vsp1_video_cleanup_pipeline(struct
>>>> vsp1_pipeline *pipe)
>>>>  	struct vsp1_vb2_buffer *buffer;
>>>>  	unsigned long flags;
>>>>
>>>> +	/* Release any cached configuration */
>>>> +	vsp1_dl_body_put(video->pipe_config);
>>>> +
>>>>  	/* Remove all buffers from the IRQ queue. */
>>>>  	spin_lock_irqsave(&video->irqlock, flags);
>>>>  	list_for_each_entry(buffer, &video->irqqueue, queue)
>>>> @@ -918,9 +928,6 @@ static void vsp1_video_stop_streaming(struct
>>>> vb2_queue *vq)
>>>>  		ret = vsp1_pipeline_stop(pipe);
>>>>  		if (ret == -ETIMEDOUT)
>>>>  			dev_err(video->vsp1->dev, "pipeline stop timeout\n");
>>>> -
>>>> -		vsp1_dl_list_put(pipe->dl);
>>>> -		pipe->dl = NULL;
>>>>  	}
>>>>  	mutex_unlock(&pipe->lock);
>>>>
>>>> @@ -1240,6 +1247,16 @@ struct vsp1_video *vsp1_video_create(struct
>>>> vsp1_device *vsp1,
>>>>  		goto error;
>>>>  	}
>>>>
>>>> +	/*
>>>> +	 * Utilise a body pool to cache the constant configuration of the
>>>> +	 * pipeline object.
>>>> +	 */
>>>> +	video->dlbs = vsp1_dl_body_pool_create(vsp1, 3, 128, 0);
>>>> +	if (!video->dlbs) {
>>>> +		ret = -ENOMEM;
>>>> +		goto error;
>>>> +	}
>>>> +
>>>>  	return video;
>>>>  
>>>>  error:
>>>> @@ -1249,6 +1266,8 @@ struct vsp1_video *vsp1_video_create(struct
>>>> vsp1_device *vsp1,
>>>>
>>>>  void vsp1_video_cleanup(struct vsp1_video *video)
>>>>  {
>>>> +	vsp1_dl_body_pool_destroy(video->dlbs);
>>>> +
>>>>  	if (video_is_registered(&video->video))
>>>>  		video_unregister_device(&video->video);
>>>>
>>>> diff --git a/drivers/media/platform/vsp1/vsp1_video.h
>>>> b/drivers/media/platform/vsp1/vsp1_video.h index
>>>> 50ea7f02205f..e84f8ee902c1
>>>> 100644
>>>> --- a/drivers/media/platform/vsp1/vsp1_video.h
>>>> +++ b/drivers/media/platform/vsp1/vsp1_video.h
>>>> @@ -43,6 +43,8 @@ struct vsp1_video {
>>>>
>>>>  	struct mutex lock;
>>>>
>>>> +	struct vsp1_dl_body_pool *dlbs;
>>>> +	struct vsp1_dl_body *pipe_config;
>>>>  	unsigned int pipe_index;
>>>>  	
>>>>  	struct vb2_queue queue;
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 8/8] media: vsp1: Move video configuration to a cached dlb
  2018-05-17 17:06         ` Kieran Bingham
@ 2018-05-17 20:11           ` Laurent Pinchart
  0 siblings, 0 replies; 26+ messages in thread
From: Laurent Pinchart @ 2018-05-17 20:11 UTC (permalink / raw)
  To: Kieran Bingham; +Cc: linux-media, linux-renesas-soc

Hi Kieran,

On Thursday, 17 May 2018 20:06:46 EEST Kieran Bingham wrote:
> On 17/05/18 15:35, Laurent Pinchart wrote:
> > On Monday, 30 April 2018 20:48:03 EEST Kieran Bingham wrote:
> >> On 07/04/18 01:23, Laurent Pinchart wrote:
> >>> On Thursday, 8 March 2018 02:05:31 EEST Kieran Bingham wrote:
> >>>> We are now able to configure a pipeline directly into a local display
> >>>> list body. Take advantage of this fact, and create a cacheable body to
> >>>> store the configuration of the pipeline in the video object.
> >>>> 
> >>>> vsp1_video_pipeline_run() is now the last user of the pipe->dl object.
> >>>> Convert this function to use the cached video->config body and obtain a
> >>>> local display list reference.
> >>>> 
> >>>> Attach the video->config body to the display list when needed before
> >>>> committing to hardware.
> >>>> 
> >>>> The pipe object is marked as un-configured when resuming from a
> >>>> suspend. This ensures that when the hardware is reset - our cached
> >>>> configuration will be re-attached to the next committed DL.
> >>>> 
> >>>> Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
> >>>> ---
> >>>> 
> >>>> v3:
> >>>>  - 's/fragment/body/', 's/fragments/bodies/'
> >>>>  - video dlb cache allocation increased from 2 to 3 dlbs
> >>>> 
> >>>> Our video DL usage now looks like the below output:
> >>>> 
> >>>> dl->body0 contains our disposable runtime configuration. Max 41.
> >>>> dl_child->body0 is our partition specific configuration. Max 12.
> >>>> dl->bodies shows our constant configuration and LUTs.
> >>>> 
> >>>>   These two are LUT/CLU:
> >>>>      * dl->bodies[x]->num_entries 256 / max 256
> >>>>      * dl->bodies[x]->num_entries 4914 / max 4914
> >>>> 
> >>>> Which shows that our 'constant' configuration cache is currently
> >>>> utilised to a maximum of 64 entries.
> >>>> 
> >>>> trace-cmd report | \
> >>>> 
> >>>>     grep max | sed 's/.*vsp1_dl_list_commit://g' | sort | uniq;
> >>>>   
> >>>>   dl->body0->num_entries 13 / max 128
> >>>>   dl->body0->num_entries 14 / max 128
> >>>>   dl->body0->num_entries 16 / max 128
> >>>>   dl->body0->num_entries 20 / max 128
> >>>>   dl->body0->num_entries 27 / max 128
> >>>>   dl->body0->num_entries 34 / max 128
> >>>>   dl->body0->num_entries 41 / max 128
> >>>>   dl_child->body0->num_entries 10 / max 128
> >>>>   dl_child->body0->num_entries 12 / max 128
> >>>>   dl->bodies[x]->num_entries 15 / max 128
> >>>>   dl->bodies[x]->num_entries 16 / max 128
> >>>>   dl->bodies[x]->num_entries 17 / max 128
> >>>>   dl->bodies[x]->num_entries 18 / max 128
> >>>>   dl->bodies[x]->num_entries 20 / max 128
> >>>>   dl->bodies[x]->num_entries 21 / max 128
> >>>>   dl->bodies[x]->num_entries 256 / max 256
> >>>>   dl->bodies[x]->num_entries 31 / max 128
> >>>>   dl->bodies[x]->num_entries 32 / max 128
> >>>>   dl->bodies[x]->num_entries 39 / max 128
> >>>>   dl->bodies[x]->num_entries 40 / max 128
> >>>>   dl->bodies[x]->num_entries 47 / max 128
> >>>>   dl->bodies[x]->num_entries 48 / max 128
> >>>>   dl->bodies[x]->num_entries 4914 / max 4914
> >>>>   dl->bodies[x]->num_entries 55 / max 128
> >>>>   dl->bodies[x]->num_entries 56 / max 128
> >>>>   dl->bodies[x]->num_entries 63 / max 128
> >>>>   dl->bodies[x]->num_entries 64 / max 128
> >>> 
> >>> This might be useful to capture in the main part of the commit message.
> >>> 
> >>>> v4:
> >>>>  - Adjust pipe configured flag to be reset on resume rather than
> >>>>  suspend
> >>>>  - rename dl_child, dl_next
> >>>>  
> >>>>  drivers/media/platform/vsp1/vsp1_pipe.c  |  7 +++-
> >>>>  drivers/media/platform/vsp1/vsp1_pipe.h  |  4 +-
> >>>>  drivers/media/platform/vsp1/vsp1_video.c | 67 ++++++++++++++---------
> >>>>  drivers/media/platform/vsp1/vsp1_video.h |  2 +-
> >>>>  4 files changed, 54 insertions(+), 26 deletions(-)
> >>>> 
> >>>> diff --git a/drivers/media/platform/vsp1/vsp1_pipe.c
> >>>> b/drivers/media/platform/vsp1/vsp1_pipe.c index
> >>>> 5012643583b6..fa445b1a2e38
> >>>> 100644
> >>>> --- a/drivers/media/platform/vsp1/vsp1_pipe.c
> >>>> +++ b/drivers/media/platform/vsp1/vsp1_pipe.c
> >>>> @@ -249,6 +249,7 @@ void vsp1_pipeline_run(struct vsp1_pipeline *pipe)
> >>>>  		vsp1_write(vsp1, VI6_CMD(pipe->output->entity.index),
> >>>>  			   VI6_CMD_STRCMD);
> >>>>  		pipe->state = VSP1_PIPELINE_RUNNING;
> >>>> +		pipe->configured = true;
> >>>>  	}
> >>>>  	
> >>>>  	pipe->buffers_ready = 0;
> >>>> @@ -470,6 +471,12 @@ void vsp1_pipelines_resume(struct vsp1_device
> >>>> *vsp1)
> >>>>  			continue;
> >>>>  		
> >>>>  		spin_lock_irqsave(&pipe->irqlock, flags);
> >>>> +		/*
> >>>> +		 * The hardware may have been reset during a suspend and will
> >>>> +		 * need a full reconfiguration
> >>>> +		 */
> >>> 
> >>> s/reconfiguration/reconfiguration./
> >>> 
> >>>> +		pipe->configured = false;
> >>>> +
> >>> 
> >>> Where does that full reconfiguration occur, given that the
> >>> vsp1_pipeline_run() right below sets pipe->configured to true without
> >>> performing reconfiguration ?
> > 
> > Q
> > 
> >> It's magic isn't it :D
> >> 
> >> If the pipe->configured flag gets set to false, the next execution of
> >> vsp1_pipeline_run() attaches the video->pipe_config (the cached
> >> configuration, containing the route_setup() and the configure_stream()
> >> entries) to the display list before configuring for the next frame.
> > 
> > Unless I'm mistaken, it's vsp1_video_pipeline_run() that does so, not
> > vsp1_pipeline_run().
> 
> Aha - ok - I think I see the issue.
> 
> 1) Yes - you are correct - vsp1_video_pipeline_run() adds the full cached
> configuration to the display list, to ensure that the routes are
> re-configured after a resume.
> 
> >> This means that the hardware gets a full configuration written to it
> >> after a suspend/resume action.
> >> 
> >> Perhaps the comment should say "The video object will write out it's
> >> cached pipe configuration on the next display list commit"
> >> 
> >>>>  		if (vsp1_pipeline_ready(pipe))
> 
> 2) ... Although the next line is a call to vsp1_pipeline_run(), upon a
> resume for a video pipeline - I believe vsp1_pipeline_ready() is false, (we
> will have gone through STOPPING, STOPPED) thus it won't run until the
> *next* iteration.

I don't think that's correct. vsp1_pipeline_ready() returns true if there is 
at least one buffer queued for every RPF and WPF in the pipeline. This can be 
the case at resume time, as suspending the VSP doesn't affect buffer queues.

> DU pipelines will not be affected by the usage pipe->configured flag...
> 
> >>>>  			vsp1_pipeline_run(pipe);
> 
> However - now I see it - yes this feels a bit ugly in that regards, and now
> feels like it's only worked by chance rather than design! :-(
> 
> Hrm ... in fact as there will be no active DL committed for the pipeline -
> is it ever possible for the above vsp1_pipeline_run() to start the pipeline
> ? So it's not chance at least :D

If power to the VSP is cut during suspend then I don't see how 
vsp1_pipelines_resume() could work. Seems like something is seriously 
broken... Should we move vsp1_pipelines_resume() to vsp1_video.c, rename it to 
vsp1_video_pipelines_resume(), and use vsp1_video_pipeline_run() instead of 
vsp1_pipeline_run() ? That would keep all the logic in vsp1_video.c and allow 
for the configured flag to be stored in struct vsp1_video along with 
stream_config.

> Perhaps instead of adding the configured flag, I could add the cached dlb
> configuration if the pipe->state is STOPPED ...

As explained in my review of v10, I think you'll then end up reconfiguring 
everything for every frame. The VSP should remain functional, but with a bit 
of a performance degradation.

>  ... I feel like I've already tried to go down the route of using the
> pipe->state though ...
> 
> 
> Ok - a quick attempt, and I've removed the pipe->configured flag - and
> changed the attach as follows:
> 
> 	/* Attach our pipe configuration to fully initialise the hardware */
> 	if (!pipe->state == VSP1_PIPELINE_STOPPED)
> 		vsp1_dl_list_add_body(dl, video->pipe_config);
> 
> This is 'cleaner' I think as it doesn't add an extra flag - but relies upon
> the exact same circumstance that the pipe->state is not set to running at
> resume time (which I believe to be OK).
> 
> It also passes the relevant tests on vsp-tests:

Seems like we need a suspend/resume test that cuts power from the VSP.

> root@Ubuntu-ARM64:~/vsp-tests# ./vsp-unit-test-0000.sh
> Test Conditions:
>   Platform          Renesas Salvator-X 2nd version board based on r8a7795
> ES2.0+ Kernel release    4.17.0-rc4-arm64-renesas-00397-g3d2f6f2901b0
>   convert           /usr/bin/convert
>   compare           /usr/bin/compare
>   killall           /usr/bin/killall
>   raw2rgbpnm        /usr/bin/raw2rgbpnm
>   stress            /usr/bin/stress
>   yavta             /usr/bin/yavta
> 
> root@Ubuntu-ARM64:~/vsp-tests# ./vsp-unit-test-0019.sh
> Testing non-active pipeline suspend/resume in suspend:freezer: passed
> Testing non-active pipeline suspend/resume in suspend:devices: passed
> Testing non-active pipeline suspend/resume in suspend:platform: passed
> Testing non-active pipeline suspend/resume in suspend:processors: passed
> Testing non-active pipeline suspend/resume in suspend:core: passed
> 
> root@Ubuntu-ARM64:~/vsp-tests# ./vsp-unit-test-0020.sh
> Testing Testing active pipeline suspend/resume in suspend:freezer: pass
> Testing Testing active pipeline suspend/resume in suspend:devices: pass
> Testing Testing active pipeline suspend/resume in suspend:platform: pass
> Testing Testing active pipeline suspend/resume in suspend:processors: pass
> Testing Testing active pipeline suspend/resume in suspend:core: pass
> 
> >>>>  		spin_unlock_irqrestore(&pipe->irqlock, flags);

[snip]

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2018-05-17 20:11 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-08  0:05 [PATCH v7 0/8] vsp1: TLB optimisation and DL caching Kieran Bingham
2018-03-08  0:05 ` [PATCH v7 1/8] media: vsp1: Reword uses of 'fragment' as 'body' Kieran Bingham
2018-04-06 21:38   ` Laurent Pinchart
2018-03-08  0:05 ` [PATCH v7 2/8] media: vsp1: Protect bodies against overflow Kieran Bingham
2018-03-08  0:05 ` [PATCH v7 3/8] media: vsp1: Provide a body pool Kieran Bingham
2018-04-06 22:33   ` Laurent Pinchart
2018-04-30 14:12     ` Kieran Bingham
2018-03-08  0:05 ` [PATCH v7 4/8] media: vsp1: Convert display lists to use new " Kieran Bingham
2018-04-06 22:55   ` Laurent Pinchart
2018-04-30 14:39     ` Kieran Bingham
2018-03-08  0:05 ` [PATCH v7 5/8] media: vsp1: Use reference counting for bodies Kieran Bingham
2018-04-06 23:06   ` Laurent Pinchart
2018-03-08  0:05 ` [PATCH v7 6/8] media: vsp1: Refactor display list configure operations Kieran Bingham
2018-04-06 23:38   ` Laurent Pinchart
2018-04-30 16:22     ` Kieran Bingham
2018-03-08  0:05 ` [PATCH v7 7/8] media: vsp1: Adapt entities to configure into a body Kieran Bingham
2018-04-06 23:55   ` Laurent Pinchart
2018-03-08  0:05 ` [PATCH v7 8/8] media: vsp1: Move video configuration to a cached dlb Kieran Bingham
2018-04-07  0:23   ` Laurent Pinchart
2018-04-30 17:48     ` Kieran Bingham
2018-05-01  8:28       ` Kieran Bingham
2018-05-01  9:07         ` Kieran Bingham
2018-05-17 14:35       ` Laurent Pinchart
2018-05-17 17:06         ` Kieran Bingham
2018-05-17 20:11           ` Laurent Pinchart
2018-04-07  0:30 ` [PATCH v7 0/8] vsp1: TLB optimisation and DL caching Laurent Pinchart

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.