All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/3] drm/vc4: Add a load tracker
@ 2018-10-25 12:45 Boris Brezillon
  2018-10-25 12:45 ` [PATCH v2 1/3] drm/atomic: Add a generic infrastructure to track underrun errors Boris Brezillon
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Boris Brezillon @ 2018-10-25 12:45 UTC (permalink / raw)
  To: Eric Anholt, Daniel Vetter; +Cc: Boris Brezillon, dri-devel

Hello,

This is the 2nd version of the VC4 load tracker patch.

Daniel, as you suggested, I've implemented a generic infrastructure to
track and report underrun errors (patch 1). Not sure this is what you
had in mind, but it seems to do the job for my use case, and should
allow me to easily track regressions in the load tracking logic with a
bunch of IGT tests. Let me know if you want it done differently.

Patch 2 is implementing the generic underrun interface in the VC4
driver, and patch 3 is just adding the load tracking logic (hasn't
changed since the RFC except for the unused 'ret' var removal).

Regards,

Boris

Boris Brezillon (3):
  drm/atomic: Add a generic infrastructure to track underrun errors
  drm/vc4: Report underrun errors
  drm/vc4: Add a load tracker to prevent HVS underflow errors

 drivers/gpu/drm/drm_atomic.c        |  12 +++
 drivers/gpu/drm/drm_atomic_helper.c |   4 +
 drivers/gpu/drm/vc4/vc4_drv.c       |   1 +
 drivers/gpu/drm/vc4/vc4_drv.h       |  14 ++++
 drivers/gpu/drm/vc4/vc4_hvs.c       |  71 ++++++++++++++++++
 drivers/gpu/drm/vc4/vc4_kms.c       | 110 +++++++++++++++++++++++++++-
 drivers/gpu/drm/vc4/vc4_plane.c     |  60 +++++++++++++++
 drivers/gpu/drm/vc4/vc4_regs.h      |  46 +++---------
 include/drm/drm_atomic_helper.h     |  53 ++++++++++++++
 include/drm/drm_device.h            |  15 ++++
 include/drm/drm_mode_config.h       |  26 +++++++
 11 files changed, 376 insertions(+), 36 deletions(-)

-- 
2.17.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 1/3] drm/atomic: Add a generic infrastructure to track underrun errors
  2018-10-25 12:45 [PATCH v2 0/3] drm/vc4: Add a load tracker Boris Brezillon
@ 2018-10-25 12:45 ` Boris Brezillon
  2018-10-26 10:33   ` Daniel Vetter
  2018-10-25 12:45 ` [PATCH v2 2/3] drm/vc4: Report " Boris Brezillon
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 18+ messages in thread
From: Boris Brezillon @ 2018-10-25 12:45 UTC (permalink / raw)
  To: Eric Anholt, Daniel Vetter; +Cc: Boris Brezillon, dri-devel

Display controllers usually provide a lot features like overlay planes,
hardware scalers, hardware rotations, ...
Most of the time those features work just fine when enabled
independently, but activating all of them at the same time tend to
show other limitations like the limited memory bus or scaler
bandwidth.

This sort of bandwidth estimation is hard to get right, and we
sometimes fail to predict that a specific workload will not pass,
which will most likely result in underrun errors somewhere in the
display pipeline (most of the time at the CRTC level).

This path aims at making underrun error tracking generic and exposing
underrun errors to userspace through a debugfs file.

This way, CI tools like IGT, can try to enable a bunch of features and
if such underrun errors are reported after the settings have been
accepted and applied by the KMS infrastructure. When that happens it
means the load tracking algorithm needs some tweaking/fixing.

Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
---
Changes in v2:
- New patch
---
 drivers/gpu/drm/drm_atomic.c        | 12 +++++++
 drivers/gpu/drm/drm_atomic_helper.c |  4 +++
 include/drm/drm_atomic_helper.h     | 53 +++++++++++++++++++++++++++++
 include/drm/drm_device.h            | 15 ++++++++
 include/drm/drm_mode_config.h       | 26 ++++++++++++++
 5 files changed, 110 insertions(+)

diff --git a/drivers/gpu/drm/drm_atomic.c b/drivers/gpu/drm/drm_atomic.c
index 0e27d45feba9..350348961ab3 100644
--- a/drivers/gpu/drm/drm_atomic.c
+++ b/drivers/gpu/drm/drm_atomic.c
@@ -1179,9 +1179,21 @@ static int drm_state_info(struct seq_file *m, void *data)
 	return 0;
 }
 
+static int drm_underrun_info(struct seq_file *m, void *data)
+{
+	struct drm_info_node *node = m->private;
+	struct drm_device *dev = node->minor->dev;
+	struct drm_printer p = drm_seq_file_printer(m);
+
+	drm_printf(&p, "%c\n", atomic_read(&dev->underrun) ? 'Y' : 'N');
+
+	return 0;
+}
+
 /* any use in debugfs files to dump individual planes/crtc/etc? */
 static const struct drm_info_list drm_atomic_debugfs_list[] = {
 	{"state", drm_state_info, 0},
+	{"underrun", drm_underrun_info, 0 },
 };
 
 int drm_atomic_debugfs_init(struct drm_minor *minor)
diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
index 6f66777dca4b..9956c2928c5b 100644
--- a/drivers/gpu/drm/drm_atomic_helper.c
+++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -1456,6 +1456,8 @@ void drm_atomic_helper_commit_tail(struct drm_atomic_state *old_state)
 {
 	struct drm_device *dev = old_state->dev;
 
+	drm_atomic_helper_underrun_stop(dev);
+
 	drm_atomic_helper_commit_modeset_disables(dev, old_state);
 
 	drm_atomic_helper_commit_planes(dev, old_state, 0);
@@ -1466,6 +1468,8 @@ void drm_atomic_helper_commit_tail(struct drm_atomic_state *old_state)
 
 	drm_atomic_helper_commit_hw_done(old_state);
 
+	drm_atomic_helper_underrun_start(dev);
+
 	drm_atomic_helper_wait_for_vblanks(dev, old_state);
 
 	drm_atomic_helper_cleanup_planes(dev, old_state);
diff --git a/include/drm/drm_atomic_helper.h b/include/drm/drm_atomic_helper.h
index 25ca0097563e..f712144dd26d 100644
--- a/include/drm/drm_atomic_helper.h
+++ b/include/drm/drm_atomic_helper.h
@@ -221,4 +221,57 @@ drm_atomic_plane_disabling(struct drm_plane_state *old_plane_state,
 	return old_plane_state->crtc && !new_plane_state->crtc;
 }
 
+/**
+ * drm_atomic_helper_underrun_set() - Set the underrun bit
+ * @dev: device the underrun happened on
+ *
+ * This function should be called when an underrun happens after an update
+ * has been committed to the device.
+ * Underrun interrupts are masked to avoid flooding the CPU with useless
+ * interrupts.
+ */
+static inline void drm_atomic_helper_underrun_set(struct drm_device *dev)
+{
+	atomic_set(&dev->underrun, 1);
+
+	if (dev->mode_config.funcs->atomic_mask_underrun)
+		dev->mode_config.funcs->atomic_mask_underrun(dev);
+}
+
+/**
+ * drm_atomic_helper_underrun_stop() - Stop generating underrun events
+ * @dev: device we want to stop underrun on
+ *
+ * This function should be called in the atomic commit path, just before
+ * committing changes to the hardware.
+ * The underrun interrupt is simply masked prior to getting underrun events
+ * while we are applying the new config.
+ */
+static inline void drm_atomic_helper_underrun_stop(struct drm_device *dev)
+{
+	if (dev->mode_config.funcs->atomic_mask_underrun)
+		dev->mode_config.funcs->atomic_mask_underrun(dev);
+}
+
+/**
+ * drm_atomic_helper_underrun_start() - Start listening to underrun events
+ * @dev: device we want to start underrun on
+ *
+ * This function should be called in the atomic commit path, just after
+ * the changes have been committed to the hardware.
+ * Any pending underrun events are cleared before unmasking the interrupt to
+ * guarantee that further underrun events are really caused by the new
+ * configuration.
+ */
+static inline void drm_atomic_helper_underrun_start(struct drm_device *dev)
+{
+	atomic_set(&dev->underrun, 0);
+
+	if (dev->mode_config.funcs->atomic_clear_underrun)
+		dev->mode_config.funcs->atomic_clear_underrun(dev);
+
+	if (dev->mode_config.funcs->atomic_unmask_underrun)
+		dev->mode_config.funcs->atomic_unmask_underrun(dev);
+}
+
 #endif /* DRM_ATOMIC_HELPER_H_ */
diff --git a/include/drm/drm_device.h b/include/drm/drm_device.h
index 42411b3ea0c8..59c63e03a776 100644
--- a/include/drm/drm_device.h
+++ b/include/drm/drm_device.h
@@ -231,6 +231,21 @@ struct drm_device {
 	 * Set by drm_fb_helper_init() and cleared by drm_fb_helper_fini().
 	 */
 	struct drm_fb_helper *fb_helper;
+
+	/**
+	 * @underrun:
+	 *
+	 * Set to 1 when an underrun error happened since the last atomic
+	 * commit, 0 otherwise.
+	 * This is particularly useful to detect when a specific modeset is too
+	 * demanding in term of memory bandwidth or other kind limitations that
+	 * are hard to guess at atomic check time.
+	 * Drivers should only set underrun to 1 (using
+	 * drm_atomic_helper_underrun_set()) when an underrun interrupt occurs
+	 * and clear if from their atomic commit hook (using
+	 * drm_atomic_helper_underrun_{stop,start}()).
+	 */
+	atomic_t underrun;
 };
 
 #endif
diff --git a/include/drm/drm_mode_config.h b/include/drm/drm_mode_config.h
index 8d540cdae57d..28b83e295fd2 100644
--- a/include/drm/drm_mode_config.h
+++ b/include/drm/drm_mode_config.h
@@ -325,6 +325,32 @@ struct drm_mode_config_funcs {
 	 * &drm_private_state and &drm_private_obj.
 	 */
 	void (*atomic_state_free)(struct drm_atomic_state *state);
+
+	/**
+	 * @atomic_mask_underrun:
+	 *
+	 * This is an optional hook called when the core needs to disable/mask
+	 * the underrun interrupt. It's used by drm_atomic_helper_underrun_set()
+	 * and drm_atomic_helper_underrun_stop().
+	 */
+	void (*atomic_mask_underrun)(struct drm_device *dev);
+
+	/**
+	 * @atomic_clear_underrun:
+	 *
+	 * This is an optional hook called when the core needs to clear pending
+	 * underrun events. It's used by drm_atomic_helper_underrun_start() to
+	 * before unmask the underrun interrupt.
+	 */
+	void (*atomic_clear_underrun)(struct drm_device *dev);
+
+	/**
+	 * @atomic_unmask_underrun:
+	 *
+	 * This is an optional hook called when the core needs to unmask
+	 * underrun interrupts. It's used by drm_atomic_helper_underrun_start().
+	 */
+	void (*atomic_unmask_underrun)(struct drm_device *dev);
 };
 
 /**
-- 
2.17.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 2/3] drm/vc4: Report underrun errors
  2018-10-25 12:45 [PATCH v2 0/3] drm/vc4: Add a load tracker Boris Brezillon
  2018-10-25 12:45 ` [PATCH v2 1/3] drm/atomic: Add a generic infrastructure to track underrun errors Boris Brezillon
@ 2018-10-25 12:45 ` Boris Brezillon
  2018-10-25 12:45 ` [PATCH v2 3/3] drm/vc4: Add a load tracker to prevent HVS underflow errors Boris Brezillon
  2018-11-28  9:16 ` [PATCH v2 0/3] drm/vc4: Add a load tracker Paul Kocialkowski
  3 siblings, 0 replies; 18+ messages in thread
From: Boris Brezillon @ 2018-10-25 12:45 UTC (permalink / raw)
  To: Eric Anholt, Daniel Vetter; +Cc: Boris Brezillon, dri-devel

The DRM framework provides a generic way to report underrun errors.
Let's implement the necessary hooks to support it in the VC4 driver.

Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
---
Changes in v2:
- New patch
---
 drivers/gpu/drm/vc4/vc4_drv.h  |  3 ++
 drivers/gpu/drm/vc4/vc4_hvs.c  | 71 ++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/vc4/vc4_kms.c  |  7 ++++
 drivers/gpu/drm/vc4/vc4_regs.h | 46 ++++++----------------
 4 files changed, 92 insertions(+), 35 deletions(-)

diff --git a/drivers/gpu/drm/vc4/vc4_drv.h b/drivers/gpu/drm/vc4/vc4_drv.h
index bd6ef1f31822..a6c67c65ffbc 100644
--- a/drivers/gpu/drm/vc4/vc4_drv.h
+++ b/drivers/gpu/drm/vc4/vc4_drv.h
@@ -763,6 +763,9 @@ void vc4_irq_reset(struct drm_device *dev);
 extern struct platform_driver vc4_hvs_driver;
 void vc4_hvs_dump_state(struct drm_device *dev);
 int vc4_hvs_debugfs_regs(struct seq_file *m, void *unused);
+void vc4_hvs_unmask_underrun(struct drm_device *dev);
+void vc4_hvs_mask_underrun(struct drm_device *dev);
+void vc4_hvs_clear_underrun(struct drm_device *dev);
 
 /* vc4_kms.c */
 int vc4_kms_load(struct drm_device *dev);
diff --git a/drivers/gpu/drm/vc4/vc4_hvs.c b/drivers/gpu/drm/vc4/vc4_hvs.c
index 5d8c749c9749..12ff95ff1d29 100644
--- a/drivers/gpu/drm/vc4/vc4_hvs.c
+++ b/drivers/gpu/drm/vc4/vc4_hvs.c
@@ -22,6 +22,7 @@
  * each CRTC.
  */
 
+#include <drm/drm_atomic_helper.h>
 #include <linux/component.h>
 #include "vc4_drv.h"
 #include "vc4_regs.h"
@@ -166,6 +167,58 @@ static int vc4_hvs_upload_linear_kernel(struct vc4_hvs *hvs,
 	return 0;
 }
 
+static irqreturn_t vc4_hvs_irq_handler(int irq, void *data)
+{
+	struct drm_device *dev = data;
+	struct vc4_dev *vc4 = to_vc4_dev(dev);
+	u32 status;
+
+	status = HVS_READ(SCALER_DISPSTAT);
+
+	if (status &
+	    (SCALER_DISPSTAT_EUFLOW(0) | SCALER_DISPSTAT_EUFLOW(1) |
+	     SCALER_DISPSTAT_EUFLOW(2)))
+		drm_atomic_helper_underrun_set(dev);
+
+	HVS_WRITE(SCALER_DISPSTAT, status);
+
+	return status ? IRQ_HANDLED : IRQ_NONE;
+}
+
+void vc4_hvs_mask_underrun(struct drm_device *dev)
+{
+	struct vc4_dev *vc4 = to_vc4_dev(dev);
+	u32 dispctrl = HVS_READ(SCALER_DISPCTRL);
+
+	dispctrl &= ~(SCALER_DISPCTRL_DSPEISLUR(0) |
+		      SCALER_DISPCTRL_DSPEISLUR(1) |
+		      SCALER_DISPCTRL_DSPEISLUR(2));
+
+	HVS_WRITE(SCALER_DISPCTRL, dispctrl);
+}
+
+void vc4_hvs_clear_underrun(struct drm_device *dev)
+{
+	struct vc4_dev *vc4 = to_vc4_dev(dev);
+
+	HVS_WRITE(SCALER_DISPSTAT,
+		  SCALER_DISPSTAT_EUFLOW(0) |
+		  SCALER_DISPSTAT_EUFLOW(1) |
+		  SCALER_DISPSTAT_EUFLOW(2));
+}
+
+void vc4_hvs_unmask_underrun(struct drm_device *dev)
+{
+	struct vc4_dev *vc4 = to_vc4_dev(dev);
+	u32 dispctrl = HVS_READ(SCALER_DISPCTRL);
+
+	dispctrl |= SCALER_DISPCTRL_DSPEISLUR(0) |
+		    SCALER_DISPCTRL_DSPEISLUR(1) |
+		    SCALER_DISPCTRL_DSPEISLUR(2);
+
+	HVS_WRITE(SCALER_DISPCTRL, dispctrl);
+}
+
 static int vc4_hvs_bind(struct device *dev, struct device *master, void *data)
 {
 	struct platform_device *pdev = to_platform_device(dev);
@@ -219,15 +272,33 @@ static int vc4_hvs_bind(struct device *dev, struct device *master, void *data)
 	dispctrl = HVS_READ(SCALER_DISPCTRL);
 
 	dispctrl |= SCALER_DISPCTRL_ENABLE;
+	dispctrl |= SCALER_DISPCTRL_DSPEISLUR(0) | SCALER_DISPCTRL_DISPEIRQ(0) |
+		    SCALER_DISPCTRL_DSPEISLUR(1) | SCALER_DISPCTRL_DISPEIRQ(1) |
+		    SCALER_DISPCTRL_DSPEISLUR(2) | SCALER_DISPCTRL_DISPEIRQ(2);
 
 	/* Set DSP3 (PV1) to use HVS channel 2, which would otherwise
 	 * be unused.
 	 */
 	dispctrl &= ~SCALER_DISPCTRL_DSP3_MUX_MASK;
+	dispctrl &= ~(SCALER_DISPCTRL_DMAEIRQ |
+		      SCALER_DISPCTRL_SLVWREIRQ |
+		      SCALER_DISPCTRL_SLVRDEIRQ |
+		      SCALER_DISPCTRL_DSPEIEOF(0) |
+		      SCALER_DISPCTRL_DSPEIEOF(1) |
+		      SCALER_DISPCTRL_DSPEIEOF(2) |
+		      SCALER_DISPCTRL_DSPEIEOLN(0) |
+		      SCALER_DISPCTRL_DSPEIEOLN(1) |
+		      SCALER_DISPCTRL_DSPEIEOLN(2) |
+		      SCALER_DISPCTRL_SCLEIRQ);
 	dispctrl |= VC4_SET_FIELD(2, SCALER_DISPCTRL_DSP3_MUX);
 
 	HVS_WRITE(SCALER_DISPCTRL, dispctrl);
 
+	ret = devm_request_irq(dev, platform_get_irq(pdev, 0),
+			       vc4_hvs_irq_handler, 0, "vc4 hvs", drm);
+	if (ret)
+		return ret;
+
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/vc4/vc4_kms.c b/drivers/gpu/drm/vc4/vc4_kms.c
index 5c9823ccb3f8..8e0183b1e8bb 100644
--- a/drivers/gpu/drm/vc4/vc4_kms.c
+++ b/drivers/gpu/drm/vc4/vc4_kms.c
@@ -139,6 +139,8 @@ vc4_atomic_complete_commit(struct drm_atomic_state *state)
 	struct drm_device *dev = state->dev;
 	struct vc4_dev *vc4 = to_vc4_dev(dev);
 
+	drm_atomic_helper_underrun_stop(dev);
+
 	drm_atomic_helper_wait_for_fences(dev, state, false);
 
 	drm_atomic_helper_wait_for_dependencies(state);
@@ -155,6 +157,8 @@ vc4_atomic_complete_commit(struct drm_atomic_state *state)
 
 	drm_atomic_helper_commit_hw_done(state);
 
+	drm_atomic_helper_underrun_start(dev);
+
 	drm_atomic_helper_wait_for_flip_done(dev, state);
 
 	drm_atomic_helper_cleanup_planes(dev, state);
@@ -395,6 +399,9 @@ static const struct drm_mode_config_funcs vc4_mode_funcs = {
 	.atomic_check = vc4_atomic_check,
 	.atomic_commit = vc4_atomic_commit,
 	.fb_create = vc4_fb_create,
+	.atomic_mask_underrun = vc4_hvs_mask_underrun,
+	.atomic_unmask_underrun = vc4_hvs_unmask_underrun,
+	.atomic_clear_underrun = vc4_hvs_clear_underrun,
 };
 
 int vc4_kms_load(struct drm_device *dev)
diff --git a/drivers/gpu/drm/vc4/vc4_regs.h b/drivers/gpu/drm/vc4/vc4_regs.h
index 931088014272..7067c62b09e8 100644
--- a/drivers/gpu/drm/vc4/vc4_regs.h
+++ b/drivers/gpu/drm/vc4/vc4_regs.h
@@ -215,8 +215,6 @@
 #define SCALER_DISPCTRL                         0x00000000
 /* Global register for clock gating the HVS */
 # define SCALER_DISPCTRL_ENABLE			BIT(31)
-# define SCALER_DISPCTRL_DSP2EISLUR		BIT(15)
-# define SCALER_DISPCTRL_DSP1EISLUR		BIT(14)
 # define SCALER_DISPCTRL_DSP3_MUX_MASK		VC4_MASK(19, 18)
 # define SCALER_DISPCTRL_DSP3_MUX_SHIFT		18
 
@@ -224,45 +222,25 @@
  * SCALER_DISPSTAT_IRQDISP0.  Note that short frame contributions are
  * always enabled.
  */
-# define SCALER_DISPCTRL_DSP0EISLUR		BIT(13)
-# define SCALER_DISPCTRL_DSP2EIEOLN		BIT(12)
-# define SCALER_DISPCTRL_DSP2EIEOF		BIT(11)
-# define SCALER_DISPCTRL_DSP1EIEOLN		BIT(10)
-# define SCALER_DISPCTRL_DSP1EIEOF		BIT(9)
+# define SCALER_DISPCTRL_DSPEISLUR(x)		BIT(13 + (x))
 /* Enables Display 0 end-of-line-N contribution to
  * SCALER_DISPSTAT_IRQDISP0
  */
-# define SCALER_DISPCTRL_DSP0EIEOLN		BIT(8)
+# define SCALER_DISPCTRL_DSPEIEOLN(x)		BIT(8 + ((x) * 2))
 /* Enables Display 0 EOF contribution to SCALER_DISPSTAT_IRQDISP0 */
-# define SCALER_DISPCTRL_DSP0EIEOF		BIT(7)
+# define SCALER_DISPCTRL_DSPEIEOF(x)		BIT(7 + ((x) * 2))
 
 # define SCALER_DISPCTRL_SLVRDEIRQ		BIT(6)
 # define SCALER_DISPCTRL_SLVWREIRQ		BIT(5)
 # define SCALER_DISPCTRL_DMAEIRQ		BIT(4)
-# define SCALER_DISPCTRL_DISP2EIRQ		BIT(3)
-# define SCALER_DISPCTRL_DISP1EIRQ		BIT(2)
 /* Enables interrupt generation on the enabled EOF/EOLN/EISLUR
  * bits and short frames..
  */
-# define SCALER_DISPCTRL_DISP0EIRQ		BIT(1)
+# define SCALER_DISPCTRL_DISPEIRQ(x)		BIT(1 + (x))
 /* Enables interrupt generation on scaler profiler interrupt. */
 # define SCALER_DISPCTRL_SCLEIRQ		BIT(0)
 
 #define SCALER_DISPSTAT                         0x00000004
-# define SCALER_DISPSTAT_COBLOW2		BIT(29)
-# define SCALER_DISPSTAT_EOLN2			BIT(28)
-# define SCALER_DISPSTAT_ESFRAME2		BIT(27)
-# define SCALER_DISPSTAT_ESLINE2		BIT(26)
-# define SCALER_DISPSTAT_EUFLOW2		BIT(25)
-# define SCALER_DISPSTAT_EOF2			BIT(24)
-
-# define SCALER_DISPSTAT_COBLOW1		BIT(21)
-# define SCALER_DISPSTAT_EOLN1			BIT(20)
-# define SCALER_DISPSTAT_ESFRAME1		BIT(19)
-# define SCALER_DISPSTAT_ESLINE1		BIT(18)
-# define SCALER_DISPSTAT_EUFLOW1		BIT(17)
-# define SCALER_DISPSTAT_EOF1			BIT(16)
-
 # define SCALER_DISPSTAT_RESP_MASK		VC4_MASK(15, 14)
 # define SCALER_DISPSTAT_RESP_SHIFT		14
 # define SCALER_DISPSTAT_RESP_OKAY		0
@@ -270,23 +248,23 @@
 # define SCALER_DISPSTAT_RESP_SLVERR		2
 # define SCALER_DISPSTAT_RESP_DECERR		3
 
-# define SCALER_DISPSTAT_COBLOW0		BIT(13)
+# define SCALER_DISPSTAT_COBLOW(x)		BIT(5 + (((x) + 1) * 8))
 /* Set when the DISPEOLN line is done compositing. */
-# define SCALER_DISPSTAT_EOLN0			BIT(12)
+# define SCALER_DISPSTAT_EOLN(x)		BIT(4 + (((x) + 1) * 8))
 /* Set when VSTART is seen but there are still pixels in the current
  * output line.
  */
-# define SCALER_DISPSTAT_ESFRAME0		BIT(11)
+# define SCALER_DISPSTAT_ESFRAME(x)		BIT(3 + (((x) + 1) * 8))
 /* Set when HSTART is seen but there are still pixels in the current
  * output line.
  */
-# define SCALER_DISPSTAT_ESLINE0		BIT(10)
+# define SCALER_DISPSTAT_ESLINE(x)		BIT(2 + (((x) + 1) * 8))
 /* Set when the the downstream tries to read from the display FIFO
  * while it's empty.
  */
-# define SCALER_DISPSTAT_EUFLOW0		BIT(9)
+# define SCALER_DISPSTAT_EUFLOW(x)		BIT(1 + (((x) + 1) * 8))
 /* Set when the display mode changes from RUN to EOF */
-# define SCALER_DISPSTAT_EOF0			BIT(8)
+# define SCALER_DISPSTAT_EOF(x)			BIT(((x) + 1) * 8)
 
 /* Set on AXI invalid DMA ID error. */
 # define SCALER_DISPSTAT_DMA_ERROR		BIT(7)
@@ -298,12 +276,10 @@
  * SCALER_DISPSTAT_RESP_ERROR is not SCALER_DISPSTAT_RESP_OKAY.
  */
 # define SCALER_DISPSTAT_IRQDMA			BIT(4)
-# define SCALER_DISPSTAT_IRQDISP2		BIT(3)
-# define SCALER_DISPSTAT_IRQDISP1		BIT(2)
 /* Set when any of the EOF/EOLN/ESFRAME/ESLINE bits are set and their
  * corresponding interrupt bit is enabled in DISPCTRL.
  */
-# define SCALER_DISPSTAT_IRQDISP0		BIT(1)
+# define SCALER_DISPSTAT_IRQDISP(x)		BIT(1 + (x))
 /* On read, the profiler interrupt.  On write, clear *all* interrupt bits. */
 # define SCALER_DISPSTAT_IRQSCL			BIT(0)
 
-- 
2.17.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 3/3] drm/vc4: Add a load tracker to prevent HVS underflow errors
  2018-10-25 12:45 [PATCH v2 0/3] drm/vc4: Add a load tracker Boris Brezillon
  2018-10-25 12:45 ` [PATCH v2 1/3] drm/atomic: Add a generic infrastructure to track underrun errors Boris Brezillon
  2018-10-25 12:45 ` [PATCH v2 2/3] drm/vc4: Report " Boris Brezillon
@ 2018-10-25 12:45 ` Boris Brezillon
  2018-10-30 23:12   ` Eric Anholt
  2018-11-28  9:16 ` [PATCH v2 0/3] drm/vc4: Add a load tracker Paul Kocialkowski
  3 siblings, 1 reply; 18+ messages in thread
From: Boris Brezillon @ 2018-10-25 12:45 UTC (permalink / raw)
  To: Eric Anholt, Daniel Vetter; +Cc: Boris Brezillon, dri-devel

The HVS block is supposed to fill the pixelvalve FIFOs fast enough to
meet the requested framerate. The problem is, the HVS and memory bus
bandwidths are limited, and if we don't take these limitations into
account we might end up with HVS underflow errors.

This patch is trying to model the per-plane HVS and memory bus bandwidth
consumption and take a decision at atomic_check() time whether the
estimated load will fit in the HVS and membus budget.

Note that we take an extra margin on the memory bus consumption to let
the system run smoothly when other blocks are doing heavy use of the
memory bus. Same goes for the HVS limit, except the margin is smaller in
this case, since the HVS is not used by external components.

Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
---
Changes in v2:
- Remove an unused var in vc4_load_tracker_atomic_check()
---
 drivers/gpu/drm/vc4/vc4_drv.c   |   1 +
 drivers/gpu/drm/vc4/vc4_drv.h   |  11 ++++
 drivers/gpu/drm/vc4/vc4_kms.c   | 103 +++++++++++++++++++++++++++++++-
 drivers/gpu/drm/vc4/vc4_plane.c |  60 +++++++++++++++++++
 4 files changed, 174 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/vc4/vc4_drv.c b/drivers/gpu/drm/vc4/vc4_drv.c
index f6f5cd80c04d..7195a0bcceb3 100644
--- a/drivers/gpu/drm/vc4/vc4_drv.c
+++ b/drivers/gpu/drm/vc4/vc4_drv.c
@@ -313,6 +313,7 @@ static void vc4_drm_unbind(struct device *dev)
 
 	drm_mode_config_cleanup(drm);
 
+	drm_atomic_private_obj_fini(&vc4->load_tracker);
 	drm_atomic_private_obj_fini(&vc4->ctm_manager);
 
 	drm_dev_put(drm);
diff --git a/drivers/gpu/drm/vc4/vc4_drv.h b/drivers/gpu/drm/vc4/vc4_drv.h
index a6c67c65ffbc..11369da944b6 100644
--- a/drivers/gpu/drm/vc4/vc4_drv.h
+++ b/drivers/gpu/drm/vc4/vc4_drv.h
@@ -200,6 +200,7 @@ struct vc4_dev {
 
 	struct drm_modeset_lock ctm_state_lock;
 	struct drm_private_obj ctm_manager;
+	struct drm_private_obj load_tracker;
 };
 
 static inline struct vc4_dev *
@@ -369,6 +370,16 @@ struct vc4_plane_state {
 	 * to enable background color fill.
 	 */
 	bool needs_bg_fill;
+
+	/* Load of this plane on the HVS block. The load is expressed in HVS
+	 * cycles/sec.
+	 */
+	u64 hvs_load;
+
+	/* Memory bandwidth needed for this plane. This is expressed in
+	 * bytes/sec.
+	 */
+	u64 membus_load;
 };
 
 static inline struct vc4_plane_state *
diff --git a/drivers/gpu/drm/vc4/vc4_kms.c b/drivers/gpu/drm/vc4/vc4_kms.c
index 8e0183b1e8bb..b905a19c1470 100644
--- a/drivers/gpu/drm/vc4/vc4_kms.c
+++ b/drivers/gpu/drm/vc4/vc4_kms.c
@@ -34,6 +34,18 @@ static struct vc4_ctm_state *to_vc4_ctm_state(struct drm_private_state *priv)
 	return container_of(priv, struct vc4_ctm_state, base);
 }
 
+struct vc4_load_tracker_state {
+	struct drm_private_state base;
+	u64 hvs_load;
+	u64 membus_load;
+};
+
+static struct vc4_load_tracker_state *
+to_vc4_load_tracker_state(struct drm_private_state *priv)
+{
+	return container_of(priv, struct vc4_load_tracker_state, base);
+}
+
 static struct vc4_ctm_state *vc4_get_ctm_state(struct drm_atomic_state *state,
 					       struct drm_private_obj *manager)
 {
@@ -383,6 +395,81 @@ vc4_ctm_atomic_check(struct drm_device *dev, struct drm_atomic_state *state)
 	return 0;
 }
 
+static int vc4_load_tracker_atomic_check(struct drm_atomic_state *state)
+{
+	struct drm_plane_state *old_plane_state, *new_plane_state;
+	struct vc4_dev *vc4 = to_vc4_dev(state->dev);
+	struct vc4_load_tracker_state *load_state;
+	struct drm_private_state *priv_state;
+	struct drm_plane *plane;
+	int i;
+
+	priv_state = drm_atomic_get_private_obj_state(state,
+						      &vc4->load_tracker);
+	if (IS_ERR(priv_state))
+		return PTR_ERR(priv_state);
+
+	load_state = to_vc4_load_tracker_state(priv_state);
+	for_each_oldnew_plane_in_state(state, plane, old_plane_state,
+				       new_plane_state, i) {
+		struct vc4_plane_state *vc4_plane_state;
+
+		if (old_plane_state->fb && old_plane_state->crtc) {
+			vc4_plane_state = to_vc4_plane_state(old_plane_state);
+			load_state->membus_load -= vc4_plane_state->membus_load;
+			load_state->hvs_load -= vc4_plane_state->hvs_load;
+		}
+
+		if (new_plane_state->fb && new_plane_state->crtc) {
+			vc4_plane_state = to_vc4_plane_state(new_plane_state);
+			load_state->membus_load += vc4_plane_state->membus_load;
+			load_state->hvs_load += vc4_plane_state->hvs_load;
+		}
+	}
+
+	/* The abolsute limit is 2Gbyte/sec, but let's take a margin to let
+	 * the system work when other blocks are accessing the memory.
+	 */
+	if (load_state->membus_load > SZ_1G + SZ_512M)
+		return -ENOSPC;
+
+	/* HVS clock is supposed to run @ 250Mhz, let's take a margin and
+	 * consider the maximum number of cycles is 240M.
+	 */
+	if (load_state->hvs_load > 240000000ULL)
+		return -ENOSPC;
+
+	return 0;
+}
+
+static struct drm_private_state *
+vc4_load_tracker_duplicate_state(struct drm_private_obj *obj)
+{
+	struct vc4_load_tracker_state *state;
+
+	state = kmemdup(obj->state, sizeof(*state), GFP_KERNEL);
+	if (!state)
+		return NULL;
+
+	__drm_atomic_helper_private_obj_duplicate_state(obj, &state->base);
+
+	return &state->base;
+}
+
+static void vc4_load_tracker_destroy_state(struct drm_private_obj *obj,
+					   struct drm_private_state *state)
+{
+	struct vc4_load_tracker_state *load_state;
+
+	load_state = to_vc4_load_tracker_state(state);
+	kfree(load_state);
+}
+
+static const struct drm_private_state_funcs vc4_load_tracker_state_funcs = {
+	.atomic_duplicate_state = vc4_load_tracker_duplicate_state,
+	.atomic_destroy_state = vc4_load_tracker_destroy_state,
+};
+
 static int
 vc4_atomic_check(struct drm_device *dev, struct drm_atomic_state *state)
 {
@@ -392,7 +479,11 @@ vc4_atomic_check(struct drm_device *dev, struct drm_atomic_state *state)
 	if (ret < 0)
 		return ret;
 
-	return drm_atomic_helper_check(dev, state);
+	ret = drm_atomic_helper_check(dev, state);
+	if (ret)
+		return ret;
+
+	return vc4_load_tracker_atomic_check(state);
 }
 
 static const struct drm_mode_config_funcs vc4_mode_funcs = {
@@ -408,6 +499,7 @@ int vc4_kms_load(struct drm_device *dev)
 {
 	struct vc4_dev *vc4 = to_vc4_dev(dev);
 	struct vc4_ctm_state *ctm_state;
+	struct vc4_load_tracker_state *load_state;
 	int ret;
 
 	sema_init(&vc4->async_modeset, 1);
@@ -437,6 +529,15 @@ int vc4_kms_load(struct drm_device *dev)
 	drm_atomic_private_obj_init(dev, &vc4->ctm_manager, &ctm_state->base,
 				    &vc4_ctm_state_funcs);
 
+	load_state = kzalloc(sizeof(*load_state), GFP_KERNEL);
+	if (!load_state) {
+		drm_atomic_private_obj_fini(&vc4->ctm_manager);
+		return -ENOMEM;
+	}
+
+	drm_atomic_private_obj_init(dev, &vc4->load_tracker, &load_state->base,
+				    &vc4_load_tracker_state_funcs);
+
 	drm_mode_config_reset(dev);
 
 	drm_kms_helper_poll_init(dev);
diff --git a/drivers/gpu/drm/vc4/vc4_plane.c b/drivers/gpu/drm/vc4/vc4_plane.c
index 0f294d612769..5e9687406517 100644
--- a/drivers/gpu/drm/vc4/vc4_plane.c
+++ b/drivers/gpu/drm/vc4/vc4_plane.c
@@ -450,6 +450,64 @@ static void vc4_write_scaling_parameters(struct drm_plane_state *state,
 	}
 }
 
+static void vc4_plane_calc_load(struct drm_plane_state *state)
+{
+	unsigned int hvs_load_shift, vrefresh, i;
+	struct drm_framebuffer *fb = state->fb;
+	struct vc4_plane_state *vc4_state;
+	struct drm_crtc_state *crtc_state;
+	unsigned int vscale_factor;
+
+	vc4_state = to_vc4_plane_state(state);
+	crtc_state = drm_atomic_get_existing_crtc_state(state->state,
+							state->crtc);
+	vrefresh = drm_mode_vrefresh(&crtc_state->adjusted_mode);
+
+	/* The HVS is able to process 2 pixels/cycle when scaling the source,
+	 * 4 pixels/cycle otherwise.
+	 * Alpha blending step seems to be pipelined and it's always operating
+	 * at 4 pixels/cycle, so the limiting aspect here seems to be the
+	 * scaler block.
+	 * HVS load is expressed in clk-cycles/sec (AKA Hz).
+	 */
+	if (vc4_state->x_scaling[0] != VC4_SCALING_NONE ||
+	    vc4_state->x_scaling[1] != VC4_SCALING_NONE ||
+	    vc4_state->y_scaling[0] != VC4_SCALING_NONE ||
+	    vc4_state->y_scaling[1] != VC4_SCALING_NONE)
+		hvs_load_shift = 1;
+	else
+		hvs_load_shift = 2;
+
+	vc4_state->membus_load = 0;
+	vc4_state->hvs_load = 0;
+	for (i = 0; i < fb->format->num_planes; i++) {
+		unsigned long pixels_load;
+
+		/* Even if the bandwidth/plane required for a single frame is
+		 *
+		 * vc4_state->src_w[i] * vc4_state->src_h[i] * cpp * vrefresh
+		 *
+		 * when downscaling, we have to read more pixels per line in
+		 * the time frame reserved for a single line, so the bandwidth
+		 * demand can be punctually higher. To account for that, we
+		 * calculate the down-scaling factor and multiply the plane
+		 * load by this number. We're likely over-estimating the read
+		 * demand, but that's better than under-estimating it.
+		 */
+		vscale_factor = DIV_ROUND_UP(vc4_state->src_h[i],
+					     vc4_state->crtc_h);
+		pixels_load = vc4_state->src_w[i] * vc4_state->src_h[i] *
+			      vscale_factor;
+
+		vc4_state->membus_load += fb->format->cpp[i] * pixels_load;
+		vc4_state->hvs_load += pixels_load;
+	}
+
+	vc4_state->hvs_load *= vrefresh;
+	vc4_state->hvs_load >>= hvs_load_shift;
+	vc4_state->membus_load *= vrefresh;
+}
+
 /* Writes out a full display list for an active plane to the plane's
  * private dlist state.
  */
@@ -769,6 +827,8 @@ static int vc4_plane_mode_set(struct drm_plane *plane,
 	vc4_state->needs_bg_fill = fb->format->has_alpha || !covers_screen ||
 				   state->alpha != DRM_BLEND_ALPHA_OPAQUE;
 
+	vc4_plane_calc_load(state);
+
 	return 0;
 }
 
-- 
2.17.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 1/3] drm/atomic: Add a generic infrastructure to track underrun errors
  2018-10-25 12:45 ` [PATCH v2 1/3] drm/atomic: Add a generic infrastructure to track underrun errors Boris Brezillon
@ 2018-10-26 10:33   ` Daniel Vetter
  2018-10-26 12:36     ` Boris Brezillon
  0 siblings, 1 reply; 18+ messages in thread
From: Daniel Vetter @ 2018-10-26 10:33 UTC (permalink / raw)
  To: Boris Brezillon; +Cc: dri-devel

On Thu, Oct 25, 2018 at 02:45:44PM +0200, Boris Brezillon wrote:
> Display controllers usually provide a lot features like overlay planes,
> hardware scalers, hardware rotations, ...
> Most of the time those features work just fine when enabled
> independently, but activating all of them at the same time tend to
> show other limitations like the limited memory bus or scaler
> bandwidth.
> 
> This sort of bandwidth estimation is hard to get right, and we
> sometimes fail to predict that a specific workload will not pass,
> which will most likely result in underrun errors somewhere in the
> display pipeline (most of the time at the CRTC level).
> 
> This path aims at making underrun error tracking generic and exposing
> underrun errors to userspace through a debugfs file.
> 
> This way, CI tools like IGT, can try to enable a bunch of features and
> if such underrun errors are reported after the settings have been
> accepted and applied by the KMS infrastructure. When that happens it
> means the load tracking algorithm needs some tweaking/fixing.
> 
> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
> ---
> Changes in v2:
> - New patch

Hm, not what I thought off. This is definitely not going to be reusable
for i915, the only other driver with underrun reporting, since we have
tons of underrun sources (not just one). And we need to mask/unmask them
individually.

It's also midlayer mistake, since you're stuff your callbacks into the
drm_mode_config_funcs core structure (which should only be used for uapi
interfaces, not helper stuff).

What I had in mind is just a simple function which spews into dmesg (so
that i915 could use it) and increments the underrun counter in debugfs.
Everything else needs to be handled in drivers I think. E.g.

drm_mode_config_report_underrun(struct drm_device *dev,
				const char *source);

and then rolling it out for i915 too (otherwise there's really no point in
the common infrastructure).

Cheers, Daniel

> ---
>  drivers/gpu/drm/drm_atomic.c        | 12 +++++++
>  drivers/gpu/drm/drm_atomic_helper.c |  4 +++
>  include/drm/drm_atomic_helper.h     | 53 +++++++++++++++++++++++++++++
>  include/drm/drm_device.h            | 15 ++++++++
>  include/drm/drm_mode_config.h       | 26 ++++++++++++++
>  5 files changed, 110 insertions(+)
> 
> diff --git a/drivers/gpu/drm/drm_atomic.c b/drivers/gpu/drm/drm_atomic.c
> index 0e27d45feba9..350348961ab3 100644
> --- a/drivers/gpu/drm/drm_atomic.c
> +++ b/drivers/gpu/drm/drm_atomic.c
> @@ -1179,9 +1179,21 @@ static int drm_state_info(struct seq_file *m, void *data)
>  	return 0;
>  }
>  
> +static int drm_underrun_info(struct seq_file *m, void *data)
> +{
> +	struct drm_info_node *node = m->private;
> +	struct drm_device *dev = node->minor->dev;
> +	struct drm_printer p = drm_seq_file_printer(m);
> +
> +	drm_printf(&p, "%c\n", atomic_read(&dev->underrun) ? 'Y' : 'N');
> +
> +	return 0;
> +}
> +
>  /* any use in debugfs files to dump individual planes/crtc/etc? */
>  static const struct drm_info_list drm_atomic_debugfs_list[] = {
>  	{"state", drm_state_info, 0},
> +	{"underrun", drm_underrun_info, 0 },
>  };
>  
>  int drm_atomic_debugfs_init(struct drm_minor *minor)
> diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
> index 6f66777dca4b..9956c2928c5b 100644
> --- a/drivers/gpu/drm/drm_atomic_helper.c
> +++ b/drivers/gpu/drm/drm_atomic_helper.c
> @@ -1456,6 +1456,8 @@ void drm_atomic_helper_commit_tail(struct drm_atomic_state *old_state)
>  {
>  	struct drm_device *dev = old_state->dev;
>  
> +	drm_atomic_helper_underrun_stop(dev);
> +
>  	drm_atomic_helper_commit_modeset_disables(dev, old_state);
>  
>  	drm_atomic_helper_commit_planes(dev, old_state, 0);
> @@ -1466,6 +1468,8 @@ void drm_atomic_helper_commit_tail(struct drm_atomic_state *old_state)
>  
>  	drm_atomic_helper_commit_hw_done(old_state);
>  
> +	drm_atomic_helper_underrun_start(dev);
> +
>  	drm_atomic_helper_wait_for_vblanks(dev, old_state);
>  
>  	drm_atomic_helper_cleanup_planes(dev, old_state);
> diff --git a/include/drm/drm_atomic_helper.h b/include/drm/drm_atomic_helper.h
> index 25ca0097563e..f712144dd26d 100644
> --- a/include/drm/drm_atomic_helper.h
> +++ b/include/drm/drm_atomic_helper.h
> @@ -221,4 +221,57 @@ drm_atomic_plane_disabling(struct drm_plane_state *old_plane_state,
>  	return old_plane_state->crtc && !new_plane_state->crtc;
>  }
>  
> +/**
> + * drm_atomic_helper_underrun_set() - Set the underrun bit
> + * @dev: device the underrun happened on
> + *
> + * This function should be called when an underrun happens after an update
> + * has been committed to the device.
> + * Underrun interrupts are masked to avoid flooding the CPU with useless
> + * interrupts.
> + */
> +static inline void drm_atomic_helper_underrun_set(struct drm_device *dev)
> +{
> +	atomic_set(&dev->underrun, 1);
> +
> +	if (dev->mode_config.funcs->atomic_mask_underrun)
> +		dev->mode_config.funcs->atomic_mask_underrun(dev);
> +}
> +
> +/**
> + * drm_atomic_helper_underrun_stop() - Stop generating underrun events
> + * @dev: device we want to stop underrun on
> + *
> + * This function should be called in the atomic commit path, just before
> + * committing changes to the hardware.
> + * The underrun interrupt is simply masked prior to getting underrun events
> + * while we are applying the new config.
> + */
> +static inline void drm_atomic_helper_underrun_stop(struct drm_device *dev)
> +{
> +	if (dev->mode_config.funcs->atomic_mask_underrun)
> +		dev->mode_config.funcs->atomic_mask_underrun(dev);
> +}
> +
> +/**
> + * drm_atomic_helper_underrun_start() - Start listening to underrun events
> + * @dev: device we want to start underrun on
> + *
> + * This function should be called in the atomic commit path, just after
> + * the changes have been committed to the hardware.
> + * Any pending underrun events are cleared before unmasking the interrupt to
> + * guarantee that further underrun events are really caused by the new
> + * configuration.
> + */
> +static inline void drm_atomic_helper_underrun_start(struct drm_device *dev)
> +{
> +	atomic_set(&dev->underrun, 0);
> +
> +	if (dev->mode_config.funcs->atomic_clear_underrun)
> +		dev->mode_config.funcs->atomic_clear_underrun(dev);
> +
> +	if (dev->mode_config.funcs->atomic_unmask_underrun)
> +		dev->mode_config.funcs->atomic_unmask_underrun(dev);
> +}
> +
>  #endif /* DRM_ATOMIC_HELPER_H_ */
> diff --git a/include/drm/drm_device.h b/include/drm/drm_device.h
> index 42411b3ea0c8..59c63e03a776 100644
> --- a/include/drm/drm_device.h
> +++ b/include/drm/drm_device.h
> @@ -231,6 +231,21 @@ struct drm_device {
>  	 * Set by drm_fb_helper_init() and cleared by drm_fb_helper_fini().
>  	 */
>  	struct drm_fb_helper *fb_helper;
> +
> +	/**
> +	 * @underrun:
> +	 *
> +	 * Set to 1 when an underrun error happened since the last atomic
> +	 * commit, 0 otherwise.
> +	 * This is particularly useful to detect when a specific modeset is too
> +	 * demanding in term of memory bandwidth or other kind limitations that
> +	 * are hard to guess at atomic check time.
> +	 * Drivers should only set underrun to 1 (using
> +	 * drm_atomic_helper_underrun_set()) when an underrun interrupt occurs
> +	 * and clear if from their atomic commit hook (using
> +	 * drm_atomic_helper_underrun_{stop,start}()).
> +	 */
> +	atomic_t underrun;
>  };
>  
>  #endif
> diff --git a/include/drm/drm_mode_config.h b/include/drm/drm_mode_config.h
> index 8d540cdae57d..28b83e295fd2 100644
> --- a/include/drm/drm_mode_config.h
> +++ b/include/drm/drm_mode_config.h
> @@ -325,6 +325,32 @@ struct drm_mode_config_funcs {
>  	 * &drm_private_state and &drm_private_obj.
>  	 */
>  	void (*atomic_state_free)(struct drm_atomic_state *state);
> +
> +	/**
> +	 * @atomic_mask_underrun:
> +	 *
> +	 * This is an optional hook called when the core needs to disable/mask
> +	 * the underrun interrupt. It's used by drm_atomic_helper_underrun_set()
> +	 * and drm_atomic_helper_underrun_stop().
> +	 */
> +	void (*atomic_mask_underrun)(struct drm_device *dev);
> +
> +	/**
> +	 * @atomic_clear_underrun:
> +	 *
> +	 * This is an optional hook called when the core needs to clear pending
> +	 * underrun events. It's used by drm_atomic_helper_underrun_start() to
> +	 * before unmask the underrun interrupt.
> +	 */
> +	void (*atomic_clear_underrun)(struct drm_device *dev);
> +
> +	/**
> +	 * @atomic_unmask_underrun:
> +	 *
> +	 * This is an optional hook called when the core needs to unmask
> +	 * underrun interrupts. It's used by drm_atomic_helper_underrun_start().
> +	 */
> +	void (*atomic_unmask_underrun)(struct drm_device *dev);
>  };
>  
>  /**
> -- 
> 2.17.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 1/3] drm/atomic: Add a generic infrastructure to track underrun errors
  2018-10-26 10:33   ` Daniel Vetter
@ 2018-10-26 12:36     ` Boris Brezillon
  2018-10-26 13:36       ` Daniel Vetter
  0 siblings, 1 reply; 18+ messages in thread
From: Boris Brezillon @ 2018-10-26 12:36 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: dri-devel

Hi Daniel,

On Fri, 26 Oct 2018 12:33:37 +0200
Daniel Vetter <daniel@ffwll.ch> wrote:

> On Thu, Oct 25, 2018 at 02:45:44PM +0200, Boris Brezillon wrote:
> > Display controllers usually provide a lot features like overlay planes,
> > hardware scalers, hardware rotations, ...
> > Most of the time those features work just fine when enabled
> > independently, but activating all of them at the same time tend to
> > show other limitations like the limited memory bus or scaler
> > bandwidth.
> > 
> > This sort of bandwidth estimation is hard to get right, and we
> > sometimes fail to predict that a specific workload will not pass,
> > which will most likely result in underrun errors somewhere in the
> > display pipeline (most of the time at the CRTC level).
> > 
> > This path aims at making underrun error tracking generic and exposing
> > underrun errors to userspace through a debugfs file.
> > 
> > This way, CI tools like IGT, can try to enable a bunch of features and
> > if such underrun errors are reported after the settings have been
> > accepted and applied by the KMS infrastructure. When that happens it
> > means the load tracking algorithm needs some tweaking/fixing.
> > 
> > Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
> > ---
> > Changes in v2:
> > - New patch  
> 
> Hm, not what I thought off. This is definitely not going to be reusable
> for i915, the only other driver with underrun reporting, since we have
> tons of underrun sources (not just one). And we need to mask/unmask them
> individually.

I agree it's not perfect, but I did it this way so that hopefully we'll
be able to use drm_atomic_helper_commit_tail() at some point instead of
having our own implementation.

If there's a way to add pre/post_commit() hooks (maybe they already
exist?) at various levels of the display pipeline that get called even
when the associated state has *not* changed, then I might be able to
place my mask/unmask/clear operations there. I really need that to
properly unmask the underrun interrupt, otherwise underruns might not be
detected after the first modeset if nothing changed in sub-parts of the
pipeline.

> 
> It's also midlayer mistake, since you're stuff your callbacks into the
> drm_mode_config_funcs core structure (which should only be used for uapi
> interfaces, not helper stuff).

Okay.

> 
> What I had in mind is just a simple function which spews into dmesg (so
> that i915 could use it) and increments the underrun counter in debugfs.
> Everything else needs to be handled in drivers I think. E.g.
> 
> drm_mode_config_report_underrun(struct drm_device *dev,
> 				const char *source);
> 
> and then rolling it out for i915 too (otherwise there's really no point in
> the common infrastructure).

Modifying the i915 driver was a bit premature since I didn't have your
feedback on the general approach. And given your review, I'm glad I
didn't start this conversion :-P.

Anyway, I was not convinced having a generic infrastructure to report
underrun errors was needed in the first place. The fact that these
errors are very HW-specific, and that data attached to such events
might be useful to understand what actually happened makes me think we
don't really need this generic underrun reporting helper.

Also note that IGT tests are likely to be HW specific too, because the
workload needed to trigger underrun errors is likely to depend on the
platform you're running on. And if you already have to write your own
tests, I don't see clear benefit in exposing underrun errors
generically.

Anyway, this is just my opinion, and if you think we actually need the
drm_mode_config_report_underrun() function, I'll implement it.

Regards,

Boris
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 1/3] drm/atomic: Add a generic infrastructure to track underrun errors
  2018-10-26 12:36     ` Boris Brezillon
@ 2018-10-26 13:36       ` Daniel Vetter
  2018-10-26 14:13         ` Boris Brezillon
  0 siblings, 1 reply; 18+ messages in thread
From: Daniel Vetter @ 2018-10-26 13:36 UTC (permalink / raw)
  To: Boris Brezillon; +Cc: dri-devel

On Fri, Oct 26, 2018 at 02:36:56PM +0200, Boris Brezillon wrote:
> Hi Daniel,
> 
> On Fri, 26 Oct 2018 12:33:37 +0200
> Daniel Vetter <daniel@ffwll.ch> wrote:
> 
> > On Thu, Oct 25, 2018 at 02:45:44PM +0200, Boris Brezillon wrote:
> > > Display controllers usually provide a lot features like overlay planes,
> > > hardware scalers, hardware rotations, ...
> > > Most of the time those features work just fine when enabled
> > > independently, but activating all of them at the same time tend to
> > > show other limitations like the limited memory bus or scaler
> > > bandwidth.
> > > 
> > > This sort of bandwidth estimation is hard to get right, and we
> > > sometimes fail to predict that a specific workload will not pass,
> > > which will most likely result in underrun errors somewhere in the
> > > display pipeline (most of the time at the CRTC level).
> > > 
> > > This path aims at making underrun error tracking generic and exposing
> > > underrun errors to userspace through a debugfs file.
> > > 
> > > This way, CI tools like IGT, can try to enable a bunch of features and
> > > if such underrun errors are reported after the settings have been
> > > accepted and applied by the KMS infrastructure. When that happens it
> > > means the load tracking algorithm needs some tweaking/fixing.
> > > 
> > > Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
> > > ---
> > > Changes in v2:
> > > - New patch  
> > 
> > Hm, not what I thought off. This is definitely not going to be reusable
> > for i915, the only other driver with underrun reporting, since we have
> > tons of underrun sources (not just one). And we need to mask/unmask them
> > individually.
> 
> I agree it's not perfect, but I did it this way so that hopefully we'll
> be able to use drm_atomic_helper_commit_tail() at some point instead of
> having our own implementation.

I expect every non-trivial driver to need to overwrite commit_tail. That's
the entire point for having created it. Exceptions are just the very, very
simple ones without anything fancy.

> If there's a way to add pre/post_commit() hooks (maybe they already
> exist?) at various levels of the display pipeline that get called even
> when the associated state has *not* changed, then I might be able to
> place my mask/unmask/clear operations there. I really need that to
> properly unmask the underrun interrupt, otherwise underruns might not be
> detected after the first modeset if nothing changed in sub-parts of the
> pipeline.

Just overwrite commit_tail. That's why it exists.

> > It's also midlayer mistake, since you're stuff your callbacks into the
> > drm_mode_config_funcs core structure (which should only be used for uapi
> > interfaces, not helper stuff).
> 
> Okay.
> 
> > 
> > What I had in mind is just a simple function which spews into dmesg (so
> > that i915 could use it) and increments the underrun counter in debugfs.
> > Everything else needs to be handled in drivers I think. E.g.
> > 
> > drm_mode_config_report_underrun(struct drm_device *dev,
> > 				const char *source);
> > 
> > and then rolling it out for i915 too (otherwise there's really no point in
> > the common infrastructure).
> 
> Modifying the i915 driver was a bit premature since I didn't have your
> feedback on the general approach. And given your review, I'm glad I
> didn't start this conversion :-P.
> 
> Anyway, I was not convinced having a generic infrastructure to report
> underrun errors was needed in the first place. The fact that these
> errors are very HW-specific, and that data attached to such events
> might be useful to understand what actually happened makes me think we
> don't really need this generic underrun reporting helper.
> 
> Also note that IGT tests are likely to be HW specific too, because the
> workload needed to trigger underrun errors is likely to depend on the
> platform you're running on. And if you already have to write your own
> tests, I don't see clear benefit in exposing underrun errors
> generically.

See my other reply in the earlier thread: That's why we're just dumping
errors into dmesg, and catching any such thing in our runners. We do not
have any specific underrun tests, we treat the entire igt test suite of
kms tests as our underrun test suite.

Extending that with more nasty corner cases is obviously great, and will
benefit everyone. But I don't think you want an igt testcase that looks
specifically for underruns. We do use crc to hunt for specific issues, but
again that's portable between drivers.

> Anyway, this is just my opinion, and if you think we actually need the
> drm_mode_config_report_underrun() function, I'll implement it.

I do think having some standard in how these errors are reported would be
useful, even if it's just a special DRM_UNDERRUN_ERROR() macro.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 1/3] drm/atomic: Add a generic infrastructure to track underrun errors
  2018-10-26 13:36       ` Daniel Vetter
@ 2018-10-26 14:13         ` Boris Brezillon
  2018-10-26 14:23           ` Daniel Vetter
  0 siblings, 1 reply; 18+ messages in thread
From: Boris Brezillon @ 2018-10-26 14:13 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: dri-devel

On Fri, 26 Oct 2018 15:36:15 +0200
Daniel Vetter <daniel@ffwll.ch> wrote:

> On Fri, Oct 26, 2018 at 02:36:56PM +0200, Boris Brezillon wrote:
> > Hi Daniel,
> > 
> > On Fri, 26 Oct 2018 12:33:37 +0200
> > Daniel Vetter <daniel@ffwll.ch> wrote:
> >   
> > > On Thu, Oct 25, 2018 at 02:45:44PM +0200, Boris Brezillon wrote:  
> > > > Display controllers usually provide a lot features like overlay planes,
> > > > hardware scalers, hardware rotations, ...
> > > > Most of the time those features work just fine when enabled
> > > > independently, but activating all of them at the same time tend to
> > > > show other limitations like the limited memory bus or scaler
> > > > bandwidth.
> > > > 
> > > > This sort of bandwidth estimation is hard to get right, and we
> > > > sometimes fail to predict that a specific workload will not pass,
> > > > which will most likely result in underrun errors somewhere in the
> > > > display pipeline (most of the time at the CRTC level).
> > > > 
> > > > This path aims at making underrun error tracking generic and exposing
> > > > underrun errors to userspace through a debugfs file.
> > > > 
> > > > This way, CI tools like IGT, can try to enable a bunch of features and
> > > > if such underrun errors are reported after the settings have been
> > > > accepted and applied by the KMS infrastructure. When that happens it
> > > > means the load tracking algorithm needs some tweaking/fixing.
> > > > 
> > > > Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
> > > > ---
> > > > Changes in v2:
> > > > - New patch    
> > > 
> > > Hm, not what I thought off. This is definitely not going to be reusable
> > > for i915, the only other driver with underrun reporting, since we have
> > > tons of underrun sources (not just one). And we need to mask/unmask them
> > > individually.  
> > 
> > I agree it's not perfect, but I did it this way so that hopefully we'll
> > be able to use drm_atomic_helper_commit_tail() at some point instead of
> > having our own implementation.  
> 
> I expect every non-trivial driver to need to overwrite commit_tail. That's
> the entire point for having created it. Exceptions are just the very, very
> simple ones without anything fancy.

Again, just my opinion, but doing that means some drivers are left
behind when new stuff are added to the generic helpers, and sometimes
those new features might be of interest to the drivers that are not
directly using those generic helpers, except they usually don't even
know about it until one comes at looking at the generic helper again
and notice the additions.
Maybe I'm not following the dri-devel ML close enough, and I'm probably
the one to blame here, but I see a real benefit in having the maximum
number of drivers using the generic helpers. To be honest, with a few
modifications, I think the VC4 driver could entirely rely on those
generic helpers.

> 
> > If there's a way to add pre/post_commit() hooks (maybe they already
> > exist?) at various levels of the display pipeline that get called even
> > when the associated state has *not* changed, then I might be able to
> > place my mask/unmask/clear operations there. I really need that to
> > properly unmask the underrun interrupt, otherwise underruns might not be
> > detected after the first modeset if nothing changed in sub-parts of the
> > pipeline.  
> 
> Just overwrite commit_tail. That's why it exists.

And that's what we do already, but I was hoping that someday we would
be able to switch to the generic helper...

> 
> > > It's also midlayer mistake, since you're stuff your callbacks into the
> > > drm_mode_config_funcs core structure (which should only be used for uapi
> > > interfaces, not helper stuff).  
> > 
> > Okay.
> >   
> > > 
> > > What I had in mind is just a simple function which spews into dmesg (so
> > > that i915 could use it) and increments the underrun counter in debugfs.
> > > Everything else needs to be handled in drivers I think. E.g.
> > > 
> > > drm_mode_config_report_underrun(struct drm_device *dev,
> > > 				const char *source);
> > > 
> > > and then rolling it out for i915 too (otherwise there's really no point in
> > > the common infrastructure).  
> > 
> > Modifying the i915 driver was a bit premature since I didn't have your
> > feedback on the general approach. And given your review, I'm glad I
> > didn't start this conversion :-P.
> > 
> > Anyway, I was not convinced having a generic infrastructure to report
> > underrun errors was needed in the first place. The fact that these
> > errors are very HW-specific, and that data attached to such events
> > might be useful to understand what actually happened makes me think we
> > don't really need this generic underrun reporting helper.
> > 
> > Also note that IGT tests are likely to be HW specific too, because the
> > workload needed to trigger underrun errors is likely to depend on the
> > platform you're running on. And if you already have to write your own
> > tests, I don't see clear benefit in exposing underrun errors
> > generically.  
> 
> See my other reply in the earlier thread: That's why we're just dumping
> errors into dmesg, and catching any such thing in our runners. We do not
> have any specific underrun tests, we treat the entire igt test suite of
> kms tests as our underrun test suite.
> 
> Extending that with more nasty corner cases is obviously great, and will
> benefit everyone. But I don't think you want an igt testcase that looks
> specifically for underruns.

Actually, that's exactly what I wanted to do: use config that were
known to generate underrun errors and make sure they don't after adding
the HVS load tracker. That's also particularly useful to track
regressions in this portion of the code.

> We do use crc to hunt for specific issues, but
> again that's portable between drivers.

We don't use CRC yet, but there was plans to add tests using writeback
to compare the output image to what we expect, and Maxime also worked
on tests relying on the Chamelium board.

Still, knowing why the image is incorrect is useful, and I guess you
could have a wrong output that is not necessarily caused by an
underrun error. So I think having tests that explicly check for
underrun errors is a good thing.

> 
> > Anyway, this is just my opinion, and if you think we actually need the
> > drm_mode_config_report_underrun() function, I'll implement it.  
> 
> I do think having some standard in how these errors are reported would be
> useful, even if it's just a special DRM_UNDERRUN_ERROR() macro.

I'll define such a macro.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 1/3] drm/atomic: Add a generic infrastructure to track underrun errors
  2018-10-26 14:13         ` Boris Brezillon
@ 2018-10-26 14:23           ` Daniel Vetter
  0 siblings, 0 replies; 18+ messages in thread
From: Daniel Vetter @ 2018-10-26 14:23 UTC (permalink / raw)
  To: Boris Brezillon; +Cc: dri-devel

On Fri, Oct 26, 2018 at 4:13 PM Boris Brezillon
<boris.brezillon@bootlin.com> wrote:
>
> On Fri, 26 Oct 2018 15:36:15 +0200
> Daniel Vetter <daniel@ffwll.ch> wrote:
>
> > On Fri, Oct 26, 2018 at 02:36:56PM +0200, Boris Brezillon wrote:
> > > Hi Daniel,
> > >
> > > On Fri, 26 Oct 2018 12:33:37 +0200
> > > Daniel Vetter <daniel@ffwll.ch> wrote:
> > >
> > > > On Thu, Oct 25, 2018 at 02:45:44PM +0200, Boris Brezillon wrote:
> > > > > Display controllers usually provide a lot features like overlay planes,
> > > > > hardware scalers, hardware rotations, ...
> > > > > Most of the time those features work just fine when enabled
> > > > > independently, but activating all of them at the same time tend to
> > > > > show other limitations like the limited memory bus or scaler
> > > > > bandwidth.
> > > > >
> > > > > This sort of bandwidth estimation is hard to get right, and we
> > > > > sometimes fail to predict that a specific workload will not pass,
> > > > > which will most likely result in underrun errors somewhere in the
> > > > > display pipeline (most of the time at the CRTC level).
> > > > >
> > > > > This path aims at making underrun error tracking generic and exposing
> > > > > underrun errors to userspace through a debugfs file.
> > > > >
> > > > > This way, CI tools like IGT, can try to enable a bunch of features and
> > > > > if such underrun errors are reported after the settings have been
> > > > > accepted and applied by the KMS infrastructure. When that happens it
> > > > > means the load tracking algorithm needs some tweaking/fixing.
> > > > >
> > > > > Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
> > > > > ---
> > > > > Changes in v2:
> > > > > - New patch
> > > >
> > > > Hm, not what I thought off. This is definitely not going to be reusable
> > > > for i915, the only other driver with underrun reporting, since we have
> > > > tons of underrun sources (not just one). And we need to mask/unmask them
> > > > individually.
> > >
> > > I agree it's not perfect, but I did it this way so that hopefully we'll
> > > be able to use drm_atomic_helper_commit_tail() at some point instead of
> > > having our own implementation.
> >
> > I expect every non-trivial driver to need to overwrite commit_tail. That's
> > the entire point for having created it. Exceptions are just the very, very
> > simple ones without anything fancy.
>
> Again, just my opinion, but doing that means some drivers are left
> behind when new stuff are added to the generic helpers, and sometimes
> those new features might be of interest to the drivers that are not
> directly using those generic helpers, except they usually don't even
> know about it until one comes at looking at the generic helper again
> and notice the additions.
> Maybe I'm not following the dri-devel ML close enough, and I'm probably
> the one to blame here, but I see a real benefit in having the maximum
> number of drivers using the generic helpers. To be honest, with a few
> modifications, I think the VC4 driver could entirely rely on those
> generic helpers.

commit_tail commits the sw state to hw. Your hw will not magically
gain any new features, so there's really not much point in
automatically adding stuff to the overall hw commit flow. Some
exceptoins apply, and those get reviewed when the core change goes in.

Yes I spend ridiculous amounts of my time reading everyone else's
drivers. And I really don't want to make commit_tail 100% applicapable
to everyone, that just makes reviewing drivers harder (since the flow
isn't obvious by simply looking at commit_tail, instead you have to
hunt down feature flags and other stuff that control things).

This holds in general btw, and it's part of the no-midlayer design
paradigm. No huge blobs that work for everyone, instead make things
modular.

But if you have a concrete example where we added something, and a
driver should have used it but didn't, we'll need to look into that.
-Daniel

> > > If there's a way to add pre/post_commit() hooks (maybe they already
> > > exist?) at various levels of the display pipeline that get called even
> > > when the associated state has *not* changed, then I might be able to
> > > place my mask/unmask/clear operations there. I really need that to
> > > properly unmask the underrun interrupt, otherwise underruns might not be
> > > detected after the first modeset if nothing changed in sub-parts of the
> > > pipeline.
> >
> > Just overwrite commit_tail. That's why it exists.
>
> And that's what we do already, but I was hoping that someday we would
> be able to switch to the generic helper...
>
> >
> > > > It's also midlayer mistake, since you're stuff your callbacks into the
> > > > drm_mode_config_funcs core structure (which should only be used for uapi
> > > > interfaces, not helper stuff).
> > >
> > > Okay.
> > >
> > > >
> > > > What I had in mind is just a simple function which spews into dmesg (so
> > > > that i915 could use it) and increments the underrun counter in debugfs.
> > > > Everything else needs to be handled in drivers I think. E.g.
> > > >
> > > > drm_mode_config_report_underrun(struct drm_device *dev,
> > > >                           const char *source);
> > > >
> > > > and then rolling it out for i915 too (otherwise there's really no point in
> > > > the common infrastructure).
> > >
> > > Modifying the i915 driver was a bit premature since I didn't have your
> > > feedback on the general approach. And given your review, I'm glad I
> > > didn't start this conversion :-P.
> > >
> > > Anyway, I was not convinced having a generic infrastructure to report
> > > underrun errors was needed in the first place. The fact that these
> > > errors are very HW-specific, and that data attached to such events
> > > might be useful to understand what actually happened makes me think we
> > > don't really need this generic underrun reporting helper.
> > >
> > > Also note that IGT tests are likely to be HW specific too, because the
> > > workload needed to trigger underrun errors is likely to depend on the
> > > platform you're running on. And if you already have to write your own
> > > tests, I don't see clear benefit in exposing underrun errors
> > > generically.
> >
> > See my other reply in the earlier thread: That's why we're just dumping
> > errors into dmesg, and catching any such thing in our runners. We do not
> > have any specific underrun tests, we treat the entire igt test suite of
> > kms tests as our underrun test suite.
> >
> > Extending that with more nasty corner cases is obviously great, and will
> > benefit everyone. But I don't think you want an igt testcase that looks
> > specifically for underruns.
>
> Actually, that's exactly what I wanted to do: use config that were
> known to generate underrun errors and make sure they don't after adding
> the HVS load tracker. That's also particularly useful to track
> regressions in this portion of the code.
>
> > We do use crc to hunt for specific issues, but
> > again that's portable between drivers.
>
> We don't use CRC yet, but there was plans to add tests using writeback
> to compare the output image to what we expect, and Maxime also worked
> on tests relying on the Chamelium board.
>
> Still, knowing why the image is incorrect is useful, and I guess you
> could have a wrong output that is not necessarily caused by an
> underrun error. So I think having tests that explicly check for
> underrun errors is a good thing.
>
> >
> > > Anyway, this is just my opinion, and if you think we actually need the
> > > drm_mode_config_report_underrun() function, I'll implement it.
> >
> > I do think having some standard in how these errors are reported would be
> > useful, even if it's just a special DRM_UNDERRUN_ERROR() macro.
>
> I'll define such a macro.

Function, if it does anything more than call DRM_ERROR. Just to avoid confusion.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 3/3] drm/vc4: Add a load tracker to prevent HVS underflow errors
  2018-10-25 12:45 ` [PATCH v2 3/3] drm/vc4: Add a load tracker to prevent HVS underflow errors Boris Brezillon
@ 2018-10-30 23:12   ` Eric Anholt
  2018-11-05 11:36     ` Boris Brezillon
  2018-11-06 13:07     ` Boris Brezillon
  0 siblings, 2 replies; 18+ messages in thread
From: Eric Anholt @ 2018-10-30 23:12 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Boris Brezillon, dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 8264 bytes --]

Boris Brezillon <boris.brezillon@bootlin.com> writes:

> The HVS block is supposed to fill the pixelvalve FIFOs fast enough to
> meet the requested framerate. The problem is, the HVS and memory bus
> bandwidths are limited, and if we don't take these limitations into
> account we might end up with HVS underflow errors.
>
> This patch is trying to model the per-plane HVS and memory bus bandwidth
> consumption and take a decision at atomic_check() time whether the
> estimated load will fit in the HVS and membus budget.
>
> Note that we take an extra margin on the memory bus consumption to let
> the system run smoothly when other blocks are doing heavy use of the
> memory bus. Same goes for the HVS limit, except the margin is smaller in
> this case, since the HVS is not used by external components.
>
> Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
> ---
> Changes in v2:
> - Remove an unused var in vc4_load_tracker_atomic_check()
> ---
>  drivers/gpu/drm/vc4/vc4_drv.c   |   1 +
>  drivers/gpu/drm/vc4/vc4_drv.h   |  11 ++++
>  drivers/gpu/drm/vc4/vc4_kms.c   | 103 +++++++++++++++++++++++++++++++-
>  drivers/gpu/drm/vc4/vc4_plane.c |  60 +++++++++++++++++++
>  4 files changed, 174 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/vc4/vc4_drv.c b/drivers/gpu/drm/vc4/vc4_drv.c
> index f6f5cd80c04d..7195a0bcceb3 100644
> --- a/drivers/gpu/drm/vc4/vc4_drv.c
> +++ b/drivers/gpu/drm/vc4/vc4_drv.c
> @@ -313,6 +313,7 @@ static void vc4_drm_unbind(struct device *dev)
>  
>  	drm_mode_config_cleanup(drm);
>  
> +	drm_atomic_private_obj_fini(&vc4->load_tracker);
>  	drm_atomic_private_obj_fini(&vc4->ctm_manager);
>  
>  	drm_dev_put(drm);
> diff --git a/drivers/gpu/drm/vc4/vc4_drv.h b/drivers/gpu/drm/vc4/vc4_drv.h
> index a6c67c65ffbc..11369da944b6 100644
> --- a/drivers/gpu/drm/vc4/vc4_drv.h
> +++ b/drivers/gpu/drm/vc4/vc4_drv.h
> @@ -200,6 +200,7 @@ struct vc4_dev {
>  
>  	struct drm_modeset_lock ctm_state_lock;
>  	struct drm_private_obj ctm_manager;
> +	struct drm_private_obj load_tracker;
>  };
>  
>  static inline struct vc4_dev *
> @@ -369,6 +370,16 @@ struct vc4_plane_state {
>  	 * to enable background color fill.
>  	 */
>  	bool needs_bg_fill;
> +
> +	/* Load of this plane on the HVS block. The load is expressed in HVS
> +	 * cycles/sec.
> +	 */
> +	u64 hvs_load;
> +
> +	/* Memory bandwidth needed for this plane. This is expressed in
> +	 * bytes/sec.
> +	 */
> +	u64 membus_load;
>  };
>  
>  static inline struct vc4_plane_state *
> diff --git a/drivers/gpu/drm/vc4/vc4_kms.c b/drivers/gpu/drm/vc4/vc4_kms.c
> index 8e0183b1e8bb..b905a19c1470 100644
> --- a/drivers/gpu/drm/vc4/vc4_kms.c
> +++ b/drivers/gpu/drm/vc4/vc4_kms.c
> @@ -34,6 +34,18 @@ static struct vc4_ctm_state *to_vc4_ctm_state(struct drm_private_state *priv)
>  	return container_of(priv, struct vc4_ctm_state, base);
>  }
>  
> +struct vc4_load_tracker_state {
> +	struct drm_private_state base;
> +	u64 hvs_load;
> +	u64 membus_load;
> +};
> +
> +static struct vc4_load_tracker_state *
> +to_vc4_load_tracker_state(struct drm_private_state *priv)
> +{
> +	return container_of(priv, struct vc4_load_tracker_state, base);
> +}
> +
>  static struct vc4_ctm_state *vc4_get_ctm_state(struct drm_atomic_state *state,
>  					       struct drm_private_obj *manager)
>  {
> @@ -383,6 +395,81 @@ vc4_ctm_atomic_check(struct drm_device *dev, struct drm_atomic_state *state)
>  	return 0;
>  }
>  
> +static int vc4_load_tracker_atomic_check(struct drm_atomic_state *state)
> +{
> +	struct drm_plane_state *old_plane_state, *new_plane_state;
> +	struct vc4_dev *vc4 = to_vc4_dev(state->dev);
> +	struct vc4_load_tracker_state *load_state;
> +	struct drm_private_state *priv_state;
> +	struct drm_plane *plane;
> +	int i;
> +
> +	priv_state = drm_atomic_get_private_obj_state(state,
> +						      &vc4->load_tracker);
> +	if (IS_ERR(priv_state))
> +		return PTR_ERR(priv_state);
> +
> +	load_state = to_vc4_load_tracker_state(priv_state);
> +	for_each_oldnew_plane_in_state(state, plane, old_plane_state,
> +				       new_plane_state, i) {
> +		struct vc4_plane_state *vc4_plane_state;
> +
> +		if (old_plane_state->fb && old_plane_state->crtc) {
> +			vc4_plane_state = to_vc4_plane_state(old_plane_state);
> +			load_state->membus_load -= vc4_plane_state->membus_load;
> +			load_state->hvs_load -= vc4_plane_state->hvs_load;
> +		}
> +
> +		if (new_plane_state->fb && new_plane_state->crtc) {
> +			vc4_plane_state = to_vc4_plane_state(new_plane_state);
> +			load_state->membus_load += vc4_plane_state->membus_load;
> +			load_state->hvs_load += vc4_plane_state->hvs_load;
> +		}
> +	}
> +
> +	/* The abolsute limit is 2Gbyte/sec, but let's take a margin to let

"absolute"

> +	 * the system work when other blocks are accessing the memory.
> +	 */
> +	if (load_state->membus_load > SZ_1G + SZ_512M)
> +		return -ENOSPC;


> +	/* HVS clock is supposed to run @ 250Mhz, let's take a margin and
> +	 * consider the maximum number of cycles is 240M.
> +	 */
> +	if (load_state->hvs_load > 240000000ULL)
> +		return -ENOSPC;

Some day we should probably be using the HVS's actual clock from DT if
available instead of a hardcoded number here.  Good enough for now.

> +static void vc4_load_tracker_destroy_state(struct drm_private_obj *obj,
> +					   struct drm_private_state *state)
> +{
> +	struct vc4_load_tracker_state *load_state;
> +
> +	load_state = to_vc4_load_tracker_state(state);
> +	kfree(load_state);
> +}

Optional: just kfree(state) for simplicity.

> +static void vc4_plane_calc_load(struct drm_plane_state *state)
> +{
> +	unsigned int hvs_load_shift, vrefresh, i;
> +	struct drm_framebuffer *fb = state->fb;
> +	struct vc4_plane_state *vc4_state;
> +	struct drm_crtc_state *crtc_state;
> +	unsigned int vscale_factor;
> +
> +	vc4_state = to_vc4_plane_state(state);
> +	crtc_state = drm_atomic_get_existing_crtc_state(state->state,
> +							state->crtc);
> +	vrefresh = drm_mode_vrefresh(&crtc_state->adjusted_mode);
> +
> +	/* The HVS is able to process 2 pixels/cycle when scaling the source,
> +	 * 4 pixels/cycle otherwise.
> +	 * Alpha blending step seems to be pipelined and it's always operating
> +	 * at 4 pixels/cycle, so the limiting aspect here seems to be the
> +	 * scaler block.
> +	 * HVS load is expressed in clk-cycles/sec (AKA Hz).
> +	 */
> +	if (vc4_state->x_scaling[0] != VC4_SCALING_NONE ||
> +	    vc4_state->x_scaling[1] != VC4_SCALING_NONE ||
> +	    vc4_state->y_scaling[0] != VC4_SCALING_NONE ||
> +	    vc4_state->y_scaling[1] != VC4_SCALING_NONE)
> +		hvs_load_shift = 1;
> +	else
> +		hvs_load_shift = 2;
> +
> +	vc4_state->membus_load = 0;
> +	vc4_state->hvs_load = 0;
> +	for (i = 0; i < fb->format->num_planes; i++) {
> +		unsigned long pixels_load;

I'm scared any time I see longs.  Do you want 32 or 64 bits here?

> +		/* Even if the bandwidth/plane required for a single frame is
> +		 *
> +		 * vc4_state->src_w[i] * vc4_state->src_h[i] * cpp * vrefresh
> +		 *
> +		 * when downscaling, we have to read more pixels per line in
> +		 * the time frame reserved for a single line, so the bandwidth
> +		 * demand can be punctually higher. To account for that, we
> +		 * calculate the down-scaling factor and multiply the plane
> +		 * load by this number. We're likely over-estimating the read
> +		 * demand, but that's better than under-estimating it.
> +		 */
> +		vscale_factor = DIV_ROUND_UP(vc4_state->src_h[i],
> +					     vc4_state->crtc_h);
> +		pixels_load = vc4_state->src_w[i] * vc4_state->src_h[i] *
> +			      vscale_factor;

If we're upscaling (common for video, right?), aren't we under-counting
the cost?  You need to scale/colorspace-convert crtc_w * crtc_h at 2
pixels per cycle.

Overall, this is simpler than I expected it to be, and looks really
promising.  Thanks for working on it!

> +		vc4_state->membus_load += fb->format->cpp[i] * pixels_load;
> +		vc4_state->hvs_load += pixels_load;
> +	}
> +
> +	vc4_state->hvs_load *= vrefresh;
> +	vc4_state->hvs_load >>= hvs_load_shift;
> +	vc4_state->membus_load *= vrefresh;
> +}
> +


[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 3/3] drm/vc4: Add a load tracker to prevent HVS underflow errors
  2018-10-30 23:12   ` Eric Anholt
@ 2018-11-05 11:36     ` Boris Brezillon
  2018-11-08 16:26       ` Eric Anholt
  2018-11-06 13:07     ` Boris Brezillon
  1 sibling, 1 reply; 18+ messages in thread
From: Boris Brezillon @ 2018-11-05 11:36 UTC (permalink / raw)
  To: Eric Anholt; +Cc: dri-devel

Hi Eric,

On Tue, 30 Oct 2018 16:12:55 -0700
Eric Anholt <eric@anholt.net> wrote:

> > +
> > +	/* The abolsute limit is 2Gbyte/sec, but let's take a margin to let  
> 
> "absolute"

Will fix the typo.

> 
> > +	 * the system work when other blocks are accessing the memory.
> > +	 */
> > +	if (load_state->membus_load > SZ_1G + SZ_512M)
> > +		return -ENOSPC;  
> 
> 
> > +	/* HVS clock is supposed to run @ 250Mhz, let's take a margin and
> > +	 * consider the maximum number of cycles is 240M.
> > +	 */
> > +	if (load_state->hvs_load > 240000000ULL)
> > +		return -ENOSPC;  
> 
> Some day we should probably be using the HVS's actual clock from DT if
> available instead of a hardcoded number here.

I agree.

> Good enough for now.
> 
> > +static void vc4_load_tracker_destroy_state(struct drm_private_obj *obj,
> > +					   struct drm_private_state *state)
> > +{
> > +	struct vc4_load_tracker_state *load_state;
> > +
> > +	load_state = to_vc4_load_tracker_state(state);
> > +	kfree(load_state);
> > +}  
> 
> Optional: just kfree(state) for simplicity.

Hm, not sure that's a good idea. kfree(state) works as long as
drm_private_state is the first field in vc4_load_tracker_state, but it
sounds a bit fragile.

I can do

	kfree(to_vc4_load_tracker_state(state));

if you prefer.

> 
> > +static void vc4_plane_calc_load(struct drm_plane_state *state)
> > +{
> > +	unsigned int hvs_load_shift, vrefresh, i;
> > +	struct drm_framebuffer *fb = state->fb;
> > +	struct vc4_plane_state *vc4_state;
> > +	struct drm_crtc_state *crtc_state;
> > +	unsigned int vscale_factor;
> > +
> > +	vc4_state = to_vc4_plane_state(state);
> > +	crtc_state = drm_atomic_get_existing_crtc_state(state->state,
> > +							state->crtc);
> > +	vrefresh = drm_mode_vrefresh(&crtc_state->adjusted_mode);
> > +
> > +	/* The HVS is able to process 2 pixels/cycle when scaling the source,
> > +	 * 4 pixels/cycle otherwise.
> > +	 * Alpha blending step seems to be pipelined and it's always operating
> > +	 * at 4 pixels/cycle, so the limiting aspect here seems to be the
> > +	 * scaler block.
> > +	 * HVS load is expressed in clk-cycles/sec (AKA Hz).
> > +	 */
> > +	if (vc4_state->x_scaling[0] != VC4_SCALING_NONE ||
> > +	    vc4_state->x_scaling[1] != VC4_SCALING_NONE ||
> > +	    vc4_state->y_scaling[0] != VC4_SCALING_NONE ||
> > +	    vc4_state->y_scaling[1] != VC4_SCALING_NONE)
> > +		hvs_load_shift = 1;
> > +	else
> > +		hvs_load_shift = 2;
> > +
> > +	vc4_state->membus_load = 0;
> > +	vc4_state->hvs_load = 0;
> > +	for (i = 0; i < fb->format->num_planes; i++) {
> > +		unsigned long pixels_load;  
> 
> I'm scared any time I see longs.  Do you want 32 or 64 bits here?

I just assumed a 32bit unsigned var would be enough, so unsigned long
seemed just fine. I can use u32 or u64 if you prefer.

> 
> > +		/* Even if the bandwidth/plane required for a single frame is
> > +		 *
> > +		 * vc4_state->src_w[i] * vc4_state->src_h[i] * cpp * vrefresh
> > +		 *
> > +		 * when downscaling, we have to read more pixels per line in
> > +		 * the time frame reserved for a single line, so the bandwidth
> > +		 * demand can be punctually higher. To account for that, we
> > +		 * calculate the down-scaling factor and multiply the plane
> > +		 * load by this number. We're likely over-estimating the read
> > +		 * demand, but that's better than under-estimating it.
> > +		 */
> > +		vscale_factor = DIV_ROUND_UP(vc4_state->src_h[i],
> > +					     vc4_state->crtc_h);
> > +		pixels_load = vc4_state->src_w[i] * vc4_state->src_h[i] *
> > +			      vscale_factor;  
> 
> If we're upscaling (common for video, right?), aren't we under-counting
> the cost?  You need to scale/colorspace-convert crtc_w * crtc_h at 2
> pixels per cycle.

That's not entirely clear to me. I'm not sure what the scaler does when
upscaling. Are the same pixels read several times from the memory? If
that's the case, then the membus load should indeed be based on the
crtc_w,h.

Also, when the spec says the HVS can process 4pixels/cycles, is it 4
input pixels or 4 output pixels per cycle?

Regards,

Boris
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 3/3] drm/vc4: Add a load tracker to prevent HVS underflow errors
  2018-10-30 23:12   ` Eric Anholt
  2018-11-05 11:36     ` Boris Brezillon
@ 2018-11-06 13:07     ` Boris Brezillon
  1 sibling, 0 replies; 18+ messages in thread
From: Boris Brezillon @ 2018-11-06 13:07 UTC (permalink / raw)
  To: Eric Anholt; +Cc: dri-devel

On Tue, 30 Oct 2018 16:12:55 -0700
Eric Anholt <eric@anholt.net> wrote:

> > +		/* Even if the bandwidth/plane required for a single frame is
> > +		 *
> > +		 * vc4_state->src_w[i] * vc4_state->src_h[i] * cpp * vrefresh
> > +		 *
> > +		 * when downscaling, we have to read more pixels per line in
> > +		 * the time frame reserved for a single line, so the bandwidth
> > +		 * demand can be punctually higher. To account for that, we
> > +		 * calculate the down-scaling factor and multiply the plane
> > +		 * load by this number. We're likely over-estimating the read
> > +		 * demand, but that's better than under-estimating it.
> > +		 */

One more thing: because we sometimes over estimate the load, we might
reject configuration that were previously accepted, so I'm wondering if
we shouldn't make the 'reject on HVS/membus load too high' optional,
and disable it by default.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 3/3] drm/vc4: Add a load tracker to prevent HVS underflow errors
  2018-11-05 11:36     ` Boris Brezillon
@ 2018-11-08 16:26       ` Eric Anholt
  2018-11-08 16:50         ` Boris Brezillon
  0 siblings, 1 reply; 18+ messages in thread
From: Eric Anholt @ 2018-11-08 16:26 UTC (permalink / raw)
  To: Boris Brezillon; +Cc: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 4109 bytes --]

Boris Brezillon <boris.brezillon@bootlin.com> writes:

> Hi Eric,
>
> On Tue, 30 Oct 2018 16:12:55 -0700
> Eric Anholt <eric@anholt.net> wrote:
>> > +static void vc4_load_tracker_destroy_state(struct drm_private_obj *obj,
>> > +					   struct drm_private_state *state)
>> > +{
>> > +	struct vc4_load_tracker_state *load_state;
>> > +
>> > +	load_state = to_vc4_load_tracker_state(state);
>> > +	kfree(load_state);
>> > +}  
>> 
>> Optional: just kfree(state) for simplicity.
>
> Hm, not sure that's a good idea. kfree(state) works as long as
> drm_private_state is the first field in vc4_load_tracker_state, but it
> sounds a bit fragile.
>
> I can do
>
> 	kfree(to_vc4_load_tracker_state(state));
>
> if you prefer.

I said optional for that reason :)  Just keep it as is.

>> > +static void vc4_plane_calc_load(struct drm_plane_state *state)
>> > +{
>> > +	unsigned int hvs_load_shift, vrefresh, i;
>> > +	struct drm_framebuffer *fb = state->fb;
>> > +	struct vc4_plane_state *vc4_state;
>> > +	struct drm_crtc_state *crtc_state;
>> > +	unsigned int vscale_factor;
>> > +
>> > +	vc4_state = to_vc4_plane_state(state);
>> > +	crtc_state = drm_atomic_get_existing_crtc_state(state->state,
>> > +							state->crtc);
>> > +	vrefresh = drm_mode_vrefresh(&crtc_state->adjusted_mode);
>> > +
>> > +	/* The HVS is able to process 2 pixels/cycle when scaling the source,
>> > +	 * 4 pixels/cycle otherwise.
>> > +	 * Alpha blending step seems to be pipelined and it's always operating
>> > +	 * at 4 pixels/cycle, so the limiting aspect here seems to be the
>> > +	 * scaler block.
>> > +	 * HVS load is expressed in clk-cycles/sec (AKA Hz).
>> > +	 */
>> > +	if (vc4_state->x_scaling[0] != VC4_SCALING_NONE ||
>> > +	    vc4_state->x_scaling[1] != VC4_SCALING_NONE ||
>> > +	    vc4_state->y_scaling[0] != VC4_SCALING_NONE ||
>> > +	    vc4_state->y_scaling[1] != VC4_SCALING_NONE)
>> > +		hvs_load_shift = 1;
>> > +	else
>> > +		hvs_load_shift = 2;
>> > +
>> > +	vc4_state->membus_load = 0;
>> > +	vc4_state->hvs_load = 0;
>> > +	for (i = 0; i < fb->format->num_planes; i++) {
>> > +		unsigned long pixels_load;  
>> 
>> I'm scared any time I see longs.  Do you want 32 or 64 bits here?
>
> I just assumed a 32bit unsigned var would be enough, so unsigned long
> seemed just fine. I can use u32 or u64 if you prefer.

Yes, please.  See also Maxime's recent trouble with a 64-bit kernel.

>> > +		/* Even if the bandwidth/plane required for a single frame is
>> > +		 *
>> > +		 * vc4_state->src_w[i] * vc4_state->src_h[i] * cpp * vrefresh
>> > +		 *
>> > +		 * when downscaling, we have to read more pixels per line in
>> > +		 * the time frame reserved for a single line, so the bandwidth
>> > +		 * demand can be punctually higher. To account for that, we
>> > +		 * calculate the down-scaling factor and multiply the plane
>> > +		 * load by this number. We're likely over-estimating the read
>> > +		 * demand, but that's better than under-estimating it.
>> > +		 */
>> > +		vscale_factor = DIV_ROUND_UP(vc4_state->src_h[i],
>> > +					     vc4_state->crtc_h);
>> > +		pixels_load = vc4_state->src_w[i] * vc4_state->src_h[i] *
>> > +			      vscale_factor;  
>> 
>> If we're upscaling (common for video, right?), aren't we under-counting
>> the cost?  You need to scale/colorspace-convert crtc_w * crtc_h at 2
>> pixels per cycle.
>
> That's not entirely clear to me. I'm not sure what the scaler does when
> upscaling. Are the same pixels read several times from the memory? If
> that's the case, then the membus load should indeed be based on the
> crtc_w,h.

I'm going to punt on this question because that would be a *lot* of
verilog tracing to figure out for me (and I'm not sure I'd even trust
what I came up with).

> Also, when the spec says the HVS can process 4pixels/cycles, is it 4
> input pixels or 4 output pixels per cycle?

Well, it's 4 pixels per cycle when not scaling, so both :)

I think the scaling pipeline is doing two output pixels per cycle.
Nothing else would make sense to me.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 3/3] drm/vc4: Add a load tracker to prevent HVS underflow errors
  2018-11-08 16:26       ` Eric Anholt
@ 2018-11-08 16:50         ` Boris Brezillon
  2018-11-08 17:53           ` Eric Anholt
  0 siblings, 1 reply; 18+ messages in thread
From: Boris Brezillon @ 2018-11-08 16:50 UTC (permalink / raw)
  To: Eric Anholt; +Cc: dri-devel

On Thu, 08 Nov 2018 08:26:49 -0800
Eric Anholt <eric@anholt.net> wrote:

> >> > +static void vc4_plane_calc_load(struct drm_plane_state *state)
> >> > +{
> >> > +	unsigned int hvs_load_shift, vrefresh, i;
> >> > +	struct drm_framebuffer *fb = state->fb;
> >> > +	struct vc4_plane_state *vc4_state;
> >> > +	struct drm_crtc_state *crtc_state;
> >> > +	unsigned int vscale_factor;
> >> > +
> >> > +	vc4_state = to_vc4_plane_state(state);
> >> > +	crtc_state = drm_atomic_get_existing_crtc_state(state->state,
> >> > +							state->crtc);
> >> > +	vrefresh = drm_mode_vrefresh(&crtc_state->adjusted_mode);
> >> > +
> >> > +	/* The HVS is able to process 2 pixels/cycle when scaling the source,
> >> > +	 * 4 pixels/cycle otherwise.
> >> > +	 * Alpha blending step seems to be pipelined and it's always operating
> >> > +	 * at 4 pixels/cycle, so the limiting aspect here seems to be the
> >> > +	 * scaler block.
> >> > +	 * HVS load is expressed in clk-cycles/sec (AKA Hz).
> >> > +	 */
> >> > +	if (vc4_state->x_scaling[0] != VC4_SCALING_NONE ||
> >> > +	    vc4_state->x_scaling[1] != VC4_SCALING_NONE ||
> >> > +	    vc4_state->y_scaling[0] != VC4_SCALING_NONE ||
> >> > +	    vc4_state->y_scaling[1] != VC4_SCALING_NONE)
> >> > +		hvs_load_shift = 1;
> >> > +	else
> >> > +		hvs_load_shift = 2;
> >> > +
> >> > +	vc4_state->membus_load = 0;
> >> > +	vc4_state->hvs_load = 0;
> >> > +	for (i = 0; i < fb->format->num_planes; i++) {
> >> > +		unsigned long pixels_load;    
> >> 
> >> I'm scared any time I see longs.  Do you want 32 or 64 bits here?  
> >
> > I just assumed a 32bit unsigned var would be enough, so unsigned long
> > seemed just fine. I can use u32 or u64 if you prefer.  
> 
> Yes, please.  See also Maxime's recent trouble with a 64-bit kernel.

Will use u32 then.

> 
> >> > +		/* Even if the bandwidth/plane required for a single frame is
> >> > +		 *
> >> > +		 * vc4_state->src_w[i] * vc4_state->src_h[i] * cpp * vrefresh
> >> > +		 *
> >> > +		 * when downscaling, we have to read more pixels per line in
> >> > +		 * the time frame reserved for a single line, so the bandwidth
> >> > +		 * demand can be punctually higher. To account for that, we
> >> > +		 * calculate the down-scaling factor and multiply the plane
> >> > +		 * load by this number. We're likely over-estimating the read
> >> > +		 * demand, but that's better than under-estimating it.
> >> > +		 */
> >> > +		vscale_factor = DIV_ROUND_UP(vc4_state->src_h[i],
> >> > +					     vc4_state->crtc_h);
> >> > +		pixels_load = vc4_state->src_w[i] * vc4_state->src_h[i] *
> >> > +			      vscale_factor;    
> >> 
> >> If we're upscaling (common for video, right?), aren't we under-counting
> >> the cost?  You need to scale/colorspace-convert crtc_w * crtc_h at 2
> >> pixels per cycle.  
> >
> > That's not entirely clear to me. I'm not sure what the scaler does when
> > upscaling. Are the same pixels read several times from the memory? If
> > that's the case, then the membus load should indeed be based on the
> > crtc_w,h.  
> 
> I'm going to punt on this question because that would be a *lot* of
> verilog tracing to figure out for me (and I'm not sure I'd even trust
> what I came up with).
> 
> > Also, when the spec says the HVS can process 4pixels/cycles, is it 4
> > input pixels or 4 output pixels per cycle?  
> 
> Well, it's 4 pixels per cycle when not scaling, so both :)

Sorry, I meant 2pixels/cycle :).

> 
> I think the scaling pipeline is doing two output pixels per cycle.
> Nothing else would make sense to me.

Okay, so the HVS load should be based on crtc_w/h and the membus load
should be based on src_w/h and the scaling ratio, right?
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 3/3] drm/vc4: Add a load tracker to prevent HVS underflow errors
  2018-11-08 16:50         ` Boris Brezillon
@ 2018-11-08 17:53           ` Eric Anholt
  0 siblings, 0 replies; 18+ messages in thread
From: Eric Anholt @ 2018-11-08 17:53 UTC (permalink / raw)
  To: Boris Brezillon; +Cc: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 3925 bytes --]

Boris Brezillon <boris.brezillon@bootlin.com> writes:

> On Thu, 08 Nov 2018 08:26:49 -0800
> Eric Anholt <eric@anholt.net> wrote:
>
>> >> > +static void vc4_plane_calc_load(struct drm_plane_state *state)
>> >> > +{
>> >> > +	unsigned int hvs_load_shift, vrefresh, i;
>> >> > +	struct drm_framebuffer *fb = state->fb;
>> >> > +	struct vc4_plane_state *vc4_state;
>> >> > +	struct drm_crtc_state *crtc_state;
>> >> > +	unsigned int vscale_factor;
>> >> > +
>> >> > +	vc4_state = to_vc4_plane_state(state);
>> >> > +	crtc_state = drm_atomic_get_existing_crtc_state(state->state,
>> >> > +							state->crtc);
>> >> > +	vrefresh = drm_mode_vrefresh(&crtc_state->adjusted_mode);
>> >> > +
>> >> > +	/* The HVS is able to process 2 pixels/cycle when scaling the source,
>> >> > +	 * 4 pixels/cycle otherwise.
>> >> > +	 * Alpha blending step seems to be pipelined and it's always operating
>> >> > +	 * at 4 pixels/cycle, so the limiting aspect here seems to be the
>> >> > +	 * scaler block.
>> >> > +	 * HVS load is expressed in clk-cycles/sec (AKA Hz).
>> >> > +	 */
>> >> > +	if (vc4_state->x_scaling[0] != VC4_SCALING_NONE ||
>> >> > +	    vc4_state->x_scaling[1] != VC4_SCALING_NONE ||
>> >> > +	    vc4_state->y_scaling[0] != VC4_SCALING_NONE ||
>> >> > +	    vc4_state->y_scaling[1] != VC4_SCALING_NONE)
>> >> > +		hvs_load_shift = 1;
>> >> > +	else
>> >> > +		hvs_load_shift = 2;
>> >> > +
>> >> > +	vc4_state->membus_load = 0;
>> >> > +	vc4_state->hvs_load = 0;
>> >> > +	for (i = 0; i < fb->format->num_planes; i++) {
>> >> > +		unsigned long pixels_load;    
>> >> 
>> >> I'm scared any time I see longs.  Do you want 32 or 64 bits here?  
>> >
>> > I just assumed a 32bit unsigned var would be enough, so unsigned long
>> > seemed just fine. I can use u32 or u64 if you prefer.  
>> 
>> Yes, please.  See also Maxime's recent trouble with a 64-bit kernel.
>
> Will use u32 then.
>
>> 
>> >> > +		/* Even if the bandwidth/plane required for a single frame is
>> >> > +		 *
>> >> > +		 * vc4_state->src_w[i] * vc4_state->src_h[i] * cpp * vrefresh
>> >> > +		 *
>> >> > +		 * when downscaling, we have to read more pixels per line in
>> >> > +		 * the time frame reserved for a single line, so the bandwidth
>> >> > +		 * demand can be punctually higher. To account for that, we
>> >> > +		 * calculate the down-scaling factor and multiply the plane
>> >> > +		 * load by this number. We're likely over-estimating the read
>> >> > +		 * demand, but that's better than under-estimating it.
>> >> > +		 */
>> >> > +		vscale_factor = DIV_ROUND_UP(vc4_state->src_h[i],
>> >> > +					     vc4_state->crtc_h);
>> >> > +		pixels_load = vc4_state->src_w[i] * vc4_state->src_h[i] *
>> >> > +			      vscale_factor;    
>> >> 
>> >> If we're upscaling (common for video, right?), aren't we under-counting
>> >> the cost?  You need to scale/colorspace-convert crtc_w * crtc_h at 2
>> >> pixels per cycle.  
>> >
>> > That's not entirely clear to me. I'm not sure what the scaler does when
>> > upscaling. Are the same pixels read several times from the memory? If
>> > that's the case, then the membus load should indeed be based on the
>> > crtc_w,h.  
>> 
>> I'm going to punt on this question because that would be a *lot* of
>> verilog tracing to figure out for me (and I'm not sure I'd even trust
>> what I came up with).
>> 
>> > Also, when the spec says the HVS can process 4pixels/cycles, is it 4
>> > input pixels or 4 output pixels per cycle?  
>> 
>> Well, it's 4 pixels per cycle when not scaling, so both :)
>
> Sorry, I meant 2pixels/cycle :).
>
>> 
>> I think the scaling pipeline is doing two output pixels per cycle.
>> Nothing else would make sense to me.
>
> Okay, so the HVS load should be based on crtc_w/h and the membus load
> should be based on src_w/h and the scaling ratio, right?

That sounds about right to me.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 0/3] drm/vc4: Add a load tracker
  2018-10-25 12:45 [PATCH v2 0/3] drm/vc4: Add a load tracker Boris Brezillon
                   ` (2 preceding siblings ...)
  2018-10-25 12:45 ` [PATCH v2 3/3] drm/vc4: Add a load tracker to prevent HVS underflow errors Boris Brezillon
@ 2018-11-28  9:16 ` Paul Kocialkowski
  2018-11-28  9:29   ` Boris Brezillon
  3 siblings, 1 reply; 18+ messages in thread
From: Paul Kocialkowski @ 2018-11-28  9:16 UTC (permalink / raw)
  To: Boris Brezillon, Eric Anholt, Daniel Vetter; +Cc: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 2660 bytes --]

Hi,

On Thu, 2018-10-25 at 14:45 +0200, Boris Brezillon wrote:
> Hello,
> 
> This is the 2nd version of the VC4 load tracker patch.
> 
> Daniel, as you suggested, I've implemented a generic infrastructure to
> track and report underrun errors (patch 1). Not sure this is what you
> had in mind, but it seems to do the job for my use case, and should
> allow me to easily track regressions in the load tracking logic with a
> bunch of IGT tests. Let me know if you want it done differently.
> 
> Patch 2 is implementing the generic underrun interface in the VC4
> driver, and patch 3 is just adding the load tracking logic (hasn't
> changed since the RFC except for the unused 'ret' var removal).

For the whole series:
Tested-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>

I am currently integrating this with IGT testing and have a few general
remarks:

- I think it would make sense to have a driver-specific debugfs entry
for enabling/disabling the rejection of commits by the load tracker.
This would be useful for testing that there is no mismatch between the
load tracker's behavior and hardware-detected underruns.

- Underrun reporting with a generic debugfs entry is a good fit for IGT
(and userspace reporting in general), but it would be useful to have an
intermediary state reported between applying a commit and getting the
underrun status.

Something like returning '?' between setting a commit and the next
vblank. This way, there is no chance that userspace reads the underrun
status related to the previous configuration.

What do you think?

Cheers,

Paul

> Regards,
> 
> Boris
> 
> Boris Brezillon (3):
>   drm/atomic: Add a generic infrastructure to track underrun errors
>   drm/vc4: Report underrun errors
>   drm/vc4: Add a load tracker to prevent HVS underflow errors
> 
>  drivers/gpu/drm/drm_atomic.c        |  12 +++
>  drivers/gpu/drm/drm_atomic_helper.c |   4 +
>  drivers/gpu/drm/vc4/vc4_drv.c       |   1 +
>  drivers/gpu/drm/vc4/vc4_drv.h       |  14 ++++
>  drivers/gpu/drm/vc4/vc4_hvs.c       |  71 ++++++++++++++++++
>  drivers/gpu/drm/vc4/vc4_kms.c       | 110 +++++++++++++++++++++++++++-
>  drivers/gpu/drm/vc4/vc4_plane.c     |  60 +++++++++++++++
>  drivers/gpu/drm/vc4/vc4_regs.h      |  46 +++---------
>  include/drm/drm_atomic_helper.h     |  53 ++++++++++++++
>  include/drm/drm_device.h            |  15 ++++
>  include/drm/drm_mode_config.h       |  26 +++++++
>  11 files changed, 376 insertions(+), 36 deletions(-)
> 
-- 
Paul Kocialkowski, Bootlin (formerly Free Electrons)
Embedded Linux and kernel engineering
https://bootlin.com

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 0/3] drm/vc4: Add a load tracker
  2018-11-28  9:16 ` [PATCH v2 0/3] drm/vc4: Add a load tracker Paul Kocialkowski
@ 2018-11-28  9:29   ` Boris Brezillon
  2018-11-28 13:32     ` Paul Kocialkowski
  0 siblings, 1 reply; 18+ messages in thread
From: Boris Brezillon @ 2018-11-28  9:29 UTC (permalink / raw)
  To: Paul Kocialkowski; +Cc: dri-devel

On Wed, 28 Nov 2018 10:16:17 +0100
Paul Kocialkowski <paul.kocialkowski@bootlin.com> wrote:

> Hi,
> 
> On Thu, 2018-10-25 at 14:45 +0200, Boris Brezillon wrote:
> > Hello,
> > 
> > This is the 2nd version of the VC4 load tracker patch.
> > 
> > Daniel, as you suggested, I've implemented a generic infrastructure to
> > track and report underrun errors (patch 1). Not sure this is what you
> > had in mind, but it seems to do the job for my use case, and should
> > allow me to easily track regressions in the load tracking logic with a
> > bunch of IGT tests. Let me know if you want it done differently.
> > 
> > Patch 2 is implementing the generic underrun interface in the VC4
> > driver, and patch 3 is just adding the load tracking logic (hasn't
> > changed since the RFC except for the unused 'ret' var removal).  
> 
> For the whole series:
> Tested-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
> 
> I am currently integrating this with IGT testing and have a few general
> remarks:
> 
> - I think it would make sense to have a driver-specific debugfs entry
> for enabling/disabling the rejection of commits by the load tracker.
> This would be useful for testing that there is no mismatch between the
> load tracker's behavior and hardware-detected underruns.

Yep, makes sense.

> 
> - Underrun reporting with a generic debugfs entry is a good fit for IGT
> (and userspace reporting in general), but it would be useful to have an
> intermediary state reported between applying a commit and getting the
> underrun status.
> 
> Something like returning '?' between setting a commit and the next
> vblank. This way, there is no chance that userspace reads the underrun
> status related to the previous configuration.

You will never get the result of the previous atomic-set since I reset
the underrun state to 0 before committing the changes, but you might
read the underrun file before the underrun event happened. So yes,
waiting for at least one VBLANK sounds reasonable. Not sure we want to
automate that in the driver though, as this would imply activating
vblank interrupts to update the underrun state even if the user doesn't
care. Maybe you can use the DRM_IOCTL_WAIT_VBLANK ioctl instead.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 0/3] drm/vc4: Add a load tracker
  2018-11-28  9:29   ` Boris Brezillon
@ 2018-11-28 13:32     ` Paul Kocialkowski
  0 siblings, 0 replies; 18+ messages in thread
From: Paul Kocialkowski @ 2018-11-28 13:32 UTC (permalink / raw)
  To: Boris Brezillon; +Cc: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 2745 bytes --]

Hi,

On Wed, 2018-11-28 at 10:29 +0100, Boris Brezillon wrote:
> On Wed, 28 Nov 2018 10:16:17 +0100
> Paul Kocialkowski <paul.kocialkowski@bootlin.com> wrote:
> 
> > Hi,
> > 
> > On Thu, 2018-10-25 at 14:45 +0200, Boris Brezillon wrote:
> > > Hello,
> > > 
> > > This is the 2nd version of the VC4 load tracker patch.
> > > 
> > > Daniel, as you suggested, I've implemented a generic infrastructure to
> > > track and report underrun errors (patch 1). Not sure this is what you
> > > had in mind, but it seems to do the job for my use case, and should
> > > allow me to easily track regressions in the load tracking logic with a
> > > bunch of IGT tests. Let me know if you want it done differently.
> > > 
> > > Patch 2 is implementing the generic underrun interface in the VC4
> > > driver, and patch 3 is just adding the load tracking logic (hasn't
> > > changed since the RFC except for the unused 'ret' var removal).  
> > 
> > For the whole series:
> > Tested-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
> > 
> > I am currently integrating this with IGT testing and have a few general
> > remarks:
> > 
> > - I think it would make sense to have a driver-specific debugfs entry
> > for enabling/disabling the rejection of commits by the load tracker.
> > This would be useful for testing that there is no mismatch between the
> > load tracker's behavior and hardware-detected underruns.
> 
> Yep, makes sense.
> 
> > - Underrun reporting with a generic debugfs entry is a good fit for IGT
> > (and userspace reporting in general), but it would be useful to have an
> > intermediary state reported between applying a commit and getting the
> > underrun status.
> > 
> > Something like returning '?' between setting a commit and the next
> > vblank. This way, there is no chance that userspace reads the underrun
> > status related to the previous configuration.
> 
> You will never get the result of the previous atomic-set since I reset
> the underrun state to 0 before committing the changes, but you might
> read the underrun file before the underrun event happened. So yes,
> waiting for at least one VBLANK sounds reasonable. Not sure we want to
> automate that in the driver though, as this would imply activating
> vblank interrupts to update the underrun state even if the user doesn't
> care. Maybe you can use the DRM_IOCTL_WAIT_VBLANK ioctl instead.

Right, that works just fine on VC4 and I guess it's fair to expect that
future hardware that uses an underrun indication will have received the
underrun indication by the time vblank happens.

-- 
Paul Kocialkowski, Bootlin (formerly Free Electrons)
Embedded Linux and kernel engineering
https://bootlin.com

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2018-11-28 13:32 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-25 12:45 [PATCH v2 0/3] drm/vc4: Add a load tracker Boris Brezillon
2018-10-25 12:45 ` [PATCH v2 1/3] drm/atomic: Add a generic infrastructure to track underrun errors Boris Brezillon
2018-10-26 10:33   ` Daniel Vetter
2018-10-26 12:36     ` Boris Brezillon
2018-10-26 13:36       ` Daniel Vetter
2018-10-26 14:13         ` Boris Brezillon
2018-10-26 14:23           ` Daniel Vetter
2018-10-25 12:45 ` [PATCH v2 2/3] drm/vc4: Report " Boris Brezillon
2018-10-25 12:45 ` [PATCH v2 3/3] drm/vc4: Add a load tracker to prevent HVS underflow errors Boris Brezillon
2018-10-30 23:12   ` Eric Anholt
2018-11-05 11:36     ` Boris Brezillon
2018-11-08 16:26       ` Eric Anholt
2018-11-08 16:50         ` Boris Brezillon
2018-11-08 17:53           ` Eric Anholt
2018-11-06 13:07     ` Boris Brezillon
2018-11-28  9:16 ` [PATCH v2 0/3] drm/vc4: Add a load tracker Paul Kocialkowski
2018-11-28  9:29   ` Boris Brezillon
2018-11-28 13:32     ` Paul Kocialkowski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.