All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/i915: Interactive RPS mode
@ 2018-07-11 21:01 Chris Wilson
  2018-07-11 22:06 ` ✗ Fi.CI.CHECKPATCH: warning for " Patchwork
                   ` (5 more replies)
  0 siblings, 6 replies; 26+ messages in thread
From: Chris Wilson @ 2018-07-11 21:01 UTC (permalink / raw)
  To: intel-gfx

RPS provides a feedback loop where we use the load during the previous
evaluation interval to decide whether to up or down clock the GPU
frequency. Our responsiveness is split into 3 regimes, a high and low
plateau with the intent to keep the gpu clocked high to cover occasional
stalls under high load, and low despite occasional glitches under steady
low load, and inbetween. However, we run into situations like kodi where
we want to stay at low power (video decoding is done efficiently
inside the fixed function HW and doesn't need high clocks even for high
bitrate streams), but just occasionally the pipeline is more complex
than a video decode and we need a smidgen of extra GPU power to present
on time. In the high power regime, we sample at sub frame intervals with
a bias to upclocking, and conversely at low power we sample over a few
frames worth to provide what we consider to be the right levels of
responsiveness respectively. At low power, we more or less expect to be
kicked out to high power at the start of a busy sequence by waitboosting.

Prior to commit e9af4ea2b9e7 ("drm/i915: Avoid waitboosting on the active
request") whenever we missed the frame or stalled, we would immediate go
full throttle and upclock the GPU to max. But in commit e9af4ea2b9e7, we
relaxed the waitboosting to only apply if the pipeline was deep to avoid
over-committing resources for a near miss. Sadly though, a near miss is
still a miss, and perceptible as jitter in the frame delivery.

To try and prevent the near miss before having to resort to boosting
after the fact, we use the pageflip queue as an indication that we are
in an "interactive" regime and so should sample the load more frequently
to provide power before the frame misses it vblank. This will make us
more favorable to providing a small power increase (one or two bins) as
required rather than going all the way to maximum and then having to
work back down again. (We still keep the waitboosting mechanism around
just in case a dramatic change in system load requires urgent uplocking,
faster than we can provide in a few evaluation intervals.)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107111
Fixes: e9af4ea2b9e7 ("drm/i915: Avoid waitboosting on the active request")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Radoslaw Szwichtenberg <radoslaw.szwichtenberg@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c  |  1 +
 drivers/gpu/drm/i915/i915_drv.h      |  6 +-
 drivers/gpu/drm/i915/intel_display.c | 20 ++++++
 drivers/gpu/drm/i915/intel_drv.h     |  2 +
 drivers/gpu/drm/i915/intel_pm.c      | 91 +++++++++++++++++++---------
 5 files changed, 90 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 099f97ef2303..4d3daed1e6d6 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2218,6 +2218,7 @@ static int i915_rps_boost_info(struct seq_file *m, void *data)
 	seq_printf(m, "CPU waiting? %d\n", count_irq_waiters(dev_priv));
 	seq_printf(m, "Boosts outstanding? %d\n",
 		   atomic_read(&rps->num_waiters));
+	seq_printf(m, "Power overrides? %d\n", READ_ONCE(rps->power_override));
 	seq_printf(m, "Frequency requested %d\n",
 		   intel_gpu_freq(dev_priv, rps->cur_freq));
 	seq_printf(m, "  min hard:%d, soft:%d; max soft:%d, hard:%d\n",
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 01dd29837233..4312094417a2 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -783,7 +783,9 @@ struct intel_rps {
 	u8 down_threshold; /* Current %busy required to downclock */
 
 	int last_adj;
-	enum { LOW_POWER, BETWEEN, HIGH_POWER } power;
+	enum { LOW_POWER, BETWEEN, HIGH_POWER, AUTO_POWER } power;
+	unsigned int power_override;
+	struct mutex power_lock;
 
 	bool enabled;
 	atomic_t num_waiters;
@@ -3429,6 +3431,8 @@ extern void i915_redisable_vga_power_on(struct drm_i915_private *dev_priv);
 extern bool ironlake_set_drps(struct drm_i915_private *dev_priv, u8 val);
 extern void intel_init_pch_refclk(struct drm_i915_private *dev_priv);
 extern int intel_set_rps(struct drm_i915_private *dev_priv, u8 val);
+extern void intel_rps_set_power(struct drm_i915_private *dev_priv,
+				int new_power);
 extern bool intel_set_memory_cxsr(struct drm_i915_private *dev_priv,
 				  bool enable);
 
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 7998e70a3174..1d1643285cee 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -13104,6 +13104,19 @@ intel_prepare_plane_fb(struct drm_plane *plane,
 		add_rps_boost_after_vblank(new_state->crtc, new_state->fence);
 	}
 
+	/*
+	 * We declare pageflips to be interactive and so merit a small bias
+	 * towards upclocking to deliver the frame on time. By only changing
+	 * the RPS thresholds to sample more regularly and aim for higher
+	 * clocks we can hopefully deliver low power workloads (like kodi)
+	 * that are not quite steady state without resorting to forcing
+	 * maximum clocks following a vblank miss (see do_rps_boost()).
+	 */
+	if (!intel_state->rps_override) {
+		intel_rps_set_power(dev_priv, HIGH_POWER);
+		intel_state->rps_override = true;
+	}
+
 	return 0;
 }
 
@@ -13120,8 +13133,15 @@ void
 intel_cleanup_plane_fb(struct drm_plane *plane,
 		       struct drm_plane_state *old_state)
 {
+	struct intel_atomic_state *intel_state =
+		to_intel_atomic_state(old_state->state);
 	struct drm_i915_private *dev_priv = to_i915(plane->dev);
 
+	if (intel_state->rps_override) {
+		intel_rps_set_power(dev_priv, AUTO_POWER);
+		intel_state->rps_override = false;
+	}
+
 	/* Should only be called after a successful intel_prepare_plane_fb()! */
 	mutex_lock(&dev_priv->drm.struct_mutex);
 	intel_plane_unpin_fb(to_intel_plane_state(old_state));
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index 61e715ddd0d5..86685c43e1d2 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -482,6 +482,8 @@ struct intel_atomic_state {
 	 */
 	bool skip_intermediate_wm;
 
+	bool rps_override;
+
 	/* Gen9+ only */
 	struct skl_ddb_values wm_results;
 
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 53aaaa3e6886..4a4a9ce6d536 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -6264,41 +6264,14 @@ static u32 intel_rps_limits(struct drm_i915_private *dev_priv, u8 val)
 	return limits;
 }
 
-static void gen6_set_rps_thresholds(struct drm_i915_private *dev_priv, u8 val)
+static void rps_set_power(struct drm_i915_private *dev_priv, int new_power)
 {
 	struct intel_rps *rps = &dev_priv->gt_pm.rps;
-	int new_power;
 	u32 threshold_up = 0, threshold_down = 0; /* in % */
 	u32 ei_up = 0, ei_down = 0;
 
-	new_power = rps->power;
-	switch (rps->power) {
-	case LOW_POWER:
-		if (val > rps->efficient_freq + 1 &&
-		    val > rps->cur_freq)
-			new_power = BETWEEN;
-		break;
+	lockdep_assert_held(&rps->power_lock);
 
-	case BETWEEN:
-		if (val <= rps->efficient_freq &&
-		    val < rps->cur_freq)
-			new_power = LOW_POWER;
-		else if (val >= rps->rp0_freq &&
-			 val > rps->cur_freq)
-			new_power = HIGH_POWER;
-		break;
-
-	case HIGH_POWER:
-		if (val < (rps->rp1_freq + rps->rp0_freq) >> 1 &&
-		    val < rps->cur_freq)
-			new_power = BETWEEN;
-		break;
-	}
-	/* Max/min bins are special */
-	if (val <= rps->min_freq_softlimit)
-		new_power = LOW_POWER;
-	if (val >= rps->max_freq_softlimit)
-		new_power = HIGH_POWER;
 	if (new_power == rps->power)
 		return;
 
@@ -6365,9 +6338,68 @@ static void gen6_set_rps_thresholds(struct drm_i915_private *dev_priv, u8 val)
 	rps->power = new_power;
 	rps->up_threshold = threshold_up;
 	rps->down_threshold = threshold_down;
+}
+
+static void gen6_set_rps_thresholds(struct drm_i915_private *dev_priv, u8 val)
+{
+	struct intel_rps *rps = &dev_priv->gt_pm.rps;
+	int new_power;
+
+	new_power = rps->power;
+	switch (rps->power) {
+	case AUTO_POWER:
+	case LOW_POWER:
+		if (val > rps->efficient_freq + 1 &&
+		    val > rps->cur_freq)
+			new_power = BETWEEN;
+		break;
+
+	case BETWEEN:
+		if (val <= rps->efficient_freq &&
+		    val < rps->cur_freq)
+			new_power = LOW_POWER;
+		else if (val >= rps->rp0_freq &&
+			 val > rps->cur_freq)
+			new_power = HIGH_POWER;
+		break;
+
+	case HIGH_POWER:
+		if (val < (rps->rp1_freq + rps->rp0_freq) >> 1 &&
+		    val < rps->cur_freq)
+			new_power = BETWEEN;
+		break;
+	}
+	/* Max/min bins are special */
+	if (val <= rps->min_freq_softlimit)
+		new_power = LOW_POWER;
+	if (val >= rps->max_freq_softlimit)
+		new_power = HIGH_POWER;
+
+	mutex_lock(&rps->power_lock);
+	if (!rps->power_override)
+		rps_set_power(dev_priv, new_power);
+	mutex_unlock(&rps->power_lock);
 	rps->last_adj = 0;
 }
 
+void intel_rps_set_power(struct drm_i915_private *dev_priv, int power)
+{
+	struct intel_rps *rps = &dev_priv->gt_pm.rps;
+
+	if (INTEL_GEN(dev_priv) < 6)
+		return;
+
+	mutex_lock(&rps->power_lock);
+	if (power != AUTO_POWER) {
+		if (!rps->power_override++)
+			rps_set_power(dev_priv, power);
+		GEM_BUG_ON(rps->power != power);
+	} else {
+		rps->power_override--;
+	}
+	mutex_unlock(&rps->power_lock);
+}
+
 static u32 gen6_rps_pm_mask(struct drm_i915_private *dev_priv, u8 val)
 {
 	struct intel_rps *rps = &dev_priv->gt_pm.rps;
@@ -9604,6 +9636,7 @@ int intel_freq_opcode(struct drm_i915_private *dev_priv, int val)
 void intel_pm_setup(struct drm_i915_private *dev_priv)
 {
 	mutex_init(&dev_priv->pcu_lock);
+	mutex_init(&dev_priv->gt_pm.rps.power_lock);
 
 	atomic_set(&dev_priv->gt_pm.rps.num_waiters, 0);
 
-- 
2.18.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* ✗ Fi.CI.CHECKPATCH: warning for drm/i915: Interactive RPS mode
  2018-07-11 21:01 [PATCH] drm/i915: Interactive RPS mode Chris Wilson
@ 2018-07-11 22:06 ` Patchwork
  2018-07-11 22:07 ` ✗ Fi.CI.SPARSE: " Patchwork
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 26+ messages in thread
From: Patchwork @ 2018-07-11 22:06 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Interactive RPS mode
URL   : https://patchwork.freedesktop.org/series/46334/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
f46e1cca8c23 drm/i915: Interactive RPS mode
-:24: ERROR:GIT_COMMIT_ID: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' - ie: 'commit e9af4ea2b9e7 ("drm/i915: Avoid waitboosting on the active request")'
#24: 
full throttle and upclock the GPU to max. But in commit e9af4ea2b9e7, we

-:69: CHECK:UNCOMMENTED_DEFINITION: struct mutex definition without comment
#69: FILE: drivers/gpu/drm/i915/i915_drv.h:788:
+	struct mutex power_lock;

-:77: CHECK:AVOID_EXTERNS: extern prototypes should be avoided in .h files
#77: FILE: drivers/gpu/drm/i915/i915_drv.h:3434:
+extern void intel_rps_set_power(struct drm_i915_private *dev_priv,

total: 1 errors, 0 warnings, 2 checks, 185 lines checked

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* ✗ Fi.CI.SPARSE: warning for drm/i915: Interactive RPS mode
  2018-07-11 21:01 [PATCH] drm/i915: Interactive RPS mode Chris Wilson
  2018-07-11 22:06 ` ✗ Fi.CI.CHECKPATCH: warning for " Patchwork
@ 2018-07-11 22:07 ` Patchwork
  2018-07-11 22:23 ` ✗ Fi.CI.BAT: failure " Patchwork
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 26+ messages in thread
From: Patchwork @ 2018-07-11 22:07 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Interactive RPS mode
URL   : https://patchwork.freedesktop.org/series/46334/
State : warning

== Summary ==

$ dim sparse origin/drm-tip
Commit: drm/i915: Interactive RPS mode
-drivers/gpu/drm/i915/selftests/../i915_drv.h:3652:16: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/../i915_drv.h:3656:16: warning: expression using sizeof(void)

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* ✗ Fi.CI.BAT: failure for drm/i915: Interactive RPS mode
  2018-07-11 21:01 [PATCH] drm/i915: Interactive RPS mode Chris Wilson
  2018-07-11 22:06 ` ✗ Fi.CI.CHECKPATCH: warning for " Patchwork
  2018-07-11 22:07 ` ✗ Fi.CI.SPARSE: " Patchwork
@ 2018-07-11 22:23 ` Patchwork
  2018-07-12  7:50 ` [PATCH v3] " Chris Wilson
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 26+ messages in thread
From: Patchwork @ 2018-07-11 22:23 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Interactive RPS mode
URL   : https://patchwork.freedesktop.org/series/46334/
State : failure

== Summary ==

= CI Bug Log - changes from CI_DRM_4473 -> Patchwork_9617 =

== Summary - FAILURE ==

  Serious unknown changes coming with Patchwork_9617 absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_9617, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/46334/revisions/1/mbox/

== Possible new issues ==

  Here are the unknown changes that may have been introduced in Patchwork_9617:

  === IGT changes ===

    ==== Possible regressions ====

    igt@drv_module_reload@basic-no-display:
      fi-snb-2520m:       PASS -> DMESG-WARN +3
      fi-ivb-3520m:       PASS -> DMESG-WARN +3
      fi-glk-j4005:       PASS -> DMESG-WARN +3
      fi-bdw-5557u:       PASS -> DMESG-WARN +2
      fi-hsw-4770r:       PASS -> DMESG-WARN +2

    igt@drv_module_reload@basic-reload:
      fi-kbl-7500u:       PASS -> DMESG-WARN +3

    igt@drv_module_reload@basic-reload-inject:
      fi-snb-2600:        PASS -> DMESG-WARN +3
      {fi-cfl-8109u}:     PASS -> DMESG-WARN +3

    igt@gem_mmap_gtt@basic-copy:
      {fi-kbl-8809g}:     NOTRUN -> DMESG-WARN +9

    igt@kms_busy@basic-flip-a:
      fi-kbl-7567u:       PASS -> DMESG-WARN +3
      fi-whl-u:           PASS -> DMESG-WARN
      fi-skl-6700k2:      PASS -> DMESG-WARN
      fi-skl-6770hq:      PASS -> DMESG-WARN
      fi-bxt-j4205:       PASS -> DMESG-WARN
      fi-skl-6260u:       PASS -> DMESG-WARN
      fi-skl-gvtdvm:      PASS -> DMESG-WARN
      fi-bdw-gvtdvm:      PASS -> DMESG-WARN
      fi-bxt-dsi:         PASS -> DMESG-WARN
      fi-cfl-8700k:       PASS -> DMESG-WARN
      fi-ivb-3770:        PASS -> DMESG-WARN
      fi-skl-6700hq:      PASS -> DMESG-WARN
      fi-glk-dsi:         PASS -> DMESG-WARN

    igt@kms_busy@basic-flip-c:
      fi-bsw-n3050:       PASS -> DMESG-WARN +3

    
== Known issues ==

  Here are the changes found in Patchwork_9617 that come from known issues:

  === IGT changes ===

    ==== Issues hit ====

    igt@kms_busy@basic-flip-a:
      fi-skl-guc:         NOTRUN -> DMESG-WARN (fdo#105710, fdo#106679)
      fi-cfl-guc:         PASS -> DMESG-WARN (fdo#106679)

    igt@kms_pipe_crc_basic@suspend-read-crc-pipe-c:
      fi-bxt-dsi:         PASS -> INCOMPLETE (fdo#103927)

    
    ==== Possible fixes ====

    igt@kms_pipe_crc_basic@suspend-read-crc-pipe-c:
      fi-skl-6700k2:      FAIL (fdo#103191) -> PASS

    
    ==== Warnings ====

    igt@drv_module_reload@basic-no-display:
      {fi-skl-iommu}:     FAIL (fdo#106066) -> DMESG-FAIL (fdo#107189) +2

    igt@gem_exec_suspend@basic-s4-devices:
      {fi-kbl-8809g}:     INCOMPLETE (fdo#107139) -> DMESG-WARN (fdo#107139)

    
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  fdo#103191 https://bugs.freedesktop.org/show_bug.cgi?id=103191
  fdo#103927 https://bugs.freedesktop.org/show_bug.cgi?id=103927
  fdo#105710 https://bugs.freedesktop.org/show_bug.cgi?id=105710
  fdo#106066 https://bugs.freedesktop.org/show_bug.cgi?id=106066
  fdo#106679 https://bugs.freedesktop.org/show_bug.cgi?id=106679
  fdo#107139 https://bugs.freedesktop.org/show_bug.cgi?id=107139
  fdo#107189 https://bugs.freedesktop.org/show_bug.cgi?id=107189


== Participating hosts (46 -> 41) ==

  Additional (1): fi-skl-guc 
  Missing    (6): fi-ilk-m540 fi-hsw-4200u fi-byt-j1900 fi-byt-squawks fi-bsw-cyan fi-ctg-p8600 


== Build changes ==

    * Linux: CI_DRM_4473 -> Patchwork_9617

  CI_DRM_4473: 17c95b123220c6896ac9cf99c78be4709da704d4 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4549: e19fd5549e9cf603251704117fc64f4068be5016 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_9617: f46e1cca8c23270f08968290b627b74e7f65545c @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

f46e1cca8c23 drm/i915: Interactive RPS mode

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_9617/issues.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v3] drm/i915: Interactive RPS mode
  2018-07-11 21:01 [PATCH] drm/i915: Interactive RPS mode Chris Wilson
                   ` (2 preceding siblings ...)
  2018-07-11 22:23 ` ✗ Fi.CI.BAT: failure " Patchwork
@ 2018-07-12  7:50 ` Chris Wilson
  2018-07-12  8:02 ` [PATCH v4] " Chris Wilson
  2018-07-23 10:01 ` [PATCH] drm/i915: Interactive RPS mode Chris Wilson
  5 siblings, 0 replies; 26+ messages in thread
From: Chris Wilson @ 2018-07-12  7:50 UTC (permalink / raw)
  To: intel-gfx

RPS provides a feedback loop where we use the load during the previous
evaluation interval to decide whether to up or down clock the GPU
frequency. Our responsiveness is split into 3 regimes, a high and low
plateau with the intent to keep the gpu clocked high to cover occasional
stalls under high load, and low despite occasional glitches under steady
low load, and inbetween. However, we run into situations like kodi where
we want to stay at low power (video decoding is done efficiently
inside the fixed function HW and doesn't need high clocks even for high
bitrate streams), but just occasionally the pipeline is more complex
than a video decode and we need a smidgen of extra GPU power to present
on time. In the high power regime, we sample at sub frame intervals with
a bias to upclocking, and conversely at low power we sample over a few
frames worth to provide what we consider to be the right levels of
responsiveness respectively. At low power, we more or less expect to be
kicked out to high power at the start of a busy sequence by waitboosting.

Prior to commit e9af4ea2b9e7 ("drm/i915: Avoid waitboosting on the active
request") whenever we missed the frame or stalled, we would immediate go
full throttle and upclock the GPU to max. But in commit e9af4ea2b9e7, we
relaxed the waitboosting to only apply if the pipeline was deep to avoid
over-committing resources for a near miss. Sadly though, a near miss is
still a miss, and perceptible as jitter in the frame delivery.

To try and prevent the near miss before having to resort to boosting
after the fact, we use the pageflip queue as an indication that we are
in an "interactive" regime and so should sample the load more frequently
to provide power before the frame misses it vblank. This will make us
more favorable to providing a small power increase (one or two bins) as
required rather than going all the way to maximum and then having to
work back down again. (We still keep the waitboosting mechanism around
just in case a dramatic change in system load requires urgent uplocking,
faster than we can provide in a few evaluation intervals.)

v2: Reduce rps_set_interactive to a boolean parameter to avoid the
confusion of what if they wanted a new power mode after pinning to a
different mode (which to choose?)
v3: Only reprogram RPS while the GT is awake, it will be set when we
wake the GT, and while off warns about being used outside of rpm.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107111
Fixes: e9af4ea2b9e7 ("drm/i915: Avoid waitboosting on the active request")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Radoslaw Szwichtenberg <radoslaw.szwichtenberg@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c  |  1 +
 drivers/gpu/drm/i915/i915_drv.h      |  4 ++
 drivers/gpu/drm/i915/intel_display.c | 20 +++++++
 drivers/gpu/drm/i915/intel_drv.h     |  2 +
 drivers/gpu/drm/i915/intel_pm.c      | 89 +++++++++++++++++++---------
 5 files changed, 87 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 099f97ef2303..4d3daed1e6d6 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2218,6 +2218,7 @@ static int i915_rps_boost_info(struct seq_file *m, void *data)
 	seq_printf(m, "CPU waiting? %d\n", count_irq_waiters(dev_priv));
 	seq_printf(m, "Boosts outstanding? %d\n",
 		   atomic_read(&rps->num_waiters));
+	seq_printf(m, "Power overrides? %d\n", READ_ONCE(rps->power_override));
 	seq_printf(m, "Frequency requested %d\n",
 		   intel_gpu_freq(dev_priv, rps->cur_freq));
 	seq_printf(m, "  min hard:%d, soft:%d; max soft:%d, hard:%d\n",
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 01dd29837233..afbc9c337852 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -784,6 +784,8 @@ struct intel_rps {
 
 	int last_adj;
 	enum { LOW_POWER, BETWEEN, HIGH_POWER } power;
+	unsigned int power_override;
+	struct mutex power_lock;
 
 	bool enabled;
 	atomic_t num_waiters;
@@ -3429,6 +3431,8 @@ extern void i915_redisable_vga_power_on(struct drm_i915_private *dev_priv);
 extern bool ironlake_set_drps(struct drm_i915_private *dev_priv, u8 val);
 extern void intel_init_pch_refclk(struct drm_i915_private *dev_priv);
 extern int intel_set_rps(struct drm_i915_private *dev_priv, u8 val);
+extern void intel_rps_set_interactive(struct drm_i915_private *dev_priv,
+				      bool state);
 extern bool intel_set_memory_cxsr(struct drm_i915_private *dev_priv,
 				  bool enable);
 
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 7998e70a3174..5809366ff9f0 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -13104,6 +13104,19 @@ intel_prepare_plane_fb(struct drm_plane *plane,
 		add_rps_boost_after_vblank(new_state->crtc, new_state->fence);
 	}
 
+	/*
+	 * We declare pageflips to be interactive and so merit a small bias
+	 * towards upclocking to deliver the frame on time. By only changing
+	 * the RPS thresholds to sample more regularly and aim for higher
+	 * clocks we can hopefully deliver low power workloads (like kodi)
+	 * that are not quite steady state without resorting to forcing
+	 * maximum clocks following a vblank miss (see do_rps_boost()).
+	 */
+	if (!intel_state->rps_interactive) {
+		intel_rps_set_interactive(dev_priv, true);
+		intel_state->rps_interactive = true;
+	}
+
 	return 0;
 }
 
@@ -13120,8 +13133,15 @@ void
 intel_cleanup_plane_fb(struct drm_plane *plane,
 		       struct drm_plane_state *old_state)
 {
+	struct intel_atomic_state *intel_state =
+		to_intel_atomic_state(old_state->state);
 	struct drm_i915_private *dev_priv = to_i915(plane->dev);
 
+	if (intel_state->rps_interactive) {
+		intel_rps_set_interactive(dev_priv, false);
+		intel_state->rps_interactive = false;
+	}
+
 	/* Should only be called after a successful intel_prepare_plane_fb()! */
 	mutex_lock(&dev_priv->drm.struct_mutex);
 	intel_plane_unpin_fb(to_intel_plane_state(old_state));
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index 61e715ddd0d5..544812488821 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -482,6 +482,8 @@ struct intel_atomic_state {
 	 */
 	bool skip_intermediate_wm;
 
+	bool rps_interactive;
+
 	/* Gen9+ only */
 	struct skl_ddb_values wm_results;
 
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 53aaaa3e6886..edc309fec5e1 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -6264,41 +6264,14 @@ static u32 intel_rps_limits(struct drm_i915_private *dev_priv, u8 val)
 	return limits;
 }
 
-static void gen6_set_rps_thresholds(struct drm_i915_private *dev_priv, u8 val)
+static void rps_set_power(struct drm_i915_private *dev_priv, int new_power)
 {
 	struct intel_rps *rps = &dev_priv->gt_pm.rps;
-	int new_power;
 	u32 threshold_up = 0, threshold_down = 0; /* in % */
 	u32 ei_up = 0, ei_down = 0;
 
-	new_power = rps->power;
-	switch (rps->power) {
-	case LOW_POWER:
-		if (val > rps->efficient_freq + 1 &&
-		    val > rps->cur_freq)
-			new_power = BETWEEN;
-		break;
+	lockdep_assert_held(&rps->power_lock);
 
-	case BETWEEN:
-		if (val <= rps->efficient_freq &&
-		    val < rps->cur_freq)
-			new_power = LOW_POWER;
-		else if (val >= rps->rp0_freq &&
-			 val > rps->cur_freq)
-			new_power = HIGH_POWER;
-		break;
-
-	case HIGH_POWER:
-		if (val < (rps->rp1_freq + rps->rp0_freq) >> 1 &&
-		    val < rps->cur_freq)
-			new_power = BETWEEN;
-		break;
-	}
-	/* Max/min bins are special */
-	if (val <= rps->min_freq_softlimit)
-		new_power = LOW_POWER;
-	if (val >= rps->max_freq_softlimit)
-		new_power = HIGH_POWER;
 	if (new_power == rps->power)
 		return;
 
@@ -6365,9 +6338,66 @@ static void gen6_set_rps_thresholds(struct drm_i915_private *dev_priv, u8 val)
 	rps->power = new_power;
 	rps->up_threshold = threshold_up;
 	rps->down_threshold = threshold_down;
+}
+
+static void gen6_set_rps_thresholds(struct drm_i915_private *dev_priv, u8 val)
+{
+	struct intel_rps *rps = &dev_priv->gt_pm.rps;
+	int new_power;
+
+	new_power = rps->power;
+	switch (rps->power) {
+	case LOW_POWER:
+		if (val > rps->efficient_freq + 1 &&
+		    val > rps->cur_freq)
+			new_power = BETWEEN;
+		break;
+
+	case BETWEEN:
+		if (val <= rps->efficient_freq &&
+		    val < rps->cur_freq)
+			new_power = LOW_POWER;
+		else if (val >= rps->rp0_freq &&
+			 val > rps->cur_freq)
+			new_power = HIGH_POWER;
+		break;
+
+	case HIGH_POWER:
+		if (val < (rps->rp1_freq + rps->rp0_freq) >> 1 &&
+		    val < rps->cur_freq)
+			new_power = BETWEEN;
+		break;
+	}
+	/* Max/min bins are special */
+	if (val <= rps->min_freq_softlimit)
+		new_power = LOW_POWER;
+	if (val >= rps->max_freq_softlimit)
+		new_power = HIGH_POWER;
+
+	mutex_lock(&rps->power_lock);
+	if (!rps->power_override)
+		rps_set_power(dev_priv, new_power);
+	mutex_unlock(&rps->power_lock);
 	rps->last_adj = 0;
 }
 
+void intel_rps_set_interactive(struct drm_i915_private *dev_priv, bool state)
+{
+	struct intel_rps *rps = &dev_priv->gt_pm.rps;
+
+	if (INTEL_GEN(dev_priv) < 6)
+		return;
+
+	mutex_lock(&rps->power_lock);
+	if (state) {
+		if (!rps->power_override++ && READ_ONCE(dev_priv->gt.awake))
+			rps_set_power(dev_priv, HIGH_POWER);
+	} else {
+		rps->power_override--;
+	}
+	mutex_unlock(&rps->power_lock);
+}
+
 static u32 gen6_rps_pm_mask(struct drm_i915_private *dev_priv, u8 val)
 {
 	struct intel_rps *rps = &dev_priv->gt_pm.rps;
@@ -9604,6 +9634,7 @@ int intel_freq_opcode(struct drm_i915_private *dev_priv, int val)
 void intel_pm_setup(struct drm_i915_private *dev_priv)
 {
 	mutex_init(&dev_priv->pcu_lock);
+	mutex_init(&dev_priv->gt_pm.rps.power_lock);
 
 	atomic_set(&dev_priv->gt_pm.rps.num_waiters, 0);
 
-- 
2.18.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v4] drm/i915: Interactive RPS mode
  2018-07-11 21:01 [PATCH] drm/i915: Interactive RPS mode Chris Wilson
                   ` (3 preceding siblings ...)
  2018-07-12  7:50 ` [PATCH v3] " Chris Wilson
@ 2018-07-12  8:02 ` Chris Wilson
  2018-07-12 17:38   ` [RESEND] " Chris Wilson
                     ` (3 more replies)
  2018-07-23 10:01 ` [PATCH] drm/i915: Interactive RPS mode Chris Wilson
  5 siblings, 4 replies; 26+ messages in thread
From: Chris Wilson @ 2018-07-12  8:02 UTC (permalink / raw)
  To: intel-gfx

RPS provides a feedback loop where we use the load during the previous
evaluation interval to decide whether to up or down clock the GPU
frequency. Our responsiveness is split into 3 regimes, a high and low
plateau with the intent to keep the gpu clocked high to cover occasional
stalls under high load, and low despite occasional glitches under steady
low load, and inbetween. However, we run into situations like kodi where
we want to stay at low power (video decoding is done efficiently
inside the fixed function HW and doesn't need high clocks even for high
bitrate streams), but just occasionally the pipeline is more complex
than a video decode and we need a smidgen of extra GPU power to present
on time. In the high power regime, we sample at sub frame intervals with
a bias to upclocking, and conversely at low power we sample over a few
frames worth to provide what we consider to be the right levels of
responsiveness respectively. At low power, we more or less expect to be
kicked out to high power at the start of a busy sequence by waitboosting.

Prior to commit e9af4ea2b9e7 ("drm/i915: Avoid waitboosting on the active
request") whenever we missed the frame or stalled, we would immediate go
full throttle and upclock the GPU to max. But in commit e9af4ea2b9e7, we
relaxed the waitboosting to only apply if the pipeline was deep to avoid
over-committing resources for a near miss. Sadly though, a near miss is
still a miss, and perceptible as jitter in the frame delivery.

To try and prevent the near miss before having to resort to boosting
after the fact, we use the pageflip queue as an indication that we are
in an "interactive" regime and so should sample the load more frequently
to provide power before the frame misses it vblank. This will make us
more favorable to providing a small power increase (one or two bins) as
required rather than going all the way to maximum and then having to
work back down again. (We still keep the waitboosting mechanism around
just in case a dramatic change in system load requires urgent uplocking,
faster than we can provide in a few evaluation intervals.)

v2: Reduce rps_set_interactive to a boolean parameter to avoid the
confusion of what if they wanted a new power mode after pinning to a
different mode (which to choose?)
v3: Only reprogram RPS while the GT is awake, it will be set when we
wake the GT, and while off warns about being used outside of rpm.
v4: Fix deferred application of interactive mode

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107111
Fixes: e9af4ea2b9e7 ("drm/i915: Avoid waitboosting on the active request")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Radoslaw Szwichtenberg <radoslaw.szwichtenberg@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c  |  1 +
 drivers/gpu/drm/i915/i915_drv.h      |  4 ++
 drivers/gpu/drm/i915/intel_display.c | 20 ++++++
 drivers/gpu/drm/i915/intel_drv.h     |  2 +
 drivers/gpu/drm/i915/intel_pm.c      | 91 +++++++++++++++++++---------
 5 files changed, 89 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 099f97ef2303..ac019bb927d0 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2218,6 +2218,7 @@ static int i915_rps_boost_info(struct seq_file *m, void *data)
 	seq_printf(m, "CPU waiting? %d\n", count_irq_waiters(dev_priv));
 	seq_printf(m, "Boosts outstanding? %d\n",
 		   atomic_read(&rps->num_waiters));
+	seq_printf(m, "Interactive? %d\n", READ_ONCE(rps->interactive));
 	seq_printf(m, "Frequency requested %d\n",
 		   intel_gpu_freq(dev_priv, rps->cur_freq));
 	seq_printf(m, "  min hard:%d, soft:%d; max soft:%d, hard:%d\n",
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 01dd29837233..f02fbeee553f 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -784,6 +784,8 @@ struct intel_rps {
 
 	int last_adj;
 	enum { LOW_POWER, BETWEEN, HIGH_POWER } power;
+	unsigned int interactive;
+	struct mutex power_lock;
 
 	bool enabled;
 	atomic_t num_waiters;
@@ -3429,6 +3431,8 @@ extern void i915_redisable_vga_power_on(struct drm_i915_private *dev_priv);
 extern bool ironlake_set_drps(struct drm_i915_private *dev_priv, u8 val);
 extern void intel_init_pch_refclk(struct drm_i915_private *dev_priv);
 extern int intel_set_rps(struct drm_i915_private *dev_priv, u8 val);
+extern void intel_rps_set_interactive(struct drm_i915_private *dev_priv,
+				      bool state);
 extern bool intel_set_memory_cxsr(struct drm_i915_private *dev_priv,
 				  bool enable);
 
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 7998e70a3174..5809366ff9f0 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -13104,6 +13104,19 @@ intel_prepare_plane_fb(struct drm_plane *plane,
 		add_rps_boost_after_vblank(new_state->crtc, new_state->fence);
 	}
 
+	/*
+	 * We declare pageflips to be interactive and so merit a small bias
+	 * towards upclocking to deliver the frame on time. By only changing
+	 * the RPS thresholds to sample more regularly and aim for higher
+	 * clocks we can hopefully deliver low power workloads (like kodi)
+	 * that are not quite steady state without resorting to forcing
+	 * maximum clocks following a vblank miss (see do_rps_boost()).
+	 */
+	if (!intel_state->rps_interactive) {
+		intel_rps_set_interactive(dev_priv, true);
+		intel_state->rps_interactive = true;
+	}
+
 	return 0;
 }
 
@@ -13120,8 +13133,15 @@ void
 intel_cleanup_plane_fb(struct drm_plane *plane,
 		       struct drm_plane_state *old_state)
 {
+	struct intel_atomic_state *intel_state =
+		to_intel_atomic_state(old_state->state);
 	struct drm_i915_private *dev_priv = to_i915(plane->dev);
 
+	if (intel_state->rps_interactive) {
+		intel_rps_set_interactive(dev_priv, false);
+		intel_state->rps_interactive = false;
+	}
+
 	/* Should only be called after a successful intel_prepare_plane_fb()! */
 	mutex_lock(&dev_priv->drm.struct_mutex);
 	intel_plane_unpin_fb(to_intel_plane_state(old_state));
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index 61e715ddd0d5..544812488821 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -482,6 +482,8 @@ struct intel_atomic_state {
 	 */
 	bool skip_intermediate_wm;
 
+	bool rps_interactive;
+
 	/* Gen9+ only */
 	struct skl_ddb_values wm_results;
 
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 53aaaa3e6886..01475bf3196a 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -6264,41 +6264,14 @@ static u32 intel_rps_limits(struct drm_i915_private *dev_priv, u8 val)
 	return limits;
 }
 
-static void gen6_set_rps_thresholds(struct drm_i915_private *dev_priv, u8 val)
+static void rps_set_power(struct drm_i915_private *dev_priv, int new_power)
 {
 	struct intel_rps *rps = &dev_priv->gt_pm.rps;
-	int new_power;
 	u32 threshold_up = 0, threshold_down = 0; /* in % */
 	u32 ei_up = 0, ei_down = 0;
 
-	new_power = rps->power;
-	switch (rps->power) {
-	case LOW_POWER:
-		if (val > rps->efficient_freq + 1 &&
-		    val > rps->cur_freq)
-			new_power = BETWEEN;
-		break;
+	lockdep_assert_held(&rps->power_lock);
 
-	case BETWEEN:
-		if (val <= rps->efficient_freq &&
-		    val < rps->cur_freq)
-			new_power = LOW_POWER;
-		else if (val >= rps->rp0_freq &&
-			 val > rps->cur_freq)
-			new_power = HIGH_POWER;
-		break;
-
-	case HIGH_POWER:
-		if (val < (rps->rp1_freq + rps->rp0_freq) >> 1 &&
-		    val < rps->cur_freq)
-			new_power = BETWEEN;
-		break;
-	}
-	/* Max/min bins are special */
-	if (val <= rps->min_freq_softlimit)
-		new_power = LOW_POWER;
-	if (val >= rps->max_freq_softlimit)
-		new_power = HIGH_POWER;
 	if (new_power == rps->power)
 		return;
 
@@ -6365,9 +6338,68 @@ static void gen6_set_rps_thresholds(struct drm_i915_private *dev_priv, u8 val)
 	rps->power = new_power;
 	rps->up_threshold = threshold_up;
 	rps->down_threshold = threshold_down;
+}
+
+static void gen6_set_rps_thresholds(struct drm_i915_private *dev_priv, u8 val)
+{
+	struct intel_rps *rps = &dev_priv->gt_pm.rps;
+	int new_power;
+
+	new_power = rps->power;
+	switch (rps->power) {
+	case LOW_POWER:
+		if (val > rps->efficient_freq + 1 &&
+		    val > rps->cur_freq)
+			new_power = BETWEEN;
+		break;
+
+	case BETWEEN:
+		if (val <= rps->efficient_freq &&
+		    val < rps->cur_freq)
+			new_power = LOW_POWER;
+		else if (val >= rps->rp0_freq &&
+			 val > rps->cur_freq)
+			new_power = HIGH_POWER;
+		break;
+
+	case HIGH_POWER:
+		if (val < (rps->rp1_freq + rps->rp0_freq) >> 1 &&
+		    val < rps->cur_freq)
+			new_power = BETWEEN;
+		break;
+	}
+	/* Max/min bins are special */
+	if (val <= rps->min_freq_softlimit)
+		new_power = LOW_POWER;
+	if (val >= rps->max_freq_softlimit)
+		new_power = HIGH_POWER;
+
+	mutex_lock(&rps->power_lock);
+	if (rps->interactive)
+		new_power = HIGH_POWER;
+	rps_set_power(dev_priv, new_power);
+	mutex_unlock(&rps->power_lock);
 	rps->last_adj = 0;
 }
 
+void intel_rps_set_interactive(struct drm_i915_private *dev_priv, bool state)
+{
+	struct intel_rps *rps = &dev_priv->gt_pm.rps;
+
+	if (INTEL_GEN(dev_priv) < 6)
+		return;
+
+	mutex_lock(&rps->power_lock);
+	if (state) {
+		if (!rps->interactive++ && READ_ONCE(dev_priv->gt.awake))
+			rps_set_power(dev_priv, HIGH_POWER);
+	} else {
+		GEM_BUG_ON(!rps->interactive);
+		rps->interactive--;
+	}
+	mutex_unlock(&rps->power_lock);
+}
+
 static u32 gen6_rps_pm_mask(struct drm_i915_private *dev_priv, u8 val)
 {
 	struct intel_rps *rps = &dev_priv->gt_pm.rps;
@@ -9604,6 +9636,7 @@ int intel_freq_opcode(struct drm_i915_private *dev_priv, int val)
 void intel_pm_setup(struct drm_i915_private *dev_priv)
 {
 	mutex_init(&dev_priv->pcu_lock);
+	mutex_init(&dev_priv->gt_pm.rps.power_lock);
 
 	atomic_set(&dev_priv->gt_pm.rps.num_waiters, 0);
 
-- 
2.18.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RESEND] drm/i915: Interactive RPS mode
  2018-07-12  8:02 ` [PATCH v4] " Chris Wilson
@ 2018-07-12 17:38   ` Chris Wilson
  2018-07-20 12:49     ` Tvrtko Ursulin
  2018-07-12 17:58   ` ✗ Fi.CI.CHECKPATCH: warning for drm/i915: Interactive RPS mode (rev4) Patchwork
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 26+ messages in thread
From: Chris Wilson @ 2018-07-12 17:38 UTC (permalink / raw)
  To: intel-gfx

RPS provides a feedback loop where we use the load during the previous
evaluation interval to decide whether to up or down clock the GPU
frequency. Our responsiveness is split into 3 regimes, a high and low
plateau with the intent to keep the gpu clocked high to cover occasional
stalls under high load, and low despite occasional glitches under steady
low load, and inbetween. However, we run into situations like kodi where
we want to stay at low power (video decoding is done efficiently
inside the fixed function HW and doesn't need high clocks even for high
bitrate streams), but just occasionally the pipeline is more complex
than a video decode and we need a smidgen of extra GPU power to present
on time. In the high power regime, we sample at sub frame intervals with
a bias to upclocking, and conversely at low power we sample over a few
frames worth to provide what we consider to be the right levels of
responsiveness respectively. At low power, we more or less expect to be
kicked out to high power at the start of a busy sequence by waitboosting.

Prior to commit e9af4ea2b9e7 ("drm/i915: Avoid waitboosting on the active
request") whenever we missed the frame or stalled, we would immediate go
full throttle and upclock the GPU to max. But in commit e9af4ea2b9e7, we
relaxed the waitboosting to only apply if the pipeline was deep to avoid
over-committing resources for a near miss. Sadly though, a near miss is
still a miss, and perceptible as jitter in the frame delivery.

To try and prevent the near miss before having to resort to boosting
after the fact, we use the pageflip queue as an indication that we are
in an "interactive" regime and so should sample the load more frequently
to provide power before the frame misses it vblank. This will make us
more favorable to providing a small power increase (one or two bins) as
required rather than going all the way to maximum and then having to
work back down again. (We still keep the waitboosting mechanism around
just in case a dramatic change in system load requires urgent uplocking,
faster than we can provide in a few evaluation intervals.)

v2: Reduce rps_set_interactive to a boolean parameter to avoid the
confusion of what if they wanted a new power mode after pinning to a
different mode (which to choose?)
v3: Only reprogram RPS while the GT is awake, it will be set when we
wake the GT, and while off warns about being used outside of rpm.
v4: Fix deferred application of interactive mode

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107111
Fixes: e9af4ea2b9e7 ("drm/i915: Avoid waitboosting on the active request")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Radoslaw Szwichtenberg <radoslaw.szwichtenberg@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c  |  1 +
 drivers/gpu/drm/i915/i915_drv.h      |  4 ++
 drivers/gpu/drm/i915/intel_display.c | 20 ++++++
 drivers/gpu/drm/i915/intel_drv.h     |  2 +
 drivers/gpu/drm/i915/intel_pm.c      | 91 +++++++++++++++++++---------
 5 files changed, 89 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 099f97ef2303..ac019bb927d0 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2218,6 +2218,7 @@ static int i915_rps_boost_info(struct seq_file *m, void *data)
 	seq_printf(m, "CPU waiting? %d\n", count_irq_waiters(dev_priv));
 	seq_printf(m, "Boosts outstanding? %d\n",
 		   atomic_read(&rps->num_waiters));
+	seq_printf(m, "Interactive? %d\n", READ_ONCE(rps->interactive));
 	seq_printf(m, "Frequency requested %d\n",
 		   intel_gpu_freq(dev_priv, rps->cur_freq));
 	seq_printf(m, "  min hard:%d, soft:%d; max soft:%d, hard:%d\n",
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 01dd29837233..f02fbeee553f 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -784,6 +784,8 @@ struct intel_rps {
 
 	int last_adj;
 	enum { LOW_POWER, BETWEEN, HIGH_POWER } power;
+	unsigned int interactive;
+	struct mutex power_lock;
 
 	bool enabled;
 	atomic_t num_waiters;
@@ -3429,6 +3431,8 @@ extern void i915_redisable_vga_power_on(struct drm_i915_private *dev_priv);
 extern bool ironlake_set_drps(struct drm_i915_private *dev_priv, u8 val);
 extern void intel_init_pch_refclk(struct drm_i915_private *dev_priv);
 extern int intel_set_rps(struct drm_i915_private *dev_priv, u8 val);
+extern void intel_rps_set_interactive(struct drm_i915_private *dev_priv,
+				      bool state);
 extern bool intel_set_memory_cxsr(struct drm_i915_private *dev_priv,
 				  bool enable);
 
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 7998e70a3174..5809366ff9f0 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -13104,6 +13104,19 @@ intel_prepare_plane_fb(struct drm_plane *plane,
 		add_rps_boost_after_vblank(new_state->crtc, new_state->fence);
 	}
 
+	/*
+	 * We declare pageflips to be interactive and so merit a small bias
+	 * towards upclocking to deliver the frame on time. By only changing
+	 * the RPS thresholds to sample more regularly and aim for higher
+	 * clocks we can hopefully deliver low power workloads (like kodi)
+	 * that are not quite steady state without resorting to forcing
+	 * maximum clocks following a vblank miss (see do_rps_boost()).
+	 */
+	if (!intel_state->rps_interactive) {
+		intel_rps_set_interactive(dev_priv, true);
+		intel_state->rps_interactive = true;
+	}
+
 	return 0;
 }
 
@@ -13120,8 +13133,15 @@ void
 intel_cleanup_plane_fb(struct drm_plane *plane,
 		       struct drm_plane_state *old_state)
 {
+	struct intel_atomic_state *intel_state =
+		to_intel_atomic_state(old_state->state);
 	struct drm_i915_private *dev_priv = to_i915(plane->dev);
 
+	if (intel_state->rps_interactive) {
+		intel_rps_set_interactive(dev_priv, false);
+		intel_state->rps_interactive = false;
+	}
+
 	/* Should only be called after a successful intel_prepare_plane_fb()! */
 	mutex_lock(&dev_priv->drm.struct_mutex);
 	intel_plane_unpin_fb(to_intel_plane_state(old_state));
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index 61e715ddd0d5..544812488821 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -482,6 +482,8 @@ struct intel_atomic_state {
 	 */
 	bool skip_intermediate_wm;
 
+	bool rps_interactive;
+
 	/* Gen9+ only */
 	struct skl_ddb_values wm_results;
 
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 53aaaa3e6886..01475bf3196a 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -6264,41 +6264,14 @@ static u32 intel_rps_limits(struct drm_i915_private *dev_priv, u8 val)
 	return limits;
 }
 
-static void gen6_set_rps_thresholds(struct drm_i915_private *dev_priv, u8 val)
+static void rps_set_power(struct drm_i915_private *dev_priv, int new_power)
 {
 	struct intel_rps *rps = &dev_priv->gt_pm.rps;
-	int new_power;
 	u32 threshold_up = 0, threshold_down = 0; /* in % */
 	u32 ei_up = 0, ei_down = 0;
 
-	new_power = rps->power;
-	switch (rps->power) {
-	case LOW_POWER:
-		if (val > rps->efficient_freq + 1 &&
-		    val > rps->cur_freq)
-			new_power = BETWEEN;
-		break;
+	lockdep_assert_held(&rps->power_lock);
 
-	case BETWEEN:
-		if (val <= rps->efficient_freq &&
-		    val < rps->cur_freq)
-			new_power = LOW_POWER;
-		else if (val >= rps->rp0_freq &&
-			 val > rps->cur_freq)
-			new_power = HIGH_POWER;
-		break;
-
-	case HIGH_POWER:
-		if (val < (rps->rp1_freq + rps->rp0_freq) >> 1 &&
-		    val < rps->cur_freq)
-			new_power = BETWEEN;
-		break;
-	}
-	/* Max/min bins are special */
-	if (val <= rps->min_freq_softlimit)
-		new_power = LOW_POWER;
-	if (val >= rps->max_freq_softlimit)
-		new_power = HIGH_POWER;
 	if (new_power == rps->power)
 		return;
 
@@ -6365,9 +6338,68 @@ static void gen6_set_rps_thresholds(struct drm_i915_private *dev_priv, u8 val)
 	rps->power = new_power;
 	rps->up_threshold = threshold_up;
 	rps->down_threshold = threshold_down;
+}
+
+static void gen6_set_rps_thresholds(struct drm_i915_private *dev_priv, u8 val)
+{
+	struct intel_rps *rps = &dev_priv->gt_pm.rps;
+	int new_power;
+
+	new_power = rps->power;
+	switch (rps->power) {
+	case LOW_POWER:
+		if (val > rps->efficient_freq + 1 &&
+		    val > rps->cur_freq)
+			new_power = BETWEEN;
+		break;
+
+	case BETWEEN:
+		if (val <= rps->efficient_freq &&
+		    val < rps->cur_freq)
+			new_power = LOW_POWER;
+		else if (val >= rps->rp0_freq &&
+			 val > rps->cur_freq)
+			new_power = HIGH_POWER;
+		break;
+
+	case HIGH_POWER:
+		if (val < (rps->rp1_freq + rps->rp0_freq) >> 1 &&
+		    val < rps->cur_freq)
+			new_power = BETWEEN;
+		break;
+	}
+	/* Max/min bins are special */
+	if (val <= rps->min_freq_softlimit)
+		new_power = LOW_POWER;
+	if (val >= rps->max_freq_softlimit)
+		new_power = HIGH_POWER;
+
+	mutex_lock(&rps->power_lock);
+	if (rps->interactive)
+		new_power = HIGH_POWER;
+	rps_set_power(dev_priv, new_power);
+	mutex_unlock(&rps->power_lock);
 	rps->last_adj = 0;
 }
 
+void intel_rps_set_interactive(struct drm_i915_private *dev_priv, bool state)
+{
+	struct intel_rps *rps = &dev_priv->gt_pm.rps;
+
+	if (INTEL_GEN(dev_priv) < 6)
+		return;
+
+	mutex_lock(&rps->power_lock);
+	if (state) {
+		if (!rps->interactive++ && READ_ONCE(dev_priv->gt.awake))
+			rps_set_power(dev_priv, HIGH_POWER);
+	} else {
+		GEM_BUG_ON(!rps->interactive);
+		rps->interactive--;
+	}
+	mutex_unlock(&rps->power_lock);
+}
+
 static u32 gen6_rps_pm_mask(struct drm_i915_private *dev_priv, u8 val)
 {
 	struct intel_rps *rps = &dev_priv->gt_pm.rps;
@@ -9604,6 +9636,7 @@ int intel_freq_opcode(struct drm_i915_private *dev_priv, int val)
 void intel_pm_setup(struct drm_i915_private *dev_priv)
 {
 	mutex_init(&dev_priv->pcu_lock);
+	mutex_init(&dev_priv->gt_pm.rps.power_lock);
 
 	atomic_set(&dev_priv->gt_pm.rps.num_waiters, 0);
 
-- 
2.18.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* ✗ Fi.CI.CHECKPATCH: warning for drm/i915: Interactive RPS mode (rev4)
  2018-07-12  8:02 ` [PATCH v4] " Chris Wilson
  2018-07-12 17:38   ` [RESEND] " Chris Wilson
@ 2018-07-12 17:58   ` Patchwork
  2018-07-12 17:59   ` ✗ Fi.CI.SPARSE: " Patchwork
  2018-07-12 18:15   ` ✓ Fi.CI.BAT: success " Patchwork
  3 siblings, 0 replies; 26+ messages in thread
From: Patchwork @ 2018-07-12 17:58 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Interactive RPS mode (rev4)
URL   : https://patchwork.freedesktop.org/series/46334/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
bfc0fb4a0c59 drm/i915: Interactive RPS mode
-:24: ERROR:GIT_COMMIT_ID: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' - ie: 'commit e9af4ea2b9e7 ("drm/i915: Avoid waitboosting on the active request")'
#24: 
full throttle and upclock the GPU to max. But in commit e9af4ea2b9e7, we

-:74: CHECK:UNCOMMENTED_DEFINITION: struct mutex definition without comment
#74: FILE: drivers/gpu/drm/i915/i915_drv.h:788:
+	struct mutex power_lock;

-:82: CHECK:AVOID_EXTERNS: extern prototypes should be avoided in .h files
#82: FILE: drivers/gpu/drm/i915/i915_drv.h:3437:
+extern void intel_rps_set_interactive(struct drm_i915_private *dev_priv,

total: 1 errors, 0 warnings, 2 checks, 183 lines checked

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* ✗ Fi.CI.SPARSE: warning for drm/i915: Interactive RPS mode (rev4)
  2018-07-12  8:02 ` [PATCH v4] " Chris Wilson
  2018-07-12 17:38   ` [RESEND] " Chris Wilson
  2018-07-12 17:58   ` ✗ Fi.CI.CHECKPATCH: warning for drm/i915: Interactive RPS mode (rev4) Patchwork
@ 2018-07-12 17:59   ` Patchwork
  2018-07-12 18:15   ` ✓ Fi.CI.BAT: success " Patchwork
  3 siblings, 0 replies; 26+ messages in thread
From: Patchwork @ 2018-07-12 17:59 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Interactive RPS mode (rev4)
URL   : https://patchwork.freedesktop.org/series/46334/
State : warning

== Summary ==

$ dim sparse origin/drm-tip
Commit: drm/i915: Interactive RPS mode
-drivers/gpu/drm/i915/selftests/../i915_drv.h:3655:16: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/../i915_drv.h:3659:16: warning: expression using sizeof(void)

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* ✓ Fi.CI.BAT: success for drm/i915: Interactive RPS mode (rev4)
  2018-07-12  8:02 ` [PATCH v4] " Chris Wilson
                     ` (2 preceding siblings ...)
  2018-07-12 17:59   ` ✗ Fi.CI.SPARSE: " Patchwork
@ 2018-07-12 18:15   ` Patchwork
  3 siblings, 0 replies; 26+ messages in thread
From: Patchwork @ 2018-07-12 18:15 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Interactive RPS mode (rev4)
URL   : https://patchwork.freedesktop.org/series/46334/
State : success

== Summary ==

= CI Bug Log - changes from CI_DRM_4479 -> Patchwork_9636 =

== Summary - SUCCESS ==

  No regressions found.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/46334/revisions/4/mbox/

== Possible new issues ==

  Here are the unknown changes that may have been introduced in Patchwork_9636:

  === IGT changes ===

    ==== Possible regressions ====

    igt@gem_mmap_gtt@basic-copy:
      {fi-kbl-8809g}:     NOTRUN -> DMESG-WARN +9

    
== Known issues ==

  Here are the changes found in Patchwork_9636 that come from known issues:

  === IGT changes ===

    ==== Issues hit ====

    igt@kms_flip@basic-flip-vs-dpms:
      fi-skl-6700hq:      PASS -> DMESG-WARN (fdo#105998)

    
    ==== Possible fixes ====

    igt@drv_module_reload@basic-reload:
      fi-ilk-650:         DMESG-WARN (fdo#106387) -> PASS +2

    igt@kms_chamelium@hdmi-hpd-fast:
      fi-kbl-7500u:       FAIL (fdo#103841, fdo#102672) -> SKIP

    igt@kms_pipe_crc_basic@suspend-read-crc-pipe-b:
      fi-snb-2520m:       INCOMPLETE (fdo#103713) -> PASS

    
    ==== Warnings ====

    igt@gem_exec_suspend@basic-s4-devices:
      {fi-kbl-8809g}:     INCOMPLETE (fdo#107139) -> DMESG-WARN (fdo#107139)

    
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  fdo#102672 https://bugs.freedesktop.org/show_bug.cgi?id=102672
  fdo#103713 https://bugs.freedesktop.org/show_bug.cgi?id=103713
  fdo#103841 https://bugs.freedesktop.org/show_bug.cgi?id=103841
  fdo#105998 https://bugs.freedesktop.org/show_bug.cgi?id=105998
  fdo#106387 https://bugs.freedesktop.org/show_bug.cgi?id=106387
  fdo#107139 https://bugs.freedesktop.org/show_bug.cgi?id=107139


== Participating hosts (46 -> 42) ==

  Additional (1): fi-byt-j1900 
  Missing    (5): fi-ctg-p8600 fi-ilk-m540 fi-byt-squawks fi-bsw-cyan fi-hsw-4200u 


== Build changes ==

    * Linux: CI_DRM_4479 -> Patchwork_9636

  CI_DRM_4479: c9c54a1c37adefd414d51ed919aab3f94794ee35 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4553: 9d25d62bb6ff62694e5226b02d79075eab304402 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_9636: bfc0fb4a0c59ca1c807cb0f0cd3a872f37c6372d @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

bfc0fb4a0c59 drm/i915: Interactive RPS mode

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_9636/issues.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RESEND] drm/i915: Interactive RPS mode
  2018-07-12 17:38   ` [RESEND] " Chris Wilson
@ 2018-07-20 12:49     ` Tvrtko Ursulin
  2018-07-20 13:02       ` Chris Wilson
  0 siblings, 1 reply; 26+ messages in thread
From: Tvrtko Ursulin @ 2018-07-20 12:49 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 12/07/2018 18:38, Chris Wilson wrote:
> RPS provides a feedback loop where we use the load during the previous
> evaluation interval to decide whether to up or down clock the GPU
> frequency. Our responsiveness is split into 3 regimes, a high and low
> plateau with the intent to keep the gpu clocked high to cover occasional
> stalls under high load, and low despite occasional glitches under steady
> low load, and inbetween. However, we run into situations like kodi where
> we want to stay at low power (video decoding is done efficiently
> inside the fixed function HW and doesn't need high clocks even for high
> bitrate streams), but just occasionally the pipeline is more complex
> than a video decode and we need a smidgen of extra GPU power to present
> on time. In the high power regime, we sample at sub frame intervals with
> a bias to upclocking, and conversely at low power we sample over a few
> frames worth to provide what we consider to be the right levels of
> responsiveness respectively. At low power, we more or less expect to be
> kicked out to high power at the start of a busy sequence by waitboosting.
> 
> Prior to commit e9af4ea2b9e7 ("drm/i915: Avoid waitboosting on the active
> request") whenever we missed the frame or stalled, we would immediate go
> full throttle and upclock the GPU to max. But in commit e9af4ea2b9e7, we
> relaxed the waitboosting to only apply if the pipeline was deep to avoid
> over-committing resources for a near miss. Sadly though, a near miss is
> still a miss, and perceptible as jitter in the frame delivery.
> 
> To try and prevent the near miss before having to resort to boosting
> after the fact, we use the pageflip queue as an indication that we are
> in an "interactive" regime and so should sample the load more frequently
> to provide power before the frame misses it vblank. This will make us
> more favorable to providing a small power increase (one or two bins) as
> required rather than going all the way to maximum and then having to
> work back down again. (We still keep the waitboosting mechanism around
> just in case a dramatic change in system load requires urgent uplocking,
> faster than we can provide in a few evaluation intervals.)
> 
> v2: Reduce rps_set_interactive to a boolean parameter to avoid the
> confusion of what if they wanted a new power mode after pinning to a
> different mode (which to choose?)
> v3: Only reprogram RPS while the GT is awake, it will be set when we
> wake the GT, and while off warns about being used outside of rpm.
> v4: Fix deferred application of interactive mode
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107111
> Fixes: e9af4ea2b9e7 ("drm/i915: Avoid waitboosting on the active request")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Radoslaw Szwichtenberg <radoslaw.szwichtenberg@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_debugfs.c  |  1 +
>   drivers/gpu/drm/i915/i915_drv.h      |  4 ++
>   drivers/gpu/drm/i915/intel_display.c | 20 ++++++
>   drivers/gpu/drm/i915/intel_drv.h     |  2 +
>   drivers/gpu/drm/i915/intel_pm.c      | 91 +++++++++++++++++++---------
>   5 files changed, 89 insertions(+), 29 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 099f97ef2303..ac019bb927d0 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -2218,6 +2218,7 @@ static int i915_rps_boost_info(struct seq_file *m, void *data)
>   	seq_printf(m, "CPU waiting? %d\n", count_irq_waiters(dev_priv));
>   	seq_printf(m, "Boosts outstanding? %d\n",
>   		   atomic_read(&rps->num_waiters));
> +	seq_printf(m, "Interactive? %d\n", READ_ONCE(rps->interactive));
>   	seq_printf(m, "Frequency requested %d\n",
>   		   intel_gpu_freq(dev_priv, rps->cur_freq));
>   	seq_printf(m, "  min hard:%d, soft:%d; max soft:%d, hard:%d\n",
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 01dd29837233..f02fbeee553f 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -784,6 +784,8 @@ struct intel_rps {
>   
>   	int last_adj;
>   	enum { LOW_POWER, BETWEEN, HIGH_POWER } power;
> +	unsigned int interactive;
> +	struct mutex power_lock;
>   
>   	bool enabled;
>   	atomic_t num_waiters;
> @@ -3429,6 +3431,8 @@ extern void i915_redisable_vga_power_on(struct drm_i915_private *dev_priv);
>   extern bool ironlake_set_drps(struct drm_i915_private *dev_priv, u8 val);
>   extern void intel_init_pch_refclk(struct drm_i915_private *dev_priv);
>   extern int intel_set_rps(struct drm_i915_private *dev_priv, u8 val);
> +extern void intel_rps_set_interactive(struct drm_i915_private *dev_priv,
> +				      bool state);
>   extern bool intel_set_memory_cxsr(struct drm_i915_private *dev_priv,
>   				  bool enable);
>   
> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> index 7998e70a3174..5809366ff9f0 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -13104,6 +13104,19 @@ intel_prepare_plane_fb(struct drm_plane *plane,
>   		add_rps_boost_after_vblank(new_state->crtc, new_state->fence);
>   	}
>   
> +	/*
> +	 * We declare pageflips to be interactive and so merit a small bias
> +	 * towards upclocking to deliver the frame on time. By only changing
> +	 * the RPS thresholds to sample more regularly and aim for higher
> +	 * clocks we can hopefully deliver low power workloads (like kodi)
> +	 * that are not quite steady state without resorting to forcing
> +	 * maximum clocks following a vblank miss (see do_rps_boost()).
> +	 */
> +	if (!intel_state->rps_interactive) {
> +		intel_rps_set_interactive(dev_priv, true);
> +		intel_state->rps_interactive = true;
> +	}
> +
>   	return 0;
>   }
>   
> @@ -13120,8 +13133,15 @@ void
>   intel_cleanup_plane_fb(struct drm_plane *plane,
>   		       struct drm_plane_state *old_state)
>   {
> +	struct intel_atomic_state *intel_state =
> +		to_intel_atomic_state(old_state->state);
>   	struct drm_i915_private *dev_priv = to_i915(plane->dev);
>   
> +	if (intel_state->rps_interactive) {
> +		intel_rps_set_interactive(dev_priv, false);
> +		intel_state->rps_interactive = false;
> +	}

Why is the rps_intearctive flag needed in plane state? I wanted to 
assume prepare & cleanup hooks are fully symmetric, but this flags makes 
me unsure. A reviewer with more display knowledge will be needed here I 
think.

> +
>   	/* Should only be called after a successful intel_prepare_plane_fb()! */
>   	mutex_lock(&dev_priv->drm.struct_mutex);
>   	intel_plane_unpin_fb(to_intel_plane_state(old_state));
> diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
> index 61e715ddd0d5..544812488821 100644
> --- a/drivers/gpu/drm/i915/intel_drv.h
> +++ b/drivers/gpu/drm/i915/intel_drv.h
> @@ -482,6 +482,8 @@ struct intel_atomic_state {
>   	 */
>   	bool skip_intermediate_wm;
>   
> +	bool rps_interactive;

Should the name at this level be something more tied to the subsystem 
and not implying wider relationships? Like page flip pending? fb_prepared?

> +
>   	/* Gen9+ only */
>   	struct skl_ddb_values wm_results;
>   
> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> index 53aaaa3e6886..01475bf3196a 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -6264,41 +6264,14 @@ static u32 intel_rps_limits(struct drm_i915_private *dev_priv, u8 val)
>   	return limits;
>   }
>   
> -static void gen6_set_rps_thresholds(struct drm_i915_private *dev_priv, u8 val)
> +static void rps_set_power(struct drm_i915_private *dev_priv, int new_power)
>   {
>   	struct intel_rps *rps = &dev_priv->gt_pm.rps;
> -	int new_power;
>   	u32 threshold_up = 0, threshold_down = 0; /* in % */
>   	u32 ei_up = 0, ei_down = 0;
>   
> -	new_power = rps->power;
> -	switch (rps->power) {
> -	case LOW_POWER:
> -		if (val > rps->efficient_freq + 1 &&
> -		    val > rps->cur_freq)
> -			new_power = BETWEEN;
> -		break;
> +	lockdep_assert_held(&rps->power_lock);
>   
> -	case BETWEEN:
> -		if (val <= rps->efficient_freq &&
> -		    val < rps->cur_freq)
> -			new_power = LOW_POWER;
> -		else if (val >= rps->rp0_freq &&
> -			 val > rps->cur_freq)
> -			new_power = HIGH_POWER;
> -		break;
> -
> -	case HIGH_POWER:
> -		if (val < (rps->rp1_freq + rps->rp0_freq) >> 1 &&
> -		    val < rps->cur_freq)
> -			new_power = BETWEEN;
> -		break;
> -	}
> -	/* Max/min bins are special */
> -	if (val <= rps->min_freq_softlimit)
> -		new_power = LOW_POWER;
> -	if (val >= rps->max_freq_softlimit)
> -		new_power = HIGH_POWER;
>   	if (new_power == rps->power)
>   		return;
>   
> @@ -6365,9 +6338,68 @@ static void gen6_set_rps_thresholds(struct drm_i915_private *dev_priv, u8 val)
>   	rps->power = new_power;
>   	rps->up_threshold = threshold_up;
>   	rps->down_threshold = threshold_down;
> +}
> +
> +static void gen6_set_rps_thresholds(struct drm_i915_private *dev_priv, u8 val)
> +{
> +	struct intel_rps *rps = &dev_priv->gt_pm.rps;
> +	int new_power;
> +
> +	new_power = rps->power;
> +	switch (rps->power) {
> +	case LOW_POWER:
> +		if (val > rps->efficient_freq + 1 &&
> +		    val > rps->cur_freq)
> +			new_power = BETWEEN;
> +		break;
> +
> +	case BETWEEN:
> +		if (val <= rps->efficient_freq &&
> +		    val < rps->cur_freq)
> +			new_power = LOW_POWER;
> +		else if (val >= rps->rp0_freq &&
> +			 val > rps->cur_freq)
> +			new_power = HIGH_POWER;
> +		break;
> +
> +	case HIGH_POWER:
> +		if (val < (rps->rp1_freq + rps->rp0_freq) >> 1 &&
> +		    val < rps->cur_freq)
> +			new_power = BETWEEN;
> +		break;
> +	}
> +	/* Max/min bins are special */
> +	if (val <= rps->min_freq_softlimit)
> +		new_power = LOW_POWER;
> +	if (val >= rps->max_freq_softlimit)
> +		new_power = HIGH_POWER;
> +
> +	mutex_lock(&rps->power_lock);

Is it worth avoiding the atomic here when GEN < 6?

> +	if (rps->interactive)
> +		new_power = HIGH_POWER;
> +	rps_set_power(dev_priv, new_power);
> +	mutex_unlock(&rps->power_lock);
 >
 >   	rps->last_adj = 0;

This block should go up to the beginning of the function since when 
rps->interactive all preceding calculations are for nothing. And can 
immediately return then.

>   }
>   
> +void intel_rps_set_interactive(struct drm_i915_private *dev_priv, bool state)

s/state/interactive/ for more self-documenting function body?

And not s/dev_priv/i915/ ?!? :)

> +{
> +	struct intel_rps *rps = &dev_priv->gt_pm.rps;
> +
> +	if (INTEL_GEN(dev_priv) < 6)
> +		return;
> +
> +	mutex_lock(&rps->power_lock);
> +	if (state) {
> +		if (!rps->interactive++ && READ_ONCE(dev_priv->gt.awake))
> +			rps_set_power(dev_priv, HIGH_POWER);
> +	} else {
> +		GEM_BUG_ON(!rps->interactive);
> +		rps->interactive--;

Hm maybe protect this in production code so some missed code flows in 
the atomic paths do not cause underflow and so permanent interactive mode.

> +	}
> +	mutex_unlock(&rps->power_lock);
> +}
> +
>   static u32 gen6_rps_pm_mask(struct drm_i915_private *dev_priv, u8 val)
>   {
>   	struct intel_rps *rps = &dev_priv->gt_pm.rps;
> @@ -9604,6 +9636,7 @@ int intel_freq_opcode(struct drm_i915_private *dev_priv, int val)
>   void intel_pm_setup(struct drm_i915_private *dev_priv)
>   {
>   	mutex_init(&dev_priv->pcu_lock);
> +	mutex_init(&dev_priv->gt_pm.rps.power_lock);
>   
>   	atomic_set(&dev_priv->gt_pm.rps.num_waiters, 0);
>   
> 

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RESEND] drm/i915: Interactive RPS mode
  2018-07-20 12:49     ` Tvrtko Ursulin
@ 2018-07-20 13:02       ` Chris Wilson
  2018-07-20 13:07         ` Ville Syrjälä
  2018-07-20 13:29         ` Tvrtko Ursulin
  0 siblings, 2 replies; 26+ messages in thread
From: Chris Wilson @ 2018-07-20 13:02 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2018-07-20 13:49:09)
> 
> On 12/07/2018 18:38, Chris Wilson wrote:
> > diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> > index 7998e70a3174..5809366ff9f0 100644
> > --- a/drivers/gpu/drm/i915/intel_display.c
> > +++ b/drivers/gpu/drm/i915/intel_display.c
> > @@ -13104,6 +13104,19 @@ intel_prepare_plane_fb(struct drm_plane *plane,
> >               add_rps_boost_after_vblank(new_state->crtc, new_state->fence);
> >       }
> >   
> > +     /*
> > +      * We declare pageflips to be interactive and so merit a small bias
> > +      * towards upclocking to deliver the frame on time. By only changing
> > +      * the RPS thresholds to sample more regularly and aim for higher
> > +      * clocks we can hopefully deliver low power workloads (like kodi)
> > +      * that are not quite steady state without resorting to forcing
> > +      * maximum clocks following a vblank miss (see do_rps_boost()).
> > +      */
> > +     if (!intel_state->rps_interactive) {
> > +             intel_rps_set_interactive(dev_priv, true);
> > +             intel_state->rps_interactive = true;
> > +     }
> > +
> >       return 0;
> >   }
> >   
> > @@ -13120,8 +13133,15 @@ void
> >   intel_cleanup_plane_fb(struct drm_plane *plane,
> >                      struct drm_plane_state *old_state)
> >   {
> > +     struct intel_atomic_state *intel_state =
> > +             to_intel_atomic_state(old_state->state);
> >       struct drm_i915_private *dev_priv = to_i915(plane->dev);
> >   
> > +     if (intel_state->rps_interactive) {
> > +             intel_rps_set_interactive(dev_priv, false);
> > +             intel_state->rps_interactive = false;
> > +     }
> 
> Why is the rps_intearctive flag needed in plane state? I wanted to 
> assume prepare & cleanup hooks are fully symmetric, but this flags makes 
> me unsure. A reviewer with more display knowledge will be needed here I 
> think.

It's so that when we call intel_cleanup_plane_fb() on something that
wasn't first prepared, we don't decrement the counter. There's a little
bit of asymmetry at the start where we inherit the plane state.

> > +
> >       /* Should only be called after a successful intel_prepare_plane_fb()! */
> >       mutex_lock(&dev_priv->drm.struct_mutex);
> >       intel_plane_unpin_fb(to_intel_plane_state(old_state));
> > diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
> > index 61e715ddd0d5..544812488821 100644
> > --- a/drivers/gpu/drm/i915/intel_drv.h
> > +++ b/drivers/gpu/drm/i915/intel_drv.h
> > @@ -482,6 +482,8 @@ struct intel_atomic_state {
> >        */
> >       bool skip_intermediate_wm;
> >   
> > +     bool rps_interactive;
> 
> Should the name at this level be something more tied to the subsystem 
> and not implying wider relationships? Like page flip pending? fb_prepared?

But we are in the plane state so anything plane/flip is implied. I think
saying whether or not we've called out to rps is a reasonable name for
the state.

> > +     /* Max/min bins are special */
> > +     if (val <= rps->min_freq_softlimit)
> > +             new_power = LOW_POWER;
> > +     if (val >= rps->max_freq_softlimit)
> > +             new_power = HIGH_POWER;
> > +
> > +     mutex_lock(&rps->power_lock);
> 
> Is it worth avoiding the atomic here when GEN < 6?

We don't get here when not !RPS. You mean GEN < 5 ;)

> > +     if (rps->interactive)
> > +             new_power = HIGH_POWER;
> > +     rps_set_power(dev_priv, new_power);
> > +     mutex_unlock(&rps->power_lock);
>  >
>  >      rps->last_adj = 0;
> 
> This block should go up to the beginning of the function since when 
> rps->interactive all preceding calculations are for nothing. And can 
> immediately return then.

We have to lock around rps_set_power, so you're not going to avoid
taking the lock here, so I don't think it makes any difference.
Certainly neater than locking everything.

> > +void intel_rps_set_interactive(struct drm_i915_private *dev_priv, bool state)
> 
> s/state/interactive/ for more self-documenting function body?
> 
> And not s/dev_priv/i915/ ?!? :)
> 
> > +{
> > +     struct intel_rps *rps = &dev_priv->gt_pm.rps;
> > +
> > +     if (INTEL_GEN(dev_priv) < 6)
> > +             return;
> > +
> > +     mutex_lock(&rps->power_lock);
> > +     if (state) {
> > +             if (!rps->interactive++ && READ_ONCE(dev_priv->gt.awake))
> > +                     rps_set_power(dev_priv, HIGH_POWER);
> > +     } else {
> > +             GEM_BUG_ON(!rps->interactive);
> > +             rps->interactive--;
> 
> Hm maybe protect this in production code so some missed code flows in 
> the atomic paths do not cause underflow and so permanent interactive mode.

Are you suggesting testing is inadequate? ;) Underflow here isn't a big
deal, it just locks in the HIGH_POWER (faster sampling, bias towards
upclocking). Or worse not allow HIGH_POWER, status quo returned.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RESEND] drm/i915: Interactive RPS mode
  2018-07-20 13:02       ` Chris Wilson
@ 2018-07-20 13:07         ` Ville Syrjälä
  2018-07-20 13:14           ` Chris Wilson
  2018-07-20 13:29         ` Tvrtko Ursulin
  1 sibling, 1 reply; 26+ messages in thread
From: Ville Syrjälä @ 2018-07-20 13:07 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Fri, Jul 20, 2018 at 02:02:34PM +0100, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-07-20 13:49:09)
> > 
> > On 12/07/2018 18:38, Chris Wilson wrote:
> > > diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> > > index 7998e70a3174..5809366ff9f0 100644
> > > --- a/drivers/gpu/drm/i915/intel_display.c
> > > +++ b/drivers/gpu/drm/i915/intel_display.c
> > > @@ -13104,6 +13104,19 @@ intel_prepare_plane_fb(struct drm_plane *plane,
> > >               add_rps_boost_after_vblank(new_state->crtc, new_state->fence);
> > >       }
> > >   
> > > +     /*
> > > +      * We declare pageflips to be interactive and so merit a small bias
> > > +      * towards upclocking to deliver the frame on time. By only changing
> > > +      * the RPS thresholds to sample more regularly and aim for higher
> > > +      * clocks we can hopefully deliver low power workloads (like kodi)
> > > +      * that are not quite steady state without resorting to forcing
> > > +      * maximum clocks following a vblank miss (see do_rps_boost()).
> > > +      */
> > > +     if (!intel_state->rps_interactive) {
> > > +             intel_rps_set_interactive(dev_priv, true);
> > > +             intel_state->rps_interactive = true;
> > > +     }
> > > +
> > >       return 0;
> > >   }
> > >   
> > > @@ -13120,8 +13133,15 @@ void
> > >   intel_cleanup_plane_fb(struct drm_plane *plane,
> > >                      struct drm_plane_state *old_state)
> > >   {
> > > +     struct intel_atomic_state *intel_state =
> > > +             to_intel_atomic_state(old_state->state);
> > >       struct drm_i915_private *dev_priv = to_i915(plane->dev);
> > >   
> > > +     if (intel_state->rps_interactive) {
> > > +             intel_rps_set_interactive(dev_priv, false);
> > > +             intel_state->rps_interactive = false;
> > > +     }
> > 
> > Why is the rps_intearctive flag needed in plane state? I wanted to 
> > assume prepare & cleanup hooks are fully symmetric, but this flags makes 
> > me unsure. A reviewer with more display knowledge will be needed here I 
> > think.
> 
> It's so that when we call intel_cleanup_plane_fb() on something that
> wasn't first prepared, we don't decrement the counter. There's a little
> bit of asymmetry at the start where we inherit the plane state.

Doing this kind of global thing from the plane hooks seems a bit
strange. How about just doing this directly from commit_tail()
etc.?

-- 
Ville Syrjälä
Intel
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RESEND] drm/i915: Interactive RPS mode
  2018-07-20 13:07         ` Ville Syrjälä
@ 2018-07-20 13:14           ` Chris Wilson
  2018-07-20 13:32             ` Ville Syrjälä
  0 siblings, 1 reply; 26+ messages in thread
From: Chris Wilson @ 2018-07-20 13:14 UTC (permalink / raw)
  To: Ville Syrjälä; +Cc: intel-gfx

Quoting Ville Syrjälä (2018-07-20 14:07:31)
> On Fri, Jul 20, 2018 at 02:02:34PM +0100, Chris Wilson wrote:
> Doing this kind of global thing from the plane hooks seems a bit
> strange. How about just doing this directly from commit_tail()
> etc.?

We want it upfront in prepare (so that it's set before any wait) or
somewhere around there (atomic_state setup?). cleanup was chosen for the
symmetry with prepare.

Pick a spot for the increment for that tells us where to decrement.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RESEND] drm/i915: Interactive RPS mode
  2018-07-20 13:02       ` Chris Wilson
  2018-07-20 13:07         ` Ville Syrjälä
@ 2018-07-20 13:29         ` Tvrtko Ursulin
  2018-07-20 13:59           ` Chris Wilson
  1 sibling, 1 reply; 26+ messages in thread
From: Tvrtko Ursulin @ 2018-07-20 13:29 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 20/07/2018 14:02, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-07-20 13:49:09)
>>
>> On 12/07/2018 18:38, Chris Wilson wrote:
>>> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
>>> index 7998e70a3174..5809366ff9f0 100644
>>> --- a/drivers/gpu/drm/i915/intel_display.c
>>> +++ b/drivers/gpu/drm/i915/intel_display.c
>>> @@ -13104,6 +13104,19 @@ intel_prepare_plane_fb(struct drm_plane *plane,
>>>                add_rps_boost_after_vblank(new_state->crtc, new_state->fence);
>>>        }
>>>    
>>> +     /*
>>> +      * We declare pageflips to be interactive and so merit a small bias
>>> +      * towards upclocking to deliver the frame on time. By only changing
>>> +      * the RPS thresholds to sample more regularly and aim for higher
>>> +      * clocks we can hopefully deliver low power workloads (like kodi)
>>> +      * that are not quite steady state without resorting to forcing
>>> +      * maximum clocks following a vblank miss (see do_rps_boost()).
>>> +      */
>>> +     if (!intel_state->rps_interactive) {
>>> +             intel_rps_set_interactive(dev_priv, true);
>>> +             intel_state->rps_interactive = true;
>>> +     }
>>> +
>>>        return 0;
>>>    }
>>>    
>>> @@ -13120,8 +13133,15 @@ void
>>>    intel_cleanup_plane_fb(struct drm_plane *plane,
>>>                       struct drm_plane_state *old_state)
>>>    {
>>> +     struct intel_atomic_state *intel_state =
>>> +             to_intel_atomic_state(old_state->state);
>>>        struct drm_i915_private *dev_priv = to_i915(plane->dev);
>>>    
>>> +     if (intel_state->rps_interactive) {
>>> +             intel_rps_set_interactive(dev_priv, false);
>>> +             intel_state->rps_interactive = false;
>>> +     }
>>
>> Why is the rps_intearctive flag needed in plane state? I wanted to
>> assume prepare & cleanup hooks are fully symmetric, but this flags makes
>> me unsure. A reviewer with more display knowledge will be needed here I
>> think.
> 
> It's so that when we call intel_cleanup_plane_fb() on something that
> wasn't first prepared, we don't decrement the counter. There's a little
> bit of asymmetry at the start where we inherit the plane state.
> 
>>> +
>>>        /* Should only be called after a successful intel_prepare_plane_fb()! */
>>>        mutex_lock(&dev_priv->drm.struct_mutex);
>>>        intel_plane_unpin_fb(to_intel_plane_state(old_state));
>>> diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
>>> index 61e715ddd0d5..544812488821 100644
>>> --- a/drivers/gpu/drm/i915/intel_drv.h
>>> +++ b/drivers/gpu/drm/i915/intel_drv.h
>>> @@ -482,6 +482,8 @@ struct intel_atomic_state {
>>>         */
>>>        bool skip_intermediate_wm;
>>>    
>>> +     bool rps_interactive;
>>
>> Should the name at this level be something more tied to the subsystem
>> and not implying wider relationships? Like page flip pending? fb_prepared?
> 
> But we are in the plane state so anything plane/flip is implied. I think
> saying whether or not we've called out to rps is a reasonable name for
> the state.

Okay fair enough. Maybe just call it rps_interactive_requested?

>>> +     /* Max/min bins are special */
>>> +     if (val <= rps->min_freq_softlimit)
>>> +             new_power = LOW_POWER;
>>> +     if (val >= rps->max_freq_softlimit)
>>> +             new_power = HIGH_POWER;
>>> +
>>> +     mutex_lock(&rps->power_lock);
>>
>> Is it worth avoiding the atomic here when GEN < 6?
> 
> We don't get here when not !RPS. You mean GEN < 5 ;)

No, okay, just incomplete picture based on locking in the other helper. 
Ignore.
> 
>>> +     if (rps->interactive)
>>> +             new_power = HIGH_POWER;
>>> +     rps_set_power(dev_priv, new_power);
>>> +     mutex_unlock(&rps->power_lock);
>>   >
>>   >      rps->last_adj = 0;
>>
>> This block should go up to the beginning of the function since when
>> rps->interactive all preceding calculations are for nothing. And can
>> immediately return then.
> 
> We have to lock around rps_set_power, so you're not going to avoid
> taking the lock here, so I don't think it makes any difference.
> Certainly neater than locking everything.

Not to avoid the lock but to avoid all the trouble of calculating 
new_power to override it all if rps->interactive is set. Just looks a 
bit weird. I suspect read of rps->interactive doesn't even need to be 
under the lock so maybe:

if (rps->interactive)
	new_power = HIGH_POWER;
} else {
	switch (...)
}

mutex_lock
...
mutex_unlock


> 
>>> +void intel_rps_set_interactive(struct drm_i915_private *dev_priv, bool state)
>>
>> s/state/interactive/ for more self-documenting function body?
>>
>> And not s/dev_priv/i915/ ?!? :)
>>
>>> +{
>>> +     struct intel_rps *rps = &dev_priv->gt_pm.rps;
>>> +
>>> +     if (INTEL_GEN(dev_priv) < 6)
>>> +             return;
>>> +
>>> +     mutex_lock(&rps->power_lock);
>>> +     if (state) {
>>> +             if (!rps->interactive++ && READ_ONCE(dev_priv->gt.awake))
>>> +                     rps_set_power(dev_priv, HIGH_POWER);
>>> +     } else {
>>> +             GEM_BUG_ON(!rps->interactive);
>>> +             rps->interactive--;
>>
>> Hm maybe protect this in production code so some missed code flows in
>> the atomic paths do not cause underflow and so permanent interactive mode.
> 
> Are you suggesting testing is inadequate? ;) Underflow here isn't a big
> deal, it just locks in the HIGH_POWER (faster sampling, bias towards
> upclocking). Or worse not allow HIGH_POWER, status quo returned.

Testing for this probably is inadequate, no? Would need to be looking at 
the new debugfs flag from many test cases. And in real world probably 
quite difficult to debug too.

Whether or not it would be too much to add a DRM_WARN for this.. I am 
leaning towards saying to have it, since it is about two systems 
communicating together so it would be easier to notice a broken 
contract. But I don't insist on it.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RESEND] drm/i915: Interactive RPS mode
  2018-07-20 13:14           ` Chris Wilson
@ 2018-07-20 13:32             ` Ville Syrjälä
  2018-07-20 13:38               ` Chris Wilson
  0 siblings, 1 reply; 26+ messages in thread
From: Ville Syrjälä @ 2018-07-20 13:32 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Fri, Jul 20, 2018 at 02:14:11PM +0100, Chris Wilson wrote:
> Quoting Ville Syrjälä (2018-07-20 14:07:31)
> > On Fri, Jul 20, 2018 at 02:02:34PM +0100, Chris Wilson wrote:
> > Doing this kind of global thing from the plane hooks seems a bit
> > strange. How about just doing this directly from commit_tail()
> > etc.?
> 
> We want it upfront in prepare (so that it's set before any wait) or
> somewhere around there (atomic_state setup?). cleanup was chosen for the
> symmetry with prepare.

Looks like we have intel_atomic_prepare_commit() which I guess would be
a decent spot then. And introduce intel_atomic_cleanup_commit() to do
the reverse?

Another question is what happens where there are parallel flips
happening? One could undo the boost from the other AFAICS. But maybe
we don't care enough to protect against that?

-- 
Ville Syrjälä
Intel
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RESEND] drm/i915: Interactive RPS mode
  2018-07-20 13:32             ` Ville Syrjälä
@ 2018-07-20 13:38               ` Chris Wilson
  2018-07-20 13:50                 ` Ville Syrjälä
  0 siblings, 1 reply; 26+ messages in thread
From: Chris Wilson @ 2018-07-20 13:38 UTC (permalink / raw)
  To: Ville Syrjälä; +Cc: intel-gfx

Quoting Ville Syrjälä (2018-07-20 14:32:40)
> On Fri, Jul 20, 2018 at 02:14:11PM +0100, Chris Wilson wrote:
> > Quoting Ville Syrjälä (2018-07-20 14:07:31)
> > > On Fri, Jul 20, 2018 at 02:02:34PM +0100, Chris Wilson wrote:
> > > Doing this kind of global thing from the plane hooks seems a bit
> > > strange. How about just doing this directly from commit_tail()
> > > etc.?
> > 
> > We want it upfront in prepare (so that it's set before any wait) or
> > somewhere around there (atomic_state setup?). cleanup was chosen for the
> > symmetry with prepare.
> 
> Looks like we have intel_atomic_prepare_commit() which I guess would be
> a decent spot then. And introduce intel_atomic_cleanup_commit() to do
> the reverse?

The only other point is I started from prepare_plane for being next to
both the reprioritisation and the add_rps_boost_after_vblank. So that's
quite nice.
 
> Another question is what happens where there are parallel flips
> happening? One could undo the boost from the other AFAICS. But maybe
> we don't care enough to protect against that?

It's a counter, so the "interactive" mode remains high until all
concurrent flips are completed.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RESEND] drm/i915: Interactive RPS mode
  2018-07-20 13:38               ` Chris Wilson
@ 2018-07-20 13:50                 ` Ville Syrjälä
  2018-07-20 14:03                   ` Chris Wilson
  0 siblings, 1 reply; 26+ messages in thread
From: Ville Syrjälä @ 2018-07-20 13:50 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Fri, Jul 20, 2018 at 02:38:32PM +0100, Chris Wilson wrote:
> Quoting Ville Syrjälä (2018-07-20 14:32:40)
> > On Fri, Jul 20, 2018 at 02:14:11PM +0100, Chris Wilson wrote:
> > > Quoting Ville Syrjälä (2018-07-20 14:07:31)
> > > > On Fri, Jul 20, 2018 at 02:02:34PM +0100, Chris Wilson wrote:
> > > > Doing this kind of global thing from the plane hooks seems a bit
> > > > strange. How about just doing this directly from commit_tail()
> > > > etc.?
> > > 
> > > We want it upfront in prepare (so that it's set before any wait) or
> > > somewhere around there (atomic_state setup?). cleanup was chosen for the
> > > symmetry with prepare.
> > 
> > Looks like we have intel_atomic_prepare_commit() which I guess would be
> > a decent spot then. And introduce intel_atomic_cleanup_commit() to do
> > the reverse?
> 
> The only other point is I started from prepare_plane for being next to
> both the reprioritisation and the add_rps_boost_after_vblank. So that's
> quite nice.

Ok, I guess that's quite reasonable.

>  
> > Another question is what happens where there are parallel flips
> > happening? One could undo the boost from the other AFAICS. But maybe
> > we don't care enough to protect against that?
> 
> It's a counter, so the "interactive" mode remains high until all
> concurrent flips are completed.

Ah. I guess the bool in the atomic state threw me off. I suppose that
one is just an optimization to avoid calling the function more than
once?

-- 
Ville Syrjälä
Intel
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RESEND] drm/i915: Interactive RPS mode
  2018-07-20 13:29         ` Tvrtko Ursulin
@ 2018-07-20 13:59           ` Chris Wilson
  2018-07-20 14:58             ` Tvrtko Ursulin
  0 siblings, 1 reply; 26+ messages in thread
From: Chris Wilson @ 2018-07-20 13:59 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2018-07-20 14:29:52)
> 
> On 20/07/2018 14:02, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2018-07-20 13:49:09)
> >>
> >> On 12/07/2018 18:38, Chris Wilson wrote:
> >>> +     if (rps->interactive)
> >>> +             new_power = HIGH_POWER;
> >>> +     rps_set_power(dev_priv, new_power);
> >>> +     mutex_unlock(&rps->power_lock);
> >>   >
> >>   >      rps->last_adj = 0;
> >>
> >> This block should go up to the beginning of the function since when
> >> rps->interactive all preceding calculations are for nothing. And can
> >> immediately return then.
> > 
> > We have to lock around rps_set_power, so you're not going to avoid
> > taking the lock here, so I don't think it makes any difference.
> > Certainly neater than locking everything.
> 
> Not to avoid the lock but to avoid all the trouble of calculating 
> new_power to override it all if rps->interactive is set. Just looks a 
> bit weird. I suspect read of rps->interactive doesn't even need to be 
> under the lock so maybe:
> 
> if (rps->interactive)
>         new_power = HIGH_POWER;
> } else {
>         switch (...)
> }

There is a race there, so you do need to explain why we wouldn't care.
(There is a reasonable argument about it being periodic and we don't care
about the first vs interactive.) I thought it came out quite neat as the
atomic override.

> >>> +{
> >>> +     struct intel_rps *rps = &dev_priv->gt_pm.rps;
> >>> +
> >>> +     if (INTEL_GEN(dev_priv) < 6)
> >>> +             return;
> >>> +
> >>> +     mutex_lock(&rps->power_lock);
> >>> +     if (state) {
> >>> +             if (!rps->interactive++ && READ_ONCE(dev_priv->gt.awake))
> >>> +                     rps_set_power(dev_priv, HIGH_POWER);
> >>> +     } else {
> >>> +             GEM_BUG_ON(!rps->interactive);
> >>> +             rps->interactive--;
> >>
> >> Hm maybe protect this in production code so some missed code flows in
> >> the atomic paths do not cause underflow and so permanent interactive mode.
> > 
> > Are you suggesting testing is inadequate? ;) Underflow here isn't a big
> > deal, it just locks in the HIGH_POWER (faster sampling, bias towards
> > upclocking). Or worse not allow HIGH_POWER, status quo returned.
> 
> Testing for this probably is inadequate, no? Would need to be looking at 
> the new debugfs flag from many test cases. And in real world probably 
> quite difficult to debug too.
> 
> Whether or not it would be too much to add a DRM_WARN for this.. I am 
> leaning towards saying to have it, since it is about two systems 
> communicating together so it would be easier to notice a broken 
> contract. But I don't insist on it.

Just checking underflow is not going to catch many problems. If we can
identify some points where we know what the value should be... I wonder
if we can assert it is zero at runtime/system suspend.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RESEND] drm/i915: Interactive RPS mode
  2018-07-20 13:50                 ` Ville Syrjälä
@ 2018-07-20 14:03                   ` Chris Wilson
  2018-07-20 14:19                     ` Ville Syrjälä
  0 siblings, 1 reply; 26+ messages in thread
From: Chris Wilson @ 2018-07-20 14:03 UTC (permalink / raw)
  To: Ville Syrjälä; +Cc: intel-gfx

Quoting Ville Syrjälä (2018-07-20 14:50:25)
> On Fri, Jul 20, 2018 at 02:38:32PM +0100, Chris Wilson wrote:
> > Quoting Ville Syrjälä (2018-07-20 14:32:40)
> > > Another question is what happens where there are parallel flips
> > > happening? One could undo the boost from the other AFAICS. But maybe
> > > we don't care enough to protect against that?
> > 
> > It's a counter, so the "interactive" mode remains high until all
> > concurrent flips are completed.
> 
> Ah. I guess the bool in the atomic state threw me off. I suppose that
> one is just an optimization to avoid calling the function more than
> once?

Yes, it's that I caught the RPS counter going negative. We have more
cleanup_plane than we prepared.... I believe that's from
find_initial_plane.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RESEND] drm/i915: Interactive RPS mode
  2018-07-20 14:03                   ` Chris Wilson
@ 2018-07-20 14:19                     ` Ville Syrjälä
  2018-07-21  8:44                       ` Chris Wilson
  0 siblings, 1 reply; 26+ messages in thread
From: Ville Syrjälä @ 2018-07-20 14:19 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Fri, Jul 20, 2018 at 03:03:09PM +0100, Chris Wilson wrote:
> Quoting Ville Syrjälä (2018-07-20 14:50:25)
> > On Fri, Jul 20, 2018 at 02:38:32PM +0100, Chris Wilson wrote:
> > > Quoting Ville Syrjälä (2018-07-20 14:32:40)
> > > > Another question is what happens where there are parallel flips
> > > > happening? One could undo the boost from the other AFAICS. But maybe
> > > > we don't care enough to protect against that?
> > > 
> > > It's a counter, so the "interactive" mode remains high until all
> > > concurrent flips are completed.
> > 
> > Ah. I guess the bool in the atomic state threw me off. I suppose that
> > one is just an optimization to avoid calling the function more than
> > once?
> 
> Yes, it's that I caught the RPS counter going negative. We have more
> cleanup_plane than we prepared.... I believe that's from
> find_initial_plane.

Hmm. I don't quite see how that could happen. I believe
prepare_fb/cleanup_fb should be fully paired up within a single commit.

-- 
Ville Syrjälä
Intel
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RESEND] drm/i915: Interactive RPS mode
  2018-07-20 13:59           ` Chris Wilson
@ 2018-07-20 14:58             ` Tvrtko Ursulin
  2018-07-20 15:01               ` Chris Wilson
  0 siblings, 1 reply; 26+ messages in thread
From: Tvrtko Ursulin @ 2018-07-20 14:58 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 20/07/2018 14:59, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-07-20 14:29:52)
>>
>> On 20/07/2018 14:02, Chris Wilson wrote:
>>> Quoting Tvrtko Ursulin (2018-07-20 13:49:09)
>>>>
>>>> On 12/07/2018 18:38, Chris Wilson wrote:
>>>>> +     if (rps->interactive)
>>>>> +             new_power = HIGH_POWER;
>>>>> +     rps_set_power(dev_priv, new_power);
>>>>> +     mutex_unlock(&rps->power_lock);
>>>>    >
>>>>    >      rps->last_adj = 0;
>>>>
>>>> This block should go up to the beginning of the function since when
>>>> rps->interactive all preceding calculations are for nothing. And can
>>>> immediately return then.
>>>
>>> We have to lock around rps_set_power, so you're not going to avoid
>>> taking the lock here, so I don't think it makes any difference.
>>> Certainly neater than locking everything.
>>
>> Not to avoid the lock but to avoid all the trouble of calculating
>> new_power to override it all if rps->interactive is set. Just looks a
>> bit weird. I suspect read of rps->interactive doesn't even need to be
>> under the lock so maybe:
>>
>> if (rps->interactive)
>>          new_power = HIGH_POWER;
>> } else {
>>          switch (...)
>> }
> 
> There is a race there, so you do need to explain why we wouldn't care.
> (There is a reasonable argument about it being periodic and we don't care
> about the first vs interactive.) I thought it came out quite neat as the
> atomic override.

Putting it all under the lock works for me then. But okay, if you so 
like the override idea, which I wouldn't call neat, but unusual, it's 
not the end of the world.

>>>>> +{
>>>>> +     struct intel_rps *rps = &dev_priv->gt_pm.rps;
>>>>> +
>>>>> +     if (INTEL_GEN(dev_priv) < 6)
>>>>> +             return;
>>>>> +
>>>>> +     mutex_lock(&rps->power_lock);
>>>>> +     if (state) {
>>>>> +             if (!rps->interactive++ && READ_ONCE(dev_priv->gt.awake))
>>>>> +                     rps_set_power(dev_priv, HIGH_POWER);
>>>>> +     } else {
>>>>> +             GEM_BUG_ON(!rps->interactive);
>>>>> +             rps->interactive--;
>>>>
>>>> Hm maybe protect this in production code so some missed code flows in
>>>> the atomic paths do not cause underflow and so permanent interactive mode.
>>>
>>> Are you suggesting testing is inadequate? ;) Underflow here isn't a big
>>> deal, it just locks in the HIGH_POWER (faster sampling, bias towards
>>> upclocking). Or worse not allow HIGH_POWER, status quo returned.
>>
>> Testing for this probably is inadequate, no? Would need to be looking at
>> the new debugfs flag from many test cases. And in real world probably
>> quite difficult to debug too.
>>
>> Whether or not it would be too much to add a DRM_WARN for this.. I am
>> leaning towards saying to have it, since it is about two systems
>> communicating together so it would be easier to notice a broken
>> contract. But I don't insist on it.
> 
> Just checking underflow is not going to catch many problems. If we can
> identify some points where we know what the value should be... I wonder
> if we can assert it is zero at runtime/system suspend.

True, it only catches the imbalance in one direction quickly. If suspend 
idea works go with that, but what's so bad about some log messages? 
Assuming leak towards the overflow direction on each flip it could be 
reached in ~18 hours which is realistic to get bug reports of.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RESEND] drm/i915: Interactive RPS mode
  2018-07-20 14:58             ` Tvrtko Ursulin
@ 2018-07-20 15:01               ` Chris Wilson
  0 siblings, 0 replies; 26+ messages in thread
From: Chris Wilson @ 2018-07-20 15:01 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2018-07-20 15:58:51)
> True, it only catches the imbalance in one direction quickly. If suspend 
> idea works go with that, but what's so bad about some log messages? 
> Assuming leak towards the overflow direction on each flip it could be 
> reached in ~18 hours which is realistic to get bug reports of.

A log message means handling bugs reports for years after you fix the bug :)

Silence is bliss.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RESEND] drm/i915: Interactive RPS mode
  2018-07-20 14:19                     ` Ville Syrjälä
@ 2018-07-21  8:44                       ` Chris Wilson
  0 siblings, 0 replies; 26+ messages in thread
From: Chris Wilson @ 2018-07-21  8:44 UTC (permalink / raw)
  To: Ville Syrjälä; +Cc: intel-gfx

Quoting Ville Syrjälä (2018-07-20 15:19:20)
> On Fri, Jul 20, 2018 at 03:03:09PM +0100, Chris Wilson wrote:
> > Quoting Ville Syrjälä (2018-07-20 14:50:25)
> > > On Fri, Jul 20, 2018 at 02:38:32PM +0100, Chris Wilson wrote:
> > > > Quoting Ville Syrjälä (2018-07-20 14:32:40)
> > > > > Another question is what happens where there are parallel flips
> > > > > happening? One could undo the boost from the other AFAICS. But maybe
> > > > > we don't care enough to protect against that?
> > > > 
> > > > It's a counter, so the "interactive" mode remains high until all
> > > > concurrent flips are completed.
> > > 
> > > Ah. I guess the bool in the atomic state threw me off. I suppose that
> > > one is just an optimization to avoid calling the function more than
> > > once?
> > 
> > Yes, it's that I caught the RPS counter going negative. We have more
> > cleanup_plane than we prepared.... I believe that's from
> > find_initial_plane.
> 
> Hmm. I don't quite see how that could happen. I believe
> prepare_fb/cleanup_fb should be fully paired up within a single commit.

The way I read intel_find_initial_plane_obj() is that it writes the
current configuration into the current intel_crtc->base.primary->state.
I think that is where the initial mismatch comes from. Also we seem to
be quite adapt at not holding rps_interactive high after modeset
completion, which suggests to me that something sneaky is happening with
the plane_state duplication.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH] drm/i915: Interactive RPS mode
  2018-07-11 21:01 [PATCH] drm/i915: Interactive RPS mode Chris Wilson
                   ` (4 preceding siblings ...)
  2018-07-12  8:02 ` [PATCH v4] " Chris Wilson
@ 2018-07-23 10:01 ` Chris Wilson
  2018-07-31 13:01   ` Joonas Lahtinen
  5 siblings, 1 reply; 26+ messages in thread
From: Chris Wilson @ 2018-07-23 10:01 UTC (permalink / raw)
  To: intel-gfx

RPS provides a feedback loop where we use the load during the previous
evaluation interval to decide whether to up or down clock the GPU
frequency. Our responsiveness is split into 3 regimes, a high and low
plateau with the intent to keep the gpu clocked high to cover occasional
stalls under high load, and low despite occasional glitches under steady
low load, and inbetween. However, we run into situations like kodi where
we want to stay at low power (video decoding is done efficiently
inside the fixed function HW and doesn't need high clocks even for high
bitrate streams), but just occasionally the pipeline is more complex
than a video decode and we need a smidgen of extra GPU power to present
on time. In the high power regime, we sample at sub frame intervals with
a bias to upclocking, and conversely at low power we sample over a few
frames worth to provide what we consider to be the right levels of
responsiveness respectively. At low power, we more or less expect to be
kicked out to high power at the start of a busy sequence by waitboosting.

Prior to commit e9af4ea2b9e7 ("drm/i915: Avoid waitboosting on the active
request") whenever we missed the frame or stalled, we would immediate go
full throttle and upclock the GPU to max. But in commit e9af4ea2b9e7, we
relaxed the waitboosting to only apply if the pipeline was deep to avoid
over-committing resources for a near miss. Sadly though, a near miss is
still a miss, and perceptible as jitter in the frame delivery.

To try and prevent the near miss before having to resort to boosting
after the fact, we use the pageflip queue as an indication that we are
in an "interactive" regime and so should sample the load more frequently
to provide power before the frame misses it vblank. This will make us
more favorable to providing a small power increase (one or two bins) as
required rather than going all the way to maximum and then having to
work back down again. (We still keep the waitboosting mechanism around
just in case a dramatic change in system load requires urgent uplocking,
faster than we can provide in a few evaluation intervals.)

v2: Reduce rps_set_interactive to a boolean parameter to avoid the
confusion of what if they wanted a new power mode after pinning to a
different mode (which to choose?)
v3: Only reprogram RPS while the GT is awake, it will be set when we
wake the GT, and while off warns about being used outside of rpm.
v4: Fix deferred application of interactive mode
v5: s/state/interactive/

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107111
Fixes: e9af4ea2b9e7 ("drm/i915: Avoid waitboosting on the active request")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Radoslaw Szwichtenberg <radoslaw.szwichtenberg@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c  |  1 +
 drivers/gpu/drm/i915/i915_drv.h      |  4 ++
 drivers/gpu/drm/i915/intel_display.c | 20 ++++++
 drivers/gpu/drm/i915/intel_drv.h     |  2 +
 drivers/gpu/drm/i915/intel_pm.c      | 91 +++++++++++++++++++---------
 5 files changed, 89 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 59dc0610ea44..08d9b0748914 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2218,6 +2218,7 @@ static int i915_rps_boost_info(struct seq_file *m, void *data)
 	seq_printf(m, "CPU waiting? %d\n", count_irq_waiters(dev_priv));
 	seq_printf(m, "Boosts outstanding? %d\n",
 		   atomic_read(&rps->num_waiters));
+	seq_printf(m, "Interactive? %d\n", READ_ONCE(rps->interactive));
 	seq_printf(m, "Frequency requested %d\n",
 		   intel_gpu_freq(dev_priv, rps->cur_freq));
 	seq_printf(m, "  min hard:%d, soft:%d; max soft:%d, hard:%d\n",
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 0f49f9988dfa..4ee3f8870da6 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -784,6 +784,8 @@ struct intel_rps {
 
 	int last_adj;
 	enum { LOW_POWER, BETWEEN, HIGH_POWER } power;
+	unsigned int interactive;
+	struct mutex power_lock;
 
 	bool enabled;
 	atomic_t num_waiters;
@@ -3422,6 +3424,8 @@ extern void i915_redisable_vga_power_on(struct drm_i915_private *dev_priv);
 extern bool ironlake_set_drps(struct drm_i915_private *dev_priv, u8 val);
 extern void intel_init_pch_refclk(struct drm_i915_private *dev_priv);
 extern int intel_set_rps(struct drm_i915_private *dev_priv, u8 val);
+extern void intel_rps_mark_interactive(struct drm_i915_private *i915,
+				       bool interactive);
 extern bool intel_set_memory_cxsr(struct drm_i915_private *dev_priv,
 				  bool enable);
 
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 577b30dde45b..73c6d56ba3ec 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -13104,6 +13104,19 @@ intel_prepare_plane_fb(struct drm_plane *plane,
 		add_rps_boost_after_vblank(new_state->crtc, new_state->fence);
 	}
 
+	/*
+	 * We declare pageflips to be interactive and so merit a small bias
+	 * towards upclocking to deliver the frame on time. By only changing
+	 * the RPS thresholds to sample more regularly and aim for higher
+	 * clocks we can hopefully deliver low power workloads (like kodi)
+	 * that are not quite steady state without resorting to forcing
+	 * maximum clocks following a vblank miss (see do_rps_boost()).
+	 */
+	if (!intel_state->rps_interactive) {
+		intel_rps_mark_interactive(dev_priv, true);
+		intel_state->rps_interactive = true;
+	}
+
 	return 0;
 }
 
@@ -13120,8 +13133,15 @@ void
 intel_cleanup_plane_fb(struct drm_plane *plane,
 		       struct drm_plane_state *old_state)
 {
+	struct intel_atomic_state *intel_state =
+		to_intel_atomic_state(old_state->state);
 	struct drm_i915_private *dev_priv = to_i915(plane->dev);
 
+	if (intel_state->rps_interactive) {
+		intel_rps_mark_interactive(dev_priv, false);
+		intel_state->rps_interactive = false;
+	}
+
 	/* Should only be called after a successful intel_prepare_plane_fb()! */
 	mutex_lock(&dev_priv->drm.struct_mutex);
 	intel_plane_unpin_fb(to_intel_plane_state(old_state));
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index c275f91244a6..17af06d8a43e 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -484,6 +484,8 @@ struct intel_atomic_state {
 	 */
 	bool skip_intermediate_wm;
 
+	bool rps_interactive;
+
 	/* Gen9+ only */
 	struct skl_ddb_values wm_results;
 
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 7312ecb73415..2d415acf4a65 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -6264,41 +6264,14 @@ static u32 intel_rps_limits(struct drm_i915_private *dev_priv, u8 val)
 	return limits;
 }
 
-static void gen6_set_rps_thresholds(struct drm_i915_private *dev_priv, u8 val)
+static void rps_set_power(struct drm_i915_private *dev_priv, int new_power)
 {
 	struct intel_rps *rps = &dev_priv->gt_pm.rps;
-	int new_power;
 	u32 threshold_up = 0, threshold_down = 0; /* in % */
 	u32 ei_up = 0, ei_down = 0;
 
-	new_power = rps->power;
-	switch (rps->power) {
-	case LOW_POWER:
-		if (val > rps->efficient_freq + 1 &&
-		    val > rps->cur_freq)
-			new_power = BETWEEN;
-		break;
+	lockdep_assert_held(&rps->power_lock);
 
-	case BETWEEN:
-		if (val <= rps->efficient_freq &&
-		    val < rps->cur_freq)
-			new_power = LOW_POWER;
-		else if (val >= rps->rp0_freq &&
-			 val > rps->cur_freq)
-			new_power = HIGH_POWER;
-		break;
-
-	case HIGH_POWER:
-		if (val < (rps->rp1_freq + rps->rp0_freq) >> 1 &&
-		    val < rps->cur_freq)
-			new_power = BETWEEN;
-		break;
-	}
-	/* Max/min bins are special */
-	if (val <= rps->min_freq_softlimit)
-		new_power = LOW_POWER;
-	if (val >= rps->max_freq_softlimit)
-		new_power = HIGH_POWER;
 	if (new_power == rps->power)
 		return;
 
@@ -6365,9 +6338,68 @@ static void gen6_set_rps_thresholds(struct drm_i915_private *dev_priv, u8 val)
 	rps->power = new_power;
 	rps->up_threshold = threshold_up;
 	rps->down_threshold = threshold_down;
+}
+
+static void gen6_set_rps_thresholds(struct drm_i915_private *dev_priv, u8 val)
+{
+	struct intel_rps *rps = &dev_priv->gt_pm.rps;
+	int new_power;
+
+	new_power = rps->power;
+	switch (rps->power) {
+	case LOW_POWER:
+		if (val > rps->efficient_freq + 1 &&
+		    val > rps->cur_freq)
+			new_power = BETWEEN;
+		break;
+
+	case BETWEEN:
+		if (val <= rps->efficient_freq &&
+		    val < rps->cur_freq)
+			new_power = LOW_POWER;
+		else if (val >= rps->rp0_freq &&
+			 val > rps->cur_freq)
+			new_power = HIGH_POWER;
+		break;
+
+	case HIGH_POWER:
+		if (val < (rps->rp1_freq + rps->rp0_freq) >> 1 &&
+		    val < rps->cur_freq)
+			new_power = BETWEEN;
+		break;
+	}
+	/* Max/min bins are special */
+	if (val <= rps->min_freq_softlimit)
+		new_power = LOW_POWER;
+	if (val >= rps->max_freq_softlimit)
+		new_power = HIGH_POWER;
+
+	mutex_lock(&rps->power_lock);
+	if (rps->interactive)
+		new_power = HIGH_POWER;
+	rps_set_power(dev_priv, new_power);
+	mutex_unlock(&rps->power_lock);
 	rps->last_adj = 0;
 }
 
+void intel_rps_mark_interactive(struct drm_i915_private *i915, bool interactive)
+{
+	struct intel_rps *rps = &i915->gt_pm.rps;
+
+	if (INTEL_GEN(i915) < 6)
+		return;
+
+	mutex_lock(&rps->power_lock);
+	if (interactive) {
+		if (!rps->interactive++ && READ_ONCE(i915->gt.awake))
+			rps_set_power(i915, HIGH_POWER);
+	} else {
+		GEM_BUG_ON(!rps->interactive);
+		rps->interactive--;
+	}
+	mutex_unlock(&rps->power_lock);
+}
+
 static u32 gen6_rps_pm_mask(struct drm_i915_private *dev_priv, u8 val)
 {
 	struct intel_rps *rps = &dev_priv->gt_pm.rps;
@@ -9604,6 +9636,7 @@ int intel_freq_opcode(struct drm_i915_private *dev_priv, int val)
 void intel_pm_setup(struct drm_i915_private *dev_priv)
 {
 	mutex_init(&dev_priv->pcu_lock);
+	mutex_init(&dev_priv->gt_pm.rps.power_lock);
 
 	atomic_set(&dev_priv->gt_pm.rps.num_waiters, 0);
 
-- 
2.18.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH] drm/i915: Interactive RPS mode
  2018-07-23 10:01 ` [PATCH] drm/i915: Interactive RPS mode Chris Wilson
@ 2018-07-31 13:01   ` Joonas Lahtinen
  0 siblings, 0 replies; 26+ messages in thread
From: Joonas Lahtinen @ 2018-07-31 13:01 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Quoting Chris Wilson (2018-07-23 13:01:00)
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -784,6 +784,8 @@ struct intel_rps {
>  
>         int last_adj;
>         enum { LOW_POWER, BETWEEN, HIGH_POWER } power;
> +       unsigned int interactive;
> +       struct mutex power_lock;

Please describe the new mutex with a few words because it's not
immediately obvious which members fall under it.

This kind of a change could always use some Tested-by's.

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2018-07-31 13:01 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-11 21:01 [PATCH] drm/i915: Interactive RPS mode Chris Wilson
2018-07-11 22:06 ` ✗ Fi.CI.CHECKPATCH: warning for " Patchwork
2018-07-11 22:07 ` ✗ Fi.CI.SPARSE: " Patchwork
2018-07-11 22:23 ` ✗ Fi.CI.BAT: failure " Patchwork
2018-07-12  7:50 ` [PATCH v3] " Chris Wilson
2018-07-12  8:02 ` [PATCH v4] " Chris Wilson
2018-07-12 17:38   ` [RESEND] " Chris Wilson
2018-07-20 12:49     ` Tvrtko Ursulin
2018-07-20 13:02       ` Chris Wilson
2018-07-20 13:07         ` Ville Syrjälä
2018-07-20 13:14           ` Chris Wilson
2018-07-20 13:32             ` Ville Syrjälä
2018-07-20 13:38               ` Chris Wilson
2018-07-20 13:50                 ` Ville Syrjälä
2018-07-20 14:03                   ` Chris Wilson
2018-07-20 14:19                     ` Ville Syrjälä
2018-07-21  8:44                       ` Chris Wilson
2018-07-20 13:29         ` Tvrtko Ursulin
2018-07-20 13:59           ` Chris Wilson
2018-07-20 14:58             ` Tvrtko Ursulin
2018-07-20 15:01               ` Chris Wilson
2018-07-12 17:58   ` ✗ Fi.CI.CHECKPATCH: warning for drm/i915: Interactive RPS mode (rev4) Patchwork
2018-07-12 17:59   ` ✗ Fi.CI.SPARSE: " Patchwork
2018-07-12 18:15   ` ✓ Fi.CI.BAT: success " Patchwork
2018-07-23 10:01 ` [PATCH] drm/i915: Interactive RPS mode Chris Wilson
2018-07-31 13:01   ` Joonas Lahtinen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.