Trio of latency sensitive patches

All of lore.kernel.org
 help / color / mirror / Atom feed

* Trio of latency sensitive patches
@ 2018-07-30 15:25 Chris Wilson
  2018-07-30 15:25 ` [PATCH 1/3] drm/i915: Limit C-states when waiting for the active request Chris Wilson
                   ` (6 more replies)
  0 siblings, 7 replies; 15+ messages in thread
From: Chris Wilson @ 2018-07-30 15:25 UTC (permalink / raw)
  To: intel-gfx

Just a couple of bug reports suggesting we should do better with power
distribution...
-Chris


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/3] drm/i915: Limit C-states when waiting for the active request
  2018-07-30 15:25 Trio of latency sensitive patches Chris Wilson
@ 2018-07-30 15:25 ` Chris Wilson
  2018-08-01  9:56   ` Chris Wilson
  2018-08-03 10:48   ` Tvrtko Ursulin
  2018-07-30 15:25 ` [PATCH 2/3] drm/i915: Do not use iowait while waiting for the GPU Chris Wilson
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 15+ messages in thread
From: Chris Wilson @ 2018-07-30 15:25 UTC (permalink / raw)
  To: intel-gfx; +Cc: Eero Tamminen

If we are waiting for the currently executing request, we have a good
idea that it will be completed in the very near future and so want to
cap the CPU_DMA_LATENCY to ensure that we wake up the client quickly.

v2: Not allowed to block in kmalloc after setting TASK_INTERRUPTIBLE.
v3: Avoid the blocking notifier as well for TASK_INTERRUPTIBLE
v4: Beautification?
v5: And ignore the preemptibility of queue_work before schedule.

Testcase: igt/gem_sync/store-default
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Eero Tamminen <eero.t.tamminen@intel.com>
Cc: Francisco Jerez <currojerez@riseup.net>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_request.c | 52 +++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 5c2c93cbab12..f3ff8dbe363d 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1258,6 +1258,51 @@ static bool __i915_wait_request_check_and_reset(struct i915_request *request)
 	return true;
 }
 
+struct wait_dma_qos {
+	struct pm_qos_request req;
+	struct work_struct add, del;
+};
+
+static void __wait_dma_qos_add(struct work_struct *work)
+{
+	struct wait_dma_qos *qos = container_of(work, typeof(*qos), add);
+
+	pm_qos_add_request(&qos->req, PM_QOS_CPU_DMA_LATENCY, 50);
+}
+
+static void __wait_dma_qos_del(struct work_struct *work)
+{
+	struct wait_dma_qos *qos = container_of(work, typeof(*qos), del);
+
+	if (!cancel_work_sync(&qos->add))
+		pm_qos_remove_request(&qos->req);
+
+	kfree(qos);
+}
+
+static struct wait_dma_qos *wait_dma_qos_add(void)
+{
+	struct wait_dma_qos *qos;
+
+	/* Called under TASK_INTERRUPTIBLE, so not allowed to sleep/block. */
+	qos = kzalloc(sizeof(*qos), GFP_NOWAIT | __GFP_NOWARN);
+	if (!qos)
+		return NULL;
+
+	INIT_WORK(&qos->add, __wait_dma_qos_add);
+	INIT_WORK(&qos->del, __wait_dma_qos_del);
+	schedule_work_on(raw_smp_processor_id(), &qos->add);
+
+	return qos;
+}
+
+static void wait_dma_qos_del(struct wait_dma_qos *qos)
+{
+	/* Defer to worker so not incur extra latency for our woken client. */
+	if (qos)
+		schedule_work(&qos->del);
+}
+
 /**
  * i915_request_wait - wait until execution of request has finished
  * @rq: the request to wait upon
@@ -1286,6 +1331,7 @@ long i915_request_wait(struct i915_request *rq,
 	wait_queue_head_t *errq = &rq->i915->gpu_error.wait_queue;
 	DEFINE_WAIT_FUNC(reset, default_wake_function);
 	DEFINE_WAIT_FUNC(exec, default_wake_function);
+	struct wait_dma_qos *qos = NULL;
 	struct intel_wait wait;
 
 	might_sleep();
@@ -1363,6 +1409,11 @@ long i915_request_wait(struct i915_request *rq,
 			break;
 		}
 
+		if (!qos &&
+		    i915_seqno_passed(intel_engine_get_seqno(rq->engine),
+				      wait.seqno - 1))
+			qos = wait_dma_qos_add();
+
 		timeout = io_schedule_timeout(timeout);
 
 		if (intel_wait_complete(&wait) &&
@@ -1412,6 +1463,7 @@ long i915_request_wait(struct i915_request *rq,
 	if (flags & I915_WAIT_LOCKED)
 		remove_wait_queue(errq, &reset);
 	remove_wait_queue(&rq->execute, &exec);
+	wait_dma_qos_del(qos);
 	trace_i915_request_wait_end(rq);
 
 	return timeout;
-- 
2.18.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/3] drm/i915: Do not use iowait while waiting for the GPU
  2018-07-30 15:25 Trio of latency sensitive patches Chris Wilson
  2018-07-30 15:25 ` [PATCH 1/3] drm/i915: Limit C-states when waiting for the active request Chris Wilson
@ 2018-07-30 15:25 ` Chris Wilson
  2018-07-31 13:03   ` Mika Kuoppala
  2018-07-30 15:25 ` [PATCH 3/3] drm/i915: Interactive RPS mode Chris Wilson
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: Chris Wilson @ 2018-07-30 15:25 UTC (permalink / raw)
  To: intel-gfx; +Cc: Eero Tamminen

A recent trend for cpufreq is to boost the CPU frequencies for
iowaiters, in particularly to benefit high frequency I/O. We do the same
and boost the GPU clocks to try and minimise time spent waiting for the
GPU. However, as the igfx and CPU share the same TDP, boosting the CPU
frequency will result in the GPU being throttled and its frequency being
reduced. Thus declaring iowait negatively impacts on GPU throughput.

v2: Both sleeps!

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107410
References: 52ccc4314293 ("cpufreq: intel_pstate: HWP boost performance on IO wakeup")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Eero Tamminen <eero.t.tamminen@intel.com>
Cc: Francisco Jerez <currojerez@riseup.net>
---
 drivers/gpu/drm/i915/i915_request.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index f3ff8dbe363d..3e48ea87b324 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1376,7 +1376,7 @@ long i915_request_wait(struct i915_request *rq,
 			goto complete;
 		}
 
-		timeout = io_schedule_timeout(timeout);
+		timeout = schedule_timeout(timeout);
 	} while (1);
 
 	GEM_BUG_ON(!intel_wait_has_seqno(&wait));
@@ -1414,7 +1414,7 @@ long i915_request_wait(struct i915_request *rq,
 				      wait.seqno - 1))
 			qos = wait_dma_qos_add();
 
-		timeout = io_schedule_timeout(timeout);
+		timeout = schedule_timeout(timeout);
 
 		if (intel_wait_complete(&wait) &&
 		    intel_wait_check_request(&wait, rq))
-- 
2.18.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/3] drm/i915: Interactive RPS mode
  2018-07-30 15:25 Trio of latency sensitive patches Chris Wilson
  2018-07-30 15:25 ` [PATCH 1/3] drm/i915: Limit C-states when waiting for the active request Chris Wilson
  2018-07-30 15:25 ` [PATCH 2/3] drm/i915: Do not use iowait while waiting for the GPU Chris Wilson
@ 2018-07-30 15:25 ` Chris Wilson
  2018-07-30 15:41 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/3] drm/i915: Limit C-states when waiting for the active request Patchwork
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Chris Wilson @ 2018-07-30 15:25 UTC (permalink / raw)
  To: intel-gfx

RPS provides a feedback loop where we use the load during the previous
evaluation interval to decide whether to up or down clock the GPU
frequency. Our responsiveness is split into 3 regimes, a high and low
plateau with the intent to keep the gpu clocked high to cover occasional
stalls under high load, and low despite occasional glitches under steady
low load, and inbetween. However, we run into situations like kodi where
we want to stay at low power (video decoding is done efficiently
inside the fixed function HW and doesn't need high clocks even for high
bitrate streams), but just occasionally the pipeline is more complex
than a video decode and we need a smidgen of extra GPU power to present
on time. In the high power regime, we sample at sub frame intervals with
a bias to upclocking, and conversely at low power we sample over a few
frames worth to provide what we consider to be the right levels of
responsiveness respectively. At low power, we more or less expect to be
kicked out to high power at the start of a busy sequence by waitboosting.

Prior to commit e9af4ea2b9e7 ("drm/i915: Avoid waitboosting on the active
request") whenever we missed the frame or stalled, we would immediate go
full throttle and upclock the GPU to max. But in commit e9af4ea2b9e7, we
relaxed the waitboosting to only apply if the pipeline was deep to avoid
over-committing resources for a near miss. Sadly though, a near miss is
still a miss, and perceptible as jitter in the frame delivery.

To try and prevent the near miss before having to resort to boosting
after the fact, we use the pageflip queue as an indication that we are
in an "interactive" regime and so should sample the load more frequently
to provide power before the frame misses it vblank. This will make us
more favorable to providing a small power increase (one or two bins) as
required rather than going all the way to maximum and then having to
work back down again. (We still keep the waitboosting mechanism around
just in case a dramatic change in system load requires urgent uplocking,
faster than we can provide in a few evaluation intervals.)

v2: Reduce rps_set_interactive to a boolean parameter to avoid the
confusion of what if they wanted a new power mode after pinning to a
different mode (which to choose?)
v3: Only reprogram RPS while the GT is awake, it will be set when we
wake the GT, and while off warns about being used outside of rpm.
v4: Fix deferred application of interactive mode
v5: s/state/interactive/

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107111
Fixes: e9af4ea2b9e7 ("drm/i915: Avoid waitboosting on the active request")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Radoslaw Szwichtenberg <radoslaw.szwichtenberg@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c  |  1 +
 drivers/gpu/drm/i915/i915_drv.h      |  4 ++
 drivers/gpu/drm/i915/intel_display.c | 20 ++++++
 drivers/gpu/drm/i915/intel_drv.h     |  2 +
 drivers/gpu/drm/i915/intel_pm.c      | 91 +++++++++++++++++++---------
 5 files changed, 89 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 59dc0610ea44..08d9b0748914 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2218,6 +2218,7 @@ static int i915_rps_boost_info(struct seq_file *m, void *data)
 	seq_printf(m, "CPU waiting? %d\n", count_irq_waiters(dev_priv));
 	seq_printf(m, "Boosts outstanding? %d\n",
 		   atomic_read(&rps->num_waiters));
+	seq_printf(m, "Interactive? %d\n", READ_ONCE(rps->interactive));
 	seq_printf(m, "Frequency requested %d\n",
 		   intel_gpu_freq(dev_priv, rps->cur_freq));
 	seq_printf(m, "  min hard:%d, soft:%d; max soft:%d, hard:%d\n",
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 0f49f9988dfa..4ee3f8870da6 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -784,6 +784,8 @@ struct intel_rps {
 
 	int last_adj;
 	enum { LOW_POWER, BETWEEN, HIGH_POWER } power;
+	unsigned int interactive;
+	struct mutex power_lock;
 
 	bool enabled;
 	atomic_t num_waiters;
@@ -3422,6 +3424,8 @@ extern void i915_redisable_vga_power_on(struct drm_i915_private *dev_priv);
 extern bool ironlake_set_drps(struct drm_i915_private *dev_priv, u8 val);
 extern void intel_init_pch_refclk(struct drm_i915_private *dev_priv);
 extern int intel_set_rps(struct drm_i915_private *dev_priv, u8 val);
+extern void intel_rps_mark_interactive(struct drm_i915_private *i915,
+				       bool interactive);
 extern bool intel_set_memory_cxsr(struct drm_i915_private *dev_priv,
 				  bool enable);
 
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 577b30dde45b..73c6d56ba3ec 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -13104,6 +13104,19 @@ intel_prepare_plane_fb(struct drm_plane *plane,
 		add_rps_boost_after_vblank(new_state->crtc, new_state->fence);
 	}
 
+	/*
+	 * We declare pageflips to be interactive and so merit a small bias
+	 * towards upclocking to deliver the frame on time. By only changing
+	 * the RPS thresholds to sample more regularly and aim for higher
+	 * clocks we can hopefully deliver low power workloads (like kodi)
+	 * that are not quite steady state without resorting to forcing
+	 * maximum clocks following a vblank miss (see do_rps_boost()).
+	 */
+	if (!intel_state->rps_interactive) {
+		intel_rps_mark_interactive(dev_priv, true);
+		intel_state->rps_interactive = true;
+	}
+
 	return 0;
 }
 
@@ -13120,8 +13133,15 @@ void
 intel_cleanup_plane_fb(struct drm_plane *plane,
 		       struct drm_plane_state *old_state)
 {
+	struct intel_atomic_state *intel_state =
+		to_intel_atomic_state(old_state->state);
 	struct drm_i915_private *dev_priv = to_i915(plane->dev);
 
+	if (intel_state->rps_interactive) {
+		intel_rps_mark_interactive(dev_priv, false);
+		intel_state->rps_interactive = false;
+	}
+
 	/* Should only be called after a successful intel_prepare_plane_fb()! */
 	mutex_lock(&dev_priv->drm.struct_mutex);
 	intel_plane_unpin_fb(to_intel_plane_state(old_state));
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index 99a5f5be5b82..1ad7c1124bef 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -484,6 +484,8 @@ struct intel_atomic_state {
 	 */
 	bool skip_intermediate_wm;
 
+	bool rps_interactive;
+
 	/* Gen9+ only */
 	struct skl_ddb_values wm_results;
 
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 8a4152244571..f1b45a7f6550 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -6256,41 +6256,14 @@ static u32 intel_rps_limits(struct drm_i915_private *dev_priv, u8 val)
 	return limits;
 }
 
-static void gen6_set_rps_thresholds(struct drm_i915_private *dev_priv, u8 val)
+static void rps_set_power(struct drm_i915_private *dev_priv, int new_power)
 {
 	struct intel_rps *rps = &dev_priv->gt_pm.rps;
-	int new_power;
 	u32 threshold_up = 0, threshold_down = 0; /* in % */
 	u32 ei_up = 0, ei_down = 0;
 
-	new_power = rps->power;
-	switch (rps->power) {
-	case LOW_POWER:
-		if (val > rps->efficient_freq + 1 &&
-		    val > rps->cur_freq)
-			new_power = BETWEEN;
-		break;
+	lockdep_assert_held(&rps->power_lock);
 
-	case BETWEEN:
-		if (val <= rps->efficient_freq &&
-		    val < rps->cur_freq)
-			new_power = LOW_POWER;
-		else if (val >= rps->rp0_freq &&
-			 val > rps->cur_freq)
-			new_power = HIGH_POWER;
-		break;
-
-	case HIGH_POWER:
-		if (val < (rps->rp1_freq + rps->rp0_freq) >> 1 &&
-		    val < rps->cur_freq)
-			new_power = BETWEEN;
-		break;
-	}
-	/* Max/min bins are special */
-	if (val <= rps->min_freq_softlimit)
-		new_power = LOW_POWER;
-	if (val >= rps->max_freq_softlimit)
-		new_power = HIGH_POWER;
 	if (new_power == rps->power)
 		return;
 
@@ -6357,9 +6330,68 @@ static void gen6_set_rps_thresholds(struct drm_i915_private *dev_priv, u8 val)
 	rps->power = new_power;
 	rps->up_threshold = threshold_up;
 	rps->down_threshold = threshold_down;
+}
+
+static void gen6_set_rps_thresholds(struct drm_i915_private *dev_priv, u8 val)
+{
+	struct intel_rps *rps = &dev_priv->gt_pm.rps;
+	int new_power;
+
+	new_power = rps->power;
+	switch (rps->power) {
+	case LOW_POWER:
+		if (val > rps->efficient_freq + 1 &&
+		    val > rps->cur_freq)
+			new_power = BETWEEN;
+		break;
+
+	case BETWEEN:
+		if (val <= rps->efficient_freq &&
+		    val < rps->cur_freq)
+			new_power = LOW_POWER;
+		else if (val >= rps->rp0_freq &&
+			 val > rps->cur_freq)
+			new_power = HIGH_POWER;
+		break;
+
+	case HIGH_POWER:
+		if (val < (rps->rp1_freq + rps->rp0_freq) >> 1 &&
+		    val < rps->cur_freq)
+			new_power = BETWEEN;
+		break;
+	}
+	/* Max/min bins are special */
+	if (val <= rps->min_freq_softlimit)
+		new_power = LOW_POWER;
+	if (val >= rps->max_freq_softlimit)
+		new_power = HIGH_POWER;
+
+	mutex_lock(&rps->power_lock);
+	if (rps->interactive)
+		new_power = HIGH_POWER;
+	rps_set_power(dev_priv, new_power);
+	mutex_unlock(&rps->power_lock);
 	rps->last_adj = 0;
 }
 
+void intel_rps_mark_interactive(struct drm_i915_private *i915, bool interactive)
+{
+	struct intel_rps *rps = &i915->gt_pm.rps;
+
+	if (INTEL_GEN(i915) < 6)
+		return;
+
+	mutex_lock(&rps->power_lock);
+	if (interactive) {
+		if (!rps->interactive++ && READ_ONCE(i915->gt.awake))
+			rps_set_power(i915, HIGH_POWER);
+	} else {
+		GEM_BUG_ON(!rps->interactive);
+		rps->interactive--;
+	}
+	mutex_unlock(&rps->power_lock);
+}
+
 static u32 gen6_rps_pm_mask(struct drm_i915_private *dev_priv, u8 val)
 {
 	struct intel_rps *rps = &dev_priv->gt_pm.rps;
@@ -9596,6 +9628,7 @@ int intel_freq_opcode(struct drm_i915_private *dev_priv, int val)
 void intel_pm_setup(struct drm_i915_private *dev_priv)
 {
 	mutex_init(&dev_priv->pcu_lock);
+	mutex_init(&dev_priv->gt_pm.rps.power_lock);
 
 	atomic_set(&dev_priv->gt_pm.rps.num_waiters, 0);
 
-- 
2.18.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/3] drm/i915: Limit C-states when waiting for the active request
  2018-07-30 15:25 Trio of latency sensitive patches Chris Wilson
                   ` (2 preceding siblings ...)
  2018-07-30 15:25 ` [PATCH 3/3] drm/i915: Interactive RPS mode Chris Wilson
@ 2018-07-30 15:41 ` Patchwork
  2018-07-30 15:43 ` ✗ Fi.CI.SPARSE: " Patchwork
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Patchwork @ 2018-07-30 15:41 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/3] drm/i915: Limit C-states when waiting for the active request
URL   : https://patchwork.freedesktop.org/series/47435/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
79a7b4257bd9 drm/i915: Limit C-states when waiting for the active request
c9a918d455b0 drm/i915: Do not use iowait while waiting for the GPU
-:16: WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description (prefer a maximum 75 chars per line)
#16: 
References: 52ccc4314293 ("cpufreq: intel_pstate: HWP boost performance on IO wakeup")

-:16: ERROR:GIT_COMMIT_ID: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' - ie: 'commit 52ccc4314293 ("cpufreq: intel_pstate: HWP boost performance on IO wakeup")'
#16: 
References: 52ccc4314293 ("cpufreq: intel_pstate: HWP boost performance on IO wakeup")

total: 1 errors, 1 warnings, 0 checks, 16 lines checked
8e9cd9e9f5bd drm/i915: Interactive RPS mode
-:27: ERROR:GIT_COMMIT_ID: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' - ie: 'commit e9af4ea2b9e7 ("drm/i915: Avoid waitboosting on the active request")'
#27: 
full throttle and upclock the GPU to max. But in commit e9af4ea2b9e7, we

-:79: CHECK:UNCOMMENTED_DEFINITION: struct mutex definition without comment
#79: FILE: drivers/gpu/drm/i915/i915_drv.h:788:
+	struct mutex power_lock;

-:87: CHECK:AVOID_EXTERNS: extern prototypes should be avoided in .h files
#87: FILE: drivers/gpu/drm/i915/i915_drv.h:3427:
+extern void intel_rps_mark_interactive(struct drm_i915_private *i915,

total: 1 errors, 0 warnings, 2 checks, 183 lines checked

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* ✗ Fi.CI.SPARSE: warning for series starting with [1/3] drm/i915: Limit C-states when waiting for the active request
  2018-07-30 15:25 Trio of latency sensitive patches Chris Wilson
                   ` (3 preceding siblings ...)
  2018-07-30 15:41 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/3] drm/i915: Limit C-states when waiting for the active request Patchwork
@ 2018-07-30 15:43 ` Patchwork
  2018-07-30 16:03 ` ✓ Fi.CI.BAT: success " Patchwork
  2018-07-30 17:47 ` ✗ Fi.CI.IGT: failure " Patchwork
  6 siblings, 0 replies; 15+ messages in thread
From: Patchwork @ 2018-07-30 15:43 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/3] drm/i915: Limit C-states when waiting for the active request
URL   : https://patchwork.freedesktop.org/series/47435/
State : warning

== Summary ==

$ dim sparse origin/drm-tip
Commit: drm/i915: Limit C-states when waiting for the active request
Okay!

Commit: drm/i915: Do not use iowait while waiting for the GPU
Okay!

Commit: drm/i915: Interactive RPS mode
-drivers/gpu/drm/i915/selftests/../i915_drv.h:3645:16: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/../i915_drv.h:3649:16: warning: expression using sizeof(void)

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* ✓ Fi.CI.BAT: success for series starting with [1/3] drm/i915: Limit C-states when waiting for the active request
  2018-07-30 15:25 Trio of latency sensitive patches Chris Wilson
                   ` (4 preceding siblings ...)
  2018-07-30 15:43 ` ✗ Fi.CI.SPARSE: " Patchwork
@ 2018-07-30 16:03 ` Patchwork
  2018-07-30 17:47 ` ✗ Fi.CI.IGT: failure " Patchwork
  6 siblings, 0 replies; 15+ messages in thread
From: Patchwork @ 2018-07-30 16:03 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/3] drm/i915: Limit C-states when waiting for the active request
URL   : https://patchwork.freedesktop.org/series/47435/
State : success

== Summary ==

= CI Bug Log - changes from CI_DRM_4595 -> Patchwork_9812 =

== Summary - WARNING ==

  Minor unknown changes coming with Patchwork_9812 need to be verified
  manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_9812, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/47435/revisions/1/mbox/

== Possible new issues ==

  Here are the unknown changes that may have been introduced in Patchwork_9812:

  === IGT changes ===

    ==== Warnings ====

    igt@drv_selftest@live_evict:
      fi-cnl-psr:         SKIP -> PASS +9

    
== Known issues ==

  Here are the changes found in Patchwork_9812 that come from known issues:

  === IGT changes ===

    ==== Issues hit ====

    igt@drv_selftest@live_hangcheck:
      fi-skl-guc:         PASS -> DMESG-FAIL (fdo#107174)
      fi-kbl-guc:         PASS -> DMESG-FAIL (fdo#106947)

    igt@drv_selftest@live_objects:
      fi-cnl-psr:         SKIP -> DMESG-FAIL (fdo#107398)

    igt@drv_selftest@live_workarounds:
      fi-kbl-7560u:       PASS -> DMESG-FAIL (fdo#107292)

    igt@kms_pipe_crc_basic@suspend-read-crc-pipe-c:
      fi-kbl-7567u:       PASS -> FAIL (fdo#103191)

    {igt@kms_psr@primary_mmap_gtt}:
      fi-cnl-psr:         PASS -> DMESG-WARN (fdo#107372)

    
    ==== Possible fixes ====

    igt@debugfs_test@read_all_entries:
      fi-snb-2520m:       INCOMPLETE (fdo#103713) -> PASS

    igt@drv_selftest@live_coherency:
      {fi-icl-u}:         DMESG-FAIL -> PASS

    igt@drv_selftest@live_requests:
      {fi-bsw-kefka}:     INCOMPLETE (fdo#105876) -> PASS

    igt@drv_selftest@live_workarounds:
      {fi-cfl-8109u}:     DMESG-FAIL (fdo#107292) -> PASS
      {fi-bsw-kefka}:     DMESG-FAIL (fdo#107292) -> PASS
      fi-cnl-psr:         DMESG-FAIL (fdo#107292) -> PASS

    igt@kms_chamelium@dp-edid-read:
      fi-kbl-7500u:       FAIL (fdo#103841) -> PASS

    
    ==== Warnings ====

    {igt@kms_psr@primary_page_flip}:
      fi-cnl-psr:         DMESG-FAIL (fdo#107372) -> DMESG-WARN (fdo#107372)

    
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  fdo#103191 https://bugs.freedesktop.org/show_bug.cgi?id=103191
  fdo#103713 https://bugs.freedesktop.org/show_bug.cgi?id=103713
  fdo#103841 https://bugs.freedesktop.org/show_bug.cgi?id=103841
  fdo#105876 https://bugs.freedesktop.org/show_bug.cgi?id=105876
  fdo#106947 https://bugs.freedesktop.org/show_bug.cgi?id=106947
  fdo#107174 https://bugs.freedesktop.org/show_bug.cgi?id=107174
  fdo#107292 https://bugs.freedesktop.org/show_bug.cgi?id=107292
  fdo#107372 https://bugs.freedesktop.org/show_bug.cgi?id=107372
  fdo#107398 https://bugs.freedesktop.org/show_bug.cgi?id=107398


== Participating hosts (52 -> 47) ==

  Additional (1): fi-byt-j1900 
  Missing    (6): fi-ilk-m540 fi-hsw-4200u fi-byt-squawks fi-bsw-cyan fi-ctg-p8600 fi-byt-clapper 


== Build changes ==

    * Linux: CI_DRM_4595 -> Patchwork_9812

  CI_DRM_4595: f133adaa57cf7118381b5ffc081bba3bbb1dbc83 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4581: f1c868dae24056ebc27e4f3c197724ce9b956a8a @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_9812: 8e9cd9e9f5bd27007a72b01037a18ee944a8e14c @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

8e9cd9e9f5bd drm/i915: Interactive RPS mode
c9a918d455b0 drm/i915: Do not use iowait while waiting for the GPU
79a7b4257bd9 drm/i915: Limit C-states when waiting for the active request

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_9812/issues.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* ✗ Fi.CI.IGT: failure for series starting with [1/3] drm/i915: Limit C-states when waiting for the active request
  2018-07-30 15:25 Trio of latency sensitive patches Chris Wilson
                   ` (5 preceding siblings ...)
  2018-07-30 16:03 ` ✓ Fi.CI.BAT: success " Patchwork
@ 2018-07-30 17:47 ` Patchwork
  6 siblings, 0 replies; 15+ messages in thread
From: Patchwork @ 2018-07-30 17:47 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/3] drm/i915: Limit C-states when waiting for the active request
URL   : https://patchwork.freedesktop.org/series/47435/
State : failure

== Summary ==

= CI Bug Log - changes from CI_DRM_4595_full -> Patchwork_9812_full =

== Summary - FAILURE ==

  Serious unknown changes coming with Patchwork_9812_full absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_9812_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

== Possible new issues ==

  Here are the unknown changes that may have been introduced in Patchwork_9812_full:

  === IGT changes ===

    ==== Possible regressions ====

    igt@pm_rps@min-max-config-loaded:
      shard-glk:          PASS -> FAIL

    
== Known issues ==

  Here are the changes found in Patchwork_9812_full that come from known issues:

  === IGT changes ===

    ==== Issues hit ====

    igt@kms_flip@plain-flip-fb-recreate-interruptible:
      shard-glk:          PASS -> FAIL (fdo#100368) +1

    igt@kms_rotation_crc@sprite-rotation-180:
      shard-hsw:          PASS -> FAIL (fdo#103925)

    igt@pm_rps@min-max-config-loaded:
      shard-apl:          PASS -> FAIL (fdo#102250)

    
    ==== Possible fixes ====

    igt@gem_ctx_isolation@vcs0-s3:
      shard-kbl:          INCOMPLETE (fdo#103665) -> PASS

    igt@kms_flip@2x-flip-vs-expired-vblank:
      shard-hsw:          FAIL (fdo#102887) -> PASS

    igt@kms_flip@flip-vs-expired-vblank-interruptible:
      shard-kbl:          FAIL (fdo#102887, fdo#105363) -> PASS

    igt@kms_flip@plain-flip-ts-check:
      shard-glk:          FAIL (fdo#100368) -> PASS

    igt@kms_setmode@basic:
      shard-apl:          FAIL (fdo#99912) -> PASS

    
  fdo#100368 https://bugs.freedesktop.org/show_bug.cgi?id=100368
  fdo#102250 https://bugs.freedesktop.org/show_bug.cgi?id=102250
  fdo#102887 https://bugs.freedesktop.org/show_bug.cgi?id=102887
  fdo#103665 https://bugs.freedesktop.org/show_bug.cgi?id=103665
  fdo#103925 https://bugs.freedesktop.org/show_bug.cgi?id=103925
  fdo#105363 https://bugs.freedesktop.org/show_bug.cgi?id=105363
  fdo#99912 https://bugs.freedesktop.org/show_bug.cgi?id=99912


== Participating hosts (5 -> 5) ==

  No changes in participating hosts


== Build changes ==

    * Linux: CI_DRM_4595 -> Patchwork_9812

  CI_DRM_4595: f133adaa57cf7118381b5ffc081bba3bbb1dbc83 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4581: f1c868dae24056ebc27e4f3c197724ce9b956a8a @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_9812: 8e9cd9e9f5bd27007a72b01037a18ee944a8e14c @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_9812/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] drm/i915: Do not use iowait while waiting for the GPU
  2018-07-30 15:25 ` [PATCH 2/3] drm/i915: Do not use iowait while waiting for the GPU Chris Wilson
@ 2018-07-31 13:03   ` Mika Kuoppala
  2018-07-31 19:25     ` Francisco Jerez
  0 siblings, 1 reply; 15+ messages in thread
From: Mika Kuoppala @ 2018-07-31 13:03 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Eero Tamminen

Chris Wilson <chris@chris-wilson.co.uk> writes:

> A recent trend for cpufreq is to boost the CPU frequencies for
> iowaiters, in particularly to benefit high frequency I/O. We do the same
> and boost the GPU clocks to try and minimise time spent waiting for the
> GPU. However, as the igfx and CPU share the same TDP, boosting the CPU
> frequency will result in the GPU being throttled and its frequency being
> reduced. Thus declaring iowait negatively impacts on GPU throughput.
>
> v2: Both sleeps!
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107410
> References: 52ccc4314293 ("cpufreq: intel_pstate: HWP boost performance on IO wakeup")

The commit above has it's own heuristics on when to actual ramp up,
inspecting the interval of io waits.

Regardless of that, with shared tdp, the waiter should not stand in a
way. And that it fixes a regression:

Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>

On other way around, the atomic commit code on updating
planes, could potentially benefit of changing to the
io_schedule_timeout. (and/or adopting c state limits)

-Mika

> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Eero Tamminen <eero.t.tamminen@intel.com>
> Cc: Francisco Jerez <currojerez@riseup.net>
> ---
>  drivers/gpu/drm/i915/i915_request.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index f3ff8dbe363d..3e48ea87b324 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -1376,7 +1376,7 @@ long i915_request_wait(struct i915_request *rq,
>  			goto complete;
>  		}
>  
> -		timeout = io_schedule_timeout(timeout);
> +		timeout = schedule_timeout(timeout);
>  	} while (1);
>  
>  	GEM_BUG_ON(!intel_wait_has_seqno(&wait));
> @@ -1414,7 +1414,7 @@ long i915_request_wait(struct i915_request *rq,
>  				      wait.seqno - 1))
>  			qos = wait_dma_qos_add();
>  
> -		timeout = io_schedule_timeout(timeout);
> +		timeout = schedule_timeout(timeout);
>  
>  		if (intel_wait_complete(&wait) &&
>  		    intel_wait_check_request(&wait, rq))
> -- 
> 2.18.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] drm/i915: Do not use iowait while waiting for the GPU
  2018-07-31 13:03   ` Mika Kuoppala
@ 2018-07-31 19:25     ` Francisco Jerez
  0 siblings, 0 replies; 15+ messages in thread
From: Francisco Jerez @ 2018-07-31 19:25 UTC (permalink / raw)
  To: Mika Kuoppala, Chris Wilson, intel-gfx; +Cc: Eero Tamminen


[-- Attachment #1.1.1: Type: text/plain, Size: 4214 bytes --]

Mika Kuoppala <mika.kuoppala@linux.intel.com> writes:

> Chris Wilson <chris@chris-wilson.co.uk> writes:
>
>> A recent trend for cpufreq is to boost the CPU frequencies for
>> iowaiters, in particularly to benefit high frequency I/O. We do the same
>> and boost the GPU clocks to try and minimise time spent waiting for the
>> GPU. However, as the igfx and CPU share the same TDP, boosting the CPU
>> frequency will result in the GPU being throttled and its frequency being
>> reduced. Thus declaring iowait negatively impacts on GPU throughput.
>>
>> v2: Both sleeps!
>>
>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107410
>> References: 52ccc4314293 ("cpufreq: intel_pstate: HWP boost performance on IO wakeup")
>
> The commit above has it's own heuristics on when to actual ramp up,
> inspecting the interval of io waits.
>
> Regardless of that, with shared tdp, the waiter should not stand in a
> way.

I've been running some tests with this series (and your previous ones).
I still see statistically significant regressions in latency-sensitive
benchmarks with this series applied:

 qgears2/render-backend=XRender Extension/test-mode=Text: XXX ±0.26% x12 -> XXX ±0.36% x15 d=-0.97% ±0.32% p=0.00%
 lightsmark:                                              XXX ±0.51% x22 -> XXX ±0.49% x20 d=-1.58% ±0.50% p=0.00%
 gputest/triangle:                                        XXX ±0.67% x10 -> XXX ±1.76% x20 d=-1.73% ±1.47% p=0.52%
 synmark/OglMultithread:ĝ                                 XXX ±0.47% x10 -> XXX ±1.06% x20 d=-3.59% ±0.88% p=0.00%

Numbers above are from a partial benchmark run on BXT J3455 -- I'm still
waiting to get the results of a full run though.

Worse, in combination with my intel_pstate branch the effect of this
patch is strictly negative.  There are no improvements because the
cpufreq governor is able to figure out by itself that boosting the
frequency of the CPU under GPU-bound conditions cannot possibly help
(The HWP boost logic could be fixed to do the same thing easily which
would allow us to obtain the best of both worlds on big core).  The
reason for the regressions is that IOWAIT is a useful signal for the
cpufreq governor to provide reduced latency in applications that are
unable to parallelize enough work between CPU and the IO device -- The
upstream governor is just using it rather ineffectively.

> And that it fixes a regression:
>

This patch isn't necessary anymore to fix the regression, there is
another change going in that mitigates the problem [1].  Can we please
keep the IO schedule calls here? (and elsewhere in the atomic commit
code)

[1] https://lkml.org/lkml/2018/7/30/880

> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>
> On other way around, the atomic commit code on updating
> planes, could potentially benefit of changing to the
> io_schedule_timeout. (and/or adopting c state limits)
>
> -Mika
>
>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>> Cc: Eero Tamminen <eero.t.tamminen@intel.com>
>> Cc: Francisco Jerez <currojerez@riseup.net>
>> ---
>>  drivers/gpu/drm/i915/i915_request.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
>> index f3ff8dbe363d..3e48ea87b324 100644
>> --- a/drivers/gpu/drm/i915/i915_request.c
>> +++ b/drivers/gpu/drm/i915/i915_request.c
>> @@ -1376,7 +1376,7 @@ long i915_request_wait(struct i915_request *rq,
>>  			goto complete;
>>  		}
>>  
>> -		timeout = io_schedule_timeout(timeout);
>> +		timeout = schedule_timeout(timeout);
>>  	} while (1);
>>  
>>  	GEM_BUG_ON(!intel_wait_has_seqno(&wait));
>> @@ -1414,7 +1414,7 @@ long i915_request_wait(struct i915_request *rq,
>>  				      wait.seqno - 1))
>>  			qos = wait_dma_qos_add();
>>  
>> -		timeout = io_schedule_timeout(timeout);
>> +		timeout = schedule_timeout(timeout);
>>  
>>  		if (intel_wait_complete(&wait) &&
>>  		    intel_wait_check_request(&wait, rq))
>> -- 
>> 2.18.0

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 227 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] drm/i915: Limit C-states when waiting for the active request
  2018-07-30 15:25 ` [PATCH 1/3] drm/i915: Limit C-states when waiting for the active request Chris Wilson
@ 2018-08-01  9:56   ` Chris Wilson
  2018-08-03 10:48   ` Tvrtko Ursulin
  1 sibling, 0 replies; 15+ messages in thread
From: Chris Wilson @ 2018-08-01  9:56 UTC (permalink / raw)
  To: intel-gfx; +Cc: Eero Tamminen

Quoting Chris Wilson (2018-07-30 16:25:20)
> If we are waiting for the currently executing request, we have a good
> idea that it will be completed in the very near future and so want to
> cap the CPU_DMA_LATENCY to ensure that we wake up the client quickly.
> 
> v2: Not allowed to block in kmalloc after setting TASK_INTERRUPTIBLE.
> v3: Avoid the blocking notifier as well for TASK_INTERRUPTIBLE
> v4: Beautification?
> v5: And ignore the preemptibility of queue_work before schedule.
> 
> Testcase: igt/gem_sync/store-default
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Eero Tamminen <eero.t.tamminen@intel.com>
> Cc: Francisco Jerez <currojerez@riseup.net>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>

media_bench disagrees with dropping iowait, but agrees with setting the
DMA_LATENCY pm_qos.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] drm/i915: Limit C-states when waiting for the active request
  2018-07-30 15:25 ` [PATCH 1/3] drm/i915: Limit C-states when waiting for the active request Chris Wilson
  2018-08-01  9:56   ` Chris Wilson
@ 2018-08-03 10:48   ` Tvrtko Ursulin
  2018-08-03 11:07     ` Chris Wilson
  1 sibling, 1 reply; 15+ messages in thread
From: Tvrtko Ursulin @ 2018-08-03 10:48 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Eero Tamminen


On 30/07/2018 16:25, Chris Wilson wrote:
> If we are waiting for the currently executing request, we have a good
> idea that it will be completed in the very near future and so want to
> cap the CPU_DMA_LATENCY to ensure that we wake up the client quickly.

Maybe, but I have a feeling we shouldn't assume what the userspace 
wants. On the other hand "seqno - 1" guard alleviates some of my 
concerns, just not sure if all.

> v2: Not allowed to block in kmalloc after setting TASK_INTERRUPTIBLE.
> v3: Avoid the blocking notifier as well for TASK_INTERRUPTIBLE
> v4: Beautification?
> v5: And ignore the preemptibility of queue_work before schedule.
> 
> Testcase: igt/gem_sync/store-default
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Eero Tamminen <eero.t.tamminen@intel.com>
> Cc: Francisco Jerez <currojerez@riseup.net>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/i915_request.c | 52 +++++++++++++++++++++++++++++
>   1 file changed, 52 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 5c2c93cbab12..f3ff8dbe363d 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -1258,6 +1258,51 @@ static bool __i915_wait_request_check_and_reset(struct i915_request *request)
>   	return true;
>   }
>   
> +struct wait_dma_qos {
> +	struct pm_qos_request req;
> +	struct work_struct add, del;
> +};
> +
> +static void __wait_dma_qos_add(struct work_struct *work)
> +{
> +	struct wait_dma_qos *qos = container_of(work, typeof(*qos), add);
> +
> +	pm_qos_add_request(&qos->req, PM_QOS_CPU_DMA_LATENCY, 50);
> +}
> +
> +static void __wait_dma_qos_del(struct work_struct *work)
> +{
> +	struct wait_dma_qos *qos = container_of(work, typeof(*qos), del);
> +
> +	if (!cancel_work_sync(&qos->add))
> +		pm_qos_remove_request(&qos->req);
> +
> +	kfree(qos);
> +}
> +
> +static struct wait_dma_qos *wait_dma_qos_add(void)
> +{
> +	struct wait_dma_qos *qos;
> +
> +	/* Called under TASK_INTERRUPTIBLE, so not allowed to sleep/block. */
> +	qos = kzalloc(sizeof(*qos), GFP_NOWAIT | __GFP_NOWARN);

Is it too big to put on the stack in i915_request_wait? Looks like that 
would be simpler.

> +	if (!qos)
> +		return NULL;
> +
> +	INIT_WORK(&qos->add, __wait_dma_qos_add);
> +	INIT_WORK(&qos->del, __wait_dma_qos_del);
> +	schedule_work_on(raw_smp_processor_id(), &qos->add);

Do we want to use the highpri wq? But in any case we do have a worker 
latency here, which may completely defeat the 50us QoS request. :(

Also, do you need to specify the CPU manually or is that in fact 
detrimental to the worker running ASAP? AFAIU this makes the worker only 
be able to start once we go to sleep, with potentially other stuff in 
there preceding our work item.

> +
> +	return qos;
> +}
> +
> +static void wait_dma_qos_del(struct wait_dma_qos *qos)
> +{
> +	/* Defer to worker so not incur extra latency for our woken client. */
> +	if (qos)
> +		schedule_work(&qos->del);
> +}
> +
>   /**
>    * i915_request_wait - wait until execution of request has finished
>    * @rq: the request to wait upon
> @@ -1286,6 +1331,7 @@ long i915_request_wait(struct i915_request *rq,
>   	wait_queue_head_t *errq = &rq->i915->gpu_error.wait_queue;
>   	DEFINE_WAIT_FUNC(reset, default_wake_function);
>   	DEFINE_WAIT_FUNC(exec, default_wake_function);
> +	struct wait_dma_qos *qos = NULL;
>   	struct intel_wait wait;
>   
>   	might_sleep();
> @@ -1363,6 +1409,11 @@ long i915_request_wait(struct i915_request *rq,
>   			break;
>   		}
>   
> +		if (!qos &&
> +		    i915_seqno_passed(intel_engine_get_seqno(rq->engine),
> +				      wait.seqno - 1))
> +			qos = wait_dma_qos_add();
> +
>   		timeout = io_schedule_timeout(timeout);
>   
>   		if (intel_wait_complete(&wait) &&
> @@ -1412,6 +1463,7 @@ long i915_request_wait(struct i915_request *rq,
>   	if (flags & I915_WAIT_LOCKED)
>   		remove_wait_queue(errq, &reset);
>   	remove_wait_queue(&rq->execute, &exec);
> +	wait_dma_qos_del(qos);
>   	trace_i915_request_wait_end(rq);
>   
>   	return timeout;
> 

Another thing we talked about on IRC is a potential to introduce an 
explicit low-latency flag to gem_wait ioctl. That would punt the 
responsibility to userspace to know if it cares, but on the other hand 
if some benchmark benefit from implicit setting that could be tempting. 
You said media-bench likes it so I'll try it.

Explicit request would also simplify the code by removing the need for 
workers. And remove the worker latency from the request which may be the 
most attractive benefit.

And also could apply the QoS request to before the seqno assignment. 
Currently I think there is a small window where wait can race with it, 
and fall into high-latency sleep, even if later it would chose to 
request low-latency.

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] drm/i915: Limit C-states when waiting for the active request
  2018-08-03 10:48   ` Tvrtko Ursulin
@ 2018-08-03 11:07     ` Chris Wilson
  2018-08-03 13:00       ` Tvrtko Ursulin
  0 siblings, 1 reply; 15+ messages in thread
From: Chris Wilson @ 2018-08-03 11:07 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx; +Cc: Eero Tamminen

Quoting Tvrtko Ursulin (2018-08-03 11:48:44)
> 
> On 30/07/2018 16:25, Chris Wilson wrote:
> > If we are waiting for the currently executing request, we have a good
> > idea that it will be completed in the very near future and so want to
> > cap the CPU_DMA_LATENCY to ensure that we wake up the client quickly.
> 
> Maybe, but I have a feeling we shouldn't assume what the userspace 
> wants. On the other hand "seqno - 1" guard alleviates some of my 
> concerns, just not sure if all.

Yup, it at least pretends to not be wholly evil.
 
> > v2: Not allowed to block in kmalloc after setting TASK_INTERRUPTIBLE.
> > v3: Avoid the blocking notifier as well for TASK_INTERRUPTIBLE
> > v4: Beautification?
> > v5: And ignore the preemptibility of queue_work before schedule.
> > 
> > Testcase: igt/gem_sync/store-default
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > Cc: Eero Tamminen <eero.t.tamminen@intel.com>
> > Cc: Francisco Jerez <currojerez@riseup.net>
> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_request.c | 52 +++++++++++++++++++++++++++++
> >   1 file changed, 52 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> > index 5c2c93cbab12..f3ff8dbe363d 100644
> > --- a/drivers/gpu/drm/i915/i915_request.c
> > +++ b/drivers/gpu/drm/i915/i915_request.c
> > @@ -1258,6 +1258,51 @@ static bool __i915_wait_request_check_and_reset(struct i915_request *request)
> >       return true;
> >   }
> >   
> > +struct wait_dma_qos {
> > +     struct pm_qos_request req;
> > +     struct work_struct add, del;
> > +};
> > +
> > +static void __wait_dma_qos_add(struct work_struct *work)
> > +{
> > +     struct wait_dma_qos *qos = container_of(work, typeof(*qos), add);
> > +
> > +     pm_qos_add_request(&qos->req, PM_QOS_CPU_DMA_LATENCY, 50);
> > +}
> > +
> > +static void __wait_dma_qos_del(struct work_struct *work)
> > +{
> > +     struct wait_dma_qos *qos = container_of(work, typeof(*qos), del);
> > +
> > +     if (!cancel_work_sync(&qos->add))
> > +             pm_qos_remove_request(&qos->req);
> > +
> > +     kfree(qos);
> > +}
> > +
> > +static struct wait_dma_qos *wait_dma_qos_add(void)
> > +{
> > +     struct wait_dma_qos *qos;
> > +
> > +     /* Called under TASK_INTERRUPTIBLE, so not allowed to sleep/block. */
> > +     qos = kzalloc(sizeof(*qos), GFP_NOWAIT | __GFP_NOWARN);
> 
> Is it too big to put on the stack in i915_request_wait? Looks like that 
> would be simpler.

We don't want to be synchronous in our out path, as that directly adds
to the client latency.
 
> > +     if (!qos)
> > +             return NULL;
> > +
> > +     INIT_WORK(&qos->add, __wait_dma_qos_add);
> > +     INIT_WORK(&qos->del, __wait_dma_qos_del);
> > +     schedule_work_on(raw_smp_processor_id(), &qos->add);
> 
> Do we want to use the highpri wq? But in any case we do have a worker 
> latency here, which may completely defeat the 50us QoS request. :(
> 
> Also, do you need to specify the CPU manually or is that in fact 
> detrimental to the worker running ASAP? AFAIU this makes the worker only 
> be able to start once we go to sleep, with potentially other stuff in 
> there preceding our work item.

That was the idea with pinning it to the current cpu; that the worker is
only run when we schedule ourselves. If we skipped the schedule, we
wouldn't need to adjust the pm_qos. There is still some wiggle with
preemption, but I don't see that being a huge concern, if we are
switching cpus we may as well make sure we don't enter a high C-state.

> > +
> > +     return qos;
> > +}
> > +
> > +static void wait_dma_qos_del(struct wait_dma_qos *qos)
> > +{
> > +     /* Defer to worker so not incur extra latency for our woken client. */
> > +     if (qos)
> > +             schedule_work(&qos->del);
> > +}
> > +
> >   /**
> >    * i915_request_wait - wait until execution of request has finished
> >    * @rq: the request to wait upon
> > @@ -1286,6 +1331,7 @@ long i915_request_wait(struct i915_request *rq,
> >       wait_queue_head_t *errq = &rq->i915->gpu_error.wait_queue;
> >       DEFINE_WAIT_FUNC(reset, default_wake_function);
> >       DEFINE_WAIT_FUNC(exec, default_wake_function);
> > +     struct wait_dma_qos *qos = NULL;
> >       struct intel_wait wait;
> >   
> >       might_sleep();
> > @@ -1363,6 +1409,11 @@ long i915_request_wait(struct i915_request *rq,
> >                       break;
> >               }
> >   
> > +             if (!qos &&
> > +                 i915_seqno_passed(intel_engine_get_seqno(rq->engine),
> > +                                   wait.seqno - 1))
> > +                     qos = wait_dma_qos_add();
> > +
> >               timeout = io_schedule_timeout(timeout);
> >   
> >               if (intel_wait_complete(&wait) &&
> > @@ -1412,6 +1463,7 @@ long i915_request_wait(struct i915_request *rq,
> >       if (flags & I915_WAIT_LOCKED)
> >               remove_wait_queue(errq, &reset);
> >       remove_wait_queue(&rq->execute, &exec);
> > +     wait_dma_qos_del(qos);
> >       trace_i915_request_wait_end(rq);
> >   
> >       return timeout;
> > 
> 
> Another thing we talked about on IRC is a potential to introduce an 
> explicit low-latency flag to gem_wait ioctl. That would punt the 
> responsibility to userspace to know if it cares, but on the other hand 
> if some benchmark benefit from implicit setting that could be tempting. 
> You said media-bench likes it so I'll try it.
> 
> Explicit request would also simplify the code by removing the need for 
> workers. And remove the worker latency from the request which may be the 
> most attractive benefit.

No, we still need the workers as the pm_qos_update can take a while if
it decides to change C-state there and then, so definitely don't want
that in the tail, and we might as well keep the trick to only do the
pm_qos_update if we sleep.
 
> And also could apply the QoS request to before the seqno assignment. 
> Currently I think there is a small window where wait can race with it, 
> and fall into high-latency sleep, even if later it would chose to 
> request low-latency.

That window should be followed by an interrupt, I think, and we should
have applied the irq_seqno barrier for gen5-gen7, so the seqno should be
valid wrt to the previous interrupt.

A hard problem to debug for sure. Maybe we should preserve the
last-interrupt timestamp in the HWSP and then we can inspect that.
Ideally though we want a journal so we can go back in case the next
interrupt arrives before we wakeup making the latency seem less. Still
some information (with detectable error) is better than none. Let's see
what that looks like.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] drm/i915: Limit C-states when waiting for the active request
  2018-08-03 11:07     ` Chris Wilson
@ 2018-08-03 13:00       ` Tvrtko Ursulin
  2018-08-03 13:57         ` Chris Wilson
  0 siblings, 1 reply; 15+ messages in thread
From: Tvrtko Ursulin @ 2018-08-03 13:00 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Eero Tamminen


On 03/08/2018 12:07, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-08-03 11:48:44)
>>
>> On 30/07/2018 16:25, Chris Wilson wrote:
>>> If we are waiting for the currently executing request, we have a good
>>> idea that it will be completed in the very near future and so want to
>>> cap the CPU_DMA_LATENCY to ensure that we wake up the client quickly.
>>
>> Maybe, but I have a feeling we shouldn't assume what the userspace
>> wants. On the other hand "seqno - 1" guard alleviates some of my
>> concerns, just not sure if all.
> 
> Yup, it at least pretends to not be wholly evil.
>   
>>> v2: Not allowed to block in kmalloc after setting TASK_INTERRUPTIBLE.
>>> v3: Avoid the blocking notifier as well for TASK_INTERRUPTIBLE
>>> v4: Beautification?
>>> v5: And ignore the preemptibility of queue_work before schedule.
>>>
>>> Testcase: igt/gem_sync/store-default
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
>>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>>> Cc: Eero Tamminen <eero.t.tamminen@intel.com>
>>> Cc: Francisco Jerez <currojerez@riseup.net>
>>> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/i915_request.c | 52 +++++++++++++++++++++++++++++
>>>    1 file changed, 52 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
>>> index 5c2c93cbab12..f3ff8dbe363d 100644
>>> --- a/drivers/gpu/drm/i915/i915_request.c
>>> +++ b/drivers/gpu/drm/i915/i915_request.c
>>> @@ -1258,6 +1258,51 @@ static bool __i915_wait_request_check_and_reset(struct i915_request *request)
>>>        return true;
>>>    }
>>>    
>>> +struct wait_dma_qos {
>>> +     struct pm_qos_request req;
>>> +     struct work_struct add, del;
>>> +};
>>> +
>>> +static void __wait_dma_qos_add(struct work_struct *work)
>>> +{
>>> +     struct wait_dma_qos *qos = container_of(work, typeof(*qos), add);
>>> +
>>> +     pm_qos_add_request(&qos->req, PM_QOS_CPU_DMA_LATENCY, 50);
>>> +}
>>> +
>>> +static void __wait_dma_qos_del(struct work_struct *work)
>>> +{
>>> +     struct wait_dma_qos *qos = container_of(work, typeof(*qos), del);
>>> +
>>> +     if (!cancel_work_sync(&qos->add))
>>> +             pm_qos_remove_request(&qos->req);
>>> +
>>> +     kfree(qos);
>>> +}
>>> +
>>> +static struct wait_dma_qos *wait_dma_qos_add(void)
>>> +{
>>> +     struct wait_dma_qos *qos;
>>> +
>>> +     /* Called under TASK_INTERRUPTIBLE, so not allowed to sleep/block. */
>>> +     qos = kzalloc(sizeof(*qos), GFP_NOWAIT | __GFP_NOWARN);
>>
>> Is it too big to put on the stack in i915_request_wait? Looks like that
>> would be simpler.
> 
> We don't want to be synchronous in our out path, as that directly adds
> to the client latency.

True, I missed the fact we wouldn't be able to return until syncing with 
the worker.

>>> +     if (!qos)
>>> +             return NULL;
>>> +
>>> +     INIT_WORK(&qos->add, __wait_dma_qos_add);
>>> +     INIT_WORK(&qos->del, __wait_dma_qos_del);
>>> +     schedule_work_on(raw_smp_processor_id(), &qos->add);
>>
>> Do we want to use the highpri wq? But in any case we do have a worker
>> latency here, which may completely defeat the 50us QoS request. :(
>>
>> Also, do you need to specify the CPU manually or is that in fact
>> detrimental to the worker running ASAP? AFAIU this makes the worker only
>> be able to start once we go to sleep, with potentially other stuff in
>> there preceding our work item.
> 
> That was the idea with pinning it to the current cpu; that the worker is
> only run when we schedule ourselves. If we skipped the schedule, we
> wouldn't need to adjust the pm_qos. There is still some wiggle with
> preemption, but I don't see that being a huge concern, if we are
> switching cpus we may as well make sure we don't enter a high C-state.

I guess it is OK, if there are preceding wq items which will delay our 
PM QoS setup they just add latency which we would get anyway after 
sleeping and before can be woken up. If I understand how these things work..

But highpri wq would still be a good move I think.

> 
>>> +
>>> +     return qos;
>>> +}
>>> +
>>> +static void wait_dma_qos_del(struct wait_dma_qos *qos)
>>> +{
>>> +     /* Defer to worker so not incur extra latency for our woken client. */
>>> +     if (qos)
>>> +             schedule_work(&qos->del);
>>> +}
>>> +
>>>    /**
>>>     * i915_request_wait - wait until execution of request has finished
>>>     * @rq: the request to wait upon
>>> @@ -1286,6 +1331,7 @@ long i915_request_wait(struct i915_request *rq,
>>>        wait_queue_head_t *errq = &rq->i915->gpu_error.wait_queue;
>>>        DEFINE_WAIT_FUNC(reset, default_wake_function);
>>>        DEFINE_WAIT_FUNC(exec, default_wake_function);
>>> +     struct wait_dma_qos *qos = NULL;
>>>        struct intel_wait wait;
>>>    
>>>        might_sleep();
>>> @@ -1363,6 +1409,11 @@ long i915_request_wait(struct i915_request *rq,
>>>                        break;
>>>                }
>>>    
>>> +             if (!qos &&
>>> +                 i915_seqno_passed(intel_engine_get_seqno(rq->engine),
>>> +                                   wait.seqno - 1))
>>> +                     qos = wait_dma_qos_add();
>>> +
>>>                timeout = io_schedule_timeout(timeout);
>>>    
>>>                if (intel_wait_complete(&wait) &&
>>> @@ -1412,6 +1463,7 @@ long i915_request_wait(struct i915_request *rq,
>>>        if (flags & I915_WAIT_LOCKED)
>>>                remove_wait_queue(errq, &reset);
>>>        remove_wait_queue(&rq->execute, &exec);
>>> +     wait_dma_qos_del(qos);
>>>        trace_i915_request_wait_end(rq);
>>>    
>>>        return timeout;
>>>
>>
>> Another thing we talked about on IRC is a potential to introduce an
>> explicit low-latency flag to gem_wait ioctl. That would punt the
>> responsibility to userspace to know if it cares, but on the other hand
>> if some benchmark benefit from implicit setting that could be tempting.
>> You said media-bench likes it so I'll try it.
>>
>> Explicit request would also simplify the code by removing the need for
>> workers. And remove the worker latency from the request which may be the
>> most attractive benefit.
> 
> No, we still need the workers as the pm_qos_update can take a while if
> it decides to change C-state there and then, so definitely don't want
> that in the tail, and we might as well keep the trick to only do the
> pm_qos_update if we sleep.

Okay true, but we would be able to set up the PM QoS earlier, before we 
do our own busy spin, and if on different CPU that would mean partially 
avoiding the worker latency. Which would be nice, since if we are giving 
some attempts at wake-up guarantees it would be nice to be as close as 
possible.

In the meantime I have exercised this via media-bench and it did seems 
there are gains for a few workloads (and almost no reliable 
regressions). But when ran under tracing the gains disappeared which 
makes me think either the 2s workload duration I selected is too short 
to give reliable numbers, or tracing brings a similar benefit wo/ PM QoS 
patch. Only conclusion is that more thorough testing is needed.

Regards,

Tvrtko

>> And also could apply the QoS request to before the seqno assignment.
>> Currently I think there is a small window where wait can race with it,
>> and fall into high-latency sleep, even if later it would chose to
>> request low-latency.
> 
> That window should be followed by an interrupt, I think, and we should
> have applied the irq_seqno barrier for gen5-gen7, so the seqno should be
> valid wrt to the previous interrupt.
> 
> A hard problem to debug for sure. Maybe we should preserve the
> last-interrupt timestamp in the HWSP and then we can inspect that.
> Ideally though we want a journal so we can go back in case the next
> interrupt arrives before we wakeup making the latency seem less. Still
> some information (with detectable error) is better than none. Let's see
> what that looks like.
> -Chris
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] drm/i915: Limit C-states when waiting for the active request
  2018-08-03 13:00       ` Tvrtko Ursulin
@ 2018-08-03 13:57         ` Chris Wilson
  0 siblings, 0 replies; 15+ messages in thread
From: Chris Wilson @ 2018-08-03 13:57 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx; +Cc: Eero Tamminen

Quoting Tvrtko Ursulin (2018-08-03 14:00:33)
> In the meantime I have exercised this via media-bench and it did seems 
> there are gains for a few workloads (and almost no reliable 
> regressions). But when ran under tracing the gains disappeared which 
> makes me think either the 2s workload duration I selected is too short 
> to give reliable numbers, or tracing brings a similar benefit wo/ PM QoS 
> patch. Only conclusion is that more thorough testing is needed.

Hmm, right tracing really does a number on cpufreq.

Switched to 

+       if (IS_ENABLED(CONFIG_DRM_I915_TRACE_IRQ_LATENCY)) {
+               struct drm_i915_private *dev_priv = rq->i915;
+               u32 now = I915_READ(RING_TIMESTAMP(rq->engine->mmio_base));
+
+               trace_i915_wait_irq_latency(rq,
+                                           now - intel_engine_get_timestamp(rq->engine),
+                                           rq->global_seqno == intel_engine_get_seqno(rq->engine));
+       }
+

from a plain pr_err() and the latency effect just vanished. That makes
it tricky.

        gem_sync   324 [002]   100.695373: i915:i915_wait_irq_latency: dev=0, engine=0:0, hw_id=2, ctx=20, seqno=6882, prio=6882, delay=231
        gem_sync   324 [002]   100.695536: i915:i915_wait_irq_latency: dev=0, engine=0:0, hw_id=2, ctx=20, seqno=6883, prio=6883, delay=219
        gem_sync   324 [002]   100.695697: i915:i915_wait_irq_latency: dev=0, engine=0:0, hw_id=2, ctx=20, seqno=6884, prio=6884, delay=215
        gem_sync   324 [002]   100.695860: i915:i915_wait_irq_latency: dev=0, engine=0:0, hw_id=2, ctx=20, seqno=6885, prio=6885, delay=231
        gem_sync   324 [002]   100.696040: i915:i915_wait_irq_latency: dev=0, engine=0:0, hw_id=2, ctx=20, seqno=6886, prio=6886, delay=313
        gem_sync   324 [002]   100.696203: i915:i915_wait_irq_latency: dev=0, engine=0:0, hw_id=2, ctx=20, seqno=6887, prio=6887, delay=218
        gem_sync   324 [002]   100.696366: i915:i915_wait_irq_latency: dev=0, engine=0:0, hw_id=2, ctx=20, seqno=6888, prio=6888, delay=216
        gem_sync   324 [002]   100.696527: i915:i915_wait_irq_latency: dev=0, engine=0:0, hw_id=2, ctx=20, seqno=6889, prio=6889, delay=214
        gem_sync   324 [002]   100.696689: i915:i915_wait_irq_latency: dev=0, engine=0:0, hw_id=2, ctx=20, seqno=6890, prio=6890, delay=218
        gem_sync   324 [002]   100.696851: i915:i915_wait_irq_latency: dev=0, engine=0:0, hw_id=2, ctx=20, seqno=6891, prio=6891, delay=215
        gem_sync   324 [002]   100.697013: i915:i915_wait_irq_latency: dev=0, engine=0:0, hw_id=2, ctx=20, seqno=6892, prio=6892, delay=214
        gem_sync   324 [002]   100.697175: i915:i915_wait_irq_latency: dev=0, engine=0:0, hw_id=2, ctx=20, seqno=6893, prio=6893, delay=216
        gem_sync   324 [002]   100.697336: i915:i915_wait_irq_latency: dev=0, engine=0:0, hw_id=2, ctx=20, seqno=6894, prio=6894, delay=217

from

[   14.232750] Wake up latency: 4004 cycles
[   14.233779] Wake up latency: 3236 cycles
[   14.234831] Wake up latency: 3580 cycles
[   14.235864] Wake up latency: 3220 cycles
[   14.236865] Wake up latency: 2653 cycles
[   14.237893] Wake up latency: 3209 cycles
[   14.238941] Wake up latency: 3492 cycles
[   14.239975] Wake up latency: 3227 cycles
[   14.240943] Wake up latency: 2019 cycles
[   14.241971] Wake up latency: 3199 cycles
[   14.243019] Wake up latency: 3500 cycles
[   14.244051] Wake up latency: 3198 cycles
[   14.245073] Wake up latency: 3014 cycles
[   14.246108] Wake up latency: 3238 cycles
[   14.247158] Wake up latency: 3539 cycles
[   14.248188] Wake up latency: 3200 cycles

-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2018-08-03 13:58 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-30 15:25 Trio of latency sensitive patches Chris Wilson
2018-07-30 15:25 ` [PATCH 1/3] drm/i915: Limit C-states when waiting for the active request Chris Wilson
2018-08-01  9:56   ` Chris Wilson
2018-08-03 10:48   ` Tvrtko Ursulin
2018-08-03 11:07     ` Chris Wilson
2018-08-03 13:00       ` Tvrtko Ursulin
2018-08-03 13:57         ` Chris Wilson
2018-07-30 15:25 ` [PATCH 2/3] drm/i915: Do not use iowait while waiting for the GPU Chris Wilson
2018-07-31 13:03   ` Mika Kuoppala
2018-07-31 19:25     ` Francisco Jerez
2018-07-30 15:25 ` [PATCH 3/3] drm/i915: Interactive RPS mode Chris Wilson
2018-07-30 15:41 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/3] drm/i915: Limit C-states when waiting for the active request Patchwork
2018-07-30 15:43 ` ✗ Fi.CI.SPARSE: " Patchwork
2018-07-30 16:03 ` ✓ Fi.CI.BAT: success " Patchwork
2018-07-30 17:47 ` ✗ Fi.CI.IGT: failure " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.