All of lore.kernel.org
 help / color / mirror / Atom feed
* Breadcrumbs, again
@ 2016-06-03 16:08 Chris Wilson
  2016-06-03 16:08 ` [PATCH 01/21] drm/i915/shrinker: Flush active on objects before counting Chris Wilson
                   ` (21 more replies)
  0 siblings, 22 replies; 60+ messages in thread
From: Chris Wilson @ 2016-06-03 16:08 UTC (permalink / raw)
  To: intel-gfx

We have a major bottleneck in waiting with many clients that is
impacting customer workloads. This is because we wake up every waiter
after the GPU advance for them all to try and identify if they were the
lucky one. The classic thundering herd, and the response is to only wake
the next in the queue who then wakes up all completed clients.

This also provides a low overhead signaling framework that *works*.

(This series follows on the BAT fix, since regressions get priority.)
-Chris

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 01/21] drm/i915/shrinker: Flush active on objects before counting
  2016-06-03 16:08 Breadcrumbs, again Chris Wilson
@ 2016-06-03 16:08 ` Chris Wilson
  2016-06-03 16:08 ` [PATCH 02/21] drm/i915: Delay queuing hangcheck to wait-request Chris Wilson
                   ` (20 subsequent siblings)
  21 siblings, 0 replies; 60+ messages in thread
From: Chris Wilson @ 2016-06-03 16:08 UTC (permalink / raw)
  To: intel-gfx

As we inspect obj->active to decide how many objects we can shrink (we
only shrink idle objects), it helps to flush the active lists first
in order to have a more accurate count of available objects.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_shrinker.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c
index 886a8797566d..1bf14544d8ad 100644
--- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
+++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
@@ -265,6 +265,8 @@ i915_gem_shrinker_count(struct shrinker *shrinker, struct shrink_control *sc)
 	if (!i915_gem_shrinker_lock(dev, &unlock))
 		return 0;
 
+	i915_gem_retire_requests(dev_priv);
+
 	count = 0;
 	list_for_each_entry(obj, &dev_priv->mm.unbound_list, global_list)
 		if (can_release_pages(obj))
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 02/21] drm/i915: Delay queuing hangcheck to wait-request
  2016-06-03 16:08 Breadcrumbs, again Chris Wilson
  2016-06-03 16:08 ` [PATCH 01/21] drm/i915/shrinker: Flush active on objects before counting Chris Wilson
@ 2016-06-03 16:08 ` Chris Wilson
  2016-06-08  8:42   ` Daniel Vetter
  2016-06-03 16:08 ` [PATCH 03/21] drm/i915: Remove the dedicated hangcheck workqueue Chris Wilson
                   ` (19 subsequent siblings)
  21 siblings, 1 reply; 60+ messages in thread
From: Chris Wilson @ 2016-06-03 16:08 UTC (permalink / raw)
  To: intel-gfx

We can forgo queuing the hangcheck from the start of every request to
until we wait upon a request. This reduces the overhead of every
request, but may increase the latency of detecting a hang. Howeever, if
nothing every waits upon a hang, did it ever hang? It also improves the
robustness of the wait-request by ensuring that the hangchecker is
indeed running before we sleep indefinitely (and thereby ensuring that
we never actually sleep forever waiting for a dead GPU).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c |  9 +++++----
 drivers/gpu/drm/i915/i915_irq.c | 10 ++++------
 2 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index c7a67a7412cd..03256f096ab6 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1310,6 +1310,9 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 			break;
 		}
 
+		/* Ensure that even if the GPU hangs, we get woken up. */
+		i915_queue_hangcheck(dev_priv);
+
 		timer.function = NULL;
 		if (timeout || missed_irq(dev_priv, engine)) {
 			unsigned long expire;
@@ -2674,8 +2677,6 @@ void __i915_add_request(struct drm_i915_gem_request *request,
 	/* Not allowed to fail! */
 	WARN(ret, "emit|add_request failed: %d!\n", ret);
 
-	i915_queue_hangcheck(engine->i915);
-
 	queue_delayed_work(dev_priv->wq,
 			   &dev_priv->mm.retire_work,
 			   round_jiffies_up_relative(HZ));
@@ -3019,8 +3020,8 @@ i915_gem_retire_requests(struct drm_i915_private *dev_priv)
 
 	if (idle)
 		mod_delayed_work(dev_priv->wq,
-				   &dev_priv->mm.idle_work,
-				   msecs_to_jiffies(100));
+				 &dev_priv->mm.idle_work,
+				 msecs_to_jiffies(100));
 
 	return idle;
 }
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 5c7378374ae6..1303d7c034d3 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -3134,10 +3134,10 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
 	intel_uncore_arm_unclaimed_mmio_detection(dev_priv);
 
 	for_each_engine_id(engine, dev_priv, id) {
+		bool busy = waitqueue_active(&engine->irq_queue);
 		u64 acthd;
 		u32 seqno;
 		unsigned user_interrupts;
-		bool busy = true;
 
 		semaphore_clear_deadlocks(dev_priv);
 
@@ -3160,12 +3160,11 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
 		if (engine->hangcheck.seqno == seqno) {
 			if (ring_idle(engine, seqno)) {
 				engine->hangcheck.action = HANGCHECK_IDLE;
-				if (waitqueue_active(&engine->irq_queue)) {
+				if (busy) {
 					/* Safeguard against driver failure */
 					user_interrupts = kick_waiters(engine);
 					engine->hangcheck.score += BUSY;
-				} else
-					busy = false;
+				}
 			} else {
 				/* We always increment the hangcheck score
 				 * if the ring is busy and still processing
@@ -3239,9 +3238,8 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
 		goto out;
 	}
 
+	/* Reset timer in case GPU hangs without another request being added */
 	if (busy_count)
-		/* Reset timer case chip hangs without another request
-		 * being added */
 		i915_queue_hangcheck(dev_priv);
 
 out:
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 03/21] drm/i915: Remove the dedicated hangcheck workqueue
  2016-06-03 16:08 Breadcrumbs, again Chris Wilson
  2016-06-03 16:08 ` [PATCH 01/21] drm/i915/shrinker: Flush active on objects before counting Chris Wilson
  2016-06-03 16:08 ` [PATCH 02/21] drm/i915: Delay queuing hangcheck to wait-request Chris Wilson
@ 2016-06-03 16:08 ` Chris Wilson
  2016-06-06 12:52   ` Tvrtko Ursulin
  2016-06-03 16:08 ` [PATCH 04/21] drm/i915: Make queueing the hangcheck work inline Chris Wilson
                   ` (18 subsequent siblings)
  21 siblings, 1 reply; 60+ messages in thread
From: Chris Wilson @ 2016-06-03 16:08 UTC (permalink / raw)
  To: intel-gfx

The queue only ever contains at most one item and has no special flags.
It is just a very simple wrapper around the system-wq - a complication
with no benefits.

v2: Use the system_long_wq as we may wish to capture the error state
after detecting the hang - which may take a bit of time.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.c | 8 --------
 drivers/gpu/drm/i915/i915_drv.h | 1 -
 drivers/gpu/drm/i915/i915_irq.c | 7 ++++---
 3 files changed, 4 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 2f8b2545e3de..3c8c75c77574 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -1143,15 +1143,8 @@ static int i915_workqueues_init(struct drm_i915_private *dev_priv)
 	if (dev_priv->hotplug.dp_wq == NULL)
 		goto out_free_wq;
 
-	dev_priv->gpu_error.hangcheck_wq =
-		alloc_ordered_workqueue("i915-hangcheck", 0);
-	if (dev_priv->gpu_error.hangcheck_wq == NULL)
-		goto out_free_dp_wq;
-
 	return 0;
 
-out_free_dp_wq:
-	destroy_workqueue(dev_priv->hotplug.dp_wq);
 out_free_wq:
 	destroy_workqueue(dev_priv->wq);
 out_err:
@@ -1162,7 +1155,6 @@ out_err:
 
 static void i915_workqueues_cleanup(struct drm_i915_private *dev_priv)
 {
-	destroy_workqueue(dev_priv->gpu_error.hangcheck_wq);
 	destroy_workqueue(dev_priv->hotplug.dp_wq);
 	destroy_workqueue(dev_priv->wq);
 }
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index be533de7383b..9471ebc99624 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1365,7 +1365,6 @@ struct i915_gpu_error {
 	/* Hang gpu twice in this window and your context gets banned */
 #define DRM_I915_CTX_BAN_PERIOD DIV_ROUND_UP(8*DRM_I915_HANGCHECK_PERIOD, 1000)
 
-	struct workqueue_struct *hangcheck_wq;
 	struct delayed_work hangcheck_work;
 
 	/* For reset and error_state handling. */
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 1303d7c034d3..a09310701999 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -3248,7 +3248,7 @@ out:
 
 void i915_queue_hangcheck(struct drm_i915_private *dev_priv)
 {
-	struct i915_gpu_error *e = &dev_priv->gpu_error;
+	unsigned long delay;
 
 	if (!i915.enable_hangcheck)
 		return;
@@ -3258,8 +3258,9 @@ void i915_queue_hangcheck(struct drm_i915_private *dev_priv)
 	 * we will ignore a hung ring if a second ring is kept busy.
 	 */
 
-	queue_delayed_work(e->hangcheck_wq, &e->hangcheck_work,
-			   round_jiffies_up_relative(DRM_I915_HANGCHECK_JIFFIES));
+	delay = round_jiffies_up_relative(DRM_I915_HANGCHECK_JIFFIES);
+	queue_delayed_work(system_long_wq,
+			   &dev_priv->gpu_error.hangcheck_work, delay);
 }
 
 static void ibx_irq_reset(struct drm_device *dev)
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 04/21] drm/i915: Make queueing the hangcheck work inline
  2016-06-03 16:08 Breadcrumbs, again Chris Wilson
                   ` (2 preceding siblings ...)
  2016-06-03 16:08 ` [PATCH 03/21] drm/i915: Remove the dedicated hangcheck workqueue Chris Wilson
@ 2016-06-03 16:08 ` Chris Wilson
  2016-06-03 16:08 ` [PATCH 05/21] drm/i915: Separate GPU hang waitqueue from advance Chris Wilson
                   ` (17 subsequent siblings)
  21 siblings, 0 replies; 60+ messages in thread
From: Chris Wilson @ 2016-06-03 16:08 UTC (permalink / raw)
  To: intel-gfx

Since the function is a small wrapper around schedule_delayed_work(),
move it inline to remove the function call overhead for the principle
caller.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h | 18 +++++++++++++++++-
 drivers/gpu/drm/i915/i915_irq.c | 17 -----------------
 2 files changed, 17 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 9471ebc99624..ceccc6d6b119 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2898,7 +2898,23 @@ void intel_hpd_cancel_work(struct drm_i915_private *dev_priv);
 bool intel_hpd_pin_to_port(enum hpd_pin pin, enum port *port);
 
 /* i915_irq.c */
-void i915_queue_hangcheck(struct drm_i915_private *dev_priv);
+static inline void i915_queue_hangcheck(struct drm_i915_private *dev_priv)
+{
+	unsigned long delay;
+
+	if (unlikely(!i915.enable_hangcheck))
+		return;
+
+	/* Don't continually defer the hangcheck so that it is always run at
+	 * least once after work has been scheduled on any ring. Otherwise,
+	 * we will ignore a hung ring if a second ring is kept busy.
+	 */
+
+	delay = round_jiffies_up_relative(DRM_I915_HANGCHECK_JIFFIES);
+	queue_delayed_work(system_long_wq,
+			   &dev_priv->gpu_error.hangcheck_work, delay);
+}
+
 __printf(3, 4)
 void i915_handle_error(struct drm_i915_private *dev_priv,
 		       u32 engine_mask,
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index a09310701999..83cab14639b2 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -3246,23 +3246,6 @@ out:
 	ENABLE_RPM_WAKEREF_ASSERTS(dev_priv);
 }
 
-void i915_queue_hangcheck(struct drm_i915_private *dev_priv)
-{
-	unsigned long delay;
-
-	if (!i915.enable_hangcheck)
-		return;
-
-	/* Don't continually defer the hangcheck so that it is always run at
-	 * least once after work has been scheduled on any ring. Otherwise,
-	 * we will ignore a hung ring if a second ring is kept busy.
-	 */
-
-	delay = round_jiffies_up_relative(DRM_I915_HANGCHECK_JIFFIES);
-	queue_delayed_work(system_long_wq,
-			   &dev_priv->gpu_error.hangcheck_work, delay);
-}
-
 static void ibx_irq_reset(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 05/21] drm/i915: Separate GPU hang waitqueue from advance
  2016-06-03 16:08 Breadcrumbs, again Chris Wilson
                   ` (3 preceding siblings ...)
  2016-06-03 16:08 ` [PATCH 04/21] drm/i915: Make queueing the hangcheck work inline Chris Wilson
@ 2016-06-03 16:08 ` Chris Wilson
  2016-06-06 13:00   ` Tvrtko Ursulin
  2016-06-03 16:08 ` [PATCH 06/21] drm/i915: Slaughter the thundering i915_wait_request herd Chris Wilson
                   ` (16 subsequent siblings)
  21 siblings, 1 reply; 60+ messages in thread
From: Chris Wilson @ 2016-06-03 16:08 UTC (permalink / raw)
  To: intel-gfx

Currently __i915_wait_request uses a per-engine wait_queue_t for the dual
purpose of waking after the GPU advances or for waking after an error.
In the future, we may add even more wake sources and require greater
separation, but for now we can conceptually simplify wakeups by separating
the two sources. In particular, this allows us to use different wait-queues
(e.g. one on the engine advancement, a global one for errors and one on
each requests) without any hassle.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h |  6 ++++++
 drivers/gpu/drm/i915/i915_gem.c |  5 +++++
 drivers/gpu/drm/i915/i915_irq.c | 19 ++++---------------
 3 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index ceccc6d6b119..e399e97965e0 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1401,6 +1401,12 @@ struct i915_gpu_error {
 #define I915_WEDGED			(1 << 31)
 
 	/**
+	 * Waitqueue to signal when a hang is detected. Used to for waiters
+	 * to release the struct_mutex for the reset to procede.
+	 */
+	wait_queue_head_t wait_queue;
+
+	/**
 	 * Waitqueue to signal when the reset has completed. Used by clients
 	 * that wait for dev_priv->mm.wedged to settle.
 	 */
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 03256f096ab6..de4fb39312a4 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1234,6 +1234,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 	const bool irq_test_in_progress =
 		ACCESS_ONCE(dev_priv->gpu_error.test_irq_rings) & intel_engine_flag(engine);
 	int state = interruptible ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
+	DEFINE_WAIT(reset);
 	DEFINE_WAIT(wait);
 	unsigned long timeout_expire;
 	s64 before = 0; /* Only to silence a compiler warning. */
@@ -1278,6 +1279,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 		goto out;
 	}
 
+	add_wait_queue(&dev_priv->gpu_error.wait_queue, &reset);
 	for (;;) {
 		struct timer_list timer;
 
@@ -1329,6 +1331,8 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 			destroy_timer_on_stack(&timer);
 		}
 	}
+	remove_wait_queue(&dev_priv->gpu_error.wait_queue, &reset);
+
 	if (!irq_test_in_progress)
 		engine->irq_put(engine);
 
@@ -5026,6 +5030,7 @@ i915_gem_load_init(struct drm_device *dev)
 			  i915_gem_retire_work_handler);
 	INIT_DELAYED_WORK(&dev_priv->mm.idle_work,
 			  i915_gem_idle_work_handler);
+	init_waitqueue_head(&dev_priv->gpu_error.wait_queue);
 	init_waitqueue_head(&dev_priv->gpu_error.reset_queue);
 
 	dev_priv->relative_constants_mode = I915_EXEC_CONSTANTS_REL_GENERAL;
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 83cab14639b2..30127b94f26e 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -2488,11 +2488,8 @@ static irqreturn_t gen8_irq_handler(int irq, void *arg)
 	return ret;
 }
 
-static void i915_error_wake_up(struct drm_i915_private *dev_priv,
-			       bool reset_completed)
+static void i915_error_wake_up(struct drm_i915_private *dev_priv)
 {
-	struct intel_engine_cs *engine;
-
 	/*
 	 * Notify all waiters for GPU completion events that reset state has
 	 * been changed, and that they need to restart their wait after
@@ -2501,18 +2498,10 @@ static void i915_error_wake_up(struct drm_i915_private *dev_priv,
 	 */
 
 	/* Wake up __wait_seqno, potentially holding dev->struct_mutex. */
-	for_each_engine(engine, dev_priv)
-		wake_up_all(&engine->irq_queue);
+	wake_up_all(&dev_priv->gpu_error.wait_queue);
 
 	/* Wake up intel_crtc_wait_for_pending_flips, holding crtc->mutex. */
 	wake_up_all(&dev_priv->pending_flip_queue);
-
-	/*
-	 * Signal tasks blocked in i915_gem_wait_for_error that the pending
-	 * reset state is cleared.
-	 */
-	if (reset_completed)
-		wake_up_all(&dev_priv->gpu_error.reset_queue);
 }
 
 /**
@@ -2577,7 +2566,7 @@ static void i915_reset_and_wakeup(struct drm_i915_private *dev_priv)
 		 * Note: The wake_up also serves as a memory barrier so that
 		 * waiters see the update value of the reset counter atomic_t.
 		 */
-		i915_error_wake_up(dev_priv, true);
+		wake_up_all(&dev_priv->gpu_error.reset_queue);
 	}
 }
 
@@ -2713,7 +2702,7 @@ void i915_handle_error(struct drm_i915_private *dev_priv,
 		 * ensure that the waiters see the updated value of the reset
 		 * counter atomic_t.
 		 */
-		i915_error_wake_up(dev_priv, false);
+		i915_error_wake_up(dev_priv);
 	}
 
 	i915_reset_and_wakeup(dev_priv);
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 06/21] drm/i915: Slaughter the thundering i915_wait_request herd
  2016-06-03 16:08 Breadcrumbs, again Chris Wilson
                   ` (4 preceding siblings ...)
  2016-06-03 16:08 ` [PATCH 05/21] drm/i915: Separate GPU hang waitqueue from advance Chris Wilson
@ 2016-06-03 16:08 ` Chris Wilson
  2016-06-06 13:58   ` Tvrtko Ursulin
  2016-06-03 16:08 ` [PATCH 07/21] drm/i915: Spin after waking up for an interrupt Chris Wilson
                   ` (15 subsequent siblings)
  21 siblings, 1 reply; 60+ messages in thread
From: Chris Wilson @ 2016-06-03 16:08 UTC (permalink / raw)
  To: intel-gfx; +Cc: Goel, Akash

One particularly stressful scenario consists of many independent tasks
all competing for GPU time and waiting upon the results (e.g. realtime
transcoding of many, many streams). One bottleneck in particular is that
each client waits on its own results, but every client is woken up after
every batchbuffer - hence the thunder of hooves as then every client must
do its heavyweight dance to read a coherent seqno to see if it is the
lucky one.

Ideally, we only want one client to wake up after the interrupt and
check its request for completion. Since the requests must retire in
order, we can select the first client on the oldest request to be woken.
Once that client has completed his wait, we can then wake up the
next client and so on. However, all clients then incur latency as every
process in the chain may be delayed for scheduling - this may also then
cause some priority inversion. To reduce the latency, when a client
is added or removed from the list, we scan the tree for completed
seqno and wake up all the completed waiters in parallel.

Using igt/benchmarks/gem_latency, we can demonstrate this effect. The
benchmark measures the number of GPU cycles between completion of a
batch and the client waking up from a call to wait-ioctl. With many
concurrent waiters, with each on a different request, we observe that
the wakeup latency before the patch scales nearly linearly with the
number of waiters (before external factors kick in making the scaling much
worse). After applying the patch, we can see that only the single waiter
for the request is being woken up, providing a constant wakeup latency
for every operation. However, the situation is not quite as rosy for
many waiters on the same request, though to the best of my knowledge this
is much less likely in practice. Here, we can observe that the
concurrent waiters incur extra latency from being woken up by the
solitary bottom-half, rather than directly by the interrupt. This
appears to be scheduler induced (having discounted adverse effects from
having a rbtree walk/erase in the wakeup path), each additional
wake_up_process() costs approximately 1us on big core. Another effect of
performing the secondary wakeups from the first bottom-half is the
incurred delay this imposes on high priority threads - rather than
immediately returning to userspace and leaving the interrupt handler to
wake the others.

To offset the delay incurred with additional waiters on a request, we
could use a hybrid scheme that did a quick read in the interrupt handler
and dequeued all the completed waiters (incurring the overhead in the
interrupt handler, not the best plan either as we then incur GPU
submission latency) but we would still have to wake up the bottom-half
every time to do the heavyweight slow read. Or we could only kick the
waiters on the seqno with the same priority as the current task (i.e. in
the realtime waiter scenario, only it is woken up immediately by the
interrupt and simply queues the next waiter before returning to userspace,
minimising its delay at the expense of the chain, and also reducing
contention on its scheduler runqueue). This is effective at avoid long
pauses in the interrupt handler and at avoiding the extra latency in
realtime/high-priority waiters.

v2: Convert from a kworker per engine into a dedicated kthread for the
bottom-half.
v3: Rename request members and tweak comments.
v4: Use a per-engine spinlock in the breadcrumbs bottom-half.
v5: Fix race in locklessly checking waiter status and kicking the task on
adding a new waiter.
v6: Fix deciding when to force the timer to hide missing interrupts.
v7: Move the bottom-half from the kthread to the first client process.
v8: Reword a few comments
v9: Break the busy loop when the interrupt is unmasked or has fired.
v10: Comments, unnecessary churn, better debugging from Tvrtko
v11: Wake all completed waiters on removing the current bottom-half to
reduce the latency of waking up a herd of clients all waiting on the
same request.
v12: Rearrange missed-interrupt fault injection so that it works with
igt/drv_missed_irq_hang
v13: Rename intel_breadcrumb and friends to intel_wait in preparation
for signal handling.
v14: RCU commentary, assert_spin_locked
v15: Hide BUG_ON behind the compiler; report on gem_latency findings.
v16: Sort seqno-groups by priority so that first-waiter has the highest
task priority (and so avoid priority inversion).
v17: Add waiters to post-mortem GPU hang state.

Testcase: igt/gem_concurrent_blit
Testcase: igt/benchmarks/gem_latency
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: "Rogozhkin, Dmitry V" <dmitry.v.rogozhkin@intel.com>
Cc: "Gong, Zhipeng" <zhipeng.gong@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Dave Gordon <david.s.gordon@intel.com>
Cc: "Goel, Akash" <akash.goel@intel.com>
---
 drivers/gpu/drm/i915/Makefile            |   1 +
 drivers/gpu/drm/i915/i915_debugfs.c      |  15 +-
 drivers/gpu/drm/i915/i915_drv.h          |  39 +++-
 drivers/gpu/drm/i915/i915_gem.c          | 141 +++++-------
 drivers/gpu/drm/i915/i915_gpu_error.c    |  59 +++++-
 drivers/gpu/drm/i915/i915_irq.c          |  20 +-
 drivers/gpu/drm/i915/intel_breadcrumbs.c | 354 +++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.c         |   4 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c  |   3 +-
 drivers/gpu/drm/i915/intel_ringbuffer.h  |  68 +++++-
 10 files changed, 595 insertions(+), 109 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/intel_breadcrumbs.c

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 7aecd309604c..f20007440821 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -38,6 +38,7 @@ i915-y += i915_cmd_parser.o \
 	  i915_gem_userptr.o \
 	  i915_gpu_error.o \
 	  i915_trace_points.o \
+	  intel_breadcrumbs.o \
 	  intel_lrc.o \
 	  intel_mocs.o \
 	  intel_ringbuffer.o \
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 3a0babe32621..48683538b4e2 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -788,10 +788,21 @@ static int i915_gem_request_info(struct seq_file *m, void *data)
 static void i915_ring_seqno_info(struct seq_file *m,
 				 struct intel_engine_cs *engine)
 {
+	struct intel_breadcrumbs *b = &engine->breadcrumbs;
+	struct rb_node *rb;
+
 	seq_printf(m, "Current sequence (%s): %x\n",
 		   engine->name, engine->get_seqno(engine));
 	seq_printf(m, "Current user interrupts (%s): %x\n",
 		   engine->name, READ_ONCE(engine->user_interrupts));
+
+	spin_lock(&b->lock);
+	for (rb = rb_first(&b->waiters); rb != NULL; rb = rb_next(rb)) {
+		struct intel_wait *w = container_of(rb, typeof(*w), node);
+		seq_printf(m, "Waiting (%s): %s [%d] on %x\n",
+			   engine->name, w->task->comm, w->task->pid, w->seqno);
+	}
+	spin_unlock(&b->lock);
 }
 
 static int i915_gem_seqno_info(struct seq_file *m, void *data)
@@ -1426,6 +1437,8 @@ static int i915_hangcheck_info(struct seq_file *m, void *unused)
 			   engine->hangcheck.seqno,
 			   seqno[id],
 			   engine->last_submitted_seqno);
+		seq_printf(m, "\twaiters? %d\n",
+			   intel_engine_has_waiter(engine));
 		seq_printf(m, "\tuser interrupts = %x [current %x]\n",
 			   engine->hangcheck.user_interrupts,
 			   READ_ONCE(engine->user_interrupts));
@@ -2411,7 +2424,7 @@ static int count_irq_waiters(struct drm_i915_private *i915)
 	int count = 0;
 
 	for_each_engine(engine, i915)
-		count += engine->irq_refcount;
+		count += intel_engine_has_waiter(engine);
 
 	return count;
 }
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index e399e97965e0..68b383d98457 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -503,6 +503,7 @@ struct drm_i915_error_state {
 		bool valid;
 		/* Software tracked state */
 		bool waiting;
+		int num_waiters;
 		int hangcheck_score;
 		enum intel_ring_hangcheck_action hangcheck_action;
 		int num_requests;
@@ -548,6 +549,12 @@ struct drm_i915_error_state {
 			u32 tail;
 		} *requests;
 
+		struct drm_i915_error_waiter {
+			char comm[TASK_COMM_LEN];
+			pid_t pid;
+			u32 seqno;
+		} *waiters;
+
 		struct {
 			u32 gfx_mode;
 			union {
@@ -1420,7 +1427,7 @@ struct i915_gpu_error {
 #define I915_STOP_RING_ALLOW_WARN      (1 << 30)
 
 	/* For missed irq/seqno simulation. */
-	unsigned int test_irq_rings;
+	unsigned long test_irq_rings;
 };
 
 enum modeset_restore {
@@ -3013,7 +3020,6 @@ ibx_disable_display_interrupt(struct drm_i915_private *dev_priv, uint32_t bits)
 	ibx_display_interrupt_update(dev_priv, bits, 0);
 }
 
-
 /* i915_gem.c */
 int i915_gem_create_ioctl(struct drm_device *dev, void *data,
 			  struct drm_file *file_priv);
@@ -3905,4 +3911,33 @@ static inline void i915_trace_irq_get(struct intel_engine_cs *engine,
 		i915_gem_request_assign(&engine->trace_irq_req, req);
 }
 
+static inline bool __i915_request_irq_complete(struct drm_i915_gem_request *req)
+{
+	/* Ensure our read of the seqno is coherent so that we
+	 * do not "miss an interrupt" (i.e. if this is the last
+	 * request and the seqno write from the GPU is not visible
+	 * by the time the interrupt fires, we will see that the
+	 * request is incomplete and go back to sleep awaiting
+	 * another interrupt that will never come.)
+	 *
+	 * Strictly, we only need to do this once after an interrupt,
+	 * but it is easier and safer to do it every time the waiter
+	 * is woken.
+	 */
+	if (i915_gem_request_completed(req, false))
+		return true;
+
+	/* We need to check whether any gpu reset happened in between
+	 * the request being submitted and now. If a reset has occurred,
+	 * the request is effectively complete (we either are in the
+	 * process of or have discarded the rendering and completely
+	 * reset the GPU. The results of the request are lost and we
+	 * are free to continue on with the original operation.
+	 */
+	if (req->reset_counter != i915_reset_counter(&req->i915->gpu_error))
+		return true;
+
+	return false;
+}
+
 #endif
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index de4fb39312a4..d08edb3d16f1 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1123,17 +1123,6 @@ i915_gem_check_wedge(unsigned reset_counter, bool interruptible)
 	return 0;
 }
 
-static void fake_irq(unsigned long data)
-{
-	wake_up_process((struct task_struct *)data);
-}
-
-static bool missed_irq(struct drm_i915_private *dev_priv,
-		       struct intel_engine_cs *engine)
-{
-	return test_bit(engine->id, &dev_priv->gpu_error.missed_irq_rings);
-}
-
 static unsigned long local_clock_us(unsigned *cpu)
 {
 	unsigned long t;
@@ -1166,7 +1155,7 @@ static bool busywait_stop(unsigned long timeout, unsigned cpu)
 	return this_cpu != cpu;
 }
 
-static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
+static bool __i915_spin_request(struct drm_i915_gem_request *req, int state)
 {
 	unsigned long timeout;
 	unsigned cpu;
@@ -1181,17 +1170,14 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
 	 * takes to sleep on a request, on the order of a microsecond.
 	 */
 
-	if (req->engine->irq_refcount)
-		return -EBUSY;
-
 	/* Only spin if we know the GPU is processing this request */
 	if (!i915_gem_request_started(req, true))
-		return -EAGAIN;
+		return false;
 
 	timeout = local_clock_us(&cpu) + 5;
-	while (!need_resched()) {
+	do {
 		if (i915_gem_request_completed(req, true))
-			return 0;
+			return true;
 
 		if (signal_pending_state(state, current))
 			break;
@@ -1200,12 +1186,9 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
 			break;
 
 		cpu_relax_lowlatency();
-	}
-
-	if (i915_gem_request_completed(req, false))
-		return 0;
+	} while (!need_resched());
 
-	return -EAGAIN;
+	return false;
 }
 
 /**
@@ -1229,18 +1212,14 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 			s64 *timeout,
 			struct intel_rps_client *rps)
 {
-	struct intel_engine_cs *engine = i915_gem_request_get_engine(req);
-	struct drm_i915_private *dev_priv = req->i915;
-	const bool irq_test_in_progress =
-		ACCESS_ONCE(dev_priv->gpu_error.test_irq_rings) & intel_engine_flag(engine);
 	int state = interruptible ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
 	DEFINE_WAIT(reset);
-	DEFINE_WAIT(wait);
-	unsigned long timeout_expire;
+	struct intel_wait wait;
+	unsigned long timeout_remain;
 	s64 before = 0; /* Only to silence a compiler warning. */
-	int ret;
+	int ret = 0;
 
-	WARN(!intel_irqs_enabled(dev_priv), "IRQs disabled");
+	might_sleep();
 
 	if (list_empty(&req->list))
 		return 0;
@@ -1248,7 +1227,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 	if (i915_gem_request_completed(req, true))
 		return 0;
 
-	timeout_expire = 0;
+	timeout_remain = MAX_SCHEDULE_TIMEOUT;
 	if (timeout) {
 		if (WARN_ON(*timeout < 0))
 			return -EINVAL;
@@ -1256,7 +1235,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 		if (*timeout == 0)
 			return -ETIME;
 
-		timeout_expire = jiffies + nsecs_to_jiffies_timeout(*timeout);
+		timeout_remain = nsecs_to_jiffies_timeout(*timeout);
 
 		/*
 		 * Record current time in case interrupted by signal, or wedged.
@@ -1264,81 +1243,59 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 		before = ktime_get_raw_ns();
 	}
 
-	if (INTEL_INFO(dev_priv)->gen >= 6)
-		gen6_rps_boost(dev_priv, rps, req->emitted_jiffies);
-
 	trace_i915_gem_request_wait_begin(req);
 
-	/* Optimistic spin for the next jiffie before touching IRQs */
-	ret = __i915_spin_request(req, state);
-	if (ret == 0)
-		goto out;
-
-	if (!irq_test_in_progress && WARN_ON(!engine->irq_get(engine))) {
-		ret = -ENODEV;
-		goto out;
-	}
-
-	add_wait_queue(&dev_priv->gpu_error.wait_queue, &reset);
-	for (;;) {
-		struct timer_list timer;
+	if (INTEL_INFO(req->i915)->gen >= 6)
+		gen6_rps_boost(req->i915, rps, req->emitted_jiffies);
 
-		prepare_to_wait(&engine->irq_queue, &wait, state);
+	/* Optimistic spin for the next ~jiffie before touching IRQs */
+	if (__i915_spin_request(req, state))
+		goto complete;
 
-		/* We need to check whether any gpu reset happened in between
-		 * the request being submitted and now. If a reset has occurred,
-		 * the request is effectively complete (we either are in the
-		 * process of or have discarded the rendering and completely
-		 * reset the GPU. The results of the request are lost and we
-		 * are free to continue on with the original operation.
+	intel_wait_init(&wait, req->seqno);
+	set_current_state(state);
+	if (intel_engine_add_wait(req->engine, &wait))
+		/* In order to check that we haven't missed the interrupt
+		 * as we enabled it, we need to kick ourselves to do a
+		 * coherent check on the seqno before we sleep.
 		 */
-		if (req->reset_counter != i915_reset_counter(&dev_priv->gpu_error)) {
-			ret = 0;
-			break;
-		}
-
-		if (i915_gem_request_completed(req, false)) {
-			ret = 0;
-			break;
-		}
+		goto wakeup;
 
+	add_wait_queue(&req->i915->gpu_error.wait_queue, &reset);
+	for (;;) {
 		if (signal_pending_state(state, current)) {
 			ret = -ERESTARTSYS;
 			break;
 		}
 
-		if (timeout && time_after_eq(jiffies, timeout_expire)) {
+		/* Ensure that even if the GPU hangs, we get woken up. */
+		i915_queue_hangcheck(req->i915);
+
+		timeout_remain = io_schedule_timeout(timeout_remain);
+		if (timeout_remain == 0) {
 			ret = -ETIME;
 			break;
 		}
 
-		/* Ensure that even if the GPU hangs, we get woken up. */
-		i915_queue_hangcheck(dev_priv);
-
-		timer.function = NULL;
-		if (timeout || missed_irq(dev_priv, engine)) {
-			unsigned long expire;
-
-			setup_timer_on_stack(&timer, fake_irq, (unsigned long)current);
-			expire = missed_irq(dev_priv, engine) ? jiffies + 1 : timeout_expire;
-			mod_timer(&timer, expire);
-		}
+		if (intel_wait_complete(&wait))
+			break;
 
-		io_schedule();
+wakeup:
+		set_current_state(state);
 
-		if (timer.function) {
-			del_singleshot_timer_sync(&timer);
-			destroy_timer_on_stack(&timer);
-		}
+		/* Carefully check if the request is complete, giving time
+		 * for the seqno to be visible following the interrupt.
+		 * We also have to check in case we are kicked by the GPU
+		 * reset in order to drop the struct_mutex.
+		 */
+		if (__i915_request_irq_complete(req))
+			break;
 	}
-	remove_wait_queue(&dev_priv->gpu_error.wait_queue, &reset);
+	remove_wait_queue(&req->i915->gpu_error.wait_queue, &reset);
 
-	if (!irq_test_in_progress)
-		engine->irq_put(engine);
-
-	finish_wait(&engine->irq_queue, &wait);
-
-out:
+	intel_engine_remove_wait(req->engine, &wait);
+	__set_current_state(TASK_RUNNING);
+complete:
 	trace_i915_gem_request_wait_end(req);
 
 	if (timeout) {
@@ -2545,6 +2502,12 @@ i915_gem_init_seqno(struct drm_i915_private *dev_priv, u32 seqno)
 	}
 	i915_gem_retire_requests(dev_priv);
 
+	/* If the seqno wraps around, we need to clear the breadcrumb rbtree */
+	if (!i915_seqno_passed(seqno, dev_priv->next_seqno)) {
+		while (intel_kick_waiters(dev_priv))
+			yield();
+	}
+
 	/* Finally reset hw state */
 	for_each_engine(engine, dev_priv)
 		intel_ring_init_seqno(engine, seqno);
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 34ff2459ceea..89241ffcc676 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -463,6 +463,18 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
 			}
 		}
 
+		if (error->ring[i].num_waiters) {
+			err_printf(m, "%s --- %d waiters\n",
+				   dev_priv->engine[i].name,
+				   error->ring[i].num_waiters);
+			for (j = 0; j < error->ring[i].num_waiters; j++) {
+				err_printf(m, " seqno 0x%08x for %s [%d]\n",
+					   error->ring[i].waiters[j].seqno,
+					   error->ring[i].waiters[j].comm,
+					   error->ring[i].waiters[j].pid);
+			}
+		}
+
 		if ((obj = error->ring[i].ringbuffer)) {
 			err_printf(m, "%s --- ringbuffer = 0x%08x\n",
 				   dev_priv->engine[i].name,
@@ -605,8 +617,9 @@ static void i915_error_state_free(struct kref *error_ref)
 		i915_error_object_free(error->ring[i].ringbuffer);
 		i915_error_object_free(error->ring[i].hws_page);
 		i915_error_object_free(error->ring[i].ctx);
-		kfree(error->ring[i].requests);
 		i915_error_object_free(error->ring[i].wa_ctx);
+		kfree(error->ring[i].requests);
+		kfree(error->ring[i].waiters);
 	}
 
 	i915_error_object_free(error->semaphore_obj);
@@ -892,6 +905,47 @@ static void gen6_record_semaphore_state(struct drm_i915_private *dev_priv,
 	}
 }
 
+static void engine_record_waiters(struct intel_engine_cs *engine,
+				  struct drm_i915_error_ring *ering)
+{
+	struct intel_breadcrumbs *b = &engine->breadcrumbs;
+	struct drm_i915_error_waiter *waiter;
+	struct rb_node *rb;
+	int count;
+
+	ering->num_waiters = 0;
+	ering->waiters = NULL;
+
+	spin_lock(&b->lock);
+	count = 0;
+	for (rb = rb_first(&b->waiters); rb != NULL; rb = rb_next(rb))
+		count++;
+	spin_unlock(&b->lock);
+
+	waiter = NULL;
+	if (count)
+		waiter = kmalloc(count*sizeof(struct drm_i915_error_waiter),
+				 GFP_ATOMIC);
+	if (!waiter)
+		return;
+
+	ering->waiters = waiter;
+
+	spin_lock(&b->lock);
+	for (rb = rb_first(&b->waiters); rb != NULL; rb = rb_next(rb)) {
+		struct intel_wait *w = container_of(rb, typeof(*w), node);
+
+		strcpy(waiter->comm, w->task->comm);
+		waiter->pid = w->task->pid;
+		waiter->seqno = w->seqno;
+		waiter++;
+
+		if (++ering->num_waiters == count)
+			break;
+	}
+	spin_unlock(&b->lock);
+}
+
 static void i915_record_ring_state(struct drm_i915_private *dev_priv,
 				   struct drm_i915_error_state *error,
 				   struct intel_engine_cs *engine,
@@ -926,7 +980,7 @@ static void i915_record_ring_state(struct drm_i915_private *dev_priv,
 		ering->instdone = I915_READ(GEN2_INSTDONE);
 	}
 
-	ering->waiting = waitqueue_active(&engine->irq_queue);
+	ering->waiting = intel_engine_has_waiter(engine);
 	ering->instpm = I915_READ(RING_INSTPM(engine->mmio_base));
 	ering->acthd = intel_ring_get_active_head(engine);
 	ering->seqno = engine->get_seqno(engine);
@@ -1032,6 +1086,7 @@ static void i915_gem_record_rings(struct drm_i915_private *dev_priv,
 		error->ring[i].valid = true;
 
 		i915_record_ring_state(dev_priv, error, engine, &error->ring[i]);
+		engine_record_waiters(engine, &error->ring[i]);
 
 		request = i915_gem_find_active_request(engine);
 		if (request) {
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 30127b94f26e..2a736f4a0fe5 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -976,13 +976,10 @@ static void ironlake_rps_change_irq_handler(struct drm_i915_private *dev_priv)
 
 static void notify_ring(struct intel_engine_cs *engine)
 {
-	if (!intel_engine_initialized(engine))
-		return;
-
-	trace_i915_gem_request_notify(engine);
-	engine->user_interrupts++;
-
-	wake_up_all(&engine->irq_queue);
+	if (intel_engine_wakeup(engine)) {
+		trace_i915_gem_request_notify(engine);
+		engine->user_interrupts++;
+	}
 }
 
 static void vlv_c0_read(struct drm_i915_private *dev_priv,
@@ -1063,7 +1060,7 @@ static bool any_waiters(struct drm_i915_private *dev_priv)
 	struct intel_engine_cs *engine;
 
 	for_each_engine(engine, dev_priv)
-		if (engine->irq_refcount)
+		if (intel_engine_has_waiter(engine))
 			return true;
 
 	return false;
@@ -3073,13 +3070,14 @@ static unsigned kick_waiters(struct intel_engine_cs *engine)
 
 	if (engine->hangcheck.user_interrupts == user_interrupts &&
 	    !test_and_set_bit(engine->id, &i915->gpu_error.missed_irq_rings)) {
-		if (!(i915->gpu_error.test_irq_rings & intel_engine_flag(engine)))
+		if (!test_bit(engine->id, &i915->gpu_error.test_irq_rings))
 			DRM_ERROR("Hangcheck timer elapsed... %s idle\n",
 				  engine->name);
 		else
 			DRM_INFO("Fake missed irq on %s\n",
 				 engine->name);
-		wake_up_all(&engine->irq_queue);
+
+		intel_engine_enable_fake_irq(engine);
 	}
 
 	return user_interrupts;
@@ -3123,7 +3121,7 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
 	intel_uncore_arm_unclaimed_mmio_detection(dev_priv);
 
 	for_each_engine_id(engine, dev_priv, id) {
-		bool busy = waitqueue_active(&engine->irq_queue);
+		bool busy = intel_engine_has_waiter(engine);
 		u64 acthd;
 		u32 seqno;
 		unsigned user_interrupts;
diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
new file mode 100644
index 000000000000..e0121f727938
--- /dev/null
+++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
@@ -0,0 +1,354 @@
+/*
+ * Copyright © 2015 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+
+#include "i915_drv.h"
+
+static void intel_breadcrumbs_fake_irq(unsigned long data)
+{
+	struct intel_engine_cs *engine = (struct intel_engine_cs *)data;
+
+	/*
+	 * The timer persists in case we cannot enable interrupts,
+	 * or if we have previously seen seqno/interrupt incoherency
+	 * ("missed interrupt" syndrome). Here the worker will wake up
+	 * every jiffie in order to kick the oldest waiter to do the
+	 * coherent seqno check.
+	 */
+	rcu_read_lock();
+	if (intel_engine_wakeup(engine))
+		mod_timer(&engine->breadcrumbs.fake_irq, jiffies + 1);
+	rcu_read_unlock();
+}
+
+static void irq_enable(struct intel_engine_cs *engine)
+{
+	WARN_ON(!engine->irq_get(engine));
+}
+
+static void irq_disable(struct intel_engine_cs *engine)
+{
+	engine->irq_put(engine);
+}
+
+static bool __intel_breadcrumbs_enable_irq(struct intel_breadcrumbs *b)
+{
+	struct intel_engine_cs *engine =
+		container_of(b, struct intel_engine_cs, breadcrumbs);
+	struct drm_i915_private *i915 = engine->i915;
+	bool irq_posted = false;
+
+	assert_spin_locked(&b->lock);
+	if (b->rpm_wakelock)
+		return false;
+
+	/* Since we are waiting on a request, the GPU should be busy
+	 * and should have its own rpm reference. For completeness,
+	 * record an rpm reference for ourselves to cover the
+	 * interrupt we unmask.
+	 */
+	intel_runtime_pm_get_noresume(i915);
+	b->rpm_wakelock = true;
+
+	/* No interrupts? Kick the waiter every jiffie! */
+	if (intel_irqs_enabled(i915)) {
+		if (!test_bit(engine->id, &i915->gpu_error.test_irq_rings)) {
+			irq_enable(engine);
+			irq_posted = true;
+		}
+		b->irq_enabled = true;
+	}
+
+	if (!b->irq_enabled ||
+	    test_bit(engine->id, &i915->gpu_error.missed_irq_rings))
+		mod_timer(&b->fake_irq, jiffies + 1);
+
+	return irq_posted;
+}
+
+static void __intel_breadcrumbs_disable_irq(struct intel_breadcrumbs *b)
+{
+	struct intel_engine_cs *engine =
+		container_of(b, struct intel_engine_cs, breadcrumbs);
+
+	assert_spin_locked(&b->lock);
+	if (!b->rpm_wakelock)
+		return;
+
+	if (b->irq_enabled) {
+		irq_disable(engine);
+		b->irq_enabled = false;
+	}
+
+	intel_runtime_pm_put(engine->i915);
+	b->rpm_wakelock = false;
+}
+
+static inline struct intel_wait *to_wait(struct rb_node *node)
+{
+	return container_of(node, struct intel_wait, node);
+}
+
+static inline void __intel_breadcrumbs_finish(struct intel_breadcrumbs *b,
+					      struct intel_wait *wait)
+{
+	assert_spin_locked(&b->lock);
+
+	/* This request is completed, so remove it from the tree, mark it as
+	 * complete, and *then* wake up the associated task.
+	 */
+	rb_erase(&wait->node, &b->waiters);
+	RB_CLEAR_NODE(&wait->node);
+
+	wake_up_process(wait->task); /* implicit smp_wmb() */
+}
+
+bool intel_engine_add_wait(struct intel_engine_cs *engine,
+			   struct intel_wait *wait)
+{
+	struct intel_breadcrumbs *b = &engine->breadcrumbs;
+	struct rb_node **p, *parent, *completed;
+	bool first;
+	u32 seqno;
+
+	spin_lock(&b->lock);
+
+	/* Insert the request into the retirement ordered list
+	 * of waiters by walking the rbtree. If we are the oldest
+	 * seqno in the tree (the first to be retired), then
+	 * set ourselves as the bottom-half.
+	 *
+	 * As we descend the tree, prune completed branches since we hold the
+	 * spinlock we know that the first_waiter must be delayed and can
+	 * reduce some of the sequential wake up latency if we take action
+	 * ourselves and wake up the completed tasks in parallel. Also, by
+	 * removing stale elements in the tree, we may be able to reduce the
+	 * ping-pong between the old bottom-half and ourselves as first-waiter.
+	 */
+	first = true;
+	parent = NULL;
+	completed = NULL;
+	seqno = engine->get_seqno(engine);
+
+	p = &b->waiters.rb_node;
+	while (*p) {
+		parent = *p;
+		if (wait->seqno == to_wait(parent)->seqno) {
+			/* We have multiple waiters on the same seqno, select
+			 * the highest priority task (that with the smallest
+			 * task->prio) to serve as the bottom-half for this
+			 * group.
+			 */
+			if (wait->task->prio > to_wait(parent)->task->prio) {
+				p = &parent->rb_right;
+				first = false;
+			} else
+				p = &parent->rb_left;
+		} else if (i915_seqno_passed(wait->seqno,
+					     to_wait(parent)->seqno)) {
+			p = &parent->rb_right;
+			if (i915_seqno_passed(seqno, to_wait(parent)->seqno))
+				completed = parent;
+			else
+				first = false;
+		} else
+			p = &parent->rb_left;
+	}
+	rb_link_node(&wait->node, parent, p);
+	rb_insert_color(&wait->node, &b->waiters);
+	GEM_BUG_ON(!first && !b->tasklet);
+
+	if (completed) {
+		struct rb_node *next = rb_next(completed);
+
+		GEM_BUG_ON(!next && !first);
+		if (next && next != &wait->node) {
+			GEM_BUG_ON(first);
+			b->first_wait = to_wait(next);
+			smp_store_mb(b->tasklet, b->first_wait->task);
+			/* As there is a delay between reading the current
+			 * seqno, processing the completed tasks and selecting
+			 * the next waiter, we may have missed the interrupt
+			 * and so need for the next bottom-half to wakeup.
+			 *
+			 * Also as we enable the IRQ, we may miss the
+			 * interrupt for that seqno, so we have to wake up
+			 * the next bottom-half in order to do a coherent check
+			 * in case the seqno passed.
+			 */
+			__intel_breadcrumbs_enable_irq(b);
+			wake_up_process(to_wait(next)->task);
+		}
+
+		do {
+			struct intel_wait *crumb = to_wait(completed);
+			completed = rb_prev(completed);
+			__intel_breadcrumbs_finish(b, crumb);
+		} while (completed);
+	}
+
+	if (first) {
+		GEM_BUG_ON(rb_first(&b->waiters) != &wait->node);
+		b->first_wait = wait;
+		smp_store_mb(b->tasklet, wait->task);
+		first =__intel_breadcrumbs_enable_irq(b);
+	}
+	GEM_BUG_ON(!b->tasklet);
+	GEM_BUG_ON(!b->first_wait);
+	GEM_BUG_ON(rb_first(&b->waiters) != &b->first_wait->node);
+
+	spin_unlock(&b->lock);
+
+	return first;
+}
+
+void intel_engine_enable_fake_irq(struct intel_engine_cs *engine)
+{
+	mod_timer(&engine->breadcrumbs.fake_irq, jiffies + 1);
+}
+
+static inline bool chain_wakeup(struct rb_node *rb, int priority)
+{
+	return rb && to_wait(rb)->task->prio <= priority;
+}
+
+void intel_engine_remove_wait(struct intel_engine_cs *engine,
+			      struct intel_wait *wait)
+{
+	struct intel_breadcrumbs *b = &engine->breadcrumbs;
+
+	/* Quick check to see if this waiter was already decoupled from
+	 * the tree by the bottom-half to avoid contention on the spinlock
+	 * by the herd.
+	 */
+	if (RB_EMPTY_NODE(&wait->node))
+		return;
+
+	spin_lock(&b->lock);
+
+	if (RB_EMPTY_NODE(&wait->node))
+		goto out_unlock;
+
+	if (b->first_wait == wait) {
+		struct rb_node *next;
+		const int priority = wait->task->prio;
+
+		GEM_BUG_ON(b->tasklet != wait->task);
+
+		/* We are the current bottom-half. Find the next candidate,
+		 * the first waiter in the queue on the remaining oldest
+		 * request. As multiple seqnos may complete in the time it
+		 * takes us to wake up and find the next waiter, we have to
+		 * wake up that waiter for it to perform its own coherent
+		 * completion check.
+		 */
+		next = rb_next(&wait->node);
+		if (chain_wakeup(next, priority)) {
+			/* If the next waiter is already complete,
+			 * wake it up and continue onto the next waiter. So
+			 * if have a small herd, they will wake up in parallel
+			 * rather than sequentially, which should reduce
+			 * the overall latency in waking all the completed
+			 * clients.
+			 *
+			 * However, waking up a chain adds extra latency to
+			 * the first_waiter. This is undesirable if that
+			 * waiter is a high priority task.
+			 */
+			u32 seqno = engine->get_seqno(engine);
+			while (i915_seqno_passed(seqno, to_wait(next)->seqno)) {
+				struct rb_node *n = rb_next(next);
+				__intel_breadcrumbs_finish(b, to_wait(next));
+				next = n;
+				if (!chain_wakeup(next, priority))
+					break;
+			}
+		}
+
+		if (next) {
+			/* In our haste, we may have completed the first waiter
+			 * before we enabled the interrupt. Do so now as we
+			 * have a second waiter for a future seqno. Afterwards,
+			 * we have to wake up that waiter in case we missed
+			 * the interrupt, or if we have to handle an
+			 * exception rather than a seqno completion.
+			 */
+			b->first_wait = to_wait(next);
+			smp_store_mb(b->tasklet, b->first_wait->task);
+			if (b->first_wait->seqno != wait->seqno)
+				__intel_breadcrumbs_enable_irq(b);
+			wake_up_process(b->tasklet);
+		} else {
+			b->first_wait = NULL;
+			WRITE_ONCE(b->tasklet, NULL);
+			__intel_breadcrumbs_disable_irq(b);
+		}
+	} else {
+		GEM_BUG_ON(rb_first(&b->waiters) == &wait->node);
+	}
+
+	GEM_BUG_ON(RB_EMPTY_NODE(&wait->node));
+	rb_erase(&wait->node, &b->waiters);
+
+out_unlock:
+	GEM_BUG_ON(b->first_wait == wait);
+	GEM_BUG_ON(rb_first(&b->waiters) != (b->first_wait ? &b->first_wait->node : NULL));
+	GEM_BUG_ON(!b->tasklet ^ RB_EMPTY_ROOT(&b->waiters));
+	spin_unlock(&b->lock);
+}
+
+void intel_engine_init_breadcrumbs(struct intel_engine_cs *engine)
+{
+	struct intel_breadcrumbs *b = &engine->breadcrumbs;
+
+	spin_lock_init(&b->lock);
+	setup_timer(&b->fake_irq,
+		    intel_breadcrumbs_fake_irq,
+		    (unsigned long)engine);
+}
+
+void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine)
+{
+	struct intel_breadcrumbs *b = &engine->breadcrumbs;
+
+	del_timer_sync(&b->fake_irq);
+}
+
+unsigned intel_kick_waiters(struct drm_i915_private *i915)
+{
+	struct intel_engine_cs *engine;
+	unsigned mask = 0;
+
+	/* To avoid the task_struct disappearing beneath us as we wake up
+	 * the process, we must first inspect the task_struct->state under the
+	 * RCU lock, i.e. as we call wake_up_process() we must be holding the
+	 * rcu_read_lock().
+	 */
+	rcu_read_lock();
+	for_each_engine(engine, i915)
+		if (unlikely(intel_engine_wakeup(engine)))
+			mask |= intel_engine_flag(engine);
+	rcu_read_unlock();
+
+	return mask;
+}
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 5c191a1afaaf..270409e9ac7a 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1890,6 +1890,8 @@ void intel_logical_ring_cleanup(struct intel_engine_cs *engine)
 	i915_cmd_parser_fini_ring(engine);
 	i915_gem_batch_pool_fini(&engine->batch_pool);
 
+	intel_engine_fini_breadcrumbs(engine);
+
 	if (engine->status_page.obj) {
 		i915_gem_object_unpin_map(engine->status_page.obj);
 		engine->status_page.obj = NULL;
@@ -1927,7 +1929,7 @@ logical_ring_default_irqs(struct intel_engine_cs *engine, unsigned shift)
 {
 	engine->irq_enable_mask = GT_RENDER_USER_INTERRUPT << shift;
 	engine->irq_keep_mask = GT_CONTEXT_SWITCH_INTERRUPT << shift;
-	init_waitqueue_head(&engine->irq_queue);
+	intel_engine_init_breadcrumbs(engine);
 }
 
 static int
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 1a389d0dcdd2..95f04345d3ec 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2309,7 +2309,7 @@ static int intel_init_ring_buffer(struct drm_device *dev,
 	memset(engine->semaphore.sync_seqno, 0,
 	       sizeof(engine->semaphore.sync_seqno));
 
-	init_waitqueue_head(&engine->irq_queue);
+	intel_engine_init_breadcrumbs(engine);
 
 	/* We may need to do things with the shrinker which
 	 * require us to immediately switch back to the default
@@ -2389,6 +2389,7 @@ void intel_cleanup_engine(struct intel_engine_cs *engine)
 
 	i915_cmd_parser_fini_ring(engine);
 	i915_gem_batch_pool_fini(&engine->batch_pool);
+	intel_engine_fini_breadcrumbs(engine);
 
 	intel_ring_context_unpin(dev_priv->kernel_context, engine);
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index b33c876fed20..061088360b80 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -160,6 +160,32 @@ struct intel_engine_cs {
 	struct intel_ringbuffer *buffer;
 	struct list_head buffers;
 
+	/* Rather than have every client wait upon all user interrupts,
+	 * with the herd waking after every interrupt and each doing the
+	 * heavyweight seqno dance, we delegate the task (of being the
+	 * bottom-half of the user interrupt) to the first client. After
+	 * every interrupt, we wake up one client, who does the heavyweight
+	 * coherent seqno read and either goes back to sleep (if incomplete),
+	 * or wakes up all the completed clients in parallel, before then
+	 * transferring the bottom-half status to the next client in the queue.
+	 *
+	 * Compared to walking the entire list of waiters in a single dedicated
+	 * bottom-half, we reduce the latency of the first waiter by avoiding
+	 * a context switch, but incur additional coherent seqno reads when
+	 * following the chain of request breadcrumbs. Since it is most likely
+	 * that we have a single client waiting on each seqno, then reducing
+	 * the overhead of waking that client is much preferred.
+	 */
+	struct intel_breadcrumbs {
+		spinlock_t lock; /* protects the lists of requests */
+		struct rb_root waiters; /* sorted by retirement, priority */
+		struct intel_wait *first_wait; /* oldest waiter by retirement */
+		struct task_struct *tasklet; /* bh for user interrupts */
+		struct timer_list fake_irq; /* used after a missed interrupt */
+		bool irq_enabled;
+		bool rpm_wakelock;
+	} breadcrumbs;
+
 	/*
 	 * A pool of objects to use as shadow copies of client batch buffers
 	 * when the command parser is enabled. Prevents the client from
@@ -308,8 +334,6 @@ struct intel_engine_cs {
 
 	bool gpu_caches_dirty;
 
-	wait_queue_head_t irq_queue;
-
 	struct i915_gem_context *last_context;
 
 	struct intel_ring_hangcheck hangcheck;
@@ -495,4 +519,44 @@ static inline u32 intel_hws_seqno_address(struct intel_engine_cs *engine)
 	return engine->status_page.gfx_addr + I915_GEM_HWS_INDEX_ADDR;
 }
 
+/* intel_breadcrumbs.c -- user interrupt bottom-half for waiters */
+struct intel_wait {
+	struct rb_node node;
+	struct task_struct *task;
+	u32 seqno;
+};
+void intel_engine_init_breadcrumbs(struct intel_engine_cs *engine);
+static inline void intel_wait_init(struct intel_wait *wait, u32 seqno)
+{
+	wait->task = current;
+	wait->seqno = seqno;
+}
+static inline bool intel_wait_complete(const struct intel_wait *wait)
+{
+	return RB_EMPTY_NODE(&wait->node);
+}
+bool intel_engine_add_wait(struct intel_engine_cs *engine,
+			   struct intel_wait *wait);
+void intel_engine_remove_wait(struct intel_engine_cs *engine,
+			      struct intel_wait *wait);
+static inline bool intel_engine_has_waiter(struct intel_engine_cs *engine)
+{
+	return READ_ONCE(engine->breadcrumbs.tasklet);
+}
+static inline bool intel_engine_wakeup(struct intel_engine_cs *engine)
+{
+	bool wakeup = false;
+	struct task_struct *task = READ_ONCE(engine->breadcrumbs.tasklet);
+	/* Note that for this not to dangerously chase a dangling pointer,
+	 * the caller is responsible for ensure that the task remain valid for
+	 * wake_up_process() i.e. that the RCU grace period cannot expire.
+	 */
+	if (task)
+		wakeup = wake_up_process(task);
+	return wakeup;
+}
+void intel_engine_enable_fake_irq(struct intel_engine_cs *engine);
+void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine);
+unsigned intel_kick_waiters(struct drm_i915_private *i915);
+
 #endif /* _INTEL_RINGBUFFER_H_ */
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 07/21] drm/i915: Spin after waking up for an interrupt
  2016-06-03 16:08 Breadcrumbs, again Chris Wilson
                   ` (5 preceding siblings ...)
  2016-06-03 16:08 ` [PATCH 06/21] drm/i915: Slaughter the thundering i915_wait_request herd Chris Wilson
@ 2016-06-03 16:08 ` Chris Wilson
  2016-06-06 14:39   ` Tvrtko Ursulin
  2016-06-03 16:08 ` [PATCH 08/21] drm/i915: Use HWS for seqno tracking everywhere Chris Wilson
                   ` (14 subsequent siblings)
  21 siblings, 1 reply; 60+ messages in thread
From: Chris Wilson @ 2016-06-03 16:08 UTC (permalink / raw)
  To: intel-gfx

When waiting for an interrupt (waiting for the GPU to complete some
work), we know we are the single waiter for the GPU. We also know when
the GPU has nearly completed our request (or at least started processing
it), so after being woken and we detect that the GPU is almost finished,
allow the bottom-half to spin for a very short while to reduce client
latencies.

The impact is minimal, there was an improvement to the realtime-vs-many
clients case, but exporting the function proves useful later.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c  |  2 +-
 drivers/gpu/drm/i915/i915_drv.h      | 26 +++++++++++++++--------
 drivers/gpu/drm/i915/i915_gem.c      | 40 +++++++++++++++++++++---------------
 drivers/gpu/drm/i915/intel_display.c |  2 +-
 drivers/gpu/drm/i915/intel_pm.c      |  4 ++--
 5 files changed, 45 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 48683538b4e2..0c287bf0d230 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -663,7 +663,7 @@ static int i915_gem_pageflip_info(struct seq_file *m, void *data)
 					   i915_gem_request_get_seqno(work->flip_queued_req),
 					   dev_priv->next_seqno,
 					   engine->get_seqno(engine),
-					   i915_gem_request_completed(work->flip_queued_req, true));
+					   i915_gem_request_completed(work->flip_queued_req));
 			} else
 				seq_printf(m, "Flip not associated with any ring\n");
 			seq_printf(m, "Flip queued on frame %d, (was ready on frame %d), now %d\n",
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 68b383d98457..b0460eda2113 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3219,24 +3219,27 @@ i915_seqno_passed(uint32_t seq1, uint32_t seq2)
 	return (int32_t)(seq1 - seq2) >= 0;
 }
 
-static inline bool i915_gem_request_started(struct drm_i915_gem_request *req,
-					   bool lazy_coherency)
+static inline bool i915_gem_request_started(const struct drm_i915_gem_request *req)
 {
-	if (!lazy_coherency && req->engine->irq_seqno_barrier)
-		req->engine->irq_seqno_barrier(req->engine);
 	return i915_seqno_passed(req->engine->get_seqno(req->engine),
 				 req->previous_seqno);
 }
 
-static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
-					      bool lazy_coherency)
+static inline bool i915_gem_request_completed(const struct drm_i915_gem_request *req)
 {
-	if (!lazy_coherency && req->engine->irq_seqno_barrier)
-		req->engine->irq_seqno_barrier(req->engine);
 	return i915_seqno_passed(req->engine->get_seqno(req->engine),
 				 req->seqno);
 }
 
+bool __i915_spin_request(const struct drm_i915_gem_request *request,
+			 int state, unsigned long timeout_us);
+static inline bool i915_spin_request(const struct drm_i915_gem_request *request,
+				     int state, unsigned long timeout_us)
+{
+	return (i915_gem_request_started(request) &&
+		__i915_spin_request(request, state, timeout_us));
+}
+
 int __must_check i915_gem_get_seqno(struct drm_i915_private *dev_priv, u32 *seqno);
 int __must_check i915_gem_set_seqno(struct drm_device *dev, u32 seqno);
 
@@ -3913,6 +3916,8 @@ static inline void i915_trace_irq_get(struct intel_engine_cs *engine,
 
 static inline bool __i915_request_irq_complete(struct drm_i915_gem_request *req)
 {
+	struct intel_engine_cs *engine = req->engine;
+
 	/* Ensure our read of the seqno is coherent so that we
 	 * do not "miss an interrupt" (i.e. if this is the last
 	 * request and the seqno write from the GPU is not visible
@@ -3924,7 +3929,10 @@ static inline bool __i915_request_irq_complete(struct drm_i915_gem_request *req)
 	 * but it is easier and safer to do it every time the waiter
 	 * is woken.
 	 */
-	if (i915_gem_request_completed(req, false))
+	if (engine->irq_seqno_barrier)
+		engine->irq_seqno_barrier(engine);
+
+	if (i915_gem_request_completed(req))
 		return true;
 
 	/* We need to check whether any gpu reset happened in between
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index d08edb3d16f1..bf5c93f2bd81 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1155,9 +1155,9 @@ static bool busywait_stop(unsigned long timeout, unsigned cpu)
 	return this_cpu != cpu;
 }
 
-static bool __i915_spin_request(struct drm_i915_gem_request *req, int state)
+bool __i915_spin_request(const struct drm_i915_gem_request *req,
+			 int state, unsigned long timeout_us)
 {
-	unsigned long timeout;
 	unsigned cpu;
 
 	/* When waiting for high frequency requests, e.g. during synchronous
@@ -1170,19 +1170,15 @@ static bool __i915_spin_request(struct drm_i915_gem_request *req, int state)
 	 * takes to sleep on a request, on the order of a microsecond.
 	 */
 
-	/* Only spin if we know the GPU is processing this request */
-	if (!i915_gem_request_started(req, true))
-		return false;
-
-	timeout = local_clock_us(&cpu) + 5;
+	timeout_us += local_clock_us(&cpu);
 	do {
-		if (i915_gem_request_completed(req, true))
+		if (i915_gem_request_completed(req))
 			return true;
 
 		if (signal_pending_state(state, current))
 			break;
 
-		if (busywait_stop(timeout, cpu))
+		if (busywait_stop(timeout_us, cpu))
 			break;
 
 		cpu_relax_lowlatency();
@@ -1224,7 +1220,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 	if (list_empty(&req->list))
 		return 0;
 
-	if (i915_gem_request_completed(req, true))
+	if (i915_gem_request_completed(req))
 		return 0;
 
 	timeout_remain = MAX_SCHEDULE_TIMEOUT;
@@ -1249,7 +1245,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 		gen6_rps_boost(req->i915, rps, req->emitted_jiffies);
 
 	/* Optimistic spin for the next ~jiffie before touching IRQs */
-	if (__i915_spin_request(req, state))
+	if (i915_spin_request(req, state, 5))
 		goto complete;
 
 	intel_wait_init(&wait, req->seqno);
@@ -1290,6 +1286,10 @@ wakeup:
 		 */
 		if (__i915_request_irq_complete(req))
 			break;
+
+		/* Only spin if we know the GPU is processing this request */
+		if (i915_spin_request(req, state, 2))
+			break;
 	}
 	remove_wait_queue(&req->i915->gpu_error.wait_queue, &reset);
 
@@ -2805,8 +2805,16 @@ i915_gem_find_active_request(struct intel_engine_cs *engine)
 {
 	struct drm_i915_gem_request *request;
 
+	/* We are called by the error capture and reset at a random
+	 * point in time. In particular, note that neither is crucially
+	 * ordered with an interrupt. After a hang, the GPU is dead and we
+	 * assume that no more writes can happen (we waited long enough for
+	 * all writes that were in transaction to be flushed) - adding an
+	 * extra delay for a recent interrupt is pointless. Hence, we do
+	 * not need an engine->irq_seqno_barrier() before the seqno reads.
+	 */
 	list_for_each_entry(request, &engine->request_list, list) {
-		if (i915_gem_request_completed(request, false))
+		if (i915_gem_request_completed(request))
 			continue;
 
 		return request;
@@ -2937,7 +2945,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *engine)
 					   struct drm_i915_gem_request,
 					   list);
 
-		if (!i915_gem_request_completed(request, true))
+		if (!i915_gem_request_completed(request))
 			break;
 
 		i915_gem_request_retire(request);
@@ -2961,7 +2969,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *engine)
 	}
 
 	if (unlikely(engine->trace_irq_req &&
-		     i915_gem_request_completed(engine->trace_irq_req, true))) {
+		     i915_gem_request_completed(engine->trace_irq_req))) {
 		engine->irq_put(engine);
 		i915_gem_request_assign(&engine->trace_irq_req, NULL);
 	}
@@ -3058,7 +3066,7 @@ i915_gem_object_flush_active(struct drm_i915_gem_object *obj)
 		if (req == NULL)
 			continue;
 
-		if (i915_gem_request_completed(req, true))
+		if (i915_gem_request_completed(req))
 			i915_gem_object_retire__read(obj, i);
 	}
 
@@ -3164,7 +3172,7 @@ __i915_gem_object_sync(struct drm_i915_gem_object *obj,
 	if (to == from)
 		return 0;
 
-	if (i915_gem_request_completed(from_req, true))
+	if (i915_gem_request_completed(from_req))
 		return 0;
 
 	if (!i915_semaphore_is_enabled(to_i915(obj->base.dev))) {
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 2bc291ac7243..bb09ee6d1a3f 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -11590,7 +11590,7 @@ static bool __pageflip_stall_check_cs(struct drm_i915_private *dev_priv,
 	vblank = intel_crtc_get_vblank_counter(intel_crtc);
 	if (work->flip_ready_vblank == 0) {
 		if (work->flip_queued_req &&
-		    !i915_gem_request_completed(work->flip_queued_req, true))
+		    !i915_gem_request_completed(work->flip_queued_req))
 			return false;
 
 		work->flip_ready_vblank = vblank;
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 657a64fc2780..712bd0debb91 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -7687,7 +7687,7 @@ static void __intel_rps_boost_work(struct work_struct *work)
 	struct request_boost *boost = container_of(work, struct request_boost, work);
 	struct drm_i915_gem_request *req = boost->req;
 
-	if (!i915_gem_request_completed(req, true))
+	if (!i915_gem_request_completed(req))
 		gen6_rps_boost(req->i915, NULL, req->emitted_jiffies);
 
 	i915_gem_request_unreference(req);
@@ -7701,7 +7701,7 @@ void intel_queue_rps_boost_for_request(struct drm_i915_gem_request *req)
 	if (req == NULL || INTEL_GEN(req->i915) < 6)
 		return;
 
-	if (i915_gem_request_completed(req, true))
+	if (i915_gem_request_completed(req))
 		return;
 
 	boost = kmalloc(sizeof(*boost), GFP_ATOMIC);
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 08/21] drm/i915: Use HWS for seqno tracking everywhere
  2016-06-03 16:08 Breadcrumbs, again Chris Wilson
                   ` (6 preceding siblings ...)
  2016-06-03 16:08 ` [PATCH 07/21] drm/i915: Spin after waking up for an interrupt Chris Wilson
@ 2016-06-03 16:08 ` Chris Wilson
  2016-06-06 14:55   ` Tvrtko Ursulin
  2016-06-03 16:08 ` [PATCH 09/21] drm/i915: Stop mapping the scratch page into CPU space Chris Wilson
                   ` (13 subsequent siblings)
  21 siblings, 1 reply; 60+ messages in thread
From: Chris Wilson @ 2016-06-03 16:08 UTC (permalink / raw)
  To: intel-gfx

By using the same address for storing the HWS on every platform, we can
remove the platform specific vfuncs and reduce the get-seqno routine to
a single read of a cached memory location.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c      |  6 +--
 drivers/gpu/drm/i915/i915_drv.h          |  4 +-
 drivers/gpu/drm/i915/i915_gpu_error.c    |  2 +-
 drivers/gpu/drm/i915/i915_irq.c          |  4 +-
 drivers/gpu/drm/i915/i915_trace.h        |  2 +-
 drivers/gpu/drm/i915/intel_breadcrumbs.c |  4 +-
 drivers/gpu/drm/i915/intel_lrc.c         | 26 +---------
 drivers/gpu/drm/i915/intel_ringbuffer.c  | 83 ++++++++------------------------
 drivers/gpu/drm/i915/intel_ringbuffer.h  |  7 +--
 9 files changed, 36 insertions(+), 102 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 0c287bf0d230..72dae6fb0aa2 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -662,7 +662,7 @@ static int i915_gem_pageflip_info(struct seq_file *m, void *data)
 					   engine->name,
 					   i915_gem_request_get_seqno(work->flip_queued_req),
 					   dev_priv->next_seqno,
-					   engine->get_seqno(engine),
+					   intel_engine_get_seqno(engine),
 					   i915_gem_request_completed(work->flip_queued_req));
 			} else
 				seq_printf(m, "Flip not associated with any ring\n");
@@ -792,7 +792,7 @@ static void i915_ring_seqno_info(struct seq_file *m,
 	struct rb_node *rb;
 
 	seq_printf(m, "Current sequence (%s): %x\n",
-		   engine->name, engine->get_seqno(engine));
+		   engine->name, intel_engine_get_seqno(engine));
 	seq_printf(m, "Current user interrupts (%s): %x\n",
 		   engine->name, READ_ONCE(engine->user_interrupts));
 
@@ -1417,7 +1417,7 @@ static int i915_hangcheck_info(struct seq_file *m, void *unused)
 
 	for_each_engine_id(engine, dev_priv, id) {
 		acthd[id] = intel_ring_get_active_head(engine);
-		seqno[id] = engine->get_seqno(engine);
+		seqno[id] = intel_engine_get_seqno(engine);
 	}
 
 	i915_get_extra_instdone(dev_priv, instdone);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index b0460eda2113..4a71f4e9a97a 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3221,13 +3221,13 @@ i915_seqno_passed(uint32_t seq1, uint32_t seq2)
 
 static inline bool i915_gem_request_started(const struct drm_i915_gem_request *req)
 {
-	return i915_seqno_passed(req->engine->get_seqno(req->engine),
+	return i915_seqno_passed(intel_engine_get_seqno(req->engine),
 				 req->previous_seqno);
 }
 
 static inline bool i915_gem_request_completed(const struct drm_i915_gem_request *req)
 {
-	return i915_seqno_passed(req->engine->get_seqno(req->engine),
+	return i915_seqno_passed(intel_engine_get_seqno(req->engine),
 				 req->seqno);
 }
 
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 89241ffcc676..81341fc4e61a 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -983,7 +983,7 @@ static void i915_record_ring_state(struct drm_i915_private *dev_priv,
 	ering->waiting = intel_engine_has_waiter(engine);
 	ering->instpm = I915_READ(RING_INSTPM(engine->mmio_base));
 	ering->acthd = intel_ring_get_active_head(engine);
-	ering->seqno = engine->get_seqno(engine);
+	ering->seqno = intel_engine_get_seqno(engine);
 	ering->last_seqno = engine->last_submitted_seqno;
 	ering->start = I915_READ_START(engine);
 	ering->head = I915_READ_HEAD(engine);
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 2a736f4a0fe5..4013ad92cdc6 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -2951,7 +2951,7 @@ static int semaphore_passed(struct intel_engine_cs *engine)
 	if (signaller->hangcheck.deadlock >= I915_NUM_ENGINES)
 		return -1;
 
-	if (i915_seqno_passed(signaller->get_seqno(signaller), seqno))
+	if (i915_seqno_passed(intel_engine_get_seqno(engine), seqno))
 		return 1;
 
 	/* cursory check for an unkickable deadlock */
@@ -3139,7 +3139,7 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
 			engine->irq_seqno_barrier(engine);
 
 		acthd = intel_ring_get_active_head(engine);
-		seqno = engine->get_seqno(engine);
+		seqno = intel_engine_get_seqno(engine);
 
 		/* Reset stuck interrupts between batch advances */
 		user_interrupts = 0;
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index 6768db032f84..3d13fde95fdf 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -558,7 +558,7 @@ TRACE_EVENT(i915_gem_request_notify,
 	    TP_fast_assign(
 			   __entry->dev = engine->i915->dev->primary->index;
 			   __entry->ring = engine->id;
-			   __entry->seqno = engine->get_seqno(engine);
+			   __entry->seqno = intel_engine_get_seqno(engine);
 			   ),
 
 	    TP_printk("dev=%u, ring=%u, seqno=%u",
diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
index e0121f727938..44346de39794 100644
--- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
@@ -148,7 +148,7 @@ bool intel_engine_add_wait(struct intel_engine_cs *engine,
 	first = true;
 	parent = NULL;
 	completed = NULL;
-	seqno = engine->get_seqno(engine);
+	seqno = intel_engine_get_seqno(engine);
 
 	p = &b->waiters.rb_node;
 	while (*p) {
@@ -275,7 +275,7 @@ void intel_engine_remove_wait(struct intel_engine_cs *engine,
 			 * the first_waiter. This is undesirable if that
 			 * waiter is a high priority task.
 			 */
-			u32 seqno = engine->get_seqno(engine);
+			u32 seqno = intel_engine_get_seqno(engine);
 			while (i915_seqno_passed(seqno, to_wait(next)->seqno)) {
 				struct rb_node *n = rb_next(next);
 				__intel_breadcrumbs_finish(b, to_wait(next));
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 270409e9ac7a..e48687837a95 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1712,16 +1712,6 @@ static int gen8_emit_flush_render(struct drm_i915_gem_request *request,
 	return 0;
 }
 
-static u32 gen8_get_seqno(struct intel_engine_cs *engine)
-{
-	return intel_read_status_page(engine, I915_GEM_HWS_INDEX);
-}
-
-static void gen8_set_seqno(struct intel_engine_cs *engine, u32 seqno)
-{
-	intel_write_status_page(engine, I915_GEM_HWS_INDEX, seqno);
-}
-
 static void bxt_a_seqno_barrier(struct intel_engine_cs *engine)
 {
 	/*
@@ -1737,14 +1727,6 @@ static void bxt_a_seqno_barrier(struct intel_engine_cs *engine)
 	intel_flush_status_page(engine, I915_GEM_HWS_INDEX);
 }
 
-static void bxt_a_set_seqno(struct intel_engine_cs *engine, u32 seqno)
-{
-	intel_write_status_page(engine, I915_GEM_HWS_INDEX, seqno);
-
-	/* See bxt_a_get_seqno() explaining the reason for the clflush. */
-	intel_flush_status_page(engine, I915_GEM_HWS_INDEX);
-}
-
 /*
  * Reserve space for 2 NOOPs at the end of each request to be
  * used as a workaround for not being allowed to do lite
@@ -1770,7 +1752,7 @@ static int gen8_emit_request(struct drm_i915_gem_request *request)
 				intel_hws_seqno_address(request->engine) |
 				MI_FLUSH_DW_USE_GTT);
 	intel_logical_ring_emit(ringbuf, 0);
-	intel_logical_ring_emit(ringbuf, i915_gem_request_get_seqno(request));
+	intel_logical_ring_emit(ringbuf, request->seqno);
 	intel_logical_ring_emit(ringbuf, MI_USER_INTERRUPT);
 	intel_logical_ring_emit(ringbuf, MI_NOOP);
 	return intel_logical_ring_advance_and_submit(request);
@@ -1916,12 +1898,8 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
 	engine->irq_get = gen8_logical_ring_get_irq;
 	engine->irq_put = gen8_logical_ring_put_irq;
 	engine->emit_bb_start = gen8_emit_bb_start;
-	engine->get_seqno = gen8_get_seqno;
-	engine->set_seqno = gen8_set_seqno;
-	if (IS_BXT_REVID(engine->i915, 0, BXT_REVID_A1)) {
+	if (IS_BXT_REVID(engine->i915, 0, BXT_REVID_A1))
 		engine->irq_seqno_barrier = bxt_a_seqno_barrier;
-		engine->set_seqno = bxt_a_set_seqno;
-	}
 }
 
 static inline void
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 95f04345d3ec..bac496902c6d 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1281,19 +1281,17 @@ static int gen8_rcs_signal(struct drm_i915_gem_request *signaller_req,
 		return ret;
 
 	for_each_engine_id(waiter, dev_priv, id) {
-		u32 seqno;
 		u64 gtt_offset = signaller->semaphore.signal_ggtt[id];
 		if (gtt_offset == MI_SEMAPHORE_SYNC_INVALID)
 			continue;
 
-		seqno = i915_gem_request_get_seqno(signaller_req);
 		intel_ring_emit(signaller, GFX_OP_PIPE_CONTROL(6));
 		intel_ring_emit(signaller, PIPE_CONTROL_GLOBAL_GTT_IVB |
 					   PIPE_CONTROL_QW_WRITE |
 					   PIPE_CONTROL_CS_STALL);
 		intel_ring_emit(signaller, lower_32_bits(gtt_offset));
 		intel_ring_emit(signaller, upper_32_bits(gtt_offset));
-		intel_ring_emit(signaller, seqno);
+		intel_ring_emit(signaller, signaller_req->seqno);
 		intel_ring_emit(signaller, 0);
 		intel_ring_emit(signaller, MI_SEMAPHORE_SIGNAL |
 					   MI_SEMAPHORE_TARGET(waiter->hw_id));
@@ -1322,18 +1320,16 @@ static int gen8_xcs_signal(struct drm_i915_gem_request *signaller_req,
 		return ret;
 
 	for_each_engine_id(waiter, dev_priv, id) {
-		u32 seqno;
 		u64 gtt_offset = signaller->semaphore.signal_ggtt[id];
 		if (gtt_offset == MI_SEMAPHORE_SYNC_INVALID)
 			continue;
 
-		seqno = i915_gem_request_get_seqno(signaller_req);
 		intel_ring_emit(signaller, (MI_FLUSH_DW + 1) |
 					   MI_FLUSH_DW_OP_STOREDW);
 		intel_ring_emit(signaller, lower_32_bits(gtt_offset) |
 					   MI_FLUSH_DW_USE_GTT);
 		intel_ring_emit(signaller, upper_32_bits(gtt_offset));
-		intel_ring_emit(signaller, seqno);
+		intel_ring_emit(signaller, signaller_req->seqno);
 		intel_ring_emit(signaller, MI_SEMAPHORE_SIGNAL |
 					   MI_SEMAPHORE_TARGET(waiter->hw_id));
 		intel_ring_emit(signaller, 0);
@@ -1364,11 +1360,9 @@ static int gen6_signal(struct drm_i915_gem_request *signaller_req,
 		i915_reg_t mbox_reg = signaller->semaphore.mbox.signal[id];
 
 		if (i915_mmio_reg_valid(mbox_reg)) {
-			u32 seqno = i915_gem_request_get_seqno(signaller_req);
-
 			intel_ring_emit(signaller, MI_LOAD_REGISTER_IMM(1));
 			intel_ring_emit_reg(signaller, mbox_reg);
-			intel_ring_emit(signaller, seqno);
+			intel_ring_emit(signaller, signaller_req->seqno);
 		}
 	}
 
@@ -1404,7 +1398,7 @@ gen6_add_request(struct drm_i915_gem_request *req)
 	intel_ring_emit(engine, MI_STORE_DWORD_INDEX);
 	intel_ring_emit(engine,
 			I915_GEM_HWS_INDEX << MI_STORE_DWORD_INDEX_SHIFT);
-	intel_ring_emit(engine, i915_gem_request_get_seqno(req));
+	intel_ring_emit(engine, req->seqno);
 	intel_ring_emit(engine, MI_USER_INTERRUPT);
 	__intel_ring_advance(engine);
 
@@ -1543,7 +1537,9 @@ static int
 pc_render_add_request(struct drm_i915_gem_request *req)
 {
 	struct intel_engine_cs *engine = req->engine;
-	u32 scratch_addr = engine->scratch.gtt_offset + 2 * CACHELINE_BYTES;
+	u32 addr = engine->status_page.gfx_addr +
+		(I915_GEM_HWS_INDEX << MI_STORE_DWORD_INDEX_SHIFT);
+	u32 scratch_addr = addr;
 	int ret;
 
 	/* For Ironlake, MI_USER_INTERRUPT was deprecated and apparently
@@ -1559,12 +1555,12 @@ pc_render_add_request(struct drm_i915_gem_request *req)
 		return ret;
 
 	intel_ring_emit(engine,
-			GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE |
+			GFX_OP_PIPE_CONTROL(4) |
+			PIPE_CONTROL_QW_WRITE |
 			PIPE_CONTROL_WRITE_FLUSH |
 			PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE);
-	intel_ring_emit(engine,
-			engine->scratch.gtt_offset | PIPE_CONTROL_GLOBAL_GTT);
-	intel_ring_emit(engine, i915_gem_request_get_seqno(req));
+	intel_ring_emit(engine, addr | PIPE_CONTROL_GLOBAL_GTT);
+	intel_ring_emit(engine, req->seqno);
 	intel_ring_emit(engine, 0);
 	PIPE_CONTROL_FLUSH(engine, scratch_addr);
 	scratch_addr += 2 * CACHELINE_BYTES; /* write to separate cachelines */
@@ -1579,13 +1575,12 @@ pc_render_add_request(struct drm_i915_gem_request *req)
 	PIPE_CONTROL_FLUSH(engine, scratch_addr);
 
 	intel_ring_emit(engine,
-			GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE |
+		       	GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE |
 			PIPE_CONTROL_WRITE_FLUSH |
 			PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE |
 			PIPE_CONTROL_NOTIFY);
-	intel_ring_emit(engine,
-			engine->scratch.gtt_offset | PIPE_CONTROL_GLOBAL_GTT);
-	intel_ring_emit(engine, i915_gem_request_get_seqno(req));
+	intel_ring_emit(engine, addr | PIPE_CONTROL_GLOBAL_GTT);
+	intel_ring_emit(engine, req->seqno);
 	intel_ring_emit(engine, 0);
 	__intel_ring_advance(engine);
 
@@ -1617,30 +1612,6 @@ gen6_seqno_barrier(struct intel_engine_cs *engine)
 	spin_unlock_irq(&dev_priv->uncore.lock);
 }
 
-static u32
-ring_get_seqno(struct intel_engine_cs *engine)
-{
-	return intel_read_status_page(engine, I915_GEM_HWS_INDEX);
-}
-
-static void
-ring_set_seqno(struct intel_engine_cs *engine, u32 seqno)
-{
-	intel_write_status_page(engine, I915_GEM_HWS_INDEX, seqno);
-}
-
-static u32
-pc_render_get_seqno(struct intel_engine_cs *engine)
-{
-	return engine->scratch.cpu_page[0];
-}
-
-static void
-pc_render_set_seqno(struct intel_engine_cs *engine, u32 seqno)
-{
-	engine->scratch.cpu_page[0] = seqno;
-}
-
 static bool
 gen5_ring_get_irq(struct intel_engine_cs *engine)
 {
@@ -1770,8 +1741,8 @@ i9xx_add_request(struct drm_i915_gem_request *req)
 
 	intel_ring_emit(engine, MI_STORE_DWORD_INDEX);
 	intel_ring_emit(engine,
-			I915_GEM_HWS_INDEX << MI_STORE_DWORD_INDEX_SHIFT);
-	intel_ring_emit(engine, i915_gem_request_get_seqno(req));
+		       	I915_GEM_HWS_INDEX << MI_STORE_DWORD_INDEX_SHIFT);
+	intel_ring_emit(engine, req->seqno);
 	intel_ring_emit(engine, MI_USER_INTERRUPT);
 	__intel_ring_advance(engine);
 
@@ -2588,7 +2559,9 @@ void intel_ring_init_seqno(struct intel_engine_cs *engine, u32 seqno)
 	memset(engine->semaphore.sync_seqno, 0,
 	       sizeof(engine->semaphore.sync_seqno));
 
-	engine->set_seqno(engine, seqno);
+	intel_write_status_page(engine, I915_GEM_HWS_INDEX, seqno);
+	if (engine->irq_seqno_barrier)
+		engine->irq_seqno_barrier(engine);
 	engine->last_submitted_seqno = seqno;
 
 	engine->hangcheck.seqno = seqno;
@@ -2830,8 +2803,6 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 		engine->irq_get = gen8_ring_get_irq;
 		engine->irq_put = gen8_ring_put_irq;
 		engine->irq_enable_mask = GT_RENDER_USER_INTERRUPT;
-		engine->get_seqno = ring_get_seqno;
-		engine->set_seqno = ring_set_seqno;
 		if (i915_semaphore_is_enabled(dev_priv)) {
 			WARN_ON(!dev_priv->semaphore_obj);
 			engine->semaphore.sync_to = gen8_ring_sync;
@@ -2848,8 +2819,6 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 		engine->irq_put = gen6_ring_put_irq;
 		engine->irq_enable_mask = GT_RENDER_USER_INTERRUPT;
 		engine->irq_seqno_barrier = gen6_seqno_barrier;
-		engine->get_seqno = ring_get_seqno;
-		engine->set_seqno = ring_set_seqno;
 		if (i915_semaphore_is_enabled(dev_priv)) {
 			engine->semaphore.sync_to = gen6_ring_sync;
 			engine->semaphore.signal = gen6_signal;
@@ -2874,8 +2843,6 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 	} else if (IS_GEN5(dev_priv)) {
 		engine->add_request = pc_render_add_request;
 		engine->flush = gen4_render_ring_flush;
-		engine->get_seqno = pc_render_get_seqno;
-		engine->set_seqno = pc_render_set_seqno;
 		engine->irq_get = gen5_ring_get_irq;
 		engine->irq_put = gen5_ring_put_irq;
 		engine->irq_enable_mask = GT_RENDER_USER_INTERRUPT |
@@ -2886,8 +2853,6 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 			engine->flush = gen2_render_ring_flush;
 		else
 			engine->flush = gen4_render_ring_flush;
-		engine->get_seqno = ring_get_seqno;
-		engine->set_seqno = ring_set_seqno;
 		if (IS_GEN2(dev_priv)) {
 			engine->irq_get = i8xx_ring_get_irq;
 			engine->irq_put = i8xx_ring_put_irq;
@@ -2965,8 +2930,6 @@ int intel_init_bsd_ring_buffer(struct drm_device *dev)
 		engine->flush = gen6_bsd_ring_flush;
 		engine->add_request = gen6_add_request;
 		engine->irq_seqno_barrier = gen6_seqno_barrier;
-		engine->get_seqno = ring_get_seqno;
-		engine->set_seqno = ring_set_seqno;
 		if (INTEL_GEN(dev_priv) >= 8) {
 			engine->irq_enable_mask =
 				GT_RENDER_USER_INTERRUPT << GEN8_VCS1_IRQ_SHIFT;
@@ -3004,8 +2967,6 @@ int intel_init_bsd_ring_buffer(struct drm_device *dev)
 		engine->mmio_base = BSD_RING_BASE;
 		engine->flush = bsd_ring_flush;
 		engine->add_request = i9xx_add_request;
-		engine->get_seqno = ring_get_seqno;
-		engine->set_seqno = ring_set_seqno;
 		if (IS_GEN5(dev_priv)) {
 			engine->irq_enable_mask = ILK_BSD_USER_INTERRUPT;
 			engine->irq_get = gen5_ring_get_irq;
@@ -3040,8 +3001,6 @@ int intel_init_bsd2_ring_buffer(struct drm_device *dev)
 	engine->flush = gen6_bsd_ring_flush;
 	engine->add_request = gen6_add_request;
 	engine->irq_seqno_barrier = gen6_seqno_barrier;
-	engine->get_seqno = ring_get_seqno;
-	engine->set_seqno = ring_set_seqno;
 	engine->irq_enable_mask =
 			GT_RENDER_USER_INTERRUPT << GEN8_VCS2_IRQ_SHIFT;
 	engine->irq_get = gen8_ring_get_irq;
@@ -3073,8 +3032,6 @@ int intel_init_blt_ring_buffer(struct drm_device *dev)
 	engine->flush = gen6_ring_flush;
 	engine->add_request = gen6_add_request;
 	engine->irq_seqno_barrier = gen6_seqno_barrier;
-	engine->get_seqno = ring_get_seqno;
-	engine->set_seqno = ring_set_seqno;
 	if (INTEL_GEN(dev_priv) >= 8) {
 		engine->irq_enable_mask =
 			GT_RENDER_USER_INTERRUPT << GEN8_BCS_IRQ_SHIFT;
@@ -3133,8 +3090,6 @@ int intel_init_vebox_ring_buffer(struct drm_device *dev)
 	engine->flush = gen6_ring_flush;
 	engine->add_request = gen6_add_request;
 	engine->irq_seqno_barrier = gen6_seqno_barrier;
-	engine->get_seqno = ring_get_seqno;
-	engine->set_seqno = ring_set_seqno;
 
 	if (INTEL_GEN(dev_priv) >= 8) {
 		engine->irq_enable_mask =
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 061088360b80..785c9e5312ff 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -219,9 +219,6 @@ struct intel_engine_cs {
 	 * monotonic, even if not coherent.
 	 */
 	void		(*irq_seqno_barrier)(struct intel_engine_cs *ring);
-	u32		(*get_seqno)(struct intel_engine_cs *ring);
-	void		(*set_seqno)(struct intel_engine_cs *ring,
-				     u32 seqno);
 	int		(*dispatch_execbuffer)(struct drm_i915_gem_request *req,
 					       u64 offset, u32 length,
 					       unsigned dispatch_flags);
@@ -497,6 +494,10 @@ int intel_init_blt_ring_buffer(struct drm_device *dev);
 int intel_init_vebox_ring_buffer(struct drm_device *dev);
 
 u64 intel_ring_get_active_head(struct intel_engine_cs *engine);
+static inline u32 intel_engine_get_seqno(struct intel_engine_cs *engine)
+{
+	return intel_read_status_page(engine, I915_GEM_HWS_INDEX);
+}
 
 int init_workarounds_ring(struct intel_engine_cs *engine);
 
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 09/21] drm/i915: Stop mapping the scratch page into CPU space
  2016-06-03 16:08 Breadcrumbs, again Chris Wilson
                   ` (7 preceding siblings ...)
  2016-06-03 16:08 ` [PATCH 08/21] drm/i915: Use HWS for seqno tracking everywhere Chris Wilson
@ 2016-06-03 16:08 ` Chris Wilson
  2016-06-06 15:03   ` Tvrtko Ursulin
  2016-06-03 16:08 ` [PATCH 10/21] drm/i915: Allocate scratch page from stolen Chris Wilson
                   ` (12 subsequent siblings)
  21 siblings, 1 reply; 60+ messages in thread
From: Chris Wilson @ 2016-06-03 16:08 UTC (permalink / raw)
  To: intel-gfx

After the elimination of using the scratch page for Ironlake's
breadcrumb, we no longer need to kmap the object. We therefore can move
it into the high unmappable space and do not need to force the object to
be coherent (i.e. snooped on !llc platforms).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 40 +++++++++------------------------
 drivers/gpu/drm/i915/intel_ringbuffer.h |  1 -
 2 files changed, 11 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index bac496902c6d..106f40c52bb5 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -643,58 +643,40 @@ out:
 	return ret;
 }
 
-void
-intel_fini_pipe_control(struct intel_engine_cs *engine)
+void intel_fini_pipe_control(struct intel_engine_cs *engine)
 {
 	if (engine->scratch.obj == NULL)
 		return;
 
-	if (INTEL_GEN(engine->i915) >= 5) {
-		kunmap(sg_page(engine->scratch.obj->pages->sgl));
-		i915_gem_object_ggtt_unpin(engine->scratch.obj);
-	}
-
+	i915_gem_object_ggtt_unpin(engine->scratch.obj);
 	drm_gem_object_unreference(&engine->scratch.obj->base);
 	engine->scratch.obj = NULL;
 }
 
-int
-intel_init_pipe_control(struct intel_engine_cs *engine)
+int intel_init_pipe_control(struct intel_engine_cs *engine)
 {
+	struct drm_i915_gem_object *obj;
 	int ret;
 
 	WARN_ON(engine->scratch.obj);
 
-	engine->scratch.obj = i915_gem_object_create(engine->i915->dev, 4096);
-	if (IS_ERR(engine->scratch.obj)) {
-		DRM_ERROR("Failed to allocate seqno page\n");
-		ret = PTR_ERR(engine->scratch.obj);
-		engine->scratch.obj = NULL;
+	obj = i915_gem_object_create(engine->i915->dev, 4096);
+	if (IS_ERR(obj)) {
+		DRM_ERROR("Failed to allocate scratch page\n");
+		ret = PTR_ERR(obj);
 		goto err;
 	}
 
-	ret = i915_gem_object_set_cache_level(engine->scratch.obj,
-					      I915_CACHE_LLC);
+	ret = i915_gem_obj_ggtt_pin(obj, 4096, PIN_HIGH);
 	if (ret)
 		goto err_unref;
 
-	ret = i915_gem_obj_ggtt_pin(engine->scratch.obj, 4096, 0);
-	if (ret)
-		goto err_unref;
-
-	engine->scratch.gtt_offset = i915_gem_obj_ggtt_offset(engine->scratch.obj);
-	engine->scratch.cpu_page = kmap(sg_page(engine->scratch.obj->pages->sgl));
-	if (engine->scratch.cpu_page == NULL) {
-		ret = -ENOMEM;
-		goto err_unpin;
-	}
-
+	engine->scratch.obj = obj;
+	engine->scratch.gtt_offset = i915_gem_obj_ggtt_offset(obj);
 	DRM_DEBUG_DRIVER("%s pipe control offset: 0x%08x\n",
 			 engine->name, engine->scratch.gtt_offset);
 	return 0;
 
-err_unpin:
-	i915_gem_object_ggtt_unpin(engine->scratch.obj);
 err_unref:
 	drm_gem_object_unreference(&engine->scratch.obj->base);
 err:
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 785c9e5312ff..4b2f19decb30 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -338,7 +338,6 @@ struct intel_engine_cs {
 	struct {
 		struct drm_i915_gem_object *obj;
 		u32 gtt_offset;
-		volatile u32 *cpu_page;
 	} scratch;
 
 	bool needs_cmd_parser;
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 10/21] drm/i915: Allocate scratch page from stolen
  2016-06-03 16:08 Breadcrumbs, again Chris Wilson
                   ` (8 preceding siblings ...)
  2016-06-03 16:08 ` [PATCH 09/21] drm/i915: Stop mapping the scratch page into CPU space Chris Wilson
@ 2016-06-03 16:08 ` Chris Wilson
  2016-06-06 15:05   ` Tvrtko Ursulin
  2016-06-03 16:08 ` [PATCH 11/21] drm/i915: Refactor scratch object allocation for gen2 w/a buffer Chris Wilson
                   ` (11 subsequent siblings)
  21 siblings, 1 reply; 60+ messages in thread
From: Chris Wilson @ 2016-06-03 16:08 UTC (permalink / raw)
  To: intel-gfx

With the last direct CPU access to the scratch page removed, we can now
allocate it from our small amount of reserved system pages (stolen
memory).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 106f40c52bb5..b7eebbed945d 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -660,7 +660,9 @@ int intel_init_pipe_control(struct intel_engine_cs *engine)
 
 	WARN_ON(engine->scratch.obj);
 
-	obj = i915_gem_object_create(engine->i915->dev, 4096);
+	obj = i915_gem_object_create_stolen(engine->i915->dev, 4096);
+	if (obj == NULL)
+		obj = i915_gem_object_create(engine->i915->dev, 4096);
 	if (IS_ERR(obj)) {
 		DRM_ERROR("Failed to allocate scratch page\n");
 		ret = PTR_ERR(obj);
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 11/21] drm/i915: Refactor scratch object allocation for gen2 w/a buffer
  2016-06-03 16:08 Breadcrumbs, again Chris Wilson
                   ` (9 preceding siblings ...)
  2016-06-03 16:08 ` [PATCH 10/21] drm/i915: Allocate scratch page from stolen Chris Wilson
@ 2016-06-03 16:08 ` Chris Wilson
  2016-06-06 15:09   ` Tvrtko Ursulin
  2016-06-03 16:08 ` [PATCH 12/21] drm/i915: Add a delay between interrupt and inspecting the final seqno (ilk) Chris Wilson
                   ` (10 subsequent siblings)
  21 siblings, 1 reply; 60+ messages in thread
From: Chris Wilson @ 2016-06-03 16:08 UTC (permalink / raw)
  To: intel-gfx

The gen2 w/a buffer is stuffed into the same slot as the gen5+ scratch
buffer. If we pass in the size we want to allocate for the scratch
buffer, both callers can use the same routine.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_lrc.c        |  2 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c | 31 ++++++++-----------------------
 drivers/gpu/drm/i915/intel_ringbuffer.h |  2 +-
 3 files changed, 10 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index e48687837a95..32b5eae7dd11 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2078,7 +2078,7 @@ static int logical_render_ring_init(struct drm_device *dev)
 	engine->emit_flush = gen8_emit_flush_render;
 	engine->emit_request = gen8_emit_request_render;
 
-	ret = intel_init_pipe_control(engine);
+	ret = intel_init_pipe_control(engine, 4096);
 	if (ret)
 		return ret;
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index b7eebbed945d..ca2e59405998 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -653,16 +653,16 @@ void intel_fini_pipe_control(struct intel_engine_cs *engine)
 	engine->scratch.obj = NULL;
 }
 
-int intel_init_pipe_control(struct intel_engine_cs *engine)
+int intel_init_pipe_control(struct intel_engine_cs *engine, int size)
 {
 	struct drm_i915_gem_object *obj;
 	int ret;
 
 	WARN_ON(engine->scratch.obj);
 
-	obj = i915_gem_object_create_stolen(engine->i915->dev, 4096);
+	obj = i915_gem_object_create_stolen(engine->i915->dev, size);
 	if (obj == NULL)
-		obj = i915_gem_object_create(engine->i915->dev, 4096);
+		obj = i915_gem_object_create(engine->i915->dev, size);
 	if (IS_ERR(obj)) {
 		DRM_ERROR("Failed to allocate scratch page\n");
 		ret = PTR_ERR(obj);
@@ -2863,31 +2863,16 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 	engine->init_hw = init_render_ring;
 	engine->cleanup = render_ring_cleanup;
 
-	/* Workaround batchbuffer to combat CS tlb bug. */
-	if (HAS_BROKEN_CS_TLB(dev_priv)) {
-		obj = i915_gem_object_create(dev, I830_WA_SIZE);
-		if (IS_ERR(obj)) {
-			DRM_ERROR("Failed to allocate batch bo\n");
-			return PTR_ERR(obj);
-		}
-
-		ret = i915_gem_obj_ggtt_pin(obj, 0, 0);
-		if (ret != 0) {
-			drm_gem_object_unreference(&obj->base);
-			DRM_ERROR("Failed to ping batch bo\n");
-			return ret;
-		}
-
-		engine->scratch.obj = obj;
-		engine->scratch.gtt_offset = i915_gem_obj_ggtt_offset(obj);
-	}
-
 	ret = intel_init_ring_buffer(dev, engine);
 	if (ret)
 		return ret;
 
 	if (INTEL_GEN(dev_priv) >= 5) {
-		ret = intel_init_pipe_control(engine);
+		ret = intel_init_pipe_control(engine, 4096);
+		if (ret)
+			return ret;
+	} else if (HAS_BROKEN_CS_TLB(dev_priv)) {
+		ret = intel_init_pipe_control(engine, I830_WA_SIZE);
 		if (ret)
 			return ret;
 	}
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 4b2f19decb30..cb599a54931a 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -483,8 +483,8 @@ void intel_ring_init_seqno(struct intel_engine_cs *engine, u32 seqno);
 int intel_ring_flush_all_caches(struct drm_i915_gem_request *req);
 int intel_ring_invalidate_all_caches(struct drm_i915_gem_request *req);
 
+int intel_init_pipe_control(struct intel_engine_cs *engine, int size);
 void intel_fini_pipe_control(struct intel_engine_cs *engine);
-int intel_init_pipe_control(struct intel_engine_cs *engine);
 
 int intel_init_render_ring_buffer(struct drm_device *dev);
 int intel_init_bsd_ring_buffer(struct drm_device *dev);
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 12/21] drm/i915: Add a delay between interrupt and inspecting the final seqno (ilk)
  2016-06-03 16:08 Breadcrumbs, again Chris Wilson
                   ` (10 preceding siblings ...)
  2016-06-03 16:08 ` [PATCH 11/21] drm/i915: Refactor scratch object allocation for gen2 w/a buffer Chris Wilson
@ 2016-06-03 16:08 ` Chris Wilson
  2016-06-03 16:08 ` [PATCH 13/21] drm/i915: Check the CPU cached value of seqno after waking the waiter Chris Wilson
                   ` (9 subsequent siblings)
  21 siblings, 0 replies; 60+ messages in thread
From: Chris Wilson @ 2016-06-03 16:08 UTC (permalink / raw)
  To: intel-gfx

On Ironlake, there is no command nor register to ensure that the write
from a MI_STORE command is completed (and coherent on the CPU) before the
command parser continues. This means that the ordering between the seqno
write and the subsequent user interrupt is undefined (like gen6+). So to
ensure that the seqno write is completed after the final user interrupt
we need to delay the read sufficiently to allow the write to complete.
This delay is undefined by the bspec, and empirically requires 75us even
though a register read combined with a clflush is less than 500ns. Hence,
the delay is due to an on-chip buffer rather than the latency of the write
to memory.

Note that the render ring controls this by filling the PIPE_CONTROL fifo
with stalling commands that force the earliest pipe-control with the
seqno to be completed before the command parser continues. Given that we
need a barrier operation for BSD, we may as well forgo the extra
per-batch latency by using a common per-interrupt barrier.

Studying the impact of adding the usleep shows that in both sequences of
and individual synchronous no-op batches is negligible for the media
engine (where the write now is unordered with the interrupt). Converting
the render engine over from the current glutton of pie-controls over to
the per-interrupt delays speeds up both the sequential and individual
synchronous no-ops by 20% and 60%, respectively. This speed up holds
even when looking at the throughput of small copies (4KiB->4MiB), both
serial and synchronous, by about 20%. This is because despite adding a
significant delay to the interrupt, in all likelihood we will see the
seqno write without having to apply the barrier (only in the rare corner
cases where the write is delayed on the last required is the delay
necessary).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94307
Testcase: igt/gem_sync #ilk
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_irq.c         | 10 ++---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 80 ++++++++-------------------------
 2 files changed, 21 insertions(+), 69 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 4013ad92cdc6..c14eb57b5807 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1264,8 +1264,7 @@ static void ivybridge_parity_error_irq_handler(struct drm_i915_private *dev_priv
 static void ilk_gt_irq_handler(struct drm_i915_private *dev_priv,
 			       u32 gt_iir)
 {
-	if (gt_iir &
-	    (GT_RENDER_USER_INTERRUPT | GT_RENDER_PIPECTL_NOTIFY_INTERRUPT))
+	if (gt_iir & GT_RENDER_USER_INTERRUPT)
 		notify_ring(&dev_priv->engine[RCS]);
 	if (gt_iir & ILK_BSD_USER_INTERRUPT)
 		notify_ring(&dev_priv->engine[VCS]);
@@ -1274,9 +1273,7 @@ static void ilk_gt_irq_handler(struct drm_i915_private *dev_priv,
 static void snb_gt_irq_handler(struct drm_i915_private *dev_priv,
 			       u32 gt_iir)
 {
-
-	if (gt_iir &
-	    (GT_RENDER_USER_INTERRUPT | GT_RENDER_PIPECTL_NOTIFY_INTERRUPT))
+	if (gt_iir & GT_RENDER_USER_INTERRUPT)
 		notify_ring(&dev_priv->engine[RCS]);
 	if (gt_iir & GT_BSD_USER_INTERRUPT)
 		notify_ring(&dev_priv->engine[VCS]);
@@ -3600,8 +3597,7 @@ static void gen5_gt_irq_postinstall(struct drm_device *dev)
 
 	gt_irqs |= GT_RENDER_USER_INTERRUPT;
 	if (IS_GEN5(dev)) {
-		gt_irqs |= GT_RENDER_PIPECTL_NOTIFY_INTERRUPT |
-			   ILK_BSD_USER_INTERRUPT;
+		gt_irqs |= ILK_BSD_USER_INTERRUPT;
 	} else {
 		gt_irqs |= GT_BLT_USER_INTERRUPT | GT_BSD_USER_INTERRUPT;
 	}
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index ca2e59405998..30e400d77d23 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1508,67 +1508,22 @@ gen6_ring_sync(struct drm_i915_gem_request *waiter_req,
 	return 0;
 }
 
-#define PIPE_CONTROL_FLUSH(ring__, addr__)					\
-do {									\
-	intel_ring_emit(ring__, GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE |		\
-		 PIPE_CONTROL_DEPTH_STALL);				\
-	intel_ring_emit(ring__, (addr__) | PIPE_CONTROL_GLOBAL_GTT);			\
-	intel_ring_emit(ring__, 0);							\
-	intel_ring_emit(ring__, 0);							\
-} while (0)
-
-static int
-pc_render_add_request(struct drm_i915_gem_request *req)
+static void
+gen5_seqno_barrier(struct intel_engine_cs *ring)
 {
-	struct intel_engine_cs *engine = req->engine;
-	u32 addr = engine->status_page.gfx_addr +
-		(I915_GEM_HWS_INDEX << MI_STORE_DWORD_INDEX_SHIFT);
-	u32 scratch_addr = addr;
-	int ret;
-
-	/* For Ironlake, MI_USER_INTERRUPT was deprecated and apparently
-	 * incoherent with writes to memory, i.e. completely fubar,
-	 * so we need to use PIPE_NOTIFY instead.
+	/* MI_STORE are internally buffered by the GPU and not flushed
+	 * either by MI_FLUSH or SyncFlush or any other combination of
+	 * MI commands.
 	 *
-	 * However, we also need to workaround the qword write
-	 * incoherence by flushing the 6 PIPE_NOTIFY buffers out to
-	 * memory before requesting an interrupt.
+	 * "Only the submission of the store operation is guaranteed.
+	 * The write result will be complete (coherent) some time later
+	 * (this is practically a finite period but there is no guaranteed
+	 * latency)."
+	 *
+	 * Empirically, we observe that we need a delay of at least 75us to
+	 * be sure that the seqno write is visible by the CPU.
 	 */
-	ret = intel_ring_begin(req, 32);
-	if (ret)
-		return ret;
-
-	intel_ring_emit(engine,
-			GFX_OP_PIPE_CONTROL(4) |
-			PIPE_CONTROL_QW_WRITE |
-			PIPE_CONTROL_WRITE_FLUSH |
-			PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE);
-	intel_ring_emit(engine, addr | PIPE_CONTROL_GLOBAL_GTT);
-	intel_ring_emit(engine, req->seqno);
-	intel_ring_emit(engine, 0);
-	PIPE_CONTROL_FLUSH(engine, scratch_addr);
-	scratch_addr += 2 * CACHELINE_BYTES; /* write to separate cachelines */
-	PIPE_CONTROL_FLUSH(engine, scratch_addr);
-	scratch_addr += 2 * CACHELINE_BYTES;
-	PIPE_CONTROL_FLUSH(engine, scratch_addr);
-	scratch_addr += 2 * CACHELINE_BYTES;
-	PIPE_CONTROL_FLUSH(engine, scratch_addr);
-	scratch_addr += 2 * CACHELINE_BYTES;
-	PIPE_CONTROL_FLUSH(engine, scratch_addr);
-	scratch_addr += 2 * CACHELINE_BYTES;
-	PIPE_CONTROL_FLUSH(engine, scratch_addr);
-
-	intel_ring_emit(engine,
-		       	GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE |
-			PIPE_CONTROL_WRITE_FLUSH |
-			PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE |
-			PIPE_CONTROL_NOTIFY);
-	intel_ring_emit(engine, addr | PIPE_CONTROL_GLOBAL_GTT);
-	intel_ring_emit(engine, req->seqno);
-	intel_ring_emit(engine, 0);
-	__intel_ring_advance(engine);
-
-	return 0;
+	usleep_range(75, 250);
 }
 
 static void
@@ -2825,12 +2780,12 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 			engine->semaphore.mbox.signal[VCS2] = GEN6_NOSYNC;
 		}
 	} else if (IS_GEN5(dev_priv)) {
-		engine->add_request = pc_render_add_request;
+		engine->add_request = i9xx_add_request;
 		engine->flush = gen4_render_ring_flush;
 		engine->irq_get = gen5_ring_get_irq;
 		engine->irq_put = gen5_ring_put_irq;
-		engine->irq_enable_mask = GT_RENDER_USER_INTERRUPT |
-					GT_RENDER_PIPECTL_NOTIFY_INTERRUPT;
+		engine->irq_seqno_barrier = gen5_seqno_barrier;
+		engine->irq_enable_mask = GT_RENDER_USER_INTERRUPT;
 	} else {
 		engine->add_request = i9xx_add_request;
 		if (INTEL_GEN(dev_priv) < 4)
@@ -2867,7 +2822,7 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 	if (ret)
 		return ret;
 
-	if (INTEL_GEN(dev_priv) >= 5) {
+	if (INTEL_GEN(dev_priv) >= 6) {
 		ret = intel_init_pipe_control(engine, 4096);
 		if (ret)
 			return ret;
@@ -2940,6 +2895,7 @@ int intel_init_bsd_ring_buffer(struct drm_device *dev)
 			engine->irq_enable_mask = ILK_BSD_USER_INTERRUPT;
 			engine->irq_get = gen5_ring_get_irq;
 			engine->irq_put = gen5_ring_put_irq;
+			engine->irq_seqno_barrier = gen5_seqno_barrier;
 		} else {
 			engine->irq_enable_mask = I915_BSD_USER_INTERRUPT;
 			engine->irq_get = i9xx_ring_get_irq;
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 13/21] drm/i915: Check the CPU cached value of seqno after waking the waiter
  2016-06-03 16:08 Breadcrumbs, again Chris Wilson
                   ` (11 preceding siblings ...)
  2016-06-03 16:08 ` [PATCH 12/21] drm/i915: Add a delay between interrupt and inspecting the final seqno (ilk) Chris Wilson
@ 2016-06-03 16:08 ` Chris Wilson
  2016-06-06 15:10   ` Tvrtko Ursulin
  2016-06-03 16:08 ` [PATCH 14/21] drm/i915: Only apply one barrier after a breadcrumb interrupt is posted Chris Wilson
                   ` (8 subsequent siblings)
  21 siblings, 1 reply; 60+ messages in thread
From: Chris Wilson @ 2016-06-03 16:08 UTC (permalink / raw)
  To: intel-gfx

If we have multiple waiters, we may find that many complete on the same
wake up. If we first inspect the seqno from the CPU cache, we may reduce
the number of heavyweight coherent seqno reads we require.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 4a71f4e9a97a..4ddb9ff319cb 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3918,6 +3918,12 @@ static inline bool __i915_request_irq_complete(struct drm_i915_gem_request *req)
 {
 	struct intel_engine_cs *engine = req->engine;
 
+	/* Before we do the heavier coherent read of the seqno,
+	 * check the value (hopefully) in the CPU cacheline.
+	 */
+	if (i915_gem_request_completed(req))
+		return true;
+
 	/* Ensure our read of the seqno is coherent so that we
 	 * do not "miss an interrupt" (i.e. if this is the last
 	 * request and the seqno write from the GPU is not visible
@@ -3929,11 +3935,11 @@ static inline bool __i915_request_irq_complete(struct drm_i915_gem_request *req)
 	 * but it is easier and safer to do it every time the waiter
 	 * is woken.
 	 */
-	if (engine->irq_seqno_barrier)
+	if (engine->irq_seqno_barrier) {
 		engine->irq_seqno_barrier(engine);
-
-	if (i915_gem_request_completed(req))
-		return true;
+		if (i915_gem_request_completed(req))
+			return true;
+	}
 
 	/* We need to check whether any gpu reset happened in between
 	 * the request being submitted and now. If a reset has occurred,
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 14/21] drm/i915: Only apply one barrier after a breadcrumb interrupt is posted
  2016-06-03 16:08 Breadcrumbs, again Chris Wilson
                   ` (12 preceding siblings ...)
  2016-06-03 16:08 ` [PATCH 13/21] drm/i915: Check the CPU cached value of seqno after waking the waiter Chris Wilson
@ 2016-06-03 16:08 ` Chris Wilson
  2016-06-06 15:34   ` Tvrtko Ursulin
  2016-06-03 16:08 ` [PATCH 15/21] drm/i915: Stop setting wraparound seqno on initialisation Chris Wilson
                   ` (7 subsequent siblings)
  21 siblings, 1 reply; 60+ messages in thread
From: Chris Wilson @ 2016-06-03 16:08 UTC (permalink / raw)
  To: intel-gfx

If we flag the seqno as potentially stale upon receiving an interrupt,
we can use that information to reduce the frequency that we apply the
heavyweight coherent seqno read (i.e. if we wake up a chain of waiters).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h          | 15 ++++++++++++++-
 drivers/gpu/drm/i915/i915_irq.c          |  1 +
 drivers/gpu/drm/i915/intel_breadcrumbs.c | 16 ++++++++++------
 drivers/gpu/drm/i915/intel_ringbuffer.h  |  1 +
 4 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 4ddb9ff319cb..a71d08199d57 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3935,7 +3935,20 @@ static inline bool __i915_request_irq_complete(struct drm_i915_gem_request *req)
 	 * but it is easier and safer to do it every time the waiter
 	 * is woken.
 	 */
-	if (engine->irq_seqno_barrier) {
+	if (engine->irq_seqno_barrier && READ_ONCE(engine->irq_posted)) {
+		/* The ordering of irq_posted versus applying the barrier
+		 * is crucial. The clearing of the current irq_posted must
+		 * be visible before we perform the barrier operation,
+		 * such that if a subsequent interrupt arrives, irq_posted
+		 * is reasserted and our task rewoken (which causes us to
+		 * do another __i915_request_irq_complete() immediately
+		 * and reapply the barrier). Conversely, if the clear
+		 * occurs after the barrier, then an interrupt that arrived
+		 * whilst we waited on the barrier would not trigger a
+		 * barrier on the next pass, and the read may not see the
+		 * seqno update.
+		 */
+		WRITE_ONCE(engine->irq_posted, false);
 		engine->irq_seqno_barrier(engine);
 		if (i915_gem_request_completed(req))
 			return true;
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index c14eb57b5807..14b3d65bb604 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -976,6 +976,7 @@ static void ironlake_rps_change_irq_handler(struct drm_i915_private *dev_priv)
 
 static void notify_ring(struct intel_engine_cs *engine)
 {
+	smp_store_mb(engine->irq_posted, true);
 	if (intel_engine_wakeup(engine)) {
 		trace_i915_gem_request_notify(engine);
 		engine->user_interrupts++;
diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
index 44346de39794..0f5fe114c204 100644
--- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
@@ -43,12 +43,18 @@ static void intel_breadcrumbs_fake_irq(unsigned long data)
 
 static void irq_enable(struct intel_engine_cs *engine)
 {
+	/* Enabling the IRQ may miss the generation of the interrupt, but
+	 * we still need to force the barrier before reading the seqno,
+	 * just in case.
+	 */
+	engine->irq_posted = true;
 	WARN_ON(!engine->irq_get(engine));
 }
 
 static void irq_disable(struct intel_engine_cs *engine)
 {
 	engine->irq_put(engine);
+	engine->irq_posted = false;
 }
 
 static bool __intel_breadcrumbs_enable_irq(struct intel_breadcrumbs *b)
@@ -56,7 +62,6 @@ static bool __intel_breadcrumbs_enable_irq(struct intel_breadcrumbs *b)
 	struct intel_engine_cs *engine =
 		container_of(b, struct intel_engine_cs, breadcrumbs);
 	struct drm_i915_private *i915 = engine->i915;
-	bool irq_posted = false;
 
 	assert_spin_locked(&b->lock);
 	if (b->rpm_wakelock)
@@ -72,10 +77,8 @@ static bool __intel_breadcrumbs_enable_irq(struct intel_breadcrumbs *b)
 
 	/* No interrupts? Kick the waiter every jiffie! */
 	if (intel_irqs_enabled(i915)) {
-		if (!test_bit(engine->id, &i915->gpu_error.test_irq_rings)) {
+		if (!test_bit(engine->id, &i915->gpu_error.test_irq_rings))
 			irq_enable(engine);
-			irq_posted = true;
-		}
 		b->irq_enabled = true;
 	}
 
@@ -83,7 +86,7 @@ static bool __intel_breadcrumbs_enable_irq(struct intel_breadcrumbs *b)
 	    test_bit(engine->id, &i915->gpu_error.missed_irq_rings))
 		mod_timer(&b->fake_irq, jiffies + 1);
 
-	return irq_posted;
+	return READ_ONCE(engine->irq_posted);
 }
 
 static void __intel_breadcrumbs_disable_irq(struct intel_breadcrumbs *b)
@@ -197,7 +200,8 @@ bool intel_engine_add_wait(struct intel_engine_cs *engine,
 			 * in case the seqno passed.
 			 */
 			__intel_breadcrumbs_enable_irq(b);
-			wake_up_process(to_wait(next)->task);
+			if (READ_ONCE(engine->irq_posted))
+				wake_up_process(to_wait(next)->task);
 		}
 
 		do {
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index cb599a54931a..324f85e8d540 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -197,6 +197,7 @@ struct intel_engine_cs {
 	struct i915_ctx_workarounds wa_ctx;
 
 	unsigned irq_refcount; /* protected by dev_priv->irq_lock */
+	bool		irq_posted;
 	u32		irq_enable_mask;	/* bitmask to enable ring interrupt */
 	struct drm_i915_gem_request *trace_irq_req;
 	bool __must_check (*irq_get)(struct intel_engine_cs *ring);
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 15/21] drm/i915: Stop setting wraparound seqno on initialisation
  2016-06-03 16:08 Breadcrumbs, again Chris Wilson
                   ` (13 preceding siblings ...)
  2016-06-03 16:08 ` [PATCH 14/21] drm/i915: Only apply one barrier after a breadcrumb interrupt is posted Chris Wilson
@ 2016-06-03 16:08 ` Chris Wilson
  2016-06-08  8:54   ` Daniel Vetter
  2016-06-03 16:08 ` [PATCH 16/21] drm/i915: Only query timestamp when measuring elapsed time Chris Wilson
                   ` (6 subsequent siblings)
  21 siblings, 1 reply; 60+ messages in thread
From: Chris Wilson @ 2016-06-03 16:08 UTC (permalink / raw)
  To: intel-gfx

We have testcases to ensure that seqno wraparound works fine, so we can
forgo forcing everyone to encounter seqno wraparound during early
uptime. seqno wraparound incurs a full GPU stall so not forcing it
will eliminate one jitter from the early system. Using the testcases, we
have very deterministic testing which given how difficult it would be to
debug an issue (GPU hang) stemming from a wraparound using pure
postmortem analysis I see no value in forcing a wrap during boot.

Advancing the global next_seqno after a GPU reset is equally pointless.

References? https://bugs.freedesktop.org/show_bug.cgi?id=95023
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 14 --------------
 1 file changed, 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index bf5c93f2bd81..269d00a40483 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4858,12 +4858,6 @@ i915_gem_init_hw(struct drm_device *dev)
 			goto out;
 	}
 
-	/*
-	 * Increment the next seqno by 0x100 so we have a visible break
-	 * on re-initialisation
-	 */
-	ret = i915_gem_set_seqno(dev, dev_priv->next_seqno+0x100);
-
 out:
 	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
 	return ret;
@@ -5006,14 +5000,6 @@ i915_gem_load_init(struct drm_device *dev)
 
 	dev_priv->relative_constants_mode = I915_EXEC_CONSTANTS_REL_GENERAL;
 
-	/*
-	 * Set initial sequence number for requests.
-	 * Using this number allows the wraparound to happen early,
-	 * catching any obvious problems.
-	 */
-	dev_priv->next_seqno = ((u32)~0 - 0x1100);
-	dev_priv->last_seqno = ((u32)~0 - 0x1101);
-
 	INIT_LIST_HEAD(&dev_priv->mm.fence_list);
 
 	init_waitqueue_head(&dev_priv->pending_flip_queue);
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 16/21] drm/i915: Only query timestamp when measuring elapsed time
  2016-06-03 16:08 Breadcrumbs, again Chris Wilson
                   ` (14 preceding siblings ...)
  2016-06-03 16:08 ` [PATCH 15/21] drm/i915: Stop setting wraparound seqno on initialisation Chris Wilson
@ 2016-06-03 16:08 ` Chris Wilson
  2016-06-06 13:50   ` Tvrtko Ursulin
  2016-06-03 16:08 ` [PATCH 17/21] drm/i915: Convert trace-irq to the breadcrumb waiter Chris Wilson
                   ` (5 subsequent siblings)
  21 siblings, 1 reply; 60+ messages in thread
From: Chris Wilson @ 2016-06-03 16:08 UTC (permalink / raw)
  To: intel-gfx

Avoid the two calls to ktime_get_raw_ns() (at best it reads the TSC) as
we only need to compute the elapsed time for a timed wait.

v2: Eliminate the unused local variable reducing the function size by 64
bytes (using the storage space on the callers stack rather than adding
to our stack frame). Writing the code this emits smaller and faster code
for the normal case.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 269d00a40483..fdbad07b5f42 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1212,7 +1212,6 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 	DEFINE_WAIT(reset);
 	struct intel_wait wait;
 	unsigned long timeout_remain;
-	s64 before = 0; /* Only to silence a compiler warning. */
 	int ret = 0;
 
 	might_sleep();
@@ -1231,12 +1230,9 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 		if (*timeout == 0)
 			return -ETIME;
 
+		/* Record current time in case interrupted, or wedged */
 		timeout_remain = nsecs_to_jiffies_timeout(*timeout);
-
-		/*
-		 * Record current time in case interrupted by signal, or wedged.
-		 */
-		before = ktime_get_raw_ns();
+		*timeout += ktime_get_raw_ns();
 	}
 
 	trace_i915_gem_request_wait_begin(req);
@@ -1299,9 +1295,9 @@ complete:
 	trace_i915_gem_request_wait_end(req);
 
 	if (timeout) {
-		s64 tres = *timeout - (ktime_get_raw_ns() - before);
-
-		*timeout = tres < 0 ? 0 : tres;
+		*timeout -= ktime_get_raw_ns();
+		if (*timeout < 0)
+			*timeout = 0;
 
 		/*
 		 * Apparently ktime isn't accurate enough and occasionally has a
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 17/21] drm/i915: Convert trace-irq to the breadcrumb waiter
  2016-06-03 16:08 Breadcrumbs, again Chris Wilson
                   ` (15 preceding siblings ...)
  2016-06-03 16:08 ` [PATCH 16/21] drm/i915: Only query timestamp when measuring elapsed time Chris Wilson
@ 2016-06-03 16:08 ` Chris Wilson
  2016-06-07 12:04   ` Tvrtko Ursulin
  2016-06-03 16:08 ` [PATCH 18/21] drm/i915: Embed signaling node into the GEM request Chris Wilson
                   ` (4 subsequent siblings)
  21 siblings, 1 reply; 60+ messages in thread
From: Chris Wilson @ 2016-06-03 16:08 UTC (permalink / raw)
  To: intel-gfx

If we convert the tracing over from direct use of ring->irq_get() and
over to the breadcrumb infrastructure, we only have a single user of the
ring->irq_get and so we will be able to simplify the driver routines
(eliminating the redundant validation and irq refcounting).

v2: Move to a signaling framework based upon the waiter.
v3: Track the first-signal to avoid having to walk the rbtree everytime.
v4: Mark the signaler thread as RT priority to reduce latency in the
indirect wakeups.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h          |   8 --
 drivers/gpu/drm/i915/i915_gem.c          |   9 +-
 drivers/gpu/drm/i915/i915_trace.h        |   2 +-
 drivers/gpu/drm/i915/intel_breadcrumbs.c | 178 +++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_ringbuffer.h  |   8 +-
 5 files changed, 188 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index a71d08199d57..b0235372cfdf 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3906,14 +3906,6 @@ wait_remaining_ms_from_jiffies(unsigned long timestamp_jiffies, int to_wait_ms)
 			    schedule_timeout_uninterruptible(remaining_jiffies);
 	}
 }
-
-static inline void i915_trace_irq_get(struct intel_engine_cs *engine,
-				      struct drm_i915_gem_request *req)
-{
-	if (engine->trace_irq_req == NULL && engine->irq_get(engine))
-		i915_gem_request_assign(&engine->trace_irq_req, req);
-}
-
 static inline bool __i915_request_irq_complete(struct drm_i915_gem_request *req)
 {
 	struct intel_engine_cs *engine = req->engine;
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index fdbad07b5f42..f4e550ddaa5d 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2500,7 +2500,8 @@ i915_gem_init_seqno(struct drm_i915_private *dev_priv, u32 seqno)
 
 	/* If the seqno wraps around, we need to clear the breadcrumb rbtree */
 	if (!i915_seqno_passed(seqno, dev_priv->next_seqno)) {
-		while (intel_kick_waiters(dev_priv))
+		while (intel_kick_waiters(dev_priv) ||
+		       intel_kick_signalers(dev_priv))
 			yield();
 	}
 
@@ -2964,12 +2965,6 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *engine)
 		i915_gem_object_retire__read(obj, engine->id);
 	}
 
-	if (unlikely(engine->trace_irq_req &&
-		     i915_gem_request_completed(engine->trace_irq_req))) {
-		engine->irq_put(engine);
-		i915_gem_request_assign(&engine->trace_irq_req, NULL);
-	}
-
 	WARN_ON(i915_verify_lists(engine->dev));
 }
 
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index 3d13fde95fdf..f59cf07184ae 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -490,7 +490,7 @@ TRACE_EVENT(i915_gem_ring_dispatch,
 			   __entry->ring = req->engine->id;
 			   __entry->seqno = req->seqno;
 			   __entry->flags = flags;
-			   i915_trace_irq_get(req->engine, req);
+			   intel_engine_enable_signaling(req);
 			   ),
 
 	    TP_printk("dev=%u, ring=%u, seqno=%u, flags=%x",
diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
index 0f5fe114c204..143891a2b68a 100644
--- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
@@ -22,6 +22,8 @@
  *
  */
 
+#include <linux/kthread.h>
+
 #include "i915_drv.h"
 
 static void intel_breadcrumbs_fake_irq(unsigned long data)
@@ -321,6 +323,155 @@ out_unlock:
 	spin_unlock(&b->lock);
 }
 
+struct signal {
+	struct rb_node node;
+	struct intel_wait wait;
+	struct drm_i915_gem_request *request;
+};
+
+static bool signal_complete(struct signal *signal)
+{
+	if (signal == NULL)
+		return false;
+
+	/* If another process served as the bottom-half it may have already
+	 * signalled that this wait is already completed.
+	 */
+	if (intel_wait_complete(&signal->wait))
+		return true;
+
+	/* Carefully check if the request is complete, giving time for the
+	 * seqno to be visible or if the GPU hung.
+	 */
+	if (__i915_request_irq_complete(signal->request))
+		return true;
+
+	return false;
+}
+
+static struct signal *to_signal(struct rb_node *rb)
+{
+	return container_of(rb, struct signal, node);
+}
+
+static void signaler_set_rtpriority(void)
+{
+	 struct sched_param param = { .sched_priority = 1 };
+	 sched_setscheduler_nocheck(current, SCHED_FIFO, &param);
+}
+
+static int intel_breadcrumbs_signaler(void *arg)
+{
+	struct intel_engine_cs *engine = arg;
+	struct intel_breadcrumbs *b = &engine->breadcrumbs;
+	struct signal *signal;
+
+	/* Install ourselves with high priority to reduce signalling latency */
+	signaler_set_rtpriority();
+
+	do {
+		set_current_state(TASK_INTERRUPTIBLE);
+
+		/* We are either woken up by the interrupt bottom-half,
+		 * or by a client adding a new signaller. In both cases,
+		 * the GPU seqno may have advanced beyond our oldest signal.
+		 * If it has, propagate the signal, remove the waiter and
+		 * check again with the next oldest signal. Otherwise we
+		 * need to wait for a new interrupt from the GPU or for
+		 * a new client.
+		 */
+		signal = READ_ONCE(b->first_signal);
+		if (signal_complete(signal)) {
+			/* Wake up all other completed waiters and select the
+			 * next bottom-half for the next user interrupt.
+			 */
+			intel_engine_remove_wait(engine, &signal->wait);
+
+			i915_gem_request_unreference(signal->request);
+
+			/* Find the next oldest signal. Note that as we have
+			 * not been holding the lock, another client may
+			 * have installed an even older signal than the one
+			 * we just completed - so double check we are still
+			 * the oldest before picking the next one.
+			 */
+			spin_lock(&b->lock);
+			if (signal == b->first_signal)
+				b->first_signal = rb_next(&signal->node);
+			rb_erase(&signal->node, &b->signals);
+			spin_unlock(&b->lock);
+
+			kfree(signal);
+		} else {
+			if (kthread_should_stop())
+				break;
+
+			schedule();
+		}
+	} while (1);
+
+	return 0;
+}
+
+int intel_engine_enable_signaling(struct drm_i915_gem_request *request)
+{
+	struct intel_engine_cs *engine = request->engine;
+	struct intel_breadcrumbs *b = &engine->breadcrumbs;
+	struct rb_node *parent, **p;
+	struct signal *signal;
+	bool first, wakeup;
+
+	if (unlikely(IS_ERR(b->signaler)))
+		return PTR_ERR(b->signaler);
+
+	signal = kmalloc(sizeof(*signal), GFP_ATOMIC);
+	if (unlikely(!signal))
+		return -ENOMEM;
+
+	signal->wait.task = b->signaler;
+	signal->wait.seqno = request->seqno;
+
+	signal->request = i915_gem_request_reference(request);
+
+	/* First add ourselves into the list of waiters, but register our
+	 * bottom-half as the signaller thread. As per usual, only the oldest
+	 * waiter (not just signaller) is tasked as the bottom-half waking
+	 * up all completed waiters after the user interrupt.
+	 *
+	 * If we are the oldest waiter, enable the irq (after which we
+	 * must double check that the seqno did not complete).
+	 */
+	wakeup = intel_engine_add_wait(engine, &signal->wait);
+
+	/* Now insert ourselves into the retirement ordered list of signals
+	 * on this engine. We track the oldest seqno as that will be the
+	 * first signal to complete.
+	 */
+	spin_lock(&b->lock);
+	parent = NULL;
+	first = true;
+	p = &b->signals.rb_node;
+	while (*p) {
+		parent = *p;
+		if (i915_seqno_passed(signal->wait.seqno,
+				      to_signal(parent)->wait.seqno)) {
+			p = &parent->rb_right;
+			first = false;
+		} else
+			p = &parent->rb_left;
+	}
+	rb_link_node(&signal->node, parent, p);
+	rb_insert_color(&signal->node, &b->signals);
+	if (first)
+		smp_store_mb(b->first_signal, signal);
+	spin_unlock(&b->lock);
+
+	if (wakeup)
+		wake_up_process(b->signaler);
+
+	return 0;
+}
+
 void intel_engine_init_breadcrumbs(struct intel_engine_cs *engine)
 {
 	struct intel_breadcrumbs *b = &engine->breadcrumbs;
@@ -329,12 +480,24 @@ void intel_engine_init_breadcrumbs(struct intel_engine_cs *engine)
 	setup_timer(&b->fake_irq,
 		    intel_breadcrumbs_fake_irq,
 		    (unsigned long)engine);
+
+	/* Spawn a thread to provide a common bottom-half for all signals.
+	 * As this is an asynchronous interface we cannot steal the current
+	 * task for handling the bottom-half to the user interrupt, therefore
+	 * we create a thread to do the coherent seqno dance after the
+	 * interrupt and then signal the waitqueue (via the dma-buf/fence).
+	 */
+	b->signaler = kthread_run(intel_breadcrumbs_signaler,
+				  engine, "irq/i915:%d", engine->id);
 }
 
 void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine)
 {
 	struct intel_breadcrumbs *b = &engine->breadcrumbs;
 
+	if (!IS_ERR_OR_NULL(b->signaler))
+		kthread_stop(b->signaler);
+
 	del_timer_sync(&b->fake_irq);
 }
 
@@ -356,3 +519,18 @@ unsigned intel_kick_waiters(struct drm_i915_private *i915)
 
 	return mask;
 }
+
+unsigned intel_kick_signalers(struct drm_i915_private *i915)
+{
+	struct intel_engine_cs *engine;
+	unsigned mask = 0;
+
+	for_each_engine(engine, i915) {
+		if (unlikely(READ_ONCE(engine->breadcrumbs.first_signal))) {
+			wake_up_process(engine->breadcrumbs.signaler);
+			mask |= intel_engine_flag(engine);
+		}
+	}
+
+	return mask;
+}
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 324f85e8d540..f4bca38caef0 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -141,6 +141,8 @@ struct  i915_ctx_workarounds {
 	struct drm_i915_gem_object *obj;
 };
 
+struct drm_i915_gem_request;
+
 struct intel_engine_cs {
 	struct drm_i915_private *i915;
 	const char	*name;
@@ -179,8 +181,11 @@ struct intel_engine_cs {
 	struct intel_breadcrumbs {
 		spinlock_t lock; /* protects the lists of requests */
 		struct rb_root waiters; /* sorted by retirement, priority */
+		struct rb_root signals; /* sorted by retirement */
 		struct intel_wait *first_wait; /* oldest waiter by retirement */
 		struct task_struct *tasklet; /* bh for user interrupts */
+		struct task_struct *signaler; /* used for fence signalling */
+		void *first_signal;
 		struct timer_list fake_irq; /* used after a missed interrupt */
 		bool irq_enabled;
 		bool rpm_wakelock;
@@ -199,7 +204,6 @@ struct intel_engine_cs {
 	unsigned irq_refcount; /* protected by dev_priv->irq_lock */
 	bool		irq_posted;
 	u32		irq_enable_mask;	/* bitmask to enable ring interrupt */
-	struct drm_i915_gem_request *trace_irq_req;
 	bool __must_check (*irq_get)(struct intel_engine_cs *ring);
 	void		(*irq_put)(struct intel_engine_cs *ring);
 
@@ -540,6 +544,7 @@ bool intel_engine_add_wait(struct intel_engine_cs *engine,
 			   struct intel_wait *wait);
 void intel_engine_remove_wait(struct intel_engine_cs *engine,
 			      struct intel_wait *wait);
+int intel_engine_enable_signaling(struct drm_i915_gem_request *request);
 static inline bool intel_engine_has_waiter(struct intel_engine_cs *engine)
 {
 	return READ_ONCE(engine->breadcrumbs.tasklet);
@@ -559,5 +564,6 @@ static inline bool intel_engine_wakeup(struct intel_engine_cs *engine)
 void intel_engine_enable_fake_irq(struct intel_engine_cs *engine);
 void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine);
 unsigned intel_kick_waiters(struct drm_i915_private *i915);
+unsigned intel_kick_signalers(struct drm_i915_private *i915);
 
 #endif /* _INTEL_RINGBUFFER_H_ */
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 18/21] drm/i915: Embed signaling node into the GEM request
  2016-06-03 16:08 Breadcrumbs, again Chris Wilson
                   ` (16 preceding siblings ...)
  2016-06-03 16:08 ` [PATCH 17/21] drm/i915: Convert trace-irq to the breadcrumb waiter Chris Wilson
@ 2016-06-03 16:08 ` Chris Wilson
  2016-06-07 12:31   ` Tvrtko Ursulin
  2016-06-03 16:08 ` [PATCH 19/21] drm/i915: Move the get/put irq locking into the caller Chris Wilson
                   ` (3 subsequent siblings)
  21 siblings, 1 reply; 60+ messages in thread
From: Chris Wilson @ 2016-06-03 16:08 UTC (permalink / raw)
  To: intel-gfx

Under the assumption that enabling signaling will be a frequent
operation, lets preallocate our attachments for signaling inside the
request struct (and so benefiting from the slab cache).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h          |  1 +
 drivers/gpu/drm/i915/intel_breadcrumbs.c | 89 ++++++++++++++++++--------------
 drivers/gpu/drm/i915/intel_ringbuffer.h  |  6 +++
 3 files changed, 56 insertions(+), 40 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index b0235372cfdf..88d9242398ce 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2363,6 +2363,7 @@ struct drm_i915_gem_request {
 	struct drm_i915_private *i915;
 	struct intel_engine_cs *engine;
 	unsigned reset_counter;
+	struct intel_signal_node signaling;
 
 	 /** GEM sequence number associated with the previous request,
 	  * when the HWS breadcrumb is equal to this the GPU is processing
diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
index 143891a2b68a..8ab508ed4248 100644
--- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
@@ -128,16 +128,14 @@ static inline void __intel_breadcrumbs_finish(struct intel_breadcrumbs *b,
 	wake_up_process(wait->task); /* implicit smp_wmb() */
 }
 
-bool intel_engine_add_wait(struct intel_engine_cs *engine,
-			   struct intel_wait *wait)
+static bool __intel_engine_add_wait(struct intel_engine_cs *engine,
+				    struct intel_wait *wait)
 {
 	struct intel_breadcrumbs *b = &engine->breadcrumbs;
 	struct rb_node **p, *parent, *completed;
 	bool first;
 	u32 seqno;
 
-	spin_lock(&b->lock);
-
 	/* Insert the request into the retirement ordered list
 	 * of waiters by walking the rbtree. If we are the oldest
 	 * seqno in the tree (the first to be retired), then
@@ -223,6 +221,17 @@ bool intel_engine_add_wait(struct intel_engine_cs *engine,
 	GEM_BUG_ON(!b->first_wait);
 	GEM_BUG_ON(rb_first(&b->waiters) != &b->first_wait->node);
 
+	return first;
+}
+
+bool intel_engine_add_wait(struct intel_engine_cs *engine,
+			   struct intel_wait *wait)
+{
+	struct intel_breadcrumbs *b = &engine->breadcrumbs;
+	bool first;
+
+	spin_lock(&b->lock);
+	first = __intel_engine_add_wait(engine, wait);
 	spin_unlock(&b->lock);
 
 	return first;
@@ -323,35 +332,29 @@ out_unlock:
 	spin_unlock(&b->lock);
 }
 
-struct signal {
-	struct rb_node node;
-	struct intel_wait wait;
-	struct drm_i915_gem_request *request;
-};
-
-static bool signal_complete(struct signal *signal)
+static bool signal_complete(struct drm_i915_gem_request *request)
 {
-	if (signal == NULL)
+	if (request == NULL)
 		return false;
 
 	/* If another process served as the bottom-half it may have already
 	 * signalled that this wait is already completed.
 	 */
-	if (intel_wait_complete(&signal->wait))
+	if (intel_wait_complete(&request->signaling.wait))
 		return true;
 
 	/* Carefully check if the request is complete, giving time for the
 	 * seqno to be visible or if the GPU hung.
 	 */
-	if (__i915_request_irq_complete(signal->request))
+	if (__i915_request_irq_complete(request))
 		return true;
 
 	return false;
 }
 
-static struct signal *to_signal(struct rb_node *rb)
+static struct drm_i915_gem_request *to_signal(struct rb_node *rb)
 {
-	return container_of(rb, struct signal, node);
+	return container_of(rb, struct drm_i915_gem_request, signaling.node);
 }
 
 static void signaler_set_rtpriority(void)
@@ -364,7 +367,7 @@ static int intel_breadcrumbs_signaler(void *arg)
 {
 	struct intel_engine_cs *engine = arg;
 	struct intel_breadcrumbs *b = &engine->breadcrumbs;
-	struct signal *signal;
+	struct drm_i915_gem_request *request;
 
 	/* Install ourselves with high priority to reduce signalling latency */
 	signaler_set_rtpriority();
@@ -380,14 +383,13 @@ static int intel_breadcrumbs_signaler(void *arg)
 		 * need to wait for a new interrupt from the GPU or for
 		 * a new client.
 		 */
-		signal = READ_ONCE(b->first_signal);
-		if (signal_complete(signal)) {
+		request = READ_ONCE(b->first_signal);
+		if (signal_complete(request)) {
 			/* Wake up all other completed waiters and select the
 			 * next bottom-half for the next user interrupt.
 			 */
-			intel_engine_remove_wait(engine, &signal->wait);
-
-			i915_gem_request_unreference(signal->request);
+			intel_engine_remove_wait(engine,
+						 &request->signaling.wait);
 
 			/* Find the next oldest signal. Note that as we have
 			 * not been holding the lock, another client may
@@ -396,12 +398,15 @@ static int intel_breadcrumbs_signaler(void *arg)
 			 * the oldest before picking the next one.
 			 */
 			spin_lock(&b->lock);
-			if (signal == b->first_signal)
-				b->first_signal = rb_next(&signal->node);
-			rb_erase(&signal->node, &b->signals);
+			if (request == b->first_signal) {
+				struct rb_node *rb =
+					rb_next(&request->signaling.node);
+				b->first_signal = rb ? to_signal(rb) : NULL;
+			}
+			rb_erase(&request->signaling.node, &b->signals);
 			spin_unlock(&b->lock);
 
-			kfree(signal);
+			i915_gem_request_unreference(request);
 		} else {
 			if (kthread_should_stop())
 				break;
@@ -418,20 +423,23 @@ int intel_engine_enable_signaling(struct drm_i915_gem_request *request)
 	struct intel_engine_cs *engine = request->engine;
 	struct intel_breadcrumbs *b = &engine->breadcrumbs;
 	struct rb_node *parent, **p;
-	struct signal *signal;
 	bool first, wakeup;
 
 	if (unlikely(IS_ERR(b->signaler)))
 		return PTR_ERR(b->signaler);
 
-	signal = kmalloc(sizeof(*signal), GFP_ATOMIC);
-	if (unlikely(!signal))
-		return -ENOMEM;
+	if (unlikely(READ_ONCE(request->signaling.wait.task)))
+		return 0;
 
-	signal->wait.task = b->signaler;
-	signal->wait.seqno = request->seqno;
+	spin_lock(&b->lock);
+	if (unlikely(request->signaling.wait.task)) {
+		wakeup = false;
+		goto unlock;
+	}
 
-	signal->request = i915_gem_request_reference(request);
+	request->signaling.wait.task = b->signaler;
+	request->signaling.wait.seqno = request->seqno;
+	i915_gem_request_reference(request);
 
 	/* First add ourselves into the list of waiters, but register our
 	 * bottom-half as the signaller thread. As per usual, only the oldest
@@ -441,29 +449,30 @@ int intel_engine_enable_signaling(struct drm_i915_gem_request *request)
 	 * If we are the oldest waiter, enable the irq (after which we
 	 * must double check that the seqno did not complete).
 	 */
-	wakeup = intel_engine_add_wait(engine, &signal->wait);
+	wakeup = __intel_engine_add_wait(engine, &request->signaling.wait);
 
 	/* Now insert ourselves into the retirement ordered list of signals
 	 * on this engine. We track the oldest seqno as that will be the
 	 * first signal to complete.
 	 */
-	spin_lock(&b->lock);
 	parent = NULL;
 	first = true;
 	p = &b->signals.rb_node;
 	while (*p) {
 		parent = *p;
-		if (i915_seqno_passed(signal->wait.seqno,
-				      to_signal(parent)->wait.seqno)) {
+		if (i915_seqno_passed(request->seqno,
+				      to_signal(parent)->seqno)) {
 			p = &parent->rb_right;
 			first = false;
 		} else
 			p = &parent->rb_left;
 	}
-	rb_link_node(&signal->node, parent, p);
-	rb_insert_color(&signal->node, &b->signals);
+	rb_link_node(&request->signaling.node, parent, p);
+	rb_insert_color(&request->signaling.node, &b->signals);
 	if (first)
-		smp_store_mb(b->first_signal, signal);
+		smp_store_mb(b->first_signal, request);
+
+unlock:
 	spin_unlock(&b->lock);
 
 	if (wakeup)
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index f4bca38caef0..5f7cb3d0ea1c 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -530,6 +530,12 @@ struct intel_wait {
 	struct task_struct *task;
 	u32 seqno;
 };
+
+struct intel_signal_node {
+	struct rb_node node;
+	struct intel_wait wait;
+};
+
 void intel_engine_init_breadcrumbs(struct intel_engine_cs *engine);
 static inline void intel_wait_init(struct intel_wait *wait, u32 seqno)
 {
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 19/21] drm/i915: Move the get/put irq locking into the caller
  2016-06-03 16:08 Breadcrumbs, again Chris Wilson
                   ` (17 preceding siblings ...)
  2016-06-03 16:08 ` [PATCH 18/21] drm/i915: Embed signaling node into the GEM request Chris Wilson
@ 2016-06-03 16:08 ` Chris Wilson
  2016-06-07 12:46   ` Tvrtko Ursulin
  2016-06-03 16:08 ` [PATCH 20/21] drm/i915: Simplify enabling user-interrupts with L3-remapping Chris Wilson
                   ` (2 subsequent siblings)
  21 siblings, 1 reply; 60+ messages in thread
From: Chris Wilson @ 2016-06-03 16:08 UTC (permalink / raw)
  To: intel-gfx

With only a single callsite for intel_engine_cs->irq_get and ->irq_put,
we can reduce the code size by moving the common preamble into the
caller, and we can also eliminate the reference counting.

For completeness, as we are no longer doing reference counting on irq,
rename the get/put vfunctions to enable/disable respectively.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_irq.c          |   8 +-
 drivers/gpu/drm/i915/intel_breadcrumbs.c |  10 +-
 drivers/gpu/drm/i915/intel_lrc.c         |  34 +---
 drivers/gpu/drm/i915/intel_ringbuffer.c  | 269 ++++++++++---------------------
 drivers/gpu/drm/i915/intel_ringbuffer.h  |   5 +-
 5 files changed, 108 insertions(+), 218 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 14b3d65bb604..5bdb433dde8c 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -259,12 +259,12 @@ static void ilk_update_gt_irq(struct drm_i915_private *dev_priv,
 	dev_priv->gt_irq_mask &= ~interrupt_mask;
 	dev_priv->gt_irq_mask |= (~enabled_irq_mask & interrupt_mask);
 	I915_WRITE(GTIMR, dev_priv->gt_irq_mask);
-	POSTING_READ(GTIMR);
 }
 
 void gen5_enable_gt_irq(struct drm_i915_private *dev_priv, uint32_t mask)
 {
 	ilk_update_gt_irq(dev_priv, mask, mask);
+	POSTING_READ_FW(GTIMR);
 }
 
 void gen5_disable_gt_irq(struct drm_i915_private *dev_priv, uint32_t mask)
@@ -2818,9 +2818,9 @@ ring_idle(struct intel_engine_cs *engine, u32 seqno)
 }
 
 static bool
-ipehr_is_semaphore_wait(struct drm_i915_private *dev_priv, u32 ipehr)
+ipehr_is_semaphore_wait(struct intel_engine_cs *engine, u32 ipehr)
 {
-	if (INTEL_GEN(dev_priv) >= 8) {
+	if (INTEL_GEN(engine->i915) >= 8) {
 		return (ipehr >> 23) == 0x1c;
 	} else {
 		ipehr &= ~MI_SEMAPHORE_SYNC_MASK;
@@ -2891,7 +2891,7 @@ semaphore_waits_for(struct intel_engine_cs *engine, u32 *seqno)
 		return NULL;
 
 	ipehr = I915_READ(RING_IPEHR(engine->mmio_base));
-	if (!ipehr_is_semaphore_wait(engine->i915, ipehr))
+	if (!ipehr_is_semaphore_wait(engine, ipehr))
 		return NULL;
 
 	/*
diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
index 8ab508ed4248..dc65a007fa20 100644
--- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
@@ -50,12 +50,18 @@ static void irq_enable(struct intel_engine_cs *engine)
 	 * just in case.
 	 */
 	engine->irq_posted = true;
-	WARN_ON(!engine->irq_get(engine));
+
+	spin_lock_irq(&engine->i915->irq_lock);
+	engine->irq_enable(engine);
+	spin_unlock_irq(&engine->i915->irq_lock);
 }
 
 static void irq_disable(struct intel_engine_cs *engine)
 {
-	engine->irq_put(engine);
+	spin_lock_irq(&engine->i915->irq_lock);
+	engine->irq_disable(engine);
+	spin_unlock_irq(&engine->i915->irq_lock);
+
 	engine->irq_posted = false;
 }
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 32b5eae7dd11..9e19b2c5b3ae 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1578,36 +1578,18 @@ static int gen8_emit_bb_start(struct drm_i915_gem_request *req,
 	return 0;
 }
 
-static bool gen8_logical_ring_get_irq(struct intel_engine_cs *engine)
+static void gen8_logical_ring_enable_irq(struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *dev_priv = engine->i915;
-	unsigned long flags;
-
-	if (WARN_ON(!intel_irqs_enabled(dev_priv)))
-		return false;
-
-	spin_lock_irqsave(&dev_priv->irq_lock, flags);
-	if (engine->irq_refcount++ == 0) {
-		I915_WRITE_IMR(engine,
-			       ~(engine->irq_enable_mask | engine->irq_keep_mask));
-		POSTING_READ(RING_IMR(engine->mmio_base));
-	}
-	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
-
-	return true;
+	I915_WRITE_IMR(engine,
+		       ~(engine->irq_enable_mask | engine->irq_keep_mask));
+	POSTING_READ_FW(RING_IMR(engine->mmio_base));
 }
 
-static void gen8_logical_ring_put_irq(struct intel_engine_cs *engine)
+static void gen8_logical_ring_disable_irq(struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *dev_priv = engine->i915;
-	unsigned long flags;
-
-	spin_lock_irqsave(&dev_priv->irq_lock, flags);
-	if (--engine->irq_refcount == 0) {
-		I915_WRITE_IMR(engine, ~engine->irq_keep_mask);
-		POSTING_READ(RING_IMR(engine->mmio_base));
-	}
-	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
+	I915_WRITE_IMR(engine, ~engine->irq_keep_mask);
 }
 
 static int gen8_emit_flush(struct drm_i915_gem_request *request,
@@ -1895,8 +1877,8 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
 	engine->init_hw = gen8_init_common_ring;
 	engine->emit_request = gen8_emit_request;
 	engine->emit_flush = gen8_emit_flush;
-	engine->irq_get = gen8_logical_ring_get_irq;
-	engine->irq_put = gen8_logical_ring_put_irq;
+	engine->irq_enable = gen8_logical_ring_enable_irq;
+	engine->irq_disable = gen8_logical_ring_disable_irq;
 	engine->emit_bb_start = gen8_emit_bb_start;
 	if (IS_BXT_REVID(engine->i915, 0, BXT_REVID_A1))
 		engine->irq_seqno_barrier = bxt_a_seqno_barrier;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 30e400d77d23..ba84b469f13f 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1551,103 +1551,54 @@ gen6_seqno_barrier(struct intel_engine_cs *engine)
 	spin_unlock_irq(&dev_priv->uncore.lock);
 }
 
-static bool
-gen5_ring_get_irq(struct intel_engine_cs *engine)
+static void
+gen5_ring_enable_irq(struct intel_engine_cs *engine)
 {
-	struct drm_i915_private *dev_priv = engine->i915;
-	unsigned long flags;
-
-	if (WARN_ON(!intel_irqs_enabled(dev_priv)))
-		return false;
-
-	spin_lock_irqsave(&dev_priv->irq_lock, flags);
-	if (engine->irq_refcount++ == 0)
-		gen5_enable_gt_irq(dev_priv, engine->irq_enable_mask);
-	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
-
-	return true;
+	gen5_enable_gt_irq(engine->i915, engine->irq_enable_mask);
 }
 
 static void
-gen5_ring_put_irq(struct intel_engine_cs *engine)
+gen5_ring_disable_irq(struct intel_engine_cs *engine)
 {
-	struct drm_i915_private *dev_priv = engine->i915;
-	unsigned long flags;
-
-	spin_lock_irqsave(&dev_priv->irq_lock, flags);
-	if (--engine->irq_refcount == 0)
-		gen5_disable_gt_irq(dev_priv, engine->irq_enable_mask);
-	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
+	gen5_disable_gt_irq(engine->i915, engine->irq_enable_mask);
 }
 
-static bool
-i9xx_ring_get_irq(struct intel_engine_cs *engine)
+static void
+i9xx_ring_enable_irq(struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *dev_priv = engine->i915;
-	unsigned long flags;
-
-	if (!intel_irqs_enabled(dev_priv))
-		return false;
-
-	spin_lock_irqsave(&dev_priv->irq_lock, flags);
-	if (engine->irq_refcount++ == 0) {
-		dev_priv->irq_mask &= ~engine->irq_enable_mask;
-		I915_WRITE(IMR, dev_priv->irq_mask);
-		POSTING_READ(IMR);
-	}
-	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
 
-	return true;
+	dev_priv->irq_mask &= ~engine->irq_enable_mask;
+	I915_WRITE(IMR, dev_priv->irq_mask);
+	POSTING_READ_FW(RING_IMR(engine->mmio_base));
 }
 
 static void
-i9xx_ring_put_irq(struct intel_engine_cs *engine)
+i9xx_ring_disable_irq(struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *dev_priv = engine->i915;
-	unsigned long flags;
 
-	spin_lock_irqsave(&dev_priv->irq_lock, flags);
-	if (--engine->irq_refcount == 0) {
-		dev_priv->irq_mask |= engine->irq_enable_mask;
-		I915_WRITE(IMR, dev_priv->irq_mask);
-		POSTING_READ(IMR);
-	}
-	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
+	dev_priv->irq_mask |= engine->irq_enable_mask;
+	I915_WRITE(IMR, dev_priv->irq_mask);
 }
 
-static bool
-i8xx_ring_get_irq(struct intel_engine_cs *engine)
+static void
+i8xx_ring_enable_irq(struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *dev_priv = engine->i915;
-	unsigned long flags;
 
-	if (!intel_irqs_enabled(dev_priv))
-		return false;
-
-	spin_lock_irqsave(&dev_priv->irq_lock, flags);
-	if (engine->irq_refcount++ == 0) {
-		dev_priv->irq_mask &= ~engine->irq_enable_mask;
-		I915_WRITE16(IMR, dev_priv->irq_mask);
-		POSTING_READ16(IMR);
-	}
-	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
-
-	return true;
+	dev_priv->irq_mask &= ~engine->irq_enable_mask;
+	I915_WRITE16(IMR, dev_priv->irq_mask);
+	POSTING_READ16(RING_IMR(engine->mmio_base));
 }
 
 static void
-i8xx_ring_put_irq(struct intel_engine_cs *engine)
+i8xx_ring_disable_irq(struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *dev_priv = engine->i915;
-	unsigned long flags;
 
-	spin_lock_irqsave(&dev_priv->irq_lock, flags);
-	if (--engine->irq_refcount == 0) {
-		dev_priv->irq_mask |= engine->irq_enable_mask;
-		I915_WRITE16(IMR, dev_priv->irq_mask);
-		POSTING_READ16(IMR);
-	}
-	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
+	dev_priv->irq_mask |= engine->irq_enable_mask;
+	I915_WRITE16(IMR, dev_priv->irq_mask);
 }
 
 static int
@@ -1688,122 +1639,74 @@ i9xx_add_request(struct drm_i915_gem_request *req)
 	return 0;
 }
 
-static bool
-gen6_ring_get_irq(struct intel_engine_cs *engine)
+static void
+gen6_ring_enable_irq(struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *dev_priv = engine->i915;
-	unsigned long flags;
-
-	if (WARN_ON(!intel_irqs_enabled(dev_priv)))
-		return false;
 
-	spin_lock_irqsave(&dev_priv->irq_lock, flags);
-	if (engine->irq_refcount++ == 0) {
-		if (HAS_L3_DPF(dev_priv) && engine->id == RCS)
-			I915_WRITE_IMR(engine,
-				       ~(engine->irq_enable_mask |
-					 GT_PARITY_ERROR(dev_priv)));
-		else
-			I915_WRITE_IMR(engine, ~engine->irq_enable_mask);
-		gen5_enable_gt_irq(dev_priv, engine->irq_enable_mask);
-	}
-	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
-
-	return true;
+	if (HAS_L3_DPF(dev_priv) && engine->id == RCS)
+		I915_WRITE_IMR(engine,
+			       ~(engine->irq_enable_mask |
+				 GT_PARITY_ERROR(dev_priv)));
+	else
+		I915_WRITE_IMR(engine, ~engine->irq_enable_mask);
+	gen5_enable_gt_irq(dev_priv, engine->irq_enable_mask);
 }
 
 static void
-gen6_ring_put_irq(struct intel_engine_cs *engine)
+gen6_ring_disable_irq(struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *dev_priv = engine->i915;
-	unsigned long flags;
 
-	spin_lock_irqsave(&dev_priv->irq_lock, flags);
-	if (--engine->irq_refcount == 0) {
-		if (HAS_L3_DPF(dev_priv) && engine->id == RCS)
-			I915_WRITE_IMR(engine, ~GT_PARITY_ERROR(dev_priv));
-		else
-			I915_WRITE_IMR(engine, ~0);
-		gen5_disable_gt_irq(dev_priv, engine->irq_enable_mask);
-	}
-	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
+	if (HAS_L3_DPF(dev_priv) && engine->id == RCS)
+		I915_WRITE_IMR(engine, ~GT_PARITY_ERROR(dev_priv));
+	else
+		I915_WRITE_IMR(engine, ~0);
+	gen5_disable_gt_irq(dev_priv, engine->irq_enable_mask);
 }
 
-static bool
-hsw_vebox_get_irq(struct intel_engine_cs *engine)
+static void
+hsw_vebox_enable_irq(struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *dev_priv = engine->i915;
-	unsigned long flags;
-
-	if (WARN_ON(!intel_irqs_enabled(dev_priv)))
-		return false;
 
-	spin_lock_irqsave(&dev_priv->irq_lock, flags);
-	if (engine->irq_refcount++ == 0) {
-		I915_WRITE_IMR(engine, ~engine->irq_enable_mask);
-		gen6_enable_pm_irq(dev_priv, engine->irq_enable_mask);
-	}
-	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
-
-	return true;
+	I915_WRITE_IMR(engine, ~engine->irq_enable_mask);
+	gen6_enable_pm_irq(dev_priv, engine->irq_enable_mask);
 }
 
 static void
-hsw_vebox_put_irq(struct intel_engine_cs *engine)
+hsw_vebox_disable_irq(struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *dev_priv = engine->i915;
-	unsigned long flags;
 
-	spin_lock_irqsave(&dev_priv->irq_lock, flags);
-	if (--engine->irq_refcount == 0) {
-		I915_WRITE_IMR(engine, ~0);
-		gen6_disable_pm_irq(dev_priv, engine->irq_enable_mask);
-	}
-	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
+	I915_WRITE_IMR(engine, ~0);
+	gen6_disable_pm_irq(dev_priv, engine->irq_enable_mask);
 }
 
-static bool
-gen8_ring_get_irq(struct intel_engine_cs *engine)
+static void
+gen8_ring_enable_irq(struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *dev_priv = engine->i915;
-	unsigned long flags;
 
-	if (WARN_ON(!intel_irqs_enabled(dev_priv)))
-		return false;
-
-	spin_lock_irqsave(&dev_priv->irq_lock, flags);
-	if (engine->irq_refcount++ == 0) {
-		if (HAS_L3_DPF(dev_priv) && engine->id == RCS) {
-			I915_WRITE_IMR(engine,
-				       ~(engine->irq_enable_mask |
-					 GT_RENDER_L3_PARITY_ERROR_INTERRUPT));
-		} else {
-			I915_WRITE_IMR(engine, ~engine->irq_enable_mask);
-		}
-		POSTING_READ(RING_IMR(engine->mmio_base));
-	}
-	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
-
-	return true;
+	if (HAS_L3_DPF(dev_priv) && engine->id == RCS)
+		I915_WRITE_IMR(engine,
+			       ~(engine->irq_enable_mask |
+				 GT_RENDER_L3_PARITY_ERROR_INTERRUPT));
+	else
+		I915_WRITE_IMR(engine, ~engine->irq_enable_mask);
+	POSTING_READ_FW(RING_IMR(engine->mmio_base));
 }
 
 static void
-gen8_ring_put_irq(struct intel_engine_cs *engine)
+gen8_ring_disable_irq(struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *dev_priv = engine->i915;
-	unsigned long flags;
 
-	spin_lock_irqsave(&dev_priv->irq_lock, flags);
-	if (--engine->irq_refcount == 0) {
-		if (HAS_L3_DPF(dev_priv) && engine->id == RCS) {
-			I915_WRITE_IMR(engine,
-				       ~GT_RENDER_L3_PARITY_ERROR_INTERRUPT);
-		} else {
-			I915_WRITE_IMR(engine, ~0);
-		}
-		POSTING_READ(RING_IMR(engine->mmio_base));
-	}
-	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
+	if (HAS_L3_DPF(dev_priv) && engine->id == RCS)
+		I915_WRITE_IMR(engine,
+			       ~GT_RENDER_L3_PARITY_ERROR_INTERRUPT);
+	else
+		I915_WRITE_IMR(engine, ~0);
 }
 
 static int
@@ -2739,8 +2642,8 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 		engine->init_context = intel_rcs_ctx_init;
 		engine->add_request = gen8_render_add_request;
 		engine->flush = gen8_render_ring_flush;
-		engine->irq_get = gen8_ring_get_irq;
-		engine->irq_put = gen8_ring_put_irq;
+		engine->irq_enable = gen8_ring_enable_irq;
+		engine->irq_disable = gen8_ring_disable_irq;
 		engine->irq_enable_mask = GT_RENDER_USER_INTERRUPT;
 		if (i915_semaphore_is_enabled(dev_priv)) {
 			WARN_ON(!dev_priv->semaphore_obj);
@@ -2754,8 +2657,8 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 		engine->flush = gen7_render_ring_flush;
 		if (IS_GEN6(dev_priv))
 			engine->flush = gen6_render_ring_flush;
-		engine->irq_get = gen6_ring_get_irq;
-		engine->irq_put = gen6_ring_put_irq;
+		engine->irq_enable = gen6_ring_enable_irq;
+		engine->irq_disable = gen6_ring_disable_irq;
 		engine->irq_enable_mask = GT_RENDER_USER_INTERRUPT;
 		engine->irq_seqno_barrier = gen6_seqno_barrier;
 		if (i915_semaphore_is_enabled(dev_priv)) {
@@ -2782,8 +2685,8 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 	} else if (IS_GEN5(dev_priv)) {
 		engine->add_request = i9xx_add_request;
 		engine->flush = gen4_render_ring_flush;
-		engine->irq_get = gen5_ring_get_irq;
-		engine->irq_put = gen5_ring_put_irq;
+		engine->irq_enable = gen5_ring_enable_irq;
+		engine->irq_disable = gen5_ring_disable_irq;
 		engine->irq_seqno_barrier = gen5_seqno_barrier;
 		engine->irq_enable_mask = GT_RENDER_USER_INTERRUPT;
 	} else {
@@ -2793,11 +2696,11 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 		else
 			engine->flush = gen4_render_ring_flush;
 		if (IS_GEN2(dev_priv)) {
-			engine->irq_get = i8xx_ring_get_irq;
-			engine->irq_put = i8xx_ring_put_irq;
+			engine->irq_enable = i8xx_ring_enable_irq;
+			engine->irq_disable = i8xx_ring_disable_irq;
 		} else {
-			engine->irq_get = i9xx_ring_get_irq;
-			engine->irq_put = i9xx_ring_put_irq;
+			engine->irq_enable = i9xx_ring_enable_irq;
+			engine->irq_disable = i9xx_ring_disable_irq;
 		}
 		engine->irq_enable_mask = I915_USER_INTERRUPT;
 	}
@@ -2857,8 +2760,8 @@ int intel_init_bsd_ring_buffer(struct drm_device *dev)
 		if (INTEL_GEN(dev_priv) >= 8) {
 			engine->irq_enable_mask =
 				GT_RENDER_USER_INTERRUPT << GEN8_VCS1_IRQ_SHIFT;
-			engine->irq_get = gen8_ring_get_irq;
-			engine->irq_put = gen8_ring_put_irq;
+			engine->irq_enable = gen8_ring_enable_irq;
+			engine->irq_disable = gen8_ring_disable_irq;
 			engine->dispatch_execbuffer =
 				gen8_ring_dispatch_execbuffer;
 			if (i915_semaphore_is_enabled(dev_priv)) {
@@ -2868,8 +2771,8 @@ int intel_init_bsd_ring_buffer(struct drm_device *dev)
 			}
 		} else {
 			engine->irq_enable_mask = GT_BSD_USER_INTERRUPT;
-			engine->irq_get = gen6_ring_get_irq;
-			engine->irq_put = gen6_ring_put_irq;
+			engine->irq_enable = gen6_ring_enable_irq;
+			engine->irq_disable = gen6_ring_disable_irq;
 			engine->dispatch_execbuffer =
 				gen6_ring_dispatch_execbuffer;
 			if (i915_semaphore_is_enabled(dev_priv)) {
@@ -2893,13 +2796,13 @@ int intel_init_bsd_ring_buffer(struct drm_device *dev)
 		engine->add_request = i9xx_add_request;
 		if (IS_GEN5(dev_priv)) {
 			engine->irq_enable_mask = ILK_BSD_USER_INTERRUPT;
-			engine->irq_get = gen5_ring_get_irq;
-			engine->irq_put = gen5_ring_put_irq;
+			engine->irq_enable = gen5_ring_enable_irq;
+			engine->irq_disable = gen5_ring_disable_irq;
 			engine->irq_seqno_barrier = gen5_seqno_barrier;
 		} else {
 			engine->irq_enable_mask = I915_BSD_USER_INTERRUPT;
-			engine->irq_get = i9xx_ring_get_irq;
-			engine->irq_put = i9xx_ring_put_irq;
+			engine->irq_enable = i9xx_ring_enable_irq;
+			engine->irq_disable = i9xx_ring_disable_irq;
 		}
 		engine->dispatch_execbuffer = i965_dispatch_execbuffer;
 	}
@@ -2928,8 +2831,8 @@ int intel_init_bsd2_ring_buffer(struct drm_device *dev)
 	engine->irq_seqno_barrier = gen6_seqno_barrier;
 	engine->irq_enable_mask =
 			GT_RENDER_USER_INTERRUPT << GEN8_VCS2_IRQ_SHIFT;
-	engine->irq_get = gen8_ring_get_irq;
-	engine->irq_put = gen8_ring_put_irq;
+	engine->irq_enable = gen8_ring_enable_irq;
+	engine->irq_disable = gen8_ring_disable_irq;
 	engine->dispatch_execbuffer =
 			gen8_ring_dispatch_execbuffer;
 	if (i915_semaphore_is_enabled(dev_priv)) {
@@ -2960,8 +2863,8 @@ int intel_init_blt_ring_buffer(struct drm_device *dev)
 	if (INTEL_GEN(dev_priv) >= 8) {
 		engine->irq_enable_mask =
 			GT_RENDER_USER_INTERRUPT << GEN8_BCS_IRQ_SHIFT;
-		engine->irq_get = gen8_ring_get_irq;
-		engine->irq_put = gen8_ring_put_irq;
+		engine->irq_enable = gen8_ring_enable_irq;
+		engine->irq_disable = gen8_ring_disable_irq;
 		engine->dispatch_execbuffer = gen8_ring_dispatch_execbuffer;
 		if (i915_semaphore_is_enabled(dev_priv)) {
 			engine->semaphore.sync_to = gen8_ring_sync;
@@ -2970,8 +2873,8 @@ int intel_init_blt_ring_buffer(struct drm_device *dev)
 		}
 	} else {
 		engine->irq_enable_mask = GT_BLT_USER_INTERRUPT;
-		engine->irq_get = gen6_ring_get_irq;
-		engine->irq_put = gen6_ring_put_irq;
+		engine->irq_enable = gen6_ring_enable_irq;
+		engine->irq_disable = gen6_ring_disable_irq;
 		engine->dispatch_execbuffer = gen6_ring_dispatch_execbuffer;
 		if (i915_semaphore_is_enabled(dev_priv)) {
 			engine->semaphore.signal = gen6_signal;
@@ -3019,8 +2922,8 @@ int intel_init_vebox_ring_buffer(struct drm_device *dev)
 	if (INTEL_GEN(dev_priv) >= 8) {
 		engine->irq_enable_mask =
 			GT_RENDER_USER_INTERRUPT << GEN8_VECS_IRQ_SHIFT;
-		engine->irq_get = gen8_ring_get_irq;
-		engine->irq_put = gen8_ring_put_irq;
+		engine->irq_enable = gen8_ring_enable_irq;
+		engine->irq_disable = gen8_ring_disable_irq;
 		engine->dispatch_execbuffer = gen8_ring_dispatch_execbuffer;
 		if (i915_semaphore_is_enabled(dev_priv)) {
 			engine->semaphore.sync_to = gen8_ring_sync;
@@ -3029,8 +2932,8 @@ int intel_init_vebox_ring_buffer(struct drm_device *dev)
 		}
 	} else {
 		engine->irq_enable_mask = PM_VEBOX_USER_INTERRUPT;
-		engine->irq_get = hsw_vebox_get_irq;
-		engine->irq_put = hsw_vebox_put_irq;
+		engine->irq_enable = hsw_vebox_enable_irq;
+		engine->irq_disable = hsw_vebox_disable_irq;
 		engine->dispatch_execbuffer = gen6_ring_dispatch_execbuffer;
 		if (i915_semaphore_is_enabled(dev_priv)) {
 			engine->semaphore.sync_to = gen6_ring_sync;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 5f7cb3d0ea1c..182cae767bf1 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -201,11 +201,10 @@ struct intel_engine_cs {
 	struct intel_hw_status_page status_page;
 	struct i915_ctx_workarounds wa_ctx;
 
-	unsigned irq_refcount; /* protected by dev_priv->irq_lock */
 	bool		irq_posted;
 	u32		irq_enable_mask;	/* bitmask to enable ring interrupt */
-	bool __must_check (*irq_get)(struct intel_engine_cs *ring);
-	void		(*irq_put)(struct intel_engine_cs *ring);
+	void		(*irq_enable)(struct intel_engine_cs *ring);
+	void		(*irq_disable)(struct intel_engine_cs *ring);
 
 	int		(*init_hw)(struct intel_engine_cs *ring);
 
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 20/21] drm/i915: Simplify enabling user-interrupts with L3-remapping
  2016-06-03 16:08 Breadcrumbs, again Chris Wilson
                   ` (18 preceding siblings ...)
  2016-06-03 16:08 ` [PATCH 19/21] drm/i915: Move the get/put irq locking into the caller Chris Wilson
@ 2016-06-03 16:08 ` Chris Wilson
  2016-06-07 12:50   ` Tvrtko Ursulin
  2016-06-03 16:08 ` [PATCH 21/21] drm/i915: Remove debug noise on detecting fault-injection of missed interrupts Chris Wilson
  2016-06-03 16:35 ` ✗ Ro.CI.BAT: failure for series starting with [01/21] drm/i915/shrinker: Flush active on objects before counting Patchwork
  21 siblings, 1 reply; 60+ messages in thread
From: Chris Wilson @ 2016-06-03 16:08 UTC (permalink / raw)
  To: intel-gfx

Borrow the idea from intel_lrc.c to precompute the mask of interrupts we
wish to always enable to avoid having lots of conditionals inside the
interrupt enabling.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 35 +++++++++++----------------------
 drivers/gpu/drm/i915/intel_ringbuffer.h |  4 ++--
 2 files changed, 14 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index ba84b469f13f..161c0792b1bf 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1227,8 +1227,7 @@ static int init_render_ring(struct intel_engine_cs *engine)
 	if (IS_GEN(dev_priv, 6, 7))
 		I915_WRITE(INSTPM, _MASKED_BIT_ENABLE(INSTPM_FORCE_ORDERING));
 
-	if (HAS_L3_DPF(dev_priv))
-		I915_WRITE_IMR(engine, ~GT_PARITY_ERROR(dev_priv));
+	I915_WRITE_IMR(engine, ~engine->irq_keep_mask);
 
 	return init_workarounds_ring(engine);
 }
@@ -1644,12 +1643,9 @@ gen6_ring_enable_irq(struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *dev_priv = engine->i915;
 
-	if (HAS_L3_DPF(dev_priv) && engine->id == RCS)
-		I915_WRITE_IMR(engine,
-			       ~(engine->irq_enable_mask |
-				 GT_PARITY_ERROR(dev_priv)));
-	else
-		I915_WRITE_IMR(engine, ~engine->irq_enable_mask);
+	I915_WRITE_IMR(engine,
+		       ~(engine->irq_enable_mask |
+			 engine->irq_keep_mask));
 	gen5_enable_gt_irq(dev_priv, engine->irq_enable_mask);
 }
 
@@ -1658,10 +1654,7 @@ gen6_ring_disable_irq(struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *dev_priv = engine->i915;
 
-	if (HAS_L3_DPF(dev_priv) && engine->id == RCS)
-		I915_WRITE_IMR(engine, ~GT_PARITY_ERROR(dev_priv));
-	else
-		I915_WRITE_IMR(engine, ~0);
+	I915_WRITE_IMR(engine, ~engine->irq_keep_mask);
 	gen5_disable_gt_irq(dev_priv, engine->irq_enable_mask);
 }
 
@@ -1688,12 +1681,9 @@ gen8_ring_enable_irq(struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *dev_priv = engine->i915;
 
-	if (HAS_L3_DPF(dev_priv) && engine->id == RCS)
-		I915_WRITE_IMR(engine,
-			       ~(engine->irq_enable_mask |
-				 GT_RENDER_L3_PARITY_ERROR_INTERRUPT));
-	else
-		I915_WRITE_IMR(engine, ~engine->irq_enable_mask);
+	I915_WRITE_IMR(engine,
+		       ~(engine->irq_enable_mask |
+			 engine->irq_keep_mask));
 	POSTING_READ_FW(RING_IMR(engine->mmio_base));
 }
 
@@ -1702,11 +1692,7 @@ gen8_ring_disable_irq(struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *dev_priv = engine->i915;
 
-	if (HAS_L3_DPF(dev_priv) && engine->id == RCS)
-		I915_WRITE_IMR(engine,
-			       ~GT_RENDER_L3_PARITY_ERROR_INTERRUPT);
-	else
-		I915_WRITE_IMR(engine, ~0);
+	I915_WRITE_IMR(engine, ~engine->irq_keep_mask);
 }
 
 static int
@@ -2621,6 +2607,9 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 	engine->hw_id = 0;
 	engine->mmio_base = RENDER_RING_BASE;
 
+	if (HAS_L3_DPF(dev_priv))
+		engine->irq_keep_mask = GT_RENDER_L3_PARITY_ERROR_INTERRUPT;
+
 	if (INTEL_GEN(dev_priv) >= 8) {
 		if (i915_semaphore_is_enabled(dev_priv)) {
 			obj = i915_gem_object_create(dev, 4096);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 182cae767bf1..166f1a3829b0 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -202,7 +202,8 @@ struct intel_engine_cs {
 	struct i915_ctx_workarounds wa_ctx;
 
 	bool		irq_posted;
-	u32		irq_enable_mask;	/* bitmask to enable ring interrupt */
+	u32             irq_keep_mask; /* bitmask for interrupts that should not be masked */
+	u32		irq_enable_mask;/* bitmask to enable ring interrupt */
 	void		(*irq_enable)(struct intel_engine_cs *ring);
 	void		(*irq_disable)(struct intel_engine_cs *ring);
 
@@ -299,7 +300,6 @@ struct intel_engine_cs {
 	unsigned int idle_lite_restore_wa;
 	bool disable_lite_restore_wa;
 	u32 ctx_desc_template;
-	u32             irq_keep_mask; /* bitmask for interrupts that should not be masked */
 	int		(*emit_request)(struct drm_i915_gem_request *request);
 	int		(*emit_flush)(struct drm_i915_gem_request *request,
 				      u32 invalidate_domains,
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 21/21] drm/i915: Remove debug noise on detecting fault-injection of missed interrupts
  2016-06-03 16:08 Breadcrumbs, again Chris Wilson
                   ` (19 preceding siblings ...)
  2016-06-03 16:08 ` [PATCH 20/21] drm/i915: Simplify enabling user-interrupts with L3-remapping Chris Wilson
@ 2016-06-03 16:08 ` Chris Wilson
  2016-06-07 12:51   ` Tvrtko Ursulin
  2016-06-03 16:35 ` ✗ Ro.CI.BAT: failure for series starting with [01/21] drm/i915/shrinker: Flush active on objects before counting Patchwork
  21 siblings, 1 reply; 60+ messages in thread
From: Chris Wilson @ 2016-06-03 16:08 UTC (permalink / raw)
  To: intel-gfx

Since the tests can and do explicitly check debugfs/i915_ring_missed_irqs
for the handling of a "missed interrupt", adding it to the dmesg at INFO
is just noise. When it happens for real, we still class it as an ERROR.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_irq.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 5bdb433dde8c..f74f5727ea77 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -3071,9 +3071,6 @@ static unsigned kick_waiters(struct intel_engine_cs *engine)
 		if (!test_bit(engine->id, &i915->gpu_error.test_irq_rings))
 			DRM_ERROR("Hangcheck timer elapsed... %s idle\n",
 				  engine->name);
-		else
-			DRM_INFO("Fake missed irq on %s\n",
-				 engine->name);
 
 		intel_engine_enable_fake_irq(engine);
 	}
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* ✗ Ro.CI.BAT: failure for series starting with [01/21] drm/i915/shrinker: Flush active on objects before counting
  2016-06-03 16:08 Breadcrumbs, again Chris Wilson
                   ` (20 preceding siblings ...)
  2016-06-03 16:08 ` [PATCH 21/21] drm/i915: Remove debug noise on detecting fault-injection of missed interrupts Chris Wilson
@ 2016-06-03 16:35 ` Patchwork
  21 siblings, 0 replies; 60+ messages in thread
From: Patchwork @ 2016-06-03 16:35 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/21] drm/i915/shrinker: Flush active on objects before counting
URL   : https://patchwork.freedesktop.org/series/8246/
State : failure

== Summary ==

Applying: drm/i915/shrinker: Flush active on objects before counting
Applying: drm/i915: Delay queuing hangcheck to wait-request
Applying: drm/i915: Remove the dedicated hangcheck workqueue
Using index info to reconstruct a base tree...
M	drivers/gpu/drm/i915/i915_drv.c
M	drivers/gpu/drm/i915/i915_drv.h
Falling back to patching base and 3-way merge...
Auto-merging drivers/gpu/drm/i915/i915_drv.h
Auto-merging drivers/gpu/drm/i915/i915_drv.c
CONFLICT (content): Merge conflict in drivers/gpu/drm/i915/i915_drv.c
error: Failed to merge in the changes.
Patch failed at 0003 drm/i915: Remove the dedicated hangcheck workqueue
The copy of the patch that failed is found in: .git/rebase-apply/patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 03/21] drm/i915: Remove the dedicated hangcheck workqueue
  2016-06-03 16:08 ` [PATCH 03/21] drm/i915: Remove the dedicated hangcheck workqueue Chris Wilson
@ 2016-06-06 12:52   ` Tvrtko Ursulin
  0 siblings, 0 replies; 60+ messages in thread
From: Tvrtko Ursulin @ 2016-06-06 12:52 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 03/06/16 17:08, Chris Wilson wrote:
> The queue only ever contains at most one item and has no special flags.
> It is just a very simple wrapper around the system-wq - a complication
> with no benefits.
>
> v2: Use the system_long_wq as we may wish to capture the error state
> after detecting the hang - which may take a bit of time.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_drv.c | 8 --------
>   drivers/gpu/drm/i915/i915_drv.h | 1 -
>   drivers/gpu/drm/i915/i915_irq.c | 7 ++++---
>   3 files changed, 4 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index 2f8b2545e3de..3c8c75c77574 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -1143,15 +1143,8 @@ static int i915_workqueues_init(struct drm_i915_private *dev_priv)
>   	if (dev_priv->hotplug.dp_wq == NULL)
>   		goto out_free_wq;
>
> -	dev_priv->gpu_error.hangcheck_wq =
> -		alloc_ordered_workqueue("i915-hangcheck", 0);
> -	if (dev_priv->gpu_error.hangcheck_wq == NULL)
> -		goto out_free_dp_wq;
> -
>   	return 0;
>
> -out_free_dp_wq:
> -	destroy_workqueue(dev_priv->hotplug.dp_wq);
>   out_free_wq:
>   	destroy_workqueue(dev_priv->wq);
>   out_err:
> @@ -1162,7 +1155,6 @@ out_err:
>
>   static void i915_workqueues_cleanup(struct drm_i915_private *dev_priv)
>   {
> -	destroy_workqueue(dev_priv->gpu_error.hangcheck_wq);
>   	destroy_workqueue(dev_priv->hotplug.dp_wq);
>   	destroy_workqueue(dev_priv->wq);
>   }
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index be533de7383b..9471ebc99624 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1365,7 +1365,6 @@ struct i915_gpu_error {
>   	/* Hang gpu twice in this window and your context gets banned */
>   #define DRM_I915_CTX_BAN_PERIOD DIV_ROUND_UP(8*DRM_I915_HANGCHECK_PERIOD, 1000)
>
> -	struct workqueue_struct *hangcheck_wq;
>   	struct delayed_work hangcheck_work;
>
>   	/* For reset and error_state handling. */
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 1303d7c034d3..a09310701999 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -3248,7 +3248,7 @@ out:
>
>   void i915_queue_hangcheck(struct drm_i915_private *dev_priv)
>   {
> -	struct i915_gpu_error *e = &dev_priv->gpu_error;
> +	unsigned long delay;
>
>   	if (!i915.enable_hangcheck)
>   		return;
> @@ -3258,8 +3258,9 @@ void i915_queue_hangcheck(struct drm_i915_private *dev_priv)
>   	 * we will ignore a hung ring if a second ring is kept busy.
>   	 */
>
> -	queue_delayed_work(e->hangcheck_wq, &e->hangcheck_work,
> -			   round_jiffies_up_relative(DRM_I915_HANGCHECK_JIFFIES));
> +	delay = round_jiffies_up_relative(DRM_I915_HANGCHECK_JIFFIES);
> +	queue_delayed_work(system_long_wq,
> +			   &dev_priv->gpu_error.hangcheck_work, delay);
>   }
>
>   static void ibx_irq_reset(struct drm_device *dev)
>

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 05/21] drm/i915: Separate GPU hang waitqueue from advance
  2016-06-03 16:08 ` [PATCH 05/21] drm/i915: Separate GPU hang waitqueue from advance Chris Wilson
@ 2016-06-06 13:00   ` Tvrtko Ursulin
  2016-06-07 12:11     ` Arun Siluvery
  0 siblings, 1 reply; 60+ messages in thread
From: Tvrtko Ursulin @ 2016-06-06 13:00 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 03/06/16 17:08, Chris Wilson wrote:
> Currently __i915_wait_request uses a per-engine wait_queue_t for the dual
> purpose of waking after the GPU advances or for waking after an error.
> In the future, we may add even more wake sources and require greater
> separation, but for now we can conceptually simplify wakeups by separating
> the two sources. In particular, this allows us to use different wait-queues
> (e.g. one on the engine advancement, a global one for errors and one on
> each requests) without any hassle.

+ Arun

I think this will conflict with the TDR work where one of the features 
is to make reset handling per engine. So I am not sure how beneficial in 
general, or painful for the TDR series, this patch might be.

Regards,

Tvrtko

> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_drv.h |  6 ++++++
>   drivers/gpu/drm/i915/i915_gem.c |  5 +++++
>   drivers/gpu/drm/i915/i915_irq.c | 19 ++++---------------
>   3 files changed, 15 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index ceccc6d6b119..e399e97965e0 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1401,6 +1401,12 @@ struct i915_gpu_error {
>   #define I915_WEDGED			(1 << 31)
>
>   	/**
> +	 * Waitqueue to signal when a hang is detected. Used to for waiters
> +	 * to release the struct_mutex for the reset to procede.
> +	 */
> +	wait_queue_head_t wait_queue;
> +
> +	/**
>   	 * Waitqueue to signal when the reset has completed. Used by clients
>   	 * that wait for dev_priv->mm.wedged to settle.
>   	 */
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 03256f096ab6..de4fb39312a4 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1234,6 +1234,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>   	const bool irq_test_in_progress =
>   		ACCESS_ONCE(dev_priv->gpu_error.test_irq_rings) & intel_engine_flag(engine);
>   	int state = interruptible ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
> +	DEFINE_WAIT(reset);
>   	DEFINE_WAIT(wait);
>   	unsigned long timeout_expire;
>   	s64 before = 0; /* Only to silence a compiler warning. */
> @@ -1278,6 +1279,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>   		goto out;
>   	}
>
> +	add_wait_queue(&dev_priv->gpu_error.wait_queue, &reset);
>   	for (;;) {
>   		struct timer_list timer;
>
> @@ -1329,6 +1331,8 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>   			destroy_timer_on_stack(&timer);
>   		}
>   	}
> +	remove_wait_queue(&dev_priv->gpu_error.wait_queue, &reset);
> +
>   	if (!irq_test_in_progress)
>   		engine->irq_put(engine);
>
> @@ -5026,6 +5030,7 @@ i915_gem_load_init(struct drm_device *dev)
>   			  i915_gem_retire_work_handler);
>   	INIT_DELAYED_WORK(&dev_priv->mm.idle_work,
>   			  i915_gem_idle_work_handler);
> +	init_waitqueue_head(&dev_priv->gpu_error.wait_queue);
>   	init_waitqueue_head(&dev_priv->gpu_error.reset_queue);
>
>   	dev_priv->relative_constants_mode = I915_EXEC_CONSTANTS_REL_GENERAL;
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 83cab14639b2..30127b94f26e 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -2488,11 +2488,8 @@ static irqreturn_t gen8_irq_handler(int irq, void *arg)
>   	return ret;
>   }
>
> -static void i915_error_wake_up(struct drm_i915_private *dev_priv,
> -			       bool reset_completed)
> +static void i915_error_wake_up(struct drm_i915_private *dev_priv)
>   {
> -	struct intel_engine_cs *engine;
> -
>   	/*
>   	 * Notify all waiters for GPU completion events that reset state has
>   	 * been changed, and that they need to restart their wait after
> @@ -2501,18 +2498,10 @@ static void i915_error_wake_up(struct drm_i915_private *dev_priv,
>   	 */
>
>   	/* Wake up __wait_seqno, potentially holding dev->struct_mutex. */
> -	for_each_engine(engine, dev_priv)
> -		wake_up_all(&engine->irq_queue);
> +	wake_up_all(&dev_priv->gpu_error.wait_queue);
>
>   	/* Wake up intel_crtc_wait_for_pending_flips, holding crtc->mutex. */
>   	wake_up_all(&dev_priv->pending_flip_queue);
> -
> -	/*
> -	 * Signal tasks blocked in i915_gem_wait_for_error that the pending
> -	 * reset state is cleared.
> -	 */
> -	if (reset_completed)
> -		wake_up_all(&dev_priv->gpu_error.reset_queue);
>   }
>
>   /**
> @@ -2577,7 +2566,7 @@ static void i915_reset_and_wakeup(struct drm_i915_private *dev_priv)
>   		 * Note: The wake_up also serves as a memory barrier so that
>   		 * waiters see the update value of the reset counter atomic_t.
>   		 */
> -		i915_error_wake_up(dev_priv, true);
> +		wake_up_all(&dev_priv->gpu_error.reset_queue);
>   	}
>   }
>
> @@ -2713,7 +2702,7 @@ void i915_handle_error(struct drm_i915_private *dev_priv,
>   		 * ensure that the waiters see the updated value of the reset
>   		 * counter atomic_t.
>   		 */
> -		i915_error_wake_up(dev_priv, false);
> +		i915_error_wake_up(dev_priv);
>   	}
>
>   	i915_reset_and_wakeup(dev_priv);
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 16/21] drm/i915: Only query timestamp when measuring elapsed time
  2016-06-03 16:08 ` [PATCH 16/21] drm/i915: Only query timestamp when measuring elapsed time Chris Wilson
@ 2016-06-06 13:50   ` Tvrtko Ursulin
  0 siblings, 0 replies; 60+ messages in thread
From: Tvrtko Ursulin @ 2016-06-06 13:50 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 03/06/16 17:08, Chris Wilson wrote:
> Avoid the two calls to ktime_get_raw_ns() (at best it reads the TSC) as
> we only need to compute the elapsed time for a timed wait.
>
> v2: Eliminate the unused local variable reducing the function size by 64
> bytes (using the storage space on the callers stack rather than adding
> to our stack frame). Writing the code this emits smaller and faster code
> for the normal case.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_gem.c | 14 +++++---------
>   1 file changed, 5 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 269d00a40483..fdbad07b5f42 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1212,7 +1212,6 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>   	DEFINE_WAIT(reset);
>   	struct intel_wait wait;
>   	unsigned long timeout_remain;
> -	s64 before = 0; /* Only to silence a compiler warning. */
>   	int ret = 0;
>
>   	might_sleep();
> @@ -1231,12 +1230,9 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>   		if (*timeout == 0)
>   			return -ETIME;
>
> +		/* Record current time in case interrupted, or wedged */
>   		timeout_remain = nsecs_to_jiffies_timeout(*timeout);
> -
> -		/*
> -		 * Record current time in case interrupted by signal, or wedged.
> -		 */
> -		before = ktime_get_raw_ns();
> +		*timeout += ktime_get_raw_ns();
>   	}
>
>   	trace_i915_gem_request_wait_begin(req);
> @@ -1299,9 +1295,9 @@ complete:
>   	trace_i915_gem_request_wait_end(req);
>
>   	if (timeout) {
> -		s64 tres = *timeout - (ktime_get_raw_ns() - before);
> -
> -		*timeout = tres < 0 ? 0 : tres;
> +		*timeout -= ktime_get_raw_ns();
> +		if (*timeout < 0)
> +			*timeout = 0;
>
>   		/*
>   		 * Apparently ktime isn't accurate enough and occasionally has a
>

Still a no from me on this one.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 06/21] drm/i915: Slaughter the thundering i915_wait_request herd
  2016-06-03 16:08 ` [PATCH 06/21] drm/i915: Slaughter the thundering i915_wait_request herd Chris Wilson
@ 2016-06-06 13:58   ` Tvrtko Ursulin
  0 siblings, 0 replies; 60+ messages in thread
From: Tvrtko Ursulin @ 2016-06-06 13:58 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Goel, Akash


On 03/06/16 17:08, Chris Wilson wrote:
> One particularly stressful scenario consists of many independent tasks
> all competing for GPU time and waiting upon the results (e.g. realtime
> transcoding of many, many streams). One bottleneck in particular is that
> each client waits on its own results, but every client is woken up after
> every batchbuffer - hence the thunder of hooves as then every client must
> do its heavyweight dance to read a coherent seqno to see if it is the
> lucky one.
>
> Ideally, we only want one client to wake up after the interrupt and
> check its request for completion. Since the requests must retire in
> order, we can select the first client on the oldest request to be woken.
> Once that client has completed his wait, we can then wake up the
> next client and so on. However, all clients then incur latency as every
> process in the chain may be delayed for scheduling - this may also then
> cause some priority inversion. To reduce the latency, when a client
> is added or removed from the list, we scan the tree for completed
> seqno and wake up all the completed waiters in parallel.
>
> Using igt/benchmarks/gem_latency, we can demonstrate this effect. The
> benchmark measures the number of GPU cycles between completion of a
> batch and the client waking up from a call to wait-ioctl. With many
> concurrent waiters, with each on a different request, we observe that
> the wakeup latency before the patch scales nearly linearly with the
> number of waiters (before external factors kick in making the scaling much
> worse). After applying the patch, we can see that only the single waiter
> for the request is being woken up, providing a constant wakeup latency
> for every operation. However, the situation is not quite as rosy for
> many waiters on the same request, though to the best of my knowledge this
> is much less likely in practice. Here, we can observe that the
> concurrent waiters incur extra latency from being woken up by the
> solitary bottom-half, rather than directly by the interrupt. This
> appears to be scheduler induced (having discounted adverse effects from
> having a rbtree walk/erase in the wakeup path), each additional
> wake_up_process() costs approximately 1us on big core. Another effect of
> performing the secondary wakeups from the first bottom-half is the
> incurred delay this imposes on high priority threads - rather than
> immediately returning to userspace and leaving the interrupt handler to
> wake the others.
>
> To offset the delay incurred with additional waiters on a request, we
> could use a hybrid scheme that did a quick read in the interrupt handler
> and dequeued all the completed waiters (incurring the overhead in the
> interrupt handler, not the best plan either as we then incur GPU
> submission latency) but we would still have to wake up the bottom-half
> every time to do the heavyweight slow read. Or we could only kick the
> waiters on the seqno with the same priority as the current task (i.e. in
> the realtime waiter scenario, only it is woken up immediately by the
> interrupt and simply queues the next waiter before returning to userspace,
> minimising its delay at the expense of the chain, and also reducing
> contention on its scheduler runqueue). This is effective at avoid long
> pauses in the interrupt handler and at avoiding the extra latency in
> realtime/high-priority waiters.
>
> v2: Convert from a kworker per engine into a dedicated kthread for the
> bottom-half.
> v3: Rename request members and tweak comments.
> v4: Use a per-engine spinlock in the breadcrumbs bottom-half.
> v5: Fix race in locklessly checking waiter status and kicking the task on
> adding a new waiter.
> v6: Fix deciding when to force the timer to hide missing interrupts.
> v7: Move the bottom-half from the kthread to the first client process.
> v8: Reword a few comments
> v9: Break the busy loop when the interrupt is unmasked or has fired.
> v10: Comments, unnecessary churn, better debugging from Tvrtko
> v11: Wake all completed waiters on removing the current bottom-half to
> reduce the latency of waking up a herd of clients all waiting on the
> same request.
> v12: Rearrange missed-interrupt fault injection so that it works with
> igt/drv_missed_irq_hang
> v13: Rename intel_breadcrumb and friends to intel_wait in preparation
> for signal handling.
> v14: RCU commentary, assert_spin_locked
> v15: Hide BUG_ON behind the compiler; report on gem_latency findings.
> v16: Sort seqno-groups by priority so that first-waiter has the highest
> task priority (and so avoid priority inversion).
> v17: Add waiters to post-mortem GPU hang state.
>
> Testcase: igt/gem_concurrent_blit
> Testcase: igt/benchmarks/gem_latency
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: "Rogozhkin, Dmitry V" <dmitry.v.rogozhkin@intel.com>
> Cc: "Gong, Zhipeng" <zhipeng.gong@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> Cc: Dave Gordon <david.s.gordon@intel.com>
> Cc: "Goel, Akash" <akash.goel@intel.com>
> ---
>   drivers/gpu/drm/i915/Makefile            |   1 +
>   drivers/gpu/drm/i915/i915_debugfs.c      |  15 +-
>   drivers/gpu/drm/i915/i915_drv.h          |  39 +++-
>   drivers/gpu/drm/i915/i915_gem.c          | 141 +++++-------
>   drivers/gpu/drm/i915/i915_gpu_error.c    |  59 +++++-
>   drivers/gpu/drm/i915/i915_irq.c          |  20 +-
>   drivers/gpu/drm/i915/intel_breadcrumbs.c | 354 +++++++++++++++++++++++++++++++
>   drivers/gpu/drm/i915/intel_lrc.c         |   4 +-
>   drivers/gpu/drm/i915/intel_ringbuffer.c  |   3 +-
>   drivers/gpu/drm/i915/intel_ringbuffer.h  |  68 +++++-
>   10 files changed, 595 insertions(+), 109 deletions(-)
>   create mode 100644 drivers/gpu/drm/i915/intel_breadcrumbs.c
>
> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> index 7aecd309604c..f20007440821 100644
> --- a/drivers/gpu/drm/i915/Makefile
> +++ b/drivers/gpu/drm/i915/Makefile
> @@ -38,6 +38,7 @@ i915-y += i915_cmd_parser.o \
>   	  i915_gem_userptr.o \
>   	  i915_gpu_error.o \
>   	  i915_trace_points.o \
> +	  intel_breadcrumbs.o \
>   	  intel_lrc.o \
>   	  intel_mocs.o \
>   	  intel_ringbuffer.o \
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 3a0babe32621..48683538b4e2 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -788,10 +788,21 @@ static int i915_gem_request_info(struct seq_file *m, void *data)
>   static void i915_ring_seqno_info(struct seq_file *m,
>   				 struct intel_engine_cs *engine)
>   {
> +	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> +	struct rb_node *rb;
> +
>   	seq_printf(m, "Current sequence (%s): %x\n",
>   		   engine->name, engine->get_seqno(engine));
>   	seq_printf(m, "Current user interrupts (%s): %x\n",
>   		   engine->name, READ_ONCE(engine->user_interrupts));
> +
> +	spin_lock(&b->lock);
> +	for (rb = rb_first(&b->waiters); rb != NULL; rb = rb_next(rb)) {
> +		struct intel_wait *w = container_of(rb, typeof(*w), node);
> +		seq_printf(m, "Waiting (%s): %s [%d] on %x\n",
> +			   engine->name, w->task->comm, w->task->pid, w->seqno);
> +	}
> +	spin_unlock(&b->lock);
>   }
>
>   static int i915_gem_seqno_info(struct seq_file *m, void *data)
> @@ -1426,6 +1437,8 @@ static int i915_hangcheck_info(struct seq_file *m, void *unused)
>   			   engine->hangcheck.seqno,
>   			   seqno[id],
>   			   engine->last_submitted_seqno);
> +		seq_printf(m, "\twaiters? %d\n",
> +			   intel_engine_has_waiter(engine));
>   		seq_printf(m, "\tuser interrupts = %x [current %x]\n",
>   			   engine->hangcheck.user_interrupts,
>   			   READ_ONCE(engine->user_interrupts));
> @@ -2411,7 +2424,7 @@ static int count_irq_waiters(struct drm_i915_private *i915)
>   	int count = 0;
>
>   	for_each_engine(engine, i915)
> -		count += engine->irq_refcount;
> +		count += intel_engine_has_waiter(engine);
>
>   	return count;
>   }
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index e399e97965e0..68b383d98457 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -503,6 +503,7 @@ struct drm_i915_error_state {
>   		bool valid;
>   		/* Software tracked state */
>   		bool waiting;
> +		int num_waiters;
>   		int hangcheck_score;
>   		enum intel_ring_hangcheck_action hangcheck_action;
>   		int num_requests;
> @@ -548,6 +549,12 @@ struct drm_i915_error_state {
>   			u32 tail;
>   		} *requests;
>
> +		struct drm_i915_error_waiter {
> +			char comm[TASK_COMM_LEN];
> +			pid_t pid;
> +			u32 seqno;
> +		} *waiters;
> +
>   		struct {
>   			u32 gfx_mode;
>   			union {
> @@ -1420,7 +1427,7 @@ struct i915_gpu_error {
>   #define I915_STOP_RING_ALLOW_WARN      (1 << 30)
>
>   	/* For missed irq/seqno simulation. */
> -	unsigned int test_irq_rings;
> +	unsigned long test_irq_rings;
>   };
>
>   enum modeset_restore {
> @@ -3013,7 +3020,6 @@ ibx_disable_display_interrupt(struct drm_i915_private *dev_priv, uint32_t bits)
>   	ibx_display_interrupt_update(dev_priv, bits, 0);
>   }
>
> -
>   /* i915_gem.c */
>   int i915_gem_create_ioctl(struct drm_device *dev, void *data,
>   			  struct drm_file *file_priv);
> @@ -3905,4 +3911,33 @@ static inline void i915_trace_irq_get(struct intel_engine_cs *engine,
>   		i915_gem_request_assign(&engine->trace_irq_req, req);
>   }
>
> +static inline bool __i915_request_irq_complete(struct drm_i915_gem_request *req)
> +{
> +	/* Ensure our read of the seqno is coherent so that we
> +	 * do not "miss an interrupt" (i.e. if this is the last
> +	 * request and the seqno write from the GPU is not visible
> +	 * by the time the interrupt fires, we will see that the
> +	 * request is incomplete and go back to sleep awaiting
> +	 * another interrupt that will never come.)
> +	 *
> +	 * Strictly, we only need to do this once after an interrupt,
> +	 * but it is easier and safer to do it every time the waiter
> +	 * is woken.
> +	 */
> +	if (i915_gem_request_completed(req, false))
> +		return true;
> +
> +	/* We need to check whether any gpu reset happened in between
> +	 * the request being submitted and now. If a reset has occurred,
> +	 * the request is effectively complete (we either are in the
> +	 * process of or have discarded the rendering and completely
> +	 * reset the GPU. The results of the request are lost and we
> +	 * are free to continue on with the original operation.
> +	 */
> +	if (req->reset_counter != i915_reset_counter(&req->i915->gpu_error))
> +		return true;
> +
> +	return false;
> +}
> +
>   #endif
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index de4fb39312a4..d08edb3d16f1 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1123,17 +1123,6 @@ i915_gem_check_wedge(unsigned reset_counter, bool interruptible)
>   	return 0;
>   }
>
> -static void fake_irq(unsigned long data)
> -{
> -	wake_up_process((struct task_struct *)data);
> -}
> -
> -static bool missed_irq(struct drm_i915_private *dev_priv,
> -		       struct intel_engine_cs *engine)
> -{
> -	return test_bit(engine->id, &dev_priv->gpu_error.missed_irq_rings);
> -}
> -
>   static unsigned long local_clock_us(unsigned *cpu)
>   {
>   	unsigned long t;
> @@ -1166,7 +1155,7 @@ static bool busywait_stop(unsigned long timeout, unsigned cpu)
>   	return this_cpu != cpu;
>   }
>
> -static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
> +static bool __i915_spin_request(struct drm_i915_gem_request *req, int state)
>   {
>   	unsigned long timeout;
>   	unsigned cpu;
> @@ -1181,17 +1170,14 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
>   	 * takes to sleep on a request, on the order of a microsecond.
>   	 */
>
> -	if (req->engine->irq_refcount)
> -		return -EBUSY;
> -
>   	/* Only spin if we know the GPU is processing this request */
>   	if (!i915_gem_request_started(req, true))
> -		return -EAGAIN;
> +		return false;
>
>   	timeout = local_clock_us(&cpu) + 5;
> -	while (!need_resched()) {
> +	do {
>   		if (i915_gem_request_completed(req, true))
> -			return 0;
> +			return true;
>
>   		if (signal_pending_state(state, current))
>   			break;
> @@ -1200,12 +1186,9 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
>   			break;
>
>   		cpu_relax_lowlatency();
> -	}
> -
> -	if (i915_gem_request_completed(req, false))
> -		return 0;
> +	} while (!need_resched());
>
> -	return -EAGAIN;
> +	return false;
>   }
>
>   /**
> @@ -1229,18 +1212,14 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>   			s64 *timeout,
>   			struct intel_rps_client *rps)
>   {
> -	struct intel_engine_cs *engine = i915_gem_request_get_engine(req);
> -	struct drm_i915_private *dev_priv = req->i915;
> -	const bool irq_test_in_progress =
> -		ACCESS_ONCE(dev_priv->gpu_error.test_irq_rings) & intel_engine_flag(engine);
>   	int state = interruptible ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
>   	DEFINE_WAIT(reset);
> -	DEFINE_WAIT(wait);
> -	unsigned long timeout_expire;
> +	struct intel_wait wait;
> +	unsigned long timeout_remain;
>   	s64 before = 0; /* Only to silence a compiler warning. */
> -	int ret;
> +	int ret = 0;
>
> -	WARN(!intel_irqs_enabled(dev_priv), "IRQs disabled");
> +	might_sleep();
>
>   	if (list_empty(&req->list))
>   		return 0;
> @@ -1248,7 +1227,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>   	if (i915_gem_request_completed(req, true))
>   		return 0;
>
> -	timeout_expire = 0;
> +	timeout_remain = MAX_SCHEDULE_TIMEOUT;
>   	if (timeout) {
>   		if (WARN_ON(*timeout < 0))
>   			return -EINVAL;
> @@ -1256,7 +1235,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>   		if (*timeout == 0)
>   			return -ETIME;
>
> -		timeout_expire = jiffies + nsecs_to_jiffies_timeout(*timeout);
> +		timeout_remain = nsecs_to_jiffies_timeout(*timeout);
>
>   		/*
>   		 * Record current time in case interrupted by signal, or wedged.
> @@ -1264,81 +1243,59 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>   		before = ktime_get_raw_ns();
>   	}
>
> -	if (INTEL_INFO(dev_priv)->gen >= 6)
> -		gen6_rps_boost(dev_priv, rps, req->emitted_jiffies);
> -
>   	trace_i915_gem_request_wait_begin(req);
>
> -	/* Optimistic spin for the next jiffie before touching IRQs */
> -	ret = __i915_spin_request(req, state);
> -	if (ret == 0)
> -		goto out;
> -
> -	if (!irq_test_in_progress && WARN_ON(!engine->irq_get(engine))) {
> -		ret = -ENODEV;
> -		goto out;
> -	}
> -
> -	add_wait_queue(&dev_priv->gpu_error.wait_queue, &reset);
> -	for (;;) {
> -		struct timer_list timer;
> +	if (INTEL_INFO(req->i915)->gen >= 6)
> +		gen6_rps_boost(req->i915, rps, req->emitted_jiffies);
>
> -		prepare_to_wait(&engine->irq_queue, &wait, state);
> +	/* Optimistic spin for the next ~jiffie before touching IRQs */
> +	if (__i915_spin_request(req, state))
> +		goto complete;
>
> -		/* We need to check whether any gpu reset happened in between
> -		 * the request being submitted and now. If a reset has occurred,
> -		 * the request is effectively complete (we either are in the
> -		 * process of or have discarded the rendering and completely
> -		 * reset the GPU. The results of the request are lost and we
> -		 * are free to continue on with the original operation.
> +	intel_wait_init(&wait, req->seqno);
> +	set_current_state(state);
> +	if (intel_engine_add_wait(req->engine, &wait))
> +		/* In order to check that we haven't missed the interrupt
> +		 * as we enabled it, we need to kick ourselves to do a
> +		 * coherent check on the seqno before we sleep.
>   		 */
> -		if (req->reset_counter != i915_reset_counter(&dev_priv->gpu_error)) {
> -			ret = 0;
> -			break;
> -		}
> -
> -		if (i915_gem_request_completed(req, false)) {
> -			ret = 0;
> -			break;
> -		}
> +		goto wakeup;
>
> +	add_wait_queue(&req->i915->gpu_error.wait_queue, &reset);
> +	for (;;) {
>   		if (signal_pending_state(state, current)) {
>   			ret = -ERESTARTSYS;
>   			break;
>   		}
>
> -		if (timeout && time_after_eq(jiffies, timeout_expire)) {
> +		/* Ensure that even if the GPU hangs, we get woken up. */
> +		i915_queue_hangcheck(req->i915);
> +
> +		timeout_remain = io_schedule_timeout(timeout_remain);
> +		if (timeout_remain == 0) {
>   			ret = -ETIME;
>   			break;
>   		}
>
> -		/* Ensure that even if the GPU hangs, we get woken up. */
> -		i915_queue_hangcheck(dev_priv);
> -
> -		timer.function = NULL;
> -		if (timeout || missed_irq(dev_priv, engine)) {
> -			unsigned long expire;
> -
> -			setup_timer_on_stack(&timer, fake_irq, (unsigned long)current);
> -			expire = missed_irq(dev_priv, engine) ? jiffies + 1 : timeout_expire;
> -			mod_timer(&timer, expire);
> -		}
> +		if (intel_wait_complete(&wait))
> +			break;
>
> -		io_schedule();
> +wakeup:
> +		set_current_state(state);
>
> -		if (timer.function) {
> -			del_singleshot_timer_sync(&timer);
> -			destroy_timer_on_stack(&timer);
> -		}
> +		/* Carefully check if the request is complete, giving time
> +		 * for the seqno to be visible following the interrupt.
> +		 * We also have to check in case we are kicked by the GPU
> +		 * reset in order to drop the struct_mutex.
> +		 */
> +		if (__i915_request_irq_complete(req))
> +			break;
>   	}
> -	remove_wait_queue(&dev_priv->gpu_error.wait_queue, &reset);
> +	remove_wait_queue(&req->i915->gpu_error.wait_queue, &reset);
>
> -	if (!irq_test_in_progress)
> -		engine->irq_put(engine);
> -
> -	finish_wait(&engine->irq_queue, &wait);
> -
> -out:
> +	intel_engine_remove_wait(req->engine, &wait);
> +	__set_current_state(TASK_RUNNING);
> +complete:
>   	trace_i915_gem_request_wait_end(req);
>
>   	if (timeout) {
> @@ -2545,6 +2502,12 @@ i915_gem_init_seqno(struct drm_i915_private *dev_priv, u32 seqno)
>   	}
>   	i915_gem_retire_requests(dev_priv);
>
> +	/* If the seqno wraps around, we need to clear the breadcrumb rbtree */
> +	if (!i915_seqno_passed(seqno, dev_priv->next_seqno)) {
> +		while (intel_kick_waiters(dev_priv))
> +			yield();
> +	}
> +
>   	/* Finally reset hw state */
>   	for_each_engine(engine, dev_priv)
>   		intel_ring_init_seqno(engine, seqno);
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index 34ff2459ceea..89241ffcc676 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -463,6 +463,18 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
>   			}
>   		}
>
> +		if (error->ring[i].num_waiters) {
> +			err_printf(m, "%s --- %d waiters\n",
> +				   dev_priv->engine[i].name,
> +				   error->ring[i].num_waiters);
> +			for (j = 0; j < error->ring[i].num_waiters; j++) {
> +				err_printf(m, " seqno 0x%08x for %s [%d]\n",
> +					   error->ring[i].waiters[j].seqno,
> +					   error->ring[i].waiters[j].comm,
> +					   error->ring[i].waiters[j].pid);
> +			}
> +		}
> +
>   		if ((obj = error->ring[i].ringbuffer)) {
>   			err_printf(m, "%s --- ringbuffer = 0x%08x\n",
>   				   dev_priv->engine[i].name,
> @@ -605,8 +617,9 @@ static void i915_error_state_free(struct kref *error_ref)
>   		i915_error_object_free(error->ring[i].ringbuffer);
>   		i915_error_object_free(error->ring[i].hws_page);
>   		i915_error_object_free(error->ring[i].ctx);
> -		kfree(error->ring[i].requests);
>   		i915_error_object_free(error->ring[i].wa_ctx);
> +		kfree(error->ring[i].requests);
> +		kfree(error->ring[i].waiters);
>   	}
>
>   	i915_error_object_free(error->semaphore_obj);
> @@ -892,6 +905,47 @@ static void gen6_record_semaphore_state(struct drm_i915_private *dev_priv,
>   	}
>   }
>
> +static void engine_record_waiters(struct intel_engine_cs *engine,
> +				  struct drm_i915_error_ring *ering)
> +{
> +	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> +	struct drm_i915_error_waiter *waiter;
> +	struct rb_node *rb;
> +	int count;
> +
> +	ering->num_waiters = 0;
> +	ering->waiters = NULL;
> +
> +	spin_lock(&b->lock);
> +	count = 0;
> +	for (rb = rb_first(&b->waiters); rb != NULL; rb = rb_next(rb))
> +		count++;
> +	spin_unlock(&b->lock);
> +
> +	waiter = NULL;
> +	if (count)
> +		waiter = kmalloc(count*sizeof(struct drm_i915_error_waiter),
> +				 GFP_ATOMIC);
> +	if (!waiter)
> +		return;
> +
> +	ering->waiters = waiter;
> +
> +	spin_lock(&b->lock);
> +	for (rb = rb_first(&b->waiters); rb != NULL; rb = rb_next(rb)) {
> +		struct intel_wait *w = container_of(rb, typeof(*w), node);
> +
> +		strcpy(waiter->comm, w->task->comm);
> +		waiter->pid = w->task->pid;
> +		waiter->seqno = w->seqno;
> +		waiter++;
> +
> +		if (++ering->num_waiters == count)
> +			break;
> +	}
> +	spin_unlock(&b->lock);
> +}
> +
>   static void i915_record_ring_state(struct drm_i915_private *dev_priv,
>   				   struct drm_i915_error_state *error,
>   				   struct intel_engine_cs *engine,
> @@ -926,7 +980,7 @@ static void i915_record_ring_state(struct drm_i915_private *dev_priv,
>   		ering->instdone = I915_READ(GEN2_INSTDONE);
>   	}
>
> -	ering->waiting = waitqueue_active(&engine->irq_queue);
> +	ering->waiting = intel_engine_has_waiter(engine);
>   	ering->instpm = I915_READ(RING_INSTPM(engine->mmio_base));
>   	ering->acthd = intel_ring_get_active_head(engine);
>   	ering->seqno = engine->get_seqno(engine);
> @@ -1032,6 +1086,7 @@ static void i915_gem_record_rings(struct drm_i915_private *dev_priv,
>   		error->ring[i].valid = true;
>
>   		i915_record_ring_state(dev_priv, error, engine, &error->ring[i]);
> +		engine_record_waiters(engine, &error->ring[i]);
>
>   		request = i915_gem_find_active_request(engine);
>   		if (request) {
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 30127b94f26e..2a736f4a0fe5 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -976,13 +976,10 @@ static void ironlake_rps_change_irq_handler(struct drm_i915_private *dev_priv)
>
>   static void notify_ring(struct intel_engine_cs *engine)
>   {
> -	if (!intel_engine_initialized(engine))
> -		return;
> -
> -	trace_i915_gem_request_notify(engine);
> -	engine->user_interrupts++;
> -
> -	wake_up_all(&engine->irq_queue);
> +	if (intel_engine_wakeup(engine)) {
> +		trace_i915_gem_request_notify(engine);
> +		engine->user_interrupts++;
> +	}
>   }
>
>   static void vlv_c0_read(struct drm_i915_private *dev_priv,
> @@ -1063,7 +1060,7 @@ static bool any_waiters(struct drm_i915_private *dev_priv)
>   	struct intel_engine_cs *engine;
>
>   	for_each_engine(engine, dev_priv)
> -		if (engine->irq_refcount)
> +		if (intel_engine_has_waiter(engine))
>   			return true;
>
>   	return false;
> @@ -3073,13 +3070,14 @@ static unsigned kick_waiters(struct intel_engine_cs *engine)
>
>   	if (engine->hangcheck.user_interrupts == user_interrupts &&
>   	    !test_and_set_bit(engine->id, &i915->gpu_error.missed_irq_rings)) {
> -		if (!(i915->gpu_error.test_irq_rings & intel_engine_flag(engine)))
> +		if (!test_bit(engine->id, &i915->gpu_error.test_irq_rings))
>   			DRM_ERROR("Hangcheck timer elapsed... %s idle\n",
>   				  engine->name);
>   		else
>   			DRM_INFO("Fake missed irq on %s\n",
>   				 engine->name);
> -		wake_up_all(&engine->irq_queue);
> +
> +		intel_engine_enable_fake_irq(engine);
>   	}
>
>   	return user_interrupts;
> @@ -3123,7 +3121,7 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
>   	intel_uncore_arm_unclaimed_mmio_detection(dev_priv);
>
>   	for_each_engine_id(engine, dev_priv, id) {
> -		bool busy = waitqueue_active(&engine->irq_queue);
> +		bool busy = intel_engine_has_waiter(engine);
>   		u64 acthd;
>   		u32 seqno;
>   		unsigned user_interrupts;
> diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> new file mode 100644
> index 000000000000..e0121f727938
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> @@ -0,0 +1,354 @@
> +/*
> + * Copyright © 2015 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
> + * IN THE SOFTWARE.
> + *
> + */
> +
> +#include "i915_drv.h"
> +
> +static void intel_breadcrumbs_fake_irq(unsigned long data)
> +{
> +	struct intel_engine_cs *engine = (struct intel_engine_cs *)data;
> +
> +	/*
> +	 * The timer persists in case we cannot enable interrupts,
> +	 * or if we have previously seen seqno/interrupt incoherency
> +	 * ("missed interrupt" syndrome). Here the worker will wake up
> +	 * every jiffie in order to kick the oldest waiter to do the
> +	 * coherent seqno check.
> +	 */
> +	rcu_read_lock();
> +	if (intel_engine_wakeup(engine))
> +		mod_timer(&engine->breadcrumbs.fake_irq, jiffies + 1);
> +	rcu_read_unlock();
> +}
> +
> +static void irq_enable(struct intel_engine_cs *engine)
> +{
> +	WARN_ON(!engine->irq_get(engine));
> +}
> +
> +static void irq_disable(struct intel_engine_cs *engine)
> +{
> +	engine->irq_put(engine);
> +}
> +
> +static bool __intel_breadcrumbs_enable_irq(struct intel_breadcrumbs *b)
> +{
> +	struct intel_engine_cs *engine =
> +		container_of(b, struct intel_engine_cs, breadcrumbs);
> +	struct drm_i915_private *i915 = engine->i915;
> +	bool irq_posted = false;
> +
> +	assert_spin_locked(&b->lock);
> +	if (b->rpm_wakelock)
> +		return false;
> +
> +	/* Since we are waiting on a request, the GPU should be busy
> +	 * and should have its own rpm reference. For completeness,
> +	 * record an rpm reference for ourselves to cover the
> +	 * interrupt we unmask.
> +	 */
> +	intel_runtime_pm_get_noresume(i915);
> +	b->rpm_wakelock = true;
> +
> +	/* No interrupts? Kick the waiter every jiffie! */
> +	if (intel_irqs_enabled(i915)) {
> +		if (!test_bit(engine->id, &i915->gpu_error.test_irq_rings)) {
> +			irq_enable(engine);
> +			irq_posted = true;
> +		}
> +		b->irq_enabled = true;
> +	}
> +
> +	if (!b->irq_enabled ||
> +	    test_bit(engine->id, &i915->gpu_error.missed_irq_rings))
> +		mod_timer(&b->fake_irq, jiffies + 1);
> +
> +	return irq_posted;
> +}
> +
> +static void __intel_breadcrumbs_disable_irq(struct intel_breadcrumbs *b)
> +{
> +	struct intel_engine_cs *engine =
> +		container_of(b, struct intel_engine_cs, breadcrumbs);
> +
> +	assert_spin_locked(&b->lock);
> +	if (!b->rpm_wakelock)
> +		return;
> +
> +	if (b->irq_enabled) {
> +		irq_disable(engine);
> +		b->irq_enabled = false;
> +	}
> +
> +	intel_runtime_pm_put(engine->i915);
> +	b->rpm_wakelock = false;
> +}
> +
> +static inline struct intel_wait *to_wait(struct rb_node *node)
> +{
> +	return container_of(node, struct intel_wait, node);
> +}
> +
> +static inline void __intel_breadcrumbs_finish(struct intel_breadcrumbs *b,
> +					      struct intel_wait *wait)
> +{
> +	assert_spin_locked(&b->lock);
> +
> +	/* This request is completed, so remove it from the tree, mark it as
> +	 * complete, and *then* wake up the associated task.
> +	 */
> +	rb_erase(&wait->node, &b->waiters);
> +	RB_CLEAR_NODE(&wait->node);
> +
> +	wake_up_process(wait->task); /* implicit smp_wmb() */
> +}
> +
> +bool intel_engine_add_wait(struct intel_engine_cs *engine,
> +			   struct intel_wait *wait)
> +{
> +	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> +	struct rb_node **p, *parent, *completed;
> +	bool first;
> +	u32 seqno;
> +
> +	spin_lock(&b->lock);
> +
> +	/* Insert the request into the retirement ordered list
> +	 * of waiters by walking the rbtree. If we are the oldest
> +	 * seqno in the tree (the first to be retired), then
> +	 * set ourselves as the bottom-half.
> +	 *
> +	 * As we descend the tree, prune completed branches since we hold the
> +	 * spinlock we know that the first_waiter must be delayed and can
> +	 * reduce some of the sequential wake up latency if we take action
> +	 * ourselves and wake up the completed tasks in parallel. Also, by
> +	 * removing stale elements in the tree, we may be able to reduce the
> +	 * ping-pong between the old bottom-half and ourselves as first-waiter.
> +	 */
> +	first = true;
> +	parent = NULL;
> +	completed = NULL;
> +	seqno = engine->get_seqno(engine);
> +
> +	p = &b->waiters.rb_node;
> +	while (*p) {
> +		parent = *p;
> +		if (wait->seqno == to_wait(parent)->seqno) {
> +			/* We have multiple waiters on the same seqno, select
> +			 * the highest priority task (that with the smallest
> +			 * task->prio) to serve as the bottom-half for this
> +			 * group.
> +			 */
> +			if (wait->task->prio > to_wait(parent)->task->prio) {
> +				p = &parent->rb_right;
> +				first = false;
> +			} else
> +				p = &parent->rb_left;
> +		} else if (i915_seqno_passed(wait->seqno,
> +					     to_wait(parent)->seqno)) {
> +			p = &parent->rb_right;
> +			if (i915_seqno_passed(seqno, to_wait(parent)->seqno))
> +				completed = parent;
> +			else
> +				first = false;
> +		} else
> +			p = &parent->rb_left;
> +	}
> +	rb_link_node(&wait->node, parent, p);
> +	rb_insert_color(&wait->node, &b->waiters);
> +	GEM_BUG_ON(!first && !b->tasklet);
> +
> +	if (completed) {
> +		struct rb_node *next = rb_next(completed);
> +
> +		GEM_BUG_ON(!next && !first);
> +		if (next && next != &wait->node) {
> +			GEM_BUG_ON(first);
> +			b->first_wait = to_wait(next);
> +			smp_store_mb(b->tasklet, b->first_wait->task);
> +			/* As there is a delay between reading the current
> +			 * seqno, processing the completed tasks and selecting
> +			 * the next waiter, we may have missed the interrupt
> +			 * and so need for the next bottom-half to wakeup.
> +			 *
> +			 * Also as we enable the IRQ, we may miss the
> +			 * interrupt for that seqno, so we have to wake up
> +			 * the next bottom-half in order to do a coherent check
> +			 * in case the seqno passed.
> +			 */
> +			__intel_breadcrumbs_enable_irq(b);
> +			wake_up_process(to_wait(next)->task);
> +		}
> +
> +		do {
> +			struct intel_wait *crumb = to_wait(completed);
> +			completed = rb_prev(completed);
> +			__intel_breadcrumbs_finish(b, crumb);
> +		} while (completed);
> +	}
> +
> +	if (first) {
> +		GEM_BUG_ON(rb_first(&b->waiters) != &wait->node);
> +		b->first_wait = wait;
> +		smp_store_mb(b->tasklet, wait->task);
> +		first =__intel_breadcrumbs_enable_irq(b);
> +	}
> +	GEM_BUG_ON(!b->tasklet);
> +	GEM_BUG_ON(!b->first_wait);
> +	GEM_BUG_ON(rb_first(&b->waiters) != &b->first_wait->node);
> +
> +	spin_unlock(&b->lock);
> +
> +	return first;
> +}
> +
> +void intel_engine_enable_fake_irq(struct intel_engine_cs *engine)
> +{
> +	mod_timer(&engine->breadcrumbs.fake_irq, jiffies + 1);
> +}
> +
> +static inline bool chain_wakeup(struct rb_node *rb, int priority)
> +{
> +	return rb && to_wait(rb)->task->prio <= priority;
> +}
> +
> +void intel_engine_remove_wait(struct intel_engine_cs *engine,
> +			      struct intel_wait *wait)
> +{
> +	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> +
> +	/* Quick check to see if this waiter was already decoupled from
> +	 * the tree by the bottom-half to avoid contention on the spinlock
> +	 * by the herd.
> +	 */
> +	if (RB_EMPTY_NODE(&wait->node))
> +		return;
> +
> +	spin_lock(&b->lock);
> +
> +	if (RB_EMPTY_NODE(&wait->node))
> +		goto out_unlock;
> +
> +	if (b->first_wait == wait) {
> +		struct rb_node *next;
> +		const int priority = wait->task->prio;
> +
> +		GEM_BUG_ON(b->tasklet != wait->task);
> +
> +		/* We are the current bottom-half. Find the next candidate,
> +		 * the first waiter in the queue on the remaining oldest
> +		 * request. As multiple seqnos may complete in the time it
> +		 * takes us to wake up and find the next waiter, we have to
> +		 * wake up that waiter for it to perform its own coherent
> +		 * completion check.
> +		 */
> +		next = rb_next(&wait->node);
> +		if (chain_wakeup(next, priority)) {
> +			/* If the next waiter is already complete,
> +			 * wake it up and continue onto the next waiter. So
> +			 * if have a small herd, they will wake up in parallel
> +			 * rather than sequentially, which should reduce
> +			 * the overall latency in waking all the completed
> +			 * clients.
> +			 *
> +			 * However, waking up a chain adds extra latency to
> +			 * the first_waiter. This is undesirable if that
> +			 * waiter is a high priority task.
> +			 */
> +			u32 seqno = engine->get_seqno(engine);
> +			while (i915_seqno_passed(seqno, to_wait(next)->seqno)) {
> +				struct rb_node *n = rb_next(next);
> +				__intel_breadcrumbs_finish(b, to_wait(next));
> +				next = n;
> +				if (!chain_wakeup(next, priority))
> +					break;
> +			}
> +		}
> +
> +		if (next) {
> +			/* In our haste, we may have completed the first waiter
> +			 * before we enabled the interrupt. Do so now as we
> +			 * have a second waiter for a future seqno. Afterwards,
> +			 * we have to wake up that waiter in case we missed
> +			 * the interrupt, or if we have to handle an
> +			 * exception rather than a seqno completion.
> +			 */
> +			b->first_wait = to_wait(next);
> +			smp_store_mb(b->tasklet, b->first_wait->task);
> +			if (b->first_wait->seqno != wait->seqno)
> +				__intel_breadcrumbs_enable_irq(b);
> +			wake_up_process(b->tasklet);
> +		} else {
> +			b->first_wait = NULL;
> +			WRITE_ONCE(b->tasklet, NULL);
> +			__intel_breadcrumbs_disable_irq(b);
> +		}
> +	} else {
> +		GEM_BUG_ON(rb_first(&b->waiters) == &wait->node);
> +	}
> +
> +	GEM_BUG_ON(RB_EMPTY_NODE(&wait->node));
> +	rb_erase(&wait->node, &b->waiters);
> +
> +out_unlock:
> +	GEM_BUG_ON(b->first_wait == wait);
> +	GEM_BUG_ON(rb_first(&b->waiters) != (b->first_wait ? &b->first_wait->node : NULL));
> +	GEM_BUG_ON(!b->tasklet ^ RB_EMPTY_ROOT(&b->waiters));
> +	spin_unlock(&b->lock);
> +}
> +
> +void intel_engine_init_breadcrumbs(struct intel_engine_cs *engine)
> +{
> +	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> +
> +	spin_lock_init(&b->lock);
> +	setup_timer(&b->fake_irq,
> +		    intel_breadcrumbs_fake_irq,
> +		    (unsigned long)engine);
> +}
> +
> +void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine)
> +{
> +	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> +
> +	del_timer_sync(&b->fake_irq);
> +}
> +
> +unsigned intel_kick_waiters(struct drm_i915_private *i915)
> +{
> +	struct intel_engine_cs *engine;
> +	unsigned mask = 0;
> +
> +	/* To avoid the task_struct disappearing beneath us as we wake up
> +	 * the process, we must first inspect the task_struct->state under the
> +	 * RCU lock, i.e. as we call wake_up_process() we must be holding the
> +	 * rcu_read_lock().
> +	 */
> +	rcu_read_lock();
> +	for_each_engine(engine, i915)
> +		if (unlikely(intel_engine_wakeup(engine)))
> +			mask |= intel_engine_flag(engine);
> +	rcu_read_unlock();
> +
> +	return mask;
> +}
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 5c191a1afaaf..270409e9ac7a 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1890,6 +1890,8 @@ void intel_logical_ring_cleanup(struct intel_engine_cs *engine)
>   	i915_cmd_parser_fini_ring(engine);
>   	i915_gem_batch_pool_fini(&engine->batch_pool);
>
> +	intel_engine_fini_breadcrumbs(engine);
> +
>   	if (engine->status_page.obj) {
>   		i915_gem_object_unpin_map(engine->status_page.obj);
>   		engine->status_page.obj = NULL;
> @@ -1927,7 +1929,7 @@ logical_ring_default_irqs(struct intel_engine_cs *engine, unsigned shift)
>   {
>   	engine->irq_enable_mask = GT_RENDER_USER_INTERRUPT << shift;
>   	engine->irq_keep_mask = GT_CONTEXT_SWITCH_INTERRUPT << shift;
> -	init_waitqueue_head(&engine->irq_queue);
> +	intel_engine_init_breadcrumbs(engine);
>   }
>
>   static int
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 1a389d0dcdd2..95f04345d3ec 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -2309,7 +2309,7 @@ static int intel_init_ring_buffer(struct drm_device *dev,
>   	memset(engine->semaphore.sync_seqno, 0,
>   	       sizeof(engine->semaphore.sync_seqno));
>
> -	init_waitqueue_head(&engine->irq_queue);
> +	intel_engine_init_breadcrumbs(engine);
>
>   	/* We may need to do things with the shrinker which
>   	 * require us to immediately switch back to the default
> @@ -2389,6 +2389,7 @@ void intel_cleanup_engine(struct intel_engine_cs *engine)
>
>   	i915_cmd_parser_fini_ring(engine);
>   	i915_gem_batch_pool_fini(&engine->batch_pool);
> +	intel_engine_fini_breadcrumbs(engine);
>
>   	intel_ring_context_unpin(dev_priv->kernel_context, engine);
>
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index b33c876fed20..061088360b80 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -160,6 +160,32 @@ struct intel_engine_cs {
>   	struct intel_ringbuffer *buffer;
>   	struct list_head buffers;
>
> +	/* Rather than have every client wait upon all user interrupts,
> +	 * with the herd waking after every interrupt and each doing the
> +	 * heavyweight seqno dance, we delegate the task (of being the
> +	 * bottom-half of the user interrupt) to the first client. After
> +	 * every interrupt, we wake up one client, who does the heavyweight
> +	 * coherent seqno read and either goes back to sleep (if incomplete),
> +	 * or wakes up all the completed clients in parallel, before then
> +	 * transferring the bottom-half status to the next client in the queue.
> +	 *
> +	 * Compared to walking the entire list of waiters in a single dedicated
> +	 * bottom-half, we reduce the latency of the first waiter by avoiding
> +	 * a context switch, but incur additional coherent seqno reads when
> +	 * following the chain of request breadcrumbs. Since it is most likely
> +	 * that we have a single client waiting on each seqno, then reducing
> +	 * the overhead of waking that client is much preferred.
> +	 */
> +	struct intel_breadcrumbs {
> +		spinlock_t lock; /* protects the lists of requests */
> +		struct rb_root waiters; /* sorted by retirement, priority */
> +		struct intel_wait *first_wait; /* oldest waiter by retirement */
> +		struct task_struct *tasklet; /* bh for user interrupts */
> +		struct timer_list fake_irq; /* used after a missed interrupt */
> +		bool irq_enabled;
> +		bool rpm_wakelock;
> +	} breadcrumbs;
> +
>   	/*
>   	 * A pool of objects to use as shadow copies of client batch buffers
>   	 * when the command parser is enabled. Prevents the client from
> @@ -308,8 +334,6 @@ struct intel_engine_cs {
>
>   	bool gpu_caches_dirty;
>
> -	wait_queue_head_t irq_queue;
> -
>   	struct i915_gem_context *last_context;
>
>   	struct intel_ring_hangcheck hangcheck;
> @@ -495,4 +519,44 @@ static inline u32 intel_hws_seqno_address(struct intel_engine_cs *engine)
>   	return engine->status_page.gfx_addr + I915_GEM_HWS_INDEX_ADDR;
>   }
>
> +/* intel_breadcrumbs.c -- user interrupt bottom-half for waiters */
> +struct intel_wait {
> +	struct rb_node node;
> +	struct task_struct *task;
> +	u32 seqno;
> +};
> +void intel_engine_init_breadcrumbs(struct intel_engine_cs *engine);
> +static inline void intel_wait_init(struct intel_wait *wait, u32 seqno)
> +{
> +	wait->task = current;
> +	wait->seqno = seqno;
> +}
> +static inline bool intel_wait_complete(const struct intel_wait *wait)
> +{
> +	return RB_EMPTY_NODE(&wait->node);
> +}
> +bool intel_engine_add_wait(struct intel_engine_cs *engine,
> +			   struct intel_wait *wait);
> +void intel_engine_remove_wait(struct intel_engine_cs *engine,
> +			      struct intel_wait *wait);
> +static inline bool intel_engine_has_waiter(struct intel_engine_cs *engine)
> +{
> +	return READ_ONCE(engine->breadcrumbs.tasklet);
> +}
> +static inline bool intel_engine_wakeup(struct intel_engine_cs *engine)
> +{
> +	bool wakeup = false;
> +	struct task_struct *task = READ_ONCE(engine->breadcrumbs.tasklet);
> +	/* Note that for this not to dangerously chase a dangling pointer,
> +	 * the caller is responsible for ensure that the task remain valid for
> +	 * wake_up_process() i.e. that the RCU grace period cannot expire.
> +	 */
> +	if (task)
> +		wakeup = wake_up_process(task);
> +	return wakeup;
> +}
> +void intel_engine_enable_fake_irq(struct intel_engine_cs *engine);
> +void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine);
> +unsigned intel_kick_waiters(struct drm_i915_private *i915);
> +
>   #endif /* _INTEL_RINGBUFFER_H_ */
>

After the latest discussion revival, I cannot spot any more issues so it 
looks good to me. I even smoke tested it a bit. But it is a complex 
beast so another pair of eyes on it would be good I think. Especially on 
the GPU error handling which I am not that familiar with.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 07/21] drm/i915: Spin after waking up for an interrupt
  2016-06-03 16:08 ` [PATCH 07/21] drm/i915: Spin after waking up for an interrupt Chris Wilson
@ 2016-06-06 14:39   ` Tvrtko Ursulin
  0 siblings, 0 replies; 60+ messages in thread
From: Tvrtko Ursulin @ 2016-06-06 14:39 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 03/06/16 17:08, Chris Wilson wrote:
> When waiting for an interrupt (waiting for the GPU to complete some
> work), we know we are the single waiter for the GPU. We also know when
> the GPU has nearly completed our request (or at least started processing
> it), so after being woken and we detect that the GPU is almost finished,

We cannot detect GPU is almost finished, just that it has started 
processing the waited on batch. I suggest rewording the commit msg to be 
accurate.

> allow the bottom-half to spin for a very short while to reduce client
> latencies.

Hm in fact I think it should explain that it is not actually adding the 
busy spin for the bottom half but extending it by 2us for the first 
waiter case.

>
> The impact is minimal, there was an improvement to the realtime-vs-many
> clients case, but exporting the function proves useful later.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_debugfs.c  |  2 +-
>   drivers/gpu/drm/i915/i915_drv.h      | 26 +++++++++++++++--------
>   drivers/gpu/drm/i915/i915_gem.c      | 40 +++++++++++++++++++++---------------
>   drivers/gpu/drm/i915/intel_display.c |  2 +-
>   drivers/gpu/drm/i915/intel_pm.c      |  4 ++--
>   5 files changed, 45 insertions(+), 29 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 48683538b4e2..0c287bf0d230 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -663,7 +663,7 @@ static int i915_gem_pageflip_info(struct seq_file *m, void *data)
>   					   i915_gem_request_get_seqno(work->flip_queued_req),
>   					   dev_priv->next_seqno,
>   					   engine->get_seqno(engine),
> -					   i915_gem_request_completed(work->flip_queued_req, true));
> +					   i915_gem_request_completed(work->flip_queued_req));
>   			} else
>   				seq_printf(m, "Flip not associated with any ring\n");
>   			seq_printf(m, "Flip queued on frame %d, (was ready on frame %d), now %d\n",
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 68b383d98457..b0460eda2113 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -3219,24 +3219,27 @@ i915_seqno_passed(uint32_t seq1, uint32_t seq2)
>   	return (int32_t)(seq1 - seq2) >= 0;
>   }
>
> -static inline bool i915_gem_request_started(struct drm_i915_gem_request *req,
> -					   bool lazy_coherency)
> +static inline bool i915_gem_request_started(const struct drm_i915_gem_request *req)
>   {
> -	if (!lazy_coherency && req->engine->irq_seqno_barrier)
> -		req->engine->irq_seqno_barrier(req->engine);
>   	return i915_seqno_passed(req->engine->get_seqno(req->engine),
>   				 req->previous_seqno);
>   }
>
> -static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
> -					      bool lazy_coherency)
> +static inline bool i915_gem_request_completed(const struct drm_i915_gem_request *req)
>   {
> -	if (!lazy_coherency && req->engine->irq_seqno_barrier)
> -		req->engine->irq_seqno_barrier(req->engine);
>   	return i915_seqno_passed(req->engine->get_seqno(req->engine),
>   				 req->seqno);
>   }
>
> +bool __i915_spin_request(const struct drm_i915_gem_request *request,
> +			 int state, unsigned long timeout_us);
> +static inline bool i915_spin_request(const struct drm_i915_gem_request *request,
> +				     int state, unsigned long timeout_us)
> +{
> +	return (i915_gem_request_started(request) &&
> +		__i915_spin_request(request, state, timeout_us));
> +}
> +
>   int __must_check i915_gem_get_seqno(struct drm_i915_private *dev_priv, u32 *seqno);
>   int __must_check i915_gem_set_seqno(struct drm_device *dev, u32 seqno);
>
> @@ -3913,6 +3916,8 @@ static inline void i915_trace_irq_get(struct intel_engine_cs *engine,
>
>   static inline bool __i915_request_irq_complete(struct drm_i915_gem_request *req)
>   {
> +	struct intel_engine_cs *engine = req->engine;
> +
>   	/* Ensure our read of the seqno is coherent so that we
>   	 * do not "miss an interrupt" (i.e. if this is the last
>   	 * request and the seqno write from the GPU is not visible
> @@ -3924,7 +3929,10 @@ static inline bool __i915_request_irq_complete(struct drm_i915_gem_request *req)
>   	 * but it is easier and safer to do it every time the waiter
>   	 * is woken.
>   	 */
> -	if (i915_gem_request_completed(req, false))
> +	if (engine->irq_seqno_barrier)
> +		engine->irq_seqno_barrier(engine);
> +
> +	if (i915_gem_request_completed(req))
>   		return true;
>
>   	/* We need to check whether any gpu reset happened in between
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index d08edb3d16f1..bf5c93f2bd81 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1155,9 +1155,9 @@ static bool busywait_stop(unsigned long timeout, unsigned cpu)
>   	return this_cpu != cpu;
>   }
>
> -static bool __i915_spin_request(struct drm_i915_gem_request *req, int state)
> +bool __i915_spin_request(const struct drm_i915_gem_request *req,
> +			 int state, unsigned long timeout_us)
>   {
> -	unsigned long timeout;
>   	unsigned cpu;
>
>   	/* When waiting for high frequency requests, e.g. during synchronous
> @@ -1170,19 +1170,15 @@ static bool __i915_spin_request(struct drm_i915_gem_request *req, int state)
>   	 * takes to sleep on a request, on the order of a microsecond.
>   	 */
>
> -	/* Only spin if we know the GPU is processing this request */
> -	if (!i915_gem_request_started(req, true))
> -		return false;
> -
> -	timeout = local_clock_us(&cpu) + 5;
> +	timeout_us += local_clock_us(&cpu);
>   	do {
> -		if (i915_gem_request_completed(req, true))
> +		if (i915_gem_request_completed(req))
>   			return true;
>
>   		if (signal_pending_state(state, current))
>   			break;
>
> -		if (busywait_stop(timeout, cpu))
> +		if (busywait_stop(timeout_us, cpu))
>   			break;
>
>   		cpu_relax_lowlatency();
> @@ -1224,7 +1220,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>   	if (list_empty(&req->list))
>   		return 0;
>
> -	if (i915_gem_request_completed(req, true))
> +	if (i915_gem_request_completed(req))
>   		return 0;
>
>   	timeout_remain = MAX_SCHEDULE_TIMEOUT;
> @@ -1249,7 +1245,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>   		gen6_rps_boost(req->i915, rps, req->emitted_jiffies);
>
>   	/* Optimistic spin for the next ~jiffie before touching IRQs */
> -	if (__i915_spin_request(req, state))
> +	if (i915_spin_request(req, state, 5))
>   		goto complete;
>
>   	intel_wait_init(&wait, req->seqno);
> @@ -1290,6 +1286,10 @@ wakeup:
>   		 */
>   		if (__i915_request_irq_complete(req))
>   			break;
> +
> +		/* Only spin if we know the GPU is processing this request */
> +		if (i915_spin_request(req, state, 2))
> +			break;

The behaviour described in the comment is embedded in the function 
called. Or change to "Only spin*s* if we know.." ?

>   	}
>   	remove_wait_queue(&req->i915->gpu_error.wait_queue, &reset);
>
> @@ -2805,8 +2805,16 @@ i915_gem_find_active_request(struct intel_engine_cs *engine)
>   {
>   	struct drm_i915_gem_request *request;
>
> +	/* We are called by the error capture and reset at a random
> +	 * point in time. In particular, note that neither is crucially
> +	 * ordered with an interrupt. After a hang, the GPU is dead and we
> +	 * assume that no more writes can happen (we waited long enough for
> +	 * all writes that were in transaction to be flushed) - adding an
> +	 * extra delay for a recent interrupt is pointless. Hence, we do
> +	 * not need an engine->irq_seqno_barrier() before the seqno reads.
> +	 */
>   	list_for_each_entry(request, &engine->request_list, list) {
> -		if (i915_gem_request_completed(request, false))
> +		if (i915_gem_request_completed(request))
>   			continue;
>
>   		return request;
> @@ -2937,7 +2945,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *engine)
>   					   struct drm_i915_gem_request,
>   					   list);
>
> -		if (!i915_gem_request_completed(request, true))
> +		if (!i915_gem_request_completed(request))
>   			break;
>
>   		i915_gem_request_retire(request);
> @@ -2961,7 +2969,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *engine)
>   	}
>
>   	if (unlikely(engine->trace_irq_req &&
> -		     i915_gem_request_completed(engine->trace_irq_req, true))) {
> +		     i915_gem_request_completed(engine->trace_irq_req))) {
>   		engine->irq_put(engine);
>   		i915_gem_request_assign(&engine->trace_irq_req, NULL);
>   	}
> @@ -3058,7 +3066,7 @@ i915_gem_object_flush_active(struct drm_i915_gem_object *obj)
>   		if (req == NULL)
>   			continue;
>
> -		if (i915_gem_request_completed(req, true))
> +		if (i915_gem_request_completed(req))
>   			i915_gem_object_retire__read(obj, i);
>   	}
>
> @@ -3164,7 +3172,7 @@ __i915_gem_object_sync(struct drm_i915_gem_object *obj,
>   	if (to == from)
>   		return 0;
>
> -	if (i915_gem_request_completed(from_req, true))
> +	if (i915_gem_request_completed(from_req))
>   		return 0;
>
>   	if (!i915_semaphore_is_enabled(to_i915(obj->base.dev))) {
> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> index 2bc291ac7243..bb09ee6d1a3f 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -11590,7 +11590,7 @@ static bool __pageflip_stall_check_cs(struct drm_i915_private *dev_priv,
>   	vblank = intel_crtc_get_vblank_counter(intel_crtc);
>   	if (work->flip_ready_vblank == 0) {
>   		if (work->flip_queued_req &&
> -		    !i915_gem_request_completed(work->flip_queued_req, true))
> +		    !i915_gem_request_completed(work->flip_queued_req))
>   			return false;
>
>   		work->flip_ready_vblank = vblank;
> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> index 657a64fc2780..712bd0debb91 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -7687,7 +7687,7 @@ static void __intel_rps_boost_work(struct work_struct *work)
>   	struct request_boost *boost = container_of(work, struct request_boost, work);
>   	struct drm_i915_gem_request *req = boost->req;
>
> -	if (!i915_gem_request_completed(req, true))
> +	if (!i915_gem_request_completed(req))
>   		gen6_rps_boost(req->i915, NULL, req->emitted_jiffies);
>
>   	i915_gem_request_unreference(req);
> @@ -7701,7 +7701,7 @@ void intel_queue_rps_boost_for_request(struct drm_i915_gem_request *req)
>   	if (req == NULL || INTEL_GEN(req->i915) < 6)
>   		return;
>
> -	if (i915_gem_request_completed(req, true))
> +	if (i915_gem_request_completed(req))
>   		return;
>
>   	boost = kmalloc(sizeof(*boost), GFP_ATOMIC);
>

I am not such a big fan of spinning, but the code looks correct. Just 
please improve the commit message and that comment.

Regards,

Tvrtko


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 08/21] drm/i915: Use HWS for seqno tracking everywhere
  2016-06-03 16:08 ` [PATCH 08/21] drm/i915: Use HWS for seqno tracking everywhere Chris Wilson
@ 2016-06-06 14:55   ` Tvrtko Ursulin
  2016-06-08  9:24     ` Chris Wilson
  0 siblings, 1 reply; 60+ messages in thread
From: Tvrtko Ursulin @ 2016-06-06 14:55 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 03/06/16 17:08, Chris Wilson wrote:
> By using the same address for storing the HWS on every platform, we can
> remove the platform specific vfuncs and reduce the get-seqno routine to
> a single read of a cached memory location.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_debugfs.c      |  6 +--
>   drivers/gpu/drm/i915/i915_drv.h          |  4 +-
>   drivers/gpu/drm/i915/i915_gpu_error.c    |  2 +-
>   drivers/gpu/drm/i915/i915_irq.c          |  4 +-
>   drivers/gpu/drm/i915/i915_trace.h        |  2 +-
>   drivers/gpu/drm/i915/intel_breadcrumbs.c |  4 +-
>   drivers/gpu/drm/i915/intel_lrc.c         | 26 +---------
>   drivers/gpu/drm/i915/intel_ringbuffer.c  | 83 ++++++++------------------------
>   drivers/gpu/drm/i915/intel_ringbuffer.h  |  7 +--
>   9 files changed, 36 insertions(+), 102 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 0c287bf0d230..72dae6fb0aa2 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -662,7 +662,7 @@ static int i915_gem_pageflip_info(struct seq_file *m, void *data)
>   					   engine->name,
>   					   i915_gem_request_get_seqno(work->flip_queued_req),
>   					   dev_priv->next_seqno,
> -					   engine->get_seqno(engine),
> +					   intel_engine_get_seqno(engine),
>   					   i915_gem_request_completed(work->flip_queued_req));
>   			} else
>   				seq_printf(m, "Flip not associated with any ring\n");
> @@ -792,7 +792,7 @@ static void i915_ring_seqno_info(struct seq_file *m,
>   	struct rb_node *rb;
>
>   	seq_printf(m, "Current sequence (%s): %x\n",
> -		   engine->name, engine->get_seqno(engine));
> +		   engine->name, intel_engine_get_seqno(engine));
>   	seq_printf(m, "Current user interrupts (%s): %x\n",
>   		   engine->name, READ_ONCE(engine->user_interrupts));
>
> @@ -1417,7 +1417,7 @@ static int i915_hangcheck_info(struct seq_file *m, void *unused)
>
>   	for_each_engine_id(engine, dev_priv, id) {
>   		acthd[id] = intel_ring_get_active_head(engine);
> -		seqno[id] = engine->get_seqno(engine);
> +		seqno[id] = intel_engine_get_seqno(engine);
>   	}
>
>   	i915_get_extra_instdone(dev_priv, instdone);
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index b0460eda2113..4a71f4e9a97a 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -3221,13 +3221,13 @@ i915_seqno_passed(uint32_t seq1, uint32_t seq2)
>
>   static inline bool i915_gem_request_started(const struct drm_i915_gem_request *req)
>   {
> -	return i915_seqno_passed(req->engine->get_seqno(req->engine),
> +	return i915_seqno_passed(intel_engine_get_seqno(req->engine),
>   				 req->previous_seqno);
>   }
>
>   static inline bool i915_gem_request_completed(const struct drm_i915_gem_request *req)
>   {
> -	return i915_seqno_passed(req->engine->get_seqno(req->engine),
> +	return i915_seqno_passed(intel_engine_get_seqno(req->engine),
>   				 req->seqno);
>   }
>
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index 89241ffcc676..81341fc4e61a 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -983,7 +983,7 @@ static void i915_record_ring_state(struct drm_i915_private *dev_priv,
>   	ering->waiting = intel_engine_has_waiter(engine);
>   	ering->instpm = I915_READ(RING_INSTPM(engine->mmio_base));
>   	ering->acthd = intel_ring_get_active_head(engine);
> -	ering->seqno = engine->get_seqno(engine);
> +	ering->seqno = intel_engine_get_seqno(engine);
>   	ering->last_seqno = engine->last_submitted_seqno;
>   	ering->start = I915_READ_START(engine);
>   	ering->head = I915_READ_HEAD(engine);
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 2a736f4a0fe5..4013ad92cdc6 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -2951,7 +2951,7 @@ static int semaphore_passed(struct intel_engine_cs *engine)
>   	if (signaller->hangcheck.deadlock >= I915_NUM_ENGINES)
>   		return -1;
>
> -	if (i915_seqno_passed(signaller->get_seqno(signaller), seqno))
> +	if (i915_seqno_passed(intel_engine_get_seqno(engine), seqno))

Should be signaller, not engine, by the look of it.

>   		return 1;
>
>   	/* cursory check for an unkickable deadlock */
> @@ -3139,7 +3139,7 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
>   			engine->irq_seqno_barrier(engine);
>
>   		acthd = intel_ring_get_active_head(engine);
> -		seqno = engine->get_seqno(engine);
> +		seqno = intel_engine_get_seqno(engine);
>
>   		/* Reset stuck interrupts between batch advances */
>   		user_interrupts = 0;
> diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
> index 6768db032f84..3d13fde95fdf 100644
> --- a/drivers/gpu/drm/i915/i915_trace.h
> +++ b/drivers/gpu/drm/i915/i915_trace.h
> @@ -558,7 +558,7 @@ TRACE_EVENT(i915_gem_request_notify,
>   	    TP_fast_assign(
>   			   __entry->dev = engine->i915->dev->primary->index;
>   			   __entry->ring = engine->id;
> -			   __entry->seqno = engine->get_seqno(engine);
> +			   __entry->seqno = intel_engine_get_seqno(engine);
>   			   ),
>
>   	    TP_printk("dev=%u, ring=%u, seqno=%u",
> diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> index e0121f727938..44346de39794 100644
> --- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
> +++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> @@ -148,7 +148,7 @@ bool intel_engine_add_wait(struct intel_engine_cs *engine,
>   	first = true;
>   	parent = NULL;
>   	completed = NULL;
> -	seqno = engine->get_seqno(engine);
> +	seqno = intel_engine_get_seqno(engine);
>
>   	p = &b->waiters.rb_node;
>   	while (*p) {
> @@ -275,7 +275,7 @@ void intel_engine_remove_wait(struct intel_engine_cs *engine,
>   			 * the first_waiter. This is undesirable if that
>   			 * waiter is a high priority task.
>   			 */
> -			u32 seqno = engine->get_seqno(engine);
> +			u32 seqno = intel_engine_get_seqno(engine);
>   			while (i915_seqno_passed(seqno, to_wait(next)->seqno)) {
>   				struct rb_node *n = rb_next(next);
>   				__intel_breadcrumbs_finish(b, to_wait(next));
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 270409e9ac7a..e48687837a95 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1712,16 +1712,6 @@ static int gen8_emit_flush_render(struct drm_i915_gem_request *request,
>   	return 0;
>   }
>
> -static u32 gen8_get_seqno(struct intel_engine_cs *engine)
> -{
> -	return intel_read_status_page(engine, I915_GEM_HWS_INDEX);
> -}
> -
> -static void gen8_set_seqno(struct intel_engine_cs *engine, u32 seqno)
> -{
> -	intel_write_status_page(engine, I915_GEM_HWS_INDEX, seqno);
> -}
> -
>   static void bxt_a_seqno_barrier(struct intel_engine_cs *engine)
>   {
>   	/*
> @@ -1737,14 +1727,6 @@ static void bxt_a_seqno_barrier(struct intel_engine_cs *engine)
>   	intel_flush_status_page(engine, I915_GEM_HWS_INDEX);
>   }
>
> -static void bxt_a_set_seqno(struct intel_engine_cs *engine, u32 seqno)
> -{
> -	intel_write_status_page(engine, I915_GEM_HWS_INDEX, seqno);
> -
> -	/* See bxt_a_get_seqno() explaining the reason for the clflush. */
> -	intel_flush_status_page(engine, I915_GEM_HWS_INDEX);
> -}
> -
>   /*
>    * Reserve space for 2 NOOPs at the end of each request to be
>    * used as a workaround for not being allowed to do lite
> @@ -1770,7 +1752,7 @@ static int gen8_emit_request(struct drm_i915_gem_request *request)
>   				intel_hws_seqno_address(request->engine) |
>   				MI_FLUSH_DW_USE_GTT);
>   	intel_logical_ring_emit(ringbuf, 0);
> -	intel_logical_ring_emit(ringbuf, i915_gem_request_get_seqno(request));
> +	intel_logical_ring_emit(ringbuf, request->seqno);
>   	intel_logical_ring_emit(ringbuf, MI_USER_INTERRUPT);
>   	intel_logical_ring_emit(ringbuf, MI_NOOP);
>   	return intel_logical_ring_advance_and_submit(request);
> @@ -1916,12 +1898,8 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
>   	engine->irq_get = gen8_logical_ring_get_irq;
>   	engine->irq_put = gen8_logical_ring_put_irq;
>   	engine->emit_bb_start = gen8_emit_bb_start;
> -	engine->get_seqno = gen8_get_seqno;
> -	engine->set_seqno = gen8_set_seqno;
> -	if (IS_BXT_REVID(engine->i915, 0, BXT_REVID_A1)) {
> +	if (IS_BXT_REVID(engine->i915, 0, BXT_REVID_A1))
>   		engine->irq_seqno_barrier = bxt_a_seqno_barrier;
> -		engine->set_seqno = bxt_a_set_seqno;
> -	}
>   }
>
>   static inline void
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 95f04345d3ec..bac496902c6d 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1281,19 +1281,17 @@ static int gen8_rcs_signal(struct drm_i915_gem_request *signaller_req,
>   		return ret;
>
>   	for_each_engine_id(waiter, dev_priv, id) {
> -		u32 seqno;
>   		u64 gtt_offset = signaller->semaphore.signal_ggtt[id];
>   		if (gtt_offset == MI_SEMAPHORE_SYNC_INVALID)
>   			continue;
>
> -		seqno = i915_gem_request_get_seqno(signaller_req);
>   		intel_ring_emit(signaller, GFX_OP_PIPE_CONTROL(6));
>   		intel_ring_emit(signaller, PIPE_CONTROL_GLOBAL_GTT_IVB |
>   					   PIPE_CONTROL_QW_WRITE |
>   					   PIPE_CONTROL_CS_STALL);
>   		intel_ring_emit(signaller, lower_32_bits(gtt_offset));
>   		intel_ring_emit(signaller, upper_32_bits(gtt_offset));
> -		intel_ring_emit(signaller, seqno);
> +		intel_ring_emit(signaller, signaller_req->seqno);
>   		intel_ring_emit(signaller, 0);
>   		intel_ring_emit(signaller, MI_SEMAPHORE_SIGNAL |
>   					   MI_SEMAPHORE_TARGET(waiter->hw_id));
> @@ -1322,18 +1320,16 @@ static int gen8_xcs_signal(struct drm_i915_gem_request *signaller_req,
>   		return ret;
>
>   	for_each_engine_id(waiter, dev_priv, id) {
> -		u32 seqno;
>   		u64 gtt_offset = signaller->semaphore.signal_ggtt[id];
>   		if (gtt_offset == MI_SEMAPHORE_SYNC_INVALID)
>   			continue;
>
> -		seqno = i915_gem_request_get_seqno(signaller_req);
>   		intel_ring_emit(signaller, (MI_FLUSH_DW + 1) |
>   					   MI_FLUSH_DW_OP_STOREDW);
>   		intel_ring_emit(signaller, lower_32_bits(gtt_offset) |
>   					   MI_FLUSH_DW_USE_GTT);
>   		intel_ring_emit(signaller, upper_32_bits(gtt_offset));
> -		intel_ring_emit(signaller, seqno);
> +		intel_ring_emit(signaller, signaller_req->seqno);
>   		intel_ring_emit(signaller, MI_SEMAPHORE_SIGNAL |
>   					   MI_SEMAPHORE_TARGET(waiter->hw_id));
>   		intel_ring_emit(signaller, 0);
> @@ -1364,11 +1360,9 @@ static int gen6_signal(struct drm_i915_gem_request *signaller_req,
>   		i915_reg_t mbox_reg = signaller->semaphore.mbox.signal[id];
>
>   		if (i915_mmio_reg_valid(mbox_reg)) {
> -			u32 seqno = i915_gem_request_get_seqno(signaller_req);
> -
>   			intel_ring_emit(signaller, MI_LOAD_REGISTER_IMM(1));
>   			intel_ring_emit_reg(signaller, mbox_reg);
> -			intel_ring_emit(signaller, seqno);
> +			intel_ring_emit(signaller, signaller_req->seqno);
>   		}
>   	}
>
> @@ -1404,7 +1398,7 @@ gen6_add_request(struct drm_i915_gem_request *req)
>   	intel_ring_emit(engine, MI_STORE_DWORD_INDEX);
>   	intel_ring_emit(engine,
>   			I915_GEM_HWS_INDEX << MI_STORE_DWORD_INDEX_SHIFT);
> -	intel_ring_emit(engine, i915_gem_request_get_seqno(req));
> +	intel_ring_emit(engine, req->seqno);
>   	intel_ring_emit(engine, MI_USER_INTERRUPT);
>   	__intel_ring_advance(engine);
>
> @@ -1543,7 +1537,9 @@ static int
>   pc_render_add_request(struct drm_i915_gem_request *req)
>   {
>   	struct intel_engine_cs *engine = req->engine;
> -	u32 scratch_addr = engine->scratch.gtt_offset + 2 * CACHELINE_BYTES;
> +	u32 addr = engine->status_page.gfx_addr +
> +		(I915_GEM_HWS_INDEX << MI_STORE_DWORD_INDEX_SHIFT);
> +	u32 scratch_addr = addr;
>   	int ret;

Before my time. :)

Why was this code flushing all that space but above where it was 
writting the seqno?

With this change it is flushing the seqno area as well.

>
>   	/* For Ironlake, MI_USER_INTERRUPT was deprecated and apparently
> @@ -1559,12 +1555,12 @@ pc_render_add_request(struct drm_i915_gem_request *req)
>   		return ret;
>
>   	intel_ring_emit(engine,
> -			GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE |
> +			GFX_OP_PIPE_CONTROL(4) |
> +			PIPE_CONTROL_QW_WRITE |
>   			PIPE_CONTROL_WRITE_FLUSH |
>   			PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE);
> -	intel_ring_emit(engine,
> -			engine->scratch.gtt_offset | PIPE_CONTROL_GLOBAL_GTT);
> -	intel_ring_emit(engine, i915_gem_request_get_seqno(req));
> +	intel_ring_emit(engine, addr | PIPE_CONTROL_GLOBAL_GTT);
> +	intel_ring_emit(engine, req->seqno);
>   	intel_ring_emit(engine, 0);
>   	PIPE_CONTROL_FLUSH(engine, scratch_addr);
>   	scratch_addr += 2 * CACHELINE_BYTES; /* write to separate cachelines */
> @@ -1579,13 +1575,12 @@ pc_render_add_request(struct drm_i915_gem_request *req)
>   	PIPE_CONTROL_FLUSH(engine, scratch_addr);
>
>   	intel_ring_emit(engine,
> -			GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE |
> +		       	GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE |
>   			PIPE_CONTROL_WRITE_FLUSH |
>   			PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE |
>   			PIPE_CONTROL_NOTIFY);
> -	intel_ring_emit(engine,
> -			engine->scratch.gtt_offset | PIPE_CONTROL_GLOBAL_GTT);
> -	intel_ring_emit(engine, i915_gem_request_get_seqno(req));
> +	intel_ring_emit(engine, addr | PIPE_CONTROL_GLOBAL_GTT);
> +	intel_ring_emit(engine, req->seqno);
>   	intel_ring_emit(engine, 0);
>   	__intel_ring_advance(engine);
>
> @@ -1617,30 +1612,6 @@ gen6_seqno_barrier(struct intel_engine_cs *engine)
>   	spin_unlock_irq(&dev_priv->uncore.lock);
>   }
>
> -static u32
> -ring_get_seqno(struct intel_engine_cs *engine)
> -{
> -	return intel_read_status_page(engine, I915_GEM_HWS_INDEX);
> -}
> -
> -static void
> -ring_set_seqno(struct intel_engine_cs *engine, u32 seqno)
> -{
> -	intel_write_status_page(engine, I915_GEM_HWS_INDEX, seqno);
> -}
> -
> -static u32
> -pc_render_get_seqno(struct intel_engine_cs *engine)
> -{
> -	return engine->scratch.cpu_page[0];
> -}
> -
> -static void
> -pc_render_set_seqno(struct intel_engine_cs *engine, u32 seqno)
> -{
> -	engine->scratch.cpu_page[0] = seqno;
> -}
> -
>   static bool
>   gen5_ring_get_irq(struct intel_engine_cs *engine)
>   {
> @@ -1770,8 +1741,8 @@ i9xx_add_request(struct drm_i915_gem_request *req)
>
>   	intel_ring_emit(engine, MI_STORE_DWORD_INDEX);
>   	intel_ring_emit(engine,
> -			I915_GEM_HWS_INDEX << MI_STORE_DWORD_INDEX_SHIFT);
> -	intel_ring_emit(engine, i915_gem_request_get_seqno(req));
> +		       	I915_GEM_HWS_INDEX << MI_STORE_DWORD_INDEX_SHIFT);
> +	intel_ring_emit(engine, req->seqno);
>   	intel_ring_emit(engine, MI_USER_INTERRUPT);
>   	__intel_ring_advance(engine);
>
> @@ -2588,7 +2559,9 @@ void intel_ring_init_seqno(struct intel_engine_cs *engine, u32 seqno)
>   	memset(engine->semaphore.sync_seqno, 0,
>   	       sizeof(engine->semaphore.sync_seqno));
>
> -	engine->set_seqno(engine, seqno);
> +	intel_write_status_page(engine, I915_GEM_HWS_INDEX, seqno);
> +	if (engine->irq_seqno_barrier)
> +		engine->irq_seqno_barrier(engine);
>   	engine->last_submitted_seqno = seqno;
>
>   	engine->hangcheck.seqno = seqno;
> @@ -2830,8 +2803,6 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
>   		engine->irq_get = gen8_ring_get_irq;
>   		engine->irq_put = gen8_ring_put_irq;
>   		engine->irq_enable_mask = GT_RENDER_USER_INTERRUPT;
> -		engine->get_seqno = ring_get_seqno;
> -		engine->set_seqno = ring_set_seqno;
>   		if (i915_semaphore_is_enabled(dev_priv)) {
>   			WARN_ON(!dev_priv->semaphore_obj);
>   			engine->semaphore.sync_to = gen8_ring_sync;
> @@ -2848,8 +2819,6 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
>   		engine->irq_put = gen6_ring_put_irq;
>   		engine->irq_enable_mask = GT_RENDER_USER_INTERRUPT;
>   		engine->irq_seqno_barrier = gen6_seqno_barrier;
> -		engine->get_seqno = ring_get_seqno;
> -		engine->set_seqno = ring_set_seqno;
>   		if (i915_semaphore_is_enabled(dev_priv)) {
>   			engine->semaphore.sync_to = gen6_ring_sync;
>   			engine->semaphore.signal = gen6_signal;
> @@ -2874,8 +2843,6 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
>   	} else if (IS_GEN5(dev_priv)) {
>   		engine->add_request = pc_render_add_request;
>   		engine->flush = gen4_render_ring_flush;
> -		engine->get_seqno = pc_render_get_seqno;
> -		engine->set_seqno = pc_render_set_seqno;
>   		engine->irq_get = gen5_ring_get_irq;
>   		engine->irq_put = gen5_ring_put_irq;
>   		engine->irq_enable_mask = GT_RENDER_USER_INTERRUPT |
> @@ -2886,8 +2853,6 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
>   			engine->flush = gen2_render_ring_flush;
>   		else
>   			engine->flush = gen4_render_ring_flush;
> -		engine->get_seqno = ring_get_seqno;
> -		engine->set_seqno = ring_set_seqno;
>   		if (IS_GEN2(dev_priv)) {
>   			engine->irq_get = i8xx_ring_get_irq;
>   			engine->irq_put = i8xx_ring_put_irq;
> @@ -2965,8 +2930,6 @@ int intel_init_bsd_ring_buffer(struct drm_device *dev)
>   		engine->flush = gen6_bsd_ring_flush;
>   		engine->add_request = gen6_add_request;
>   		engine->irq_seqno_barrier = gen6_seqno_barrier;
> -		engine->get_seqno = ring_get_seqno;
> -		engine->set_seqno = ring_set_seqno;
>   		if (INTEL_GEN(dev_priv) >= 8) {
>   			engine->irq_enable_mask =
>   				GT_RENDER_USER_INTERRUPT << GEN8_VCS1_IRQ_SHIFT;
> @@ -3004,8 +2967,6 @@ int intel_init_bsd_ring_buffer(struct drm_device *dev)
>   		engine->mmio_base = BSD_RING_BASE;
>   		engine->flush = bsd_ring_flush;
>   		engine->add_request = i9xx_add_request;
> -		engine->get_seqno = ring_get_seqno;
> -		engine->set_seqno = ring_set_seqno;
>   		if (IS_GEN5(dev_priv)) {
>   			engine->irq_enable_mask = ILK_BSD_USER_INTERRUPT;
>   			engine->irq_get = gen5_ring_get_irq;
> @@ -3040,8 +3001,6 @@ int intel_init_bsd2_ring_buffer(struct drm_device *dev)
>   	engine->flush = gen6_bsd_ring_flush;
>   	engine->add_request = gen6_add_request;
>   	engine->irq_seqno_barrier = gen6_seqno_barrier;
> -	engine->get_seqno = ring_get_seqno;
> -	engine->set_seqno = ring_set_seqno;
>   	engine->irq_enable_mask =
>   			GT_RENDER_USER_INTERRUPT << GEN8_VCS2_IRQ_SHIFT;
>   	engine->irq_get = gen8_ring_get_irq;
> @@ -3073,8 +3032,6 @@ int intel_init_blt_ring_buffer(struct drm_device *dev)
>   	engine->flush = gen6_ring_flush;
>   	engine->add_request = gen6_add_request;
>   	engine->irq_seqno_barrier = gen6_seqno_barrier;
> -	engine->get_seqno = ring_get_seqno;
> -	engine->set_seqno = ring_set_seqno;
>   	if (INTEL_GEN(dev_priv) >= 8) {
>   		engine->irq_enable_mask =
>   			GT_RENDER_USER_INTERRUPT << GEN8_BCS_IRQ_SHIFT;
> @@ -3133,8 +3090,6 @@ int intel_init_vebox_ring_buffer(struct drm_device *dev)
>   	engine->flush = gen6_ring_flush;
>   	engine->add_request = gen6_add_request;
>   	engine->irq_seqno_barrier = gen6_seqno_barrier;
> -	engine->get_seqno = ring_get_seqno;
> -	engine->set_seqno = ring_set_seqno;
>
>   	if (INTEL_GEN(dev_priv) >= 8) {
>   		engine->irq_enable_mask =
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 061088360b80..785c9e5312ff 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -219,9 +219,6 @@ struct intel_engine_cs {
>   	 * monotonic, even if not coherent.
>   	 */
>   	void		(*irq_seqno_barrier)(struct intel_engine_cs *ring);
> -	u32		(*get_seqno)(struct intel_engine_cs *ring);
> -	void		(*set_seqno)(struct intel_engine_cs *ring,
> -				     u32 seqno);
>   	int		(*dispatch_execbuffer)(struct drm_i915_gem_request *req,
>   					       u64 offset, u32 length,
>   					       unsigned dispatch_flags);
> @@ -497,6 +494,10 @@ int intel_init_blt_ring_buffer(struct drm_device *dev);
>   int intel_init_vebox_ring_buffer(struct drm_device *dev);
>
>   u64 intel_ring_get_active_head(struct intel_engine_cs *engine);
> +static inline u32 intel_engine_get_seqno(struct intel_engine_cs *engine)
> +{
> +	return intel_read_status_page(engine, I915_GEM_HWS_INDEX);
> +}
>
>   int init_workarounds_ring(struct intel_engine_cs *engine);
>
>

Regards,

Tvrtko


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 09/21] drm/i915: Stop mapping the scratch page into CPU space
  2016-06-03 16:08 ` [PATCH 09/21] drm/i915: Stop mapping the scratch page into CPU space Chris Wilson
@ 2016-06-06 15:03   ` Tvrtko Ursulin
  0 siblings, 0 replies; 60+ messages in thread
From: Tvrtko Ursulin @ 2016-06-06 15:03 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 03/06/16 17:08, Chris Wilson wrote:
> After the elimination of using the scratch page for Ironlake's
> breadcrumb, we no longer need to kmap the object. We therefore can move
> it into the high unmappable space and do not need to force the object to
> be coherent (i.e. snooped on !llc platforms).
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/intel_ringbuffer.c | 40 +++++++++------------------------
>   drivers/gpu/drm/i915/intel_ringbuffer.h |  1 -
>   2 files changed, 11 insertions(+), 30 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index bac496902c6d..106f40c52bb5 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -643,58 +643,40 @@ out:
>   	return ret;
>   }
>
> -void
> -intel_fini_pipe_control(struct intel_engine_cs *engine)
> +void intel_fini_pipe_control(struct intel_engine_cs *engine)
>   {
>   	if (engine->scratch.obj == NULL)
>   		return;
>
> -	if (INTEL_GEN(engine->i915) >= 5) {
> -		kunmap(sg_page(engine->scratch.obj->pages->sgl));
> -		i915_gem_object_ggtt_unpin(engine->scratch.obj);
> -	}
> -
> +	i915_gem_object_ggtt_unpin(engine->scratch.obj);
>   	drm_gem_object_unreference(&engine->scratch.obj->base);
>   	engine->scratch.obj = NULL;
>   }
>
> -int
> -intel_init_pipe_control(struct intel_engine_cs *engine)
> +int intel_init_pipe_control(struct intel_engine_cs *engine)
>   {
> +	struct drm_i915_gem_object *obj;
>   	int ret;
>
>   	WARN_ON(engine->scratch.obj);
>
> -	engine->scratch.obj = i915_gem_object_create(engine->i915->dev, 4096);
> -	if (IS_ERR(engine->scratch.obj)) {
> -		DRM_ERROR("Failed to allocate seqno page\n");
> -		ret = PTR_ERR(engine->scratch.obj);
> -		engine->scratch.obj = NULL;
> +	obj = i915_gem_object_create(engine->i915->dev, 4096);
> +	if (IS_ERR(obj)) {
> +		DRM_ERROR("Failed to allocate scratch page\n");
> +		ret = PTR_ERR(obj);
>   		goto err;
>   	}
>
> -	ret = i915_gem_object_set_cache_level(engine->scratch.obj,
> -					      I915_CACHE_LLC);
> +	ret = i915_gem_obj_ggtt_pin(obj, 4096, PIN_HIGH);
>   	if (ret)
>   		goto err_unref;
>
> -	ret = i915_gem_obj_ggtt_pin(engine->scratch.obj, 4096, 0);
> -	if (ret)
> -		goto err_unref;
> -
> -	engine->scratch.gtt_offset = i915_gem_obj_ggtt_offset(engine->scratch.obj);
> -	engine->scratch.cpu_page = kmap(sg_page(engine->scratch.obj->pages->sgl));
> -	if (engine->scratch.cpu_page == NULL) {
> -		ret = -ENOMEM;
> -		goto err_unpin;
> -	}
> -
> +	engine->scratch.obj = obj;
> +	engine->scratch.gtt_offset = i915_gem_obj_ggtt_offset(obj);
>   	DRM_DEBUG_DRIVER("%s pipe control offset: 0x%08x\n",
>   			 engine->name, engine->scratch.gtt_offset);
>   	return 0;
>
> -err_unpin:
> -	i915_gem_object_ggtt_unpin(engine->scratch.obj);
>   err_unref:
>   	drm_gem_object_unreference(&engine->scratch.obj->base);
>   err:
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 785c9e5312ff..4b2f19decb30 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -338,7 +338,6 @@ struct intel_engine_cs {
>   	struct {
>   		struct drm_i915_gem_object *obj;
>   		u32 gtt_offset;
> -		volatile u32 *cpu_page;
>   	} scratch;
>
>   	bool needs_cmd_parser;
>

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 10/21] drm/i915: Allocate scratch page from stolen
  2016-06-03 16:08 ` [PATCH 10/21] drm/i915: Allocate scratch page from stolen Chris Wilson
@ 2016-06-06 15:05   ` Tvrtko Ursulin
  0 siblings, 0 replies; 60+ messages in thread
From: Tvrtko Ursulin @ 2016-06-06 15:05 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 03/06/16 17:08, Chris Wilson wrote:
> With the last direct CPU access to the scratch page removed, we can now
> allocate it from our small amount of reserved system pages (stolen
> memory).
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/intel_ringbuffer.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 106f40c52bb5..b7eebbed945d 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -660,7 +660,9 @@ int intel_init_pipe_control(struct intel_engine_cs *engine)
>
>   	WARN_ON(engine->scratch.obj);
>
> -	obj = i915_gem_object_create(engine->i915->dev, 4096);
> +	obj = i915_gem_object_create_stolen(engine->i915->dev, 4096);
> +	if (obj == NULL)
> +		obj = i915_gem_object_create(engine->i915->dev, 4096);
>   	if (IS_ERR(obj)) {
>   		DRM_ERROR("Failed to allocate scratch page\n");
>   		ret = PTR_ERR(obj);
>

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 11/21] drm/i915: Refactor scratch object allocation for gen2 w/a buffer
  2016-06-03 16:08 ` [PATCH 11/21] drm/i915: Refactor scratch object allocation for gen2 w/a buffer Chris Wilson
@ 2016-06-06 15:09   ` Tvrtko Ursulin
  2016-06-08  9:27     ` Chris Wilson
  0 siblings, 1 reply; 60+ messages in thread
From: Tvrtko Ursulin @ 2016-06-06 15:09 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 03/06/16 17:08, Chris Wilson wrote:
> The gen2 w/a buffer is stuffed into the same slot as the gen5+ scratch
> buffer. If we pass in the size we want to allocate for the scratch
> buffer, both callers can use the same routine.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/intel_lrc.c        |  2 +-
>   drivers/gpu/drm/i915/intel_ringbuffer.c | 31 ++++++++-----------------------
>   drivers/gpu/drm/i915/intel_ringbuffer.h |  2 +-
>   3 files changed, 10 insertions(+), 25 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index e48687837a95..32b5eae7dd11 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -2078,7 +2078,7 @@ static int logical_render_ring_init(struct drm_device *dev)
>   	engine->emit_flush = gen8_emit_flush_render;
>   	engine->emit_request = gen8_emit_request_render;
>
> -	ret = intel_init_pipe_control(engine);
> +	ret = intel_init_pipe_control(engine, 4096);
>   	if (ret)
>   		return ret;
>
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index b7eebbed945d..ca2e59405998 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -653,16 +653,16 @@ void intel_fini_pipe_control(struct intel_engine_cs *engine)
>   	engine->scratch.obj = NULL;
>   }
>
> -int intel_init_pipe_control(struct intel_engine_cs *engine)
> +int intel_init_pipe_control(struct intel_engine_cs *engine, int size)
>   {
>   	struct drm_i915_gem_object *obj;
>   	int ret;
>
>   	WARN_ON(engine->scratch.obj);
>
> -	obj = i915_gem_object_create_stolen(engine->i915->dev, 4096);
> +	obj = i915_gem_object_create_stolen(engine->i915->dev, size);
>   	if (obj == NULL)
> -		obj = i915_gem_object_create(engine->i915->dev, 4096);
> +		obj = i915_gem_object_create(engine->i915->dev, size);
>   	if (IS_ERR(obj)) {
>   		DRM_ERROR("Failed to allocate scratch page\n");
>   		ret = PTR_ERR(obj);
> @@ -2863,31 +2863,16 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
>   	engine->init_hw = init_render_ring;
>   	engine->cleanup = render_ring_cleanup;
>
> -	/* Workaround batchbuffer to combat CS tlb bug. */
> -	if (HAS_BROKEN_CS_TLB(dev_priv)) {
> -		obj = i915_gem_object_create(dev, I830_WA_SIZE);
> -		if (IS_ERR(obj)) {
> -			DRM_ERROR("Failed to allocate batch bo\n");
> -			return PTR_ERR(obj);
> -		}
> -
> -		ret = i915_gem_obj_ggtt_pin(obj, 0, 0);
> -		if (ret != 0) {
> -			drm_gem_object_unreference(&obj->base);
> -			DRM_ERROR("Failed to ping batch bo\n");
> -			return ret;
> -		}
> -
> -		engine->scratch.obj = obj;
> -		engine->scratch.gtt_offset = i915_gem_obj_ggtt_offset(obj);
> -	}
> -
>   	ret = intel_init_ring_buffer(dev, engine);
>   	if (ret)
>   		return ret;
>
>   	if (INTEL_GEN(dev_priv) >= 5) {
> -		ret = intel_init_pipe_control(engine);
> +		ret = intel_init_pipe_control(engine, 4096);

Could be cool to define this size with a descriptive name at this point.

> +		if (ret)
> +			return ret;
> +	} else if (HAS_BROKEN_CS_TLB(dev_priv)) {
> +		ret = intel_init_pipe_control(engine, I830_WA_SIZE);
>   		if (ret)
>   			return ret;
>   	}
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 4b2f19decb30..cb599a54931a 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -483,8 +483,8 @@ void intel_ring_init_seqno(struct intel_engine_cs *engine, u32 seqno);
>   int intel_ring_flush_all_caches(struct drm_i915_gem_request *req);
>   int intel_ring_invalidate_all_caches(struct drm_i915_gem_request *req);
>
> +int intel_init_pipe_control(struct intel_engine_cs *engine, int size);
>   void intel_fini_pipe_control(struct intel_engine_cs *engine);
> -int intel_init_pipe_control(struct intel_engine_cs *engine);
>
>   int intel_init_render_ring_buffer(struct drm_device *dev);
>   int intel_init_bsd_ring_buffer(struct drm_device *dev);
>

Anyway,

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 13/21] drm/i915: Check the CPU cached value of seqno after waking the waiter
  2016-06-03 16:08 ` [PATCH 13/21] drm/i915: Check the CPU cached value of seqno after waking the waiter Chris Wilson
@ 2016-06-06 15:10   ` Tvrtko Ursulin
  0 siblings, 0 replies; 60+ messages in thread
From: Tvrtko Ursulin @ 2016-06-06 15:10 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 03/06/16 17:08, Chris Wilson wrote:
> If we have multiple waiters, we may find that many complete on the same
> wake up. If we first inspect the seqno from the CPU cache, we may reduce
> the number of heavyweight coherent seqno reads we require.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_drv.h | 14 ++++++++++----
>   1 file changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 4a71f4e9a97a..4ddb9ff319cb 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -3918,6 +3918,12 @@ static inline bool __i915_request_irq_complete(struct drm_i915_gem_request *req)
>   {
>   	struct intel_engine_cs *engine = req->engine;
>
> +	/* Before we do the heavier coherent read of the seqno,
> +	 * check the value (hopefully) in the CPU cacheline.
> +	 */
> +	if (i915_gem_request_completed(req))
> +		return true;
> +
>   	/* Ensure our read of the seqno is coherent so that we
>   	 * do not "miss an interrupt" (i.e. if this is the last
>   	 * request and the seqno write from the GPU is not visible
> @@ -3929,11 +3935,11 @@ static inline bool __i915_request_irq_complete(struct drm_i915_gem_request *req)
>   	 * but it is easier and safer to do it every time the waiter
>   	 * is woken.
>   	 */
> -	if (engine->irq_seqno_barrier)
> +	if (engine->irq_seqno_barrier) {
>   		engine->irq_seqno_barrier(engine);
> -
> -	if (i915_gem_request_completed(req))
> -		return true;
> +		if (i915_gem_request_completed(req))
> +			return true;
> +	}
>
>   	/* We need to check whether any gpu reset happened in between
>   	 * the request being submitted and now. If a reset has occurred,
>

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 14/21] drm/i915: Only apply one barrier after a breadcrumb interrupt is posted
  2016-06-03 16:08 ` [PATCH 14/21] drm/i915: Only apply one barrier after a breadcrumb interrupt is posted Chris Wilson
@ 2016-06-06 15:34   ` Tvrtko Ursulin
  2016-06-08  9:35     ` Chris Wilson
  0 siblings, 1 reply; 60+ messages in thread
From: Tvrtko Ursulin @ 2016-06-06 15:34 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 03/06/16 17:08, Chris Wilson wrote:
> If we flag the seqno as potentially stale upon receiving an interrupt,
> we can use that information to reduce the frequency that we apply the
> heavyweight coherent seqno read (i.e. if we wake up a chain of waiters).
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_drv.h          | 15 ++++++++++++++-
>   drivers/gpu/drm/i915/i915_irq.c          |  1 +
>   drivers/gpu/drm/i915/intel_breadcrumbs.c | 16 ++++++++++------
>   drivers/gpu/drm/i915/intel_ringbuffer.h  |  1 +
>   4 files changed, 26 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 4ddb9ff319cb..a71d08199d57 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -3935,7 +3935,20 @@ static inline bool __i915_request_irq_complete(struct drm_i915_gem_request *req)
>   	 * but it is easier and safer to do it every time the waiter
>   	 * is woken.
>   	 */
> -	if (engine->irq_seqno_barrier) {
> +	if (engine->irq_seqno_barrier && READ_ONCE(engine->irq_posted)) {
> +		/* The ordering of irq_posted versus applying the barrier
> +		 * is crucial. The clearing of the current irq_posted must
> +		 * be visible before we perform the barrier operation,
> +		 * such that if a subsequent interrupt arrives, irq_posted
> +		 * is reasserted and our task rewoken (which causes us to
> +		 * do another __i915_request_irq_complete() immediately
> +		 * and reapply the barrier). Conversely, if the clear
> +		 * occurs after the barrier, then an interrupt that arrived
> +		 * whilst we waited on the barrier would not trigger a
> +		 * barrier on the next pass, and the read may not see the
> +		 * seqno update.
> +		 */
> +		WRITE_ONCE(engine->irq_posted, false);

Why is this not smp_store_mb ?

>   		engine->irq_seqno_barrier(engine);
>   		if (i915_gem_request_completed(req))
>   			return true;
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index c14eb57b5807..14b3d65bb604 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -976,6 +976,7 @@ static void ironlake_rps_change_irq_handler(struct drm_i915_private *dev_priv)
>
>   static void notify_ring(struct intel_engine_cs *engine)
>   {
> +	smp_store_mb(engine->irq_posted, true);
>   	if (intel_engine_wakeup(engine)) {
>   		trace_i915_gem_request_notify(engine);
>   		engine->user_interrupts++;
> diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> index 44346de39794..0f5fe114c204 100644
> --- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
> +++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> @@ -43,12 +43,18 @@ static void intel_breadcrumbs_fake_irq(unsigned long data)
>
>   static void irq_enable(struct intel_engine_cs *engine)
>   {
> +	/* Enabling the IRQ may miss the generation of the interrupt, but
> +	 * we still need to force the barrier before reading the seqno,
> +	 * just in case.
> +	 */
> +	engine->irq_posted = true;

Should it be smp_store_mb here as well?

>   	WARN_ON(!engine->irq_get(engine));
>   }
>
>   static void irq_disable(struct intel_engine_cs *engine)
>   {
>   	engine->irq_put(engine);
> +	engine->irq_posted = false;
>   }
>
>   static bool __intel_breadcrumbs_enable_irq(struct intel_breadcrumbs *b)
> @@ -56,7 +62,6 @@ static bool __intel_breadcrumbs_enable_irq(struct intel_breadcrumbs *b)
>   	struct intel_engine_cs *engine =
>   		container_of(b, struct intel_engine_cs, breadcrumbs);
>   	struct drm_i915_private *i915 = engine->i915;
> -	bool irq_posted = false;
>
>   	assert_spin_locked(&b->lock);
>   	if (b->rpm_wakelock)
> @@ -72,10 +77,8 @@ static bool __intel_breadcrumbs_enable_irq(struct intel_breadcrumbs *b)
>
>   	/* No interrupts? Kick the waiter every jiffie! */
>   	if (intel_irqs_enabled(i915)) {
> -		if (!test_bit(engine->id, &i915->gpu_error.test_irq_rings)) {
> +		if (!test_bit(engine->id, &i915->gpu_error.test_irq_rings))
>   			irq_enable(engine);
> -			irq_posted = true;
> -		}
>   		b->irq_enabled = true;
>   	}
>
> @@ -83,7 +86,7 @@ static bool __intel_breadcrumbs_enable_irq(struct intel_breadcrumbs *b)
>   	    test_bit(engine->id, &i915->gpu_error.missed_irq_rings))
>   		mod_timer(&b->fake_irq, jiffies + 1);
>
> -	return irq_posted;
> +	return READ_ONCE(engine->irq_posted);
>   }
>
>   static void __intel_breadcrumbs_disable_irq(struct intel_breadcrumbs *b)
> @@ -197,7 +200,8 @@ bool intel_engine_add_wait(struct intel_engine_cs *engine,
>   			 * in case the seqno passed.
>   			 */
>   			__intel_breadcrumbs_enable_irq(b);
> -			wake_up_process(to_wait(next)->task);
> +			if (READ_ONCE(engine->irq_posted))

if (__intel_breadcrumbs_enable_irq(b)) ?

> +				wake_up_process(to_wait(next)->task);
>   		}
>
>   		do {
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index cb599a54931a..324f85e8d540 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -197,6 +197,7 @@ struct intel_engine_cs {
>   	struct i915_ctx_workarounds wa_ctx;
>
>   	unsigned irq_refcount; /* protected by dev_priv->irq_lock */
> +	bool		irq_posted;
>   	u32		irq_enable_mask;	/* bitmask to enable ring interrupt */
>   	struct drm_i915_gem_request *trace_irq_req;
>   	bool __must_check (*irq_get)(struct intel_engine_cs *ring);
>

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 17/21] drm/i915: Convert trace-irq to the breadcrumb waiter
  2016-06-03 16:08 ` [PATCH 17/21] drm/i915: Convert trace-irq to the breadcrumb waiter Chris Wilson
@ 2016-06-07 12:04   ` Tvrtko Ursulin
  2016-06-08  9:48     ` Chris Wilson
  0 siblings, 1 reply; 60+ messages in thread
From: Tvrtko Ursulin @ 2016-06-07 12:04 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 03/06/16 17:08, Chris Wilson wrote:
> If we convert the tracing over from direct use of ring->irq_get() and
> over to the breadcrumb infrastructure, we only have a single user of the
> ring->irq_get and so we will be able to simplify the driver routines
> (eliminating the redundant validation and irq refcounting).

There is a bit more to this than using the breadcrumbs infrastructure. 
So needs more text - a little bit of design documentation in the commit.

> v2: Move to a signaling framework based upon the waiter.
> v3: Track the first-signal to avoid having to walk the rbtree everytime.
> v4: Mark the signaler thread as RT priority to reduce latency in the
> indirect wakeups.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_drv.h          |   8 --
>   drivers/gpu/drm/i915/i915_gem.c          |   9 +-
>   drivers/gpu/drm/i915/i915_trace.h        |   2 +-
>   drivers/gpu/drm/i915/intel_breadcrumbs.c | 178 +++++++++++++++++++++++++++++++
>   drivers/gpu/drm/i915/intel_ringbuffer.h  |   8 +-
>   5 files changed, 188 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index a71d08199d57..b0235372cfdf 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -3906,14 +3906,6 @@ wait_remaining_ms_from_jiffies(unsigned long timestamp_jiffies, int to_wait_ms)
>   			    schedule_timeout_uninterruptible(remaining_jiffies);
>   	}
>   }
> -
> -static inline void i915_trace_irq_get(struct intel_engine_cs *engine,
> -				      struct drm_i915_gem_request *req)
> -{
> -	if (engine->trace_irq_req == NULL && engine->irq_get(engine))
> -		i915_gem_request_assign(&engine->trace_irq_req, req);
> -}
> -
>   static inline bool __i915_request_irq_complete(struct drm_i915_gem_request *req)
>   {
>   	struct intel_engine_cs *engine = req->engine;
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index fdbad07b5f42..f4e550ddaa5d 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2500,7 +2500,8 @@ i915_gem_init_seqno(struct drm_i915_private *dev_priv, u32 seqno)
>
>   	/* If the seqno wraps around, we need to clear the breadcrumb rbtree */
>   	if (!i915_seqno_passed(seqno, dev_priv->next_seqno)) {
> -		while (intel_kick_waiters(dev_priv))
> +		while (intel_kick_waiters(dev_priv) ||
> +		       intel_kick_signalers(dev_priv))
>   			yield();
>   	}
>
> @@ -2964,12 +2965,6 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *engine)
>   		i915_gem_object_retire__read(obj, engine->id);
>   	}
>
> -	if (unlikely(engine->trace_irq_req &&
> -		     i915_gem_request_completed(engine->trace_irq_req))) {
> -		engine->irq_put(engine);
> -		i915_gem_request_assign(&engine->trace_irq_req, NULL);
> -	}
> -
>   	WARN_ON(i915_verify_lists(engine->dev));
>   }
>
> diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
> index 3d13fde95fdf..f59cf07184ae 100644
> --- a/drivers/gpu/drm/i915/i915_trace.h
> +++ b/drivers/gpu/drm/i915/i915_trace.h
> @@ -490,7 +490,7 @@ TRACE_EVENT(i915_gem_ring_dispatch,
>   			   __entry->ring = req->engine->id;
>   			   __entry->seqno = req->seqno;
>   			   __entry->flags = flags;
> -			   i915_trace_irq_get(req->engine, req);
> +			   intel_engine_enable_signaling(req);
>   			   ),
>
>   	    TP_printk("dev=%u, ring=%u, seqno=%u, flags=%x",
> diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> index 0f5fe114c204..143891a2b68a 100644
> --- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
> +++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> @@ -22,6 +22,8 @@
>    *
>    */
>
> +#include <linux/kthread.h>
> +
>   #include "i915_drv.h"
>
>   static void intel_breadcrumbs_fake_irq(unsigned long data)
> @@ -321,6 +323,155 @@ out_unlock:
>   	spin_unlock(&b->lock);
>   }
>
> +struct signal {
> +	struct rb_node node;
> +	struct intel_wait wait;
> +	struct drm_i915_gem_request *request;
> +};
> +
> +static bool signal_complete(struct signal *signal)
> +{
> +	if (signal == NULL)
> +		return false;
> +
> +	/* If another process served as the bottom-half it may have already
> +	 * signalled that this wait is already completed.
> +	 */
> +	if (intel_wait_complete(&signal->wait))
> +		return true;
> +
> +	/* Carefully check if the request is complete, giving time for the
> +	 * seqno to be visible or if the GPU hung.
> +	 */
> +	if (__i915_request_irq_complete(signal->request))
> +		return true;
> +
> +	return false;
> +}
> +
> +static struct signal *to_signal(struct rb_node *rb)
> +{
> +	return container_of(rb, struct signal, node);
> +}
> +
> +static void signaler_set_rtpriority(void)
> +{
> +	 struct sched_param param = { .sched_priority = 1 };
> +	 sched_setscheduler_nocheck(current, SCHED_FIFO, &param);
> +}
> +
> +static int intel_breadcrumbs_signaler(void *arg)
> +{
> +	struct intel_engine_cs *engine = arg;
> +	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> +	struct signal *signal;
> +
> +	/* Install ourselves with high priority to reduce signalling latency */
> +	signaler_set_rtpriority();
> +
> +	do {
> +		set_current_state(TASK_INTERRUPTIBLE);
> +
> +		/* We are either woken up by the interrupt bottom-half,
> +		 * or by a client adding a new signaller. In both cases,
> +		 * the GPU seqno may have advanced beyond our oldest signal.
> +		 * If it has, propagate the signal, remove the waiter and
> +		 * check again with the next oldest signal. Otherwise we
> +		 * need to wait for a new interrupt from the GPU or for
> +		 * a new client.
> +		 */
> +		signal = READ_ONCE(b->first_signal);
> +		if (signal_complete(signal)) {
> +			/* Wake up all other completed waiters and select the
> +			 * next bottom-half for the next user interrupt.
> +			 */
> +			intel_engine_remove_wait(engine, &signal->wait);
> +
> +			i915_gem_request_unreference(signal->request);
> +
> +			/* Find the next oldest signal. Note that as we have
> +			 * not been holding the lock, another client may
> +			 * have installed an even older signal than the one
> +			 * we just completed - so double check we are still
> +			 * the oldest before picking the next one.
> +			 */
> +			spin_lock(&b->lock);
> +			if (signal == b->first_signal)
> +				b->first_signal = rb_next(&signal->node);
> +			rb_erase(&signal->node, &b->signals);
> +			spin_unlock(&b->lock);
> +
> +			kfree(signal);
> +		} else {
> +			if (kthread_should_stop())
> +				break;
> +
> +			schedule();
> +		}
> +	} while (1);
> +
> +	return 0;
> +}

So the thread is only because it is convenient to plug it in the 
breadcrumbs infrastructure. Otherwise the processing above could be done 
from a lighter weight context as well since nothing seems to need the 
process context.

One alternative could perhaps be to add a waiter->wake_up vfunc and 
signalers could then potentially use a tasklet?

> +
> +int intel_engine_enable_signaling(struct drm_i915_gem_request *request)
> +{
> +	struct intel_engine_cs *engine = request->engine;
> +	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> +	struct rb_node *parent, **p;
> +	struct signal *signal;
> +	bool first, wakeup;
> +
> +	if (unlikely(IS_ERR(b->signaler)))
> +		return PTR_ERR(b->signaler);

I don't see that there is a fallback is kthread creation failed. It 
should just fail in intel_engine_init_breadcrumbs if that happens.

> +
> +	signal = kmalloc(sizeof(*signal), GFP_ATOMIC);

Ugh GFP_ATOMIC - why?

And even should you instead just embed in into requests?

> +	if (unlikely(!signal))
> +		return -ENOMEM;
> +
> +	signal->wait.task = b->signaler;
> +	signal->wait.seqno = request->seqno;
> +
> +	signal->request = i915_gem_request_reference(request);
> +
> +	/* First add ourselves into the list of waiters, but register our
> +	 * bottom-half as the signaller thread. As per usual, only the oldest
> +	 * waiter (not just signaller) is tasked as the bottom-half waking
> +	 * up all completed waiters after the user interrupt.
> +	 *
> +	 * If we are the oldest waiter, enable the irq (after which we
> +	 * must double check that the seqno did not complete).
> +	 */
> +	wakeup = intel_engine_add_wait(engine, &signal->wait);
> +
> +	/* Now insert ourselves into the retirement ordered list of signals
> +	 * on this engine. We track the oldest seqno as that will be the
> +	 * first signal to complete.
> +	 */
> +	spin_lock(&b->lock);
> +	parent = NULL;
> +	first = true;
> +	p = &b->signals.rb_node;
> +	while (*p) {
> +		parent = *p;
> +		if (i915_seqno_passed(signal->wait.seqno,
> +				      to_signal(parent)->wait.seqno)) {
> +			p = &parent->rb_right;
> +			first = false;
> +		} else
> +			p = &parent->rb_left;
> +	}
> +	rb_link_node(&signal->node, parent, p);
> +	rb_insert_color(&signal->node, &b->signals);
> +	if (first)
> +		smp_store_mb(b->first_signal, signal);
> +	spin_unlock(&b->lock);
> +
> +	if (wakeup)
> +		wake_up_process(b->signaler);
> +
> +	return 0;
> +}
> +
>   void intel_engine_init_breadcrumbs(struct intel_engine_cs *engine)
>   {
>   	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> @@ -329,12 +480,24 @@ void intel_engine_init_breadcrumbs(struct intel_engine_cs *engine)
>   	setup_timer(&b->fake_irq,
>   		    intel_breadcrumbs_fake_irq,
>   		    (unsigned long)engine);
> +
> +	/* Spawn a thread to provide a common bottom-half for all signals.
> +	 * As this is an asynchronous interface we cannot steal the current
> +	 * task for handling the bottom-half to the user interrupt, therefore
> +	 * we create a thread to do the coherent seqno dance after the
> +	 * interrupt and then signal the waitqueue (via the dma-buf/fence).
> +	 */
> +	b->signaler = kthread_run(intel_breadcrumbs_signaler,
> +				  engine, "irq/i915:%d", engine->id);

As commented above, init should fail here because it cannot run without 
the thread.

>   }
>
>   void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine)
>   {
>   	struct intel_breadcrumbs *b = &engine->breadcrumbs;
>
> +	if (!IS_ERR_OR_NULL(b->signaler))
> +		kthread_stop(b->signaler);
> +
>   	del_timer_sync(&b->fake_irq);
>   }
>
> @@ -356,3 +519,18 @@ unsigned intel_kick_waiters(struct drm_i915_private *i915)
>
>   	return mask;
>   }
> +
> +unsigned intel_kick_signalers(struct drm_i915_private *i915)
> +{
> +	struct intel_engine_cs *engine;
> +	unsigned mask = 0;
> +
> +	for_each_engine(engine, i915) {
> +		if (unlikely(READ_ONCE(engine->breadcrumbs.first_signal))) {
> +			wake_up_process(engine->breadcrumbs.signaler);
> +			mask |= intel_engine_flag(engine);
> +		}
> +	}
> +
> +	return mask;
> +}
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 324f85e8d540..f4bca38caef0 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -141,6 +141,8 @@ struct  i915_ctx_workarounds {
>   	struct drm_i915_gem_object *obj;
>   };
>
> +struct drm_i915_gem_request;
> +
>   struct intel_engine_cs {
>   	struct drm_i915_private *i915;
>   	const char	*name;
> @@ -179,8 +181,11 @@ struct intel_engine_cs {
>   	struct intel_breadcrumbs {
>   		spinlock_t lock; /* protects the lists of requests */
>   		struct rb_root waiters; /* sorted by retirement, priority */
> +		struct rb_root signals; /* sorted by retirement */
>   		struct intel_wait *first_wait; /* oldest waiter by retirement */
>   		struct task_struct *tasklet; /* bh for user interrupts */
> +		struct task_struct *signaler; /* used for fence signalling */
> +		void *first_signal;
>   		struct timer_list fake_irq; /* used after a missed interrupt */
>   		bool irq_enabled;
>   		bool rpm_wakelock;
> @@ -199,7 +204,6 @@ struct intel_engine_cs {
>   	unsigned irq_refcount; /* protected by dev_priv->irq_lock */
>   	bool		irq_posted;
>   	u32		irq_enable_mask;	/* bitmask to enable ring interrupt */
> -	struct drm_i915_gem_request *trace_irq_req;
>   	bool __must_check (*irq_get)(struct intel_engine_cs *ring);
>   	void		(*irq_put)(struct intel_engine_cs *ring);
>
> @@ -540,6 +544,7 @@ bool intel_engine_add_wait(struct intel_engine_cs *engine,
>   			   struct intel_wait *wait);
>   void intel_engine_remove_wait(struct intel_engine_cs *engine,
>   			      struct intel_wait *wait);
> +int intel_engine_enable_signaling(struct drm_i915_gem_request *request);
>   static inline bool intel_engine_has_waiter(struct intel_engine_cs *engine)
>   {
>   	return READ_ONCE(engine->breadcrumbs.tasklet);
> @@ -559,5 +564,6 @@ static inline bool intel_engine_wakeup(struct intel_engine_cs *engine)
>   void intel_engine_enable_fake_irq(struct intel_engine_cs *engine);
>   void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine);
>   unsigned intel_kick_waiters(struct drm_i915_private *i915);
> +unsigned intel_kick_signalers(struct drm_i915_private *i915);
>
>   #endif /* _INTEL_RINGBUFFER_H_ */
>

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 05/21] drm/i915: Separate GPU hang waitqueue from advance
  2016-06-06 13:00   ` Tvrtko Ursulin
@ 2016-06-07 12:11     ` Arun Siluvery
  0 siblings, 0 replies; 60+ messages in thread
From: Arun Siluvery @ 2016-06-07 12:11 UTC (permalink / raw)
  To: Tvrtko Ursulin, Chris Wilson, intel-gfx

On 06/06/2016 18:30, Tvrtko Ursulin wrote:
>
> On 03/06/16 17:08, Chris Wilson wrote:
>> Currently __i915_wait_request uses a per-engine wait_queue_t for the dual
>> purpose of waking after the GPU advances or for waking after an error.
>> In the future, we may add even more wake sources and require greater
>> separation, but for now we can conceptually simplify wakeups by
>> separating
>> the two sources. In particular, this allows us to use different
>> wait-queues
>> (e.g. one on the engine advancement, a global one for errors and one on
>> each requests) without any hassle.
>
> + Arun
>
> I think this will conflict with the TDR work where one of the features
> is to make reset handling per engine. So I am not sure how beneficial in
> general, or painful for the TDR series, this patch might be.

Thanks Tvrtko.
Chris has give some comments on a related tdr patch based on these 
changes. I am looking into how to update my changes based on this.

Tdr code need access to struct_mutex so if a waiter is holding it then 
we should be able to ask it to try again so that we can proceed with 
recovery, similarly when an engine reset is in progress.

regards
Arun

>
> Regards,
>
> Tvrtko
>
>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> ---
>>   drivers/gpu/drm/i915/i915_drv.h |  6 ++++++
>>   drivers/gpu/drm/i915/i915_gem.c |  5 +++++
>>   drivers/gpu/drm/i915/i915_irq.c | 19 ++++---------------
>>   3 files changed, 15 insertions(+), 15 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h
>> b/drivers/gpu/drm/i915/i915_drv.h
>> index ceccc6d6b119..e399e97965e0 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -1401,6 +1401,12 @@ struct i915_gpu_error {
>>   #define I915_WEDGED            (1 << 31)
>>
>>       /**
>> +     * Waitqueue to signal when a hang is detected. Used to for waiters
>> +     * to release the struct_mutex for the reset to procede.
>> +     */
>> +    wait_queue_head_t wait_queue;
>> +
>> +    /**
>>        * Waitqueue to signal when the reset has completed. Used by
>> clients
>>        * that wait for dev_priv->mm.wedged to settle.
>>        */
>> diff --git a/drivers/gpu/drm/i915/i915_gem.c
>> b/drivers/gpu/drm/i915/i915_gem.c
>> index 03256f096ab6..de4fb39312a4 100644
>> --- a/drivers/gpu/drm/i915/i915_gem.c
>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>> @@ -1234,6 +1234,7 @@ int __i915_wait_request(struct
>> drm_i915_gem_request *req,
>>       const bool irq_test_in_progress =
>>           ACCESS_ONCE(dev_priv->gpu_error.test_irq_rings) &
>> intel_engine_flag(engine);
>>       int state = interruptible ? TASK_INTERRUPTIBLE :
>> TASK_UNINTERRUPTIBLE;
>> +    DEFINE_WAIT(reset);
>>       DEFINE_WAIT(wait);
>>       unsigned long timeout_expire;
>>       s64 before = 0; /* Only to silence a compiler warning. */
>> @@ -1278,6 +1279,7 @@ int __i915_wait_request(struct
>> drm_i915_gem_request *req,
>>           goto out;
>>       }
>>
>> +    add_wait_queue(&dev_priv->gpu_error.wait_queue, &reset);
>>       for (;;) {
>>           struct timer_list timer;
>>
>> @@ -1329,6 +1331,8 @@ int __i915_wait_request(struct
>> drm_i915_gem_request *req,
>>               destroy_timer_on_stack(&timer);
>>           }
>>       }
>> +    remove_wait_queue(&dev_priv->gpu_error.wait_queue, &reset);
>> +
>>       if (!irq_test_in_progress)
>>           engine->irq_put(engine);
>>
>> @@ -5026,6 +5030,7 @@ i915_gem_load_init(struct drm_device *dev)
>>                 i915_gem_retire_work_handler);
>>       INIT_DELAYED_WORK(&dev_priv->mm.idle_work,
>>                 i915_gem_idle_work_handler);
>> +    init_waitqueue_head(&dev_priv->gpu_error.wait_queue);
>>       init_waitqueue_head(&dev_priv->gpu_error.reset_queue);
>>
>>       dev_priv->relative_constants_mode =
>> I915_EXEC_CONSTANTS_REL_GENERAL;
>> diff --git a/drivers/gpu/drm/i915/i915_irq.c
>> b/drivers/gpu/drm/i915/i915_irq.c
>> index 83cab14639b2..30127b94f26e 100644
>> --- a/drivers/gpu/drm/i915/i915_irq.c
>> +++ b/drivers/gpu/drm/i915/i915_irq.c
>> @@ -2488,11 +2488,8 @@ static irqreturn_t gen8_irq_handler(int irq,
>> void *arg)
>>       return ret;
>>   }
>>
>> -static void i915_error_wake_up(struct drm_i915_private *dev_priv,
>> -                   bool reset_completed)
>> +static void i915_error_wake_up(struct drm_i915_private *dev_priv)
>>   {
>> -    struct intel_engine_cs *engine;
>> -
>>       /*
>>        * Notify all waiters for GPU completion events that reset state
>> has
>>        * been changed, and that they need to restart their wait after
>> @@ -2501,18 +2498,10 @@ static void i915_error_wake_up(struct
>> drm_i915_private *dev_priv,
>>        */
>>
>>       /* Wake up __wait_seqno, potentially holding dev->struct_mutex. */
>> -    for_each_engine(engine, dev_priv)
>> -        wake_up_all(&engine->irq_queue);
>> +    wake_up_all(&dev_priv->gpu_error.wait_queue);
>>
>>       /* Wake up intel_crtc_wait_for_pending_flips, holding
>> crtc->mutex. */
>>       wake_up_all(&dev_priv->pending_flip_queue);
>> -
>> -    /*
>> -     * Signal tasks blocked in i915_gem_wait_for_error that the pending
>> -     * reset state is cleared.
>> -     */
>> -    if (reset_completed)
>> -        wake_up_all(&dev_priv->gpu_error.reset_queue);
>>   }
>>
>>   /**
>> @@ -2577,7 +2566,7 @@ static void i915_reset_and_wakeup(struct
>> drm_i915_private *dev_priv)
>>            * Note: The wake_up also serves as a memory barrier so that
>>            * waiters see the update value of the reset counter atomic_t.
>>            */
>> -        i915_error_wake_up(dev_priv, true);
>> +        wake_up_all(&dev_priv->gpu_error.reset_queue);
>>       }
>>   }
>>
>> @@ -2713,7 +2702,7 @@ void i915_handle_error(struct drm_i915_private
>> *dev_priv,
>>            * ensure that the waiters see the updated value of the reset
>>            * counter atomic_t.
>>            */
>> -        i915_error_wake_up(dev_priv, false);
>> +        i915_error_wake_up(dev_priv);
>>       }
>>
>>       i915_reset_and_wakeup(dev_priv);
>>
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 18/21] drm/i915: Embed signaling node into the GEM request
  2016-06-03 16:08 ` [PATCH 18/21] drm/i915: Embed signaling node into the GEM request Chris Wilson
@ 2016-06-07 12:31   ` Tvrtko Ursulin
  2016-06-08  9:54     ` Chris Wilson
  0 siblings, 1 reply; 60+ messages in thread
From: Tvrtko Ursulin @ 2016-06-07 12:31 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 03/06/16 17:08, Chris Wilson wrote:
> Under the assumption that enabling signaling will be a frequent
> operation, lets preallocate our attachments for signaling inside the
> request struct (and so benefiting from the slab cache).

Oh you did this part which I suggested in the previous patch. :)

> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_drv.h          |  1 +
>   drivers/gpu/drm/i915/intel_breadcrumbs.c | 89 ++++++++++++++++++--------------
>   drivers/gpu/drm/i915/intel_ringbuffer.h  |  6 +++
>   3 files changed, 56 insertions(+), 40 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index b0235372cfdf..88d9242398ce 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2363,6 +2363,7 @@ struct drm_i915_gem_request {
>   	struct drm_i915_private *i915;
>   	struct intel_engine_cs *engine;
>   	unsigned reset_counter;
> +	struct intel_signal_node signaling;
>
>   	 /** GEM sequence number associated with the previous request,
>   	  * when the HWS breadcrumb is equal to this the GPU is processing
> diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> index 143891a2b68a..8ab508ed4248 100644
> --- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
> +++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> @@ -128,16 +128,14 @@ static inline void __intel_breadcrumbs_finish(struct intel_breadcrumbs *b,
>   	wake_up_process(wait->task); /* implicit smp_wmb() */
>   }
>
> -bool intel_engine_add_wait(struct intel_engine_cs *engine,
> -			   struct intel_wait *wait)
> +static bool __intel_engine_add_wait(struct intel_engine_cs *engine,
> +				    struct intel_wait *wait)
>   {
>   	struct intel_breadcrumbs *b = &engine->breadcrumbs;
>   	struct rb_node **p, *parent, *completed;
>   	bool first;
>   	u32 seqno;
>
> -	spin_lock(&b->lock);
> -
>   	/* Insert the request into the retirement ordered list
>   	 * of waiters by walking the rbtree. If we are the oldest
>   	 * seqno in the tree (the first to be retired), then
> @@ -223,6 +221,17 @@ bool intel_engine_add_wait(struct intel_engine_cs *engine,
>   	GEM_BUG_ON(!b->first_wait);
>   	GEM_BUG_ON(rb_first(&b->waiters) != &b->first_wait->node);
>
> +	return first;
> +}
> +
> +bool intel_engine_add_wait(struct intel_engine_cs *engine,
> +			   struct intel_wait *wait)
> +{
> +	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> +	bool first;
> +
> +	spin_lock(&b->lock);
> +	first = __intel_engine_add_wait(engine, wait);
>   	spin_unlock(&b->lock);
>
>   	return first;
> @@ -323,35 +332,29 @@ out_unlock:
>   	spin_unlock(&b->lock);
>   }
>
> -struct signal {
> -	struct rb_node node;
> -	struct intel_wait wait;
> -	struct drm_i915_gem_request *request;
> -};
> -
> -static bool signal_complete(struct signal *signal)
> +static bool signal_complete(struct drm_i915_gem_request *request)
>   {
> -	if (signal == NULL)
> +	if (request == NULL)
>   		return false;
>
>   	/* If another process served as the bottom-half it may have already
>   	 * signalled that this wait is already completed.
>   	 */
> -	if (intel_wait_complete(&signal->wait))
> +	if (intel_wait_complete(&request->signaling.wait))
>   		return true;
>
>   	/* Carefully check if the request is complete, giving time for the
>   	 * seqno to be visible or if the GPU hung.
>   	 */
> -	if (__i915_request_irq_complete(signal->request))
> +	if (__i915_request_irq_complete(request))
>   		return true;
>
>   	return false;
>   }
>
> -static struct signal *to_signal(struct rb_node *rb)
> +static struct drm_i915_gem_request *to_signal(struct rb_node *rb)

Why it is call to_signal then?

>   {
> -	return container_of(rb, struct signal, node);
> +	return container_of(rb, struct drm_i915_gem_request, signaling.node);
>   }
>
>   static void signaler_set_rtpriority(void)
> @@ -364,7 +367,7 @@ static int intel_breadcrumbs_signaler(void *arg)
>   {
>   	struct intel_engine_cs *engine = arg;
>   	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> -	struct signal *signal;
> +	struct drm_i915_gem_request *request;
>
>   	/* Install ourselves with high priority to reduce signalling latency */
>   	signaler_set_rtpriority();
> @@ -380,14 +383,13 @@ static int intel_breadcrumbs_signaler(void *arg)
>   		 * need to wait for a new interrupt from the GPU or for
>   		 * a new client.
>   		 */
> -		signal = READ_ONCE(b->first_signal);
> -		if (signal_complete(signal)) {
> +		request = READ_ONCE(b->first_signal);
> +		if (signal_complete(request)) {
>   			/* Wake up all other completed waiters and select the
>   			 * next bottom-half for the next user interrupt.
>   			 */
> -			intel_engine_remove_wait(engine, &signal->wait);
> -
> -			i915_gem_request_unreference(signal->request);
> +			intel_engine_remove_wait(engine,
> +						 &request->signaling.wait);
>
>   			/* Find the next oldest signal. Note that as we have
>   			 * not been holding the lock, another client may
> @@ -396,12 +398,15 @@ static int intel_breadcrumbs_signaler(void *arg)
>   			 * the oldest before picking the next one.
>   			 */
>   			spin_lock(&b->lock);
> -			if (signal == b->first_signal)
> -				b->first_signal = rb_next(&signal->node);
> -			rb_erase(&signal->node, &b->signals);
> +			if (request == b->first_signal) {
> +				struct rb_node *rb =
> +					rb_next(&request->signaling.node);
> +				b->first_signal = rb ? to_signal(rb) : NULL;

Made me look in the previous patch on how you didn't need to change the 
type for first_signal in this one. void* ! :) Please fix there. :)

> +			}
> +			rb_erase(&request->signaling.node, &b->signals);
>   			spin_unlock(&b->lock);
>
> -			kfree(signal);
> +			i915_gem_request_unreference(request);
>   		} else {
>   			if (kthread_should_stop())
>   				break;
> @@ -418,20 +423,23 @@ int intel_engine_enable_signaling(struct drm_i915_gem_request *request)
>   	struct intel_engine_cs *engine = request->engine;
>   	struct intel_breadcrumbs *b = &engine->breadcrumbs;
>   	struct rb_node *parent, **p;
> -	struct signal *signal;
>   	bool first, wakeup;
>
>   	if (unlikely(IS_ERR(b->signaler)))
>   		return PTR_ERR(b->signaler);
>
> -	signal = kmalloc(sizeof(*signal), GFP_ATOMIC);
> -	if (unlikely(!signal))
> -		return -ENOMEM;
> +	if (unlikely(READ_ONCE(request->signaling.wait.task)))
> +		return 0;

Hmm it will depend on following patches whether this is safe. I don't 
like the explosion of READ_ONCE and smp_store_mb's in these patches. 
Something is bound to be broken.

You even check it below under the lock. So I am not sure this 
optimisation is worth it. Maybe leave it for later?

>
> -	signal->wait.task = b->signaler;
> -	signal->wait.seqno = request->seqno;
> +	spin_lock(&b->lock);
> +	if (unlikely(request->signaling.wait.task)) {
> +		wakeup = false;
> +		goto unlock;
> +	}
>
> -	signal->request = i915_gem_request_reference(request);
> +	request->signaling.wait.task = b->signaler;
> +	request->signaling.wait.seqno = request->seqno;
> +	i915_gem_request_reference(request);
>
>   	/* First add ourselves into the list of waiters, but register our
>   	 * bottom-half as the signaller thread. As per usual, only the oldest
> @@ -441,29 +449,30 @@ int intel_engine_enable_signaling(struct drm_i915_gem_request *request)
>   	 * If we are the oldest waiter, enable the irq (after which we
>   	 * must double check that the seqno did not complete).
>   	 */
> -	wakeup = intel_engine_add_wait(engine, &signal->wait);
> +	wakeup = __intel_engine_add_wait(engine, &request->signaling.wait);
>
>   	/* Now insert ourselves into the retirement ordered list of signals
>   	 * on this engine. We track the oldest seqno as that will be the
>   	 * first signal to complete.
>   	 */
> -	spin_lock(&b->lock);
>   	parent = NULL;
>   	first = true;
>   	p = &b->signals.rb_node;
>   	while (*p) {
>   		parent = *p;
> -		if (i915_seqno_passed(signal->wait.seqno,
> -				      to_signal(parent)->wait.seqno)) {
> +		if (i915_seqno_passed(request->seqno,
> +				      to_signal(parent)->seqno)) {
>   			p = &parent->rb_right;
>   			first = false;
>   		} else
>   			p = &parent->rb_left;
>   	}
> -	rb_link_node(&signal->node, parent, p);
> -	rb_insert_color(&signal->node, &b->signals);
> +	rb_link_node(&request->signaling.node, parent, p);
> +	rb_insert_color(&request->signaling.node, &b->signals);
>   	if (first)
> -		smp_store_mb(b->first_signal, signal);
> +		smp_store_mb(b->first_signal, request);
> +
> +unlock:
>   	spin_unlock(&b->lock);
>
>   	if (wakeup)
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index f4bca38caef0..5f7cb3d0ea1c 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -530,6 +530,12 @@ struct intel_wait {
>   	struct task_struct *task;
>   	u32 seqno;
>   };
> +
> +struct intel_signal_node {
> +	struct rb_node node;
> +	struct intel_wait wait;
> +};
> +
>   void intel_engine_init_breadcrumbs(struct intel_engine_cs *engine);
>   static inline void intel_wait_init(struct intel_wait *wait, u32 seqno)
>   {
>

Otherwise looks OK.

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 19/21] drm/i915: Move the get/put irq locking into the caller
  2016-06-03 16:08 ` [PATCH 19/21] drm/i915: Move the get/put irq locking into the caller Chris Wilson
@ 2016-06-07 12:46   ` Tvrtko Ursulin
  2016-06-08 10:01     ` Chris Wilson
  0 siblings, 1 reply; 60+ messages in thread
From: Tvrtko Ursulin @ 2016-06-07 12:46 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 03/06/16 17:08, Chris Wilson wrote:
> With only a single callsite for intel_engine_cs->irq_get and ->irq_put,
> we can reduce the code size by moving the common preamble into the
> caller, and we can also eliminate the reference counting.
>
> For completeness, as we are no longer doing reference counting on irq,
> rename the get/put vfunctions to enable/disable respectively.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_irq.c          |   8 +-
>   drivers/gpu/drm/i915/intel_breadcrumbs.c |  10 +-
>   drivers/gpu/drm/i915/intel_lrc.c         |  34 +---
>   drivers/gpu/drm/i915/intel_ringbuffer.c  | 269 ++++++++++---------------------
>   drivers/gpu/drm/i915/intel_ringbuffer.h  |   5 +-
>   5 files changed, 108 insertions(+), 218 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 14b3d65bb604..5bdb433dde8c 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -259,12 +259,12 @@ static void ilk_update_gt_irq(struct drm_i915_private *dev_priv,
>   	dev_priv->gt_irq_mask &= ~interrupt_mask;
>   	dev_priv->gt_irq_mask |= (~enabled_irq_mask & interrupt_mask);
>   	I915_WRITE(GTIMR, dev_priv->gt_irq_mask);
> -	POSTING_READ(GTIMR);
>   }
>
>   void gen5_enable_gt_irq(struct drm_i915_private *dev_priv, uint32_t mask)
>   {
>   	ilk_update_gt_irq(dev_priv, mask, mask);
> +	POSTING_READ_FW(GTIMR);
>   }

Unrelated hunks?

How is POSTING_READ_FW correct?

Also removes the posting read from disable, OK?

>
>   void gen5_disable_gt_irq(struct drm_i915_private *dev_priv, uint32_t mask)
> @@ -2818,9 +2818,9 @@ ring_idle(struct intel_engine_cs *engine, u32 seqno)
>   }
>
>   static bool
> -ipehr_is_semaphore_wait(struct drm_i915_private *dev_priv, u32 ipehr)
> +ipehr_is_semaphore_wait(struct intel_engine_cs *engine, u32 ipehr)
>   {
> -	if (INTEL_GEN(dev_priv) >= 8) {
> +	if (INTEL_GEN(engine->i915) >= 8) {
>   		return (ipehr >> 23) == 0x1c;
>   	} else {
>   		ipehr &= ~MI_SEMAPHORE_SYNC_MASK;
> @@ -2891,7 +2891,7 @@ semaphore_waits_for(struct intel_engine_cs *engine, u32 *seqno)
>   		return NULL;
>
>   	ipehr = I915_READ(RING_IPEHR(engine->mmio_base));
> -	if (!ipehr_is_semaphore_wait(engine->i915, ipehr))
> +	if (!ipehr_is_semaphore_wait(engine, ipehr))
>   		return NULL;

Two hunks of meh as some would say. :)

>
>   	/*
> diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> index 8ab508ed4248..dc65a007fa20 100644
> --- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
> +++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> @@ -50,12 +50,18 @@ static void irq_enable(struct intel_engine_cs *engine)
>   	 * just in case.
>   	 */
>   	engine->irq_posted = true;
> -	WARN_ON(!engine->irq_get(engine));
> +
> +	spin_lock_irq(&engine->i915->irq_lock);
> +	engine->irq_enable(engine);
> +	spin_unlock_irq(&engine->i915->irq_lock);
>   }
>
>   static void irq_disable(struct intel_engine_cs *engine)
>   {
> -	engine->irq_put(engine);
> +	spin_lock_irq(&engine->i915->irq_lock);
> +	engine->irq_disable(engine);
> +	spin_unlock_irq(&engine->i915->irq_lock);
> +
>   	engine->irq_posted = false;
>   }
>
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 32b5eae7dd11..9e19b2c5b3ae 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1578,36 +1578,18 @@ static int gen8_emit_bb_start(struct drm_i915_gem_request *req,
>   	return 0;
>   }
>
> -static bool gen8_logical_ring_get_irq(struct intel_engine_cs *engine)
> +static void gen8_logical_ring_enable_irq(struct intel_engine_cs *engine)
>   {
>   	struct drm_i915_private *dev_priv = engine->i915;
> -	unsigned long flags;
> -
> -	if (WARN_ON(!intel_irqs_enabled(dev_priv)))
> -		return false;
> -
> -	spin_lock_irqsave(&dev_priv->irq_lock, flags);
> -	if (engine->irq_refcount++ == 0) {
> -		I915_WRITE_IMR(engine,
> -			       ~(engine->irq_enable_mask | engine->irq_keep_mask));
> -		POSTING_READ(RING_IMR(engine->mmio_base));
> -	}
> -	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
> -
> -	return true;
> +	I915_WRITE_IMR(engine,
> +		       ~(engine->irq_enable_mask | engine->irq_keep_mask));
> +	POSTING_READ_FW(RING_IMR(engine->mmio_base));

Hm, more of _FW following normal access. What am I missing? You are not 
by any chance banking on the auto-release window?

>   }
>
> -static void gen8_logical_ring_put_irq(struct intel_engine_cs *engine)
> +static void gen8_logical_ring_disable_irq(struct intel_engine_cs *engine)
>   {
>   	struct drm_i915_private *dev_priv = engine->i915;
> -	unsigned long flags;
> -
> -	spin_lock_irqsave(&dev_priv->irq_lock, flags);
> -	if (--engine->irq_refcount == 0) {
> -		I915_WRITE_IMR(engine, ~engine->irq_keep_mask);
> -		POSTING_READ(RING_IMR(engine->mmio_base));
> -	}
> -	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
> +	I915_WRITE_IMR(engine, ~engine->irq_keep_mask);

Another posting read gone here?

>   }
>
>   static int gen8_emit_flush(struct drm_i915_gem_request *request,
> @@ -1895,8 +1877,8 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
>   	engine->init_hw = gen8_init_common_ring;
>   	engine->emit_request = gen8_emit_request;
>   	engine->emit_flush = gen8_emit_flush;
> -	engine->irq_get = gen8_logical_ring_get_irq;
> -	engine->irq_put = gen8_logical_ring_put_irq;
> +	engine->irq_enable = gen8_logical_ring_enable_irq;
> +	engine->irq_disable = gen8_logical_ring_disable_irq;
>   	engine->emit_bb_start = gen8_emit_bb_start;
>   	if (IS_BXT_REVID(engine->i915, 0, BXT_REVID_A1))
>   		engine->irq_seqno_barrier = bxt_a_seqno_barrier;
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 30e400d77d23..ba84b469f13f 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1551,103 +1551,54 @@ gen6_seqno_barrier(struct intel_engine_cs *engine)
>   	spin_unlock_irq(&dev_priv->uncore.lock);
>   }
>
> -static bool
> -gen5_ring_get_irq(struct intel_engine_cs *engine)
> +static void
> +gen5_ring_enable_irq(struct intel_engine_cs *engine)
>   {
> -	struct drm_i915_private *dev_priv = engine->i915;
> -	unsigned long flags;
> -
> -	if (WARN_ON(!intel_irqs_enabled(dev_priv)))
> -		return false;
> -
> -	spin_lock_irqsave(&dev_priv->irq_lock, flags);
> -	if (engine->irq_refcount++ == 0)
> -		gen5_enable_gt_irq(dev_priv, engine->irq_enable_mask);
> -	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
> -
> -	return true;
> +	gen5_enable_gt_irq(engine->i915, engine->irq_enable_mask);
>   }
>
>   static void
> -gen5_ring_put_irq(struct intel_engine_cs *engine)
> +gen5_ring_disable_irq(struct intel_engine_cs *engine)
>   {
> -	struct drm_i915_private *dev_priv = engine->i915;
> -	unsigned long flags;
> -
> -	spin_lock_irqsave(&dev_priv->irq_lock, flags);
> -	if (--engine->irq_refcount == 0)
> -		gen5_disable_gt_irq(dev_priv, engine->irq_enable_mask);
> -	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
> +	gen5_disable_gt_irq(engine->i915, engine->irq_enable_mask);
>   }
>
> -static bool
> -i9xx_ring_get_irq(struct intel_engine_cs *engine)
> +static void
> +i9xx_ring_enable_irq(struct intel_engine_cs *engine)
>   {
>   	struct drm_i915_private *dev_priv = engine->i915;
> -	unsigned long flags;
> -
> -	if (!intel_irqs_enabled(dev_priv))
> -		return false;
> -
> -	spin_lock_irqsave(&dev_priv->irq_lock, flags);
> -	if (engine->irq_refcount++ == 0) {
> -		dev_priv->irq_mask &= ~engine->irq_enable_mask;
> -		I915_WRITE(IMR, dev_priv->irq_mask);
> -		POSTING_READ(IMR);
> -	}
> -	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
>
> -	return true;
> +	dev_priv->irq_mask &= ~engine->irq_enable_mask;
> +	I915_WRITE(IMR, dev_priv->irq_mask);
> +	POSTING_READ_FW(RING_IMR(engine->mmio_base));
>   }
>
>   static void
> -i9xx_ring_put_irq(struct intel_engine_cs *engine)
> +i9xx_ring_disable_irq(struct intel_engine_cs *engine)
>   {
>   	struct drm_i915_private *dev_priv = engine->i915;
> -	unsigned long flags;
>
> -	spin_lock_irqsave(&dev_priv->irq_lock, flags);
> -	if (--engine->irq_refcount == 0) {
> -		dev_priv->irq_mask |= engine->irq_enable_mask;
> -		I915_WRITE(IMR, dev_priv->irq_mask);
> -		POSTING_READ(IMR);
> -	}
> -	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
> +	dev_priv->irq_mask |= engine->irq_enable_mask;
> +	I915_WRITE(IMR, dev_priv->irq_mask);
>   }
>
> -static bool
> -i8xx_ring_get_irq(struct intel_engine_cs *engine)
> +static void
> +i8xx_ring_enable_irq(struct intel_engine_cs *engine)
>   {
>   	struct drm_i915_private *dev_priv = engine->i915;
> -	unsigned long flags;
>
> -	if (!intel_irqs_enabled(dev_priv))
> -		return false;
> -
> -	spin_lock_irqsave(&dev_priv->irq_lock, flags);
> -	if (engine->irq_refcount++ == 0) {
> -		dev_priv->irq_mask &= ~engine->irq_enable_mask;
> -		I915_WRITE16(IMR, dev_priv->irq_mask);
> -		POSTING_READ16(IMR);
> -	}
> -	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
> -
> -	return true;
> +	dev_priv->irq_mask &= ~engine->irq_enable_mask;
> +	I915_WRITE16(IMR, dev_priv->irq_mask);
> +	POSTING_READ16(RING_IMR(engine->mmio_base));
>   }
>
>   static void
> -i8xx_ring_put_irq(struct intel_engine_cs *engine)
> +i8xx_ring_disable_irq(struct intel_engine_cs *engine)
>   {
>   	struct drm_i915_private *dev_priv = engine->i915;
> -	unsigned long flags;
>
> -	spin_lock_irqsave(&dev_priv->irq_lock, flags);
> -	if (--engine->irq_refcount == 0) {
> -		dev_priv->irq_mask |= engine->irq_enable_mask;
> -		I915_WRITE16(IMR, dev_priv->irq_mask);
> -		POSTING_READ16(IMR);
> -	}
> -	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
> +	dev_priv->irq_mask |= engine->irq_enable_mask;
> +	I915_WRITE16(IMR, dev_priv->irq_mask);
>   }
>
>   static int
> @@ -1688,122 +1639,74 @@ i9xx_add_request(struct drm_i915_gem_request *req)
>   	return 0;
>   }
>
> -static bool
> -gen6_ring_get_irq(struct intel_engine_cs *engine)
> +static void
> +gen6_ring_enable_irq(struct intel_engine_cs *engine)
>   {
>   	struct drm_i915_private *dev_priv = engine->i915;
> -	unsigned long flags;
> -
> -	if (WARN_ON(!intel_irqs_enabled(dev_priv)))
> -		return false;
>
> -	spin_lock_irqsave(&dev_priv->irq_lock, flags);
> -	if (engine->irq_refcount++ == 0) {
> -		if (HAS_L3_DPF(dev_priv) && engine->id == RCS)
> -			I915_WRITE_IMR(engine,
> -				       ~(engine->irq_enable_mask |
> -					 GT_PARITY_ERROR(dev_priv)));
> -		else
> -			I915_WRITE_IMR(engine, ~engine->irq_enable_mask);
> -		gen5_enable_gt_irq(dev_priv, engine->irq_enable_mask);
> -	}
> -	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
> -
> -	return true;
> +	if (HAS_L3_DPF(dev_priv) && engine->id == RCS)
> +		I915_WRITE_IMR(engine,
> +			       ~(engine->irq_enable_mask |
> +				 GT_PARITY_ERROR(dev_priv)));
> +	else
> +		I915_WRITE_IMR(engine, ~engine->irq_enable_mask);
> +	gen5_enable_gt_irq(dev_priv, engine->irq_enable_mask);
>   }
>
>   static void
> -gen6_ring_put_irq(struct intel_engine_cs *engine)
> +gen6_ring_disable_irq(struct intel_engine_cs *engine)
>   {
>   	struct drm_i915_private *dev_priv = engine->i915;
> -	unsigned long flags;
>
> -	spin_lock_irqsave(&dev_priv->irq_lock, flags);
> -	if (--engine->irq_refcount == 0) {
> -		if (HAS_L3_DPF(dev_priv) && engine->id == RCS)
> -			I915_WRITE_IMR(engine, ~GT_PARITY_ERROR(dev_priv));
> -		else
> -			I915_WRITE_IMR(engine, ~0);
> -		gen5_disable_gt_irq(dev_priv, engine->irq_enable_mask);
> -	}
> -	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
> +	if (HAS_L3_DPF(dev_priv) && engine->id == RCS)
> +		I915_WRITE_IMR(engine, ~GT_PARITY_ERROR(dev_priv));
> +	else
> +		I915_WRITE_IMR(engine, ~0);
> +	gen5_disable_gt_irq(dev_priv, engine->irq_enable_mask);
>   }
>
> -static bool
> -hsw_vebox_get_irq(struct intel_engine_cs *engine)
> +static void
> +hsw_vebox_enable_irq(struct intel_engine_cs *engine)
>   {
>   	struct drm_i915_private *dev_priv = engine->i915;
> -	unsigned long flags;
> -
> -	if (WARN_ON(!intel_irqs_enabled(dev_priv)))
> -		return false;
>
> -	spin_lock_irqsave(&dev_priv->irq_lock, flags);
> -	if (engine->irq_refcount++ == 0) {
> -		I915_WRITE_IMR(engine, ~engine->irq_enable_mask);
> -		gen6_enable_pm_irq(dev_priv, engine->irq_enable_mask);
> -	}
> -	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
> -
> -	return true;
> +	I915_WRITE_IMR(engine, ~engine->irq_enable_mask);
> +	gen6_enable_pm_irq(dev_priv, engine->irq_enable_mask);
>   }
>
>   static void
> -hsw_vebox_put_irq(struct intel_engine_cs *engine)
> +hsw_vebox_disable_irq(struct intel_engine_cs *engine)
>   {
>   	struct drm_i915_private *dev_priv = engine->i915;
> -	unsigned long flags;
>
> -	spin_lock_irqsave(&dev_priv->irq_lock, flags);
> -	if (--engine->irq_refcount == 0) {
> -		I915_WRITE_IMR(engine, ~0);
> -		gen6_disable_pm_irq(dev_priv, engine->irq_enable_mask);
> -	}
> -	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
> +	I915_WRITE_IMR(engine, ~0);
> +	gen6_disable_pm_irq(dev_priv, engine->irq_enable_mask);
>   }
>
> -static bool
> -gen8_ring_get_irq(struct intel_engine_cs *engine)
> +static void
> +gen8_ring_enable_irq(struct intel_engine_cs *engine)
>   {
>   	struct drm_i915_private *dev_priv = engine->i915;
> -	unsigned long flags;
>
> -	if (WARN_ON(!intel_irqs_enabled(dev_priv)))
> -		return false;
> -
> -	spin_lock_irqsave(&dev_priv->irq_lock, flags);
> -	if (engine->irq_refcount++ == 0) {
> -		if (HAS_L3_DPF(dev_priv) && engine->id == RCS) {
> -			I915_WRITE_IMR(engine,
> -				       ~(engine->irq_enable_mask |
> -					 GT_RENDER_L3_PARITY_ERROR_INTERRUPT));
> -		} else {
> -			I915_WRITE_IMR(engine, ~engine->irq_enable_mask);
> -		}
> -		POSTING_READ(RING_IMR(engine->mmio_base));
> -	}
> -	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
> -
> -	return true;
> +	if (HAS_L3_DPF(dev_priv) && engine->id == RCS)
> +		I915_WRITE_IMR(engine,
> +			       ~(engine->irq_enable_mask |
> +				 GT_RENDER_L3_PARITY_ERROR_INTERRUPT));
> +	else
> +		I915_WRITE_IMR(engine, ~engine->irq_enable_mask);
> +	POSTING_READ_FW(RING_IMR(engine->mmio_base));
>   }
>
>   static void
> -gen8_ring_put_irq(struct intel_engine_cs *engine)
> +gen8_ring_disable_irq(struct intel_engine_cs *engine)
>   {
>   	struct drm_i915_private *dev_priv = engine->i915;
> -	unsigned long flags;
>
> -	spin_lock_irqsave(&dev_priv->irq_lock, flags);
> -	if (--engine->irq_refcount == 0) {
> -		if (HAS_L3_DPF(dev_priv) && engine->id == RCS) {
> -			I915_WRITE_IMR(engine,
> -				       ~GT_RENDER_L3_PARITY_ERROR_INTERRUPT);
> -		} else {
> -			I915_WRITE_IMR(engine, ~0);
> -		}
> -		POSTING_READ(RING_IMR(engine->mmio_base));
> -	}
> -	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
> +	if (HAS_L3_DPF(dev_priv) && engine->id == RCS)
> +		I915_WRITE_IMR(engine,
> +			       ~GT_RENDER_L3_PARITY_ERROR_INTERRUPT);
> +	else
> +		I915_WRITE_IMR(engine, ~0);
>   }
>
>   static int
> @@ -2739,8 +2642,8 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
>   		engine->init_context = intel_rcs_ctx_init;
>   		engine->add_request = gen8_render_add_request;
>   		engine->flush = gen8_render_ring_flush;
> -		engine->irq_get = gen8_ring_get_irq;
> -		engine->irq_put = gen8_ring_put_irq;
> +		engine->irq_enable = gen8_ring_enable_irq;
> +		engine->irq_disable = gen8_ring_disable_irq;
>   		engine->irq_enable_mask = GT_RENDER_USER_INTERRUPT;
>   		if (i915_semaphore_is_enabled(dev_priv)) {
>   			WARN_ON(!dev_priv->semaphore_obj);
> @@ -2754,8 +2657,8 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
>   		engine->flush = gen7_render_ring_flush;
>   		if (IS_GEN6(dev_priv))
>   			engine->flush = gen6_render_ring_flush;
> -		engine->irq_get = gen6_ring_get_irq;
> -		engine->irq_put = gen6_ring_put_irq;
> +		engine->irq_enable = gen6_ring_enable_irq;
> +		engine->irq_disable = gen6_ring_disable_irq;
>   		engine->irq_enable_mask = GT_RENDER_USER_INTERRUPT;
>   		engine->irq_seqno_barrier = gen6_seqno_barrier;
>   		if (i915_semaphore_is_enabled(dev_priv)) {
> @@ -2782,8 +2685,8 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
>   	} else if (IS_GEN5(dev_priv)) {
>   		engine->add_request = i9xx_add_request;
>   		engine->flush = gen4_render_ring_flush;
> -		engine->irq_get = gen5_ring_get_irq;
> -		engine->irq_put = gen5_ring_put_irq;
> +		engine->irq_enable = gen5_ring_enable_irq;
> +		engine->irq_disable = gen5_ring_disable_irq;
>   		engine->irq_seqno_barrier = gen5_seqno_barrier;
>   		engine->irq_enable_mask = GT_RENDER_USER_INTERRUPT;
>   	} else {
> @@ -2793,11 +2696,11 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
>   		else
>   			engine->flush = gen4_render_ring_flush;
>   		if (IS_GEN2(dev_priv)) {
> -			engine->irq_get = i8xx_ring_get_irq;
> -			engine->irq_put = i8xx_ring_put_irq;
> +			engine->irq_enable = i8xx_ring_enable_irq;
> +			engine->irq_disable = i8xx_ring_disable_irq;
>   		} else {
> -			engine->irq_get = i9xx_ring_get_irq;
> -			engine->irq_put = i9xx_ring_put_irq;
> +			engine->irq_enable = i9xx_ring_enable_irq;
> +			engine->irq_disable = i9xx_ring_disable_irq;
>   		}
>   		engine->irq_enable_mask = I915_USER_INTERRUPT;
>   	}
> @@ -2857,8 +2760,8 @@ int intel_init_bsd_ring_buffer(struct drm_device *dev)
>   		if (INTEL_GEN(dev_priv) >= 8) {
>   			engine->irq_enable_mask =
>   				GT_RENDER_USER_INTERRUPT << GEN8_VCS1_IRQ_SHIFT;
> -			engine->irq_get = gen8_ring_get_irq;
> -			engine->irq_put = gen8_ring_put_irq;
> +			engine->irq_enable = gen8_ring_enable_irq;
> +			engine->irq_disable = gen8_ring_disable_irq;
>   			engine->dispatch_execbuffer =
>   				gen8_ring_dispatch_execbuffer;
>   			if (i915_semaphore_is_enabled(dev_priv)) {
> @@ -2868,8 +2771,8 @@ int intel_init_bsd_ring_buffer(struct drm_device *dev)
>   			}
>   		} else {
>   			engine->irq_enable_mask = GT_BSD_USER_INTERRUPT;
> -			engine->irq_get = gen6_ring_get_irq;
> -			engine->irq_put = gen6_ring_put_irq;
> +			engine->irq_enable = gen6_ring_enable_irq;
> +			engine->irq_disable = gen6_ring_disable_irq;
>   			engine->dispatch_execbuffer =
>   				gen6_ring_dispatch_execbuffer;
>   			if (i915_semaphore_is_enabled(dev_priv)) {
> @@ -2893,13 +2796,13 @@ int intel_init_bsd_ring_buffer(struct drm_device *dev)
>   		engine->add_request = i9xx_add_request;
>   		if (IS_GEN5(dev_priv)) {
>   			engine->irq_enable_mask = ILK_BSD_USER_INTERRUPT;
> -			engine->irq_get = gen5_ring_get_irq;
> -			engine->irq_put = gen5_ring_put_irq;
> +			engine->irq_enable = gen5_ring_enable_irq;
> +			engine->irq_disable = gen5_ring_disable_irq;
>   			engine->irq_seqno_barrier = gen5_seqno_barrier;
>   		} else {
>   			engine->irq_enable_mask = I915_BSD_USER_INTERRUPT;
> -			engine->irq_get = i9xx_ring_get_irq;
> -			engine->irq_put = i9xx_ring_put_irq;
> +			engine->irq_enable = i9xx_ring_enable_irq;
> +			engine->irq_disable = i9xx_ring_disable_irq;
>   		}
>   		engine->dispatch_execbuffer = i965_dispatch_execbuffer;
>   	}
> @@ -2928,8 +2831,8 @@ int intel_init_bsd2_ring_buffer(struct drm_device *dev)
>   	engine->irq_seqno_barrier = gen6_seqno_barrier;
>   	engine->irq_enable_mask =
>   			GT_RENDER_USER_INTERRUPT << GEN8_VCS2_IRQ_SHIFT;
> -	engine->irq_get = gen8_ring_get_irq;
> -	engine->irq_put = gen8_ring_put_irq;
> +	engine->irq_enable = gen8_ring_enable_irq;
> +	engine->irq_disable = gen8_ring_disable_irq;
>   	engine->dispatch_execbuffer =
>   			gen8_ring_dispatch_execbuffer;
>   	if (i915_semaphore_is_enabled(dev_priv)) {
> @@ -2960,8 +2863,8 @@ int intel_init_blt_ring_buffer(struct drm_device *dev)
>   	if (INTEL_GEN(dev_priv) >= 8) {
>   		engine->irq_enable_mask =
>   			GT_RENDER_USER_INTERRUPT << GEN8_BCS_IRQ_SHIFT;
> -		engine->irq_get = gen8_ring_get_irq;
> -		engine->irq_put = gen8_ring_put_irq;
> +		engine->irq_enable = gen8_ring_enable_irq;
> +		engine->irq_disable = gen8_ring_disable_irq;
>   		engine->dispatch_execbuffer = gen8_ring_dispatch_execbuffer;
>   		if (i915_semaphore_is_enabled(dev_priv)) {
>   			engine->semaphore.sync_to = gen8_ring_sync;
> @@ -2970,8 +2873,8 @@ int intel_init_blt_ring_buffer(struct drm_device *dev)
>   		}
>   	} else {
>   		engine->irq_enable_mask = GT_BLT_USER_INTERRUPT;
> -		engine->irq_get = gen6_ring_get_irq;
> -		engine->irq_put = gen6_ring_put_irq;
> +		engine->irq_enable = gen6_ring_enable_irq;
> +		engine->irq_disable = gen6_ring_disable_irq;
>   		engine->dispatch_execbuffer = gen6_ring_dispatch_execbuffer;
>   		if (i915_semaphore_is_enabled(dev_priv)) {
>   			engine->semaphore.signal = gen6_signal;
> @@ -3019,8 +2922,8 @@ int intel_init_vebox_ring_buffer(struct drm_device *dev)
>   	if (INTEL_GEN(dev_priv) >= 8) {
>   		engine->irq_enable_mask =
>   			GT_RENDER_USER_INTERRUPT << GEN8_VECS_IRQ_SHIFT;
> -		engine->irq_get = gen8_ring_get_irq;
> -		engine->irq_put = gen8_ring_put_irq;
> +		engine->irq_enable = gen8_ring_enable_irq;
> +		engine->irq_disable = gen8_ring_disable_irq;
>   		engine->dispatch_execbuffer = gen8_ring_dispatch_execbuffer;
>   		if (i915_semaphore_is_enabled(dev_priv)) {
>   			engine->semaphore.sync_to = gen8_ring_sync;
> @@ -3029,8 +2932,8 @@ int intel_init_vebox_ring_buffer(struct drm_device *dev)
>   		}
>   	} else {
>   		engine->irq_enable_mask = PM_VEBOX_USER_INTERRUPT;
> -		engine->irq_get = hsw_vebox_get_irq;
> -		engine->irq_put = hsw_vebox_put_irq;
> +		engine->irq_enable = hsw_vebox_enable_irq;
> +		engine->irq_disable = hsw_vebox_disable_irq;
>   		engine->dispatch_execbuffer = gen6_ring_dispatch_execbuffer;
>   		if (i915_semaphore_is_enabled(dev_priv)) {
>   			engine->semaphore.sync_to = gen6_ring_sync;
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 5f7cb3d0ea1c..182cae767bf1 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -201,11 +201,10 @@ struct intel_engine_cs {
>   	struct intel_hw_status_page status_page;
>   	struct i915_ctx_workarounds wa_ctx;
>
> -	unsigned irq_refcount; /* protected by dev_priv->irq_lock */
>   	bool		irq_posted;
>   	u32		irq_enable_mask;	/* bitmask to enable ring interrupt */
> -	bool __must_check (*irq_get)(struct intel_engine_cs *ring);
> -	void		(*irq_put)(struct intel_engine_cs *ring);
> +	void		(*irq_enable)(struct intel_engine_cs *ring);
> +	void		(*irq_disable)(struct intel_engine_cs *ring);
>
>   	int		(*init_hw)(struct intel_engine_cs *ring);
>
>

Some more instances of things I've already asked about.

Apart from those opens looks OK.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 20/21] drm/i915: Simplify enabling user-interrupts with L3-remapping
  2016-06-03 16:08 ` [PATCH 20/21] drm/i915: Simplify enabling user-interrupts with L3-remapping Chris Wilson
@ 2016-06-07 12:50   ` Tvrtko Ursulin
  0 siblings, 0 replies; 60+ messages in thread
From: Tvrtko Ursulin @ 2016-06-07 12:50 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 03/06/16 17:08, Chris Wilson wrote:
> Borrow the idea from intel_lrc.c to precompute the mask of interrupts we
> wish to always enable to avoid having lots of conditionals inside the
> interrupt enabling.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/intel_ringbuffer.c | 35 +++++++++++----------------------
>   drivers/gpu/drm/i915/intel_ringbuffer.h |  4 ++--
>   2 files changed, 14 insertions(+), 25 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index ba84b469f13f..161c0792b1bf 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1227,8 +1227,7 @@ static int init_render_ring(struct intel_engine_cs *engine)
>   	if (IS_GEN(dev_priv, 6, 7))
>   		I915_WRITE(INSTPM, _MASKED_BIT_ENABLE(INSTPM_FORCE_ORDERING));
>
> -	if (HAS_L3_DPF(dev_priv))
> -		I915_WRITE_IMR(engine, ~GT_PARITY_ERROR(dev_priv));
> +	I915_WRITE_IMR(engine, ~engine->irq_keep_mask);
>
>   	return init_workarounds_ring(engine);
>   }
> @@ -1644,12 +1643,9 @@ gen6_ring_enable_irq(struct intel_engine_cs *engine)
>   {
>   	struct drm_i915_private *dev_priv = engine->i915;
>
> -	if (HAS_L3_DPF(dev_priv) && engine->id == RCS)
> -		I915_WRITE_IMR(engine,
> -			       ~(engine->irq_enable_mask |
> -				 GT_PARITY_ERROR(dev_priv)));
> -	else
> -		I915_WRITE_IMR(engine, ~engine->irq_enable_mask);
> +	I915_WRITE_IMR(engine,
> +		       ~(engine->irq_enable_mask |
> +			 engine->irq_keep_mask));
>   	gen5_enable_gt_irq(dev_priv, engine->irq_enable_mask);
>   }
>
> @@ -1658,10 +1654,7 @@ gen6_ring_disable_irq(struct intel_engine_cs *engine)
>   {
>   	struct drm_i915_private *dev_priv = engine->i915;
>
> -	if (HAS_L3_DPF(dev_priv) && engine->id == RCS)
> -		I915_WRITE_IMR(engine, ~GT_PARITY_ERROR(dev_priv));
> -	else
> -		I915_WRITE_IMR(engine, ~0);
> +	I915_WRITE_IMR(engine, ~engine->irq_keep_mask);
>   	gen5_disable_gt_irq(dev_priv, engine->irq_enable_mask);
>   }
>
> @@ -1688,12 +1681,9 @@ gen8_ring_enable_irq(struct intel_engine_cs *engine)
>   {
>   	struct drm_i915_private *dev_priv = engine->i915;
>
> -	if (HAS_L3_DPF(dev_priv) && engine->id == RCS)
> -		I915_WRITE_IMR(engine,
> -			       ~(engine->irq_enable_mask |
> -				 GT_RENDER_L3_PARITY_ERROR_INTERRUPT));
> -	else
> -		I915_WRITE_IMR(engine, ~engine->irq_enable_mask);
> +	I915_WRITE_IMR(engine,
> +		       ~(engine->irq_enable_mask |
> +			 engine->irq_keep_mask));
>   	POSTING_READ_FW(RING_IMR(engine->mmio_base));
>   }
>
> @@ -1702,11 +1692,7 @@ gen8_ring_disable_irq(struct intel_engine_cs *engine)
>   {
>   	struct drm_i915_private *dev_priv = engine->i915;
>
> -	if (HAS_L3_DPF(dev_priv) && engine->id == RCS)
> -		I915_WRITE_IMR(engine,
> -			       ~GT_RENDER_L3_PARITY_ERROR_INTERRUPT);
> -	else
> -		I915_WRITE_IMR(engine, ~0);
> +	I915_WRITE_IMR(engine, ~engine->irq_keep_mask);
>   }
>
>   static int
> @@ -2621,6 +2607,9 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
>   	engine->hw_id = 0;
>   	engine->mmio_base = RENDER_RING_BASE;
>
> +	if (HAS_L3_DPF(dev_priv))
> +		engine->irq_keep_mask = GT_RENDER_L3_PARITY_ERROR_INTERRUPT;
> +
>   	if (INTEL_GEN(dev_priv) >= 8) {
>   		if (i915_semaphore_is_enabled(dev_priv)) {
>   			obj = i915_gem_object_create(dev, 4096);
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 182cae767bf1..166f1a3829b0 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -202,7 +202,8 @@ struct intel_engine_cs {
>   	struct i915_ctx_workarounds wa_ctx;
>
>   	bool		irq_posted;
> -	u32		irq_enable_mask;	/* bitmask to enable ring interrupt */
> +	u32             irq_keep_mask; /* bitmask for interrupts that should not be masked */
> +	u32		irq_enable_mask;/* bitmask to enable ring interrupt */
>   	void		(*irq_enable)(struct intel_engine_cs *ring);
>   	void		(*irq_disable)(struct intel_engine_cs *ring);
>
> @@ -299,7 +300,6 @@ struct intel_engine_cs {
>   	unsigned int idle_lite_restore_wa;
>   	bool disable_lite_restore_wa;
>   	u32 ctx_desc_template;
> -	u32             irq_keep_mask; /* bitmask for interrupts that should not be masked */
>   	int		(*emit_request)(struct drm_i915_gem_request *request);
>   	int		(*emit_flush)(struct drm_i915_gem_request *request,
>   				      u32 invalidate_domains,
>

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 21/21] drm/i915: Remove debug noise on detecting fault-injection of missed interrupts
  2016-06-03 16:08 ` [PATCH 21/21] drm/i915: Remove debug noise on detecting fault-injection of missed interrupts Chris Wilson
@ 2016-06-07 12:51   ` Tvrtko Ursulin
  0 siblings, 0 replies; 60+ messages in thread
From: Tvrtko Ursulin @ 2016-06-07 12:51 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 03/06/16 17:08, Chris Wilson wrote:
> Since the tests can and do explicitly check debugfs/i915_ring_missed_irqs
> for the handling of a "missed interrupt", adding it to the dmesg at INFO
> is just noise. When it happens for real, we still class it as an ERROR.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_irq.c | 3 ---
>   1 file changed, 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 5bdb433dde8c..f74f5727ea77 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -3071,9 +3071,6 @@ static unsigned kick_waiters(struct intel_engine_cs *engine)
>   		if (!test_bit(engine->id, &i915->gpu_error.test_irq_rings))
>   			DRM_ERROR("Hangcheck timer elapsed... %s idle\n",
>   				  engine->name);
> -		else
> -			DRM_INFO("Fake missed irq on %s\n",
> -				 engine->name);
>
>   		intel_engine_enable_fake_irq(engine);
>   	}
>

Makes sense, or could be at debug level. Either way:

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 02/21] drm/i915: Delay queuing hangcheck to wait-request
  2016-06-03 16:08 ` [PATCH 02/21] drm/i915: Delay queuing hangcheck to wait-request Chris Wilson
@ 2016-06-08  8:42   ` Daniel Vetter
  2016-06-08  9:13     ` Chris Wilson
  0 siblings, 1 reply; 60+ messages in thread
From: Daniel Vetter @ 2016-06-08  8:42 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Fri, Jun 03, 2016 at 05:08:34PM +0100, Chris Wilson wrote:
> We can forgo queuing the hangcheck from the start of every request to
> until we wait upon a request. This reduces the overhead of every
> request, but may increase the latency of detecting a hang. Howeever, if
> nothing every waits upon a hang, did it ever hang? It also improves the
> robustness of the wait-request by ensuring that the hangchecker is
> indeed running before we sleep indefinitely (and thereby ensuring that
> we never actually sleep forever waiting for a dead GPU).
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

I think this will run into TDR patches, where we want a super-low-latency
hangcheck in some cases. But then I think that's implemented by wrapping
the batch in some special cs commands to insta-kill the engine if the
timeout expired, so probably not a big problem. Still worth it to
double-check with Mika I'd say.
-Daniel

> ---
>  drivers/gpu/drm/i915/i915_gem.c |  9 +++++----
>  drivers/gpu/drm/i915/i915_irq.c | 10 ++++------
>  2 files changed, 9 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index c7a67a7412cd..03256f096ab6 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1310,6 +1310,9 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>  			break;
>  		}
>  
> +		/* Ensure that even if the GPU hangs, we get woken up. */
> +		i915_queue_hangcheck(dev_priv);
> +
>  		timer.function = NULL;
>  		if (timeout || missed_irq(dev_priv, engine)) {
>  			unsigned long expire;
> @@ -2674,8 +2677,6 @@ void __i915_add_request(struct drm_i915_gem_request *request,
>  	/* Not allowed to fail! */
>  	WARN(ret, "emit|add_request failed: %d!\n", ret);
>  
> -	i915_queue_hangcheck(engine->i915);
> -
>  	queue_delayed_work(dev_priv->wq,
>  			   &dev_priv->mm.retire_work,
>  			   round_jiffies_up_relative(HZ));
> @@ -3019,8 +3020,8 @@ i915_gem_retire_requests(struct drm_i915_private *dev_priv)
>  
>  	if (idle)
>  		mod_delayed_work(dev_priv->wq,
> -				   &dev_priv->mm.idle_work,
> -				   msecs_to_jiffies(100));
> +				 &dev_priv->mm.idle_work,
> +				 msecs_to_jiffies(100));
>  
>  	return idle;
>  }
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 5c7378374ae6..1303d7c034d3 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -3134,10 +3134,10 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
>  	intel_uncore_arm_unclaimed_mmio_detection(dev_priv);
>  
>  	for_each_engine_id(engine, dev_priv, id) {
> +		bool busy = waitqueue_active(&engine->irq_queue);
>  		u64 acthd;
>  		u32 seqno;
>  		unsigned user_interrupts;
> -		bool busy = true;
>  
>  		semaphore_clear_deadlocks(dev_priv);
>  
> @@ -3160,12 +3160,11 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
>  		if (engine->hangcheck.seqno == seqno) {
>  			if (ring_idle(engine, seqno)) {
>  				engine->hangcheck.action = HANGCHECK_IDLE;
> -				if (waitqueue_active(&engine->irq_queue)) {
> +				if (busy) {
>  					/* Safeguard against driver failure */
>  					user_interrupts = kick_waiters(engine);
>  					engine->hangcheck.score += BUSY;
> -				} else
> -					busy = false;
> +				}
>  			} else {
>  				/* We always increment the hangcheck score
>  				 * if the ring is busy and still processing
> @@ -3239,9 +3238,8 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
>  		goto out;
>  	}
>  
> +	/* Reset timer in case GPU hangs without another request being added */
>  	if (busy_count)
> -		/* Reset timer case chip hangs without another request
> -		 * being added */
>  		i915_queue_hangcheck(dev_priv);
>  
>  out:
> -- 
> 2.8.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 15/21] drm/i915: Stop setting wraparound seqno on initialisation
  2016-06-03 16:08 ` [PATCH 15/21] drm/i915: Stop setting wraparound seqno on initialisation Chris Wilson
@ 2016-06-08  8:54   ` Daniel Vetter
  0 siblings, 0 replies; 60+ messages in thread
From: Daniel Vetter @ 2016-06-08  8:54 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Fri, Jun 03, 2016 at 05:08:47PM +0100, Chris Wilson wrote:
> We have testcases to ensure that seqno wraparound works fine, so we can
> forgo forcing everyone to encounter seqno wraparound during early
> uptime. seqno wraparound incurs a full GPU stall so not forcing it
> will eliminate one jitter from the early system. Using the testcases, we
> have very deterministic testing which given how difficult it would be to
> debug an issue (GPU hang) stemming from a wraparound using pure
> postmortem analysis I see no value in forcing a wrap during boot.
> 
> Advancing the global next_seqno after a GPU reset is equally pointless.
> 
> References? https://bugs.freedesktop.org/show_bug.cgi?id=95023
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Makes sense.

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

> ---
>  drivers/gpu/drm/i915/i915_gem.c | 14 --------------
>  1 file changed, 14 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index bf5c93f2bd81..269d00a40483 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -4858,12 +4858,6 @@ i915_gem_init_hw(struct drm_device *dev)
>  			goto out;
>  	}
>  
> -	/*
> -	 * Increment the next seqno by 0x100 so we have a visible break
> -	 * on re-initialisation
> -	 */
> -	ret = i915_gem_set_seqno(dev, dev_priv->next_seqno+0x100);
> -
>  out:
>  	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
>  	return ret;
> @@ -5006,14 +5000,6 @@ i915_gem_load_init(struct drm_device *dev)
>  
>  	dev_priv->relative_constants_mode = I915_EXEC_CONSTANTS_REL_GENERAL;
>  
> -	/*
> -	 * Set initial sequence number for requests.
> -	 * Using this number allows the wraparound to happen early,
> -	 * catching any obvious problems.
> -	 */
> -	dev_priv->next_seqno = ((u32)~0 - 0x1100);
> -	dev_priv->last_seqno = ((u32)~0 - 0x1101);
> -
>  	INIT_LIST_HEAD(&dev_priv->mm.fence_list);
>  
>  	init_waitqueue_head(&dev_priv->pending_flip_queue);
> -- 
> 2.8.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 02/21] drm/i915: Delay queuing hangcheck to wait-request
  2016-06-08  8:42   ` Daniel Vetter
@ 2016-06-08  9:13     ` Chris Wilson
  0 siblings, 0 replies; 60+ messages in thread
From: Chris Wilson @ 2016-06-08  9:13 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Wed, Jun 08, 2016 at 10:42:58AM +0200, Daniel Vetter wrote:
> On Fri, Jun 03, 2016 at 05:08:34PM +0100, Chris Wilson wrote:
> > We can forgo queuing the hangcheck from the start of every request to
> > until we wait upon a request. This reduces the overhead of every
> > request, but may increase the latency of detecting a hang. Howeever, if
> > nothing every waits upon a hang, did it ever hang? It also improves the
> > robustness of the wait-request by ensuring that the hangchecker is
> > indeed running before we sleep indefinitely (and thereby ensuring that
> > we never actually sleep forever waiting for a dead GPU).
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> 
> I think this will run into TDR patches, where we want a super-low-latency
> hangcheck in some cases. But then I think that's implemented by wrapping
> the batch in some special cs commands to insta-kill the engine if the
> timeout expired, so probably not a big problem. Still worth it to
> double-check with Mika I'd say.

Exactly. With TDR, hangcheck is relegated to denial of service
protection. This does not conflict with TDR, they act as complementary.
With timelines, we probably want to go even further and completely
divorce checking GPU state for hangcheck from checking for timeline
advancement. There simply asking if the waiter has been stuck
dramatically simplifies everything. TDR is again complementary, but
hangcheck still functions in case TDR fails or is disabled.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 08/21] drm/i915: Use HWS for seqno tracking everywhere
  2016-06-06 14:55   ` Tvrtko Ursulin
@ 2016-06-08  9:24     ` Chris Wilson
  0 siblings, 0 replies; 60+ messages in thread
From: Chris Wilson @ 2016-06-08  9:24 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Mon, Jun 06, 2016 at 03:55:18PM +0100, Tvrtko Ursulin wrote:
> 
> On 03/06/16 17:08, Chris Wilson wrote:
> >diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> >index 2a736f4a0fe5..4013ad92cdc6 100644
> >--- a/drivers/gpu/drm/i915/i915_irq.c
> >+++ b/drivers/gpu/drm/i915/i915_irq.c
> >@@ -2951,7 +2951,7 @@ static int semaphore_passed(struct intel_engine_cs *engine)
> >  	if (signaller->hangcheck.deadlock >= I915_NUM_ENGINES)
> >  		return -1;
> >
> >-	if (i915_seqno_passed(signaller->get_seqno(signaller), seqno))
> >+	if (i915_seqno_passed(intel_engine_get_seqno(engine), seqno))
> 
> Should be signaller, not engine, by the look of it.

Ta!

> >@@ -1543,7 +1537,9 @@ static int
> >  pc_render_add_request(struct drm_i915_gem_request *req)
> >  {
> >  	struct intel_engine_cs *engine = req->engine;
> >-	u32 scratch_addr = engine->scratch.gtt_offset + 2 * CACHELINE_BYTES;
> >+	u32 addr = engine->status_page.gfx_addr +
> >+		(I915_GEM_HWS_INDEX << MI_STORE_DWORD_INDEX_SHIFT);
> >+	u32 scratch_addr = addr;
> >  	int ret;
> 
> Before my time. :)
> 
> Why was this code flushing all that space but above where it was
> writting the seqno?

The idea is simply that we need to delay the command streamer by
performing N writes before asserting the interrupt. The choice of where
is purely paranoia to make sure the GPU can't pack multiple writes into
one transaction.
 
> With this change it is flushing the seqno area as well.

s/flushing/writing/
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 11/21] drm/i915: Refactor scratch object allocation for gen2 w/a buffer
  2016-06-06 15:09   ` Tvrtko Ursulin
@ 2016-06-08  9:27     ` Chris Wilson
  0 siblings, 0 replies; 60+ messages in thread
From: Chris Wilson @ 2016-06-08  9:27 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Mon, Jun 06, 2016 at 04:09:12PM +0100, Tvrtko Ursulin wrote:
> 
> On 03/06/16 17:08, Chris Wilson wrote:
> >  	if (INTEL_GEN(dev_priv) >= 5) {
> >-		ret = intel_init_pipe_control(engine);
> >+		ret = intel_init_pipe_control(engine, 4096);
> 
> Could be cool to define this size with a descriptive name at this point.

GTT_PAGE_SIZE, or GTT_PAGE_4k for when we do large pages from yonks ago.

In this case it would be GEN5_SCRATCH_SIZE I guess. Is that worth it?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 14/21] drm/i915: Only apply one barrier after a breadcrumb interrupt is posted
  2016-06-06 15:34   ` Tvrtko Ursulin
@ 2016-06-08  9:35     ` Chris Wilson
  2016-06-08  9:57       ` Tvrtko Ursulin
  0 siblings, 1 reply; 60+ messages in thread
From: Chris Wilson @ 2016-06-08  9:35 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Mon, Jun 06, 2016 at 04:34:27PM +0100, Tvrtko Ursulin wrote:
> 
> On 03/06/16 17:08, Chris Wilson wrote:
> >If we flag the seqno as potentially stale upon receiving an interrupt,
> >we can use that information to reduce the frequency that we apply the
> >heavyweight coherent seqno read (i.e. if we wake up a chain of waiters).
> >
> >Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >---
> >  drivers/gpu/drm/i915/i915_drv.h          | 15 ++++++++++++++-
> >  drivers/gpu/drm/i915/i915_irq.c          |  1 +
> >  drivers/gpu/drm/i915/intel_breadcrumbs.c | 16 ++++++++++------
> >  drivers/gpu/drm/i915/intel_ringbuffer.h  |  1 +
> >  4 files changed, 26 insertions(+), 7 deletions(-)
> >
> >diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> >index 4ddb9ff319cb..a71d08199d57 100644
> >--- a/drivers/gpu/drm/i915/i915_drv.h
> >+++ b/drivers/gpu/drm/i915/i915_drv.h
> >@@ -3935,7 +3935,20 @@ static inline bool __i915_request_irq_complete(struct drm_i915_gem_request *req)
> >  	 * but it is easier and safer to do it every time the waiter
> >  	 * is woken.
> >  	 */
> >-	if (engine->irq_seqno_barrier) {
> >+	if (engine->irq_seqno_barrier && READ_ONCE(engine->irq_posted)) {
> >+		/* The ordering of irq_posted versus applying the barrier
> >+		 * is crucial. The clearing of the current irq_posted must
> >+		 * be visible before we perform the barrier operation,
> >+		 * such that if a subsequent interrupt arrives, irq_posted
> >+		 * is reasserted and our task rewoken (which causes us to
> >+		 * do another __i915_request_irq_complete() immediately
> >+		 * and reapply the barrier). Conversely, if the clear
> >+		 * occurs after the barrier, then an interrupt that arrived
> >+		 * whilst we waited on the barrier would not trigger a
> >+		 * barrier on the next pass, and the read may not see the
> >+		 * seqno update.
> >+		 */
> >+		WRITE_ONCE(engine->irq_posted, false);
> 
> Why is this not smp_store_mb ?

We only require the ordering wrt to irq_seqno_barrier().

How about:

if (engine->irq_seqno_barrier &&
    cmpxchg_relaxed(&engine->irq_post, 1, 0)) {

Less shouty?

> >diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> >index 44346de39794..0f5fe114c204 100644
> >--- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
> >+++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> >@@ -43,12 +43,18 @@ static void intel_breadcrumbs_fake_irq(unsigned long data)
> >
> >  static void irq_enable(struct intel_engine_cs *engine)
> >  {
> >+	/* Enabling the IRQ may miss the generation of the interrupt, but
> >+	 * we still need to force the barrier before reading the seqno,
> >+	 * just in case.
> >+	 */
> >+	engine->irq_posted = true;
> 
> Should it be smp_store_mb here as well?

No, this is written/read on the same callchain.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 17/21] drm/i915: Convert trace-irq to the breadcrumb waiter
  2016-06-07 12:04   ` Tvrtko Ursulin
@ 2016-06-08  9:48     ` Chris Wilson
  2016-06-08 10:16       ` Tvrtko Ursulin
  0 siblings, 1 reply; 60+ messages in thread
From: Chris Wilson @ 2016-06-08  9:48 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Tue, Jun 07, 2016 at 01:04:22PM +0100, Tvrtko Ursulin wrote:
> >+static int intel_breadcrumbs_signaler(void *arg)
> >+{
> >+	struct intel_engine_cs *engine = arg;
> >+	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> >+	struct signal *signal;
> >+
> >+	/* Install ourselves with high priority to reduce signalling latency */
> >+	signaler_set_rtpriority();
> >+
> >+	do {
> >+		set_current_state(TASK_INTERRUPTIBLE);
> >+
> >+		/* We are either woken up by the interrupt bottom-half,
> >+		 * or by a client adding a new signaller. In both cases,
> >+		 * the GPU seqno may have advanced beyond our oldest signal.
> >+		 * If it has, propagate the signal, remove the waiter and
> >+		 * check again with the next oldest signal. Otherwise we
> >+		 * need to wait for a new interrupt from the GPU or for
> >+		 * a new client.
> >+		 */
> >+		signal = READ_ONCE(b->first_signal);
> >+		if (signal_complete(signal)) {
> >+			/* Wake up all other completed waiters and select the
> >+			 * next bottom-half for the next user interrupt.
> >+			 */
> >+			intel_engine_remove_wait(engine, &signal->wait);
> >+
> >+			i915_gem_request_unreference(signal->request);
> >+
> >+			/* Find the next oldest signal. Note that as we have
> >+			 * not been holding the lock, another client may
> >+			 * have installed an even older signal than the one
> >+			 * we just completed - so double check we are still
> >+			 * the oldest before picking the next one.
> >+			 */
> >+			spin_lock(&b->lock);
> >+			if (signal == b->first_signal)
> >+				b->first_signal = rb_next(&signal->node);
> >+			rb_erase(&signal->node, &b->signals);
> >+			spin_unlock(&b->lock);
> >+
> >+			kfree(signal);
> >+		} else {
> >+			if (kthread_should_stop())
> >+				break;
> >+
> >+			schedule();
> >+		}
> >+	} while (1);
> >+
> >+	return 0;
> >+}
> 
> So the thread is only because it is convenient to plug it in the
> breadcrumbs infrastructure. Otherwise the processing above could be
> done from a lighter weight context as well since nothing seems to
> need the process context.

No, seqno processing requires process/sleepable context. The delays we
incur can be >100us and not suitable for irq/softirq context.

> One alternative could perhaps be to add a waiter->wake_up vfunc and
> signalers could then potentially use a tasklet?

Hmm, I did find that in order to reduce execlists latency, I had to
drive the tasklet processing from the signaler.

> >+int intel_engine_enable_signaling(struct drm_i915_gem_request *request)
> >+{
> >+	struct intel_engine_cs *engine = request->engine;
> >+	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> >+	struct rb_node *parent, **p;
> >+	struct signal *signal;
> >+	bool first, wakeup;
> >+
> >+	if (unlikely(IS_ERR(b->signaler)))
> >+		return PTR_ERR(b->signaler);
> 
> I don't see that there is a fallback is kthread creation failed. It
> should just fail in intel_engine_init_breadcrumbs if that happens.

Because it is not fatal to using the GPU, just one optional function.

> >+	signal = kmalloc(sizeof(*signal), GFP_ATOMIC);
> 
> Ugh GFP_ATOMIC - why?

Because of dma-buf/fence.c.
 
> And even should you instead just embed in into requests?

I was resisting embedding even more into requests, so first patch was
for a simpler integration, with a subsequent patch to embed the node
into the request.

> >@@ -329,12 +480,24 @@ void intel_engine_init_breadcrumbs(struct intel_engine_cs *engine)
> >  	setup_timer(&b->fake_irq,
> >  		    intel_breadcrumbs_fake_irq,
> >  		    (unsigned long)engine);
> >+
> >+	/* Spawn a thread to provide a common bottom-half for all signals.
> >+	 * As this is an asynchronous interface we cannot steal the current
> >+	 * task for handling the bottom-half to the user interrupt, therefore
> >+	 * we create a thread to do the coherent seqno dance after the
> >+	 * interrupt and then signal the waitqueue (via the dma-buf/fence).
> >+	 */
> >+	b->signaler = kthread_run(intel_breadcrumbs_signaler,
> >+				  engine, "irq/i915:%d", engine->id);
> 
> As commented above, init should fail here because it cannot run
> without the thread.

We can function without the signaler.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 18/21] drm/i915: Embed signaling node into the GEM request
  2016-06-07 12:31   ` Tvrtko Ursulin
@ 2016-06-08  9:54     ` Chris Wilson
  0 siblings, 0 replies; 60+ messages in thread
From: Chris Wilson @ 2016-06-08  9:54 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Tue, Jun 07, 2016 at 01:31:25PM +0100, Tvrtko Ursulin wrote:
> >@@ -418,20 +423,23 @@ int intel_engine_enable_signaling(struct drm_i915_gem_request *request)
> >  	struct intel_engine_cs *engine = request->engine;
> >  	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> >  	struct rb_node *parent, **p;
> >-	struct signal *signal;
> >  	bool first, wakeup;
> >
> >  	if (unlikely(IS_ERR(b->signaler)))
> >  		return PTR_ERR(b->signaler);
> >
> >-	signal = kmalloc(sizeof(*signal), GFP_ATOMIC);
> >-	if (unlikely(!signal))
> >-		return -ENOMEM;
> >+	if (unlikely(READ_ONCE(request->signaling.wait.task)))
> >+		return 0;
> 
> Hmm it will depend on following patches whether this is safe. I
> don't like the explosion of READ_ONCE and smp_store_mb's in these
> patches. Something is bound to be broken.

This one is trivial :)
 
> You even check it below under the lock. So I am not sure this
> optimisation is worth it. Maybe leave it for later?

I was just worrying about contention since the breadcrumbs lock is being
used to guard both trees and contention in the waiters is noticeable.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 14/21] drm/i915: Only apply one barrier after a breadcrumb interrupt is posted
  2016-06-08  9:35     ` Chris Wilson
@ 2016-06-08  9:57       ` Tvrtko Ursulin
  0 siblings, 0 replies; 60+ messages in thread
From: Tvrtko Ursulin @ 2016-06-08  9:57 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 08/06/16 10:35, Chris Wilson wrote:
> On Mon, Jun 06, 2016 at 04:34:27PM +0100, Tvrtko Ursulin wrote:
>>
>> On 03/06/16 17:08, Chris Wilson wrote:
>>> If we flag the seqno as potentially stale upon receiving an interrupt,
>>> we can use that information to reduce the frequency that we apply the
>>> heavyweight coherent seqno read (i.e. if we wake up a chain of waiters).
>>>
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> ---
>>>   drivers/gpu/drm/i915/i915_drv.h          | 15 ++++++++++++++-
>>>   drivers/gpu/drm/i915/i915_irq.c          |  1 +
>>>   drivers/gpu/drm/i915/intel_breadcrumbs.c | 16 ++++++++++------
>>>   drivers/gpu/drm/i915/intel_ringbuffer.h  |  1 +
>>>   4 files changed, 26 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>>> index 4ddb9ff319cb..a71d08199d57 100644
>>> --- a/drivers/gpu/drm/i915/i915_drv.h
>>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>>> @@ -3935,7 +3935,20 @@ static inline bool __i915_request_irq_complete(struct drm_i915_gem_request *req)
>>>   	 * but it is easier and safer to do it every time the waiter
>>>   	 * is woken.
>>>   	 */
>>> -	if (engine->irq_seqno_barrier) {
>>> +	if (engine->irq_seqno_barrier && READ_ONCE(engine->irq_posted)) {
>>> +		/* The ordering of irq_posted versus applying the barrier
>>> +		 * is crucial. The clearing of the current irq_posted must
>>> +		 * be visible before we perform the barrier operation,
>>> +		 * such that if a subsequent interrupt arrives, irq_posted
>>> +		 * is reasserted and our task rewoken (which causes us to
>>> +		 * do another __i915_request_irq_complete() immediately
>>> +		 * and reapply the barrier). Conversely, if the clear
>>> +		 * occurs after the barrier, then an interrupt that arrived
>>> +		 * whilst we waited on the barrier would not trigger a
>>> +		 * barrier on the next pass, and the read may not see the
>>> +		 * seqno update.
>>> +		 */
>>> +		WRITE_ONCE(engine->irq_posted, false);
>>
>> Why is this not smp_store_mb ?
>
> We only require the ordering wrt to irq_seqno_barrier().
>
> How about:
>
> if (engine->irq_seqno_barrier &&
>      cmpxchg_relaxed(&engine->irq_post, 1, 0)) {
>
> Less shouty?

I think so.

>>> diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
>>> index 44346de39794..0f5fe114c204 100644
>>> --- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
>>> +++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
>>> @@ -43,12 +43,18 @@ static void intel_breadcrumbs_fake_irq(unsigned long data)
>>>
>>>   static void irq_enable(struct intel_engine_cs *engine)
>>>   {
>>> +	/* Enabling the IRQ may miss the generation of the interrupt, but
>>> +	 * we still need to force the barrier before reading the seqno,
>>> +	 * just in case.
>>> +	 */
>>> +	engine->irq_posted = true;
>>
>> Should it be smp_store_mb here as well?
>
> No, this is written/read on the same callchain.

Ah true.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 19/21] drm/i915: Move the get/put irq locking into the caller
  2016-06-07 12:46   ` Tvrtko Ursulin
@ 2016-06-08 10:01     ` Chris Wilson
  2016-06-08 10:18       ` Tvrtko Ursulin
  0 siblings, 1 reply; 60+ messages in thread
From: Chris Wilson @ 2016-06-08 10:01 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Tue, Jun 07, 2016 at 01:46:53PM +0100, Tvrtko Ursulin wrote:
> 
> On 03/06/16 17:08, Chris Wilson wrote:
> >With only a single callsite for intel_engine_cs->irq_get and ->irq_put,
> >we can reduce the code size by moving the common preamble into the
> >caller, and we can also eliminate the reference counting.
> >
> >For completeness, as we are no longer doing reference counting on irq,
> >rename the get/put vfunctions to enable/disable respectively.
> >
> >Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >---
> >  drivers/gpu/drm/i915/i915_irq.c          |   8 +-
> >  drivers/gpu/drm/i915/intel_breadcrumbs.c |  10 +-
> >  drivers/gpu/drm/i915/intel_lrc.c         |  34 +---
> >  drivers/gpu/drm/i915/intel_ringbuffer.c  | 269 ++++++++++---------------------
> >  drivers/gpu/drm/i915/intel_ringbuffer.h  |   5 +-
> >  5 files changed, 108 insertions(+), 218 deletions(-)
> >
> >diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> >index 14b3d65bb604..5bdb433dde8c 100644
> >--- a/drivers/gpu/drm/i915/i915_irq.c
> >+++ b/drivers/gpu/drm/i915/i915_irq.c
> >@@ -259,12 +259,12 @@ static void ilk_update_gt_irq(struct drm_i915_private *dev_priv,
> >  	dev_priv->gt_irq_mask &= ~interrupt_mask;
> >  	dev_priv->gt_irq_mask |= (~enabled_irq_mask & interrupt_mask);
> >  	I915_WRITE(GTIMR, dev_priv->gt_irq_mask);
> >-	POSTING_READ(GTIMR);
> >  }
> >
> >  void gen5_enable_gt_irq(struct drm_i915_private *dev_priv, uint32_t mask)
> >  {
> >  	ilk_update_gt_irq(dev_priv, mask, mask);
> >+	POSTING_READ_FW(GTIMR);
> >  }
> 
> Unrelated hunks?
> 
> How is POSTING_READ_FW correct?

The requirement here is an uncached read of the mmio register in order
to flush the previous write to hw. A grander scheme would be to convert
all posting reads, but that requires double checking to see if anyone
has been cheating!

> Also removes the posting read from disable, OK?

Correct, we only depend upon the ordering with hw on the enable path.
This is one of those rare instances where the barrier is required (and
UC write is not enough!), if we don't we check the "post-enabled seqno"
before the interrupt is ready.

> >+	I915_WRITE_IMR(engine,
> >+		       ~(engine->irq_enable_mask | engine->irq_keep_mask));
> >+	POSTING_READ_FW(RING_IMR(engine->mmio_base));
> 
> Hm, more of _FW following normal access. What am I missing? You are
> not by any chance banking on the auto-release window?

Nope.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 17/21] drm/i915: Convert trace-irq to the breadcrumb waiter
  2016-06-08  9:48     ` Chris Wilson
@ 2016-06-08 10:16       ` Tvrtko Ursulin
  2016-06-08 11:24         ` Chris Wilson
  0 siblings, 1 reply; 60+ messages in thread
From: Tvrtko Ursulin @ 2016-06-08 10:16 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 08/06/16 10:48, Chris Wilson wrote:
> On Tue, Jun 07, 2016 at 01:04:22PM +0100, Tvrtko Ursulin wrote:
>>> +static int intel_breadcrumbs_signaler(void *arg)
>>> +{
>>> +	struct intel_engine_cs *engine = arg;
>>> +	struct intel_breadcrumbs *b = &engine->breadcrumbs;
>>> +	struct signal *signal;
>>> +
>>> +	/* Install ourselves with high priority to reduce signalling latency */
>>> +	signaler_set_rtpriority();
>>> +
>>> +	do {
>>> +		set_current_state(TASK_INTERRUPTIBLE);
>>> +
>>> +		/* We are either woken up by the interrupt bottom-half,
>>> +		 * or by a client adding a new signaller. In both cases,
>>> +		 * the GPU seqno may have advanced beyond our oldest signal.
>>> +		 * If it has, propagate the signal, remove the waiter and
>>> +		 * check again with the next oldest signal. Otherwise we
>>> +		 * need to wait for a new interrupt from the GPU or for
>>> +		 * a new client.
>>> +		 */
>>> +		signal = READ_ONCE(b->first_signal);
>>> +		if (signal_complete(signal)) {
>>> +			/* Wake up all other completed waiters and select the
>>> +			 * next bottom-half for the next user interrupt.
>>> +			 */
>>> +			intel_engine_remove_wait(engine, &signal->wait);
>>> +
>>> +			i915_gem_request_unreference(signal->request);
>>> +
>>> +			/* Find the next oldest signal. Note that as we have
>>> +			 * not been holding the lock, another client may
>>> +			 * have installed an even older signal than the one
>>> +			 * we just completed - so double check we are still
>>> +			 * the oldest before picking the next one.
>>> +			 */
>>> +			spin_lock(&b->lock);
>>> +			if (signal == b->first_signal)
>>> +				b->first_signal = rb_next(&signal->node);
>>> +			rb_erase(&signal->node, &b->signals);
>>> +			spin_unlock(&b->lock);
>>> +
>>> +			kfree(signal);
>>> +		} else {
>>> +			if (kthread_should_stop())
>>> +				break;
>>> +
>>> +			schedule();
>>> +		}
>>> +	} while (1);
>>> +
>>> +	return 0;
>>> +}
>>
>> So the thread is only because it is convenient to plug it in the
>> breadcrumbs infrastructure. Otherwise the processing above could be
>> done from a lighter weight context as well since nothing seems to
>> need the process context.
>
> No, seqno processing requires process/sleepable context. The delays we
> incur can be >100us and not suitable for irq/softirq context.

Nothing in this patch needs it - please say in the commit why it is 
choosing the process context then.

And why so long delays? It looks pretty lightweight to me.

>> One alternative could perhaps be to add a waiter->wake_up vfunc and
>> signalers could then potentially use a tasklet?
>
> Hmm, I did find that in order to reduce execlists latency, I had to
> drive the tasklet processing from the signaler.

What do you mean? The existing execlists tasklet? Now would that work?

>>> +int intel_engine_enable_signaling(struct drm_i915_gem_request *request)
>>> +{
>>> +	struct intel_engine_cs *engine = request->engine;
>>> +	struct intel_breadcrumbs *b = &engine->breadcrumbs;
>>> +	struct rb_node *parent, **p;
>>> +	struct signal *signal;
>>> +	bool first, wakeup;
>>> +
>>> +	if (unlikely(IS_ERR(b->signaler)))
>>> +		return PTR_ERR(b->signaler);
>>
>> I don't see that there is a fallback is kthread creation failed. It
>> should just fail in intel_engine_init_breadcrumbs if that happens.
>
> Because it is not fatal to using the GPU, just one optional function.

But we never expect it to fail and it is not even dependent on anything 
user controllable. Just a random error which would cause user experience 
to degrade. If thread creation failed it means system is in such a poor 
shape I would just fail the driver init.

>>> +	signal = kmalloc(sizeof(*signal), GFP_ATOMIC);
>>
>> Ugh GFP_ATOMIC - why?
>
> Because of dma-buf/fence.c.

Ah.. yes bad.. fortunately fixed in the following patch.

>> And even should you instead just embed in into requests?
>
> I was resisting embedding even more into requests, so first patch was
> for a simpler integration, with a subsequent patch to embed the node
> into the request.
>
>>> @@ -329,12 +480,24 @@ void intel_engine_init_breadcrumbs(struct intel_engine_cs *engine)
>>>   	setup_timer(&b->fake_irq,
>>>   		    intel_breadcrumbs_fake_irq,
>>>   		    (unsigned long)engine);
>>> +
>>> +	/* Spawn a thread to provide a common bottom-half for all signals.
>>> +	 * As this is an asynchronous interface we cannot steal the current
>>> +	 * task for handling the bottom-half to the user interrupt, therefore
>>> +	 * we create a thread to do the coherent seqno dance after the
>>> +	 * interrupt and then signal the waitqueue (via the dma-buf/fence).
>>> +	 */
>>> +	b->signaler = kthread_run(intel_breadcrumbs_signaler,
>>> +				  engine, "irq/i915:%d", engine->id);
>>
>> As commented above, init should fail here because it cannot run
>> without the thread.
>
> We can function without the signaler.

Commented above.

Regards,

Tvrtko


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 19/21] drm/i915: Move the get/put irq locking into the caller
  2016-06-08 10:01     ` Chris Wilson
@ 2016-06-08 10:18       ` Tvrtko Ursulin
  2016-06-08 11:10         ` Chris Wilson
  0 siblings, 1 reply; 60+ messages in thread
From: Tvrtko Ursulin @ 2016-06-08 10:18 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 08/06/16 11:01, Chris Wilson wrote:
> On Tue, Jun 07, 2016 at 01:46:53PM +0100, Tvrtko Ursulin wrote:
>>
>> On 03/06/16 17:08, Chris Wilson wrote:
>>> With only a single callsite for intel_engine_cs->irq_get and ->irq_put,
>>> we can reduce the code size by moving the common preamble into the
>>> caller, and we can also eliminate the reference counting.
>>>
>>> For completeness, as we are no longer doing reference counting on irq,
>>> rename the get/put vfunctions to enable/disable respectively.
>>>
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> ---
>>>   drivers/gpu/drm/i915/i915_irq.c          |   8 +-
>>>   drivers/gpu/drm/i915/intel_breadcrumbs.c |  10 +-
>>>   drivers/gpu/drm/i915/intel_lrc.c         |  34 +---
>>>   drivers/gpu/drm/i915/intel_ringbuffer.c  | 269 ++++++++++---------------------
>>>   drivers/gpu/drm/i915/intel_ringbuffer.h  |   5 +-
>>>   5 files changed, 108 insertions(+), 218 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
>>> index 14b3d65bb604..5bdb433dde8c 100644
>>> --- a/drivers/gpu/drm/i915/i915_irq.c
>>> +++ b/drivers/gpu/drm/i915/i915_irq.c
>>> @@ -259,12 +259,12 @@ static void ilk_update_gt_irq(struct drm_i915_private *dev_priv,
>>>   	dev_priv->gt_irq_mask &= ~interrupt_mask;
>>>   	dev_priv->gt_irq_mask |= (~enabled_irq_mask & interrupt_mask);
>>>   	I915_WRITE(GTIMR, dev_priv->gt_irq_mask);
>>> -	POSTING_READ(GTIMR);
>>>   }
>>>
>>>   void gen5_enable_gt_irq(struct drm_i915_private *dev_priv, uint32_t mask)
>>>   {
>>>   	ilk_update_gt_irq(dev_priv, mask, mask);
>>> +	POSTING_READ_FW(GTIMR);
>>>   }
>>
>> Unrelated hunks?
>>
>> How is POSTING_READ_FW correct?
>
> The requirement here is an uncached read of the mmio register in order
> to flush the previous write to hw. A grander scheme would be to convert
> all posting reads, but that requires double checking to see if anyone
> has been cheating!

So what prevents to force-wake for getting released between the 
I915_WRITE and POSTING_READ_FW ?

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 19/21] drm/i915: Move the get/put irq locking into the caller
  2016-06-08 10:18       ` Tvrtko Ursulin
@ 2016-06-08 11:10         ` Chris Wilson
  2016-06-08 11:49           ` Tvrtko Ursulin
  0 siblings, 1 reply; 60+ messages in thread
From: Chris Wilson @ 2016-06-08 11:10 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Wed, Jun 08, 2016 at 11:18:59AM +0100, Tvrtko Ursulin wrote:
> 
> On 08/06/16 11:01, Chris Wilson wrote:
> >On Tue, Jun 07, 2016 at 01:46:53PM +0100, Tvrtko Ursulin wrote:
> >>
> >>On 03/06/16 17:08, Chris Wilson wrote:
> >>>With only a single callsite for intel_engine_cs->irq_get and ->irq_put,
> >>>we can reduce the code size by moving the common preamble into the
> >>>caller, and we can also eliminate the reference counting.
> >>>
> >>>For completeness, as we are no longer doing reference counting on irq,
> >>>rename the get/put vfunctions to enable/disable respectively.
> >>>
> >>>Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >>>---
> >>>  drivers/gpu/drm/i915/i915_irq.c          |   8 +-
> >>>  drivers/gpu/drm/i915/intel_breadcrumbs.c |  10 +-
> >>>  drivers/gpu/drm/i915/intel_lrc.c         |  34 +---
> >>>  drivers/gpu/drm/i915/intel_ringbuffer.c  | 269 ++++++++++---------------------
> >>>  drivers/gpu/drm/i915/intel_ringbuffer.h  |   5 +-
> >>>  5 files changed, 108 insertions(+), 218 deletions(-)
> >>>
> >>>diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> >>>index 14b3d65bb604..5bdb433dde8c 100644
> >>>--- a/drivers/gpu/drm/i915/i915_irq.c
> >>>+++ b/drivers/gpu/drm/i915/i915_irq.c
> >>>@@ -259,12 +259,12 @@ static void ilk_update_gt_irq(struct drm_i915_private *dev_priv,
> >>>  	dev_priv->gt_irq_mask &= ~interrupt_mask;
> >>>  	dev_priv->gt_irq_mask |= (~enabled_irq_mask & interrupt_mask);
> >>>  	I915_WRITE(GTIMR, dev_priv->gt_irq_mask);
> >>>-	POSTING_READ(GTIMR);
> >>>  }
> >>>
> >>>  void gen5_enable_gt_irq(struct drm_i915_private *dev_priv, uint32_t mask)
> >>>  {
> >>>  	ilk_update_gt_irq(dev_priv, mask, mask);
> >>>+	POSTING_READ_FW(GTIMR);
> >>>  }
> >>
> >>Unrelated hunks?
> >>
> >>How is POSTING_READ_FW correct?
> >
> >The requirement here is an uncached read of the mmio register in order
> >to flush the previous write to hw. A grander scheme would be to convert
> >all posting reads, but that requires double checking to see if anyone
> >has been cheating!
> 
> So what prevents to force-wake for getting released between the
> I915_WRITE and POSTING_READ_FW ?

Nothing. The point is that the FW is not required for the correctness or
operation of the POSTING_READ as a barrier to hardware enabling the
interrupt.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 17/21] drm/i915: Convert trace-irq to the breadcrumb waiter
  2016-06-08 10:16       ` Tvrtko Ursulin
@ 2016-06-08 11:24         ` Chris Wilson
  2016-06-08 11:47           ` Tvrtko Ursulin
  0 siblings, 1 reply; 60+ messages in thread
From: Chris Wilson @ 2016-06-08 11:24 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Wed, Jun 08, 2016 at 11:16:13AM +0100, Tvrtko Ursulin wrote:
> 
> On 08/06/16 10:48, Chris Wilson wrote:
> >On Tue, Jun 07, 2016 at 01:04:22PM +0100, Tvrtko Ursulin wrote:
> >>>+static int intel_breadcrumbs_signaler(void *arg)
> >>>+{
> >>>+	struct intel_engine_cs *engine = arg;
> >>>+	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> >>>+	struct signal *signal;
> >>>+
> >>>+	/* Install ourselves with high priority to reduce signalling latency */
> >>>+	signaler_set_rtpriority();
> >>>+
> >>>+	do {
> >>>+		set_current_state(TASK_INTERRUPTIBLE);
> >>>+
> >>>+		/* We are either woken up by the interrupt bottom-half,
> >>>+		 * or by a client adding a new signaller. In both cases,
> >>>+		 * the GPU seqno may have advanced beyond our oldest signal.
> >>>+		 * If it has, propagate the signal, remove the waiter and
> >>>+		 * check again with the next oldest signal. Otherwise we
> >>>+		 * need to wait for a new interrupt from the GPU or for
> >>>+		 * a new client.
> >>>+		 */
> >>>+		signal = READ_ONCE(b->first_signal);
> >>>+		if (signal_complete(signal)) {
> >>>+			/* Wake up all other completed waiters and select the
> >>>+			 * next bottom-half for the next user interrupt.
> >>>+			 */
> >>>+			intel_engine_remove_wait(engine, &signal->wait);
> >>>+
> >>>+			i915_gem_request_unreference(signal->request);
> >>>+
> >>>+			/* Find the next oldest signal. Note that as we have
> >>>+			 * not been holding the lock, another client may
> >>>+			 * have installed an even older signal than the one
> >>>+			 * we just completed - so double check we are still
> >>>+			 * the oldest before picking the next one.
> >>>+			 */
> >>>+			spin_lock(&b->lock);
> >>>+			if (signal == b->first_signal)
> >>>+				b->first_signal = rb_next(&signal->node);
> >>>+			rb_erase(&signal->node, &b->signals);
> >>>+			spin_unlock(&b->lock);
> >>>+
> >>>+			kfree(signal);
> >>>+		} else {
> >>>+			if (kthread_should_stop())
> >>>+				break;
> >>>+
> >>>+			schedule();
> >>>+		}
> >>>+	} while (1);
> >>>+
> >>>+	return 0;
> >>>+}
> >>
> >>So the thread is only because it is convenient to plug it in the
> >>breadcrumbs infrastructure. Otherwise the processing above could be
> >>done from a lighter weight context as well since nothing seems to
> >>need the process context.
> >
> >No, seqno processing requires process/sleepable context. The delays we
> >incur can be >100us and not suitable for irq/softirq context.
> 
> Nothing in this patch needs it - please say in the commit why it is
> choosing the process context then.

Bottom half processing requires it. irq_seqno_barrier is not suitable
for irq/softirq context.

> And why so long delays? It looks pretty lightweight to me.
> 
> >>One alternative could perhaps be to add a waiter->wake_up vfunc and
> >>signalers could then potentially use a tasklet?
> >
> >Hmm, I did find that in order to reduce execlists latency, I had to
> >drive the tasklet processing from the signaler.
> 
> What do you mean? The existing execlists tasklet? Now would that work?

Due to how dma-fence signals, the softirq is never kicked
(spin_lock_irq doesn't handle local_bh_enable()) and so we would only
submit a new task via execlists on a reschedule. That latency added
about 30% (30s on bsw) to gem_exec_parallel.

> >>>+int intel_engine_enable_signaling(struct drm_i915_gem_request *request)
> >>>+{
> >>>+	struct intel_engine_cs *engine = request->engine;
> >>>+	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> >>>+	struct rb_node *parent, **p;
> >>>+	struct signal *signal;
> >>>+	bool first, wakeup;
> >>>+
> >>>+	if (unlikely(IS_ERR(b->signaler)))
> >>>+		return PTR_ERR(b->signaler);
> >>
> >>I don't see that there is a fallback is kthread creation failed. It
> >>should just fail in intel_engine_init_breadcrumbs if that happens.
> >
> >Because it is not fatal to using the GPU, just one optional function.
> 
> But we never expect it to fail and it is not even dependent on
> anything user controllable. Just a random error which would cause
> user experience to degrade. If thread creation failed it means
> system is in such a poor shape I would just fail the driver init.

A minimally functional system is better than nothing at all.
GEM is not required for driver loading, interrupt driven dma-fences less
so.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 17/21] drm/i915: Convert trace-irq to the breadcrumb waiter
  2016-06-08 11:24         ` Chris Wilson
@ 2016-06-08 11:47           ` Tvrtko Ursulin
  2016-06-08 12:34             ` Chris Wilson
  0 siblings, 1 reply; 60+ messages in thread
From: Tvrtko Ursulin @ 2016-06-08 11:47 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 08/06/16 12:24, Chris Wilson wrote:
> On Wed, Jun 08, 2016 at 11:16:13AM +0100, Tvrtko Ursulin wrote:
>>
>> On 08/06/16 10:48, Chris Wilson wrote:
>>> On Tue, Jun 07, 2016 at 01:04:22PM +0100, Tvrtko Ursulin wrote:
>>>>> +static int intel_breadcrumbs_signaler(void *arg)
>>>>> +{
>>>>> +	struct intel_engine_cs *engine = arg;
>>>>> +	struct intel_breadcrumbs *b = &engine->breadcrumbs;
>>>>> +	struct signal *signal;
>>>>> +
>>>>> +	/* Install ourselves with high priority to reduce signalling latency */
>>>>> +	signaler_set_rtpriority();
>>>>> +
>>>>> +	do {
>>>>> +		set_current_state(TASK_INTERRUPTIBLE);
>>>>> +
>>>>> +		/* We are either woken up by the interrupt bottom-half,
>>>>> +		 * or by a client adding a new signaller. In both cases,
>>>>> +		 * the GPU seqno may have advanced beyond our oldest signal.
>>>>> +		 * If it has, propagate the signal, remove the waiter and
>>>>> +		 * check again with the next oldest signal. Otherwise we
>>>>> +		 * need to wait for a new interrupt from the GPU or for
>>>>> +		 * a new client.
>>>>> +		 */
>>>>> +		signal = READ_ONCE(b->first_signal);
>>>>> +		if (signal_complete(signal)) {
>>>>> +			/* Wake up all other completed waiters and select the
>>>>> +			 * next bottom-half for the next user interrupt.
>>>>> +			 */
>>>>> +			intel_engine_remove_wait(engine, &signal->wait);
>>>>> +
>>>>> +			i915_gem_request_unreference(signal->request);
>>>>> +
>>>>> +			/* Find the next oldest signal. Note that as we have
>>>>> +			 * not been holding the lock, another client may
>>>>> +			 * have installed an even older signal than the one
>>>>> +			 * we just completed - so double check we are still
>>>>> +			 * the oldest before picking the next one.
>>>>> +			 */
>>>>> +			spin_lock(&b->lock);
>>>>> +			if (signal == b->first_signal)
>>>>> +				b->first_signal = rb_next(&signal->node);
>>>>> +			rb_erase(&signal->node, &b->signals);
>>>>> +			spin_unlock(&b->lock);
>>>>> +
>>>>> +			kfree(signal);
>>>>> +		} else {
>>>>> +			if (kthread_should_stop())
>>>>> +				break;
>>>>> +
>>>>> +			schedule();
>>>>> +		}
>>>>> +	} while (1);
>>>>> +
>>>>> +	return 0;
>>>>> +}
>>>>
>>>> So the thread is only because it is convenient to plug it in the
>>>> breadcrumbs infrastructure. Otherwise the processing above could be
>>>> done from a lighter weight context as well since nothing seems to
>>>> need the process context.
>>>
>>> No, seqno processing requires process/sleepable context. The delays we
>>> incur can be >100us and not suitable for irq/softirq context.
>>
>> Nothing in this patch needs it - please say in the commit why it is
>> choosing the process context then.
>
> Bottom half processing requires it. irq_seqno_barrier is not suitable
> for irq/softirq context.

Why? Because of a single clflush? How long does that take?

>> And why so long delays? It looks pretty lightweight to me.
>>
>>>> One alternative could perhaps be to add a waiter->wake_up vfunc and
>>>> signalers could then potentially use a tasklet?
>>>
>>> Hmm, I did find that in order to reduce execlists latency, I had to
>>> drive the tasklet processing from the signaler.
>>
>> What do you mean? The existing execlists tasklet? Now would that work?
>
> Due to how dma-fence signals, the softirq is never kicked
> (spin_lock_irq doesn't handle local_bh_enable()) and so we would only
> submit a new task via execlists on a reschedule. That latency added
> about 30% (30s on bsw) to gem_exec_parallel.

I don't follow. User interrupts are separate from context complete which 
drives the submission. How do fences interfere with the latter?

>>>>> +int intel_engine_enable_signaling(struct drm_i915_gem_request *request)
>>>>> +{
>>>>> +	struct intel_engine_cs *engine = request->engine;
>>>>> +	struct intel_breadcrumbs *b = &engine->breadcrumbs;
>>>>> +	struct rb_node *parent, **p;
>>>>> +	struct signal *signal;
>>>>> +	bool first, wakeup;
>>>>> +
>>>>> +	if (unlikely(IS_ERR(b->signaler)))
>>>>> +		return PTR_ERR(b->signaler);
>>>>
>>>> I don't see that there is a fallback is kthread creation failed. It
>>>> should just fail in intel_engine_init_breadcrumbs if that happens.
>>>
>>> Because it is not fatal to using the GPU, just one optional function.
>>
>> But we never expect it to fail and it is not even dependent on
>> anything user controllable. Just a random error which would cause
>> user experience to degrade. If thread creation failed it means
>> system is in such a poor shape I would just fail the driver init.
>
> A minimally functional system is better than nothing at all.
> GEM is not required for driver loading, interrupt driven dma-fences less
> so.

If you are so hot for that, how about vfuncing enable signaling in that 
case? Because I find the "have we created our kthread at driver init 
time successfuly" question for every fence a bit too much.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 19/21] drm/i915: Move the get/put irq locking into the caller
  2016-06-08 11:10         ` Chris Wilson
@ 2016-06-08 11:49           ` Tvrtko Ursulin
  2016-06-08 12:54             ` Chris Wilson
  0 siblings, 1 reply; 60+ messages in thread
From: Tvrtko Ursulin @ 2016-06-08 11:49 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 08/06/16 12:10, Chris Wilson wrote:
> On Wed, Jun 08, 2016 at 11:18:59AM +0100, Tvrtko Ursulin wrote:
>>
>> On 08/06/16 11:01, Chris Wilson wrote:
>>> On Tue, Jun 07, 2016 at 01:46:53PM +0100, Tvrtko Ursulin wrote:
>>>>
>>>> On 03/06/16 17:08, Chris Wilson wrote:
>>>>> With only a single callsite for intel_engine_cs->irq_get and ->irq_put,
>>>>> we can reduce the code size by moving the common preamble into the
>>>>> caller, and we can also eliminate the reference counting.
>>>>>
>>>>> For completeness, as we are no longer doing reference counting on irq,
>>>>> rename the get/put vfunctions to enable/disable respectively.
>>>>>
>>>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>>>> ---
>>>>>   drivers/gpu/drm/i915/i915_irq.c          |   8 +-
>>>>>   drivers/gpu/drm/i915/intel_breadcrumbs.c |  10 +-
>>>>>   drivers/gpu/drm/i915/intel_lrc.c         |  34 +---
>>>>>   drivers/gpu/drm/i915/intel_ringbuffer.c  | 269 ++++++++++---------------------
>>>>>   drivers/gpu/drm/i915/intel_ringbuffer.h  |   5 +-
>>>>>   5 files changed, 108 insertions(+), 218 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
>>>>> index 14b3d65bb604..5bdb433dde8c 100644
>>>>> --- a/drivers/gpu/drm/i915/i915_irq.c
>>>>> +++ b/drivers/gpu/drm/i915/i915_irq.c
>>>>> @@ -259,12 +259,12 @@ static void ilk_update_gt_irq(struct drm_i915_private *dev_priv,
>>>>>   	dev_priv->gt_irq_mask &= ~interrupt_mask;
>>>>>   	dev_priv->gt_irq_mask |= (~enabled_irq_mask & interrupt_mask);
>>>>>   	I915_WRITE(GTIMR, dev_priv->gt_irq_mask);
>>>>> -	POSTING_READ(GTIMR);
>>>>>   }
>>>>>
>>>>>   void gen5_enable_gt_irq(struct drm_i915_private *dev_priv, uint32_t mask)
>>>>>   {
>>>>>   	ilk_update_gt_irq(dev_priv, mask, mask);
>>>>> +	POSTING_READ_FW(GTIMR);
>>>>>   }
>>>>
>>>> Unrelated hunks?
>>>>
>>>> How is POSTING_READ_FW correct?
>>>
>>> The requirement here is an uncached read of the mmio register in order
>>> to flush the previous write to hw. A grander scheme would be to convert
>>> all posting reads, but that requires double checking to see if anyone
>>> has been cheating!
>>
>> So what prevents to force-wake for getting released between the
>> I915_WRITE and POSTING_READ_FW ?
>
> Nothing. The point is that the FW is not required for the correctness or
> operation of the POSTING_READ as a barrier to hardware enabling the
> interrupt.

So sleeping hardware is OK with being read from? It won't hang or 
anything, just provide bad data?

Why not change POSTING_READ to be I915_READ_FW always then?

Regards,

Tvrtko



_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 17/21] drm/i915: Convert trace-irq to the breadcrumb waiter
  2016-06-08 11:47           ` Tvrtko Ursulin
@ 2016-06-08 12:34             ` Chris Wilson
  2016-06-08 12:44               ` Tvrtko Ursulin
  0 siblings, 1 reply; 60+ messages in thread
From: Chris Wilson @ 2016-06-08 12:34 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Wed, Jun 08, 2016 at 12:47:28PM +0100, Tvrtko Ursulin wrote:
> 
> On 08/06/16 12:24, Chris Wilson wrote:
> >On Wed, Jun 08, 2016 at 11:16:13AM +0100, Tvrtko Ursulin wrote:
> >>
> >>On 08/06/16 10:48, Chris Wilson wrote:
> >>>On Tue, Jun 07, 2016 at 01:04:22PM +0100, Tvrtko Ursulin wrote:
> >>>>>+static int intel_breadcrumbs_signaler(void *arg)
> >>>>>+{
> >>>>>+	struct intel_engine_cs *engine = arg;
> >>>>>+	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> >>>>>+	struct signal *signal;
> >>>>>+
> >>>>>+	/* Install ourselves with high priority to reduce signalling latency */
> >>>>>+	signaler_set_rtpriority();
> >>>>>+
> >>>>>+	do {
> >>>>>+		set_current_state(TASK_INTERRUPTIBLE);
> >>>>>+
> >>>>>+		/* We are either woken up by the interrupt bottom-half,
> >>>>>+		 * or by a client adding a new signaller. In both cases,
> >>>>>+		 * the GPU seqno may have advanced beyond our oldest signal.
> >>>>>+		 * If it has, propagate the signal, remove the waiter and
> >>>>>+		 * check again with the next oldest signal. Otherwise we
> >>>>>+		 * need to wait for a new interrupt from the GPU or for
> >>>>>+		 * a new client.
> >>>>>+		 */
> >>>>>+		signal = READ_ONCE(b->first_signal);
> >>>>>+		if (signal_complete(signal)) {
> >>>>>+			/* Wake up all other completed waiters and select the
> >>>>>+			 * next bottom-half for the next user interrupt.
> >>>>>+			 */
> >>>>>+			intel_engine_remove_wait(engine, &signal->wait);
> >>>>>+
> >>>>>+			i915_gem_request_unreference(signal->request);
> >>>>>+
> >>>>>+			/* Find the next oldest signal. Note that as we have
> >>>>>+			 * not been holding the lock, another client may
> >>>>>+			 * have installed an even older signal than the one
> >>>>>+			 * we just completed - so double check we are still
> >>>>>+			 * the oldest before picking the next one.
> >>>>>+			 */
> >>>>>+			spin_lock(&b->lock);
> >>>>>+			if (signal == b->first_signal)
> >>>>>+				b->first_signal = rb_next(&signal->node);
> >>>>>+			rb_erase(&signal->node, &b->signals);
> >>>>>+			spin_unlock(&b->lock);
> >>>>>+
> >>>>>+			kfree(signal);
> >>>>>+		} else {
> >>>>>+			if (kthread_should_stop())
> >>>>>+				break;
> >>>>>+
> >>>>>+			schedule();
> >>>>>+		}
> >>>>>+	} while (1);
> >>>>>+
> >>>>>+	return 0;
> >>>>>+}
> >>>>
> >>>>So the thread is only because it is convenient to plug it in the
> >>>>breadcrumbs infrastructure. Otherwise the processing above could be
> >>>>done from a lighter weight context as well since nothing seems to
> >>>>need the process context.
> >>>
> >>>No, seqno processing requires process/sleepable context. The delays we
> >>>incur can be >100us and not suitable for irq/softirq context.
> >>
> >>Nothing in this patch needs it - please say in the commit why it is
> >>choosing the process context then.
> >
> >Bottom half processing requires it. irq_seqno_barrier is not suitable
> >for irq/softirq context.
> 
> Why? Because of a single clflush? How long does that take?

Because both Ironlake and Baytrail require definite delays on the order of
100us. Haswell, Broadwell, Skylake all need an extra delay that we don't
yet have.
 
> >>And why so long delays? It looks pretty lightweight to me.
> >>
> >>>>One alternative could perhaps be to add a waiter->wake_up vfunc and
> >>>>signalers could then potentially use a tasklet?
> >>>
> >>>Hmm, I did find that in order to reduce execlists latency, I had to
> >>>drive the tasklet processing from the signaler.
> >>
> >>What do you mean? The existing execlists tasklet? Now would that work?
> >
> >Due to how dma-fence signals, the softirq is never kicked
> >(spin_lock_irq doesn't handle local_bh_enable()) and so we would only
> >submit a new task via execlists on a reschedule. That latency added
> >about 30% (30s on bsw) to gem_exec_parallel.
> 
> I don't follow. User interrupts are separate from context complete
> which drives the submission. How do fences interfere with the
> latter?

The biggest user benchmark (ala sysmark) regression we have for
execlists is the latency in submitting the first request to hardware via
elsp (or at least the hw responding to and executing that batch, 
the per-bb and per-ctx w/a are not free either). If we incur extra
latency in the driver in even adding the request to the queue for an
idle GPU, that is easily felt by userspace.

> >>>>>+int intel_engine_enable_signaling(struct drm_i915_gem_request *request)
> >>>>>+{
> >>>>>+	struct intel_engine_cs *engine = request->engine;
> >>>>>+	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> >>>>>+	struct rb_node *parent, **p;
> >>>>>+	struct signal *signal;
> >>>>>+	bool first, wakeup;
> >>>>>+
> >>>>>+	if (unlikely(IS_ERR(b->signaler)))
> >>>>>+		return PTR_ERR(b->signaler);
> >>>>
> >>>>I don't see that there is a fallback is kthread creation failed. It
> >>>>should just fail in intel_engine_init_breadcrumbs if that happens.
> >>>
> >>>Because it is not fatal to using the GPU, just one optional function.
> >>
> >>But we never expect it to fail and it is not even dependent on
> >>anything user controllable. Just a random error which would cause
> >>user experience to degrade. If thread creation failed it means
> >>system is in such a poor shape I would just fail the driver init.
> >
> >A minimally functional system is better than nothing at all.
> >GEM is not required for driver loading, interrupt driven dma-fences less
> >so.
> 
> If you are so hot for that, how about vfuncing enable signaling in
> that case? Because I find the "have we created our kthread at driver
> init time successfuly" question for every fence a bit too much.

read + conditional that pulls in the cacheline we want? You can place
the test after the spinlock if you want to avoid the cost I supose.
Or we just mark the GPU as wedged.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 17/21] drm/i915: Convert trace-irq to the breadcrumb waiter
  2016-06-08 12:34             ` Chris Wilson
@ 2016-06-08 12:44               ` Tvrtko Ursulin
  2016-06-08 13:47                 ` Chris Wilson
  0 siblings, 1 reply; 60+ messages in thread
From: Tvrtko Ursulin @ 2016-06-08 12:44 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 08/06/16 13:34, Chris Wilson wrote:
> On Wed, Jun 08, 2016 at 12:47:28PM +0100, Tvrtko Ursulin wrote:
>>
>> On 08/06/16 12:24, Chris Wilson wrote:
>>> On Wed, Jun 08, 2016 at 11:16:13AM +0100, Tvrtko Ursulin wrote:
>>>>
>>>> On 08/06/16 10:48, Chris Wilson wrote:
>>>>> On Tue, Jun 07, 2016 at 01:04:22PM +0100, Tvrtko Ursulin wrote:
>>>>>>> +static int intel_breadcrumbs_signaler(void *arg)
>>>>>>> +{
>>>>>>> +	struct intel_engine_cs *engine = arg;
>>>>>>> +	struct intel_breadcrumbs *b = &engine->breadcrumbs;
>>>>>>> +	struct signal *signal;
>>>>>>> +
>>>>>>> +	/* Install ourselves with high priority to reduce signalling latency */
>>>>>>> +	signaler_set_rtpriority();
>>>>>>> +
>>>>>>> +	do {
>>>>>>> +		set_current_state(TASK_INTERRUPTIBLE);
>>>>>>> +
>>>>>>> +		/* We are either woken up by the interrupt bottom-half,
>>>>>>> +		 * or by a client adding a new signaller. In both cases,
>>>>>>> +		 * the GPU seqno may have advanced beyond our oldest signal.
>>>>>>> +		 * If it has, propagate the signal, remove the waiter and
>>>>>>> +		 * check again with the next oldest signal. Otherwise we
>>>>>>> +		 * need to wait for a new interrupt from the GPU or for
>>>>>>> +		 * a new client.
>>>>>>> +		 */
>>>>>>> +		signal = READ_ONCE(b->first_signal);
>>>>>>> +		if (signal_complete(signal)) {
>>>>>>> +			/* Wake up all other completed waiters and select the
>>>>>>> +			 * next bottom-half for the next user interrupt.
>>>>>>> +			 */
>>>>>>> +			intel_engine_remove_wait(engine, &signal->wait);
>>>>>>> +
>>>>>>> +			i915_gem_request_unreference(signal->request);
>>>>>>> +
>>>>>>> +			/* Find the next oldest signal. Note that as we have
>>>>>>> +			 * not been holding the lock, another client may
>>>>>>> +			 * have installed an even older signal than the one
>>>>>>> +			 * we just completed - so double check we are still
>>>>>>> +			 * the oldest before picking the next one.
>>>>>>> +			 */
>>>>>>> +			spin_lock(&b->lock);
>>>>>>> +			if (signal == b->first_signal)
>>>>>>> +				b->first_signal = rb_next(&signal->node);
>>>>>>> +			rb_erase(&signal->node, &b->signals);
>>>>>>> +			spin_unlock(&b->lock);
>>>>>>> +
>>>>>>> +			kfree(signal);
>>>>>>> +		} else {
>>>>>>> +			if (kthread_should_stop())
>>>>>>> +				break;
>>>>>>> +
>>>>>>> +			schedule();
>>>>>>> +		}
>>>>>>> +	} while (1);
>>>>>>> +
>>>>>>> +	return 0;
>>>>>>> +}
>>>>>>
>>>>>> So the thread is only because it is convenient to plug it in the
>>>>>> breadcrumbs infrastructure. Otherwise the processing above could be
>>>>>> done from a lighter weight context as well since nothing seems to
>>>>>> need the process context.
>>>>>
>>>>> No, seqno processing requires process/sleepable context. The delays we
>>>>> incur can be >100us and not suitable for irq/softirq context.
>>>>
>>>> Nothing in this patch needs it - please say in the commit why it is
>>>> choosing the process context then.
>>>
>>> Bottom half processing requires it. irq_seqno_barrier is not suitable
>>> for irq/softirq context.
>>
>> Why? Because of a single clflush? How long does that take?
>
> Because both Ironlake and Baytrail require definite delays on the order of
> 100us. Haswell, Broadwell, Skylake all need an extra delay that we don't
> yet have.

Okay, please mention in the commit so the choice is documented.

>>>> And why so long delays? It looks pretty lightweight to me.
>>>>
>>>>>> One alternative could perhaps be to add a waiter->wake_up vfunc and
>>>>>> signalers could then potentially use a tasklet?
>>>>>
>>>>> Hmm, I did find that in order to reduce execlists latency, I had to
>>>>> drive the tasklet processing from the signaler.
>>>>
>>>> What do you mean? The existing execlists tasklet? Now would that work?
>>>
>>> Due to how dma-fence signals, the softirq is never kicked
>>> (spin_lock_irq doesn't handle local_bh_enable()) and so we would only
>>> submit a new task via execlists on a reschedule. That latency added
>>> about 30% (30s on bsw) to gem_exec_parallel.
>>
>> I don't follow. User interrupts are separate from context complete
>> which drives the submission. How do fences interfere with the
>> latter?
>
> The biggest user benchmark (ala sysmark) regression we have for
> execlists is the latency in submitting the first request to hardware via
> elsp (or at least the hw responding to and executing that batch,
> the per-bb and per-ctx w/a are not free either). If we incur extra
> latency in the driver in even adding the request to the queue for an
> idle GPU, that is easily felt by userspace.

I still don't see how fences tie into that. But it is not so important 
since it was all along the lines of "do we really need a thread".

>>>>>>> +int intel_engine_enable_signaling(struct drm_i915_gem_request *request)
>>>>>>> +{
>>>>>>> +	struct intel_engine_cs *engine = request->engine;
>>>>>>> +	struct intel_breadcrumbs *b = &engine->breadcrumbs;
>>>>>>> +	struct rb_node *parent, **p;
>>>>>>> +	struct signal *signal;
>>>>>>> +	bool first, wakeup;
>>>>>>> +
>>>>>>> +	if (unlikely(IS_ERR(b->signaler)))
>>>>>>> +		return PTR_ERR(b->signaler);
>>>>>>
>>>>>> I don't see that there is a fallback is kthread creation failed. It
>>>>>> should just fail in intel_engine_init_breadcrumbs if that happens.
>>>>>
>>>>> Because it is not fatal to using the GPU, just one optional function.
>>>>
>>>> But we never expect it to fail and it is not even dependent on
>>>> anything user controllable. Just a random error which would cause
>>>> user experience to degrade. If thread creation failed it means
>>>> system is in such a poor shape I would just fail the driver init.
>>>
>>> A minimally functional system is better than nothing at all.
>>> GEM is not required for driver loading, interrupt driven dma-fences less
>>> so.
>>
>> If you are so hot for that, how about vfuncing enable signaling in
>> that case? Because I find the "have we created our kthread at driver
>> init time successfuly" question for every fence a bit too much.
>
> read + conditional that pulls in the cacheline we want? You can place
> the test after the spinlock if you want to avoid the cost I supose.
> Or we just mark the GPU as wedged.

What I meant was to pass in different fence_ops at fence_init time 
depending on whether or not signaler thread was created or not. If 
driver is wanted to be functional in that case, and 
fence->enable_signaling needs to keep returning errors, it sound like a 
much more elegant solution than to repeating the check at every 
fence->enable_signaling call.

Regards,

Tvrtko





_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 19/21] drm/i915: Move the get/put irq locking into the caller
  2016-06-08 11:49           ` Tvrtko Ursulin
@ 2016-06-08 12:54             ` Chris Wilson
  0 siblings, 0 replies; 60+ messages in thread
From: Chris Wilson @ 2016-06-08 12:54 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Wed, Jun 08, 2016 at 12:49:14PM +0100, Tvrtko Ursulin wrote:
> 
> On 08/06/16 12:10, Chris Wilson wrote:
> >On Wed, Jun 08, 2016 at 11:18:59AM +0100, Tvrtko Ursulin wrote:
> >>
> >>On 08/06/16 11:01, Chris Wilson wrote:
> >>>On Tue, Jun 07, 2016 at 01:46:53PM +0100, Tvrtko Ursulin wrote:
> >>>>
> >>>>On 03/06/16 17:08, Chris Wilson wrote:
> >>>>>With only a single callsite for intel_engine_cs->irq_get and ->irq_put,
> >>>>>we can reduce the code size by moving the common preamble into the
> >>>>>caller, and we can also eliminate the reference counting.
> >>>>>
> >>>>>For completeness, as we are no longer doing reference counting on irq,
> >>>>>rename the get/put vfunctions to enable/disable respectively.
> >>>>>
> >>>>>Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >>>>>---
> >>>>>  drivers/gpu/drm/i915/i915_irq.c          |   8 +-
> >>>>>  drivers/gpu/drm/i915/intel_breadcrumbs.c |  10 +-
> >>>>>  drivers/gpu/drm/i915/intel_lrc.c         |  34 +---
> >>>>>  drivers/gpu/drm/i915/intel_ringbuffer.c  | 269 ++++++++++---------------------
> >>>>>  drivers/gpu/drm/i915/intel_ringbuffer.h  |   5 +-
> >>>>>  5 files changed, 108 insertions(+), 218 deletions(-)
> >>>>>
> >>>>>diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> >>>>>index 14b3d65bb604..5bdb433dde8c 100644
> >>>>>--- a/drivers/gpu/drm/i915/i915_irq.c
> >>>>>+++ b/drivers/gpu/drm/i915/i915_irq.c
> >>>>>@@ -259,12 +259,12 @@ static void ilk_update_gt_irq(struct drm_i915_private *dev_priv,
> >>>>>  	dev_priv->gt_irq_mask &= ~interrupt_mask;
> >>>>>  	dev_priv->gt_irq_mask |= (~enabled_irq_mask & interrupt_mask);
> >>>>>  	I915_WRITE(GTIMR, dev_priv->gt_irq_mask);
> >>>>>-	POSTING_READ(GTIMR);
> >>>>>  }
> >>>>>
> >>>>>  void gen5_enable_gt_irq(struct drm_i915_private *dev_priv, uint32_t mask)
> >>>>>  {
> >>>>>  	ilk_update_gt_irq(dev_priv, mask, mask);
> >>>>>+	POSTING_READ_FW(GTIMR);
> >>>>>  }
> >>>>
> >>>>Unrelated hunks?
> >>>>
> >>>>How is POSTING_READ_FW correct?
> >>>
> >>>The requirement here is an uncached read of the mmio register in order
> >>>to flush the previous write to hw. A grander scheme would be to convert
> >>>all posting reads, but that requires double checking to see if anyone
> >>>has been cheating!
> >>
> >>So what prevents to force-wake for getting released between the
> >>I915_WRITE and POSTING_READ_FW ?
> >
> >Nothing. The point is that the FW is not required for the correctness or
> >operation of the POSTING_READ as a barrier to hardware enabling the
> >interrupt.
> 
> So sleeping hardware is OK with being read from? It won't hang or
> anything, just provide bad data?

Just garbage.
 
> Why not change POSTING_READ to be I915_READ_FW always then?

First plan was to purge all the posting-reads. That proves some are
required. So now, if it appears in a profile, it is asked to justify its
existence.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 17/21] drm/i915: Convert trace-irq to the breadcrumb waiter
  2016-06-08 12:44               ` Tvrtko Ursulin
@ 2016-06-08 13:47                 ` Chris Wilson
  0 siblings, 0 replies; 60+ messages in thread
From: Chris Wilson @ 2016-06-08 13:47 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Wed, Jun 08, 2016 at 01:44:27PM +0100, Tvrtko Ursulin wrote:
> 
> On 08/06/16 13:34, Chris Wilson wrote:
> >On Wed, Jun 08, 2016 at 12:47:28PM +0100, Tvrtko Ursulin wrote:
> >>
> >>On 08/06/16 12:24, Chris Wilson wrote:
> >>>On Wed, Jun 08, 2016 at 11:16:13AM +0100, Tvrtko Ursulin wrote:
> >>>>
> >>>>And why so long delays? It looks pretty lightweight to me.
> >>>>
> >>>>>>One alternative could perhaps be to add a waiter->wake_up vfunc and
> >>>>>>signalers could then potentially use a tasklet?
> >>>>>
> >>>>>Hmm, I did find that in order to reduce execlists latency, I had to
> >>>>>drive the tasklet processing from the signaler.
> >>>>
> >>>>What do you mean? The existing execlists tasklet? Now would that work?
> >>>
> >>>Due to how dma-fence signals, the softirq is never kicked
> >>>(spin_lock_irq doesn't handle local_bh_enable()) and so we would only
> >>>submit a new task via execlists on a reschedule. That latency added
> >>>about 30% (30s on bsw) to gem_exec_parallel.
> >>
> >>I don't follow. User interrupts are separate from context complete
> >>which drives the submission. How do fences interfere with the
> >>latter?
> >
> >The biggest user benchmark (ala sysmark) regression we have for
> >execlists is the latency in submitting the first request to hardware via
> >elsp (or at least the hw responding to and executing that batch,
> >the per-bb and per-ctx w/a are not free either). If we incur extra
> >latency in the driver in even adding the request to the queue for an
> >idle GPU, that is easily felt by userspace.
> 
> I still don't see how fences tie into that. But it is not so
> important since it was all along the lines of "do we really need a
> thread".

I was just mentioning in passing an issue I noticed when mixing fences
and tasklets! Which boils down to spin_unlock_irq() doesn't do
local_bh_enable() and so trying to schedule a tasklet from inside a
fence callback incurs more latency than you would expect. Entirely
unrelated expect for the signaling, fencing and their uses ;)

> >>>>>>>+int intel_engine_enable_signaling(struct drm_i915_gem_request *request)
> >>>>>>>+{
> >>>>>>>+	struct intel_engine_cs *engine = request->engine;
> >>>>>>>+	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> >>>>>>>+	struct rb_node *parent, **p;
> >>>>>>>+	struct signal *signal;
> >>>>>>>+	bool first, wakeup;
> >>>>>>>+
> >>>>>>>+	if (unlikely(IS_ERR(b->signaler)))
> >>>>>>>+		return PTR_ERR(b->signaler);
> >>>>>>
> >>>>>>I don't see that there is a fallback is kthread creation failed. It
> >>>>>>should just fail in intel_engine_init_breadcrumbs if that happens.
> >>>>>
> >>>>>Because it is not fatal to using the GPU, just one optional function.
> >>>>
> >>>>But we never expect it to fail and it is not even dependent on
> >>>>anything user controllable. Just a random error which would cause
> >>>>user experience to degrade. If thread creation failed it means
> >>>>system is in such a poor shape I would just fail the driver init.
> >>>
> >>>A minimally functional system is better than nothing at all.
> >>>GEM is not required for driver loading, interrupt driven dma-fences less
> >>>so.
> >>
> >>If you are so hot for that, how about vfuncing enable signaling in
> >>that case? Because I find the "have we created our kthread at driver
> >>init time successfuly" question for every fence a bit too much.
> >
> >read + conditional that pulls in the cacheline we want? You can place
> >the test after the spinlock if you want to avoid the cost I supose.
> >Or we just mark the GPU as wedged.
> 
> What I meant was to pass in different fence_ops at fence_init time
> depending on whether or not signaler thread was created or not. If
> driver is wanted to be functional in that case, and
> fence->enable_signaling needs to keep returning errors, it sound
> like a much more elegant solution than to repeating the check at
> every fence->enable_signaling call.

Actually, looking at it, the code was broken for !thread as there was
not an automatic fallback to polling by dma-fence. Choice between doing
that ourselves for an impossible failure case or just marking the GPU as
wedged on init.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2016-06-08 13:47 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-03 16:08 Breadcrumbs, again Chris Wilson
2016-06-03 16:08 ` [PATCH 01/21] drm/i915/shrinker: Flush active on objects before counting Chris Wilson
2016-06-03 16:08 ` [PATCH 02/21] drm/i915: Delay queuing hangcheck to wait-request Chris Wilson
2016-06-08  8:42   ` Daniel Vetter
2016-06-08  9:13     ` Chris Wilson
2016-06-03 16:08 ` [PATCH 03/21] drm/i915: Remove the dedicated hangcheck workqueue Chris Wilson
2016-06-06 12:52   ` Tvrtko Ursulin
2016-06-03 16:08 ` [PATCH 04/21] drm/i915: Make queueing the hangcheck work inline Chris Wilson
2016-06-03 16:08 ` [PATCH 05/21] drm/i915: Separate GPU hang waitqueue from advance Chris Wilson
2016-06-06 13:00   ` Tvrtko Ursulin
2016-06-07 12:11     ` Arun Siluvery
2016-06-03 16:08 ` [PATCH 06/21] drm/i915: Slaughter the thundering i915_wait_request herd Chris Wilson
2016-06-06 13:58   ` Tvrtko Ursulin
2016-06-03 16:08 ` [PATCH 07/21] drm/i915: Spin after waking up for an interrupt Chris Wilson
2016-06-06 14:39   ` Tvrtko Ursulin
2016-06-03 16:08 ` [PATCH 08/21] drm/i915: Use HWS for seqno tracking everywhere Chris Wilson
2016-06-06 14:55   ` Tvrtko Ursulin
2016-06-08  9:24     ` Chris Wilson
2016-06-03 16:08 ` [PATCH 09/21] drm/i915: Stop mapping the scratch page into CPU space Chris Wilson
2016-06-06 15:03   ` Tvrtko Ursulin
2016-06-03 16:08 ` [PATCH 10/21] drm/i915: Allocate scratch page from stolen Chris Wilson
2016-06-06 15:05   ` Tvrtko Ursulin
2016-06-03 16:08 ` [PATCH 11/21] drm/i915: Refactor scratch object allocation for gen2 w/a buffer Chris Wilson
2016-06-06 15:09   ` Tvrtko Ursulin
2016-06-08  9:27     ` Chris Wilson
2016-06-03 16:08 ` [PATCH 12/21] drm/i915: Add a delay between interrupt and inspecting the final seqno (ilk) Chris Wilson
2016-06-03 16:08 ` [PATCH 13/21] drm/i915: Check the CPU cached value of seqno after waking the waiter Chris Wilson
2016-06-06 15:10   ` Tvrtko Ursulin
2016-06-03 16:08 ` [PATCH 14/21] drm/i915: Only apply one barrier after a breadcrumb interrupt is posted Chris Wilson
2016-06-06 15:34   ` Tvrtko Ursulin
2016-06-08  9:35     ` Chris Wilson
2016-06-08  9:57       ` Tvrtko Ursulin
2016-06-03 16:08 ` [PATCH 15/21] drm/i915: Stop setting wraparound seqno on initialisation Chris Wilson
2016-06-08  8:54   ` Daniel Vetter
2016-06-03 16:08 ` [PATCH 16/21] drm/i915: Only query timestamp when measuring elapsed time Chris Wilson
2016-06-06 13:50   ` Tvrtko Ursulin
2016-06-03 16:08 ` [PATCH 17/21] drm/i915: Convert trace-irq to the breadcrumb waiter Chris Wilson
2016-06-07 12:04   ` Tvrtko Ursulin
2016-06-08  9:48     ` Chris Wilson
2016-06-08 10:16       ` Tvrtko Ursulin
2016-06-08 11:24         ` Chris Wilson
2016-06-08 11:47           ` Tvrtko Ursulin
2016-06-08 12:34             ` Chris Wilson
2016-06-08 12:44               ` Tvrtko Ursulin
2016-06-08 13:47                 ` Chris Wilson
2016-06-03 16:08 ` [PATCH 18/21] drm/i915: Embed signaling node into the GEM request Chris Wilson
2016-06-07 12:31   ` Tvrtko Ursulin
2016-06-08  9:54     ` Chris Wilson
2016-06-03 16:08 ` [PATCH 19/21] drm/i915: Move the get/put irq locking into the caller Chris Wilson
2016-06-07 12:46   ` Tvrtko Ursulin
2016-06-08 10:01     ` Chris Wilson
2016-06-08 10:18       ` Tvrtko Ursulin
2016-06-08 11:10         ` Chris Wilson
2016-06-08 11:49           ` Tvrtko Ursulin
2016-06-08 12:54             ` Chris Wilson
2016-06-03 16:08 ` [PATCH 20/21] drm/i915: Simplify enabling user-interrupts with L3-remapping Chris Wilson
2016-06-07 12:50   ` Tvrtko Ursulin
2016-06-03 16:08 ` [PATCH 21/21] drm/i915: Remove debug noise on detecting fault-injection of missed interrupts Chris Wilson
2016-06-07 12:51   ` Tvrtko Ursulin
2016-06-03 16:35 ` ✗ Ro.CI.BAT: failure for series starting with [01/21] drm/i915/shrinker: Flush active on objects before counting Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.