[CI 01/20] drm: Restore double clflush on the last partial cacheline

All of lore.kernel.org
 help / color / mirror / Atom feed

* [CI 01/20] drm: Restore double clflush on the last partial cacheline
@ 2016-05-19 11:32 Chris Wilson
  2016-05-19 11:32 ` [CI 02/20] drm/i915/shrinker: Flush active on objects before counting Chris Wilson
                   ` (19 more replies)
  0 siblings, 20 replies; 39+ messages in thread
From: Chris Wilson @ 2016-05-19 11:32 UTC (permalink / raw)
  To: intel-gfx

This effectively reverts

commit afcd950cafea6e27b739fe7772cbbeed37d05b8b
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Jun 10 15:58:01 2015 +0100

    drm: Avoid the double clflush on the last cache line in drm_clflush_virt_range()

as we have observed issues with serialisation of the clflush operations
on Baytrail+ Atoms with partial updates. Applying the double flush on the
last cacheline forces that clflush to be ordered with respect to the
previous clflush, and the mfence then protects against prefetches crossing
the clflush boundary.

The same issue can be demonstrated in userspace with igt/gem_exec_flush.

Fixes: afcd950cafea6 (drm: Avoid the double clflush on the last cache...)
Testcase: igt/gem_concurrent_blit
Testcase: igt/gem_partial_pread_pwrite
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92845
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: dri-devel@lists.freedesktop.org
Cc: Akash Goel <akash.goel@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/drm_cache.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c
index 6743ff7dccfa..7f4a6c550319 100644
--- a/drivers/gpu/drm/drm_cache.c
+++ b/drivers/gpu/drm/drm_cache.c
@@ -136,6 +136,7 @@ drm_clflush_virt_range(void *addr, unsigned long length)
 		mb();
 		for (; addr < end; addr += size)
 			clflushopt(addr);
+		clflushopt(end - 1); /* force serialisation */
 		mb();
 		return;
 	}
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [CI 02/20] drm/i915/shrinker: Flush active on objects before counting
  2016-05-19 11:32 [CI 01/20] drm: Restore double clflush on the last partial cacheline Chris Wilson
@ 2016-05-19 11:32 ` Chris Wilson
  2016-05-19 12:14   ` Tvrtko Ursulin
  2016-05-19 11:32 ` [CI 03/20] drm/i915: Delay queuing hangcheck to wait-request Chris Wilson
                   ` (18 subsequent siblings)
  19 siblings, 1 reply; 39+ messages in thread
From: Chris Wilson @ 2016-05-19 11:32 UTC (permalink / raw)
  To: intel-gfx

As we inspect obj->active to decide how many objects we can shrink (we
only shrink idle objects), it helps to flush the active lists first
in order to have a more accurate count of available objects.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_shrinker.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c
index 538c30499848..d893014d28fe 100644
--- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
+++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
@@ -265,6 +265,8 @@ i915_gem_shrinker_count(struct shrinker *shrinker, struct shrink_control *sc)
 	if (!i915_gem_shrinker_lock(dev, &unlock))
 		return 0;
 
+	i915_gem_retire_requests(dev_priv);
+
 	count = 0;
 	list_for_each_entry(obj, &dev_priv->mm.unbound_list, global_list)
 		if (can_release_pages(obj))
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [CI 03/20] drm/i915: Delay queuing hangcheck to wait-request
  2016-05-19 11:32 [CI 01/20] drm: Restore double clflush on the last partial cacheline Chris Wilson
  2016-05-19 11:32 ` [CI 02/20] drm/i915/shrinker: Flush active on objects before counting Chris Wilson
@ 2016-05-19 11:32 ` Chris Wilson
  2016-05-19 12:34   ` Tvrtko Ursulin
  2016-05-19 11:32 ` [CI 04/20] drm/i915: Remove the dedicated hangcheck workqueue Chris Wilson
                   ` (17 subsequent siblings)
  19 siblings, 1 reply; 39+ messages in thread
From: Chris Wilson @ 2016-05-19 11:32 UTC (permalink / raw)
  To: intel-gfx

We can forgo queuing the hangcheck from the start of every request to
until we wait upon a request. This reduces the overhead of every
request, but may increase the latency of detecting a hang. Howeever, if
nothing every waits upon a hang, did it ever hang? It also improves the
robustness of the wait-request by ensuring that the hangchecker is
indeed running before we sleep indefinitely (and thereby ensuring that
we never actually sleep forever waiting for a dead GPU).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c |  9 +++++----
 drivers/gpu/drm/i915/i915_irq.c | 10 ++++------
 2 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 24cab8802c2e..06013b9fbc6a 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1310,6 +1310,9 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 			break;
 		}
 
+		/* Ensure that even if the GPU hangs, we get woken up. */
+		i915_queue_hangcheck(dev_priv);
+
 		timer.function = NULL;
 		if (timeout || missed_irq(dev_priv, engine)) {
 			unsigned long expire;
@@ -2654,8 +2657,6 @@ void __i915_add_request(struct drm_i915_gem_request *request,
 	/* Not allowed to fail! */
 	WARN(ret, "emit|add_request failed: %d!\n", ret);
 
-	i915_queue_hangcheck(engine->i915);
-
 	queue_delayed_work(dev_priv->wq,
 			   &dev_priv->mm.retire_work,
 			   round_jiffies_up_relative(HZ));
@@ -2999,8 +3000,8 @@ i915_gem_retire_requests(struct drm_i915_private *dev_priv)
 
 	if (idle)
 		mod_delayed_work(dev_priv->wq,
-				   &dev_priv->mm.idle_work,
-				   msecs_to_jiffies(100));
+				 &dev_priv->mm.idle_work,
+				 msecs_to_jiffies(100));
 
 	return idle;
 }
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index f0d941455bed..4818fcb5e960 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -3148,10 +3148,10 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
 	intel_uncore_arm_unclaimed_mmio_detection(dev_priv);
 
 	for_each_engine_id(engine, dev_priv, id) {
+		bool busy = waitqueue_active(&engine->irq_queue);
 		u64 acthd;
 		u32 seqno;
 		unsigned user_interrupts;
-		bool busy = true;
 
 		semaphore_clear_deadlocks(dev_priv);
 
@@ -3174,12 +3174,11 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
 		if (engine->hangcheck.seqno == seqno) {
 			if (ring_idle(engine, seqno)) {
 				engine->hangcheck.action = HANGCHECK_IDLE;
-				if (waitqueue_active(&engine->irq_queue)) {
+				if (busy) {
 					/* Safeguard against driver failure */
 					user_interrupts = kick_waiters(engine);
 					engine->hangcheck.score += BUSY;
-				} else
-					busy = false;
+				}
 			} else {
 				/* We always increment the hangcheck score
 				 * if the ring is busy and still processing
@@ -3253,9 +3252,8 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
 		goto out;
 	}
 
+	/* Reset timer in case GPU hangs without another request being added */
 	if (busy_count)
-		/* Reset timer case chip hangs without another request
-		 * being added */
 		i915_queue_hangcheck(dev_priv);
 
 out:
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [CI 04/20] drm/i915: Remove the dedicated hangcheck workqueue
  2016-05-19 11:32 [CI 01/20] drm: Restore double clflush on the last partial cacheline Chris Wilson
  2016-05-19 11:32 ` [CI 02/20] drm/i915/shrinker: Flush active on objects before counting Chris Wilson
  2016-05-19 11:32 ` [CI 03/20] drm/i915: Delay queuing hangcheck to wait-request Chris Wilson
@ 2016-05-19 11:32 ` Chris Wilson
  2016-05-19 12:50   ` Tvrtko Ursulin
  2016-05-19 11:32 ` [CI 05/20] drm/i915: Make queueing the hangcheck work inline Chris Wilson
                   ` (16 subsequent siblings)
  19 siblings, 1 reply; 39+ messages in thread
From: Chris Wilson @ 2016-05-19 11:32 UTC (permalink / raw)
  To: intel-gfx

The queue only ever contains at most one item and has no special flags.
It is just a very simple wrapper around the system-wq - a complication
with no benefits.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_dma.c | 8 --------
 drivers/gpu/drm/i915/i915_drv.h | 1 -
 drivers/gpu/drm/i915/i915_irq.c | 6 +++---
 3 files changed, 3 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index a8c79f6512a4..aa312fd2bc10 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -1026,15 +1026,8 @@ static int i915_workqueues_init(struct drm_i915_private *dev_priv)
 	if (dev_priv->hotplug.dp_wq == NULL)
 		goto out_free_wq;
 
-	dev_priv->gpu_error.hangcheck_wq =
-		alloc_ordered_workqueue("i915-hangcheck", 0);
-	if (dev_priv->gpu_error.hangcheck_wq == NULL)
-		goto out_free_dp_wq;
-
 	return 0;
 
-out_free_dp_wq:
-	destroy_workqueue(dev_priv->hotplug.dp_wq);
 out_free_wq:
 	destroy_workqueue(dev_priv->wq);
 out_err:
@@ -1045,7 +1038,6 @@ out_err:
 
 static void i915_workqueues_cleanup(struct drm_i915_private *dev_priv)
 {
-	destroy_workqueue(dev_priv->gpu_error.hangcheck_wq);
 	destroy_workqueue(dev_priv->hotplug.dp_wq);
 	destroy_workqueue(dev_priv->wq);
 }
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 72f0b02a8372..42b8038bf998 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1366,7 +1366,6 @@ struct i915_gpu_error {
 	/* Hang gpu twice in this window and your context gets banned */
 #define DRM_I915_CTX_BAN_PERIOD DIV_ROUND_UP(8*DRM_I915_HANGCHECK_PERIOD, 1000)
 
-	struct workqueue_struct *hangcheck_wq;
 	struct delayed_work hangcheck_work;
 
 	/* For reset and error_state handling. */
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 4818fcb5e960..984b575ba081 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -3262,7 +3262,7 @@ out:
 
 void i915_queue_hangcheck(struct drm_i915_private *dev_priv)
 {
-	struct i915_gpu_error *e = &dev_priv->gpu_error;
+	unsigned long delay;
 
 	if (!i915.enable_hangcheck)
 		return;
@@ -3272,8 +3272,8 @@ void i915_queue_hangcheck(struct drm_i915_private *dev_priv)
 	 * we will ignore a hung ring if a second ring is kept busy.
 	 */
 
-	queue_delayed_work(e->hangcheck_wq, &e->hangcheck_work,
-			   round_jiffies_up_relative(DRM_I915_HANGCHECK_JIFFIES));
+	delay = round_jiffies_up_relative(DRM_I915_HANGCHECK_JIFFIES);
+	schedule_delayed_work(&dev_priv->gpu_error.hangcheck_work, delay);
 }
 
 static void ibx_irq_reset(struct drm_device *dev)
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [CI 05/20] drm/i915: Make queueing the hangcheck work inline
  2016-05-19 11:32 [CI 01/20] drm: Restore double clflush on the last partial cacheline Chris Wilson
                   ` (2 preceding siblings ...)
  2016-05-19 11:32 ` [CI 04/20] drm/i915: Remove the dedicated hangcheck workqueue Chris Wilson
@ 2016-05-19 11:32 ` Chris Wilson
  2016-05-19 12:53   ` Tvrtko Ursulin
  2016-05-19 11:32 ` [CI 06/20] drm/i915: Slaughter the thundering i915_wait_request herd Chris Wilson
                   ` (15 subsequent siblings)
  19 siblings, 1 reply; 39+ messages in thread
From: Chris Wilson @ 2016-05-19 11:32 UTC (permalink / raw)
  To: intel-gfx

Since the function is a small wrapper around schedule_delayed_work(),
move it inline to remove the function call overhead for the principle
caller.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h | 17 ++++++++++++++++-
 drivers/gpu/drm/i915/i915_irq.c | 16 ----------------
 2 files changed, 16 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 42b8038bf998..33553ea99a03 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2833,7 +2833,22 @@ void intel_hpd_cancel_work(struct drm_i915_private *dev_priv);
 bool intel_hpd_pin_to_port(enum hpd_pin pin, enum port *port);
 
 /* i915_irq.c */
-void i915_queue_hangcheck(struct drm_i915_private *dev_priv);
+static inline void i915_queue_hangcheck(struct drm_i915_private *dev_priv)
+{
+	unsigned long delay;
+
+	if (unlikely(!i915.enable_hangcheck))
+		return;
+
+	/* Don't continually defer the hangcheck so that it is always run at
+	 * least once after work has been scheduled on any ring. Otherwise,
+	 * we will ignore a hung ring if a second ring is kept busy.
+	 */
+
+	delay = round_jiffies_up_relative(DRM_I915_HANGCHECK_JIFFIES);
+	schedule_delayed_work(&dev_priv->gpu_error.hangcheck_work, delay);
+}
+
 __printf(3, 4)
 void i915_handle_error(struct drm_i915_private *dev_priv,
 		       u32 engine_mask,
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 984b575ba081..27dd45a7291c 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -3260,22 +3260,6 @@ out:
 	ENABLE_RPM_WAKEREF_ASSERTS(dev_priv);
 }
 
-void i915_queue_hangcheck(struct drm_i915_private *dev_priv)
-{
-	unsigned long delay;
-
-	if (!i915.enable_hangcheck)
-		return;
-
-	/* Don't continually defer the hangcheck so that it is always run at
-	 * least once after work has been scheduled on any ring. Otherwise,
-	 * we will ignore a hung ring if a second ring is kept busy.
-	 */
-
-	delay = round_jiffies_up_relative(DRM_I915_HANGCHECK_JIFFIES);
-	schedule_delayed_work(&dev_priv->gpu_error.hangcheck_work, delay);
-}
-
 static void ibx_irq_reset(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [CI 06/20] drm/i915: Slaughter the thundering i915_wait_request herd
  2016-05-19 11:32 [CI 01/20] drm: Restore double clflush on the last partial cacheline Chris Wilson
                   ` (3 preceding siblings ...)
  2016-05-19 11:32 ` [CI 05/20] drm/i915: Make queueing the hangcheck work inline Chris Wilson
@ 2016-05-19 11:32 ` Chris Wilson
  2016-05-20 12:04   ` Tvrtko Ursulin
  2016-05-19 11:32 ` [CI 07/20] drm/i915: Remove the lazy_coherency parameter from request-completed? Chris Wilson
                   ` (14 subsequent siblings)
  19 siblings, 1 reply; 39+ messages in thread
From: Chris Wilson @ 2016-05-19 11:32 UTC (permalink / raw)
  To: intel-gfx

One particularly stressful scenario consists of many independent tasks
all competing for GPU time and waiting upon the results (e.g. realtime
transcoding of many, many streams). One bottleneck in particular is that
each client waits on its own results, but every client is woken up after
every batchbuffer - hence the thunder of hooves as then every client must
do its heavyweight dance to read a coherent seqno to see if it is the
lucky one.

Ideally, we only want one client to wake up after the interrupt and
check its request for completion. Since the requests must retire in
order, we can select the first client on the oldest request to be woken.
Once that client has completed his wait, we can then wake up the
next client and so on. However, all clients then incur latency as every
process in the chain may be delayed for scheduling - this may also then
cause some priority inversion. To reduce the latency, when a client
is added or removed from the list, we scan the tree for completed
seqno and wake up all the completed waiters in parallel.

Using igt/benchmarks/gem_latency, we can demonstrate this effect. The
benchmark measures the number of GPU cycles between completion of a
batch and the client waking up from a call to wait-ioctl. With many
concurrent waiters, with each on a different request, we observe that
the wakeup latency before the patch scales nearly linearly with the
number of waiters (before external factors kick in making the scaling much
worse). After applying the patch, we can see that only the single waiter
for the request is being woken up, providing a constant wakeup latency
for every operation. However, the situation is not quite as rosy for
many waiters on the same request, though to the best of my knowledge this
is much less likely in practice. Here, we can observe that the
concurrent waiters incur extra latency from being woken up by the
solitary bottom-half, rather than directly by the interrupt. This
appears to be scheduler induced (having discounted adverse effects from
having a rbtree walk/erase in the wakeup path), each additional
wake_up_process() costs approximately 1us on big core. Another effect of
performing the secondary wakeups from the first bottom-half is the
incurred delay this imposes on high priority threads - rather than
immediately returning to userspace and leaving the interrupt handler to
wake the others.

To offset the delay incurred with additional waiters on a request, we
could use a hybrid scheme that did a quick read in the interrupt handler
and dequeued all the completed waiters (incurring the overhead in the
interrupt handler, not the best plan either as we then incur GPU
submission latency) but we would still have to wake up the bottom-half
every time to do the heavyweight slow read. Or we could only kick the
waiters on the seqno with the same priority as the current task (i.e. in
the realtime waiter scenario, only it is woken up immediately by the
interrupt and simply queues the next waiter before returning to userspace,
minimising its delay at the expense of the chain, and also reducing
contention on its scheduler runqueue). This is effective at avoid long
pauses in the interrupt handler and at avoiding the extra latency in
realtime/high-priority waiters.

v2: Convert from a kworker per engine into a dedicated kthread for the
bottom-half.
v3: Rename request members and tweak comments.
v4: Use a per-engine spinlock in the breadcrumbs bottom-half.
v5: Fix race in locklessly checking waiter status and kicking the task on
adding a new waiter.
v6: Fix deciding when to force the timer to hide missing interrupts.
v7: Move the bottom-half from the kthread to the first client process.
v8: Reword a few comments
v9: Break the busy loop when the interrupt is unmasked or has fired.
v10: Comments, unnecessary churn, better debugging from Tvrtko
v11: Wake all completed waiters on removing the current bottom-half to
reduce the latency of waking up a herd of clients all waiting on the
same request.
v12: Rearrange missed-interrupt fault injection so that it works with
igt/drv_missed_irq_hang
v13: Rename intel_breadcrumb and friends to intel_wait in preparation
for signal handling.
v14: RCU commentary, assert_spin_locked
v15: Hide BUG_ON behind the compiler; report on gem_latency findings.
v16: Sort seqno-groups by priority so that first-waiter has the highest
task priority (and so avoid priority inversion).
v17: Add waiters to post-mortem GPU hang state.

Testcase: igt/gem_concurrent_blit
Testcase: igt/benchmarks/gem_latency
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: "Rogozhkin, Dmitry V" <dmitry.v.rogozhkin@intel.com>
Cc: "Gong, Zhipeng" <zhipeng.gong@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Dave Gordon <david.s.gordon@intel.com>
Cc: "Goel, Akash" <akash.goel@intel.com>
---
 drivers/gpu/drm/i915/Makefile            |   1 +
 drivers/gpu/drm/i915/i915_debugfs.c      |  15 +-
 drivers/gpu/drm/i915/i915_drv.h          |  39 +++-
 drivers/gpu/drm/i915/i915_gem.c          | 130 ++++--------
 drivers/gpu/drm/i915/i915_gpu_error.c    |  59 +++++-
 drivers/gpu/drm/i915/i915_irq.c          |  27 +--
 drivers/gpu/drm/i915/intel_breadcrumbs.c | 333 +++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.c         |   4 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c  |   3 +-
 drivers/gpu/drm/i915/intel_ringbuffer.h  |  67 ++++++-
 10 files changed, 567 insertions(+), 111 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/intel_breadcrumbs.c

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 7e2944406b8f..4c4e19b3f4d7 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -37,6 +37,7 @@ i915-y += i915_cmd_parser.o \
 	  i915_gem_userptr.o \
 	  i915_gpu_error.o \
 	  i915_trace_points.o \
+	  intel_breadcrumbs.o \
 	  intel_lrc.o \
 	  intel_mocs.o \
 	  intel_ringbuffer.o \
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 24f4105b910f..02a923feeb7d 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -761,10 +761,21 @@ static int i915_gem_request_info(struct seq_file *m, void *data)
 static void i915_ring_seqno_info(struct seq_file *m,
 				 struct intel_engine_cs *engine)
 {
+	struct intel_breadcrumbs *b = &engine->breadcrumbs;
+	struct rb_node *rb;
+
 	seq_printf(m, "Current sequence (%s): %x\n",
 		   engine->name, engine->get_seqno(engine));
 	seq_printf(m, "Current user interrupts (%s): %x\n",
 		   engine->name, READ_ONCE(engine->user_interrupts));
+
+	spin_lock(&b->lock);
+	for (rb = rb_first(&b->waiters); rb != NULL; rb = rb_next(rb)) {
+		struct intel_wait *w = container_of(rb, typeof(*w), node);
+		seq_printf(m, "Waiting (%s): %s [%d] on %x\n",
+			   engine->name, w->task->comm, w->task->pid, w->seqno);
+	}
+	spin_unlock(&b->lock);
 }
 
 static int i915_gem_seqno_info(struct seq_file *m, void *data)
@@ -1400,6 +1411,8 @@ static int i915_hangcheck_info(struct seq_file *m, void *unused)
 			   engine->hangcheck.seqno,
 			   seqno[id],
 			   engine->last_submitted_seqno);
+		seq_printf(m, "\twaiters? %d\n",
+			   intel_engine_has_waiter(engine));
 		seq_printf(m, "\tuser interrupts = %x [current %x]\n",
 			   engine->hangcheck.user_interrupts,
 			   READ_ONCE(engine->user_interrupts));
@@ -2384,7 +2397,7 @@ static int count_irq_waiters(struct drm_i915_private *i915)
 	int count = 0;
 
 	for_each_engine(engine, i915)
-		count += engine->irq_refcount;
+		count += intel_engine_has_waiter(engine);
 
 	return count;
 }
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 33553ea99a03..7b329464e8eb 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -503,6 +503,7 @@ struct drm_i915_error_state {
 		bool valid;
 		/* Software tracked state */
 		bool waiting;
+		int num_waiters;
 		int hangcheck_score;
 		enum intel_ring_hangcheck_action hangcheck_action;
 		int num_requests;
@@ -548,6 +549,12 @@ struct drm_i915_error_state {
 			u32 tail;
 		} *requests;
 
+		struct drm_i915_error_waiter {
+			char comm[TASK_COMM_LEN];
+			pid_t pid;
+			u32 seqno;
+		} *waiters;
+
 		struct {
 			u32 gfx_mode;
 			union {
@@ -1415,7 +1422,7 @@ struct i915_gpu_error {
 #define I915_STOP_RING_ALLOW_WARN      (1 << 30)
 
 	/* For missed irq/seqno simulation. */
-	unsigned int test_irq_rings;
+	unsigned long test_irq_rings;
 };
 
 enum modeset_restore {
@@ -2941,7 +2948,6 @@ ibx_disable_display_interrupt(struct drm_i915_private *dev_priv, uint32_t bits)
 	ibx_display_interrupt_update(dev_priv, bits, 0);
 }
 
-
 /* i915_gem.c */
 int i915_gem_create_ioctl(struct drm_device *dev, void *data,
 			  struct drm_file *file_priv);
@@ -3815,4 +3821,33 @@ static inline void i915_trace_irq_get(struct intel_engine_cs *engine,
 		i915_gem_request_assign(&engine->trace_irq_req, req);
 }
 
+static inline bool __i915_request_irq_complete(struct drm_i915_gem_request *req)
+{
+	/* Ensure our read of the seqno is coherent so that we
+	 * do not "miss an interrupt" (i.e. if this is the last
+	 * request and the seqno write from the GPU is not visible
+	 * by the time the interrupt fires, we will see that the
+	 * request is incomplete and go back to sleep awaiting
+	 * another interrupt that will never come.)
+	 *
+	 * Strictly, we only need to do this once after an interrupt,
+	 * but it is easier and safer to do it every time the waiter
+	 * is woken.
+	 */
+	if (i915_gem_request_completed(req, false))
+		return true;
+
+	/* We need to check whether any gpu reset happened in between
+	 * the request being submitted and now. If a reset has occurred,
+	 * the request is effectively complete (we either are in the
+	 * process of or have discarded the rendering and completely
+	 * reset the GPU. The results of the request are lost and we
+	 * are free to continue on with the original operation.
+	 */
+	if (req->reset_counter != i915_reset_counter(&req->i915->gpu_error))
+		return true;
+
+	return false;
+}
+
 #endif
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 06013b9fbc6a..9348b5d7de0e 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1123,17 +1123,6 @@ i915_gem_check_wedge(unsigned reset_counter, bool interruptible)
 	return 0;
 }
 
-static void fake_irq(unsigned long data)
-{
-	wake_up_process((struct task_struct *)data);
-}
-
-static bool missed_irq(struct drm_i915_private *dev_priv,
-		       struct intel_engine_cs *engine)
-{
-	return test_bit(engine->id, &dev_priv->gpu_error.missed_irq_rings);
-}
-
 static unsigned long local_clock_us(unsigned *cpu)
 {
 	unsigned long t;
@@ -1166,7 +1155,7 @@ static bool busywait_stop(unsigned long timeout, unsigned cpu)
 	return this_cpu != cpu;
 }
 
-static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
+static bool __i915_spin_request(struct drm_i915_gem_request *req, int state)
 {
 	unsigned long timeout;
 	unsigned cpu;
@@ -1181,17 +1170,14 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
 	 * takes to sleep on a request, on the order of a microsecond.
 	 */
 
-	if (req->engine->irq_refcount)
-		return -EBUSY;
-
 	/* Only spin if we know the GPU is processing this request */
 	if (!i915_gem_request_started(req, true))
-		return -EAGAIN;
+		return false;
 
 	timeout = local_clock_us(&cpu) + 5;
-	while (!need_resched()) {
+	do {
 		if (i915_gem_request_completed(req, true))
-			return 0;
+			return true;
 
 		if (signal_pending_state(state, current))
 			break;
@@ -1200,12 +1186,9 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
 			break;
 
 		cpu_relax_lowlatency();
-	}
-
-	if (i915_gem_request_completed(req, false))
-		return 0;
+	} while (!need_resched());
 
-	return -EAGAIN;
+	return false;
 }
 
 /**
@@ -1229,17 +1212,13 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 			s64 *timeout,
 			struct intel_rps_client *rps)
 {
-	struct intel_engine_cs *engine = i915_gem_request_get_engine(req);
-	struct drm_i915_private *dev_priv = req->i915;
-	const bool irq_test_in_progress =
-		ACCESS_ONCE(dev_priv->gpu_error.test_irq_rings) & intel_engine_flag(engine);
 	int state = interruptible ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
-	DEFINE_WAIT(wait);
-	unsigned long timeout_expire;
+	struct intel_wait wait;
+	unsigned long timeout_remain;
 	s64 before = 0; /* Only to silence a compiler warning. */
-	int ret;
+	int ret = 0;
 
-	WARN(!intel_irqs_enabled(dev_priv), "IRQs disabled");
+	might_sleep();
 
 	if (list_empty(&req->list))
 		return 0;
@@ -1247,7 +1226,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 	if (i915_gem_request_completed(req, true))
 		return 0;
 
-	timeout_expire = 0;
+	timeout_remain = MAX_SCHEDULE_TIMEOUT;
 	if (timeout) {
 		if (WARN_ON(*timeout < 0))
 			return -EINVAL;
@@ -1255,7 +1234,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 		if (*timeout == 0)
 			return -ETIME;
 
-		timeout_expire = jiffies + nsecs_to_jiffies_timeout(*timeout);
+		timeout_remain = nsecs_to_jiffies_timeout(*timeout);
 
 		/*
 		 * Record current time in case interrupted by signal, or wedged.
@@ -1263,78 +1242,57 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 		before = ktime_get_raw_ns();
 	}
 
-	if (INTEL_INFO(dev_priv)->gen >= 6)
-		gen6_rps_boost(dev_priv, rps, req->emitted_jiffies);
-
 	trace_i915_gem_request_wait_begin(req);
 
-	/* Optimistic spin for the next jiffie before touching IRQs */
-	ret = __i915_spin_request(req, state);
-	if (ret == 0)
-		goto out;
+	if (INTEL_INFO(req->i915)->gen >= 6)
+		gen6_rps_boost(req->i915, rps, req->emitted_jiffies);
 
-	if (!irq_test_in_progress && WARN_ON(!engine->irq_get(engine))) {
-		ret = -ENODEV;
-		goto out;
-	}
-
-	for (;;) {
-		struct timer_list timer;
+	/* Optimistic spin for the next ~jiffie before touching IRQs */
+	if (__i915_spin_request(req, state))
+		goto complete;
 
-		prepare_to_wait(&engine->irq_queue, &wait, state);
-
-		/* We need to check whether any gpu reset happened in between
-		 * the request being submitted and now. If a reset has occurred,
-		 * the request is effectively complete (we either are in the
-		 * process of or have discarded the rendering and completely
-		 * reset the GPU. The results of the request are lost and we
-		 * are free to continue on with the original operation.
+	intel_wait_init(&wait, req->seqno);
+	set_current_state(state);
+	if (intel_engine_add_wait(req->engine, &wait))
+		/* In order to check that we haven't missed the interrupt
+		 * as we enabled it, we need to kick ourselves to do a
+		 * coherent check on the seqno before we sleep.
 		 */
-		if (req->reset_counter != i915_reset_counter(&dev_priv->gpu_error)) {
-			ret = 0;
-			break;
-		}
-
-		if (i915_gem_request_completed(req, false)) {
-			ret = 0;
-			break;
-		}
+		goto wakeup;
 
+	for (;;) {
 		if (signal_pending_state(state, current)) {
 			ret = -ERESTARTSYS;
 			break;
 		}
 
-		if (timeout && time_after_eq(jiffies, timeout_expire)) {
+		/* Ensure that even if the GPU hangs, we get woken up. */
+		i915_queue_hangcheck(req->i915);
+
+		timeout_remain = io_schedule_timeout(timeout_remain);
+		if (timeout_remain == 0) {
 			ret = -ETIME;
 			break;
 		}
 
-		/* Ensure that even if the GPU hangs, we get woken up. */
-		i915_queue_hangcheck(dev_priv);
-
-		timer.function = NULL;
-		if (timeout || missed_irq(dev_priv, engine)) {
-			unsigned long expire;
-
-			setup_timer_on_stack(&timer, fake_irq, (unsigned long)current);
-			expire = missed_irq(dev_priv, engine) ? jiffies + 1 : timeout_expire;
-			mod_timer(&timer, expire);
-		}
+		if (intel_wait_complete(&wait))
+			break;
 
-		io_schedule();
+wakeup:
+		set_current_state(state);
 
-		if (timer.function) {
-			del_singleshot_timer_sync(&timer);
-			destroy_timer_on_stack(&timer);
-		}
+		/* Carefully check if the request is complete, giving time
+		 * for the seqno to be visible following the interrupt.
+		 * We also have to check in case we are kicked by the GPU
+		 * reset in order to drop the struct_mutex.
+		 */
+		if (__i915_request_irq_complete(req))
+			break;
 	}
-	if (!irq_test_in_progress)
-		engine->irq_put(engine);
 
-	finish_wait(&engine->irq_queue, &wait);
-
-out:
+	intel_engine_remove_wait(req->engine, &wait);
+	__set_current_state(TASK_RUNNING);
+complete:
 	trace_i915_gem_request_wait_end(req);
 
 	if (timeout) {
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 34ff2459ceea..89241ffcc676 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -463,6 +463,18 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
 			}
 		}
 
+		if (error->ring[i].num_waiters) {
+			err_printf(m, "%s --- %d waiters\n",
+				   dev_priv->engine[i].name,
+				   error->ring[i].num_waiters);
+			for (j = 0; j < error->ring[i].num_waiters; j++) {
+				err_printf(m, " seqno 0x%08x for %s [%d]\n",
+					   error->ring[i].waiters[j].seqno,
+					   error->ring[i].waiters[j].comm,
+					   error->ring[i].waiters[j].pid);
+			}
+		}
+
 		if ((obj = error->ring[i].ringbuffer)) {
 			err_printf(m, "%s --- ringbuffer = 0x%08x\n",
 				   dev_priv->engine[i].name,
@@ -605,8 +617,9 @@ static void i915_error_state_free(struct kref *error_ref)
 		i915_error_object_free(error->ring[i].ringbuffer);
 		i915_error_object_free(error->ring[i].hws_page);
 		i915_error_object_free(error->ring[i].ctx);
-		kfree(error->ring[i].requests);
 		i915_error_object_free(error->ring[i].wa_ctx);
+		kfree(error->ring[i].requests);
+		kfree(error->ring[i].waiters);
 	}
 
 	i915_error_object_free(error->semaphore_obj);
@@ -892,6 +905,47 @@ static void gen6_record_semaphore_state(struct drm_i915_private *dev_priv,
 	}
 }
 
+static void engine_record_waiters(struct intel_engine_cs *engine,
+				  struct drm_i915_error_ring *ering)
+{
+	struct intel_breadcrumbs *b = &engine->breadcrumbs;
+	struct drm_i915_error_waiter *waiter;
+	struct rb_node *rb;
+	int count;
+
+	ering->num_waiters = 0;
+	ering->waiters = NULL;
+
+	spin_lock(&b->lock);
+	count = 0;
+	for (rb = rb_first(&b->waiters); rb != NULL; rb = rb_next(rb))
+		count++;
+	spin_unlock(&b->lock);
+
+	waiter = NULL;
+	if (count)
+		waiter = kmalloc(count*sizeof(struct drm_i915_error_waiter),
+				 GFP_ATOMIC);
+	if (!waiter)
+		return;
+
+	ering->waiters = waiter;
+
+	spin_lock(&b->lock);
+	for (rb = rb_first(&b->waiters); rb != NULL; rb = rb_next(rb)) {
+		struct intel_wait *w = container_of(rb, typeof(*w), node);
+
+		strcpy(waiter->comm, w->task->comm);
+		waiter->pid = w->task->pid;
+		waiter->seqno = w->seqno;
+		waiter++;
+
+		if (++ering->num_waiters == count)
+			break;
+	}
+	spin_unlock(&b->lock);
+}
+
 static void i915_record_ring_state(struct drm_i915_private *dev_priv,
 				   struct drm_i915_error_state *error,
 				   struct intel_engine_cs *engine,
@@ -926,7 +980,7 @@ static void i915_record_ring_state(struct drm_i915_private *dev_priv,
 		ering->instdone = I915_READ(GEN2_INSTDONE);
 	}
 
-	ering->waiting = waitqueue_active(&engine->irq_queue);
+	ering->waiting = intel_engine_has_waiter(engine);
 	ering->instpm = I915_READ(RING_INSTPM(engine->mmio_base));
 	ering->acthd = intel_ring_get_active_head(engine);
 	ering->seqno = engine->get_seqno(engine);
@@ -1032,6 +1086,7 @@ static void i915_gem_record_rings(struct drm_i915_private *dev_priv,
 		error->ring[i].valid = true;
 
 		i915_record_ring_state(dev_priv, error, engine, &error->ring[i]);
+		engine_record_waiters(engine, &error->ring[i]);
 
 		request = i915_gem_find_active_request(engine);
 		if (request) {
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 27dd45a7291c..c49b8356a80d 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -988,13 +988,10 @@ static void ironlake_rps_change_irq_handler(struct drm_i915_private *dev_priv)
 
 static void notify_ring(struct intel_engine_cs *engine)
 {
-	if (!intel_engine_initialized(engine))
-		return;
-
-	trace_i915_gem_request_notify(engine);
-	engine->user_interrupts++;
-
-	wake_up_all(&engine->irq_queue);
+	if (intel_engine_wakeup(engine)) {
+		trace_i915_gem_request_notify(engine);
+		engine->user_interrupts++;
+	}
 }
 
 static void vlv_c0_read(struct drm_i915_private *dev_priv,
@@ -1075,7 +1072,7 @@ static bool any_waiters(struct drm_i915_private *dev_priv)
 	struct intel_engine_cs *engine;
 
 	for_each_engine(engine, dev_priv)
-		if (engine->irq_refcount)
+		if (intel_engine_has_waiter(engine))
 			return true;
 
 	return false;
@@ -2505,8 +2502,6 @@ static irqreturn_t gen8_irq_handler(int irq, void *arg)
 static void i915_error_wake_up(struct drm_i915_private *dev_priv,
 			       bool reset_completed)
 {
-	struct intel_engine_cs *engine;
-
 	/*
 	 * Notify all waiters for GPU completion events that reset state has
 	 * been changed, and that they need to restart their wait after
@@ -2514,9 +2509,8 @@ static void i915_error_wake_up(struct drm_i915_private *dev_priv,
 	 * a gpu reset pending so that i915_error_work_func can acquire them).
 	 */
 
-	/* Wake up __wait_seqno, potentially holding dev->struct_mutex. */
-	for_each_engine(engine, dev_priv)
-		wake_up_all(&engine->irq_queue);
+	/* Wake up i915_wait_request, potentially holding dev->struct_mutex. */
+	intel_kick_waiters(dev_priv);
 
 	/* Wake up intel_crtc_wait_for_pending_flips, holding crtc->mutex. */
 	wake_up_all(&dev_priv->pending_flip_queue);
@@ -3098,13 +3092,14 @@ static unsigned kick_waiters(struct intel_engine_cs *engine)
 
 	if (engine->hangcheck.user_interrupts == user_interrupts &&
 	    !test_and_set_bit(engine->id, &i915->gpu_error.missed_irq_rings)) {
-		if (!(i915->gpu_error.test_irq_rings & intel_engine_flag(engine)))
+		if (!test_bit(engine->id, &i915->gpu_error.test_irq_rings))
 			DRM_ERROR("Hangcheck timer elapsed... %s idle\n",
 				  engine->name);
 		else
 			DRM_INFO("Fake missed irq on %s\n",
 				 engine->name);
-		wake_up_all(&engine->irq_queue);
+
+		intel_engine_enable_fake_irq(engine);
 	}
 
 	return user_interrupts;
@@ -3148,7 +3143,7 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
 	intel_uncore_arm_unclaimed_mmio_detection(dev_priv);
 
 	for_each_engine_id(engine, dev_priv, id) {
-		bool busy = waitqueue_active(&engine->irq_queue);
+		bool busy = intel_engine_has_waiter(engine);
 		u64 acthd;
 		u32 seqno;
 		unsigned user_interrupts;
diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
new file mode 100644
index 000000000000..c82ecbf7470a
--- /dev/null
+++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
@@ -0,0 +1,333 @@
+/*
+ * Copyright © 2015 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+
+#include "i915_drv.h"
+
+static void intel_breadcrumbs_fake_irq(unsigned long data)
+{
+	struct intel_engine_cs *engine = (struct intel_engine_cs *)data;
+
+	/*
+	 * The timer persists in case we cannot enable interrupts,
+	 * or if we have previously seen seqno/interrupt incoherency
+	 * ("missed interrupt" syndrome). Here the worker will wake up
+	 * every jiffie in order to kick the oldest waiter to do the
+	 * coherent seqno check.
+	 */
+	rcu_read_lock();
+	if (intel_engine_wakeup(engine))
+		mod_timer(&engine->breadcrumbs.fake_irq, jiffies + 1);
+	rcu_read_unlock();
+}
+
+static void irq_enable(struct intel_engine_cs *engine)
+{
+	WARN_ON(!engine->irq_get(engine));
+}
+
+static void irq_disable(struct intel_engine_cs *engine)
+{
+	engine->irq_put(engine);
+}
+
+static bool __intel_breadcrumbs_enable_irq(struct intel_breadcrumbs *b)
+{
+	struct intel_engine_cs *engine =
+		container_of(b, struct intel_engine_cs, breadcrumbs);
+	struct drm_i915_private *i915 = engine->i915;
+	bool irq_posted = false;
+
+	assert_spin_locked(&b->lock);
+	if (b->rpm_wakelock)
+		return false;
+
+	/* Since we are waiting on a request, the GPU should be busy
+	 * and should have its own rpm reference. For completeness,
+	 * record an rpm reference for ourselves to cover the
+	 * interrupt we unmask.
+	 */
+	intel_runtime_pm_get_noresume(i915);
+	b->rpm_wakelock = true;
+
+	/* No interrupts? Kick the waiter every jiffie! */
+	if (intel_irqs_enabled(i915)) {
+		if (!test_bit(engine->id, &i915->gpu_error.test_irq_rings)) {
+			irq_enable(engine);
+			irq_posted = true;
+		}
+		b->irq_enabled = true;
+	}
+
+	if (!b->irq_enabled ||
+	    test_bit(engine->id, &i915->gpu_error.missed_irq_rings))
+		mod_timer(&b->fake_irq, jiffies + 1);
+
+	return irq_posted;
+}
+
+static void __intel_breadcrumbs_disable_irq(struct intel_breadcrumbs *b)
+{
+	struct intel_engine_cs *engine =
+		container_of(b, struct intel_engine_cs, breadcrumbs);
+
+	assert_spin_locked(&b->lock);
+	if (!b->rpm_wakelock)
+		return;
+
+	if (b->irq_enabled) {
+		irq_disable(engine);
+		b->irq_enabled = false;
+	}
+
+	intel_runtime_pm_put(engine->i915);
+	b->rpm_wakelock = false;
+}
+
+static inline struct intel_wait *to_wait(struct rb_node *node)
+{
+	return container_of(node, struct intel_wait, node);
+}
+
+static inline void __intel_breadcrumbs_finish(struct intel_breadcrumbs *b,
+					      struct intel_wait *wait)
+{
+	assert_spin_locked(&b->lock);
+
+	/* This request is completed, so remove it from the tree, mark it as
+	 * complete, and *then* wake up the associated task.
+	 */
+	rb_erase(&wait->node, &b->waiters);
+	RB_CLEAR_NODE(&wait->node);
+
+	wake_up_process(wait->task); /* implicit smp_wmb() */
+}
+
+bool intel_engine_add_wait(struct intel_engine_cs *engine,
+			   struct intel_wait *wait)
+{
+	struct intel_breadcrumbs *b = &engine->breadcrumbs;
+	struct rb_node **p, *parent, *completed;
+	bool first;
+	u32 seqno;
+
+	spin_lock(&b->lock);
+
+	/* Insert the request into the retirement ordered list
+	 * of waiters by walking the rbtree. If we are the oldest
+	 * seqno in the tree (the first to be retired), then
+	 * set ourselves as the bottom-half.
+	 *
+	 * As we descend the tree, prune completed branches since we hold the
+	 * spinlock we know that the first_waiter must be delayed and can
+	 * reduce some of the sequential wake up latency if we take action
+	 * ourselves and wake up the completed tasks in parallel. Also, by
+	 * removing stale elements in the tree, we may be able to reduce the
+	 * ping-pong between the old bottom-half and ourselves as first-waiter.
+	 */
+	first = true;
+	parent = NULL;
+	completed = NULL;
+	seqno = engine->get_seqno(engine);
+
+	p = &b->waiters.rb_node;
+	while (*p) {
+		parent = *p;
+		if (wait->seqno == to_wait(parent)->seqno) {
+			/* We have multiple waiters on the same seqno, select
+			 * the highest priority task (that with the smallest
+			 * task->prio) to serve as the bottom-half for this
+			 * group.
+			 */
+			if (wait->task->prio > to_wait(parent)->task->prio) {
+				p = &parent->rb_right;
+				first = false;
+			} else
+				p = &parent->rb_left;
+		} else if (i915_seqno_passed(wait->seqno,
+					     to_wait(parent)->seqno)) {
+			p = &parent->rb_right;
+			if (i915_seqno_passed(seqno, to_wait(parent)->seqno))
+				completed = parent;
+			else
+				first = false;
+		} else
+			p = &parent->rb_left;
+	}
+	rb_link_node(&wait->node, parent, p);
+	rb_insert_color(&wait->node, &b->waiters);
+
+	if (completed != NULL) {
+		struct rb_node *next = rb_next(completed);
+
+		if (next && next != &wait->node) {
+			GEM_BUG_ON(first);
+			smp_store_mb(b->first_waiter, to_wait(next)->task);
+			/* As there is a delay between reading the current
+			 * seqno, processing the completed tasks and selecting
+			 * the next waiter, we may have missed the interrupt
+			 * and so need for the next bottom-half to wakeup.
+			 *
+			 * Also as we enable the IRQ, we may miss the
+			 * interrupt for that seqno, so we have to wake up
+			 * the next bottom-half in order to do a coherent check
+			 * in case the seqno passed.
+			 */
+			__intel_breadcrumbs_enable_irq(b);
+			wake_up_process(to_wait(next)->task);
+		}
+
+		do {
+			struct intel_wait *crumb = to_wait(completed);
+			completed = rb_prev(completed);
+			__intel_breadcrumbs_finish(b, crumb);
+		} while (completed != NULL);
+	}
+
+	if (first) {
+		smp_store_mb(b->first_waiter, wait->task);
+		first =__intel_breadcrumbs_enable_irq(b);
+	}
+	GEM_BUG_ON(b->first_waiter == NULL);
+
+	spin_unlock(&b->lock);
+
+	return first;
+}
+
+void intel_engine_enable_fake_irq(struct intel_engine_cs *engine)
+{
+	mod_timer(&engine->breadcrumbs.fake_irq, jiffies + 1);
+}
+
+static inline bool chain_wakeup(struct rb_node *rb, int priority)
+{
+	return rb && to_wait(rb)->task->prio <= priority;
+}
+
+void intel_engine_remove_wait(struct intel_engine_cs *engine,
+			      struct intel_wait *wait)
+{
+	struct intel_breadcrumbs *b = &engine->breadcrumbs;
+
+	/* Quick check to see if this waiter was already decoupled from
+	 * the tree by the bottom-half to avoid contention on the spinlock
+	 * by the herd.
+	 */
+	if (RB_EMPTY_NODE(&wait->node))
+		return;
+
+	spin_lock(&b->lock);
+
+	if (b->first_waiter == wait->task) {
+		struct rb_node *next;
+		struct task_struct *task;
+		const int priority = wait->task->prio;
+
+		/* We are the current bottom-half. Find the next candidate,
+		 * the first waiter in the queue on the remaining oldest
+		 * request. As multiple seqnos may complete in the time it
+		 * takes us to wake up and find the next waiter, we have to
+		 * wake up that waiter for it to perform its own coherent
+		 * completion check.
+		 */
+		next = rb_next(&wait->node);
+		if (chain_wakeup(next, priority)) {
+			/* If the next waiter is already complete,
+			 * wake it up and continue onto the next waiter. So
+			 * if have a small herd, they will wake up in parallel
+			 * rather than sequentially, which should reduce
+			 * the overall latency in waking all the completed
+			 * clients.
+			 *
+			 * However, waking up a chain adds extra latency to
+			 * the first_waiter. This is undesirable if that
+			 * waiter is a high priority task.
+			 */
+			u32 seqno = engine->get_seqno(engine);
+			while (i915_seqno_passed(seqno, to_wait(next)->seqno)) {
+				struct rb_node *n = rb_next(next);
+				__intel_breadcrumbs_finish(b, to_wait(next));
+				next = n;
+				if (!chain_wakeup(next, priority))
+					break;
+			}
+		}
+		task = next ? to_wait(next)->task : NULL;
+
+		smp_store_mb(b->first_waiter, task);
+		if (task) {
+			/* In our haste, we may have completed the first waiter
+			 * before we enabled the interrupt. Do so now as we
+			 * have a second waiter for a future seqno. Afterwards,
+			 * we have to wake up that waiter in case we missed
+			 * the interrupt, or if we have to handle an
+			 * exception rather than a seqno completion.
+			 */
+			if (to_wait(next)->seqno != wait->seqno)
+				__intel_breadcrumbs_enable_irq(b);
+			wake_up_process(task);
+		} else
+			__intel_breadcrumbs_disable_irq(b);
+	}
+
+	if (!RB_EMPTY_NODE(&wait->node))
+		rb_erase(&wait->node, &b->waiters);
+	spin_unlock(&b->lock);
+}
+
+void intel_engine_init_breadcrumbs(struct intel_engine_cs *engine)
+{
+	struct intel_breadcrumbs *b = &engine->breadcrumbs;
+
+	spin_lock_init(&b->lock);
+	setup_timer(&b->fake_irq,
+		    intel_breadcrumbs_fake_irq,
+		    (unsigned long)engine);
+}
+
+void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine)
+{
+	struct intel_breadcrumbs *b = &engine->breadcrumbs;
+
+	del_timer_sync(&b->fake_irq);
+}
+
+unsigned intel_kick_waiters(struct drm_i915_private *i915)
+{
+	struct intel_engine_cs *engine;
+	unsigned mask = 0;
+
+	/* To avoid the task_struct disappearing beneath us as we wake up
+	 * the process, we must first inspect the task_struct->state under the
+	 * RCU lock, i.e. as we call wake_up_process() we must be holding the
+	 * rcu_read_lock().
+	 */
+	rcu_read_lock();
+	for_each_engine(engine, i915)
+		if (unlikely(intel_engine_wakeup(engine)))
+			mask |= intel_engine_flag(engine);
+	rcu_read_unlock();
+
+	return mask;
+}
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index db10c961e0f4..34c6c3fd6296 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1894,6 +1894,8 @@ void intel_logical_ring_cleanup(struct intel_engine_cs *engine)
 	i915_cmd_parser_fini_ring(engine);
 	i915_gem_batch_pool_fini(&engine->batch_pool);
 
+	intel_engine_fini_breadcrumbs(engine);
+
 	if (engine->status_page.obj) {
 		i915_gem_object_unpin_map(engine->status_page.obj);
 		engine->status_page.obj = NULL;
@@ -1931,7 +1933,7 @@ logical_ring_default_irqs(struct intel_engine_cs *engine, unsigned shift)
 {
 	engine->irq_enable_mask = GT_RENDER_USER_INTERRUPT << shift;
 	engine->irq_keep_mask = GT_CONTEXT_SWITCH_INTERRUPT << shift;
-	init_waitqueue_head(&engine->irq_queue);
+	intel_engine_init_breadcrumbs(engine);
 }
 
 static int
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 8d35a3978f9b..9bd3dd756d3d 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2258,7 +2258,7 @@ static int intel_init_ring_buffer(struct drm_device *dev,
 	memset(engine->semaphore.sync_seqno, 0,
 	       sizeof(engine->semaphore.sync_seqno));
 
-	init_waitqueue_head(&engine->irq_queue);
+	intel_engine_init_breadcrumbs(engine);
 
 	ringbuf = intel_engine_create_ringbuffer(engine, 32 * PAGE_SIZE);
 	if (IS_ERR(ringbuf)) {
@@ -2327,6 +2327,7 @@ void intel_cleanup_engine(struct intel_engine_cs *engine)
 
 	i915_cmd_parser_fini_ring(engine);
 	i915_gem_batch_pool_fini(&engine->batch_pool);
+	intel_engine_fini_breadcrumbs(engine);
 	engine->i915 = NULL;
 }
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 929e7b4af2a4..28597ea8fbf9 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -160,6 +160,31 @@ struct intel_engine_cs {
 	struct intel_ringbuffer *buffer;
 	struct list_head buffers;
 
+	/* Rather than have every client wait upon all user interrupts,
+	 * with the herd waking after every interrupt and each doing the
+	 * heavyweight seqno dance, we delegate the task (of being the
+	 * bottom-half of the user interrupt) to the first client. After
+	 * every interrupt, we wake up one client, who does the heavyweight
+	 * coherent seqno read and either goes back to sleep (if incomplete),
+	 * or wakes up all the completed clients in parallel, before then
+	 * transferring the bottom-half status to the next client in the queue.
+	 *
+	 * Compared to walking the entire list of waiters in a single dedicated
+	 * bottom-half, we reduce the latency of the first waiter by avoiding
+	 * a context switch, but incur additional coherent seqno reads when
+	 * following the chain of request breadcrumbs. Since it is most likely
+	 * that we have a single client waiting on each seqno, then reducing
+	 * the overhead of waking that client is much preferred.
+	 */
+	struct intel_breadcrumbs {
+		spinlock_t lock; /* protects the lists of requests */
+		struct rb_root waiters; /* sorted by retirement, priority */
+		struct task_struct *first_waiter; /* bh for user interrupts */
+		struct timer_list fake_irq; /* used after a missed interrupt */
+		bool irq_enabled;
+		bool rpm_wakelock;
+	} breadcrumbs;
+
 	/*
 	 * A pool of objects to use as shadow copies of client batch buffers
 	 * when the command parser is enabled. Prevents the client from
@@ -308,8 +333,6 @@ struct intel_engine_cs {
 
 	bool gpu_caches_dirty;
 
-	wait_queue_head_t irq_queue;
-
 	struct intel_context *last_context;
 
 	struct intel_ring_hangcheck hangcheck;
@@ -495,4 +518,44 @@ static inline u32 intel_hws_seqno_address(struct intel_engine_cs *engine)
 	return engine->status_page.gfx_addr + I915_GEM_HWS_INDEX_ADDR;
 }
 
+/* intel_breadcrumbs.c -- user interrupt bottom-half for waiters */
+struct intel_wait {
+	struct rb_node node;
+	struct task_struct *task;
+	u32 seqno;
+};
+void intel_engine_init_breadcrumbs(struct intel_engine_cs *engine);
+static inline void intel_wait_init(struct intel_wait *wait, u32 seqno)
+{
+	wait->task = current;
+	wait->seqno = seqno;
+}
+static inline bool intel_wait_complete(const struct intel_wait *wait)
+{
+	return RB_EMPTY_NODE(&wait->node);
+}
+bool intel_engine_add_wait(struct intel_engine_cs *engine,
+			   struct intel_wait *wait);
+void intel_engine_remove_wait(struct intel_engine_cs *engine,
+			      struct intel_wait *wait);
+static inline bool intel_engine_has_waiter(struct intel_engine_cs *engine)
+{
+	return READ_ONCE(engine->breadcrumbs.first_waiter);
+}
+static inline bool intel_engine_wakeup(struct intel_engine_cs *engine)
+{
+	bool wakeup = false;
+	struct task_struct *task = READ_ONCE(engine->breadcrumbs.first_waiter);
+	/* Note that for this not to dangerously chase a dangling pointer,
+	 * the caller is responsible for ensure that the task remain valid for
+	 * wake_up_process() i.e. that the RCU grace period cannot expire.
+	 */
+	if (task)
+		wakeup = wake_up_process(task);
+	return wakeup;
+}
+void intel_engine_enable_fake_irq(struct intel_engine_cs *engine);
+void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine);
+unsigned intel_kick_waiters(struct drm_i915_private *i915);
+
 #endif /* _INTEL_RINGBUFFER_H_ */
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [CI 07/20] drm/i915: Remove the lazy_coherency parameter from request-completed?
  2016-05-19 11:32 [CI 01/20] drm: Restore double clflush on the last partial cacheline Chris Wilson
                   ` (4 preceding siblings ...)
  2016-05-19 11:32 ` [CI 06/20] drm/i915: Slaughter the thundering i915_wait_request herd Chris Wilson
@ 2016-05-19 11:32 ` Chris Wilson
  2016-05-19 11:32 ` [CI 08/20] drm/i915: Use HWS for seqno tracking everywhere Chris Wilson
                   ` (13 subsequent siblings)
  19 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2016-05-19 11:32 UTC (permalink / raw)
  To: intel-gfx

Now that we have split out the seqno-barrier from the
engine->get_seqno() callback itself, we can move the users of the
seqno-barrier to the required callsites simplifying the common code and
making the required workaround handling much more explicit.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c  |  2 +-
 drivers/gpu/drm/i915/i915_drv.h      | 17 ++++++++---------
 drivers/gpu/drm/i915/i915_gem.c      | 24 ++++++++++++++++--------
 drivers/gpu/drm/i915/intel_display.c |  2 +-
 drivers/gpu/drm/i915/intel_pm.c      |  4 ++--
 5 files changed, 28 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 02a923feeb7d..a4287d729fd9 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -632,7 +632,7 @@ static int i915_gem_pageflip_info(struct seq_file *m, void *data)
 					   i915_gem_request_get_seqno(work->flip_queued_req),
 					   dev_priv->next_seqno,
 					   engine->get_seqno(engine),
-					   i915_gem_request_completed(work->flip_queued_req, true));
+					   i915_gem_request_completed(work->flip_queued_req));
 			} else
 				seq_printf(m, "Flip not associated with any ring\n");
 			seq_printf(m, "Flip queued on frame %d, (was ready on frame %d), now %d\n",
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 7b329464e8eb..6e24b404542d 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3142,20 +3142,14 @@ i915_seqno_passed(uint32_t seq1, uint32_t seq2)
 	return (int32_t)(seq1 - seq2) >= 0;
 }
 
-static inline bool i915_gem_request_started(struct drm_i915_gem_request *req,
-					   bool lazy_coherency)
+static inline bool i915_gem_request_started(struct drm_i915_gem_request *req)
 {
-	if (!lazy_coherency && req->engine->irq_seqno_barrier)
-		req->engine->irq_seqno_barrier(req->engine);
 	return i915_seqno_passed(req->engine->get_seqno(req->engine),
 				 req->previous_seqno);
 }
 
-static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
-					      bool lazy_coherency)
+static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req)
 {
-	if (!lazy_coherency && req->engine->irq_seqno_barrier)
-		req->engine->irq_seqno_barrier(req->engine);
 	return i915_seqno_passed(req->engine->get_seqno(req->engine),
 				 req->seqno);
 }
@@ -3823,6 +3817,8 @@ static inline void i915_trace_irq_get(struct intel_engine_cs *engine,
 
 static inline bool __i915_request_irq_complete(struct drm_i915_gem_request *req)
 {
+	struct intel_engine_cs *engine = req->engine;
+
 	/* Ensure our read of the seqno is coherent so that we
 	 * do not "miss an interrupt" (i.e. if this is the last
 	 * request and the seqno write from the GPU is not visible
@@ -3834,7 +3830,10 @@ static inline bool __i915_request_irq_complete(struct drm_i915_gem_request *req)
 	 * but it is easier and safer to do it every time the waiter
 	 * is woken.
 	 */
-	if (i915_gem_request_completed(req, false))
+	if (engine->irq_seqno_barrier)
+		engine->irq_seqno_barrier(engine);
+
+	if (i915_gem_request_completed(req))
 		return true;
 
 	/* We need to check whether any gpu reset happened in between
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 9348b5d7de0e..16ef7682ed0c 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1171,12 +1171,12 @@ static bool __i915_spin_request(struct drm_i915_gem_request *req, int state)
 	 */
 
 	/* Only spin if we know the GPU is processing this request */
-	if (!i915_gem_request_started(req, true))
+	if (!i915_gem_request_started(req))
 		return false;
 
 	timeout = local_clock_us(&cpu) + 5;
 	do {
-		if (i915_gem_request_completed(req, true))
+		if (i915_gem_request_completed(req))
 			return true;
 
 		if (signal_pending_state(state, current))
@@ -1223,7 +1223,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 	if (list_empty(&req->list))
 		return 0;
 
-	if (i915_gem_request_completed(req, true))
+	if (i915_gem_request_completed(req))
 		return 0;
 
 	timeout_remain = MAX_SCHEDULE_TIMEOUT;
@@ -2776,8 +2776,16 @@ i915_gem_find_active_request(struct intel_engine_cs *engine)
 {
 	struct drm_i915_gem_request *request;
 
+	/* We are called by the error capture and reset at a random
+	 * point in time. In particular, note that neither is crucially
+	 * ordered with an interrupt. After a hang, the GPU is dead and we
+	 * assume that no more writes can happen (we waited long enough for
+	 * all writes that were in transaction to be flushed) - adding an
+	 * extra delay for a recent interrupt is pointless. Hence, we do
+	 * not need an engine->irq_seqno_barrier() before the seqno reads.
+	 */
 	list_for_each_entry(request, &engine->request_list, list) {
-		if (i915_gem_request_completed(request, false))
+		if (i915_gem_request_completed(request))
 			continue;
 
 		return request;
@@ -2908,7 +2916,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *engine)
 					   struct drm_i915_gem_request,
 					   list);
 
-		if (!i915_gem_request_completed(request, true))
+		if (!i915_gem_request_completed(request))
 			break;
 
 		i915_gem_request_retire(request);
@@ -2932,7 +2940,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *engine)
 	}
 
 	if (unlikely(engine->trace_irq_req &&
-		     i915_gem_request_completed(engine->trace_irq_req, true))) {
+		     i915_gem_request_completed(engine->trace_irq_req))) {
 		engine->irq_put(engine);
 		i915_gem_request_assign(&engine->trace_irq_req, NULL);
 	}
@@ -3029,7 +3037,7 @@ i915_gem_object_flush_active(struct drm_i915_gem_object *obj)
 		if (req == NULL)
 			continue;
 
-		if (i915_gem_request_completed(req, true))
+		if (i915_gem_request_completed(req))
 			i915_gem_object_retire__read(obj, i);
 	}
 
@@ -3135,7 +3143,7 @@ __i915_gem_object_sync(struct drm_i915_gem_object *obj,
 	if (to == from)
 		return 0;
 
-	if (i915_gem_request_completed(from_req, true))
+	if (i915_gem_request_completed(from_req))
 		return 0;
 
 	if (!i915_semaphore_is_enabled(to_i915(obj->base.dev))) {
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 94d28c795e22..4f90920b10a7 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -11449,7 +11449,7 @@ static bool __intel_pageflip_stall_check(struct drm_device *dev,
 
 	if (work->flip_ready_vblank == 0) {
 		if (work->flip_queued_req &&
-		    !i915_gem_request_completed(work->flip_queued_req, true))
+		    !i915_gem_request_completed(work->flip_queued_req))
 			return false;
 
 		work->flip_ready_vblank = drm_crtc_vblank_count(crtc);
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index adb64638f595..abb6ab84ad3c 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -7531,7 +7531,7 @@ static void __intel_rps_boost_work(struct work_struct *work)
 	struct request_boost *boost = container_of(work, struct request_boost, work);
 	struct drm_i915_gem_request *req = boost->req;
 
-	if (!i915_gem_request_completed(req, true))
+	if (!i915_gem_request_completed(req))
 		gen6_rps_boost(req->i915, NULL, req->emitted_jiffies);
 
 	i915_gem_request_unreference(req);
@@ -7545,7 +7545,7 @@ void intel_queue_rps_boost_for_request(struct drm_i915_gem_request *req)
 	if (req == NULL || INTEL_GEN(req->i915) < 6)
 		return;
 
-	if (i915_gem_request_completed(req, true))
+	if (i915_gem_request_completed(req))
 		return;
 
 	boost = kmalloc(sizeof(*boost), GFP_ATOMIC);
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [CI 08/20] drm/i915: Use HWS for seqno tracking everywhere
  2016-05-19 11:32 [CI 01/20] drm: Restore double clflush on the last partial cacheline Chris Wilson
                   ` (5 preceding siblings ...)
  2016-05-19 11:32 ` [CI 07/20] drm/i915: Remove the lazy_coherency parameter from request-completed? Chris Wilson
@ 2016-05-19 11:32 ` Chris Wilson
  2016-05-19 11:32 ` [CI 09/20] drm/i915: Stop mapping the scratch page into CPU space Chris Wilson
                   ` (12 subsequent siblings)
  19 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2016-05-19 11:32 UTC (permalink / raw)
  To: intel-gfx

By using the same address for storing the HWS on every platform, we can
remove the platform specific vfuncs and reduce the get-seqno routine to
a single read of a cached memory location.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c      |  6 +--
 drivers/gpu/drm/i915/i915_drv.h          |  4 +-
 drivers/gpu/drm/i915/i915_gpu_error.c    |  2 +-
 drivers/gpu/drm/i915/i915_irq.c          |  4 +-
 drivers/gpu/drm/i915/i915_trace.h        |  2 +-
 drivers/gpu/drm/i915/intel_breadcrumbs.c |  4 +-
 drivers/gpu/drm/i915/intel_lrc.c         | 26 +---------
 drivers/gpu/drm/i915/intel_ringbuffer.c  | 83 ++++++++------------------------
 drivers/gpu/drm/i915/intel_ringbuffer.h  |  7 +--
 9 files changed, 36 insertions(+), 102 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index a4287d729fd9..d2dbda2d6298 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -631,7 +631,7 @@ static int i915_gem_pageflip_info(struct seq_file *m, void *data)
 					   engine->name,
 					   i915_gem_request_get_seqno(work->flip_queued_req),
 					   dev_priv->next_seqno,
-					   engine->get_seqno(engine),
+					   intel_engine_get_seqno(engine),
 					   i915_gem_request_completed(work->flip_queued_req));
 			} else
 				seq_printf(m, "Flip not associated with any ring\n");
@@ -765,7 +765,7 @@ static void i915_ring_seqno_info(struct seq_file *m,
 	struct rb_node *rb;
 
 	seq_printf(m, "Current sequence (%s): %x\n",
-		   engine->name, engine->get_seqno(engine));
+		   engine->name, intel_engine_get_seqno(engine));
 	seq_printf(m, "Current user interrupts (%s): %x\n",
 		   engine->name, READ_ONCE(engine->user_interrupts));
 
@@ -1391,7 +1391,7 @@ static int i915_hangcheck_info(struct seq_file *m, void *unused)
 
 	for_each_engine_id(engine, dev_priv, id) {
 		acthd[id] = intel_ring_get_active_head(engine);
-		seqno[id] = engine->get_seqno(engine);
+		seqno[id] = intel_engine_get_seqno(engine);
 	}
 
 	i915_get_extra_instdone(dev_priv, instdone);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 6e24b404542d..8c5fde4be4a6 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3144,13 +3144,13 @@ i915_seqno_passed(uint32_t seq1, uint32_t seq2)
 
 static inline bool i915_gem_request_started(struct drm_i915_gem_request *req)
 {
-	return i915_seqno_passed(req->engine->get_seqno(req->engine),
+	return i915_seqno_passed(intel_engine_get_seqno(req->engine),
 				 req->previous_seqno);
 }
 
 static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req)
 {
-	return i915_seqno_passed(req->engine->get_seqno(req->engine),
+	return i915_seqno_passed(intel_engine_get_seqno(req->engine),
 				 req->seqno);
 }
 
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 89241ffcc676..81341fc4e61a 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -983,7 +983,7 @@ static void i915_record_ring_state(struct drm_i915_private *dev_priv,
 	ering->waiting = intel_engine_has_waiter(engine);
 	ering->instpm = I915_READ(RING_INSTPM(engine->mmio_base));
 	ering->acthd = intel_ring_get_active_head(engine);
-	ering->seqno = engine->get_seqno(engine);
+	ering->seqno = intel_engine_get_seqno(engine);
 	ering->last_seqno = engine->last_submitted_seqno;
 	ering->start = I915_READ_START(engine);
 	ering->head = I915_READ_HEAD(engine);
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index c49b8356a80d..46462c5c66a0 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -2973,7 +2973,7 @@ static int semaphore_passed(struct intel_engine_cs *engine)
 	if (signaller->hangcheck.deadlock >= I915_NUM_ENGINES)
 		return -1;
 
-	if (i915_seqno_passed(signaller->get_seqno(signaller), seqno))
+	if (i915_seqno_passed(intel_engine_get_seqno(engine), seqno))
 		return 1;
 
 	/* cursory check for an unkickable deadlock */
@@ -3161,7 +3161,7 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
 			engine->irq_seqno_barrier(engine);
 
 		acthd = intel_ring_get_active_head(engine);
-		seqno = engine->get_seqno(engine);
+		seqno = intel_engine_get_seqno(engine);
 
 		/* Reset stuck interrupts between batch advances */
 		user_interrupts = 0;
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index 20b2e4039792..09bbb71e9ec5 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -558,7 +558,7 @@ TRACE_EVENT(i915_gem_request_notify,
 	    TP_fast_assign(
 			   __entry->dev = engine->i915->dev->primary->index;
 			   __entry->ring = engine->id;
-			   __entry->seqno = engine->get_seqno(engine);
+			   __entry->seqno = intel_engine_get_seqno(engine);
 			   ),
 
 	    TP_printk("dev=%u, ring=%u, seqno=%u",
diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
index c82ecbf7470a..37f8fe19f122 100644
--- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
@@ -148,7 +148,7 @@ bool intel_engine_add_wait(struct intel_engine_cs *engine,
 	first = true;
 	parent = NULL;
 	completed = NULL;
-	seqno = engine->get_seqno(engine);
+	seqno = intel_engine_get_seqno(engine);
 
 	p = &b->waiters.rb_node;
 	while (*p) {
@@ -264,7 +264,7 @@ void intel_engine_remove_wait(struct intel_engine_cs *engine,
 			 * the first_waiter. This is undesirable if that
 			 * waiter is a high priority task.
 			 */
-			u32 seqno = engine->get_seqno(engine);
+			u32 seqno = intel_engine_get_seqno(engine);
 			while (i915_seqno_passed(seqno, to_wait(next)->seqno)) {
 				struct rb_node *n = rb_next(next);
 				__intel_breadcrumbs_finish(b, to_wait(next));
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 34c6c3fd6296..47f730a89002 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1716,16 +1716,6 @@ static int gen8_emit_flush_render(struct drm_i915_gem_request *request,
 	return 0;
 }
 
-static u32 gen8_get_seqno(struct intel_engine_cs *engine)
-{
-	return intel_read_status_page(engine, I915_GEM_HWS_INDEX);
-}
-
-static void gen8_set_seqno(struct intel_engine_cs *engine, u32 seqno)
-{
-	intel_write_status_page(engine, I915_GEM_HWS_INDEX, seqno);
-}
-
 static void bxt_a_seqno_barrier(struct intel_engine_cs *engine)
 {
 	/*
@@ -1741,14 +1731,6 @@ static void bxt_a_seqno_barrier(struct intel_engine_cs *engine)
 	intel_flush_status_page(engine, I915_GEM_HWS_INDEX);
 }
 
-static void bxt_a_set_seqno(struct intel_engine_cs *engine, u32 seqno)
-{
-	intel_write_status_page(engine, I915_GEM_HWS_INDEX, seqno);
-
-	/* See bxt_a_get_seqno() explaining the reason for the clflush. */
-	intel_flush_status_page(engine, I915_GEM_HWS_INDEX);
-}
-
 /*
  * Reserve space for 2 NOOPs at the end of each request to be
  * used as a workaround for not being allowed to do lite
@@ -1774,7 +1756,7 @@ static int gen8_emit_request(struct drm_i915_gem_request *request)
 				intel_hws_seqno_address(request->engine) |
 				MI_FLUSH_DW_USE_GTT);
 	intel_logical_ring_emit(ringbuf, 0);
-	intel_logical_ring_emit(ringbuf, i915_gem_request_get_seqno(request));
+	intel_logical_ring_emit(ringbuf, request->seqno);
 	intel_logical_ring_emit(ringbuf, MI_USER_INTERRUPT);
 	intel_logical_ring_emit(ringbuf, MI_NOOP);
 	return intel_logical_ring_advance_and_submit(request);
@@ -1920,12 +1902,8 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
 	engine->irq_get = gen8_logical_ring_get_irq;
 	engine->irq_put = gen8_logical_ring_put_irq;
 	engine->emit_bb_start = gen8_emit_bb_start;
-	engine->get_seqno = gen8_get_seqno;
-	engine->set_seqno = gen8_set_seqno;
-	if (IS_BXT_REVID(engine->i915, 0, BXT_REVID_A1)) {
+	if (IS_BXT_REVID(engine->i915, 0, BXT_REVID_A1))
 		engine->irq_seqno_barrier = bxt_a_seqno_barrier;
-		engine->set_seqno = bxt_a_set_seqno;
-	}
 }
 
 static inline void
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 9bd3dd756d3d..b3fc5005577d 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1281,19 +1281,17 @@ static int gen8_rcs_signal(struct drm_i915_gem_request *signaller_req,
 		return ret;
 
 	for_each_engine_id(waiter, dev_priv, id) {
-		u32 seqno;
 		u64 gtt_offset = signaller->semaphore.signal_ggtt[id];
 		if (gtt_offset == MI_SEMAPHORE_SYNC_INVALID)
 			continue;
 
-		seqno = i915_gem_request_get_seqno(signaller_req);
 		intel_ring_emit(signaller, GFX_OP_PIPE_CONTROL(6));
 		intel_ring_emit(signaller, PIPE_CONTROL_GLOBAL_GTT_IVB |
 					   PIPE_CONTROL_QW_WRITE |
 					   PIPE_CONTROL_CS_STALL);
 		intel_ring_emit(signaller, lower_32_bits(gtt_offset));
 		intel_ring_emit(signaller, upper_32_bits(gtt_offset));
-		intel_ring_emit(signaller, seqno);
+		intel_ring_emit(signaller, signaller_req->seqno);
 		intel_ring_emit(signaller, 0);
 		intel_ring_emit(signaller, MI_SEMAPHORE_SIGNAL |
 					   MI_SEMAPHORE_TARGET(waiter->hw_id));
@@ -1322,18 +1320,16 @@ static int gen8_xcs_signal(struct drm_i915_gem_request *signaller_req,
 		return ret;
 
 	for_each_engine_id(waiter, dev_priv, id) {
-		u32 seqno;
 		u64 gtt_offset = signaller->semaphore.signal_ggtt[id];
 		if (gtt_offset == MI_SEMAPHORE_SYNC_INVALID)
 			continue;
 
-		seqno = i915_gem_request_get_seqno(signaller_req);
 		intel_ring_emit(signaller, (MI_FLUSH_DW + 1) |
 					   MI_FLUSH_DW_OP_STOREDW);
 		intel_ring_emit(signaller, lower_32_bits(gtt_offset) |
 					   MI_FLUSH_DW_USE_GTT);
 		intel_ring_emit(signaller, upper_32_bits(gtt_offset));
-		intel_ring_emit(signaller, seqno);
+		intel_ring_emit(signaller, signaller_req->seqno);
 		intel_ring_emit(signaller, MI_SEMAPHORE_SIGNAL |
 					   MI_SEMAPHORE_TARGET(waiter->hw_id));
 		intel_ring_emit(signaller, 0);
@@ -1364,11 +1360,9 @@ static int gen6_signal(struct drm_i915_gem_request *signaller_req,
 		i915_reg_t mbox_reg = signaller->semaphore.mbox.signal[id];
 
 		if (i915_mmio_reg_valid(mbox_reg)) {
-			u32 seqno = i915_gem_request_get_seqno(signaller_req);
-
 			intel_ring_emit(signaller, MI_LOAD_REGISTER_IMM(1));
 			intel_ring_emit_reg(signaller, mbox_reg);
-			intel_ring_emit(signaller, seqno);
+			intel_ring_emit(signaller, signaller_req->seqno);
 		}
 	}
 
@@ -1404,7 +1398,7 @@ gen6_add_request(struct drm_i915_gem_request *req)
 	intel_ring_emit(engine, MI_STORE_DWORD_INDEX);
 	intel_ring_emit(engine,
 			I915_GEM_HWS_INDEX << MI_STORE_DWORD_INDEX_SHIFT);
-	intel_ring_emit(engine, i915_gem_request_get_seqno(req));
+	intel_ring_emit(engine, req->seqno);
 	intel_ring_emit(engine, MI_USER_INTERRUPT);
 	__intel_ring_advance(engine);
 
@@ -1543,7 +1537,9 @@ static int
 pc_render_add_request(struct drm_i915_gem_request *req)
 {
 	struct intel_engine_cs *engine = req->engine;
-	u32 scratch_addr = engine->scratch.gtt_offset + 2 * CACHELINE_BYTES;
+	u32 addr = engine->status_page.gfx_addr +
+		(I915_GEM_HWS_INDEX << MI_STORE_DWORD_INDEX_SHIFT);
+	u32 scratch_addr = addr;
 	int ret;
 
 	/* For Ironlake, MI_USER_INTERRUPT was deprecated and apparently
@@ -1559,12 +1555,12 @@ pc_render_add_request(struct drm_i915_gem_request *req)
 		return ret;
 
 	intel_ring_emit(engine,
-			GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE |
+			GFX_OP_PIPE_CONTROL(4) |
+			PIPE_CONTROL_QW_WRITE |
 			PIPE_CONTROL_WRITE_FLUSH |
 			PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE);
-	intel_ring_emit(engine,
-			engine->scratch.gtt_offset | PIPE_CONTROL_GLOBAL_GTT);
-	intel_ring_emit(engine, i915_gem_request_get_seqno(req));
+	intel_ring_emit(engine, addr | PIPE_CONTROL_GLOBAL_GTT);
+	intel_ring_emit(engine, req->seqno);
 	intel_ring_emit(engine, 0);
 	PIPE_CONTROL_FLUSH(engine, scratch_addr);
 	scratch_addr += 2 * CACHELINE_BYTES; /* write to separate cachelines */
@@ -1579,13 +1575,12 @@ pc_render_add_request(struct drm_i915_gem_request *req)
 	PIPE_CONTROL_FLUSH(engine, scratch_addr);
 
 	intel_ring_emit(engine,
-			GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE |
+		       	GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE |
 			PIPE_CONTROL_WRITE_FLUSH |
 			PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE |
 			PIPE_CONTROL_NOTIFY);
-	intel_ring_emit(engine,
-			engine->scratch.gtt_offset | PIPE_CONTROL_GLOBAL_GTT);
-	intel_ring_emit(engine, i915_gem_request_get_seqno(req));
+	intel_ring_emit(engine, addr | PIPE_CONTROL_GLOBAL_GTT);
+	intel_ring_emit(engine, req->seqno);
 	intel_ring_emit(engine, 0);
 	__intel_ring_advance(engine);
 
@@ -1617,30 +1612,6 @@ gen6_seqno_barrier(struct intel_engine_cs *engine)
 	spin_unlock_irq(&dev_priv->uncore.lock);
 }
 
-static u32
-ring_get_seqno(struct intel_engine_cs *engine)
-{
-	return intel_read_status_page(engine, I915_GEM_HWS_INDEX);
-}
-
-static void
-ring_set_seqno(struct intel_engine_cs *engine, u32 seqno)
-{
-	intel_write_status_page(engine, I915_GEM_HWS_INDEX, seqno);
-}
-
-static u32
-pc_render_get_seqno(struct intel_engine_cs *engine)
-{
-	return engine->scratch.cpu_page[0];
-}
-
-static void
-pc_render_set_seqno(struct intel_engine_cs *engine, u32 seqno)
-{
-	engine->scratch.cpu_page[0] = seqno;
-}
-
 static bool
 gen5_ring_get_irq(struct intel_engine_cs *engine)
 {
@@ -1770,8 +1741,8 @@ i9xx_add_request(struct drm_i915_gem_request *req)
 
 	intel_ring_emit(engine, MI_STORE_DWORD_INDEX);
 	intel_ring_emit(engine,
-			I915_GEM_HWS_INDEX << MI_STORE_DWORD_INDEX_SHIFT);
-	intel_ring_emit(engine, i915_gem_request_get_seqno(req));
+		       	I915_GEM_HWS_INDEX << MI_STORE_DWORD_INDEX_SHIFT);
+	intel_ring_emit(engine, req->seqno);
 	intel_ring_emit(engine, MI_USER_INTERRUPT);
 	__intel_ring_advance(engine);
 
@@ -2523,7 +2494,9 @@ void intel_ring_init_seqno(struct intel_engine_cs *engine, u32 seqno)
 	memset(engine->semaphore.sync_seqno, 0,
 	       sizeof(engine->semaphore.sync_seqno));
 
-	engine->set_seqno(engine, seqno);
+	intel_write_status_page(engine, I915_GEM_HWS_INDEX, seqno);
+	if (engine->irq_seqno_barrier)
+		engine->irq_seqno_barrier(engine);
 	engine->last_submitted_seqno = seqno;
 
 	engine->hangcheck.seqno = seqno;
@@ -2765,8 +2738,6 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 		engine->irq_get = gen8_ring_get_irq;
 		engine->irq_put = gen8_ring_put_irq;
 		engine->irq_enable_mask = GT_RENDER_USER_INTERRUPT;
-		engine->get_seqno = ring_get_seqno;
-		engine->set_seqno = ring_set_seqno;
 		if (i915_semaphore_is_enabled(dev_priv)) {
 			WARN_ON(!dev_priv->semaphore_obj);
 			engine->semaphore.sync_to = gen8_ring_sync;
@@ -2783,8 +2754,6 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 		engine->irq_put = gen6_ring_put_irq;
 		engine->irq_enable_mask = GT_RENDER_USER_INTERRUPT;
 		engine->irq_seqno_barrier = gen6_seqno_barrier;
-		engine->get_seqno = ring_get_seqno;
-		engine->set_seqno = ring_set_seqno;
 		if (i915_semaphore_is_enabled(dev_priv)) {
 			engine->semaphore.sync_to = gen6_ring_sync;
 			engine->semaphore.signal = gen6_signal;
@@ -2809,8 +2778,6 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 	} else if (IS_GEN5(dev_priv)) {
 		engine->add_request = pc_render_add_request;
 		engine->flush = gen4_render_ring_flush;
-		engine->get_seqno = pc_render_get_seqno;
-		engine->set_seqno = pc_render_set_seqno;
 		engine->irq_get = gen5_ring_get_irq;
 		engine->irq_put = gen5_ring_put_irq;
 		engine->irq_enable_mask = GT_RENDER_USER_INTERRUPT |
@@ -2821,8 +2788,6 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 			engine->flush = gen2_render_ring_flush;
 		else
 			engine->flush = gen4_render_ring_flush;
-		engine->get_seqno = ring_get_seqno;
-		engine->set_seqno = ring_set_seqno;
 		if (IS_GEN2(dev_priv)) {
 			engine->irq_get = i8xx_ring_get_irq;
 			engine->irq_put = i8xx_ring_put_irq;
@@ -2900,8 +2865,6 @@ int intel_init_bsd_ring_buffer(struct drm_device *dev)
 		engine->flush = gen6_bsd_ring_flush;
 		engine->add_request = gen6_add_request;
 		engine->irq_seqno_barrier = gen6_seqno_barrier;
-		engine->get_seqno = ring_get_seqno;
-		engine->set_seqno = ring_set_seqno;
 		if (INTEL_GEN(dev_priv) >= 8) {
 			engine->irq_enable_mask =
 				GT_RENDER_USER_INTERRUPT << GEN8_VCS1_IRQ_SHIFT;
@@ -2939,8 +2902,6 @@ int intel_init_bsd_ring_buffer(struct drm_device *dev)
 		engine->mmio_base = BSD_RING_BASE;
 		engine->flush = bsd_ring_flush;
 		engine->add_request = i9xx_add_request;
-		engine->get_seqno = ring_get_seqno;
-		engine->set_seqno = ring_set_seqno;
 		if (IS_GEN5(dev_priv)) {
 			engine->irq_enable_mask = ILK_BSD_USER_INTERRUPT;
 			engine->irq_get = gen5_ring_get_irq;
@@ -2975,8 +2936,6 @@ int intel_init_bsd2_ring_buffer(struct drm_device *dev)
 	engine->flush = gen6_bsd_ring_flush;
 	engine->add_request = gen6_add_request;
 	engine->irq_seqno_barrier = gen6_seqno_barrier;
-	engine->get_seqno = ring_get_seqno;
-	engine->set_seqno = ring_set_seqno;
 	engine->irq_enable_mask =
 			GT_RENDER_USER_INTERRUPT << GEN8_VCS2_IRQ_SHIFT;
 	engine->irq_get = gen8_ring_get_irq;
@@ -3008,8 +2967,6 @@ int intel_init_blt_ring_buffer(struct drm_device *dev)
 	engine->flush = gen6_ring_flush;
 	engine->add_request = gen6_add_request;
 	engine->irq_seqno_barrier = gen6_seqno_barrier;
-	engine->get_seqno = ring_get_seqno;
-	engine->set_seqno = ring_set_seqno;
 	if (INTEL_GEN(dev_priv) >= 8) {
 		engine->irq_enable_mask =
 			GT_RENDER_USER_INTERRUPT << GEN8_BCS_IRQ_SHIFT;
@@ -3068,8 +3025,6 @@ int intel_init_vebox_ring_buffer(struct drm_device *dev)
 	engine->flush = gen6_ring_flush;
 	engine->add_request = gen6_add_request;
 	engine->irq_seqno_barrier = gen6_seqno_barrier;
-	engine->get_seqno = ring_get_seqno;
-	engine->set_seqno = ring_set_seqno;
 
 	if (INTEL_GEN(dev_priv) >= 8) {
 		engine->irq_enable_mask =
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 28597ea8fbf9..559be04b5456 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -218,9 +218,6 @@ struct intel_engine_cs {
 	 * monotonic, even if not coherent.
 	 */
 	void		(*irq_seqno_barrier)(struct intel_engine_cs *ring);
-	u32		(*get_seqno)(struct intel_engine_cs *ring);
-	void		(*set_seqno)(struct intel_engine_cs *ring,
-				     u32 seqno);
 	int		(*dispatch_execbuffer)(struct drm_i915_gem_request *req,
 					       u64 offset, u32 length,
 					       unsigned dispatch_flags);
@@ -496,6 +493,10 @@ int intel_init_blt_ring_buffer(struct drm_device *dev);
 int intel_init_vebox_ring_buffer(struct drm_device *dev);
 
 u64 intel_ring_get_active_head(struct intel_engine_cs *engine);
+static inline u32 intel_engine_get_seqno(struct intel_engine_cs *engine)
+{
+	return intel_read_status_page(engine, I915_GEM_HWS_INDEX);
+}
 
 int init_workarounds_ring(struct intel_engine_cs *engine);
 
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [CI 09/20] drm/i915: Stop mapping the scratch page into CPU space
  2016-05-19 11:32 [CI 01/20] drm: Restore double clflush on the last partial cacheline Chris Wilson
                   ` (6 preceding siblings ...)
  2016-05-19 11:32 ` [CI 08/20] drm/i915: Use HWS for seqno tracking everywhere Chris Wilson
@ 2016-05-19 11:32 ` Chris Wilson
  2016-05-19 11:32 ` [CI 10/20] drm/i915: Allocate scratch page from stolen Chris Wilson
                   ` (11 subsequent siblings)
  19 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2016-05-19 11:32 UTC (permalink / raw)
  To: intel-gfx

After the elimination of using the scratch page for Ironlake's
breadcrumb, we no longer need to kmap the object. We therefore can move
it into the high unmappable space and do not need to force the object to
be coherent (i.e. snooped on !llc platforms).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 40 +++++++++------------------------
 drivers/gpu/drm/i915/intel_ringbuffer.h |  1 -
 2 files changed, 11 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index b3fc5005577d..90dfabb372a8 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -643,58 +643,40 @@ out:
 	return ret;
 }
 
-void
-intel_fini_pipe_control(struct intel_engine_cs *engine)
+void intel_fini_pipe_control(struct intel_engine_cs *engine)
 {
 	if (engine->scratch.obj == NULL)
 		return;
 
-	if (INTEL_GEN(engine->i915) >= 5) {
-		kunmap(sg_page(engine->scratch.obj->pages->sgl));
-		i915_gem_object_ggtt_unpin(engine->scratch.obj);
-	}
-
+	i915_gem_object_ggtt_unpin(engine->scratch.obj);
 	drm_gem_object_unreference(&engine->scratch.obj->base);
 	engine->scratch.obj = NULL;
 }
 
-int
-intel_init_pipe_control(struct intel_engine_cs *engine)
+int intel_init_pipe_control(struct intel_engine_cs *engine)
 {
+	struct drm_i915_gem_object *obj;
 	int ret;
 
 	WARN_ON(engine->scratch.obj);
 
-	engine->scratch.obj = i915_gem_object_create(engine->i915->dev, 4096);
-	if (IS_ERR(engine->scratch.obj)) {
-		DRM_ERROR("Failed to allocate seqno page\n");
-		ret = PTR_ERR(engine->scratch.obj);
-		engine->scratch.obj = NULL;
+	obj = i915_gem_object_create(engine->i915->dev, 4096);
+	if (IS_ERR(obj)) {
+		DRM_ERROR("Failed to allocate scratch page\n");
+		ret = PTR_ERR(obj);
 		goto err;
 	}
 
-	ret = i915_gem_object_set_cache_level(engine->scratch.obj,
-					      I915_CACHE_LLC);
+	ret = i915_gem_obj_ggtt_pin(obj, 4096, PIN_HIGH);
 	if (ret)
 		goto err_unref;
 
-	ret = i915_gem_obj_ggtt_pin(engine->scratch.obj, 4096, 0);
-	if (ret)
-		goto err_unref;
-
-	engine->scratch.gtt_offset = i915_gem_obj_ggtt_offset(engine->scratch.obj);
-	engine->scratch.cpu_page = kmap(sg_page(engine->scratch.obj->pages->sgl));
-	if (engine->scratch.cpu_page == NULL) {
-		ret = -ENOMEM;
-		goto err_unpin;
-	}
-
+	engine->scratch.obj = obj;
+	engine->scratch.gtt_offset = i915_gem_obj_ggtt_offset(obj);
 	DRM_DEBUG_DRIVER("%s pipe control offset: 0x%08x\n",
 			 engine->name, engine->scratch.gtt_offset);
 	return 0;
 
-err_unpin:
-	i915_gem_object_ggtt_unpin(engine->scratch.obj);
 err_unref:
 	drm_gem_object_unreference(&engine->scratch.obj->base);
 err:
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 559be04b5456..960de2e44079 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -337,7 +337,6 @@ struct intel_engine_cs {
 	struct {
 		struct drm_i915_gem_object *obj;
 		u32 gtt_offset;
-		volatile u32 *cpu_page;
 	} scratch;
 
 	bool needs_cmd_parser;
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [CI 10/20] drm/i915: Allocate scratch page from stolen
  2016-05-19 11:32 [CI 01/20] drm: Restore double clflush on the last partial cacheline Chris Wilson
                   ` (7 preceding siblings ...)
  2016-05-19 11:32 ` [CI 09/20] drm/i915: Stop mapping the scratch page into CPU space Chris Wilson
@ 2016-05-19 11:32 ` Chris Wilson
  2016-05-19 11:32 ` [CI 11/20] drm/i915: Refactor scratch object allocation for gen2 w/a buffer Chris Wilson
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2016-05-19 11:32 UTC (permalink / raw)
  To: intel-gfx

With the last direct CPU access to the scratch page removed, we can now
allocate it from our small amount of reserved system pages (stolen
memory).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 90dfabb372a8..aeb61d85bb37 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -660,7 +660,9 @@ int intel_init_pipe_control(struct intel_engine_cs *engine)
 
 	WARN_ON(engine->scratch.obj);
 
-	obj = i915_gem_object_create(engine->i915->dev, 4096);
+	obj = i915_gem_object_create_stolen(engine->i915->dev, 4096);
+	if (obj == NULL)
+		obj = i915_gem_object_create(engine->i915->dev, 4096);
 	if (IS_ERR(obj)) {
 		DRM_ERROR("Failed to allocate scratch page\n");
 		ret = PTR_ERR(obj);
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [CI 11/20] drm/i915: Refactor scratch object allocation for gen2 w/a buffer
  2016-05-19 11:32 [CI 01/20] drm: Restore double clflush on the last partial cacheline Chris Wilson
                   ` (8 preceding siblings ...)
  2016-05-19 11:32 ` [CI 10/20] drm/i915: Allocate scratch page from stolen Chris Wilson
@ 2016-05-19 11:32 ` Chris Wilson
  2016-05-19 11:32 ` [CI 12/20] drm/i915: Add a delay between interrupt and inspecting the final seqno (ilk) Chris Wilson
                   ` (9 subsequent siblings)
  19 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2016-05-19 11:32 UTC (permalink / raw)
  To: intel-gfx

The gen2 w/a buffer is stuffed into the same slot as the gen5+ scratch
buffer. If we pass in the size we want to allocate for the scratch
buffer, both callers can use the same routine.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_lrc.c        |  2 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c | 31 ++++++++-----------------------
 drivers/gpu/drm/i915/intel_ringbuffer.h |  2 +-
 3 files changed, 10 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 47f730a89002..2991fd2352b9 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2082,7 +2082,7 @@ static int logical_render_ring_init(struct drm_device *dev)
 	engine->emit_flush = gen8_emit_flush_render;
 	engine->emit_request = gen8_emit_request_render;
 
-	ret = intel_init_pipe_control(engine);
+	ret = intel_init_pipe_control(engine, 4096);
 	if (ret)
 		return ret;
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index aeb61d85bb37..1a916b57fbd4 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -653,16 +653,16 @@ void intel_fini_pipe_control(struct intel_engine_cs *engine)
 	engine->scratch.obj = NULL;
 }
 
-int intel_init_pipe_control(struct intel_engine_cs *engine)
+int intel_init_pipe_control(struct intel_engine_cs *engine, int size)
 {
 	struct drm_i915_gem_object *obj;
 	int ret;
 
 	WARN_ON(engine->scratch.obj);
 
-	obj = i915_gem_object_create_stolen(engine->i915->dev, 4096);
+	obj = i915_gem_object_create_stolen(engine->i915->dev, size);
 	if (obj == NULL)
-		obj = i915_gem_object_create(engine->i915->dev, 4096);
+		obj = i915_gem_object_create(engine->i915->dev, size);
 	if (IS_ERR(obj)) {
 		DRM_ERROR("Failed to allocate scratch page\n");
 		ret = PTR_ERR(obj);
@@ -2798,31 +2798,16 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 	engine->init_hw = init_render_ring;
 	engine->cleanup = render_ring_cleanup;
 
-	/* Workaround batchbuffer to combat CS tlb bug. */
-	if (HAS_BROKEN_CS_TLB(dev_priv)) {
-		obj = i915_gem_object_create(dev, I830_WA_SIZE);
-		if (IS_ERR(obj)) {
-			DRM_ERROR("Failed to allocate batch bo\n");
-			return PTR_ERR(obj);
-		}
-
-		ret = i915_gem_obj_ggtt_pin(obj, 0, 0);
-		if (ret != 0) {
-			drm_gem_object_unreference(&obj->base);
-			DRM_ERROR("Failed to ping batch bo\n");
-			return ret;
-		}
-
-		engine->scratch.obj = obj;
-		engine->scratch.gtt_offset = i915_gem_obj_ggtt_offset(obj);
-	}
-
 	ret = intel_init_ring_buffer(dev, engine);
 	if (ret)
 		return ret;
 
 	if (INTEL_GEN(dev_priv) >= 5) {
-		ret = intel_init_pipe_control(engine);
+		ret = intel_init_pipe_control(engine, 4096);
+		if (ret)
+			return ret;
+	} else if (HAS_BROKEN_CS_TLB(dev_priv)) {
+		ret = intel_init_pipe_control(engine, I830_WA_SIZE);
 		if (ret)
 			return ret;
 	}
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 960de2e44079..8c45435d7865 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -482,8 +482,8 @@ void intel_ring_init_seqno(struct intel_engine_cs *engine, u32 seqno);
 int intel_ring_flush_all_caches(struct drm_i915_gem_request *req);
 int intel_ring_invalidate_all_caches(struct drm_i915_gem_request *req);
 
+int intel_init_pipe_control(struct intel_engine_cs *engine, int size);
 void intel_fini_pipe_control(struct intel_engine_cs *engine);
-int intel_init_pipe_control(struct intel_engine_cs *engine);
 
 int intel_init_render_ring_buffer(struct drm_device *dev);
 int intel_init_bsd_ring_buffer(struct drm_device *dev);
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [CI 12/20] drm/i915: Add a delay between interrupt and inspecting the final seqno (ilk)
  2016-05-19 11:32 [CI 01/20] drm: Restore double clflush on the last partial cacheline Chris Wilson
                   ` (9 preceding siblings ...)
  2016-05-19 11:32 ` [CI 11/20] drm/i915: Refactor scratch object allocation for gen2 w/a buffer Chris Wilson
@ 2016-05-19 11:32 ` Chris Wilson
  2016-05-19 11:32 ` [CI 13/20] drm/i915: Check the CPU cached value of seqno after waking the waiter Chris Wilson
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2016-05-19 11:32 UTC (permalink / raw)
  To: intel-gfx

On Ironlake, there is no command nor register to ensure that the write
from a MI_STORE command is completed (and coherent on the CPU) before the
command parser continues. This means that the ordering between the seqno
write and the subsequent user interrupt is undefined (like gen6+). So to
ensure that the seqno write is completed after the final user interrupt
we need to delay the read sufficiently to allow the write to complete.
This delay is undefined by the bspec, and empirically requires 75us even
though a register read combined with a clflush is less than 500ns. Hence,
the delay is due to an on-chip buffer rather than the latency of the write
to memory.

Note that the render ring controls this by filling the PIPE_CONTROL fifo
with stalling commands that force the earliest pipe-control with the
seqno to be completed before the command parser continues. Given that we
need a barrier operation for BSD, we may as well forgo the extra
per-batch latency by using a common per-interrupt barrier.

Studying the impact of adding the usleep shows that in both sequences of
and individual synchronous no-op batches is negligible for the media
engine (where the write now is unordered with the interrupt). Converting
the render engine over from the current glutton of pie-controls over to
the per-interrupt delays speeds up both the sequential and individual
synchronous no-ops by 20% and 60%, respectively. This speed up holds
even when looking at the throughput of small copies (4KiB->4MiB), both
serial and synchronous, by about 20%. This is because despite adding a
significant delay to the interrupt, in all likelihood we will see the
seqno write without having to apply the barrier (only in the rare corner
cases where the write is delayed on the last required is the delay
necessary).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94307
Testcase: igt/gem_sync #ilk
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_irq.c         | 10 ++---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 80 ++++++++-------------------------
 2 files changed, 21 insertions(+), 69 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 46462c5c66a0..795e60b47eab 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1276,8 +1276,7 @@ static void ivybridge_parity_error_irq_handler(struct drm_i915_private *dev_priv
 static void ilk_gt_irq_handler(struct drm_i915_private *dev_priv,
 			       u32 gt_iir)
 {
-	if (gt_iir &
-	    (GT_RENDER_USER_INTERRUPT | GT_RENDER_PIPECTL_NOTIFY_INTERRUPT))
+	if (gt_iir & GT_RENDER_USER_INTERRUPT)
 		notify_ring(&dev_priv->engine[RCS]);
 	if (gt_iir & ILK_BSD_USER_INTERRUPT)
 		notify_ring(&dev_priv->engine[VCS]);
@@ -1286,9 +1285,7 @@ static void ilk_gt_irq_handler(struct drm_i915_private *dev_priv,
 static void snb_gt_irq_handler(struct drm_i915_private *dev_priv,
 			       u32 gt_iir)
 {
-
-	if (gt_iir &
-	    (GT_RENDER_USER_INTERRUPT | GT_RENDER_PIPECTL_NOTIFY_INTERRUPT))
+	if (gt_iir & GT_RENDER_USER_INTERRUPT)
 		notify_ring(&dev_priv->engine[RCS]);
 	if (gt_iir & GT_BSD_USER_INTERRUPT)
 		notify_ring(&dev_priv->engine[VCS]);
@@ -3622,8 +3619,7 @@ static void gen5_gt_irq_postinstall(struct drm_device *dev)
 
 	gt_irqs |= GT_RENDER_USER_INTERRUPT;
 	if (IS_GEN5(dev)) {
-		gt_irqs |= GT_RENDER_PIPECTL_NOTIFY_INTERRUPT |
-			   ILK_BSD_USER_INTERRUPT;
+		gt_irqs |= ILK_BSD_USER_INTERRUPT;
 	} else {
 		gt_irqs |= GT_BLT_USER_INTERRUPT | GT_BSD_USER_INTERRUPT;
 	}
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 1a916b57fbd4..63a0c78d6a22 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1508,67 +1508,22 @@ gen6_ring_sync(struct drm_i915_gem_request *waiter_req,
 	return 0;
 }
 
-#define PIPE_CONTROL_FLUSH(ring__, addr__)					\
-do {									\
-	intel_ring_emit(ring__, GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE |		\
-		 PIPE_CONTROL_DEPTH_STALL);				\
-	intel_ring_emit(ring__, (addr__) | PIPE_CONTROL_GLOBAL_GTT);			\
-	intel_ring_emit(ring__, 0);							\
-	intel_ring_emit(ring__, 0);							\
-} while (0)
-
-static int
-pc_render_add_request(struct drm_i915_gem_request *req)
+static void
+gen5_seqno_barrier(struct intel_engine_cs *ring)
 {
-	struct intel_engine_cs *engine = req->engine;
-	u32 addr = engine->status_page.gfx_addr +
-		(I915_GEM_HWS_INDEX << MI_STORE_DWORD_INDEX_SHIFT);
-	u32 scratch_addr = addr;
-	int ret;
-
-	/* For Ironlake, MI_USER_INTERRUPT was deprecated and apparently
-	 * incoherent with writes to memory, i.e. completely fubar,
-	 * so we need to use PIPE_NOTIFY instead.
+	/* MI_STORE are internally buffered by the GPU and not flushed
+	 * either by MI_FLUSH or SyncFlush or any other combination of
+	 * MI commands.
 	 *
-	 * However, we also need to workaround the qword write
-	 * incoherence by flushing the 6 PIPE_NOTIFY buffers out to
-	 * memory before requesting an interrupt.
+	 * "Only the submission of the store operation is guaranteed.
+	 * The write result will be complete (coherent) some time later
+	 * (this is practically a finite period but there is no guaranteed
+	 * latency)."
+	 *
+	 * Empirically, we observe that we need a delay of at least 75us to
+	 * be sure that the seqno write is visible by the CPU.
 	 */
-	ret = intel_ring_begin(req, 32);
-	if (ret)
-		return ret;
-
-	intel_ring_emit(engine,
-			GFX_OP_PIPE_CONTROL(4) |
-			PIPE_CONTROL_QW_WRITE |
-			PIPE_CONTROL_WRITE_FLUSH |
-			PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE);
-	intel_ring_emit(engine, addr | PIPE_CONTROL_GLOBAL_GTT);
-	intel_ring_emit(engine, req->seqno);
-	intel_ring_emit(engine, 0);
-	PIPE_CONTROL_FLUSH(engine, scratch_addr);
-	scratch_addr += 2 * CACHELINE_BYTES; /* write to separate cachelines */
-	PIPE_CONTROL_FLUSH(engine, scratch_addr);
-	scratch_addr += 2 * CACHELINE_BYTES;
-	PIPE_CONTROL_FLUSH(engine, scratch_addr);
-	scratch_addr += 2 * CACHELINE_BYTES;
-	PIPE_CONTROL_FLUSH(engine, scratch_addr);
-	scratch_addr += 2 * CACHELINE_BYTES;
-	PIPE_CONTROL_FLUSH(engine, scratch_addr);
-	scratch_addr += 2 * CACHELINE_BYTES;
-	PIPE_CONTROL_FLUSH(engine, scratch_addr);
-
-	intel_ring_emit(engine,
-		       	GFX_OP_PIPE_CONTROL(4) | PIPE_CONTROL_QW_WRITE |
-			PIPE_CONTROL_WRITE_FLUSH |
-			PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE |
-			PIPE_CONTROL_NOTIFY);
-	intel_ring_emit(engine, addr | PIPE_CONTROL_GLOBAL_GTT);
-	intel_ring_emit(engine, req->seqno);
-	intel_ring_emit(engine, 0);
-	__intel_ring_advance(engine);
-
-	return 0;
+	usleep_range(75, 250);
 }
 
 static void
@@ -2760,12 +2715,12 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 			engine->semaphore.mbox.signal[VCS2] = GEN6_NOSYNC;
 		}
 	} else if (IS_GEN5(dev_priv)) {
-		engine->add_request = pc_render_add_request;
+		engine->add_request = i9xx_add_request;
 		engine->flush = gen4_render_ring_flush;
 		engine->irq_get = gen5_ring_get_irq;
 		engine->irq_put = gen5_ring_put_irq;
-		engine->irq_enable_mask = GT_RENDER_USER_INTERRUPT |
-					GT_RENDER_PIPECTL_NOTIFY_INTERRUPT;
+		engine->irq_seqno_barrier = gen5_seqno_barrier;
+		engine->irq_enable_mask = GT_RENDER_USER_INTERRUPT;
 	} else {
 		engine->add_request = i9xx_add_request;
 		if (INTEL_GEN(dev_priv) < 4)
@@ -2802,7 +2757,7 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 	if (ret)
 		return ret;
 
-	if (INTEL_GEN(dev_priv) >= 5) {
+	if (INTEL_GEN(dev_priv) >= 6) {
 		ret = intel_init_pipe_control(engine, 4096);
 		if (ret)
 			return ret;
@@ -2875,6 +2830,7 @@ int intel_init_bsd_ring_buffer(struct drm_device *dev)
 			engine->irq_enable_mask = ILK_BSD_USER_INTERRUPT;
 			engine->irq_get = gen5_ring_get_irq;
 			engine->irq_put = gen5_ring_put_irq;
+			engine->irq_seqno_barrier = gen5_seqno_barrier;
 		} else {
 			engine->irq_enable_mask = I915_BSD_USER_INTERRUPT;
 			engine->irq_get = i9xx_ring_get_irq;
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [CI 13/20] drm/i915: Check the CPU cached value of seqno after waking the waiter
  2016-05-19 11:32 [CI 01/20] drm: Restore double clflush on the last partial cacheline Chris Wilson
                   ` (10 preceding siblings ...)
  2016-05-19 11:32 ` [CI 12/20] drm/i915: Add a delay between interrupt and inspecting the final seqno (ilk) Chris Wilson
@ 2016-05-19 11:32 ` Chris Wilson
  2016-05-19 11:32 ` [CI 14/20] drm/i915: Only apply one barrier after a breadcrumb interrupt is posted Chris Wilson
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2016-05-19 11:32 UTC (permalink / raw)
  To: intel-gfx

If we have multiple waiters, we may find that many complete on the same
wake up. If we first inspect the seqno from the CPU cache, we may reduce
the number of heavyweight coherent seqno reads we require.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 8c5fde4be4a6..685a2e4f4415 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3819,6 +3819,12 @@ static inline bool __i915_request_irq_complete(struct drm_i915_gem_request *req)
 {
 	struct intel_engine_cs *engine = req->engine;
 
+	/* Before we do the heavier coherent read of the seqno,
+	 * check the value (hopefully) in the CPU cacheline.
+	 */
+	if (i915_gem_request_completed(req))
+		return true;
+
 	/* Ensure our read of the seqno is coherent so that we
 	 * do not "miss an interrupt" (i.e. if this is the last
 	 * request and the seqno write from the GPU is not visible
@@ -3830,11 +3836,11 @@ static inline bool __i915_request_irq_complete(struct drm_i915_gem_request *req)
 	 * but it is easier and safer to do it every time the waiter
 	 * is woken.
 	 */
-	if (engine->irq_seqno_barrier)
+	if (engine->irq_seqno_barrier) {
 		engine->irq_seqno_barrier(engine);
-
-	if (i915_gem_request_completed(req))
-		return true;
+		if (i915_gem_request_completed(req))
+			return true;
+	}
 
 	/* We need to check whether any gpu reset happened in between
 	 * the request being submitted and now. If a reset has occurred,
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [CI 14/20] drm/i915: Only apply one barrier after a breadcrumb interrupt is posted
  2016-05-19 11:32 [CI 01/20] drm: Restore double clflush on the last partial cacheline Chris Wilson
                   ` (11 preceding siblings ...)
  2016-05-19 11:32 ` [CI 13/20] drm/i915: Check the CPU cached value of seqno after waking the waiter Chris Wilson
@ 2016-05-19 11:32 ` Chris Wilson
  2016-05-19 11:32 ` [CI 15/20] drm/i915: Stop setting wraparound seqno on initialisation Chris Wilson
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2016-05-19 11:32 UTC (permalink / raw)
  To: intel-gfx

If we flag the seqno as potentially stale upon receiving an interrupt,
we can use that information to reduce the frequency that we apply the
heavyweight coherent seqno read (i.e. if we wake up a chain of waiters).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h          | 15 ++++++++++++++-
 drivers/gpu/drm/i915/i915_irq.c          |  1 +
 drivers/gpu/drm/i915/intel_breadcrumbs.c | 11 ++++++++++-
 drivers/gpu/drm/i915/intel_ringbuffer.h  |  1 +
 4 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 685a2e4f4415..ce685a83bb03 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3836,7 +3836,20 @@ static inline bool __i915_request_irq_complete(struct drm_i915_gem_request *req)
 	 * but it is easier and safer to do it every time the waiter
 	 * is woken.
 	 */
-	if (engine->irq_seqno_barrier) {
+	if (engine->irq_seqno_barrier && READ_ONCE(engine->irq_posted)) {
+		/* The ordering of irq_posted versus applying the barrier
+		 * is crucial. The clearing of the current irq_posted must
+		 * be visible before we perform the barrier operation,
+		 * such that if a subsequent interrupt arrives, irq_posted
+		 * is reasserted and our task rewoken (which causes us to
+		 * do another __i915_request_irq_complete() immediately
+		 * and reapply the barrier). Conversely, if the clear
+		 * occurs after the barrier, then an interrupt that arrived
+		 * whilst we waited on the barrier would not trigger a
+		 * barrier on the next pass, and the read may not see the
+		 * seqno update.
+		 */
+		WRITE_ONCE(engine->irq_posted, false);
 		engine->irq_seqno_barrier(engine);
 		if (i915_gem_request_completed(req))
 			return true;
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 795e60b47eab..f62fcf3f6ea8 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -988,6 +988,7 @@ static void ironlake_rps_change_irq_handler(struct drm_i915_private *dev_priv)
 
 static void notify_ring(struct intel_engine_cs *engine)
 {
+	smp_store_mb(engine->irq_posted, true);
 	if (intel_engine_wakeup(engine)) {
 		trace_i915_gem_request_notify(engine);
 		engine->user_interrupts++;
diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
index 37f8fe19f122..578de43cb07e 100644
--- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
@@ -43,12 +43,20 @@ static void intel_breadcrumbs_fake_irq(unsigned long data)
 
 static void irq_enable(struct intel_engine_cs *engine)
 {
+	/* Enabling the IRQ may miss the generation of the interrupt, but
+	 * we still need to force the barrier before reading the seqno,
+	 * just in case.
+	 */
+	engine->irq_posted = true;
+
 	WARN_ON(!engine->irq_get(engine));
 }
 
 static void irq_disable(struct intel_engine_cs *engine)
 {
 	engine->irq_put(engine);
+
+	engine->irq_posted = false;
 }
 
 static bool __intel_breadcrumbs_enable_irq(struct intel_breadcrumbs *b)
@@ -194,7 +202,8 @@ bool intel_engine_add_wait(struct intel_engine_cs *engine,
 			 * in case the seqno passed.
 			 */
 			__intel_breadcrumbs_enable_irq(b);
-			wake_up_process(to_wait(next)->task);
+			if (READ_ONCE(engine->irq_posted))
+				wake_up_process(to_wait(next)->task);
 		}
 
 		do {
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 8c45435d7865..a2e988068cb4 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -196,6 +196,7 @@ struct intel_engine_cs {
 	struct i915_ctx_workarounds wa_ctx;
 
 	unsigned irq_refcount; /* protected by dev_priv->irq_lock */
+	bool		irq_posted;
 	u32		irq_enable_mask;	/* bitmask to enable ring interrupt */
 	struct drm_i915_gem_request *trace_irq_req;
 	bool __must_check (*irq_get)(struct intel_engine_cs *ring);
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [CI 15/20] drm/i915: Stop setting wraparound seqno on initialisation
  2016-05-19 11:32 [CI 01/20] drm: Restore double clflush on the last partial cacheline Chris Wilson
                   ` (12 preceding siblings ...)
  2016-05-19 11:32 ` [CI 14/20] drm/i915: Only apply one barrier after a breadcrumb interrupt is posted Chris Wilson
@ 2016-05-19 11:32 ` Chris Wilson
  2016-05-19 11:32 ` [CI 16/20] drm/i915: Only query timestamp when measuring elapsed time Chris Wilson
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2016-05-19 11:32 UTC (permalink / raw)
  To: intel-gfx

We have testcases to ensure that seqno wraparound works fine, so we can
forgo forcing everyone to encounter seqno wraparound during early
uptime. seqno wraparound incurs a full GPU stall so not forcing it
will eliminate one jitter from the early system. Using the testcases, we
have very deterministic testing which given how difficult it would be to
debug an issue (GPU hang) stemming from a wraparound using pure
postmortem analysis I see no value in forcing a wrap during boot.

Advancing the global next_seqno after a GPU reset is equally pointless.

References? https://bugs.freedesktop.org/show_bug.cgi?id=95023
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 14 --------------
 1 file changed, 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 16ef7682ed0c..b48a3b46e86f 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4842,12 +4842,6 @@ i915_gem_init_hw(struct drm_device *dev)
 		}
 	}
 
-	/*
-	 * Increment the next seqno by 0x100 so we have a visible break
-	 * on re-initialisation
-	 */
-	ret = i915_gem_set_seqno(dev, dev_priv->next_seqno+0x100);
-
 out:
 	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
 	return ret;
@@ -4992,14 +4986,6 @@ i915_gem_load_init(struct drm_device *dev)
 
 	dev_priv->relative_constants_mode = I915_EXEC_CONSTANTS_REL_GENERAL;
 
-	/*
-	 * Set initial sequence number for requests.
-	 * Using this number allows the wraparound to happen early,
-	 * catching any obvious problems.
-	 */
-	dev_priv->next_seqno = ((u32)~0 - 0x1100);
-	dev_priv->last_seqno = ((u32)~0 - 0x1101);
-
 	INIT_LIST_HEAD(&dev_priv->mm.fence_list);
 
 	init_waitqueue_head(&dev_priv->pending_flip_queue);
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [CI 16/20] drm/i915: Only query timestamp when measuring elapsed time
  2016-05-19 11:32 [CI 01/20] drm: Restore double clflush on the last partial cacheline Chris Wilson
                   ` (13 preceding siblings ...)
  2016-05-19 11:32 ` [CI 15/20] drm/i915: Stop setting wraparound seqno on initialisation Chris Wilson
@ 2016-05-19 11:32 ` Chris Wilson
  2016-05-19 15:44   ` Tvrtko Ursulin
  2016-05-19 11:32 ` [CI 17/20] drm/i915: Convert trace-irq to the breadcrumb waiter Chris Wilson
                   ` (4 subsequent siblings)
  19 siblings, 1 reply; 39+ messages in thread
From: Chris Wilson @ 2016-05-19 11:32 UTC (permalink / raw)
  To: intel-gfx

Avoid the two calls to ktime_get_raw_ns() (at best it reads the TSC) as
we only need to compute the elapsed time for a timed wait.

v2: Eliminate the unused local variable reducing the function size by 64
bytes (using the storage space on the callers stack rather than adding
to our stack frame)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index b48a3b46e86f..2c254cf49c15 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1215,7 +1215,6 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 	int state = interruptible ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
 	struct intel_wait wait;
 	unsigned long timeout_remain;
-	s64 before = 0; /* Only to silence a compiler warning. */
 	int ret = 0;
 
 	might_sleep();
@@ -1234,12 +1233,9 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 		if (*timeout == 0)
 			return -ETIME;
 
+		/* Record current time in case interrupted, or wedged */
 		timeout_remain = nsecs_to_jiffies_timeout(*timeout);
-
-		/*
-		 * Record current time in case interrupted by signal, or wedged.
-		 */
-		before = ktime_get_raw_ns();
+		*timeout += ktime_get_raw_ns();
 	}
 
 	trace_i915_gem_request_wait_begin(req);
@@ -1296,9 +1292,9 @@ complete:
 	trace_i915_gem_request_wait_end(req);
 
 	if (timeout) {
-		s64 tres = *timeout - (ktime_get_raw_ns() - before);
-
-		*timeout = tres < 0 ? 0 : tres;
+		*timeout -= ktime_get_raw_ns();
+		if (*timeout < 0)
+			*timeout = 0;
 
 		/*
 		 * Apparently ktime isn't accurate enough and occasionally has a
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [CI 17/20] drm/i915: Convert trace-irq to the breadcrumb waiter
  2016-05-19 11:32 [CI 01/20] drm: Restore double clflush on the last partial cacheline Chris Wilson
                   ` (14 preceding siblings ...)
  2016-05-19 11:32 ` [CI 16/20] drm/i915: Only query timestamp when measuring elapsed time Chris Wilson
@ 2016-05-19 11:32 ` Chris Wilson
  2016-05-19 11:32 ` [CI 18/20] drm/i915: Move the get/put irq locking into the caller Chris Wilson
                   ` (3 subsequent siblings)
  19 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2016-05-19 11:32 UTC (permalink / raw)
  To: intel-gfx

If we convert the tracing over from direct use of ring->irq_get() and
over to the breadcrumb infrastructure, we only have a single user of the
ring->irq_get and so we will be able to simplify the driver routines
(eliminating the redundant validation and irq refcounting).

v2: Move to a signaling framework based upon the waiter.
v3: Track the first-signal to avoid having to walk the rbtree everytime.
v4: Mark the signaler thread as RT priority to reduce latency in the
indirect wakeups.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h          |   8 --
 drivers/gpu/drm/i915/i915_gem.c          |   6 --
 drivers/gpu/drm/i915/i915_trace.h        |   2 +-
 drivers/gpu/drm/i915/intel_breadcrumbs.c | 176 +++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_ringbuffer.h  |   7 +-
 5 files changed, 183 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index ce685a83bb03..4468fff349d9 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3807,14 +3807,6 @@ wait_remaining_ms_from_jiffies(unsigned long timestamp_jiffies, int to_wait_ms)
 			    schedule_timeout_uninterruptible(remaining_jiffies);
 	}
 }
-
-static inline void i915_trace_irq_get(struct intel_engine_cs *engine,
-				      struct drm_i915_gem_request *req)
-{
-	if (engine->trace_irq_req == NULL && engine->irq_get(engine))
-		i915_gem_request_assign(&engine->trace_irq_req, req);
-}
-
 static inline bool __i915_request_irq_complete(struct drm_i915_gem_request *req)
 {
 	struct intel_engine_cs *engine = req->engine;
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 2c254cf49c15..0ec9ee8fd941 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2935,12 +2935,6 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *engine)
 		i915_gem_object_retire__read(obj, engine->id);
 	}
 
-	if (unlikely(engine->trace_irq_req &&
-		     i915_gem_request_completed(engine->trace_irq_req))) {
-		engine->irq_put(engine);
-		i915_gem_request_assign(&engine->trace_irq_req, NULL);
-	}
-
 	WARN_ON(i915_verify_lists(engine->dev));
 }
 
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index 09bbb71e9ec5..f40ca977321c 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -490,7 +490,7 @@ TRACE_EVENT(i915_gem_ring_dispatch,
 			   __entry->ring = req->engine->id;
 			   __entry->seqno = req->seqno;
 			   __entry->flags = flags;
-			   i915_trace_irq_get(req->engine, req);
+			   intel_engine_enable_signaling(req);
 			   ),
 
 	    TP_printk("dev=%u, ring=%u, seqno=%u, flags=%x",
diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
index 578de43cb07e..14fb9fdcde3a 100644
--- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
@@ -22,6 +22,8 @@
  *
  */
 
+#include <linux/kthread.h>
+
 #include "i915_drv.h"
 
 static void intel_breadcrumbs_fake_irq(unsigned long data)
@@ -315,10 +317,184 @@ void intel_engine_init_breadcrumbs(struct intel_engine_cs *engine)
 		    (unsigned long)engine);
 }
 
+struct signal {
+	struct rb_node node;
+	struct intel_wait wait;
+	struct drm_i915_gem_request *request;
+};
+
+static bool signal_complete(struct signal *signal)
+{
+	if (signal == NULL)
+		return false;
+
+	/* If another process served as the bottom-half it may have already
+	 * signalled that this wait is already completed.
+	 */
+	if (intel_wait_complete(&signal->wait))
+		return true;
+
+	/* Carefully check if the request is complete, giving time for the
+	 * seqno to be visible or if the GPU hung.
+	 */
+	if (__i915_request_irq_complete(signal->request))
+		return true;
+
+	return false;
+}
+
+static struct signal *to_signal(struct rb_node *rb)
+{
+	return container_of(rb, struct signal, node);
+}
+
+static void signaler_set_rtpriority(void)
+{
+	 struct sched_param param = { .sched_priority = 1 };
+	 sched_setscheduler_nocheck(current, SCHED_FIFO, &param);
+}
+
+static int intel_breadcrumbs_signaler(void *arg)
+{
+	struct intel_engine_cs *engine = arg;
+	struct intel_breadcrumbs *b = &engine->breadcrumbs;
+	struct signal *signal;
+
+	/* Install ourselves with high priority to reduce signalling latency */
+	signaler_set_rtpriority();
+
+	do {
+		set_current_state(TASK_INTERRUPTIBLE);
+
+		/* We are either woken up by the interrupt bottom-half,
+		 * or by a client adding a new signaller. In both cases,
+		 * the GPU seqno may have advanced beyond our oldest signal.
+		 * If it has, propagate the signal, remove the waiter and
+		 * check again with the next oldest signal. Otherwise we
+		 * need to wait for a new interrupt from the GPU or for
+		 * a new client.
+		 */
+		signal = READ_ONCE(b->first_signal);
+		if (signal_complete(signal)) {
+			/* Wake up all other completed waiters and select the
+			 * next bottom-half for the next user interrupt.
+			 */
+			intel_engine_remove_wait(engine, &signal->wait);
+
+			i915_gem_request_unreference(signal->request);
+
+			/* Find the next oldest signal. Note that as we have
+			 * not been holding the lock, another client may
+			 * have installed an even older signal than the one
+			 * we just completed - so double check we are still
+			 * the oldest before picking the next one.
+			 */
+			spin_lock(&b->lock);
+			if (signal == b->first_signal)
+				b->first_signal = rb_next(&signal->node);
+			rb_erase(&signal->node, &b->signals);
+			spin_unlock(&b->lock);
+
+			kfree(signal);
+		} else {
+			if (kthread_should_stop())
+				break;
+
+			schedule();
+		}
+	} while (1);
+
+	return 0;
+}
+
+int intel_engine_enable_signaling(struct drm_i915_gem_request *request)
+{
+	struct intel_engine_cs *engine = request->engine;
+	struct intel_breadcrumbs *b = &engine->breadcrumbs;
+	struct rb_node *parent, **p;
+	struct task_struct *task;
+	struct signal *signal;
+	bool first;
+
+	signal = kmalloc(sizeof(*signal), GFP_ATOMIC);
+	if (unlikely(signal == NULL))
+		return -ENOMEM;
+
+	/* Spawn a thread to provide a common bottom-half for all signals.
+	 * As this is an asynchronous interface we cannot steal the current
+	 * task for handling the bottom-half to the user interrupt, therefore
+	 * we create a thread to do the coherent seqno dance after the
+	 * interrupt and then signal the waitqueue (via the dma-buf/fence).
+	 */
+	task = READ_ONCE(b->signaler);
+	if (unlikely(task == NULL)) {
+		spin_lock(&b->lock);
+		task = b->signaler;
+		if (task == NULL) {
+			task = kthread_create(intel_breadcrumbs_signaler,
+					      engine,
+					      "irq/i915:%d",
+					      engine->id);
+			if (!IS_ERR(task))
+				b->signaler = task;
+		}
+		spin_unlock(&b->lock);
+
+		if (IS_ERR(task)) {
+			kfree(signal);
+			return PTR_ERR(task);
+		}
+	}
+
+	signal->wait.task = task;
+	signal->wait.seqno = request->seqno;
+
+	signal->request = i915_gem_request_reference(request);
+
+	/* Insert ourselves into the retirement ordered list of signals
+	 * on this engine. We track the oldest seqno as that will be the
+	 * first signal to complete.
+	 */
+	spin_lock(&b->lock);
+	parent = NULL;
+	first = true;
+	p = &b->signals.rb_node;
+	while (*p) {
+		parent = *p;
+		if (i915_seqno_passed(signal->wait.seqno,
+				      to_signal(parent)->wait.seqno)) {
+			p = &parent->rb_right;
+			first = false;
+		} else
+			p = &parent->rb_left;
+	}
+	rb_link_node(&signal->node, parent, p);
+	rb_insert_color(&signal->node, &b->signals);
+	if (first)
+		smp_store_mb(b->first_signal, signal);
+	spin_unlock(&b->lock);
+
+	/* Now add ourselves into the list of waiters, but register our
+	 * bottom-half as the signaller thread. As per usual, only the oldest
+	 * waiter (not just signaller) is tasked as the bottom-half waking
+	 * up all completed waiters after the user interrupt.
+	 *
+	 * If we are the oldest waiter, enable the irq (after which we
+	 * must double check that the seqno did not complete).
+	 */
+	if (intel_engine_add_wait(engine, &signal->wait))
+		wake_up_process(task);
+
+	return 0;
+}
+
 void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine)
 {
 	struct intel_breadcrumbs *b = &engine->breadcrumbs;
 
+	if (b->signaler)
+		kthread_stop(b->signaler);
+
 	del_timer_sync(&b->fake_irq);
 }
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index a2e988068cb4..ca611fe6997e 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -141,6 +141,8 @@ struct  i915_ctx_workarounds {
 	struct drm_i915_gem_object *obj;
 };
 
+struct drm_i915_gem_request;
+
 struct intel_engine_cs {
 	struct drm_i915_private *i915;
 	const char	*name;
@@ -179,7 +181,10 @@ struct intel_engine_cs {
 	struct intel_breadcrumbs {
 		spinlock_t lock; /* protects the lists of requests */
 		struct rb_root waiters; /* sorted by retirement, priority */
+		struct rb_root signals; /* sorted by retirement */
 		struct task_struct *first_waiter; /* bh for user interrupts */
+		struct task_struct *signaler; /* used for fence signalling */
+		void *first_signal;
 		struct timer_list fake_irq; /* used after a missed interrupt */
 		bool irq_enabled;
 		bool rpm_wakelock;
@@ -198,7 +203,6 @@ struct intel_engine_cs {
 	unsigned irq_refcount; /* protected by dev_priv->irq_lock */
 	bool		irq_posted;
 	u32		irq_enable_mask;	/* bitmask to enable ring interrupt */
-	struct drm_i915_gem_request *trace_irq_req;
 	bool __must_check (*irq_get)(struct intel_engine_cs *ring);
 	void		(*irq_put)(struct intel_engine_cs *ring);
 
@@ -539,6 +543,7 @@ bool intel_engine_add_wait(struct intel_engine_cs *engine,
 			   struct intel_wait *wait);
 void intel_engine_remove_wait(struct intel_engine_cs *engine,
 			      struct intel_wait *wait);
+int intel_engine_enable_signaling(struct drm_i915_gem_request *request);
 static inline bool intel_engine_has_waiter(struct intel_engine_cs *engine)
 {
 	return READ_ONCE(engine->breadcrumbs.first_waiter);
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [CI 18/20] drm/i915: Move the get/put irq locking into the caller
  2016-05-19 11:32 [CI 01/20] drm: Restore double clflush on the last partial cacheline Chris Wilson
                   ` (15 preceding siblings ...)
  2016-05-19 11:32 ` [CI 17/20] drm/i915: Convert trace-irq to the breadcrumb waiter Chris Wilson
@ 2016-05-19 11:32 ` Chris Wilson
  2016-05-19 11:32 ` [CI 19/20] drm/i915: Simplify enabling user-interrupts with L3-remapping Chris Wilson
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2016-05-19 11:32 UTC (permalink / raw)
  To: intel-gfx

With only a single callsite for intel_engine_cs->irq_get and ->irq_put,
we can reduce the code size by moving the common preamble into the
caller, and we can also eliminate the reference counting.

For completeness, as we are no longer doing reference counting on irq,
rename the get/put vfunctions to enable/disable respectively.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_irq.c          |   8 +-
 drivers/gpu/drm/i915/intel_breadcrumbs.c |   8 +-
 drivers/gpu/drm/i915/intel_lrc.c         |  34 +---
 drivers/gpu/drm/i915/intel_ringbuffer.c  | 269 ++++++++++---------------------
 drivers/gpu/drm/i915/intel_ringbuffer.h  |   5 +-
 5 files changed, 106 insertions(+), 218 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index f62fcf3f6ea8..70f7617b5bf1 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -259,12 +259,12 @@ static void ilk_update_gt_irq(struct drm_i915_private *dev_priv,
 	dev_priv->gt_irq_mask &= ~interrupt_mask;
 	dev_priv->gt_irq_mask |= (~enabled_irq_mask & interrupt_mask);
 	I915_WRITE(GTIMR, dev_priv->gt_irq_mask);
-	POSTING_READ(GTIMR);
 }
 
 void gen5_enable_gt_irq(struct drm_i915_private *dev_priv, uint32_t mask)
 {
 	ilk_update_gt_irq(dev_priv, mask, mask);
+	POSTING_READ_FW(GTIMR);
 }
 
 void gen5_disable_gt_irq(struct drm_i915_private *dev_priv, uint32_t mask)
@@ -2840,9 +2840,9 @@ ring_idle(struct intel_engine_cs *engine, u32 seqno)
 }
 
 static bool
-ipehr_is_semaphore_wait(struct drm_i915_private *dev_priv, u32 ipehr)
+ipehr_is_semaphore_wait(struct intel_engine_cs *engine, u32 ipehr)
 {
-	if (INTEL_GEN(dev_priv) >= 8) {
+	if (INTEL_GEN(engine->i915) >= 8) {
 		return (ipehr >> 23) == 0x1c;
 	} else {
 		ipehr &= ~MI_SEMAPHORE_SYNC_MASK;
@@ -2913,7 +2913,7 @@ semaphore_waits_for(struct intel_engine_cs *engine, u32 *seqno)
 		return NULL;
 
 	ipehr = I915_READ(RING_IPEHR(engine->mmio_base));
-	if (!ipehr_is_semaphore_wait(engine->i915, ipehr))
+	if (!ipehr_is_semaphore_wait(engine, ipehr))
 		return NULL;
 
 	/*
diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
index 14fb9fdcde3a..475c454d11bf 100644
--- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
@@ -51,12 +51,16 @@ static void irq_enable(struct intel_engine_cs *engine)
 	 */
 	engine->irq_posted = true;
 
-	WARN_ON(!engine->irq_get(engine));
+	spin_lock_irq(&engine->i915->irq_lock);
+	engine->irq_enable(engine);
+	spin_unlock_irq(&engine->i915->irq_lock);
 }
 
 static void irq_disable(struct intel_engine_cs *engine)
 {
-	engine->irq_put(engine);
+	spin_lock_irq(&engine->i915->irq_lock);
+	engine->irq_disable(engine);
+	spin_unlock_irq(&engine->i915->irq_lock);
 
 	engine->irq_posted = false;
 }
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 2991fd2352b9..8ac0f1e5a36f 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1582,36 +1582,18 @@ static int gen8_emit_bb_start(struct drm_i915_gem_request *req,
 	return 0;
 }
 
-static bool gen8_logical_ring_get_irq(struct intel_engine_cs *engine)
+static void gen8_logical_ring_enable_irq(struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *dev_priv = engine->i915;
-	unsigned long flags;
-
-	if (WARN_ON(!intel_irqs_enabled(dev_priv)))
-		return false;
-
-	spin_lock_irqsave(&dev_priv->irq_lock, flags);
-	if (engine->irq_refcount++ == 0) {
-		I915_WRITE_IMR(engine,
-			       ~(engine->irq_enable_mask | engine->irq_keep_mask));
-		POSTING_READ(RING_IMR(engine->mmio_base));
-	}
-	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
-
-	return true;
+	I915_WRITE_IMR(engine,
+		       ~(engine->irq_enable_mask | engine->irq_keep_mask));
+	POSTING_READ_FW(RING_IMR(engine->mmio_base));
 }
 
-static void gen8_logical_ring_put_irq(struct intel_engine_cs *engine)
+static void gen8_logical_ring_disable_irq(struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *dev_priv = engine->i915;
-	unsigned long flags;
-
-	spin_lock_irqsave(&dev_priv->irq_lock, flags);
-	if (--engine->irq_refcount == 0) {
-		I915_WRITE_IMR(engine, ~engine->irq_keep_mask);
-		POSTING_READ(RING_IMR(engine->mmio_base));
-	}
-	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
+	I915_WRITE_IMR(engine, ~engine->irq_keep_mask);
 }
 
 static int gen8_emit_flush(struct drm_i915_gem_request *request,
@@ -1899,8 +1881,8 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
 	engine->init_hw = gen8_init_common_ring;
 	engine->emit_request = gen8_emit_request;
 	engine->emit_flush = gen8_emit_flush;
-	engine->irq_get = gen8_logical_ring_get_irq;
-	engine->irq_put = gen8_logical_ring_put_irq;
+	engine->irq_enable = gen8_logical_ring_enable_irq;
+	engine->irq_disable = gen8_logical_ring_disable_irq;
 	engine->emit_bb_start = gen8_emit_bb_start;
 	if (IS_BXT_REVID(engine->i915, 0, BXT_REVID_A1))
 		engine->irq_seqno_barrier = bxt_a_seqno_barrier;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 63a0c78d6a22..25b7ad684a28 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1551,103 +1551,54 @@ gen6_seqno_barrier(struct intel_engine_cs *engine)
 	spin_unlock_irq(&dev_priv->uncore.lock);
 }
 
-static bool
-gen5_ring_get_irq(struct intel_engine_cs *engine)
+static void
+gen5_ring_enable_irq(struct intel_engine_cs *engine)
 {
-	struct drm_i915_private *dev_priv = engine->i915;
-	unsigned long flags;
-
-	if (WARN_ON(!intel_irqs_enabled(dev_priv)))
-		return false;
-
-	spin_lock_irqsave(&dev_priv->irq_lock, flags);
-	if (engine->irq_refcount++ == 0)
-		gen5_enable_gt_irq(dev_priv, engine->irq_enable_mask);
-	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
-
-	return true;
+	gen5_enable_gt_irq(engine->i915, engine->irq_enable_mask);
 }
 
 static void
-gen5_ring_put_irq(struct intel_engine_cs *engine)
+gen5_ring_disable_irq(struct intel_engine_cs *engine)
 {
-	struct drm_i915_private *dev_priv = engine->i915;
-	unsigned long flags;
-
-	spin_lock_irqsave(&dev_priv->irq_lock, flags);
-	if (--engine->irq_refcount == 0)
-		gen5_disable_gt_irq(dev_priv, engine->irq_enable_mask);
-	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
+	gen5_disable_gt_irq(engine->i915, engine->irq_enable_mask);
 }
 
-static bool
-i9xx_ring_get_irq(struct intel_engine_cs *engine)
+static void
+i9xx_ring_enable_irq(struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *dev_priv = engine->i915;
-	unsigned long flags;
-
-	if (!intel_irqs_enabled(dev_priv))
-		return false;
-
-	spin_lock_irqsave(&dev_priv->irq_lock, flags);
-	if (engine->irq_refcount++ == 0) {
-		dev_priv->irq_mask &= ~engine->irq_enable_mask;
-		I915_WRITE(IMR, dev_priv->irq_mask);
-		POSTING_READ(IMR);
-	}
-	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
 
-	return true;
+	dev_priv->irq_mask &= ~engine->irq_enable_mask;
+	I915_WRITE(IMR, dev_priv->irq_mask);
+	POSTING_READ_FW(RING_IMR(engine->mmio_base));
 }
 
 static void
-i9xx_ring_put_irq(struct intel_engine_cs *engine)
+i9xx_ring_disable_irq(struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *dev_priv = engine->i915;
-	unsigned long flags;
 
-	spin_lock_irqsave(&dev_priv->irq_lock, flags);
-	if (--engine->irq_refcount == 0) {
-		dev_priv->irq_mask |= engine->irq_enable_mask;
-		I915_WRITE(IMR, dev_priv->irq_mask);
-		POSTING_READ(IMR);
-	}
-	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
+	dev_priv->irq_mask |= engine->irq_enable_mask;
+	I915_WRITE(IMR, dev_priv->irq_mask);
 }
 
-static bool
-i8xx_ring_get_irq(struct intel_engine_cs *engine)
+static void
+i8xx_ring_enable_irq(struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *dev_priv = engine->i915;
-	unsigned long flags;
 
-	if (!intel_irqs_enabled(dev_priv))
-		return false;
-
-	spin_lock_irqsave(&dev_priv->irq_lock, flags);
-	if (engine->irq_refcount++ == 0) {
-		dev_priv->irq_mask &= ~engine->irq_enable_mask;
-		I915_WRITE16(IMR, dev_priv->irq_mask);
-		POSTING_READ16(IMR);
-	}
-	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
-
-	return true;
+	dev_priv->irq_mask &= ~engine->irq_enable_mask;
+	I915_WRITE16(IMR, dev_priv->irq_mask);
+	POSTING_READ16(RING_IMR(engine->mmio_base));
 }
 
 static void
-i8xx_ring_put_irq(struct intel_engine_cs *engine)
+i8xx_ring_disable_irq(struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *dev_priv = engine->i915;
-	unsigned long flags;
 
-	spin_lock_irqsave(&dev_priv->irq_lock, flags);
-	if (--engine->irq_refcount == 0) {
-		dev_priv->irq_mask |= engine->irq_enable_mask;
-		I915_WRITE16(IMR, dev_priv->irq_mask);
-		POSTING_READ16(IMR);
-	}
-	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
+	dev_priv->irq_mask |= engine->irq_enable_mask;
+	I915_WRITE16(IMR, dev_priv->irq_mask);
 }
 
 static int
@@ -1688,122 +1639,74 @@ i9xx_add_request(struct drm_i915_gem_request *req)
 	return 0;
 }
 
-static bool
-gen6_ring_get_irq(struct intel_engine_cs *engine)
+static void
+gen6_ring_enable_irq(struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *dev_priv = engine->i915;
-	unsigned long flags;
-
-	if (WARN_ON(!intel_irqs_enabled(dev_priv)))
-		return false;
 
-	spin_lock_irqsave(&dev_priv->irq_lock, flags);
-	if (engine->irq_refcount++ == 0) {
-		if (HAS_L3_DPF(dev_priv) && engine->id == RCS)
-			I915_WRITE_IMR(engine,
-				       ~(engine->irq_enable_mask |
-					 GT_PARITY_ERROR(dev_priv)));
-		else
-			I915_WRITE_IMR(engine, ~engine->irq_enable_mask);
-		gen5_enable_gt_irq(dev_priv, engine->irq_enable_mask);
-	}
-	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
-
-	return true;
+	if (HAS_L3_DPF(dev_priv) && engine->id == RCS)
+		I915_WRITE_IMR(engine,
+			       ~(engine->irq_enable_mask |
+				 GT_PARITY_ERROR(dev_priv)));
+	else
+		I915_WRITE_IMR(engine, ~engine->irq_enable_mask);
+	gen5_enable_gt_irq(dev_priv, engine->irq_enable_mask);
 }
 
 static void
-gen6_ring_put_irq(struct intel_engine_cs *engine)
+gen6_ring_disable_irq(struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *dev_priv = engine->i915;
-	unsigned long flags;
 
-	spin_lock_irqsave(&dev_priv->irq_lock, flags);
-	if (--engine->irq_refcount == 0) {
-		if (HAS_L3_DPF(dev_priv) && engine->id == RCS)
-			I915_WRITE_IMR(engine, ~GT_PARITY_ERROR(dev_priv));
-		else
-			I915_WRITE_IMR(engine, ~0);
-		gen5_disable_gt_irq(dev_priv, engine->irq_enable_mask);
-	}
-	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
+	if (HAS_L3_DPF(dev_priv) && engine->id == RCS)
+		I915_WRITE_IMR(engine, ~GT_PARITY_ERROR(dev_priv));
+	else
+		I915_WRITE_IMR(engine, ~0);
+	gen5_disable_gt_irq(dev_priv, engine->irq_enable_mask);
 }
 
-static bool
-hsw_vebox_get_irq(struct intel_engine_cs *engine)
+static void
+hsw_vebox_enable_irq(struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *dev_priv = engine->i915;
-	unsigned long flags;
-
-	if (WARN_ON(!intel_irqs_enabled(dev_priv)))
-		return false;
 
-	spin_lock_irqsave(&dev_priv->irq_lock, flags);
-	if (engine->irq_refcount++ == 0) {
-		I915_WRITE_IMR(engine, ~engine->irq_enable_mask);
-		gen6_enable_pm_irq(dev_priv, engine->irq_enable_mask);
-	}
-	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
-
-	return true;
+	I915_WRITE_IMR(engine, ~engine->irq_enable_mask);
+	gen6_enable_pm_irq(dev_priv, engine->irq_enable_mask);
 }
 
 static void
-hsw_vebox_put_irq(struct intel_engine_cs *engine)
+hsw_vebox_disable_irq(struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *dev_priv = engine->i915;
-	unsigned long flags;
 
-	spin_lock_irqsave(&dev_priv->irq_lock, flags);
-	if (--engine->irq_refcount == 0) {
-		I915_WRITE_IMR(engine, ~0);
-		gen6_disable_pm_irq(dev_priv, engine->irq_enable_mask);
-	}
-	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
+	I915_WRITE_IMR(engine, ~0);
+	gen6_disable_pm_irq(dev_priv, engine->irq_enable_mask);
 }
 
-static bool
-gen8_ring_get_irq(struct intel_engine_cs *engine)
+static void
+gen8_ring_enable_irq(struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *dev_priv = engine->i915;
-	unsigned long flags;
 
-	if (WARN_ON(!intel_irqs_enabled(dev_priv)))
-		return false;
-
-	spin_lock_irqsave(&dev_priv->irq_lock, flags);
-	if (engine->irq_refcount++ == 0) {
-		if (HAS_L3_DPF(dev_priv) && engine->id == RCS) {
-			I915_WRITE_IMR(engine,
-				       ~(engine->irq_enable_mask |
-					 GT_RENDER_L3_PARITY_ERROR_INTERRUPT));
-		} else {
-			I915_WRITE_IMR(engine, ~engine->irq_enable_mask);
-		}
-		POSTING_READ(RING_IMR(engine->mmio_base));
-	}
-	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
-
-	return true;
+	if (HAS_L3_DPF(dev_priv) && engine->id == RCS)
+		I915_WRITE_IMR(engine,
+			       ~(engine->irq_enable_mask |
+				 GT_RENDER_L3_PARITY_ERROR_INTERRUPT));
+	else
+		I915_WRITE_IMR(engine, ~engine->irq_enable_mask);
+	POSTING_READ_FW(RING_IMR(engine->mmio_base));
 }
 
 static void
-gen8_ring_put_irq(struct intel_engine_cs *engine)
+gen8_ring_disable_irq(struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *dev_priv = engine->i915;
-	unsigned long flags;
 
-	spin_lock_irqsave(&dev_priv->irq_lock, flags);
-	if (--engine->irq_refcount == 0) {
-		if (HAS_L3_DPF(dev_priv) && engine->id == RCS) {
-			I915_WRITE_IMR(engine,
-				       ~GT_RENDER_L3_PARITY_ERROR_INTERRUPT);
-		} else {
-			I915_WRITE_IMR(engine, ~0);
-		}
-		POSTING_READ(RING_IMR(engine->mmio_base));
-	}
-	spin_unlock_irqrestore(&dev_priv->irq_lock, flags);
+	if (HAS_L3_DPF(dev_priv) && engine->id == RCS)
+		I915_WRITE_IMR(engine,
+			       ~GT_RENDER_L3_PARITY_ERROR_INTERRUPT);
+	else
+		I915_WRITE_IMR(engine, ~0);
 }
 
 static int
@@ -2674,8 +2577,8 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 		engine->init_context = intel_rcs_ctx_init;
 		engine->add_request = gen8_render_add_request;
 		engine->flush = gen8_render_ring_flush;
-		engine->irq_get = gen8_ring_get_irq;
-		engine->irq_put = gen8_ring_put_irq;
+		engine->irq_enable = gen8_ring_enable_irq;
+		engine->irq_disable = gen8_ring_disable_irq;
 		engine->irq_enable_mask = GT_RENDER_USER_INTERRUPT;
 		if (i915_semaphore_is_enabled(dev_priv)) {
 			WARN_ON(!dev_priv->semaphore_obj);
@@ -2689,8 +2592,8 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 		engine->flush = gen7_render_ring_flush;
 		if (IS_GEN6(dev_priv))
 			engine->flush = gen6_render_ring_flush;
-		engine->irq_get = gen6_ring_get_irq;
-		engine->irq_put = gen6_ring_put_irq;
+		engine->irq_enable = gen6_ring_enable_irq;
+		engine->irq_disable = gen6_ring_disable_irq;
 		engine->irq_enable_mask = GT_RENDER_USER_INTERRUPT;
 		engine->irq_seqno_barrier = gen6_seqno_barrier;
 		if (i915_semaphore_is_enabled(dev_priv)) {
@@ -2717,8 +2620,8 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 	} else if (IS_GEN5(dev_priv)) {
 		engine->add_request = i9xx_add_request;
 		engine->flush = gen4_render_ring_flush;
-		engine->irq_get = gen5_ring_get_irq;
-		engine->irq_put = gen5_ring_put_irq;
+		engine->irq_enable = gen5_ring_enable_irq;
+		engine->irq_disable = gen5_ring_disable_irq;
 		engine->irq_seqno_barrier = gen5_seqno_barrier;
 		engine->irq_enable_mask = GT_RENDER_USER_INTERRUPT;
 	} else {
@@ -2728,11 +2631,11 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 		else
 			engine->flush = gen4_render_ring_flush;
 		if (IS_GEN2(dev_priv)) {
-			engine->irq_get = i8xx_ring_get_irq;
-			engine->irq_put = i8xx_ring_put_irq;
+			engine->irq_enable = i8xx_ring_enable_irq;
+			engine->irq_disable = i8xx_ring_disable_irq;
 		} else {
-			engine->irq_get = i9xx_ring_get_irq;
-			engine->irq_put = i9xx_ring_put_irq;
+			engine->irq_enable = i9xx_ring_enable_irq;
+			engine->irq_disable = i9xx_ring_disable_irq;
 		}
 		engine->irq_enable_mask = I915_USER_INTERRUPT;
 	}
@@ -2792,8 +2695,8 @@ int intel_init_bsd_ring_buffer(struct drm_device *dev)
 		if (INTEL_GEN(dev_priv) >= 8) {
 			engine->irq_enable_mask =
 				GT_RENDER_USER_INTERRUPT << GEN8_VCS1_IRQ_SHIFT;
-			engine->irq_get = gen8_ring_get_irq;
-			engine->irq_put = gen8_ring_put_irq;
+			engine->irq_enable = gen8_ring_enable_irq;
+			engine->irq_disable = gen8_ring_disable_irq;
 			engine->dispatch_execbuffer =
 				gen8_ring_dispatch_execbuffer;
 			if (i915_semaphore_is_enabled(dev_priv)) {
@@ -2803,8 +2706,8 @@ int intel_init_bsd_ring_buffer(struct drm_device *dev)
 			}
 		} else {
 			engine->irq_enable_mask = GT_BSD_USER_INTERRUPT;
-			engine->irq_get = gen6_ring_get_irq;
-			engine->irq_put = gen6_ring_put_irq;
+			engine->irq_enable = gen6_ring_enable_irq;
+			engine->irq_disable = gen6_ring_disable_irq;
 			engine->dispatch_execbuffer =
 				gen6_ring_dispatch_execbuffer;
 			if (i915_semaphore_is_enabled(dev_priv)) {
@@ -2828,13 +2731,13 @@ int intel_init_bsd_ring_buffer(struct drm_device *dev)
 		engine->add_request = i9xx_add_request;
 		if (IS_GEN5(dev_priv)) {
 			engine->irq_enable_mask = ILK_BSD_USER_INTERRUPT;
-			engine->irq_get = gen5_ring_get_irq;
-			engine->irq_put = gen5_ring_put_irq;
+			engine->irq_enable = gen5_ring_enable_irq;
+			engine->irq_disable = gen5_ring_disable_irq;
 			engine->irq_seqno_barrier = gen5_seqno_barrier;
 		} else {
 			engine->irq_enable_mask = I915_BSD_USER_INTERRUPT;
-			engine->irq_get = i9xx_ring_get_irq;
-			engine->irq_put = i9xx_ring_put_irq;
+			engine->irq_enable = i9xx_ring_enable_irq;
+			engine->irq_disable = i9xx_ring_disable_irq;
 		}
 		engine->dispatch_execbuffer = i965_dispatch_execbuffer;
 	}
@@ -2863,8 +2766,8 @@ int intel_init_bsd2_ring_buffer(struct drm_device *dev)
 	engine->irq_seqno_barrier = gen6_seqno_barrier;
 	engine->irq_enable_mask =
 			GT_RENDER_USER_INTERRUPT << GEN8_VCS2_IRQ_SHIFT;
-	engine->irq_get = gen8_ring_get_irq;
-	engine->irq_put = gen8_ring_put_irq;
+	engine->irq_enable = gen8_ring_enable_irq;
+	engine->irq_disable = gen8_ring_disable_irq;
 	engine->dispatch_execbuffer =
 			gen8_ring_dispatch_execbuffer;
 	if (i915_semaphore_is_enabled(dev_priv)) {
@@ -2895,8 +2798,8 @@ int intel_init_blt_ring_buffer(struct drm_device *dev)
 	if (INTEL_GEN(dev_priv) >= 8) {
 		engine->irq_enable_mask =
 			GT_RENDER_USER_INTERRUPT << GEN8_BCS_IRQ_SHIFT;
-		engine->irq_get = gen8_ring_get_irq;
-		engine->irq_put = gen8_ring_put_irq;
+		engine->irq_enable = gen8_ring_enable_irq;
+		engine->irq_disable = gen8_ring_disable_irq;
 		engine->dispatch_execbuffer = gen8_ring_dispatch_execbuffer;
 		if (i915_semaphore_is_enabled(dev_priv)) {
 			engine->semaphore.sync_to = gen8_ring_sync;
@@ -2905,8 +2808,8 @@ int intel_init_blt_ring_buffer(struct drm_device *dev)
 		}
 	} else {
 		engine->irq_enable_mask = GT_BLT_USER_INTERRUPT;
-		engine->irq_get = gen6_ring_get_irq;
-		engine->irq_put = gen6_ring_put_irq;
+		engine->irq_enable = gen6_ring_enable_irq;
+		engine->irq_disable = gen6_ring_disable_irq;
 		engine->dispatch_execbuffer = gen6_ring_dispatch_execbuffer;
 		if (i915_semaphore_is_enabled(dev_priv)) {
 			engine->semaphore.signal = gen6_signal;
@@ -2954,8 +2857,8 @@ int intel_init_vebox_ring_buffer(struct drm_device *dev)
 	if (INTEL_GEN(dev_priv) >= 8) {
 		engine->irq_enable_mask =
 			GT_RENDER_USER_INTERRUPT << GEN8_VECS_IRQ_SHIFT;
-		engine->irq_get = gen8_ring_get_irq;
-		engine->irq_put = gen8_ring_put_irq;
+		engine->irq_enable = gen8_ring_enable_irq;
+		engine->irq_disable = gen8_ring_disable_irq;
 		engine->dispatch_execbuffer = gen8_ring_dispatch_execbuffer;
 		if (i915_semaphore_is_enabled(dev_priv)) {
 			engine->semaphore.sync_to = gen8_ring_sync;
@@ -2964,8 +2867,8 @@ int intel_init_vebox_ring_buffer(struct drm_device *dev)
 		}
 	} else {
 		engine->irq_enable_mask = PM_VEBOX_USER_INTERRUPT;
-		engine->irq_get = hsw_vebox_get_irq;
-		engine->irq_put = hsw_vebox_put_irq;
+		engine->irq_enable = hsw_vebox_enable_irq;
+		engine->irq_disable = hsw_vebox_disable_irq;
 		engine->dispatch_execbuffer = gen6_ring_dispatch_execbuffer;
 		if (i915_semaphore_is_enabled(dev_priv)) {
 			engine->semaphore.sync_to = gen6_ring_sync;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index ca611fe6997e..b591f5dd23cd 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -200,11 +200,10 @@ struct intel_engine_cs {
 	struct intel_hw_status_page status_page;
 	struct i915_ctx_workarounds wa_ctx;
 
-	unsigned irq_refcount; /* protected by dev_priv->irq_lock */
 	bool		irq_posted;
 	u32		irq_enable_mask;	/* bitmask to enable ring interrupt */
-	bool __must_check (*irq_get)(struct intel_engine_cs *ring);
-	void		(*irq_put)(struct intel_engine_cs *ring);
+	void		(*irq_enable)(struct intel_engine_cs *ring);
+	void		(*irq_disable)(struct intel_engine_cs *ring);
 
 	int		(*init_hw)(struct intel_engine_cs *ring);
 
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [CI 19/20] drm/i915: Simplify enabling user-interrupts with L3-remapping
  2016-05-19 11:32 [CI 01/20] drm: Restore double clflush on the last partial cacheline Chris Wilson
                   ` (16 preceding siblings ...)
  2016-05-19 11:32 ` [CI 18/20] drm/i915: Move the get/put irq locking into the caller Chris Wilson
@ 2016-05-19 11:32 ` Chris Wilson
  2016-05-19 11:32 ` [CI 20/20] drm/i915: Remove debug noise on detecting fault-injection of missed interrupts Chris Wilson
  2016-05-19 12:07 ` ✗ Ro.CI.BAT: warning for series starting with [CI,01/20] drm: Restore double clflush on the last partial cacheline Patchwork
  19 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2016-05-19 11:32 UTC (permalink / raw)
  To: intel-gfx

Borrow the idea from intel_lrc.c to precompute the mask of interrupts we
wish to always enable to avoid having lots of conditionals inside the
interrupt enabling.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 35 +++++++++++----------------------
 drivers/gpu/drm/i915/intel_ringbuffer.h |  4 ++--
 2 files changed, 14 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 25b7ad684a28..055d344cbac4 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1227,8 +1227,7 @@ static int init_render_ring(struct intel_engine_cs *engine)
 	if (IS_GEN(dev_priv, 6, 7))
 		I915_WRITE(INSTPM, _MASKED_BIT_ENABLE(INSTPM_FORCE_ORDERING));
 
-	if (HAS_L3_DPF(dev_priv))
-		I915_WRITE_IMR(engine, ~GT_PARITY_ERROR(dev_priv));
+	I915_WRITE_IMR(engine, ~engine->irq_keep_mask);
 
 	return init_workarounds_ring(engine);
 }
@@ -1644,12 +1643,9 @@ gen6_ring_enable_irq(struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *dev_priv = engine->i915;
 
-	if (HAS_L3_DPF(dev_priv) && engine->id == RCS)
-		I915_WRITE_IMR(engine,
-			       ~(engine->irq_enable_mask |
-				 GT_PARITY_ERROR(dev_priv)));
-	else
-		I915_WRITE_IMR(engine, ~engine->irq_enable_mask);
+	I915_WRITE_IMR(engine,
+		       ~(engine->irq_enable_mask |
+			 engine->irq_keep_mask));
 	gen5_enable_gt_irq(dev_priv, engine->irq_enable_mask);
 }
 
@@ -1658,10 +1654,7 @@ gen6_ring_disable_irq(struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *dev_priv = engine->i915;
 
-	if (HAS_L3_DPF(dev_priv) && engine->id == RCS)
-		I915_WRITE_IMR(engine, ~GT_PARITY_ERROR(dev_priv));
-	else
-		I915_WRITE_IMR(engine, ~0);
+	I915_WRITE_IMR(engine, ~engine->irq_keep_mask);
 	gen5_disable_gt_irq(dev_priv, engine->irq_enable_mask);
 }
 
@@ -1688,12 +1681,9 @@ gen8_ring_enable_irq(struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *dev_priv = engine->i915;
 
-	if (HAS_L3_DPF(dev_priv) && engine->id == RCS)
-		I915_WRITE_IMR(engine,
-			       ~(engine->irq_enable_mask |
-				 GT_RENDER_L3_PARITY_ERROR_INTERRUPT));
-	else
-		I915_WRITE_IMR(engine, ~engine->irq_enable_mask);
+	I915_WRITE_IMR(engine,
+		       ~(engine->irq_enable_mask |
+			 engine->irq_keep_mask));
 	POSTING_READ_FW(RING_IMR(engine->mmio_base));
 }
 
@@ -1702,11 +1692,7 @@ gen8_ring_disable_irq(struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *dev_priv = engine->i915;
 
-	if (HAS_L3_DPF(dev_priv) && engine->id == RCS)
-		I915_WRITE_IMR(engine,
-			       ~GT_RENDER_L3_PARITY_ERROR_INTERRUPT);
-	else
-		I915_WRITE_IMR(engine, ~0);
+	I915_WRITE_IMR(engine, ~engine->irq_keep_mask);
 }
 
 static int
@@ -2556,6 +2542,9 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 	engine->hw_id = 0;
 	engine->mmio_base = RENDER_RING_BASE;
 
+	if (HAS_L3_DPF(dev_priv))
+		engine->irq_keep_mask = GT_RENDER_L3_PARITY_ERROR_INTERRUPT;
+
 	if (INTEL_GEN(dev_priv) >= 8) {
 		if (i915_semaphore_is_enabled(dev_priv)) {
 			obj = i915_gem_object_create(dev, 4096);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index b591f5dd23cd..eb1ca680ad0b 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -201,7 +201,8 @@ struct intel_engine_cs {
 	struct i915_ctx_workarounds wa_ctx;
 
 	bool		irq_posted;
-	u32		irq_enable_mask;	/* bitmask to enable ring interrupt */
+	u32             irq_keep_mask; /* bitmask for interrupts that should not be masked */
+	u32		irq_enable_mask;/* bitmask to enable ring interrupt */
 	void		(*irq_enable)(struct intel_engine_cs *ring);
 	void		(*irq_disable)(struct intel_engine_cs *ring);
 
@@ -298,7 +299,6 @@ struct intel_engine_cs {
 	unsigned int idle_lite_restore_wa;
 	bool disable_lite_restore_wa;
 	u32 ctx_desc_template;
-	u32             irq_keep_mask; /* bitmask for interrupts that should not be masked */
 	int		(*emit_request)(struct drm_i915_gem_request *request);
 	int		(*emit_flush)(struct drm_i915_gem_request *request,
 				      u32 invalidate_domains,
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [CI 20/20] drm/i915: Remove debug noise on detecting fault-injection of missed interrupts
  2016-05-19 11:32 [CI 01/20] drm: Restore double clflush on the last partial cacheline Chris Wilson
                   ` (17 preceding siblings ...)
  2016-05-19 11:32 ` [CI 19/20] drm/i915: Simplify enabling user-interrupts with L3-remapping Chris Wilson
@ 2016-05-19 11:32 ` Chris Wilson
  2016-05-19 12:07 ` ✗ Ro.CI.BAT: warning for series starting with [CI,01/20] drm: Restore double clflush on the last partial cacheline Patchwork
  19 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2016-05-19 11:32 UTC (permalink / raw)
  To: intel-gfx

Since the tests can and do explicitly check debugfs/i915_ring_missed_irqs
for the handling of a "missed interrupt", adding it to the dmesg at INFO
is just noise. When it happens for real, we still class it as an ERROR.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_irq.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 70f7617b5bf1..313fe63ad56f 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -3093,9 +3093,6 @@ static unsigned kick_waiters(struct intel_engine_cs *engine)
 		if (!test_bit(engine->id, &i915->gpu_error.test_irq_rings))
 			DRM_ERROR("Hangcheck timer elapsed... %s idle\n",
 				  engine->name);
-		else
-			DRM_INFO("Fake missed irq on %s\n",
-				 engine->name);
 
 		intel_engine_enable_fake_irq(engine);
 	}
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* ✗ Ro.CI.BAT: warning for series starting with [CI,01/20] drm: Restore double clflush on the last partial cacheline
  2016-05-19 11:32 [CI 01/20] drm: Restore double clflush on the last partial cacheline Chris Wilson
                   ` (18 preceding siblings ...)
  2016-05-19 11:32 ` [CI 20/20] drm/i915: Remove debug noise on detecting fault-injection of missed interrupts Chris Wilson
@ 2016-05-19 12:07 ` Patchwork
  19 siblings, 0 replies; 39+ messages in thread
From: Patchwork @ 2016-05-19 12:07 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [CI,01/20] drm: Restore double clflush on the last partial cacheline
URL   : https://patchwork.freedesktop.org/series/7403/
State : warning

== Summary ==

Series 7403v1 Series without cover letter
http://patchwork.freedesktop.org/api/1.0/series/7403/revisions/1/mbox

Test gem_exec_flush:
        Subgroup basic-batch-kernel-default-uc:
                pass       -> DMESG-WARN (fi-bdw-i7-5557u)
        Subgroup basic-batch-kernel-default-wb:
                pass       -> DMESG-WARN (ro-skl-i7-6700hq)
                pass       -> DMESG-WARN (ro-bdw-i7-5557U)
                pass       -> DMESG-WARN (fi-skl-i7-6700k)
        Subgroup basic-uc-pro-default:
                fail       -> PASS       (ro-byt-n2820)

fi-bdw-i7-5557u  total:219  pass:205  dwarn:1   dfail:0   fail:0   skip:13 
fi-bsw-n3050     total:218  pass:176  dwarn:0   dfail:0   fail:0   skip:42 
fi-byt-n2820     total:218  pass:176  dwarn:0   dfail:0   fail:1   skip:41 
fi-hsw-i7-4770k  total:219  pass:198  dwarn:0   dfail:0   fail:0   skip:21 
fi-hsw-i7-4770r  total:219  pass:193  dwarn:0   dfail:0   fail:0   skip:26 
fi-skl-i7-6700k  total:219  pass:190  dwarn:1   dfail:0   fail:0   skip:28 
ro-bdw-i5-5250u  total:219  pass:181  dwarn:0   dfail:0   fail:0   skip:38 
ro-bdw-i7-5557U  total:219  pass:205  dwarn:1   dfail:0   fail:0   skip:13 
ro-bdw-i7-5600u  total:219  pass:187  dwarn:0   dfail:0   fail:0   skip:32 
ro-bsw-n3050     total:219  pass:177  dwarn:0   dfail:0   fail:0   skip:42 
ro-byt-n2820     total:218  pass:176  dwarn:0   dfail:0   fail:1   skip:41 
ro-hsw-i3-4010u  total:218  pass:193  dwarn:0   dfail:0   fail:0   skip:25 
ro-hsw-i7-4770r  total:219  pass:194  dwarn:0   dfail:0   fail:0   skip:25 
ro-ilk-i7-620lm  total:219  pass:151  dwarn:0   dfail:0   fail:1   skip:67 
ro-ilk1-i5-650   total:214  pass:151  dwarn:0   dfail:0   fail:2   skip:61 
ro-ivb-i7-3770   total:219  pass:183  dwarn:0   dfail:0   fail:0   skip:36 
ro-ivb2-i7-3770  total:219  pass:187  dwarn:0   dfail:0   fail:0   skip:32 
ro-skl-i7-6700hq total:214  pass:188  dwarn:1   dfail:0   fail:0   skip:25 
ro-snb-i7-2620M  total:219  pass:177  dwarn:0   dfail:0   fail:1   skip:41 
fi-snb-i7-2600 failed to connect after reboot

Results at /archive/results/CI_IGT_test/RO_Patchwork_939/

a2499a0 drm-intel-nightly: 2016y-05m-18d-17h-17m-55s UTC integration manifest
1c0d80b drm/i915: Remove debug noise on detecting fault-injection of missed interrupts
681616b drm/i915: Simplify enabling user-interrupts with L3-remapping
6304f0f drm/i915: Move the get/put irq locking into the caller
20f1659 drm/i915: Convert trace-irq to the breadcrumb waiter
b64d50a drm/i915: Only query timestamp when measuring elapsed time
5e3557f drm/i915: Stop setting wraparound seqno on initialisation
7ce6b39 drm/i915: Only apply one barrier after a breadcrumb interrupt is posted
45ca6c5 drm/i915: Check the CPU cached value of seqno after waking the waiter
e4ae50c drm/i915: Add a delay between interrupt and inspecting the final seqno (ilk)
664a481 drm/i915: Refactor scratch object allocation for gen2 w/a buffer
eec2227 drm/i915: Allocate scratch page from stolen
2cf8f74 drm/i915: Stop mapping the scratch page into CPU space
40aee12 drm/i915: Use HWS for seqno tracking everywhere
02c3232 drm/i915: Remove the lazy_coherency parameter from request-completed?
e8ac9d0 drm/i915: Slaughter the thundering i915_wait_request herd
994aedb drm/i915: Make queueing the hangcheck work inline
03e6a27 drm/i915: Remove the dedicated hangcheck workqueue
df36a24 drm/i915: Delay queuing hangcheck to wait-request
dc80249 drm/i915/shrinker: Flush active on objects before counting
7c64661 drm: Restore double clflush on the last partial cacheline

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [CI 02/20] drm/i915/shrinker: Flush active on objects before counting
  2016-05-19 11:32 ` [CI 02/20] drm/i915/shrinker: Flush active on objects before counting Chris Wilson
@ 2016-05-19 12:14   ` Tvrtko Ursulin
  0 siblings, 0 replies; 39+ messages in thread
From: Tvrtko Ursulin @ 2016-05-19 12:14 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 19/05/16 12:32, Chris Wilson wrote:
> As we inspect obj->active to decide how many objects we can shrink (we
> only shrink idle objects), it helps to flush the active lists first
> in order to have a more accurate count of available objects.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_gem_shrinker.c | 2 ++
>   1 file changed, 2 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c
> index 538c30499848..d893014d28fe 100644
> --- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
> +++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
> @@ -265,6 +265,8 @@ i915_gem_shrinker_count(struct shrinker *shrinker, struct shrink_control *sc)
>   	if (!i915_gem_shrinker_lock(dev, &unlock))
>   		return 0;
>
> +	i915_gem_retire_requests(dev_priv);
> +
>   	count = 0;
>   	list_for_each_entry(obj, &dev_priv->mm.unbound_list, global_list)
>   		if (can_release_pages(obj))
>

I can't think of a way this could harm anything and it is giving a more 
up to date count.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [CI 03/20] drm/i915: Delay queuing hangcheck to wait-request
  2016-05-19 11:32 ` [CI 03/20] drm/i915: Delay queuing hangcheck to wait-request Chris Wilson
@ 2016-05-19 12:34   ` Tvrtko Ursulin
  2016-05-19 12:52     ` Chris Wilson
  0 siblings, 1 reply; 39+ messages in thread
From: Tvrtko Ursulin @ 2016-05-19 12:34 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 19/05/16 12:32, Chris Wilson wrote:
> We can forgo queuing the hangcheck from the start of every request to
> until we wait upon a request. This reduces the overhead of every
> request, but may increase the latency of detecting a hang. Howeever, if
> nothing every waits upon a hang, did it ever hang? It also improves the

Maybe it is better to detect and reset sooner rather than later in case 
the hang is burning power and may now only get looked at at some random 
time in the future? It is probably quite weak argument but it came to mind.

> robustness of the wait-request by ensuring that the hangchecker is
> indeed running before we sleep indefinitely (and thereby ensuring that
> we never actually sleep forever waiting for a dead GPU).
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_gem.c |  9 +++++----
>   drivers/gpu/drm/i915/i915_irq.c | 10 ++++------
>   2 files changed, 9 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 24cab8802c2e..06013b9fbc6a 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1310,6 +1310,9 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>   			break;
>   		}
>
> +		/* Ensure that even if the GPU hangs, we get woken up. */
> +		i915_queue_hangcheck(dev_priv);
> +
>   		timer.function = NULL;
>   		if (timeout || missed_irq(dev_priv, engine)) {
>   			unsigned long expire;
> @@ -2654,8 +2657,6 @@ void __i915_add_request(struct drm_i915_gem_request *request,
>   	/* Not allowed to fail! */
>   	WARN(ret, "emit|add_request failed: %d!\n", ret);
>
> -	i915_queue_hangcheck(engine->i915);
> -
>   	queue_delayed_work(dev_priv->wq,
>   			   &dev_priv->mm.retire_work,
>   			   round_jiffies_up_relative(HZ));
> @@ -2999,8 +3000,8 @@ i915_gem_retire_requests(struct drm_i915_private *dev_priv)
>
>   	if (idle)
>   		mod_delayed_work(dev_priv->wq,
> -				   &dev_priv->mm.idle_work,
> -				   msecs_to_jiffies(100));
> +				 &dev_priv->mm.idle_work,
> +				 msecs_to_jiffies(100));
>
>   	return idle;
>   }
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index f0d941455bed..4818fcb5e960 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -3148,10 +3148,10 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
>   	intel_uncore_arm_unclaimed_mmio_detection(dev_priv);
>
>   	for_each_engine_id(engine, dev_priv, id) {
> +		bool busy = waitqueue_active(&engine->irq_queue);
>   		u64 acthd;
>   		u32 seqno;
>   		unsigned user_interrupts;
> -		bool busy = true;
>
>   		semaphore_clear_deadlocks(dev_priv);
>
> @@ -3174,12 +3174,11 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
>   		if (engine->hangcheck.seqno == seqno) {
>   			if (ring_idle(engine, seqno)) {
>   				engine->hangcheck.action = HANGCHECK_IDLE;
> -				if (waitqueue_active(&engine->irq_queue)) {
> +				if (busy) {
>   					/* Safeguard against driver failure */
>   					user_interrupts = kick_waiters(engine);
>   					engine->hangcheck.score += BUSY;
> -				} else
> -					busy = false;
> +				}
>   			} else {
>   				/* We always increment the hangcheck score
>   				 * if the ring is busy and still processing
> @@ -3253,9 +3252,8 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
>   		goto out;
>   	}
>
> +	/* Reset timer in case GPU hangs without another request being added */
>   	if (busy_count)
> -		/* Reset timer case chip hangs without another request
> -		 * being added */
>   		i915_queue_hangcheck(dev_priv);
>
>   out:
>

With a disclaimer that I don't really know hangcheck - it looks that 
previously it was getting re-queued all the time until the engine when 
idle, regardless of if there were waiters. With this change it looks 
like that is gone. If there are no waiters it won't get re-queued. How 
does incrementing the hangcheck score work now?

Regards,

Tvrtko


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [CI 04/20] drm/i915: Remove the dedicated hangcheck workqueue
  2016-05-19 11:32 ` [CI 04/20] drm/i915: Remove the dedicated hangcheck workqueue Chris Wilson
@ 2016-05-19 12:50   ` Tvrtko Ursulin
  2016-05-19 13:13     ` Chris Wilson
  0 siblings, 1 reply; 39+ messages in thread
From: Tvrtko Ursulin @ 2016-05-19 12:50 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 19/05/16 12:32, Chris Wilson wrote:
> The queue only ever contains at most one item and has no special flags.
> It is just a very simple wrapper around the system-wq - a complication
> with no benefits.

How much time do we take in the reset case - is it acceptable to do that 
work from the system wq?

Hm, why was it an ordered queue so far - we only queue single work on it?

Regards,

Tvrtko

> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_dma.c | 8 --------
>   drivers/gpu/drm/i915/i915_drv.h | 1 -
>   drivers/gpu/drm/i915/i915_irq.c | 6 +++---
>   3 files changed, 3 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> index a8c79f6512a4..aa312fd2bc10 100644
> --- a/drivers/gpu/drm/i915/i915_dma.c
> +++ b/drivers/gpu/drm/i915/i915_dma.c
> @@ -1026,15 +1026,8 @@ static int i915_workqueues_init(struct drm_i915_private *dev_priv)
>   	if (dev_priv->hotplug.dp_wq == NULL)
>   		goto out_free_wq;
>
> -	dev_priv->gpu_error.hangcheck_wq =
> -		alloc_ordered_workqueue("i915-hangcheck", 0);
> -	if (dev_priv->gpu_error.hangcheck_wq == NULL)
> -		goto out_free_dp_wq;
> -
>   	return 0;
>
> -out_free_dp_wq:
> -	destroy_workqueue(dev_priv->hotplug.dp_wq);
>   out_free_wq:
>   	destroy_workqueue(dev_priv->wq);
>   out_err:
> @@ -1045,7 +1038,6 @@ out_err:
>
>   static void i915_workqueues_cleanup(struct drm_i915_private *dev_priv)
>   {
> -	destroy_workqueue(dev_priv->gpu_error.hangcheck_wq);
>   	destroy_workqueue(dev_priv->hotplug.dp_wq);
>   	destroy_workqueue(dev_priv->wq);
>   }
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 72f0b02a8372..42b8038bf998 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1366,7 +1366,6 @@ struct i915_gpu_error {
>   	/* Hang gpu twice in this window and your context gets banned */
>   #define DRM_I915_CTX_BAN_PERIOD DIV_ROUND_UP(8*DRM_I915_HANGCHECK_PERIOD, 1000)
>
> -	struct workqueue_struct *hangcheck_wq;
>   	struct delayed_work hangcheck_work;
>
>   	/* For reset and error_state handling. */
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 4818fcb5e960..984b575ba081 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -3262,7 +3262,7 @@ out:
>
>   void i915_queue_hangcheck(struct drm_i915_private *dev_priv)
>   {
> -	struct i915_gpu_error *e = &dev_priv->gpu_error;
> +	unsigned long delay;
>
>   	if (!i915.enable_hangcheck)
>   		return;
> @@ -3272,8 +3272,8 @@ void i915_queue_hangcheck(struct drm_i915_private *dev_priv)
>   	 * we will ignore a hung ring if a second ring is kept busy.
>   	 */
>
> -	queue_delayed_work(e->hangcheck_wq, &e->hangcheck_work,
> -			   round_jiffies_up_relative(DRM_I915_HANGCHECK_JIFFIES));
> +	delay = round_jiffies_up_relative(DRM_I915_HANGCHECK_JIFFIES);
> +	schedule_delayed_work(&dev_priv->gpu_error.hangcheck_work, delay);
>   }
>
>   static void ibx_irq_reset(struct drm_device *dev)
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [CI 03/20] drm/i915: Delay queuing hangcheck to wait-request
  2016-05-19 12:34   ` Tvrtko Ursulin
@ 2016-05-19 12:52     ` Chris Wilson
  0 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2016-05-19 12:52 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Thu, May 19, 2016 at 01:34:55PM +0100, Tvrtko Ursulin wrote:
> 
> On 19/05/16 12:32, Chris Wilson wrote:
> >We can forgo queuing the hangcheck from the start of every request to
> >until we wait upon a request. This reduces the overhead of every
> >request, but may increase the latency of detecting a hang. Howeever, if
> >nothing every waits upon a hang, did it ever hang? It also improves the
> 
> Maybe it is better to detect and reset sooner rather than later in
> case the hang is burning power and may now only get looked at at
> some random time in the future? It is probably quite weak argument
> but it came to mind.

Yes. The counter argument is that it runs for seconds anyway until a
hang is found and for quick detection and recovery we have a watchdog
that we aren't using yet (but one day will switch to). So what I have in
mind here is to radically simplify the hangcheck and base it entirely on
whether the waiter (or the scheduler) is making progress. At some point,
we always need to wait on the GPU for a resource. (And then it just
becomes a plain DoS detector as a fallback to fine grained recovery.)

> >robustness of the wait-request by ensuring that the hangchecker is
> >indeed running before we sleep indefinitely (and thereby ensuring that
> >we never actually sleep forever waiting for a dead GPU).
> >
> >Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >---
> >  drivers/gpu/drm/i915/i915_gem.c |  9 +++++----
> >  drivers/gpu/drm/i915/i915_irq.c | 10 ++++------
> >  2 files changed, 9 insertions(+), 10 deletions(-)
> >
> >diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> >index 24cab8802c2e..06013b9fbc6a 100644
> >--- a/drivers/gpu/drm/i915/i915_gem.c
> >+++ b/drivers/gpu/drm/i915/i915_gem.c
> >@@ -1310,6 +1310,9 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
> >  			break;
> >  		}
> >
> >+		/* Ensure that even if the GPU hangs, we get woken up. */
> >+		i915_queue_hangcheck(dev_priv);
> >+
> >  		timer.function = NULL;
> >  		if (timeout || missed_irq(dev_priv, engine)) {
> >  			unsigned long expire;
> >@@ -2654,8 +2657,6 @@ void __i915_add_request(struct drm_i915_gem_request *request,
> >  	/* Not allowed to fail! */
> >  	WARN(ret, "emit|add_request failed: %d!\n", ret);
> >
> >-	i915_queue_hangcheck(engine->i915);
> >-
> >  	queue_delayed_work(dev_priv->wq,
> >  			   &dev_priv->mm.retire_work,
> >  			   round_jiffies_up_relative(HZ));
> >@@ -2999,8 +3000,8 @@ i915_gem_retire_requests(struct drm_i915_private *dev_priv)
> >
> >  	if (idle)
> >  		mod_delayed_work(dev_priv->wq,
> >-				   &dev_priv->mm.idle_work,
> >-				   msecs_to_jiffies(100));
> >+				 &dev_priv->mm.idle_work,
> >+				 msecs_to_jiffies(100));
> >
> >  	return idle;
> >  }
> >diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> >index f0d941455bed..4818fcb5e960 100644
> >--- a/drivers/gpu/drm/i915/i915_irq.c
> >+++ b/drivers/gpu/drm/i915/i915_irq.c
> >@@ -3148,10 +3148,10 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
> >  	intel_uncore_arm_unclaimed_mmio_detection(dev_priv);
> >
> >  	for_each_engine_id(engine, dev_priv, id) {
> >+		bool busy = waitqueue_active(&engine->irq_queue);
> >  		u64 acthd;
> >  		u32 seqno;
> >  		unsigned user_interrupts;
> >-		bool busy = true;
> >
> >  		semaphore_clear_deadlocks(dev_priv);
> >
> >@@ -3174,12 +3174,11 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
> >  		if (engine->hangcheck.seqno == seqno) {
> >  			if (ring_idle(engine, seqno)) {
> >  				engine->hangcheck.action = HANGCHECK_IDLE;
> >-				if (waitqueue_active(&engine->irq_queue)) {
> >+				if (busy) {
> >  					/* Safeguard against driver failure */
> >  					user_interrupts = kick_waiters(engine);
> >  					engine->hangcheck.score += BUSY;
> >-				} else
> >-					busy = false;
> >+				}
> >  			} else {
> >  				/* We always increment the hangcheck score
> >  				 * if the ring is busy and still processing
> >@@ -3253,9 +3252,8 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
> >  		goto out;
> >  	}
> >
> >+	/* Reset timer in case GPU hangs without another request being added */
> >  	if (busy_count)
> >-		/* Reset timer case chip hangs without another request
> >-		 * being added */
> >  		i915_queue_hangcheck(dev_priv);
> >
> >  out:
> >
> 
> With a disclaimer that I don't really know hangcheck - it looks that
> previously it was getting re-queued all the time until the engine
> when idle, regardless of if there were waiters. With this change it
> looks like that is gone. If there are no waiters it won't get
> re-queued. How does incrementing the hangcheck score work now?

It gets reset between waits (i.e. the hangcheck is idle as would now be
the case for GPU idle) and then incremented for as long as someone is
waiting until it fires. In practice, it doesn't alter the discovery rate
at all.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [CI 05/20] drm/i915: Make queueing the hangcheck work inline
  2016-05-19 11:32 ` [CI 05/20] drm/i915: Make queueing the hangcheck work inline Chris Wilson
@ 2016-05-19 12:53   ` Tvrtko Ursulin
  2016-05-19 13:18     ` Chris Wilson
  0 siblings, 1 reply; 39+ messages in thread
From: Tvrtko Ursulin @ 2016-05-19 12:53 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 19/05/16 12:32, Chris Wilson wrote:
> Since the function is a small wrapper around schedule_delayed_work(),
> move it inline to remove the function call overhead for the principle
> caller.

Alternatively move it to i915_gem.c and let the compiler decide?

But it is small enough, so anyway:

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko


> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_drv.h | 17 ++++++++++++++++-
>   drivers/gpu/drm/i915/i915_irq.c | 16 ----------------
>   2 files changed, 16 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 42b8038bf998..33553ea99a03 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2833,7 +2833,22 @@ void intel_hpd_cancel_work(struct drm_i915_private *dev_priv);
>   bool intel_hpd_pin_to_port(enum hpd_pin pin, enum port *port);
>
>   /* i915_irq.c */
> -void i915_queue_hangcheck(struct drm_i915_private *dev_priv);
> +static inline void i915_queue_hangcheck(struct drm_i915_private *dev_priv)
> +{
> +	unsigned long delay;
> +
> +	if (unlikely(!i915.enable_hangcheck))
> +		return;
> +
> +	/* Don't continually defer the hangcheck so that it is always run at
> +	 * least once after work has been scheduled on any ring. Otherwise,
> +	 * we will ignore a hung ring if a second ring is kept busy.
> +	 */
> +
> +	delay = round_jiffies_up_relative(DRM_I915_HANGCHECK_JIFFIES);
> +	schedule_delayed_work(&dev_priv->gpu_error.hangcheck_work, delay);
> +}
> +
>   __printf(3, 4)
>   void i915_handle_error(struct drm_i915_private *dev_priv,
>   		       u32 engine_mask,
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 984b575ba081..27dd45a7291c 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -3260,22 +3260,6 @@ out:
>   	ENABLE_RPM_WAKEREF_ASSERTS(dev_priv);
>   }
>
> -void i915_queue_hangcheck(struct drm_i915_private *dev_priv)
> -{
> -	unsigned long delay;
> -
> -	if (!i915.enable_hangcheck)
> -		return;
> -
> -	/* Don't continually defer the hangcheck so that it is always run at
> -	 * least once after work has been scheduled on any ring. Otherwise,
> -	 * we will ignore a hung ring if a second ring is kept busy.
> -	 */
> -
> -	delay = round_jiffies_up_relative(DRM_I915_HANGCHECK_JIFFIES);
> -	schedule_delayed_work(&dev_priv->gpu_error.hangcheck_work, delay);
> -}
> -
>   static void ibx_irq_reset(struct drm_device *dev)
>   {
>   	struct drm_i915_private *dev_priv = dev->dev_private;
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [CI 04/20] drm/i915: Remove the dedicated hangcheck workqueue
  2016-05-19 12:50   ` Tvrtko Ursulin
@ 2016-05-19 13:13     ` Chris Wilson
  2016-05-20 12:07       ` Tvrtko Ursulin
  0 siblings, 1 reply; 39+ messages in thread
From: Chris Wilson @ 2016-05-19 13:13 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Thu, May 19, 2016 at 01:50:51PM +0100, Tvrtko Ursulin wrote:
> 
> On 19/05/16 12:32, Chris Wilson wrote:
> >The queue only ever contains at most one item and has no special flags.
> >It is just a very simple wrapper around the system-wq - a complication
> >with no benefits.
> 
> How much time do we take in the reset case - is it acceptable to do
> that work from the system wq?

Hangcheck is a handful of register reads and some pointer chasing per
engine. (There is a seqno_barrier in there which may be reasonably
expensive but not a cpu hog). The error capture is run from the
hangcheck context - and that is no small task (especially if we ever
apply the object compression patches), but for safety we need to call
stop_machine() so it really doesn't matter at that point.
 
> Hm, why was it an ordered queue so far - we only queue single work on it?

Copy-n-paste I think. The dev_priv->wq is ordered because we take
struct_mutex in everything and so only really support one concurrent task
at a time. We wanted a seperate workqueue for hangcheck so it wouldn't
be interfered with by the driver blocking the ordered dev_priv->wq, but
that is just the same as using the unbound system workqueue. Hmm,
probably we do want system_unbound_wq is we ever see an issue.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [CI 05/20] drm/i915: Make queueing the hangcheck work inline
  2016-05-19 12:53   ` Tvrtko Ursulin
@ 2016-05-19 13:18     ` Chris Wilson
  0 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2016-05-19 13:18 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Thu, May 19, 2016 at 01:53:51PM +0100, Tvrtko Ursulin wrote:
> 
> 
> On 19/05/16 12:32, Chris Wilson wrote:
> >Since the function is a small wrapper around schedule_delayed_work(),
> >move it inline to remove the function call overhead for the principle
> >caller.
> 
> Alternatively move it to i915_gem.c and let the compiler decide?

Because then I would have to move it again! :)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [CI 16/20] drm/i915: Only query timestamp when measuring elapsed time
  2016-05-19 11:32 ` [CI 16/20] drm/i915: Only query timestamp when measuring elapsed time Chris Wilson
@ 2016-05-19 15:44   ` Tvrtko Ursulin
  2016-05-20 12:20     ` Chris Wilson
  0 siblings, 1 reply; 39+ messages in thread
From: Tvrtko Ursulin @ 2016-05-19 15:44 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 19/05/16 12:32, Chris Wilson wrote:
> Avoid the two calls to ktime_get_raw_ns() (at best it reads the TSC) as
> we only need to compute the elapsed time for a timed wait.
>
> v2: Eliminate the unused local variable reducing the function size by 64
> bytes (using the storage space on the callers stack rather than adding
> to our stack frame)
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_gem.c | 14 +++++---------
>   1 file changed, 5 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index b48a3b46e86f..2c254cf49c15 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1215,7 +1215,6 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>   	int state = interruptible ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
>   	struct intel_wait wait;
>   	unsigned long timeout_remain;
> -	s64 before = 0; /* Only to silence a compiler warning. */
>   	int ret = 0;
>
>   	might_sleep();
> @@ -1234,12 +1233,9 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>   		if (*timeout == 0)
>   			return -ETIME;
>
> +		/* Record current time in case interrupted, or wedged */
>   		timeout_remain = nsecs_to_jiffies_timeout(*timeout);
> -
> -		/*
> -		 * Record current time in case interrupted by signal, or wedged.
> -		 */
> -		before = ktime_get_raw_ns();
> +		*timeout += ktime_get_raw_ns();
>   	}
>
>   	trace_i915_gem_request_wait_begin(req);
> @@ -1296,9 +1292,9 @@ complete:
>   	trace_i915_gem_request_wait_end(req);
>
>   	if (timeout) {
> -		s64 tres = *timeout - (ktime_get_raw_ns() - before);
> -
> -		*timeout = tres < 0 ? 0 : tres;
> +		*timeout -= ktime_get_raw_ns();
> +		if (*timeout < 0)
> +			*timeout = 0;
>
>   		/*
>   		 * Apparently ktime isn't accurate enough and occasionally has a
>

I think this is bad, better have a local than play games with callers 
storage.

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [CI 06/20] drm/i915: Slaughter the thundering i915_wait_request herd
  2016-05-19 11:32 ` [CI 06/20] drm/i915: Slaughter the thundering i915_wait_request herd Chris Wilson
@ 2016-05-20 12:04   ` Tvrtko Ursulin
  2016-05-20 12:19     ` Chris Wilson
  0 siblings, 1 reply; 39+ messages in thread
From: Tvrtko Ursulin @ 2016-05-20 12:04 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 19/05/16 12:32, Chris Wilson wrote:
> One particularly stressful scenario consists of many independent tasks
> all competing for GPU time and waiting upon the results (e.g. realtime
> transcoding of many, many streams). One bottleneck in particular is that
> each client waits on its own results, but every client is woken up after
> every batchbuffer - hence the thunder of hooves as then every client must
> do its heavyweight dance to read a coherent seqno to see if it is the
> lucky one.
>
> Ideally, we only want one client to wake up after the interrupt and
> check its request for completion. Since the requests must retire in
> order, we can select the first client on the oldest request to be woken.
> Once that client has completed his wait, we can then wake up the
> next client and so on. However, all clients then incur latency as every
> process in the chain may be delayed for scheduling - this may also then
> cause some priority inversion. To reduce the latency, when a client
> is added or removed from the list, we scan the tree for completed
> seqno and wake up all the completed waiters in parallel.
>
> Using igt/benchmarks/gem_latency, we can demonstrate this effect. The
> benchmark measures the number of GPU cycles between completion of a
> batch and the client waking up from a call to wait-ioctl. With many
> concurrent waiters, with each on a different request, we observe that
> the wakeup latency before the patch scales nearly linearly with the
> number of waiters (before external factors kick in making the scaling much
> worse). After applying the patch, we can see that only the single waiter
> for the request is being woken up, providing a constant wakeup latency
> for every operation. However, the situation is not quite as rosy for
> many waiters on the same request, though to the best of my knowledge this
> is much less likely in practice. Here, we can observe that the
> concurrent waiters incur extra latency from being woken up by the
> solitary bottom-half, rather than directly by the interrupt. This
> appears to be scheduler induced (having discounted adverse effects from
> having a rbtree walk/erase in the wakeup path), each additional
> wake_up_process() costs approximately 1us on big core. Another effect of
> performing the secondary wakeups from the first bottom-half is the
> incurred delay this imposes on high priority threads - rather than
> immediately returning to userspace and leaving the interrupt handler to
> wake the others.
>
> To offset the delay incurred with additional waiters on a request, we
> could use a hybrid scheme that did a quick read in the interrupt handler
> and dequeued all the completed waiters (incurring the overhead in the
> interrupt handler, not the best plan either as we then incur GPU
> submission latency) but we would still have to wake up the bottom-half
> every time to do the heavyweight slow read. Or we could only kick the
> waiters on the seqno with the same priority as the current task (i.e. in
> the realtime waiter scenario, only it is woken up immediately by the
> interrupt and simply queues the next waiter before returning to userspace,
> minimising its delay at the expense of the chain, and also reducing
> contention on its scheduler runqueue). This is effective at avoid long
> pauses in the interrupt handler and at avoiding the extra latency in
> realtime/high-priority waiters.
>
> v2: Convert from a kworker per engine into a dedicated kthread for the
> bottom-half.
> v3: Rename request members and tweak comments.
> v4: Use a per-engine spinlock in the breadcrumbs bottom-half.
> v5: Fix race in locklessly checking waiter status and kicking the task on
> adding a new waiter.
> v6: Fix deciding when to force the timer to hide missing interrupts.
> v7: Move the bottom-half from the kthread to the first client process.
> v8: Reword a few comments
> v9: Break the busy loop when the interrupt is unmasked or has fired.
> v10: Comments, unnecessary churn, better debugging from Tvrtko
> v11: Wake all completed waiters on removing the current bottom-half to
> reduce the latency of waking up a herd of clients all waiting on the
> same request.
> v12: Rearrange missed-interrupt fault injection so that it works with
> igt/drv_missed_irq_hang
> v13: Rename intel_breadcrumb and friends to intel_wait in preparation
> for signal handling.
> v14: RCU commentary, assert_spin_locked
> v15: Hide BUG_ON behind the compiler; report on gem_latency findings.
> v16: Sort seqno-groups by priority so that first-waiter has the highest
> task priority (and so avoid priority inversion).
> v17: Add waiters to post-mortem GPU hang state.

A lot of time has passed since I last looked at this. :I A lot of 
complexity to recall. :)

>
> Testcase: igt/gem_concurrent_blit
> Testcase: igt/benchmarks/gem_latency
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: "Rogozhkin, Dmitry V" <dmitry.v.rogozhkin@intel.com>
> Cc: "Gong, Zhipeng" <zhipeng.gong@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> Cc: Dave Gordon <david.s.gordon@intel.com>
> Cc: "Goel, Akash" <akash.goel@intel.com>
> ---
>   drivers/gpu/drm/i915/Makefile            |   1 +
>   drivers/gpu/drm/i915/i915_debugfs.c      |  15 +-
>   drivers/gpu/drm/i915/i915_drv.h          |  39 +++-
>   drivers/gpu/drm/i915/i915_gem.c          | 130 ++++--------
>   drivers/gpu/drm/i915/i915_gpu_error.c    |  59 +++++-
>   drivers/gpu/drm/i915/i915_irq.c          |  27 +--
>   drivers/gpu/drm/i915/intel_breadcrumbs.c | 333 +++++++++++++++++++++++++++++++
>   drivers/gpu/drm/i915/intel_lrc.c         |   4 +-
>   drivers/gpu/drm/i915/intel_ringbuffer.c  |   3 +-
>   drivers/gpu/drm/i915/intel_ringbuffer.h  |  67 ++++++-
>   10 files changed, 567 insertions(+), 111 deletions(-)
>   create mode 100644 drivers/gpu/drm/i915/intel_breadcrumbs.c
>
> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> index 7e2944406b8f..4c4e19b3f4d7 100644
> --- a/drivers/gpu/drm/i915/Makefile
> +++ b/drivers/gpu/drm/i915/Makefile
> @@ -37,6 +37,7 @@ i915-y += i915_cmd_parser.o \
>   	  i915_gem_userptr.o \
>   	  i915_gpu_error.o \
>   	  i915_trace_points.o \
> +	  intel_breadcrumbs.o \
>   	  intel_lrc.o \
>   	  intel_mocs.o \
>   	  intel_ringbuffer.o \
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 24f4105b910f..02a923feeb7d 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -761,10 +761,21 @@ static int i915_gem_request_info(struct seq_file *m, void *data)
>   static void i915_ring_seqno_info(struct seq_file *m,
>   				 struct intel_engine_cs *engine)
>   {
> +	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> +	struct rb_node *rb;
> +
>   	seq_printf(m, "Current sequence (%s): %x\n",
>   		   engine->name, engine->get_seqno(engine));
>   	seq_printf(m, "Current user interrupts (%s): %x\n",
>   		   engine->name, READ_ONCE(engine->user_interrupts));
> +
> +	spin_lock(&b->lock);
> +	for (rb = rb_first(&b->waiters); rb != NULL; rb = rb_next(rb)) {
> +		struct intel_wait *w = container_of(rb, typeof(*w), node);
> +		seq_printf(m, "Waiting (%s): %s [%d] on %x\n",
> +			   engine->name, w->task->comm, w->task->pid, w->seqno);
> +	}
> +	spin_unlock(&b->lock);
>   }
>
>   static int i915_gem_seqno_info(struct seq_file *m, void *data)
> @@ -1400,6 +1411,8 @@ static int i915_hangcheck_info(struct seq_file *m, void *unused)
>   			   engine->hangcheck.seqno,
>   			   seqno[id],
>   			   engine->last_submitted_seqno);
> +		seq_printf(m, "\twaiters? %d\n",
> +			   intel_engine_has_waiter(engine));
>   		seq_printf(m, "\tuser interrupts = %x [current %x]\n",
>   			   engine->hangcheck.user_interrupts,
>   			   READ_ONCE(engine->user_interrupts));
> @@ -2384,7 +2397,7 @@ static int count_irq_waiters(struct drm_i915_private *i915)
>   	int count = 0;
>
>   	for_each_engine(engine, i915)
> -		count += engine->irq_refcount;
> +		count += intel_engine_has_waiter(engine);
>
>   	return count;
>   }
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 33553ea99a03..7b329464e8eb 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -503,6 +503,7 @@ struct drm_i915_error_state {
>   		bool valid;
>   		/* Software tracked state */
>   		bool waiting;
> +		int num_waiters;
>   		int hangcheck_score;
>   		enum intel_ring_hangcheck_action hangcheck_action;
>   		int num_requests;
> @@ -548,6 +549,12 @@ struct drm_i915_error_state {
>   			u32 tail;
>   		} *requests;
>
> +		struct drm_i915_error_waiter {
> +			char comm[TASK_COMM_LEN];
> +			pid_t pid;
> +			u32 seqno;
> +		} *waiters;
> +
>   		struct {
>   			u32 gfx_mode;
>   			union {
> @@ -1415,7 +1422,7 @@ struct i915_gpu_error {
>   #define I915_STOP_RING_ALLOW_WARN      (1 << 30)
>
>   	/* For missed irq/seqno simulation. */
> -	unsigned int test_irq_rings;
> +	unsigned long test_irq_rings;
>   };
>
>   enum modeset_restore {
> @@ -2941,7 +2948,6 @@ ibx_disable_display_interrupt(struct drm_i915_private *dev_priv, uint32_t bits)
>   	ibx_display_interrupt_update(dev_priv, bits, 0);
>   }
>
> -
>   /* i915_gem.c */
>   int i915_gem_create_ioctl(struct drm_device *dev, void *data,
>   			  struct drm_file *file_priv);
> @@ -3815,4 +3821,33 @@ static inline void i915_trace_irq_get(struct intel_engine_cs *engine,
>   		i915_gem_request_assign(&engine->trace_irq_req, req);
>   }
>
> +static inline bool __i915_request_irq_complete(struct drm_i915_gem_request *req)
> +{
> +	/* Ensure our read of the seqno is coherent so that we
> +	 * do not "miss an interrupt" (i.e. if this is the last
> +	 * request and the seqno write from the GPU is not visible
> +	 * by the time the interrupt fires, we will see that the
> +	 * request is incomplete and go back to sleep awaiting
> +	 * another interrupt that will never come.)
> +	 *
> +	 * Strictly, we only need to do this once after an interrupt,
> +	 * but it is easier and safer to do it every time the waiter
> +	 * is woken.
> +	 */
> +	if (i915_gem_request_completed(req, false))
> +		return true;
> +
> +	/* We need to check whether any gpu reset happened in between
> +	 * the request being submitted and now. If a reset has occurred,
> +	 * the request is effectively complete (we either are in the
> +	 * process of or have discarded the rendering and completely
> +	 * reset the GPU. The results of the request are lost and we
> +	 * are free to continue on with the original operation.
> +	 */
> +	if (req->reset_counter != i915_reset_counter(&req->i915->gpu_error))
> +		return true;
> +
> +	return false;
> +}
> +
>   #endif
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 06013b9fbc6a..9348b5d7de0e 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1123,17 +1123,6 @@ i915_gem_check_wedge(unsigned reset_counter, bool interruptible)
>   	return 0;
>   }
>
> -static void fake_irq(unsigned long data)
> -{
> -	wake_up_process((struct task_struct *)data);
> -}
> -
> -static bool missed_irq(struct drm_i915_private *dev_priv,
> -		       struct intel_engine_cs *engine)
> -{
> -	return test_bit(engine->id, &dev_priv->gpu_error.missed_irq_rings);
> -}
> -
>   static unsigned long local_clock_us(unsigned *cpu)
>   {
>   	unsigned long t;
> @@ -1166,7 +1155,7 @@ static bool busywait_stop(unsigned long timeout, unsigned cpu)
>   	return this_cpu != cpu;
>   }
>
> -static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
> +static bool __i915_spin_request(struct drm_i915_gem_request *req, int state)
>   {
>   	unsigned long timeout;
>   	unsigned cpu;
> @@ -1181,17 +1170,14 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
>   	 * takes to sleep on a request, on the order of a microsecond.
>   	 */
>
> -	if (req->engine->irq_refcount)
> -		return -EBUSY;
> -
>   	/* Only spin if we know the GPU is processing this request */
>   	if (!i915_gem_request_started(req, true))
> -		return -EAGAIN;
> +		return false;
>
>   	timeout = local_clock_us(&cpu) + 5;
> -	while (!need_resched()) {
> +	do {
>   		if (i915_gem_request_completed(req, true))
> -			return 0;
> +			return true;
>
>   		if (signal_pending_state(state, current))
>   			break;
> @@ -1200,12 +1186,9 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
>   			break;
>
>   		cpu_relax_lowlatency();
> -	}
> -
> -	if (i915_gem_request_completed(req, false))
> -		return 0;
> +	} while (!need_resched());
>
> -	return -EAGAIN;
> +	return false;
>   }
>
>   /**
> @@ -1229,17 +1212,13 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>   			s64 *timeout,
>   			struct intel_rps_client *rps)
>   {
> -	struct intel_engine_cs *engine = i915_gem_request_get_engine(req);
> -	struct drm_i915_private *dev_priv = req->i915;
> -	const bool irq_test_in_progress =
> -		ACCESS_ONCE(dev_priv->gpu_error.test_irq_rings) & intel_engine_flag(engine);
>   	int state = interruptible ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
> -	DEFINE_WAIT(wait);
> -	unsigned long timeout_expire;
> +	struct intel_wait wait;
> +	unsigned long timeout_remain;
>   	s64 before = 0; /* Only to silence a compiler warning. */
> -	int ret;
> +	int ret = 0;
>
> -	WARN(!intel_irqs_enabled(dev_priv), "IRQs disabled");
> +	might_sleep();
>
>   	if (list_empty(&req->list))
>   		return 0;
> @@ -1247,7 +1226,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>   	if (i915_gem_request_completed(req, true))
>   		return 0;
>
> -	timeout_expire = 0;
> +	timeout_remain = MAX_SCHEDULE_TIMEOUT;
>   	if (timeout) {
>   		if (WARN_ON(*timeout < 0))
>   			return -EINVAL;
> @@ -1255,7 +1234,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>   		if (*timeout == 0)
>   			return -ETIME;
>
> -		timeout_expire = jiffies + nsecs_to_jiffies_timeout(*timeout);
> +		timeout_remain = nsecs_to_jiffies_timeout(*timeout);
>
>   		/*
>   		 * Record current time in case interrupted by signal, or wedged.
> @@ -1263,78 +1242,57 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>   		before = ktime_get_raw_ns();
>   	}
>
> -	if (INTEL_INFO(dev_priv)->gen >= 6)
> -		gen6_rps_boost(dev_priv, rps, req->emitted_jiffies);
> -
>   	trace_i915_gem_request_wait_begin(req);
>
> -	/* Optimistic spin for the next jiffie before touching IRQs */
> -	ret = __i915_spin_request(req, state);
> -	if (ret == 0)
> -		goto out;
> +	if (INTEL_INFO(req->i915)->gen >= 6)
> +		gen6_rps_boost(req->i915, rps, req->emitted_jiffies);
>
> -	if (!irq_test_in_progress && WARN_ON(!engine->irq_get(engine))) {
> -		ret = -ENODEV;
> -		goto out;
> -	}
> -
> -	for (;;) {
> -		struct timer_list timer;
> +	/* Optimistic spin for the next ~jiffie before touching IRQs */
> +	if (__i915_spin_request(req, state))
> +		goto complete;
>
> -		prepare_to_wait(&engine->irq_queue, &wait, state);
> -
> -		/* We need to check whether any gpu reset happened in between
> -		 * the request being submitted and now. If a reset has occurred,
> -		 * the request is effectively complete (we either are in the
> -		 * process of or have discarded the rendering and completely
> -		 * reset the GPU. The results of the request are lost and we
> -		 * are free to continue on with the original operation.
> +	intel_wait_init(&wait, req->seqno);
> +	set_current_state(state);
> +	if (intel_engine_add_wait(req->engine, &wait))
> +		/* In order to check that we haven't missed the interrupt
> +		 * as we enabled it, we need to kick ourselves to do a
> +		 * coherent check on the seqno before we sleep.
>   		 */
> -		if (req->reset_counter != i915_reset_counter(&dev_priv->gpu_error)) {
> -			ret = 0;
> -			break;
> -		}
> -
> -		if (i915_gem_request_completed(req, false)) {
> -			ret = 0;
> -			break;
> -		}
> +		goto wakeup;
>
> +	for (;;) {
>   		if (signal_pending_state(state, current)) {
>   			ret = -ERESTARTSYS;
>   			break;
>   		}
>
> -		if (timeout && time_after_eq(jiffies, timeout_expire)) {
> +		/* Ensure that even if the GPU hangs, we get woken up. */
> +		i915_queue_hangcheck(req->i915);
> +
> +		timeout_remain = io_schedule_timeout(timeout_remain);
> +		if (timeout_remain == 0) {
>   			ret = -ETIME;
>   			break;
>   		}
>
> -		/* Ensure that even if the GPU hangs, we get woken up. */
> -		i915_queue_hangcheck(dev_priv);
> -
> -		timer.function = NULL;
> -		if (timeout || missed_irq(dev_priv, engine)) {
> -			unsigned long expire;
> -
> -			setup_timer_on_stack(&timer, fake_irq, (unsigned long)current);
> -			expire = missed_irq(dev_priv, engine) ? jiffies + 1 : timeout_expire;
> -			mod_timer(&timer, expire);
> -		}
> +		if (intel_wait_complete(&wait))
> +			break;
>
> -		io_schedule();
> +wakeup:
> +		set_current_state(state);
>
> -		if (timer.function) {
> -			del_singleshot_timer_sync(&timer);
> -			destroy_timer_on_stack(&timer);
> -		}
> +		/* Carefully check if the request is complete, giving time
> +		 * for the seqno to be visible following the interrupt.
> +		 * We also have to check in case we are kicked by the GPU
> +		 * reset in order to drop the struct_mutex.
> +		 */
> +		if (__i915_request_irq_complete(req))
> +			break;
>   	}
> -	if (!irq_test_in_progress)
> -		engine->irq_put(engine);
>
> -	finish_wait(&engine->irq_queue, &wait);
> -
> -out:
> +	intel_engine_remove_wait(req->engine, &wait);
> +	__set_current_state(TASK_RUNNING);
> +complete:
>   	trace_i915_gem_request_wait_end(req);
>
>   	if (timeout) {
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index 34ff2459ceea..89241ffcc676 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -463,6 +463,18 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
>   			}
>   		}
>
> +		if (error->ring[i].num_waiters) {
> +			err_printf(m, "%s --- %d waiters\n",
> +				   dev_priv->engine[i].name,
> +				   error->ring[i].num_waiters);
> +			for (j = 0; j < error->ring[i].num_waiters; j++) {
> +				err_printf(m, " seqno 0x%08x for %s [%d]\n",
> +					   error->ring[i].waiters[j].seqno,
> +					   error->ring[i].waiters[j].comm,
> +					   error->ring[i].waiters[j].pid);
> +			}
> +		}
> +
>   		if ((obj = error->ring[i].ringbuffer)) {
>   			err_printf(m, "%s --- ringbuffer = 0x%08x\n",
>   				   dev_priv->engine[i].name,
> @@ -605,8 +617,9 @@ static void i915_error_state_free(struct kref *error_ref)
>   		i915_error_object_free(error->ring[i].ringbuffer);
>   		i915_error_object_free(error->ring[i].hws_page);
>   		i915_error_object_free(error->ring[i].ctx);
> -		kfree(error->ring[i].requests);
>   		i915_error_object_free(error->ring[i].wa_ctx);
> +		kfree(error->ring[i].requests);
> +		kfree(error->ring[i].waiters);
>   	}
>
>   	i915_error_object_free(error->semaphore_obj);
> @@ -892,6 +905,47 @@ static void gen6_record_semaphore_state(struct drm_i915_private *dev_priv,
>   	}
>   }
>
> +static void engine_record_waiters(struct intel_engine_cs *engine,
> +				  struct drm_i915_error_ring *ering)
> +{
> +	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> +	struct drm_i915_error_waiter *waiter;
> +	struct rb_node *rb;
> +	int count;
> +
> +	ering->num_waiters = 0;
> +	ering->waiters = NULL;
> +
> +	spin_lock(&b->lock);
> +	count = 0;
> +	for (rb = rb_first(&b->waiters); rb != NULL; rb = rb_next(rb))
> +		count++;
> +	spin_unlock(&b->lock);
> +
> +	waiter = NULL;
> +	if (count)
> +		waiter = kmalloc(count*sizeof(struct drm_i915_error_waiter),
> +				 GFP_ATOMIC);
> +	if (!waiter)
> +		return;
> +
> +	ering->waiters = waiter;
> +
> +	spin_lock(&b->lock);
> +	for (rb = rb_first(&b->waiters); rb != NULL; rb = rb_next(rb)) {
> +		struct intel_wait *w = container_of(rb, typeof(*w), node);
> +
> +		strcpy(waiter->comm, w->task->comm);
> +		waiter->pid = w->task->pid;
> +		waiter->seqno = w->seqno;
> +		waiter++;
> +
> +		if (++ering->num_waiters == count)
> +			break;
> +	}
> +	spin_unlock(&b->lock);
> +}
> +
>   static void i915_record_ring_state(struct drm_i915_private *dev_priv,
>   				   struct drm_i915_error_state *error,
>   				   struct intel_engine_cs *engine,
> @@ -926,7 +980,7 @@ static void i915_record_ring_state(struct drm_i915_private *dev_priv,
>   		ering->instdone = I915_READ(GEN2_INSTDONE);
>   	}
>
> -	ering->waiting = waitqueue_active(&engine->irq_queue);
> +	ering->waiting = intel_engine_has_waiter(engine);
>   	ering->instpm = I915_READ(RING_INSTPM(engine->mmio_base));
>   	ering->acthd = intel_ring_get_active_head(engine);
>   	ering->seqno = engine->get_seqno(engine);
> @@ -1032,6 +1086,7 @@ static void i915_gem_record_rings(struct drm_i915_private *dev_priv,
>   		error->ring[i].valid = true;
>
>   		i915_record_ring_state(dev_priv, error, engine, &error->ring[i]);
> +		engine_record_waiters(engine, &error->ring[i]);
>
>   		request = i915_gem_find_active_request(engine);
>   		if (request) {
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 27dd45a7291c..c49b8356a80d 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -988,13 +988,10 @@ static void ironlake_rps_change_irq_handler(struct drm_i915_private *dev_priv)
>
>   static void notify_ring(struct intel_engine_cs *engine)
>   {
> -	if (!intel_engine_initialized(engine))
> -		return;
> -
> -	trace_i915_gem_request_notify(engine);
> -	engine->user_interrupts++;
> -
> -	wake_up_all(&engine->irq_queue);
> +	if (intel_engine_wakeup(engine)) {
> +		trace_i915_gem_request_notify(engine);
> +		engine->user_interrupts++;
> +	}
>   }
>
>   static void vlv_c0_read(struct drm_i915_private *dev_priv,
> @@ -1075,7 +1072,7 @@ static bool any_waiters(struct drm_i915_private *dev_priv)
>   	struct intel_engine_cs *engine;
>
>   	for_each_engine(engine, dev_priv)
> -		if (engine->irq_refcount)
> +		if (intel_engine_has_waiter(engine))
>   			return true;
>
>   	return false;
> @@ -2505,8 +2502,6 @@ static irqreturn_t gen8_irq_handler(int irq, void *arg)
>   static void i915_error_wake_up(struct drm_i915_private *dev_priv,
>   			       bool reset_completed)
>   {
> -	struct intel_engine_cs *engine;
> -
>   	/*
>   	 * Notify all waiters for GPU completion events that reset state has
>   	 * been changed, and that they need to restart their wait after
> @@ -2514,9 +2509,8 @@ static void i915_error_wake_up(struct drm_i915_private *dev_priv,
>   	 * a gpu reset pending so that i915_error_work_func can acquire them).
>   	 */
>
> -	/* Wake up __wait_seqno, potentially holding dev->struct_mutex. */
> -	for_each_engine(engine, dev_priv)
> -		wake_up_all(&engine->irq_queue);
> +	/* Wake up i915_wait_request, potentially holding dev->struct_mutex. */
> +	intel_kick_waiters(dev_priv);
>
>   	/* Wake up intel_crtc_wait_for_pending_flips, holding crtc->mutex. */
>   	wake_up_all(&dev_priv->pending_flip_queue);
> @@ -3098,13 +3092,14 @@ static unsigned kick_waiters(struct intel_engine_cs *engine)
>
>   	if (engine->hangcheck.user_interrupts == user_interrupts &&
>   	    !test_and_set_bit(engine->id, &i915->gpu_error.missed_irq_rings)) {
> -		if (!(i915->gpu_error.test_irq_rings & intel_engine_flag(engine)))
> +		if (!test_bit(engine->id, &i915->gpu_error.test_irq_rings))
>   			DRM_ERROR("Hangcheck timer elapsed... %s idle\n",
>   				  engine->name);
>   		else
>   			DRM_INFO("Fake missed irq on %s\n",
>   				 engine->name);
> -		wake_up_all(&engine->irq_queue);
> +
> +		intel_engine_enable_fake_irq(engine);
>   	}
>
>   	return user_interrupts;
> @@ -3148,7 +3143,7 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
>   	intel_uncore_arm_unclaimed_mmio_detection(dev_priv);
>
>   	for_each_engine_id(engine, dev_priv, id) {
> -		bool busy = waitqueue_active(&engine->irq_queue);
> +		bool busy = intel_engine_has_waiter(engine);
>   		u64 acthd;
>   		u32 seqno;
>   		unsigned user_interrupts;
> diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> new file mode 100644
> index 000000000000..c82ecbf7470a
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> @@ -0,0 +1,333 @@
> +/*
> + * Copyright © 2015 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
> + * IN THE SOFTWARE.
> + *
> + */
> +
> +#include "i915_drv.h"
> +
> +static void intel_breadcrumbs_fake_irq(unsigned long data)
> +{
> +	struct intel_engine_cs *engine = (struct intel_engine_cs *)data;
> +
> +	/*
> +	 * The timer persists in case we cannot enable interrupts,
> +	 * or if we have previously seen seqno/interrupt incoherency
> +	 * ("missed interrupt" syndrome). Here the worker will wake up
> +	 * every jiffie in order to kick the oldest waiter to do the
> +	 * coherent seqno check.
> +	 */
> +	rcu_read_lock();
> +	if (intel_engine_wakeup(engine))
> +		mod_timer(&engine->breadcrumbs.fake_irq, jiffies + 1);
> +	rcu_read_unlock();
> +}
> +
> +static void irq_enable(struct intel_engine_cs *engine)
> +{
> +	WARN_ON(!engine->irq_get(engine));
> +}
> +
> +static void irq_disable(struct intel_engine_cs *engine)
> +{
> +	engine->irq_put(engine);
> +}
> +
> +static bool __intel_breadcrumbs_enable_irq(struct intel_breadcrumbs *b)
> +{
> +	struct intel_engine_cs *engine =
> +		container_of(b, struct intel_engine_cs, breadcrumbs);
> +	struct drm_i915_private *i915 = engine->i915;
> +	bool irq_posted = false;
> +
> +	assert_spin_locked(&b->lock);
> +	if (b->rpm_wakelock)
> +		return false;
> +
> +	/* Since we are waiting on a request, the GPU should be busy
> +	 * and should have its own rpm reference. For completeness,
> +	 * record an rpm reference for ourselves to cover the
> +	 * interrupt we unmask.
> +	 */
> +	intel_runtime_pm_get_noresume(i915);
> +	b->rpm_wakelock = true;
> +
> +	/* No interrupts? Kick the waiter every jiffie! */
> +	if (intel_irqs_enabled(i915)) {
> +		if (!test_bit(engine->id, &i915->gpu_error.test_irq_rings)) {
> +			irq_enable(engine);
> +			irq_posted = true;
> +		}
> +		b->irq_enabled = true;
> +	}
> +
> +	if (!b->irq_enabled ||
> +	    test_bit(engine->id, &i915->gpu_error.missed_irq_rings))
> +		mod_timer(&b->fake_irq, jiffies + 1);
> +
> +	return irq_posted;
> +}
> +
> +static void __intel_breadcrumbs_disable_irq(struct intel_breadcrumbs *b)
> +{
> +	struct intel_engine_cs *engine =
> +		container_of(b, struct intel_engine_cs, breadcrumbs);
> +
> +	assert_spin_locked(&b->lock);
> +	if (!b->rpm_wakelock)
> +		return;
> +
> +	if (b->irq_enabled) {
> +		irq_disable(engine);
> +		b->irq_enabled = false;
> +	}
> +
> +	intel_runtime_pm_put(engine->i915);
> +	b->rpm_wakelock = false;
> +}
> +
> +static inline struct intel_wait *to_wait(struct rb_node *node)
> +{
> +	return container_of(node, struct intel_wait, node);
> +}
> +
> +static inline void __intel_breadcrumbs_finish(struct intel_breadcrumbs *b,
> +					      struct intel_wait *wait)
> +{
> +	assert_spin_locked(&b->lock);
> +
> +	/* This request is completed, so remove it from the tree, mark it as
> +	 * complete, and *then* wake up the associated task.
> +	 */
> +	rb_erase(&wait->node, &b->waiters);
> +	RB_CLEAR_NODE(&wait->node);
> +
> +	wake_up_process(wait->task); /* implicit smp_wmb() */
> +}
> +
> +bool intel_engine_add_wait(struct intel_engine_cs *engine,
> +			   struct intel_wait *wait)
> +{
> +	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> +	struct rb_node **p, *parent, *completed;
> +	bool first;
> +	u32 seqno;
> +
> +	spin_lock(&b->lock);
> +
> +	/* Insert the request into the retirement ordered list
> +	 * of waiters by walking the rbtree. If we are the oldest
> +	 * seqno in the tree (the first to be retired), then
> +	 * set ourselves as the bottom-half.
> +	 *
> +	 * As we descend the tree, prune completed branches since we hold the
> +	 * spinlock we know that the first_waiter must be delayed and can
> +	 * reduce some of the sequential wake up latency if we take action
> +	 * ourselves and wake up the completed tasks in parallel. Also, by
> +	 * removing stale elements in the tree, we may be able to reduce the
> +	 * ping-pong between the old bottom-half and ourselves as first-waiter.
> +	 */
> +	first = true;
> +	parent = NULL;
> +	completed = NULL;
> +	seqno = engine->get_seqno(engine);
> +
> +	p = &b->waiters.rb_node;
> +	while (*p) {
> +		parent = *p;
> +		if (wait->seqno == to_wait(parent)->seqno) {
> +			/* We have multiple waiters on the same seqno, select
> +			 * the highest priority task (that with the smallest
> +			 * task->prio) to serve as the bottom-half for this
> +			 * group.
> +			 */
> +			if (wait->task->prio > to_wait(parent)->task->prio) {
> +				p = &parent->rb_right;
> +				first = false;
> +			} else
> +				p = &parent->rb_left;
> +		} else if (i915_seqno_passed(wait->seqno,
> +					     to_wait(parent)->seqno)) {
> +			p = &parent->rb_right;
> +			if (i915_seqno_passed(seqno, to_wait(parent)->seqno))
> +				completed = parent;

Hm don't you need the completed handling in the equal seqno case as well?

> +			else
> +				first = false;
> +		} else
> +			p = &parent->rb_left;
> +	}
> +	rb_link_node(&wait->node, parent, p);
> +	rb_insert_color(&wait->node, &b->waiters);
> +
> +	if (completed != NULL) {
> +		struct rb_node *next = rb_next(completed);
> +
> +		if (next && next != &wait->node) {
> +			GEM_BUG_ON(first);
> +			smp_store_mb(b->first_waiter, to_wait(next)->task);
> +			/* As there is a delay between reading the current
> +			 * seqno, processing the completed tasks and selecting
> +			 * the next waiter, we may have missed the interrupt
> +			 * and so need for the next bottom-half to wakeup.
> +			 *
> +			 * Also as we enable the IRQ, we may miss the
> +			 * interrupt for that seqno, so we have to wake up
> +			 * the next bottom-half in order to do a coherent check
> +			 * in case the seqno passed.
> +			 */
> +			__intel_breadcrumbs_enable_irq(b);
> +			wake_up_process(to_wait(next)->task);
> +		}
> +
> +		do {
> +			struct intel_wait *crumb = to_wait(completed);
> +			completed = rb_prev(completed);
> +			__intel_breadcrumbs_finish(b, crumb);
> +		} while (completed != NULL);
> +	}
> +
> +	if (first) {
> +		smp_store_mb(b->first_waiter, wait->task);
> +		first =__intel_breadcrumbs_enable_irq(b);
> +	}
> +	GEM_BUG_ON(b->first_waiter == NULL);
> +
> +	spin_unlock(&b->lock);
> +
> +	return first;
> +}
> +
> +void intel_engine_enable_fake_irq(struct intel_engine_cs *engine)
> +{
> +	mod_timer(&engine->breadcrumbs.fake_irq, jiffies + 1);
> +}
> +
> +static inline bool chain_wakeup(struct rb_node *rb, int priority)
> +{
> +	return rb && to_wait(rb)->task->prio <= priority;
> +}
> +
> +void intel_engine_remove_wait(struct intel_engine_cs *engine,
> +			      struct intel_wait *wait)
> +{
> +	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> +
> +	/* Quick check to see if this waiter was already decoupled from
> +	 * the tree by the bottom-half to avoid contention on the spinlock
> +	 * by the herd.
> +	 */
> +	if (RB_EMPTY_NODE(&wait->node))
> +		return;
> +
> +	spin_lock(&b->lock);
> +
> +	if (b->first_waiter == wait->task) {
> +		struct rb_node *next;
> +		struct task_struct *task;
> +		const int priority = wait->task->prio;
> +
> +		/* We are the current bottom-half. Find the next candidate,
> +		 * the first waiter in the queue on the remaining oldest
> +		 * request. As multiple seqnos may complete in the time it
> +		 * takes us to wake up and find the next waiter, we have to
> +		 * wake up that waiter for it to perform its own coherent
> +		 * completion check.
> +		 */
> +		next = rb_next(&wait->node);
> +		if (chain_wakeup(next, priority)) {

Don't get this, next waiter my be a different seqno so how is the 
priority check relevant?

Also, how can the next node be smaller priority anyway, if equal seqno 
if has be equal or greater I thought?

Then since add_waiter will wake up one extra waiter to handle the missed 
irq case, here you may skip checking them based simply on task priority 
which seems wrong.

> +			/* If the next waiter is already complete,
> +			 * wake it up and continue onto the next waiter. So
> +			 * if have a small herd, they will wake up in parallel
> +			 * rather than sequentially, which should reduce
> +			 * the overall latency in waking all the completed
> +			 * clients.
> +			 *
> +			 * However, waking up a chain adds extra latency to
> +			 * the first_waiter. This is undesirable if that
> +			 * waiter is a high priority task.
> +			 */
> +			u32 seqno = engine->get_seqno(engine);
> +			while (i915_seqno_passed(seqno, to_wait(next)->seqno)) {
> +				struct rb_node *n = rb_next(next);
> +				__intel_breadcrumbs_finish(b, to_wait(next));
> +				next = n;
> +				if (!chain_wakeup(next, priority))
> +					break;
> +			}
> +		}
> +		task = next ? to_wait(next)->task : NULL;
> +
> +		smp_store_mb(b->first_waiter, task);
> +		if (task) {
> +			/* In our haste, we may have completed the first waiter
> +			 * before we enabled the interrupt. Do so now as we
> +			 * have a second waiter for a future seqno. Afterwards,
> +			 * we have to wake up that waiter in case we missed
> +			 * the interrupt, or if we have to handle an
> +			 * exception rather than a seqno completion.
> +			 */
> +			if (to_wait(next)->seqno != wait->seqno)
> +				__intel_breadcrumbs_enable_irq(b);
> +			wake_up_process(task);
> +		} else
> +			__intel_breadcrumbs_disable_irq(b);
> +	}
> +
> +	if (!RB_EMPTY_NODE(&wait->node))
> +		rb_erase(&wait->node, &b->waiters);
> +	spin_unlock(&b->lock);
> +}
> +
> +void intel_engine_init_breadcrumbs(struct intel_engine_cs *engine)
> +{
> +	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> +
> +	spin_lock_init(&b->lock);
> +	setup_timer(&b->fake_irq,
> +		    intel_breadcrumbs_fake_irq,
> +		    (unsigned long)engine);
> +}
> +
> +void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine)
> +{
> +	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> +
> +	del_timer_sync(&b->fake_irq);
> +}
> +
> +unsigned intel_kick_waiters(struct drm_i915_private *i915)
> +{
> +	struct intel_engine_cs *engine;
> +	unsigned mask = 0;
> +
> +	/* To avoid the task_struct disappearing beneath us as we wake up
> +	 * the process, we must first inspect the task_struct->state under the
> +	 * RCU lock, i.e. as we call wake_up_process() we must be holding the
> +	 * rcu_read_lock().
> +	 */
> +	rcu_read_lock();
> +	for_each_engine(engine, i915)
> +		if (unlikely(intel_engine_wakeup(engine)))
> +			mask |= intel_engine_flag(engine);
> +	rcu_read_unlock();
> +
> +	return mask;
> +}
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index db10c961e0f4..34c6c3fd6296 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1894,6 +1894,8 @@ void intel_logical_ring_cleanup(struct intel_engine_cs *engine)
>   	i915_cmd_parser_fini_ring(engine);
>   	i915_gem_batch_pool_fini(&engine->batch_pool);
>
> +	intel_engine_fini_breadcrumbs(engine);
> +
>   	if (engine->status_page.obj) {
>   		i915_gem_object_unpin_map(engine->status_page.obj);
>   		engine->status_page.obj = NULL;
> @@ -1931,7 +1933,7 @@ logical_ring_default_irqs(struct intel_engine_cs *engine, unsigned shift)
>   {
>   	engine->irq_enable_mask = GT_RENDER_USER_INTERRUPT << shift;
>   	engine->irq_keep_mask = GT_CONTEXT_SWITCH_INTERRUPT << shift;
> -	init_waitqueue_head(&engine->irq_queue);
> +	intel_engine_init_breadcrumbs(engine);
>   }
>
>   static int
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 8d35a3978f9b..9bd3dd756d3d 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -2258,7 +2258,7 @@ static int intel_init_ring_buffer(struct drm_device *dev,
>   	memset(engine->semaphore.sync_seqno, 0,
>   	       sizeof(engine->semaphore.sync_seqno));
>
> -	init_waitqueue_head(&engine->irq_queue);
> +	intel_engine_init_breadcrumbs(engine);
>
>   	ringbuf = intel_engine_create_ringbuffer(engine, 32 * PAGE_SIZE);
>   	if (IS_ERR(ringbuf)) {
> @@ -2327,6 +2327,7 @@ void intel_cleanup_engine(struct intel_engine_cs *engine)
>
>   	i915_cmd_parser_fini_ring(engine);
>   	i915_gem_batch_pool_fini(&engine->batch_pool);
> +	intel_engine_fini_breadcrumbs(engine);
>   	engine->i915 = NULL;
>   }
>
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 929e7b4af2a4..28597ea8fbf9 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -160,6 +160,31 @@ struct intel_engine_cs {
>   	struct intel_ringbuffer *buffer;
>   	struct list_head buffers;
>
> +	/* Rather than have every client wait upon all user interrupts,
> +	 * with the herd waking after every interrupt and each doing the
> +	 * heavyweight seqno dance, we delegate the task (of being the
> +	 * bottom-half of the user interrupt) to the first client. After
> +	 * every interrupt, we wake up one client, who does the heavyweight
> +	 * coherent seqno read and either goes back to sleep (if incomplete),
> +	 * or wakes up all the completed clients in parallel, before then
> +	 * transferring the bottom-half status to the next client in the queue.
> +	 *
> +	 * Compared to walking the entire list of waiters in a single dedicated
> +	 * bottom-half, we reduce the latency of the first waiter by avoiding
> +	 * a context switch, but incur additional coherent seqno reads when
> +	 * following the chain of request breadcrumbs. Since it is most likely
> +	 * that we have a single client waiting on each seqno, then reducing
> +	 * the overhead of waking that client is much preferred.
> +	 */
> +	struct intel_breadcrumbs {
> +		spinlock_t lock; /* protects the lists of requests */
> +		struct rb_root waiters; /* sorted by retirement, priority */
> +		struct task_struct *first_waiter; /* bh for user interrupts */
> +		struct timer_list fake_irq; /* used after a missed interrupt */
> +		bool irq_enabled;
> +		bool rpm_wakelock;
> +	} breadcrumbs;
> +
>   	/*
>   	 * A pool of objects to use as shadow copies of client batch buffers
>   	 * when the command parser is enabled. Prevents the client from
> @@ -308,8 +333,6 @@ struct intel_engine_cs {
>
>   	bool gpu_caches_dirty;
>
> -	wait_queue_head_t irq_queue;
> -
>   	struct intel_context *last_context;
>
>   	struct intel_ring_hangcheck hangcheck;
> @@ -495,4 +518,44 @@ static inline u32 intel_hws_seqno_address(struct intel_engine_cs *engine)
>   	return engine->status_page.gfx_addr + I915_GEM_HWS_INDEX_ADDR;
>   }
>
> +/* intel_breadcrumbs.c -- user interrupt bottom-half for waiters */
> +struct intel_wait {
> +	struct rb_node node;
> +	struct task_struct *task;
> +	u32 seqno;
> +};
> +void intel_engine_init_breadcrumbs(struct intel_engine_cs *engine);
> +static inline void intel_wait_init(struct intel_wait *wait, u32 seqno)
> +{
> +	wait->task = current;
> +	wait->seqno = seqno;
> +}
> +static inline bool intel_wait_complete(const struct intel_wait *wait)
> +{
> +	return RB_EMPTY_NODE(&wait->node);
> +}
> +bool intel_engine_add_wait(struct intel_engine_cs *engine,
> +			   struct intel_wait *wait);
> +void intel_engine_remove_wait(struct intel_engine_cs *engine,
> +			      struct intel_wait *wait);
> +static inline bool intel_engine_has_waiter(struct intel_engine_cs *engine)
> +{
> +	return READ_ONCE(engine->breadcrumbs.first_waiter);
> +}
> +static inline bool intel_engine_wakeup(struct intel_engine_cs *engine)
> +{
> +	bool wakeup = false;
> +	struct task_struct *task = READ_ONCE(engine->breadcrumbs.first_waiter);
> +	/* Note that for this not to dangerously chase a dangling pointer,
> +	 * the caller is responsible for ensure that the task remain valid for
> +	 * wake_up_process() i.e. that the RCU grace period cannot expire.
> +	 */

This gets called from hard irq context and I did not manage to figure 
out what makes it safe in the remove waiter path? Why the hard irq 
couldn't not sample NULL here?

> +	if (task)
> +		wakeup = wake_up_process(task);
> +	return wakeup;
> +}
> +void intel_engine_enable_fake_irq(struct intel_engine_cs *engine);
> +void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine);
> +unsigned intel_kick_waiters(struct drm_i915_private *i915);
> +
>   #endif /* _INTEL_RINGBUFFER_H_ */
>

Otherwise still looks like a good idea! :)

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [CI 04/20] drm/i915: Remove the dedicated hangcheck workqueue
  2016-05-19 13:13     ` Chris Wilson
@ 2016-05-20 12:07       ` Tvrtko Ursulin
  2016-05-20 12:23         ` Chris Wilson
  0 siblings, 1 reply; 39+ messages in thread
From: Tvrtko Ursulin @ 2016-05-20 12:07 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 19/05/16 14:13, Chris Wilson wrote:
> On Thu, May 19, 2016 at 01:50:51PM +0100, Tvrtko Ursulin wrote:
>>
>> On 19/05/16 12:32, Chris Wilson wrote:
>>> The queue only ever contains at most one item and has no special flags.
>>> It is just a very simple wrapper around the system-wq - a complication
>>> with no benefits.
>>
>> How much time do we take in the reset case - is it acceptable to do
>> that work from the system wq?
>
> Hangcheck is a handful of register reads and some pointer chasing per
> engine. (There is a seqno_barrier in there which may be reasonably
> expensive but not a cpu hog). The error capture is run from the
> hangcheck context - and that is no small task (especially if we ever
> apply the object compression patches), but for safety we need to call
> stop_machine() so it really doesn't matter at that point.

I don't see a stop_machine? So until there is one, using the system wq 
is a bit impolite in the error capture state, agreed?

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [CI 06/20] drm/i915: Slaughter the thundering i915_wait_request herd
  2016-05-20 12:04   ` Tvrtko Ursulin
@ 2016-05-20 12:19     ` Chris Wilson
  2016-05-23  8:53       ` Tvrtko Ursulin
  0 siblings, 1 reply; 39+ messages in thread
From: Chris Wilson @ 2016-05-20 12:19 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Fri, May 20, 2016 at 01:04:13PM +0100, Tvrtko Ursulin wrote:
> >+	p = &b->waiters.rb_node;
> >+	while (*p) {
> >+		parent = *p;
> >+		if (wait->seqno == to_wait(parent)->seqno) {
> >+			/* We have multiple waiters on the same seqno, select
> >+			 * the highest priority task (that with the smallest
> >+			 * task->prio) to serve as the bottom-half for this
> >+			 * group.
> >+			 */
> >+			if (wait->task->prio > to_wait(parent)->task->prio) {
> >+				p = &parent->rb_right;
> >+				first = false;
> >+			} else
> >+				p = &parent->rb_left;
> >+		} else if (i915_seqno_passed(wait->seqno,
> >+					     to_wait(parent)->seqno)) {
> >+			p = &parent->rb_right;
> >+			if (i915_seqno_passed(seqno, to_wait(parent)->seqno))
> >+				completed = parent;
> 
> Hm don't you need the completed handling in the equal seqno case as well?

i915_seqno_passed() returnts true if seqnoA == seqnoB

> >+void intel_engine_remove_wait(struct intel_engine_cs *engine,
> >+			      struct intel_wait *wait)
> >+{
> >+	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> >+
> >+	/* Quick check to see if this waiter was already decoupled from
> >+	 * the tree by the bottom-half to avoid contention on the spinlock
> >+	 * by the herd.
> >+	 */
> >+	if (RB_EMPTY_NODE(&wait->node))
> >+		return;
> >+
> >+	spin_lock(&b->lock);
> >+
> >+	if (b->first_waiter == wait->task) {
> >+		struct rb_node *next;
> >+		struct task_struct *task;
> >+		const int priority = wait->task->prio;
> >+
> >+		/* We are the current bottom-half. Find the next candidate,
> >+		 * the first waiter in the queue on the remaining oldest
> >+		 * request. As multiple seqnos may complete in the time it
> >+		 * takes us to wake up and find the next waiter, we have to
> >+		 * wake up that waiter for it to perform its own coherent
> >+		 * completion check.
> >+		 */
> >+		next = rb_next(&wait->node);
> >+		if (chain_wakeup(next, priority)) {
> 
> Don't get this, next waiter my be a different seqno so how is the
> priority check relevant?

We only want to call try_to_wake_up() on processes at least as important
as us.

> Also, how can the next node be smaller priority anyway, if equal
> seqno if has be equal or greater I thought?

Next seqno can have a smaller priority (and be complete). chain_wakeup()
just asks if the next waiter is as important as ourselves (the next
waiter may be for a different seqno).
 
> Then since add_waiter will wake up one extra waiter to handle the
> missed irq case, here you may skip checking them based simply on
> task priority which seems wrong.

Correct. They will be handled by the next waiter in the qeueue, not us.
Our job is to wake as many completed (+ the potentially completed bh) as
possible without incurring undue delay for ourselves. All completed waiters
will be woken in turn as the next bh to run will look at the list and
wake up those at the the same priority as itself (+ the next potentially
completed bh).

> >+static inline bool intel_engine_wakeup(struct intel_engine_cs *engine)
> >+{
> >+	bool wakeup = false;
> >+	struct task_struct *task = READ_ONCE(engine->breadcrumbs.first_waiter);
> >+	/* Note that for this not to dangerously chase a dangling pointer,
> >+	 * the caller is responsible for ensure that the task remain valid for
> >+	 * wake_up_process() i.e. that the RCU grace period cannot expire.
> >+	 */
> 
> This gets called from hard irq context and I did not manage to
> figure out what makes it safe in the remove waiter path? Why the
> hard irq couldn't not sample NULL here?

Because we only need RCU access to task, which is provided by the hard
irq context.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [CI 16/20] drm/i915: Only query timestamp when measuring elapsed time
  2016-05-19 15:44   ` Tvrtko Ursulin
@ 2016-05-20 12:20     ` Chris Wilson
  2016-05-23  8:54       ` Tvrtko Ursulin
  0 siblings, 1 reply; 39+ messages in thread
From: Chris Wilson @ 2016-05-20 12:20 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Thu, May 19, 2016 at 04:44:03PM +0100, Tvrtko Ursulin wrote:
> 
> On 19/05/16 12:32, Chris Wilson wrote:
> >Avoid the two calls to ktime_get_raw_ns() (at best it reads the TSC) as
> >we only need to compute the elapsed time for a timed wait.
> >
> >v2: Eliminate the unused local variable reducing the function size by 64
> >bytes (using the storage space on the callers stack rather than adding
> >to our stack frame)
> >
> >Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >---
> >  drivers/gpu/drm/i915/i915_gem.c | 14 +++++---------
> >  1 file changed, 5 insertions(+), 9 deletions(-)
> >
> >diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> >index b48a3b46e86f..2c254cf49c15 100644
> >--- a/drivers/gpu/drm/i915/i915_gem.c
> >+++ b/drivers/gpu/drm/i915/i915_gem.c
> >@@ -1215,7 +1215,6 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
> >  	int state = interruptible ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
> >  	struct intel_wait wait;
> >  	unsigned long timeout_remain;
> >-	s64 before = 0; /* Only to silence a compiler warning. */
> >  	int ret = 0;
> >
> >  	might_sleep();
> >@@ -1234,12 +1233,9 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
> >  		if (*timeout == 0)
> >  			return -ETIME;
> >
> >+		/* Record current time in case interrupted, or wedged */
> >  		timeout_remain = nsecs_to_jiffies_timeout(*timeout);
> >-
> >-		/*
> >-		 * Record current time in case interrupted by signal, or wedged.
> >-		 */
> >-		before = ktime_get_raw_ns();
> >+		*timeout += ktime_get_raw_ns();
> >  	}
> >
> >  	trace_i915_gem_request_wait_begin(req);
> >@@ -1296,9 +1292,9 @@ complete:
> >  	trace_i915_gem_request_wait_end(req);
> >
> >  	if (timeout) {
> >-		s64 tres = *timeout - (ktime_get_raw_ns() - before);
> >-
> >-		*timeout = tres < 0 ? 0 : tres;
> >+		*timeout -= ktime_get_raw_ns();
> >+		if (*timeout < 0)
> >+			*timeout = 0;
> >
> >  		/*
> >  		 * Apparently ktime isn't accurate enough and occasionally has a
> >
> 
> I think this is bad, better have a local than play games with
> callers storage.

It's smaller faster code :-)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [CI 04/20] drm/i915: Remove the dedicated hangcheck workqueue
  2016-05-20 12:07       ` Tvrtko Ursulin
@ 2016-05-20 12:23         ` Chris Wilson
  2016-05-23  8:55           ` Tvrtko Ursulin
  0 siblings, 1 reply; 39+ messages in thread
From: Chris Wilson @ 2016-05-20 12:23 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Fri, May 20, 2016 at 01:07:29PM +0100, Tvrtko Ursulin wrote:
> 
> On 19/05/16 14:13, Chris Wilson wrote:
> >On Thu, May 19, 2016 at 01:50:51PM +0100, Tvrtko Ursulin wrote:
> >>
> >>On 19/05/16 12:32, Chris Wilson wrote:
> >>>The queue only ever contains at most one item and has no special flags.
> >>>It is just a very simple wrapper around the system-wq - a complication
> >>>with no benefits.
> >>
> >>How much time do we take in the reset case - is it acceptable to do
> >>that work from the system wq?
> >
> >Hangcheck is a handful of register reads and some pointer chasing per
> >engine. (There is a seqno_barrier in there which may be reasonably
> >expensive but not a cpu hog). The error capture is run from the
> >hangcheck context - and that is no small task (especially if we ever
> >apply the object compression patches), but for safety we need to call
> >stop_machine() so it really doesn't matter at that point.
> 
> I don't see a stop_machine? So until there is one, using the system
> wq is a bit impolite in the error capture state, agreed?

It's in the queue to fix the odd oops we get during capture.

You'd be happy with system_long_wq in the meantime?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [CI 06/20] drm/i915: Slaughter the thundering i915_wait_request herd
  2016-05-20 12:19     ` Chris Wilson
@ 2016-05-23  8:53       ` Tvrtko Ursulin
  2016-06-06 10:14         ` Chris Wilson
  0 siblings, 1 reply; 39+ messages in thread
From: Tvrtko Ursulin @ 2016-05-23  8:53 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 20/05/16 13:19, Chris Wilson wrote:
> On Fri, May 20, 2016 at 01:04:13PM +0100, Tvrtko Ursulin wrote:
>>> +	p = &b->waiters.rb_node;
>>> +	while (*p) {
>>> +		parent = *p;
>>> +		if (wait->seqno == to_wait(parent)->seqno) {
>>> +			/* We have multiple waiters on the same seqno, select
>>> +			 * the highest priority task (that with the smallest
>>> +			 * task->prio) to serve as the bottom-half for this
>>> +			 * group.
>>> +			 */
>>> +			if (wait->task->prio > to_wait(parent)->task->prio) {
>>> +				p = &parent->rb_right;
>>> +				first = false;
>>> +			} else
>>> +				p = &parent->rb_left;
>>> +		} else if (i915_seqno_passed(wait->seqno,
>>> +					     to_wait(parent)->seqno)) {
>>> +			p = &parent->rb_right;
>>> +			if (i915_seqno_passed(seqno, to_wait(parent)->seqno))
>>> +				completed = parent;
>>
>> Hm don't you need the completed handling in the equal seqno case as well?
>
> i915_seqno_passed() returnts true if seqnoA == seqnoB

I meant in the first branch, multiple waiter on the same seqno?

>>> +void intel_engine_remove_wait(struct intel_engine_cs *engine,
>>> +			      struct intel_wait *wait)
>>> +{
>>> +	struct intel_breadcrumbs *b = &engine->breadcrumbs;
>>> +
>>> +	/* Quick check to see if this waiter was already decoupled from
>>> +	 * the tree by the bottom-half to avoid contention on the spinlock
>>> +	 * by the herd.
>>> +	 */
>>> +	if (RB_EMPTY_NODE(&wait->node))
>>> +		return;
>>> +
>>> +	spin_lock(&b->lock);
>>> +
>>> +	if (b->first_waiter == wait->task) {
>>> +		struct rb_node *next;
>>> +		struct task_struct *task;
>>> +		const int priority = wait->task->prio;
>>> +
>>> +		/* We are the current bottom-half. Find the next candidate,
>>> +		 * the first waiter in the queue on the remaining oldest
>>> +		 * request. As multiple seqnos may complete in the time it
>>> +		 * takes us to wake up and find the next waiter, we have to
>>> +		 * wake up that waiter for it to perform its own coherent
>>> +		 * completion check.
>>> +		 */
>>> +		next = rb_next(&wait->node);
>>> +		if (chain_wakeup(next, priority)) {
>>
>> Don't get this, next waiter my be a different seqno so how is the
>> priority check relevant?
>
> We only want to call try_to_wake_up() on processes at least as important
> as us.

But what is the comment saying about having to wake up the following 
waiter, because of multiple seqnos potentially completing? It says that 
and then it may not wake up anyone depending on relative priorities.

>
>> Also, how can the next node be smaller priority anyway, if equal
>> seqno if has be equal or greater I thought?
>
> Next seqno can have a smaller priority (and be complete). chain_wakeup()
> just asks if the next waiter is as important as ourselves (the next
> waiter may be for a different seqno).
>
>> Then since add_waiter will wake up one extra waiter to handle the
>> missed irq case, here you may skip checking them based simply on
>> task priority which seems wrong.
>
> Correct. They will be handled by the next waiter in the qeueue, not us.
> Our job is to wake as many completed (+ the potentially completed bh) as
> possible without incurring undue delay for ourselves. All completed waiters
> will be woken in turn as the next bh to run will look at the list and
> wake up those at the the same priority as itself (+ the next potentially
> completed bh).

Who will wake up the next waiter? add_waiter will wake up one, and then 
the next waiter here (in remove_waiter) may not wake up any further 
based on priority.

Is the assumption that it is only possible to miss one interrupt?

>>> +static inline bool intel_engine_wakeup(struct intel_engine_cs *engine)
>>> +{
>>> +	bool wakeup = false;
>>> +	struct task_struct *task = READ_ONCE(engine->breadcrumbs.first_waiter);
>>> +	/* Note that for this not to dangerously chase a dangling pointer,
>>> +	 * the caller is responsible for ensure that the task remain valid for
>>> +	 * wake_up_process() i.e. that the RCU grace period cannot expire.
>>> +	 */
>>
>> This gets called from hard irq context and I did not manage to
>> figure out what makes it safe in the remove waiter path? Why the
>> hard irq couldn't not sample NULL here?
>
> Because we only need RCU access to task, which is provided by the hard
> irq context.

What does the comment mean by saying callers are responsible for the RCU 
period? From which point to which? I still can't tie whatever callers 
might be doing with the unrelated hardirq.

Regards,

Tvrtko



_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [CI 16/20] drm/i915: Only query timestamp when measuring elapsed time
  2016-05-20 12:20     ` Chris Wilson
@ 2016-05-23  8:54       ` Tvrtko Ursulin
  0 siblings, 0 replies; 39+ messages in thread
From: Tvrtko Ursulin @ 2016-05-23  8:54 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 20/05/16 13:20, Chris Wilson wrote:
> On Thu, May 19, 2016 at 04:44:03PM +0100, Tvrtko Ursulin wrote:
>>
>> On 19/05/16 12:32, Chris Wilson wrote:
>>> Avoid the two calls to ktime_get_raw_ns() (at best it reads the TSC) as
>>> we only need to compute the elapsed time for a timed wait.
>>>
>>> v2: Eliminate the unused local variable reducing the function size by 64
>>> bytes (using the storage space on the callers stack rather than adding
>>> to our stack frame)
>>>
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> ---
>>>   drivers/gpu/drm/i915/i915_gem.c | 14 +++++---------
>>>   1 file changed, 5 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>>> index b48a3b46e86f..2c254cf49c15 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>>> @@ -1215,7 +1215,6 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>>>   	int state = interruptible ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
>>>   	struct intel_wait wait;
>>>   	unsigned long timeout_remain;
>>> -	s64 before = 0; /* Only to silence a compiler warning. */
>>>   	int ret = 0;
>>>
>>>   	might_sleep();
>>> @@ -1234,12 +1233,9 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>>>   		if (*timeout == 0)
>>>   			return -ETIME;
>>>
>>> +		/* Record current time in case interrupted, or wedged */
>>>   		timeout_remain = nsecs_to_jiffies_timeout(*timeout);
>>> -
>>> -		/*
>>> -		 * Record current time in case interrupted by signal, or wedged.
>>> -		 */
>>> -		before = ktime_get_raw_ns();
>>> +		*timeout += ktime_get_raw_ns();
>>>   	}
>>>
>>>   	trace_i915_gem_request_wait_begin(req);
>>> @@ -1296,9 +1292,9 @@ complete:
>>>   	trace_i915_gem_request_wait_end(req);
>>>
>>>   	if (timeout) {
>>> -		s64 tres = *timeout - (ktime_get_raw_ns() - before);
>>> -
>>> -		*timeout = tres < 0 ? 0 : tres;
>>> +		*timeout -= ktime_get_raw_ns();
>>> +		if (*timeout < 0)
>>> +			*timeout = 0;
>>>
>>>   		/*
>>>   		 * Apparently ktime isn't accurate enough and occasionally has a
>>>
>>
>> I think this is bad, better have a local than play games with
>> callers storage.
>
> It's smaller faster code :-)

You know I am normally all for that but this one I cannot approve.

Regards,

Tvrtko


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [CI 04/20] drm/i915: Remove the dedicated hangcheck workqueue
  2016-05-20 12:23         ` Chris Wilson
@ 2016-05-23  8:55           ` Tvrtko Ursulin
  0 siblings, 0 replies; 39+ messages in thread
From: Tvrtko Ursulin @ 2016-05-23  8:55 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 20/05/16 13:23, Chris Wilson wrote:
> On Fri, May 20, 2016 at 01:07:29PM +0100, Tvrtko Ursulin wrote:
>>
>> On 19/05/16 14:13, Chris Wilson wrote:
>>> On Thu, May 19, 2016 at 01:50:51PM +0100, Tvrtko Ursulin wrote:
>>>>
>>>> On 19/05/16 12:32, Chris Wilson wrote:
>>>>> The queue only ever contains at most one item and has no special flags.
>>>>> It is just a very simple wrapper around the system-wq - a complication
>>>>> with no benefits.
>>>>
>>>> How much time do we take in the reset case - is it acceptable to do
>>>> that work from the system wq?
>>>
>>> Hangcheck is a handful of register reads and some pointer chasing per
>>> engine. (There is a seqno_barrier in there which may be reasonably
>>> expensive but not a cpu hog). The error capture is run from the
>>> hangcheck context - and that is no small task (especially if we ever
>>> apply the object compression patches), but for safety we need to call
>>> stop_machine() so it really doesn't matter at that point.
>>
>> I don't see a stop_machine? So until there is one, using the system
>> wq is a bit impolite in the error capture state, agreed?
>
> It's in the queue to fix the odd oops we get during capture.
>
> You'd be happy with system_long_wq in the meantime?

Having read the comment in include/linux/workqueue.h I think that would 
be more appropriate in general, so yes.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [CI 06/20] drm/i915: Slaughter the thundering i915_wait_request herd
  2016-05-23  8:53       ` Tvrtko Ursulin
@ 2016-06-06 10:14         ` Chris Wilson
  2016-06-06 11:04           ` Tvrtko Ursulin
  0 siblings, 1 reply; 39+ messages in thread
From: Chris Wilson @ 2016-06-06 10:14 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Mon, May 23, 2016 at 09:53:40AM +0100, Tvrtko Ursulin wrote:
> 
> On 20/05/16 13:19, Chris Wilson wrote:
> >On Fri, May 20, 2016 at 01:04:13PM +0100, Tvrtko Ursulin wrote:
> >>>+	p = &b->waiters.rb_node;
> >>>+	while (*p) {
> >>>+		parent = *p;
> >>>+		if (wait->seqno == to_wait(parent)->seqno) {
> >>>+			/* We have multiple waiters on the same seqno, select
> >>>+			 * the highest priority task (that with the smallest
> >>>+			 * task->prio) to serve as the bottom-half for this
> >>>+			 * group.
> >>>+			 */
> >>>+			if (wait->task->prio > to_wait(parent)->task->prio) {
> >>>+				p = &parent->rb_right;
> >>>+				first = false;
> >>>+			} else
> >>>+				p = &parent->rb_left;
> >>>+		} else if (i915_seqno_passed(wait->seqno,
> >>>+					     to_wait(parent)->seqno)) {
> >>>+			p = &parent->rb_right;
> >>>+			if (i915_seqno_passed(seqno, to_wait(parent)->seqno))
> >>>+				completed = parent;
> >>
> >>Hm don't you need the completed handling in the equal seqno case as well?
> >
> >i915_seqno_passed() returnts true if seqnoA == seqnoB
> 
> I meant in the first branch, multiple waiter on the same seqno?

It could be applied there as well. It is just a bit hairer to keep the
rbtree and call semantics intact. Hmm, for ourselves it is probably
better to early return and late the pending wakeup handle the rbtree.

> >>>+void intel_engine_remove_wait(struct intel_engine_cs *engine,
> >>>+			      struct intel_wait *wait)
> >>>+{
> >>>+	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> >>>+
> >>>+	/* Quick check to see if this waiter was already decoupled from
> >>>+	 * the tree by the bottom-half to avoid contention on the spinlock
> >>>+	 * by the herd.
> >>>+	 */
> >>>+	if (RB_EMPTY_NODE(&wait->node))
> >>>+		return;
> >>>+
> >>>+	spin_lock(&b->lock);
> >>>+
> >>>+	if (b->first_waiter == wait->task) {
> >>>+		struct rb_node *next;
> >>>+		struct task_struct *task;
> >>>+		const int priority = wait->task->prio;
> >>>+
> >>>+		/* We are the current bottom-half. Find the next candidate,
> >>>+		 * the first waiter in the queue on the remaining oldest
> >>>+		 * request. As multiple seqnos may complete in the time it
> >>>+		 * takes us to wake up and find the next waiter, we have to
> >>>+		 * wake up that waiter for it to perform its own coherent
> >>>+		 * completion check.
> >>>+		 */
> >>>+		next = rb_next(&wait->node);
> >>>+		if (chain_wakeup(next, priority)) {
> >>
> >>Don't get this, next waiter my be a different seqno so how is the
> >>priority check relevant?
> >
> >We only want to call try_to_wake_up() on processes at least as important
> >as us.
> 
> But what is the comment saying about having to wake up the following
> waiter, because of multiple seqnos potentially completing? It says
> that and then it may not wake up anyone depending on relative
> priorities.

It does, it always wakes the next bottom half.

> >>Also, how can the next node be smaller priority anyway, if equal
> >>seqno if has be equal or greater I thought?
> >
> >Next seqno can have a smaller priority (and be complete). chain_wakeup()
> >just asks if the next waiter is as important as ourselves (the next
> >waiter may be for a different seqno).
> >
> >>Then since add_waiter will wake up one extra waiter to handle the
> >>missed irq case, here you may skip checking them based simply on
> >>task priority which seems wrong.
> >
> >Correct. They will be handled by the next waiter in the qeueue, not us.
> >Our job is to wake as many completed (+ the potentially completed bh) as
> >possible without incurring undue delay for ourselves. All completed waiters
> >will be woken in turn as the next bh to run will look at the list and
> >wake up those at the the same priority as itself (+ the next potentially
> >completed bh).
> 
> Who will wake up the next waiter? add_waiter will wake up one, and
> then the next waiter here (in remove_waiter) may not wake up any
> further based on priority.

We do. We wake up all waiters that are complete at the same priority or
better than us, plus the next waiter to take over as the bottom half.

> Is the assumption that it is only possible to miss one interrupt?
> 
> >>>+static inline bool intel_engine_wakeup(struct intel_engine_cs *engine)
> >>>+{
> >>>+	bool wakeup = false;
> >>>+	struct task_struct *task = READ_ONCE(engine->breadcrumbs.first_waiter);
> >>>+	/* Note that for this not to dangerously chase a dangling pointer,
> >>>+	 * the caller is responsible for ensure that the task remain valid for
> >>>+	 * wake_up_process() i.e. that the RCU grace period cannot expire.
> >>>+	 */
> >>
> >>This gets called from hard irq context and I did not manage to
> >>figure out what makes it safe in the remove waiter path? Why the
> >>hard irq couldn't not sample NULL here?
> >
> >Because we only need RCU access to task, which is provided by the hard
> >irq context.
> 
> What does the comment mean by saying callers are responsible for the
> RCU period? From which point to which? I still can't tie whatever
> callers might be doing with the unrelated hardirq.

Ensuring that the preempt count is raised to prevent the RCU grace
period from expiring between the read of task and the *task, i.e.
rcu_read_lock() or other means.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [CI 06/20] drm/i915: Slaughter the thundering i915_wait_request herd
  2016-06-06 10:14         ` Chris Wilson
@ 2016-06-06 11:04           ` Tvrtko Ursulin
  0 siblings, 0 replies; 39+ messages in thread
From: Tvrtko Ursulin @ 2016-06-06 11:04 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 06/06/16 11:14, Chris Wilson wrote:
> On Mon, May 23, 2016 at 09:53:40AM +0100, Tvrtko Ursulin wrote:
>>
>> On 20/05/16 13:19, Chris Wilson wrote:
>>> On Fri, May 20, 2016 at 01:04:13PM +0100, Tvrtko Ursulin wrote:
>>>>> +	p = &b->waiters.rb_node;
>>>>> +	while (*p) {
>>>>> +		parent = *p;
>>>>> +		if (wait->seqno == to_wait(parent)->seqno) {
>>>>> +			/* We have multiple waiters on the same seqno, select
>>>>> +			 * the highest priority task (that with the smallest
>>>>> +			 * task->prio) to serve as the bottom-half for this
>>>>> +			 * group.
>>>>> +			 */
>>>>> +			if (wait->task->prio > to_wait(parent)->task->prio) {
>>>>> +				p = &parent->rb_right;
>>>>> +				first = false;
>>>>> +			} else
>>>>> +				p = &parent->rb_left;
>>>>> +		} else if (i915_seqno_passed(wait->seqno,
>>>>> +					     to_wait(parent)->seqno)) {
>>>>> +			p = &parent->rb_right;
>>>>> +			if (i915_seqno_passed(seqno, to_wait(parent)->seqno))
>>>>> +				completed = parent;
>>>>
>>>> Hm don't you need the completed handling in the equal seqno case as well?
>>>
>>> i915_seqno_passed() returnts true if seqnoA == seqnoB
>>
>> I meant in the first branch, multiple waiter on the same seqno?
>
> It could be applied there as well. It is just a bit hairer to keep the
> rbtree and call semantics intact. Hmm, for ourselves it is probably
> better to early return and late the pending wakeup handle the rbtree.

Did not understand this, especially the last sentence. But thats fine, I 
forgot what I meant there anyway after two weeks. :)

>>>>> +void intel_engine_remove_wait(struct intel_engine_cs *engine,
>>>>> +			      struct intel_wait *wait)
>>>>> +{
>>>>> +	struct intel_breadcrumbs *b = &engine->breadcrumbs;
>>>>> +
>>>>> +	/* Quick check to see if this waiter was already decoupled from
>>>>> +	 * the tree by the bottom-half to avoid contention on the spinlock
>>>>> +	 * by the herd.
>>>>> +	 */
>>>>> +	if (RB_EMPTY_NODE(&wait->node))
>>>>> +		return;
>>>>> +
>>>>> +	spin_lock(&b->lock);
>>>>> +
>>>>> +	if (b->first_waiter == wait->task) {
>>>>> +		struct rb_node *next;
>>>>> +		struct task_struct *task;
>>>>> +		const int priority = wait->task->prio;
>>>>> +
>>>>> +		/* We are the current bottom-half. Find the next candidate,
>>>>> +		 * the first waiter in the queue on the remaining oldest
>>>>> +		 * request. As multiple seqnos may complete in the time it
>>>>> +		 * takes us to wake up and find the next waiter, we have to
>>>>> +		 * wake up that waiter for it to perform its own coherent
>>>>> +		 * completion check.
>>>>> +		 */
>>>>> +		next = rb_next(&wait->node);
>>>>> +		if (chain_wakeup(next, priority)) {
>>>>
>>>> Don't get this, next waiter my be a different seqno so how is the
>>>> priority check relevant?
>>>
>>> We only want to call try_to_wake_up() on processes at least as important
>>> as us.
>>
>> But what is the comment saying about having to wake up the following
>> waiter, because of multiple seqnos potentially completing? It says
>> that and then it may not wake up anyone depending on relative
>> priorities.
>
> It does, it always wakes the next bottom half.
>
>>>> Also, how can the next node be smaller priority anyway, if equal
>>>> seqno if has be equal or greater I thought?
>>>
>>> Next seqno can have a smaller priority (and be complete). chain_wakeup()
>>> just asks if the next waiter is as important as ourselves (the next
>>> waiter may be for a different seqno).
>>>
>>>> Then since add_waiter will wake up one extra waiter to handle the
>>>> missed irq case, here you may skip checking them based simply on
>>>> task priority which seems wrong.
>>>
>>> Correct. They will be handled by the next waiter in the qeueue, not us.
>>> Our job is to wake as many completed (+ the potentially completed bh) as
>>> possible without incurring undue delay for ourselves. All completed waiters
>>> will be woken in turn as the next bh to run will look at the list and
>>> wake up those at the the same priority as itself (+ the next potentially
>>> completed bh).
>>
>> Who will wake up the next waiter? add_waiter will wake up one, and
>> then the next waiter here (in remove_waiter) may not wake up any
>> further based on priority.
>
> We do. We wake up all waiters that are complete at the same priority or
> better than us, plus the next waiter to take over as the bottom half.

Oh right, the if (next) branch.

>> Is the assumption that it is only possible to miss one interrupt?
>>
>>>>> +static inline bool intel_engine_wakeup(struct intel_engine_cs *engine)
>>>>> +{
>>>>> +	bool wakeup = false;
>>>>> +	struct task_struct *task = READ_ONCE(engine->breadcrumbs.first_waiter);
>>>>> +	/* Note that for this not to dangerously chase a dangling pointer,
>>>>> +	 * the caller is responsible for ensure that the task remain valid for
>>>>> +	 * wake_up_process() i.e. that the RCU grace period cannot expire.
>>>>> +	 */
>>>>
>>>> This gets called from hard irq context and I did not manage to
>>>> figure out what makes it safe in the remove waiter path? Why the
>>>> hard irq couldn't not sample NULL here?
>>>
>>> Because we only need RCU access to task, which is provided by the hard
>>> irq context.
>>
>> What does the comment mean by saying callers are responsible for the
>> RCU period? From which point to which? I still can't tie whatever
>> callers might be doing with the unrelated hardirq.
>
> Ensuring that the preempt count is raised to prevent the RCU grace
> period from expiring between the read of task and the *task, i.e.
> rcu_read_lock() or other means.

Yes I agree it looks fine on second look. It is only reading the 
engine->breadcrumbs.tasklet once and that one will remain valid until 
the task exists plus a grace period.

Regards,

Tvrtko



_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2016-06-06 11:04 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-19 11:32 [CI 01/20] drm: Restore double clflush on the last partial cacheline Chris Wilson
2016-05-19 11:32 ` [CI 02/20] drm/i915/shrinker: Flush active on objects before counting Chris Wilson
2016-05-19 12:14   ` Tvrtko Ursulin
2016-05-19 11:32 ` [CI 03/20] drm/i915: Delay queuing hangcheck to wait-request Chris Wilson
2016-05-19 12:34   ` Tvrtko Ursulin
2016-05-19 12:52     ` Chris Wilson
2016-05-19 11:32 ` [CI 04/20] drm/i915: Remove the dedicated hangcheck workqueue Chris Wilson
2016-05-19 12:50   ` Tvrtko Ursulin
2016-05-19 13:13     ` Chris Wilson
2016-05-20 12:07       ` Tvrtko Ursulin
2016-05-20 12:23         ` Chris Wilson
2016-05-23  8:55           ` Tvrtko Ursulin
2016-05-19 11:32 ` [CI 05/20] drm/i915: Make queueing the hangcheck work inline Chris Wilson
2016-05-19 12:53   ` Tvrtko Ursulin
2016-05-19 13:18     ` Chris Wilson
2016-05-19 11:32 ` [CI 06/20] drm/i915: Slaughter the thundering i915_wait_request herd Chris Wilson
2016-05-20 12:04   ` Tvrtko Ursulin
2016-05-20 12:19     ` Chris Wilson
2016-05-23  8:53       ` Tvrtko Ursulin
2016-06-06 10:14         ` Chris Wilson
2016-06-06 11:04           ` Tvrtko Ursulin
2016-05-19 11:32 ` [CI 07/20] drm/i915: Remove the lazy_coherency parameter from request-completed? Chris Wilson
2016-05-19 11:32 ` [CI 08/20] drm/i915: Use HWS for seqno tracking everywhere Chris Wilson
2016-05-19 11:32 ` [CI 09/20] drm/i915: Stop mapping the scratch page into CPU space Chris Wilson
2016-05-19 11:32 ` [CI 10/20] drm/i915: Allocate scratch page from stolen Chris Wilson
2016-05-19 11:32 ` [CI 11/20] drm/i915: Refactor scratch object allocation for gen2 w/a buffer Chris Wilson
2016-05-19 11:32 ` [CI 12/20] drm/i915: Add a delay between interrupt and inspecting the final seqno (ilk) Chris Wilson
2016-05-19 11:32 ` [CI 13/20] drm/i915: Check the CPU cached value of seqno after waking the waiter Chris Wilson
2016-05-19 11:32 ` [CI 14/20] drm/i915: Only apply one barrier after a breadcrumb interrupt is posted Chris Wilson
2016-05-19 11:32 ` [CI 15/20] drm/i915: Stop setting wraparound seqno on initialisation Chris Wilson
2016-05-19 11:32 ` [CI 16/20] drm/i915: Only query timestamp when measuring elapsed time Chris Wilson
2016-05-19 15:44   ` Tvrtko Ursulin
2016-05-20 12:20     ` Chris Wilson
2016-05-23  8:54       ` Tvrtko Ursulin
2016-05-19 11:32 ` [CI 17/20] drm/i915: Convert trace-irq to the breadcrumb waiter Chris Wilson
2016-05-19 11:32 ` [CI 18/20] drm/i915: Move the get/put irq locking into the caller Chris Wilson
2016-05-19 11:32 ` [CI 19/20] drm/i915: Simplify enabling user-interrupts with L3-remapping Chris Wilson
2016-05-19 11:32 ` [CI 20/20] drm/i915: Remove debug noise on detecting fault-injection of missed interrupts Chris Wilson
2016-05-19 12:07 ` ✗ Ro.CI.BAT: warning for series starting with [CI,01/20] drm: Restore double clflush on the last partial cacheline Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.