All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chris Wilson <chris@chris-wilson.co.uk>
To: intel-gfx@lists.freedesktop.org
Subject: [PATCH 018/190] drm/i915: Slaughter the thundering i915_wait_request herd
Date: Mon, 11 Jan 2016 09:16:29 +0000	[thread overview]
Message-ID: <1452503961-14837-18-git-send-email-chris@chris-wilson.co.uk> (raw)
In-Reply-To: <1452503961-14837-1-git-send-email-chris@chris-wilson.co.uk>

One particularly stressful scenario consists of many independent tasks
all competing for GPU time and waiting upon the results (e.g. realtime
transcoding of many, many streams). One bottleneck in particular is that
each client waits on its own results, but every client is woken up after
every batchbuffer - hence the thunder of hooves as then every client must
do its heavyweight dance to read a coherent seqno to see if it is the
lucky one.

Ideally, we only want one client to wake up after the interrupt and
check its request for completion. Since the requests must retire in
order, we can select the first client on the oldest request to be woken.
Once that client has completed his wait, we can then wake up the
next client and so on. However, all clients then incur latency as every
process in the chain may be delayed for scheduling - this may also then
cause some priority inversion. To reduce the latency, when a client
is added or removed from the list, we scan the tree for completed
seqno and wake up all the completed waiters in parallel.

Using igt/benchmarks/gem_latency, we can demonstrate this effect. The
benchmark measures the number of GPU cycles between completion of a
batch and the client waking up from a call to wait-ioctl. With many
concurrent waiters, with each on a different request, we observe that
the wakeup latency before the patch scales nearly linearly with the
number of waiters (before external factors kick in making the scaling much
worse). After applying the patch, we can see that only the single waiter
for the request is being woken up, providing a constant wakeup latency
for every operation. However, the situation is not quite as rosy for
many waiters on the same request, though to the best of my knowledge this
is much less likely in practice. Here, we can observe that the
concurrent waiters incur extra latency from being woken up by the
solitary bottom-half, rather than directly by the interrupt. This
appears to be scheduler induced (having discounted adverse effects from
having a rbtree walk/erase in the wakeup path), each additional
wake_up_process() costs aproximately 1us on big core. Another effect of
performing the secondary wakeups from the first bottom-half is the
incurred delay this imposes on high priority threads - rather than
immediately returning to userspace and leaving the interrupt handler to
wake the others.

To offset the delay incurred with additional waiters on a request, we
could use a hybrid scheme that did a quick read in the interrupt handler
and dequeued all the completed waiters (incurring the overhead in the
interrupt handler, not the best plan either as we then incur GPU
submission latency) but we would still have to wake up the bottom-half
everytime to do the heavyweight slow read. Or we could only kick the
waiters on the seqno with the same priority as the current task (i.e. in
the realtime waiter scenario, only it is woken up immediately by the
interrupt and simply queues the next waiter before returning to userspace,
minimising its delay at the expense of the chain, and also reducing
contention on its scheduler runqueue). This is effective at avoid long
pauses in the interrupt handler and at avoiding the extra latency in
realtime/high-priority waiters.

v2: Convert from a kworker per engine into a dedicated kthread for the
bottom-half.
v3: Rename request members and tweak comments.
v4: Use a per-engine spinlock in the breadcrumbs bottom-half.
v5: Fix race in locklessly checking waiter status and kicking the task on
adding a new waiter.
v6: Fix deciding when to force the timer to hide missing interrupts.
v7: Move the bottom-half from the kthread to the first client process.
v8: Reword a few comments
v9: Break the busy loop when the interrupt is unmasked or has fired.
v10: Comments, unnecessary churn, better debugging from Tvrtko
v11: Wake all completed waiters on removing the current bottom-half to
reduce the latency of waking up a herd of clients all waiting on the
same request.
v12: Rearrange missed-interrupt fault injection so that it works with
igt/drv_missed_irq_hang
v13: Rename intel_breadcrumb and friends to intel_wait in preparation
for signal handling.
v14: RCU commentary, assert_spin_locked
v15: Hide BUG_ON behind the compiler; report on gem_latency findings.
v16: Sort seqno-groups by priority so that first-waiter has the highest
task priority (and so avoid priority inversion).

Testcase: igt/gem_concurrent_blit
Testcase: igt/benchmarks/gem_latency
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: "Rogozhkin, Dmitry V" <dmitry.v.rogozhkin@intel.com>
Cc: "Gong, Zhipeng" <zhipeng.gong@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Dave Gordon <david.s.gordon@intel.com>
---
 drivers/gpu/drm/i915/Makefile            |   1 +
 drivers/gpu/drm/i915/i915_debugfs.c      |  19 +-
 drivers/gpu/drm/i915/i915_drv.h          |  32 ++-
 drivers/gpu/drm/i915/i915_gem.c          | 141 +++++--------
 drivers/gpu/drm/i915/i915_gpu_error.c    |   2 +-
 drivers/gpu/drm/i915/i915_irq.c          |  20 +-
 drivers/gpu/drm/i915/intel_breadcrumbs.c | 336 +++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.c         |   5 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c  |   5 +-
 drivers/gpu/drm/i915/intel_ringbuffer.h  |  69 ++++++-
 10 files changed, 521 insertions(+), 109 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/intel_breadcrumbs.c

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 1e9895b9a546..99ce591c8574 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -37,6 +37,7 @@ i915-y += i915_cmd_parser.o \
 	  i915_gem_userptr.o \
 	  i915_gpu_error.o \
 	  i915_trace_points.o \
+	  intel_breadcrumbs.o \
 	  intel_lrc.o \
 	  intel_mocs.o \
 	  intel_ringbuffer.o \
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 6ff2d23faaa7..9396597b136d 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -730,10 +730,22 @@ static int i915_gem_request_info(struct seq_file *m, void *data)
 static void i915_ring_seqno_info(struct seq_file *m,
 				 struct intel_engine_cs *ring)
 {
+	struct rb_node *rb;
+
 	if (ring->get_seqno) {
 		seq_printf(m, "Current sequence (%s): %x\n",
 			   ring->name, ring->get_seqno(ring, false));
 	}
+
+	spin_lock(&ring->breadcrumbs.lock);
+	for (rb = rb_first(&ring->breadcrumbs.waiters);
+	     rb != NULL;
+	     rb = rb_next(rb)) {
+		struct intel_wait *w = container_of(rb, typeof(*w), node);
+		seq_printf(m, "Waiting (%s): %s [%d] on %x\n",
+			   ring->name, w->task->comm, w->task->pid, w->seqno);
+	}
+	spin_unlock(&ring->breadcrumbs.lock);
 }
 
 static int i915_gem_seqno_info(struct seq_file *m, void *data)
@@ -1359,8 +1371,9 @@ static int i915_hangcheck_info(struct seq_file *m, void *unused)
 
 	for_each_ring(ring, dev_priv, i) {
 		seq_printf(m, "%s:\n", ring->name);
-		seq_printf(m, "\tseqno = %x [current %x]\n",
-			   ring->hangcheck.seqno, seqno[i]);
+		seq_printf(m, "\tseqno = %x [current %x], waiters? %d\n",
+			   ring->hangcheck.seqno, seqno[i],
+			   intel_engine_has_waiter(ring));
 		seq_printf(m, "\tACTHD = 0x%08llx [current 0x%08llx]\n",
 			   (long long)ring->hangcheck.acthd,
 			   (long long)acthd[i]);
@@ -2346,7 +2359,7 @@ static int count_irq_waiters(struct drm_i915_private *i915)
 	int i;
 
 	for_each_ring(ring, i915, i)
-		count += ring->irq_refcount;
+		count += intel_engine_has_waiter(ring);
 
 	return count;
 }
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 201dd330f66a..a9e8de57e848 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1379,7 +1379,7 @@ struct i915_gpu_error {
 #define I915_STOP_RING_ALLOW_WARN      (1 << 30)
 
 	/* For missed irq/seqno simulation. */
-	unsigned int test_irq_rings;
+	unsigned long test_irq_rings;
 };
 
 enum modeset_restore {
@@ -2813,7 +2813,6 @@ ibx_disable_display_interrupt(struct drm_i915_private *dev_priv, uint32_t bits)
 	ibx_display_interrupt_update(dev_priv, bits, 0);
 }
 
-
 /* i915_gem.c */
 int i915_gem_create_ioctl(struct drm_device *dev, void *data,
 			  struct drm_file *file_priv);
@@ -3631,4 +3630,33 @@ static inline void i915_trace_irq_get(struct intel_engine_cs *ring,
 		i915_gem_request_assign(&ring->trace_irq_req, req);
 }
 
+static inline bool __i915_request_irq_complete(struct drm_i915_gem_request *req)
+{
+	/* Ensure our read of the seqno is coherent so that we
+	 * do not "miss an interrupt" (i.e. if this is the last
+	 * request and the seqno write from the GPU is not visible
+	 * by the time the interrupt fires, we will see that the
+	 * request is incomplete and go back to sleep awaiting
+	 * another interrupt that will never come.)
+	 *
+	 * Strictly, we only need to do this once after an interrupt,
+	 * but it is easier and safer to do it every time the waiter
+	 * is woken.
+	 */
+	if (i915_gem_request_completed(req, false))
+		return true;
+
+	/* We need to check whether any gpu reset happened in between
+	 * the request being submitted and now. If a reset has occurred,
+	 * the request is effectively complete (we either are in the
+	 * process of or have discarded the rendering and completely
+	 * reset the GPU. The results of the request are lost and we
+	 * are free to continue on with the original operation.
+	 */
+	if (req->reset_counter != i915_reset_counter(&req->i915->gpu_error))
+		return true;
+
+	return false;
+}
+
 #endif
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index b4da8b354a3b..4b26529f1f44 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1121,17 +1121,6 @@ i915_gem_check_wedge(unsigned reset_counter, bool interruptible)
 	return 0;
 }
 
-static void fake_irq(unsigned long data)
-{
-	wake_up_process((struct task_struct *)data);
-}
-
-static bool missed_irq(struct drm_i915_private *dev_priv,
-		       struct intel_engine_cs *ring)
-{
-	return test_bit(ring->id, &dev_priv->gpu_error.missed_irq_rings);
-}
-
 static unsigned long local_clock_us(unsigned *cpu)
 {
 	unsigned long t;
@@ -1164,7 +1153,9 @@ static bool busywait_stop(unsigned long timeout, unsigned cpu)
 	return this_cpu != cpu;
 }
 
-static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
+static bool __i915_spin_request(struct drm_i915_gem_request *req,
+				struct intel_wait *wait,
+				int state)
 {
 	unsigned long timeout;
 	unsigned cpu;
@@ -1179,31 +1170,30 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
 	 * takes to sleep on a request, on the order of a microsecond.
 	 */
 
-	if (req->ring->irq_refcount)
-		return -EBUSY;
-
 	/* Only spin if we know the GPU is processing this request */
 	if (!i915_gem_request_started(req, true))
-		return -EAGAIN;
+		return false;
 
 	timeout = local_clock_us(&cpu) + 5;
-	while (!need_resched()) {
+	do {
 		if (i915_gem_request_completed(req, true))
-			return 0;
+			return true;
 
-		if (signal_pending_state(state, current))
+		if (signal_pending_state(state, wait->task))
 			break;
 
 		if (busywait_stop(timeout, cpu))
 			break;
 
 		cpu_relax_lowlatency();
-	}
 
-	if (i915_gem_request_completed(req, false))
-		return 0;
+		/* Break the loop if we have consumed the timeslice (or been
+		 * preempted) or when either the background thread has
+		 * enabled the interrupt, or the IRQ itself has fired.
+		 */
+	} while (!need_resched() && wait->task->state == state);
 
-	return -EAGAIN;
+	return false;
 }
 
 /**
@@ -1227,18 +1217,13 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 			s64 *timeout,
 			struct intel_rps_client *rps)
 {
-	struct intel_engine_cs *ring = i915_gem_request_get_ring(req);
-	struct drm_device *dev = ring->dev;
-	struct drm_i915_private *dev_priv = dev->dev_private;
-	const bool irq_test_in_progress =
-		ACCESS_ONCE(dev_priv->gpu_error.test_irq_rings) & intel_ring_flag(ring);
 	int state = interruptible ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
-	DEFINE_WAIT(wait);
-	unsigned long timeout_expire;
+	struct intel_wait wait;
+	unsigned long timeout_remain;
 	s64 before, now;
-	int ret;
+	int ret = 0;
 
-	WARN(!intel_irqs_enabled(dev_priv), "IRQs disabled");
+	might_sleep();
 
 	if (list_empty(&req->list))
 		return 0;
@@ -1246,7 +1231,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 	if (i915_gem_request_completed(req, true))
 		return 0;
 
-	timeout_expire = 0;
+	timeout_remain = MAX_SCHEDULE_TIMEOUT;
 	if (timeout) {
 		if (WARN_ON(*timeout < 0))
 			return -EINVAL;
@@ -1254,83 +1239,65 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 		if (*timeout == 0)
 			return -ETIME;
 
-		timeout_expire = jiffies + nsecs_to_jiffies_timeout(*timeout);
+		timeout_remain = nsecs_to_jiffies_timeout(*timeout);
 	}
 
-	if (INTEL_INFO(dev_priv)->gen >= 6)
-		gen6_rps_boost(dev_priv, rps, req->emitted_jiffies);
-
 	/* Record current time in case interrupted by signal, or wedged */
 	trace_i915_gem_request_wait_begin(req);
 	before = ktime_get_raw_ns();
 
-	/* Optimistic spin for the next jiffie before touching IRQs */
-	ret = __i915_spin_request(req, state);
-	if (ret == 0)
-		goto out;
+	if (INTEL_INFO(req->i915)->gen >= 6)
+		gen6_rps_boost(req->i915, rps, req->emitted_jiffies);
 
-	if (!irq_test_in_progress && WARN_ON(!ring->irq_get(ring))) {
-		ret = -ENODEV;
-		goto out;
-	}
-
-	for (;;) {
-		struct timer_list timer;
+	intel_wait_init(&wait, req->seqno);
+	set_task_state(wait.task, state);
 
-		prepare_to_wait(&ring->irq_queue, &wait, state);
+	/* Optimistic spin for the next ~jiffie before touching IRQs */
+	if (intel_engine_add_wait(req->ring, &wait)) {
+		if (__i915_spin_request(req, &wait, state))
+			goto complete;
 
-		/* We need to check whether any gpu reset happened in between
-		 * the request being submitted and now. If a reset has occurred,
-		 * the request is effectively complete (we either are in the
-		 * process of or have discarded the rendering and completely
-		 * reset the GPU. The results of the request are lost and we
-		 * are free to continue on with the original operation.
+		/* In order to check that we haven't missed the interrupt
+		 * as we enabled it, we need to kick ourselves to do a
+		 * coherent check on the seqno before we sleep.
 		 */
-		if (req->reset_counter != i915_reset_counter(&dev_priv->gpu_error)) {
-			ret = 0;
-			break;
-		}
-
-		if (i915_gem_request_completed(req, false)) {
-			ret = 0;
-			break;
-		}
+		if (intel_engine_enable_wait_irq(req->ring, &wait))
+			goto wakeup;
+	}
 
-		if (signal_pending_state(state, current)) {
+	for (;;) {
+		if (signal_pending_state(state, wait.task)) {
 			ret = -ERESTARTSYS;
 			break;
 		}
 
-		if (timeout && time_after_eq(jiffies, timeout_expire)) {
+		/* Ensure that even if the GPU hangs, we get woken up. */
+		i915_queue_hangcheck(req->i915);
+
+		timeout_remain = io_schedule_timeout(timeout_remain);
+		if (timeout_remain == 0) {
 			ret = -ETIME;
 			break;
 		}
 
-		/* Ensure that even if the GPU hangs, we get woken up. */
-		i915_queue_hangcheck(dev_priv);
-
-		timer.function = NULL;
-		if (timeout || missed_irq(dev_priv, ring)) {
-			unsigned long expire;
-
-			setup_timer_on_stack(&timer, fake_irq, (unsigned long)current);
-			expire = missed_irq(dev_priv, ring) ? jiffies + 1 : timeout_expire;
-			mod_timer(&timer, expire);
-		}
+		if (intel_wait_complete(&wait))
+			break;
 
-		io_schedule();
+wakeup:
+		set_task_state(wait.task, state);
 
-		if (timer.function) {
-			del_singleshot_timer_sync(&timer);
-			destroy_timer_on_stack(&timer);
-		}
+		/* Carefully check if the request is complete, giving time
+		 * for the seqno to be visible following the interrupt.
+		 * We also have to check in case we are kicked by the GPU
+		 * reset in order to drop the struct_mutex.
+		 */
+		if (__i915_request_irq_complete(req))
+			break;
 	}
-	if (!irq_test_in_progress)
-		ring->irq_put(ring);
-
-	finish_wait(&ring->irq_queue, &wait);
 
-out:
+complete:
+	intel_engine_remove_wait(req->ring, &wait);
+	__set_task_state(wait.task, TASK_RUNNING);
 	now = ktime_get_raw_ns();
 	trace_i915_gem_request_wait_end(req);
 
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 06ca4082735b..f805d117f3d1 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -900,7 +900,7 @@ static void i915_record_ring_state(struct drm_device *dev,
 		ering->instdone = I915_READ(GEN2_INSTDONE);
 	}
 
-	ering->waiting = waitqueue_active(&ring->irq_queue);
+	ering->waiting = intel_engine_has_waiter(ring);
 	ering->instpm = I915_READ(RING_INSTPM(ring->mmio_base));
 	ering->seqno = ring->get_seqno(ring, false);
 	ering->acthd = intel_ring_get_active_head(ring);
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 2a8a9694eec5..95b997a57da8 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1000,8 +1000,7 @@ static void notify_ring(struct intel_engine_cs *ring)
 		return;
 
 	trace_i915_gem_request_notify(ring);
-
-	wake_up_all(&ring->irq_queue);
+	intel_engine_wakeup(ring);
 }
 
 static void vlv_c0_read(struct drm_i915_private *dev_priv,
@@ -1083,7 +1082,7 @@ static bool any_waiters(struct drm_i915_private *dev_priv)
 	int i;
 
 	for_each_ring(ring, dev_priv, i)
-		if (ring->irq_refcount)
+		if (intel_engine_has_waiter(ring))
 			return true;
 
 	return false;
@@ -2431,9 +2430,6 @@ out:
 static void i915_error_wake_up(struct drm_i915_private *dev_priv,
 			       bool reset_completed)
 {
-	struct intel_engine_cs *ring;
-	int i;
-
 	/*
 	 * Notify all waiters for GPU completion events that reset state has
 	 * been changed, and that they need to restart their wait after
@@ -2441,9 +2437,8 @@ static void i915_error_wake_up(struct drm_i915_private *dev_priv,
 	 * a gpu reset pending so that i915_error_work_func can acquire them).
 	 */
 
-	/* Wake up __wait_seqno, potentially holding dev->struct_mutex. */
-	for_each_ring(ring, dev_priv, i)
-		wake_up_all(&ring->irq_queue);
+	/* Wake up i915_wait_request, potentially holding dev->struct_mutex. */
+	intel_kick_waiters(dev_priv);
 
 	/* Wake up intel_crtc_wait_for_pending_flips, holding crtc->mutex. */
 	wake_up_all(&dev_priv->pending_flip_queue);
@@ -3079,16 +3074,17 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
 			if (ring_idle(ring, seqno)) {
 				ring->hangcheck.action = HANGCHECK_IDLE;
 
-				if (waitqueue_active(&ring->irq_queue)) {
+				if (intel_engine_has_waiter(ring)) {
 					/* Issue a wake-up to catch stuck h/w. */
 					if (!test_and_set_bit(ring->id, &dev_priv->gpu_error.missed_irq_rings)) {
-						if (!(dev_priv->gpu_error.test_irq_rings & intel_ring_flag(ring)))
+						if (!test_bit(ring->id, &dev_priv->gpu_error.test_irq_rings))
 							DRM_ERROR("Hangcheck timer elapsed... %s idle\n",
 								  ring->name);
 						else
 							DRM_INFO("Fake missed irq on %s\n",
 								 ring->name);
-						wake_up_all(&ring->irq_queue);
+
+						intel_engine_enable_fake_irq(ring);
 					}
 					/* Safeguard against driver failure */
 					ring->hangcheck.score += BUSY;
diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
new file mode 100644
index 000000000000..9f756583a44e
--- /dev/null
+++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
@@ -0,0 +1,336 @@
+/*
+ * Copyright © 2015 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+
+#include "i915_drv.h"
+
+static void intel_breadcrumbs_fake_irq(unsigned long data)
+{
+	struct intel_engine_cs *engine = (struct intel_engine_cs *)data;
+
+	/*
+	 * The timer persists in case we cannot enable interrupts,
+	 * or if we have previously seen seqno/interrupt incoherency
+	 * ("missed interrupt" syndrome). Here the worker will wake up
+	 * every jiffie in order to kick the oldest waiter to do the
+	 * coherent seqno check.
+	 */
+	rcu_read_lock();
+	if (intel_engine_wakeup(engine))
+		mod_timer(&engine->breadcrumbs.fake_irq, jiffies + 1);
+	rcu_read_unlock();
+}
+
+static void irq_enable(struct intel_engine_cs *engine)
+{
+	WARN_ON(!engine->irq_get(engine));
+}
+
+static void irq_disable(struct intel_engine_cs *engine)
+{
+	engine->irq_put(engine);
+}
+
+static bool __intel_breadcrumbs_enable_irq(struct intel_breadcrumbs *b)
+{
+	struct intel_engine_cs *engine =
+		container_of(b, struct intel_engine_cs, breadcrumbs);
+	bool noirq;
+
+	assert_spin_locked(&b->lock);
+	if (b->rpm_wakelock)
+		return false;
+
+	/* Since we are waiting on a request, the GPU should be busy
+	 * and should have its own rpm reference. For completeness,
+	 * record an rpm reference for ourselves to cover the
+	 * interrupt we unmask.
+	 */
+	intel_runtime_pm_get_noresume(engine->i915);
+	b->rpm_wakelock = true;
+
+	/* No interrupts? Kick the waiter every jiffie! */
+	noirq = true;
+	if (intel_irqs_enabled(engine->i915)) {
+		noirq = test_bit(engine->id,
+				 &engine->i915->gpu_error.missed_irq_rings);
+		if (!test_bit(engine->id,
+			      &engine->i915->gpu_error.test_irq_rings)) {
+			irq_enable(engine);
+			b->irq_enabled = true;
+		}
+	}
+	if (noirq)
+		mod_timer(&b->fake_irq, jiffies + 1);
+
+	return b->irq_enabled;
+}
+
+static void __intel_breadcrumbs_disable_irq(struct intel_breadcrumbs *b)
+{
+	struct intel_engine_cs *engine =
+		container_of(b, struct intel_engine_cs, breadcrumbs);
+
+	assert_spin_locked(&b->lock);
+	if (!b->rpm_wakelock)
+		return;
+
+	if (b->irq_enabled) {
+		irq_disable(engine);
+		b->irq_enabled = false;
+	}
+
+	intel_runtime_pm_put(engine->i915);
+	b->rpm_wakelock = false;
+}
+
+static inline struct intel_wait *to_wait(struct rb_node *node)
+{
+	return container_of(node, struct intel_wait, node);
+}
+
+static inline void __intel_breadcrumbs_finish(struct intel_breadcrumbs *b,
+					      struct intel_wait *wait)
+{
+	assert_spin_locked(&b->lock);
+
+	/* This request is completed, so remove it from the tree, mark it as
+	 * complete, and *then* wake up the associated task.
+	 */
+	rb_erase(&wait->node, &b->waiters);
+	RB_CLEAR_NODE(&wait->node);
+
+	wake_up_process(wait->task); /* implicit smp_wmb() */
+}
+
+bool intel_engine_add_wait(struct intel_engine_cs *engine,
+			   struct intel_wait *wait)
+{
+	struct intel_breadcrumbs *b = &engine->breadcrumbs;
+	u32 seqno = engine->get_seqno(engine, true);
+	struct rb_node **p, *parent, *completed;
+	bool first;
+
+	spin_lock(&b->lock);
+
+	/* Insert the request into the retirment ordered list
+	 * of waiters by walking the rbtree. If we are the oldest
+	 * seqno in the tree (the first to be retired), then
+	 * set ourselves as the bottom-half.
+	 *
+	 * As we descend the tree, prune completed branches since we hold the
+	 * spinlock we know that the first_waiter must be delayed and can
+	 * reduce some of the sequential wake up latency if we take action
+	 * ourselves and wake up the copmleted tasks in parallel. Also, by
+	 * removing stale elements in the tree, we may be able to reduce the
+	 * ping-pong between the old bottom-half and ourselves as first-waiter.
+	 */
+	first = true;
+	parent = NULL;
+	completed = NULL;
+	p = &b->waiters.rb_node;
+	while (*p) {
+		parent = *p;
+		if (wait->seqno == to_wait(parent)->seqno) {
+			/* We have multiple waiters on the same seqno, select
+			 * the highest priority task (that with the smallest
+			 * task->prio) to serve as the bottom-half for this
+			 * group.
+			 */
+			if (wait->task->prio > to_wait(parent)->task->prio) {
+				p = &parent->rb_right;
+				first = false;
+			} else
+				p = &parent->rb_left;
+		} else if (i915_seqno_passed(wait->seqno,
+					     to_wait(parent)->seqno)) {
+			p = &parent->rb_right;
+			if (i915_seqno_passed(seqno, to_wait(parent)->seqno))
+				completed = parent;
+			else
+				first = false;
+		} else
+			p = &parent->rb_left;
+	}
+	rb_link_node(&wait->node, parent, p);
+	rb_insert_color(&wait->node, &b->waiters);
+
+	if (completed != NULL) {
+		struct rb_node *next = rb_next(completed);
+
+		if (next && next != &wait->node) {
+			GEM_BUG_ON(first);
+			smp_store_mb(b->first_waiter, to_wait(next)->task);
+			/* If we enable the IRQ, we may have missed the
+			 * interrupt for that seqno, so we have to wake up
+			 * that bottom-half in order to do a coherent check
+			 * in case the seqno passed.
+			 */
+			if (__intel_breadcrumbs_enable_irq(b))
+				wake_up_process(to_wait(next)->task);
+		}
+
+		do {
+			struct intel_wait *crumb = to_wait(completed);
+			completed = rb_prev(completed);
+			__intel_breadcrumbs_finish(b, crumb);
+		} while (completed != NULL);
+	}
+
+	if (first)
+		smp_store_mb(b->first_waiter, wait->task);
+	GEM_BUG_ON(b->first_waiter == NULL);
+
+	spin_unlock(&b->lock);
+
+	return first;
+}
+
+bool intel_engine_enable_wait_irq(struct intel_engine_cs *engine,
+				  const struct intel_wait *wait)
+{
+	struct intel_breadcrumbs *b = &engine->breadcrumbs;
+	bool first = false;
+
+	spin_lock(&b->lock);
+	if (b->first_waiter == wait->task)
+		first =__intel_breadcrumbs_enable_irq(b);
+	spin_unlock(&b->lock);
+
+	return first;
+}
+
+void intel_engine_enable_fake_irq(struct intel_engine_cs *engine)
+{
+	mod_timer(&engine->breadcrumbs.fake_irq, jiffies + 1);
+}
+
+static inline bool chain_wakeup(struct rb_node *rb, int priority)
+{
+	return rb && to_wait(rb)->task->prio <= priority;
+}
+
+void intel_engine_remove_wait(struct intel_engine_cs *engine,
+			      struct intel_wait *wait)
+{
+	struct intel_breadcrumbs *b = &engine->breadcrumbs;
+
+	/* Quick check to see if this waiter was already decoupled from
+	 * the tree by the bottom-half to avoid contention on the spinlock
+	 * by the herd.
+	 */
+	if (RB_EMPTY_NODE(&wait->node))
+		return;
+
+	spin_lock(&b->lock);
+
+	if (b->first_waiter == wait->task) {
+		struct rb_node *next;
+		struct task_struct *task;
+		const int priority = wait->task->prio;
+
+		/* We are the current bottom-half. Find the next candidate,
+		 * the first waiter in the queue on the remaining oldest
+		 * request. As multiple seqnos may complete in the time it
+		 * takes us to wake up and find the next waiter, we have to
+		 * wake up that waiter for it to perform its own coherent
+		 * completion check.
+		 */
+		next = rb_next(&wait->node);
+		if (chain_wakeup(next, priority)) {
+			/* If the next waiter is already complete,
+			 * wake it up and continue onto the next waiter. So
+			 * if have a small herd, they will wake up in parallel
+			 * rather than sequentially, which should reduce
+			 * the overall latency in waking all the completed
+			 * clients.
+			 *
+			 * However, waking up a chain adds extra latency to
+			 * the first_waiter. This is undesirable if that
+			 * waiter is a high priority task.
+			 */
+			u32 seqno = engine->get_seqno(engine, true);
+			while (i915_seqno_passed(seqno,
+						 to_wait(next)->seqno)) {
+				struct rb_node *n = rb_next(next);
+				__intel_breadcrumbs_finish(b, to_wait(next));
+				next = n;
+				if (!chain_wakeup(next, priority))
+					break;
+			}
+		}
+		task = next ? to_wait(next)->task : NULL;
+
+		smp_store_mb(b->first_waiter, task);
+		if (task) {
+			/* In our haste, we may have completed the first waiter
+			 * before we enabled the interrupt. Do so now as we
+			 * have a second waiter for a future seqno. Afterwards,
+			 * we have to wake up that waiter in case we missed
+			 * the interrupt, or if we have to handle an
+			 * exception rather than a seqno completion.
+			 */
+			if (to_wait(next)->seqno != wait->seqno)
+				__intel_breadcrumbs_enable_irq(b);
+			wake_up_process(task);
+		} else
+			__intel_breadcrumbs_disable_irq(b);
+	}
+
+	if (!RB_EMPTY_NODE(&wait->node))
+		rb_erase(&wait->node, &b->waiters);
+	spin_unlock(&b->lock);
+}
+
+void intel_engine_init_breadcrumbs(struct intel_engine_cs *engine)
+{
+	struct intel_breadcrumbs *b = &engine->breadcrumbs;
+
+	spin_lock_init(&b->lock);
+	setup_timer(&b->fake_irq,
+		    intel_breadcrumbs_fake_irq,
+		    (unsigned long)engine);
+}
+
+void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine)
+{
+	struct intel_breadcrumbs *b = &engine->breadcrumbs;
+
+	del_timer_sync(&b->fake_irq);
+}
+
+void intel_kick_waiters(struct drm_i915_private *i915)
+{
+	struct intel_engine_cs *engine;
+	int i;
+
+	/* To avoid the task_struct disappearing beneath us as we wake up
+	 * the process, we must first inspect the task_struct->state under the
+	 * RCU lock, i.e. as we call wake_up_process() we must be holding the
+	 * rcu_read_lock().
+	 */
+	rcu_read_lock();
+	for_each_ring(engine, i915, i)
+		intel_engine_wakeup(engine);
+	rcu_read_unlock();
+}
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 32644338e6f8..16fa58a0a930 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1928,6 +1928,8 @@ void intel_logical_ring_cleanup(struct intel_engine_cs *ring)
 	i915_cmd_parser_fini_ring(ring);
 	i915_gem_batch_pool_fini(&ring->batch_pool);
 
+	intel_engine_fini_breadcrumbs(ring);
+
 	if (ring->status_page.obj) {
 		kunmap(sg_page(ring->status_page.obj->pages->sgl));
 		ring->status_page.obj = NULL;
@@ -1945,10 +1947,11 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
 	ring->buffer = NULL;
 
 	ring->dev = dev;
+	ring->i915 = to_i915(dev);
 	INIT_LIST_HEAD(&ring->active_list);
 	INIT_LIST_HEAD(&ring->request_list);
 	i915_gem_batch_pool_init(dev, &ring->batch_pool);
-	init_waitqueue_head(&ring->irq_queue);
+	intel_engine_init_breadcrumbs(ring);
 
 	INIT_LIST_HEAD(&ring->buffers);
 	INIT_LIST_HEAD(&ring->execlist_queue);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index a1d43b2c7077..60b0df2c5399 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2152,6 +2152,7 @@ static int intel_init_ring_buffer(struct drm_device *dev,
 	WARN_ON(ring->buffer);
 
 	ring->dev = dev;
+	ring->i915 = to_i915(dev);
 	INIT_LIST_HEAD(&ring->active_list);
 	INIT_LIST_HEAD(&ring->request_list);
 	INIT_LIST_HEAD(&ring->execlist_queue);
@@ -2159,7 +2160,7 @@ static int intel_init_ring_buffer(struct drm_device *dev,
 	i915_gem_batch_pool_init(dev, &ring->batch_pool);
 	memset(ring->semaphore.sync_seqno, 0, sizeof(ring->semaphore.sync_seqno));
 
-	init_waitqueue_head(&ring->irq_queue);
+	intel_engine_init_breadcrumbs(ring);
 
 	ringbuf = intel_engine_create_ringbuffer(ring, 32 * PAGE_SIZE);
 	if (IS_ERR(ringbuf)) {
@@ -2223,6 +2224,8 @@ void intel_cleanup_ring_buffer(struct intel_engine_cs *ring)
 
 	i915_cmd_parser_fini_ring(ring);
 	i915_gem_batch_pool_fini(&ring->batch_pool);
+	intel_engine_fini_breadcrumbs(ring);
+
 	ring->dev = NULL;
 }
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 7349d9258191..51fcb66bfc4a 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -158,9 +158,35 @@ struct  intel_engine_cs {
 #define LAST_USER_RING (VECS + 1)
 	u32		mmio_base;
 	struct		drm_device *dev;
+	struct drm_i915_private *i915;
 	struct intel_ringbuffer *buffer;
 	struct list_head buffers;
 
+	/* Rather than have every client wait upon all user interrupts,
+	 * with the herd waking after every interrupt and each doing the
+	 * heavyweight seqno dance, we delegate the task (of being the
+	 * bottom-half of the user interrupt) to the first client. After
+	 * every interrupt, we wake up one client, who does the heavyweight
+	 * coherent seqno read and either goes back to sleep (if incomplete),
+	 * or wakes up all the completed clients in parallel, before then
+	 * transferring the bottom-half status to the next client in the queue.
+	 *
+	 * Compared to walking the entire list of waiters in a single dedicated
+	 * bottom-half, we reduce the latency of the first waiter by avoiding
+	 * a context switch, but incur additional coherent seqno reads when
+	 * following the chain of request breadcrumbs. Since it is most likely
+	 * that we have a single client waiting on each seqno, then reducing
+	 * the overhead of waking that client is much preferred.
+	 */
+	struct intel_breadcrumbs {
+		spinlock_t lock; /* protects the lists of requests */
+		struct rb_root waiters; /* sorted by retirement, priority */
+		struct task_struct *first_waiter; /* bh for user interrupts */
+		struct timer_list fake_irq; /* used after a missed interrupt */
+		bool irq_enabled;
+		bool rpm_wakelock;
+	} breadcrumbs;
+
 	/*
 	 * A pool of objects to use as shadow copies of client batch buffers
 	 * when the command parser is enabled. Prevents the client from
@@ -304,8 +330,6 @@ struct  intel_engine_cs {
 
 	bool gpu_caches_dirty;
 
-	wait_queue_head_t irq_queue;
-
 	struct intel_context *default_context;
 	struct intel_context *last_context;
 
@@ -511,4 +535,45 @@ void intel_ring_reserved_space_end(struct intel_ringbuffer *ringbuf);
 /* Legacy ringbuffer specific portion of reservation code: */
 int intel_ring_reserve_space(struct drm_i915_gem_request *request);
 
+/* intel_breadcrumbs.c -- user interrupt bottom-half for waiters */
+struct intel_wait {
+	struct rb_node node;
+	struct task_struct *task;
+	u32 seqno;
+};
+void intel_engine_init_breadcrumbs(struct intel_engine_cs *engine);
+static inline void intel_wait_init(struct intel_wait *wait, u32 seqno)
+{
+	wait->task = current;
+	wait->seqno = seqno;
+}
+static inline bool intel_wait_complete(const struct intel_wait *wait)
+{
+	return RB_EMPTY_NODE(&wait->node);
+}
+bool intel_engine_add_wait(struct intel_engine_cs *engine,
+			   struct intel_wait *wait);
+bool intel_engine_enable_wait_irq(struct intel_engine_cs *engine,
+				  const struct intel_wait *wait);
+void intel_engine_remove_wait(struct intel_engine_cs *engine,
+			      struct intel_wait *wait);
+static inline bool intel_engine_has_waiter(struct intel_engine_cs *engine)
+{
+	return READ_ONCE(engine->breadcrumbs.first_waiter);
+}
+static inline bool intel_engine_wakeup(struct intel_engine_cs *engine)
+{
+	struct task_struct *task = READ_ONCE(engine->breadcrumbs.first_waiter);
+	/* Note that for this not to dangerously chase a dangling pointer,
+	 * the caller is responsible for ensure that the task remain valid for
+	 * wake_up_process() i.e. that the RCU grace period cannot expire.
+	 */
+	if (task)
+		wake_up_process(task);
+	return task != NULL;
+}
+void intel_engine_enable_fake_irq(struct intel_engine_cs *engine);
+void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine);
+void intel_kick_waiters(struct drm_i915_private *i915);
+
 #endif /* _INTEL_RINGBUFFER_H_ */
-- 
2.7.0.rc3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

  parent reply	other threads:[~2016-01-11  9:19 UTC|newest]

Thread overview: 263+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-11  9:16 [PATCH 001/190] drm: Release driver references to handle before making it available again Chris Wilson
2016-01-11  9:16 ` [PATCH 002/190] drm/i915: Move the mb() following release-mmap into release-mmap Chris Wilson
2016-01-11  9:16 ` [PATCH 003/190] drm/i915: Add an optional selection from i915 of CONFIG_MMU_NOTIFIER Chris Wilson
2016-02-17 12:59   ` Daniel Vetter
2016-01-11  9:16 ` [PATCH 004/190] drm/i915: Fix some invalid requests cancellations Chris Wilson
2016-01-12 18:16   ` [Intel-gfx] " Dave Gordon
2016-01-12 18:16     ` Dave Gordon
2016-01-13 20:06     ` [Intel-gfx] " Chris Wilson
2016-01-11  9:16 ` [PATCH 005/190] drm/i915: Force clean compilation with -Werror Chris Wilson
2016-01-11  9:16 ` [PATCH 006/190] drm/i915: Add GEM debugging Kconfig option Chris Wilson
2016-01-12 17:44   ` Dave Gordon
2016-01-11  9:16 ` [PATCH 007/190] drm/i915: Hide the atomic_read(reset_counter) behind a helper Chris Wilson
2016-01-11  9:16 ` [PATCH 008/190] drm/i915: Simplify checking of GPU reset_counter in display pageflips Chris Wilson
2016-01-11  9:16 ` [PATCH 009/190] drm/i915: Tighten reset_counter for reset status Chris Wilson
2016-01-11  9:16 ` [PATCH 010/190] drm/i915: Store the reset counter when constructing a request Chris Wilson
2016-01-11  9:16 ` [PATCH 011/190] drm/i915: Simplify reset_counter handling during atomic modesetting Chris Wilson
2016-01-11  9:16 ` [PATCH 012/190] drm/i915: Prevent leaking of -EIO from i915_wait_request() Chris Wilson
2016-01-11  9:16 ` [PATCH 013/190] drm/i915: Suppress error message when GPU resets are disabled Chris Wilson
2016-01-11  9:16 ` [PATCH 014/190] drm/i915: Delay queuing hangcheck to wait-request Chris Wilson
2016-01-11  9:16 ` [PATCH 015/190] drm/i915: Remove the dedicated hangcheck workqueue Chris Wilson
2016-01-11  9:16 ` [PATCH 016/190] drm/i915: Make queueing the hangcheck work inline Chris Wilson
2016-01-11  9:16 ` [PATCH 017/190] drm/i915: Remove forcewake dance from seqno/irq barrier on legacy gen6+ Chris Wilson
2016-01-11 14:02   ` Dave Gordon
2016-01-21 16:27     ` Mika Kuoppala
2016-03-24  6:39   ` David Weinehall
2016-01-11  9:16 ` Chris Wilson [this message]
2016-01-11  9:16 ` [PATCH 019/190] drm/i915: Separate out the seqno-barrier from engine->get_seqno Chris Wilson
2016-01-11 15:43   ` Dave Gordon
2016-01-11  9:16 ` [PATCH 020/190] drm/i915: Remove the lazy_coherency parameter from request-completed? Chris Wilson
2016-01-11 15:45   ` Dave Gordon
2016-01-11 16:24     ` Chris Wilson
2016-01-12 10:27   ` Mika Kuoppala
2016-01-12 10:51     ` Chris Wilson
2016-01-11  9:16 ` [PATCH 021/190] drm/i915: Use HWS for seqno tracking everywhere Chris Wilson
2016-01-11 20:03   ` Dave Gordon
2016-01-12 10:05   ` Mika Kuoppala
2016-01-12 11:03     ` Chris Wilson
2016-01-12 14:30       ` Mika Kuoppala
2016-01-12 14:46         ` Chris Wilson
2016-01-11  9:16 ` [PATCH 022/190] drm/i915: Check the CPU cached value of seqno after waking the waiter Chris Wilson
2016-01-11  9:16 ` [PATCH 023/190] drm/i915: Only apply one barrier after a breadcrumb interrupt is posted Chris Wilson
2016-01-11  9:16 ` [PATCH 024/190] drm/i915: Replace manual barrier() with READ_ONCE() in HWS accessor Chris Wilson
2016-01-12 14:17   ` Mika Kuoppala
2016-01-11  9:16 ` [PATCH 025/190] drm/i915: Broadwell execlists needs exactly the same seqno w/a as legacy Chris Wilson
2016-01-11  9:16 ` [PATCH 026/190] drm/i915: Stop setting wraparound seqno on initialisation Chris Wilson
2016-01-11  9:16 ` [PATCH 027/190] drm/i915: Only query timestamp when measuring elapsed time Chris Wilson
2016-01-11  9:16 ` [PATCH 028/190] drm/i915: On GPU reset, set the HWS breadcrumb to the last seqno Chris Wilson
2016-01-11  9:16 ` [PATCH 029/190] drm/i915: Convert trace-irq to the breadcrumb waiter Chris Wilson
2016-01-11  9:16 ` [PATCH 030/190] drm/i915: Move the get/put irq locking into the caller Chris Wilson
2016-01-11  9:16 ` [PATCH 031/190] drm/i915: Harden detection of missed interrupts Chris Wilson
2016-01-11  9:16 ` [PATCH 032/190] drm/i915: Remove debug noise on detecting fault-injection " Chris Wilson
2016-01-11  9:16 ` [PATCH 033/190] drm/i915: Only start retire worker when idle Chris Wilson
2016-01-11  9:16 ` [PATCH 034/190] drm/i915: Do not keep postponing the idle-work Chris Wilson
2016-01-11  9:16 ` [PATCH 035/190] drm/i915: Remove redundant queue_delayed_work() from throttle ioctl Chris Wilson
2016-01-11  9:16 ` [PATCH 036/190] drm/i915: Restore waitboost credit to the synchronous waiter Chris Wilson
2016-01-11 16:10   ` Jesse Barnes
2016-01-11  9:16 ` [PATCH 037/190] drm/i915: Add background commentary to "waitboosting" Chris Wilson
2016-01-11  9:16 ` [PATCH 038/190] drm/i915: Flush the RPS bottom-half when the GPU idles Chris Wilson
2016-01-11  9:16 ` [PATCH 039/190] drm/i915: Remove stop-rings debugfs interface Chris Wilson
2016-02-25 17:30   ` Arun Siluvery
2016-01-11  9:16 ` [PATCH 040/190] drm/i915: Record the ringbuffer associated with the request Chris Wilson
2016-01-11  9:16 ` [PATCH 041/190] drm/i915: Allow userspace to request no-error-capture upon GPU hangs Chris Wilson
2016-01-11  9:16 ` [PATCH 042/190] drm/i915: Clean up GPU hang message Chris Wilson
2016-02-25 17:40   ` Arun Siluvery
2016-01-11  9:16 ` [PATCH 043/190] drm/i915: Skip capturing an error state if we already have one Chris Wilson
2016-01-11  9:16 ` [PATCH 044/190] drm/i915: Move GEM request routines to i915_gem_request.c Chris Wilson
2016-02-25 17:52   ` Arun Siluvery
2016-03-08 12:58     ` Tvrtko Ursulin
2016-03-08 13:35       ` Arun Siluvery
2016-01-11  9:16 ` [PATCH 045/190] drm/i915: Move releasing of the GEM request from free to retire/cancel Chris Wilson
2016-03-08 13:15   ` Tvrtko Ursulin
2016-04-05 13:42     ` Tvrtko Ursulin
2016-04-05 14:09       ` Chris Wilson
2016-04-05 14:17         ` Tvrtko Ursulin
2016-04-05 14:27           ` Chris Wilson
2016-04-05 14:45             ` Chris Wilson
2016-04-05 14:10       ` Chris Wilson
2016-01-11  9:16 ` [PATCH 046/190] drm/i915: Derive GEM requests from dma-fence Chris Wilson
2016-01-11  9:16 ` [PATCH 047/190] drm/i915: Rename request reference/unreference to get/put Chris Wilson
2016-01-11  9:16 ` [PATCH 048/190] drm/i915: Disable waitboosting for fence_wait() Chris Wilson
2016-01-11  9:17 ` [PATCH 049/190] drm/i915: Disable waitboosting for mmioflips/semaphores Chris Wilson
2016-01-11  9:17 ` [PATCH 050/190] drm/i915: Refactor duplicate object vmap functions Chris Wilson
2016-01-11  9:17 ` [PATCH 051/190] drm,i915: Introduce drm_malloc_gfp() Chris Wilson
2016-01-11  9:17 ` [PATCH 052/190] drm/i915: Treat ringbuffer writes as write to normal memory Chris Wilson
2016-01-11  9:17 ` [PATCH 053/190] drm/i915: Convert i915_semaphores_is_enabled over to early sanitize Chris Wilson
2016-01-12 19:07   ` Dave Gordon
2016-01-11  9:17 ` [PATCH 054/190] drm/i915: Use the new rq->i915 field where appropriate Chris Wilson
2016-01-11  9:17 ` [PATCH 055/190] drm/i915: Unify intel_logical_ring_emit and intel_ring_emit Chris Wilson
2016-01-12 17:29   ` Dave Gordon
2016-01-11  9:17 ` [PATCH 056/190] drm/i915: Unify intel_ring_begin() Chris Wilson
2016-01-11  9:17 ` [PATCH 057/190] drm/i915: Remove the identical implementations of request space reservation Chris Wilson
2016-01-11  9:17 ` [PATCH 058/190] drm/i915: Rename request->ring to request->engine Chris Wilson
2016-01-28 11:45   ` Tvrtko Ursulin
2016-01-11  9:17 ` [PATCH 059/190] drm/i915: Rename request->ringbuf to request->ring Chris Wilson
2016-01-28 11:48   ` Tvrtko Ursulin
2016-01-11  9:17 ` [PATCH 060/190] drm/i915: Rename backpointer from intel_ringbuffer to intel_engine_cs Chris Wilson
2016-01-28 11:49   ` Tvrtko Ursulin
2016-01-11  9:17 ` [PATCH 061/190] drm/i915: Rename intel_context[engine].ringbuf Chris Wilson
2016-01-11  9:17 ` [PATCH 062/190] drm/i915: Rename extern functions operating on intel_engine_cs Chris Wilson
2016-01-11  9:17 ` [PATCH 063/190] drm/i915: Rename struct intel_ringbuffer to intel_ring Chris Wilson
2016-01-28 11:54   ` Tvrtko Ursulin
2016-01-11  9:17 ` [PATCH 064/190] drm/i915: Rename intel_pin_and_map_ring() Chris Wilson
2016-01-11  9:17 ` [PATCH 065/190] drm/i915: Remove obsolete engine->gpu_caches_dirty Chris Wilson
2016-01-11  9:17 ` [PATCH 066/190] drm/i915: Simplify request_alloc by returning the allocated request Chris Wilson
2016-01-12 17:11   ` Dave Gordon
2016-01-11  9:17 ` [PATCH 067/190] drm/i915: Unify legacy/execlists emission of MI_BATCHBUFFER_START Chris Wilson
2016-01-11  9:17 ` [PATCH 068/190] drm/i915: Unify adding requests between ringbuffer and execlists Chris Wilson
2016-01-11  9:17 ` [PATCH 069/190] drm/i915: Remove duplicate golden render state init from execlists Chris Wilson
2016-01-11  9:17 ` [PATCH 070/190] drm/i915: Unify legacy/execlists submit_execbuf callbacks Chris Wilson
2016-01-11  9:17 ` [PATCH 071/190] drm/i915: Simplify calling engine->sync_to Chris Wilson
2016-01-11  9:17 ` [PATCH 072/190] drm/i915: Execlists cannot pin a context without the object Chris Wilson
2016-01-11 15:24   ` Tvrtko Ursulin
2016-01-11  9:17 ` [PATCH 073/190] drm/i915: Introduce i915_gem_active for request tracking Chris Wilson
2016-01-11 17:32   ` Tvrtko Ursulin
2016-01-11 22:49     ` Chris Wilson
2016-01-12 10:04       ` Tvrtko Ursulin
2016-01-12 11:01         ` Chris Wilson
2016-01-12 13:42           ` Tvrtko Ursulin
2016-01-12 13:44           ` Tvrtko Ursulin
2016-01-12 14:08             ` Chris Wilson
2016-01-11  9:17 ` [PATCH 074/190] drm/i915: Rename request->list to link for consistency Chris Wilson
2016-01-12 13:47   ` Tvrtko Ursulin
2016-01-11  9:17 ` [PATCH 075/190] drm/i915: Refactor activity tracking for requests Chris Wilson
2016-01-28 11:41   ` Tvrtko Ursulin
2016-01-28 11:46     ` Chris Wilson
2016-01-28 11:56       ` Tvrtko Ursulin
2016-01-11  9:17 ` [PATCH 076/190] drm/i915: Rename vma->*_list to *_link for consistency Chris Wilson
2016-01-12 13:49   ` Tvrtko Ursulin
2016-01-11  9:17 ` [PATCH 077/190] drm/i915: Amalgamate GGTT/ppGTT vma debug list walkers Chris Wilson
2016-01-11  9:17 ` [PATCH 078/190] drm/i915: Split early global GTT initialisation Chris Wilson
2016-01-11  9:17 ` [PATCH 079/190] drm/i915: Reduce the pointer dance of i915_is_ggtt() Chris Wilson
2016-01-15 12:12   ` Dave Gordon
2016-01-15 12:24     ` Chris Wilson
2016-01-11  9:17 ` [PATCH 080/190] drm/i915: Store owning file on the i915_address_space Chris Wilson
2016-01-11  9:17 ` [PATCH 081/190] drm/i915: i915_vma_move_to_active prep patch Chris Wilson
2016-01-11  9:17 ` [PATCH 082/190] drm/i915: Count how many VMA are bound for an object Chris Wilson
2016-01-11  9:17 ` [PATCH 083/190] drm/i915: Be more careful when unbinding vma Chris Wilson
2016-01-11  9:17 ` [PATCH 084/190] drm/i915: Track active vma requests Chris Wilson
2016-01-11  9:17 ` [PATCH 085/190] drm/i915: Release vma when the handle is closed Chris Wilson
2016-01-11  9:17 ` [PATCH 086/190] drm/i915: Mark the context and address space as closed Chris Wilson
2016-01-11 10:44 ` [PATCH 087/190] Revert "drm/i915: Clean up associated VMAs on context destruction" Chris Wilson
2016-01-11 10:44   ` [PATCH 088/190] drm/i915: Move execlists interrupt based submission to a bottom-half Chris Wilson
2016-02-19 12:08     ` Tvrtko Ursulin
2016-02-19 12:29       ` Chris Wilson
2016-02-19 14:10         ` Tvrtko Ursulin
2016-02-19 14:34           ` Chris Wilson
2016-02-19 14:52             ` Tvrtko Ursulin
2016-02-19 15:02               ` Chris Wilson
2016-02-19 14:41           ` Chris Wilson
2016-01-11 10:44   ` [PATCH 089/190] drm/i915: Tidy execlists submission and tracking Chris Wilson
2016-01-11 10:44   ` [PATCH 090/190] drm/i915: Refactor execlists default context pinning Chris Wilson
2016-01-11 10:44   ` [PATCH 091/190] drm/i915: Move context initialisation to first-use Chris Wilson
2016-01-11 10:44   ` [PATCH 092/190] drm/i915: Move the magical deferred context allocation into the request Chris Wilson
2016-01-11 10:44   ` [PATCH 093/190] drm/i915: Move the forced switch back to the kernel context into eviction Chris Wilson
2016-01-11 10:44   ` [PATCH 094/190] drm/i915: Remove early l3-remap Chris Wilson
2016-01-11 10:44   ` [PATCH 095/190] drm/i915: Rearrange switch_context to load the aliasing ppgtt on first use Chris Wilson
2016-01-11 10:44   ` [PATCH 096/190] drm/i915: Eliminate early submission of context enabling request Chris Wilson
2016-01-11 10:44   ` [PATCH 097/190] drm/i915/shrinker: Flush active on objects before counting Chris Wilson
2016-01-11 10:44   ` [PATCH 098/190] drm/i915: Double check the active status on the batch pool Chris Wilson
2016-01-11 10:44   ` [PATCH 099/190] drm/i915: Check for request completion before choosing CS flips Chris Wilson
2016-01-11 10:44   ` [PATCH 100/190] drm/i915: Remove request retirement before each batch Chris Wilson
2016-01-11 10:44   ` [PATCH 101/190] drm/i915: Only retire if necessary when creating a userptr Chris Wilson
2016-01-11 10:44   ` [PATCH 102/190] drm/i915: Move the "per-ring" default_context to the device Chris Wilson
2016-01-11 14:40     ` Dave Gordon
2016-01-11 10:44   ` [PATCH 103/190] drm/i915: Move pinning of dev_priv->kernel_context into its creator Chris Wilson
2016-01-11 10:44   ` [PATCH 104/190] drm/i915: Remove i915_gem_execbuffer_retire_commands() Chris Wilson
2016-01-11 10:44   ` [PATCH 105/190] drm/i915: Pad GTT views of exec objects up to user specified size Chris Wilson
2016-03-22 14:32     ` David Weinehall
2016-01-11 10:44   ` [PATCH 106/190] drm/i915: Split insertion/binding of an object into the VM Chris Wilson
2016-01-11 10:44   ` [PATCH 107/190] drm/i915: Record allocated vma size Chris Wilson
2016-01-11 10:44   ` [PATCH 108/190] drm/i915: Start passing around i915_vma from execbuffer Chris Wilson
2016-01-11 10:44   ` [PATCH 109/190] drm/i915: Remove highly confusing i915_gem_obj_ggtt_pin() Chris Wilson
2016-01-11 10:44   ` [PATCH 110/190] drm/i915: Move vma->pin_count:4 to vma->flags Chris Wilson
2016-01-11 10:44   ` [PATCH 111/190] drm/i915: Make fb_tracking.lock a spinlock Chris Wilson
2016-01-11 10:44   ` [PATCH 112/190] drm/i915: Move obj->active:5 to obj->flags Chris Wilson
2016-03-24 12:00     ` David Weinehall
2016-01-11 10:44   ` [PATCH 113/190] drm/i915: Enable lockless lookup of request tracking via RCU Chris Wilson
2016-01-11 10:44   ` [PATCH 114/190] drm/i915: Remove (struct_mutex) locking for wait-ioctl Chris Wilson
2016-01-11 10:44   ` [PATCH 115/190] drm/i915: Remove (struct_mutex) locking for busy-ioctl Chris Wilson
2016-01-11 10:45   ` [PATCH 116/190] drm/i915: Reduce locking inside swfinish ioctl Chris Wilson
2016-01-11 10:45   ` [PATCH 117/190] drm/i915: Remove pinned check from madvise ioctl Chris Wilson
2016-01-11 10:45   ` [PATCH 118/190] drm/i915: Remove locking for get_tiling Chris Wilson
2016-01-11 10:45   ` [PATCH 119/190] drm/i915: Reduce amount of duplicate buffer information captured on error Chris Wilson
2016-01-11 10:45   ` [PATCH 120/190] drm/i915: Stop the machine whilst capturing the GPU crash dump Chris Wilson
2016-01-11 10:45   ` [PATCH 121/190] drm/i915: Scan GGTT active list for context object Chris Wilson
2016-01-11 10:45   ` [PATCH 122/190] drm/i915: Move setting of request->batch into its single callsite Chris Wilson
2016-01-11 10:45   ` [PATCH 123/190] drm/i915: Mark unmappable GGTT entries as PIN_HIGH Chris Wilson
2016-01-11 10:45   ` [PATCH 124/190] drm/i915: Track pinned vma inside guc Chris Wilson
2016-01-11 10:45   ` [PATCH 125/190] drm/i915: Track pinned VMA Chris Wilson
2016-01-11 10:45   ` [PATCH 126/190] drm/i915: Print the batchbuffer offset next to BBADDR in error state Chris Wilson
2016-01-11 10:45   ` [PATCH 127/190] drm/i915: Cache kmap between relocations Chris Wilson
2016-01-11 10:45   ` [PATCH 128/190] drm/i915: Extract i915_gem_obj_prepare_shmem_write() Chris Wilson
2016-01-11 10:45   ` [PATCH 129/190] drm/i915: Before accessing an object via the cpu, flush GTT writes Chris Wilson
2016-01-11 10:45   ` [PATCH 130/190] drm/i915: Wait for writes through the GTT to land before reading back Chris Wilson
2016-01-11 10:45   ` [PATCH 131/190] drm/i915: Pin the pages first in shmem prepare read/write Chris Wilson
2016-01-11 10:45   ` [PATCH 132/190] drm/i915: Tidy up flush cpu/gtt write domains Chris Wilson
2016-01-11 10:45   ` [PATCH 133/190] drm/i915: Convert known clflush paths over to clflush_cache_range() Chris Wilson
2016-01-11 10:45   ` [PATCH 134/190] drm/i915: Refactor execbuffer relocation writing Chris Wilson
2016-01-11 10:45   ` [PATCH 135/190] drm/i915: Move map-and-fenceable tracking to the VMA Chris Wilson
2016-01-11 10:45   ` [PATCH 136/190] drm/i915: Move ioremap_wc tracking onto VMA Chris Wilson
2016-02-11 13:20     ` Tvrtko Ursulin
2016-02-11 13:29       ` Chris Wilson
2016-02-11 14:10         ` Tvrtko Ursulin
2016-02-19 15:11           ` Chris Wilson
2016-02-22 15:29             ` Tvrtko Ursulin
2016-02-23 10:21               ` Chris Wilson
2016-01-11 10:45   ` [PATCH 137/190] drm/i915: Shrink pages around failure to dma map Chris Wilson
2016-01-11 10:45   ` [PATCH 138/190] drm/i915/userptr: Make gup errors stickier Chris Wilson
2016-01-11 10:45   ` [PATCH 139/190] drm/i915: Move fence tracking from object to vma Chris Wilson
2016-01-11 10:45   ` [PATCH 140/190] drm/i915: Fix partial GGTT faulting Chris Wilson
2016-01-11 10:45   ` [PATCH 141/190] drm/i915: Choose not to evict faultable objects from the GGTT Chris Wilson
2016-01-11 11:00 ` [PATCH 142/190] drm/i915: Fallback to using unmappable memory for scanout Chris Wilson
2016-01-11 11:00   ` [PATCH 143/190] drm/i915: Track display alignment on VMA Chris Wilson
2016-01-11 11:00   ` [PATCH 144/190] drm/i915: Bump the inactive MRU tracking for all VMA accessed Chris Wilson
2016-01-11 11:00   ` [PATCH 145/190] drm/i915: Stop discarding GTT cache-domain on unbind vma Chris Wilson
2016-01-12 13:22     ` Joonas Lahtinen
2016-01-11 11:00   ` [PATCH 146/190] io-mapping: Always create a struct to hold metadata about the io-mapping Chris Wilson
2016-01-11 11:00   ` [PATCH 147/190] drm/i915: Use remap_io_mapping() to prefault all PTE in a single pass Chris Wilson
2016-01-11 11:00   ` [PATCH 148/190] drm/i915: Stop marking the unaccessible scratch page as UC Chris Wilson
2016-01-11 11:00   ` [PATCH 149/190] drm/i915: Use i915_vm_to_ppgtt() Chris Wilson
2016-01-11 11:00   ` [PATCH 150/190] drm/i915: Embed the scratch page struct into each VM Chris Wilson
2016-01-11 11:00   ` [PATCH 151/190] drm/i915: Allow DMA pagetables to use highmem Chris Wilson
2016-01-11 11:00   ` [PATCH 152/190] drm/i915: Replace request->postfix with ->head for space searching Chris Wilson
2016-01-11 11:00   ` [PATCH 153/190] drm/i915: Record the position of the start of the request Chris Wilson
2016-01-11 11:00   ` [PATCH 154/190] drm/i915: Move per-request pid from request to ctx Chris Wilson
2016-01-11 11:00   ` [PATCH 155/190] drm/i915: Merge legacy+execlists context structs Chris Wilson
2016-01-11 11:00   ` [PATCH 156/190] drm/i915: Store the active context object on all engines upon error Chris Wilson
2016-01-11 11:00   ` [PATCH 157/190] drm/i915: Tidy execlists by using intel_context_engine locals Chris Wilson
2016-01-11 11:00   ` [PATCH 158/190] drm/i915: Skip holding an object reference for execbuf preparation Chris Wilson
2016-01-11 11:01   ` [PATCH 159/190] drm/i915: Defer active reference until required Chris Wilson
2016-01-11 11:01   ` [PATCH 160/190] drm: Track drm_mm nodes with an interval tree Chris Wilson
2016-01-11 11:01   ` [PATCH 161/190] drm: Convert drm_vma_manager to embedded interval-tree in drm_mm Chris Wilson
2016-01-11 11:01   ` [PATCH 162/190] drm/i915: Allow the user to pass a context to any ring Chris Wilson
2016-01-11 11:01   ` [PATCH 163/190] drm/i915: Fix i915_gem_evict_for_vma (soft-pinning) Chris Wilson
2016-01-11 11:01   ` [PATCH 164/190] drm/i915: Move obj->dirty:1 to obj->flags Chris Wilson
2016-03-24  8:17     ` David Weinehall
2016-01-11 11:01   ` [PATCH 165/190] drm/i915: Use the precomputed value for whether to enable command parsing Chris Wilson
2016-01-11 11:01   ` [PATCH 166/190] drm/i915: Drop spinlocks around adding to the client request list Chris Wilson
2016-01-11 11:01   ` [PATCH 167/190] drm/i915: Amalgamate execbuffer parameter structures Chris Wilson
2016-01-11 11:01   ` [PATCH 168/190] drm/i915: Skip holding context reference for duration of execbuffer call Chris Wilson
2016-01-11 11:01   ` [PATCH 169/190] drm/i915: Use vma->exec_entry as our double-entry placeholder Chris Wilson
2016-01-11 11:01   ` [PATCH 170/190] drm/i915: Store a direct lookup from object handle to vma Chris Wilson
2016-01-11 11:01   ` [PATCH 171/190] drm/i915: Pass vma to relocate entry Chris Wilson
2016-01-11 11:01   ` [PATCH 172/190] drm/i915: Eliminate lots of iterations over the execobjects array Chris Wilson
2016-01-11 11:01   ` [PATCH 173/190] drm/i915: Wait upon userptr get-user-pages within execbuffer Chris Wilson
2016-01-11 11:01   ` [PATCH 174/190] drm/i915: Show context objects in debugfs/i915_gem_objects Chris Wilson
2016-03-24  7:58     ` David Weinehall
2016-01-11 11:01   ` [PATCH 175/190] drm/i915: Remove superfluous i915_add_request_no_flush() helper Chris Wilson
2016-01-11 11:01   ` [PATCH 176/190] drm/i915: Use the MRU stack search after evicting Chris Wilson
2016-01-11 11:01   ` [PATCH 177/190] drm/i915: Use VMA as the primary object for context state Chris Wilson
2016-01-11 11:01   ` [PATCH 178/190] drm/i915: Do an inline flush-active before dropping the mutex when waiting Chris Wilson
2016-01-11 11:01   ` [PATCH 179/190] drm/i915: Skip MI_SET_CONTEXT for the same context Chris Wilson
2016-01-11 11:01   ` [PATCH 180/190] drm/i915: Micro-optimise i915_gem_object_get_dirty_page() Chris Wilson
2016-01-11 11:01   ` [PATCH 181/190] drm/i915: Introduce an internal allocator for disposable private objects Chris Wilson
2016-01-11 11:01   ` [PATCH 182/190] drm/i915: Avoid allocating a vmap arena for a single page Chris Wilson
2016-01-11 11:01   ` [PATCH 183/190] drm/i915/cmdparser: Use cached vmappings Chris Wilson
2016-01-11 11:01   ` [PATCH 184/190] drm/i915/cmdparser: Only cache the dst vmap Chris Wilson
2016-01-11 11:01   ` [PATCH 185/190] drm/i915/cmdparser: Improve hash function Chris Wilson
2016-01-11 11:01   ` [PATCH 186/190] drm/i915/cmdparser: Compare against the previous command descriptor Chris Wilson
2016-01-11 11:01   ` [PATCH 187/190] drm/i915: Allow execbuffer to use the first object as the batch Chris Wilson
2016-01-11 11:01   ` [PATCH 188/190] drm/i915: Use VMA for ringbuffer tracking Chris Wilson
2016-01-11 11:01   ` [PATCH 189/190] drm/i915: Skip clearing the GGTT on full-ppgtt systems Chris Wilson
2016-01-11 11:01   ` [PATCH 190/190] drm/i915: Do a nonblocking wait first in pread/pwrite Chris Wilson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1452503961-14837-18-git-send-email-chris@chris-wilson.co.uk \
    --to=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.