All of lore.kernel.org
 help / color / mirror / Atom feed
* First class VMA, take 2
@ 2016-08-07 14:45 Chris Wilson
  2016-08-07 14:45 ` [PATCH 01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance Chris Wilson
                   ` (38 more replies)
  0 siblings, 39 replies; 125+ messages in thread
From: Chris Wilson @ 2016-08-07 14:45 UTC (permalink / raw)
  To: intel-gfx

Same series as before with a couple of fixes up front and the larger
patches at the rear broken up into a dozen or more separate patches.
This is just a small set to set us on the path of tracking VMA and
moving more information onto them.
-Chris

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* [PATCH 01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance
  2016-08-07 14:45 First class VMA, take 2 Chris Wilson
@ 2016-08-07 14:45 ` Chris Wilson
  2016-08-08  9:12   ` Daniel Vetter
  2016-08-07 14:45 ` [PATCH 02/33] drm/i915: Do not overwrite the request with zero on reallocation Chris Wilson
                   ` (37 subsequent siblings)
  38 siblings, 1 reply; 125+ messages in thread
From: Chris Wilson @ 2016-08-07 14:45 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter

In the debate as to whether the second read of active->request is
ordered after the dependent reads of the first read of active->request,
just give in and throw a smp_rmb() in there so that ordering of loads is
assured.

v2: Explain the manual smp_rmb()

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_gem.c         | 25 ++++++++++++++++++++-----
 drivers/gpu/drm/i915/i915_gem_request.h |  3 +++
 2 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index f4f8eaa90f2a..654f0b015f97 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3735,7 +3735,7 @@ i915_gem_object_ggtt_unpin_view(struct drm_i915_gem_object *obj,
 	i915_vma_unpin(i915_gem_obj_to_ggtt_view(obj, view));
 }
 
-static __always_inline unsigned __busy_read_flag(unsigned int id)
+static __always_inline unsigned int __busy_read_flag(unsigned int id)
 {
 	/* Note that we could alias engines in the execbuf API, but
 	 * that would be very unwise as it prevents userspace from
@@ -3753,7 +3753,7 @@ static __always_inline unsigned int __busy_write_id(unsigned int id)
 	return id;
 }
 
-static __always_inline unsigned
+static __always_inline unsigned int
 __busy_set_if_active(const struct i915_gem_active *active,
 		     unsigned int (*flag)(unsigned int id))
 {
@@ -3770,19 +3770,34 @@ __busy_set_if_active(const struct i915_gem_active *active,
 
 		id = request->engine->exec_id;
 
-		/* Check that the pointer wasn't reassigned and overwritten. */
+		/* Check that the pointer wasn't reassigned and overwritten.
+		 *
+		 * In __i915_gem_active_get_rcu(), we enforce ordering between
+		 * the first rcu pointer dereference (imposing a
+		 * read-dependency only on access through the pointer) and
+		 * the second lockless access through the memory barrier
+		 * following a successful atomic_inc_not_zero(). Here there
+		 * is no such barrier, and so we must manually insert an
+		 * explicit read barrier to ensure that the following
+		 * access occurs after all the loads through the first
+		 * pointer.
+		 *
+		 * The corresponding write barrier is part of
+		 * rcu_assign_pointer().
+		 */
+		smp_rmb();
 		if (request == rcu_access_pointer(active->request))
 			return flag(id);
 	} while (1);
 }
 
-static inline unsigned
+static __always_inline unsigned int
 busy_check_reader(const struct i915_gem_active *active)
 {
 	return __busy_set_if_active(active, __busy_read_flag);
 }
 
-static inline unsigned
+static __always_inline unsigned int
 busy_check_writer(const struct i915_gem_active *active)
 {
 	return __busy_set_if_active(active, __busy_write_id);
diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
index 3496e28785e7..b2456dede3ad 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.h
+++ b/drivers/gpu/drm/i915/i915_gem_request.h
@@ -497,6 +497,9 @@ __i915_gem_active_get_rcu(const struct i915_gem_active *active)
 		 * incremented) then the following read for rcu_access_pointer()
 		 * must occur after the atomic operation and so confirm
 		 * that this request is the one currently being tracked.
+		 *
+		 * The corresponding write barrier is part of
+		 * rcu_assign_pointer().
 		 */
 		if (!request || request == rcu_access_pointer(active->request))
 			return rcu_pointer_handoff(request);
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 125+ messages in thread

* [PATCH 02/33] drm/i915: Do not overwrite the request with zero on reallocation
  2016-08-07 14:45 First class VMA, take 2 Chris Wilson
  2016-08-07 14:45 ` [PATCH 01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance Chris Wilson
@ 2016-08-07 14:45 ` Chris Wilson
  2016-08-08  9:25   ` Daniel Vetter
  2016-08-07 14:45 ` [PATCH 03/33] drm/i915: Move missed interrupt detection from hangcheck to breadcrumbs Chris Wilson
                   ` (36 subsequent siblings)
  38 siblings, 1 reply; 125+ messages in thread
From: Chris Wilson @ 2016-08-07 14:45 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter, Goel, Akash

When using RCU lookup for the request, commit 0eafec6d3244 ("drm/i915:
Enable lockless lookup of request tracking via RCU"), we acknowledge that
we may race with another thread that could have reallocated the request.
In order for the first thread not to blow up, the second thread must not
clear the request completed before overwriting it. In the RCU lookup, we
allow for the engine/seqno to be replaced but we do not allow for it to
be zeroed.

The choice we make is to either add extra checking to the RCU lookup, or
embrace the inherent races (as intended). It is more complicated as we
need to manually clear everything we depend upon being zero initialised,
but we benefit from not emiting the memset() to clear the entire
frequently allocated structure (that memset turns up in throughput
profiles). And at the same time, the lookup remains flexible for future
adjustments.

v2: Old style LRC requires another variable to be initialize. (The
danger inherent in not zeroing everything.)
v3: request->batch also needs to be cleared

Fixes: 0eafec6d3244 ("drm/i915: Enable lockless lookup of request...")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: "Goel, Akash" <akash.goel@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gem_request.c | 37 ++++++++++++++++++++++++++++++++-
 drivers/gpu/drm/i915/i915_gem_request.h | 11 ++++++++++
 2 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index 6a1661643d3d..b7ffde002a62 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -355,7 +355,35 @@ i915_gem_request_alloc(struct intel_engine_cs *engine,
 	if (req && i915_gem_request_completed(req))
 		i915_gem_request_retire(req);
 
-	req = kmem_cache_zalloc(dev_priv->requests, GFP_KERNEL);
+	/* Beware: Dragons be flying overhead.
+	 *
+	 * We use RCU to look up requests in flight. The lookups may
+	 * race with the request being allocated from the slab freelist.
+	 * That is the request we are writing to here, may be in the process
+	 * of being read by __i915_gem_active_get_request_rcu(). As such,
+	 * we have to be very careful when overwriting the contents. During
+	 * the RCU lookup, we change chase the request->engine pointer,
+	 * read the request->fence.seqno and increment the reference count.
+	 *
+	 * The reference count is incremented atomically. If it is zero,
+	 * the lookup knows the request is unallocated and complete. Otherwise,
+	 * it is either still in use, or has been reallocated and reset
+	 * with fence_init(). This increment is safe for release as we check
+	 * that the request we have a reference to and matches the active
+	 * request.
+	 *
+	 * Before we increment the refcount, we chase the request->engine
+	 * pointer. We must not call kmem_cache_zalloc() or else we set
+	 * that pointer to NULL and cause a crash during the lookup. If
+	 * we see the request is completed (based on the value of the
+	 * old engine and seqno), the lookup is complete and reports NULL.
+	 * If we decide the request is not completed (new engine or seqno),
+	 * then we grab a reference and double check that it is still the
+	 * active request - which it won't be and restart the lookup.
+	 *
+	 * Do not use kmem_cache_zalloc() here!
+	 */
+	req = kmem_cache_alloc(dev_priv->requests, GFP_KERNEL);
 	if (!req)
 		return ERR_PTR(-ENOMEM);
 
@@ -375,6 +403,13 @@ i915_gem_request_alloc(struct intel_engine_cs *engine,
 	req->engine = engine;
 	req->ctx = i915_gem_context_get(ctx);
 
+	/* No zalloc, must clear what we need by hand */
+	req->signaling.wait.tsk = NULL;
+	req->previous_context = NULL;
+	req->file_priv = NULL;
+	req->batch_obj = NULL;
+	req->elsp_submitted = 0;
+
 	/*
 	 * Reserve space in the ring buffer for all the commands required to
 	 * eventually emit this request. This is to guarantee that the
diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
index b2456dede3ad..721eb8cbce9b 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.h
+++ b/drivers/gpu/drm/i915/i915_gem_request.h
@@ -51,6 +51,13 @@ struct intel_signal_node {
  * emission time to be associated with the request for tracking how far ahead
  * of the GPU the submission is.
  *
+ * When modifying this structure be very aware that we perform a lockless
+ * RCU lookup of it that may race against reallocation of the struct
+ * from the slab freelist. We intentionally do not zero the structure on
+ * allocation so that the lookup can use the dangling pointers (and is
+ * cogniscent that those pointers may be wrong). Instead, everything that
+ * needs to be initialised must be done so explicitly.
+ *
  * The requests are reference counted.
  */
 struct drm_i915_gem_request {
@@ -465,6 +472,10 @@ __i915_gem_active_get_rcu(const struct i915_gem_active *active)
 	 * just report the active tracker is idle. If the new request is
 	 * incomplete, then we acquire a reference on it and check that
 	 * it remained the active request.
+	 *
+	 * It is then imperative that we do not zero the request on
+	 * reallocation, so that we can chase the dangling pointers!
+	 * See i915_gem_request_alloc().
 	 */
 	do {
 		struct drm_i915_gem_request *request;
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 125+ messages in thread

* [PATCH 03/33] drm/i915: Move missed interrupt detection from hangcheck to breadcrumbs
  2016-08-07 14:45 First class VMA, take 2 Chris Wilson
  2016-08-07 14:45 ` [PATCH 01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance Chris Wilson
  2016-08-07 14:45 ` [PATCH 02/33] drm/i915: Do not overwrite the request with zero on reallocation Chris Wilson
@ 2016-08-07 14:45 ` Chris Wilson
  2016-08-09 14:08   ` [PATCH v2] " Chris Wilson
  2016-08-09 14:10   ` [PATCH v3] " Chris Wilson
  2016-08-07 14:45 ` [PATCH 04/33] drm/i915: Use RCU to annotate and enforce protection for breadcrumb's bh Chris Wilson
                   ` (35 subsequent siblings)
  38 siblings, 2 replies; 125+ messages in thread
From: Chris Wilson @ 2016-08-07 14:45 UTC (permalink / raw)
  To: intel-gfx; +Cc: Mika Kuoppala

In commit 2529d57050af ("drm/i915: Drop racy markup of missed-irqs from
idle-worker") the racy detection of missed interrupts was removed when
we went idle. This however opened up the issue that the stuck waiters
were not being reported, causing a test case failure. If we move the
stuck waiter detection out of hangcheck and into the breadcrumb
mechanims (i.e. the waiter) itself, we can avoid this issue entirely.
This leaves hangcheck looking for a stuck GPU (inspecting for request
advancement and HEAD motion), and breadcrumbs looking for a stuck
waiter - hopefully make both easier to understand by their segregation.

v2: Reduce the error message as we now run independently of hangcheck,
and the hanging batch used by igt also counts as a stuck waiter causing
extra warnings in dmesg.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97104
Fixes: 2529d57050af (waiter"drm/i915: Drop racy markup of missed-irqs...")
Testcase: igt/drv_missed_irq
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c      | 11 +++----
 drivers/gpu/drm/i915/i915_gem.c          | 10 -------
 drivers/gpu/drm/i915/i915_irq.c          | 26 +----------------
 drivers/gpu/drm/i915/intel_breadcrumbs.c | 49 +++++++++++++++++++++++---------
 drivers/gpu/drm/i915/intel_engine_cs.c   |  1 +
 drivers/gpu/drm/i915/intel_ringbuffer.h  |  6 ++--
 6 files changed, 45 insertions(+), 58 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 9bd41581b592..0627e170ea25 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -787,8 +787,6 @@ static void i915_ring_seqno_info(struct seq_file *m,
 
 	seq_printf(m, "Current sequence (%s): %x\n",
 		   engine->name, intel_engine_get_seqno(engine));
-	seq_printf(m, "Current user interrupts (%s): %lx\n",
-		   engine->name, READ_ONCE(engine->breadcrumbs.irq_wakeups));
 
 	spin_lock(&b->lock);
 	for (rb = rb_first(&b->waiters); rb; rb = rb_next(rb)) {
@@ -1434,11 +1432,10 @@ static int i915_hangcheck_info(struct seq_file *m, void *unused)
 			   engine->hangcheck.seqno,
 			   seqno[id],
 			   engine->last_submitted_seqno);
-		seq_printf(m, "\twaiters? %d\n",
-			   intel_engine_has_waiter(engine));
-		seq_printf(m, "\tuser interrupts = %lx [current %lx]\n",
-			   engine->hangcheck.user_interrupts,
-			   READ_ONCE(engine->breadcrumbs.irq_wakeups));
+		seq_printf(m, "\twaiters? %s, fake irq active? %s\n",
+			   yesno(intel_engine_has_waiter(engine)),
+			   yesno(test_bit(engine->id,
+					  &dev_priv->gpu_error.missed_irq_rings)));
 		seq_printf(m, "\tACTHD = 0x%08llx [current 0x%08llx]\n",
 			   (long long)engine->hangcheck.acthd,
 			   (long long)acthd[id]);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 654f0b015f97..cf94c6ed0ff5 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2526,7 +2526,6 @@ i915_gem_idle_work_handler(struct work_struct *work)
 		container_of(work, typeof(*dev_priv), gt.idle_work.work);
 	struct drm_device *dev = &dev_priv->drm;
 	struct intel_engine_cs *engine;
-	unsigned int stuck_engines;
 	bool rearm_hangcheck;
 
 	if (!READ_ONCE(dev_priv->gt.awake))
@@ -2556,15 +2555,6 @@ i915_gem_idle_work_handler(struct work_struct *work)
 	dev_priv->gt.awake = false;
 	rearm_hangcheck = false;
 
-	/* As we have disabled hangcheck, we need to unstick any waiters still
-	 * hanging around. However, as we may be racing against the interrupt
-	 * handler or the waiters themselves, we skip enabling the fake-irq.
-	 */
-	stuck_engines = intel_kick_waiters(dev_priv);
-	if (unlikely(stuck_engines))
-		DRM_DEBUG_DRIVER("kicked stuck waiters (%x)...missed irq?\n",
-				 stuck_engines);
-
 	if (INTEL_GEN(dev_priv) >= 6)
 		gen6_rps_idle(dev_priv);
 	intel_runtime_pm_put(dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 591f452ece68..ebb83d5a448b 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -972,10 +972,8 @@ static void ironlake_rps_change_irq_handler(struct drm_i915_private *dev_priv)
 static void notify_ring(struct intel_engine_cs *engine)
 {
 	smp_store_mb(engine->breadcrumbs.irq_posted, true);
-	if (intel_engine_wakeup(engine)) {
+	if (intel_engine_wakeup(engine))
 		trace_i915_gem_request_notify(engine);
-		engine->breadcrumbs.irq_wakeups++;
-	}
 }
 
 static void vlv_c0_read(struct drm_i915_private *dev_priv,
@@ -3044,22 +3042,6 @@ engine_stuck(struct intel_engine_cs *engine, u64 acthd)
 	return HANGCHECK_HUNG;
 }
 
-static unsigned long kick_waiters(struct intel_engine_cs *engine)
-{
-	struct drm_i915_private *i915 = engine->i915;
-	unsigned long irq_count = READ_ONCE(engine->breadcrumbs.irq_wakeups);
-
-	if (engine->hangcheck.user_interrupts == irq_count &&
-	    !test_and_set_bit(engine->id, &i915->gpu_error.missed_irq_rings)) {
-		if (!test_bit(engine->id, &i915->gpu_error.test_irq_rings))
-			DRM_ERROR("Hangcheck timer elapsed... %s idle\n",
-				  engine->name);
-
-		intel_engine_enable_fake_irq(engine);
-	}
-
-	return irq_count;
-}
 /*
  * This is called when the chip hasn't reported back with completed
  * batchbuffers in a long time. We keep track per ring seqno progress and
@@ -3097,7 +3079,6 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
 		bool busy = intel_engine_has_waiter(engine);
 		u64 acthd;
 		u32 seqno;
-		unsigned user_interrupts;
 
 		semaphore_clear_deadlocks(dev_priv);
 
@@ -3114,15 +3095,11 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
 		acthd = intel_engine_get_active_head(engine);
 		seqno = intel_engine_get_seqno(engine);
 
-		/* Reset stuck interrupts between batch advances */
-		user_interrupts = 0;
-
 		if (engine->hangcheck.seqno == seqno) {
 			if (!intel_engine_is_active(engine)) {
 				engine->hangcheck.action = HANGCHECK_IDLE;
 				if (busy) {
 					/* Safeguard against driver failure */
-					user_interrupts = kick_waiters(engine);
 					engine->hangcheck.score += BUSY;
 				}
 			} else {
@@ -3185,7 +3162,6 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
 
 		engine->hangcheck.seqno = seqno;
 		engine->hangcheck.acthd = acthd;
-		engine->hangcheck.user_interrupts = user_interrupts;
 		busy_count += busy;
 	}
 
diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
index 90867446f1a5..8ecb3b6538fc 100644
--- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
@@ -26,6 +26,29 @@
 
 #include "i915_drv.h"
 
+static void intel_breadcrumbs_hangcheck(unsigned long data)
+{
+	struct intel_engine_cs *engine = (struct intel_engine_cs *)data;
+	struct intel_breadcrumbs *b = &engine->breadcrumbs;
+
+	if (!b->irq_enabled)
+		return;
+
+	if (time_before(jiffies, b->timeout)) {
+		mod_timer(&b->hangcheck, b->timeout);
+		return;
+	}
+
+	DRM_DEBUG("Hangcheck timer elapsed... %s idle\n", engine->name);
+	set_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings);
+	mod_timer(&engine->breadcrumbs.fake_irq, jiffies + 1);
+}
+
+static unsigned long wait_timeout(void)
+{
+	return round_jiffies_up(jiffies + DRM_I915_HANGCHECK_JIFFIES);
+}
+
 static void intel_breadcrumbs_fake_irq(unsigned long data)
 {
 	struct intel_engine_cs *engine = (struct intel_engine_cs *)data;
@@ -51,13 +74,6 @@ static void irq_enable(struct intel_engine_cs *engine)
 	 */
 	engine->breadcrumbs.irq_posted = true;
 
-	/* Make sure the current hangcheck doesn't falsely accuse a just
-	 * started irq handler from missing an interrupt (because the
-	 * interrupt count still matches the stale value from when
-	 * the irq handler was disabled, many hangchecks ago).
-	 */
-	engine->breadcrumbs.irq_wakeups++;
-
 	spin_lock_irq(&engine->i915->irq_lock);
 	engine->irq_enable(engine);
 	spin_unlock_irq(&engine->i915->irq_lock);
@@ -98,8 +114,13 @@ static void __intel_breadcrumbs_enable_irq(struct intel_breadcrumbs *b)
 	}
 
 	if (!b->irq_enabled ||
-	    test_bit(engine->id, &i915->gpu_error.missed_irq_rings))
+	    test_bit(engine->id, &i915->gpu_error.missed_irq_rings)) {
 		mod_timer(&b->fake_irq, jiffies + 1);
+	} else {
+		/* Ensure we never sleep indefinitely */
+		GEM_BUG_ON(!time_after(b->timeout, jiffies));
+		mod_timer(&b->hangcheck, b->timeout);
+	}
 
 	/* Ensure that even if the GPU hangs, we get woken up.
 	 *
@@ -219,6 +240,7 @@ static bool __intel_engine_add_wait(struct intel_engine_cs *engine,
 		GEM_BUG_ON(!next && !first);
 		if (next && next != &wait->node) {
 			GEM_BUG_ON(first);
+			b->timeout = wait_timeout();
 			b->first_wait = to_wait(next);
 			smp_store_mb(b->irq_seqno_bh, b->first_wait->tsk);
 			/* As there is a delay between reading the current
@@ -245,6 +267,7 @@ static bool __intel_engine_add_wait(struct intel_engine_cs *engine,
 
 	if (first) {
 		GEM_BUG_ON(rb_first(&b->waiters) != &wait->node);
+		b->timeout = wait_timeout();
 		b->first_wait = wait;
 		smp_store_mb(b->irq_seqno_bh, wait->tsk);
 		/* After assigning ourselves as the new bottom-half, we must
@@ -277,11 +300,6 @@ bool intel_engine_add_wait(struct intel_engine_cs *engine,
 	return first;
 }
 
-void intel_engine_enable_fake_irq(struct intel_engine_cs *engine)
-{
-	mod_timer(&engine->breadcrumbs.fake_irq, jiffies + 1);
-}
-
 static inline bool chain_wakeup(struct rb_node *rb, int priority)
 {
 	return rb && to_wait(rb)->tsk->prio <= priority;
@@ -359,6 +377,7 @@ void intel_engine_remove_wait(struct intel_engine_cs *engine,
 			 * the interrupt, or if we have to handle an
 			 * exception rather than a seqno completion.
 			 */
+			b->timeout = wait_timeout();
 			b->first_wait = to_wait(next);
 			smp_store_mb(b->irq_seqno_bh, b->first_wait->tsk);
 			if (b->first_wait->seqno != wait->seqno)
@@ -533,6 +552,9 @@ int intel_engine_init_breadcrumbs(struct intel_engine_cs *engine)
 	struct task_struct *tsk;
 
 	spin_lock_init(&b->lock);
+	setup_timer(&b->hangcheck,
+		    intel_breadcrumbs_hangcheck,
+		    (unsigned long)engine);
 	setup_timer(&b->fake_irq,
 		    intel_breadcrumbs_fake_irq,
 		    (unsigned long)engine);
@@ -561,6 +583,7 @@ void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine)
 		kthread_stop(b->signaler);
 
 	del_timer_sync(&b->fake_irq);
+	del_timer_sync(&b->hangcheck);
 }
 
 unsigned int intel_kick_waiters(struct drm_i915_private *i915)
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index e9b301ae2d0c..0dd3d1de18aa 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -164,6 +164,7 @@ cleanup:
 void intel_engine_init_hangcheck(struct intel_engine_cs *engine)
 {
 	memset(&engine->hangcheck, 0, sizeof(engine->hangcheck));
+	clear_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings);
 }
 
 static void intel_engine_init_requests(struct intel_engine_cs *engine)
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 43e545e44352..4aed4586b0b6 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -75,7 +75,6 @@ enum intel_engine_hangcheck_action {
 
 struct intel_engine_hangcheck {
 	u64 acthd;
-	unsigned long user_interrupts;
 	u32 seqno;
 	int score;
 	enum intel_engine_hangcheck_action action;
@@ -173,7 +172,6 @@ struct intel_engine_cs {
 	 */
 	struct intel_breadcrumbs {
 		struct task_struct *irq_seqno_bh; /* bh for user interrupts */
-		unsigned long irq_wakeups;
 		bool irq_posted;
 
 		spinlock_t lock; /* protects the lists of requests */
@@ -183,6 +181,9 @@ struct intel_engine_cs {
 		struct task_struct *signaler; /* used for fence signalling */
 		struct drm_i915_gem_request *first_signal;
 		struct timer_list fake_irq; /* used after a missed interrupt */
+		struct timer_list hangcheck; /* detect missed interrupts */
+
+		unsigned long timeout;
 
 		bool irq_enabled : 1;
 		bool rpm_wakelock : 1;
@@ -560,7 +561,6 @@ static inline bool intel_engine_wakeup(struct intel_engine_cs *engine)
 	return wakeup;
 }
 
-void intel_engine_enable_fake_irq(struct intel_engine_cs *engine);
 void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine);
 unsigned int intel_kick_waiters(struct drm_i915_private *i915);
 unsigned int intel_kick_signalers(struct drm_i915_private *i915);
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 125+ messages in thread

* [PATCH 04/33] drm/i915: Use RCU to annotate and enforce protection for breadcrumb's bh
  2016-08-07 14:45 First class VMA, take 2 Chris Wilson
                   ` (2 preceding siblings ...)
  2016-08-07 14:45 ` [PATCH 03/33] drm/i915: Move missed interrupt detection from hangcheck to breadcrumbs Chris Wilson
@ 2016-08-07 14:45 ` Chris Wilson
  2016-08-08  9:33   ` Daniel Vetter
  2016-08-12  9:56   ` Joonas Lahtinen
  2016-08-07 14:45 ` [PATCH 05/33] drm/i915: Reduce amount of duplicate buffer information captured on error Chris Wilson
                   ` (34 subsequent siblings)
  38 siblings, 2 replies; 125+ messages in thread
From: Chris Wilson @ 2016-08-07 14:45 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter

The bottom-half we use for processing the breadcrumb interrupt is a
task, which is an RCU protected struct. When accessing this struct, we
need to be holding the RCU read lock to prevent it disappearing beneath
us. We can use the RCU annotation to mark our irq_seqno_bh pointer as
being under RCU guard and then use the RCU accessors to both provide
correct ordering of access through the pointer.

Most notably, this fixes the access from hard irq context to use the RCU
read lock, which both Daniel and Tvrtko complained about.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h          |  2 +-
 drivers/gpu/drm/i915/intel_breadcrumbs.c | 22 +++++++++-------------
 drivers/gpu/drm/i915/intel_ringbuffer.c  |  2 --
 drivers/gpu/drm/i915/intel_ringbuffer.h  | 21 ++++++++++++++-------
 4 files changed, 24 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index feec00f769e1..3d546b5c2e4c 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3848,7 +3848,7 @@ static inline bool __i915_request_irq_complete(struct drm_i915_gem_request *req)
 	 * is woken.
 	 */
 	if (engine->irq_seqno_barrier &&
-	    READ_ONCE(engine->breadcrumbs.irq_seqno_bh) == current &&
+	    rcu_access_pointer(engine->breadcrumbs.irq_seqno_bh) == current &&
 	    cmpxchg_relaxed(&engine->breadcrumbs.irq_posted, 1, 0)) {
 		struct task_struct *tsk;
 
diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
index 8ecb3b6538fc..7552bd039565 100644
--- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
@@ -60,10 +60,8 @@ static void intel_breadcrumbs_fake_irq(unsigned long data)
 	 * every jiffie in order to kick the oldest waiter to do the
 	 * coherent seqno check.
 	 */
-	rcu_read_lock();
 	if (intel_engine_wakeup(engine))
 		mod_timer(&engine->breadcrumbs.fake_irq, jiffies + 1);
-	rcu_read_unlock();
 }
 
 static void irq_enable(struct intel_engine_cs *engine)
@@ -232,7 +230,7 @@ static bool __intel_engine_add_wait(struct intel_engine_cs *engine,
 	}
 	rb_link_node(&wait->node, parent, p);
 	rb_insert_color(&wait->node, &b->waiters);
-	GEM_BUG_ON(!first && !b->irq_seqno_bh);
+	GEM_BUG_ON(!first && !rcu_access_pointer(b->irq_seqno_bh));
 
 	if (completed) {
 		struct rb_node *next = rb_next(completed);
@@ -242,7 +240,7 @@ static bool __intel_engine_add_wait(struct intel_engine_cs *engine,
 			GEM_BUG_ON(first);
 			b->timeout = wait_timeout();
 			b->first_wait = to_wait(next);
-			smp_store_mb(b->irq_seqno_bh, b->first_wait->tsk);
+			rcu_assign_pointer(b->irq_seqno_bh, b->first_wait->tsk);
 			/* As there is a delay between reading the current
 			 * seqno, processing the completed tasks and selecting
 			 * the next waiter, we may have missed the interrupt
@@ -269,7 +267,7 @@ static bool __intel_engine_add_wait(struct intel_engine_cs *engine,
 		GEM_BUG_ON(rb_first(&b->waiters) != &wait->node);
 		b->timeout = wait_timeout();
 		b->first_wait = wait;
-		smp_store_mb(b->irq_seqno_bh, wait->tsk);
+		rcu_assign_pointer(b->irq_seqno_bh, wait->tsk);
 		/* After assigning ourselves as the new bottom-half, we must
 		 * perform a cursory check to prevent a missed interrupt.
 		 * Either we miss the interrupt whilst programming the hardware,
@@ -280,7 +278,7 @@ static bool __intel_engine_add_wait(struct intel_engine_cs *engine,
 		 */
 		__intel_breadcrumbs_enable_irq(b);
 	}
-	GEM_BUG_ON(!b->irq_seqno_bh);
+	GEM_BUG_ON(!rcu_access_pointer(b->irq_seqno_bh));
 	GEM_BUG_ON(!b->first_wait);
 	GEM_BUG_ON(rb_first(&b->waiters) != &b->first_wait->node);
 
@@ -335,7 +333,7 @@ void intel_engine_remove_wait(struct intel_engine_cs *engine,
 		const int priority = wakeup_priority(b, wait->tsk);
 		struct rb_node *next;
 
-		GEM_BUG_ON(b->irq_seqno_bh != wait->tsk);
+		GEM_BUG_ON(rcu_access_pointer(b->irq_seqno_bh) != wait->tsk);
 
 		/* We are the current bottom-half. Find the next candidate,
 		 * the first waiter in the queue on the remaining oldest
@@ -379,13 +377,13 @@ void intel_engine_remove_wait(struct intel_engine_cs *engine,
 			 */
 			b->timeout = wait_timeout();
 			b->first_wait = to_wait(next);
-			smp_store_mb(b->irq_seqno_bh, b->first_wait->tsk);
+			rcu_assign_pointer(b->irq_seqno_bh, b->first_wait->tsk);
 			if (b->first_wait->seqno != wait->seqno)
 				__intel_breadcrumbs_enable_irq(b);
-			wake_up_process(b->irq_seqno_bh);
+			wake_up_process(b->first_wait->tsk);
 		} else {
 			b->first_wait = NULL;
-			WRITE_ONCE(b->irq_seqno_bh, NULL);
+			rcu_assign_pointer(b->irq_seqno_bh, NULL);
 			__intel_breadcrumbs_disable_irq(b);
 		}
 	} else {
@@ -399,7 +397,7 @@ out_unlock:
 	GEM_BUG_ON(b->first_wait == wait);
 	GEM_BUG_ON(rb_first(&b->waiters) !=
 		   (b->first_wait ? &b->first_wait->node : NULL));
-	GEM_BUG_ON(!b->irq_seqno_bh ^ RB_EMPTY_ROOT(&b->waiters));
+	GEM_BUG_ON(!rcu_access_pointer(b->irq_seqno_bh) ^ RB_EMPTY_ROOT(&b->waiters));
 	spin_unlock(&b->lock);
 }
 
@@ -596,11 +594,9 @@ unsigned int intel_kick_waiters(struct drm_i915_private *i915)
 	 * RCU lock, i.e. as we call wake_up_process() we must be holding the
 	 * rcu_read_lock().
 	 */
-	rcu_read_lock();
 	for_each_engine(engine, i915)
 		if (unlikely(intel_engine_wakeup(engine)))
 			mask |= intel_engine_flag(engine);
-	rcu_read_unlock();
 
 	return mask;
 }
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index e08a1e1b04e4..16b726fe33eb 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2410,9 +2410,7 @@ void intel_engine_init_seqno(struct intel_engine_cs *engine, u32 seqno)
 	/* After manually advancing the seqno, fake the interrupt in case
 	 * there are any waiters for that seqno.
 	 */
-	rcu_read_lock();
 	intel_engine_wakeup(engine);
-	rcu_read_unlock();
 }
 
 static void gen6_bsd_submit_request(struct drm_i915_gem_request *request)
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 4aed4586b0b6..66dc93469076 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -171,7 +171,7 @@ struct intel_engine_cs {
 	 * the overhead of waking that client is much preferred.
 	 */
 	struct intel_breadcrumbs {
-		struct task_struct *irq_seqno_bh; /* bh for user interrupts */
+		struct task_struct __rcu *irq_seqno_bh; /* bh for interrupts */
 		bool irq_posted;
 
 		spinlock_t lock; /* protects the lists of requests */
@@ -541,23 +541,30 @@ void intel_engine_enable_signaling(struct drm_i915_gem_request *request);
 
 static inline bool intel_engine_has_waiter(struct intel_engine_cs *engine)
 {
-	return READ_ONCE(engine->breadcrumbs.irq_seqno_bh);
+	return rcu_access_pointer(engine->breadcrumbs.irq_seqno_bh);
 }
 
 static inline bool intel_engine_wakeup(struct intel_engine_cs *engine)
 {
 	bool wakeup = false;
-	struct task_struct *tsk = READ_ONCE(engine->breadcrumbs.irq_seqno_bh);
+
 	/* Note that for this not to dangerously chase a dangling pointer,
-	 * the caller is responsible for ensure that the task remain valid for
-	 * wake_up_process() i.e. that the RCU grace period cannot expire.
+	 * we must hold the rcu_read_lock here.
 	 *
 	 * Also note that tsk is likely to be in !TASK_RUNNING state so an
 	 * early test for tsk->state != TASK_RUNNING before wake_up_process()
 	 * is unlikely to be beneficial.
 	 */
-	if (tsk)
-		wakeup = wake_up_process(tsk);
+	if (rcu_access_pointer(engine->breadcrumbs.irq_seqno_bh)) {
+		struct task_struct *tsk;
+
+		rcu_read_lock();
+		tsk = rcu_dereference(engine->breadcrumbs.irq_seqno_bh);
+		if (tsk)
+			wakeup = wake_up_process(tsk);
+		rcu_read_unlock();
+	}
+
 	return wakeup;
 }
 
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 125+ messages in thread

* [PATCH 05/33] drm/i915: Reduce amount of duplicate buffer information captured on error
  2016-08-07 14:45 First class VMA, take 2 Chris Wilson
                   ` (3 preceding siblings ...)
  2016-08-07 14:45 ` [PATCH 04/33] drm/i915: Use RCU to annotate and enforce protection for breadcrumb's bh Chris Wilson
@ 2016-08-07 14:45 ` Chris Wilson
  2016-08-10  7:04   ` Joonas Lahtinen
  2016-08-07 14:45 ` [PATCH 06/33] drm/i915: Stop the machine whilst capturing the GPU crash dump Chris Wilson
                   ` (33 subsequent siblings)
  38 siblings, 1 reply; 125+ messages in thread
From: Chris Wilson @ 2016-08-07 14:45 UTC (permalink / raw)
  To: intel-gfx

When capturing the error state, we do not need to know about every
address space - just those that are related to the error. We know which
context is active at the time, therefore we know which VM are implicated
in the error. We can then restrict the VM which we report to the
relevant subset.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h       |   9 +-
 drivers/gpu/drm/i915/i915_gpu_error.c | 202 ++++++++++++++--------------------
 2 files changed, 89 insertions(+), 122 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 3d546b5c2e4c..15c41158b4cf 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -517,6 +517,7 @@ struct drm_i915_error_state {
 		int num_waiters;
 		int hangcheck_score;
 		enum intel_engine_hangcheck_action hangcheck_action;
+		struct i915_address_space *vm;
 		int num_requests;
 
 		/* our own tracking of ring head and tail */
@@ -586,17 +587,15 @@ struct drm_i915_error_state {
 		u32 read_domains;
 		u32 write_domain;
 		s32 fence_reg:I915_MAX_NUM_FENCE_BITS;
-		s32 pinned:2;
 		u32 tiling:2;
 		u32 dirty:1;
 		u32 purgeable:1;
 		u32 userptr:1;
 		s32 engine:4;
 		u32 cache_level:3;
-	} **active_bo, **pinned_bo;
-
-	u32 *active_bo_count, *pinned_bo_count;
-	u32 vm_count;
+	} *active_bo[I915_NUM_ENGINES], *pinned_bo;
+	u32 active_bo_count[I915_NUM_ENGINES], pinned_bo_count;
+	struct i915_address_space *active_vm[I915_NUM_ENGINES];
 };
 
 struct intel_connector;
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index eecb87063c88..ced296983caa 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -42,16 +42,6 @@ static const char *engine_str(int engine)
 	}
 }
 
-static const char *pin_flag(int pinned)
-{
-	if (pinned > 0)
-		return " P";
-	else if (pinned < 0)
-		return " p";
-	else
-		return "";
-}
-
 static const char *tiling_flag(int tiling)
 {
 	switch (tiling) {
@@ -189,7 +179,7 @@ static void print_error_buffers(struct drm_i915_error_state_buf *m,
 {
 	int i;
 
-	err_printf(m, "  %s [%d]:\n", name, count);
+	err_printf(m, "%s [%d]:\n", name, count);
 
 	while (count--) {
 		err_printf(m, "    %08x_%08x %8u %02x %02x [ ",
@@ -202,7 +192,6 @@ static void print_error_buffers(struct drm_i915_error_state_buf *m,
 			err_printf(m, "%02x ", err->rseqno[i]);
 
 		err_printf(m, "] %02x", err->wseqno);
-		err_puts(m, pin_flag(err->pinned));
 		err_puts(m, tiling_flag(err->tiling));
 		err_puts(m, dirty_flag(err->dirty));
 		err_puts(m, purgeable_flag(err->purgeable));
@@ -414,18 +403,25 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
 			error_print_engine(m, &error->engine[i]);
 	}
 
-	for (i = 0; i < error->vm_count; i++) {
-		err_printf(m, "vm[%d]\n", i);
+	for (i = 0; i < I915_NUM_ENGINES; i++) {
+		if (!error->active_vm[i])
+			break;
 
-		print_error_buffers(m, "Active",
+		err_printf(m, "Active vm[%d]\n", i);
+		for (j = 0; j < I915_NUM_ENGINES; j++) {
+			if (error->engine[j].vm == error->active_vm[i])
+				err_printf(m, "    %s\n",
+					   dev_priv->engine[j].name);
+		}
+		print_error_buffers(m, "  Buffers",
 				    error->active_bo[i],
 				    error->active_bo_count[i]);
-
-		print_error_buffers(m, "Pinned",
-				    error->pinned_bo[i],
-				    error->pinned_bo_count[i]);
 	}
 
+	print_error_buffers(m, "Pinned (global)",
+			    error->pinned_bo,
+			    error->pinned_bo_count);
+
 	for (i = 0; i < ARRAY_SIZE(error->engine); i++) {
 		struct drm_i915_error_engine *ee = &error->engine[i];
 
@@ -626,13 +622,10 @@ static void i915_error_state_free(struct kref *error_ref)
 
 	i915_error_object_free(error->semaphore_obj);
 
-	for (i = 0; i < error->vm_count; i++)
+	for (i = 0; i < ARRAY_SIZE(error->active_bo); i++)
 		kfree(error->active_bo[i]);
-
-	kfree(error->active_bo);
-	kfree(error->active_bo_count);
 	kfree(error->pinned_bo);
-	kfree(error->pinned_bo_count);
+
 	kfree(error->overlay);
 	kfree(error->display);
 	kfree(error);
@@ -778,9 +771,6 @@ static void capture_bo(struct drm_i915_error_buffer *err,
 	err->read_domains = obj->base.read_domains;
 	err->write_domain = obj->base.write_domain;
 	err->fence_reg = obj->fence_reg;
-	err->pinned = 0;
-	if (i915_gem_obj_is_pinned(obj))
-		err->pinned = 1;
 	err->tiling = i915_gem_object_get_tiling(obj);
 	err->dirty = obj->dirty;
 	err->purgeable = obj->madv != I915_MADV_WILLNEED;
@@ -788,13 +778,17 @@ static void capture_bo(struct drm_i915_error_buffer *err,
 	err->cache_level = obj->cache_level;
 }
 
-static u32 capture_active_bo(struct drm_i915_error_buffer *err,
-			     int count, struct list_head *head)
+static u32 capture_error_bo(struct drm_i915_error_buffer *err,
+			    int count, struct list_head *head,
+			    bool pinned_only)
 {
 	struct i915_vma *vma;
 	int i = 0;
 
 	list_for_each_entry(vma, head, vm_link) {
+		if (pinned_only && !i915_vma_is_pinned(vma))
+			continue;
+
 		capture_bo(err++, vma);
 		if (++i == count)
 			break;
@@ -803,28 +797,6 @@ static u32 capture_active_bo(struct drm_i915_error_buffer *err,
 	return i;
 }
 
-static u32 capture_pinned_bo(struct drm_i915_error_buffer *err,
-			     int count, struct list_head *head,
-			     struct i915_address_space *vm)
-{
-	struct drm_i915_gem_object *obj;
-	struct drm_i915_error_buffer * const first = err;
-	struct drm_i915_error_buffer * const last = err + count;
-
-	list_for_each_entry(obj, head, global_list) {
-		struct i915_vma *vma;
-
-		if (err == last)
-			break;
-
-		list_for_each_entry(vma, &obj->vma_list, obj_link)
-			if (vma->vm == vm && i915_vma_is_pinned(vma))
-				capture_bo(err++, vma);
-	}
-
-	return err - first;
-}
-
 /* Generate a semi-unique error code. The code is not meant to have meaning, The
  * code's only purpose is to try to prevent false duplicated bug reports by
  * grossly estimating a GPU error state.
@@ -1062,7 +1034,6 @@ static void error_record_engine_registers(struct drm_i915_error_state *error,
 	}
 }
 
-
 static void i915_gem_record_active_context(struct intel_engine_cs *engine,
 					   struct drm_i915_error_state *error,
 					   struct drm_i915_error_engine *ee)
@@ -1115,10 +1086,9 @@ static void i915_gem_record_rings(struct drm_i915_private *dev_priv,
 
 		request = i915_gem_find_active_request(engine);
 		if (request) {
-			struct i915_address_space *vm;
 			struct intel_ring *ring;
 
-			vm = request->ctx->ppgtt ?
+			ee->vm = request->ctx->ppgtt ?
 				&request->ctx->ppgtt->base : &ggtt->base;
 
 			/* We need to copy these to an anonymous buffer
@@ -1128,7 +1098,7 @@ static void i915_gem_record_rings(struct drm_i915_private *dev_priv,
 			ee->batchbuffer =
 				i915_error_object_create(dev_priv,
 							 request->batch_obj,
-							 vm);
+							 ee->vm);
 
 			if (HAS_BROKEN_CS_TLB(dev_priv))
 				ee->wa_batchbuffer =
@@ -1210,89 +1180,85 @@ static void i915_gem_record_rings(struct drm_i915_private *dev_priv,
 	}
 }
 
-/* FIXME: Since pin count/bound list is global, we duplicate what we capture per
- * VM.
- */
 static void i915_gem_capture_vm(struct drm_i915_private *dev_priv,
 				struct drm_i915_error_state *error,
 				struct i915_address_space *vm,
 				const int ndx)
 {
-	struct drm_i915_error_buffer *active_bo = NULL, *pinned_bo = NULL;
-	struct drm_i915_gem_object *obj;
+	struct drm_i915_error_buffer *active_bo;
 	struct i915_vma *vma;
 	int i;
 
 	i = 0;
 	list_for_each_entry(vma, &vm->active_list, vm_link)
 		i++;
-	error->active_bo_count[ndx] = i;
-
-	list_for_each_entry(obj, &dev_priv->mm.bound_list, global_list) {
-		list_for_each_entry(vma, &obj->vma_list, obj_link)
-			if (vma->vm == vm && i915_vma_is_pinned(vma))
-				i++;
-	}
-	error->pinned_bo_count[ndx] = i - error->active_bo_count[ndx];
 
-	if (i) {
+	active_bo = NULL;
+	if (i)
 		active_bo = kcalloc(i, sizeof(*active_bo), GFP_ATOMIC);
-		if (active_bo)
-			pinned_bo = active_bo + error->active_bo_count[ndx];
-	}
-
 	if (active_bo)
-		error->active_bo_count[ndx] =
-			capture_active_bo(active_bo,
-					  error->active_bo_count[ndx],
-					  &vm->active_list);
-
-	if (pinned_bo)
-		error->pinned_bo_count[ndx] =
-			capture_pinned_bo(pinned_bo,
-					  error->pinned_bo_count[ndx],
-					  &dev_priv->mm.bound_list, vm);
+		i = capture_error_bo(active_bo, i, &vm->active_list, false);
+	else
+		i = 0;
+
 	error->active_bo[ndx] = active_bo;
-	error->pinned_bo[ndx] = pinned_bo;
+	error->active_bo_count[ndx] = i;
+	error->active_vm[ndx] = vm;
 }
 
-static void i915_gem_capture_buffers(struct drm_i915_private *dev_priv,
-				     struct drm_i915_error_state *error)
+static void i915_capture_active_buffers(struct drm_i915_private *dev_priv,
+					struct drm_i915_error_state *error)
 {
-	struct i915_address_space *vm;
-	int cnt = 0, i = 0;
-
-	list_for_each_entry(vm, &dev_priv->vm_list, global_link)
-		cnt++;
-
-	error->active_bo = kcalloc(cnt, sizeof(*error->active_bo), GFP_ATOMIC);
-	error->pinned_bo = kcalloc(cnt, sizeof(*error->pinned_bo), GFP_ATOMIC);
-	error->active_bo_count = kcalloc(cnt, sizeof(*error->active_bo_count),
-					 GFP_ATOMIC);
-	error->pinned_bo_count = kcalloc(cnt, sizeof(*error->pinned_bo_count),
-					 GFP_ATOMIC);
-
-	if (error->active_bo == NULL ||
-	    error->pinned_bo == NULL ||
-	    error->active_bo_count == NULL ||
-	    error->pinned_bo_count == NULL) {
-		kfree(error->active_bo);
-		kfree(error->active_bo_count);
-		kfree(error->pinned_bo);
-		kfree(error->pinned_bo_count);
-
-		error->active_bo = NULL;
-		error->active_bo_count = NULL;
-		error->pinned_bo = NULL;
-		error->pinned_bo_count = NULL;
-	} else {
-		list_for_each_entry(vm, &dev_priv->vm_list, global_link)
-			i915_gem_capture_vm(dev_priv, error, vm, i++);
+	int cnt = 0, i, j;
+
+	BUILD_BUG_ON(ARRAY_SIZE(error->engine) > ARRAY_SIZE(error->active_bo));
+	BUILD_BUG_ON(ARRAY_SIZE(error->active_bo) != ARRAY_SIZE(error->active_vm));
+	BUILD_BUG_ON(ARRAY_SIZE(error->active_bo) != ARRAY_SIZE(error->active_bo_count));
 
-		error->vm_count = cnt;
+	for (i = 0; i < I915_NUM_ENGINES; i++) {
+		struct drm_i915_error_engine *ee = &error->engine[i];
+
+		if (!ee->vm)
+			continue;
+
+		for (j = 0; j < i; j++)
+			if (error->engine[j].vm == ee->vm)
+				break;
+		if (j != i)
+			continue;
+
+		i915_gem_capture_vm(dev_priv, error, ee->vm, cnt++);
 	}
 }
 
+static void i915_capture_pinned_buffers(struct drm_i915_private *dev_priv,
+					struct drm_i915_error_state *error)
+{
+	struct i915_address_space *vm = &dev_priv->ggtt.base;
+	struct drm_i915_error_buffer *bo;
+	struct i915_vma *vma;
+	int i, j;
+
+	i = 0;
+	list_for_each_entry(vma, &vm->active_list, vm_link)
+		i++;
+
+	j = 0;
+	list_for_each_entry(vma, &vm->inactive_list, vm_link)
+		j++;
+
+	bo = NULL;
+	if (i + j)
+		bo = kcalloc(i + j, sizeof(*bo), GFP_ATOMIC);
+	if (!bo)
+		return;
+
+	i = capture_error_bo(bo, i, &vm->active_list, true);
+	j = capture_error_bo(bo + i, j, &vm->inactive_list, true);
+	error->pinned_bo_count = i + j;
+	error->pinned_bo = bo;
+}
+
 /* Capture all registers which don't fit into another category. */
 static void i915_capture_reg_state(struct drm_i915_private *dev_priv,
 				   struct drm_i915_error_state *error)
@@ -1436,10 +1402,12 @@ void i915_capture_error_state(struct drm_i915_private *dev_priv,
 
 	i915_capture_gen_state(dev_priv, error);
 	i915_capture_reg_state(dev_priv, error);
-	i915_gem_capture_buffers(dev_priv, error);
 	i915_gem_record_fences(dev_priv, error);
 	i915_gem_record_rings(dev_priv, error);
 
+	i915_capture_active_buffers(dev_priv, error);
+	i915_capture_pinned_buffers(dev_priv, error);
+
 	do_gettimeofday(&error->time);
 
 	error->overlay = intel_overlay_capture_error_state(dev_priv);
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 125+ messages in thread

* [PATCH 06/33] drm/i915: Stop the machine whilst capturing the GPU crash dump
  2016-08-07 14:45 First class VMA, take 2 Chris Wilson
                   ` (4 preceding siblings ...)
  2016-08-07 14:45 ` [PATCH 05/33] drm/i915: Reduce amount of duplicate buffer information captured on error Chris Wilson
@ 2016-08-07 14:45 ` Chris Wilson
  2016-08-07 14:45 ` [PATCH 07/33] drm/i915: Store the active context object on all engines upon error Chris Wilson
                   ` (32 subsequent siblings)
  38 siblings, 0 replies; 125+ messages in thread
From: Chris Wilson @ 2016-08-07 14:45 UTC (permalink / raw)
  To: intel-gfx

The error state is purposefully racy as we expect it to be called at any
time and so have avoided any locking whilst capturing the crash dump.
However, with multi-engine GPUs and multiple CPUs, those races can
manifest into OOPSes as we attempt to chase dangling pointers freed on
other CPUs. Under discussion are lots of ways to slow down normal
operation in order to protect the post-mortem error capture, but what it
we take the opposite approach and freeze the machine whilst the error
capture runs (note the GPU may still running, but as long as we don't
process any of the results the driver's bookkeeping will be static).

Note that by of itself, this is not a complete fix. It also depends on
the compiler barriers in list_add/list_del to prevent traversing the
lists into the void. We also depend that we only require state from
carefully controlled sources - i.e. all the state we require for
post-mortem debugging should be reachable from the request itself so
that we only have to worry about retrieving the request carefully. Once
we have the request, we know that all pointers from it are intact.

v2: Avoid drm_clflush_pages() inside stop_machine() as it may use
stop_machine() itself for its wbinvd fallback.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/Kconfig          |  1 +
 drivers/gpu/drm/i915/i915_drv.h       |  2 ++
 drivers/gpu/drm/i915/i915_gpu_error.c | 48 +++++++++++++++++++++--------------
 3 files changed, 32 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig
index 7769e469118f..7badcee88ebf 100644
--- a/drivers/gpu/drm/i915/Kconfig
+++ b/drivers/gpu/drm/i915/Kconfig
@@ -4,6 +4,7 @@ config DRM_I915
 	depends on X86 && PCI
 	select INTEL_GTT
 	select INTERVAL_TREE
+	select STOP_MACHINE
 	# we need shmfs for the swappable backing store, and in particular
 	# the shmem_readpage() which depends upon tmpfs
 	select SHMEM
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 15c41158b4cf..826486d03e8e 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -481,6 +481,8 @@ struct drm_i915_error_state {
 	struct kref ref;
 	struct timeval time;
 
+	struct drm_i915_private *i915;
+
 	char error_msg[128];
 	bool simulated;
 	int iommu;
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index ced296983caa..b94a59733cf8 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -28,6 +28,7 @@
  */
 
 #include <generated/utsrelease.h>
+#include <linux/stop_machine.h>
 #include "i915_drv.h"
 
 static const char *engine_str(int engine)
@@ -684,14 +685,12 @@ i915_error_object_create(struct drm_i915_private *dev_priv,
 
 	dst->page_count = num_pages;
 	while (num_pages--) {
-		unsigned long flags;
 		void *d;
 
 		d = kmalloc(PAGE_SIZE, GFP_ATOMIC);
 		if (d == NULL)
 			goto unwind;
 
-		local_irq_save(flags);
 		if (use_ggtt) {
 			void __iomem *s;
 
@@ -710,15 +709,10 @@ i915_error_object_create(struct drm_i915_private *dev_priv,
 
 			page = i915_gem_object_get_page(src, i);
 
-			drm_clflush_pages(&page, 1);
-
 			s = kmap_atomic(page);
 			memcpy(d, s, PAGE_SIZE);
 			kunmap_atomic(s);
-
-			drm_clflush_pages(&page, 1);
 		}
-		local_irq_restore(flags);
 
 		dst->pages[i++] = d;
 		reloc_offset += PAGE_SIZE;
@@ -1371,6 +1365,32 @@ static void i915_capture_gen_state(struct drm_i915_private *dev_priv,
 	error->suspend_count = dev_priv->suspend_count;
 }
 
+static int capture(void *data)
+{
+	struct drm_i915_error_state *error = data;
+
+	/* Ensure that what we readback from memory matches what the GPU sees */
+	wbinvd();
+
+	i915_capture_gen_state(error->i915, error);
+	i915_capture_reg_state(error->i915, error);
+	i915_gem_record_fences(error->i915, error);
+	i915_gem_record_rings(error->i915, error);
+
+	i915_capture_active_buffers(error->i915, error);
+	i915_capture_pinned_buffers(error->i915, error);
+
+	do_gettimeofday(&error->time);
+
+	error->overlay = intel_overlay_capture_error_state(error->i915);
+	error->display = intel_display_capture_error_state(error->i915);
+
+	/* And make sure we don't leave trash in the CPU cache */
+	wbinvd();
+
+	return 0;
+}
+
 /**
  * i915_capture_error_state - capture an error record for later analysis
  * @dev: drm device
@@ -1399,19 +1419,9 @@ void i915_capture_error_state(struct drm_i915_private *dev_priv,
 	}
 
 	kref_init(&error->ref);
+	error->i915 = dev_priv;
 
-	i915_capture_gen_state(dev_priv, error);
-	i915_capture_reg_state(dev_priv, error);
-	i915_gem_record_fences(dev_priv, error);
-	i915_gem_record_rings(dev_priv, error);
-
-	i915_capture_active_buffers(dev_priv, error);
-	i915_capture_pinned_buffers(dev_priv, error);
-
-	do_gettimeofday(&error->time);
-
-	error->overlay = intel_overlay_capture_error_state(dev_priv);
-	error->display = intel_display_capture_error_state(dev_priv);
+	stop_machine(capture, error, NULL);
 
 	i915_error_capture_msg(dev_priv, error, engine_mask, error_msg);
 	DRM_INFO("%s\n", error->error_msg);
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 125+ messages in thread

* [PATCH 07/33] drm/i915: Store the active context object on all engines upon error
  2016-08-07 14:45 First class VMA, take 2 Chris Wilson
                   ` (5 preceding siblings ...)
  2016-08-07 14:45 ` [PATCH 06/33] drm/i915: Stop the machine whilst capturing the GPU crash dump Chris Wilson
@ 2016-08-07 14:45 ` Chris Wilson
  2016-08-09  9:02   ` Joonas Lahtinen
  2016-08-07 14:45 ` [PATCH 08/33] drm/i915: Move setting of request->batch into its single callsite Chris Wilson
                   ` (31 subsequent siblings)
  38 siblings, 1 reply; 125+ messages in thread
From: Chris Wilson @ 2016-08-07 14:45 UTC (permalink / raw)
  To: intel-gfx

With execlists, we have context objects everywhere, not just RCS. So
store them for post-mortem debugging. This also has a secondary effect
of removing one more unsafe list iteration with using preserved state
from the hanging request. And now we can cross-reference the request's
context state with that loaded by the GPU.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gpu_error.c | 28 ++++------------------------
 1 file changed, 4 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index b94a59733cf8..c621fa23cd28 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1028,28 +1028,6 @@ static void error_record_engine_registers(struct drm_i915_error_state *error,
 	}
 }
 
-static void i915_gem_record_active_context(struct intel_engine_cs *engine,
-					   struct drm_i915_error_state *error,
-					   struct drm_i915_error_engine *ee)
-{
-	struct drm_i915_private *dev_priv = engine->i915;
-	struct drm_i915_gem_object *obj;
-
-	/* Currently render ring is the only HW context user */
-	if (engine->id != RCS || !error->ccid)
-		return;
-
-	list_for_each_entry(obj, &dev_priv->mm.bound_list, global_list) {
-		if (!i915_gem_obj_ggtt_bound(obj))
-			continue;
-
-		if ((error->ccid & PAGE_MASK) == i915_gem_obj_ggtt_offset(obj)) {
-			ee->ctx = i915_error_ggtt_object_create(dev_priv, obj);
-			break;
-		}
-	}
-}
-
 static void i915_gem_record_rings(struct drm_i915_private *dev_priv,
 				  struct drm_i915_error_state *error)
 {
@@ -1099,6 +1077,10 @@ static void i915_gem_record_rings(struct drm_i915_private *dev_priv,
 					i915_error_ggtt_object_create(dev_priv,
 								      engine->scratch.obj);
 
+			ee->ctx =
+				i915_error_ggtt_object_create(dev_priv,
+							      request->ctx->engine[i].state);
+
 			if (request->pid) {
 				struct task_struct *task;
 
@@ -1129,8 +1111,6 @@ static void i915_gem_record_rings(struct drm_i915_private *dev_priv,
 		ee->wa_ctx = i915_error_ggtt_object_create(dev_priv,
 							   engine->wa_ctx.obj);
 
-		i915_gem_record_active_context(engine, error, ee);
-
 		count = 0;
 		list_for_each_entry(request, &engine->request_list, link)
 			count++;
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 125+ messages in thread

* [PATCH 08/33] drm/i915: Move setting of request->batch into its single callsite
  2016-08-07 14:45 First class VMA, take 2 Chris Wilson
                   ` (6 preceding siblings ...)
  2016-08-07 14:45 ` [PATCH 07/33] drm/i915: Store the active context object on all engines upon error Chris Wilson
@ 2016-08-07 14:45 ` Chris Wilson
  2016-08-09 15:53   ` Mika Kuoppala
  2016-08-10  7:19   ` Joonas Lahtinen
  2016-08-07 14:45 ` [PATCH 09/33] drm/i915: Mark unmappable GGTT entries as PIN_HIGH Chris Wilson
                   ` (30 subsequent siblings)
  38 siblings, 2 replies; 125+ messages in thread
From: Chris Wilson @ 2016-08-07 14:45 UTC (permalink / raw)
  To: intel-gfx

request->batch_obj is only set by execbuffer for the convenience of
debugging hangs. By moving that operation to the callsite, we can
simplify all other callers and future patches. We also move the
complications of reference handling of the request->batch_obj next to
where the active tracking is set up for the request.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 10 +++++++++-
 drivers/gpu/drm/i915/i915_gem_request.c    | 12 +-----------
 drivers/gpu/drm/i915/i915_gem_request.h    |  8 +++-----
 3 files changed, 13 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index c494b79ded20..c8d13fea4b25 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1702,6 +1702,14 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 		goto err_batch_unpin;
 	}
 
+	/* Whilst this request exists, batch_obj will be on the
+	 * active_list, and so will hold the active reference. Only when this
+	 * request is retired will the the batch_obj be moved onto the
+	 * inactive_list and lose its active reference. Hence we do not need
+	 * to explicitly hold another reference here.
+	 */
+	params->request->batch_obj = params->batch->obj;
+
 	ret = i915_gem_request_add_to_client(params->request, file);
 	if (ret)
 		goto err_request;
@@ -1720,7 +1728,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 
 	ret = execbuf_submit(params, args, &eb->vmas);
 err_request:
-	__i915_add_request(params->request, params->batch->obj, ret == 0);
+	__i915_add_request(params->request, ret == 0);
 
 err_batch_unpin:
 	/*
diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index b7ffde002a62..c6f523e2879c 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -461,9 +461,7 @@ static void i915_gem_mark_busy(const struct intel_engine_cs *engine)
  * request is not being tracked for completion but the work itself is
  * going to happen on the hardware. This would be a Bad Thing(tm).
  */
-void __i915_add_request(struct drm_i915_gem_request *request,
-			struct drm_i915_gem_object *obj,
-			bool flush_caches)
+void __i915_add_request(struct drm_i915_gem_request *request, bool flush_caches)
 {
 	struct intel_engine_cs *engine;
 	struct intel_ring *ring;
@@ -504,14 +502,6 @@ void __i915_add_request(struct drm_i915_gem_request *request,
 
 	request->head = request_start;
 
-	/* Whilst this request exists, batch_obj will be on the
-	 * active_list, and so will hold the active reference. Only when this
-	 * request is retired will the the batch_obj be moved onto the
-	 * inactive_list and lose its active reference. Hence we do not need
-	 * to explicitly hold another reference here.
-	 */
-	request->batch_obj = obj;
-
 	/* Seal the request and mark it as pending execution. Note that
 	 * we may inspect this state, without holding any locks, during
 	 * hangcheck. Hence we apply the barrier to ensure that we do not
diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
index 721eb8cbce9b..d5176f9cc22f 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.h
+++ b/drivers/gpu/drm/i915/i915_gem_request.h
@@ -225,13 +225,11 @@ static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst,
 	*pdst = src;
 }
 
-void __i915_add_request(struct drm_i915_gem_request *req,
-			struct drm_i915_gem_object *batch_obj,
-			bool flush_caches);
+void __i915_add_request(struct drm_i915_gem_request *req, bool flush_caches);
 #define i915_add_request(req) \
-	__i915_add_request(req, NULL, true)
+	__i915_add_request(req, true)
 #define i915_add_request_no_flush(req) \
-	__i915_add_request(req, NULL, false)
+	__i915_add_request(req, false)
 
 struct intel_rps_client;
 #define NO_WAITBOOST ERR_PTR(-1)
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 125+ messages in thread

* [PATCH 09/33] drm/i915: Mark unmappable GGTT entries as PIN_HIGH
  2016-08-07 14:45 First class VMA, take 2 Chris Wilson
                   ` (7 preceding siblings ...)
  2016-08-07 14:45 ` [PATCH 08/33] drm/i915: Move setting of request->batch into its single callsite Chris Wilson
@ 2016-08-07 14:45 ` Chris Wilson
  2016-08-08  9:09   ` Joonas Lahtinen
  2016-08-09 11:05   ` Tvrtko Ursulin
  2016-08-07 14:45 ` [PATCH 10/33] drm/i915: Remove inactive/active list from debugfs Chris Wilson
                   ` (29 subsequent siblings)
  38 siblings, 2 replies; 125+ messages in thread
From: Chris Wilson @ 2016-08-07 14:45 UTC (permalink / raw)
  To: intel-gfx

We allocate a few objects into the GGTT that we never need to access via
the mappable aperture (such as contexts, status pages). We can request
that these are bound high in the VM to increase the amount of mappable
aperture available. However, anything that may be frequently pinned
(such as logical contexts) we want to use the fast search & insert.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_lrc.c        | 2 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c | 5 +++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 309c5d9b1c57..c7f4b64b16f6 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1182,7 +1182,7 @@ static int lrc_setup_wa_ctx_obj(struct intel_engine_cs *engine, u32 size)
 	}
 
 	ret = i915_gem_object_ggtt_pin(engine->wa_ctx.obj, NULL,
-				       0, PAGE_SIZE, 0);
+				       0, PAGE_SIZE, PIN_HIGH);
 	if (ret) {
 		DRM_DEBUG_DRIVER("pin LRC WA ctx backing obj failed: %d\n",
 				 ret);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 16b726fe33eb..09f01c641c14 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2093,7 +2093,7 @@ static int intel_ring_context_pin(struct i915_gem_context *ctx,
 
 	if (ce->state) {
 		ret = i915_gem_object_ggtt_pin(ce->state, NULL, 0,
-					       ctx->ggtt_alignment, 0);
+					       ctx->ggtt_alignment, PIN_HIGH);
 		if (ret)
 			goto error;
 	}
@@ -2629,7 +2629,8 @@ static void intel_ring_init_semaphores(struct drm_i915_private *dev_priv,
 			i915.semaphores = 0;
 		} else {
 			i915_gem_object_set_cache_level(obj, I915_CACHE_LLC);
-			ret = i915_gem_object_ggtt_pin(obj, NULL, 0, 0, 0);
+			ret = i915_gem_object_ggtt_pin(obj, NULL,
+						       0, 0, PIN_HIGH);
 			if (ret != 0) {
 				i915_gem_object_put(obj);
 				DRM_ERROR("Failed to pin semaphore bo. Disabling semaphores\n");
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 125+ messages in thread

* [PATCH 10/33] drm/i915: Remove inactive/active list from debugfs
  2016-08-07 14:45 First class VMA, take 2 Chris Wilson
                   ` (8 preceding siblings ...)
  2016-08-07 14:45 ` [PATCH 09/33] drm/i915: Mark unmappable GGTT entries as PIN_HIGH Chris Wilson
@ 2016-08-07 14:45 ` Chris Wilson
  2016-08-09 10:29   ` Joonas Lahtinen
  2016-08-07 14:45 ` [PATCH 11/33] drm/i915: Focus debugfs/i915_gem_pinned to show only display pins Chris Wilson
                   ` (28 subsequent siblings)
  38 siblings, 1 reply; 125+ messages in thread
From: Chris Wilson @ 2016-08-07 14:45 UTC (permalink / raw)
  To: intel-gfx

These two files (i915_gem_active, i915_gem_inactive) no longer give
pertinent information since active/inactive tracking is per-vm and so we
need the information per-vm. They are obsolete so remove them.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c | 49 -------------------------------------
 1 file changed, 49 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 0627e170ea25..8de458dcffaa 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -210,53 +210,6 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 		seq_printf(m, " (frontbuffer: 0x%03x)", frontbuffer_bits);
 }
 
-static int i915_gem_object_list_info(struct seq_file *m, void *data)
-{
-	struct drm_info_node *node = m->private;
-	uintptr_t list = (uintptr_t) node->info_ent->data;
-	struct list_head *head;
-	struct drm_device *dev = node->minor->dev;
-	struct drm_i915_private *dev_priv = to_i915(dev);
-	struct i915_ggtt *ggtt = &dev_priv->ggtt;
-	struct i915_vma *vma;
-	u64 total_obj_size, total_gtt_size;
-	int count, ret;
-
-	ret = mutex_lock_interruptible(&dev->struct_mutex);
-	if (ret)
-		return ret;
-
-	/* FIXME: the user of this interface might want more than just GGTT */
-	switch (list) {
-	case ACTIVE_LIST:
-		seq_puts(m, "Active:\n");
-		head = &ggtt->base.active_list;
-		break;
-	case INACTIVE_LIST:
-		seq_puts(m, "Inactive:\n");
-		head = &ggtt->base.inactive_list;
-		break;
-	default:
-		mutex_unlock(&dev->struct_mutex);
-		return -EINVAL;
-	}
-
-	total_obj_size = total_gtt_size = count = 0;
-	list_for_each_entry(vma, head, vm_link) {
-		seq_printf(m, "   ");
-		describe_obj(m, vma->obj);
-		seq_printf(m, "\n");
-		total_obj_size += vma->obj->base.size;
-		total_gtt_size += vma->node.size;
-		count++;
-	}
-	mutex_unlock(&dev->struct_mutex);
-
-	seq_printf(m, "Total %d objects, %llu bytes, %llu GTT size\n",
-		   count, total_obj_size, total_gtt_size);
-	return 0;
-}
-
 static int obj_rank_by_stolen(void *priv,
 			      struct list_head *A, struct list_head *B)
 {
@@ -5375,8 +5328,6 @@ static const struct drm_info_list i915_debugfs_list[] = {
 	{"i915_gem_objects", i915_gem_object_info, 0},
 	{"i915_gem_gtt", i915_gem_gtt_info, 0},
 	{"i915_gem_pinned", i915_gem_gtt_info, 0, (void *) PINNED_LIST},
-	{"i915_gem_active", i915_gem_object_list_info, 0, (void *) ACTIVE_LIST},
-	{"i915_gem_inactive", i915_gem_object_list_info, 0, (void *) INACTIVE_LIST},
 	{"i915_gem_stolen", i915_gem_stolen_list_info },
 	{"i915_gem_pageflip", i915_gem_pageflip_info, 0},
 	{"i915_gem_request", i915_gem_request_info, 0},
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 125+ messages in thread

* [PATCH 11/33] drm/i915: Focus debugfs/i915_gem_pinned to show only display pins
  2016-08-07 14:45 First class VMA, take 2 Chris Wilson
                   ` (9 preceding siblings ...)
  2016-08-07 14:45 ` [PATCH 10/33] drm/i915: Remove inactive/active list from debugfs Chris Wilson
@ 2016-08-07 14:45 ` Chris Wilson
  2016-08-09 10:39   ` Joonas Lahtinen
  2016-08-07 14:45 ` [PATCH 12/33] drm/i915: Reduce i915_gem_objects to only show object information Chris Wilson
                   ` (27 subsequent siblings)
  38 siblings, 1 reply; 125+ messages in thread
From: Chris Wilson @ 2016-08-07 14:45 UTC (permalink / raw)
  To: intel-gfx

Only those objects pinned to the display have semi-permanent pins of a
global nature (other pins are transient within their local vm). Simplify
i915_gem_pinned to only show the pertinent information about the pinned
objects within the GGTT.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c | 11 ++---------
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 8de458dcffaa..9911594acbc9 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -40,12 +40,6 @@
 #include <drm/i915_drm.h>
 #include "i915_drv.h"
 
-enum {
-	ACTIVE_LIST,
-	INACTIVE_LIST,
-	PINNED_LIST,
-};
-
 /* As the drm_debugfs_init() routines are called before dev->dev_private is
  * allocated we need to hook into the minor for release. */
 static int
@@ -537,7 +531,6 @@ static int i915_gem_gtt_info(struct seq_file *m, void *data)
 {
 	struct drm_info_node *node = m->private;
 	struct drm_device *dev = node->minor->dev;
-	uintptr_t list = (uintptr_t) node->info_ent->data;
 	struct drm_i915_private *dev_priv = to_i915(dev);
 	struct drm_i915_gem_object *obj;
 	u64 total_obj_size, total_gtt_size;
@@ -549,7 +542,7 @@ static int i915_gem_gtt_info(struct seq_file *m, void *data)
 
 	total_obj_size = total_gtt_size = count = 0;
 	list_for_each_entry(obj, &dev_priv->mm.bound_list, global_list) {
-		if (list == PINNED_LIST && !i915_gem_obj_is_pinned(obj))
+		if (!obj->pin_display)
 			continue;
 
 		seq_puts(m, "   ");
@@ -5327,7 +5320,7 @@ static const struct drm_info_list i915_debugfs_list[] = {
 	{"i915_capabilities", i915_capabilities, 0},
 	{"i915_gem_objects", i915_gem_object_info, 0},
 	{"i915_gem_gtt", i915_gem_gtt_info, 0},
-	{"i915_gem_pinned", i915_gem_gtt_info, 0, (void *) PINNED_LIST},
+	{"i915_gem_pinned", i915_gem_gtt_info, 0, 0},
 	{"i915_gem_stolen", i915_gem_stolen_list_info },
 	{"i915_gem_pageflip", i915_gem_pageflip_info, 0},
 	{"i915_gem_request", i915_gem_request_info, 0},
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 125+ messages in thread

* [PATCH 12/33] drm/i915: Reduce i915_gem_objects to only show object information
  2016-08-07 14:45 First class VMA, take 2 Chris Wilson
                   ` (10 preceding siblings ...)
  2016-08-07 14:45 ` [PATCH 11/33] drm/i915: Focus debugfs/i915_gem_pinned to show only display pins Chris Wilson
@ 2016-08-07 14:45 ` Chris Wilson
  2016-08-10  7:29   ` Joonas Lahtinen
  2016-08-07 14:45 ` [PATCH 13/33] drm/i915: Remove redundant WARN_ON from __i915_add_request() Chris Wilson
                   ` (26 subsequent siblings)
  38 siblings, 1 reply; 125+ messages in thread
From: Chris Wilson @ 2016-08-07 14:45 UTC (permalink / raw)
  To: intel-gfx

No longer is knowing how much of the GTT (both mappable aperture and
beyond) relevant, and the output clutters the real information - that is
how many objects are allocated and bound (and by who) so that we can
quickly grasp if there is a leak.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c | 100 ++++++++++--------------------------
 1 file changed, 28 insertions(+), 72 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 9911594acbc9..b41c05767def 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -269,17 +269,6 @@ static int i915_gem_stolen_list_info(struct seq_file *m, void *data)
 	return 0;
 }
 
-#define count_objects(list, member) do { \
-	list_for_each_entry(obj, list, member) { \
-		size += i915_gem_obj_total_ggtt_size(obj); \
-		++count; \
-		if (obj->map_and_fenceable) { \
-			mappable_size += i915_gem_obj_ggtt_size(obj); \
-			++mappable_count; \
-		} \
-	} \
-} while (0)
-
 struct file_stats {
 	struct drm_i915_file_private *file_priv;
 	unsigned long count;
@@ -394,30 +383,16 @@ static void print_context_stats(struct seq_file *m,
 	print_file_stats(m, "[k]contexts", stats);
 }
 
-#define count_vmas(list, member) do { \
-	list_for_each_entry(vma, list, member) { \
-		size += i915_gem_obj_total_ggtt_size(vma->obj); \
-		++count; \
-		if (vma->obj->map_and_fenceable) { \
-			mappable_size += i915_gem_obj_ggtt_size(vma->obj); \
-			++mappable_count; \
-		} \
-	} \
-} while (0)
-
 static int i915_gem_object_info(struct seq_file *m, void* data)
 {
 	struct drm_info_node *node = m->private;
 	struct drm_device *dev = node->minor->dev;
 	struct drm_i915_private *dev_priv = to_i915(dev);
 	struct i915_ggtt *ggtt = &dev_priv->ggtt;
-	u32 count, mappable_count, purgeable_count;
-	u64 size, mappable_size, purgeable_size;
-	unsigned long pin_mapped_count = 0, pin_mapped_purgeable_count = 0;
-	u64 pin_mapped_size = 0, pin_mapped_purgeable_size = 0;
+	u32 count, mapped_count, purgeable_count, pin_count;
+	u64 size, mapped_size, purgeable_size, pin_size;
 	struct drm_i915_gem_object *obj;
 	struct drm_file *file;
-	struct i915_vma *vma;
 	int ret;
 
 	ret = mutex_lock_interruptible(&dev->struct_mutex);
@@ -428,70 +403,51 @@ static int i915_gem_object_info(struct seq_file *m, void* data)
 		   dev_priv->mm.object_count,
 		   dev_priv->mm.object_memory);
 
-	size = count = mappable_size = mappable_count = 0;
-	count_objects(&dev_priv->mm.bound_list, global_list);
-	seq_printf(m, "%u [%u] objects, %llu [%llu] bytes in gtt\n",
-		   count, mappable_count, size, mappable_size);
-
-	size = count = mappable_size = mappable_count = 0;
-	count_vmas(&ggtt->base.active_list, vm_link);
-	seq_printf(m, "  %u [%u] active objects, %llu [%llu] bytes\n",
-		   count, mappable_count, size, mappable_size);
-
-	size = count = mappable_size = mappable_count = 0;
-	count_vmas(&ggtt->base.inactive_list, vm_link);
-	seq_printf(m, "  %u [%u] inactive objects, %llu [%llu] bytes\n",
-		   count, mappable_count, size, mappable_size);
-
 	size = count = purgeable_size = purgeable_count = 0;
 	list_for_each_entry(obj, &dev_priv->mm.unbound_list, global_list) {
-		size += obj->base.size, ++count;
-		if (obj->madv == I915_MADV_DONTNEED)
-			purgeable_size += obj->base.size, ++purgeable_count;
+		size += obj->base.size;
+		++count;
+
+		if (obj->madv == I915_MADV_DONTNEED) {
+			purgeable_size += obj->base.size;
+			++purgeable_count;
+		}
+
 		if (obj->mapping) {
-			pin_mapped_count++;
-			pin_mapped_size += obj->base.size;
-			if (obj->pages_pin_count == 0) {
-				pin_mapped_purgeable_count++;
-				pin_mapped_purgeable_size += obj->base.size;
-			}
+			mapped_count++;
+			mapped_size += obj->base.size;
 		}
 	}
 	seq_printf(m, "%u unbound objects, %llu bytes\n", count, size);
 
-	size = count = mappable_size = mappable_count = 0;
+	size = count = pin_size = pin_count = 0;
 	list_for_each_entry(obj, &dev_priv->mm.bound_list, global_list) {
-		if (obj->fault_mappable) {
-			size += i915_gem_obj_ggtt_size(obj);
-			++count;
-		}
+		size += obj->base.size;
+		++count;
+
 		if (obj->pin_display) {
-			mappable_size += i915_gem_obj_ggtt_size(obj);
-			++mappable_count;
+			pin_size += obj->base.size;
+			++pin_count;
 		}
+
 		if (obj->madv == I915_MADV_DONTNEED) {
 			purgeable_size += obj->base.size;
 			++purgeable_count;
 		}
+
 		if (obj->mapping) {
-			pin_mapped_count++;
-			pin_mapped_size += obj->base.size;
-			if (obj->pages_pin_count == 0) {
-				pin_mapped_purgeable_count++;
-				pin_mapped_purgeable_size += obj->base.size;
-			}
+			mapped_count++;
+			mapped_size += obj->base.size;
 		}
 	}
+	seq_printf(m, "%u bound objects, %llu bytes\n",
+		   count, size);
 	seq_printf(m, "%u purgeable objects, %llu bytes\n",
 		   purgeable_count, purgeable_size);
-	seq_printf(m, "%u pinned mappable objects, %llu bytes\n",
-		   mappable_count, mappable_size);
-	seq_printf(m, "%u fault mappable objects, %llu bytes\n",
-		   count, size);
-	seq_printf(m,
-		   "%lu [%lu] pin mapped objects, %llu [%llu] bytes [purgeable]\n",
-		   pin_mapped_count, pin_mapped_purgeable_count,
-		   pin_mapped_size, pin_mapped_purgeable_size);
+	seq_printf(m, "%u mapped objects, %llu bytes\n",
+		   mapped_count, mapped_size);
+	seq_printf(m, "%u pinned objects, %llu bytes\n",
+		   pin_count, pin_size);
 
 	seq_printf(m, "%llu [%llu] gtt total\n",
 		   ggtt->base.total, ggtt->mappable_end - ggtt->base.start);
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 125+ messages in thread

* [PATCH 13/33] drm/i915: Remove redundant WARN_ON from __i915_add_request()
  2016-08-07 14:45 First class VMA, take 2 Chris Wilson
                   ` (11 preceding siblings ...)
  2016-08-07 14:45 ` [PATCH 12/33] drm/i915: Reduce i915_gem_objects to only show object information Chris Wilson
@ 2016-08-07 14:45 ` Chris Wilson
  2016-08-08  9:03   ` Joonas Lahtinen
  2016-08-07 14:45 ` [PATCH 14/33] drm/i915: Create a VMA for an object Chris Wilson
                   ` (25 subsequent siblings)
  38 siblings, 1 reply; 125+ messages in thread
From: Chris Wilson @ 2016-08-07 14:45 UTC (permalink / raw)
  To: intel-gfx

It's an outright programming error, so explode if it is ever hit.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_request.c | 10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index c6f523e2879c..0092f5e90cb2 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -463,18 +463,12 @@ static void i915_gem_mark_busy(const struct intel_engine_cs *engine)
  */
 void __i915_add_request(struct drm_i915_gem_request *request, bool flush_caches)
 {
-	struct intel_engine_cs *engine;
-	struct intel_ring *ring;
+	struct intel_engine_cs *engine = request->engine;
+	struct intel_ring *ring = request->ring;
 	u32 request_start;
 	u32 reserved_tail;
 	int ret;
 
-	if (WARN_ON(!request))
-		return;
-
-	engine = request->engine;
-	ring = request->ring;
-
 	/*
 	 * To ensure that this call will not fail, space for its emissions
 	 * should already have been reserved in the ring buffer. Let the ring
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 125+ messages in thread

* [PATCH 14/33] drm/i915: Create a VMA for an object
  2016-08-07 14:45 First class VMA, take 2 Chris Wilson
                   ` (12 preceding siblings ...)
  2016-08-07 14:45 ` [PATCH 13/33] drm/i915: Remove redundant WARN_ON from __i915_add_request() Chris Wilson
@ 2016-08-07 14:45 ` Chris Wilson
  2016-08-08  9:01   ` Joonas Lahtinen
  2016-08-07 14:45 ` [PATCH 15/33] drm/i915: Track pinned vma inside guc Chris Wilson
                   ` (24 subsequent siblings)
  38 siblings, 1 reply; 125+ messages in thread
From: Chris Wilson @ 2016-08-07 14:45 UTC (permalink / raw)
  To: intel-gfx

In many places, we wish to store the VMA in preference to the object
itself and so being able to create the persistent VMA is useful.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h     |  2 ++
 drivers/gpu/drm/i915/i915_gem_gtt.c | 10 ++++++++++
 drivers/gpu/drm/i915/i915_gem_gtt.h |  5 +++++
 3 files changed, 17 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 826486d03e8e..2d8f32cd726d 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3903,4 +3903,6 @@ static inline bool __i915_request_irq_complete(struct drm_i915_gem_request *req)
 	return false;
 }
 
+#define nullify(ptr) ({typeof(*ptr) T = *(ptr); *(ptr) = NULL; T;})
+
 #endif
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 18c7c9644761..ce53f08186fa 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -3388,6 +3388,16 @@ __i915_gem_vma_create(struct drm_i915_gem_object *obj,
 }
 
 struct i915_vma *
+i915_vma_create(struct drm_i915_gem_object *obj,
+		struct i915_address_space *vm,
+		const struct i915_ggtt_view *view)
+{
+	GEM_BUG_ON(view ? i915_gem_obj_to_ggtt_view(obj, view) : i915_gem_obj_to_vma(obj, vm));
+
+	return __i915_gem_vma_create(obj, vm, view ?: &i915_ggtt_view_normal);
+}
+
+struct i915_vma *
 i915_gem_obj_lookup_or_create_vma(struct drm_i915_gem_object *obj,
 				  struct i915_address_space *vm)
 {
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index cc56206a1600..ac47663a4d32 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -232,6 +232,11 @@ struct i915_vma {
 	struct drm_i915_gem_exec_object2 *exec_entry;
 };
 
+struct i915_vma *
+i915_vma_create(struct drm_i915_gem_object *obj,
+		struct i915_address_space *vm,
+		const struct i915_ggtt_view *view);
+
 static inline bool i915_vma_is_ggtt(const struct i915_vma *vma)
 {
 	return vma->flags & I915_VMA_GGTT;
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 125+ messages in thread

* [PATCH 15/33] drm/i915: Track pinned vma inside guc
  2016-08-07 14:45 First class VMA, take 2 Chris Wilson
                   ` (13 preceding siblings ...)
  2016-08-07 14:45 ` [PATCH 14/33] drm/i915: Create a VMA for an object Chris Wilson
@ 2016-08-07 14:45 ` Chris Wilson
  2016-08-11 16:19   ` Dave Gordon
  2016-08-07 14:45 ` [PATCH 16/33] drm/i915: Convert fence computations to use vma directly Chris Wilson
                   ` (23 subsequent siblings)
  38 siblings, 1 reply; 125+ messages in thread
From: Chris Wilson @ 2016-08-07 14:45 UTC (permalink / raw)
  To: intel-gfx

Since the guc allocates and pins and object into the GGTT for its usage,
it is more natural to use that pinned VMA as our resource cookie.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c        |  10 +--
 drivers/gpu/drm/i915/i915_guc_submission.c | 131 ++++++++++++++---------------
 drivers/gpu/drm/i915/intel_guc.h           |   9 +-
 drivers/gpu/drm/i915/intel_guc_loader.c    |   7 +-
 4 files changed, 77 insertions(+), 80 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index b41c05767def..e2a9fc353ef3 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2524,15 +2524,15 @@ static int i915_guc_log_dump(struct seq_file *m, void *data)
 	struct drm_info_node *node = m->private;
 	struct drm_device *dev = node->minor->dev;
 	struct drm_i915_private *dev_priv = to_i915(dev);
-	struct drm_i915_gem_object *log_obj = dev_priv->guc.log_obj;
-	u32 *log;
+	struct drm_i915_gem_object *obj;
 	int i = 0, pg;
 
-	if (!log_obj)
+	if (!dev_priv->guc.log)
 		return 0;
 
-	for (pg = 0; pg < log_obj->base.size / PAGE_SIZE; pg++) {
-		log = kmap_atomic(i915_gem_object_get_page(log_obj, pg));
+	obj = dev_priv->guc.log->obj;
+	for (pg = 0; pg < obj->base.size / PAGE_SIZE; pg++) {
+		u32 *log = kmap_atomic(i915_gem_object_get_page(obj, pg));
 
 		for (i = 0; i < PAGE_SIZE / sizeof(u32); i += 4)
 			seq_printf(m, "0x%08x 0x%08x 0x%08x 0x%08x\n",
diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
index 03a5cef353eb..f56d68173ae6 100644
--- a/drivers/gpu/drm/i915/i915_guc_submission.c
+++ b/drivers/gpu/drm/i915/i915_guc_submission.c
@@ -183,7 +183,7 @@ static int guc_update_doorbell_id(struct intel_guc *guc,
 				  struct i915_guc_client *client,
 				  u16 new_id)
 {
-	struct sg_table *sg = guc->ctx_pool_obj->pages;
+	struct sg_table *sg = guc->ctx_pool->obj->pages;
 	void *doorbell_bitmap = guc->doorbell_bitmap;
 	struct guc_doorbell_info *doorbell;
 	struct guc_context_desc desc;
@@ -325,8 +325,8 @@ static void guc_init_proc_desc(struct intel_guc *guc,
 static void guc_init_ctx_desc(struct intel_guc *guc,
 			      struct i915_guc_client *client)
 {
-	struct drm_i915_gem_object *client_obj = client->client_obj;
 	struct drm_i915_private *dev_priv = guc_to_i915(guc);
+	struct drm_i915_gem_object *client_obj = client->client->obj;
 	struct intel_engine_cs *engine;
 	struct i915_gem_context *ctx = client->owner;
 	struct guc_context_desc desc;
@@ -380,7 +380,7 @@ static void guc_init_ctx_desc(struct intel_guc *guc,
 	 * The doorbell, process descriptor, and workqueue are all parts
 	 * of the client object, which the GuC will reference via the GGTT
 	 */
-	gfx_addr = i915_gem_obj_ggtt_offset(client_obj);
+	gfx_addr = client->client->node.start;
 	desc.db_trigger_phy = sg_dma_address(client_obj->pages->sgl) +
 				client->doorbell_offset;
 	desc.db_trigger_cpu = (uintptr_t)client->client_base +
@@ -397,7 +397,7 @@ static void guc_init_ctx_desc(struct intel_guc *guc,
 	desc.desc_private = (uintptr_t)client;
 
 	/* Pool context is pinned already */
-	sg = guc->ctx_pool_obj->pages;
+	sg = guc->ctx_pool->obj->pages;
 	sg_pcopy_from_buffer(sg->sgl, sg->nents, &desc, sizeof(desc),
 			     sizeof(desc) * client->ctx_index);
 }
@@ -410,7 +410,7 @@ static void guc_fini_ctx_desc(struct intel_guc *guc,
 
 	memset(&desc, 0, sizeof(desc));
 
-	sg = guc->ctx_pool_obj->pages;
+	sg = guc->ctx_pool->obj->pages;
 	sg_pcopy_from_buffer(sg->sgl, sg->nents, &desc, sizeof(desc),
 			     sizeof(desc) * client->ctx_index);
 }
@@ -492,7 +492,7 @@ static void guc_add_workqueue_item(struct i915_guc_client *gc,
 	/* WQ starts from the page after doorbell / process_desc */
 	wq_page = (wq_off + GUC_DB_SIZE) >> PAGE_SHIFT;
 	wq_off &= PAGE_SIZE - 1;
-	base = kmap_atomic(i915_gem_object_get_page(gc->client_obj, wq_page));
+	base = kmap_atomic(i915_gem_object_get_page(gc->client->obj, wq_page));
 	wqi = (struct guc_wq_item *)((char *)base + wq_off);
 
 	/* Now fill in the 4-word work queue item */
@@ -611,8 +611,8 @@ static void i915_guc_submit(struct drm_i915_gem_request *rq)
  */
 
 /**
- * gem_allocate_guc_obj() - Allocate gem object for GuC usage
- * @dev_priv:	driver private data structure
+ * guc_allocate_vma() - Allocate a GGTT VMA for GuC usage
+ * @guc:	the guc
  * @size:	size of object
  *
  * This is a wrapper to create a gem obj. In order to use it inside GuC, the
@@ -621,45 +621,49 @@ static void i915_guc_submit(struct drm_i915_gem_request *rq)
  *
  * Return:	A drm_i915_gem_object if successful, otherwise NULL.
  */
-static struct drm_i915_gem_object *
-gem_allocate_guc_obj(struct drm_i915_private *dev_priv, u32 size)
+static struct i915_vma *guc_allocate_vma(struct intel_guc *guc, u32 size)
 {
+	struct drm_i915_private *dev_priv = guc_to_i915(guc);
 	struct drm_i915_gem_object *obj;
+	struct i915_vma *vma;
+	int ret;
 
 	obj = i915_gem_object_create(&dev_priv->drm, size);
 	if (IS_ERR(obj))
-		return NULL;
+		return ERR_CAST(obj);
 
-	if (i915_gem_object_get_pages(obj)) {
-		i915_gem_object_put(obj);
-		return NULL;
-	}
+	vma = i915_vma_create(obj, &dev_priv->ggtt.base, NULL);
+	if (IS_ERR(vma))
+		goto err;
 
-	if (i915_gem_object_ggtt_pin(obj, NULL, 0, PAGE_SIZE,
-				     PIN_OFFSET_BIAS | GUC_WOPCM_TOP)) {
-		i915_gem_object_put(obj);
-		return NULL;
+	ret = i915_vma_pin(vma, 0, PAGE_SIZE,
+			   PIN_GLOBAL | PIN_OFFSET_BIAS | GUC_WOPCM_TOP);
+	if (ret) {
+		vma = ERR_PTR(ret);
+		goto err;
 	}
 
 	/* Invalidate GuC TLB to let GuC take the latest updates to GTT. */
 	I915_WRITE(GEN8_GTCR, GEN8_GTCR_INVALIDATE);
 
-	return obj;
+	return vma;
+
+err:
+	i915_gem_object_put(obj);
+	return vma;
 }
 
 /**
- * gem_release_guc_obj() - Release gem object allocated for GuC usage
- * @obj:	gem obj to be released
+ * guc_release_vma() - Release gem object allocated for GuC usage
+ * @vma:	gem obj to be released
  */
-static void gem_release_guc_obj(struct drm_i915_gem_object *obj)
+static void guc_release_vma(struct i915_vma *vma)
 {
-	if (!obj)
+	if (!vma)
 		return;
 
-	if (i915_gem_obj_is_pinned(obj))
-		i915_gem_object_ggtt_unpin(obj);
-
-	i915_gem_object_put(obj);
+	i915_vma_unpin(vma);
+	i915_gem_object_put(vma->obj);
 }
 
 static void
@@ -686,7 +690,7 @@ guc_client_free(struct drm_i915_private *dev_priv,
 		kunmap(kmap_to_page(client->client_base));
 	}
 
-	gem_release_guc_obj(client->client_obj);
+	guc_release_vma(client->client);
 
 	if (client->ctx_index != GUC_INVALID_CTX_ID) {
 		guc_fini_ctx_desc(guc, client);
@@ -757,7 +761,7 @@ guc_client_alloc(struct drm_i915_private *dev_priv,
 {
 	struct i915_guc_client *client;
 	struct intel_guc *guc = &dev_priv->guc;
-	struct drm_i915_gem_object *obj;
+	struct i915_vma *vma;
 	uint16_t db_id;
 
 	client = kzalloc(sizeof(*client), GFP_KERNEL);
@@ -777,13 +781,13 @@ guc_client_alloc(struct drm_i915_private *dev_priv,
 	}
 
 	/* The first page is doorbell/proc_desc. Two followed pages are wq. */
-	obj = gem_allocate_guc_obj(dev_priv, GUC_DB_SIZE + GUC_WQ_SIZE);
-	if (!obj)
+	vma = guc_allocate_vma(guc, GUC_DB_SIZE + GUC_WQ_SIZE);
+	if (IS_ERR(vma))
 		goto err;
 
 	/* We'll keep just the first (doorbell/proc) page permanently kmap'd. */
-	client->client_obj = obj;
-	client->client_base = kmap(i915_gem_object_get_page(obj, 0));
+	client->client = vma;
+	client->client_base = kmap(i915_gem_object_get_page(vma->obj, 0));
 	client->wq_offset = GUC_DB_SIZE;
 	client->wq_size = GUC_WQ_SIZE;
 
@@ -825,8 +829,7 @@ err:
 
 static void guc_create_log(struct intel_guc *guc)
 {
-	struct drm_i915_private *dev_priv = guc_to_i915(guc);
-	struct drm_i915_gem_object *obj;
+	struct i915_vma *vma;
 	unsigned long offset;
 	uint32_t size, flags;
 
@@ -842,16 +845,16 @@ static void guc_create_log(struct intel_guc *guc)
 		GUC_LOG_ISR_PAGES + 1 +
 		GUC_LOG_CRASH_PAGES + 1) << PAGE_SHIFT;
 
-	obj = guc->log_obj;
-	if (!obj) {
-		obj = gem_allocate_guc_obj(dev_priv, size);
-		if (!obj) {
+	vma = guc->log;
+	if (!vma) {
+		vma = guc_allocate_vma(guc, size);
+		if (IS_ERR(vma)) {
 			/* logging will be off */
 			i915.guc_log_level = -1;
 			return;
 		}
 
-		guc->log_obj = obj;
+		guc->log = vma;
 	}
 
 	/* each allocated unit is a page */
@@ -860,7 +863,7 @@ static void guc_create_log(struct intel_guc *guc)
 		(GUC_LOG_ISR_PAGES << GUC_LOG_ISR_SHIFT) |
 		(GUC_LOG_CRASH_PAGES << GUC_LOG_CRASH_SHIFT);
 
-	offset = i915_gem_obj_ggtt_offset(obj) >> PAGE_SHIFT; /* in pages */
+	offset = vma->node.start >> PAGE_SHIFT; /* in pages */
 	guc->log_flags = (offset << GUC_LOG_BUF_ADDR_SHIFT) | flags;
 }
 
@@ -889,7 +892,7 @@ static void init_guc_policies(struct guc_policies *policies)
 static void guc_create_ads(struct intel_guc *guc)
 {
 	struct drm_i915_private *dev_priv = guc_to_i915(guc);
-	struct drm_i915_gem_object *obj;
+	struct i915_vma *vma;
 	struct guc_ads *ads;
 	struct guc_policies *policies;
 	struct guc_mmio_reg_state *reg_state;
@@ -902,16 +905,16 @@ static void guc_create_ads(struct intel_guc *guc)
 			sizeof(struct guc_mmio_reg_state) +
 			GUC_S3_SAVE_SPACE_PAGES * PAGE_SIZE;
 
-	obj = guc->ads_obj;
-	if (!obj) {
-		obj = gem_allocate_guc_obj(dev_priv, PAGE_ALIGN(size));
-		if (!obj)
+	vma = guc->ads;
+	if (!vma) {
+		vma = guc_allocate_vma(guc, PAGE_ALIGN(size));
+		if (IS_ERR(vma))
 			return;
 
-		guc->ads_obj = obj;
+		guc->ads = vma;
 	}
 
-	page = i915_gem_object_get_page(obj, 0);
+	page = i915_gem_object_get_page(vma->obj, 0);
 	ads = kmap(page);
 
 	/*
@@ -931,8 +934,7 @@ static void guc_create_ads(struct intel_guc *guc)
 	policies = (void *)ads + sizeof(struct guc_ads);
 	init_guc_policies(policies);
 
-	ads->scheduler_policies = i915_gem_obj_ggtt_offset(obj) +
-			sizeof(struct guc_ads);
+	ads->scheduler_policies = vma->node.start + sizeof(struct guc_ads);
 
 	/* MMIO reg state */
 	reg_state = (void *)policies + sizeof(struct guc_policies);
@@ -960,10 +962,9 @@ static void guc_create_ads(struct intel_guc *guc)
  */
 int i915_guc_submission_init(struct drm_i915_private *dev_priv)
 {
-	const size_t ctxsize = sizeof(struct guc_context_desc);
-	const size_t poolsize = GUC_MAX_GPU_CONTEXTS * ctxsize;
-	const size_t gemsize = round_up(poolsize, PAGE_SIZE);
 	struct intel_guc *guc = &dev_priv->guc;
+	struct i915_vma *vma;
+	u32 size;
 
 	/* Wipe bitmap & delete client in case of reinitialisation */
 	bitmap_clear(guc->doorbell_bitmap, 0, GUC_MAX_DOORBELLS);
@@ -972,13 +973,15 @@ int i915_guc_submission_init(struct drm_i915_private *dev_priv)
 	if (!i915.enable_guc_submission)
 		return 0; /* not enabled  */
 
-	if (guc->ctx_pool_obj)
+	if (guc->ctx_pool)
 		return 0; /* already allocated */
 
-	guc->ctx_pool_obj = gem_allocate_guc_obj(dev_priv, gemsize);
-	if (!guc->ctx_pool_obj)
-		return -ENOMEM;
+	size = PAGE_ALIGN(GUC_MAX_GPU_CONTEXTS*sizeof(struct guc_context_desc));
+	vma = guc_allocate_vma(guc, size);
+	if (IS_ERR(vma))
+		return PTR_ERR(vma);
 
+	guc->ctx_pool = vma;
 	ida_init(&guc->ctx_ids);
 	guc_create_log(guc);
 	guc_create_ads(guc);
@@ -1030,16 +1033,12 @@ void i915_guc_submission_fini(struct drm_i915_private *dev_priv)
 {
 	struct intel_guc *guc = &dev_priv->guc;
 
-	gem_release_guc_obj(dev_priv->guc.ads_obj);
-	guc->ads_obj = NULL;
-
-	gem_release_guc_obj(dev_priv->guc.log_obj);
-	guc->log_obj = NULL;
+	guc_release_vma(nullify(&guc->ads));
+	guc_release_vma(nullify(&guc->log));
 
-	if (guc->ctx_pool_obj)
+	if (guc->ctx_pool)
 		ida_destroy(&guc->ctx_ids);
-	gem_release_guc_obj(guc->ctx_pool_obj);
-	guc->ctx_pool_obj = NULL;
+	guc_release_vma(nullify(&guc->ctx_pool));
 }
 
 /**
diff --git a/drivers/gpu/drm/i915/intel_guc.h b/drivers/gpu/drm/i915/intel_guc.h
index 623cf26cd784..a8da563cadb7 100644
--- a/drivers/gpu/drm/i915/intel_guc.h
+++ b/drivers/gpu/drm/i915/intel_guc.h
@@ -63,7 +63,7 @@ struct drm_i915_gem_request;
  *   retcode: errno from last guc_submit()
  */
 struct i915_guc_client {
-	struct drm_i915_gem_object *client_obj;
+	struct i915_vma *client;
 	void *client_base;		/* first page (only) of above	*/
 	struct i915_gem_context *owner;
 	struct intel_guc *guc;
@@ -125,11 +125,10 @@ struct intel_guc_fw {
 struct intel_guc {
 	struct intel_guc_fw guc_fw;
 	uint32_t log_flags;
-	struct drm_i915_gem_object *log_obj;
+	struct i915_vma *log;
 
-	struct drm_i915_gem_object *ads_obj;
-
-	struct drm_i915_gem_object *ctx_pool_obj;
+	struct i915_vma *ads;
+	struct i915_vma *ctx_pool;
 	struct ida ctx_ids;
 
 	struct i915_guc_client *execbuf_client;
diff --git a/drivers/gpu/drm/i915/intel_guc_loader.c b/drivers/gpu/drm/i915/intel_guc_loader.c
index 3763e30cc165..58ef4418a2ef 100644
--- a/drivers/gpu/drm/i915/intel_guc_loader.c
+++ b/drivers/gpu/drm/i915/intel_guc_loader.c
@@ -181,16 +181,15 @@ static void set_guc_init_params(struct drm_i915_private *dev_priv)
 			i915.guc_log_level << GUC_LOG_VERBOSITY_SHIFT;
 	}
 
-	if (guc->ads_obj) {
-		u32 ads = (u32)i915_gem_obj_ggtt_offset(guc->ads_obj)
-				>> PAGE_SHIFT;
+	if (guc->ads) {
+		u32 ads = (u32)guc->ads->node.start >> PAGE_SHIFT;
 		params[GUC_CTL_DEBUG] |= ads << GUC_ADS_ADDR_SHIFT;
 		params[GUC_CTL_DEBUG] |= GUC_ADS_ENABLED;
 	}
 
 	/* If GuC submission is enabled, set up additional parameters here */
 	if (i915.enable_guc_submission) {
-		u32 pgs = i915_gem_obj_ggtt_offset(dev_priv->guc.ctx_pool_obj);
+		u32 pgs = dev_priv->guc.ctx_pool->node.start;
 		u32 ctx_in_16 = GUC_MAX_GPU_CONTEXTS / 16;
 
 		pgs >>= PAGE_SHIFT;
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 125+ messages in thread

* [PATCH 16/33] drm/i915: Convert fence computations to use vma directly
  2016-08-07 14:45 First class VMA, take 2 Chris Wilson
                   ` (14 preceding siblings ...)
  2016-08-07 14:45 ` [PATCH 15/33] drm/i915: Track pinned vma inside guc Chris Wilson
@ 2016-08-07 14:45 ` Chris Wilson
  2016-08-09 10:27   ` Joonas Lahtinen
  2016-08-07 14:45 ` [PATCH 17/33] drm/i915: Use VMA directly for checking tiling parameters Chris Wilson
                   ` (22 subsequent siblings)
  38 siblings, 1 reply; 125+ messages in thread
From: Chris Wilson @ 2016-08-07 14:45 UTC (permalink / raw)
  To: intel-gfx

Lookup the GGTT vma once for the object assigned to the fence, and then
derive everything from that vma.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_fence.c | 55 +++++++++++++++++------------------
 1 file changed, 26 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_fence.c b/drivers/gpu/drm/i915/i915_gem_fence.c
index 9e8173fe2a09..60749cd23f20 100644
--- a/drivers/gpu/drm/i915/i915_gem_fence.c
+++ b/drivers/gpu/drm/i915/i915_gem_fence.c
@@ -85,22 +85,19 @@ static void i965_write_fence_reg(struct drm_device *dev, int reg,
 	POSTING_READ(fence_reg_lo);
 
 	if (obj) {
-		u32 size = i915_gem_obj_ggtt_size(obj);
+		struct i915_vma *vma = i915_gem_obj_to_ggtt(obj);
 		unsigned int tiling = i915_gem_object_get_tiling(obj);
 		unsigned int stride = i915_gem_object_get_stride(obj);
-		uint64_t val;
+		u64 size = vma->node.size;
+		u32 row_size = stride * (tiling == I915_TILING_Y ? 32 : 8);
+		u64 val;
 
 		/* Adjust fence size to match tiled area */
-		if (tiling != I915_TILING_NONE) {
-			uint32_t row_size = stride *
-				(tiling == I915_TILING_Y ? 32 : 8);
-			size = (size / row_size) * row_size;
-		}
+		size = size / row_size * row_size;
 
-		val = (uint64_t)((i915_gem_obj_ggtt_offset(obj) + size - 4096) &
-				 0xfffff000) << 32;
-		val |= i915_gem_obj_ggtt_offset(obj) & 0xfffff000;
-		val |= (uint64_t)((stride / 128) - 1) << fence_pitch_shift;
+		val = ((vma->node.start + size - 4096) & 0xfffff000) << 32;
+		val |= vma->node.start & 0xfffff000;
+		val |= (u64)((stride / 128) - 1) << fence_pitch_shift;
 		if (tiling == I915_TILING_Y)
 			val |= 1 << I965_FENCE_TILING_Y_SHIFT;
 		val |= I965_FENCE_REG_VALID;
@@ -123,17 +120,17 @@ static void i915_write_fence_reg(struct drm_device *dev, int reg,
 	u32 val;
 
 	if (obj) {
-		u32 size = i915_gem_obj_ggtt_size(obj);
+		struct i915_vma *vma = i915_gem_obj_to_ggtt(obj);
 		unsigned int tiling = i915_gem_object_get_tiling(obj);
 		unsigned int stride = i915_gem_object_get_stride(obj);
 		int pitch_val;
 		int tile_width;
 
-		WARN((i915_gem_obj_ggtt_offset(obj) & ~I915_FENCE_START_MASK) ||
-		     (size & -size) != size ||
-		     (i915_gem_obj_ggtt_offset(obj) & (size - 1)),
-		     "object 0x%08llx [fenceable? %d] not 1M or pot-size (0x%08x) aligned\n",
-		     i915_gem_obj_ggtt_offset(obj), obj->map_and_fenceable, size);
+		WARN((vma->node.start & ~I915_FENCE_START_MASK) ||
+		     !is_power_of_2(vma->node.size) ||
+		     (vma->node.start & (vma->node.size - 1)),
+		     "object 0x%08llx [fenceable? %d] not 1M or pot-size (0x%08llx) aligned\n",
+		     vma->node.start, obj->map_and_fenceable, vma->node.size);
 
 		if (tiling == I915_TILING_Y && HAS_128_BYTE_Y_TILING(dev))
 			tile_width = 128;
@@ -144,10 +141,10 @@ static void i915_write_fence_reg(struct drm_device *dev, int reg,
 		pitch_val = stride / tile_width;
 		pitch_val = ffs(pitch_val) - 1;
 
-		val = i915_gem_obj_ggtt_offset(obj);
+		val = vma->node.start;
 		if (tiling == I915_TILING_Y)
 			val |= 1 << I830_FENCE_TILING_Y_SHIFT;
-		val |= I915_FENCE_SIZE_BITS(size);
+		val |= I915_FENCE_SIZE_BITS(vma->node.size);
 		val |= pitch_val << I830_FENCE_PITCH_SHIFT;
 		val |= I830_FENCE_REG_VALID;
 	} else
@@ -161,27 +158,27 @@ static void i830_write_fence_reg(struct drm_device *dev, int reg,
 				struct drm_i915_gem_object *obj)
 {
 	struct drm_i915_private *dev_priv = to_i915(dev);
-	uint32_t val;
+	u32 val;
 
 	if (obj) {
-		u32 size = i915_gem_obj_ggtt_size(obj);
+		struct i915_vma *vma = i915_gem_obj_to_ggtt(obj);
 		unsigned int tiling = i915_gem_object_get_tiling(obj);
 		unsigned int stride = i915_gem_object_get_stride(obj);
-		uint32_t pitch_val;
+		u32 pitch_val;
 
-		WARN((i915_gem_obj_ggtt_offset(obj) & ~I830_FENCE_START_MASK) ||
-		     (size & -size) != size ||
-		     (i915_gem_obj_ggtt_offset(obj) & (size - 1)),
-		     "object 0x%08llx not 512K or pot-size 0x%08x aligned\n",
-		     i915_gem_obj_ggtt_offset(obj), size);
+		WARN((vma->node.start & ~I830_FENCE_START_MASK) ||
+		     !is_power_of_2(vma->node.size) ||
+		     (vma->node.start & (vma->node.size - 1)),
+		     "object 0x%08llx not 512K or pot-size 0x%08llx aligned\n",
+		     vma->node.start, vma->node.size);
 
 		pitch_val = stride / 128;
 		pitch_val = ffs(pitch_val) - 1;
 
-		val = i915_gem_obj_ggtt_offset(obj);
+		val = vma->node.start;
 		if (tiling == I915_TILING_Y)
 			val |= 1 << I830_FENCE_TILING_Y_SHIFT;
-		val |= I830_FENCE_SIZE_BITS(size);
+		val |= I830_FENCE_SIZE_BITS(vma->node.size);
 		val |= pitch_val << I830_FENCE_PITCH_SHIFT;
 		val |= I830_FENCE_REG_VALID;
 	} else
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 125+ messages in thread

* [PATCH 17/33] drm/i915: Use VMA directly for checking tiling parameters
  2016-08-07 14:45 First class VMA, take 2 Chris Wilson
                   ` (15 preceding siblings ...)
  2016-08-07 14:45 ` [PATCH 16/33] drm/i915: Convert fence computations to use vma directly Chris Wilson
@ 2016-08-07 14:45 ` Chris Wilson
  2016-08-09  6:18   ` Joonas Lahtinen
  2016-08-07 14:45 ` [PATCH 18/33] drm/i915: Use VMA as the primary object for context state Chris Wilson
                   ` (21 subsequent siblings)
  38 siblings, 1 reply; 125+ messages in thread
From: Chris Wilson @ 2016-08-07 14:45 UTC (permalink / raw)
  To: intel-gfx

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_tiling.c | 47 ++++++++++++++++++++--------------
 1 file changed, 28 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_tiling.c b/drivers/gpu/drm/i915/i915_gem_tiling.c
index f4b984de83b5..2ceaddc959d3 100644
--- a/drivers/gpu/drm/i915/i915_gem_tiling.c
+++ b/drivers/gpu/drm/i915/i915_gem_tiling.c
@@ -117,34 +117,45 @@ i915_tiling_ok(struct drm_device *dev, int stride, int size, int tiling_mode)
 }
 
 /* Is the current GTT allocation valid for the change in tiling? */
-static bool
+static int
 i915_gem_object_fence_ok(struct drm_i915_gem_object *obj, int tiling_mode)
 {
 	struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
+	struct i915_vma *vma;
 	u32 size;
 
 	if (tiling_mode == I915_TILING_NONE)
-		return true;
+		return 0;
 
 	if (INTEL_GEN(dev_priv) >= 4)
-		return true;
+		return 0;
+
+	vma = i915_gem_obj_to_ggtt(obj);
+	if (!vma)
+		return 0;
+
+	if (!obj->map_and_fenceable)
+		return 0;
 
 	if (IS_GEN3(dev_priv)) {
-		if (i915_gem_obj_ggtt_offset(obj) & ~I915_FENCE_START_MASK)
-			return false;
+		if (vma->node.start & ~I915_FENCE_START_MASK)
+			goto bad;
 	} else {
-		if (i915_gem_obj_ggtt_offset(obj) & ~I830_FENCE_START_MASK)
-			return false;
+		if (vma->node.start & ~I830_FENCE_START_MASK)
+			goto bad;
 	}
 
 	size = i915_gem_get_ggtt_size(dev_priv, obj->base.size, tiling_mode);
-	if (i915_gem_obj_ggtt_size(obj) != size)
-		return false;
+	if (vma->node.size < size)
+		goto bad;
 
-	if (i915_gem_obj_ggtt_offset(obj) & (size - 1))
-		return false;
+	if (vma->node.start & (size - 1))
+		goto bad;
 
-	return true;
+	return 0;
+
+bad:
+	return i915_vma_unbind(vma);
 }
 
 /**
@@ -168,7 +179,7 @@ i915_gem_set_tiling(struct drm_device *dev, void *data,
 	struct drm_i915_gem_set_tiling *args = data;
 	struct drm_i915_private *dev_priv = to_i915(dev);
 	struct drm_i915_gem_object *obj;
-	int ret = 0;
+	int err = 0;
 
 	/* Make sure we don't cross-contaminate obj->tiling_and_stride */
 	BUILD_BUG_ON(I915_TILING_LAST & STRIDE_MASK);
@@ -187,7 +198,7 @@ i915_gem_set_tiling(struct drm_device *dev, void *data,
 
 	mutex_lock(&dev->struct_mutex);
 	if (obj->pin_display || obj->framebuffer_references) {
-		ret = -EBUSY;
+		err = -EBUSY;
 		goto err;
 	}
 
@@ -234,11 +245,9 @@ i915_gem_set_tiling(struct drm_device *dev, void *data,
 		 * has to also include the unfenced register the GPU uses
 		 * whilst executing a fenced command for an untiled object.
 		 */
-		if (obj->map_and_fenceable &&
-		    !i915_gem_object_fence_ok(obj, args->tiling_mode))
-			ret = i915_vma_unbind(i915_gem_obj_to_ggtt(obj));
 
-		if (ret == 0) {
+		err = i915_gem_object_fence_ok(obj, args->tiling_mode);
+		if (!err) {
 			if (obj->pages &&
 			    obj->madv == I915_MADV_WILLNEED &&
 			    dev_priv->quirks & QUIRK_PIN_SWIZZLED_PAGES) {
@@ -281,7 +290,7 @@ err:
 
 	intel_runtime_pm_put(dev_priv);
 
-	return ret;
+	return err;
 }
 
 /**
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 125+ messages in thread

* [PATCH 18/33] drm/i915: Use VMA as the primary object for context state
  2016-08-07 14:45 First class VMA, take 2 Chris Wilson
                   ` (16 preceding siblings ...)
  2016-08-07 14:45 ` [PATCH 17/33] drm/i915: Use VMA directly for checking tiling parameters Chris Wilson
@ 2016-08-07 14:45 ` Chris Wilson
  2016-08-10  8:03   ` Joonas Lahtinen
  2016-08-07 14:45 ` [PATCH 19/33] drm/i915: Only clflush the context object when binding Chris Wilson
                   ` (20 subsequent siblings)
  38 siblings, 1 reply; 125+ messages in thread
From: Chris Wilson @ 2016-08-07 14:45 UTC (permalink / raw)
  To: intel-gfx

When working with contexts, we most frequently want the GGTT VMA for the
context state, first and foremost. Since the object is available via the
VMA, we need only then store the VMA.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c        | 33 +++++++--------
 drivers/gpu/drm/i915/i915_drv.h            |  3 +-
 drivers/gpu/drm/i915/i915_gem_context.c    | 60 ++++++++++++++--------------
 drivers/gpu/drm/i915/i915_gpu_error.c      |  7 ++--
 drivers/gpu/drm/i915/i915_guc_submission.c |  6 +--
 drivers/gpu/drm/i915/intel_lrc.c           | 64 +++++++++++++++---------------
 drivers/gpu/drm/i915/intel_ringbuffer.c    |  6 +--
 7 files changed, 88 insertions(+), 91 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index e2a9fc353ef3..6a03fa7b6264 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -354,7 +354,7 @@ static int per_file_ctx_stats(int id, void *ptr, void *data)
 
 	for (n = 0; n < ARRAY_SIZE(ctx->engine); n++) {
 		if (ctx->engine[n].state)
-			per_file_stats(0, ctx->engine[n].state, data);
+			per_file_stats(0, ctx->engine[n].state->obj, data);
 		if (ctx->engine[n].ring)
 			per_file_stats(0, ctx->engine[n].ring->obj, data);
 	}
@@ -1976,7 +1976,7 @@ static int i915_context_status(struct seq_file *m, void *unused)
 			seq_printf(m, "%s: ", engine->name);
 			seq_putc(m, ce->initialised ? 'I' : 'i');
 			if (ce->state)
-				describe_obj(m, ce->state);
+				describe_obj(m, ce->state->obj);
 			if (ce->ring)
 				describe_ctx_ring(m, ce->ring);
 			seq_putc(m, '\n');
@@ -1994,36 +1994,33 @@ static void i915_dump_lrc_obj(struct seq_file *m,
 			      struct i915_gem_context *ctx,
 			      struct intel_engine_cs *engine)
 {
-	struct drm_i915_gem_object *ctx_obj = ctx->engine[engine->id].state;
+	struct i915_vma *vma = ctx->engine[engine->id].state;
 	struct page *page;
-	uint32_t *reg_state;
 	int j;
-	unsigned long ggtt_offset = 0;
 
 	seq_printf(m, "CONTEXT: %s %u\n", engine->name, ctx->hw_id);
 
-	if (ctx_obj == NULL) {
-		seq_puts(m, "\tNot allocated\n");
+	if (!vma) {
+		seq_puts(m, "\tFake context\n");
 		return;
 	}
 
-	if (!i915_gem_obj_ggtt_bound(ctx_obj))
-		seq_puts(m, "\tNot bound in GGTT\n");
-	else
-		ggtt_offset = i915_gem_obj_ggtt_offset(ctx_obj);
+	if (vma->flags & I915_VMA_GLOBAL_BIND)
+		seq_printf(m, "\tBound in GGTT at 0x%x\n",
+			   lower_32_bits(vma->node.start));
 
-	if (i915_gem_object_get_pages(ctx_obj)) {
-		seq_puts(m, "\tFailed to get pages for context object\n");
+	if (i915_gem_object_get_pages(vma->obj)) {
+		seq_puts(m, "\tFailed to get pages for context object\n\n");
 		return;
 	}
 
-	page = i915_gem_object_get_page(ctx_obj, LRC_STATE_PN);
-	if (!WARN_ON(page == NULL)) {
-		reg_state = kmap_atomic(page);
+	page = i915_gem_object_get_page(vma->obj, LRC_STATE_PN);
+	if (page) {
+		u32 *reg_state = kmap_atomic(page);
 
 		for (j = 0; j < 0x600 / sizeof(u32) / 4; j += 4) {
-			seq_printf(m, "\t[0x%08lx] 0x%08x 0x%08x 0x%08x 0x%08x\n",
-				   ggtt_offset + 4096 + (j * 4),
+			seq_printf(m, "\t[0x%08llx] 0x%08x 0x%08x 0x%08x 0x%08x\n",
+				   vma->node.start + 4096 + (j * 4),
 				   reg_state[j], reg_state[j + 1],
 				   reg_state[j + 2], reg_state[j + 3]);
 		}
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 2d8f32cd726d..143b42b6545e 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -892,9 +892,8 @@ struct i915_gem_context {
 	u32 ggtt_alignment;
 
 	struct intel_context {
-		struct drm_i915_gem_object *state;
+		struct i915_vma *state;
 		struct intel_ring *ring;
-		struct i915_vma *lrc_vma;
 		uint32_t *lrc_reg_state;
 		u64 lrc_desc;
 		int pin_count;
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index bb72af5320b0..aa0419faeb34 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -155,7 +155,7 @@ void i915_gem_context_free(struct kref *ctx_ref)
 		if (ce->ring)
 			intel_ring_free(ce->ring);
 
-		i915_gem_object_put(ce->state);
+		i915_gem_object_put(ce->state->obj);
 	}
 
 	list_del(&ctx->link);
@@ -281,13 +281,24 @@ __create_hw_context(struct drm_device *dev,
 	ctx->ggtt_alignment = get_context_alignment(dev_priv);
 
 	if (dev_priv->hw_context_size) {
-		struct drm_i915_gem_object *obj =
-				i915_gem_alloc_context_obj(dev, dev_priv->hw_context_size);
+		struct drm_i915_gem_object *obj;
+		struct i915_vma *vma;
+
+		obj = i915_gem_alloc_context_obj(dev,
+						 dev_priv->hw_context_size);
 		if (IS_ERR(obj)) {
 			ret = PTR_ERR(obj);
 			goto err_out;
 		}
-		ctx->engine[RCS].state = obj;
+
+		vma = i915_vma_create(obj, &dev_priv->ggtt.base, NULL);
+		if (IS_ERR(vma)) {
+			i915_gem_object_put(obj);
+			ret = PTR_ERR(vma);
+			goto err_out;
+		}
+
+		ctx->engine[RCS].state = vma;
 	}
 
 	/* Default context will never have a file_priv */
@@ -399,7 +410,7 @@ static void i915_gem_context_unpin(struct i915_gem_context *ctx,
 		struct intel_context *ce = &ctx->engine[engine->id];
 
 		if (ce->state)
-			i915_gem_object_ggtt_unpin(ce->state);
+			i915_vma_unpin(ce->state);
 
 		i915_gem_context_put(ctx);
 	}
@@ -620,9 +631,7 @@ mi_set_context(struct drm_i915_gem_request *req, u32 hw_flags)
 
 	intel_ring_emit(ring, MI_NOOP);
 	intel_ring_emit(ring, MI_SET_CONTEXT);
-	intel_ring_emit(ring,
-			i915_gem_obj_ggtt_offset(req->ctx->engine[RCS].state) |
-			flags);
+	intel_ring_emit(ring, req->ctx->engine[RCS].state->node.start | flags);
 	/*
 	 * w/a: MI_SET_CONTEXT must always be followed by MI_NOOP
 	 * WaMiSetContext_Hang:snb,ivb,vlv
@@ -763,8 +772,9 @@ static int do_rcs_switch(struct drm_i915_gem_request *req)
 		return 0;
 
 	/* Trying to pin first makes error handling easier. */
-	ret = i915_gem_object_ggtt_pin(to->engine[RCS].state, NULL, 0,
-				       to->ggtt_alignment, 0);
+	ret = i915_vma_pin(to->engine[RCS].state,
+			   0, to->ggtt_alignment,
+			   PIN_GLOBAL);
 	if (ret)
 		return ret;
 
@@ -778,16 +788,12 @@ static int do_rcs_switch(struct drm_i915_gem_request *req)
 	from = engine->last_context;
 
 	/*
-	 * Clear this page out of any CPU caches for coherent swap-in/out. Note
-	 * that thanks to write = false in this call and us not setting any gpu
-	 * write domains when putting a context object onto the active list
-	 * (when switching away from it), this won't block.
-	 *
-	 * XXX: We need a real interface to do this instead of trickery.
+	 * Clear this page out of any CPU caches for coherent swap-in/out.
 	 */
-	ret = i915_gem_object_set_to_gtt_domain(to->engine[RCS].state, false);
+	ret = i915_gem_object_set_to_gtt_domain(to->engine[RCS].state->obj,
+						false);
 	if (ret)
-		goto unpin_out;
+		goto unpin_vma;
 
 	if (needs_pd_load_pre(ppgtt, engine, to)) {
 		/* Older GENs and non render rings still want the load first,
@@ -797,7 +803,7 @@ static int do_rcs_switch(struct drm_i915_gem_request *req)
 		trace_switch_mm(engine, to);
 		ret = ppgtt->switch_mm(ppgtt, req);
 		if (ret)
-			goto unpin_out;
+			goto unpin_vma;
 	}
 
 	if (!to->engine[RCS].initialised || i915_gem_context_is_default(to))
@@ -814,7 +820,7 @@ static int do_rcs_switch(struct drm_i915_gem_request *req)
 	if (to != from || (hw_flags & MI_FORCE_RESTORE)) {
 		ret = mi_set_context(req, hw_flags);
 		if (ret)
-			goto unpin_out;
+			goto unpin_vma;
 	}
 
 	/* The backing object for the context is done after switching to the
@@ -824,8 +830,6 @@ static int do_rcs_switch(struct drm_i915_gem_request *req)
 	 * MI_SET_CONTEXT instead of when the next seqno has completed.
 	 */
 	if (from != NULL) {
-		struct drm_i915_gem_object *obj = from->engine[RCS].state;
-
 		/* As long as MI_SET_CONTEXT is serializing, ie. it flushes the
 		 * whole damn pipeline, we don't need to explicitly mark the
 		 * object dirty. The only exception is that the context must be
@@ -833,11 +837,9 @@ static int do_rcs_switch(struct drm_i915_gem_request *req)
 		 * able to defer doing this until we know the object would be
 		 * swapped, but there is no way to do that yet.
 		 */
-		obj->base.read_domains = I915_GEM_DOMAIN_INSTRUCTION;
-		i915_vma_move_to_active(i915_gem_obj_to_ggtt(obj), req, 0);
-
-		/* obj is kept alive until the next request by its active ref */
-		i915_gem_object_ggtt_unpin(obj);
+		i915_vma_move_to_active(from->engine[RCS].state, req, 0);
+		/* state is kept alive until the next request */
+		i915_vma_unpin(from->engine[RCS].state);
 		i915_gem_context_put(from);
 	}
 	engine->last_context = i915_gem_context_get(to);
@@ -882,8 +884,8 @@ static int do_rcs_switch(struct drm_i915_gem_request *req)
 
 	return 0;
 
-unpin_out:
-	i915_gem_object_ggtt_unpin(to->engine[RCS].state);
+unpin_vma:
+	i915_vma_unpin(to->engine[RCS].state);
 	return ret;
 }
 
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index c621fa23cd28..21a4d0220c17 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1077,9 +1077,10 @@ static void i915_gem_record_rings(struct drm_i915_private *dev_priv,
 					i915_error_ggtt_object_create(dev_priv,
 								      engine->scratch.obj);
 
-			ee->ctx =
-				i915_error_ggtt_object_create(dev_priv,
-							      request->ctx->engine[i].state);
+			if (request->ctx->engine[i].state) {
+				ee->ctx = i915_error_ggtt_object_create(dev_priv,
+									request->ctx->engine[i].state->obj);
+			}
 
 			if (request->pid) {
 				struct task_struct *task;
diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
index f56d68173ae6..03a4d2ae71db 100644
--- a/drivers/gpu/drm/i915/i915_guc_submission.c
+++ b/drivers/gpu/drm/i915/i915_guc_submission.c
@@ -358,7 +358,7 @@ static void guc_init_ctx_desc(struct intel_guc *guc,
 		lrc->context_desc = lower_32_bits(ce->lrc_desc);
 
 		/* The state page is after PPHWSP */
-		gfx_addr = i915_gem_obj_ggtt_offset(ce->state);
+		gfx_addr = ce->state->node.start;
 		lrc->ring_lcra = gfx_addr + LRC_STATE_PN * PAGE_SIZE;
 		lrc->context_id = (client->ctx_index << GUC_ELC_CTXID_OFFSET) |
 				(engine->guc_id << GUC_ELC_ENGINE_OFFSET);
@@ -1061,7 +1061,7 @@ int intel_guc_suspend(struct drm_device *dev)
 	/* any value greater than GUC_POWER_D0 */
 	data[1] = GUC_POWER_D1;
 	/* first page is shared data with GuC */
-	data[2] = i915_gem_obj_ggtt_offset(ctx->engine[RCS].state);
+	data[2] = ctx->engine[RCS].state->node.start;
 
 	return host2guc_action(guc, data, ARRAY_SIZE(data));
 }
@@ -1086,7 +1086,7 @@ int intel_guc_resume(struct drm_device *dev)
 	data[0] = HOST2GUC_ACTION_EXIT_S_STATE;
 	data[1] = GUC_POWER_D0;
 	/* first page is shared data with GuC */
-	data[2] = i915_gem_obj_ggtt_offset(ctx->engine[RCS].state);
+	data[2] = ctx->engine[RCS].state->node.start;
 
 	return host2guc_action(guc, data, ARRAY_SIZE(data));
 }
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index c7f4b64b16f6..74c08bf5d136 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -315,7 +315,7 @@ intel_lr_context_descriptor_update(struct i915_gem_context *ctx,
 
 	desc = ctx->desc_template;				/* bits  3-4  */
 	desc |= engine->ctx_desc_template;			/* bits  0-11 */
-	desc |= ce->lrc_vma->node.start + LRC_PPHWSP_PN * PAGE_SIZE;
+	desc |= ce->state->node.start + LRC_PPHWSP_PN * PAGE_SIZE;
 								/* bits 12-31 */
 	desc |= (u64)ctx->hw_id << GEN8_CTX_ID_SHIFT;		/* bits 32-52 */
 
@@ -763,7 +763,6 @@ void intel_execlists_cancel_requests(struct intel_engine_cs *engine)
 static int intel_lr_context_pin(struct i915_gem_context *ctx,
 				struct intel_engine_cs *engine)
 {
-	struct drm_i915_private *dev_priv = ctx->i915;
 	struct intel_context *ce = &ctx->engine[engine->id];
 	void *vaddr;
 	u32 *lrc_reg_state;
@@ -774,16 +773,15 @@ static int intel_lr_context_pin(struct i915_gem_context *ctx,
 	if (ce->pin_count++)
 		return 0;
 
-	ret = i915_gem_object_ggtt_pin(ce->state, NULL,
-				       0, GEN8_LR_CONTEXT_ALIGN,
-				       PIN_OFFSET_BIAS | GUC_WOPCM_TOP);
+	ret = i915_vma_pin(ce->state, 0, GEN8_LR_CONTEXT_ALIGN,
+			   PIN_OFFSET_BIAS | GUC_WOPCM_TOP | PIN_GLOBAL);
 	if (ret)
 		goto err;
 
-	vaddr = i915_gem_object_pin_map(ce->state);
+	vaddr = i915_gem_object_pin_map(ce->state->obj);
 	if (IS_ERR(vaddr)) {
 		ret = PTR_ERR(vaddr);
-		goto unpin_ctx_obj;
+		goto unpin_vma;
 	}
 
 	lrc_reg_state = vaddr + LRC_STATE_PN * PAGE_SIZE;
@@ -792,24 +790,25 @@ static int intel_lr_context_pin(struct i915_gem_context *ctx,
 	if (ret)
 		goto unpin_map;
 
-	ce->lrc_vma = i915_gem_obj_to_ggtt(ce->state);
 	intel_lr_context_descriptor_update(ctx, engine);
 
 	lrc_reg_state[CTX_RING_BUFFER_START+1] = ce->ring->vma->node.start;
 	ce->lrc_reg_state = lrc_reg_state;
-	ce->state->dirty = true;
+	ce->state->obj->dirty = true;
 
 	/* Invalidate GuC TLB. */
-	if (i915.enable_guc_submission)
+	if (i915.enable_guc_submission) {
+		struct drm_i915_private *dev_priv = ctx->i915;
 		I915_WRITE(GEN8_GTCR, GEN8_GTCR_INVALIDATE);
+	}
 
 	i915_gem_context_get(ctx);
 	return 0;
 
 unpin_map:
-	i915_gem_object_unpin_map(ce->state);
-unpin_ctx_obj:
-	i915_gem_object_ggtt_unpin(ce->state);
+	i915_gem_object_unpin_map(ce->state->obj);
+unpin_vma:
+	__i915_vma_unpin(ce->state);
 err:
 	ce->pin_count = 0;
 	return ret;
@@ -828,12 +827,8 @@ void intel_lr_context_unpin(struct i915_gem_context *ctx,
 
 	intel_ring_unpin(ce->ring);
 
-	i915_gem_object_unpin_map(ce->state);
-	i915_gem_object_ggtt_unpin(ce->state);
-
-	ce->lrc_vma = NULL;
-	ce->lrc_desc = 0;
-	ce->lrc_reg_state = NULL;
+	i915_gem_object_unpin_map(ce->state->obj);
+	i915_vma_unpin(ce->state);
 
 	i915_gem_context_put(ctx);
 }
@@ -1747,19 +1742,18 @@ logical_ring_default_irqs(struct intel_engine_cs *engine)
 }
 
 static int
-lrc_setup_hws(struct intel_engine_cs *engine,
-	      struct drm_i915_gem_object *dctx_obj)
+lrc_setup_hws(struct intel_engine_cs *engine, struct i915_vma *vma)
 {
 	void *hws;
 
 	/* The HWSP is part of the default context object in LRC mode. */
-	engine->status_page.gfx_addr = i915_gem_obj_ggtt_offset(dctx_obj) +
-				       LRC_PPHWSP_PN * PAGE_SIZE;
-	hws = i915_gem_object_pin_map(dctx_obj);
+	engine->status_page.gfx_addr =
+		vma->node.start + LRC_PPHWSP_PN * PAGE_SIZE;
+	hws = i915_gem_object_pin_map(vma->obj);
 	if (IS_ERR(hws))
 		return PTR_ERR(hws);
 	engine->status_page.page_addr = hws + LRC_PPHWSP_PN * PAGE_SIZE;
-	engine->status_page.obj = dctx_obj;
+	engine->status_page.obj = vma->obj;
 
 	return 0;
 }
@@ -2131,6 +2125,7 @@ static int execlists_context_deferred_alloc(struct i915_gem_context *ctx,
 {
 	struct drm_i915_gem_object *ctx_obj;
 	struct intel_context *ce = &ctx->engine[engine->id];
+	struct i915_vma *vma;
 	uint32_t context_size;
 	struct intel_ring *ring;
 	int ret;
@@ -2148,6 +2143,12 @@ static int execlists_context_deferred_alloc(struct i915_gem_context *ctx,
 		return PTR_ERR(ctx_obj);
 	}
 
+	vma = i915_vma_create(ctx_obj, &ctx->i915->ggtt.base, NULL);
+	if (IS_ERR(vma)) {
+		ret = PTR_ERR(vma);
+		goto error_deref_obj;
+	}
+
 	ring = intel_engine_create_ring(engine, ctx->ring_size);
 	if (IS_ERR(ring)) {
 		ret = PTR_ERR(ring);
@@ -2161,7 +2162,7 @@ static int execlists_context_deferred_alloc(struct i915_gem_context *ctx,
 	}
 
 	ce->ring = ring;
-	ce->state = ctx_obj;
+	ce->state = vma;
 	ce->initialised = engine->init_context == NULL;
 
 	return 0;
@@ -2170,8 +2171,6 @@ error_ring_free:
 	intel_ring_free(ring);
 error_deref_obj:
 	i915_gem_object_put(ctx_obj);
-	ce->ring = NULL;
-	ce->state = NULL;
 	return ret;
 }
 
@@ -2182,24 +2181,23 @@ void intel_lr_context_reset(struct drm_i915_private *dev_priv,
 
 	for_each_engine(engine, dev_priv) {
 		struct intel_context *ce = &ctx->engine[engine->id];
-		struct drm_i915_gem_object *ctx_obj = ce->state;
 		void *vaddr;
 		uint32_t *reg_state;
 
-		if (!ctx_obj)
+		if (!ce->state)
 			continue;
 
-		vaddr = i915_gem_object_pin_map(ctx_obj);
+		vaddr = i915_gem_object_pin_map(ce->state->obj);
 		if (WARN_ON(IS_ERR(vaddr)))
 			continue;
 
 		reg_state = vaddr + LRC_STATE_PN * PAGE_SIZE;
-		ctx_obj->dirty = true;
 
 		reg_state[CTX_RING_HEAD+1] = 0;
 		reg_state[CTX_RING_TAIL+1] = 0;
 
-		i915_gem_object_unpin_map(ctx_obj);
+		ce->state->obj->dirty = true;
+		i915_gem_object_unpin_map(ce->state->obj);
 
 		ce->ring->head = 0;
 		ce->ring->tail = 0;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 09f01c641c14..5a383430e91d 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2092,8 +2092,8 @@ static int intel_ring_context_pin(struct i915_gem_context *ctx,
 		return 0;
 
 	if (ce->state) {
-		ret = i915_gem_object_ggtt_pin(ce->state, NULL, 0,
-					       ctx->ggtt_alignment, PIN_HIGH);
+		ret = i915_vma_pin(ce->state, 0, ctx->ggtt_alignment,
+				   PIN_GLOBAL | PIN_HIGH);
 		if (ret)
 			goto error;
 	}
@@ -2127,7 +2127,7 @@ static void intel_ring_context_unpin(struct i915_gem_context *ctx,
 		return;
 
 	if (ce->state)
-		i915_gem_object_ggtt_unpin(ce->state);
+		i915_vma_unpin(ce->state);
 
 	i915_gem_context_put(ctx);
 }
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 125+ messages in thread

* [PATCH 19/33] drm/i915: Only clflush the context object when binding
  2016-08-07 14:45 First class VMA, take 2 Chris Wilson
                   ` (17 preceding siblings ...)
  2016-08-07 14:45 ` [PATCH 18/33] drm/i915: Use VMA as the primary object for context state Chris Wilson
@ 2016-08-07 14:45 ` Chris Wilson
  2016-08-10  8:41   ` Joonas Lahtinen
  2016-08-07 14:45 ` [PATCH 20/33] drm/i915: Use VMA for ringbuffer tracking Chris Wilson
                   ` (19 subsequent siblings)
  38 siblings, 1 reply; 125+ messages in thread
From: Chris Wilson @ 2016-08-07 14:45 UTC (permalink / raw)
  To: intel-gfx

We know that the only access to the context object is via the GPU, and
the only time when it can be out of the GPU domain is when it is swapped
out and unbound. Therefore we only need to clflush the object when
binding.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_context.c | 12 +++++++-----
 drivers/gpu/drm/i915/intel_ringbuffer.c |  4 ++++
 2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index aa0419faeb34..5d42fee75464 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -771,6 +771,13 @@ static int do_rcs_switch(struct drm_i915_gem_request *req)
 	if (skip_rcs_switch(ppgtt, engine, to))
 		return 0;
 
+	if (!(to->engine[RCS].state->flags & I915_VMA_GLOBAL_BIND)) {
+		ret = i915_gem_object_set_to_gtt_domain(to->engine[RCS].state->obj,
+							false);
+		if (ret)
+			return ret;
+	}
+
 	/* Trying to pin first makes error handling easier. */
 	ret = i915_vma_pin(to->engine[RCS].state,
 			   0, to->ggtt_alignment,
@@ -790,11 +797,6 @@ static int do_rcs_switch(struct drm_i915_gem_request *req)
 	/*
 	 * Clear this page out of any CPU caches for coherent swap-in/out.
 	 */
-	ret = i915_gem_object_set_to_gtt_domain(to->engine[RCS].state->obj,
-						false);
-	if (ret)
-		goto unpin_vma;
-
 	if (needs_pd_load_pre(ppgtt, engine, to)) {
 		/* Older GENs and non render rings still want the load first,
 		 * "PP_DCLV followed by PP_DIR_BASE register through Load
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 5a383430e91d..f24e4e83afd7 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2092,6 +2092,10 @@ static int intel_ring_context_pin(struct i915_gem_context *ctx,
 		return 0;
 
 	if (ce->state) {
+		ret = i915_gem_object_set_to_gtt_domain(ce->state->obj, false);
+		if (ret)
+			goto error;
+
 		ret = i915_vma_pin(ce->state, 0, ctx->ggtt_alignment,
 				   PIN_GLOBAL | PIN_HIGH);
 		if (ret)
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 125+ messages in thread

* [PATCH 20/33] drm/i915: Use VMA for ringbuffer tracking
  2016-08-07 14:45 First class VMA, take 2 Chris Wilson
                   ` (18 preceding siblings ...)
  2016-08-07 14:45 ` [PATCH 19/33] drm/i915: Only clflush the context object when binding Chris Wilson
@ 2016-08-07 14:45 ` Chris Wilson
  2016-08-11  9:32   ` Joonas Lahtinen
  2016-08-07 14:45 ` [PATCH 21/33] drm/i915: Use VMA for scratch page tracking Chris Wilson
                   ` (18 subsequent siblings)
  38 siblings, 1 reply; 125+ messages in thread
From: Chris Wilson @ 2016-08-07 14:45 UTC (permalink / raw)
  To: intel-gfx

Use the GGTT VMA as the primary cookie for handing ring objects as
the most common action upon the ring is mapping and unmapping which act
upon the VMA itself. By restructuring the code to work with the ring
VMA, we can shrink the code and remove a few cycles from context pinning.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c        |   2 +-
 drivers/gpu/drm/i915/i915_gpu_error.c      |   4 +-
 drivers/gpu/drm/i915/i915_guc_submission.c |  18 +--
 drivers/gpu/drm/i915/intel_lrc.c           |  17 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c    | 245 +++++++++++++----------------
 drivers/gpu/drm/i915/intel_ringbuffer.h    |  14 +-
 6 files changed, 139 insertions(+), 161 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 6a03fa7b6264..09b1d05d003a 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -356,7 +356,7 @@ static int per_file_ctx_stats(int id, void *ptr, void *data)
 		if (ctx->engine[n].state)
 			per_file_stats(0, ctx->engine[n].state->obj, data);
 		if (ctx->engine[n].ring)
-			per_file_stats(0, ctx->engine[n].ring->obj, data);
+			per_file_stats(0, ctx->engine[n].ring->vma->obj, data);
 	}
 
 	return 0;
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 21a4d0220c17..09c3ae0c282a 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1102,12 +1102,12 @@ static void i915_gem_record_rings(struct drm_i915_private *dev_priv,
 			ee->cpu_ring_tail = ring->tail;
 			ee->ringbuffer =
 				i915_error_ggtt_object_create(dev_priv,
-							      ring->obj);
+							      ring->vma->obj);
 		}
 
 		ee->hws_page =
 			i915_error_ggtt_object_create(dev_priv,
-						      engine->status_page.obj);
+						      engine->status_page.vma->obj);
 
 		ee->wa_ctx = i915_error_ggtt_object_create(dev_priv,
 							   engine->wa_ctx.obj);
diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
index 03a4d2ae71db..761201ff6b34 100644
--- a/drivers/gpu/drm/i915/i915_guc_submission.c
+++ b/drivers/gpu/drm/i915/i915_guc_submission.c
@@ -343,7 +343,7 @@ static void guc_init_ctx_desc(struct intel_guc *guc,
 	for_each_engine(engine, dev_priv) {
 		struct intel_context *ce = &ctx->engine[engine->id];
 		struct guc_execlist_context *lrc = &desc.lrc[engine->guc_id];
-		struct drm_i915_gem_object *obj;
+		struct i915_vma *vma;
 
 		/* TODO: We have a design issue to be solved here. Only when we
 		 * receive the first batch, we know which engine is used by the
@@ -358,17 +358,15 @@ static void guc_init_ctx_desc(struct intel_guc *guc,
 		lrc->context_desc = lower_32_bits(ce->lrc_desc);
 
 		/* The state page is after PPHWSP */
-		gfx_addr = ce->state->node.start;
-		lrc->ring_lcra = gfx_addr + LRC_STATE_PN * PAGE_SIZE;
+		vma = ce->state;
+		lrc->ring_lcra = vma->node.start + LRC_STATE_PN * PAGE_SIZE;
 		lrc->context_id = (client->ctx_index << GUC_ELC_CTXID_OFFSET) |
 				(engine->guc_id << GUC_ELC_ENGINE_OFFSET);
 
-		obj = ce->ring->obj;
-		gfx_addr = i915_gem_obj_ggtt_offset(obj);
-
-		lrc->ring_begin = gfx_addr;
-		lrc->ring_end = gfx_addr + obj->base.size - 1;
-		lrc->ring_next_free_location = gfx_addr;
+		vma = ce->ring->vma;
+		lrc->ring_begin = vma->node.start;
+		lrc->ring_end = vma->node.start + vma->node.size - 1;
+		lrc->ring_next_free_location = lrc->ring_begin;
 		lrc->ring_current_tail_pointer_value = 0;
 
 		desc.engines_used |= (1 << engine->guc_id);
@@ -925,7 +923,7 @@ static void guc_create_ads(struct intel_guc *guc)
 	 * to find it.
 	 */
 	engine = &dev_priv->engine[RCS];
-	ads->golden_context_lrca = engine->status_page.gfx_addr;
+	ads->golden_context_lrca = engine->status_page.ggtt_offset;
 
 	for_each_engine(engine, dev_priv)
 		ads->eng_state_size[engine->guc_id] = intel_lr_context_size(engine);
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 74c08bf5d136..198d59b272b2 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1273,7 +1273,7 @@ static void lrc_init_hws(struct intel_engine_cs *engine)
 	struct drm_i915_private *dev_priv = engine->i915;
 
 	I915_WRITE(RING_HWS_PGA(engine->mmio_base),
-		   (u32)engine->status_page.gfx_addr);
+		   engine->status_page.ggtt_offset);
 	POSTING_READ(RING_HWS_PGA(engine->mmio_base));
 }
 
@@ -1695,9 +1695,9 @@ void intel_logical_ring_cleanup(struct intel_engine_cs *engine)
 
 	intel_engine_cleanup_common(engine);
 
-	if (engine->status_page.obj) {
-		i915_gem_object_unpin_map(engine->status_page.obj);
-		engine->status_page.obj = NULL;
+	if (engine->status_page.vma) {
+		i915_gem_object_unpin_map(engine->status_page.vma->obj);
+		engine->status_page.vma = NULL;
 	}
 	intel_lr_context_unpin(dev_priv->kernel_context, engine);
 
@@ -1744,16 +1744,17 @@ logical_ring_default_irqs(struct intel_engine_cs *engine)
 static int
 lrc_setup_hws(struct intel_engine_cs *engine, struct i915_vma *vma)
 {
+#define HWS_OFFSET (LRC_PPHWSP_PN * PAGE_SIZE)
 	void *hws;
 
 	/* The HWSP is part of the default context object in LRC mode. */
-	engine->status_page.gfx_addr =
-		vma->node.start + LRC_PPHWSP_PN * PAGE_SIZE;
 	hws = i915_gem_object_pin_map(vma->obj);
 	if (IS_ERR(hws))
 		return PTR_ERR(hws);
-	engine->status_page.page_addr = hws + LRC_PPHWSP_PN * PAGE_SIZE;
-	engine->status_page.obj = vma->obj;
+
+	engine->status_page.page_addr = hws + HWS_OFFSET;
+	engine->status_page.ggtt_offset = vma->node.start + HWS_OFFSET;
+	engine->status_page.vma = vma;
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index f24e4e83afd7..cff9935fe36f 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -466,7 +466,7 @@ static void intel_ring_setup_status_page(struct intel_engine_cs *engine)
 		mmio = RING_HWS_PGA(engine->mmio_base);
 	}
 
-	I915_WRITE(mmio, (u32)engine->status_page.gfx_addr);
+	I915_WRITE(mmio, engine->status_page.ggtt_offset);
 	POSTING_READ(mmio);
 
 	/*
@@ -531,7 +531,6 @@ static int init_ring_common(struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *dev_priv = engine->i915;
 	struct intel_ring *ring = engine->buffer;
-	struct drm_i915_gem_object *obj = ring->obj;
 	int ret = 0;
 
 	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
@@ -571,7 +570,7 @@ static int init_ring_common(struct intel_engine_cs *engine)
 	 * registers with the above sequence (the readback of the HEAD registers
 	 * also enforces ordering), otherwise the hw might lose the new ring
 	 * register values. */
-	I915_WRITE_START(engine, i915_gem_obj_ggtt_offset(obj));
+	I915_WRITE_START(engine, ring->vma->node.start);
 
 	/* WaClearRingBufHeadRegAtInit:ctg,elk */
 	if (I915_READ_HEAD(engine))
@@ -586,16 +585,16 @@ static int init_ring_common(struct intel_engine_cs *engine)
 
 	/* If the head is still not zero, the ring is dead */
 	if (wait_for((I915_READ_CTL(engine) & RING_VALID) != 0 &&
-		     I915_READ_START(engine) == i915_gem_obj_ggtt_offset(obj) &&
+		     I915_READ_START(engine) == ring->vma->node.start &&
 		     (I915_READ_HEAD(engine) & HEAD_ADDR) == 0, 50)) {
 		DRM_ERROR("%s initialization failed "
-			  "ctl %08x (valid? %d) head %08x tail %08x start %08x [expected %08lx]\n",
+			  "ctl %08x (valid? %d) head %08x tail %08x start %08x [expected %08llx]\n",
 			  engine->name,
 			  I915_READ_CTL(engine),
 			  I915_READ_CTL(engine) & RING_VALID,
 			  I915_READ_HEAD(engine), I915_READ_TAIL(engine),
 			  I915_READ_START(engine),
-			  (unsigned long)i915_gem_obj_ggtt_offset(obj));
+			  ring->vma->node.start);
 		ret = -EIO;
 		goto out;
 	}
@@ -1853,79 +1852,78 @@ static void cleanup_phys_status_page(struct intel_engine_cs *engine)
 
 static void cleanup_status_page(struct intel_engine_cs *engine)
 {
-	struct drm_i915_gem_object *obj;
+	struct i915_vma *vma;
 
-	obj = engine->status_page.obj;
-	if (obj == NULL)
+	vma = nullify(&engine->status_page.vma);
+	if (!vma)
 		return;
 
-	kunmap(sg_page(obj->pages->sgl));
-	i915_gem_object_ggtt_unpin(obj);
-	i915_gem_object_put(obj);
-	engine->status_page.obj = NULL;
+	i915_vma_unpin(vma);
+	i915_gem_object_unpin_map(vma->obj);
+	i915_gem_object_put(vma->obj);
 }
 
 static int init_status_page(struct intel_engine_cs *engine)
 {
-	struct drm_i915_gem_object *obj = engine->status_page.obj;
-
-	if (obj == NULL) {
-		unsigned flags;
-		int ret;
+	struct drm_i915_gem_object *obj;
+	struct i915_vma *vma;
+	unsigned flags;
+	int ret;
 
-		obj = i915_gem_object_create(&engine->i915->drm, 4096);
-		if (IS_ERR(obj)) {
-			DRM_ERROR("Failed to allocate status page\n");
-			return PTR_ERR(obj);
-		}
+	obj = i915_gem_object_create(&engine->i915->drm, 4096);
+	if (IS_ERR(obj)) {
+		DRM_ERROR("Failed to allocate status page\n");
+		return PTR_ERR(obj);
+	}
 
-		ret = i915_gem_object_set_cache_level(obj, I915_CACHE_LLC);
-		if (ret)
-			goto err_unref;
-
-		flags = 0;
-		if (!HAS_LLC(engine->i915))
-			/* On g33, we cannot place HWS above 256MiB, so
-			 * restrict its pinning to the low mappable arena.
-			 * Though this restriction is not documented for
-			 * gen4, gen5, or byt, they also behave similarly
-			 * and hang if the HWS is placed at the top of the
-			 * GTT. To generalise, it appears that all !llc
-			 * platforms have issues with us placing the HWS
-			 * above the mappable region (even though we never
-			 * actualy map it).
-			 */
-			flags |= PIN_MAPPABLE;
-		ret = i915_gem_object_ggtt_pin(obj, NULL, 0, 4096, flags);
-		if (ret) {
-err_unref:
-			i915_gem_object_put(obj);
-			return ret;
-		}
+	ret = i915_gem_object_set_cache_level(obj, I915_CACHE_LLC);
+	if (ret)
+		goto err_unref;
 
-		engine->status_page.obj = obj;
+	vma = i915_vma_create(obj, &engine->i915->ggtt.base, NULL);
+	if (IS_ERR(vma)) {
+		ret = PTR_ERR(vma);
+		goto err_unref;
 	}
 
-	engine->status_page.gfx_addr = i915_gem_obj_ggtt_offset(obj);
-	engine->status_page.page_addr = kmap(sg_page(obj->pages->sgl));
-	memset(engine->status_page.page_addr, 0, PAGE_SIZE);
+	flags = PIN_GLOBAL;
+	if (!HAS_LLC(engine->i915))
+		/* On g33, we cannot place HWS above 256MiB, so
+		 * restrict its pinning to the low mappable arena.
+		 * Though this restriction is not documented for
+		 * gen4, gen5, or byt, they also behave similarly
+		 * and hang if the HWS is placed at the top of the
+		 * GTT. To generalise, it appears that all !llc
+		 * platforms have issues with us placing the HWS
+		 * above the mappable region (even though we never
+		 * actualy map it).
+		 */
+		flags |= PIN_MAPPABLE;
+	ret = i915_vma_pin(vma, 0, 4096, flags);
+	if (ret)
+		goto err_unref;
 
-	DRM_DEBUG_DRIVER("%s hws offset: 0x%08x\n",
-			engine->name, engine->status_page.gfx_addr);
+	engine->status_page.vma = vma;
+	engine->status_page.ggtt_offset = vma->node.start;
+	engine->status_page.page_addr = i915_gem_object_pin_map(obj);
 
+	DRM_DEBUG_DRIVER("%s hws offset: 0x%08llx\n",
+			 engine->name, vma->node.start);
 	return 0;
+
+err_unref:
+	i915_gem_object_put(obj);
+	return ret;
 }
 
 static int init_phys_status_page(struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *dev_priv = engine->i915;
 
-	if (!dev_priv->status_page_dmah) {
-		dev_priv->status_page_dmah =
-			drm_pci_alloc(&dev_priv->drm, PAGE_SIZE, PAGE_SIZE);
-		if (!dev_priv->status_page_dmah)
-			return -ENOMEM;
-	}
+	dev_priv->status_page_dmah =
+		drm_pci_alloc(&dev_priv->drm, PAGE_SIZE, PAGE_SIZE);
+	if (!dev_priv->status_page_dmah)
+		return -ENOMEM;
 
 	engine->status_page.page_addr = dev_priv->status_page_dmah->vaddr;
 	memset(engine->status_page.page_addr, 0, PAGE_SIZE);
@@ -1935,55 +1933,31 @@ static int init_phys_status_page(struct intel_engine_cs *engine)
 
 int intel_ring_pin(struct intel_ring *ring)
 {
-	struct drm_i915_private *dev_priv = ring->engine->i915;
-	struct drm_i915_gem_object *obj = ring->obj;
 	/* Ring wraparound at offset 0 sometimes hangs. No idea why. */
-	unsigned flags = PIN_OFFSET_BIAS | 4096;
+	unsigned int flags = PIN_GLOBAL | PIN_OFFSET_BIAS | 4096;
 	void *addr;
 	int ret;
 
-	if (HAS_LLC(dev_priv) && !obj->stolen) {
-		ret = i915_gem_object_ggtt_pin(obj, NULL, 0, PAGE_SIZE, flags);
-		if (ret)
-			return ret;
-
-		ret = i915_gem_object_set_to_cpu_domain(obj, true);
-		if (ret)
-			goto err_unpin;
-
-		addr = i915_gem_object_pin_map(obj);
-		if (IS_ERR(addr)) {
-			ret = PTR_ERR(addr);
-			goto err_unpin;
-		}
-	} else {
-		ret = i915_gem_object_ggtt_pin(obj, NULL, 0, PAGE_SIZE,
-					       flags | PIN_MAPPABLE);
-		if (ret)
-			return ret;
+	GEM_BUG_ON(ring->vaddr);
 
-		ret = i915_gem_object_set_to_gtt_domain(obj, true);
-		if (ret)
-			goto err_unpin;
+	if (ring->vmap)
+		flags |= PIN_MAPPABLE;
 
-		/* Access through the GTT requires the device to be awake. */
-		assert_rpm_wakelock_held(dev_priv);
+	ret = i915_vma_pin(ring->vma, 0, PAGE_SIZE, flags);
+	if (unlikely(ret))
+		return ret;
 
-		addr = (void __force *)
-			i915_vma_pin_iomap(i915_gem_obj_to_ggtt(obj));
-		if (IS_ERR(addr)) {
-			ret = PTR_ERR(addr);
-			goto err_unpin;
-		}
+	if (ring->vmap)
+		addr = i915_gem_object_pin_map(ring->vma->obj);
+	else
+		addr = (void __force *)i915_vma_pin_iomap(ring->vma);
+	if (IS_ERR(addr)) {
+		i915_vma_unpin(ring->vma);
+		return PTR_ERR(addr);
 	}
 
 	ring->vaddr = addr;
-	ring->vma = i915_gem_obj_to_ggtt(obj);
 	return 0;
-
-err_unpin:
-	i915_gem_object_ggtt_unpin(obj);
-	return ret;
 }
 
 void intel_ring_unpin(struct intel_ring *ring)
@@ -1991,60 +1965,66 @@ void intel_ring_unpin(struct intel_ring *ring)
 	GEM_BUG_ON(!ring->vma);
 	GEM_BUG_ON(!ring->vaddr);
 
-	if (HAS_LLC(ring->engine->i915) && !ring->obj->stolen)
-		i915_gem_object_unpin_map(ring->obj);
+	if (ring->vmap)
+		i915_gem_object_unpin_map(ring->vma->obj);
 	else
 		i915_vma_unpin_iomap(ring->vma);
 	ring->vaddr = NULL;
 
-	i915_gem_object_ggtt_unpin(ring->obj);
-	ring->vma = NULL;
+	i915_vma_unpin(ring->vma);
 }
 
-static void intel_destroy_ringbuffer_obj(struct intel_ring *ring)
-{
-	i915_gem_object_put(ring->obj);
-	ring->obj = NULL;
-}
-
-static int intel_alloc_ringbuffer_obj(struct drm_device *dev,
-				      struct intel_ring *ring)
+static struct i915_vma *
+intel_ring_create_vma(struct drm_i915_private *dev_priv, int size)
 {
 	struct drm_i915_gem_object *obj;
+	struct i915_vma *vma;
+	int ret;
 
-	obj = NULL;
-	if (!HAS_LLC(dev))
-		obj = i915_gem_object_create_stolen(dev, ring->size);
-	if (obj == NULL)
-		obj = i915_gem_object_create(dev, ring->size);
+	obj = ERR_PTR(-ENODEV);
+	if (!HAS_LLC(dev_priv))
+		obj = i915_gem_object_create_stolen(&dev_priv->drm, size);
 	if (IS_ERR(obj))
-		return PTR_ERR(obj);
+		obj = i915_gem_object_create(&dev_priv->drm, size);
+	if (IS_ERR(obj))
+		return ERR_CAST(obj);
 
 	/* mark ring buffers as read-only from GPU side by default */
 	obj->gt_ro = 1;
 
-	ring->obj = obj;
+	if (HAS_LLC(dev_priv) && !obj->stolen)
+		ret = i915_gem_object_set_to_cpu_domain(obj, true);
+	else
+		ret = i915_gem_object_set_to_gtt_domain(obj, true);
+	if (ret) {
+		vma = ERR_PTR(ret);
+		goto err;
+	}
 
-	return 0;
+	vma = i915_vma_create(obj, &dev_priv->ggtt.base, NULL);
+	if (IS_ERR(vma))
+		goto err;
+
+	return vma;
+
+err:
+	i915_gem_object_put(obj);
+	return vma;
 }
 
 struct intel_ring *
 intel_engine_create_ring(struct intel_engine_cs *engine, int size)
 {
 	struct intel_ring *ring;
-	int ret;
+	struct i915_vma *vma;
 
 	GEM_BUG_ON(!is_power_of_2(size));
 
 	ring = kzalloc(sizeof(*ring), GFP_KERNEL);
-	if (ring == NULL) {
-		DRM_DEBUG_DRIVER("Failed to allocate ringbuffer %s\n",
-				 engine->name);
+	if (!ring)
 		return ERR_PTR(-ENOMEM);
-	}
 
 	ring->engine = engine;
-	list_add(&ring->link, &engine->buffers);
 
 	INIT_LIST_HEAD(&ring->request_list);
 
@@ -2060,22 +2040,23 @@ intel_engine_create_ring(struct intel_engine_cs *engine, int size)
 	ring->last_retired_head = -1;
 	intel_ring_update_space(ring);
 
-	ret = intel_alloc_ringbuffer_obj(&engine->i915->drm, ring);
-	if (ret) {
-		DRM_DEBUG_DRIVER("Failed to allocate ringbuffer %s: %d\n",
-				 engine->name, ret);
-		list_del(&ring->link);
+	vma = intel_ring_create_vma(engine->i915, size);
+	if (IS_ERR(vma)) {
 		kfree(ring);
-		return ERR_PTR(ret);
+		return ERR_CAST(vma);
 	}
+	ring->vma = vma;
+	if (HAS_LLC(engine->i915) && !vma->obj->stolen)
+		ring->vmap = true;
 
+	list_add(&ring->link, &engine->buffers);
 	return ring;
 }
 
 void
 intel_ring_free(struct intel_ring *ring)
 {
-	intel_destroy_ringbuffer_obj(ring);
+	i915_gem_object_put(ring->vma->obj);
 	list_del(&ring->link);
 	kfree(ring);
 }
@@ -2169,7 +2150,6 @@ static int intel_init_ring_buffer(struct intel_engine_cs *engine)
 		ret = PTR_ERR(ring);
 		goto error;
 	}
-	engine->buffer = ring;
 
 	if (I915_NEED_GFX_HWS(dev_priv)) {
 		ret = init_status_page(engine);
@@ -2184,11 +2164,10 @@ static int intel_init_ring_buffer(struct intel_engine_cs *engine)
 
 	ret = intel_ring_pin(ring);
 	if (ret) {
-		DRM_ERROR("Failed to pin and map ringbuffer %s: %d\n",
-				engine->name, ret);
-		intel_destroy_ringbuffer_obj(ring);
+		intel_ring_free(ring);
 		goto error;
 	}
+	engine->buffer = ring;
 
 	return 0;
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 66dc93469076..35e2b87ab17a 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -26,10 +26,10 @@
  */
 #define I915_RING_FREE_SPACE 64
 
-struct  intel_hw_status_page {
-	u32		*page_addr;
-	unsigned int	gfx_addr;
-	struct		drm_i915_gem_object *obj;
+struct intel_hw_status_page {
+	struct i915_vma *vma;
+	u32 *page_addr;
+	u32 ggtt_offset;
 };
 
 #define I915_READ_TAIL(engine) I915_READ(RING_TAIL((engine)->mmio_base))
@@ -83,9 +83,8 @@ struct intel_engine_hangcheck {
 };
 
 struct intel_ring {
-	struct drm_i915_gem_object *obj;
-	void *vaddr;
 	struct i915_vma *vma;
+	void *vaddr;
 
 	struct intel_engine_cs *engine;
 	struct list_head link;
@@ -97,6 +96,7 @@ struct intel_ring {
 	int space;
 	int size;
 	int effective_size;
+	bool vmap;
 
 	/** We track the position of the requests in the ring buffer, and
 	 * when each is retired we increment last_retired_head as the GPU
@@ -516,7 +516,7 @@ int init_workarounds_ring(struct intel_engine_cs *engine);
 
 static inline u32 intel_hws_seqno_address(struct intel_engine_cs *engine)
 {
-	return engine->status_page.gfx_addr + I915_GEM_HWS_INDEX_ADDR;
+	return engine->status_page.ggtt_offset + I915_GEM_HWS_INDEX_ADDR;
 }
 
 /* intel_breadcrumbs.c -- user interrupt bottom-half for waiters */
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 125+ messages in thread

* [PATCH 21/33] drm/i915: Use VMA for scratch page tracking
  2016-08-07 14:45 First class VMA, take 2 Chris Wilson
                   ` (19 preceding siblings ...)
  2016-08-07 14:45 ` [PATCH 20/33] drm/i915: Use VMA for ringbuffer tracking Chris Wilson
@ 2016-08-07 14:45 ` Chris Wilson
  2016-08-08  8:00   ` [PATCH 1/3] " Chris Wilson
  2016-08-11 10:06   ` [PATCH 21/33] drm/i915: Use VMA for scratch page tracking Joonas Lahtinen
  2016-08-07 14:45 ` [PATCH 22/33] drm/i915/overlay: Use VMA as the primary tracker for images Chris Wilson
                   ` (17 subsequent siblings)
  38 siblings, 2 replies; 125+ messages in thread
From: Chris Wilson @ 2016-08-07 14:45 UTC (permalink / raw)
  To: intel-gfx

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_context.c |  2 +-
 drivers/gpu/drm/i915/i915_gpu_error.c   |  2 +-
 drivers/gpu/drm/i915/intel_display.c    |  2 +-
 drivers/gpu/drm/i915/intel_engine_cs.c  | 50 ++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.c        | 17 +++++-----
 drivers/gpu/drm/i915/intel_ringbuffer.c | 59 +++++----------------------------
 drivers/gpu/drm/i915/intel_ringbuffer.h | 10 ++----
 7 files changed, 71 insertions(+), 71 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 5d42fee75464..15eed897b498 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -660,7 +660,7 @@ mi_set_context(struct drm_i915_gem_request *req, u32 hw_flags)
 					MI_STORE_REGISTER_MEM |
 					MI_SRM_LRM_GLOBAL_GTT);
 			intel_ring_emit_reg(ring, last_reg);
-			intel_ring_emit(ring, engine->scratch.gtt_offset);
+			intel_ring_emit(ring, engine->scratch->node.start);
 			intel_ring_emit(ring, MI_NOOP);
 		}
 		intel_ring_emit(ring, MI_ARB_ON_OFF | MI_ARB_ENABLE);
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 09c3ae0c282a..2d93af0bb793 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1075,7 +1075,7 @@ static void i915_gem_record_rings(struct drm_i915_private *dev_priv,
 			if (HAS_BROKEN_CS_TLB(dev_priv))
 				ee->wa_batchbuffer =
 					i915_error_ggtt_object_create(dev_priv,
-								      engine->scratch.obj);
+								      engine->scratch->obj);
 
 			if (request->ctx->engine[i].state) {
 				ee->ctx = i915_error_ggtt_object_create(dev_priv,
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 9cbf5431c1e3..3deee0306e82 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -11325,7 +11325,7 @@ static int intel_gen7_queue_flip(struct drm_device *dev,
 			intel_ring_emit(ring, MI_STORE_REGISTER_MEM |
 					      MI_SRM_LRM_GLOBAL_GTT);
 		intel_ring_emit_reg(ring, DERRMR);
-		intel_ring_emit(ring, req->engine->scratch.gtt_offset + 256);
+		intel_ring_emit(ring, req->engine->scratch->node.start + 256);
 		if (IS_GEN8(dev)) {
 			intel_ring_emit(ring, 0);
 			intel_ring_emit(ring, MI_NOOP);
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 0dd3d1de18aa..1dec35441ab5 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -195,6 +195,54 @@ void intel_engine_setup_common(struct intel_engine_cs *engine)
 	i915_gem_batch_pool_init(engine, &engine->batch_pool);
 }
 
+int intel_engine_create_scratch(struct intel_engine_cs *engine, int size)
+{
+	struct drm_i915_gem_object *obj;
+	struct i915_vma *vma;
+	int ret;
+
+	WARN_ON(engine->scratch);
+
+	obj = i915_gem_object_create_stolen(&engine->i915->drm, size);
+	if (!obj)
+		obj = i915_gem_object_create(&engine->i915->drm, size);
+	if (IS_ERR(obj)) {
+		DRM_ERROR("Failed to allocate scratch page\n");
+		return PTR_ERR(obj);
+	}
+
+	vma = i915_vma_create(obj, &engine->i915->ggtt.base, NULL);
+	if (IS_ERR(vma)) {
+		ret = PTR_ERR(vma);
+		goto err_unref;
+	}
+
+	ret = i915_vma_pin(vma, 0, 4096, PIN_GLOBAL | PIN_HIGH);
+	if (ret)
+		goto err_unref;
+
+	engine->scratch = vma;
+	DRM_DEBUG_DRIVER("%s pipe control offset: 0x%08llx\n",
+			 engine->name, vma->node.start);
+	return 0;
+
+err_unref:
+	i915_gem_object_put(obj);
+	return ret;
+}
+
+static void intel_engine_cleanup_scratch(struct intel_engine_cs *engine)
+{
+	struct i915_vma *vma;
+
+	vma = nullify(&engine->scratch);
+	if (!vma)
+		return;
+
+	i915_vma_unpin(vma);
+	i915_gem_object_put(vma->obj);
+}
+
 /**
  * intel_engines_init_common - initialize cengine state which might require hw access
  * @engine: Engine to initialize.
@@ -226,6 +274,8 @@ int intel_engine_init_common(struct intel_engine_cs *engine)
  */
 void intel_engine_cleanup_common(struct intel_engine_cs *engine)
 {
+	intel_engine_cleanup_scratch(engine);
+
 	intel_engine_cleanup_cmd_parser(engine);
 	intel_engine_fini_breadcrumbs(engine);
 	i915_gem_batch_pool_fini(&engine->batch_pool);
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 198d59b272b2..096eb8c2da17 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -914,7 +914,7 @@ static inline int gen8_emit_flush_coherentl3_wa(struct intel_engine_cs *engine,
 	wa_ctx_emit(batch, index, (MI_STORE_REGISTER_MEM_GEN8 |
 				   MI_SRM_LRM_GLOBAL_GTT));
 	wa_ctx_emit_reg(batch, index, GEN8_L3SQCREG4);
-	wa_ctx_emit(batch, index, engine->scratch.gtt_offset + 256);
+	wa_ctx_emit(batch, index, engine->scratch->node.start + 256);
 	wa_ctx_emit(batch, index, 0);
 
 	wa_ctx_emit(batch, index, MI_LOAD_REGISTER_IMM(1));
@@ -932,7 +932,7 @@ static inline int gen8_emit_flush_coherentl3_wa(struct intel_engine_cs *engine,
 	wa_ctx_emit(batch, index, (MI_LOAD_REGISTER_MEM_GEN8 |
 				   MI_SRM_LRM_GLOBAL_GTT));
 	wa_ctx_emit_reg(batch, index, GEN8_L3SQCREG4);
-	wa_ctx_emit(batch, index, engine->scratch.gtt_offset + 256);
+	wa_ctx_emit(batch, index, engine->scratch->node.start + 256);
 	wa_ctx_emit(batch, index, 0);
 
 	return index;
@@ -993,7 +993,7 @@ static int gen8_init_indirectctx_bb(struct intel_engine_cs *engine,
 
 	/* WaClearSlmSpaceAtContextSwitch:bdw,chv */
 	/* Actual scratch location is at 128 bytes offset */
-	scratch_addr = engine->scratch.gtt_offset + 2*CACHELINE_BYTES;
+	scratch_addr = engine->scratch->node.start + 2*CACHELINE_BYTES;
 
 	wa_ctx_emit(batch, index, GFX_OP_PIPE_CONTROL(6));
 	wa_ctx_emit(batch, index, (PIPE_CONTROL_FLUSH_L3 |
@@ -1072,8 +1072,8 @@ static int gen9_init_indirectctx_bb(struct intel_engine_cs *engine,
 	/* WaClearSlmSpaceAtContextSwitch:kbl */
 	/* Actual scratch location is at 128 bytes offset */
 	if (IS_KBL_REVID(dev_priv, 0, KBL_REVID_A0)) {
-		uint32_t scratch_addr
-			= engine->scratch.gtt_offset + 2*CACHELINE_BYTES;
+		uint32_t scratch_addr =
+			engine->scratch->node.start + 2*CACHELINE_BYTES;
 
 		wa_ctx_emit(batch, index, GFX_OP_PIPE_CONTROL(6));
 		wa_ctx_emit(batch, index, (PIPE_CONTROL_FLUSH_L3 |
@@ -1215,7 +1215,7 @@ static int intel_init_workaround_bb(struct intel_engine_cs *engine)
 	}
 
 	/* some WA perform writes to scratch page, ensure it is valid */
-	if (engine->scratch.obj == NULL) {
+	if (!engine->scratch) {
 		DRM_ERROR("scratch page not allocated for %s\n", engine->name);
 		return -EINVAL;
 	}
@@ -1483,7 +1483,7 @@ static int gen8_emit_flush_render(struct drm_i915_gem_request *request,
 {
 	struct intel_ring *ring = request->ring;
 	struct intel_engine_cs *engine = request->engine;
-	u32 scratch_addr = engine->scratch.gtt_offset + 2 * CACHELINE_BYTES;
+	u32 scratch_addr = engine->scratch->node.start + 2 * CACHELINE_BYTES;
 	bool vf_flush_wa = false, dc_flush_wa = false;
 	u32 flags = 0;
 	int ret;
@@ -1844,11 +1844,10 @@ int logical_render_ring_init(struct intel_engine_cs *engine)
 	else
 		engine->init_hw = gen8_init_render_ring;
 	engine->init_context = gen8_init_rcs_context;
-	engine->cleanup = intel_fini_pipe_control;
 	engine->emit_flush = gen8_emit_flush_render;
 	engine->emit_request = gen8_emit_request_render;
 
-	ret = intel_init_pipe_control(engine, 4096);
+	ret = intel_engine_create_scratch(engine, 4096);
 	if (ret)
 		return ret;
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index cff9935fe36f..af2d81ae3e7d 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -176,7 +176,7 @@ intel_emit_post_sync_nonzero_flush(struct drm_i915_gem_request *req)
 {
 	struct intel_ring *ring = req->ring;
 	u32 scratch_addr =
-		req->engine->scratch.gtt_offset + 2 * CACHELINE_BYTES;
+		req->engine->scratch->node.start + 2 * CACHELINE_BYTES;
 	int ret;
 
 	ret = intel_ring_begin(req, 6);
@@ -212,7 +212,7 @@ gen6_render_ring_flush(struct drm_i915_gem_request *req, u32 mode)
 {
 	struct intel_ring *ring = req->ring;
 	u32 scratch_addr =
-		req->engine->scratch.gtt_offset + 2 * CACHELINE_BYTES;
+		req->engine->scratch->node.start + 2 * CACHELINE_BYTES;
 	u32 flags = 0;
 	int ret;
 
@@ -286,7 +286,7 @@ gen7_render_ring_flush(struct drm_i915_gem_request *req, u32 mode)
 {
 	struct intel_ring *ring = req->ring;
 	u32 scratch_addr =
-		req->engine->scratch.gtt_offset + 2 * CACHELINE_BYTES;
+		req->engine->scratch->node.start + 2 * CACHELINE_BYTES;
 	u32 flags = 0;
 	int ret;
 
@@ -370,7 +370,8 @@ gen8_emit_pipe_control(struct drm_i915_gem_request *req,
 static int
 gen8_render_ring_flush(struct drm_i915_gem_request *req, u32 mode)
 {
-	u32 scratch_addr = req->engine->scratch.gtt_offset + 2 * CACHELINE_BYTES;
+	u32 scratch_addr =
+		req->engine->scratch->node.start + 2 * CACHELINE_BYTES;
 	u32 flags = 0;
 	int ret;
 
@@ -612,48 +613,6 @@ out:
 	return ret;
 }
 
-void intel_fini_pipe_control(struct intel_engine_cs *engine)
-{
-	if (engine->scratch.obj == NULL)
-		return;
-
-	i915_gem_object_ggtt_unpin(engine->scratch.obj);
-	i915_gem_object_put(engine->scratch.obj);
-	engine->scratch.obj = NULL;
-}
-
-int intel_init_pipe_control(struct intel_engine_cs *engine, int size)
-{
-	struct drm_i915_gem_object *obj;
-	int ret;
-
-	WARN_ON(engine->scratch.obj);
-
-	obj = i915_gem_object_create_stolen(&engine->i915->drm, size);
-	if (!obj)
-		obj = i915_gem_object_create(&engine->i915->drm, size);
-	if (IS_ERR(obj)) {
-		DRM_ERROR("Failed to allocate scratch page\n");
-		ret = PTR_ERR(obj);
-		goto err;
-	}
-
-	ret = i915_gem_object_ggtt_pin(obj, NULL, 0, 4096, PIN_HIGH);
-	if (ret)
-		goto err_unref;
-
-	engine->scratch.obj = obj;
-	engine->scratch.gtt_offset = i915_gem_obj_ggtt_offset(obj);
-	DRM_DEBUG_DRIVER("%s pipe control offset: 0x%08x\n",
-			 engine->name, engine->scratch.gtt_offset);
-	return 0;
-
-err_unref:
-	i915_gem_object_put(engine->scratch.obj);
-err:
-	return ret;
-}
-
 static int intel_ring_workarounds_emit(struct drm_i915_gem_request *req)
 {
 	struct intel_ring *ring = req->ring;
@@ -1304,8 +1263,6 @@ static void render_ring_cleanup(struct intel_engine_cs *engine)
 		i915_gem_object_put(dev_priv->semaphore_obj);
 		dev_priv->semaphore_obj = NULL;
 	}
-
-	intel_fini_pipe_control(engine);
 }
 
 static int gen8_rcs_signal(struct drm_i915_gem_request *req)
@@ -1763,7 +1720,7 @@ i830_emit_bb_start(struct drm_i915_gem_request *req,
 		   unsigned int dispatch_flags)
 {
 	struct intel_ring *ring = req->ring;
-	u32 cs_offset = req->engine->scratch.gtt_offset;
+	u32 cs_offset = req->engine->scratch->node.start;
 	int ret;
 
 	ret = intel_ring_begin(req, 6);
@@ -2790,11 +2747,11 @@ int intel_init_render_ring_buffer(struct intel_engine_cs *engine)
 		return ret;
 
 	if (INTEL_GEN(dev_priv) >= 6) {
-		ret = intel_init_pipe_control(engine, 4096);
+		ret = intel_engine_create_scratch(engine, 4096);
 		if (ret)
 			return ret;
 	} else if (HAS_BROKEN_CS_TLB(dev_priv)) {
-		ret = intel_init_pipe_control(engine, I830_WA_SIZE);
+		ret = intel_engine_create_scratch(engine, I830_WA_SIZE);
 		if (ret)
 			return ret;
 	}
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 35e2b87ab17a..9e3ab8129734 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -198,6 +198,7 @@ struct intel_engine_cs {
 
 	struct intel_hw_status_page status_page;
 	struct i915_ctx_workarounds wa_ctx;
+	struct i915_vma *scratch;
 
 	u32             irq_keep_mask; /* always keep these interrupts */
 	u32		irq_enable_mask; /* bitmask to enable ring interrupt */
@@ -320,11 +321,6 @@ struct intel_engine_cs {
 
 	struct intel_engine_hangcheck hangcheck;
 
-	struct {
-		struct drm_i915_gem_object *obj;
-		u32 gtt_offset;
-	} scratch;
-
 	bool needs_cmd_parser;
 
 	/*
@@ -476,11 +472,9 @@ void intel_ring_update_space(struct intel_ring *ring);
 
 void intel_engine_init_seqno(struct intel_engine_cs *engine, u32 seqno);
 
-int intel_init_pipe_control(struct intel_engine_cs *engine, int size);
-void intel_fini_pipe_control(struct intel_engine_cs *engine);
-
 void intel_engine_setup_common(struct intel_engine_cs *engine);
 int intel_engine_init_common(struct intel_engine_cs *engine);
+int intel_engine_create_scratch(struct intel_engine_cs *engine, int size);
 void intel_engine_cleanup_common(struct intel_engine_cs *engine);
 
 static inline int intel_engine_idle(struct intel_engine_cs *engine,
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 125+ messages in thread

* [PATCH 22/33] drm/i915/overlay: Use VMA as the primary tracker for images
  2016-08-07 14:45 First class VMA, take 2 Chris Wilson
                   ` (20 preceding siblings ...)
  2016-08-07 14:45 ` [PATCH 21/33] drm/i915: Use VMA for scratch page tracking Chris Wilson
@ 2016-08-07 14:45 ` Chris Wilson
  2016-08-11 10:17   ` Joonas Lahtinen
  2016-08-07 14:45 ` [PATCH 23/33] drm/i915: Use VMA as the primary tracker for semaphore page Chris Wilson
                   ` (16 subsequent siblings)
  38 siblings, 1 reply; 125+ messages in thread
From: Chris Wilson @ 2016-08-07 14:45 UTC (permalink / raw)
  To: intel-gfx

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_overlay.c | 38 ++++++++++++++++++++----------------
 1 file changed, 21 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_overlay.c b/drivers/gpu/drm/i915/intel_overlay.c
index 90f3ab424e01..3f44b77aa0a2 100644
--- a/drivers/gpu/drm/i915/intel_overlay.c
+++ b/drivers/gpu/drm/i915/intel_overlay.c
@@ -171,8 +171,7 @@ struct overlay_registers {
 struct intel_overlay {
 	struct drm_i915_private *i915;
 	struct intel_crtc *crtc;
-	struct drm_i915_gem_object *vid_bo;
-	struct drm_i915_gem_object *old_vid_bo;
+	struct i915_vma *vma, *old_vma;
 	bool active;
 	bool pfit_active;
 	u32 pfit_vscale_ratio; /* shifted-point number, (1<<12) == 1.0 */
@@ -317,15 +316,17 @@ static void intel_overlay_release_old_vid_tail(struct i915_gem_active *active,
 {
 	struct intel_overlay *overlay =
 		container_of(active, typeof(*overlay), last_flip);
-	struct drm_i915_gem_object *obj = overlay->old_vid_bo;
+	struct i915_vma *vma;
 
-	i915_gem_track_fb(obj, NULL,
-			  INTEL_FRONTBUFFER_OVERLAY(overlay->crtc->pipe));
+	vma = nullify(&overlay->old_vma);
+	if (WARN_ON(!vma))
+		return;
 
-	i915_gem_object_ggtt_unpin(obj);
-	i915_gem_object_put(obj);
+	i915_gem_track_fb(vma->obj, NULL,
+			  INTEL_FRONTBUFFER_OVERLAY(overlay->crtc->pipe));
 
-	overlay->old_vid_bo = NULL;
+	i915_gem_object_unpin_from_display_plane(vma->obj, &i915_ggtt_view_normal);
+	i915_gem_object_put(vma->obj);
 }
 
 static void intel_overlay_off_tail(struct i915_gem_active *active,
@@ -333,15 +334,15 @@ static void intel_overlay_off_tail(struct i915_gem_active *active,
 {
 	struct intel_overlay *overlay =
 		container_of(active, typeof(*overlay), last_flip);
-	struct drm_i915_gem_object *obj = overlay->vid_bo;
+	struct i915_vma *vma;
 
 	/* never have the overlay hw on without showing a frame */
-	if (WARN_ON(!obj))
+	vma = nullify(&overlay->vma);
+	if (WARN_ON(!vma))
 		return;
 
-	i915_gem_object_ggtt_unpin(obj);
-	i915_gem_object_put(obj);
-	overlay->vid_bo = NULL;
+	i915_gem_object_unpin_from_display_plane(vma->obj, &i915_ggtt_view_normal);
+	i915_gem_object_put(vma->obj);
 
 	overlay->crtc->overlay = NULL;
 	overlay->crtc = NULL;
@@ -421,7 +422,7 @@ static int intel_overlay_release_old_vid(struct intel_overlay *overlay)
 	/* Only wait if there is actually an old frame to release to
 	 * guarantee forward progress.
 	 */
-	if (!overlay->old_vid_bo)
+	if (!overlay->old_vma)
 		return 0;
 
 	if (I915_READ(ISR) & I915_OVERLAY_PLANE_FLIP_PENDING_INTERRUPT) {
@@ -744,6 +745,7 @@ static int intel_overlay_do_put_image(struct intel_overlay *overlay,
 	struct drm_i915_private *dev_priv = overlay->i915;
 	u32 swidth, swidthsw, sheight, ostride;
 	enum pipe pipe = overlay->crtc->pipe;
+	struct i915_vma *vma;
 
 	lockdep_assert_held(&dev_priv->drm.struct_mutex);
 	WARN_ON(!drm_modeset_is_locked(&dev_priv->drm.mode_config.connection_mutex));
@@ -757,6 +759,8 @@ static int intel_overlay_do_put_image(struct intel_overlay *overlay,
 	if (ret != 0)
 		return ret;
 
+	vma = i915_gem_obj_to_ggtt_view(new_bo, &i915_ggtt_view_normal);
+
 	ret = i915_gem_object_put_fence(new_bo);
 	if (ret)
 		goto out_unpin;
@@ -834,11 +838,11 @@ static int intel_overlay_do_put_image(struct intel_overlay *overlay,
 	if (ret)
 		goto out_unpin;
 
-	i915_gem_track_fb(overlay->vid_bo, new_bo,
+	i915_gem_track_fb(overlay->vma->obj, new_bo,
 			  INTEL_FRONTBUFFER_OVERLAY(pipe));
 
-	overlay->old_vid_bo = overlay->vid_bo;
-	overlay->vid_bo = new_bo;
+	overlay->old_vma = overlay->vma;
+	overlay->vma = vma;
 
 	intel_frontbuffer_flip(dev_priv, INTEL_FRONTBUFFER_OVERLAY(pipe));
 
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 125+ messages in thread

* [PATCH 23/33] drm/i915: Use VMA as the primary tracker for semaphore page
  2016-08-07 14:45 First class VMA, take 2 Chris Wilson
                   ` (21 preceding siblings ...)
  2016-08-07 14:45 ` [PATCH 22/33] drm/i915/overlay: Use VMA as the primary tracker for images Chris Wilson
@ 2016-08-07 14:45 ` Chris Wilson
  2016-08-11 10:42   ` Joonas Lahtinen
  2016-08-07 14:45 ` [PATCH 24/33] drm/i915: Use VMA for render state page tracking Chris Wilson
                   ` (15 subsequent siblings)
  38 siblings, 1 reply; 125+ messages in thread
From: Chris Wilson @ 2016-08-07 14:45 UTC (permalink / raw)
  To: intel-gfx

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c     |  2 +-
 drivers/gpu/drm/i915/i915_drv.h         |  4 +--
 drivers/gpu/drm/i915/i915_gpu_error.c   | 14 ++++----
 drivers/gpu/drm/i915/intel_ringbuffer.c | 58 +++++++++++++++++++--------------
 drivers/gpu/drm/i915/intel_ringbuffer.h |  4 +--
 5 files changed, 45 insertions(+), 37 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 09b1d05d003a..cf8a8df07bed 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -3142,7 +3142,7 @@ static int i915_semaphore_status(struct seq_file *m, void *unused)
 		struct page *page;
 		uint64_t *seqno;
 
-		page = i915_gem_object_get_page(dev_priv->semaphore_obj, 0);
+		page = i915_gem_object_get_page(dev_priv->semaphore->obj, 0);
 
 		seqno = (uint64_t *)kmap_atomic(page);
 		for_each_engine_id(engine, dev_priv, id) {
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 143b42b6545e..4fae0659941f 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -510,7 +510,7 @@ struct drm_i915_error_state {
 	u64 fence[I915_MAX_NUM_FENCES];
 	struct intel_overlay_error_state *overlay;
 	struct intel_display_error_state *display;
-	struct drm_i915_error_object *semaphore_obj;
+	struct drm_i915_error_object *semaphore;
 
 	struct drm_i915_error_engine {
 		int engine_id;
@@ -1754,7 +1754,7 @@ struct drm_i915_private {
 	struct pci_dev *bridge_dev;
 	struct i915_gem_context *kernel_context;
 	struct intel_engine_cs engine[I915_NUM_ENGINES];
-	struct drm_i915_gem_object *semaphore_obj;
+	struct i915_vma *semaphore;
 	u32 next_seqno;
 
 	struct drm_dma_handle *status_page_dmah;
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 2d93af0bb793..5e9e1cfa110e 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -530,7 +530,7 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
 		}
 	}
 
-	if ((obj = error->semaphore_obj)) {
+	if ((obj = error->semaphore)) {
 		err_printf(m, "Semaphore page = 0x%08x\n",
 			   lower_32_bits(obj->gtt_offset));
 		for (elt = 0; elt < PAGE_SIZE/16; elt += 4) {
@@ -621,7 +621,7 @@ static void i915_error_state_free(struct kref *error_ref)
 		kfree(ee->waiters);
 	}
 
-	i915_error_object_free(error->semaphore_obj);
+	i915_error_object_free(error->semaphore);
 
 	for (i = 0; i < ARRAY_SIZE(error->active_bo); i++)
 		kfree(error->active_bo[i]);
@@ -850,7 +850,7 @@ static void gen8_record_semaphore_state(struct drm_i915_error_state *error,
 	struct intel_engine_cs *to;
 	enum intel_engine_id id;
 
-	if (!error->semaphore_obj)
+	if (!error->semaphore)
 		return;
 
 	for_each_engine_id(to, dev_priv, id) {
@@ -863,7 +863,7 @@ static void gen8_record_semaphore_state(struct drm_i915_error_state *error,
 
 		signal_offset =
 			(GEN8_SIGNAL_OFFSET(engine, id) & (PAGE_SIZE - 1)) / 4;
-		tmp = error->semaphore_obj->pages[0];
+		tmp = error->semaphore->pages[0];
 		idx = intel_engine_sync_index(engine, to);
 
 		ee->semaphore_mboxes[idx] = tmp[signal_offset];
@@ -1035,10 +1035,10 @@ static void i915_gem_record_rings(struct drm_i915_private *dev_priv,
 	struct drm_i915_gem_request *request;
 	int i, count;
 
-	if (dev_priv->semaphore_obj) {
-		error->semaphore_obj =
+	if (dev_priv->semaphore) {
+		error->semaphore =
 			i915_error_ggtt_object_create(dev_priv,
-						      dev_priv->semaphore_obj);
+						      dev_priv->semaphore->obj);
 	}
 
 	for (i = 0; i < I915_NUM_ENGINES; i++) {
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index af2d81ae3e7d..af483a1dca0a 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1257,12 +1257,14 @@ static int init_render_ring(struct intel_engine_cs *engine)
 static void render_ring_cleanup(struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *dev_priv = engine->i915;
+	struct i915_vma *vma;
 
-	if (dev_priv->semaphore_obj) {
-		i915_gem_object_ggtt_unpin(dev_priv->semaphore_obj);
-		i915_gem_object_put(dev_priv->semaphore_obj);
-		dev_priv->semaphore_obj = NULL;
-	}
+	vma = nullify(&dev_priv->semaphore);
+	if (!vma)
+		return;
+
+	i915_vma_unpin(vma);
+	i915_gem_object_put(vma->obj);
 }
 
 static int gen8_rcs_signal(struct drm_i915_gem_request *req)
@@ -2329,12 +2331,14 @@ void intel_engine_init_seqno(struct intel_engine_cs *engine, u32 seqno)
 		if (HAS_VEBOX(dev_priv))
 			I915_WRITE(RING_SYNC_2(engine->mmio_base), 0);
 	}
-	if (dev_priv->semaphore_obj) {
-		struct drm_i915_gem_object *obj = dev_priv->semaphore_obj;
+	if (dev_priv->semaphore) {
+		struct drm_i915_gem_object *obj = dev_priv->semaphore->obj;
 		struct page *page = i915_gem_object_get_dirty_page(obj, 0);
 		void *semaphores = kmap(page);
 		memset(semaphores + GEN8_SEMAPHORE_OFFSET(engine->id, 0),
 		       0, I915_NUM_ENGINES * gen8_semaphore_seqno_size);
+		drm_clflush_virt_range(semaphores + GEN8_SEMAPHORE_OFFSET(engine->id, 0),
+				       I915_NUM_ENGINES * gen8_semaphore_seqno_size);
 		kunmap(page);
 	}
 	memset(engine->semaphore.sync_seqno, 0,
@@ -2556,36 +2560,40 @@ static int gen6_ring_flush(struct drm_i915_gem_request *req, u32 mode)
 static void intel_ring_init_semaphores(struct drm_i915_private *dev_priv,
 				       struct intel_engine_cs *engine)
 {
-	struct drm_i915_gem_object *obj;
 	int ret, i;
 
 	if (!i915.semaphores)
 		return;
 
-	if (INTEL_GEN(dev_priv) >= 8 && !dev_priv->semaphore_obj) {
+	if (INTEL_GEN(dev_priv) >= 8 && !dev_priv->semaphore) {
+		struct drm_i915_gem_object *obj;
+		struct i915_vma *vma;
+
 		obj = i915_gem_object_create(&dev_priv->drm, 4096);
 		if (IS_ERR(obj)) {
-			DRM_ERROR("Failed to allocate semaphore bo. Disabling semaphores\n");
 			i915.semaphores = 0;
-		} else {
-			i915_gem_object_set_cache_level(obj, I915_CACHE_LLC);
-			ret = i915_gem_object_ggtt_pin(obj, NULL,
-						       0, 0, PIN_HIGH);
-			if (ret != 0) {
-				i915_gem_object_put(obj);
-				DRM_ERROR("Failed to pin semaphore bo. Disabling semaphores\n");
-				i915.semaphores = 0;
-			} else {
-				dev_priv->semaphore_obj = obj;
-			}
+			return;
 		}
-	}
 
-	if (!i915.semaphores)
-		return;
+		vma = i915_vma_create(obj, &dev_priv->ggtt.base, NULL);
+		if (IS_ERR(vma)) {
+			i915_gem_object_put(obj);
+			i915.semaphores = 0;
+			return;
+		}
+
+		ret = i915_vma_pin(vma, 0, 0, PIN_GLOBAL | PIN_HIGH);
+		if (ret) {
+			i915_gem_object_put(obj);
+			i915.semaphores = 0;
+			return;
+		}
+
+		dev_priv->semaphore = vma;
+	}
 
 	if (INTEL_GEN(dev_priv) >= 8) {
-		u64 offset = i915_gem_obj_ggtt_offset(dev_priv->semaphore_obj);
+		u64 offset = dev_priv->semaphore->node.start;
 
 		engine->semaphore.sync_to = gen8_ring_sync_to;
 		engine->semaphore.signal = gen8_xcs_signal;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 9e3ab8129734..bab3367f8647 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -57,10 +57,10 @@ struct intel_hw_status_page {
 #define GEN8_SEMAPHORE_OFFSET(__from, __to)			     \
 	(((__from) * I915_NUM_ENGINES  + (__to)) * gen8_semaphore_seqno_size)
 #define GEN8_SIGNAL_OFFSET(__ring, to)			     \
-	(i915_gem_obj_ggtt_offset(dev_priv->semaphore_obj) + \
+	(dev_priv->semaphore->node.start + \
 	 GEN8_SEMAPHORE_OFFSET((__ring)->id, (to)))
 #define GEN8_WAIT_OFFSET(__ring, from)			     \
-	(i915_gem_obj_ggtt_offset(dev_priv->semaphore_obj) + \
+	(dev_priv->semaphore->node.start + \
 	 GEN8_SEMAPHORE_OFFSET(from, (__ring)->id))
 
 enum intel_engine_hangcheck_action {
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 125+ messages in thread

* [PATCH 24/33] drm/i915: Use VMA for render state page tracking
  2016-08-07 14:45 First class VMA, take 2 Chris Wilson
                   ` (22 preceding siblings ...)
  2016-08-07 14:45 ` [PATCH 23/33] drm/i915: Use VMA as the primary tracker for semaphore page Chris Wilson
@ 2016-08-07 14:45 ` Chris Wilson
  2016-08-11 10:46   ` Joonas Lahtinen
  2016-08-07 14:45 ` [PATCH 25/33] drm/i915: Use VMA for wa_ctx tracking Chris Wilson
                   ` (14 subsequent siblings)
  38 siblings, 1 reply; 125+ messages in thread
From: Chris Wilson @ 2016-08-07 14:45 UTC (permalink / raw)
  To: intel-gfx

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_render_state.c | 40 +++++++++++++++-------------
 drivers/gpu/drm/i915/i915_gem_render_state.h |  2 +-
 2 files changed, 23 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_render_state.c b/drivers/gpu/drm/i915/i915_gem_render_state.c
index 57fd767a2d79..95b7e9afd5f8 100644
--- a/drivers/gpu/drm/i915/i915_gem_render_state.c
+++ b/drivers/gpu/drm/i915/i915_gem_render_state.c
@@ -30,8 +30,7 @@
 
 struct render_state {
 	const struct intel_renderstate_rodata *rodata;
-	struct drm_i915_gem_object *obj;
-	u64 ggtt_offset;
+	struct i915_vma *vma;
 	u32 aux_batch_size;
 	u32 aux_batch_offset;
 };
@@ -73,7 +72,7 @@ render_state_get_rodata(const struct drm_i915_gem_request *req)
 
 static int render_state_setup(struct render_state *so)
 {
-	struct drm_device *dev = so->obj->base.dev;
+	struct drm_device *dev = so->vma->vm->dev;
 	const struct intel_renderstate_rodata *rodata = so->rodata;
 	const bool has_64bit_reloc = INTEL_GEN(dev) >= 8;
 	unsigned int i = 0, reloc_index = 0;
@@ -81,18 +80,18 @@ static int render_state_setup(struct render_state *so)
 	u32 *d;
 	int ret;
 
-	ret = i915_gem_object_set_to_cpu_domain(so->obj, true);
+	ret = i915_gem_object_set_to_cpu_domain(so->vma->obj, true);
 	if (ret)
 		return ret;
 
-	page = i915_gem_object_get_dirty_page(so->obj, 0);
+	page = i915_gem_object_get_dirty_page(so->vma->obj, 0);
 	d = kmap(page);
 
 	while (i < rodata->batch_items) {
 		u32 s = rodata->batch[i];
 
 		if (i * 4  == rodata->reloc[reloc_index]) {
-			u64 r = s + so->ggtt_offset;
+			u64 r = s + so->vma->node.start;
 			s = lower_32_bits(r);
 			if (has_64bit_reloc) {
 				if (i + 1 >= rodata->batch_items ||
@@ -154,7 +153,7 @@ static int render_state_setup(struct render_state *so)
 
 	kunmap(page);
 
-	ret = i915_gem_object_set_to_gtt_domain(so->obj, false);
+	ret = i915_gem_object_set_to_gtt_domain(so->vma->obj, false);
 	if (ret)
 		return ret;
 
@@ -175,6 +174,7 @@ err_out:
 int i915_gem_render_state_init(struct drm_i915_gem_request *req)
 {
 	struct render_state so;
+	struct drm_i915_gem_object *obj;
 	int ret;
 
 	if (WARN_ON(req->engine->id != RCS))
@@ -187,21 +187,25 @@ int i915_gem_render_state_init(struct drm_i915_gem_request *req)
 	if (so.rodata->batch_items * 4 > 4096)
 		return -EINVAL;
 
-	so.obj = i915_gem_object_create(&req->i915->drm, 4096);
-	if (IS_ERR(so.obj))
-		return PTR_ERR(so.obj);
+	obj = i915_gem_object_create(&req->i915->drm, 4096);
+	if (IS_ERR(obj))
+		return PTR_ERR(obj);
 
-	ret = i915_gem_object_ggtt_pin(so.obj, NULL, 0, 0, 0);
-	if (ret)
+	so.vma = i915_vma_create(obj, &req->i915->ggtt.base, NULL);
+	if (IS_ERR(so.vma)) {
+		ret = PTR_ERR(so.vma);
 		goto err_obj;
+	}
 
-	so.ggtt_offset = i915_gem_obj_ggtt_offset(so.obj);
+	ret = i915_vma_pin(so.vma, 0, 0, PIN_GLOBAL);
+	if (ret)
+		goto err_obj;
 
 	ret = render_state_setup(&so);
 	if (ret)
 		goto err_unpin;
 
-	ret = req->engine->emit_bb_start(req, so.ggtt_offset,
+	ret = req->engine->emit_bb_start(req, so.vma->node.start,
 					 so.rodata->batch_items * 4,
 					 I915_DISPATCH_SECURE);
 	if (ret)
@@ -209,7 +213,7 @@ int i915_gem_render_state_init(struct drm_i915_gem_request *req)
 
 	if (so.aux_batch_size > 8) {
 		ret = req->engine->emit_bb_start(req,
-						 (so.ggtt_offset +
+						 (so.vma->node.start +
 						  so.aux_batch_offset),
 						 so.aux_batch_size,
 						 I915_DISPATCH_SECURE);
@@ -217,10 +221,10 @@ int i915_gem_render_state_init(struct drm_i915_gem_request *req)
 			goto err_unpin;
 	}
 
-	i915_vma_move_to_active(i915_gem_obj_to_ggtt(so.obj), req, 0);
+	i915_vma_move_to_active(so.vma, req, 0);
 err_unpin:
-	i915_gem_object_ggtt_unpin(so.obj);
+	i915_vma_unpin(so.vma);
 err_obj:
-	i915_gem_object_put(so.obj);
+	i915_gem_object_put(obj);
 	return ret;
 }
diff --git a/drivers/gpu/drm/i915/i915_gem_render_state.h b/drivers/gpu/drm/i915/i915_gem_render_state.h
index c44fca8599bb..18cce3f06e9c 100644
--- a/drivers/gpu/drm/i915/i915_gem_render_state.h
+++ b/drivers/gpu/drm/i915/i915_gem_render_state.h
@@ -24,7 +24,7 @@
 #ifndef _I915_GEM_RENDER_STATE_H_
 #define _I915_GEM_RENDER_STATE_H_
 
-#include <linux/types.h>
+struct drm_i915_gem_request;
 
 int i915_gem_render_state_init(struct drm_i915_gem_request *req);
 
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 125+ messages in thread

* [PATCH 25/33] drm/i915: Use VMA for wa_ctx tracking
  2016-08-07 14:45 First class VMA, take 2 Chris Wilson
                   ` (23 preceding siblings ...)
  2016-08-07 14:45 ` [PATCH 24/33] drm/i915: Use VMA for render state page tracking Chris Wilson
@ 2016-08-07 14:45 ` Chris Wilson
  2016-08-11 10:53   ` Joonas Lahtinen
  2016-08-07 14:45 ` [PATCH 26/33] drm/i915: Track pinned VMA Chris Wilson
                   ` (13 subsequent siblings)
  38 siblings, 1 reply; 125+ messages in thread
From: Chris Wilson @ 2016-08-07 14:45 UTC (permalink / raw)
  To: intel-gfx

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gpu_error.c   |  2 +-
 drivers/gpu/drm/i915/intel_lrc.c        | 54 ++++++++++++++++++---------------
 drivers/gpu/drm/i915/intel_ringbuffer.h |  4 +--
 3 files changed, 32 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 5e9e1cfa110e..ad43c26b76a0 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1110,7 +1110,7 @@ static void i915_gem_record_rings(struct drm_i915_private *dev_priv,
 						      engine->status_page.vma->obj);
 
 		ee->wa_ctx = i915_error_ggtt_object_create(dev_priv,
-							   engine->wa_ctx.obj);
+							   engine->wa_ctx.vma->obj);
 
 		count = 0;
 		list_for_each_entry(request, &engine->request_list, link)
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 096eb8c2da17..4829d6b847d4 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1165,45 +1165,49 @@ static int gen9_init_perctx_bb(struct intel_engine_cs *engine,
 
 static int lrc_setup_wa_ctx_obj(struct intel_engine_cs *engine, u32 size)
 {
-	int ret;
+	struct drm_i915_gem_object *obj;
+	struct i915_vma *vma;
+	int err;
 
-	engine->wa_ctx.obj = i915_gem_object_create(&engine->i915->drm,
-						    PAGE_ALIGN(size));
-	if (IS_ERR(engine->wa_ctx.obj)) {
-		DRM_DEBUG_DRIVER("alloc LRC WA ctx backing obj failed.\n");
-		ret = PTR_ERR(engine->wa_ctx.obj);
-		engine->wa_ctx.obj = NULL;
-		return ret;
+	obj = i915_gem_object_create(&engine->i915->drm, PAGE_ALIGN(size));
+	if (IS_ERR(obj))
+		return PTR_ERR(obj);
+
+	vma = i915_vma_create(obj, &engine->i915->ggtt.base, NULL);
+	if (IS_ERR(vma)) {
+		i915_gem_object_put(obj);
+		return PTR_ERR(vma);
 	}
 
-	ret = i915_gem_object_ggtt_pin(engine->wa_ctx.obj, NULL,
-				       0, PAGE_SIZE, PIN_HIGH);
-	if (ret) {
-		DRM_DEBUG_DRIVER("pin LRC WA ctx backing obj failed: %d\n",
-				 ret);
-		i915_gem_object_put(engine->wa_ctx.obj);
-		return ret;
+	err = i915_vma_pin(vma, 0, PAGE_SIZE, PIN_GLOBAL | PIN_HIGH);
+	if (err) {
+		i915_gem_object_put(obj);
+		return err;
 	}
 
+	engine->wa_ctx.vma = vma;
 	return 0;
 }
 
 static void lrc_destroy_wa_ctx_obj(struct intel_engine_cs *engine)
 {
-	if (engine->wa_ctx.obj) {
-		i915_gem_object_ggtt_unpin(engine->wa_ctx.obj);
-		i915_gem_object_put(engine->wa_ctx.obj);
-		engine->wa_ctx.obj = NULL;
-	}
+	struct i915_vma *vma;
+
+	vma = nullify(&engine->wa_ctx.vma);
+	if (!vma)
+		return;
+
+	i915_vma_unpin(vma);
+	i915_gem_object_put(vma->obj);
 }
 
 static int intel_init_workaround_bb(struct intel_engine_cs *engine)
 {
-	int ret;
+	struct i915_ctx_workarounds *wa_ctx = &engine->wa_ctx;
 	uint32_t *batch;
 	uint32_t offset;
 	struct page *page;
-	struct i915_ctx_workarounds *wa_ctx = &engine->wa_ctx;
+	int ret;
 
 	WARN_ON(engine->id != RCS);
 
@@ -1226,7 +1230,7 @@ static int intel_init_workaround_bb(struct intel_engine_cs *engine)
 		return ret;
 	}
 
-	page = i915_gem_object_get_dirty_page(wa_ctx->obj, 0);
+	page = i915_gem_object_get_dirty_page(wa_ctx->vma->obj, 0);
 	batch = kmap_atomic(page);
 	offset = 0;
 
@@ -2019,9 +2023,9 @@ populate_lr_context(struct i915_gem_context *ctx,
 			       RING_INDIRECT_CTX(engine->mmio_base), 0);
 		ASSIGN_CTX_REG(reg_state, CTX_RCS_INDIRECT_CTX_OFFSET,
 			       RING_INDIRECT_CTX_OFFSET(engine->mmio_base), 0);
-		if (engine->wa_ctx.obj) {
+		if (engine->wa_ctx.vma) {
 			struct i915_ctx_workarounds *wa_ctx = &engine->wa_ctx;
-			uint32_t ggtt_offset = i915_gem_obj_ggtt_offset(wa_ctx->obj);
+			u32 ggtt_offset = wa_ctx->vma->node.start;
 
 			reg_state[CTX_RCS_INDIRECT_CTX+1] =
 				(ggtt_offset + wa_ctx->indirect_ctx.offset * sizeof(uint32_t)) |
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index bab3367f8647..0a87cfbb7fec 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -123,12 +123,12 @@ struct drm_i915_reg_table;
  *    an option for future use.
  *  size: size of the batch in DWORDS
  */
-struct  i915_ctx_workarounds {
+struct i915_ctx_workarounds {
 	struct i915_wa_ctx_bb {
 		u32 offset;
 		u32 size;
 	} indirect_ctx, per_ctx;
-	struct drm_i915_gem_object *obj;
+	struct i915_vma *vma;
 };
 
 struct drm_i915_gem_request;
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 125+ messages in thread

* [PATCH 26/33] drm/i915: Track pinned VMA
  2016-08-07 14:45 First class VMA, take 2 Chris Wilson
                   ` (24 preceding siblings ...)
  2016-08-07 14:45 ` [PATCH 25/33] drm/i915: Use VMA for wa_ctx tracking Chris Wilson
@ 2016-08-07 14:45 ` Chris Wilson
  2016-08-11 12:18   ` Joonas Lahtinen
  2016-08-07 14:45 ` [PATCH 27/33] drm/i915: Print the batchbuffer offset next to BBADDR in error state Chris Wilson
                   ` (12 subsequent siblings)
  38 siblings, 1 reply; 125+ messages in thread
From: Chris Wilson @ 2016-08-07 14:45 UTC (permalink / raw)
  To: intel-gfx

Treat the VMA as the primary struct responsible for tracking bindings
into the GPU's VM. That is we want to treat the VMA returned after we
pin an object into the VM as the cookie we hold and eventually release
when unpinning. Doing so eliminates the ambiguity in pinning the object
and then searching for the relevant pin later.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c        |   2 +-
 drivers/gpu/drm/i915/i915_drv.h            |  60 ++------
 drivers/gpu/drm/i915/i915_gem.c            | 225 +++++++----------------------
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  60 ++++----
 drivers/gpu/drm/i915/i915_gem_fence.c      |  14 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c        |  67 +++++----
 drivers/gpu/drm/i915/i915_gem_gtt.h        |  14 --
 drivers/gpu/drm/i915/i915_gem_request.c    |   2 +-
 drivers/gpu/drm/i915/i915_gem_request.h    |   2 +-
 drivers/gpu/drm/i915/i915_gem_stolen.c     |   2 +-
 drivers/gpu/drm/i915/i915_gem_tiling.c     |   2 +-
 drivers/gpu/drm/i915/i915_gpu_error.c      |  62 ++++----
 drivers/gpu/drm/i915/intel_display.c       |  55 ++++---
 drivers/gpu/drm/i915/intel_drv.h           |   5 +-
 drivers/gpu/drm/i915/intel_fbc.c           |   2 +-
 drivers/gpu/drm/i915/intel_fbdev.c         |  19 +--
 drivers/gpu/drm/i915/intel_guc_loader.c    |  21 +--
 drivers/gpu/drm/i915/intel_overlay.c       |  32 ++--
 drivers/gpu/drm/i915/intel_sprite.c        |   8 +-
 19 files changed, 256 insertions(+), 398 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index cf8a8df07bed..5f00d6347905 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -105,7 +105,7 @@ static char get_tiling_flag(struct drm_i915_gem_object *obj)
 
 static char get_global_flag(struct drm_i915_gem_object *obj)
 {
-	return i915_gem_obj_to_ggtt(obj) ? 'g' : ' ';
+	return i915_gem_object_to_ggtt(obj, NULL) ?  'g' : ' ';
 }
 
 static char get_pin_mapped_flag(struct drm_i915_gem_object *obj)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 4fae0659941f..ed9d872859b3 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3066,7 +3066,7 @@ struct drm_i915_gem_object *i915_gem_object_create_from_data(
 void i915_gem_close_object(struct drm_gem_object *gem, struct drm_file *file);
 void i915_gem_free_object(struct drm_gem_object *obj);
 
-int __must_check
+struct i915_vma * __must_check
 i915_gem_object_ggtt_pin(struct drm_i915_gem_object *obj,
 			 const struct i915_ggtt_view *view,
 			 u64 size,
@@ -3262,12 +3262,11 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj,
 				  bool write);
 int __must_check
 i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write);
-int __must_check
+struct i915_vma * __must_check
 i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
 				     u32 alignment,
 				     const struct i915_ggtt_view *view);
-void i915_gem_object_unpin_from_display_plane(struct drm_i915_gem_object *obj,
-					      const struct i915_ggtt_view *view);
+void i915_gem_object_unpin_from_display_plane(struct i915_vma *vma);
 int i915_gem_object_attach_phys(struct drm_i915_gem_object *obj,
 				int align);
 int i915_gem_open(struct drm_device *dev, struct drm_file *file);
@@ -3287,63 +3286,34 @@ struct drm_gem_object *i915_gem_prime_import(struct drm_device *dev,
 struct dma_buf *i915_gem_prime_export(struct drm_device *dev,
 				struct drm_gem_object *gem_obj, int flags);
 
-u64 i915_gem_obj_ggtt_offset_view(struct drm_i915_gem_object *o,
-				  const struct i915_ggtt_view *view);
-u64 i915_gem_obj_offset(struct drm_i915_gem_object *o,
-			struct i915_address_space *vm);
-static inline u64
-i915_gem_obj_ggtt_offset(struct drm_i915_gem_object *o)
-{
-	return i915_gem_obj_ggtt_offset_view(o, &i915_ggtt_view_normal);
-}
-
-bool i915_gem_obj_ggtt_bound_view(struct drm_i915_gem_object *o,
-				  const struct i915_ggtt_view *view);
-bool i915_gem_obj_bound(struct drm_i915_gem_object *o,
-			struct i915_address_space *vm);
-
 struct i915_vma *
 i915_gem_obj_to_vma(struct drm_i915_gem_object *obj,
-		    struct i915_address_space *vm);
-struct i915_vma *
-i915_gem_obj_to_ggtt_view(struct drm_i915_gem_object *obj,
-			  const struct i915_ggtt_view *view);
+		     struct i915_address_space *vm,
+		     const struct i915_ggtt_view *view);
 
 struct i915_vma *
 i915_gem_obj_lookup_or_create_vma(struct drm_i915_gem_object *obj,
-				  struct i915_address_space *vm);
-struct i915_vma *
-i915_gem_obj_lookup_or_create_ggtt_vma(struct drm_i915_gem_object *obj,
-				       const struct i915_ggtt_view *view);
-
-static inline struct i915_vma *
-i915_gem_obj_to_ggtt(struct drm_i915_gem_object *obj)
-{
-	return i915_gem_obj_to_ggtt_view(obj, &i915_ggtt_view_normal);
-}
-bool i915_gem_obj_is_pinned(struct drm_i915_gem_object *obj);
+				  struct i915_address_space *vm,
+				  const struct i915_ggtt_view *view);
 
-/* Some GGTT VM helpers */
 static inline struct i915_hw_ppgtt *
 i915_vm_to_ppgtt(struct i915_address_space *vm)
 {
 	return container_of(vm, struct i915_hw_ppgtt, base);
 }
 
-static inline bool i915_gem_obj_ggtt_bound(struct drm_i915_gem_object *obj)
+static inline struct i915_vma *
+i915_gem_object_to_ggtt(struct drm_i915_gem_object *obj,
+			const struct i915_ggtt_view *view)
 {
-	return i915_gem_obj_ggtt_bound_view(obj, &i915_ggtt_view_normal);
+	return i915_gem_obj_to_vma(obj, &to_i915(obj->base.dev)->ggtt.base, view);
 }
 
-unsigned long
-i915_gem_obj_ggtt_size(struct drm_i915_gem_object *obj);
-
-void i915_gem_object_ggtt_unpin_view(struct drm_i915_gem_object *obj,
-				     const struct i915_ggtt_view *view);
-static inline void
-i915_gem_object_ggtt_unpin(struct drm_i915_gem_object *obj)
+static inline unsigned long
+i915_gem_object_ggtt_offset(struct drm_i915_gem_object *o,
+			    const struct i915_ggtt_view *view)
 {
-	i915_gem_object_ggtt_unpin_view(obj, &i915_ggtt_view_normal);
+	return i915_gem_object_to_ggtt(o, view)->node.start;
 }
 
 /* i915_gem_fence.c */
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index cf94c6ed0ff5..24641e4aea4f 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -737,14 +737,15 @@ i915_gem_gtt_pread(struct drm_device *dev,
 {
 	struct drm_i915_private *dev_priv = to_i915(dev);
 	struct i915_ggtt *ggtt = &dev_priv->ggtt;
+	struct i915_vma *vma;
 	struct drm_mm_node node;
 	char __user *user_data;
 	uint64_t remain;
 	uint64_t offset;
 	int ret;
 
-	ret = i915_gem_object_ggtt_pin(obj, NULL, 0, 0, PIN_MAPPABLE);
-	if (ret) {
+	vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0, PIN_MAPPABLE);
+	if (IS_ERR(vma)) {
 		ret = insert_mappable_node(dev_priv, &node, PAGE_SIZE);
 		if (ret)
 			goto out;
@@ -757,7 +758,7 @@ i915_gem_gtt_pread(struct drm_device *dev,
 
 		i915_gem_object_pin_pages(obj);
 	} else {
-		node.start = i915_gem_obj_ggtt_offset(obj);
+		node.start = vma->node.start;
 		node.allocated = false;
 		ret = i915_gem_object_put_fence(obj);
 		if (ret)
@@ -838,7 +839,7 @@ out_unpin:
 		i915_gem_object_unpin_pages(obj);
 		remove_mappable_node(&node);
 	} else {
-		i915_gem_object_ggtt_unpin(obj);
+		i915_vma_unpin(vma);
 	}
 out:
 	return ret;
@@ -1036,6 +1037,7 @@ i915_gem_gtt_pwrite_fast(struct drm_i915_private *i915,
 {
 	struct i915_ggtt *ggtt = &i915->ggtt;
 	struct drm_device *dev = obj->base.dev;
+	struct i915_vma *vma;
 	struct drm_mm_node node;
 	uint64_t remain, offset;
 	char __user *user_data;
@@ -1045,9 +1047,9 @@ i915_gem_gtt_pwrite_fast(struct drm_i915_private *i915,
 	if (i915_gem_object_is_tiled(obj))
 		return -EFAULT;
 
-	ret = i915_gem_object_ggtt_pin(obj, NULL, 0, 0,
+	vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0,
 				       PIN_MAPPABLE | PIN_NONBLOCK);
-	if (ret) {
+	if (IS_ERR(vma)) {
 		ret = insert_mappable_node(i915, &node, PAGE_SIZE);
 		if (ret)
 			goto out;
@@ -1060,7 +1062,7 @@ i915_gem_gtt_pwrite_fast(struct drm_i915_private *i915,
 
 		i915_gem_object_pin_pages(obj);
 	} else {
-		node.start = i915_gem_obj_ggtt_offset(obj);
+		node.start = vma->node.start;
 		node.allocated = false;
 		ret = i915_gem_object_put_fence(obj);
 		if (ret)
@@ -1148,7 +1150,7 @@ out_unpin:
 		i915_gem_object_unpin_pages(obj);
 		remove_mappable_node(&node);
 	} else {
-		i915_gem_object_ggtt_unpin(obj);
+		i915_vma_unpin(vma);
 	}
 out:
 	return ret;
@@ -1616,7 +1618,7 @@ i915_gem_mmap_ioctl(struct drm_device *dev, void *data,
 
 /**
  * i915_gem_fault - fault a page into the GTT
- * @vma: VMA in question
+ * @mm: VMA in question
  * @vmf: fault info
  *
  * The fault handler is set up by drm_gem_mmap() when a object is GTT mapped
@@ -1630,20 +1632,21 @@ i915_gem_mmap_ioctl(struct drm_device *dev, void *data,
  * suffer if the GTT working set is large or there are few fence registers
  * left.
  */
-int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
+int i915_gem_fault(struct vm_area_struct *vm, struct vm_fault *vmf)
 {
-	struct drm_i915_gem_object *obj = to_intel_bo(vma->vm_private_data);
+	struct drm_i915_gem_object *obj = to_intel_bo(vm->vm_private_data);
 	struct drm_device *dev = obj->base.dev;
 	struct drm_i915_private *dev_priv = to_i915(dev);
 	struct i915_ggtt *ggtt = &dev_priv->ggtt;
 	struct i915_ggtt_view view = i915_ggtt_view_normal;
 	bool write = !!(vmf->flags & FAULT_FLAG_WRITE);
+	struct i915_vma *vma;
 	pgoff_t page_offset;
 	unsigned long pfn;
 	int ret;
 
 	/* We don't use vmf->pgoff since that has the fake offset */
-	page_offset = ((unsigned long)vmf->virtual_address - vma->vm_start) >>
+	page_offset = ((unsigned long)vmf->virtual_address - vm->vm_start) >>
 		PAGE_SHIFT;
 
 	trace_i915_gem_object_fault(obj, page_offset, true, write);
@@ -1680,14 +1683,16 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 		view.params.partial.size =
 			min_t(unsigned int,
 			      chunk_size,
-			      (vma->vm_end - vma->vm_start)/PAGE_SIZE -
+			      (vm->vm_end - vm->vm_start)/PAGE_SIZE -
 			      view.params.partial.offset);
 	}
 
 	/* Now pin it into the GTT if needed */
-	ret = i915_gem_object_ggtt_pin(obj, &view, 0, 0, PIN_MAPPABLE);
-	if (ret)
+	vma = i915_gem_object_ggtt_pin(obj, &view, 0, 0, PIN_MAPPABLE);
+	if (IS_ERR(vma)) {
+		ret = PTR_ERR(vma);
 		goto err_unlock;
+	}
 
 	ret = i915_gem_object_set_to_gtt_domain(obj, write);
 	if (ret)
@@ -1698,8 +1703,7 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 		goto err_unpin;
 
 	/* Finally, remap it using the new GTT offset */
-	pfn = ggtt->mappable_base +
-		i915_gem_obj_ggtt_offset_view(obj, &view);
+	pfn = ggtt->mappable_base + vma->node.start;
 	pfn >>= PAGE_SHIFT;
 
 	if (unlikely(view.type == I915_GGTT_VIEW_PARTIAL)) {
@@ -1708,12 +1712,12 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 		 * is due to userspace losing part of the mapping or never
 		 * having accessed it before (at this partials' range).
 		 */
-		unsigned long base = vma->vm_start +
+		unsigned long base = vm->vm_start +
 				     (view.params.partial.offset << PAGE_SHIFT);
 		unsigned int i;
 
 		for (i = 0; i < view.params.partial.size; i++) {
-			ret = vm_insert_pfn(vma, base + i * PAGE_SIZE, pfn + i);
+			ret = vm_insert_pfn(vm, base + i * PAGE_SIZE, pfn + i);
 			if (ret)
 				break;
 		}
@@ -1722,13 +1726,13 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 	} else {
 		if (!obj->fault_mappable) {
 			unsigned long size = min_t(unsigned long,
-						   vma->vm_end - vma->vm_start,
+						   vm->vm_end - vm->vm_start,
 						   obj->base.size);
 			int i;
 
 			for (i = 0; i < size >> PAGE_SHIFT; i++) {
-				ret = vm_insert_pfn(vma,
-						    (unsigned long)vma->vm_start + i * PAGE_SIZE,
+				ret = vm_insert_pfn(vm,
+						    (unsigned long)vm->vm_start + i * PAGE_SIZE,
 						    pfn + i);
 				if (ret)
 					break;
@@ -1736,12 +1740,12 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 
 			obj->fault_mappable = true;
 		} else
-			ret = vm_insert_pfn(vma,
+			ret = vm_insert_pfn(vm,
 					    (unsigned long)vmf->virtual_address,
 					    pfn + page_offset);
 	}
 err_unpin:
-	i915_gem_object_ggtt_unpin_view(obj, &view);
+	__i915_vma_unpin(vma);
 err_unlock:
 	mutex_unlock(&dev->struct_mutex);
 err_rpm:
@@ -3190,7 +3194,7 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
 					    old_write_domain);
 
 	/* And bump the LRU for this access */
-	vma = i915_gem_obj_to_ggtt(obj);
+	vma = i915_gem_object_to_ggtt(obj, NULL);
 	if (vma &&
 	    drm_mm_node_allocated(&vma->node) &&
 	    !i915_vma_is_active(vma))
@@ -3414,11 +3418,12 @@ rpm_put:
  * Can be called from an uninterruptible phase (modesetting) and allows
  * any flushes to be pipelined (for pageflips).
  */
-int
+struct i915_vma *
 i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
 				     u32 alignment,
 				     const struct i915_ggtt_view *view)
 {
+	struct i915_vma *vma;
 	u32 old_read_domains, old_write_domain;
 	int ret;
 
@@ -3438,19 +3443,23 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
 	 */
 	ret = i915_gem_object_set_cache_level(obj,
 					      HAS_WT(obj->base.dev) ? I915_CACHE_WT : I915_CACHE_NONE);
-	if (ret)
+	if (ret) {
+		vma = ERR_PTR(ret);
 		goto err_unpin_display;
+	}
 
 	/* As the user may map the buffer once pinned in the display plane
 	 * (e.g. libkms for the bootup splash), we have to ensure that we
 	 * always use map_and_fenceable for all scanout buffers.
 	 */
-	ret = i915_gem_object_ggtt_pin(obj, view, 0, alignment,
+	vma = i915_gem_object_ggtt_pin(obj, view, 0, alignment,
 				       view->type == I915_GGTT_VIEW_NORMAL ?
 				       PIN_MAPPABLE : 0);
-	if (ret)
+	if (IS_ERR(vma))
 		goto err_unpin_display;
 
+	WARN_ON(obj->pin_display > i915_vma_pin_count(vma));
+
 	i915_gem_object_flush_cpu_write_domain(obj);
 
 	old_write_domain = obj->base.write_domain;
@@ -3466,23 +3475,23 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
 					    old_read_domains,
 					    old_write_domain);
 
-	return 0;
+	return vma;
 
 err_unpin_display:
 	obj->pin_display--;
-	return ret;
+	return vma;
 }
 
 void
-i915_gem_object_unpin_from_display_plane(struct drm_i915_gem_object *obj,
-					 const struct i915_ggtt_view *view)
+i915_gem_object_unpin_from_display_plane(struct i915_vma *vma)
 {
-	if (WARN_ON(obj->pin_display == 0))
+	if (WARN_ON(vma->obj->pin_display == 0))
 		return;
 
-	i915_gem_object_ggtt_unpin_view(obj, view);
+	vma->obj->pin_display--;
 
-	obj->pin_display--;
+	i915_vma_unpin(vma);
+	WARN_ON(vma->obj->pin_display > i915_vma_pin_count(vma));
 }
 
 /**
@@ -3679,27 +3688,25 @@ err:
 	return ret;
 }
 
-int
+struct i915_vma *
 i915_gem_object_ggtt_pin(struct drm_i915_gem_object *obj,
-			 const struct i915_ggtt_view *view,
+			 const struct i915_ggtt_view *ggtt_view,
 			 u64 size,
 			 u64 alignment,
 			 u64 flags)
 {
+	struct i915_address_space *vm = &to_i915(obj->base.dev)->ggtt.base;
 	struct i915_vma *vma;
 	int ret;
 
-	if (!view)
-		view = &i915_ggtt_view_normal;
-
-	vma = i915_gem_obj_lookup_or_create_ggtt_vma(obj, view);
+	vma = i915_gem_obj_lookup_or_create_vma(obj, vm, ggtt_view);
 	if (IS_ERR(vma))
-		return PTR_ERR(vma);
+		return vma;
 
 	if (i915_vma_misplaced(vma, size, alignment, flags)) {
 		if (flags & PIN_NONBLOCK &&
 		    (i915_vma_is_pinned(vma) || i915_vma_is_active(vma)))
-			return -ENOSPC;
+			return ERR_PTR(-ENOSPC);
 
 		WARN(i915_vma_is_pinned(vma),
 		     "bo is already pinned in ggtt with incorrect alignment:"
@@ -3712,17 +3719,14 @@ i915_gem_object_ggtt_pin(struct drm_i915_gem_object *obj,
 		     obj->map_and_fenceable);
 		ret = i915_vma_unbind(vma);
 		if (ret)
-			return ret;
+			return ERR_PTR(ret);
 	}
 
-	return i915_vma_pin(vma, size, alignment, flags | PIN_GLOBAL);
-}
+	ret = i915_vma_pin(vma, size, alignment, flags | PIN_GLOBAL);
+	if (ret)
+		return ERR_PTR(ret);
 
-void
-i915_gem_object_ggtt_unpin_view(struct drm_i915_gem_object *obj,
-				const struct i915_ggtt_view *view)
-{
-	i915_vma_unpin(i915_gem_obj_to_ggtt_view(obj, view));
+	return vma;
 }
 
 static __always_inline unsigned int __busy_read_flag(unsigned int id)
@@ -4087,32 +4091,6 @@ void i915_gem_free_object(struct drm_gem_object *gem_obj)
 	intel_runtime_pm_put(dev_priv);
 }
 
-struct i915_vma *i915_gem_obj_to_vma(struct drm_i915_gem_object *obj,
-				     struct i915_address_space *vm)
-{
-	struct i915_vma *vma;
-	list_for_each_entry(vma, &obj->vma_list, obj_link) {
-		if (vma->ggtt_view.type == I915_GGTT_VIEW_NORMAL &&
-		    vma->vm == vm)
-			return vma;
-	}
-	return NULL;
-}
-
-struct i915_vma *i915_gem_obj_to_ggtt_view(struct drm_i915_gem_object *obj,
-					   const struct i915_ggtt_view *view)
-{
-	struct i915_vma *vma;
-
-	GEM_BUG_ON(!view);
-
-	list_for_each_entry(vma, &obj->vma_list, obj_link)
-		if (i915_vma_is_ggtt(vma) &&
-		    i915_ggtt_view_equal(&vma->ggtt_view, view))
-			return vma;
-	return NULL;
-}
-
 int i915_gem_suspend(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = to_i915(dev);
@@ -4580,97 +4558,6 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old,
 	}
 }
 
-/* All the new VM stuff */
-u64 i915_gem_obj_offset(struct drm_i915_gem_object *o,
-			struct i915_address_space *vm)
-{
-	struct drm_i915_private *dev_priv = to_i915(o->base.dev);
-	struct i915_vma *vma;
-
-	WARN_ON(vm == &dev_priv->mm.aliasing_ppgtt->base);
-
-	list_for_each_entry(vma, &o->vma_list, obj_link) {
-		if (i915_vma_is_ggtt(vma) &&
-		    vma->ggtt_view.type != I915_GGTT_VIEW_NORMAL)
-			continue;
-		if (vma->vm == vm)
-			return vma->node.start;
-	}
-
-	WARN(1, "%s vma for this object not found.\n",
-	     i915_is_ggtt(vm) ? "global" : "ppgtt");
-	return -1;
-}
-
-u64 i915_gem_obj_ggtt_offset_view(struct drm_i915_gem_object *o,
-				  const struct i915_ggtt_view *view)
-{
-	struct i915_vma *vma;
-
-	list_for_each_entry(vma, &o->vma_list, obj_link)
-		if (i915_vma_is_ggtt(vma) &&
-		    i915_ggtt_view_equal(&vma->ggtt_view, view))
-			return vma->node.start;
-
-	WARN(1, "global vma for this object not found. (view=%u)\n", view->type);
-	return -1;
-}
-
-bool i915_gem_obj_bound(struct drm_i915_gem_object *o,
-			struct i915_address_space *vm)
-{
-	struct i915_vma *vma;
-
-	list_for_each_entry(vma, &o->vma_list, obj_link) {
-		if (i915_vma_is_ggtt(vma) &&
-		    vma->ggtt_view.type != I915_GGTT_VIEW_NORMAL)
-			continue;
-		if (vma->vm == vm && drm_mm_node_allocated(&vma->node))
-			return true;
-	}
-
-	return false;
-}
-
-bool i915_gem_obj_ggtt_bound_view(struct drm_i915_gem_object *o,
-				  const struct i915_ggtt_view *view)
-{
-	struct i915_vma *vma;
-
-	list_for_each_entry(vma, &o->vma_list, obj_link)
-		if (i915_vma_is_ggtt(vma) &&
-		    i915_ggtt_view_equal(&vma->ggtt_view, view) &&
-		    drm_mm_node_allocated(&vma->node))
-			return true;
-
-	return false;
-}
-
-unsigned long i915_gem_obj_ggtt_size(struct drm_i915_gem_object *o)
-{
-	struct i915_vma *vma;
-
-	GEM_BUG_ON(list_empty(&o->vma_list));
-
-	list_for_each_entry(vma, &o->vma_list, obj_link) {
-		if (i915_vma_is_ggtt(vma) &&
-		    vma->ggtt_view.type == I915_GGTT_VIEW_NORMAL)
-			return vma->node.size;
-	}
-
-	return 0;
-}
-
-bool i915_gem_obj_is_pinned(struct drm_i915_gem_object *obj)
-{
-	struct i915_vma *vma;
-	list_for_each_entry(vma, &obj->vma_list, obj_link)
-		if (i915_vma_is_pinned(vma))
-			return true;
-
-	return false;
-}
-
 /* Like i915_gem_object_get_page(), but mark the returned page dirty */
 struct page *
 i915_gem_object_get_dirty_page(struct drm_i915_gem_object *obj, int n)
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index c8d13fea4b25..160867f48f91 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -180,8 +180,8 @@ eb_lookup_vmas(struct eb_vmas *eb,
 		 * from the (obj, vm) we don't run the risk of creating
 		 * duplicated vmas for the same vm.
 		 */
-		vma = i915_gem_obj_lookup_or_create_vma(obj, vm);
-		if (IS_ERR(vma)) {
+		vma = i915_gem_obj_lookup_or_create_vma(obj, vm, NULL);
+		if (unlikely(IS_ERR(vma))) {
 			DRM_DEBUG("Failed to lookup VMA\n");
 			ret = PTR_ERR(vma);
 			goto err;
@@ -349,30 +349,34 @@ relocate_entry_gtt(struct drm_i915_gem_object *obj,
 		   struct drm_i915_gem_relocation_entry *reloc,
 		   uint64_t target_offset)
 {
-	struct drm_device *dev = obj->base.dev;
-	struct drm_i915_private *dev_priv = to_i915(dev);
+	struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
 	struct i915_ggtt *ggtt = &dev_priv->ggtt;
+	struct i915_vma *vma;
 	uint64_t delta = relocation_target(reloc, target_offset);
 	uint64_t offset;
 	void __iomem *reloc_page;
 	int ret;
 
+	vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0, PIN_MAPPABLE);
+	if (IS_ERR(vma))
+		return PTR_ERR(vma);
+
 	ret = i915_gem_object_set_to_gtt_domain(obj, true);
 	if (ret)
-		return ret;
+		goto unpin;
 
 	ret = i915_gem_object_put_fence(obj);
 	if (ret)
-		return ret;
+		goto unpin;
 
 	/* Map the page containing the relocation we're going to perform.  */
-	offset = i915_gem_obj_ggtt_offset(obj);
+	offset = vma->node.start;
 	offset += reloc->offset;
 	reloc_page = io_mapping_map_atomic_wc(ggtt->mappable,
 					      offset & PAGE_MASK);
 	iowrite32(lower_32_bits(delta), reloc_page + offset_in_page(offset));
 
-	if (INTEL_INFO(dev)->gen >= 8) {
+	if (INTEL_GEN(dev_priv) >= 8) {
 		offset += sizeof(uint32_t);
 
 		if (offset_in_page(offset) == 0) {
@@ -388,7 +392,9 @@ relocate_entry_gtt(struct drm_i915_gem_object *obj,
 
 	io_mapping_unmap_atomic(reloc_page);
 
-	return 0;
+unpin:
+	i915_vma_unpin(vma);
+	return ret;
 }
 
 static void
@@ -1281,7 +1287,7 @@ i915_reset_gen7_sol_offsets(struct drm_i915_gem_request *req)
 	return 0;
 }
 
-static struct i915_vma*
+static struct i915_vma *
 i915_gem_execbuffer_parse(struct intel_engine_cs *engine,
 			  struct drm_i915_gem_exec_object2 *shadow_exec_entry,
 			  struct drm_i915_gem_object *batch_obj,
@@ -1305,31 +1311,30 @@ i915_gem_execbuffer_parse(struct intel_engine_cs *engine,
 				      batch_start_offset,
 				      batch_len,
 				      is_master);
-	if (ret)
+	if (ret) {
+		if (ret == -EACCES) /* unhandled chained batch */
+			vma = NULL;
+		else
+			vma = ERR_PTR(ret);
 		goto err;
+	}
 
-	ret = i915_gem_object_ggtt_pin(shadow_batch_obj, NULL, 0, 0, 0);
-	if (ret)
+	vma = i915_gem_object_ggtt_pin(shadow_batch_obj, NULL, 0, 0, 0);
+	if (IS_ERR(vma)) {
+		ret = PTR_ERR(vma);
 		goto err;
-
-	i915_gem_object_unpin_pages(shadow_batch_obj);
+	}
 
 	memset(shadow_exec_entry, 0, sizeof(*shadow_exec_entry));
 
-	vma = i915_gem_obj_to_ggtt(shadow_batch_obj);
 	vma->exec_entry = shadow_exec_entry;
 	vma->exec_entry->flags = __EXEC_OBJECT_HAS_PIN;
 	i915_gem_object_get(shadow_batch_obj);
 	list_add_tail(&vma->exec_list, &eb->vmas);
 
-	return vma;
-
 err:
 	i915_gem_object_unpin_pages(shadow_batch_obj);
-	if (ret == -EACCES) /* unhandled chained batch */
-		return NULL;
-	else
-		return ERR_PTR(ret);
+	return vma;
 }
 
 static int
@@ -1677,6 +1682,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	 * hsw should have this fixed, but bdw mucks it up again. */
 	if (dispatch_flags & I915_DISPATCH_SECURE) {
 		struct drm_i915_gem_object *obj = params->batch->obj;
+		struct i915_vma *vma;
 
 		/*
 		 * So on first glance it looks freaky that we pin the batch here
@@ -1688,11 +1694,13 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 		 *   fitting due to fragmentation.
 		 * So this is actually safe.
 		 */
-		ret = i915_gem_object_ggtt_pin(obj, NULL, 0, 0, 0);
-		if (ret)
+		vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0, 0);
+		if (IS_ERR(vma)) {
+			ret = PTR_ERR(vma);
 			goto err;
+		}
 
-		params->batch = i915_gem_obj_to_ggtt(obj);
+		params->batch = vma;
 	}
 
 	/* Allocate a request for this batch buffer nice and early. */
@@ -1708,7 +1716,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	 * inactive_list and lose its active reference. Hence we do not need
 	 * to explicitly hold another reference here.
 	 */
-	params->request->batch_obj = params->batch->obj;
+	params->request->batch = params->batch;
 
 	ret = i915_gem_request_add_to_client(params->request, file);
 	if (ret)
diff --git a/drivers/gpu/drm/i915/i915_gem_fence.c b/drivers/gpu/drm/i915/i915_gem_fence.c
index 60749cd23f20..f979aeaeb78a 100644
--- a/drivers/gpu/drm/i915/i915_gem_fence.c
+++ b/drivers/gpu/drm/i915/i915_gem_fence.c
@@ -85,7 +85,7 @@ static void i965_write_fence_reg(struct drm_device *dev, int reg,
 	POSTING_READ(fence_reg_lo);
 
 	if (obj) {
-		struct i915_vma *vma = i915_gem_obj_to_ggtt(obj);
+		struct i915_vma *vma = i915_gem_object_to_ggtt(obj, NULL);
 		unsigned int tiling = i915_gem_object_get_tiling(obj);
 		unsigned int stride = i915_gem_object_get_stride(obj);
 		u64 size = vma->node.size;
@@ -120,7 +120,7 @@ static void i915_write_fence_reg(struct drm_device *dev, int reg,
 	u32 val;
 
 	if (obj) {
-		struct i915_vma *vma = i915_gem_obj_to_ggtt(obj);
+		struct i915_vma *vma = i915_gem_object_to_ggtt(obj, NULL);
 		unsigned int tiling = i915_gem_object_get_tiling(obj);
 		unsigned int stride = i915_gem_object_get_stride(obj);
 		int pitch_val;
@@ -161,7 +161,7 @@ static void i830_write_fence_reg(struct drm_device *dev, int reg,
 	u32 val;
 
 	if (obj) {
-		struct i915_vma *vma = i915_gem_obj_to_ggtt(obj);
+		struct i915_vma *vma = i915_gem_object_to_ggtt(obj, NULL);
 		unsigned int tiling = i915_gem_object_get_tiling(obj);
 		unsigned int stride = i915_gem_object_get_stride(obj);
 		u32 pitch_val;
@@ -432,13 +432,7 @@ bool
 i915_gem_object_pin_fence(struct drm_i915_gem_object *obj)
 {
 	if (obj->fence_reg != I915_FENCE_REG_NONE) {
-		struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
-		struct i915_vma *ggtt_vma = i915_gem_obj_to_ggtt(obj);
-
-		WARN_ON(!ggtt_vma ||
-			dev_priv->fence_regs[obj->fence_reg].pin_count >
-			i915_vma_pin_count(ggtt_vma));
-		dev_priv->fence_regs[obj->fence_reg].pin_count++;
+		to_i915(obj->base.dev)->fence_regs[obj->fence_reg].pin_count++;
 		return true;
 	} else
 		return false;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index ce53f08186fa..e85593b3bb85 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -3351,14 +3351,10 @@ __i915_gem_vma_create(struct drm_i915_gem_object *obj,
 
 	GEM_BUG_ON(vm->closed);
 
-	if (WARN_ON(i915_is_ggtt(vm) != !!view))
-		return ERR_PTR(-EINVAL);
-
 	vma = kmem_cache_zalloc(to_i915(obj->base.dev)->vmas, GFP_KERNEL);
 	if (vma == NULL)
 		return ERR_PTR(-ENOMEM);
 
-	INIT_LIST_HEAD(&vma->obj_link);
 	INIT_LIST_HEAD(&vma->exec_list);
 	for (i = 0; i < ARRAY_SIZE(vma->last_read); i++)
 		init_request_active(&vma->last_read[i], i915_vma_retire);
@@ -3367,8 +3363,7 @@ __i915_gem_vma_create(struct drm_i915_gem_object *obj,
 	vma->obj = obj;
 	vma->size = obj->base.size;
 
-	if (i915_is_ggtt(vm)) {
-		vma->flags |= I915_VMA_GGTT;
+	if (view) {
 		vma->ggtt_view = *view;
 		if (view->type == I915_GGTT_VIEW_PARTIAL) {
 			vma->size = view->params.partial.size;
@@ -3378,56 +3373,76 @@ __i915_gem_vma_create(struct drm_i915_gem_object *obj,
 				intel_rotation_info_size(&view->params.rotated);
 			vma->size <<= PAGE_SHIFT;
 		}
+	}
+
+	if (i915_is_ggtt(vm)) {
+		vma->flags |= I915_VMA_GGTT;
 	} else {
 		i915_ppgtt_get(i915_vm_to_ppgtt(vm));
 	}
 
 	list_add_tail(&vma->obj_link, &obj->vma_list);
-
 	return vma;
 }
 
+static inline bool vma_matches(struct i915_vma *vma,
+			       struct i915_address_space *vm,
+			       const struct i915_ggtt_view *view)
+{
+	if (vma->vm != vm)
+		return false;
+
+	if (!i915_vma_is_ggtt(vma))
+		return true;
+
+	if (!view)
+		return vma->ggtt_view.type == 0;
+
+	if (vma->ggtt_view.type != view->type)
+		return false;
+
+	return memcmp(&vma->ggtt_view.params,
+		      &view->params,
+		      sizeof(view->params)) == 0;
+}
+
 struct i915_vma *
 i915_vma_create(struct drm_i915_gem_object *obj,
 		struct i915_address_space *vm,
 		const struct i915_ggtt_view *view)
 {
-	GEM_BUG_ON(view ? i915_gem_obj_to_ggtt_view(obj, view) : i915_gem_obj_to_vma(obj, vm));
+	GEM_BUG_ON(i915_gem_obj_to_vma(obj, vm, view));
 
-	return __i915_gem_vma_create(obj, vm, view ?: &i915_ggtt_view_normal);
+	return __i915_gem_vma_create(obj, vm, view);
 }
 
 struct i915_vma *
-i915_gem_obj_lookup_or_create_vma(struct drm_i915_gem_object *obj,
-				  struct i915_address_space *vm)
+i915_gem_obj_to_vma(struct drm_i915_gem_object *obj,
+		    struct i915_address_space *vm,
+		    const struct i915_ggtt_view *view)
 {
 	struct i915_vma *vma;
 
-	vma = i915_gem_obj_to_vma(obj, vm);
-	if (!vma)
-		vma = __i915_gem_vma_create(obj, vm,
-					    i915_is_ggtt(vm) ? &i915_ggtt_view_normal : NULL);
+	list_for_each_entry_reverse(vma, &obj->vma_list, obj_link)
+		if (vma_matches(vma, vm, view))
+			return vma;
 
-	return vma;
+	return NULL;
 }
 
 struct i915_vma *
-i915_gem_obj_lookup_or_create_ggtt_vma(struct drm_i915_gem_object *obj,
-				       const struct i915_ggtt_view *view)
+i915_gem_obj_lookup_or_create_vma(struct drm_i915_gem_object *obj,
+				  struct i915_address_space *vm,
+				  const struct i915_ggtt_view *view)
 {
-	struct drm_device *dev = obj->base.dev;
-	struct drm_i915_private *dev_priv = to_i915(dev);
-	struct i915_ggtt *ggtt = &dev_priv->ggtt;
-	struct i915_vma *vma = i915_gem_obj_to_ggtt_view(obj, view);
-
-	GEM_BUG_ON(!view);
+	struct i915_vma *vma;
 
+	vma = i915_gem_obj_to_vma(obj, vm, view);
 	if (!vma)
-		vma = __i915_gem_vma_create(obj, &ggtt->base, view);
+		vma = __i915_gem_vma_create(obj, vm, view);
 
 	GEM_BUG_ON(i915_vma_is_closed(vma));
 	return vma;
-
 }
 
 static struct scatterlist *
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index ac47663a4d32..d3eb910ddb89 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -613,20 +613,6 @@ void i915_gem_restore_gtt_mappings(struct drm_device *dev);
 int __must_check i915_gem_gtt_prepare_object(struct drm_i915_gem_object *obj);
 void i915_gem_gtt_finish_object(struct drm_i915_gem_object *obj);
 
-static inline bool
-i915_ggtt_view_equal(const struct i915_ggtt_view *a,
-                     const struct i915_ggtt_view *b)
-{
-	if (WARN_ON(!a || !b))
-		return false;
-
-	if (a->type != b->type)
-		return false;
-	if (a->type != I915_GGTT_VIEW_NORMAL)
-		return !memcmp(&a->params, &b->params, sizeof(a->params));
-	return true;
-}
-
 /* Flags used by pin/bind&friends. */
 #define PIN_NONBLOCK		BIT(0)
 #define PIN_MAPPABLE		BIT(1)
diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index 0092f5e90cb2..187c4f9ce8d0 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -407,7 +407,7 @@ i915_gem_request_alloc(struct intel_engine_cs *engine,
 	req->signaling.wait.tsk = NULL;
 	req->previous_context = NULL;
 	req->file_priv = NULL;
-	req->batch_obj = NULL;
+	req->batch = NULL;
 	req->elsp_submitted = 0;
 
 	/*
diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
index d5176f9cc22f..1f396f470a86 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.h
+++ b/drivers/gpu/drm/i915/i915_gem_request.h
@@ -118,7 +118,7 @@ struct drm_i915_gem_request {
 	/** Batch buffer related to this request if any (used for
 	 * error state dump only).
 	 */
-	struct drm_i915_gem_object *batch_obj;
+	struct i915_vma *batch;
 	struct list_head active_list;
 
 	/** Time at which this request was emitted, in jiffies. */
diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c b/drivers/gpu/drm/i915/i915_gem_stolen.c
index 13279610eeec..c3dcfb724966 100644
--- a/drivers/gpu/drm/i915/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/i915_gem_stolen.c
@@ -685,7 +685,7 @@ i915_gem_object_create_stolen_for_preallocated(struct drm_device *dev,
 	if (gtt_offset == I915_GTT_OFFSET_NONE)
 		return obj;
 
-	vma = i915_gem_obj_lookup_or_create_vma(obj, &ggtt->base);
+	vma = i915_gem_obj_lookup_or_create_vma(obj, &ggtt->base, NULL);
 	if (IS_ERR(vma)) {
 		ret = PTR_ERR(vma);
 		goto err;
diff --git a/drivers/gpu/drm/i915/i915_gem_tiling.c b/drivers/gpu/drm/i915/i915_gem_tiling.c
index 2ceaddc959d3..265e4123e678 100644
--- a/drivers/gpu/drm/i915/i915_gem_tiling.c
+++ b/drivers/gpu/drm/i915/i915_gem_tiling.c
@@ -130,7 +130,7 @@ i915_gem_object_fence_ok(struct drm_i915_gem_object *obj, int tiling_mode)
 	if (INTEL_GEN(dev_priv) >= 4)
 		return 0;
 
-	vma = i915_gem_obj_to_ggtt(obj);
+	vma = i915_gem_object_to_ggtt(obj, NULL);
 	if (!vma)
 		return 0;
 
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index ad43c26b76a0..1bceaf96bc5f 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -634,46 +634,42 @@ static void i915_error_state_free(struct kref *error_ref)
 
 static struct drm_i915_error_object *
 i915_error_object_create(struct drm_i915_private *dev_priv,
-			 struct drm_i915_gem_object *src,
-			 struct i915_address_space *vm)
+			 struct i915_vma *vma)
 {
 	struct i915_ggtt *ggtt = &dev_priv->ggtt;
+	struct drm_i915_gem_object *src;
 	struct drm_i915_error_object *dst;
-	struct i915_vma *vma = NULL;
 	int num_pages;
 	bool use_ggtt;
 	int i = 0;
 	u64 reloc_offset;
 
-	if (src == NULL || src->pages == NULL)
+	if (!vma)
+		return NULL;
+
+	src = vma->obj;
+	if (!src->pages)
 		return NULL;
 
 	num_pages = src->base.size >> PAGE_SHIFT;
 
 	dst = kmalloc(sizeof(*dst) + num_pages * sizeof(u32 *), GFP_ATOMIC);
-	if (dst == NULL)
+	if (!dst)
 		return NULL;
 
-	if (i915_gem_obj_bound(src, vm))
-		dst->gtt_offset = i915_gem_obj_offset(src, vm);
-	else
-		dst->gtt_offset = -1;
-
-	reloc_offset = dst->gtt_offset;
-	if (i915_is_ggtt(vm))
-		vma = i915_gem_obj_to_ggtt(src);
+	reloc_offset = dst->gtt_offset = vma->node.start;
 	use_ggtt = (src->cache_level == I915_CACHE_NONE &&
-		   vma && (vma->flags & I915_VMA_GLOBAL_BIND) &&
+		   (vma->flags & I915_VMA_GLOBAL_BIND) &&
 		   reloc_offset + num_pages * PAGE_SIZE <= ggtt->mappable_end);
 
 	/* Cannot access stolen address directly, try to use the aperture */
 	if (src->stolen) {
 		use_ggtt = true;
 
-		if (!(vma && vma->flags & I915_VMA_GLOBAL_BIND))
+		if (!(vma->flags & I915_VMA_GLOBAL_BIND))
 			goto unwind;
 
-		reloc_offset = i915_gem_obj_ggtt_offset(src);
+		reloc_offset = vma->node.start;
 		if (reloc_offset + num_pages * PAGE_SIZE > ggtt->mappable_end)
 			goto unwind;
 	}
@@ -726,8 +722,6 @@ unwind:
 	kfree(dst);
 	return NULL;
 }
-#define i915_error_ggtt_object_create(dev_priv, src) \
-	i915_error_object_create((dev_priv), (src), &(dev_priv)->ggtt.base)
 
 /* The error capture is special as tries to run underneath the normal
  * locking rules - so we use the raw version of the i915_gem_active lookup.
@@ -1035,11 +1029,8 @@ static void i915_gem_record_rings(struct drm_i915_private *dev_priv,
 	struct drm_i915_gem_request *request;
 	int i, count;
 
-	if (dev_priv->semaphore) {
-		error->semaphore =
-			i915_error_ggtt_object_create(dev_priv,
-						      dev_priv->semaphore->obj);
-	}
+	error->semaphore =
+		i915_error_object_create(dev_priv, dev_priv->semaphore);
 
 	for (i = 0; i < I915_NUM_ENGINES; i++) {
 		struct intel_engine_cs *engine = &dev_priv->engine[i];
@@ -1069,18 +1060,16 @@ static void i915_gem_record_rings(struct drm_i915_private *dev_priv,
 			 */
 			ee->batchbuffer =
 				i915_error_object_create(dev_priv,
-							 request->batch_obj,
-							 ee->vm);
+							 request->batch);
 
 			if (HAS_BROKEN_CS_TLB(dev_priv))
 				ee->wa_batchbuffer =
-					i915_error_ggtt_object_create(dev_priv,
-								      engine->scratch->obj);
+					i915_error_object_create(dev_priv,
+								 engine->scratch);
 
-			if (request->ctx->engine[i].state) {
-				ee->ctx = i915_error_ggtt_object_create(dev_priv,
-									request->ctx->engine[i].state->obj);
-			}
+			ee->ctx =
+				i915_error_object_create(dev_priv,
+							 request->ctx->engine[i].state);
 
 			if (request->pid) {
 				struct task_struct *task;
@@ -1101,16 +1090,15 @@ static void i915_gem_record_rings(struct drm_i915_private *dev_priv,
 			ee->cpu_ring_head = ring->head;
 			ee->cpu_ring_tail = ring->tail;
 			ee->ringbuffer =
-				i915_error_ggtt_object_create(dev_priv,
-							      ring->vma->obj);
+				i915_error_object_create(dev_priv, ring->vma);
 		}
 
 		ee->hws_page =
-			i915_error_ggtt_object_create(dev_priv,
-						      engine->status_page.vma->obj);
+			i915_error_object_create(dev_priv,
+						 engine->status_page.vma);
 
-		ee->wa_ctx = i915_error_ggtt_object_create(dev_priv,
-							   engine->wa_ctx.vma->obj);
+		ee->wa_ctx =
+			i915_error_object_create(dev_priv, engine->wa_ctx.vma);
 
 		count = 0;
 		list_for_each_entry(request, &engine->request_list, link)
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 3deee0306e82..15129c65d65d 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -2206,14 +2206,14 @@ static unsigned int intel_surf_alignment(const struct drm_i915_private *dev_priv
 	}
 }
 
-int
-intel_pin_and_fence_fb_obj(struct drm_framebuffer *fb,
-			   unsigned int rotation)
+struct i915_vma *
+intel_pin_and_fence_fb_obj(struct drm_framebuffer *fb, unsigned int rotation)
 {
 	struct drm_device *dev = fb->dev;
 	struct drm_i915_private *dev_priv = to_i915(dev);
 	struct drm_i915_gem_object *obj = intel_fb_obj(fb);
 	struct i915_ggtt_view view;
+	struct i915_vma *vma;
 	u32 alignment;
 	int ret;
 
@@ -2240,10 +2240,11 @@ intel_pin_and_fence_fb_obj(struct drm_framebuffer *fb,
 	 */
 	intel_runtime_pm_get(dev_priv);
 
-	ret = i915_gem_object_pin_to_display_plane(obj, alignment,
-						   &view);
-	if (ret)
+	vma = i915_gem_object_pin_to_display_plane(obj, alignment, &view);
+	if (IS_ERR(vma)) {
+		ret = PTR_ERR(vma);
 		goto err_pm;
+	}
 
 	/* Install a fence for tiled scan-out. Pre-i965 always needs a
 	 * fence, whereas 965+ only requires a fence if using
@@ -2270,19 +2271,20 @@ intel_pin_and_fence_fb_obj(struct drm_framebuffer *fb,
 	}
 
 	intel_runtime_pm_put(dev_priv);
-	return 0;
+	return vma;
 
 err_unpin:
-	i915_gem_object_unpin_from_display_plane(obj, &view);
+	i915_gem_object_unpin_from_display_plane(vma);
 err_pm:
 	intel_runtime_pm_put(dev_priv);
-	return ret;
+	return ERR_PTR(ret);
 }
 
 void intel_unpin_fb_obj(struct drm_framebuffer *fb, unsigned int rotation)
 {
 	struct drm_i915_gem_object *obj = intel_fb_obj(fb);
 	struct i915_ggtt_view view;
+	struct i915_vma *vma;
 
 	WARN_ON(!mutex_is_locked(&obj->base.dev->struct_mutex));
 
@@ -2291,7 +2293,8 @@ void intel_unpin_fb_obj(struct drm_framebuffer *fb, unsigned int rotation)
 	if (view.type == I915_GGTT_VIEW_NORMAL)
 		i915_gem_object_unpin_fence(obj);
 
-	i915_gem_object_unpin_from_display_plane(obj, &view);
+	vma = i915_gem_object_to_ggtt(obj, &view);
+	i915_gem_object_unpin_from_display_plane(vma);
 }
 
 /*
@@ -2552,7 +2555,7 @@ intel_find_initial_plane_obj(struct intel_crtc *intel_crtc,
 			continue;
 
 		obj = intel_fb_obj(fb);
-		if (i915_gem_obj_ggtt_offset(obj) == plane_config->base) {
+		if (i915_gem_object_ggtt_offset(obj, NULL) == plane_config->base) {
 			drm_framebuffer_reference(fb);
 			goto valid_fb;
 		}
@@ -2709,11 +2712,11 @@ static void i9xx_update_primary_plane(struct drm_plane *primary,
 	I915_WRITE(DSPSTRIDE(plane), fb->pitches[0]);
 	if (INTEL_INFO(dev)->gen >= 4) {
 		I915_WRITE(DSPSURF(plane),
-			   i915_gem_obj_ggtt_offset(obj) + intel_crtc->dspaddr_offset);
+			   i915_gem_object_ggtt_offset(obj, NULL) + intel_crtc->dspaddr_offset);
 		I915_WRITE(DSPTILEOFF(plane), (y << 16) | x);
 		I915_WRITE(DSPLINOFF(plane), linear_offset);
 	} else
-		I915_WRITE(DSPADDR(plane), i915_gem_obj_ggtt_offset(obj) + linear_offset);
+		I915_WRITE(DSPADDR(plane), i915_gem_object_ggtt_offset(obj, NULL) + linear_offset);
 	POSTING_READ(reg);
 }
 
@@ -2813,7 +2816,7 @@ static void ironlake_update_primary_plane(struct drm_plane *primary,
 
 	I915_WRITE(DSPSTRIDE(plane), fb->pitches[0]);
 	I915_WRITE(DSPSURF(plane),
-		   i915_gem_obj_ggtt_offset(obj) + intel_crtc->dspaddr_offset);
+		   i915_gem_object_ggtt_offset(obj, NULL) + intel_crtc->dspaddr_offset);
 	if (IS_HASWELL(dev) || IS_BROADWELL(dev)) {
 		I915_WRITE(DSPOFFSET(plane), (y << 16) | x);
 	} else {
@@ -2846,7 +2849,7 @@ u32 intel_plane_obj_offset(struct intel_plane *intel_plane,
 	intel_fill_fb_ggtt_view(&view, intel_plane->base.state->fb,
 				intel_plane->base.state->rotation);
 
-	vma = i915_gem_obj_to_ggtt_view(obj, &view);
+	vma = i915_gem_object_to_ggtt(obj, &view);
 	if (WARN(!vma, "ggtt vma for display object not found! (view=%u)\n",
 		view.type))
 		return -1;
@@ -11577,6 +11580,7 @@ static int intel_crtc_page_flip(struct drm_crtc *crtc,
 	struct intel_engine_cs *engine;
 	bool mmio_flip;
 	struct drm_i915_gem_request *request;
+	struct i915_vma *vma;
 	int ret;
 
 	/*
@@ -11685,9 +11689,11 @@ static int intel_crtc_page_flip(struct drm_crtc *crtc,
 
 	mmio_flip = use_mmio_flip(engine, obj);
 
-	ret = intel_pin_and_fence_fb_obj(fb, primary->state->rotation);
-	if (ret)
+	vma = intel_pin_and_fence_fb_obj(fb, primary->state->rotation);
+	if (IS_ERR(vma)) {
+		ret = PTR_ERR(vma);
 		goto cleanup_pending;
+	}
 
 	work->gtt_offset = intel_plane_obj_offset(to_intel_plane(primary),
 						  obj, 0);
@@ -14040,7 +14046,11 @@ intel_prepare_plane_fb(struct drm_plane *plane,
 		if (ret)
 			DRM_DEBUG_KMS("failed to attach phys object\n");
 	} else {
-		ret = intel_pin_and_fence_fb_obj(fb, new_state->rotation);
+		struct i915_vma *vma;
+
+		vma = intel_pin_and_fence_fb_obj(fb, new_state->rotation);
+		if (IS_ERR(vma))
+			ret = PTR_ERR(vma);
 	}
 
 	if (ret == 0) {
@@ -14396,7 +14406,7 @@ intel_update_cursor_plane(struct drm_plane *plane,
 	if (!obj)
 		addr = 0;
 	else if (!INTEL_INFO(dev)->cursor_needs_physical)
-		addr = i915_gem_obj_ggtt_offset(obj);
+		addr = i915_gem_object_ggtt_offset(obj, NULL);
 	else
 		addr = obj->phys_handle->busaddr;
 
@@ -16228,7 +16238,6 @@ void intel_modeset_gem_init(struct drm_device *dev)
 	struct drm_i915_private *dev_priv = to_i915(dev);
 	struct drm_crtc *c;
 	struct drm_i915_gem_object *obj;
-	int ret;
 
 	intel_init_gt_powersave(dev_priv);
 
@@ -16242,15 +16251,17 @@ void intel_modeset_gem_init(struct drm_device *dev)
 	 * for this.
 	 */
 	for_each_crtc(dev, c) {
+		struct i915_vma *vma;
+
 		obj = intel_fb_obj(c->primary->fb);
 		if (obj == NULL)
 			continue;
 
 		mutex_lock(&dev->struct_mutex);
-		ret = intel_pin_and_fence_fb_obj(c->primary->fb,
+		vma = intel_pin_and_fence_fb_obj(c->primary->fb,
 						 c->primary->state->rotation);
 		mutex_unlock(&dev->struct_mutex);
-		if (ret) {
+		if (IS_ERR(vma)) {
 			DRM_ERROR("failed to pin boot fb on pipe %d\n",
 				  to_intel_crtc(c)->pipe);
 			drm_framebuffer_unreference(c->primary->fb);
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index 1ad2e2c5f580..5050cc126915 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -183,6 +183,7 @@ struct intel_framebuffer {
 struct intel_fbdev {
 	struct drm_fb_helper helper;
 	struct intel_framebuffer *fb;
+	struct i915_vma *vma;
 	async_cookie_t cookie;
 	int preferred_bpp;
 };
@@ -1217,8 +1218,8 @@ bool intel_get_load_detect_pipe(struct drm_connector *connector,
 void intel_release_load_detect_pipe(struct drm_connector *connector,
 				    struct intel_load_detect_pipe *old,
 				    struct drm_modeset_acquire_ctx *ctx);
-int intel_pin_and_fence_fb_obj(struct drm_framebuffer *fb,
-			       unsigned int rotation);
+struct i915_vma *
+intel_pin_and_fence_fb_obj(struct drm_framebuffer *fb, unsigned int rotation);
 void intel_unpin_fb_obj(struct drm_framebuffer *fb, unsigned int rotation);
 struct drm_framebuffer *
 __intel_framebuffer_create(struct drm_device *dev,
diff --git a/drivers/gpu/drm/i915/intel_fbc.c b/drivers/gpu/drm/i915/intel_fbc.c
index 85adc2b92594..c2ef71a1215e 100644
--- a/drivers/gpu/drm/i915/intel_fbc.c
+++ b/drivers/gpu/drm/i915/intel_fbc.c
@@ -737,7 +737,7 @@ static void intel_fbc_update_state_cache(struct intel_crtc *crtc,
 	/* FIXME: We lack the proper locking here, so only run this on the
 	 * platforms that need. */
 	if (IS_GEN(dev_priv, 5, 6))
-		cache->fb.ilk_ggtt_offset = i915_gem_obj_ggtt_offset(obj);
+		cache->fb.ilk_ggtt_offset = i915_gem_object_ggtt_offset(obj, NULL);
 	cache->fb.pixel_format = fb->pixel_format;
 	cache->fb.stride = fb->pitches[0];
 	cache->fb.fence_reg = obj->fence_reg;
diff --git a/drivers/gpu/drm/i915/intel_fbdev.c b/drivers/gpu/drm/i915/intel_fbdev.c
index 0436b4869d57..692bf75db3bd 100644
--- a/drivers/gpu/drm/i915/intel_fbdev.c
+++ b/drivers/gpu/drm/i915/intel_fbdev.c
@@ -188,7 +188,6 @@ static int intelfb_create(struct drm_fb_helper *helper,
 	struct fb_info *info;
 	struct drm_framebuffer *fb;
 	struct i915_vma *vma;
-	struct drm_i915_gem_object *obj;
 	bool prealloc = false;
 	void __iomem *vaddr;
 	int ret;
@@ -216,17 +215,17 @@ static int intelfb_create(struct drm_fb_helper *helper,
 		sizes->fb_height = intel_fb->base.height;
 	}
 
-	obj = intel_fb->obj;
-
 	mutex_lock(&dev->struct_mutex);
 
 	/* Pin the GGTT vma for our access via info->screen_base.
 	 * This also validates that any existing fb inherited from the
 	 * BIOS is suitable for own access.
 	 */
-	ret = intel_pin_and_fence_fb_obj(&ifbdev->fb->base, BIT(DRM_ROTATE_0));
-	if (ret)
+	vma = intel_pin_and_fence_fb_obj(&ifbdev->fb->base, BIT(DRM_ROTATE_0));
+	if (IS_ERR(vma)) {
+		ret = PTR_ERR(vma);
 		goto out_unlock;
+	}
 
 	info = drm_fb_helper_alloc_fbi(helper);
 	if (IS_ERR(info)) {
@@ -246,8 +245,6 @@ static int intelfb_create(struct drm_fb_helper *helper,
 	info->flags = FBINFO_DEFAULT | FBINFO_CAN_FORCE_OUTPUT;
 	info->fbops = &intelfb_ops;
 
-	vma = i915_gem_obj_to_ggtt(obj);
-
 	/* setup aperture base/size for vesafb takeover */
 	info->apertures->ranges[0].base = dev->mode_config.fb_base;
 	info->apertures->ranges[0].size = ggtt->mappable_end;
@@ -274,14 +271,14 @@ static int intelfb_create(struct drm_fb_helper *helper,
 	 * If the object is stolen however, it will be full of whatever
 	 * garbage was left in there.
 	 */
-	if (ifbdev->fb->obj->stolen && !prealloc)
+	if (intel_fb->obj->stolen && !prealloc)
 		memset_io(info->screen_base, 0, info->screen_size);
 
 	/* Use default scratch pixmap (info->pixmap.flags = FB_PIXMAP_SYSTEM) */
 
-	DRM_DEBUG_KMS("allocated %dx%d fb: 0x%08llx, bo %p\n",
-		      fb->width, fb->height,
-		      i915_gem_obj_ggtt_offset(obj), obj);
+	DRM_DEBUG_KMS("allocated %dx%d fb: 0x%08llx\n",
+		      fb->width, fb->height, vma->node.start);
+	ifbdev->vma = vma;
 
 	mutex_unlock(&dev->struct_mutex);
 	vga_switcheroo_client_fb_set(dev->pdev, info);
diff --git a/drivers/gpu/drm/i915/intel_guc_loader.c b/drivers/gpu/drm/i915/intel_guc_loader.c
index 58ef4418a2ef..11ea0ae1285d 100644
--- a/drivers/gpu/drm/i915/intel_guc_loader.c
+++ b/drivers/gpu/drm/i915/intel_guc_loader.c
@@ -237,12 +237,12 @@ static inline bool guc_ucode_response(struct drm_i915_private *dev_priv,
  * Note that GuC needs the CSS header plus uKernel code to be copied by the
  * DMA engine in one operation, whereas the RSA signature is loaded via MMIO.
  */
-static int guc_ucode_xfer_dma(struct drm_i915_private *dev_priv)
+static int guc_ucode_xfer_dma(struct drm_i915_private *dev_priv,
+			      struct i915_vma *vma)
 {
 	struct intel_guc_fw *guc_fw = &dev_priv->guc.guc_fw;
-	struct drm_i915_gem_object *fw_obj = guc_fw->guc_fw_obj;
 	unsigned long offset;
-	struct sg_table *sg = fw_obj->pages;
+	struct sg_table *sg = vma->obj->pages;
 	u32 status, rsa[UOS_RSA_SCRATCH_MAX_COUNT];
 	int i, ret = 0;
 
@@ -259,7 +259,7 @@ static int guc_ucode_xfer_dma(struct drm_i915_private *dev_priv)
 	I915_WRITE(DMA_COPY_SIZE, guc_fw->header_size + guc_fw->ucode_size);
 
 	/* Set the source address for the new blob */
-	offset = i915_gem_obj_ggtt_offset(fw_obj) + guc_fw->header_offset;
+	offset = vma->node.start + guc_fw->header_offset;
 	I915_WRITE(DMA_ADDR_0_LOW, lower_32_bits(offset));
 	I915_WRITE(DMA_ADDR_0_HIGH, upper_32_bits(offset) & 0xFFFF);
 
@@ -314,6 +314,7 @@ static int guc_ucode_xfer(struct drm_i915_private *dev_priv)
 {
 	struct intel_guc_fw *guc_fw = &dev_priv->guc.guc_fw;
 	struct drm_device *dev = &dev_priv->drm;
+	struct i915_vma *vma;
 	int ret;
 
 	ret = i915_gem_object_set_to_gtt_domain(guc_fw->guc_fw_obj, false);
@@ -322,10 +323,10 @@ static int guc_ucode_xfer(struct drm_i915_private *dev_priv)
 		return ret;
 	}
 
-	ret = i915_gem_object_ggtt_pin(guc_fw->guc_fw_obj, NULL, 0, 0, 0);
-	if (ret) {
-		DRM_DEBUG_DRIVER("pin failed %d\n", ret);
-		return ret;
+	vma = i915_gem_object_ggtt_pin(guc_fw->guc_fw_obj, NULL, 0, 0, 0);
+	if (IS_ERR(vma)) {
+		DRM_DEBUG_DRIVER("pin failed %d\n", (int)PTR_ERR(vma));
+		return PTR_ERR(vma);
 	}
 
 	/* Invalidate GuC TLB to let GuC take the latest updates to GTT. */
@@ -368,7 +369,7 @@ static int guc_ucode_xfer(struct drm_i915_private *dev_priv)
 
 	set_guc_init_params(dev_priv);
 
-	ret = guc_ucode_xfer_dma(dev_priv);
+	ret = guc_ucode_xfer_dma(dev_priv, vma);
 
 	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
 
@@ -376,7 +377,7 @@ static int guc_ucode_xfer(struct drm_i915_private *dev_priv)
 	 * We keep the object pages for reuse during resume. But we can unpin it
 	 * now that DMA has completed, so it doesn't continue to take up space.
 	 */
-	i915_gem_object_ggtt_unpin(guc_fw->guc_fw_obj);
+	i915_vma_unpin(vma);
 
 	return ret;
 }
diff --git a/drivers/gpu/drm/i915/intel_overlay.c b/drivers/gpu/drm/i915/intel_overlay.c
index 3f44b77aa0a2..ce0835eee2f5 100644
--- a/drivers/gpu/drm/i915/intel_overlay.c
+++ b/drivers/gpu/drm/i915/intel_overlay.c
@@ -325,7 +325,7 @@ static void intel_overlay_release_old_vid_tail(struct i915_gem_active *active,
 	i915_gem_track_fb(vma->obj, NULL,
 			  INTEL_FRONTBUFFER_OVERLAY(overlay->crtc->pipe));
 
-	i915_gem_object_unpin_from_display_plane(vma->obj, &i915_ggtt_view_normal);
+	i915_gem_object_unpin_from_display_plane(vma);
 	i915_gem_object_put(vma->obj);
 }
 
@@ -341,7 +341,7 @@ static void intel_overlay_off_tail(struct i915_gem_active *active,
 	if (WARN_ON(!vma))
 		return;
 
-	i915_gem_object_unpin_from_display_plane(vma->obj, &i915_ggtt_view_normal);
+	i915_gem_object_unpin_from_display_plane(vma);
 	i915_gem_object_put(vma->obj);
 
 	overlay->crtc->overlay = NULL;
@@ -754,12 +754,10 @@ static int intel_overlay_do_put_image(struct intel_overlay *overlay,
 	if (ret != 0)
 		return ret;
 
-	ret = i915_gem_object_pin_to_display_plane(new_bo, 0,
+	vma = i915_gem_object_pin_to_display_plane(new_bo, 0,
 						   &i915_ggtt_view_normal);
-	if (ret != 0)
-		return ret;
-
-	vma = i915_gem_obj_to_ggtt_view(new_bo, &i915_ggtt_view_normal);
+	if (IS_ERR(vma))
+		return PTR_ERR(vma);
 
 	ret = i915_gem_object_put_fence(new_bo);
 	if (ret)
@@ -802,7 +800,7 @@ static int intel_overlay_do_put_image(struct intel_overlay *overlay,
 	swidth = params->src_w;
 	swidthsw = calc_swidthsw(dev_priv, params->offset_Y, tmp_width);
 	sheight = params->src_h;
-	iowrite32(i915_gem_obj_ggtt_offset(new_bo) + params->offset_Y, &regs->OBUF_0Y);
+	iowrite32(vma->node.start + params->offset_Y, &regs->OBUF_0Y);
 	ostride = params->stride_Y;
 
 	if (params->format & I915_OVERLAY_YUV_PLANAR) {
@@ -816,8 +814,8 @@ static int intel_overlay_do_put_image(struct intel_overlay *overlay,
 				      params->src_w/uv_hscale);
 		swidthsw |= max_t(u32, tmp_U, tmp_V) << 16;
 		sheight |= (params->src_h/uv_vscale) << 16;
-		iowrite32(i915_gem_obj_ggtt_offset(new_bo) + params->offset_U, &regs->OBUF_0U);
-		iowrite32(i915_gem_obj_ggtt_offset(new_bo) + params->offset_V, &regs->OBUF_0V);
+		iowrite32(vma->node.start + params->offset_U, &regs->OBUF_0U);
+		iowrite32(vma->node.start + params->offset_V, &regs->OBUF_0V);
 		ostride |= params->stride_UV << 16;
 	}
 
@@ -849,7 +847,7 @@ static int intel_overlay_do_put_image(struct intel_overlay *overlay,
 	return 0;
 
 out_unpin:
-	i915_gem_object_ggtt_unpin(new_bo);
+	i915_gem_object_unpin_from_display_plane(vma);
 	return ret;
 }
 
@@ -1372,6 +1370,7 @@ void intel_setup_overlay(struct drm_i915_private *dev_priv)
 	struct intel_overlay *overlay;
 	struct drm_i915_gem_object *reg_bo;
 	struct overlay_registers __iomem *regs;
+	struct i915_vma *vma = NULL;
 	int ret;
 
 	if (!HAS_OVERLAY(dev_priv))
@@ -1405,13 +1404,14 @@ void intel_setup_overlay(struct drm_i915_private *dev_priv)
 		}
 		overlay->flip_addr = reg_bo->phys_handle->busaddr;
 	} else {
-		ret = i915_gem_object_ggtt_pin(reg_bo, NULL,
+		vma = i915_gem_object_ggtt_pin(reg_bo, NULL,
 					       0, PAGE_SIZE, PIN_MAPPABLE);
-		if (ret) {
+		if (IS_ERR(vma)) {
 			DRM_ERROR("failed to pin overlay register bo\n");
+			ret = PTR_ERR(vma);
 			goto out_free_bo;
 		}
-		overlay->flip_addr = i915_gem_obj_ggtt_offset(reg_bo);
+		overlay->flip_addr = vma->node.start;
 
 		ret = i915_gem_object_set_to_gtt_domain(reg_bo, true);
 		if (ret) {
@@ -1443,8 +1443,8 @@ void intel_setup_overlay(struct drm_i915_private *dev_priv)
 	return;
 
 out_unpin_bo:
-	if (!OVERLAY_NEEDS_PHYSICAL(dev_priv))
-		i915_gem_object_ggtt_unpin(reg_bo);
+	if (vma)
+		i915_vma_unpin(vma);
 out_free_bo:
 	i915_gem_object_put(reg_bo);
 out_free:
diff --git a/drivers/gpu/drm/i915/intel_sprite.c b/drivers/gpu/drm/i915/intel_sprite.c
index 5beafd4bc1c1..010d69810657 100644
--- a/drivers/gpu/drm/i915/intel_sprite.c
+++ b/drivers/gpu/drm/i915/intel_sprite.c
@@ -477,8 +477,8 @@ vlv_update_plane(struct drm_plane *dplane,
 
 	I915_WRITE(SPSIZE(pipe, plane), (crtc_h << 16) | crtc_w);
 	I915_WRITE(SPCNTR(pipe, plane), sprctl);
-	I915_WRITE(SPSURF(pipe, plane), i915_gem_obj_ggtt_offset(obj) +
-		   sprsurf_offset);
+	I915_WRITE(SPSURF(pipe, plane),
+		   i915_gem_object_ggtt_offset(obj, NULL) + sprsurf_offset);
 	POSTING_READ(SPSURF(pipe, plane));
 }
 
@@ -617,7 +617,7 @@ ivb_update_plane(struct drm_plane *plane,
 		I915_WRITE(SPRSCALE(pipe), sprscale);
 	I915_WRITE(SPRCTL(pipe), sprctl);
 	I915_WRITE(SPRSURF(pipe),
-		   i915_gem_obj_ggtt_offset(obj) + sprsurf_offset);
+		   i915_gem_object_ggtt_offset(obj, NULL) + sprsurf_offset);
 	POSTING_READ(SPRSURF(pipe));
 }
 
@@ -746,7 +746,7 @@ ilk_update_plane(struct drm_plane *plane,
 	I915_WRITE(DVSSCALE(pipe), dvsscale);
 	I915_WRITE(DVSCNTR(pipe), dvscntr);
 	I915_WRITE(DVSSURF(pipe),
-		   i915_gem_obj_ggtt_offset(obj) + dvssurf_offset);
+		   i915_gem_object_ggtt_offset(obj, NULL) + dvssurf_offset);
 	POSTING_READ(DVSSURF(pipe));
 }
 
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 125+ messages in thread

* [PATCH 27/33] drm/i915: Print the batchbuffer offset next to BBADDR in error state
  2016-08-07 14:45 First class VMA, take 2 Chris Wilson
                   ` (25 preceding siblings ...)
  2016-08-07 14:45 ` [PATCH 26/33] drm/i915: Track pinned VMA Chris Wilson
@ 2016-08-07 14:45 ` Chris Wilson
  2016-08-11 12:24   ` Joonas Lahtinen
  2016-08-07 14:45 ` [PATCH 28/33] drm/i915: Move per-request pid from request to ctx Chris Wilson
                   ` (11 subsequent siblings)
  38 siblings, 1 reply; 125+ messages in thread
From: Chris Wilson @ 2016-08-07 14:45 UTC (permalink / raw)
  To: intel-gfx

It is useful when looking at captured error states to check the recorded
BBADDR register (the address of the last batchbuffer instruction loaded)
against the expected offset of the batch buffer, and so do a quick check
that (a) the capture is true or (b) HEAD hasn't wandered off into the
badlands.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h       |  1 +
 drivers/gpu/drm/i915/i915_gpu_error.c | 13 ++++++++++++-
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index ed9d872859b3..4023718017a8 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -552,6 +552,7 @@ struct drm_i915_error_state {
 		struct drm_i915_error_object {
 			int page_count;
 			u64 gtt_offset;
+			u64 gtt_size;
 			u32 *pages[0];
 		} *ringbuffer, *batchbuffer, *wa_batchbuffer, *ctx, *hws_page;
 
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 1bceaf96bc5f..9faac19029cd 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -243,6 +243,14 @@ static void error_print_engine(struct drm_i915_error_state_buf *m,
 	err_printf(m, "  IPEIR: 0x%08x\n", ee->ipeir);
 	err_printf(m, "  IPEHR: 0x%08x\n", ee->ipehr);
 	err_printf(m, "  INSTDONE: 0x%08x\n", ee->instdone);
+	if (ee->batchbuffer) {
+		u64 start = ee->batchbuffer->gtt_offset;
+		u64 end = start + ee->batchbuffer->gtt_size;
+
+		err_printf(m, "  batch: [0x%08x %08x, 0x%08x %08x]\n",
+			   upper_32_bits(start), lower_32_bits(start),
+			   upper_32_bits(end), lower_32_bits(end));
+	}
 	if (INTEL_GEN(m->i915) >= 4) {
 		err_printf(m, "  BBADDR: 0x%08x %08x\n",
 			   (u32)(ee->bbaddr>>32), (u32)ee->bbaddr);
@@ -657,7 +665,10 @@ i915_error_object_create(struct drm_i915_private *dev_priv,
 	if (!dst)
 		return NULL;
 
-	reloc_offset = dst->gtt_offset = vma->node.start;
+	dst->gtt_offset = vma->node.start;
+	dst->gtt_size = vma->node.size;
+
+	reloc_offset = dst->gtt_offset;
 	use_ggtt = (src->cache_level == I915_CACHE_NONE &&
 		   (vma->flags & I915_VMA_GLOBAL_BIND) &&
 		   reloc_offset + num_pages * PAGE_SIZE <= ggtt->mappable_end);
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 125+ messages in thread

* [PATCH 28/33] drm/i915: Move per-request pid from request to ctx
  2016-08-07 14:45 First class VMA, take 2 Chris Wilson
                   ` (26 preceding siblings ...)
  2016-08-07 14:45 ` [PATCH 27/33] drm/i915: Print the batchbuffer offset next to BBADDR in error state Chris Wilson
@ 2016-08-07 14:45 ` Chris Wilson
  2016-08-11 12:32   ` Joonas Lahtinen
  2016-08-07 14:45 ` [PATCH 29/33] drm/i915: Only record active and pending requests upon a GPU hang Chris Wilson
                   ` (10 subsequent siblings)
  38 siblings, 1 reply; 125+ messages in thread
From: Chris Wilson @ 2016-08-07 14:45 UTC (permalink / raw)
  To: intel-gfx

Since contexts are not currently shared between userspace processes, we
have an exact correspondence between context creator and guilty batch
submitter. Therefore we can save some per-batch work by inspecting the
context->pid upon error instead. Note that we take the context's
creator's pid rather than the file's pid in order to better track fd
passed over sockets.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c     | 25 ++++++++++++++++---------
 drivers/gpu/drm/i915/i915_drv.h         |  2 ++
 drivers/gpu/drm/i915/i915_gem_context.c |  4 ++++
 drivers/gpu/drm/i915/i915_gem_request.c |  5 -----
 drivers/gpu/drm/i915/i915_gem_request.h |  3 ---
 drivers/gpu/drm/i915/i915_gpu_error.c   | 13 ++++++++++---
 6 files changed, 32 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 5f00d6347905..963c6d28d332 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -460,6 +460,8 @@ static int i915_gem_object_info(struct seq_file *m, void* data)
 	print_context_stats(m, dev_priv);
 	list_for_each_entry_reverse(file, &dev->filelist, lhead) {
 		struct file_stats stats;
+		struct drm_i915_file_private *file_priv = file->driver_priv;
+		struct drm_i915_gem_request *request;
 		struct task_struct *task;
 
 		memset(&stats, 0, sizeof(stats));
@@ -473,10 +475,17 @@ static int i915_gem_object_info(struct seq_file *m, void* data)
 		 * still alive (e.g. get_pid(current) => fork() => exit()).
 		 * Therefore, we need to protect this ->comm access using RCU.
 		 */
+		mutex_lock(&dev->struct_mutex);
+		request = list_first_entry_or_null(&file_priv->mm.request_list,
+						   struct drm_i915_gem_request,
+						   client_list);
 		rcu_read_lock();
-		task = pid_task(file->pid, PIDTYPE_PID);
+		task = pid_task(request && request->ctx->pid ?
+				request->ctx->pid : file->pid,
+				PIDTYPE_PID);
 		print_file_stats(m, task ? task->comm : "<unknown>", stats);
 		rcu_read_unlock();
+		mutex_unlock(&dev->struct_mutex);
 	}
 	mutex_unlock(&dev->filelist_mutex);
 
@@ -657,12 +666,11 @@ static int i915_gem_request_info(struct seq_file *m, void *data)
 
 		seq_printf(m, "%s requests: %d\n", engine->name, count);
 		list_for_each_entry(req, &engine->request_list, link) {
+			struct pid *pid = req->ctx->pid;
 			struct task_struct *task;
 
 			rcu_read_lock();
-			task = NULL;
-			if (req->pid)
-				task = pid_task(req->pid, PIDTYPE_PID);
+			task = pid ? pid_task(pid, PIDTYPE_PID) : NULL;
 			seq_printf(m, "    %x @ %d: %s [%d]\n",
 				   req->fence.seqno,
 				   (int) (jiffies - req->emitted_jiffies),
@@ -1951,18 +1959,17 @@ static int i915_context_status(struct seq_file *m, void *unused)
 
 	list_for_each_entry(ctx, &dev_priv->context_list, link) {
 		seq_printf(m, "HW context %u ", ctx->hw_id);
-		if (IS_ERR(ctx->file_priv)) {
-			seq_puts(m, "(deleted) ");
-		} else if (ctx->file_priv) {
-			struct pid *pid = ctx->file_priv->file->pid;
+		if (ctx->pid) {
 			struct task_struct *task;
 
-			task = get_pid_task(pid, PIDTYPE_PID);
+			task = get_pid_task(ctx->pid, PIDTYPE_PID);
 			if (task) {
 				seq_printf(m, "(%s [%d]) ",
 					   task->comm, task->pid);
 				put_task_struct(task);
 			}
+		} else if (IS_ERR(ctx->file_priv)) {
+			seq_puts(m, "(deleted) ");
 		} else {
 			seq_puts(m, "(kernel) ");
 		}
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 4023718017a8..e7357656728e 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -560,6 +560,7 @@ struct drm_i915_error_state {
 
 		struct drm_i915_error_request {
 			long jiffies;
+			pid_t pid;
 			u32 seqno;
 			u32 tail;
 		} *requests;
@@ -880,6 +881,7 @@ struct i915_gem_context {
 	struct drm_i915_private *i915;
 	struct drm_i915_file_private *file_priv;
 	struct i915_hw_ppgtt *ppgtt;
+	struct pid *pid;
 
 	struct i915_ctx_hang_stats hang_stats;
 
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 15eed897b498..c026d591d142 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -158,6 +158,7 @@ void i915_gem_context_free(struct kref *ctx_ref)
 		i915_gem_object_put(ce->state->obj);
 	}
 
+	put_pid(ctx->pid);
 	list_del(&ctx->link);
 
 	ida_simple_remove(&ctx->i915->context_hw_ida, ctx->hw_id);
@@ -311,6 +312,9 @@ __create_hw_context(struct drm_device *dev,
 		ret = DEFAULT_CONTEXT_HANDLE;
 
 	ctx->file_priv = file_priv;
+	if (file_priv)
+		ctx->pid = get_task_pid(current, PIDTYPE_PID);
+
 	ctx->user_handle = ret;
 	/* NB: Mark all slices as needing a remap so that when the context first
 	 * loads it will restore whatever remap state already exists. If there
diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index 187c4f9ce8d0..8fdd313248f9 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -137,8 +137,6 @@ int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
 	list_add_tail(&req->client_list, &file_priv->mm.request_list);
 	spin_unlock(&file_priv->mm.lock);
 
-	req->pid = get_pid(task_pid(current));
-
 	return 0;
 }
 
@@ -154,9 +152,6 @@ i915_gem_request_remove_from_client(struct drm_i915_gem_request *request)
 	list_del(&request->client_list);
 	request->file_priv = NULL;
 	spin_unlock(&file_priv->mm.lock);
-
-	put_pid(request->pid);
-	request->pid = NULL;
 }
 
 void i915_gem_retire_noop(struct i915_gem_active *active,
diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
index 1f396f470a86..72a4b73cbb79 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.h
+++ b/drivers/gpu/drm/i915/i915_gem_request.h
@@ -134,9 +134,6 @@ struct drm_i915_gem_request {
 	/** file_priv list entry for this request */
 	struct list_head client_list;
 
-	/** process identifier submitting this request */
-	struct pid *pid;
-
 	/**
 	 * The ELSP only accepts two elements at a time, so we queue
 	 * context/tail pairs on a given queue (ring->execlist_queue) until the
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 9faac19029cd..52d1564f37c4 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -460,7 +460,8 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
 				   dev_priv->engine[i].name,
 				   ee->num_requests);
 			for (j = 0; j < ee->num_requests; j++) {
-				err_printf(m, "  seqno 0x%08x, emitted %ld, tail 0x%08x\n",
+				err_printf(m, "  pid %d, seqno 0x%08x, emitted %ld, tail 0x%08x\n",
+					   ee->requests[j].pid,
 					   ee->requests[j].seqno,
 					   ee->requests[j].jiffies,
 					   ee->requests[j].tail);
@@ -1061,6 +1062,7 @@ static void i915_gem_record_rings(struct drm_i915_private *dev_priv,
 		request = i915_gem_find_active_request(engine);
 		if (request) {
 			struct intel_ring *ring;
+			struct pid *pid;
 
 			ee->vm = request->ctx->ppgtt ?
 				&request->ctx->ppgtt->base : &ggtt->base;
@@ -1082,11 +1084,12 @@ static void i915_gem_record_rings(struct drm_i915_private *dev_priv,
 				i915_error_object_create(dev_priv,
 							 request->ctx->engine[i].state);
 
-			if (request->pid) {
+			pid = request->ctx->pid;
+			if (pid) {
 				struct task_struct *task;
 
 				rcu_read_lock();
-				task = pid_task(request->pid, PIDTYPE_PID);
+				task = pid_task(pid, PIDTYPE_PID);
 				if (task) {
 					strcpy(ee->comm, task->comm);
 					ee->pid = task->pid;
@@ -1150,6 +1153,10 @@ static void i915_gem_record_rings(struct drm_i915_private *dev_priv,
 			erq->seqno = request->fence.seqno;
 			erq->jiffies = request->emitted_jiffies;
 			erq->tail = request->postfix;
+
+			rcu_read_lock();
+			erq->pid = request->ctx ? pid_nr(request->ctx->pid) : 0;
+			rcu_read_unlock();
 		}
 	}
 }
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 125+ messages in thread

* [PATCH 29/33] drm/i915: Only record active and pending requests upon a GPU hang
  2016-08-07 14:45 First class VMA, take 2 Chris Wilson
                   ` (27 preceding siblings ...)
  2016-08-07 14:45 ` [PATCH 28/33] drm/i915: Move per-request pid from request to ctx Chris Wilson
@ 2016-08-07 14:45 ` Chris Wilson
  2016-08-11 12:36   ` Joonas Lahtinen
  2016-08-07 14:45 ` [PATCH 30/33] drm/i915: Record the RING_MODE register for post-mortem debugging Chris Wilson
                   ` (9 subsequent siblings)
  38 siblings, 1 reply; 125+ messages in thread
From: Chris Wilson @ 2016-08-07 14:45 UTC (permalink / raw)
  To: intel-gfx

There is no other state pertaining to the completed requests in the
hang, other than gleamed through the ringbuffer, so including the
expired requests in the list of outstanding requests simply adds noise.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gpu_error.c | 105 +++++++++++++++++++---------------
 1 file changed, 58 insertions(+), 47 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 52d1564f37c4..5d8fd0beda2e 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1034,12 +1034,65 @@ static void error_record_engine_registers(struct drm_i915_error_state *error,
 	}
 }
 
+static void engine_record_requests(struct intel_engine_cs *engine,
+				   struct drm_i915_gem_request *first,
+				   struct drm_i915_error_engine *ee)
+{
+	struct drm_i915_gem_request *request;
+	int count;
+
+	count = 0;
+	request = first;
+	list_for_each_entry_from(request, &engine->request_list, link)
+		count++;
+
+	ee->requests = NULL;
+	kcalloc(count, sizeof(*ee->requests),
+		GFP_ATOMIC);
+	if (ee->requests == NULL)
+		return;
+	ee->num_requests = count;
+
+	count = 0;
+	request = first;
+	list_for_each_entry_from(request, &engine->request_list, link) {
+		struct drm_i915_error_request *erq;
+
+		if (count >= ee->num_requests) {
+			/*
+			 * If the ring request list was changed in
+			 * between the point where the error request
+			 * list was created and dimensioned and this
+			 * point then just exit early to avoid crashes.
+			 *
+			 * We don't need to communicate that the
+			 * request list changed state during error
+			 * state capture and that the error state is
+			 * slightly incorrect as a consequence since we
+			 * are typically only interested in the request
+			 * list state at the point of error state
+			 * capture, not in any changes happening during
+			 * the capture.
+			 */
+			break;
+		}
+
+		erq = &ee->requests[count++];
+		erq->seqno = request->fence.seqno;
+		erq->jiffies = request->emitted_jiffies;
+		erq->tail = request->tail;
+
+		rcu_read_lock();
+		erq->pid = request->ctx ? pid_nr(request->ctx->pid) : 0;
+		rcu_read_unlock();
+	}
+}
+
 static void i915_gem_record_rings(struct drm_i915_private *dev_priv,
 				  struct drm_i915_error_state *error)
 {
 	struct i915_ggtt *ggtt = &dev_priv->ggtt;
-	struct drm_i915_gem_request *request;
-	int i, count;
+	int i;
 
 	error->semaphore =
 		i915_error_object_create(dev_priv, dev_priv->semaphore);
@@ -1047,6 +1100,7 @@ static void i915_gem_record_rings(struct drm_i915_private *dev_priv,
 	for (i = 0; i < I915_NUM_ENGINES; i++) {
 		struct intel_engine_cs *engine = &dev_priv->engine[i];
 		struct drm_i915_error_engine *ee = &error->engine[i];
+		struct drm_i915_gem_request *request;
 
 		ee->pid = -1;
 		ee->engine_id = -1;
@@ -1105,6 +1159,8 @@ static void i915_gem_record_rings(struct drm_i915_private *dev_priv,
 			ee->cpu_ring_tail = ring->tail;
 			ee->ringbuffer =
 				i915_error_object_create(dev_priv, ring->vma);
+
+			engine_record_requests(engine, request, ee);
 		}
 
 		ee->hws_page =
@@ -1113,51 +1169,6 @@ static void i915_gem_record_rings(struct drm_i915_private *dev_priv,
 
 		ee->wa_ctx =
 			i915_error_object_create(dev_priv, engine->wa_ctx.vma);
-
-		count = 0;
-		list_for_each_entry(request, &engine->request_list, link)
-			count++;
-
-		ee->num_requests = count;
-		ee->requests =
-			kcalloc(count, sizeof(*ee->requests), GFP_ATOMIC);
-		if (!ee->requests) {
-			ee->num_requests = 0;
-			continue;
-		}
-
-		count = 0;
-		list_for_each_entry(request, &engine->request_list, link) {
-			struct drm_i915_error_request *erq;
-
-			if (count >= ee->num_requests) {
-				/*
-				 * If the ring request list was changed in
-				 * between the point where the error request
-				 * list was created and dimensioned and this
-				 * point then just exit early to avoid crashes.
-				 *
-				 * We don't need to communicate that the
-				 * request list changed state during error
-				 * state capture and that the error state is
-				 * slightly incorrect as a consequence since we
-				 * are typically only interested in the request
-				 * list state at the point of error state
-				 * capture, not in any changes happening during
-				 * the capture.
-				 */
-				break;
-			}
-
-			erq = &ee->requests[count++];
-			erq->seqno = request->fence.seqno;
-			erq->jiffies = request->emitted_jiffies;
-			erq->tail = request->postfix;
-
-			rcu_read_lock();
-			erq->pid = request->ctx ? pid_nr(request->ctx->pid) : 0;
-			rcu_read_unlock();
-		}
 	}
 }
 
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 125+ messages in thread

* [PATCH 30/33] drm/i915: Record the RING_MODE register for post-mortem debugging
  2016-08-07 14:45 First class VMA, take 2 Chris Wilson
                   ` (28 preceding siblings ...)
  2016-08-07 14:45 ` [PATCH 29/33] drm/i915: Only record active and pending requests upon a GPU hang Chris Wilson
@ 2016-08-07 14:45 ` Chris Wilson
  2016-08-08 11:35   ` Joonas Lahtinen
  2016-08-07 14:45 ` [PATCH 31/33] drm/i915: Always use the GTT for error capture Chris Wilson
                   ` (8 subsequent siblings)
  38 siblings, 1 reply; 125+ messages in thread
From: Chris Wilson @ 2016-08-07 14:45 UTC (permalink / raw)
  To: intel-gfx

Just another useful register to inspect following a GPU hang.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h       | 1 +
 drivers/gpu/drm/i915/i915_gpu_error.c | 4 ++++
 2 files changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index e7357656728e..b5abf88e908e 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -534,6 +534,7 @@ struct drm_i915_error_state {
 		u32 tail;
 		u32 head;
 		u32 ctl;
+		u32 mode;
 		u32 hws;
 		u32 ipeir;
 		u32 ipehr;
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 5d8fd0beda2e..c48277fbe6a7 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -237,6 +237,8 @@ static void error_print_engine(struct drm_i915_error_state_buf *m,
 	err_printf(m, "  HEAD:  0x%08x\n", ee->head);
 	err_printf(m, "  TAIL:  0x%08x\n", ee->tail);
 	err_printf(m, "  CTL:   0x%08x\n", ee->ctl);
+	err_printf(m, "  MODE:  0x%08x [idle? %d]\n",
+		   ee->mode, !!(ee->mode & MODE_IDLE));
 	err_printf(m, "  HWS:   0x%08x\n", ee->hws);
 	err_printf(m, "  ACTHD: 0x%08x %08x\n",
 		   (u32)(ee->acthd>>32), (u32)ee->acthd);
@@ -979,6 +981,8 @@ static void error_record_engine_registers(struct drm_i915_error_state *error,
 	ee->head = I915_READ_HEAD(engine);
 	ee->tail = I915_READ_TAIL(engine);
 	ee->ctl = I915_READ_CTL(engine);
+	if (INTEL_GEN(dev_priv) > 2)
+		ee->mode = I915_READ_MODE(engine);
 
 	if (I915_NEED_GFX_HWS(dev_priv)) {
 		i915_reg_t mmio;
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 125+ messages in thread

* [PATCH 31/33] drm/i915: Always use the GTT for error capture
  2016-08-07 14:45 First class VMA, take 2 Chris Wilson
                   ` (29 preceding siblings ...)
  2016-08-07 14:45 ` [PATCH 30/33] drm/i915: Record the RING_MODE register for post-mortem debugging Chris Wilson
@ 2016-08-07 14:45 ` Chris Wilson
  2016-08-07 14:45 ` [PATCH 32/33] drm/i915: Consolidate error object printing Chris Wilson
                   ` (7 subsequent siblings)
  38 siblings, 0 replies; 125+ messages in thread
From: Chris Wilson @ 2016-08-07 14:45 UTC (permalink / raw)
  To: intel-gfx

Since the GTT provides universal access to any GPU page, we can use it
to reduce our plethora of read methods to just one. It also has the
important characteristic of being exactly what the GPU sees - if there
are incoherency problems, seeing the batch as executed (rather than as
trapped inside the cpu cache) is important.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c   |  43 ++++++++----
 drivers/gpu/drm/i915/i915_gem_gtt.h   |   2 +
 drivers/gpu/drm/i915/i915_gpu_error.c | 120 ++++++++++++----------------------
 3 files changed, 74 insertions(+), 91 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index e85593b3bb85..157fd6100d6b 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -2730,6 +2730,7 @@ int i915_gem_init_ggtt(struct drm_i915_private *dev_priv)
 	 */
 	struct i915_ggtt *ggtt = &dev_priv->ggtt;
 	unsigned long hole_start, hole_end;
+	struct i915_hw_ppgtt *ppgtt;
 	struct drm_mm_node *entry;
 	int ret;
 
@@ -2737,6 +2738,15 @@ int i915_gem_init_ggtt(struct drm_i915_private *dev_priv)
 	if (ret)
 		return ret;
 
+	/* Reserve a mappable slot for our lockless error capture */
+	ret = drm_mm_insert_node_in_range_generic(&ggtt->base.mm,
+						  &ggtt->gpu_error,
+						  4096, 0, -1,
+						  0, ggtt->mappable_end,
+						  0, 0);
+	if (ret)
+		return ret;
+
 	/* Clear any non-preallocated blocks */
 	drm_mm_for_each_hole(entry, &ggtt->base.mm, hole_start, hole_end) {
 		DRM_DEBUG_KMS("clearing unused GTT space: [%lx, %lx]\n",
@@ -2751,25 +2761,21 @@ int i915_gem_init_ggtt(struct drm_i915_private *dev_priv)
 			       true);
 
 	if (USES_PPGTT(dev_priv) && !USES_FULL_PPGTT(dev_priv)) {
-		struct i915_hw_ppgtt *ppgtt;
-
 		ppgtt = kzalloc(sizeof(*ppgtt), GFP_KERNEL);
-		if (!ppgtt)
-			return -ENOMEM;
+		if (!ppgtt) {
+			ret = -ENOMEM;
+			goto err;
+		}
 
 		ret = __hw_ppgtt_init(ppgtt, dev_priv);
-		if (ret) {
-			kfree(ppgtt);
-			return ret;
-		}
+		if (ret)
+			goto err_ppgtt;
 
-		if (ppgtt->base.allocate_va_range)
+		if (ppgtt->base.allocate_va_range) {
 			ret = ppgtt->base.allocate_va_range(&ppgtt->base, 0,
 							    ppgtt->base.total);
-		if (ret) {
-			ppgtt->base.cleanup(&ppgtt->base);
-			kfree(ppgtt);
-			return ret;
+			if (ret)
+				goto err_ppgtt_cleanup;
 		}
 
 		ppgtt->base.clear_range(&ppgtt->base,
@@ -2783,6 +2789,14 @@ int i915_gem_init_ggtt(struct drm_i915_private *dev_priv)
 	}
 
 	return 0;
+
+err_ppgtt_cleanup:
+	ppgtt->base.cleanup(&ppgtt->base);
+err_ppgtt:
+	kfree(ppgtt);
+err:
+	drm_mm_remove_node(&ggtt->gpu_error);
+	return ret;
 }
 
 /**
@@ -2802,6 +2816,9 @@ void i915_ggtt_cleanup_hw(struct drm_i915_private *dev_priv)
 
 	i915_gem_cleanup_stolen(&dev_priv->drm);
 
+	if (drm_mm_node_allocated(&ggtt->gpu_error))
+		drm_mm_remove_node(&ggtt->gpu_error);
+
 	if (drm_mm_initialized(&ggtt->base.mm)) {
 		intel_vgt_deballoon(dev_priv);
 
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index d3eb910ddb89..0aa900d61eb0 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -433,6 +433,8 @@ struct i915_ggtt {
 	bool do_idle_maps;
 
 	int mtrr;
+
+	struct drm_mm_node gpu_error;
 };
 
 struct i915_hw_ppgtt {
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index c48277fbe6a7..d9a65cbd1234 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -607,7 +607,7 @@ static void i915_error_object_free(struct drm_i915_error_object *obj)
 		return;
 
 	for (page = 0; page < obj->page_count; page++)
-		kfree(obj->pages[page]);
+		free_page((unsigned long)obj->pages[page]);
 
 	kfree(obj);
 }
@@ -643,98 +643,69 @@ static void i915_error_state_free(struct kref *error_ref)
 	kfree(error);
 }
 
+static int compress_page(void *src, struct drm_i915_error_object *dst)
+{
+	unsigned long page;
+
+	page = __get_free_page(GFP_ATOMIC | __GFP_NOWARN);
+	if (!page)
+		return -ENOMEM;
+
+	dst->pages[dst->page_count++] = (void *)page;
+
+	memcpy((void *)page, src, PAGE_SIZE);
+	return 0;
+}
+
 static struct drm_i915_error_object *
-i915_error_object_create(struct drm_i915_private *dev_priv,
+i915_error_object_create(struct drm_i915_private *i915,
 			 struct i915_vma *vma)
 {
-	struct i915_ggtt *ggtt = &dev_priv->ggtt;
-	struct drm_i915_gem_object *src;
+	struct i915_ggtt *ggtt = &i915->ggtt;
+	const u64 slot = ggtt->gpu_error.start;
 	struct drm_i915_error_object *dst;
-	int num_pages;
-	bool use_ggtt;
-	int i = 0;
-	u64 reloc_offset;
+	unsigned long num_pages;
+	struct sgt_iter iter;
+	dma_addr_t dma;
 
 	if (!vma)
 		return NULL;
 
-	src = vma->obj;
-	if (!src->pages)
-		return NULL;
-
-	num_pages = src->base.size >> PAGE_SHIFT;
-
-	dst = kmalloc(sizeof(*dst) + num_pages * sizeof(u32 *), GFP_ATOMIC);
+	num_pages = min_t(u64, vma->size, vma->obj->base.size) >> PAGE_SHIFT;
+	dst = kmalloc(sizeof(*dst) + num_pages * sizeof(u32 *),
+		      GFP_ATOMIC | __GFP_NOWARN);
 	if (!dst)
 		return NULL;
 
 	dst->gtt_offset = vma->node.start;
-	dst->gtt_size = vma->node.size;
+	dst->page_count = 0;
 
-	reloc_offset = dst->gtt_offset;
-	use_ggtt = (src->cache_level == I915_CACHE_NONE &&
-		   (vma->flags & I915_VMA_GLOBAL_BIND) &&
-		   reloc_offset + num_pages * PAGE_SIZE <= ggtt->mappable_end);
+	for_each_sgt_dma(dma, iter,
+			 vma->ggtt_view.pages ?: vma->obj->pages) {
+		int ret;
+		void *s;
 
-	/* Cannot access stolen address directly, try to use the aperture */
-	if (src->stolen) {
-		use_ggtt = true;
+		ggtt->base.insert_page(&ggtt->base, dma, slot,
+				       I915_CACHE_NONE, 0);
 
-		if (!(vma->flags & I915_VMA_GLOBAL_BIND))
-			goto unwind;
+		s = (void *__force)
+			io_mapping_map_atomic_wc(ggtt->mappable, slot);
+		ret = compress_page(s, dst);
+		io_mapping_unmap_atomic(s);
 
-		reloc_offset = vma->node.start;
-		if (reloc_offset + num_pages * PAGE_SIZE > ggtt->mappable_end)
+		if (ret)
 			goto unwind;
 	}
-
-	/* Cannot access snooped pages through the aperture */
-	if (use_ggtt && src->cache_level != I915_CACHE_NONE &&
-	    !HAS_LLC(dev_priv))
-		goto unwind;
-
-	dst->page_count = num_pages;
-	while (num_pages--) {
-		void *d;
-
-		d = kmalloc(PAGE_SIZE, GFP_ATOMIC);
-		if (d == NULL)
-			goto unwind;
-
-		if (use_ggtt) {
-			void __iomem *s;
-
-			/* Simply ignore tiling or any overlapping fence.
-			 * It's part of the error state, and this hopefully
-			 * captures what the GPU read.
-			 */
-
-			s = io_mapping_map_atomic_wc(ggtt->mappable,
-						     reloc_offset);
-			memcpy_fromio(d, s, PAGE_SIZE);
-			io_mapping_unmap_atomic(s);
-		} else {
-			struct page *page;
-			void *s;
-
-			page = i915_gem_object_get_page(src, i);
-
-			s = kmap_atomic(page);
-			memcpy(d, s, PAGE_SIZE);
-			kunmap_atomic(s);
-		}
-
-		dst->pages[i++] = d;
-		reloc_offset += PAGE_SIZE;
-	}
-
+out:
+	ggtt->base.clear_range(&ggtt->base, slot, PAGE_SIZE, true);
 	return dst;
 
 unwind:
-	while (i--)
-		kfree(dst->pages[i]);
+	while (dst->page_count--)
+		free_page((unsigned long)dst->pages[dst->page_count]);
 	kfree(dst);
-	return NULL;
+	dst = NULL;
+	goto out;
 }
 
 /* The error capture is special as tries to run underneath the normal
@@ -1371,9 +1342,6 @@ static int capture(void *data)
 {
 	struct drm_i915_error_state *error = data;
 
-	/* Ensure that what we readback from memory matches what the GPU sees */
-	wbinvd();
-
 	i915_capture_gen_state(error->i915, error);
 	i915_capture_reg_state(error->i915, error);
 	i915_gem_record_fences(error->i915, error);
@@ -1387,9 +1355,6 @@ static int capture(void *data)
 	error->overlay = intel_overlay_capture_error_state(error->i915);
 	error->display = intel_display_capture_error_state(error->i915);
 
-	/* And make sure we don't leave trash in the CPU cache */
-	wbinvd();
-
 	return 0;
 }
 
@@ -1463,7 +1428,6 @@ void i915_error_state_get(struct drm_device *dev,
 	if (error_priv->error)
 		kref_get(&error_priv->error->ref);
 	spin_unlock_irq(&dev_priv->gpu_error.lock);
-
 }
 
 void i915_error_state_put(struct i915_error_state_file_priv *error_priv)
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 125+ messages in thread

* [PATCH 32/33] drm/i915: Consolidate error object printing
  2016-08-07 14:45 First class VMA, take 2 Chris Wilson
                   ` (30 preceding siblings ...)
  2016-08-07 14:45 ` [PATCH 31/33] drm/i915: Always use the GTT for error capture Chris Wilson
@ 2016-08-07 14:45 ` Chris Wilson
  2016-08-09 11:44   ` Joonas Lahtinen
  2016-08-07 14:45 ` [PATCH 33/33] drm/i915: Compress GPU objects in error state Chris Wilson
                   ` (6 subsequent siblings)
  38 siblings, 1 reply; 125+ messages in thread
From: Chris Wilson @ 2016-08-07 14:45 UTC (permalink / raw)
  To: intel-gfx

Leave all the pretty printing to userspace and simplify the error
capture to only have a single common object printer. It makes the kernel
code more compact, and the refactoring allows us to apply more complex
transformations like compressing the output.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gpu_error.c | 100 +++++++++-------------------------
 1 file changed, 25 insertions(+), 75 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index d9a65cbd1234..a0e906bc64dc 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -310,10 +310,22 @@ void i915_error_printf(struct drm_i915_error_state_buf *e, const char *f, ...)
 }
 
 static void print_error_obj(struct drm_i915_error_state_buf *m,
+			    struct intel_engine_cs *engine,
+			    const char *name,
 			    struct drm_i915_error_object *obj)
 {
 	int page, offset, elt;
 
+	if (!obj)
+		return;
+
+	if (name) {
+		err_printf(m, "%s --- %s gtt_offset = 0x%08x_%08x\n",
+			   engine ? engine->name : "global", name,
+			   upper_32_bits(obj->gtt_offset),
+			   lower_32_bits(obj->gtt_offset));
+	}
+
 	for (page = offset = 0; page < obj->page_count; page++) {
 		for (elt = 0; elt < PAGE_SIZE/4; elt++) {
 			err_printf(m, "%08x :  %08x\n", offset,
@@ -330,8 +342,8 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
 	struct drm_i915_private *dev_priv = to_i915(dev);
 	struct drm_i915_error_state *error = error_priv->error;
 	struct drm_i915_error_object *obj;
-	int i, j, offset, elt;
 	int max_hangcheck_score;
+	int i, j;
 
 	if (!error) {
 		err_printf(m, "no error state collected\n");
@@ -446,15 +458,7 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
 			err_printf(m, " --- gtt_offset = 0x%08x %08x\n",
 				   upper_32_bits(obj->gtt_offset),
 				   lower_32_bits(obj->gtt_offset));
-			print_error_obj(m, obj);
-		}
-
-		obj = ee->wa_batchbuffer;
-		if (obj) {
-			err_printf(m, "%s (w/a) --- gtt_offset = 0x%08x\n",
-				   dev_priv->engine[i].name,
-				   lower_32_bits(obj->gtt_offset));
-			print_error_obj(m, obj);
+			print_error_obj(m, &dev_priv->engine[i], NULL, obj);
 		}
 
 		if (ee->num_requests) {
@@ -482,77 +486,23 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
 			}
 		}
 
-		if ((obj = ee->ringbuffer)) {
-			err_printf(m, "%s --- ringbuffer = 0x%08x\n",
-				   dev_priv->engine[i].name,
-				   lower_32_bits(obj->gtt_offset));
-			print_error_obj(m, obj);
-		}
+		print_error_obj(m, &dev_priv->engine[i],
+				"ringbuffer", ee->ringbuffer);
 
-		if ((obj = ee->hws_page)) {
-			u64 hws_offset = obj->gtt_offset;
-			u32 *hws_page = &obj->pages[0][0];
+		print_error_obj(m, &dev_priv->engine[i],
+				"HW Status", ee->hws_page);
 
-			if (i915.enable_execlists) {
-				hws_offset += LRC_PPHWSP_PN * PAGE_SIZE;
-				hws_page = &obj->pages[LRC_PPHWSP_PN][0];
-			}
-			err_printf(m, "%s --- HW Status = 0x%08llx\n",
-				   dev_priv->engine[i].name, hws_offset);
-			offset = 0;
-			for (elt = 0; elt < PAGE_SIZE/16; elt += 4) {
-				err_printf(m, "[%04x] %08x %08x %08x %08x\n",
-					   offset,
-					   hws_page[elt],
-					   hws_page[elt+1],
-					   hws_page[elt+2],
-					   hws_page[elt+3]);
-				offset += 16;
-			}
-		}
+		print_error_obj(m, &dev_priv->engine[i],
+				"HW context", ee->ctx);
 
-		obj = ee->wa_ctx;
-		if (obj) {
-			u64 wa_ctx_offset = obj->gtt_offset;
-			u32 *wa_ctx_page = &obj->pages[0][0];
-			struct intel_engine_cs *engine = &dev_priv->engine[RCS];
-			u32 wa_ctx_size = (engine->wa_ctx.indirect_ctx.size +
-					   engine->wa_ctx.per_ctx.size);
-
-			err_printf(m, "%s --- WA ctx batch buffer = 0x%08llx\n",
-				   dev_priv->engine[i].name, wa_ctx_offset);
-			offset = 0;
-			for (elt = 0; elt < wa_ctx_size; elt += 4) {
-				err_printf(m, "[%04x] %08x %08x %08x %08x\n",
-					   offset,
-					   wa_ctx_page[elt + 0],
-					   wa_ctx_page[elt + 1],
-					   wa_ctx_page[elt + 2],
-					   wa_ctx_page[elt + 3]);
-				offset += 16;
-			}
-		}
+		print_error_obj(m, &dev_priv->engine[i],
+				"WA context", ee->wa_ctx);
 
-		if ((obj = ee->ctx)) {
-			err_printf(m, "%s --- HW Context = 0x%08x\n",
-				   dev_priv->engine[i].name,
-				   lower_32_bits(obj->gtt_offset));
-			print_error_obj(m, obj);
-		}
+		print_error_obj(m, &dev_priv->engine[i],
+				"WA batchbuffer", ee->wa_batchbuffer);
 	}
 
-	if ((obj = error->semaphore)) {
-		err_printf(m, "Semaphore page = 0x%08x\n",
-			   lower_32_bits(obj->gtt_offset));
-		for (elt = 0; elt < PAGE_SIZE/16; elt += 4) {
-			err_printf(m, "[%04x] %08x %08x %08x %08x\n",
-				   elt * 4,
-				   obj->pages[0][elt],
-				   obj->pages[0][elt+1],
-				   obj->pages[0][elt+2],
-				   obj->pages[0][elt+3]);
-		}
-	}
+	print_error_obj(m, NULL, "Semaphores", error->semaphore);
 
 	if (error->overlay)
 		intel_overlay_print_error_state(m, error->overlay);
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 125+ messages in thread

* [PATCH 33/33] drm/i915: Compress GPU objects in error state
  2016-08-07 14:45 First class VMA, take 2 Chris Wilson
                   ` (31 preceding siblings ...)
  2016-08-07 14:45 ` [PATCH 32/33] drm/i915: Consolidate error object printing Chris Wilson
@ 2016-08-07 14:45 ` Chris Wilson
  2016-08-10 10:32   ` Joonas Lahtinen
  2016-08-07 15:16 ` ✗ Ro.CI.BAT: failure for series starting with [01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance Patchwork
                   ` (5 subsequent siblings)
  38 siblings, 1 reply; 125+ messages in thread
From: Chris Wilson @ 2016-08-07 14:45 UTC (permalink / raw)
  To: intel-gfx

Our error states are quickly growing, pinning kernel memory with them.
The majority of the space is taken up by the error objects. These
compress well using zlib and without decode are mostly meaningless, so
encoding them does not hinder quickly parsing the error state for
familiarity.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/Kconfig          |  1 +
 drivers/gpu/drm/i915/i915_drv.h       |  3 +-
 drivers/gpu/drm/i915/i915_gpu_error.c | 96 ++++++++++++++++++++++++++++++-----
 3 files changed, 85 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig
index 7badcee88ebf..c8ea20526aef 100644
--- a/drivers/gpu/drm/i915/Kconfig
+++ b/drivers/gpu/drm/i915/Kconfig
@@ -5,6 +5,7 @@ config DRM_I915
 	select INTEL_GTT
 	select INTERVAL_TREE
 	select STOP_MACHINE
+	select ZLIB_DEFLATE
 	# we need shmfs for the swappable backing store, and in particular
 	# the shmem_readpage() which depends upon tmpfs
 	select SHMEM
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index b5abf88e908e..7d4038a2525a 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -551,9 +551,10 @@ struct drm_i915_error_state {
 		u32 semaphore_mboxes[I915_NUM_ENGINES - 1];
 
 		struct drm_i915_error_object {
-			int page_count;
 			u64 gtt_offset;
 			u64 gtt_size;
+			int page_count;
+			int unused;
 			u32 *pages[0];
 		} *ringbuffer, *batchbuffer, *wa_batchbuffer, *ctx, *hws_page;
 
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index a0e906bc64dc..80c4ca3d4db9 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -29,6 +29,7 @@
 
 #include <generated/utsrelease.h>
 #include <linux/stop_machine.h>
+#include <linux/zlib.h>
 #include "i915_drv.h"
 
 static const char *engine_str(int engine)
@@ -309,12 +310,30 @@ void i915_error_printf(struct drm_i915_error_state_buf *e, const char *f, ...)
 	va_end(args);
 }
 
+static bool
+ascii85_encode(u32 in, char *out)
+{
+	int i;
+
+	if (in == 0)
+		return false;
+
+	out[5] = '\0';
+	for (i = 5; i--; ) {
+		out[i] = '!' + in % 85;
+		in /= 85;
+	}
+
+	return true;
+}
+
 static void print_error_obj(struct drm_i915_error_state_buf *m,
 			    struct intel_engine_cs *engine,
 			    const char *name,
 			    struct drm_i915_error_object *obj)
 {
-	int page, offset, elt;
+	char out[6];
+	int page;
 
 	if (!obj)
 		return;
@@ -326,13 +345,23 @@ static void print_error_obj(struct drm_i915_error_state_buf *m,
 			   lower_32_bits(obj->gtt_offset));
 	}
 
-	for (page = offset = 0; page < obj->page_count; page++) {
-		for (elt = 0; elt < PAGE_SIZE/4; elt++) {
-			err_printf(m, "%08x :  %08x\n", offset,
-				   obj->pages[page][elt]);
-			offset += 4;
+	err_puts(m, ":"); /* indicate compressed data */
+	for (page = 0; page < obj->page_count; page++) {
+		int i, len;
+
+		len = PAGE_SIZE;
+		if (page == obj->page_count - 1)
+			len -= obj->unused;
+		len = (len + 3) / 4;
+
+		for (i = 0; i < len; i++) {
+			if (ascii85_encode(obj->pages[page][i], out))
+				err_puts(m, out);
+			else
+				err_puts(m, "z");
 		}
 	}
+	err_puts(m, "\n");
 }
 
 int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
@@ -593,17 +622,37 @@ static void i915_error_state_free(struct kref *error_ref)
 	kfree(error);
 }
 
-static int compress_page(void *src, struct drm_i915_error_object *dst)
+static int compress_page(struct z_stream_s *zstream,
+			 void *src,
+			 struct drm_i915_error_object *dst)
 {
-	unsigned long page;
+	zstream->next_in = src;
+	zstream->avail_in = PAGE_SIZE;
 
-	page = __get_free_page(GFP_ATOMIC | __GFP_NOWARN);
-	if (!page)
-		return -ENOMEM;
+	do {
+		if (zstream->avail_out == 0) {
+			unsigned long page;
+
+			page = __get_free_page(GFP_ATOMIC | __GFP_NOWARN);
+			if (!page)
+				return -ENOMEM;
+
+			dst->pages[dst->page_count++] = (void *)page;
+
+			zstream->next_out = (void *)page;
+			zstream->avail_out = PAGE_SIZE;
+		}
 
-	dst->pages[dst->page_count++] = (void *)page;
+		if (zlib_deflate(zstream, Z_SYNC_FLUSH) != Z_OK)
+			return -EIO;
+
+#if 0
+		/* XXX fallback to uncompressed if we increases size? */
+		if (zstream->total_out > zstream->total_in)
+			return -E2BIG;
+#endif
+	} while (zstream->avail_in);
 
-	memcpy((void *)page, src, PAGE_SIZE);
 	return 0;
 }
 
@@ -614,6 +663,7 @@ i915_error_object_create(struct drm_i915_private *i915,
 	struct i915_ggtt *ggtt = &i915->ggtt;
 	const u64 slot = ggtt->gpu_error.start;
 	struct drm_i915_error_object *dst;
+	struct z_stream_s zstream;
 	unsigned long num_pages;
 	struct sgt_iter iter;
 	dma_addr_t dma;
@@ -622,6 +672,7 @@ i915_error_object_create(struct drm_i915_private *i915,
 		return NULL;
 
 	num_pages = min_t(u64, vma->size, vma->obj->base.size) >> PAGE_SHIFT;
+	num_pages = DIV_ROUND_UP(10 * num_pages, 8); /* worstcase zlib growth */
 	dst = kmalloc(sizeof(*dst) + num_pages * sizeof(u32 *),
 		      GFP_ATOMIC | __GFP_NOWARN);
 	if (!dst)
@@ -629,6 +680,18 @@ i915_error_object_create(struct drm_i915_private *i915,
 
 	dst->gtt_offset = vma->node.start;
 	dst->page_count = 0;
+	dst->unused = 0;
+
+	memset(&zstream, 0, sizeof(zstream));
+	zstream.workspace = kmalloc(zlib_deflate_workspacesize(MAX_WBITS,
+							       MAX_MEM_LEVEL),
+				    GFP_ATOMIC | __GFP_NOWARN);
+	if (!zstream.workspace ||
+	    zlib_deflateInit(&zstream, Z_DEFAULT_COMPRESSION) != Z_OK) {
+		kfree(zstream.workspace);
+		kfree(dst);
+		return NULL;
+	}
 
 	for_each_sgt_dma(dma, iter,
 			 vma->ggtt_view.pages ?: vma->obj->pages) {
@@ -640,13 +703,18 @@ i915_error_object_create(struct drm_i915_private *i915,
 
 		s = (void *__force)
 			io_mapping_map_atomic_wc(ggtt->mappable, slot);
-		ret = compress_page(s, dst);
+		ret = compress_page(&zstream, s, dst);
 		io_mapping_unmap_atomic(s);
 
 		if (ret)
 			goto unwind;
 	}
+	zlib_deflate(&zstream, Z_FINISH);
+	dst->unused = zstream.avail_out;
 out:
+	zlib_deflateEnd(&zstream);
+	kfree(zstream.workspace);
+
 	ggtt->base.clear_range(&ggtt->base, slot, PAGE_SIZE, true);
 	return dst;
 
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 125+ messages in thread

* ✗ Ro.CI.BAT: failure for series starting with [01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance
  2016-08-07 14:45 First class VMA, take 2 Chris Wilson
                   ` (32 preceding siblings ...)
  2016-08-07 14:45 ` [PATCH 33/33] drm/i915: Compress GPU objects in error state Chris Wilson
@ 2016-08-07 15:16 ` Patchwork
  2016-08-08  9:46 ` ✗ Ro.CI.BAT: failure for series starting with [01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance (rev4) Patchwork
                   ` (4 subsequent siblings)
  38 siblings, 0 replies; 125+ messages in thread
From: Patchwork @ 2016-08-07 15:16 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance
URL   : https://patchwork.freedesktop.org/series/10770/
State : failure

== Summary ==

Series 10770v1 Series without cover letter
http://patchwork.freedesktop.org/api/1.0/series/10770/revisions/1/mbox

Test drv_module_reload_basic:
                pass       -> SKIP       (ro-hsw-i3-4010u)
Test gem_exec_suspend:
        Subgroup basic-s3:
                pass       -> DMESG-WARN (ro-bdw-i7-5557U)
Test kms_cursor_legacy:
        Subgroup basic-cursor-vs-flip-varying-size:
                pass       -> FAIL       (ro-ilk1-i5-650)
        Subgroup basic-flip-vs-cursor-legacy:
                pass       -> FAIL       (ro-hsw-i7-4770r)
        Subgroup basic-flip-vs-cursor-varying-size:
                pass       -> FAIL       (ro-snb-i7-2620M)
                fail       -> PASS       (ro-bdw-i5-5250u)
Test kms_pipe_crc_basic:
        Subgroup suspend-read-crc-pipe-a:
                pass       -> SKIP       (ro-bdw-i7-5557U)
        Subgroup suspend-read-crc-pipe-b:
                pass       -> SKIP       (ro-bdw-i7-5557U)
        Subgroup suspend-read-crc-pipe-c:
                pass       -> SKIP       (ro-bdw-i7-5557U)

fi-kbl-qkkr      total:244  pass:185  dwarn:28  dfail:0   fail:3   skip:28 
ro-bdw-i5-5250u  total:240  pass:219  dwarn:4   dfail:0   fail:1   skip:16 
ro-bdw-i7-5557U  total:240  pass:220  dwarn:1   dfail:0   fail:0   skip:19 
ro-bdw-i7-5600u  total:240  pass:207  dwarn:0   dfail:0   fail:1   skip:32 
ro-bsw-n3050     total:240  pass:194  dwarn:0   dfail:0   fail:4   skip:42 
ro-byt-n2820     total:240  pass:197  dwarn:0   dfail:0   fail:3   skip:40 
ro-hsw-i3-4010u  total:240  pass:213  dwarn:0   dfail:0   fail:0   skip:27 
ro-hsw-i7-4770r  total:240  pass:213  dwarn:0   dfail:0   fail:1   skip:26 
ro-ilk-i7-620lm  total:240  pass:172  dwarn:1   dfail:0   fail:2   skip:65 
ro-ilk1-i5-650   total:235  pass:173  dwarn:0   dfail:0   fail:2   skip:60 
ro-ivb-i7-3770   total:240  pass:205  dwarn:0   dfail:0   fail:0   skip:35 
ro-ivb2-i7-3770  total:240  pass:209  dwarn:0   dfail:0   fail:0   skip:31 
ro-skl3-i5-6260u total:240  pass:223  dwarn:0   dfail:0   fail:3   skip:14 
ro-snb-i7-2620M  total:240  pass:197  dwarn:0   dfail:0   fail:2   skip:41 

Results at /archive/results/CI_IGT_test/RO_Patchwork_1756/

b834992 drm-intel-nightly: 2016y-08m-05d-20h-40m-44s UTC integration manifest
da185cb drm/i915: Compress GPU objects in error state
966324f drm/i915: Consolidate error object printing
61a7172 drm/i915: Always use the GTT for error capture
bcd036b drm/i915: Record the RING_MODE register for post-mortem debugging
fc82dc5 drm/i915: Only record active and pending requests upon a GPU hang
e5cc8da drm/i915: Move per-request pid from request to ctx
a33b1e0 drm/i915: Print the batchbuffer offset next to BBADDR in error state
5c90f4c drm/i915: Track pinned VMA
3cba679 drm/i915: Use VMA for wa_ctx tracking
d8f8e18 drm/i915: Use VMA for render state page tracking
b7172b2 drm/i915: Use VMA as the primary tracker for semaphore page
13b9376 drm/i915/overlay: Use VMA as the primary tracker for images
020d87e drm/i915: Use VMA for scratch page tracking
a210ddc drm/i915: Use VMA for ringbuffer tracking
5a8c965 drm/i915: Only clflush the context object when binding
2760d6e drm/i915: Use VMA as the primary object for context state
b9d7f13 drm/i915: Use VMA directly for checking tiling parameters
2aa06f8 drm/i915: Convert fence computations to use vma directly
1c52491 drm/i915: Track pinned vma inside guc
8896fab38 drm/i915: Create a VMA for an object
be410b8d drm/i915: Remove redundant WARN_ON from __i915_add_request()
5660fb6 drm/i915: Reduce i915_gem_objects to only show object information
9e72f39 drm/i915: Focus debugfs/i915_gem_pinned to show only display pins
7104d09 drm/i915: Remove inactive/active list from debugfs
ca52a69 drm/i915: Mark unmappable GGTT entries as PIN_HIGH
6ec00a7 drm/i915: Move setting of request->batch into its single callsite
7a39715 drm/i915: Store the active context object on all engines upon error
331b7c8 drm/i915: Stop the machine whilst capturing the GPU crash dump
a701c16 drm/i915: Reduce amount of duplicate buffer information captured on error
0f70639 drm/i915: Use RCU to annotate and enforce protection for breadcrumb's bh
cc71f27 drm/i915: Move missed interrupt detection from hangcheck to breadcrumbs
a505e6e drm/i915: Do not overwrite the request with zero on reallocation
a096aec drm/i915: Add smp_rmb() to busy ioctl's RCU dance

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* [PATCH 1/3] drm/i915: Use VMA for scratch page tracking
  2016-08-07 14:45 ` [PATCH 21/33] drm/i915: Use VMA for scratch page tracking Chris Wilson
@ 2016-08-08  8:00   ` Chris Wilson
  2016-08-08  8:00     ` [PATCH 2/3] drm/i915: Move common scratch allocation/destroy to intel_engine_cs.c Chris Wilson
  2016-08-08  8:00     ` [PATCH 3/3] drm/i915: Move common seqno reset " Chris Wilson
  2016-08-11 10:06   ` [PATCH 21/33] drm/i915: Use VMA for scratch page tracking Joonas Lahtinen
  1 sibling, 2 replies; 125+ messages in thread
From: Chris Wilson @ 2016-08-08  8:00 UTC (permalink / raw)
  To: intel-gfx

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---

Accidental squashing during rebase.
-Chris

---
 drivers/gpu/drm/i915/i915_gem_context.c |  2 +-
 drivers/gpu/drm/i915/i915_gpu_error.c   |  2 +-
 drivers/gpu/drm/i915/intel_display.c    |  2 +-
 drivers/gpu/drm/i915/intel_lrc.c        | 18 +++++------
 drivers/gpu/drm/i915/intel_ringbuffer.c | 55 +++++++++++++++++++--------------
 drivers/gpu/drm/i915/intel_ringbuffer.h | 10 ++----
 6 files changed, 46 insertions(+), 43 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 5d42fee75464..15eed897b498 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -660,7 +660,7 @@ mi_set_context(struct drm_i915_gem_request *req, u32 hw_flags)
 					MI_STORE_REGISTER_MEM |
 					MI_SRM_LRM_GLOBAL_GTT);
 			intel_ring_emit_reg(ring, last_reg);
-			intel_ring_emit(ring, engine->scratch.gtt_offset);
+			intel_ring_emit(ring, engine->scratch->node.start);
 			intel_ring_emit(ring, MI_NOOP);
 		}
 		intel_ring_emit(ring, MI_ARB_ON_OFF | MI_ARB_ENABLE);
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 09c3ae0c282a..2d93af0bb793 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1075,7 +1075,7 @@ static void i915_gem_record_rings(struct drm_i915_private *dev_priv,
 			if (HAS_BROKEN_CS_TLB(dev_priv))
 				ee->wa_batchbuffer =
 					i915_error_ggtt_object_create(dev_priv,
-								      engine->scratch.obj);
+								      engine->scratch->obj);
 
 			if (request->ctx->engine[i].state) {
 				ee->ctx = i915_error_ggtt_object_create(dev_priv,
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 9cbf5431c1e3..3deee0306e82 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -11325,7 +11325,7 @@ static int intel_gen7_queue_flip(struct drm_device *dev,
 			intel_ring_emit(ring, MI_STORE_REGISTER_MEM |
 					      MI_SRM_LRM_GLOBAL_GTT);
 		intel_ring_emit_reg(ring, DERRMR);
-		intel_ring_emit(ring, req->engine->scratch.gtt_offset + 256);
+		intel_ring_emit(ring, req->engine->scratch->node.start + 256);
 		if (IS_GEN8(dev)) {
 			intel_ring_emit(ring, 0);
 			intel_ring_emit(ring, MI_NOOP);
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 198d59b272b2..4dc77747911d 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -914,7 +914,7 @@ static inline int gen8_emit_flush_coherentl3_wa(struct intel_engine_cs *engine,
 	wa_ctx_emit(batch, index, (MI_STORE_REGISTER_MEM_GEN8 |
 				   MI_SRM_LRM_GLOBAL_GTT));
 	wa_ctx_emit_reg(batch, index, GEN8_L3SQCREG4);
-	wa_ctx_emit(batch, index, engine->scratch.gtt_offset + 256);
+	wa_ctx_emit(batch, index, engine->scratch->node.start + 256);
 	wa_ctx_emit(batch, index, 0);
 
 	wa_ctx_emit(batch, index, MI_LOAD_REGISTER_IMM(1));
@@ -932,7 +932,7 @@ static inline int gen8_emit_flush_coherentl3_wa(struct intel_engine_cs *engine,
 	wa_ctx_emit(batch, index, (MI_LOAD_REGISTER_MEM_GEN8 |
 				   MI_SRM_LRM_GLOBAL_GTT));
 	wa_ctx_emit_reg(batch, index, GEN8_L3SQCREG4);
-	wa_ctx_emit(batch, index, engine->scratch.gtt_offset + 256);
+	wa_ctx_emit(batch, index, engine->scratch->node.start + 256);
 	wa_ctx_emit(batch, index, 0);
 
 	return index;
@@ -993,7 +993,7 @@ static int gen8_init_indirectctx_bb(struct intel_engine_cs *engine,
 
 	/* WaClearSlmSpaceAtContextSwitch:bdw,chv */
 	/* Actual scratch location is at 128 bytes offset */
-	scratch_addr = engine->scratch.gtt_offset + 2*CACHELINE_BYTES;
+	scratch_addr = engine->scratch->node.start + 2*CACHELINE_BYTES;
 
 	wa_ctx_emit(batch, index, GFX_OP_PIPE_CONTROL(6));
 	wa_ctx_emit(batch, index, (PIPE_CONTROL_FLUSH_L3 |
@@ -1072,8 +1072,8 @@ static int gen9_init_indirectctx_bb(struct intel_engine_cs *engine,
 	/* WaClearSlmSpaceAtContextSwitch:kbl */
 	/* Actual scratch location is at 128 bytes offset */
 	if (IS_KBL_REVID(dev_priv, 0, KBL_REVID_A0)) {
-		uint32_t scratch_addr
-			= engine->scratch.gtt_offset + 2*CACHELINE_BYTES;
+		uint32_t scratch_addr =
+			engine->scratch->node.start + 2*CACHELINE_BYTES;
 
 		wa_ctx_emit(batch, index, GFX_OP_PIPE_CONTROL(6));
 		wa_ctx_emit(batch, index, (PIPE_CONTROL_FLUSH_L3 |
@@ -1215,7 +1215,7 @@ static int intel_init_workaround_bb(struct intel_engine_cs *engine)
 	}
 
 	/* some WA perform writes to scratch page, ensure it is valid */
-	if (engine->scratch.obj == NULL) {
+	if (!engine->scratch) {
 		DRM_ERROR("scratch page not allocated for %s\n", engine->name);
 		return -EINVAL;
 	}
@@ -1483,7 +1483,7 @@ static int gen8_emit_flush_render(struct drm_i915_gem_request *request,
 {
 	struct intel_ring *ring = request->ring;
 	struct intel_engine_cs *engine = request->engine;
-	u32 scratch_addr = engine->scratch.gtt_offset + 2 * CACHELINE_BYTES;
+	u32 scratch_addr = engine->scratch->node.start + 2 * CACHELINE_BYTES;
 	bool vf_flush_wa = false, dc_flush_wa = false;
 	u32 flags = 0;
 	int ret;
@@ -1844,11 +1844,11 @@ int logical_render_ring_init(struct intel_engine_cs *engine)
 	else
 		engine->init_hw = gen8_init_render_ring;
 	engine->init_context = gen8_init_rcs_context;
-	engine->cleanup = intel_fini_pipe_control;
+	engine->cleanup = intel_engine_cleanup_scratch;
 	engine->emit_flush = gen8_emit_flush_render;
 	engine->emit_request = gen8_emit_request_render;
 
-	ret = intel_init_pipe_control(engine, 4096);
+	ret = intel_engine_create_scratch(engine, 4096);
 	if (ret)
 		return ret;
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index cff9935fe36f..f684fef895c1 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -176,7 +176,7 @@ intel_emit_post_sync_nonzero_flush(struct drm_i915_gem_request *req)
 {
 	struct intel_ring *ring = req->ring;
 	u32 scratch_addr =
-		req->engine->scratch.gtt_offset + 2 * CACHELINE_BYTES;
+		req->engine->scratch->node.start + 2 * CACHELINE_BYTES;
 	int ret;
 
 	ret = intel_ring_begin(req, 6);
@@ -212,7 +212,7 @@ gen6_render_ring_flush(struct drm_i915_gem_request *req, u32 mode)
 {
 	struct intel_ring *ring = req->ring;
 	u32 scratch_addr =
-		req->engine->scratch.gtt_offset + 2 * CACHELINE_BYTES;
+		req->engine->scratch->node.start + 2 * CACHELINE_BYTES;
 	u32 flags = 0;
 	int ret;
 
@@ -286,7 +286,7 @@ gen7_render_ring_flush(struct drm_i915_gem_request *req, u32 mode)
 {
 	struct intel_ring *ring = req->ring;
 	u32 scratch_addr =
-		req->engine->scratch.gtt_offset + 2 * CACHELINE_BYTES;
+		req->engine->scratch->node.start + 2 * CACHELINE_BYTES;
 	u32 flags = 0;
 	int ret;
 
@@ -370,7 +370,8 @@ gen8_emit_pipe_control(struct drm_i915_gem_request *req,
 static int
 gen8_render_ring_flush(struct drm_i915_gem_request *req, u32 mode)
 {
-	u32 scratch_addr = req->engine->scratch.gtt_offset + 2 * CACHELINE_BYTES;
+	u32 scratch_addr =
+		req->engine->scratch->node.start + 2 * CACHELINE_BYTES;
 	u32 flags = 0;
 	int ret;
 
@@ -612,45 +613,51 @@ out:
 	return ret;
 }
 
-void intel_fini_pipe_control(struct intel_engine_cs *engine)
+void intel_engine_cleanup_scratch(struct intel_engine_cs *engine)
 {
-	if (engine->scratch.obj == NULL)
+	struct i915_vma *vma;
+
+	vma = nullify(&engine->scratch);
+	if (!vma)
 		return;
 
-	i915_gem_object_ggtt_unpin(engine->scratch.obj);
-	i915_gem_object_put(engine->scratch.obj);
-	engine->scratch.obj = NULL;
+	i915_vma_unpin(vma);
+	i915_gem_object_put(vma->obj);
 }
 
-int intel_init_pipe_control(struct intel_engine_cs *engine, int size)
+int intel_engine_create_scratch(struct intel_engine_cs *engine, int size)
 {
 	struct drm_i915_gem_object *obj;
+	struct i915_vma *vma;
 	int ret;
 
-	WARN_ON(engine->scratch.obj);
+	WARN_ON(engine->scratch);
 
 	obj = i915_gem_object_create_stolen(&engine->i915->drm, size);
 	if (!obj)
 		obj = i915_gem_object_create(&engine->i915->drm, size);
 	if (IS_ERR(obj)) {
 		DRM_ERROR("Failed to allocate scratch page\n");
-		ret = PTR_ERR(obj);
-		goto err;
+		return PTR_ERR(obj);
 	}
 
-	ret = i915_gem_object_ggtt_pin(obj, NULL, 0, 4096, PIN_HIGH);
+	vma = i915_vma_create(obj, &engine->i915->ggtt.base, NULL);
+	if (IS_ERR(vma)) {
+		ret = PTR_ERR(vma);
+		goto err_unref;
+	}
+
+	ret = i915_vma_pin(vma, 0, 4096, PIN_GLOBAL | PIN_HIGH);
 	if (ret)
 		goto err_unref;
 
-	engine->scratch.obj = obj;
-	engine->scratch.gtt_offset = i915_gem_obj_ggtt_offset(obj);
-	DRM_DEBUG_DRIVER("%s pipe control offset: 0x%08x\n",
-			 engine->name, engine->scratch.gtt_offset);
+	engine->scratch = vma;
+	DRM_DEBUG_DRIVER("%s pipe control offset: 0x%08llx\n",
+			 engine->name, vma->node.start);
 	return 0;
 
 err_unref:
-	i915_gem_object_put(engine->scratch.obj);
-err:
+	i915_gem_object_put(obj);
 	return ret;
 }
 
@@ -1305,7 +1312,7 @@ static void render_ring_cleanup(struct intel_engine_cs *engine)
 		dev_priv->semaphore_obj = NULL;
 	}
 
-	intel_fini_pipe_control(engine);
+	intel_engine_cleanup_scratch(engine);
 }
 
 static int gen8_rcs_signal(struct drm_i915_gem_request *req)
@@ -1763,7 +1770,7 @@ i830_emit_bb_start(struct drm_i915_gem_request *req,
 		   unsigned int dispatch_flags)
 {
 	struct intel_ring *ring = req->ring;
-	u32 cs_offset = req->engine->scratch.gtt_offset;
+	u32 cs_offset = req->engine->scratch->node.start;
 	int ret;
 
 	ret = intel_ring_begin(req, 6);
@@ -2790,11 +2797,11 @@ int intel_init_render_ring_buffer(struct intel_engine_cs *engine)
 		return ret;
 
 	if (INTEL_GEN(dev_priv) >= 6) {
-		ret = intel_init_pipe_control(engine, 4096);
+		ret = intel_engine_create_scratch(engine, 4096);
 		if (ret)
 			return ret;
 	} else if (HAS_BROKEN_CS_TLB(dev_priv)) {
-		ret = intel_init_pipe_control(engine, I830_WA_SIZE);
+		ret = intel_engine_create_scratch(engine, I830_WA_SIZE);
 		if (ret)
 			return ret;
 	}
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 35e2b87ab17a..6a236c4c3f89 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -198,6 +198,7 @@ struct intel_engine_cs {
 
 	struct intel_hw_status_page status_page;
 	struct i915_ctx_workarounds wa_ctx;
+	struct i915_vma *scratch;
 
 	u32             irq_keep_mask; /* always keep these interrupts */
 	u32		irq_enable_mask; /* bitmask to enable ring interrupt */
@@ -320,11 +321,6 @@ struct intel_engine_cs {
 
 	struct intel_engine_hangcheck hangcheck;
 
-	struct {
-		struct drm_i915_gem_object *obj;
-		u32 gtt_offset;
-	} scratch;
-
 	bool needs_cmd_parser;
 
 	/*
@@ -476,8 +472,8 @@ void intel_ring_update_space(struct intel_ring *ring);
 
 void intel_engine_init_seqno(struct intel_engine_cs *engine, u32 seqno);
 
-int intel_init_pipe_control(struct intel_engine_cs *engine, int size);
-void intel_fini_pipe_control(struct intel_engine_cs *engine);
+int intel_engine_create_scratch(struct intel_engine_cs *engine, int size);
+void intel_engine_cleanup_scratch(struct intel_engine_cs *engine);
 
 void intel_engine_setup_common(struct intel_engine_cs *engine);
 int intel_engine_init_common(struct intel_engine_cs *engine);
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 125+ messages in thread

* [PATCH 2/3] drm/i915: Move common scratch allocation/destroy to intel_engine_cs.c
  2016-08-08  8:00   ` [PATCH 1/3] " Chris Wilson
@ 2016-08-08  8:00     ` Chris Wilson
  2016-08-08  9:24       ` Matthew Auld
  2016-08-08  8:00     ` [PATCH 3/3] drm/i915: Move common seqno reset " Chris Wilson
  1 sibling, 1 reply; 125+ messages in thread
From: Chris Wilson @ 2016-08-08  8:00 UTC (permalink / raw)
  To: intel-gfx

Since the scratch allocation and cleanup is shared by all engine
submission backends, move it out of the legacy intel_ringbuffer.c and
into the new home for common routines, intel_engine_cs.c

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_engine_cs.c  | 50 +++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.c        |  1 -
 drivers/gpu/drm/i915/intel_ringbuffer.c | 50 ---------------------------------
 drivers/gpu/drm/i915/intel_ringbuffer.h |  4 +--
 4 files changed, 51 insertions(+), 54 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 0dd3d1de18aa..1dec35441ab5 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -195,6 +195,54 @@ void intel_engine_setup_common(struct intel_engine_cs *engine)
 	i915_gem_batch_pool_init(engine, &engine->batch_pool);
 }
 
+int intel_engine_create_scratch(struct intel_engine_cs *engine, int size)
+{
+	struct drm_i915_gem_object *obj;
+	struct i915_vma *vma;
+	int ret;
+
+	WARN_ON(engine->scratch);
+
+	obj = i915_gem_object_create_stolen(&engine->i915->drm, size);
+	if (!obj)
+		obj = i915_gem_object_create(&engine->i915->drm, size);
+	if (IS_ERR(obj)) {
+		DRM_ERROR("Failed to allocate scratch page\n");
+		return PTR_ERR(obj);
+	}
+
+	vma = i915_vma_create(obj, &engine->i915->ggtt.base, NULL);
+	if (IS_ERR(vma)) {
+		ret = PTR_ERR(vma);
+		goto err_unref;
+	}
+
+	ret = i915_vma_pin(vma, 0, 4096, PIN_GLOBAL | PIN_HIGH);
+	if (ret)
+		goto err_unref;
+
+	engine->scratch = vma;
+	DRM_DEBUG_DRIVER("%s pipe control offset: 0x%08llx\n",
+			 engine->name, vma->node.start);
+	return 0;
+
+err_unref:
+	i915_gem_object_put(obj);
+	return ret;
+}
+
+static void intel_engine_cleanup_scratch(struct intel_engine_cs *engine)
+{
+	struct i915_vma *vma;
+
+	vma = nullify(&engine->scratch);
+	if (!vma)
+		return;
+
+	i915_vma_unpin(vma);
+	i915_gem_object_put(vma->obj);
+}
+
 /**
  * intel_engines_init_common - initialize cengine state which might require hw access
  * @engine: Engine to initialize.
@@ -226,6 +274,8 @@ int intel_engine_init_common(struct intel_engine_cs *engine)
  */
 void intel_engine_cleanup_common(struct intel_engine_cs *engine)
 {
+	intel_engine_cleanup_scratch(engine);
+
 	intel_engine_cleanup_cmd_parser(engine);
 	intel_engine_fini_breadcrumbs(engine);
 	i915_gem_batch_pool_fini(&engine->batch_pool);
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 4dc77747911d..096eb8c2da17 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1844,7 +1844,6 @@ int logical_render_ring_init(struct intel_engine_cs *engine)
 	else
 		engine->init_hw = gen8_init_render_ring;
 	engine->init_context = gen8_init_rcs_context;
-	engine->cleanup = intel_engine_cleanup_scratch;
 	engine->emit_flush = gen8_emit_flush_render;
 	engine->emit_request = gen8_emit_request_render;
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index f684fef895c1..af2d81ae3e7d 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -613,54 +613,6 @@ out:
 	return ret;
 }
 
-void intel_engine_cleanup_scratch(struct intel_engine_cs *engine)
-{
-	struct i915_vma *vma;
-
-	vma = nullify(&engine->scratch);
-	if (!vma)
-		return;
-
-	i915_vma_unpin(vma);
-	i915_gem_object_put(vma->obj);
-}
-
-int intel_engine_create_scratch(struct intel_engine_cs *engine, int size)
-{
-	struct drm_i915_gem_object *obj;
-	struct i915_vma *vma;
-	int ret;
-
-	WARN_ON(engine->scratch);
-
-	obj = i915_gem_object_create_stolen(&engine->i915->drm, size);
-	if (!obj)
-		obj = i915_gem_object_create(&engine->i915->drm, size);
-	if (IS_ERR(obj)) {
-		DRM_ERROR("Failed to allocate scratch page\n");
-		return PTR_ERR(obj);
-	}
-
-	vma = i915_vma_create(obj, &engine->i915->ggtt.base, NULL);
-	if (IS_ERR(vma)) {
-		ret = PTR_ERR(vma);
-		goto err_unref;
-	}
-
-	ret = i915_vma_pin(vma, 0, 4096, PIN_GLOBAL | PIN_HIGH);
-	if (ret)
-		goto err_unref;
-
-	engine->scratch = vma;
-	DRM_DEBUG_DRIVER("%s pipe control offset: 0x%08llx\n",
-			 engine->name, vma->node.start);
-	return 0;
-
-err_unref:
-	i915_gem_object_put(obj);
-	return ret;
-}
-
 static int intel_ring_workarounds_emit(struct drm_i915_gem_request *req)
 {
 	struct intel_ring *ring = req->ring;
@@ -1311,8 +1263,6 @@ static void render_ring_cleanup(struct intel_engine_cs *engine)
 		i915_gem_object_put(dev_priv->semaphore_obj);
 		dev_priv->semaphore_obj = NULL;
 	}
-
-	intel_engine_cleanup_scratch(engine);
 }
 
 static int gen8_rcs_signal(struct drm_i915_gem_request *req)
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 6a236c4c3f89..9e3ab8129734 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -472,11 +472,9 @@ void intel_ring_update_space(struct intel_ring *ring);
 
 void intel_engine_init_seqno(struct intel_engine_cs *engine, u32 seqno);
 
-int intel_engine_create_scratch(struct intel_engine_cs *engine, int size);
-void intel_engine_cleanup_scratch(struct intel_engine_cs *engine);
-
 void intel_engine_setup_common(struct intel_engine_cs *engine);
 int intel_engine_init_common(struct intel_engine_cs *engine);
+int intel_engine_create_scratch(struct intel_engine_cs *engine, int size);
 void intel_engine_cleanup_common(struct intel_engine_cs *engine);
 
 static inline int intel_engine_idle(struct intel_engine_cs *engine,
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 125+ messages in thread

* [PATCH 3/3] drm/i915: Move common seqno reset to intel_engine_cs.c
  2016-08-08  8:00   ` [PATCH 1/3] " Chris Wilson
  2016-08-08  8:00     ` [PATCH 2/3] drm/i915: Move common scratch allocation/destroy to intel_engine_cs.c Chris Wilson
@ 2016-08-08  8:00     ` Chris Wilson
  2016-08-08  9:40       ` Matthew Auld
  1 sibling, 1 reply; 125+ messages in thread
From: Chris Wilson @ 2016-08-08  8:00 UTC (permalink / raw)
  To: intel-gfx

Since the intel_engine_init_seqno() is shared by all engine submission
backends, move it out of the legacy intel_ringbuffer.c and
into the new home for common routines, intel_engine_cs.c

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_engine_cs.c  | 42 +++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_ringbuffer.c | 42 ---------------------------------
 2 files changed, 42 insertions(+), 42 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 1dec35441ab5..b82401849cb5 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -161,6 +161,48 @@ cleanup:
 	return ret;
 }
 
+void intel_engine_init_seqno(struct intel_engine_cs *engine, u32 seqno)
+{
+	struct drm_i915_private *dev_priv = engine->i915;
+
+	/* Our semaphore implementation is strictly monotonic (i.e. we proceed
+	 * so long as the semaphore value in the register/page is greater
+	 * than the sync value), so whenever we reset the seqno,
+	 * so long as we reset the tracking semaphore value to 0, it will
+	 * always be before the next request's seqno. If we don't reset
+	 * the semaphore value, then when the seqno moves backwards all
+	 * future waits will complete instantly (causing rendering corruption).
+	 */
+	if (IS_GEN6(dev_priv) || IS_GEN7(dev_priv)) {
+		I915_WRITE(RING_SYNC_0(engine->mmio_base), 0);
+		I915_WRITE(RING_SYNC_1(engine->mmio_base), 0);
+		if (HAS_VEBOX(dev_priv))
+			I915_WRITE(RING_SYNC_2(engine->mmio_base), 0);
+	}
+	if (dev_priv->semaphore_obj) {
+		struct drm_i915_gem_object *obj = dev_priv->semaphore_obj;
+		struct page *page = i915_gem_object_get_dirty_page(obj, 0);
+		void *semaphores = kmap(page);
+		memset(semaphores + GEN8_SEMAPHORE_OFFSET(engine->id, 0),
+		       0, I915_NUM_ENGINES * gen8_semaphore_seqno_size);
+		kunmap(page);
+	}
+	memset(engine->semaphore.sync_seqno, 0,
+	       sizeof(engine->semaphore.sync_seqno));
+
+	intel_write_status_page(engine, I915_GEM_HWS_INDEX, seqno);
+	if (engine->irq_seqno_barrier)
+		engine->irq_seqno_barrier(engine);
+	engine->last_submitted_seqno = seqno;
+
+	engine->hangcheck.seqno = seqno;
+
+	/* After manually advancing the seqno, fake the interrupt in case
+	 * there are any waiters for that seqno.
+	 */
+	intel_engine_wakeup(engine);
+}
+
 void intel_engine_init_hangcheck(struct intel_engine_cs *engine)
 {
 	memset(&engine->hangcheck, 0, sizeof(engine->hangcheck));
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index af2d81ae3e7d..3d4613673f27 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2311,48 +2311,6 @@ int intel_ring_cacheline_align(struct drm_i915_gem_request *req)
 	return 0;
 }
 
-void intel_engine_init_seqno(struct intel_engine_cs *engine, u32 seqno)
-{
-	struct drm_i915_private *dev_priv = engine->i915;
-
-	/* Our semaphore implementation is strictly monotonic (i.e. we proceed
-	 * so long as the semaphore value in the register/page is greater
-	 * than the sync value), so whenever we reset the seqno,
-	 * so long as we reset the tracking semaphore value to 0, it will
-	 * always be before the next request's seqno. If we don't reset
-	 * the semaphore value, then when the seqno moves backwards all
-	 * future waits will complete instantly (causing rendering corruption).
-	 */
-	if (IS_GEN6(dev_priv) || IS_GEN7(dev_priv)) {
-		I915_WRITE(RING_SYNC_0(engine->mmio_base), 0);
-		I915_WRITE(RING_SYNC_1(engine->mmio_base), 0);
-		if (HAS_VEBOX(dev_priv))
-			I915_WRITE(RING_SYNC_2(engine->mmio_base), 0);
-	}
-	if (dev_priv->semaphore_obj) {
-		struct drm_i915_gem_object *obj = dev_priv->semaphore_obj;
-		struct page *page = i915_gem_object_get_dirty_page(obj, 0);
-		void *semaphores = kmap(page);
-		memset(semaphores + GEN8_SEMAPHORE_OFFSET(engine->id, 0),
-		       0, I915_NUM_ENGINES * gen8_semaphore_seqno_size);
-		kunmap(page);
-	}
-	memset(engine->semaphore.sync_seqno, 0,
-	       sizeof(engine->semaphore.sync_seqno));
-
-	intel_write_status_page(engine, I915_GEM_HWS_INDEX, seqno);
-	if (engine->irq_seqno_barrier)
-		engine->irq_seqno_barrier(engine);
-	engine->last_submitted_seqno = seqno;
-
-	engine->hangcheck.seqno = seqno;
-
-	/* After manually advancing the seqno, fake the interrupt in case
-	 * there are any waiters for that seqno.
-	 */
-	intel_engine_wakeup(engine);
-}
-
 static void gen6_bsd_submit_request(struct drm_i915_gem_request *request)
 {
 	struct drm_i915_private *dev_priv = request->i915;
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 125+ messages in thread

* Re: [PATCH 14/33] drm/i915: Create a VMA for an object
  2016-08-07 14:45 ` [PATCH 14/33] drm/i915: Create a VMA for an object Chris Wilson
@ 2016-08-08  9:01   ` Joonas Lahtinen
  2016-08-08  9:09     ` Chris Wilson
  0 siblings, 1 reply; 125+ messages in thread
From: Joonas Lahtinen @ 2016-08-08  9:01 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On su, 2016-08-07 at 15:45 +0100, Chris Wilson wrote:
> In many places, we wish to store the VMA in preference to the object
> itself and so being able to create the persistent VMA is useful.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_drv.h     |  2 ++
>  drivers/gpu/drm/i915/i915_gem_gtt.c | 10 ++++++++++
>  drivers/gpu/drm/i915/i915_gem_gtt.h |  5 +++++
>  3 files changed, 17 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 826486d03e8e..2d8f32cd726d 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -3903,4 +3903,6 @@ static inline bool __i915_request_irq_complete(struct drm_i915_gem_request *req)
>  	return false;
>  }
>  
> +#define nullify(ptr) ({typeof(*ptr) T = *(ptr); *(ptr) = NULL; T;})
> +

Random lost hunk here.

>  #endif
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 18c7c9644761..ce53f08186fa 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -3388,6 +3388,16 @@ __i915_gem_vma_create(struct drm_i915_gem_object *obj,
>  }
>  
>  struct i915_vma *
> +i915_vma_create(struct drm_i915_gem_object *obj,
> +		struct i915_address_space *vm,
> +		const struct i915_ggtt_view *view)
> +{
> +	GEM_BUG_ON(view ? i915_gem_obj_to_ggtt_view(obj, view) : i915_gem_obj_to_vma(obj, vm));

GEM_BUG_ON(view && !i915_is_ggtt(vm)) ?

> +
> +	return __i915_gem_vma_create(obj, vm, view ?: &i915_ggtt_view_normal);
> +}
> +
> +struct i915_vma *
>  i915_gem_obj_lookup_or_create_vma(struct drm_i915_gem_object *obj,
>  				  struct i915_address_space *vm)
>  {
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
> index cc56206a1600..ac47663a4d32 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -232,6 +232,11 @@ struct i915_vma {
>  	struct drm_i915_gem_exec_object2 *exec_entry;
>  };
>  
> +struct i915_vma *
> +i915_vma_create(struct drm_i915_gem_object *obj,
> +		struct i915_address_space *vm,
> +		const struct i915_ggtt_view *view);
> +
>  static inline bool i915_vma_is_ggtt(const struct i915_vma *vma)
>  {
>  	return vma->flags & I915_VMA_GGTT;
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 13/33] drm/i915: Remove redundant WARN_ON from __i915_add_request()
  2016-08-07 14:45 ` [PATCH 13/33] drm/i915: Remove redundant WARN_ON from __i915_add_request() Chris Wilson
@ 2016-08-08  9:03   ` Joonas Lahtinen
  0 siblings, 0 replies; 125+ messages in thread
From: Joonas Lahtinen @ 2016-08-08  9:03 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On su, 2016-08-07 at 15:45 +0100, Chris Wilson wrote:
> It's an outright programming error, so explode if it is ever hit.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

> ---
>  drivers/gpu/drm/i915/i915_gem_request.c | 10 ++--------
>  1 file changed, 2 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
> index c6f523e2879c..0092f5e90cb2 100644
> --- a/drivers/gpu/drm/i915/i915_gem_request.c
> +++ b/drivers/gpu/drm/i915/i915_gem_request.c
> @@ -463,18 +463,12 @@ static void i915_gem_mark_busy(const struct intel_engine_cs *engine)
>   */
>  void __i915_add_request(struct drm_i915_gem_request *request, bool flush_caches)
>  {
> -	struct intel_engine_cs *engine;
> -	struct intel_ring *ring;
> +	struct intel_engine_cs *engine = request->engine;
> +	struct intel_ring *ring = request->ring;
>  	u32 request_start;
>  	u32 reserved_tail;
>  	int ret;
>  
> -	if (WARN_ON(!request))
> -		return;
> -
> -	engine = request->engine;
> -	ring = request->ring;
> -
>  	/*
>  	 * To ensure that this call will not fail, space for its emissions
>  	 * should already have been reserved in the ring buffer. Let the ring
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 09/33] drm/i915: Mark unmappable GGTT entries as PIN_HIGH
  2016-08-07 14:45 ` [PATCH 09/33] drm/i915: Mark unmappable GGTT entries as PIN_HIGH Chris Wilson
@ 2016-08-08  9:09   ` Joonas Lahtinen
  2016-08-09 11:05   ` Tvrtko Ursulin
  1 sibling, 0 replies; 125+ messages in thread
From: Joonas Lahtinen @ 2016-08-08  9:09 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On su, 2016-08-07 at 15:45 +0100, Chris Wilson wrote:
> We allocate a few objects into the GGTT that we never need to access via
> the mappable aperture (such as contexts, status pages). We can request
> that these are bound high in the VM to increase the amount of mappable
> aperture available. However, anything that may be frequently pinned
> (such as logical contexts) we want to use the fast search & insert.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

> ---
>  drivers/gpu/drm/i915/intel_lrc.c        | 2 +-
>  drivers/gpu/drm/i915/intel_ringbuffer.c | 5 +++--
>  2 files changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 309c5d9b1c57..c7f4b64b16f6 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1182,7 +1182,7 @@ static int lrc_setup_wa_ctx_obj(struct intel_engine_cs *engine, u32 size)
>  	}
>  
>  	ret = i915_gem_object_ggtt_pin(engine->wa_ctx.obj, NULL,
> -				       0, PAGE_SIZE, 0);
> +				       0, PAGE_SIZE, PIN_HIGH);
>  	if (ret) {
>  		DRM_DEBUG_DRIVER("pin LRC WA ctx backing obj failed: %d\n",
>  				 ret);
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 16b726fe33eb..09f01c641c14 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -2093,7 +2093,7 @@ static int intel_ring_context_pin(struct i915_gem_context *ctx,
>  
>  	if (ce->state) {
>  		ret = i915_gem_object_ggtt_pin(ce->state, NULL, 0,
> -					       ctx->ggtt_alignment, 0);
> +					       ctx->ggtt_alignment, PIN_HIGH);
>  		if (ret)
>  			goto error;
>  	}
> @@ -2629,7 +2629,8 @@ static void intel_ring_init_semaphores(struct drm_i915_private *dev_priv,
>  			i915.semaphores = 0;
>  		} else {
>  			i915_gem_object_set_cache_level(obj, I915_CACHE_LLC);
> -			ret = i915_gem_object_ggtt_pin(obj, NULL, 0, 0, 0);
> +			ret = i915_gem_object_ggtt_pin(obj, NULL,
> +						       0, 0, PIN_HIGH);
>  			if (ret != 0) {
>  				i915_gem_object_put(obj);
>  				DRM_ERROR("Failed to pin semaphore bo. Disabling semaphores\n");
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 14/33] drm/i915: Create a VMA for an object
  2016-08-08  9:01   ` Joonas Lahtinen
@ 2016-08-08  9:09     ` Chris Wilson
  2016-08-10 10:58       ` Joonas Lahtinen
  0 siblings, 1 reply; 125+ messages in thread
From: Chris Wilson @ 2016-08-08  9:09 UTC (permalink / raw)
  To: Joonas Lahtinen; +Cc: intel-gfx

On Mon, Aug 08, 2016 at 12:01:07PM +0300, Joonas Lahtinen wrote:
> On su, 2016-08-07 at 15:45 +0100, Chris Wilson wrote:
> > In many places, we wish to store the VMA in preference to the object
> > itself and so being able to create the persistent VMA is useful.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >  drivers/gpu/drm/i915/i915_drv.h     |  2 ++
> >  drivers/gpu/drm/i915/i915_gem_gtt.c | 10 ++++++++++
> >  drivers/gpu/drm/i915/i915_gem_gtt.h |  5 +++++
> >  3 files changed, 17 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index 826486d03e8e..2d8f32cd726d 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -3903,4 +3903,6 @@ static inline bool __i915_request_irq_complete(struct drm_i915_gem_request *req)
> >  	return false;
> >  }
> >  
> > +#define nullify(ptr) ({typeof(*ptr) T = *(ptr); *(ptr) = NULL; T;})
> > +
> 
> Random lost hunk here.

In the next patches where I use i915_vma_create() I also use this
helper. It was just conveience.

> >  #endif
> > diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> > index 18c7c9644761..ce53f08186fa 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> > @@ -3388,6 +3388,16 @@ __i915_gem_vma_create(struct drm_i915_gem_object *obj,
> >  }
> >  
> >  struct i915_vma *
> > +i915_vma_create(struct drm_i915_gem_object *obj,
> > +		struct i915_address_space *vm,
> > +		const struct i915_ggtt_view *view)
> > +{
> > +	GEM_BUG_ON(view ? i915_gem_obj_to_ggtt_view(obj, view) : i915_gem_obj_to_vma(obj, vm));
> 
> GEM_BUG_ON(view && !i915_is_ggtt(vm)) ?

We have that as a WARN_ON inside create(), I suspose it doesn't hurt
here either and documents the interface.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance
  2016-08-07 14:45 ` [PATCH 01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance Chris Wilson
@ 2016-08-08  9:12   ` Daniel Vetter
  2016-08-08  9:30     ` Chris Wilson
  0 siblings, 1 reply; 125+ messages in thread
From: Daniel Vetter @ 2016-08-08  9:12 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Daniel Vetter, intel-gfx

On Sun, Aug 07, 2016 at 03:45:09PM +0100, Chris Wilson wrote:
> In the debate as to whether the second read of active->request is
> ordered after the dependent reads of the first read of active->request,
> just give in and throw a smp_rmb() in there so that ordering of loads is
> assured.
> 
> v2: Explain the manual smp_rmb()
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

r-b confirmed.
-Daniel

> ---
>  drivers/gpu/drm/i915/i915_gem.c         | 25 ++++++++++++++++++++-----
>  drivers/gpu/drm/i915/i915_gem_request.h |  3 +++
>  2 files changed, 23 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index f4f8eaa90f2a..654f0b015f97 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3735,7 +3735,7 @@ i915_gem_object_ggtt_unpin_view(struct drm_i915_gem_object *obj,
>  	i915_vma_unpin(i915_gem_obj_to_ggtt_view(obj, view));
>  }
>  
> -static __always_inline unsigned __busy_read_flag(unsigned int id)
> +static __always_inline unsigned int __busy_read_flag(unsigned int id)
>  {
>  	/* Note that we could alias engines in the execbuf API, but
>  	 * that would be very unwise as it prevents userspace from
> @@ -3753,7 +3753,7 @@ static __always_inline unsigned int __busy_write_id(unsigned int id)
>  	return id;
>  }
>  
> -static __always_inline unsigned
> +static __always_inline unsigned int
>  __busy_set_if_active(const struct i915_gem_active *active,
>  		     unsigned int (*flag)(unsigned int id))
>  {
> @@ -3770,19 +3770,34 @@ __busy_set_if_active(const struct i915_gem_active *active,
>  
>  		id = request->engine->exec_id;
>  
> -		/* Check that the pointer wasn't reassigned and overwritten. */
> +		/* Check that the pointer wasn't reassigned and overwritten.
> +		 *
> +		 * In __i915_gem_active_get_rcu(), we enforce ordering between
> +		 * the first rcu pointer dereference (imposing a
> +		 * read-dependency only on access through the pointer) and
> +		 * the second lockless access through the memory barrier
> +		 * following a successful atomic_inc_not_zero(). Here there
> +		 * is no such barrier, and so we must manually insert an
> +		 * explicit read barrier to ensure that the following
> +		 * access occurs after all the loads through the first
> +		 * pointer.
> +		 *
> +		 * The corresponding write barrier is part of
> +		 * rcu_assign_pointer().
> +		 */
> +		smp_rmb();
>  		if (request == rcu_access_pointer(active->request))
>  			return flag(id);
>  	} while (1);
>  }
>  
> -static inline unsigned
> +static __always_inline unsigned int
>  busy_check_reader(const struct i915_gem_active *active)
>  {
>  	return __busy_set_if_active(active, __busy_read_flag);
>  }
>  
> -static inline unsigned
> +static __always_inline unsigned int
>  busy_check_writer(const struct i915_gem_active *active)
>  {
>  	return __busy_set_if_active(active, __busy_write_id);
> diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
> index 3496e28785e7..b2456dede3ad 100644
> --- a/drivers/gpu/drm/i915/i915_gem_request.h
> +++ b/drivers/gpu/drm/i915/i915_gem_request.h
> @@ -497,6 +497,9 @@ __i915_gem_active_get_rcu(const struct i915_gem_active *active)
>  		 * incremented) then the following read for rcu_access_pointer()
>  		 * must occur after the atomic operation and so confirm
>  		 * that this request is the one currently being tracked.
> +		 *
> +		 * The corresponding write barrier is part of
> +		 * rcu_assign_pointer().
>  		 */
>  		if (!request || request == rcu_access_pointer(active->request))
>  			return rcu_pointer_handoff(request);
> -- 
> 2.8.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 2/3] drm/i915: Move common scratch allocation/destroy to intel_engine_cs.c
  2016-08-08  8:00     ` [PATCH 2/3] drm/i915: Move common scratch allocation/destroy to intel_engine_cs.c Chris Wilson
@ 2016-08-08  9:24       ` Matthew Auld
  0 siblings, 0 replies; 125+ messages in thread
From: Matthew Auld @ 2016-08-08  9:24 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Intel Graphics Development

I take it that this patch belongs in the previous series where you
introduce the nullify helper?

Assuming that:
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 02/33] drm/i915: Do not overwrite the request with zero on reallocation
  2016-08-07 14:45 ` [PATCH 02/33] drm/i915: Do not overwrite the request with zero on reallocation Chris Wilson
@ 2016-08-08  9:25   ` Daniel Vetter
  2016-08-08  9:56     ` Chris Wilson
  0 siblings, 1 reply; 125+ messages in thread
From: Daniel Vetter @ 2016-08-08  9:25 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Daniel Vetter, intel-gfx, Goel, Akash

On Sun, Aug 07, 2016 at 03:45:10PM +0100, Chris Wilson wrote:
> When using RCU lookup for the request, commit 0eafec6d3244 ("drm/i915:
> Enable lockless lookup of request tracking via RCU"), we acknowledge that
> we may race with another thread that could have reallocated the request.
> In order for the first thread not to blow up, the second thread must not
> clear the request completed before overwriting it. In the RCU lookup, we
> allow for the engine/seqno to be replaced but we do not allow for it to
> be zeroed.
> 
> The choice we make is to either add extra checking to the RCU lookup, or
> embrace the inherent races (as intended). It is more complicated as we
> need to manually clear everything we depend upon being zero initialised,
> but we benefit from not emiting the memset() to clear the entire
> frequently allocated structure (that memset turns up in throughput
> profiles). And at the same time, the lookup remains flexible for future
> adjustments.
> 
> v2: Old style LRC requires another variable to be initialize. (The
> danger inherent in not zeroing everything.)
> v3: request->batch also needs to be cleared
> 
> Fixes: 0eafec6d3244 ("drm/i915: Enable lockless lookup of request...")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: "Goel, Akash" <akash.goel@intel.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem_request.c | 37 ++++++++++++++++++++++++++++++++-
>  drivers/gpu/drm/i915/i915_gem_request.h | 11 ++++++++++
>  2 files changed, 47 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
> index 6a1661643d3d..b7ffde002a62 100644
> --- a/drivers/gpu/drm/i915/i915_gem_request.c
> +++ b/drivers/gpu/drm/i915/i915_gem_request.c
> @@ -355,7 +355,35 @@ i915_gem_request_alloc(struct intel_engine_cs *engine,
>  	if (req && i915_gem_request_completed(req))
>  		i915_gem_request_retire(req);
>  
> -	req = kmem_cache_zalloc(dev_priv->requests, GFP_KERNEL);
> +	/* Beware: Dragons be flying overhead.
> +	 *
> +	 * We use RCU to look up requests in flight. The lookups may
> +	 * race with the request being allocated from the slab freelist.
> +	 * That is the request we are writing to here, may be in the process
> +	 * of being read by __i915_gem_active_get_request_rcu(). As such,
> +	 * we have to be very careful when overwriting the contents. During
> +	 * the RCU lookup, we change chase the request->engine pointer,
> +	 * read the request->fence.seqno and increment the reference count.
> +	 *
> +	 * The reference count is incremented atomically. If it is zero,
> +	 * the lookup knows the request is unallocated and complete. Otherwise,
> +	 * it is either still in use, or has been reallocated and reset
> +	 * with fence_init(). This increment is safe for release as we check
> +	 * that the request we have a reference to and matches the active
> +	 * request.
> +	 *
> +	 * Before we increment the refcount, we chase the request->engine
> +	 * pointer. We must not call kmem_cache_zalloc() or else we set
> +	 * that pointer to NULL and cause a crash during the lookup. If
> +	 * we see the request is completed (based on the value of the
> +	 * old engine and seqno), the lookup is complete and reports NULL.
> +	 * If we decide the request is not completed (new engine or seqno),
> +	 * then we grab a reference and double check that it is still the
> +	 * active request - which it won't be and restart the lookup.
> +	 *
> +	 * Do not use kmem_cache_zalloc() here!
> +	 */
> +	req = kmem_cache_alloc(dev_priv->requests, GFP_KERNEL);
>  	if (!req)
>  		return ERR_PTR(-ENOMEM);
>  
> @@ -375,6 +403,13 @@ i915_gem_request_alloc(struct intel_engine_cs *engine,
>  	req->engine = engine;
>  	req->ctx = i915_gem_context_get(ctx);

See my earlier review - if we go with this I think we should fully embrace
it and not clear anything where it's not needed. Otherwise we have a funny
mix of defensive clearing to NULL and needing to be careful.
  
> +	/* No zalloc, must clear what we need by hand */
> +	req->signaling.wait.tsk = NULL;

This shouldn't be non-NULL once the refcount has dropped to 0. Maybe a
WARN_ON instead?

> +	req->previous_context = NULL;

We unconditionally set this in advance_context (together with a bunch of
other ring state tracked in the request). Do we really need to reset this
here?

> +	req->file_priv = NULL;

This is already cleared in either request_retire or _release. Again maybe
just a WARN_ON?.

> +	req->batch_obj = NULL;

Agreed with this one, we might reuse the request for a non-execbuf
request. But I think we also need to reset ->pid here.

> +	req->elsp_submitted = 0;

Needed, but feels misplaced since it's lrc stuff. I think it'd be better
to stuff this into intel_logical_ring_alloc_request_extras.

Aside, while reviewing this I noticed that the /** comments in
i915_gem_request.h aren't really kerneldoc - the metadata is missing. Also
would be great to include all that into a new section in i915.rst.

I didn't spot anything else that could result in harm - but I probably
missed something somewhere ;-)

I'm happy with all the comments&other changes in this patch.
-Daniel

> +
>  	/*
>  	 * Reserve space in the ring buffer for all the commands required to
>  	 * eventually emit this request. This is to guarantee that the
> diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
> index b2456dede3ad..721eb8cbce9b 100644
> --- a/drivers/gpu/drm/i915/i915_gem_request.h
> +++ b/drivers/gpu/drm/i915/i915_gem_request.h
> @@ -51,6 +51,13 @@ struct intel_signal_node {
>   * emission time to be associated with the request for tracking how far ahead
>   * of the GPU the submission is.
>   *
> + * When modifying this structure be very aware that we perform a lockless
> + * RCU lookup of it that may race against reallocation of the struct
> + * from the slab freelist. We intentionally do not zero the structure on
> + * allocation so that the lookup can use the dangling pointers (and is
> + * cogniscent that those pointers may be wrong). Instead, everything that
> + * needs to be initialised must be done so explicitly.
> + *
>   * The requests are reference counted.
>   */
>  struct drm_i915_gem_request {
> @@ -465,6 +472,10 @@ __i915_gem_active_get_rcu(const struct i915_gem_active *active)
>  	 * just report the active tracker is idle. If the new request is
>  	 * incomplete, then we acquire a reference on it and check that
>  	 * it remained the active request.
> +	 *
> +	 * It is then imperative that we do not zero the request on
> +	 * reallocation, so that we can chase the dangling pointers!
> +	 * See i915_gem_request_alloc().
>  	 */
>  	do {
>  		struct drm_i915_gem_request *request;
> -- 
> 2.8.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance
  2016-08-08  9:12   ` Daniel Vetter
@ 2016-08-08  9:30     ` Chris Wilson
  2016-08-08  9:45       ` Chris Wilson
  0 siblings, 1 reply; 125+ messages in thread
From: Chris Wilson @ 2016-08-08  9:30 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Daniel Vetter, intel-gfx

On Mon, Aug 08, 2016 at 11:12:59AM +0200, Daniel Vetter wrote:
> On Sun, Aug 07, 2016 at 03:45:09PM +0100, Chris Wilson wrote:
> > In the debate as to whether the second read of active->request is
> > ordered after the dependent reads of the first read of active->request,
> > just give in and throw a smp_rmb() in there so that ordering of loads is
> > assured.
> > 
> > v2: Explain the manual smp_rmb()
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> 
> r-b confirmed.

It's still fishy that we are implying an SMP effect where we need to
mandate the local processor order (that being the order evaluation of
request = *active; engine = *request; *active). The two *active are
already ordered across SMP, so we are only concered about this cpu. :|
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 04/33] drm/i915: Use RCU to annotate and enforce protection for breadcrumb's bh
  2016-08-07 14:45 ` [PATCH 04/33] drm/i915: Use RCU to annotate and enforce protection for breadcrumb's bh Chris Wilson
@ 2016-08-08  9:33   ` Daniel Vetter
  2016-08-12  9:56   ` Joonas Lahtinen
  1 sibling, 0 replies; 125+ messages in thread
From: Daniel Vetter @ 2016-08-08  9:33 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Daniel Vetter, intel-gfx

On Sun, Aug 07, 2016 at 03:45:12PM +0100, Chris Wilson wrote:
> The bottom-half we use for processing the breadcrumb interrupt is a
> task, which is an RCU protected struct. When accessing this struct, we
> need to be holding the RCU read lock to prevent it disappearing beneath
> us. We can use the RCU annotation to mark our irq_seqno_bh pointer as
> being under RCU guard and then use the RCU accessors to both provide
> correct ordering of access through the pointer.
> 
> Most notably, this fixes the access from hard irq context to use the RCU
> read lock, which both Daniel and Tvrtko complained about.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>

I'll leave sparse-checking this to 0day and runtime lockdep checking to CI
;-)

> ---
>  drivers/gpu/drm/i915/i915_drv.h          |  2 +-
>  drivers/gpu/drm/i915/intel_breadcrumbs.c | 22 +++++++++-------------
>  drivers/gpu/drm/i915/intel_ringbuffer.c  |  2 --
>  drivers/gpu/drm/i915/intel_ringbuffer.h  | 21 ++++++++++++++-------
>  4 files changed, 24 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index feec00f769e1..3d546b5c2e4c 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -3848,7 +3848,7 @@ static inline bool __i915_request_irq_complete(struct drm_i915_gem_request *req)
>  	 * is woken.
>  	 */
>  	if (engine->irq_seqno_barrier &&
> -	    READ_ONCE(engine->breadcrumbs.irq_seqno_bh) == current &&
> +	    rcu_access_pointer(engine->breadcrumbs.irq_seqno_bh) == current &&
>  	    cmpxchg_relaxed(&engine->breadcrumbs.irq_posted, 1, 0)) {
>  		struct task_struct *tsk;
>  
> diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> index 8ecb3b6538fc..7552bd039565 100644
> --- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
> +++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> @@ -60,10 +60,8 @@ static void intel_breadcrumbs_fake_irq(unsigned long data)
>  	 * every jiffie in order to kick the oldest waiter to do the
>  	 * coherent seqno check.
>  	 */
> -	rcu_read_lock();
>  	if (intel_engine_wakeup(engine))
>  		mod_timer(&engine->breadcrumbs.fake_irq, jiffies + 1);
> -	rcu_read_unlock();
>  }
>  
>  static void irq_enable(struct intel_engine_cs *engine)
> @@ -232,7 +230,7 @@ static bool __intel_engine_add_wait(struct intel_engine_cs *engine,
>  	}
>  	rb_link_node(&wait->node, parent, p);
>  	rb_insert_color(&wait->node, &b->waiters);
> -	GEM_BUG_ON(!first && !b->irq_seqno_bh);
> +	GEM_BUG_ON(!first && !rcu_access_pointer(b->irq_seqno_bh));

Nit: reading through rcu docs I think the suggested accessor here is
rcu_dereference_protected for write-side access. That one allows the
compiler full freedom for reordering. otoh it's a bit more noise-y and meh
about optional debug checks anyway. So with or without that change:

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

>  
>  	if (completed) {
>  		struct rb_node *next = rb_next(completed);
> @@ -242,7 +240,7 @@ static bool __intel_engine_add_wait(struct intel_engine_cs *engine,
>  			GEM_BUG_ON(first);
>  			b->timeout = wait_timeout();
>  			b->first_wait = to_wait(next);
> -			smp_store_mb(b->irq_seqno_bh, b->first_wait->tsk);
> +			rcu_assign_pointer(b->irq_seqno_bh, b->first_wait->tsk);
>  			/* As there is a delay between reading the current
>  			 * seqno, processing the completed tasks and selecting
>  			 * the next waiter, we may have missed the interrupt
> @@ -269,7 +267,7 @@ static bool __intel_engine_add_wait(struct intel_engine_cs *engine,
>  		GEM_BUG_ON(rb_first(&b->waiters) != &wait->node);
>  		b->timeout = wait_timeout();
>  		b->first_wait = wait;
> -		smp_store_mb(b->irq_seqno_bh, wait->tsk);
> +		rcu_assign_pointer(b->irq_seqno_bh, wait->tsk);
>  		/* After assigning ourselves as the new bottom-half, we must
>  		 * perform a cursory check to prevent a missed interrupt.
>  		 * Either we miss the interrupt whilst programming the hardware,
> @@ -280,7 +278,7 @@ static bool __intel_engine_add_wait(struct intel_engine_cs *engine,
>  		 */
>  		__intel_breadcrumbs_enable_irq(b);
>  	}
> -	GEM_BUG_ON(!b->irq_seqno_bh);
> +	GEM_BUG_ON(!rcu_access_pointer(b->irq_seqno_bh));
>  	GEM_BUG_ON(!b->first_wait);
>  	GEM_BUG_ON(rb_first(&b->waiters) != &b->first_wait->node);
>  
> @@ -335,7 +333,7 @@ void intel_engine_remove_wait(struct intel_engine_cs *engine,
>  		const int priority = wakeup_priority(b, wait->tsk);
>  		struct rb_node *next;
>  
> -		GEM_BUG_ON(b->irq_seqno_bh != wait->tsk);
> +		GEM_BUG_ON(rcu_access_pointer(b->irq_seqno_bh) != wait->tsk);
>  
>  		/* We are the current bottom-half. Find the next candidate,
>  		 * the first waiter in the queue on the remaining oldest
> @@ -379,13 +377,13 @@ void intel_engine_remove_wait(struct intel_engine_cs *engine,
>  			 */
>  			b->timeout = wait_timeout();
>  			b->first_wait = to_wait(next);
> -			smp_store_mb(b->irq_seqno_bh, b->first_wait->tsk);
> +			rcu_assign_pointer(b->irq_seqno_bh, b->first_wait->tsk);
>  			if (b->first_wait->seqno != wait->seqno)
>  				__intel_breadcrumbs_enable_irq(b);
> -			wake_up_process(b->irq_seqno_bh);
> +			wake_up_process(b->first_wait->tsk);
>  		} else {
>  			b->first_wait = NULL;
> -			WRITE_ONCE(b->irq_seqno_bh, NULL);
> +			rcu_assign_pointer(b->irq_seqno_bh, NULL);
>  			__intel_breadcrumbs_disable_irq(b);
>  		}
>  	} else {
> @@ -399,7 +397,7 @@ out_unlock:
>  	GEM_BUG_ON(b->first_wait == wait);
>  	GEM_BUG_ON(rb_first(&b->waiters) !=
>  		   (b->first_wait ? &b->first_wait->node : NULL));
> -	GEM_BUG_ON(!b->irq_seqno_bh ^ RB_EMPTY_ROOT(&b->waiters));
> +	GEM_BUG_ON(!rcu_access_pointer(b->irq_seqno_bh) ^ RB_EMPTY_ROOT(&b->waiters));
>  	spin_unlock(&b->lock);
>  }
>  
> @@ -596,11 +594,9 @@ unsigned int intel_kick_waiters(struct drm_i915_private *i915)
>  	 * RCU lock, i.e. as we call wake_up_process() we must be holding the
>  	 * rcu_read_lock().
>  	 */
> -	rcu_read_lock();
>  	for_each_engine(engine, i915)
>  		if (unlikely(intel_engine_wakeup(engine)))
>  			mask |= intel_engine_flag(engine);
> -	rcu_read_unlock();
>  
>  	return mask;
>  }
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index e08a1e1b04e4..16b726fe33eb 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -2410,9 +2410,7 @@ void intel_engine_init_seqno(struct intel_engine_cs *engine, u32 seqno)
>  	/* After manually advancing the seqno, fake the interrupt in case
>  	 * there are any waiters for that seqno.
>  	 */
> -	rcu_read_lock();
>  	intel_engine_wakeup(engine);
> -	rcu_read_unlock();
>  }
>  
>  static void gen6_bsd_submit_request(struct drm_i915_gem_request *request)
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 4aed4586b0b6..66dc93469076 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -171,7 +171,7 @@ struct intel_engine_cs {
>  	 * the overhead of waking that client is much preferred.
>  	 */
>  	struct intel_breadcrumbs {
> -		struct task_struct *irq_seqno_bh; /* bh for user interrupts */
> +		struct task_struct __rcu *irq_seqno_bh; /* bh for interrupts */
>  		bool irq_posted;
>  
>  		spinlock_t lock; /* protects the lists of requests */
> @@ -541,23 +541,30 @@ void intel_engine_enable_signaling(struct drm_i915_gem_request *request);
>  
>  static inline bool intel_engine_has_waiter(struct intel_engine_cs *engine)
>  {
> -	return READ_ONCE(engine->breadcrumbs.irq_seqno_bh);
> +	return rcu_access_pointer(engine->breadcrumbs.irq_seqno_bh);
>  }
>  
>  static inline bool intel_engine_wakeup(struct intel_engine_cs *engine)
>  {
>  	bool wakeup = false;
> -	struct task_struct *tsk = READ_ONCE(engine->breadcrumbs.irq_seqno_bh);
> +
>  	/* Note that for this not to dangerously chase a dangling pointer,
> -	 * the caller is responsible for ensure that the task remain valid for
> -	 * wake_up_process() i.e. that the RCU grace period cannot expire.
> +	 * we must hold the rcu_read_lock here.
>  	 *
>  	 * Also note that tsk is likely to be in !TASK_RUNNING state so an
>  	 * early test for tsk->state != TASK_RUNNING before wake_up_process()
>  	 * is unlikely to be beneficial.
>  	 */
> -	if (tsk)
> -		wakeup = wake_up_process(tsk);
> +	if (rcu_access_pointer(engine->breadcrumbs.irq_seqno_bh)) {
> +		struct task_struct *tsk;
> +
> +		rcu_read_lock();
> +		tsk = rcu_dereference(engine->breadcrumbs.irq_seqno_bh);
> +		if (tsk)
> +			wakeup = wake_up_process(tsk);
> +		rcu_read_unlock();
> +	}
> +
>  	return wakeup;
>  }
>  
> -- 
> 2.8.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 3/3] drm/i915: Move common seqno reset to intel_engine_cs.c
  2016-08-08  8:00     ` [PATCH 3/3] drm/i915: Move common seqno reset " Chris Wilson
@ 2016-08-08  9:40       ` Matthew Auld
  2016-08-08 10:15         ` Chris Wilson
  0 siblings, 1 reply; 125+ messages in thread
From: Matthew Auld @ 2016-08-08  9:40 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Intel Graphics Development

In which header does the prototype for intel_engine_init_seqno now exist?
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance
  2016-08-08  9:30     ` Chris Wilson
@ 2016-08-08  9:45       ` Chris Wilson
  2016-08-09  6:36         ` Joonas Lahtinen
  0 siblings, 1 reply; 125+ messages in thread
From: Chris Wilson @ 2016-08-08  9:45 UTC (permalink / raw)
  To: Daniel Vetter, intel-gfx, joonas.lahtinen, Daniel Vetter

On Mon, Aug 08, 2016 at 10:30:25AM +0100, Chris Wilson wrote:
> On Mon, Aug 08, 2016 at 11:12:59AM +0200, Daniel Vetter wrote:
> > On Sun, Aug 07, 2016 at 03:45:09PM +0100, Chris Wilson wrote:
> > > In the debate as to whether the second read of active->request is
> > > ordered after the dependent reads of the first read of active->request,
> > > just give in and throw a smp_rmb() in there so that ordering of loads is
> > > assured.
> > > 
> > > v2: Explain the manual smp_rmb()
> > > 
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> > 
> > r-b confirmed.
> 
> It's still fishy that we are implying an SMP effect where we need to
> mandate the local processor order (that being the order evaluation of
> request = *active; engine = *request; *active). The two *active are
> already ordered across SMP, so we are only concered about this cpu. :|

More second thoughts. rcu_assign_pointer(NULL) is not visible to
rcu_access_pointer on another CPU without the smp_rmb. I think I am
overestimating the barriers in place for RCU, and they are weaker than
what I imagined for good reason.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* ✗ Ro.CI.BAT: failure for series starting with [01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance (rev4)
  2016-08-07 14:45 First class VMA, take 2 Chris Wilson
                   ` (33 preceding siblings ...)
  2016-08-07 15:16 ` ✗ Ro.CI.BAT: failure for series starting with [01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance Patchwork
@ 2016-08-08  9:46 ` Patchwork
  2016-08-08 10:34 ` ✗ Fi.CI.BAT: " Patchwork
                   ` (3 subsequent siblings)
  38 siblings, 0 replies; 125+ messages in thread
From: Patchwork @ 2016-08-08  9:46 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance (rev4)
URL   : https://patchwork.freedesktop.org/series/10770/
State : failure

== Summary ==

Applying: drm/i915: Add smp_rmb() to busy ioctl's RCU dance
Applying: drm/i915: Do not overwrite the request with zero on reallocation
Applying: drm/i915: Move missed interrupt detection from hangcheck to breadcrumbs
Applying: drm/i915: Use RCU to annotate and enforce protection for breadcrumb's bh
Applying: drm/i915: Reduce amount of duplicate buffer information captured on error
Applying: drm/i915: Stop the machine whilst capturing the GPU crash dump
Applying: drm/i915: Store the active context object on all engines upon error
Applying: drm/i915: Move setting of request->batch into its single callsite
Applying: drm/i915: Mark unmappable GGTT entries as PIN_HIGH
Applying: drm/i915: Remove inactive/active list from debugfs
Applying: drm/i915: Focus debugfs/i915_gem_pinned to show only display pins
Applying: drm/i915: Reduce i915_gem_objects to only show object information
Applying: drm/i915: Remove redundant WARN_ON from __i915_add_request()
Applying: drm/i915: Create a VMA for an object
Applying: drm/i915: Track pinned vma inside guc
Applying: drm/i915: Convert fence computations to use vma directly
Applying: drm/i915: Use VMA directly for checking tiling parameters
Applying: drm/i915: Use VMA as the primary object for context state
Applying: drm/i915: Only clflush the context object when binding
Applying: drm/i915: Use VMA for ringbuffer tracking
Applying: drm/i915: Move common seqno reset to intel_engine_cs.c
Applying: drm/i915/overlay: Use VMA as the primary tracker for images
Applying: drm/i915: Use VMA as the primary tracker for semaphore page
Using index info to reconstruct a base tree...
M	drivers/gpu/drm/i915/i915_drv.h
M	drivers/gpu/drm/i915/i915_gpu_error.c
M	drivers/gpu/drm/i915/intel_ringbuffer.c
M	drivers/gpu/drm/i915/intel_ringbuffer.h
Falling back to patching base and 3-way merge...
Auto-merging drivers/gpu/drm/i915/intel_ringbuffer.h
Auto-merging drivers/gpu/drm/i915/intel_ringbuffer.c
CONFLICT (content): Merge conflict in drivers/gpu/drm/i915/intel_ringbuffer.c
Auto-merging drivers/gpu/drm/i915/i915_gpu_error.c
Auto-merging drivers/gpu/drm/i915/i915_drv.h
error: Failed to merge in the changes.
Patch failed at 0023 drm/i915: Use VMA as the primary tracker for semaphore page
The copy of the patch that failed is found in: .git/rebase-apply/patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 02/33] drm/i915: Do not overwrite the request with zero on reallocation
  2016-08-08  9:25   ` Daniel Vetter
@ 2016-08-08  9:56     ` Chris Wilson
  2016-08-09  6:32       ` Daniel Vetter
  0 siblings, 1 reply; 125+ messages in thread
From: Chris Wilson @ 2016-08-08  9:56 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Daniel Vetter, intel-gfx, Goel, Akash

On Mon, Aug 08, 2016 at 11:25:56AM +0200, Daniel Vetter wrote:
> On Sun, Aug 07, 2016 at 03:45:10PM +0100, Chris Wilson wrote:
> > When using RCU lookup for the request, commit 0eafec6d3244 ("drm/i915:
> > Enable lockless lookup of request tracking via RCU"), we acknowledge that
> > we may race with another thread that could have reallocated the request.
> > In order for the first thread not to blow up, the second thread must not
> > clear the request completed before overwriting it. In the RCU lookup, we
> > allow for the engine/seqno to be replaced but we do not allow for it to
> > be zeroed.
> > 
> > The choice we make is to either add extra checking to the RCU lookup, or
> > embrace the inherent races (as intended). It is more complicated as we
> > need to manually clear everything we depend upon being zero initialised,
> > but we benefit from not emiting the memset() to clear the entire
> > frequently allocated structure (that memset turns up in throughput
> > profiles). And at the same time, the lookup remains flexible for future
> > adjustments.
> > 
> > v2: Old style LRC requires another variable to be initialize. (The
> > danger inherent in not zeroing everything.)
> > v3: request->batch also needs to be cleared
> > 
> > Fixes: 0eafec6d3244 ("drm/i915: Enable lockless lookup of request...")
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: "Goel, Akash" <akash.goel@intel.com>
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_gem_request.c | 37 ++++++++++++++++++++++++++++++++-
> >  drivers/gpu/drm/i915/i915_gem_request.h | 11 ++++++++++
> >  2 files changed, 47 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
> > index 6a1661643d3d..b7ffde002a62 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_request.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_request.c
> > @@ -355,7 +355,35 @@ i915_gem_request_alloc(struct intel_engine_cs *engine,
> >  	if (req && i915_gem_request_completed(req))
> >  		i915_gem_request_retire(req);
> >  
> > -	req = kmem_cache_zalloc(dev_priv->requests, GFP_KERNEL);
> > +	/* Beware: Dragons be flying overhead.
> > +	 *
> > +	 * We use RCU to look up requests in flight. The lookups may
> > +	 * race with the request being allocated from the slab freelist.
> > +	 * That is the request we are writing to here, may be in the process
> > +	 * of being read by __i915_gem_active_get_request_rcu(). As such,
> > +	 * we have to be very careful when overwriting the contents. During
> > +	 * the RCU lookup, we change chase the request->engine pointer,
> > +	 * read the request->fence.seqno and increment the reference count.
> > +	 *
> > +	 * The reference count is incremented atomically. If it is zero,
> > +	 * the lookup knows the request is unallocated and complete. Otherwise,
> > +	 * it is either still in use, or has been reallocated and reset
> > +	 * with fence_init(). This increment is safe for release as we check
> > +	 * that the request we have a reference to and matches the active
> > +	 * request.
> > +	 *
> > +	 * Before we increment the refcount, we chase the request->engine
> > +	 * pointer. We must not call kmem_cache_zalloc() or else we set
> > +	 * that pointer to NULL and cause a crash during the lookup. If
> > +	 * we see the request is completed (based on the value of the
> > +	 * old engine and seqno), the lookup is complete and reports NULL.
> > +	 * If we decide the request is not completed (new engine or seqno),
> > +	 * then we grab a reference and double check that it is still the
> > +	 * active request - which it won't be and restart the lookup.
> > +	 *
> > +	 * Do not use kmem_cache_zalloc() here!
> > +	 */
> > +	req = kmem_cache_alloc(dev_priv->requests, GFP_KERNEL);
> >  	if (!req)
> >  		return ERR_PTR(-ENOMEM);
> >  
> > @@ -375,6 +403,13 @@ i915_gem_request_alloc(struct intel_engine_cs *engine,
> >  	req->engine = engine;
> >  	req->ctx = i915_gem_context_get(ctx);
> 
> See my earlier review - if we go with this I think we should fully embrace
> it and not clear anything where it's not needed. Otherwise we have a funny
> mix of defensive clearing to NULL and needing to be careful.
>   
> > +	/* No zalloc, must clear what we need by hand */
> > +	req->signaling.wait.tsk = NULL;
> 
> This shouldn't be non-NULL once the refcount has dropped to 0. Maybe a
> WARN_ON instead?

This is just from older code where we had the if (wait.tsk != NULL)
skip.

> > +	req->previous_context = NULL;
> 
> We unconditionally set this in advance_context (together with a bunch of
> other ring state tracked in the request). Do we really need to reset this
> here?

Previous_context may be used unset (along a failure path), so requires
initialising.

> > +	req->file_priv = NULL;
> 
> This is already cleared in either request_retire or _release. Again maybe
> just a WARN_ON?.

But we never clear it first, so it may be poisoned.

> > +	req->batch_obj = NULL;
> 
> Agreed with this one, we might reuse the request for a non-execbuf
> request. But I think we also need to reset ->pid here.

What pid? Gah. (Don't have pid here in my tree...)

> > +	req->elsp_submitted = 0;
> 
> Needed, but feels misplaced since it's lrc stuff. I think it'd be better
> to stuff this into intel_logical_ring_alloc_request_extras.

No need for that extra complexity, it is to be removed.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 3/3] drm/i915: Move common seqno reset to intel_engine_cs.c
  2016-08-08  9:40       ` Matthew Auld
@ 2016-08-08 10:15         ` Chris Wilson
  2016-08-08 15:34           ` Matthew Auld
  0 siblings, 1 reply; 125+ messages in thread
From: Chris Wilson @ 2016-08-08 10:15 UTC (permalink / raw)
  To: Matthew Auld; +Cc: Intel Graphics Development

On Mon, Aug 08, 2016 at 10:40:35AM +0100, Matthew Auld wrote:
> In which header does the prototype for intel_engine_init_seqno now exist?

Same as before, intel_ringbuffer.h. We haven't yet decided upon
splitting that between engine submission and actual ring operations yet.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* ✗ Fi.CI.BAT: failure for series starting with [01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance (rev4)
  2016-08-07 14:45 First class VMA, take 2 Chris Wilson
                   ` (34 preceding siblings ...)
  2016-08-08  9:46 ` ✗ Ro.CI.BAT: failure for series starting with [01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance (rev4) Patchwork
@ 2016-08-08 10:34 ` Patchwork
  2016-08-09 14:10 ` ✗ Ro.CI.BAT: failure for series starting with [01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance (rev5) Patchwork
                   ` (2 subsequent siblings)
  38 siblings, 0 replies; 125+ messages in thread
From: Patchwork @ 2016-08-08 10:34 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance (rev4)
URL   : https://patchwork.freedesktop.org/series/10770/
State : failure

== Summary ==

  CC      drivers/usb/storage/usb.o
  LD [M]  drivers/net/ethernet/intel/e1000/e1000.o
  CC      drivers/usb/storage/initializers.o
  CC      drivers/net/phy/mdio_device.o
  CC      drivers/net/phy/swphy.o
  CC [M]  drivers/net/usb/mcs7830.o
  CC [M]  drivers/net/usb/usbnet.o
  CC [M]  drivers/net/phy/lxt.o
  CC [M]  drivers/net/usb/cdc_ncm.o
  CC      drivers/usb/storage/sierra_ms.o
  CC [M]  drivers/net/phy/smsc.o
  CC      drivers/usb/storage/option_ms.o
  CC [M]  drivers/net/phy/bcm-phy-lib.o
  CC      drivers/usb/storage/usual-tables.o
  CC [M]  drivers/net/phy/broadcom.o
  CC [M]  drivers/net/phy/bcm7xxx.o
  CC [M]  drivers/net/phy/bcm87xx.o
  CC [M]  drivers/net/phy/realtek.o
  CC [M]  drivers/net/phy/fixed_phy.o
  LD      drivers/net/phy/libphy.o
  LD      drivers/net/phy/built-in.o
  LD [M]  drivers/net/ethernet/intel/igb/igb.o
  LD      drivers/usb/storage/usb-storage.o
  LD      drivers/usb/storage/built-in.o
  LD      drivers/usb/built-in.o
  LD [M]  drivers/net/ethernet/intel/e1000e/e1000e.o
  LD      drivers/net/ethernet/built-in.o
  LD      drivers/net/built-in.o
Makefile:975: recipe for target 'drivers' failed
make: *** [drivers] Error 2

Full logs at /archive/deploy/logs/CI_Patchwork_build_2251

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 30/33] drm/i915: Record the RING_MODE register for post-mortem debugging
  2016-08-07 14:45 ` [PATCH 30/33] drm/i915: Record the RING_MODE register for post-mortem debugging Chris Wilson
@ 2016-08-08 11:35   ` Joonas Lahtinen
  0 siblings, 0 replies; 125+ messages in thread
From: Joonas Lahtinen @ 2016-08-08 11:35 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On su, 2016-08-07 at 15:45 +0100, Chris Wilson wrote:
> Just another useful register to inspect following a GPU hang.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_drv.h       | 1 +
>  drivers/gpu/drm/i915/i915_gpu_error.c | 4 ++++
>  2 files changed, 5 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index e7357656728e..b5abf88e908e 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -534,6 +534,7 @@ struct drm_i915_error_state {
>  		u32 tail;
>  		u32 head;
>  		u32 ctl;
> +		u32 mode;
>  		u32 hws;
>  		u32 ipeir;
>  		u32 ipehr;
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index 5d8fd0beda2e..c48277fbe6a7 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -237,6 +237,8 @@ static void error_print_engine(struct drm_i915_error_state_buf *m,
>  	err_printf(m, "  HEAD:  0x%08x\n", ee->head);
>  	err_printf(m, "  TAIL:  0x%08x\n", ee->tail);
>  	err_printf(m, "  CTL:   0x%08x\n", ee->ctl);
> +	err_printf(m, "  MODE:  0x%08x [idle? %d]\n",
> +		   ee->mode, !!(ee->mode & MODE_IDLE));

yesno()?

>  	err_printf(m, "  HWS:   0x%08x\n", ee->hws);
>  	err_printf(m, "  ACTHD: 0x%08x %08x\n",
>  		   (u32)(ee->acthd>>32), (u32)ee->acthd);
> @@ -979,6 +981,8 @@ static void error_record_engine_registers(struct drm_i915_error_state *error,
>  	ee->head = I915_READ_HEAD(engine);
>  	ee->tail = I915_READ_TAIL(engine);
>  	ee->ctl = I915_READ_CTL(engine);
> +	if (INTEL_GEN(dev_priv) > 2)
> +		ee->mode = I915_READ_MODE(engine);

IS_GEN2 is used elsewhere in the code with I915_READ_MODE, either site
should be fixed.

Regards, Joonas

>  
>  	if (I915_NEED_GFX_HWS(dev_priv)) {
>  		i915_reg_t mmio;
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 3/3] drm/i915: Move common seqno reset to intel_engine_cs.c
  2016-08-08 10:15         ` Chris Wilson
@ 2016-08-08 15:34           ` Matthew Auld
  0 siblings, 0 replies; 125+ messages in thread
From: Matthew Auld @ 2016-08-08 15:34 UTC (permalink / raw)
  To: Chris Wilson, Intel Graphics Development

On 8 August 2016 at 11:15, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> On Mon, Aug 08, 2016 at 10:40:35AM +0100, Matthew Auld wrote:
>> In which header does the prototype for intel_engine_init_seqno now exist?
>
> Same as before, intel_ringbuffer.h. We haven't yet decided upon
> splitting that between engine submission and actual ring operations yet.
Ok.
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 17/33] drm/i915: Use VMA directly for checking tiling parameters
  2016-08-07 14:45 ` [PATCH 17/33] drm/i915: Use VMA directly for checking tiling parameters Chris Wilson
@ 2016-08-09  6:18   ` Joonas Lahtinen
  2016-08-09  8:03     ` Chris Wilson
  0 siblings, 1 reply; 125+ messages in thread
From: Joonas Lahtinen @ 2016-08-09  6:18 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On su, 2016-08-07 at 15:45 +0100, Chris Wilson wrote:
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_gem_tiling.c | 47 ++++++++++++++++++++--------------
>  1 file changed, 28 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_tiling.c b/drivers/gpu/drm/i915/i915_gem_tiling.c
> index f4b984de83b5..2ceaddc959d3 100644
> --- a/drivers/gpu/drm/i915/i915_gem_tiling.c
> +++ b/drivers/gpu/drm/i915/i915_gem_tiling.c
> @@ -117,34 +117,45 @@ i915_tiling_ok(struct drm_device *dev, int stride, int size, int tiling_mode)
>  }
>  
>  /* Is the current GTT allocation valid for the change in tiling? */
> -static bool
> +static int
>  i915_gem_object_fence_ok(struct drm_i915_gem_object *obj, int tiling_mode)
>  {
>  	struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
> +	struct i915_vma *vma;
>  	u32 size;
>  
>  	if (tiling_mode == I915_TILING_NONE)
> -		return true;
> +		return 0;
>  
>  	if (INTEL_GEN(dev_priv) >= 4)
> -		return true;
> +		return 0;
> +
> +	vma = i915_gem_obj_to_ggtt(obj);
> +	if (!vma)
> +		return 0;
> +
> +	if (!obj->map_and_fenceable)
> +		return 0;
>  
>  	if (IS_GEN3(dev_priv)) {
> -		if (i915_gem_obj_ggtt_offset(obj) & ~I915_FENCE_START_MASK)
> -			return false;
> +		if (vma->node.start & ~I915_FENCE_START_MASK)
> +			goto bad;
>  	} else {
> -		if (i915_gem_obj_ggtt_offset(obj) & ~I830_FENCE_START_MASK)
> -			return false;
> +		if (vma->node.start & ~I830_FENCE_START_MASK)
> +			goto bad;
>  	}
>  
>  	size = i915_gem_get_ggtt_size(dev_priv, obj->base.size, tiling_mode);
> -	if (i915_gem_obj_ggtt_size(obj) != size)
> -		return false;
> +	if (vma->node.size < size)
> +		goto bad;
>  
> -	if (i915_gem_obj_ggtt_offset(obj) & (size - 1))
> -		return false;
> +	if (vma->node.start & (size - 1))
> +		goto bad;
>  
> -	return true;
> +	return 0;
> +
> +bad:
> +	return i915_vma_unbind(vma);

Umm, I do not see matching checks to convert to return the error values
in i915_vma_unbind(). Am I lost or what were you after with this?

Regards, Joonas

>  }
>  
>  /**
> @@ -168,7 +179,7 @@ i915_gem_set_tiling(struct drm_device *dev, void *data,
>  	struct drm_i915_gem_set_tiling *args = data;
>  	struct drm_i915_private *dev_priv = to_i915(dev);
>  	struct drm_i915_gem_object *obj;
> -	int ret = 0;
> +	int err = 0;
>  
>  	/* Make sure we don't cross-contaminate obj->tiling_and_stride */
>  	BUILD_BUG_ON(I915_TILING_LAST & STRIDE_MASK);
> @@ -187,7 +198,7 @@ i915_gem_set_tiling(struct drm_device *dev, void *data,
>  
>  	mutex_lock(&dev->struct_mutex);
>  	if (obj->pin_display || obj->framebuffer_references) {
> -		ret = -EBUSY;
> +		err = -EBUSY;
>  		goto err;
>  	}
>  
> @@ -234,11 +245,9 @@ i915_gem_set_tiling(struct drm_device *dev, void *data,
>  		 * has to also include the unfenced register the GPU uses
>  		 * whilst executing a fenced command for an untiled object.
>  		 */
> -		if (obj->map_and_fenceable &&
> -		    !i915_gem_object_fence_ok(obj, args->tiling_mode))
> -			ret = i915_vma_unbind(i915_gem_obj_to_ggtt(obj));
>  
> -		if (ret == 0) {
> +		err = i915_gem_object_fence_ok(obj, args->tiling_mode);
> +		if (!err) {
>  			if (obj->pages &&
>  			    obj->madv == I915_MADV_WILLNEED &&
>  			    dev_priv->quirks & QUIRK_PIN_SWIZZLED_PAGES) {
> @@ -281,7 +290,7 @@ err:
>  
>  	intel_runtime_pm_put(dev_priv);
>  
> -	return ret;
> +	return err;
>  }
>  
>  /**
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 02/33] drm/i915: Do not overwrite the request with zero on reallocation
  2016-08-08  9:56     ` Chris Wilson
@ 2016-08-09  6:32       ` Daniel Vetter
  0 siblings, 0 replies; 125+ messages in thread
From: Daniel Vetter @ 2016-08-09  6:32 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter, intel-gfx, joonas.lahtinen, Goel,
	Akash, Daniel Vetter

On Mon, Aug 08, 2016 at 10:56:56AM +0100, Chris Wilson wrote:
> On Mon, Aug 08, 2016 at 11:25:56AM +0200, Daniel Vetter wrote:
> > On Sun, Aug 07, 2016 at 03:45:10PM +0100, Chris Wilson wrote:
> > > When using RCU lookup for the request, commit 0eafec6d3244 ("drm/i915:
> > > Enable lockless lookup of request tracking via RCU"), we acknowledge that
> > > we may race with another thread that could have reallocated the request.
> > > In order for the first thread not to blow up, the second thread must not
> > > clear the request completed before overwriting it. In the RCU lookup, we
> > > allow for the engine/seqno to be replaced but we do not allow for it to
> > > be zeroed.
> > > 
> > > The choice we make is to either add extra checking to the RCU lookup, or
> > > embrace the inherent races (as intended). It is more complicated as we
> > > need to manually clear everything we depend upon being zero initialised,
> > > but we benefit from not emiting the memset() to clear the entire
> > > frequently allocated structure (that memset turns up in throughput
> > > profiles). And at the same time, the lookup remains flexible for future
> > > adjustments.
> > > 
> > > v2: Old style LRC requires another variable to be initialize. (The
> > > danger inherent in not zeroing everything.)
> > > v3: request->batch also needs to be cleared
> > > 
> > > Fixes: 0eafec6d3244 ("drm/i915: Enable lockless lookup of request...")
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > Cc: "Goel, Akash" <akash.goel@intel.com>
> > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > > ---
> > >  drivers/gpu/drm/i915/i915_gem_request.c | 37 ++++++++++++++++++++++++++++++++-
> > >  drivers/gpu/drm/i915/i915_gem_request.h | 11 ++++++++++
> > >  2 files changed, 47 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
> > > index 6a1661643d3d..b7ffde002a62 100644
> > > --- a/drivers/gpu/drm/i915/i915_gem_request.c
> > > +++ b/drivers/gpu/drm/i915/i915_gem_request.c
> > > @@ -355,7 +355,35 @@ i915_gem_request_alloc(struct intel_engine_cs *engine,
> > >  	if (req && i915_gem_request_completed(req))
> > >  		i915_gem_request_retire(req);
> > >  
> > > -	req = kmem_cache_zalloc(dev_priv->requests, GFP_KERNEL);
> > > +	/* Beware: Dragons be flying overhead.
> > > +	 *
> > > +	 * We use RCU to look up requests in flight. The lookups may
> > > +	 * race with the request being allocated from the slab freelist.
> > > +	 * That is the request we are writing to here, may be in the process
> > > +	 * of being read by __i915_gem_active_get_request_rcu(). As such,
> > > +	 * we have to be very careful when overwriting the contents. During
> > > +	 * the RCU lookup, we change chase the request->engine pointer,
> > > +	 * read the request->fence.seqno and increment the reference count.
> > > +	 *
> > > +	 * The reference count is incremented atomically. If it is zero,
> > > +	 * the lookup knows the request is unallocated and complete. Otherwise,
> > > +	 * it is either still in use, or has been reallocated and reset
> > > +	 * with fence_init(). This increment is safe for release as we check
> > > +	 * that the request we have a reference to and matches the active
> > > +	 * request.
> > > +	 *
> > > +	 * Before we increment the refcount, we chase the request->engine
> > > +	 * pointer. We must not call kmem_cache_zalloc() or else we set
> > > +	 * that pointer to NULL and cause a crash during the lookup. If
> > > +	 * we see the request is completed (based on the value of the
> > > +	 * old engine and seqno), the lookup is complete and reports NULL.
> > > +	 * If we decide the request is not completed (new engine or seqno),
> > > +	 * then we grab a reference and double check that it is still the
> > > +	 * active request - which it won't be and restart the lookup.
> > > +	 *
> > > +	 * Do not use kmem_cache_zalloc() here!
> > > +	 */
> > > +	req = kmem_cache_alloc(dev_priv->requests, GFP_KERNEL);
> > >  	if (!req)
> > >  		return ERR_PTR(-ENOMEM);
> > >  
> > > @@ -375,6 +403,13 @@ i915_gem_request_alloc(struct intel_engine_cs *engine,
> > >  	req->engine = engine;
> > >  	req->ctx = i915_gem_context_get(ctx);
> > 
> > See my earlier review - if we go with this I think we should fully embrace
> > it and not clear anything where it's not needed. Otherwise we have a funny
> > mix of defensive clearing to NULL and needing to be careful.
> >   
> > > +	/* No zalloc, must clear what we need by hand */
> > > +	req->signaling.wait.tsk = NULL;
> > 
> > This shouldn't be non-NULL once the refcount has dropped to 0. Maybe a
> > WARN_ON instead?
> 
> This is just from older code where we had the if (wait.tsk != NULL)
> skip.
> 
> > > +	req->previous_context = NULL;
> > 
> > We unconditionally set this in advance_context (together with a bunch of
> > other ring state tracked in the request). Do we really need to reset this
> > here?
> 
> Previous_context may be used unset (along a failure path), so requires
> initialising.
> 
> > > +	req->file_priv = NULL;
> > 
> > This is already cleared in either request_retire or _release. Again maybe
> > just a WARN_ON?.
> 
> But we never clear it first, so it may be poisoned.

Argh right, I forgotten that it's not just not memset on realloc, but in
general. Then we indeed need to clear these all.

> 
> > > +	req->batch_obj = NULL;
> > 
> > Agreed with this one, we might reuse the request for a non-execbuf
> > request. But I think we also need to reset ->pid here.
> 
> What pid? Gah. (Don't have pid here in my tree...)

Otoh we shouldn't be printing pid really when file_priv == NULL, so no
need to clear.

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

> 
> > > +	req->elsp_submitted = 0;
> > 
> > Needed, but feels misplaced since it's lrc stuff. I think it'd be better
> > to stuff this into intel_logical_ring_alloc_request_extras.
> 
> No need for that extra complexity, it is to be removed.
> -Chris
> 
> -- 
> Chris Wilson, Intel Open Source Technology Centre

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance
  2016-08-08  9:45       ` Chris Wilson
@ 2016-08-09  6:36         ` Joonas Lahtinen
  2016-08-09  7:14           ` Chris Wilson
  0 siblings, 1 reply; 125+ messages in thread
From: Joonas Lahtinen @ 2016-08-09  6:36 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter, intel-gfx, Daniel Vetter

On ma, 2016-08-08 at 10:45 +0100, Chris Wilson wrote:
> On Mon, Aug 08, 2016 at 10:30:25AM +0100, Chris Wilson wrote:
> > 
> > On Mon, Aug 08, 2016 at 11:12:59AM +0200, Daniel Vetter wrote:
> > > 
> > > On Sun, Aug 07, 2016 at 03:45:09PM +0100, Chris Wilson wrote:
> > > > 
> > > > In the debate as to whether the second read of active->request is
> > > > ordered after the dependent reads of the first read of active->request,
> > > > just give in and throw a smp_rmb() in there so that ordering of loads is
> > > > assured.
> > > > 
> > > > v2: Explain the manual smp_rmb()
> > > > 
> > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > > Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > r-b confirmed.
> > It's still fishy that we are implying an SMP effect where we need to
> > mandate the local processor order (that being the order evaluation of
> > request = *active; engine = *request; *active). The two *active are
> > already ordered across SMP, so we are only concered about this cpu. :|
> More second thoughts. rcu_assign_pointer(NULL) is not visible to
> rcu_access_pointer on another CPU without the smp_rmb. 

Should not a RCU read side lock be involved?

Is it not kind of the point that rcu_assign_pointer() will only be
visible everywhere when all previous read side critical sections have
ended after calling rcu_synchronize()? And will be valid during
rcu_read_lock().

If we do not use read side critical sections, how do we expect the
synchronization to happen by RCU code?

Regards, Joonas

> I think I am
> overestimating the barriers in place for RCU, and they are weaker than
> what I imagined for good reason.
> -Chris
> 
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance
  2016-08-09  6:36         ` Joonas Lahtinen
@ 2016-08-09  7:14           ` Chris Wilson
  2016-08-09  8:48             ` Joonas Lahtinen
  0 siblings, 1 reply; 125+ messages in thread
From: Chris Wilson @ 2016-08-09  7:14 UTC (permalink / raw)
  To: Joonas Lahtinen; +Cc: Daniel Vetter, intel-gfx

On Tue, Aug 09, 2016 at 09:36:48AM +0300, Joonas Lahtinen wrote:
> On ma, 2016-08-08 at 10:45 +0100, Chris Wilson wrote:
> > On Mon, Aug 08, 2016 at 10:30:25AM +0100, Chris Wilson wrote:
> > > 
> > > On Mon, Aug 08, 2016 at 11:12:59AM +0200, Daniel Vetter wrote:
> > > > 
> > > > On Sun, Aug 07, 2016 at 03:45:09PM +0100, Chris Wilson wrote:
> > > > > 
> > > > > In the debate as to whether the second read of active->request is
> > > > > ordered after the dependent reads of the first read of active->request,
> > > > > just give in and throw a smp_rmb() in there so that ordering of loads is
> > > > > assured.
> > > > > 
> > > > > v2: Explain the manual smp_rmb()
> > > > > 
> > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > > > Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > > r-b confirmed.
> > > It's still fishy that we are implying an SMP effect where we need to
> > > mandate the local processor order (that being the order evaluation of
> > > request = *active; engine = *request; *active). The two *active are
> > > already ordered across SMP, so we are only concered about this cpu. :|
> > More second thoughts. rcu_assign_pointer(NULL) is not visible to
> > rcu_access_pointer on another CPU without the smp_rmb. 
> 
> Should not a RCU read side lock be involved?

Yes, we use rcu read lock here. The question here is about visibility of
the other processor writes vs the local processor order. Before the
other processor can overwrite the request during reallocation, it will
have updated the active->request and gone through a wmb. During busy
ioctl's read of the request, we want to make sure that the values we
read (request->engine, request->seqno) have not been overwritten as we
do so - and we do that by serialising the second pointer check with the
other cpus.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 17/33] drm/i915: Use VMA directly for checking tiling parameters
  2016-08-09  6:18   ` Joonas Lahtinen
@ 2016-08-09  8:03     ` Chris Wilson
  0 siblings, 0 replies; 125+ messages in thread
From: Chris Wilson @ 2016-08-09  8:03 UTC (permalink / raw)
  To: Joonas Lahtinen; +Cc: intel-gfx

On Tue, Aug 09, 2016 at 09:18:28AM +0300, Joonas Lahtinen wrote:
> On su, 2016-08-07 at 15:45 +0100, Chris Wilson wrote:
> > +	return 0;
> > +
> > +bad:
> > +	return i915_vma_unbind(vma);
> 
> Umm, I do not see matching checks to convert to return the error values
> in i915_vma_unbind(). Am I lost or what were you after with this?

The code here is unchanged, just moved so that all the similar checks
were in the same function and mixed between caller and callee.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance
  2016-08-09  7:14           ` Chris Wilson
@ 2016-08-09  8:48             ` Joonas Lahtinen
  2016-08-09  9:05               ` Chris Wilson
  0 siblings, 1 reply; 125+ messages in thread
From: Joonas Lahtinen @ 2016-08-09  8:48 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Daniel Vetter, intel-gfx

On ti, 2016-08-09 at 08:14 +0100, Chris Wilson wrote:
> On Tue, Aug 09, 2016 at 09:36:48AM +0300, Joonas Lahtinen wrote:
> > 
> > On ma, 2016-08-08 at 10:45 +0100, Chris Wilson wrote:
> > > 
> > > On Mon, Aug 08, 2016 at 10:30:25AM +0100, Chris Wilson wrote:
> > > > 
> > > > 
> > > > On Mon, Aug 08, 2016 at 11:12:59AM +0200, Daniel Vetter wrote:
> > > > > 
> > > > > 
> > > > > On Sun, Aug 07, 2016 at 03:45:09PM +0100, Chris Wilson wrote:
> > > > > > 
> > > > > > 
> > > > > > In the debate as to whether the second read of active->request is
> > > > > > ordered after the dependent reads of the first read of active->request,
> > > > > > just give in and throw a smp_rmb() in there so that ordering of loads is
> > > > > > assured.
> > > > > > 
> > > > > > v2: Explain the manual smp_rmb()
> > > > > > 
> > > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > > > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > > > > Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > > > r-b confirmed.
> > > > It's still fishy that we are implying an SMP effect where we need to
> > > > mandate the local processor order (that being the order evaluation of
> > > > request = *active; engine = *request; *active). The two *active are
> > > > already ordered across SMP, so we are only concered about this cpu. :|
> > > More second thoughts. rcu_assign_pointer(NULL) is not visible to
> > > rcu_access_pointer on another CPU without the smp_rmb. 
> > Should not a RCU read side lock be involved?
> Yes, we use rcu read lock here. The question here is about visibility of
> the other processor writes vs the local processor order. Before the
> other processor can overwrite the request during reallocation, it will
> have updated the active->request and gone through a wmb. During busy
> ioctl's read of the request, we want to make sure that the values we
> read (request->engine, request->seqno) have not been overwritten as we
> do so - and we do that by serialising the second pointer check with the
> other cpus.

As discussed in IRC, some other mechanism than an improvised spinning
loop + some SMP barriers thrown around would be much preferred.

You suggested a seqlock, and it would likely be ok.

Regards, Joonas

> -Chris
> 
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 07/33] drm/i915: Store the active context object on all engines upon error
  2016-08-07 14:45 ` [PATCH 07/33] drm/i915: Store the active context object on all engines upon error Chris Wilson
@ 2016-08-09  9:02   ` Joonas Lahtinen
  0 siblings, 0 replies; 125+ messages in thread
From: Joonas Lahtinen @ 2016-08-09  9:02 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On su, 2016-08-07 at 15:45 +0100, Chris Wilson wrote:
> With execlists, we have context objects everywhere, not just RCS. So
> store them for post-mortem debugging. This also has a secondary effect
> of removing one more unsafe list iteration with using preserved state
> from the hanging request. And now we can cross-reference the request's
> context state with that loaded by the GPU.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

> ---
>  drivers/gpu/drm/i915/i915_gpu_error.c | 28 ++++------------------------
>  1 file changed, 4 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index b94a59733cf8..c621fa23cd28 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -1028,28 +1028,6 @@ static void error_record_engine_registers(struct drm_i915_error_state *error,
>  	}
>  }
>  
> -static void i915_gem_record_active_context(struct intel_engine_cs *engine,
> -					   struct drm_i915_error_state *error,
> -					   struct drm_i915_error_engine *ee)
> -{
> -	struct drm_i915_private *dev_priv = engine->i915;
> -	struct drm_i915_gem_object *obj;
> -
> -	/* Currently render ring is the only HW context user */
> -	if (engine->id != RCS || !error->ccid)
> -		return;
> -
> -	list_for_each_entry(obj, &dev_priv->mm.bound_list, global_list) {
> -		if (!i915_gem_obj_ggtt_bound(obj))
> -			continue;
> -
> -		if ((error->ccid & PAGE_MASK) == i915_gem_obj_ggtt_offset(obj)) {
> -			ee->ctx = i915_error_ggtt_object_create(dev_priv, obj);
> -			break;
> -		}
> -	}
> -}
> -
>  static void i915_gem_record_rings(struct drm_i915_private *dev_priv,
>  				  struct drm_i915_error_state *error)
>  {
> @@ -1099,6 +1077,10 @@ static void i915_gem_record_rings(struct drm_i915_private *dev_priv,
>  					i915_error_ggtt_object_create(dev_priv,
>  								      engine->scratch.obj);
>  
> +			ee->ctx =
> +				i915_error_ggtt_object_create(dev_priv,
> +							      request->ctx->engine[i].state);
> +
>  			if (request->pid) {
>  				struct task_struct *task;
>  
> @@ -1129,8 +1111,6 @@ static void i915_gem_record_rings(struct drm_i915_private *dev_priv,
>  		ee->wa_ctx = i915_error_ggtt_object_create(dev_priv,
>  							   engine->wa_ctx.obj);
>  
> -		i915_gem_record_active_context(engine, error, ee);
> -
>  		count = 0;
>  		list_for_each_entry(request, &engine->request_list, link)
>  			count++;
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance
  2016-08-09  8:48             ` Joonas Lahtinen
@ 2016-08-09  9:05               ` Chris Wilson
  2016-08-10 10:12                 ` Daniel Vetter
  0 siblings, 1 reply; 125+ messages in thread
From: Chris Wilson @ 2016-08-09  9:05 UTC (permalink / raw)
  To: Joonas Lahtinen; +Cc: Daniel Vetter, intel-gfx

On Tue, Aug 09, 2016 at 11:48:56AM +0300, Joonas Lahtinen wrote:
> On ti, 2016-08-09 at 08:14 +0100, Chris Wilson wrote:
> > On Tue, Aug 09, 2016 at 09:36:48AM +0300, Joonas Lahtinen wrote:
> > > 
> > > On ma, 2016-08-08 at 10:45 +0100, Chris Wilson wrote:
> > > > 
> > > > On Mon, Aug 08, 2016 at 10:30:25AM +0100, Chris Wilson wrote:
> > > > > 
> > > > > 
> > > > > On Mon, Aug 08, 2016 at 11:12:59AM +0200, Daniel Vetter wrote:
> > > > > > 
> > > > > > 
> > > > > > On Sun, Aug 07, 2016 at 03:45:09PM +0100, Chris Wilson wrote:
> > > > > > > 
> > > > > > > 
> > > > > > > In the debate as to whether the second read of active->request is
> > > > > > > ordered after the dependent reads of the first read of active->request,
> > > > > > > just give in and throw a smp_rmb() in there so that ordering of loads is
> > > > > > > assured.
> > > > > > > 
> > > > > > > v2: Explain the manual smp_rmb()
> > > > > > > 
> > > > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > > > > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > > > > > Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > > > > r-b confirmed.
> > > > > It's still fishy that we are implying an SMP effect where we need to
> > > > > mandate the local processor order (that being the order evaluation of
> > > > > request = *active; engine = *request; *active). The two *active are
> > > > > already ordered across SMP, so we are only concered about this cpu. :|
> > > > More second thoughts. rcu_assign_pointer(NULL) is not visible to
> > > > rcu_access_pointer on another CPU without the smp_rmb. 
> > > Should not a RCU read side lock be involved?
> > Yes, we use rcu read lock here. The question here is about visibility of
> > the other processor writes vs the local processor order. Before the
> > other processor can overwrite the request during reallocation, it will
> > have updated the active->request and gone through a wmb. During busy
> > ioctl's read of the request, we want to make sure that the values we
> > read (request->engine, request->seqno) have not been overwritten as we
> > do so - and we do that by serialising the second pointer check with the
> > other cpus.
> 
> As discussed in IRC, some other mechanism than an improvised spinning
> loop + some SMP barriers thrown around would be much preferred.
> 
> You suggested a seqlock, and it would likely be ok.

I was comparing the read latching as they are identical. Using a
read/write seqlock around the request modification does not prevent all
dangers such as using kzalloc() and introduces a second sequence counter
to the one we already have. And for good reason seqlock says to use RCU
here. Which puts us in a bit of a catch-22 and having to guard against
SLAB_DESTROY_BY_RCU.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 16/33] drm/i915: Convert fence computations to use vma directly
  2016-08-07 14:45 ` [PATCH 16/33] drm/i915: Convert fence computations to use vma directly Chris Wilson
@ 2016-08-09 10:27   ` Joonas Lahtinen
  2016-08-09 10:33     ` Chris Wilson
  0 siblings, 1 reply; 125+ messages in thread
From: Joonas Lahtinen @ 2016-08-09 10:27 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On su, 2016-08-07 at 15:45 +0100, Chris Wilson wrote:
> Lookup the GGTT vma once for the object assigned to the fence, and then
> derive everything from that vma.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_gem_fence.c | 55 +++++++++++++++++------------------
>  1 file changed, 26 insertions(+), 29 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_fence.c b/drivers/gpu/drm/i915/i915_gem_fence.c
> index 9e8173fe2a09..60749cd23f20 100644
> --- a/drivers/gpu/drm/i915/i915_gem_fence.c
> +++ b/drivers/gpu/drm/i915/i915_gem_fence.c
> @@ -85,22 +85,19 @@ static void i965_write_fence_reg(struct drm_device *dev, int reg,
>  	POSTING_READ(fence_reg_lo);
>  
>  	if (obj) {
> -		u32 size = i915_gem_obj_ggtt_size(obj);
> +		struct i915_vma *vma = i915_gem_obj_to_ggtt(obj);
>  		unsigned int tiling = i915_gem_object_get_tiling(obj);
>  		unsigned int stride = i915_gem_object_get_stride(obj);
> -		uint64_t val;
> +		u64 size = vma->node.size;
> +		u32 row_size = stride * (tiling == I915_TILING_Y ? 32 : 8);
> +		u64 val;
>  
>  		/* Adjust fence size to match tiled area */
> -		if (tiling != I915_TILING_NONE) {
> -			uint32_t row_size = stride *
> -				(tiling == I915_TILING_Y ? 32 : 8);
> -			size = (size / row_size) * row_size;
> -		}
> +		size = size / row_size * row_size;

There's a macro for this, it's called rounddown().

>  
> -		val = (uint64_t)((i915_gem_obj_ggtt_offset(obj) + size - 4096) &
> -				 0xfffff000) << 32;
> -		val |= i915_gem_obj_ggtt_offset(obj) & 0xfffff000;
> -		val |= (uint64_t)((stride / 128) - 1) << fence_pitch_shift;
> +		val = ((vma->node.start + size - 4096) & 0xfffff000) << 32;
> +		val |= vma->node.start & 0xfffff000;
> +		val |= (u64)((stride / 128) - 1) << fence_pitch_shift;

This was rather magicy before, but it could be much better. The rest
are less so. Can be added to TODO.

With above converted to rounddown()

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas

>  		if (tiling == I915_TILING_Y)
>  			val |= 1 << I965_FENCE_TILING_Y_SHIFT;
>  		val |= I965_FENCE_REG_VALID;
> @@ -123,17 +120,17 @@ static void i915_write_fence_reg(struct drm_device *dev, int reg,
>  	u32 val;
>  
>  	if (obj) {
> -		u32 size = i915_gem_obj_ggtt_size(obj);
> +		struct i915_vma *vma = i915_gem_obj_to_ggtt(obj);
>  		unsigned int tiling = i915_gem_object_get_tiling(obj);
>  		unsigned int stride = i915_gem_object_get_stride(obj);
>  		int pitch_val;
>  		int tile_width;
>  
> -		WARN((i915_gem_obj_ggtt_offset(obj) & ~I915_FENCE_START_MASK) ||
> -		     (size & -size) != size ||
> -		     (i915_gem_obj_ggtt_offset(obj) & (size - 1)),
> -		     "object 0x%08llx [fenceable? %d] not 1M or pot-size (0x%08x) aligned\n",
> -		     i915_gem_obj_ggtt_offset(obj), obj->map_and_fenceable, size);
> +		WARN((vma->node.start & ~I915_FENCE_START_MASK) ||
> +		     !is_power_of_2(vma->node.size) ||
> +		     (vma->node.start & (vma->node.size - 1)),
> +		     "object 0x%08llx [fenceable? %d] not 1M or pot-size (0x%08llx) aligned\n",
> +		     vma->node.start, obj->map_and_fenceable, vma->node.size);
>  
>  		if (tiling == I915_TILING_Y && HAS_128_BYTE_Y_TILING(dev))
>  			tile_width = 128;
> @@ -144,10 +141,10 @@ static void i915_write_fence_reg(struct drm_device *dev, int reg,
>  		pitch_val = stride / tile_width;
>  		pitch_val = ffs(pitch_val) - 1;
>  
> -		val = i915_gem_obj_ggtt_offset(obj);
> +		val = vma->node.start;
>  		if (tiling == I915_TILING_Y)
>  			val |= 1 << I830_FENCE_TILING_Y_SHIFT;
> -		val |= I915_FENCE_SIZE_BITS(size);
> +		val |= I915_FENCE_SIZE_BITS(vma->node.size);
>  		val |= pitch_val << I830_FENCE_PITCH_SHIFT;
>  		val |= I830_FENCE_REG_VALID;
>  	} else
> @@ -161,27 +158,27 @@ static void i830_write_fence_reg(struct drm_device *dev, int reg,
>  				struct drm_i915_gem_object *obj)
>  {
>  	struct drm_i915_private *dev_priv = to_i915(dev);
> -	uint32_t val;
> +	u32 val;
>  
>  	if (obj) {
> -		u32 size = i915_gem_obj_ggtt_size(obj);
> +		struct i915_vma *vma = i915_gem_obj_to_ggtt(obj);
>  		unsigned int tiling = i915_gem_object_get_tiling(obj);
>  		unsigned int stride = i915_gem_object_get_stride(obj);
> -		uint32_t pitch_val;
> +		u32 pitch_val;
>  
> -		WARN((i915_gem_obj_ggtt_offset(obj) & ~I830_FENCE_START_MASK) ||
> -		     (size & -size) != size ||
> -		     (i915_gem_obj_ggtt_offset(obj) & (size - 1)),
> -		     "object 0x%08llx not 512K or pot-size 0x%08x aligned\n",
> -		     i915_gem_obj_ggtt_offset(obj), size);
> +		WARN((vma->node.start & ~I830_FENCE_START_MASK) ||
> +		     !is_power_of_2(vma->node.size) ||
> +		     (vma->node.start & (vma->node.size - 1)),
> +		     "object 0x%08llx not 512K or pot-size 0x%08llx aligned\n",
> +		     vma->node.start, vma->node.size);
>  
>  		pitch_val = stride / 128;
>  		pitch_val = ffs(pitch_val) - 1;
>  
> -		val = i915_gem_obj_ggtt_offset(obj);
> +		val = vma->node.start;
>  		if (tiling == I915_TILING_Y)
>  			val |= 1 << I830_FENCE_TILING_Y_SHIFT;
> -		val |= I830_FENCE_SIZE_BITS(size);
> +		val |= I830_FENCE_SIZE_BITS(vma->node.size);
>  		val |= pitch_val << I830_FENCE_PITCH_SHIFT;
>  		val |= I830_FENCE_REG_VALID;
>  	} else
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 10/33] drm/i915: Remove inactive/active list from debugfs
  2016-08-07 14:45 ` [PATCH 10/33] drm/i915: Remove inactive/active list from debugfs Chris Wilson
@ 2016-08-09 10:29   ` Joonas Lahtinen
  0 siblings, 0 replies; 125+ messages in thread
From: Joonas Lahtinen @ 2016-08-09 10:29 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On su, 2016-08-07 at 15:45 +0100, Chris Wilson wrote:
> These two files (i915_gem_active, i915_gem_inactive) no longer give
> pertinent information since active/inactive tracking is per-vm and so we
> need the information per-vm. They are obsolete so remove them.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

> ---
>  drivers/gpu/drm/i915/i915_debugfs.c | 49 -------------------------------------
>  1 file changed, 49 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 0627e170ea25..8de458dcffaa 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -210,53 +210,6 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
>  		seq_printf(m, " (frontbuffer: 0x%03x)", frontbuffer_bits);
>  }
>  
> -static int i915_gem_object_list_info(struct seq_file *m, void *data)
> -{
> -	struct drm_info_node *node = m->private;
> -	uintptr_t list = (uintptr_t) node->info_ent->data;
> -	struct list_head *head;
> -	struct drm_device *dev = node->minor->dev;
> -	struct drm_i915_private *dev_priv = to_i915(dev);
> -	struct i915_ggtt *ggtt = &dev_priv->ggtt;
> -	struct i915_vma *vma;
> -	u64 total_obj_size, total_gtt_size;
> -	int count, ret;
> -
> -	ret = mutex_lock_interruptible(&dev->struct_mutex);
> -	if (ret)
> -		return ret;
> -
> -	/* FIXME: the user of this interface might want more than just GGTT */
> -	switch (list) {
> -	case ACTIVE_LIST:
> -		seq_puts(m, "Active:\n");
> -		head = &ggtt->base.active_list;
> -		break;
> -	case INACTIVE_LIST:
> -		seq_puts(m, "Inactive:\n");
> -		head = &ggtt->base.inactive_list;
> -		break;
> -	default:
> -		mutex_unlock(&dev->struct_mutex);
> -		return -EINVAL;
> -	}
> -
> -	total_obj_size = total_gtt_size = count = 0;
> -	list_for_each_entry(vma, head, vm_link) {
> -		seq_printf(m, "   ");
> -		describe_obj(m, vma->obj);
> -		seq_printf(m, "\n");
> -		total_obj_size += vma->obj->base.size;
> -		total_gtt_size += vma->node.size;
> -		count++;
> -	}
> -	mutex_unlock(&dev->struct_mutex);
> -
> -	seq_printf(m, "Total %d objects, %llu bytes, %llu GTT size\n",
> -		   count, total_obj_size, total_gtt_size);
> -	return 0;
> -}
> -
>  static int obj_rank_by_stolen(void *priv,
>  			      struct list_head *A, struct list_head *B)
>  {
> @@ -5375,8 +5328,6 @@ static const struct drm_info_list i915_debugfs_list[] = {
>  	{"i915_gem_objects", i915_gem_object_info, 0},
>  	{"i915_gem_gtt", i915_gem_gtt_info, 0},
>  	{"i915_gem_pinned", i915_gem_gtt_info, 0, (void *) PINNED_LIST},
> -	{"i915_gem_active", i915_gem_object_list_info, 0, (void *) ACTIVE_LIST},
> -	{"i915_gem_inactive", i915_gem_object_list_info, 0, (void *) INACTIVE_LIST},
>  	{"i915_gem_stolen", i915_gem_stolen_list_info },
>  	{"i915_gem_pageflip", i915_gem_pageflip_info, 0},
>  	{"i915_gem_request", i915_gem_request_info, 0},
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 16/33] drm/i915: Convert fence computations to use vma directly
  2016-08-09 10:27   ` Joonas Lahtinen
@ 2016-08-09 10:33     ` Chris Wilson
  0 siblings, 0 replies; 125+ messages in thread
From: Chris Wilson @ 2016-08-09 10:33 UTC (permalink / raw)
  To: Joonas Lahtinen; +Cc: intel-gfx

On Tue, Aug 09, 2016 at 01:27:31PM +0300, Joonas Lahtinen wrote:
> On su, 2016-08-07 at 15:45 +0100, Chris Wilson wrote:
> > Lookup the GGTT vma once for the object assigned to the fence, and then
> > derive everything from that vma.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >  drivers/gpu/drm/i915/i915_gem_fence.c | 55 +++++++++++++++++------------------
> >  1 file changed, 26 insertions(+), 29 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem_fence.c b/drivers/gpu/drm/i915/i915_gem_fence.c
> > index 9e8173fe2a09..60749cd23f20 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_fence.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_fence.c
> > @@ -85,22 +85,19 @@ static void i965_write_fence_reg(struct drm_device *dev, int reg,
> >  	POSTING_READ(fence_reg_lo);
> >  
> >  	if (obj) {
> > -		u32 size = i915_gem_obj_ggtt_size(obj);
> > +		struct i915_vma *vma = i915_gem_obj_to_ggtt(obj);
> >  		unsigned int tiling = i915_gem_object_get_tiling(obj);
> >  		unsigned int stride = i915_gem_object_get_stride(obj);
> > -		uint64_t val;
> > +		u64 size = vma->node.size;
> > +		u32 row_size = stride * (tiling == I915_TILING_Y ? 32 : 8);
> > +		u64 val;
> >  
> >  		/* Adjust fence size to match tiled area */
> > -		if (tiling != I915_TILING_NONE) {
> > -			uint32_t row_size = stride *
> > -				(tiling == I915_TILING_Y ? 32 : 8);
> > -			size = (size / row_size) * row_size;
> > -		}
> > +		size = size / row_size * row_size;
> 
> There's a macro for this, it's called rounddown().
> 
> >  
> > -		val = (uint64_t)((i915_gem_obj_ggtt_offset(obj) + size - 4096) &
> > -				 0xfffff000) << 32;
> > -		val |= i915_gem_obj_ggtt_offset(obj) & 0xfffff000;
> > -		val |= (uint64_t)((stride / 128) - 1) << fence_pitch_shift;
> > +		val = ((vma->node.start + size - 4096) & 0xfffff000) << 32;
> > +		val |= vma->node.start & 0xfffff000;
> > +		val |= (u64)((stride / 128) - 1) << fence_pitch_shift;
> 
> This was rather magicy before, but it could be much better. The rest
> are less so. Can be added to TODO.

> With above converted to rounddown()

The code is just silly. The correct fix is to expand the GTT vma to fit
the fence and not reduce the fence.  rounddown is fine as until it is
fixed.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 11/33] drm/i915: Focus debugfs/i915_gem_pinned to show only display pins
  2016-08-07 14:45 ` [PATCH 11/33] drm/i915: Focus debugfs/i915_gem_pinned to show only display pins Chris Wilson
@ 2016-08-09 10:39   ` Joonas Lahtinen
  2016-08-09 10:46     ` Chris Wilson
  0 siblings, 1 reply; 125+ messages in thread
From: Joonas Lahtinen @ 2016-08-09 10:39 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On su, 2016-08-07 at 15:45 +0100, Chris Wilson wrote:
> Only those objects pinned to the display have semi-permanent pins of a
> global nature (other pins are transient within their local vm). Simplify
> i915_gem_pinned to only show the pertinent information about the pinned
> objects within the GGTT.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_debugfs.c | 11 ++---------
>  1 file changed, 2 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 8de458dcffaa..9911594acbc9 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -40,12 +40,6 @@
>  #include 
>  #include "i915_drv.h"
>  
> -enum {
> -	ACTIVE_LIST,
> -	INACTIVE_LIST,
> -	PINNED_LIST,
> -};
> -
>  /* As the drm_debugfs_init() routines are called before dev->dev_private is
>   * allocated we need to hook into the minor for release. */
>  static int
> @@ -537,7 +531,6 @@ static int i915_gem_gtt_info(struct seq_file *m, void *data)
>  {
>  	struct drm_info_node *node = m->private;
>  	struct drm_device *dev = node->minor->dev;
> -	uintptr_t list = (uintptr_t) node->info_ent->data;
>  	struct drm_i915_private *dev_priv = to_i915(dev);
>  	struct drm_i915_gem_object *obj;
>  	u64 total_obj_size, total_gtt_size;
> @@ -549,7 +542,7 @@ static int i915_gem_gtt_info(struct seq_file *m, void *data)
>  
>  	total_obj_size = total_gtt_size = count = 0;
>  	list_for_each_entry(obj, &dev_priv->mm.bound_list, global_list) {
> -		if (list == PINNED_LIST && !i915_gem_obj_is_pinned(obj))
> +		if (!obj->pin_display)
>  			continue;
>  
>  		seq_puts(m, "   ");
> @@ -5327,7 +5320,7 @@ static const struct drm_info_list i915_debugfs_list[] = {
>  	{"i915_capabilities", i915_capabilities, 0},
>  	{"i915_gem_objects", i915_gem_object_info, 0},
>  	{"i915_gem_gtt", i915_gem_gtt_info, 0},
> -	{"i915_gem_pinned", i915_gem_gtt_info, 0, (void *) PINNED_LIST},
> +	{"i915_gem_pinned", i915_gem_gtt_info, 0, 0},

"i915_gem_pin_display" then? Otherwise it's a fragile change.

Regards, Joonas

>  	{"i915_gem_stolen", i915_gem_stolen_list_info },
>  	{"i915_gem_pageflip", i915_gem_pageflip_info, 0},
>  	{"i915_gem_request", i915_gem_request_info, 0},
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 11/33] drm/i915: Focus debugfs/i915_gem_pinned to show only display pins
  2016-08-09 10:39   ` Joonas Lahtinen
@ 2016-08-09 10:46     ` Chris Wilson
  2016-08-09 11:32       ` Joonas Lahtinen
  0 siblings, 1 reply; 125+ messages in thread
From: Chris Wilson @ 2016-08-09 10:46 UTC (permalink / raw)
  To: Joonas Lahtinen; +Cc: intel-gfx

On Tue, Aug 09, 2016 at 01:39:02PM +0300, Joonas Lahtinen wrote:
> On su, 2016-08-07 at 15:45 +0100, Chris Wilson wrote:
> > Only those objects pinned to the display have semi-permanent pins of a
> > global nature (other pins are transient within their local vm). Simplify
> > i915_gem_pinned to only show the pertinent information about the pinned
> > objects within the GGTT.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >  drivers/gpu/drm/i915/i915_debugfs.c | 11 ++---------
> >  1 file changed, 2 insertions(+), 9 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> > index 8de458dcffaa..9911594acbc9 100644
> > --- a/drivers/gpu/drm/i915/i915_debugfs.c
> > +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> > @@ -40,12 +40,6 @@
> >  #include 
> >  #include "i915_drv.h"
> >  
> > -enum {
> > -	ACTIVE_LIST,
> > -	INACTIVE_LIST,
> > -	PINNED_LIST,
> > -};
> > -
> >  /* As the drm_debugfs_init() routines are called before dev->dev_private is
> >   * allocated we need to hook into the minor for release. */
> >  static int
> > @@ -537,7 +531,6 @@ static int i915_gem_gtt_info(struct seq_file *m, void *data)
> >  {
> >  	struct drm_info_node *node = m->private;
> >  	struct drm_device *dev = node->minor->dev;
> > -	uintptr_t list = (uintptr_t) node->info_ent->data;
> >  	struct drm_i915_private *dev_priv = to_i915(dev);
> >  	struct drm_i915_gem_object *obj;
> >  	u64 total_obj_size, total_gtt_size;
> > @@ -549,7 +542,7 @@ static int i915_gem_gtt_info(struct seq_file *m, void *data)
> >  
> >  	total_obj_size = total_gtt_size = count = 0;
> >  	list_for_each_entry(obj, &dev_priv->mm.bound_list, global_list) {
> > -		if (list == PINNED_LIST && !i915_gem_obj_is_pinned(obj))
> > +		if (!obj->pin_display)
> >  			continue;
> >  
> >  		seq_puts(m, "   ");
> > @@ -5327,7 +5320,7 @@ static const struct drm_info_list i915_debugfs_list[] = {
> >  	{"i915_capabilities", i915_capabilities, 0},
> >  	{"i915_gem_objects", i915_gem_object_info, 0},
> >  	{"i915_gem_gtt", i915_gem_gtt_info, 0},
> > -	{"i915_gem_pinned", i915_gem_gtt_info, 0, (void *) PINNED_LIST},
> > +	{"i915_gem_pinned", i915_gem_gtt_info, 0, 0},
> 
> "i915_gem_pin_display" then? Otherwise it's a fragile change.

Sure. Sold with that change?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 09/33] drm/i915: Mark unmappable GGTT entries as PIN_HIGH
  2016-08-07 14:45 ` [PATCH 09/33] drm/i915: Mark unmappable GGTT entries as PIN_HIGH Chris Wilson
  2016-08-08  9:09   ` Joonas Lahtinen
@ 2016-08-09 11:05   ` Tvrtko Ursulin
  2016-08-09 11:13     ` Chris Wilson
  1 sibling, 1 reply; 125+ messages in thread
From: Tvrtko Ursulin @ 2016-08-09 11:05 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 07/08/16 15:45, Chris Wilson wrote:
> We allocate a few objects into the GGTT that we never need to access via
> the mappable aperture (such as contexts, status pages). We can request
> that these are bound high in the VM to increase the amount of mappable
> aperture available. However, anything that may be frequently pinned
> (such as logical contexts) we want to use the fast search & insert.

Is the last bit still true after you merged:

   commit 202b52b7fbf70858609ec20829c7d69a13ffa351
   Author: Chris Wilson <chris@chris-wilson.co.uk>
   Date:   Wed Aug 3 16:04:09 2016 +0100

       drm: Track drm_mm nodes with an interval tree

Or we could PIN_HIGH the LRCs as well now?

Regards,

Tvrtko

> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/intel_lrc.c        | 2 +-
>   drivers/gpu/drm/i915/intel_ringbuffer.c | 5 +++--
>   2 files changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 309c5d9b1c57..c7f4b64b16f6 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1182,7 +1182,7 @@ static int lrc_setup_wa_ctx_obj(struct intel_engine_cs *engine, u32 size)
>   	}
>
>   	ret = i915_gem_object_ggtt_pin(engine->wa_ctx.obj, NULL,
> -				       0, PAGE_SIZE, 0);
> +				       0, PAGE_SIZE, PIN_HIGH);
>   	if (ret) {
>   		DRM_DEBUG_DRIVER("pin LRC WA ctx backing obj failed: %d\n",
>   				 ret);
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 16b726fe33eb..09f01c641c14 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -2093,7 +2093,7 @@ static int intel_ring_context_pin(struct i915_gem_context *ctx,
>
>   	if (ce->state) {
>   		ret = i915_gem_object_ggtt_pin(ce->state, NULL, 0,
> -					       ctx->ggtt_alignment, 0);
> +					       ctx->ggtt_alignment, PIN_HIGH);
>   		if (ret)
>   			goto error;
>   	}
> @@ -2629,7 +2629,8 @@ static void intel_ring_init_semaphores(struct drm_i915_private *dev_priv,
>   			i915.semaphores = 0;
>   		} else {
>   			i915_gem_object_set_cache_level(obj, I915_CACHE_LLC);
> -			ret = i915_gem_object_ggtt_pin(obj, NULL, 0, 0, 0);
> +			ret = i915_gem_object_ggtt_pin(obj, NULL,
> +						       0, 0, PIN_HIGH);
>   			if (ret != 0) {
>   				i915_gem_object_put(obj);
>   				DRM_ERROR("Failed to pin semaphore bo. Disabling semaphores\n");
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 09/33] drm/i915: Mark unmappable GGTT entries as PIN_HIGH
  2016-08-09 11:05   ` Tvrtko Ursulin
@ 2016-08-09 11:13     ` Chris Wilson
  2016-08-09 11:20       ` Chris Wilson
  0 siblings, 1 reply; 125+ messages in thread
From: Chris Wilson @ 2016-08-09 11:13 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Tue, Aug 09, 2016 at 12:05:30PM +0100, Tvrtko Ursulin wrote:
> 
> On 07/08/16 15:45, Chris Wilson wrote:
> >We allocate a few objects into the GGTT that we never need to access via
> >the mappable aperture (such as contexts, status pages). We can request
> >that these are bound high in the VM to increase the amount of mappable
> >aperture available. However, anything that may be frequently pinned
> >(such as logical contexts) we want to use the fast search & insert.
> 
> Is the last bit still true after you merged:
> 
>   commit 202b52b7fbf70858609ec20829c7d69a13ffa351
>   Author: Chris Wilson <chris@chris-wilson.co.uk>
>   Date:   Wed Aug 3 16:04:09 2016 +0100
> 
>       drm: Track drm_mm nodes with an interval tree
> 
> Or we could PIN_HIGH the LRCs as well now?

It is still true. I've another set of patches for drm_mm to speed up the
actual PIN_HIGH searching, but that is still slower than using the
eviction stack. It's a tradeoff of causing fragmentation (and maybe
hitting a slow cleanup path) or faster allocation when creating
contexts. I think the tradeoff is still towards causing fragmentation.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 09/33] drm/i915: Mark unmappable GGTT entries as PIN_HIGH
  2016-08-09 11:13     ` Chris Wilson
@ 2016-08-09 11:20       ` Chris Wilson
  0 siblings, 0 replies; 125+ messages in thread
From: Chris Wilson @ 2016-08-09 11:20 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

On Tue, Aug 09, 2016 at 12:13:32PM +0100, Chris Wilson wrote:
> On Tue, Aug 09, 2016 at 12:05:30PM +0100, Tvrtko Ursulin wrote:
> > 
> > On 07/08/16 15:45, Chris Wilson wrote:
> > >We allocate a few objects into the GGTT that we never need to access via
> > >the mappable aperture (such as contexts, status pages). We can request
> > >that these are bound high in the VM to increase the amount of mappable
> > >aperture available. However, anything that may be frequently pinned
> > >(such as logical contexts) we want to use the fast search & insert.
> > 
> > Is the last bit still true after you merged:
> > 
> >   commit 202b52b7fbf70858609ec20829c7d69a13ffa351
> >   Author: Chris Wilson <chris@chris-wilson.co.uk>
> >   Date:   Wed Aug 3 16:04:09 2016 +0100
> > 
> >       drm: Track drm_mm nodes with an interval tree
> > 
> > Or we could PIN_HIGH the LRCs as well now?
> 
> It is still true. I've another set of patches for drm_mm to speed up the
> actual PIN_HIGH searching, but that is still slower than using the
> eviction stack. It's a tradeoff of causing fragmentation (and maybe
> hitting a slow cleanup path) or faster allocation when creating
> contexts. I think the tradeoff is still towards causing fragmentation.

I should also mention that the DRM_MM_SEARCH_HIGH is broken right now as
it just allocates from the bottom of the eviction stack, not the first
hole available whilst scanning top-down.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 11/33] drm/i915: Focus debugfs/i915_gem_pinned to show only display pins
  2016-08-09 10:46     ` Chris Wilson
@ 2016-08-09 11:32       ` Joonas Lahtinen
  0 siblings, 0 replies; 125+ messages in thread
From: Joonas Lahtinen @ 2016-08-09 11:32 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On ti, 2016-08-09 at 11:46 +0100, Chris Wilson wrote:
> On Tue, Aug 09, 2016 at 01:39:02PM +0300, Joonas Lahtinen wrote:
> > 
> > On su, 2016-08-07 at 15:45 +0100, Chris Wilson wrote:
> > > @@ -5327,7 +5320,7 @@ static const struct drm_info_list i915_debugfs_list[] = {
> > >  	{"i915_capabilities", i915_capabilities, 0},
> > >  	{"i915_gem_objects", i915_gem_object_info, 0},
> > >  	{"i915_gem_gtt", i915_gem_gtt_info, 0},
> > > -	{"i915_gem_pinned", i915_gem_gtt_info, 0, (void *) PINNED_LIST},
> > > +	{"i915_gem_pinned", i915_gem_gtt_info, 0, 0},
> > "i915_gem_pin_display" then? Otherwise it's a fragile change.
> Sure. Sold with that change?

Also remove the "i915_gem_gtt", alias and rename the function
appropriately, with those;

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

> -Chris
> 
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 32/33] drm/i915: Consolidate error object printing
  2016-08-07 14:45 ` [PATCH 32/33] drm/i915: Consolidate error object printing Chris Wilson
@ 2016-08-09 11:44   ` Joonas Lahtinen
  2016-08-09 11:53     ` Chris Wilson
  0 siblings, 1 reply; 125+ messages in thread
From: Joonas Lahtinen @ 2016-08-09 11:44 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On su, 2016-08-07 at 15:45 +0100, Chris Wilson wrote:
>  static void print_error_obj(struct drm_i915_error_state_buf *m,
> +			    struct intel_engine_cs *engine,
> +			    const char *name,
>  			    struct drm_i915_error_object *obj)
>  {
>  	int page, offset, elt;
>  
> +	if (!obj)
> +		return;
> +
> +	if (name) {
> +		err_printf(m, "%s --- %s gtt_offset = 0x%08x_%08x\n",
> +			   engine ? engine->name : "global", name,
> +			   upper_32_bits(obj->gtt_offset),
> +			   lower_32_bits(obj->gtt_offset));
> +	}
> +
>  	for (page = offset = 0; page < obj->page_count; page++) {
>  		for (elt = 0; elt < PAGE_SIZE/4; elt++) {
>  			err_printf(m, "%08x :  %08x\n", offset,
> @@ -330,8 +342,8 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
>  	struct drm_i915_private *dev_priv = to_i915(dev);
>  	struct drm_i915_error_state *error = error_priv->error;
>  	struct drm_i915_error_object *obj;
> -	int i, j, offset, elt;
>  	int max_hangcheck_score;
> +	int i, j;
>  
>  	if (!error) {
>  		err_printf(m, "no error state collected\n");
> @@ -446,15 +458,7 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
>  			err_printf(m, " --- gtt_offset = 0x%08x %08x\n",

If intended for userspace parsing "0x%08x %08x" vs. "0x%08x_%08x" would
be good to be consistent. And to reduce such error in future, I'd also
make this line be printed with above function (let there be extra
space).

With that;

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 32/33] drm/i915: Consolidate error object printing
  2016-08-09 11:44   ` Joonas Lahtinen
@ 2016-08-09 11:53     ` Chris Wilson
  2016-08-10 10:55       ` Joonas Lahtinen
  0 siblings, 1 reply; 125+ messages in thread
From: Chris Wilson @ 2016-08-09 11:53 UTC (permalink / raw)
  To: Joonas Lahtinen; +Cc: intel-gfx

On Tue, Aug 09, 2016 at 02:44:41PM +0300, Joonas Lahtinen wrote:
> On su, 2016-08-07 at 15:45 +0100, Chris Wilson wrote:
> >  static void print_error_obj(struct drm_i915_error_state_buf *m,
> > +			    struct intel_engine_cs *engine,
> > +			    const char *name,
> >  			    struct drm_i915_error_object *obj)
> >  {
> >  	int page, offset, elt;
> >  
> > +	if (!obj)
> > +		return;
> > +
> > +	if (name) {
> > +		err_printf(m, "%s --- %s gtt_offset = 0x%08x_%08x\n",
> > +			   engine ? engine->name : "global", name,
> > +			   upper_32_bits(obj->gtt_offset),
> > +			   lower_32_bits(obj->gtt_offset));
> > +	}
> > +
> >  	for (page = offset = 0; page < obj->page_count; page++) {
> >  		for (elt = 0; elt < PAGE_SIZE/4; elt++) {
> >  			err_printf(m, "%08x :  %08x\n", offset,
> > @@ -330,8 +342,8 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
> >  	struct drm_i915_private *dev_priv = to_i915(dev);
> >  	struct drm_i915_error_state *error = error_priv->error;
> >  	struct drm_i915_error_object *obj;
> > -	int i, j, offset, elt;
> >  	int max_hangcheck_score;
> > +	int i, j;
> >  
> >  	if (!error) {
> >  		err_printf(m, "no error state collected\n");
> > @@ -446,15 +458,7 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
> >  			err_printf(m, " --- gtt_offset = 0x%08x %08x\n",
> 
> If intended for userspace parsing "0x%08x %08x" vs. "0x%08x_%08x" would
> be good to be consistent. And to reduce such error in future, I'd also
> make this line be printed with above function (let there be extra
> space).

Yes, I remembered to fix that mistake only after sending the patches. :|

Combining this one is a bit trickier as it doesn't conform to the others.
For simplicity I left the custom header in the caller.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* [PATCH v2] drm/i915: Move missed interrupt detection from hangcheck to breadcrumbs
  2016-08-07 14:45 ` [PATCH 03/33] drm/i915: Move missed interrupt detection from hangcheck to breadcrumbs Chris Wilson
@ 2016-08-09 14:08   ` Chris Wilson
  2016-08-09 14:10   ` [PATCH v3] " Chris Wilson
  1 sibling, 0 replies; 125+ messages in thread
From: Chris Wilson @ 2016-08-09 14:08 UTC (permalink / raw)
  To: intel-gfx; +Cc: Mika Kuoppala

In commit 2529d57050af ("drm/i915: Drop racy markup of missed-irqs from
idle-worker") the racy detection of missed interrupts was removed when
we went idle. This however opened up the issue that the stuck waiters
were not being reported, causing a test case failure. If we move the
stuck waiter detection out of hangcheck and into the breadcrumb
mechanims (i.e. the waiter) itself, we can avoid this issue entirely.
This leaves hangcheck looking for a stuck GPU (inspecting for request
advancement and HEAD motion), and breadcrumbs looking for a stuck
waiter - hopefully make both easier to understand by their segregation.

v2: Reduce the error message as we now run independently of hangcheck,
and the hanging batch used by igt also counts as a stuck waiter causing
extra warnings in dmesg.
v3: Move the breadcrumb's hangcheck kickstart to the first missed wait.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97104
Fixes: 2529d57050af (waiter"drm/i915: Drop racy markup of missed-irqs...")
Testcase: igt/drv_missed_irq
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c      | 11 ++---
 drivers/gpu/drm/i915/i915_gem.c          | 10 -----
 drivers/gpu/drm/i915/i915_irq.c          | 26 +-----------
 drivers/gpu/drm/i915/intel_breadcrumbs.c | 69 ++++++++++++++++++++++----------
 drivers/gpu/drm/i915/intel_engine_cs.c   |  1 +
 drivers/gpu/drm/i915/intel_ringbuffer.h  |  6 +--
 6 files changed, 56 insertions(+), 67 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index f62285c1ed7f..96bfc745a820 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -787,8 +787,6 @@ static void i915_ring_seqno_info(struct seq_file *m,
 
 	seq_printf(m, "Current sequence (%s): %x\n",
 		   engine->name, intel_engine_get_seqno(engine));
-	seq_printf(m, "Current user interrupts (%s): %lx\n",
-		   engine->name, READ_ONCE(engine->breadcrumbs.irq_wakeups));
 
 	spin_lock(&b->lock);
 	for (rb = rb_first(&b->waiters); rb; rb = rb_next(rb)) {
@@ -1434,11 +1432,10 @@ static int i915_hangcheck_info(struct seq_file *m, void *unused)
 			   engine->hangcheck.seqno,
 			   seqno[id],
 			   engine->last_submitted_seqno);
-		seq_printf(m, "\twaiters? %d\n",
-			   intel_engine_has_waiter(engine));
-		seq_printf(m, "\tuser interrupts = %lx [current %lx]\n",
-			   engine->hangcheck.user_interrupts,
-			   READ_ONCE(engine->breadcrumbs.irq_wakeups));
+		seq_printf(m, "\twaiters? %s, fake irq active? %s\n",
+			   yesno(intel_engine_has_waiter(engine)),
+			   yesno(test_bit(engine->id,
+					  &dev_priv->gpu_error.missed_irq_rings)));
 		seq_printf(m, "\tACTHD = 0x%08llx [current 0x%08llx]\n",
 			   (long long)engine->hangcheck.acthd,
 			   (long long)acthd[id]);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index d71fa9a93afa..2bb9ef91a243 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2524,7 +2524,6 @@ i915_gem_idle_work_handler(struct work_struct *work)
 		container_of(work, typeof(*dev_priv), gt.idle_work.work);
 	struct drm_device *dev = &dev_priv->drm;
 	struct intel_engine_cs *engine;
-	unsigned int stuck_engines;
 	bool rearm_hangcheck;
 
 	if (!READ_ONCE(dev_priv->gt.awake))
@@ -2554,15 +2553,6 @@ i915_gem_idle_work_handler(struct work_struct *work)
 	dev_priv->gt.awake = false;
 	rearm_hangcheck = false;
 
-	/* As we have disabled hangcheck, we need to unstick any waiters still
-	 * hanging around. However, as we may be racing against the interrupt
-	 * handler or the waiters themselves, we skip enabling the fake-irq.
-	 */
-	stuck_engines = intel_kick_waiters(dev_priv);
-	if (unlikely(stuck_engines))
-		DRM_DEBUG_DRIVER("kicked stuck waiters (%x)...missed irq?\n",
-				 stuck_engines);
-
 	if (INTEL_GEN(dev_priv) >= 6)
 		gen6_rps_idle(dev_priv);
 	intel_runtime_pm_put(dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 591f452ece68..ebb83d5a448b 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -972,10 +972,8 @@ static void ironlake_rps_change_irq_handler(struct drm_i915_private *dev_priv)
 static void notify_ring(struct intel_engine_cs *engine)
 {
 	smp_store_mb(engine->breadcrumbs.irq_posted, true);
-	if (intel_engine_wakeup(engine)) {
+	if (intel_engine_wakeup(engine))
 		trace_i915_gem_request_notify(engine);
-		engine->breadcrumbs.irq_wakeups++;
-	}
 }
 
 static void vlv_c0_read(struct drm_i915_private *dev_priv,
@@ -3044,22 +3042,6 @@ engine_stuck(struct intel_engine_cs *engine, u64 acthd)
 	return HANGCHECK_HUNG;
 }
 
-static unsigned long kick_waiters(struct intel_engine_cs *engine)
-{
-	struct drm_i915_private *i915 = engine->i915;
-	unsigned long irq_count = READ_ONCE(engine->breadcrumbs.irq_wakeups);
-
-	if (engine->hangcheck.user_interrupts == irq_count &&
-	    !test_and_set_bit(engine->id, &i915->gpu_error.missed_irq_rings)) {
-		if (!test_bit(engine->id, &i915->gpu_error.test_irq_rings))
-			DRM_ERROR("Hangcheck timer elapsed... %s idle\n",
-				  engine->name);
-
-		intel_engine_enable_fake_irq(engine);
-	}
-
-	return irq_count;
-}
 /*
  * This is called when the chip hasn't reported back with completed
  * batchbuffers in a long time. We keep track per ring seqno progress and
@@ -3097,7 +3079,6 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
 		bool busy = intel_engine_has_waiter(engine);
 		u64 acthd;
 		u32 seqno;
-		unsigned user_interrupts;
 
 		semaphore_clear_deadlocks(dev_priv);
 
@@ -3114,15 +3095,11 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
 		acthd = intel_engine_get_active_head(engine);
 		seqno = intel_engine_get_seqno(engine);
 
-		/* Reset stuck interrupts between batch advances */
-		user_interrupts = 0;
-
 		if (engine->hangcheck.seqno == seqno) {
 			if (!intel_engine_is_active(engine)) {
 				engine->hangcheck.action = HANGCHECK_IDLE;
 				if (busy) {
 					/* Safeguard against driver failure */
-					user_interrupts = kick_waiters(engine);
 					engine->hangcheck.score += BUSY;
 				}
 			} else {
@@ -3185,7 +3162,6 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
 
 		engine->hangcheck.seqno = seqno;
 		engine->hangcheck.acthd = acthd;
-		engine->hangcheck.user_interrupts = user_interrupts;
 		busy_count += busy;
 	}
 
diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
index 90867446f1a5..c83ae3cb51df 100644
--- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
@@ -26,6 +26,40 @@
 
 #include "i915_drv.h"
 
+static void intel_breadcrumbs_hangcheck(unsigned long data)
+{
+	struct intel_engine_cs *engine = (struct intel_engine_cs *)data;
+	struct intel_breadcrumbs *b = &engine->breadcrumbs;
+
+	if (!b->irq_enabled)
+		return;
+
+	if (time_before(jiffies, b->timeout)) {
+		mod_timer(&b->hangcheck, b->timeout);
+		return;
+	}
+
+	DRM_DEBUG("Hangcheck timer elapsed... %s idle\n", engine->name);
+	set_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings);
+	mod_timer(&engine->breadcrumbs.fake_irq, jiffies + 1);
+
+	/* Ensure that even if the GPU hangs, we get woken up.
+	 *
+	 * However, note that if no one is waiting, we never notice
+	 * a gpu hang. Eventually, we will have to wait for a resource
+	 * held by the GPU and so trigger a hangcheck. In the most
+	 * pathological case, this will be upon memory starvation! To
+	 * prevent this, we also queue the hangcheck from the retire
+	 * worker.
+	 */
+	i915_queue_hangcheck(i915);
+}
+
+static unsigned long wait_timeout(void)
+{
+	return round_jiffies_up(jiffies + DRM_I915_HANGCHECK_JIFFIES);
+}
+
 static void intel_breadcrumbs_fake_irq(unsigned long data)
 {
 	struct intel_engine_cs *engine = (struct intel_engine_cs *)data;
@@ -51,13 +85,6 @@ static void irq_enable(struct intel_engine_cs *engine)
 	 */
 	engine->breadcrumbs.irq_posted = true;
 
-	/* Make sure the current hangcheck doesn't falsely accuse a just
-	 * started irq handler from missing an interrupt (because the
-	 * interrupt count still matches the stale value from when
-	 * the irq handler was disabled, many hangchecks ago).
-	 */
-	engine->breadcrumbs.irq_wakeups++;
-
 	spin_lock_irq(&engine->i915->irq_lock);
 	engine->irq_enable(engine);
 	spin_unlock_irq(&engine->i915->irq_lock);
@@ -98,17 +125,13 @@ static void __intel_breadcrumbs_enable_irq(struct intel_breadcrumbs *b)
 	}
 
 	if (!b->irq_enabled ||
-	    test_bit(engine->id, &i915->gpu_error.missed_irq_rings))
+	    test_bit(engine->id, &i915->gpu_error.missed_irq_rings)) {
 		mod_timer(&b->fake_irq, jiffies + 1);
-
-	/* Ensure that even if the GPU hangs, we get woken up.
-	 *
-	 * However, note that if no one is waiting, we never notice
-	 * a gpu hang. Eventually, we will have to wait for a resource
-	 * held by the GPU and so trigger a hangcheck. In the most
-	 * pathological case, this will be upon memory starvation!
-	 */
-	i915_queue_hangcheck(i915);
+	} else {
+		/* Ensure we never sleep indefinitely */
+		GEM_BUG_ON(!time_after(b->timeout, jiffies));
+		mod_timer(&b->hangcheck, b->timeout);
+	}
 }
 
 static void __intel_breadcrumbs_disable_irq(struct intel_breadcrumbs *b)
@@ -219,6 +242,7 @@ static bool __intel_engine_add_wait(struct intel_engine_cs *engine,
 		GEM_BUG_ON(!next && !first);
 		if (next && next != &wait->node) {
 			GEM_BUG_ON(first);
+			b->timeout = wait_timeout();
 			b->first_wait = to_wait(next);
 			smp_store_mb(b->irq_seqno_bh, b->first_wait->tsk);
 			/* As there is a delay between reading the current
@@ -245,6 +269,7 @@ static bool __intel_engine_add_wait(struct intel_engine_cs *engine,
 
 	if (first) {
 		GEM_BUG_ON(rb_first(&b->waiters) != &wait->node);
+		b->timeout = wait_timeout();
 		b->first_wait = wait;
 		smp_store_mb(b->irq_seqno_bh, wait->tsk);
 		/* After assigning ourselves as the new bottom-half, we must
@@ -277,11 +302,6 @@ bool intel_engine_add_wait(struct intel_engine_cs *engine,
 	return first;
 }
 
-void intel_engine_enable_fake_irq(struct intel_engine_cs *engine)
-{
-	mod_timer(&engine->breadcrumbs.fake_irq, jiffies + 1);
-}
-
 static inline bool chain_wakeup(struct rb_node *rb, int priority)
 {
 	return rb && to_wait(rb)->tsk->prio <= priority;
@@ -359,6 +379,7 @@ void intel_engine_remove_wait(struct intel_engine_cs *engine,
 			 * the interrupt, or if we have to handle an
 			 * exception rather than a seqno completion.
 			 */
+			b->timeout = wait_timeout();
 			b->first_wait = to_wait(next);
 			smp_store_mb(b->irq_seqno_bh, b->first_wait->tsk);
 			if (b->first_wait->seqno != wait->seqno)
@@ -533,6 +554,9 @@ int intel_engine_init_breadcrumbs(struct intel_engine_cs *engine)
 	struct task_struct *tsk;
 
 	spin_lock_init(&b->lock);
+	setup_timer(&b->hangcheck,
+		    intel_breadcrumbs_hangcheck,
+		    (unsigned long)engine);
 	setup_timer(&b->fake_irq,
 		    intel_breadcrumbs_fake_irq,
 		    (unsigned long)engine);
@@ -561,6 +585,7 @@ void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine)
 		kthread_stop(b->signaler);
 
 	del_timer_sync(&b->fake_irq);
+	del_timer_sync(&b->hangcheck);
 }
 
 unsigned int intel_kick_waiters(struct drm_i915_private *i915)
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index e9b301ae2d0c..0dd3d1de18aa 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -164,6 +164,7 @@ cleanup:
 void intel_engine_init_hangcheck(struct intel_engine_cs *engine)
 {
 	memset(&engine->hangcheck, 0, sizeof(engine->hangcheck));
+	clear_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings);
 }
 
 static void intel_engine_init_requests(struct intel_engine_cs *engine)
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 43e545e44352..4aed4586b0b6 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -75,7 +75,6 @@ enum intel_engine_hangcheck_action {
 
 struct intel_engine_hangcheck {
 	u64 acthd;
-	unsigned long user_interrupts;
 	u32 seqno;
 	int score;
 	enum intel_engine_hangcheck_action action;
@@ -173,7 +172,6 @@ struct intel_engine_cs {
 	 */
 	struct intel_breadcrumbs {
 		struct task_struct *irq_seqno_bh; /* bh for user interrupts */
-		unsigned long irq_wakeups;
 		bool irq_posted;
 
 		spinlock_t lock; /* protects the lists of requests */
@@ -183,6 +181,9 @@ struct intel_engine_cs {
 		struct task_struct *signaler; /* used for fence signalling */
 		struct drm_i915_gem_request *first_signal;
 		struct timer_list fake_irq; /* used after a missed interrupt */
+		struct timer_list hangcheck; /* detect missed interrupts */
+
+		unsigned long timeout;
 
 		bool irq_enabled : 1;
 		bool rpm_wakelock : 1;
@@ -560,7 +561,6 @@ static inline bool intel_engine_wakeup(struct intel_engine_cs *engine)
 	return wakeup;
 }
 
-void intel_engine_enable_fake_irq(struct intel_engine_cs *engine);
 void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine);
 unsigned int intel_kick_waiters(struct drm_i915_private *i915);
 unsigned int intel_kick_signalers(struct drm_i915_private *i915);
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 125+ messages in thread

* ✗ Ro.CI.BAT: failure for series starting with [01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance (rev5)
  2016-08-07 14:45 First class VMA, take 2 Chris Wilson
                   ` (35 preceding siblings ...)
  2016-08-08 10:34 ` ✗ Fi.CI.BAT: " Patchwork
@ 2016-08-09 14:10 ` Patchwork
  2016-08-09 14:20 ` ✗ Ro.CI.BAT: failure for series starting with [01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance (rev6) Patchwork
  2016-08-10  6:43 ` Patchwork
  38 siblings, 0 replies; 125+ messages in thread
From: Patchwork @ 2016-08-09 14:10 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance (rev5)
URL   : https://patchwork.freedesktop.org/series/10770/
State : failure

== Summary ==

Applying: drm/i915: Add smp_rmb() to busy ioctl's RCU dance
Using index info to reconstruct a base tree...
M	drivers/gpu/drm/i915/i915_gem.c
M	drivers/gpu/drm/i915/i915_gem_request.h
Falling back to patching base and 3-way merge...
Auto-merging drivers/gpu/drm/i915/i915_gem.c
CONFLICT (content): Merge conflict in drivers/gpu/drm/i915/i915_gem.c
error: Failed to merge in the changes.
Patch failed at 0001 drm/i915: Add smp_rmb() to busy ioctl's RCU dance
The copy of the patch that failed is found in: .git/rebase-apply/patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* [PATCH v3] drm/i915: Move missed interrupt detection from hangcheck to breadcrumbs
  2016-08-07 14:45 ` [PATCH 03/33] drm/i915: Move missed interrupt detection from hangcheck to breadcrumbs Chris Wilson
  2016-08-09 14:08   ` [PATCH v2] " Chris Wilson
@ 2016-08-09 14:10   ` Chris Wilson
  2016-08-09 15:24     ` Mika Kuoppala
  1 sibling, 1 reply; 125+ messages in thread
From: Chris Wilson @ 2016-08-09 14:10 UTC (permalink / raw)
  To: intel-gfx; +Cc: Mika Kuoppala

In commit 2529d57050af ("drm/i915: Drop racy markup of missed-irqs from
idle-worker") the racy detection of missed interrupts was removed when
we went idle. This however opened up the issue that the stuck waiters
were not being reported, causing a test case failure. If we move the
stuck waiter detection out of hangcheck and into the breadcrumb
mechanims (i.e. the waiter) itself, we can avoid this issue entirely.
This leaves hangcheck looking for a stuck GPU (inspecting for request
advancement and HEAD motion), and breadcrumbs looking for a stuck
waiter - hopefully make both easier to understand by their segregation.

v2: Reduce the error message as we now run independently of hangcheck,
and the hanging batch used by igt also counts as a stuck waiter causing
extra warnings in dmesg.
v3: Move the breadcrumb's hangcheck kickstart to the first missed wait.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97104
Fixes: 2529d57050af (waiter"drm/i915: Drop racy markup of missed-irqs...")
Testcase: igt/drv_missed_irq
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c      | 11 ++---
 drivers/gpu/drm/i915/i915_gem.c          | 10 -----
 drivers/gpu/drm/i915/i915_irq.c          | 26 +-----------
 drivers/gpu/drm/i915/intel_breadcrumbs.c | 69 ++++++++++++++++++++++----------
 drivers/gpu/drm/i915/intel_engine_cs.c   |  1 +
 drivers/gpu/drm/i915/intel_ringbuffer.h  |  6 +--
 6 files changed, 56 insertions(+), 67 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index f62285c1ed7f..96bfc745a820 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -787,8 +787,6 @@ static void i915_ring_seqno_info(struct seq_file *m,
 
 	seq_printf(m, "Current sequence (%s): %x\n",
 		   engine->name, intel_engine_get_seqno(engine));
-	seq_printf(m, "Current user interrupts (%s): %lx\n",
-		   engine->name, READ_ONCE(engine->breadcrumbs.irq_wakeups));
 
 	spin_lock(&b->lock);
 	for (rb = rb_first(&b->waiters); rb; rb = rb_next(rb)) {
@@ -1434,11 +1432,10 @@ static int i915_hangcheck_info(struct seq_file *m, void *unused)
 			   engine->hangcheck.seqno,
 			   seqno[id],
 			   engine->last_submitted_seqno);
-		seq_printf(m, "\twaiters? %d\n",
-			   intel_engine_has_waiter(engine));
-		seq_printf(m, "\tuser interrupts = %lx [current %lx]\n",
-			   engine->hangcheck.user_interrupts,
-			   READ_ONCE(engine->breadcrumbs.irq_wakeups));
+		seq_printf(m, "\twaiters? %s, fake irq active? %s\n",
+			   yesno(intel_engine_has_waiter(engine)),
+			   yesno(test_bit(engine->id,
+					  &dev_priv->gpu_error.missed_irq_rings)));
 		seq_printf(m, "\tACTHD = 0x%08llx [current 0x%08llx]\n",
 			   (long long)engine->hangcheck.acthd,
 			   (long long)acthd[id]);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index d71fa9a93afa..2bb9ef91a243 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2524,7 +2524,6 @@ i915_gem_idle_work_handler(struct work_struct *work)
 		container_of(work, typeof(*dev_priv), gt.idle_work.work);
 	struct drm_device *dev = &dev_priv->drm;
 	struct intel_engine_cs *engine;
-	unsigned int stuck_engines;
 	bool rearm_hangcheck;
 
 	if (!READ_ONCE(dev_priv->gt.awake))
@@ -2554,15 +2553,6 @@ i915_gem_idle_work_handler(struct work_struct *work)
 	dev_priv->gt.awake = false;
 	rearm_hangcheck = false;
 
-	/* As we have disabled hangcheck, we need to unstick any waiters still
-	 * hanging around. However, as we may be racing against the interrupt
-	 * handler or the waiters themselves, we skip enabling the fake-irq.
-	 */
-	stuck_engines = intel_kick_waiters(dev_priv);
-	if (unlikely(stuck_engines))
-		DRM_DEBUG_DRIVER("kicked stuck waiters (%x)...missed irq?\n",
-				 stuck_engines);
-
 	if (INTEL_GEN(dev_priv) >= 6)
 		gen6_rps_idle(dev_priv);
 	intel_runtime_pm_put(dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 591f452ece68..ebb83d5a448b 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -972,10 +972,8 @@ static void ironlake_rps_change_irq_handler(struct drm_i915_private *dev_priv)
 static void notify_ring(struct intel_engine_cs *engine)
 {
 	smp_store_mb(engine->breadcrumbs.irq_posted, true);
-	if (intel_engine_wakeup(engine)) {
+	if (intel_engine_wakeup(engine))
 		trace_i915_gem_request_notify(engine);
-		engine->breadcrumbs.irq_wakeups++;
-	}
 }
 
 static void vlv_c0_read(struct drm_i915_private *dev_priv,
@@ -3044,22 +3042,6 @@ engine_stuck(struct intel_engine_cs *engine, u64 acthd)
 	return HANGCHECK_HUNG;
 }
 
-static unsigned long kick_waiters(struct intel_engine_cs *engine)
-{
-	struct drm_i915_private *i915 = engine->i915;
-	unsigned long irq_count = READ_ONCE(engine->breadcrumbs.irq_wakeups);
-
-	if (engine->hangcheck.user_interrupts == irq_count &&
-	    !test_and_set_bit(engine->id, &i915->gpu_error.missed_irq_rings)) {
-		if (!test_bit(engine->id, &i915->gpu_error.test_irq_rings))
-			DRM_ERROR("Hangcheck timer elapsed... %s idle\n",
-				  engine->name);
-
-		intel_engine_enable_fake_irq(engine);
-	}
-
-	return irq_count;
-}
 /*
  * This is called when the chip hasn't reported back with completed
  * batchbuffers in a long time. We keep track per ring seqno progress and
@@ -3097,7 +3079,6 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
 		bool busy = intel_engine_has_waiter(engine);
 		u64 acthd;
 		u32 seqno;
-		unsigned user_interrupts;
 
 		semaphore_clear_deadlocks(dev_priv);
 
@@ -3114,15 +3095,11 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
 		acthd = intel_engine_get_active_head(engine);
 		seqno = intel_engine_get_seqno(engine);
 
-		/* Reset stuck interrupts between batch advances */
-		user_interrupts = 0;
-
 		if (engine->hangcheck.seqno == seqno) {
 			if (!intel_engine_is_active(engine)) {
 				engine->hangcheck.action = HANGCHECK_IDLE;
 				if (busy) {
 					/* Safeguard against driver failure */
-					user_interrupts = kick_waiters(engine);
 					engine->hangcheck.score += BUSY;
 				}
 			} else {
@@ -3185,7 +3162,6 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
 
 		engine->hangcheck.seqno = seqno;
 		engine->hangcheck.acthd = acthd;
-		engine->hangcheck.user_interrupts = user_interrupts;
 		busy_count += busy;
 	}
 
diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
index 90867446f1a5..7be9af1d5424 100644
--- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
@@ -26,6 +26,40 @@
 
 #include "i915_drv.h"
 
+static void intel_breadcrumbs_hangcheck(unsigned long data)
+{
+	struct intel_engine_cs *engine = (struct intel_engine_cs *)data;
+	struct intel_breadcrumbs *b = &engine->breadcrumbs;
+
+	if (!b->irq_enabled)
+		return;
+
+	if (time_before(jiffies, b->timeout)) {
+		mod_timer(&b->hangcheck, b->timeout);
+		return;
+	}
+
+	DRM_DEBUG("Hangcheck timer elapsed... %s idle\n", engine->name);
+	set_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings);
+	mod_timer(&engine->breadcrumbs.fake_irq, jiffies + 1);
+
+	/* Ensure that even if the GPU hangs, we get woken up.
+	 *
+	 * However, note that if no one is waiting, we never notice
+	 * a gpu hang. Eventually, we will have to wait for a resource
+	 * held by the GPU and so trigger a hangcheck. In the most
+	 * pathological case, this will be upon memory starvation! To
+	 * prevent this, we also queue the hangcheck from the retire
+	 * worker.
+	 */
+	i915_queue_hangcheck(engine->i915);
+}
+
+static unsigned long wait_timeout(void)
+{
+	return round_jiffies_up(jiffies + DRM_I915_HANGCHECK_JIFFIES);
+}
+
 static void intel_breadcrumbs_fake_irq(unsigned long data)
 {
 	struct intel_engine_cs *engine = (struct intel_engine_cs *)data;
@@ -51,13 +85,6 @@ static void irq_enable(struct intel_engine_cs *engine)
 	 */
 	engine->breadcrumbs.irq_posted = true;
 
-	/* Make sure the current hangcheck doesn't falsely accuse a just
-	 * started irq handler from missing an interrupt (because the
-	 * interrupt count still matches the stale value from when
-	 * the irq handler was disabled, many hangchecks ago).
-	 */
-	engine->breadcrumbs.irq_wakeups++;
-
 	spin_lock_irq(&engine->i915->irq_lock);
 	engine->irq_enable(engine);
 	spin_unlock_irq(&engine->i915->irq_lock);
@@ -98,17 +125,13 @@ static void __intel_breadcrumbs_enable_irq(struct intel_breadcrumbs *b)
 	}
 
 	if (!b->irq_enabled ||
-	    test_bit(engine->id, &i915->gpu_error.missed_irq_rings))
+	    test_bit(engine->id, &i915->gpu_error.missed_irq_rings)) {
 		mod_timer(&b->fake_irq, jiffies + 1);
-
-	/* Ensure that even if the GPU hangs, we get woken up.
-	 *
-	 * However, note that if no one is waiting, we never notice
-	 * a gpu hang. Eventually, we will have to wait for a resource
-	 * held by the GPU and so trigger a hangcheck. In the most
-	 * pathological case, this will be upon memory starvation!
-	 */
-	i915_queue_hangcheck(i915);
+	} else {
+		/* Ensure we never sleep indefinitely */
+		GEM_BUG_ON(!time_after(b->timeout, jiffies));
+		mod_timer(&b->hangcheck, b->timeout);
+	}
 }
 
 static void __intel_breadcrumbs_disable_irq(struct intel_breadcrumbs *b)
@@ -219,6 +242,7 @@ static bool __intel_engine_add_wait(struct intel_engine_cs *engine,
 		GEM_BUG_ON(!next && !first);
 		if (next && next != &wait->node) {
 			GEM_BUG_ON(first);
+			b->timeout = wait_timeout();
 			b->first_wait = to_wait(next);
 			smp_store_mb(b->irq_seqno_bh, b->first_wait->tsk);
 			/* As there is a delay between reading the current
@@ -245,6 +269,7 @@ static bool __intel_engine_add_wait(struct intel_engine_cs *engine,
 
 	if (first) {
 		GEM_BUG_ON(rb_first(&b->waiters) != &wait->node);
+		b->timeout = wait_timeout();
 		b->first_wait = wait;
 		smp_store_mb(b->irq_seqno_bh, wait->tsk);
 		/* After assigning ourselves as the new bottom-half, we must
@@ -277,11 +302,6 @@ bool intel_engine_add_wait(struct intel_engine_cs *engine,
 	return first;
 }
 
-void intel_engine_enable_fake_irq(struct intel_engine_cs *engine)
-{
-	mod_timer(&engine->breadcrumbs.fake_irq, jiffies + 1);
-}
-
 static inline bool chain_wakeup(struct rb_node *rb, int priority)
 {
 	return rb && to_wait(rb)->tsk->prio <= priority;
@@ -359,6 +379,7 @@ void intel_engine_remove_wait(struct intel_engine_cs *engine,
 			 * the interrupt, or if we have to handle an
 			 * exception rather than a seqno completion.
 			 */
+			b->timeout = wait_timeout();
 			b->first_wait = to_wait(next);
 			smp_store_mb(b->irq_seqno_bh, b->first_wait->tsk);
 			if (b->first_wait->seqno != wait->seqno)
@@ -536,6 +557,9 @@ int intel_engine_init_breadcrumbs(struct intel_engine_cs *engine)
 	setup_timer(&b->fake_irq,
 		    intel_breadcrumbs_fake_irq,
 		    (unsigned long)engine);
+	setup_timer(&b->hangcheck,
+		    intel_breadcrumbs_hangcheck,
+		    (unsigned long)engine);
 
 	/* Spawn a thread to provide a common bottom-half for all signals.
 	 * As this is an asynchronous interface we cannot steal the current
@@ -560,6 +584,7 @@ void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine)
 	if (!IS_ERR_OR_NULL(b->signaler))
 		kthread_stop(b->signaler);
 
+	del_timer_sync(&b->hangcheck);
 	del_timer_sync(&b->fake_irq);
 }
 
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index e9b301ae2d0c..0dd3d1de18aa 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -164,6 +164,7 @@ cleanup:
 void intel_engine_init_hangcheck(struct intel_engine_cs *engine)
 {
 	memset(&engine->hangcheck, 0, sizeof(engine->hangcheck));
+	clear_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings);
 }
 
 static void intel_engine_init_requests(struct intel_engine_cs *engine)
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 43e545e44352..4aed4586b0b6 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -75,7 +75,6 @@ enum intel_engine_hangcheck_action {
 
 struct intel_engine_hangcheck {
 	u64 acthd;
-	unsigned long user_interrupts;
 	u32 seqno;
 	int score;
 	enum intel_engine_hangcheck_action action;
@@ -173,7 +172,6 @@ struct intel_engine_cs {
 	 */
 	struct intel_breadcrumbs {
 		struct task_struct *irq_seqno_bh; /* bh for user interrupts */
-		unsigned long irq_wakeups;
 		bool irq_posted;
 
 		spinlock_t lock; /* protects the lists of requests */
@@ -183,6 +181,9 @@ struct intel_engine_cs {
 		struct task_struct *signaler; /* used for fence signalling */
 		struct drm_i915_gem_request *first_signal;
 		struct timer_list fake_irq; /* used after a missed interrupt */
+		struct timer_list hangcheck; /* detect missed interrupts */
+
+		unsigned long timeout;
 
 		bool irq_enabled : 1;
 		bool rpm_wakelock : 1;
@@ -560,7 +561,6 @@ static inline bool intel_engine_wakeup(struct intel_engine_cs *engine)
 	return wakeup;
 }
 
-void intel_engine_enable_fake_irq(struct intel_engine_cs *engine);
 void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine);
 unsigned int intel_kick_waiters(struct drm_i915_private *i915);
 unsigned int intel_kick_signalers(struct drm_i915_private *i915);
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 125+ messages in thread

* ✗ Ro.CI.BAT: failure for series starting with [01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance (rev6)
  2016-08-07 14:45 First class VMA, take 2 Chris Wilson
                   ` (36 preceding siblings ...)
  2016-08-09 14:10 ` ✗ Ro.CI.BAT: failure for series starting with [01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance (rev5) Patchwork
@ 2016-08-09 14:20 ` Patchwork
  2016-08-10  6:43 ` Patchwork
  38 siblings, 0 replies; 125+ messages in thread
From: Patchwork @ 2016-08-09 14:20 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance (rev6)
URL   : https://patchwork.freedesktop.org/series/10770/
State : failure

== Summary ==

Applying: drm/i915: Add smp_rmb() to busy ioctl's RCU dance
Using index info to reconstruct a base tree...
M	drivers/gpu/drm/i915/i915_gem.c
M	drivers/gpu/drm/i915/i915_gem_request.h
Falling back to patching base and 3-way merge...
Auto-merging drivers/gpu/drm/i915/i915_gem.c
CONFLICT (content): Merge conflict in drivers/gpu/drm/i915/i915_gem.c
error: Failed to merge in the changes.
Patch failed at 0001 drm/i915: Add smp_rmb() to busy ioctl's RCU dance
The copy of the patch that failed is found in: .git/rebase-apply/patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH v3] drm/i915: Move missed interrupt detection from hangcheck to breadcrumbs
  2016-08-09 14:10   ` [PATCH v3] " Chris Wilson
@ 2016-08-09 15:24     ` Mika Kuoppala
  0 siblings, 0 replies; 125+ messages in thread
From: Mika Kuoppala @ 2016-08-09 15:24 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> In commit 2529d57050af ("drm/i915: Drop racy markup of missed-irqs from
> idle-worker") the racy detection of missed interrupts was removed when
> we went idle. This however opened up the issue that the stuck waiters
> were not being reported, causing a test case failure. If we move the
> stuck waiter detection out of hangcheck and into the breadcrumb
> mechanims (i.e. the waiter) itself, we can avoid this issue entirely.
> This leaves hangcheck looking for a stuck GPU (inspecting for request
> advancement and HEAD motion), and breadcrumbs looking for a stuck
> waiter - hopefully make both easier to understand by their segregation.
>
> v2: Reduce the error message as we now run independently of hangcheck,
> and the hanging batch used by igt also counts as a stuck waiter causing
> extra warnings in dmesg.
> v3: Move the breadcrumb's hangcheck kickstart to the first missed wait.
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97104
> Fixes: 2529d57050af (waiter"drm/i915: Drop racy markup of missed-irqs...")
> Testcase: igt/drv_missed_irq
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>

> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_debugfs.c      | 11 ++---
>  drivers/gpu/drm/i915/i915_gem.c          | 10 -----
>  drivers/gpu/drm/i915/i915_irq.c          | 26 +-----------
>  drivers/gpu/drm/i915/intel_breadcrumbs.c | 69 ++++++++++++++++++++++----------
>  drivers/gpu/drm/i915/intel_engine_cs.c   |  1 +
>  drivers/gpu/drm/i915/intel_ringbuffer.h  |  6 +--
>  6 files changed, 56 insertions(+), 67 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index f62285c1ed7f..96bfc745a820 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -787,8 +787,6 @@ static void i915_ring_seqno_info(struct seq_file *m,
>  
>  	seq_printf(m, "Current sequence (%s): %x\n",
>  		   engine->name, intel_engine_get_seqno(engine));
> -	seq_printf(m, "Current user interrupts (%s): %lx\n",
> -		   engine->name, READ_ONCE(engine->breadcrumbs.irq_wakeups));
>  
>  	spin_lock(&b->lock);
>  	for (rb = rb_first(&b->waiters); rb; rb = rb_next(rb)) {
> @@ -1434,11 +1432,10 @@ static int i915_hangcheck_info(struct seq_file *m, void *unused)
>  			   engine->hangcheck.seqno,
>  			   seqno[id],
>  			   engine->last_submitted_seqno);
> -		seq_printf(m, "\twaiters? %d\n",
> -			   intel_engine_has_waiter(engine));
> -		seq_printf(m, "\tuser interrupts = %lx [current %lx]\n",
> -			   engine->hangcheck.user_interrupts,
> -			   READ_ONCE(engine->breadcrumbs.irq_wakeups));
> +		seq_printf(m, "\twaiters? %s, fake irq active? %s\n",
> +			   yesno(intel_engine_has_waiter(engine)),
> +			   yesno(test_bit(engine->id,
> +					  &dev_priv->gpu_error.missed_irq_rings)));
>  		seq_printf(m, "\tACTHD = 0x%08llx [current 0x%08llx]\n",
>  			   (long long)engine->hangcheck.acthd,
>  			   (long long)acthd[id]);
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index d71fa9a93afa..2bb9ef91a243 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2524,7 +2524,6 @@ i915_gem_idle_work_handler(struct work_struct *work)
>  		container_of(work, typeof(*dev_priv), gt.idle_work.work);
>  	struct drm_device *dev = &dev_priv->drm;
>  	struct intel_engine_cs *engine;
> -	unsigned int stuck_engines;
>  	bool rearm_hangcheck;
>  
>  	if (!READ_ONCE(dev_priv->gt.awake))
> @@ -2554,15 +2553,6 @@ i915_gem_idle_work_handler(struct work_struct *work)
>  	dev_priv->gt.awake = false;
>  	rearm_hangcheck = false;
>  
> -	/* As we have disabled hangcheck, we need to unstick any waiters still
> -	 * hanging around. However, as we may be racing against the interrupt
> -	 * handler or the waiters themselves, we skip enabling the fake-irq.
> -	 */
> -	stuck_engines = intel_kick_waiters(dev_priv);
> -	if (unlikely(stuck_engines))
> -		DRM_DEBUG_DRIVER("kicked stuck waiters (%x)...missed irq?\n",
> -				 stuck_engines);
> -
>  	if (INTEL_GEN(dev_priv) >= 6)
>  		gen6_rps_idle(dev_priv);
>  	intel_runtime_pm_put(dev_priv);
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 591f452ece68..ebb83d5a448b 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -972,10 +972,8 @@ static void ironlake_rps_change_irq_handler(struct drm_i915_private *dev_priv)
>  static void notify_ring(struct intel_engine_cs *engine)
>  {
>  	smp_store_mb(engine->breadcrumbs.irq_posted, true);
> -	if (intel_engine_wakeup(engine)) {
> +	if (intel_engine_wakeup(engine))
>  		trace_i915_gem_request_notify(engine);
> -		engine->breadcrumbs.irq_wakeups++;
> -	}
>  }
>  
>  static void vlv_c0_read(struct drm_i915_private *dev_priv,
> @@ -3044,22 +3042,6 @@ engine_stuck(struct intel_engine_cs *engine, u64 acthd)
>  	return HANGCHECK_HUNG;
>  }
>  
> -static unsigned long kick_waiters(struct intel_engine_cs *engine)
> -{
> -	struct drm_i915_private *i915 = engine->i915;
> -	unsigned long irq_count = READ_ONCE(engine->breadcrumbs.irq_wakeups);
> -
> -	if (engine->hangcheck.user_interrupts == irq_count &&
> -	    !test_and_set_bit(engine->id, &i915->gpu_error.missed_irq_rings)) {
> -		if (!test_bit(engine->id, &i915->gpu_error.test_irq_rings))
> -			DRM_ERROR("Hangcheck timer elapsed... %s idle\n",
> -				  engine->name);
> -
> -		intel_engine_enable_fake_irq(engine);
> -	}
> -
> -	return irq_count;
> -}
>  /*
>   * This is called when the chip hasn't reported back with completed
>   * batchbuffers in a long time. We keep track per ring seqno progress and
> @@ -3097,7 +3079,6 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
>  		bool busy = intel_engine_has_waiter(engine);
>  		u64 acthd;
>  		u32 seqno;
> -		unsigned user_interrupts;
>  
>  		semaphore_clear_deadlocks(dev_priv);
>  
> @@ -3114,15 +3095,11 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
>  		acthd = intel_engine_get_active_head(engine);
>  		seqno = intel_engine_get_seqno(engine);
>  
> -		/* Reset stuck interrupts between batch advances */
> -		user_interrupts = 0;
> -
>  		if (engine->hangcheck.seqno == seqno) {
>  			if (!intel_engine_is_active(engine)) {
>  				engine->hangcheck.action = HANGCHECK_IDLE;
>  				if (busy) {
>  					/* Safeguard against driver failure */
> -					user_interrupts = kick_waiters(engine);
>  					engine->hangcheck.score += BUSY;
>  				}
>  			} else {
> @@ -3185,7 +3162,6 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
>  
>  		engine->hangcheck.seqno = seqno;
>  		engine->hangcheck.acthd = acthd;
> -		engine->hangcheck.user_interrupts = user_interrupts;
>  		busy_count += busy;
>  	}
>  
> diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> index 90867446f1a5..7be9af1d5424 100644
> --- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
> +++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> @@ -26,6 +26,40 @@
>  
>  #include "i915_drv.h"
>  
> +static void intel_breadcrumbs_hangcheck(unsigned long data)
> +{
> +	struct intel_engine_cs *engine = (struct intel_engine_cs *)data;
> +	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> +
> +	if (!b->irq_enabled)
> +		return;
> +
> +	if (time_before(jiffies, b->timeout)) {
> +		mod_timer(&b->hangcheck, b->timeout);
> +		return;
> +	}
> +
> +	DRM_DEBUG("Hangcheck timer elapsed... %s idle\n", engine->name);
> +	set_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings);
> +	mod_timer(&engine->breadcrumbs.fake_irq, jiffies + 1);
> +
> +	/* Ensure that even if the GPU hangs, we get woken up.
> +	 *
> +	 * However, note that if no one is waiting, we never notice
> +	 * a gpu hang. Eventually, we will have to wait for a resource
> +	 * held by the GPU and so trigger a hangcheck. In the most
> +	 * pathological case, this will be upon memory starvation! To
> +	 * prevent this, we also queue the hangcheck from the retire
> +	 * worker.
> +	 */
> +	i915_queue_hangcheck(engine->i915);
> +}
> +
> +static unsigned long wait_timeout(void)
> +{
> +	return round_jiffies_up(jiffies + DRM_I915_HANGCHECK_JIFFIES);
> +}
> +
>  static void intel_breadcrumbs_fake_irq(unsigned long data)
>  {
>  	struct intel_engine_cs *engine = (struct intel_engine_cs *)data;
> @@ -51,13 +85,6 @@ static void irq_enable(struct intel_engine_cs *engine)
>  	 */
>  	engine->breadcrumbs.irq_posted = true;
>  
> -	/* Make sure the current hangcheck doesn't falsely accuse a just
> -	 * started irq handler from missing an interrupt (because the
> -	 * interrupt count still matches the stale value from when
> -	 * the irq handler was disabled, many hangchecks ago).
> -	 */
> -	engine->breadcrumbs.irq_wakeups++;
> -
>  	spin_lock_irq(&engine->i915->irq_lock);
>  	engine->irq_enable(engine);
>  	spin_unlock_irq(&engine->i915->irq_lock);
> @@ -98,17 +125,13 @@ static void __intel_breadcrumbs_enable_irq(struct intel_breadcrumbs *b)
>  	}
>  
>  	if (!b->irq_enabled ||
> -	    test_bit(engine->id, &i915->gpu_error.missed_irq_rings))
> +	    test_bit(engine->id, &i915->gpu_error.missed_irq_rings)) {
>  		mod_timer(&b->fake_irq, jiffies + 1);
> -
> -	/* Ensure that even if the GPU hangs, we get woken up.
> -	 *
> -	 * However, note that if no one is waiting, we never notice
> -	 * a gpu hang. Eventually, we will have to wait for a resource
> -	 * held by the GPU and so trigger a hangcheck. In the most
> -	 * pathological case, this will be upon memory starvation!
> -	 */
> -	i915_queue_hangcheck(i915);
> +	} else {
> +		/* Ensure we never sleep indefinitely */
> +		GEM_BUG_ON(!time_after(b->timeout, jiffies));
> +		mod_timer(&b->hangcheck, b->timeout);
> +	}
>  }
>  
>  static void __intel_breadcrumbs_disable_irq(struct intel_breadcrumbs *b)
> @@ -219,6 +242,7 @@ static bool __intel_engine_add_wait(struct intel_engine_cs *engine,
>  		GEM_BUG_ON(!next && !first);
>  		if (next && next != &wait->node) {
>  			GEM_BUG_ON(first);
> +			b->timeout = wait_timeout();
>  			b->first_wait = to_wait(next);
>  			smp_store_mb(b->irq_seqno_bh, b->first_wait->tsk);
>  			/* As there is a delay between reading the current
> @@ -245,6 +269,7 @@ static bool __intel_engine_add_wait(struct intel_engine_cs *engine,
>  
>  	if (first) {
>  		GEM_BUG_ON(rb_first(&b->waiters) != &wait->node);
> +		b->timeout = wait_timeout();
>  		b->first_wait = wait;
>  		smp_store_mb(b->irq_seqno_bh, wait->tsk);
>  		/* After assigning ourselves as the new bottom-half, we must
> @@ -277,11 +302,6 @@ bool intel_engine_add_wait(struct intel_engine_cs *engine,
>  	return first;
>  }
>  
> -void intel_engine_enable_fake_irq(struct intel_engine_cs *engine)
> -{
> -	mod_timer(&engine->breadcrumbs.fake_irq, jiffies + 1);
> -}
> -
>  static inline bool chain_wakeup(struct rb_node *rb, int priority)
>  {
>  	return rb && to_wait(rb)->tsk->prio <= priority;
> @@ -359,6 +379,7 @@ void intel_engine_remove_wait(struct intel_engine_cs *engine,
>  			 * the interrupt, or if we have to handle an
>  			 * exception rather than a seqno completion.
>  			 */
> +			b->timeout = wait_timeout();
>  			b->first_wait = to_wait(next);
>  			smp_store_mb(b->irq_seqno_bh, b->first_wait->tsk);
>  			if (b->first_wait->seqno != wait->seqno)
> @@ -536,6 +557,9 @@ int intel_engine_init_breadcrumbs(struct intel_engine_cs *engine)
>  	setup_timer(&b->fake_irq,
>  		    intel_breadcrumbs_fake_irq,
>  		    (unsigned long)engine);
> +	setup_timer(&b->hangcheck,
> +		    intel_breadcrumbs_hangcheck,
> +		    (unsigned long)engine);
>  
>  	/* Spawn a thread to provide a common bottom-half for all signals.
>  	 * As this is an asynchronous interface we cannot steal the current
> @@ -560,6 +584,7 @@ void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine)
>  	if (!IS_ERR_OR_NULL(b->signaler))
>  		kthread_stop(b->signaler);
>  
> +	del_timer_sync(&b->hangcheck);
>  	del_timer_sync(&b->fake_irq);
>  }
>  
> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> index e9b301ae2d0c..0dd3d1de18aa 100644
> --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> @@ -164,6 +164,7 @@ cleanup:
>  void intel_engine_init_hangcheck(struct intel_engine_cs *engine)
>  {
>  	memset(&engine->hangcheck, 0, sizeof(engine->hangcheck));
> +	clear_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings);
>  }
>  
>  static void intel_engine_init_requests(struct intel_engine_cs *engine)
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 43e545e44352..4aed4586b0b6 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -75,7 +75,6 @@ enum intel_engine_hangcheck_action {
>  
>  struct intel_engine_hangcheck {
>  	u64 acthd;
> -	unsigned long user_interrupts;
>  	u32 seqno;
>  	int score;
>  	enum intel_engine_hangcheck_action action;
> @@ -173,7 +172,6 @@ struct intel_engine_cs {
>  	 */
>  	struct intel_breadcrumbs {
>  		struct task_struct *irq_seqno_bh; /* bh for user interrupts */
> -		unsigned long irq_wakeups;
>  		bool irq_posted;
>  
>  		spinlock_t lock; /* protects the lists of requests */
> @@ -183,6 +181,9 @@ struct intel_engine_cs {
>  		struct task_struct *signaler; /* used for fence signalling */
>  		struct drm_i915_gem_request *first_signal;
>  		struct timer_list fake_irq; /* used after a missed interrupt */
> +		struct timer_list hangcheck; /* detect missed interrupts */
> +
> +		unsigned long timeout;
>  
>  		bool irq_enabled : 1;
>  		bool rpm_wakelock : 1;
> @@ -560,7 +561,6 @@ static inline bool intel_engine_wakeup(struct intel_engine_cs *engine)
>  	return wakeup;
>  }
>  
> -void intel_engine_enable_fake_irq(struct intel_engine_cs *engine);
>  void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine);
>  unsigned int intel_kick_waiters(struct drm_i915_private *i915);
>  unsigned int intel_kick_signalers(struct drm_i915_private *i915);
> -- 
> 2.8.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 08/33] drm/i915: Move setting of request->batch into its single callsite
  2016-08-07 14:45 ` [PATCH 08/33] drm/i915: Move setting of request->batch into its single callsite Chris Wilson
@ 2016-08-09 15:53   ` Mika Kuoppala
  2016-08-09 16:04     ` Chris Wilson
  2016-08-10  7:19   ` Joonas Lahtinen
  1 sibling, 1 reply; 125+ messages in thread
From: Mika Kuoppala @ 2016-08-09 15:53 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> request->batch_obj is only set by execbuffer for the convenience of
> debugging hangs. By moving that operation to the callsite, we can
> simplify all other callers and future patches. We also move the
> complications of reference handling of the request->batch_obj next to
> where the active tracking is set up for the request.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c | 10 +++++++++-
>  drivers/gpu/drm/i915/i915_gem_request.c    | 12 +-----------
>  drivers/gpu/drm/i915/i915_gem_request.h    |  8 +++-----
>  3 files changed, 13 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index c494b79ded20..c8d13fea4b25 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -1702,6 +1702,14 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>  		goto err_batch_unpin;
>  	}
>  
> +	/* Whilst this request exists, batch_obj will be on the
> +	 * active_list, and so will hold the active reference. Only when this
> +	 * request is retired will the the batch_obj be moved onto the
> +	 * inactive_list and lose its active reference. Hence we do not need
> +	 * to explicitly hold another reference here.
> +	 */

The comment here might or might not need revisiting. I can't say yet.

But when I tried to learn how the current code works, I found
that there are comments referencing __i915_gem_active_get_request_rcu()
which does not exist.

-Mika

> +	params->request->batch_obj = params->batch->obj;
> +
>  	ret = i915_gem_request_add_to_client(params->request, file);
>  	if (ret)
>  		goto err_request;
> @@ -1720,7 +1728,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>  
>  	ret = execbuf_submit(params, args, &eb->vmas);
>  err_request:
> -	__i915_add_request(params->request, params->batch->obj, ret == 0);
> +	__i915_add_request(params->request, ret == 0);
>  
>  err_batch_unpin:
>  	/*
> diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
> index b7ffde002a62..c6f523e2879c 100644
> --- a/drivers/gpu/drm/i915/i915_gem_request.c
> +++ b/drivers/gpu/drm/i915/i915_gem_request.c
> @@ -461,9 +461,7 @@ static void i915_gem_mark_busy(const struct intel_engine_cs *engine)
>   * request is not being tracked for completion but the work itself is
>   * going to happen on the hardware. This would be a Bad Thing(tm).
>   */
> -void __i915_add_request(struct drm_i915_gem_request *request,
> -			struct drm_i915_gem_object *obj,
> -			bool flush_caches)
> +void __i915_add_request(struct drm_i915_gem_request *request, bool flush_caches)
>  {
>  	struct intel_engine_cs *engine;
>  	struct intel_ring *ring;
> @@ -504,14 +502,6 @@ void __i915_add_request(struct drm_i915_gem_request *request,
>  
>  	request->head = request_start;
>  
> -	/* Whilst this request exists, batch_obj will be on the
> -	 * active_list, and so will hold the active reference. Only when this
> -	 * request is retired will the the batch_obj be moved onto the
> -	 * inactive_list and lose its active reference. Hence we do not need
> -	 * to explicitly hold another reference here.
> -	 */
> -	request->batch_obj = obj;
> -
>  	/* Seal the request and mark it as pending execution. Note that
>  	 * we may inspect this state, without holding any locks, during
>  	 * hangcheck. Hence we apply the barrier to ensure that we do not
> diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
> index 721eb8cbce9b..d5176f9cc22f 100644
> --- a/drivers/gpu/drm/i915/i915_gem_request.h
> +++ b/drivers/gpu/drm/i915/i915_gem_request.h
> @@ -225,13 +225,11 @@ static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst,
>  	*pdst = src;
>  }
>  
> -void __i915_add_request(struct drm_i915_gem_request *req,
> -			struct drm_i915_gem_object *batch_obj,
> -			bool flush_caches);
> +void __i915_add_request(struct drm_i915_gem_request *req, bool flush_caches);
>  #define i915_add_request(req) \
> -	__i915_add_request(req, NULL, true)
> +	__i915_add_request(req, true)
>  #define i915_add_request_no_flush(req) \
> -	__i915_add_request(req, NULL, false)
> +	__i915_add_request(req, false)
>  
>  struct intel_rps_client;
>  #define NO_WAITBOOST ERR_PTR(-1)
> -- 
> 2.8.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 08/33] drm/i915: Move setting of request->batch into its single callsite
  2016-08-09 15:53   ` Mika Kuoppala
@ 2016-08-09 16:04     ` Chris Wilson
  0 siblings, 0 replies; 125+ messages in thread
From: Chris Wilson @ 2016-08-09 16:04 UTC (permalink / raw)
  To: Mika Kuoppala; +Cc: intel-gfx

On Tue, Aug 09, 2016 at 06:53:16PM +0300, Mika Kuoppala wrote:
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > request->batch_obj is only set by execbuffer for the convenience of
> > debugging hangs. By moving that operation to the callsite, we can
> > simplify all other callers and future patches. We also move the
> > complications of reference handling of the request->batch_obj next to
> > where the active tracking is set up for the request.
> >
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >  drivers/gpu/drm/i915/i915_gem_execbuffer.c | 10 +++++++++-
> >  drivers/gpu/drm/i915/i915_gem_request.c    | 12 +-----------
> >  drivers/gpu/drm/i915/i915_gem_request.h    |  8 +++-----
> >  3 files changed, 13 insertions(+), 17 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > index c494b79ded20..c8d13fea4b25 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > @@ -1702,6 +1702,14 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
> >  		goto err_batch_unpin;
> >  	}
> >  
> > +	/* Whilst this request exists, batch_obj will be on the
> > +	 * active_list, and so will hold the active reference. Only when this
> > +	 * request is retired will the the batch_obj be moved onto the
> > +	 * inactive_list and lose its active reference. Hence we do not need
> > +	 * to explicitly hold another reference here.
> > +	 */
> 
> The comment here might or might not need revisiting. I can't say yet.

That's still true. Active objects have a reference that prevents them
from being freed whilst in use by the GPU - currently managed by
i915_gem_object_retire__read() iirc.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* ✗ Ro.CI.BAT: failure for series starting with [01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance (rev6)
  2016-08-07 14:45 First class VMA, take 2 Chris Wilson
                   ` (37 preceding siblings ...)
  2016-08-09 14:20 ` ✗ Ro.CI.BAT: failure for series starting with [01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance (rev6) Patchwork
@ 2016-08-10  6:43 ` Patchwork
  38 siblings, 0 replies; 125+ messages in thread
From: Patchwork @ 2016-08-10  6:43 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance (rev6)
URL   : https://patchwork.freedesktop.org/series/10770/
State : failure

== Summary ==

Applying: drm/i915: Add smp_rmb() to busy ioctl's RCU dance
Using index info to reconstruct a base tree...
M	drivers/gpu/drm/i915/i915_gem.c
M	drivers/gpu/drm/i915/i915_gem_request.h
Falling back to patching base and 3-way merge...
Auto-merging drivers/gpu/drm/i915/i915_gem.c
CONFLICT (content): Merge conflict in drivers/gpu/drm/i915/i915_gem.c
error: Failed to merge in the changes.
Patch failed at 0001 drm/i915: Add smp_rmb() to busy ioctl's RCU dance
The copy of the patch that failed is found in: .git/rebase-apply/patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 05/33] drm/i915: Reduce amount of duplicate buffer information captured on error
  2016-08-07 14:45 ` [PATCH 05/33] drm/i915: Reduce amount of duplicate buffer information captured on error Chris Wilson
@ 2016-08-10  7:04   ` Joonas Lahtinen
  2016-08-10  7:15     ` Chris Wilson
  0 siblings, 1 reply; 125+ messages in thread
From: Joonas Lahtinen @ 2016-08-10  7:04 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On su, 2016-08-07 at 15:45 +0100, Chris Wilson wrote:
> @@ -414,18 +403,25 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
>  			error_print_engine(m, &error->engine[i]);
>  	}
>  
> -	for (i = 0; i < error->vm_count; i++) {
> -		err_printf(m, "vm[%d]\n", i);
> +	for (i = 0; i < I915_NUM_ENGINES; i++) {
> +		if (!error->active_vm[i])
> +			break;
>  
> -		print_error_buffers(m, "Active",
> +		err_printf(m, "Active vm[%d]\n", i);
> +		for (j = 0; j < I915_NUM_ENGINES; j++) {
> +			if (error->engine[j].vm == error->active_vm[i])

break here and then print outside loop?

> +				err_printf(m, "    %s\n",
> +					   dev_priv->engine[j].name);
> +		}
> +

<SNIP>

>  static void i915_gem_capture_vm(struct drm_i915_private *dev_priv,
>  				struct drm_i915_error_state *error,
>  				struct i915_address_space *vm,
>  				const int ndx)
>  {
> -	struct drm_i915_error_buffer *active_bo = NULL, *pinned_bo = NULL;
> -	struct drm_i915_gem_object *obj;
> +	struct drm_i915_error_buffer *active_bo;
>  	struct i915_vma *vma;
>  	int i;

'count';

>  
>  	i = 0;
>  	list_for_each_entry(vma, &vm->active_list, vm_link)
>  		i++;
> -	error->active_bo_count[ndx] = i;
> -
> -	list_for_each_entry(obj, &dev_priv->mm.bound_list, global_list) {
> -		list_for_each_entry(vma, &obj->vma_list, obj_link)
> -			if (vma->vm == vm && i915_vma_is_pinned(vma))
> -				i++;
> -	}
> -	error->pinned_bo_count[ndx] = i - error->active_bo_count[ndx];
>  
> -	if (i) {
> +	active_bo = NULL;

Could be initialized at declaration for better readability.

> +	if (i)
>  		active_bo = kcalloc(i, sizeof(*active_bo), GFP_ATOMIC);
> -		if (active_bo)
> -			pinned_bo = active_bo + error->active_bo_count[ndx];
> -	}
> -
>  	if (active_bo)
> -		error->active_bo_count[ndx] =
> -			capture_active_bo(active_bo,
> -					  error->active_bo_count[ndx],
> -					  &vm->active_list);
> -
> -	if (pinned_bo)
> -		error->pinned_bo_count[ndx] =
> -			capture_pinned_bo(pinned_bo,
> -					  error->pinned_bo_count[ndx],
> -					  &dev_priv->mm.bound_list, vm);
> +		i = capture_error_bo(active_bo, i, &vm->active_list, false);
> +	else
> +		i = 0;
> +
>  	error->active_bo[ndx] = active_bo;
> -	error->pinned_bo[ndx] = pinned_bo;
> +	error->active_bo_count[ndx] = i;

While at it, make the i variable 'count' and initialize it at the
declaration too. And maybe make the ndx variable something reasonable
like 'engine' or 'eid'.

> +	error->active_vm[ndx] = vm;
>  }
> 

<SNIP>

>  
> +	for (i = 0; i < I915_NUM_ENGINES; i++) {
> +		struct drm_i915_error_engine *ee = &error->engine[i];
> +
> +		if (!ee->vm)
> +			continue;
> +
> +		for (j = 0; j < i; j++)
> +			if (error->engine[j].vm == ee->vm)
> +				break;
> +		if (j != i)
> +			continue;

Maybe add a comment that we wan't to avoid capturing same vm twice.

> +
> +		i915_gem_capture_vm(dev_priv, error, ee->vm, cnt++);
>  	}
>  }
>  
> +static void i915_capture_pinned_buffers(struct drm_i915_private *dev_priv,
> +					struct drm_i915_error_state *error)
> +{
> +	struct i915_address_space *vm = &dev_priv->ggtt.base;
> +	struct drm_i915_error_buffer *bo;
> +	struct i915_vma *vma;
> +	int i, j;

count_active, count_inactive? i and j are iteration variables.

> +
> +	i = 0;
> +	list_for_each_entry(vma, &vm->active_list, vm_link)
> +		i++;
> +
> +	j = 0;
> +	list_for_each_entry(vma, &vm->inactive_list, vm_link)
> +		j++;
> +
> +	bo = NULL;

Initialize at declaration as this is one-shot.

>  /* Capture all registers which don't fit into another category. */
>  static void i915_capture_reg_state(struct drm_i915_private *dev_priv,
>  				   struct drm_i915_error_state *error)
> @@ -1436,10 +1402,12 @@ void i915_capture_error_state(struct drm_i915_private *dev_priv,
>  
>  	i915_capture_gen_state(dev_priv, error);
>  	i915_capture_reg_state(dev_priv, error);
> -	i915_gem_capture_buffers(dev_priv, error);
>  	i915_gem_record_fences(dev_priv, error);
>  	i915_gem_record_rings(dev_priv, error);
>  
> +	i915_capture_active_buffers(dev_priv, error);
> +	i915_capture_pinned_buffers(dev_priv, error);
> +

Any specific reason for reordering here?

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 05/33] drm/i915: Reduce amount of duplicate buffer information captured on error
  2016-08-10  7:04   ` Joonas Lahtinen
@ 2016-08-10  7:15     ` Chris Wilson
  2016-08-10  8:07       ` Joonas Lahtinen
  0 siblings, 1 reply; 125+ messages in thread
From: Chris Wilson @ 2016-08-10  7:15 UTC (permalink / raw)
  To: Joonas Lahtinen; +Cc: intel-gfx

On Wed, Aug 10, 2016 at 10:04:16AM +0300, Joonas Lahtinen wrote:
> On su, 2016-08-07 at 15:45 +0100, Chris Wilson wrote:
> > -	if (i) {
> > +	active_bo = NULL;
> 
> Could be initialized at declaration for better readability.

No. I disagree strongly. I dislike having to go back to the beginning of
the block to check to see if was initialised before the if-chain that
otherwise sets the value.

> >  /* Capture all registers which don't fit into another category. */
> >  static void i915_capture_reg_state(struct drm_i915_private *dev_priv,
> >  				   struct drm_i915_error_state *error)
> > @@ -1436,10 +1402,12 @@ void i915_capture_error_state(struct drm_i915_private *dev_priv,
> >  
> >  	i915_capture_gen_state(dev_priv, error);
> >  	i915_capture_reg_state(dev_priv, error);
> > -	i915_gem_capture_buffers(dev_priv, error);
> >  	i915_gem_record_fences(dev_priv, error);
> >  	i915_gem_record_rings(dev_priv, error);
> >  
> > +	i915_capture_active_buffers(dev_priv, error);
> > +	i915_capture_pinned_buffers(dev_priv, error);
> > +
> 
> Any specific reason for reordering here?

Different varieties of state capture, trying to use whitespace for
grouping.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 08/33] drm/i915: Move setting of request->batch into its single callsite
  2016-08-07 14:45 ` [PATCH 08/33] drm/i915: Move setting of request->batch into its single callsite Chris Wilson
  2016-08-09 15:53   ` Mika Kuoppala
@ 2016-08-10  7:19   ` Joonas Lahtinen
  1 sibling, 0 replies; 125+ messages in thread
From: Joonas Lahtinen @ 2016-08-10  7:19 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On su, 2016-08-07 at 15:45 +0100, Chris Wilson wrote:
> request->batch_obj is only set by execbuffer for the convenience of
> debugging hangs. By moving that operation to the callsite, we can
> simplify all other callers and future patches. We also move the
> complications of reference handling of the request->batch_obj next to
> where the active tracking is set up for the request.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 12/33] drm/i915: Reduce i915_gem_objects to only show object information
  2016-08-07 14:45 ` [PATCH 12/33] drm/i915: Reduce i915_gem_objects to only show object information Chris Wilson
@ 2016-08-10  7:29   ` Joonas Lahtinen
  2016-08-10  7:38     ` Chris Wilson
  0 siblings, 1 reply; 125+ messages in thread
From: Joonas Lahtinen @ 2016-08-10  7:29 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On su, 2016-08-07 at 15:45 +0100, Chris Wilson wrote:
>  		if (obj->pin_display) {
> -			mappable_size += i915_gem_obj_ggtt_size(obj);
> -			++mappable_count;
> +			pin_size += obj->base.size;
> +			++pin_count;

variables names to form pin_display_*

> +	seq_printf(m, "%u mapped objects, %llu bytes\n",
> +		   mapped_count, mapped_size);
> +	seq_printf(m, "%u pinned objects, %llu bytes\n",

"display pinned objects"

With those;

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 12/33] drm/i915: Reduce i915_gem_objects to only show object information
  2016-08-10  7:29   ` Joonas Lahtinen
@ 2016-08-10  7:38     ` Chris Wilson
  2016-08-10  8:10       ` Joonas Lahtinen
  0 siblings, 1 reply; 125+ messages in thread
From: Chris Wilson @ 2016-08-10  7:38 UTC (permalink / raw)
  To: Joonas Lahtinen; +Cc: intel-gfx

On Wed, Aug 10, 2016 at 10:29:59AM +0300, Joonas Lahtinen wrote:
> On su, 2016-08-07 at 15:45 +0100, Chris Wilson wrote:
> >  		if (obj->pin_display) {
> > -			mappable_size += i915_gem_obj_ggtt_size(obj);
> > -			++mappable_count;
> > +			pin_size += obj->base.size;
> > +			++pin_count;
> 
> variables names to form pin_display_*

No.

> > +	seq_printf(m, "%u mapped objects, %llu bytes\n",
> > +		   mapped_count, mapped_size);
> > +	seq_printf(m, "%u pinned objects, %llu bytes\n",
> 
> "display pinned objects"

Disagree as well.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 18/33] drm/i915: Use VMA as the primary object for context state
  2016-08-07 14:45 ` [PATCH 18/33] drm/i915: Use VMA as the primary object for context state Chris Wilson
@ 2016-08-10  8:03   ` Joonas Lahtinen
  2016-08-10  8:25     ` Chris Wilson
  0 siblings, 1 reply; 125+ messages in thread
From: Joonas Lahtinen @ 2016-08-10  8:03 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On su, 2016-08-07 at 15:45 +0100, Chris Wilson wrote:
> -	if (!i915_gem_obj_ggtt_bound(ctx_obj))
> -		seq_puts(m, "\tNot bound in GGTT\n");
> -	else
> -		ggtt_offset = i915_gem_obj_ggtt_offset(ctx_obj);
> +	if (vma->flags & I915_VMA_GLOBAL_BIND)
> +		seq_printf(m, "\tBound in GGTT at 0x%x\n",

0x%04x?

> +			   lower_32_bits(vma->node.start));
>  
> -	if (i915_gem_object_get_pages(ctx_obj)) {
> -		seq_puts(m, "\tFailed to get pages for context object\n");
> +	if (i915_gem_object_get_pages(vma->obj)) {
> +		seq_puts(m, "\tFailed to get pages for context object\n\n");
>  		return;
>  	}
>  
> -	page = i915_gem_object_get_page(ctx_obj, LRC_STATE_PN);
> -	if (!WARN_ON(page == NULL)) {
> -		reg_state = kmap_atomic(page);
> +	page = i915_gem_object_get_page(vma->obj, LRC_STATE_PN);
> +	if (page) {

Dropped the WARN_ON? No mention in the commit message.

> @@ -620,9 +631,7 @@ mi_set_context(struct drm_i915_gem_request *req, u32 hw_flags)
>  
>  	intel_ring_emit(ring, MI_NOOP);
>  	intel_ring_emit(ring, MI_SET_CONTEXT);
> -	intel_ring_emit(ring,
> -			i915_gem_obj_ggtt_offset(req->ctx->engine[RCS].state) |
> -			flags);
> +	intel_ring_emit(ring, req->ctx->engine[RCS].state->node.start | flags);

Do we somewhere make sure flags do not collide with address? Not
related to this patch, though.

> @@ -778,16 +788,12 @@ static int do_rcs_switch(struct drm_i915_gem_request *req)
>  	from = engine->last_context;
>  
>  	/*
> -	 * Clear this page out of any CPU caches for coherent swap-in/out. Note
> -	 * that thanks to write = false in this call and us not setting any gpu
> -	 * write domains when putting a context object onto the active list
> -	 * (when switching away from it), this won't block.
> -	 *
> -	 * XXX: We need a real interface to do this instead of trickery.

What is changed to make this comment obsolete or should have it been
removed earlier?
 
>  
>  	return 0;
>  
> -unpin_out:
> -	i915_gem_object_ggtt_unpin(to->engine[RCS].state);
> +unpin_vma:

sole error path; "err"

> +	i915_vma_unpin(to->engine[RCS].state);
>  	return ret;
>  }
>  
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index c621fa23cd28..21a4d0220c17 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -1077,9 +1077,10 @@ static void i915_gem_record_rings(struct drm_i915_private *dev_priv,
>  					i915_error_ggtt_object_create(dev_priv,
>  								      engine->scratch.obj);
>  
> -			ee->ctx =
> -				i915_error_ggtt_object_create(dev_priv,
> -							      request->ctx->engine[i].state);
> +			if (request->ctx->engine[i].state) {
> +				ee->ctx = i915_error_ggtt_object_create(dev_priv,
> +									request->ctx->engine[i].state->obj);
> +			}

Why conditional now?

>  
>  	i915_gem_context_get(ctx);
>  	return 0;
>  
>  unpin_map:
> -	i915_gem_object_unpin_map(ce->state);
> -unpin_ctx_obj:
> -	i915_gem_object_ggtt_unpin(ce->state);
> +	i915_gem_object_unpin_map(ce->state->obj);
> +unpin_vma:
> +	__i915_vma_unpin(ce->state);

err_vma while at it?

> @@ -2161,7 +2162,7 @@ static int execlists_context_deferred_alloc(struct i915_gem_context *ctx,
>  	}
>  
>  	ce->ring = ring;
> -	ce->state = ctx_obj;
> +	ce->state = vma;

Maybe the member name could be just ce->vma too?

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 05/33] drm/i915: Reduce amount of duplicate buffer information captured on error
  2016-08-10  7:15     ` Chris Wilson
@ 2016-08-10  8:07       ` Joonas Lahtinen
  2016-08-10  8:36         ` Chris Wilson
  0 siblings, 1 reply; 125+ messages in thread
From: Joonas Lahtinen @ 2016-08-10  8:07 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On ke, 2016-08-10 at 08:15 +0100, Chris Wilson wrote:
> On Wed, Aug 10, 2016 at 10:04:16AM +0300, Joonas Lahtinen wrote:
> > 
> > On su, 2016-08-07 at 15:45 +0100, Chris Wilson wrote:
> > > 
> > > -	if (i) {
> > > +	active_bo = NULL;
> > Could be initialized at declaration for better readability.
> No. I disagree strongly. I dislike having to go back to the beginning of
> the block to check to see if was initialised before the if-chain that
> otherwise sets the value.

GCC has captured such an uninitialized variable scenario for quite a
while. Just increases noise.

> 
> > 
> > > 
> > >  /* Capture all registers which don't fit into another category. */
> > >  static void i915_capture_reg_state(struct drm_i915_private *dev_priv,
> > >  				   struct drm_i915_error_state *error)
> > > @@ -1436,10 +1402,12 @@ void i915_capture_error_state(struct drm_i915_private *dev_priv,
> > >  
> > >  	i915_capture_gen_state(dev_priv, error);
> > >  	i915_capture_reg_state(dev_priv, error);
> > > -	i915_gem_capture_buffers(dev_priv, error);
> > >  	i915_gem_record_fences(dev_priv, error);
> > >  	i915_gem_record_rings(dev_priv, error);
> > >  
> > > +	i915_capture_active_buffers(dev_priv, error);
> > > +	i915_capture_pinned_buffers(dev_priv, error);
> > > +
> > Any specific reason for reordering here?
> Different varieties of state capture, trying to use whitespace for
> grouping.

Maybe keep it at current place and add whitespace before and after,
making it three blocks?

Regards, joonas

> -Chris
> 
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 12/33] drm/i915: Reduce i915_gem_objects to only show object information
  2016-08-10  7:38     ` Chris Wilson
@ 2016-08-10  8:10       ` Joonas Lahtinen
  0 siblings, 0 replies; 125+ messages in thread
From: Joonas Lahtinen @ 2016-08-10  8:10 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On ke, 2016-08-10 at 08:38 +0100, Chris Wilson wrote:
> On Wed, Aug 10, 2016 at 10:29:59AM +0300, Joonas Lahtinen wrote:
> > 
> > On su, 2016-08-07 at 15:45 +0100, Chris Wilson wrote:
> > > 
> > >  		if (obj->pin_display) {
> > > -			mappable_size += i915_gem_obj_ggtt_size(obj);
> > > -			++mappable_count;
> > > +			pin_size += obj->base.size;
> > > +			++pin_count;
> > variables names to form pin_display_*
> No.
> 
> > 
> > > 
> > > +	seq_printf(m, "%u mapped objects, %llu bytes\n",
> > > +		   mapped_count, mapped_size);
> > > +	seq_printf(m, "%u pinned objects, %llu bytes\n",
> > "display pinned objects"
> Disagree as well.

Would be in line with the "i915_gem_pin_display" in sysfs. If not, then
keep the previous test condition?

Regards, Joonas

> -Chris
> 
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 18/33] drm/i915: Use VMA as the primary object for context state
  2016-08-10  8:03   ` Joonas Lahtinen
@ 2016-08-10  8:25     ` Chris Wilson
  2016-08-10 10:54       ` Joonas Lahtinen
  0 siblings, 1 reply; 125+ messages in thread
From: Chris Wilson @ 2016-08-10  8:25 UTC (permalink / raw)
  To: Joonas Lahtinen; +Cc: intel-gfx

On Wed, Aug 10, 2016 at 11:03:39AM +0300, Joonas Lahtinen wrote:
> On su, 2016-08-07 at 15:45 +0100, Chris Wilson wrote:
> > -	if (!i915_gem_obj_ggtt_bound(ctx_obj))
> > -		seq_puts(m, "\tNot bound in GGTT\n");
> > -	else
> > -		ggtt_offset = i915_gem_obj_ggtt_offset(ctx_obj);
> > +	if (vma->flags & I915_VMA_GLOBAL_BIND)
> > +		seq_printf(m, "\tBound in GGTT at 0x%x\n",
> 
> 0x%04x?

You mean 0x08.

> > -	if (i915_gem_object_get_pages(ctx_obj)) {
> > -		seq_puts(m, "\tFailed to get pages for context object\n");
> > +	if (i915_gem_object_get_pages(vma->obj)) {
> > +		seq_puts(m, "\tFailed to get pages for context object\n\n");
> >  		return;
> >  	}
> >  
> > -	page = i915_gem_object_get_page(ctx_obj, LRC_STATE_PN);
> > -	if (!WARN_ON(page == NULL)) {
> > -		reg_state = kmap_atomic(page);
> > +	page = i915_gem_object_get_page(vma->obj, LRC_STATE_PN);
> > +	if (page) {
> 
> Dropped the WARN_ON? No mention in the commit message.

It's a redundant warn that should have been thrown out before. It
doesn't even deserve mentioning.

> > @@ -620,9 +631,7 @@ mi_set_context(struct drm_i915_gem_request *req, u32 hw_flags)
> >  
> >  	intel_ring_emit(ring, MI_NOOP);
> >  	intel_ring_emit(ring, MI_SET_CONTEXT);
> > -	intel_ring_emit(ring,
> > -			i915_gem_obj_ggtt_offset(req->ctx->engine[RCS].state) |
> > -			flags);
> > +	intel_ring_emit(ring, req->ctx->engine[RCS].state->node.start | flags);
> 
> Do we somewhere make sure flags do not collide with address? Not
> related to this patch, though.

Bspec vs alignment request, with warns that the allocation meets the
requst.

> > @@ -778,16 +788,12 @@ static int do_rcs_switch(struct drm_i915_gem_request *req)
> >  	from = engine->last_context;
> >  
> >  	/*
> > -	 * Clear this page out of any CPU caches for coherent swap-in/out. Note
> > -	 * that thanks to write = false in this call and us not setting any gpu
> > -	 * write domains when putting a context object onto the active list
> > -	 * (when switching away from it), this won't block.
> > -	 *
> > -	 * XXX: We need a real interface to do this instead of trickery.
> 
> What is changed to make this comment obsolete or should have it been
> removed earlier?

I had written a custom routine to do it, and then removed it to keep
this patch concise. In the next patch it is obsolete.

> >  	return 0;
> >  
> > -unpin_out:
> > -	i915_gem_object_ggtt_unpin(to->engine[RCS].state);
> > +unpin_vma:
> 
> sole error path; "err"
> 
> > +	i915_vma_unpin(to->engine[RCS].state);
> >  	return ret;
> >  }
> >  
> > diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> > index c621fa23cd28..21a4d0220c17 100644
> > --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> > +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> > @@ -1077,9 +1077,10 @@ static void i915_gem_record_rings(struct drm_i915_private *dev_priv,
> >  					i915_error_ggtt_object_create(dev_priv,
> >  								      engine->scratch.obj);
> >  
> > -			ee->ctx =
> > -				i915_error_ggtt_object_create(dev_priv,
> > -							      request->ctx->engine[i].state);
> > +			if (request->ctx->engine[i].state) {
> > +				ee->ctx = i915_error_ggtt_object_create(dev_priv,
> > +									request->ctx->engine[i].state->obj);
> > +			}
> 
> Why conditional now?

Because the code would otherwise dereference a NULL pointer.

It gets removed again in the next patches when we pass vma to error
object capture.

> >  	i915_gem_context_get(ctx);
> >  	return 0;
> >  
> >  unpin_map:
> > -	i915_gem_object_unpin_map(ce->state);
> > -unpin_ctx_obj:
> > -	i915_gem_object_ggtt_unpin(ce->state);
> > +	i915_gem_object_unpin_map(ce->state->obj);
> > +unpin_vma:
> > +	__i915_vma_unpin(ce->state);
> 
> err_vma while at it?
> 
> > @@ -2161,7 +2162,7 @@ static int execlists_context_deferred_alloc(struct i915_gem_context *ctx,
> >  	}
> >  
> >  	ce->ring = ring;
> > -	ce->state = ctx_obj;
> > +	ce->state = vma;
> 
> Maybe the member name could be just ce->vma too?

No, it still contains the logical GPU state as opposed to the ring which
also has its own vma, and so potentially confusing.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 05/33] drm/i915: Reduce amount of duplicate buffer information captured on error
  2016-08-10  8:07       ` Joonas Lahtinen
@ 2016-08-10  8:36         ` Chris Wilson
  2016-08-10 10:51           ` Joonas Lahtinen
  0 siblings, 1 reply; 125+ messages in thread
From: Chris Wilson @ 2016-08-10  8:36 UTC (permalink / raw)
  To: Joonas Lahtinen; +Cc: intel-gfx

On Wed, Aug 10, 2016 at 11:07:46AM +0300, Joonas Lahtinen wrote:
> On ke, 2016-08-10 at 08:15 +0100, Chris Wilson wrote:
> > On Wed, Aug 10, 2016 at 10:04:16AM +0300, Joonas Lahtinen wrote:
> > > 
> > > On su, 2016-08-07 at 15:45 +0100, Chris Wilson wrote:
> > > > 
> > > > -	if (i) {
> > > > +	active_bo = NULL;
> > > Could be initialized at declaration for better readability.
> > No. I disagree strongly. I dislike having to go back to the beginning of
> > the block to check to see if was initialised before the if-chain that
> > otherwise sets the value.
> 
> GCC has captured such an uninitialized variable scenario for quite a
> while. Just increases noise.

Imo batching the use together as in this patch improves the signal as
the reader can see everything in a single block.

> > > >  /* Capture all registers which don't fit into another category. */
> > > >  static void i915_capture_reg_state(struct drm_i915_private *dev_priv,
> > > >  				   struct drm_i915_error_state *error)
> > > > @@ -1436,10 +1402,12 @@ void i915_capture_error_state(struct drm_i915_private *dev_priv,
> > > >  
> > > >  	i915_capture_gen_state(dev_priv, error);
> > > >  	i915_capture_reg_state(dev_priv, error);
> > > > -	i915_gem_capture_buffers(dev_priv, error);
> > > >  	i915_gem_record_fences(dev_priv, error);
> > > >  	i915_gem_record_rings(dev_priv, error);
> > > >  
> > > > +	i915_capture_active_buffers(dev_priv, error);
> > > > +	i915_capture_pinned_buffers(dev_priv, error);
> > > > +
> > > Any specific reason for reordering here?
> > Different varieties of state capture, trying to use whitespace for
> > grouping.
> 
> Maybe keep it at current place and add whitespace before and after,
> making it three blocks?

gen_state, reg_state, record_fences are register state.

record_rings is a mix of register and associated buffers.

capture_*_buffers are the list of user buffers in the GTTs.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 19/33] drm/i915: Only clflush the context object when binding
  2016-08-07 14:45 ` [PATCH 19/33] drm/i915: Only clflush the context object when binding Chris Wilson
@ 2016-08-10  8:41   ` Joonas Lahtinen
  2016-08-10  9:02     ` Chris Wilson
  0 siblings, 1 reply; 125+ messages in thread
From: Joonas Lahtinen @ 2016-08-10  8:41 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On su, 2016-08-07 at 15:45 +0100, Chris Wilson wrote:
> @@ -771,6 +771,13 @@ static int do_rcs_switch(struct drm_i915_gem_request *req)
>  	if (skip_rcs_switch(ppgtt, engine, to))
>  		return 0;
>  
> +	if (!(to->engine[RCS].state->flags & I915_VMA_GLOBAL_BIND)) {
> +		ret = i915_gem_object_set_to_gtt_domain(to->engine[RCS].state->obj,
> +							false);
> +		if (ret)
> +			return ret;
> +	}
> +
>  	/* Trying to pin first makes error handling easier. */
>  	ret = i915_vma_pin(to->engine[RCS].state,
>  			   0, to->ggtt_alignment,

This could be lifted inside the if?

> @@ -790,11 +797,6 @@ static int do_rcs_switch(struct drm_i915_gem_request *req)
>  	/*
>  	 * Clear this page out of any CPU caches for coherent swap-in/out.
>  	 */

Move/update the comment too?

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 19/33] drm/i915: Only clflush the context object when binding
  2016-08-10  8:41   ` Joonas Lahtinen
@ 2016-08-10  9:02     ` Chris Wilson
  2016-08-10 10:50       ` Joonas Lahtinen
  0 siblings, 1 reply; 125+ messages in thread
From: Chris Wilson @ 2016-08-10  9:02 UTC (permalink / raw)
  To: Joonas Lahtinen; +Cc: intel-gfx

On Wed, Aug 10, 2016 at 11:41:39AM +0300, Joonas Lahtinen wrote:
> On su, 2016-08-07 at 15:45 +0100, Chris Wilson wrote:
> > @@ -771,6 +771,13 @@ static int do_rcs_switch(struct drm_i915_gem_request *req)
> >  	if (skip_rcs_switch(ppgtt, engine, to))
> >  		return 0;
> >  
> > +	if (!(to->engine[RCS].state->flags & I915_VMA_GLOBAL_BIND)) {
> > +		ret = i915_gem_object_set_to_gtt_domain(to->engine[RCS].state->obj,
> > +							false);
> > +		if (ret)
> > +			return ret;
> > +	}
> > +
> >  	/* Trying to pin first makes error handling easier. */
> >  	ret = i915_vma_pin(to->engine[RCS].state,
> >  			   0, to->ggtt_alignment,
> 
> This could be lifted inside the if?

I'd rather not as that would be more unusual and we would then need a
complicated error path.

If the compiler is feeling intelligent it could combine the two blocks
into one, i.e.

if (unlikely((++vma->flags ^ flags) & I915_VMA_BIND_MASK)) {
	ret = i915_gem_object_set_to_gtt_domain(vma->obj, false);
	if (ret)
		return ret;

	ret = __i915_vma_do_pin(vma, size, alignment, flags);
	if (ret)
		return ret;
}

I don't suggest we do the expansion ourselves, and I don't have a bit to
spare in the PIN_FLAGS.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance
  2016-08-09  9:05               ` Chris Wilson
@ 2016-08-10 10:12                 ` Daniel Vetter
  2016-08-10 10:13                   ` Daniel Vetter
  0 siblings, 1 reply; 125+ messages in thread
From: Daniel Vetter @ 2016-08-10 10:12 UTC (permalink / raw)
  To: Chris Wilson, Joonas Lahtinen, Daniel Vetter, intel-gfx, Daniel Vetter

On Tue, Aug 09, 2016 at 10:05:30AM +0100, Chris Wilson wrote:
> On Tue, Aug 09, 2016 at 11:48:56AM +0300, Joonas Lahtinen wrote:
> > On ti, 2016-08-09 at 08:14 +0100, Chris Wilson wrote:
> > > On Tue, Aug 09, 2016 at 09:36:48AM +0300, Joonas Lahtinen wrote:
> > > > 
> > > > On ma, 2016-08-08 at 10:45 +0100, Chris Wilson wrote:
> > > > > 
> > > > > On Mon, Aug 08, 2016 at 10:30:25AM +0100, Chris Wilson wrote:
> > > > > > 
> > > > > > 
> > > > > > On Mon, Aug 08, 2016 at 11:12:59AM +0200, Daniel Vetter wrote:
> > > > > > > 
> > > > > > > 
> > > > > > > On Sun, Aug 07, 2016 at 03:45:09PM +0100, Chris Wilson wrote:
> > > > > > > > 
> > > > > > > > 
> > > > > > > > In the debate as to whether the second read of active->request is
> > > > > > > > ordered after the dependent reads of the first read of active->request,
> > > > > > > > just give in and throw a smp_rmb() in there so that ordering of loads is
> > > > > > > > assured.
> > > > > > > > 
> > > > > > > > v2: Explain the manual smp_rmb()
> > > > > > > > 
> > > > > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > > > > > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > > > > > > Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > > > > > r-b confirmed.
> > > > > > It's still fishy that we are implying an SMP effect where we need to
> > > > > > mandate the local processor order (that being the order evaluation of
> > > > > > request = *active; engine = *request; *active). The two *active are
> > > > > > already ordered across SMP, so we are only concered about this cpu. :|
> > > > > More second thoughts. rcu_assign_pointer(NULL) is not visible to
> > > > > rcu_access_pointer on another CPU without the smp_rmb. 
> > > > Should not a RCU read side lock be involved?
> > > Yes, we use rcu read lock here. The question here is about visibility of
> > > the other processor writes vs the local processor order. Before the
> > > other processor can overwrite the request during reallocation, it will
> > > have updated the active->request and gone through a wmb. During busy
> > > ioctl's read of the request, we want to make sure that the values we
> > > read (request->engine, request->seqno) have not been overwritten as we
> > > do so - and we do that by serialising the second pointer check with the
> > > other cpus.
> > 
> > As discussed in IRC, some other mechanism than an improvised spinning
> > loop + some SMP barriers thrown around would be much preferred.
> > 
> > You suggested a seqlock, and it would likely be ok.
> 
> I was comparing the read latching as they are identical. Using a
> read/write seqlock around the request modification does not prevent all
> dangers such as using kzalloc() and introduces a second sequence counter
> to the one we already have. And for good reason seqlock says to use RCU
> here. Which puts us in a bit of a catch-22 and having to guard against
> SLAB_DESTROY_BY_RCU.

Since I was part of that irc party too: I think this is perfectly fine
as-is. Essentially what we do here is plain rcu+kref_get_unless_zero to
protect against zombies. This is exactly the design fence_get_rcu is meant
to be used in. But for this fastpath we can avoid even the zombie
protection if we're careful enough. Trying to make this less scary by
using a seqlock instead of plain rcu would run counter to the overall
fence_get_rcu design. And since the goal is to share fences widely and
far, we don't want to build up slightly different locking scheme here
where fence pointers are not really protected by rcu.

Given all that I think the open-coded scary dance (with lots of comments)
is acceptable, also since this clearly is a fastpath that userspace loves
to beat on.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance
  2016-08-10 10:12                 ` Daniel Vetter
@ 2016-08-10 10:13                   ` Daniel Vetter
  2016-08-10 11:00                     ` Joonas Lahtinen
  0 siblings, 1 reply; 125+ messages in thread
From: Daniel Vetter @ 2016-08-10 10:13 UTC (permalink / raw)
  To: Chris Wilson, Joonas Lahtinen, Daniel Vetter, intel-gfx, Daniel Vetter

On Wed, Aug 10, 2016 at 12:12:37PM +0200, Daniel Vetter wrote:
> On Tue, Aug 09, 2016 at 10:05:30AM +0100, Chris Wilson wrote:
> > On Tue, Aug 09, 2016 at 11:48:56AM +0300, Joonas Lahtinen wrote:
> > > On ti, 2016-08-09 at 08:14 +0100, Chris Wilson wrote:
> > > > On Tue, Aug 09, 2016 at 09:36:48AM +0300, Joonas Lahtinen wrote:
> > > > > 
> > > > > On ma, 2016-08-08 at 10:45 +0100, Chris Wilson wrote:
> > > > > > 
> > > > > > On Mon, Aug 08, 2016 at 10:30:25AM +0100, Chris Wilson wrote:
> > > > > > > 
> > > > > > > 
> > > > > > > On Mon, Aug 08, 2016 at 11:12:59AM +0200, Daniel Vetter wrote:
> > > > > > > > 
> > > > > > > > 
> > > > > > > > On Sun, Aug 07, 2016 at 03:45:09PM +0100, Chris Wilson wrote:
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > In the debate as to whether the second read of active->request is
> > > > > > > > > ordered after the dependent reads of the first read of active->request,
> > > > > > > > > just give in and throw a smp_rmb() in there so that ordering of loads is
> > > > > > > > > assured.
> > > > > > > > > 
> > > > > > > > > v2: Explain the manual smp_rmb()
> > > > > > > > > 
> > > > > > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > > > > > > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > > > > > > > Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > > > > > > r-b confirmed.
> > > > > > > It's still fishy that we are implying an SMP effect where we need to
> > > > > > > mandate the local processor order (that being the order evaluation of
> > > > > > > request = *active; engine = *request; *active). The two *active are
> > > > > > > already ordered across SMP, so we are only concered about this cpu. :|
> > > > > > More second thoughts. rcu_assign_pointer(NULL) is not visible to
> > > > > > rcu_access_pointer on another CPU without the smp_rmb. 
> > > > > Should not a RCU read side lock be involved?
> > > > Yes, we use rcu read lock here. The question here is about visibility of
> > > > the other processor writes vs the local processor order. Before the
> > > > other processor can overwrite the request during reallocation, it will
> > > > have updated the active->request and gone through a wmb. During busy
> > > > ioctl's read of the request, we want to make sure that the values we
> > > > read (request->engine, request->seqno) have not been overwritten as we
> > > > do so - and we do that by serialising the second pointer check with the
> > > > other cpus.
> > > 
> > > As discussed in IRC, some other mechanism than an improvised spinning
> > > loop + some SMP barriers thrown around would be much preferred.
> > > 
> > > You suggested a seqlock, and it would likely be ok.
> > 
> > I was comparing the read latching as they are identical. Using a
> > read/write seqlock around the request modification does not prevent all
> > dangers such as using kzalloc() and introduces a second sequence counter
> > to the one we already have. And for good reason seqlock says to use RCU
> > here. Which puts us in a bit of a catch-22 and having to guard against
> > SLAB_DESTROY_BY_RCU.
> 
> Since I was part of that irc party too: I think this is perfectly fine
> as-is. Essentially what we do here is plain rcu+kref_get_unless_zero to
> protect against zombies. This is exactly the design fence_get_rcu is meant
> to be used in. But for this fastpath we can avoid even the zombie
> protection if we're careful enough. Trying to make this less scary by
> using a seqlock instead of plain rcu would run counter to the overall
> fence_get_rcu design. And since the goal is to share fences widely and
> far, we don't want to build up slightly different locking scheme here
> where fence pointers are not really protected by rcu.
> 
> Given all that I think the open-coded scary dance (with lots of comments)
> is acceptable, also since this clearly is a fastpath that userspace loves
> to beat on.

Still would like to see Joonas' r-b on the first few patches ofc.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 33/33] drm/i915: Compress GPU objects in error state
  2016-08-07 14:45 ` [PATCH 33/33] drm/i915: Compress GPU objects in error state Chris Wilson
@ 2016-08-10 10:32   ` Joonas Lahtinen
  2016-08-10 10:52     ` Chris Wilson
  0 siblings, 1 reply; 125+ messages in thread
From: Joonas Lahtinen @ 2016-08-10 10:32 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On su, 2016-08-07 at 15:45 +0100, Chris Wilson wrote:
> @@ -309,12 +310,30 @@ void i915_error_printf(struct drm_i915_error_state_buf *e, const char *f, ...)
>  	va_end(args);
>  }
>  
> +static bool
> +ascii85_encode(u32 in, char *out)


base64 is more de facto and I bet userland "expects" it too.

I'd also throw the routines under lib/

> @@ -326,13 +345,23 @@ static void print_error_obj(struct drm_i915_error_state_buf *m,
>  			   lower_32_bits(obj->gtt_offset));
>  	}
>  
> -	for (page = offset = 0; page < obj->page_count; page++) {
> -		for (elt = 0; elt < PAGE_SIZE/4; elt++) {
> -			err_printf(m, "%08x :  %08x\n", offset,
> -				   obj->pages[page][elt]);
> -			offset += 4;
> +	err_puts(m, ":"); /* indicate compressed data */

I'd also keep the the uncompressed option, because somebody might be
trying to make a micro-kernel without extra algorithms options. A
config setting could be justified.

> +	for (page = 0; page < obj->page_count; page++) {
> +		int i, len;
> +
> +		len = PAGE_SIZE;
> +		if (page == obj->page_count - 1)
> +			len -= obj->unused;
> +		len = (len + 3) / 4;
> +
> +		for (i = 0; i < len; i++) {
> +			if (ascii85_encode(obj->pages[page][i], out))
> +				err_puts(m, out);
> +			else
> +				err_puts(m, "z");

I think the encode function could take ranges, you do not very often
care about encoding/decoding single character.

>  		}
>  	}
> +	err_puts(m, "\n");
>  }
>  
>  int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
> @@ -593,17 +622,37 @@ static void i915_error_state_free(struct kref *error_ref)
>  	kfree(error);
>  }
>  
> -static int compress_page(void *src, struct drm_i915_error_object *dst)
> +static int compress_page(struct z_stream_s *zstream,
> +			 void *src,
> +			 struct drm_i915_error_object *dst)
>  {
> -	unsigned long page;
> +	zstream->next_in = src;
> +	zstream->avail_in = PAGE_SIZE;
>  
> -	page = __get_free_page(GFP_ATOMIC | __GFP_NOWARN);
> -	if (!page)
> -		return -ENOMEM;
> +	do {
> +		if (zstream->avail_out == 0) {
> +			unsigned long page;
> +
> +			page = __get_free_page(GFP_ATOMIC | __GFP_NOWARN);
> +			if (!page)
> +				return -ENOMEM;
> +
> +			dst->pages[dst->page_count++] = (void *)page;

Why is not dst->pages of different type?

> +
> +			zstream->next_out = (void *)page;
> +			zstream->avail_out = PAGE_SIZE;
> +		}
>  
> -	dst->pages[dst->page_count++] = (void *)page;
> +		if (zlib_deflate(zstream, Z_SYNC_FLUSH) != Z_OK)
> +			return -EIO;
> +
> +#if 0
> +		/* XXX fallback to uncompressed if we increases size? */
> +		if (zstream->total_out > zstream->total_in)
> +			return -E2BIG;
> +#endif

Not something we would merge. FIXME: or TODO: comment should be enough,
or make it DRM_INFO and we can act if we get reports?

> +	} while (zstream->avail_in);
>  
> -	memcpy((void *)page, src, PAGE_SIZE);

// The function name has been so descriptive previously :P

> @@ -622,6 +672,7 @@ i915_error_object_create(struct drm_i915_private *i915,
>  		return NULL;
>  
>  	num_pages = min_t(u64, vma->size, vma->obj->base.size) >> PAGE_SHIFT;
> +	num_pages = DIV_ROUND_UP(10 * num_pages, 8); /* worstcase zlib growth */

This kind of calculation could be made into a zlib function?

> @@ -629,6 +680,18 @@ i915_error_object_create(struct drm_i915_private *i915,
>  
>  	dst->gtt_offset = vma->node.start;
>  	dst->page_count = 0;
> +	dst->unused = 0;
> +
> +	memset(&zstream, 0, sizeof(zstream));
> +	zstream.workspace = kmalloc(zlib_deflate_workspacesize(MAX_WBITS,
> +							       MAX_MEM_LEVEL),
> +				    GFP_ATOMIC | __GFP_NOWARN);

Wouldn't look better with an intermediate variable?

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 19/33] drm/i915: Only clflush the context object when binding
  2016-08-10  9:02     ` Chris Wilson
@ 2016-08-10 10:50       ` Joonas Lahtinen
  0 siblings, 0 replies; 125+ messages in thread
From: Joonas Lahtinen @ 2016-08-10 10:50 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On ke, 2016-08-10 at 10:02 +0100, Chris Wilson wrote:
> On Wed, Aug 10, 2016 at 11:41:39AM +0300, Joonas Lahtinen wrote:
> > 
> > On su, 2016-08-07 at 15:45 +0100, Chris Wilson wrote:
> > > 
> > > @@ -771,6 +771,13 @@ static int do_rcs_switch(struct drm_i915_gem_request *req)
> > >  	if (skip_rcs_switch(ppgtt, engine, to))
> > >  		return 0;
> > >  
> > > +	if (!(to->engine[RCS].state->flags & I915_VMA_GLOBAL_BIND)) {
> > > +		ret = i915_gem_object_set_to_gtt_domain(to->engine[RCS].state->obj,
> > > +							false);
> > > +		if (ret)
> > > +			return ret;
> > > +	}
> > > +
> > >  	/* Trying to pin first makes error handling easier. */
> > >  	ret = i915_vma_pin(to->engine[RCS].state,
> > >  			   0, to->ggtt_alignment,
> > This could be lifted inside the if?
> I'd rather not as that would be more unusual and we would then need a
> complicated error path.
> 
> If the compiler is feeling intelligent it could combine the two blocks
> into one, i.e.
> 
> if (unlikely((++vma->flags ^ flags) & I915_VMA_BIND_MASK)) {
> 	ret = i915_gem_object_set_to_gtt_domain(vma->obj, false);
> 	if (ret)
> 		return ret;
> 
> 	ret = __i915_vma_do_pin(vma, size, alignment, flags);
> 	if (ret)
> 		return ret;
> }
> 
> I don't suggest we do the expansion ourselves, and I don't have a bit to
> spare in the PIN_FLAGS.

With the comment moved/updated (mentioned in previous patch);

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

> -Chris
> 
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 05/33] drm/i915: Reduce amount of duplicate buffer information captured on error
  2016-08-10  8:36         ` Chris Wilson
@ 2016-08-10 10:51           ` Joonas Lahtinen
  0 siblings, 0 replies; 125+ messages in thread
From: Joonas Lahtinen @ 2016-08-10 10:51 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On ke, 2016-08-10 at 09:36 +0100, Chris Wilson wrote:
> On Wed, Aug 10, 2016 at 11:07:46AM +0300, Joonas Lahtinen wrote:
> > 
> > On ke, 2016-08-10 at 08:15 +0100, Chris Wilson wrote:
> > > 
> > > On Wed, Aug 10, 2016 at 10:04:16AM +0300, Joonas Lahtinen wrote:
> > > > 
> > > > 
> > > > On su, 2016-08-07 at 15:45 +0100, Chris Wilson wrote:
> > > > > 
> > > > > 
> > > > > -	if (i) {
> > > > > +	active_bo = NULL;
> > > > Could be initialized at declaration for better readability.
> > > No. I disagree strongly. I dislike having to go back to the beginning of
> > > the block to check to see if was initialised before the if-chain that
> > > otherwise sets the value.
> > GCC has captured such an uninitialized variable scenario for quite a
> > while. Just increases noise.
> Imo batching the use together as in this patch improves the signal as
> the reader can see everything in a single block.

Meh, lets agree to disagree.

> 
> > 
> > > 
> > > > 
> > > > > 
> > > > >  /* Capture all registers which don't fit into another category. */
> > > > >  static void i915_capture_reg_state(struct drm_i915_private *dev_priv,
> > > > >  				   struct drm_i915_error_state *error)
> > > > > @@ -1436,10 +1402,12 @@ void i915_capture_error_state(struct drm_i915_private *dev_priv,
> > > > >  
> > > > >  	i915_capture_gen_state(dev_priv, error);
> > > > >  	i915_capture_reg_state(dev_priv, error);
> > > > > -	i915_gem_capture_buffers(dev_priv, error);
> > > > >  	i915_gem_record_fences(dev_priv, error);
> > > > >  	i915_gem_record_rings(dev_priv, error);
> > > > >  
> > > > > +	i915_capture_active_buffers(dev_priv, error);
> > > > > +	i915_capture_pinned_buffers(dev_priv, error);
> > > > > +
> > > > Any specific reason for reordering here?
> > > Different varieties of state capture, trying to use whitespace for
> > > grouping.
> > Maybe keep it at current place and add whitespace before and after,
> > making it three blocks?
> gen_state, reg_state, record_fences are register state.
> 
> record_rings is a mix of register and associated buffers.
> 
> capture_*_buffers are the list of user buffers in the GTTs.

Was just wondering if the order could effect something;

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas

> -Chris
> 
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 33/33] drm/i915: Compress GPU objects in error state
  2016-08-10 10:32   ` Joonas Lahtinen
@ 2016-08-10 10:52     ` Chris Wilson
  2016-08-10 11:26       ` Joonas Lahtinen
  0 siblings, 1 reply; 125+ messages in thread
From: Chris Wilson @ 2016-08-10 10:52 UTC (permalink / raw)
  To: Joonas Lahtinen; +Cc: intel-gfx

On Wed, Aug 10, 2016 at 01:32:29PM +0300, Joonas Lahtinen wrote:
> On su, 2016-08-07 at 15:45 +0100, Chris Wilson wrote:
> > @@ -309,12 +310,30 @@ void i915_error_printf(struct drm_i915_error_state_buf *e, const char *f, ...)
> >  	va_end(args);
> >  }
> >  
> > +static bool
> > +ascii85_encode(u32 in, char *out)
> 
> 
> base64 is more de facto and I bet userland "expects" it too.

No. It expects a standard zlib compressed ascii85 stream.

> > @@ -326,13 +345,23 @@ static void print_error_obj(struct drm_i915_error_state_buf *m,
> >  			   lower_32_bits(obj->gtt_offset));
> >  	}
> >  
> > -	for (page = offset = 0; page < obj->page_count; page++) {
> > -		for (elt = 0; elt < PAGE_SIZE/4; elt++) {
> > -			err_printf(m, "%08x :  %08x\n", offset,
> > -				   obj->pages[page][elt]);
> > -			offset += 4;
> > +	err_puts(m, ":"); /* indicate compressed data */
> 
> I'd also keep the the uncompressed option, because somebody might be
> trying to make a micro-kernel without extra algorithms options. A
> config setting could be justified.
> 
> > +	for (page = 0; page < obj->page_count; page++) {
> > +		int i, len;
> > +
> > +		len = PAGE_SIZE;
> > +		if (page == obj->page_count - 1)
> > +			len -= obj->unused;
> > +		len = (len + 3) / 4;
> > +
> > +		for (i = 0; i < len; i++) {
> > +			if (ascii85_encode(obj->pages[page][i], out))
> > +				err_puts(m, out);
> > +			else
> > +				err_puts(m, "z");
> 
> I think the encode function could take ranges, you do not very often
> care about encoding/decoding single character.

> >  		}
> >  	}
> > +	err_puts(m, "\n");
> >  }
> >  
> >  int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
> > @@ -593,17 +622,37 @@ static void i915_error_state_free(struct kref *error_ref)
> >  	kfree(error);
> >  }
> >  
> > -static int compress_page(void *src, struct drm_i915_error_object *dst)
> > +static int compress_page(struct z_stream_s *zstream,
> > +			 void *src,
> > +			 struct drm_i915_error_object *dst)
> >  {
> > -	unsigned long page;
> > +	zstream->next_in = src;
> > +	zstream->avail_in = PAGE_SIZE;
> >  
> > -	page = __get_free_page(GFP_ATOMIC | __GFP_NOWARN);
> > -	if (!page)
> > -		return -ENOMEM;
> > +	do {
> > +		if (zstream->avail_out == 0) {
> > +			unsigned long page;
> > +
> > +			page = __get_free_page(GFP_ATOMIC | __GFP_NOWARN);
> > +			if (!page)
> > +				return -ENOMEM;
> > +
> > +			dst->pages[dst->page_count++] = (void *)page;
> 
> Why is not dst->pages of different type?

You want dst->pages[] as an array of unsigned long?

> 
> > +
> > +			zstream->next_out = (void *)page;
> > +			zstream->avail_out = PAGE_SIZE;
> > +		}
> >  
> > -	dst->pages[dst->page_count++] = (void *)page;
> > +		if (zlib_deflate(zstream, Z_SYNC_FLUSH) != Z_OK)
> > +			return -EIO;
> > +
> > +#if 0
> > +		/* XXX fallback to uncompressed if we increases size? */
> > +		if (zstream->total_out > zstream->total_in)
> > +			return -E2BIG;
> > +#endif
> 
> Not something we would merge. FIXME: or TODO: comment should be enough,
> or make it DRM_INFO and we can act if we get reports?
> 
> > +	} while (zstream->avail_in);
> >  
> > -	memcpy((void *)page, src, PAGE_SIZE);
> 
> // The function name has been so descriptive previously :P
> 
> > @@ -622,6 +672,7 @@ i915_error_object_create(struct drm_i915_private *i915,
> >  		return NULL;
> >  
> >  	num_pages = min_t(u64, vma->size, vma->obj->base.size) >> PAGE_SHIFT;
> > +	num_pages = DIV_ROUND_UP(10 * num_pages, 8); /* worstcase zlib growth */
> 
> This kind of calculation could be made into a zlib function?
> 
> > @@ -629,6 +680,18 @@ i915_error_object_create(struct drm_i915_private *i915,
> >  
> >  	dst->gtt_offset = vma->node.start;
> >  	dst->page_count = 0;
> > +	dst->unused = 0;
> > +
> > +	memset(&zstream, 0, sizeof(zstream));
> > +	zstream.workspace = kmalloc(zlib_deflate_workspacesize(MAX_WBITS,
> > +							       MAX_MEM_LEVEL),
> > +				    GFP_ATOMIC | __GFP_NOWARN);
> 
> Wouldn't look better with an intermediate variable?

No.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 18/33] drm/i915: Use VMA as the primary object for context state
  2016-08-10  8:25     ` Chris Wilson
@ 2016-08-10 10:54       ` Joonas Lahtinen
  0 siblings, 0 replies; 125+ messages in thread
From: Joonas Lahtinen @ 2016-08-10 10:54 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On ke, 2016-08-10 at 09:25 +0100, Chris Wilson wrote:
> On Wed, Aug 10, 2016 at 11:03:39AM +0300, Joonas Lahtinen wrote:
> > 
> > On su, 2016-08-07 at 15:45 +0100, Chris Wilson wrote:
> > > 
> > > -	if (!i915_gem_obj_ggtt_bound(ctx_obj))
> > > -		seq_puts(m, "\tNot bound in GGTT\n");
> > > -	else
> > > -		ggtt_offset = i915_gem_obj_ggtt_offset(ctx_obj);
> > > +	if (vma->flags & I915_VMA_GLOBAL_BIND)
> > > +		seq_printf(m, "\tBound in GGTT at 0x%x\n",
> > 0x%04x?
> You mean 0x08.
> 

Yep.

This was again rather complex to review, different things being mixed;

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 32/33] drm/i915: Consolidate error object printing
  2016-08-09 11:53     ` Chris Wilson
@ 2016-08-10 10:55       ` Joonas Lahtinen
  0 siblings, 0 replies; 125+ messages in thread
From: Joonas Lahtinen @ 2016-08-10 10:55 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On ti, 2016-08-09 at 12:53 +0100, Chris Wilson wrote:
> On Tue, Aug 09, 2016 at 02:44:41PM +0300, Joonas Lahtinen wrote:
> > 
> > On su, 2016-08-07 at 15:45 +0100, Chris Wilson wrote:
> > > @@ -446,15 +458,7 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
> > >  			err_printf(m, " --- gtt_offset = 0x%08x %08x\n",
> > If intended for userspace parsing "0x%08x %08x" vs. "0x%08x_%08x" would
> > be good to be consistent. And to reduce such error in future, I'd also
> > make this line be printed with above function (let there be extra
> > space).
> Yes, I remembered to fix that mistake only after sending the patches. :|
> 
> Combining this one is a bit trickier as it doesn't conform to the others.
> For simplicity I left the custom header in the caller.

Ack.

> -Chris
> 
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 14/33] drm/i915: Create a VMA for an object
  2016-08-08  9:09     ` Chris Wilson
@ 2016-08-10 10:58       ` Joonas Lahtinen
  0 siblings, 0 replies; 125+ messages in thread
From: Joonas Lahtinen @ 2016-08-10 10:58 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On ma, 2016-08-08 at 10:09 +0100, Chris Wilson wrote:
> On Mon, Aug 08, 2016 at 12:01:07PM +0300, Joonas Lahtinen wrote:
> > 
> > On su, 2016-08-07 at 15:45 +0100, Chris Wilson wrote:
> > > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > > @@ -3903,4 +3903,6 @@ static inline bool __i915_request_irq_complete(struct drm_i915_gem_request *req)
> > >  	return false;
> > >  }
> > >  
> > > +#define nullify(ptr) ({typeof(*ptr) T = *(ptr); *(ptr) = NULL; T;})
> > > +
> > Random lost hunk here.
> In the next patches where I use i915_vma_create() I also use this
> helper. It was just conveience.
> 

As discussed in IRC, with proper name and to its own patch.

> > >  struct i915_vma *
> > > +i915_vma_create(struct drm_i915_gem_object *obj,
> > > +		struct i915_address_space *vm,
> > > +		const struct i915_ggtt_view *view)
> > > +{
> > > +	GEM_BUG_ON(view ? i915_gem_obj_to_ggtt_view(obj, view) : i915_gem_obj_to_vma(obj, vm));
> > GEM_BUG_ON(view && !i915_is_ggtt(vm)) ?
> We have that as a WARN_ON inside create(), I suspose it doesn't hurt
> here either and documents the interface.

That added;

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas

> -Chris
> 
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance
  2016-08-10 10:13                   ` Daniel Vetter
@ 2016-08-10 11:00                     ` Joonas Lahtinen
  2016-08-12  9:50                       ` Joonas Lahtinen
  0 siblings, 1 reply; 125+ messages in thread
From: Joonas Lahtinen @ 2016-08-10 11:00 UTC (permalink / raw)
  To: Daniel Vetter, Chris Wilson, intel-gfx, Daniel Vetter

On ke, 2016-08-10 at 12:13 +0200, Daniel Vetter wrote:
> On Wed, Aug 10, 2016 at 12:12:37PM +0200, Daniel Vetter wrote:
> > 
> > On Tue, Aug 09, 2016 at 10:05:30AM +0100, Chris Wilson wrote:
> > > 
> > > On Tue, Aug 09, 2016 at 11:48:56AM +0300, Joonas Lahtinen wrote:
> > > > 
> > > > On ti, 2016-08-09 at 08:14 +0100, Chris Wilson wrote:
> > > > > 
> > > > > On Tue, Aug 09, 2016 at 09:36:48AM +0300, Joonas Lahtinen wrote:
> > > > > > 
> > > > > > 
> > > > > > On ma, 2016-08-08 at 10:45 +0100, Chris Wilson wrote:
> > > > > > > 
> > > > > > > 
> > > > > > > On Mon, Aug 08, 2016 at 10:30:25AM +0100, Chris Wilson wrote:
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > On Mon, Aug 08, 2016 at 11:12:59AM +0200, Daniel Vetter wrote:
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > On Sun, Aug 07, 2016 at 03:45:09PM +0100, Chris Wilson wrote:
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > In the debate as to whether the second read of active->request is
> > > > > > > > > > ordered after the dependent reads of the first read of active->request,
> > > > > > > > > > just give in and throw a smp_rmb() in there so that ordering of loads is
> > > > > > > > > > assured.
> > > > > > > > > > 
> > > > > > > > > > v2: Explain the manual smp_rmb()
> > > > > > > > > > 
> > > > > > > > > > Signed-off-by: Chris Wilson 
> > > > > > > > > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > > > > > > > > Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > > > > > > > r-b confirmed.
> > > > > > > > It's still fishy that we are implying an SMP effect where we need to
> > > > > > > > mandate the local processor order (that being the order evaluation of
> > > > > > > > request = *active; engine = *request; *active). The two *active are
> > > > > > > > already ordered across SMP, so we are only concered about this cpu. :|
> > > > > > > More second thoughts. rcu_assign_pointer(NULL) is not visible to
> > > > > > > rcu_access_pointer on another CPU without the smp_rmb. 
> > > > > > Should not a RCU read side lock be involved?
> > > > > Yes, we use rcu read lock here. The question here is about visibility of
> > > > > the other processor writes vs the local processor order. Before the
> > > > > other processor can overwrite the request during reallocation, it will
> > > > > have updated the active->request and gone through a wmb. During busy
> > > > > ioctl's read of the request, we want to make sure that the values we
> > > > > read (request->engine, request->seqno) have not been overwritten as we
> > > > > do so - and we do that by serialising the second pointer check with the
> > > > > other cpus.
> > > > As discussed in IRC, some other mechanism than an improvised spinning
> > > > loop + some SMP barriers thrown around would be much preferred.
> > > > 
> > > > You suggested a seqlock, and it would likely be ok.
> > > I was comparing the read latching as they are identical. Using a
> > > read/write seqlock around the request modification does not prevent all
> > > dangers such as using kzalloc() and introduces a second sequence counter
> > > to the one we already have. And for good reason seqlock says to use RCU
> > > here. Which puts us in a bit of a catch-22 and having to guard against
> > > SLAB_DESTROY_BY_RCU.
> > Since I was part of that irc party too: I think this is perfectly fine
> > as-is. Essentially what we do here is plain rcu+kref_get_unless_zero to
> > protect against zombies. This is exactly the design fence_get_rcu is meant
> > to be used in. But for this fastpath we can avoid even the zombie
> > protection if we're careful enough. Trying to make this less scary by
> > using a seqlock instead of plain rcu would run counter to the overall
> > fence_get_rcu design. And since the goal is to share fences widely and
> > far, we don't want to build up slightly different locking scheme here
> > where fence pointers are not really protected by rcu.
> > 
> > Given all that I think the open-coded scary dance (with lots of comments)
> > is acceptable, also since this clearly is a fastpath that userspace loves
> > to beat on.
> Still would like to see Joonas' r-b on the first few patches ofc.

With the extra comments;

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

I still think it's fragile, though. But lets see once the dust settles
if we can make improvements.

Regards, Joonas

> -Daniel
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 33/33] drm/i915: Compress GPU objects in error state
  2016-08-10 10:52     ` Chris Wilson
@ 2016-08-10 11:26       ` Joonas Lahtinen
  0 siblings, 0 replies; 125+ messages in thread
From: Joonas Lahtinen @ 2016-08-10 11:26 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On ke, 2016-08-10 at 11:52 +0100, Chris Wilson wrote:
> On Wed, Aug 10, 2016 at 01:32:29PM +0300, Joonas Lahtinen wrote:
> > 
> > On su, 2016-08-07 at 15:45 +0100, Chris Wilson wrote:
> > > 
> > > @@ -309,12 +310,30 @@ void i915_error_printf(struct drm_i915_error_state_buf *e, const char *f, ...)
> > >  	va_end(args);
> > >  }
> > >  
> > > +static bool
> > > +ascii85_encode(u32 in, char *out)
> > 
> > base64 is more de facto and I bet userland "expects" it too.
> No. It expects a standard zlib compressed ascii85 stream.

Right, seems there is some standard being followed. Other comments
about making the function more general purpose apply though.

> > > -static int compress_page(void *src, struct drm_i915_error_object *dst)
> > > +static int compress_page(struct z_stream_s *zstream,
> > > +			 void *src,
> > > +			 struct drm_i915_error_object *dst)
> > >  {
> > > -	unsigned long page;
> > > +	zstream->next_in = src;
> > > +	zstream->avail_in = PAGE_SIZE;
> > >  
> > > -	page = __get_free_page(GFP_ATOMIC | __GFP_NOWARN);
> > > -	if (!page)
> > > -		return -ENOMEM;
> > > +	do {
> > > +		if (zstream->avail_out == 0) {
> > > +			unsigned long page;
> > > +
> > > +			page = __get_free_page(GFP_ATOMIC | __GFP_NOWARN);
> > > +			if (!page)
> > > +				return -ENOMEM;
> > > +
> > > +			dst->pages[dst->page_count++] = (void *)page;
> > Why is not dst->pages of different type?
> You want dst->pages[] as an array of unsigned long?

Wouldn't that be more convenient?

> > > @@ -629,6 +680,18 @@ i915_error_object_create(struct drm_i915_private *i915,
> > >  
> > >  	dst->gtt_offset = vma->node.start;
> > >  	dst->page_count = 0;
> > > +	dst->unused = 0;
> > > +
> > > +	memset(&zstream, 0, sizeof(zstream));
> > > +	zstream.workspace = kmalloc(zlib_deflate_workspacesize(MAX_WBITS,
> > > +							       MAX_MEM_LEVEL),
> > > +				    GFP_ATOMIC | __GFP_NOWARN);
> > Wouldn't look better with an intermediate variable?

Right. It's not exactly wrong, so go ahead...

This could be split in its own series, too... These mega series are
horrible to try to get reviewed.

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 20/33] drm/i915: Use VMA for ringbuffer tracking
  2016-08-07 14:45 ` [PATCH 20/33] drm/i915: Use VMA for ringbuffer tracking Chris Wilson
@ 2016-08-11  9:32   ` Joonas Lahtinen
  2016-08-11  9:58     ` Chris Wilson
  0 siblings, 1 reply; 125+ messages in thread
From: Joonas Lahtinen @ 2016-08-11  9:32 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On su, 2016-08-07 at 15:45 +0100, Chris Wilson wrote:
> diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
> index 03a4d2ae71db..761201ff6b34 100644
> --- a/drivers/gpu/drm/i915/i915_guc_submission.c
> +++ b/drivers/gpu/drm/i915/i915_guc_submission.c
> @@ -343,7 +343,7 @@ static void guc_init_ctx_desc(struct intel_guc *guc,
>  	for_each_engine(engine, dev_priv) {
>  		struct intel_context *ce = &ctx->engine[engine->id];
>  		struct guc_execlist_context *lrc = &desc.lrc[engine->guc_id];
> -		struct drm_i915_gem_object *obj;
> +		struct i915_vma *vma;
>  
>  		/* TODO: We have a design issue to be solved here. Only when we
>  		 * receive the first batch, we know which engine is used by the
> @@ -358,17 +358,15 @@ static void guc_init_ctx_desc(struct intel_guc *guc,
>  		lrc->context_desc = lower_32_bits(ce->lrc_desc);
>  
>  		/* The state page is after PPHWSP */
> -		gfx_addr = ce->state->node.start;
> -		lrc->ring_lcra = gfx_addr + LRC_STATE_PN * PAGE_SIZE;
> +		vma = ce->state;
> +		lrc->ring_lcra = vma->node.start + LRC_STATE_PN * PAGE_SIZE;

An alias just for this line? Maybe not.

>  		lrc->context_id = (client->ctx_index << GUC_ELC_CTXID_OFFSET) |
>  				(engine->guc_id << GUC_ELC_ENGINE_OFFSET);
>  
> -		obj = ce->ring->obj;
> -		gfx_addr = i915_gem_obj_ggtt_offset(obj);
> -
> -		lrc->ring_begin = gfx_addr;
> -		lrc->ring_end = gfx_addr + obj->base.size - 1;
> -		lrc->ring_next_free_location = gfx_addr;
> +		vma = ce->ring->vma;
> +		lrc->ring_begin = vma->node.start;
> +		lrc->ring_end = vma->node.start + vma->node.size - 1;
> +		lrc->ring_next_free_location = lrc->ring_begin;

Again, an alias for three lines? And it's a multipurpose alias too, so
double nope.

> @@ -1744,16 +1744,17 @@ logical_ring_default_irqs(struct intel_engine_cs *engine)
>  static int
>  lrc_setup_hws(struct intel_engine_cs *engine, struct i915_vma *vma)
>  {
> +#define HWS_OFFSET (LRC_PPHWSP_PN * PAGE_SIZE)

Wouldn't this go next to LRC_PPHWSP_PN?

> @@ -1853,79 +1852,78 @@ static void cleanup_phys_status_page(struct intel_engine_cs *engine)
>  
>  static void cleanup_status_page(struct intel_engine_cs *engine)
>  {
> -	struct drm_i915_gem_object *obj;
> +	struct i915_vma *vma;
>  
> -	obj = engine->status_page.obj;
> -	if (obj == NULL)
> +	vma = nullify(&engine->status_page.vma);
> +	if (!vma)
>  		return;
>  
> -	kunmap(sg_page(obj->pages->sgl));
> -	i915_gem_object_ggtt_unpin(obj);
> -	i915_gem_object_put(obj);
> -	engine->status_page.obj = NULL;
> +	i915_vma_unpin(vma);
> +	i915_gem_object_unpin_map(vma->obj);
> +	i915_gem_object_put(vma->obj);

This looks tad strange, because usually one first does all 'foo->bar'
releases and then 'foo'. Just commenting here.

<SNIP>

> -	engine->status_page.gfx_addr = i915_gem_obj_ggtt_offset(obj);
> -	engine->status_page.page_addr = kmap(sg_page(obj->pages->sgl));
> -	memset(engine->status_page.page_addr, 0, PAGE_SIZE);
> +	flags = PIN_GLOBAL;
> +	if (!HAS_LLC(engine->i915))
> +		/* On g33, we cannot place HWS above 256MiB, so
> +		 * restrict its pinning to the low mappable arena.
> +		 * Though this restriction is not documented for
> +		 * gen4, gen5, or byt, they also behave similarly
> +		 * and hang if the HWS is placed at the top of the
> +		 * GTT. To generalise, it appears that all !llc
> +		 * platforms have issues with us placing the HWS
> +		 * above the mappable region (even though we never
> +		 * actualy map it).
> +		 */
> +		flags |= PIN_MAPPABLE;

For readability, I'd move the comment one level up and before the if.
 
> +	DRM_DEBUG_DRIVER("%s hws offset: 0x%08llx\n",
> +			 engine->name, vma->node.start);
>  	return 0;
> +
> +err_unref:

Sole error label, could be err.

>  
>  int intel_ring_pin(struct intel_ring *ring)
>  {
> -	struct drm_i915_private *dev_priv = ring->engine->i915;
> -	struct drm_i915_gem_object *obj = ring->obj;
>  	/* Ring wraparound at offset 0 sometimes hangs. No idea why. */
> -	unsigned flags = PIN_OFFSET_BIAS | 4096;
> +	unsigned int flags = PIN_GLOBAL | PIN_OFFSET_BIAS | 4096;
>  	void *addr;
>  	int ret;
>  
> -	if (HAS_LLC(dev_priv) && !obj->stolen) {
> -		ret = i915_gem_object_ggtt_pin(obj, NULL, 0, PAGE_SIZE, flags);
> -		if (ret)
> -			return ret;
> -
> -		ret = i915_gem_object_set_to_cpu_domain(obj, true);
> -		if (ret)
> -			goto err_unpin;
> -
> -		addr = i915_gem_object_pin_map(obj);
> -		if (IS_ERR(addr)) {
> -			ret = PTR_ERR(addr);
> -			goto err_unpin;
> -		}
> -	} else {
> -		ret = i915_gem_object_ggtt_pin(obj, NULL, 0, PAGE_SIZE,
> -					       flags | PIN_MAPPABLE);
> -		if (ret)
> -			return ret;
> +	GEM_BUG_ON(ring->vaddr);
>  
> -		ret = i915_gem_object_set_to_gtt_domain(obj, true);
> -		if (ret)
> -			goto err_unpin;
> +	if (ring->vmap)
> +		flags |= PIN_MAPPABLE;
>  
> -		/* Access through the GTT requires the device to be awake. */
> -		assert_rpm_wakelock_held(dev_priv);

This wakelock disappears in this patch.

> +	ret = i915_vma_pin(ring->vma, 0, PAGE_SIZE, flags);
> +	if (unlikely(ret))
> +		return ret;
>  
> -		addr = (void __force *)
> -			i915_vma_pin_iomap(i915_gem_obj_to_ggtt(obj));
> -		if (IS_ERR(addr)) {
> -			ret = PTR_ERR(addr);
> -			goto err_unpin;
> -		}
> +	if (ring->vmap)
> +		addr = i915_gem_object_pin_map(ring->vma->obj);
> +	else
> +		addr = (void __force *)i915_vma_pin_iomap(ring->vma);

Wakelock needed in this path?

> +	if (IS_ERR(addr)) {
> +		i915_vma_unpin(ring->vma);
> +		return PTR_ERR(addr);

Keep the good ol' teardown path.


>  	}
>  
>  	ring->vaddr = addr;
> -	ring->vma = i915_gem_obj_to_ggtt(obj);
>  	return 0;
> -
> -err_unpin:
> -	i915_gem_object_ggtt_unpin(obj);
> -	return ret;
>  }
>  
> -static int intel_alloc_ringbuffer_obj(struct drm_device *dev,
> -				      struct intel_ring *ring)
> +static struct i915_vma *
> +intel_ring_create_vma(struct drm_i915_private *dev_priv, int size)
>  {
>  	struct drm_i915_gem_object *obj;
> +	struct i915_vma *vma;
> +	int ret;
>  
> -	obj = NULL;
> -	if (!HAS_LLC(dev))
> -		obj = i915_gem_object_create_stolen(dev, ring->size);
> -	if (obj == NULL)
> -		obj = i915_gem_object_create(dev, ring->size);
> +	obj = ERR_PTR(-ENODEV);
> +	if (!HAS_LLC(dev_priv))
> +		obj = i915_gem_object_create_stolen(&dev_priv->drm, size);
>  	if (IS_ERR(obj))
> -		return PTR_ERR(obj);
> +		obj = i915_gem_object_create(&dev_priv->drm, size);
> +	if (IS_ERR(obj))
> +		return ERR_CAST(obj);
>  
>  	/* mark ring buffers as read-only from GPU side by default */
>  	obj->gt_ro = 1;
>  
> -	ring->obj = obj;
> +	if (HAS_LLC(dev_priv) && !obj->stolen)
> +		ret = i915_gem_object_set_to_cpu_domain(obj, true);
> +	else
> +		ret = i915_gem_object_set_to_gtt_domain(obj, true);
> +	if (ret) {
> +		vma = ERR_PTR(ret);
> +		goto err;
> +	}

Might be worth mentioning that the ring objects are now moved to their
domain at the time creation, not pinning. Any specific reason for the
change?

Also mention that you're silencing quite a few debugs and one
DRM_ERROR.

> @@ -2060,22 +2040,23 @@ intel_engine_create_ring(struct intel_engine_cs *engine, int size)
>  	ring->last_retired_head = -1;
>  	intel_ring_update_space(ring);
>  
> -	ret = intel_alloc_ringbuffer_obj(&engine->i915->drm, ring);
> -	if (ret) {
> -		DRM_DEBUG_DRIVER("Failed to allocate ringbuffer %s: %d\n",
> -				 engine->name, ret);
> -		list_del(&ring->link);
> +	vma = intel_ring_create_vma(engine->i915, size);
> +	if (IS_ERR(vma)) {
>  		kfree(ring);
> -		return ERR_PTR(ret);
> +		return ERR_CAST(vma);
>  	}
> +	ring->vma = vma;
> +	if (HAS_LLC(engine->i915) && !vma->obj->stolen)
> +		ring->vmap = true;

use_vmap/need_vmap or something? 'vmap' sounds like the actual mapping.

>  		ret = init_status_page(engine);
> @@ -2184,11 +2164,10 @@ static int intel_init_ring_buffer(struct intel_engine_cs *engine)
>  
>  	ret = intel_ring_pin(ring);
>  	if (ret) {
> -		DRM_ERROR("Failed to pin and map ringbuffer %s: %d\n",
> -				engine->name, ret);
> -		intel_destroy_ringbuffer_obj(ring);
> +		intel_ring_free(ring);

Shouldn't this be like goto err_ring?

>  		goto error;
>  	}
> +	engine->buffer = ring;
>  
>  	return 0;
>   
>  	struct intel_engine_cs *engine;
>  	struct list_head link;
> @@ -97,6 +96,7 @@ struct intel_ring {
>  	int space;
>  	int size;
>  	int effective_size;
> +	bool vmap;

Renaming suggested above.

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 20/33] drm/i915: Use VMA for ringbuffer tracking
  2016-08-11  9:32   ` Joonas Lahtinen
@ 2016-08-11  9:58     ` Chris Wilson
  0 siblings, 0 replies; 125+ messages in thread
From: Chris Wilson @ 2016-08-11  9:58 UTC (permalink / raw)
  To: Joonas Lahtinen; +Cc: intel-gfx

On Thu, Aug 11, 2016 at 12:32:50PM +0300, Joonas Lahtinen wrote:
> On su, 2016-08-07 at 15:45 +0100, Chris Wilson wrote:
> > diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
> > index 03a4d2ae71db..761201ff6b34 100644
> > --- a/drivers/gpu/drm/i915/i915_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/i915_guc_submission.c
> > @@ -343,7 +343,7 @@ static void guc_init_ctx_desc(struct intel_guc *guc,
> >  	for_each_engine(engine, dev_priv) {
> >  		struct intel_context *ce = &ctx->engine[engine->id];
> >  		struct guc_execlist_context *lrc = &desc.lrc[engine->guc_id];
> > -		struct drm_i915_gem_object *obj;
> > +		struct i915_vma *vma;
> >  
> >  		/* TODO: We have a design issue to be solved here. Only when we
> >  		 * receive the first batch, we know which engine is used by the
> > @@ -358,17 +358,15 @@ static void guc_init_ctx_desc(struct intel_guc *guc,
> >  		lrc->context_desc = lower_32_bits(ce->lrc_desc);
> >  
> >  		/* The state page is after PPHWSP */
> > -		gfx_addr = ce->state->node.start;
> > -		lrc->ring_lcra = gfx_addr + LRC_STATE_PN * PAGE_SIZE;
> > +		vma = ce->state;
> > +		lrc->ring_lcra = vma->node.start + LRC_STATE_PN * PAGE_SIZE;
> 
> An alias just for this line? Maybe not.

I was just trying to follow the conventions of the existing code.

> > @@ -1744,16 +1744,17 @@ logical_ring_default_irqs(struct intel_engine_cs *engine)
> >  static int
> >  lrc_setup_hws(struct intel_engine_cs *engine, struct i915_vma *vma)
> >  {
> > +#define HWS_OFFSET (LRC_PPHWSP_PN * PAGE_SIZE)
> 
> Wouldn't this go next to LRC_PPHWSP_PN?

I was going to undef it. Perhaps just make it a const local variable
instead.

> > @@ -1853,79 +1852,78 @@ static void cleanup_phys_status_page(struct intel_engine_cs *engine)
> >  
> >  static void cleanup_status_page(struct intel_engine_cs *engine)
> >  {
> > -	struct drm_i915_gem_object *obj;
> > +	struct i915_vma *vma;
> >  
> > -	obj = engine->status_page.obj;
> > -	if (obj == NULL)
> > +	vma = nullify(&engine->status_page.vma);
> > +	if (!vma)
> >  		return;
> >  
> > -	kunmap(sg_page(obj->pages->sgl));
> > -	i915_gem_object_ggtt_unpin(obj);
> > -	i915_gem_object_put(obj);
> > -	engine->status_page.obj = NULL;
> > +	i915_vma_unpin(vma);
> > +	i915_gem_object_unpin_map(vma->obj);
> > +	i915_gem_object_put(vma->obj);
> 
> This looks tad strange, because usually one first does all 'foo->bar'
> releases and then 'foo'. Just commenting here.

Next revision has both i915_vma_put() to hide the oddity, and
i915_vma_put_and_release for the common cases.

> > -	engine->status_page.gfx_addr = i915_gem_obj_ggtt_offset(obj);
> > -	engine->status_page.page_addr = kmap(sg_page(obj->pages->sgl));
> > -	memset(engine->status_page.page_addr, 0, PAGE_SIZE);
> > +	flags = PIN_GLOBAL;
> > +	if (!HAS_LLC(engine->i915))
> > +		/* On g33, we cannot place HWS above 256MiB, so
> > +		 * restrict its pinning to the low mappable arena.
> > +		 * Though this restriction is not documented for
> > +		 * gen4, gen5, or byt, they also behave similarly
> > +		 * and hang if the HWS is placed at the top of the
> > +		 * GTT. To generalise, it appears that all !llc
> > +		 * platforms have issues with us placing the HWS
> > +		 * above the mappable region (even though we never
> > +		 * actualy map it).
> > +		 */
> > +		flags |= PIN_MAPPABLE;
> 
> For readability, I'd move the comment one level up and before the if.

Just moving the comment, so I'd rather keep the stanza intact.

> > -		/* Access through the GTT requires the device to be awake. */
> > -		assert_rpm_wakelock_held(dev_priv);
> 
> This wakelock disappears in this patch.

Hmm, it was in the pin_iomap. Apparently not at this point in time.

> > +	if (HAS_LLC(dev_priv) && !obj->stolen)
> > +		ret = i915_gem_object_set_to_cpu_domain(obj, true);
> > +	else
> > +		ret = i915_gem_object_set_to_gtt_domain(obj, true);
> > +	if (ret) {
> > +		vma = ERR_PTR(ret);
> > +		goto err;
> > +	}
> 
> Might be worth mentioning that the ring objects are now moved to their
> domain at the time creation, not pinning. Any specific reason for the
> change?

Just saving a bit of complexity at pinning (and taking a risk doing so).
But since we have the is-bound? trick, we can use that instead.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 21/33] drm/i915: Use VMA for scratch page tracking
  2016-08-07 14:45 ` [PATCH 21/33] drm/i915: Use VMA for scratch page tracking Chris Wilson
  2016-08-08  8:00   ` [PATCH 1/3] " Chris Wilson
@ 2016-08-11 10:06   ` Joonas Lahtinen
  2016-08-11 10:22     ` Chris Wilson
  1 sibling, 1 reply; 125+ messages in thread
From: Joonas Lahtinen @ 2016-08-11 10:06 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On su, 2016-08-07 at 15:45 +0100, Chris Wilson wrote:
> +
> +	engine->scratch = vma;
> +	DRM_DEBUG_DRIVER("%s pipe control offset: 0x%08llx\n",
> +			 engine->name, vma->node.start);

Not related to this patch; we do seem to have confusion on scratch vs.
pipe control terms.

> +	return 0;
> +
> +err_unref:
> +	i915_gem_object_put(obj);
> +	return ret;
> +}
>  	return index;
> @@ -993,7 +993,7 @@ static int gen8_init_indirectctx_bb(struct intel_engine_cs *engine,
>  
>  	/* WaClearSlmSpaceAtContextSwitch:bdw,chv */
>  	/* Actual scratch location is at 128 bytes offset */
> -	scratch_addr = engine->scratch.gtt_offset + 2*CACHELINE_BYTES;
> +	scratch_addr = engine->scratch->node.start + 2*CACHELINE_BYTES;

Add spaces around *

>  
>  	wa_ctx_emit(batch, index, GFX_OP_PIPE_CONTROL(6));
>  	wa_ctx_emit(batch, index, (PIPE_CONTROL_FLUSH_L3 |
> @@ -1072,8 +1072,8 @@ static int gen9_init_indirectctx_bb(struct intel_engine_cs *engine,
>  	/* WaClearSlmSpaceAtContextSwitch:kbl */
>  	/* Actual scratch location is at 128 bytes offset */
>  	if (IS_KBL_REVID(dev_priv, 0, KBL_REVID_A0)) {
> -		uint32_t scratch_addr
> -			= engine->scratch.gtt_offset + 2*CACHELINE_BYTES;
> +		uint32_t scratch_addr =
> +			engine->scratch->node.start + 2*CACHELINE_BYTES;

While correcting formatting; add spaces around * 

With those two tweaks;

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 22/33] drm/i915/overlay: Use VMA as the primary tracker for images
  2016-08-07 14:45 ` [PATCH 22/33] drm/i915/overlay: Use VMA as the primary tracker for images Chris Wilson
@ 2016-08-11 10:17   ` Joonas Lahtinen
  0 siblings, 0 replies; 125+ messages in thread
From: Joonas Lahtinen @ 2016-08-11 10:17 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On su, 2016-08-07 at 15:45 +0100, Chris Wilson wrote:
> @@ -171,8 +171,7 @@ struct overlay_registers {
>  struct intel_overlay {
>  	struct drm_i915_private *i915;
>  	struct intel_crtc *crtc;
> -	struct drm_i915_gem_object *vid_bo;
> -	struct drm_i915_gem_object *old_vid_bo;
> +	struct i915_vma *vma, *old_vma;

Only nitpick here; I'd keep two line form.

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 21/33] drm/i915: Use VMA for scratch page tracking
  2016-08-11 10:06   ` [PATCH 21/33] drm/i915: Use VMA for scratch page tracking Joonas Lahtinen
@ 2016-08-11 10:22     ` Chris Wilson
  0 siblings, 0 replies; 125+ messages in thread
From: Chris Wilson @ 2016-08-11 10:22 UTC (permalink / raw)
  To: Joonas Lahtinen; +Cc: intel-gfx

On Thu, Aug 11, 2016 at 01:06:23PM +0300, Joonas Lahtinen wrote:
> On su, 2016-08-07 at 15:45 +0100, Chris Wilson wrote:
> > +
> > +	engine->scratch = vma;
> > +	DRM_DEBUG_DRIVER("%s pipe control offset: 0x%08llx\n",
> > +			 engine->name, vma->node.start);
> 
> Not related to this patch; we do seem to have confusion on scratch vs.
> pipe control terms.

I think after this it is just scratch space that is used by pipecontrol,
wa, cs flips, etc and references to it being only for pipecontrol are
gone. (We could still use the HWS for these scratch writes iirc.)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 23/33] drm/i915: Use VMA as the primary tracker for semaphore page
  2016-08-07 14:45 ` [PATCH 23/33] drm/i915: Use VMA as the primary tracker for semaphore page Chris Wilson
@ 2016-08-11 10:42   ` Joonas Lahtinen
  0 siblings, 0 replies; 125+ messages in thread
From: Joonas Lahtinen @ 2016-08-11 10:42 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On su, 2016-08-07 at 15:45 +0100, Chris Wilson wrote:
> @@ -530,7 +530,7 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
>  		}
>  	}
>  
> -	if ((obj = error->semaphore_obj)) {
> +	if ((obj = error->semaphore)) {

Hate this kind of code which is direct result of copy paste... Adding
to TODO list.

>  static int gen8_rcs_signal(struct drm_i915_gem_request *req)
> @@ -2329,12 +2331,14 @@ void intel_engine_init_seqno(struct intel_engine_cs *engine, u32 seqno)
>  		if (HAS_VEBOX(dev_priv))
>  			I915_WRITE(RING_SYNC_2(engine->mmio_base), 0);
>  	}
> -	if (dev_priv->semaphore_obj) {
> -		struct drm_i915_gem_object *obj = dev_priv->semaphore_obj;
> +	if (dev_priv->semaphore) {
> +		struct drm_i915_gem_object *obj = dev_priv->semaphore->obj;
>  		struct page *page = i915_gem_object_get_dirty_page(obj, 0);
>  		void *semaphores = kmap(page);
>  		memset(semaphores + GEN8_SEMAPHORE_OFFSET(engine->id, 0),
>  		       0, I915_NUM_ENGINES * gen8_semaphore_seqno_size);
> +		drm_clflush_virt_range(semaphores + GEN8_SEMAPHORE_OFFSET(engine->id, 0),
> +				       I915_NUM_ENGINES * gen8_semaphore_seqno_size);

Where did this hunk appear from? Did not expect based on the commit
message as there was no such thing :P

>  		kunmap(page);
>  	}
>  	memset(engine->semaphore.sync_seqno, 0,
> @@ -2556,36 +2560,40 @@ static int gen6_ring_flush(struct drm_i915_gem_request *req, u32 mode)
>  static void intel_ring_init_semaphores(struct drm_i915_private *dev_priv,
>  				       struct intel_engine_cs *engine)
>  {
> -	struct drm_i915_gem_object *obj;
>  	int ret, i;
>  
>  	if (!i915.semaphores)
>  		return;
>  
> -	if (INTEL_GEN(dev_priv) >= 8 && !dev_priv->semaphore_obj) {
> +	if (INTEL_GEN(dev_priv) >= 8 && !dev_priv->semaphore) {
> +		struct drm_i915_gem_object *obj;
> +		struct i915_vma *vma;
> +
>  		obj = i915_gem_object_create(&dev_priv->drm, 4096);
>  		if (IS_ERR(obj)) {
> -			DRM_ERROR("Failed to allocate semaphore bo. Disabling semaphores\n");

Silencing a DRM_ERROR, maybe into commit message too.

>  			i915.semaphores = 0;
> -		} else {
> -			i915_gem_object_set_cache_level(obj, I915_CACHE_LLC);

Right, this is traded for the drm_clflush_virt_range()? I'd add a
comment on top of the new location.

> -			ret = i915_gem_object_ggtt_pin(obj, NULL,
> -						       0, 0, PIN_HIGH);
> -			if (ret != 0) {
> -				i915_gem_object_put(obj);
> -				DRM_ERROR("Failed to pin semaphore bo. Disabling semaphores\n");
> -				i915.semaphores = 0;
> -			} else {
> -				dev_priv->semaphore_obj = obj;
> -			}
> +			return;

Goto teardown;

>  		}
> -	}
>  
> -	if (!i915.semaphores)
> -		return;
> +		vma = i915_vma_create(obj, &dev_priv->ggtt.base, NULL);
> +		if (IS_ERR(vma)) {
> +			i915_gem_object_put(obj);
> +			i915.semaphores = 0;
> +			return;

Goto teardown.

> +		}
> +
> +		ret = i915_vma_pin(vma, 0, 0, PIN_GLOBAL | PIN_HIGH);
> +		if (ret) {
> +			i915_gem_object_put(obj);
> +			i915.semaphores = 0;
> +			return;

Goto teardown.

> +		}
> +
> +		dev_priv->semaphore = vma;
> +	}
>  

Above addressed;

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 24/33] drm/i915: Use VMA for render state page tracking
  2016-08-07 14:45 ` [PATCH 24/33] drm/i915: Use VMA for render state page tracking Chris Wilson
@ 2016-08-11 10:46   ` Joonas Lahtinen
  0 siblings, 0 replies; 125+ messages in thread
From: Joonas Lahtinen @ 2016-08-11 10:46 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On su, 2016-08-07 at 15:45 +0100, Chris Wilson wrote:
> @@ -24,7 +24,7 @@
>  #ifndef _I915_GEM_RENDER_STATE_H_
>  #define _I915_GEM_RENDER_STATE_H_
>  
> -#include 
> +struct drm_i915_gem_request;
>  

This patch almost only did what the title said...

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 25/33] drm/i915: Use VMA for wa_ctx tracking
  2016-08-07 14:45 ` [PATCH 25/33] drm/i915: Use VMA for wa_ctx tracking Chris Wilson
@ 2016-08-11 10:53   ` Joonas Lahtinen
  2016-08-11 11:02     ` Chris Wilson
  0 siblings, 1 reply; 125+ messages in thread
From: Joonas Lahtinen @ 2016-08-11 10:53 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On su, 2016-08-07 at 15:45 +0100, Chris Wilson wrote:
>  static int lrc_setup_wa_ctx_obj(struct intel_engine_cs *engine, u32 size)
>  {
> -	int ret;
> +	struct drm_i915_gem_object *obj;
> +	struct i915_vma *vma;
> +	int err;
>  
> -	engine->wa_ctx.obj = i915_gem_object_create(&engine->i915->drm,
> -						    PAGE_ALIGN(size));
> -	if (IS_ERR(engine->wa_ctx.obj)) {
> -		DRM_DEBUG_DRIVER("alloc LRC WA ctx backing obj failed.\n");
> -		ret = PTR_ERR(engine->wa_ctx.obj);
> -		engine->wa_ctx.obj = NULL;
> -		return ret;
> +	obj = i915_gem_object_create(&engine->i915->drm, PAGE_ALIGN(size));
> +	if (IS_ERR(obj))
> +		return PTR_ERR(obj);
> +
> +	vma = i915_vma_create(obj, &engine->i915->ggtt.base, NULL);
> +	if (IS_ERR(vma)) {
> +		i915_gem_object_put(obj);
> +		return PTR_ERR(vma);

Goto teardown; err = PTR_ERR(vma);

>  	}
>  
> -	ret = i915_gem_object_ggtt_pin(engine->wa_ctx.obj, NULL,
> -				       0, PAGE_SIZE, PIN_HIGH);
> -	if (ret) {
> -		DRM_DEBUG_DRIVER("pin LRC WA ctx backing obj failed: %d\n",
> -				 ret);
> -		i915_gem_object_put(engine->wa_ctx.obj);
> -		return ret;
> +	err = i915_vma_pin(vma, 0, PAGE_SIZE, PIN_GLOBAL | PIN_HIGH);
> +	if (err) {
> +		i915_gem_object_put(obj);

Goto teardown.

> +		return err;
>  	}
>  
> +	engine->wa_ctx.vma = vma;
>  	return 0;
>  }
>  
> @@ -2019,9 +2023,9 @@ populate_lr_context(struct i915_gem_context *ctx,
>  			       RING_INDIRECT_CTX(engine->mmio_base), 0);
>  		ASSIGN_CTX_REG(reg_state, CTX_RCS_INDIRECT_CTX_OFFSET,
>  			       RING_INDIRECT_CTX_OFFSET(engine->mmio_base), 0);
> -		if (engine->wa_ctx.obj) {
> +		if (engine->wa_ctx.vma) {
>  			struct i915_ctx_workarounds *wa_ctx = &engine->wa_ctx;
> -			uint32_t ggtt_offset = i915_gem_obj_ggtt_offset(wa_ctx->obj);
> +			u32 ggtt_offset = wa_ctx->vma->node.start;

lower_32_bits()?

>  
>  			reg_state[CTX_RCS_INDIRECT_CTX+1] =
>  				(ggtt_offset + wa_ctx->indirect_ctx.offset * sizeof(uint32_t)) |

With above addressed;

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 25/33] drm/i915: Use VMA for wa_ctx tracking
  2016-08-11 10:53   ` Joonas Lahtinen
@ 2016-08-11 11:02     ` Chris Wilson
  2016-08-11 12:41       ` Joonas Lahtinen
  0 siblings, 1 reply; 125+ messages in thread
From: Chris Wilson @ 2016-08-11 11:02 UTC (permalink / raw)
  To: Joonas Lahtinen; +Cc: intel-gfx

On Thu, Aug 11, 2016 at 01:53:40PM +0300, Joonas Lahtinen wrote:
> On su, 2016-08-07 at 15:45 +0100, Chris Wilson wrote:
> > @@ -2019,9 +2023,9 @@ populate_lr_context(struct i915_gem_context *ctx,
> >  			       RING_INDIRECT_CTX(engine->mmio_base), 0);
> >  		ASSIGN_CTX_REG(reg_state, CTX_RCS_INDIRECT_CTX_OFFSET,
> >  			       RING_INDIRECT_CTX_OFFSET(engine->mmio_base), 0);
> > -		if (engine->wa_ctx.obj) {
> > +		if (engine->wa_ctx.vma) {
> >  			struct i915_ctx_workarounds *wa_ctx = &engine->wa_ctx;
> > -			uint32_t ggtt_offset = i915_gem_obj_ggtt_offset(wa_ctx->obj);
> > +			u32 ggtt_offset = wa_ctx->vma->node.start;
> 
> lower_32_bits()?

I considered, I didn't to keep the changes to a minimum plus I've a
slight unease about making it seem like we don't care about the upper 32
bits.

static inline u32 i915_ggtt_offset(vma)
{
	GEM_BUG_ON(upper_32_bits(vma->node.start));
	return lower_32_bits(vma->node.start);
}

is possibly overkill but stops me feeling uneasy about the
seeming truncation. Is this something that UBSAN detects?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 26/33] drm/i915: Track pinned VMA
  2016-08-07 14:45 ` [PATCH 26/33] drm/i915: Track pinned VMA Chris Wilson
@ 2016-08-11 12:18   ` Joonas Lahtinen
  2016-08-11 12:37     ` Chris Wilson
  0 siblings, 1 reply; 125+ messages in thread
From: Joonas Lahtinen @ 2016-08-11 12:18 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On su, 2016-08-07 at 15:45 +0100, Chris Wilson wrote:
> @@ -1616,7 +1618,7 @@ i915_gem_mmap_ioctl(struct drm_device *dev, void *data,
>  
>  /**
>   * i915_gem_fault - fault a page into the GTT
> - * @vma: VMA in question
> + * @mm: VMA in question

should be @vm or whatever the correct name.

>   * @vmf: fault info
>   *
>   * The fault handler is set up by drm_gem_mmap() when a object is GTT mapped
> @@ -1630,20 +1632,21 @@ i915_gem_mmap_ioctl(struct drm_device *dev, void *data,
>   * suffer if the GTT working set is large or there are few fence registers
>   * left.
>   */
> -int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
> +int i915_gem_fault(struct vm_area_struct *vm, struct vm_fault *vmf)

'vm' is used elsewhere for the address space, maybe 'kvma'? Or 'area'
(used in linux/mm.h too occasionally)

> @@ -1722,13 +1726,13 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
>  	} else {
>  		if (!obj->fault_mappable) {
>  			unsigned long size = min_t(unsigned long,
> -						   vma->vm_end - vma->vm_start,
> +						   vm->vm_end - vm->vm_start,
>  						   obj->base.size);
>  			int i;
>  
>  			for (i = 0; i < size >> PAGE_SHIFT; i++) {
> -				ret = vm_insert_pfn(vma,
> -						    (unsigned long)vma->vm_start + i * PAGE_SIZE,
> +				ret = vm_insert_pfn(vm,
> +						    (unsigned long)vm->vm_start + i * PAGE_SIZE,

Hmm, vm->vm_start is already unsigned long, so cast could be
eliminated.

>  						    pfn + i);
>  				if (ret)
>  					break;
> @@ -1736,12 +1740,12 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
>  
>  			obj->fault_mappable = true;
>  		} else
> -			ret = vm_insert_pfn(vma,
> +			ret = vm_insert_pfn(vm,
>  					    (unsigned long)vmf->virtual_address,
>  					    pfn + page_offset);
>  	}
>  err_unpin:
> -	i915_gem_object_ggtt_unpin_view(obj, &view);
> +	__i915_vma_unpin(vma);
>  err_unlock:
>  	mutex_unlock(&dev->struct_mutex);
>  err_rpm:
> @@ -3190,7 +3194,7 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
>  					    old_write_domain);
>  
>  	/* And bump the LRU for this access */
> -	vma = i915_gem_obj_to_ggtt(obj);
> +	vma = i915_gem_object_to_ggtt(obj, NULL);
>  	if (vma &&
>  	    drm_mm_node_allocated(&vma->node) &&
>  	    !i915_vma_is_active(vma))
> @@ -3414,11 +3418,12 @@ rpm_put:
>   * Can be called from an uninterruptible phase (modesetting) and allows
>   * any flushes to be pipelined (for pageflips).
>   */
> -int
> +struct i915_vma *
>  i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
>  				     u32 alignment,
>  				     const struct i915_ggtt_view *view)
>  {
> +	struct i915_vma *vma;
>  	u32 old_read_domains, old_write_domain;
>  	int ret;
>  
> @@ -3438,19 +3443,23 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
>  	 */
>  	ret = i915_gem_object_set_cache_level(obj,
>  					      HAS_WT(obj->base.dev) ? I915_CACHE_WT : I915_CACHE_NONE);
> -	if (ret)
> +	if (ret) {
> +		vma = ERR_PTR(ret);
>  		goto err_unpin_display;
> +	}
>  
>  	/* As the user may map the buffer once pinned in the display plane
>  	 * (e.g. libkms for the bootup splash), we have to ensure that we
>  	 * always use map_and_fenceable for all scanout buffers.
>  	 */
> -	ret = i915_gem_object_ggtt_pin(obj, view, 0, alignment,
> +	vma = i915_gem_object_ggtt_pin(obj, view, 0, alignment,
>  				       view->type == I915_GGTT_VIEW_NORMAL ?
>  				       PIN_MAPPABLE : 0);
> -	if (ret)
> +	if (IS_ERR(vma))
>  		goto err_unpin_display;
>  
> +	WARN_ON(obj->pin_display > i915_vma_pin_count(vma));
> +
>  	i915_gem_object_flush_cpu_write_domain(obj);
>  
>  	old_write_domain = obj->base.write_domain;
> @@ -3466,23 +3475,23 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
>  					    old_read_domains,
>  					    old_write_domain);
>  
> -	return 0;
> +	return vma;
>  
>  err_unpin_display:
>  	obj->pin_display--;
> -	return ret;
> +	return vma;
>  }
>  
>  void
> -i915_gem_object_unpin_from_display_plane(struct drm_i915_gem_object *obj,
> -					 const struct i915_ggtt_view *view)
> +i915_gem_object_unpin_from_display_plane(struct i915_vma *vma)
>  {
> -	if (WARN_ON(obj->pin_display == 0))
> +	if (WARN_ON(vma->obj->pin_display == 0))
>  		return;
>  
> -	i915_gem_object_ggtt_unpin_view(obj, view);
> +	vma->obj->pin_display--;
>  
> -	obj->pin_display--;
> +	i915_vma_unpin(vma);
> +	WARN_ON(vma->obj->pin_display > i915_vma_pin_count(vma));
>  }
>  
>  /**
> @@ -3679,27 +3688,25 @@ err:
>  	return ret;
>  }
>  
> -int
> +struct i915_vma *
>  i915_gem_object_ggtt_pin(struct drm_i915_gem_object *obj,
> -			 const struct i915_ggtt_view *view,
> +			 const struct i915_ggtt_view *ggtt_view,

Hmm, why distinctive name compared to other places? This function has
_ggtt_ in the name, so should be implicit.

>  			 u64 size,
>  			 u64 alignment,
>  			 u64 flags)
>  {

<SNIP>

> -/* All the new VM stuff */

Oh noes, we destroy all the new stuff :P

> @@ -349,30 +349,34 @@ relocate_entry_gtt(struct drm_i915_gem_object *obj,
>  		   struct drm_i915_gem_relocation_entry *reloc,
>  		   uint64_t target_offset)
>  {
> -	struct drm_device *dev = obj->base.dev;
> -	struct drm_i915_private *dev_priv = to_i915(dev);
> +	struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
>  	struct i915_ggtt *ggtt = &dev_priv->ggtt;
> +	struct i915_vma *vma;
>  	uint64_t delta = relocation_target(reloc, target_offset);
>  	uint64_t offset;
>  	void __iomem *reloc_page;
>  	int ret;
>  
> +	vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0, PIN_MAPPABLE);
> +	if (IS_ERR(vma))
> +		return PTR_ERR(vma);
> +
>  	ret = i915_gem_object_set_to_gtt_domain(obj, true);
>  	if (ret)
> -		return ret;
> +		goto unpin;
>  
>  	ret = i915_gem_object_put_fence(obj);
>  	if (ret)
> -		return ret;
> +		goto unpin;
>  
>  	/* Map the page containing the relocation we're going to perform.  */
> -	offset = i915_gem_obj_ggtt_offset(obj);
> +	offset = vma->node.start;
>  	offset += reloc->offset;

Could concatenate to previous line now;

offset = vma->node.start + reloc->offset;

> -static struct i915_vma*
> +static struct i915_vma *
>  i915_gem_execbuffer_parse(struct intel_engine_cs *engine,
>  			  struct drm_i915_gem_exec_object2 *shadow_exec_entry,
>  			  struct drm_i915_gem_object *batch_obj,
> @@ -1305,31 +1311,30 @@ i915_gem_execbuffer_parse(struct intel_engine_cs *engine,
>  				      batch_start_offset,
>  				      batch_len,
>  				      is_master);
> -	if (ret)
> +	if (ret) {
> +		if (ret == -EACCES) /* unhandled chained batch */
> +			vma = NULL;
> +		else
> +			vma = ERR_PTR(ret);
>  		goto err;
> +	}
>  
> -	ret = i915_gem_object_ggtt_pin(shadow_batch_obj, NULL, 0, 0, 0);
> -	if (ret)
> +	vma = i915_gem_object_ggtt_pin(shadow_batch_obj, NULL, 0, 0, 0);
> +	if (IS_ERR(vma)) {
> +		ret = PTR_ERR(vma);

Hmm, 'err' label no longer cares about ret, so this is redundant. Or
should the ret-to-vma translation been kept at the end?

>  		goto err;
> -
> -	i915_gem_object_unpin_pages(shadow_batch_obj);
> +	}
>  
>  	memset(shadow_exec_entry, 0, sizeof(*shadow_exec_entry));
>  
> -	vma = i915_gem_obj_to_ggtt(shadow_batch_obj);
>  	vma->exec_entry = shadow_exec_entry;
>  	vma->exec_entry->flags = __EXEC_OBJECT_HAS_PIN;
>  	i915_gem_object_get(shadow_batch_obj);
>  	list_add_tail(&vma->exec_list, &eb->vmas);
>  
> -	return vma;
> -
>  err:
>  	i915_gem_object_unpin_pages(shadow_batch_obj);
> -	if (ret == -EACCES) /* unhandled chained batch */
> -		return NULL;
> -	else
> -		return ERR_PTR(ret);
> +	return vma;
>  }
>  

<SNIP>

> @@ -432,13 +432,7 @@ bool
>  i915_gem_object_pin_fence(struct drm_i915_gem_object *obj)
>  {
>  	if (obj->fence_reg != I915_FENCE_REG_NONE) {
> -		struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
> -		struct i915_vma *ggtt_vma = i915_gem_obj_to_ggtt(obj);
> -
> -		WARN_ON(!ggtt_vma ||
> -			dev_priv->fence_regs[obj->fence_reg].pin_count >
> -			i915_vma_pin_count(ggtt_vma));

Is this WARN_ON deliberately removed, not worthy GEM_BUG_ON?

> -		dev_priv->fence_regs[obj->fence_reg].pin_count++;
> +		to_i915(obj->base.dev)->fence_regs[obj->fence_reg].pin_count++;

This is not the most readable line of code one can imagine. dev_priv
alias does make readability better occasionally.

> @@ -3351,14 +3351,10 @@ __i915_gem_vma_create(struct drm_i915_gem_object *obj,
>  
>  	GEM_BUG_ON(vm->closed);
>  
> -	if (WARN_ON(i915_is_ggtt(vm) != !!view))
> -		return ERR_PTR(-EINVAL);
> -
>  	vma = kmem_cache_zalloc(to_i915(obj->base.dev)->vmas, GFP_KERNEL);
>  	if (vma == NULL)
>  		return ERR_PTR(-ENOMEM);
>  
> -	INIT_LIST_HEAD(&vma->obj_link);

This disappears completely?

> @@ -3378,56 +3373,76 @@ __i915_gem_vma_create(struct drm_i915_gem_object *obj,
> +static inline bool vma_matches(struct i915_vma *vma,
> +			       struct i915_address_space *vm,
> +			       const struct i915_ggtt_view *view)

Function name is not clearest. But I can not suggest a better one off
the top of my head.
 
>  static struct drm_i915_error_object *
>  i915_error_object_create(struct drm_i915_private *dev_priv,
> -			 struct drm_i915_gem_object *src,
> -			 struct i915_address_space *vm)
> +			 struct i915_vma *vma)
>  {
>  	struct i915_ggtt *ggtt = &dev_priv->ggtt;
> +	struct drm_i915_gem_object *src;
>  	struct drm_i915_error_object *dst;
> -	struct i915_vma *vma = NULL;
>  	int num_pages;
>  	bool use_ggtt;
>  	int i = 0;
>  	u64 reloc_offset;
>  
> -	if (src == NULL || src->pages == NULL)
> +	if (!vma)
> +		return NULL;
> +
> +	src = vma->obj;
> +	if (!src->pages)
>  		return NULL;
>  
>  	num_pages = src->base.size >> PAGE_SHIFT;
>  
>  	dst = kmalloc(sizeof(*dst) + num_pages * sizeof(u32 *), GFP_ATOMIC);
> -	if (dst == NULL)
> +	if (!dst)
>  		return NULL;
>  
> -	if (i915_gem_obj_bound(src, vm))
> -		dst->gtt_offset = i915_gem_obj_offset(src, vm);
> -	else
> -		dst->gtt_offset = -1;

What was the purpose of this line previously, the calculations would
get majestically messed up?

> @@ -1035,11 +1029,8 @@ static void i915_gem_record_rings(struct drm_i915_private *dev_priv,
>  	struct drm_i915_gem_request *request;
>  	int i, count;
>  
> -	if (dev_priv->semaphore) {
> -		error->semaphore =
> -			i915_error_ggtt_object_create(dev_priv,
> -						      dev_priv->semaphore->obj);
> -	}
> +	error->semaphore =
> +		i915_error_object_create(dev_priv, dev_priv->semaphore);

Not sure if I like hiding the NULL tolerancy inside the function.

> @@ -2240,10 +2240,11 @@ intel_pin_and_fence_fb_obj(struct drm_framebuffer *fb,
>  	 */
>  	intel_runtime_pm_get(dev_priv);
>  
> -	ret = i915_gem_object_pin_to_display_plane(obj, alignment,
> -						   &view);
> -	if (ret)
> +	vma = i915_gem_object_pin_to_display_plane(obj, alignment, &view);
> +	if (IS_ERR(vma)) {
> +		ret = PTR_ERR(vma);

Other places there's return vma in the error path too, might be cleaner
here too as there's already translation to -EBUSY in the ret error use.

>  		goto err_pm;
> +	}
>  
>  	/* Install a fence for tiled scan-out. Pre-i965 always needs a
>  	 * fence, whereas 965+ only requires a fence if using
> @@ -2270,19 +2271,20 @@ intel_pin_and_fence_fb_obj(struct drm_framebuffer *fb,
>  	}
>  
>  	intel_runtime_pm_put(dev_priv);
> -	return 0;
> +	return vma;
>  
>  err_unpin:
> -	i915_gem_object_unpin_from_display_plane(obj, &view);
> +	i915_gem_object_unpin_from_display_plane(vma);
>  err_pm:
>  	intel_runtime_pm_put(dev_priv);
> -	return ret;
> +	return ERR_PTR(ret);
>  }
>  

<SNIP>

> @@ -2291,7 +2293,8 @@ void intel_unpin_fb_obj(struct drm_framebuffer *fb, unsigned int rotation)
>  	if (view.type == I915_GGTT_VIEW_NORMAL)
>  		i915_gem_object_unpin_fence(obj);
>  
> -	i915_gem_object_unpin_from_display_plane(obj, &view);
> +	vma = i915_gem_object_to_ggtt(obj, &view);

GEM_BUG_ON(!vma)?

> +	i915_gem_object_unpin_from_display_plane(vma);
>  }
>  
>  /*
> @@ -2552,7 +2555,7 @@ intel_find_initial_plane_obj(struct intel_crtc *intel_crtc,
>  			continue;
>  
>  		obj = intel_fb_obj(fb);
> -		if (i915_gem_obj_ggtt_offset(obj) == plane_config->base) {
> +		if (i915_gem_object_ggtt_offset(obj, NULL) == plane_config->base) {
>  			drm_framebuffer_reference(fb);
>  			goto valid_fb;
>  		}
> @@ -2709,11 +2712,11 @@ static void i9xx_update_primary_plane(struct drm_plane *primary,
>  	I915_WRITE(DSPSTRIDE(plane), fb->pitches[0]);
>  	if (INTEL_INFO(dev)->gen >= 4) {
>  		I915_WRITE(DSPSURF(plane),
> -			   i915_gem_obj_ggtt_offset(obj) + intel_crtc->dspaddr_offset);
> +			   i915_gem_object_ggtt_offset(obj, NULL) + intel_crtc->dspaddr_offset);

As discussed in IRC, this function to be removed further down the path.

> @@ -216,17 +215,17 @@ static int intelfb_create(struct drm_fb_helper *helper,
>  		sizes->fb_height = intel_fb->base.height;
>  	}
>  
> -	obj = intel_fb->obj;
> -
>  	mutex_lock(&dev->struct_mutex);
>  
>  	/* Pin the GGTT vma for our access via info->screen_base.
>  	 * This also validates that any existing fb inherited from the
>  	 * BIOS is suitable for own access.
>  	 */
> -	ret = intel_pin_and_fence_fb_obj(&ifbdev->fb->base, BIT(DRM_ROTATE_0));
> -	if (ret)
> +	vma = intel_pin_and_fence_fb_obj(&ifbdev->fb->base, BIT(DRM_ROTATE_0));

Needs rebasing, BIT(DRM_ROTATE_0) is now just DRM_ROTATE_0.

> @@ -1443,8 +1443,8 @@ void intel_setup_overlay(struct drm_i915_private *dev_priv)
>  	return;
>  
>  out_unpin_bo:
> -	if (!OVERLAY_NEEDS_PHYSICAL(dev_priv))
> -		i915_gem_object_ggtt_unpin(reg_bo);
> +	if (vma)
> +		i915_vma_unpin(vma);

This might be the only acceptable use of if () in teardown path.

I hope there is an excellent revision listing in the next iteration.
This was a pain to go through.

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 27/33] drm/i915: Print the batchbuffer offset next to BBADDR in error state
  2016-08-07 14:45 ` [PATCH 27/33] drm/i915: Print the batchbuffer offset next to BBADDR in error state Chris Wilson
@ 2016-08-11 12:24   ` Joonas Lahtinen
  0 siblings, 0 replies; 125+ messages in thread
From: Joonas Lahtinen @ 2016-08-11 12:24 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On su, 2016-08-07 at 15:45 +0100, Chris Wilson wrote:
> It is useful when looking at captured error states to check the recorded
> BBADDR register (the address of the last batchbuffer instruction loaded)
> against the expected offset of the batch buffer, and so do a quick check
> that (a) the capture is true or (b) HEAD hasn't wandered off into the
> badlands.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

> ---
>  drivers/gpu/drm/i915/i915_drv.h       |  1 +
>  drivers/gpu/drm/i915/i915_gpu_error.c | 13 ++++++++++++-
>  2 files changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index ed9d872859b3..4023718017a8 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -552,6 +552,7 @@ struct drm_i915_error_state {
>  		struct drm_i915_error_object {
>  			int page_count;
>  			u64 gtt_offset;
> +			u64 gtt_size;
>  			u32 *pages[0];
>  		} *ringbuffer, *batchbuffer, *wa_batchbuffer, *ctx, *hws_page;
>  
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index 1bceaf96bc5f..9faac19029cd 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -243,6 +243,14 @@ static void error_print_engine(struct drm_i915_error_state_buf *m,
>  	err_printf(m, "  IPEIR: 0x%08x\n", ee->ipeir);
>  	err_printf(m, "  IPEHR: 0x%08x\n", ee->ipehr);
>  	err_printf(m, "  INSTDONE: 0x%08x\n", ee->instdone);
> +	if (ee->batchbuffer) {
> +		u64 start = ee->batchbuffer->gtt_offset;
> +		u64 end = start + ee->batchbuffer->gtt_size;
> +
> +		err_printf(m, "  batch: [0x%08x %08x, 0x%08x %08x]\n",

underscores to join the numbers for consistency, I think one way or
another should be kept as a standard.

> +			   upper_32_bits(start), lower_32_bits(start),
> +			   upper_32_bits(end), lower_32_bits(end));
> +	}
>  	if (INTEL_GEN(m->i915) >= 4) {
>  		err_printf(m, "  BBADDR: 0x%08x %08x\n",
>  			   (u32)(ee->bbaddr>>32), (u32)ee->bbaddr);
> @@ -657,7 +665,10 @@ i915_error_object_create(struct drm_i915_private *dev_priv,
>  	if (!dst)
>  		return NULL;
>  
> -	reloc_offset = dst->gtt_offset = vma->node.start;
> +	dst->gtt_offset = vma->node.start;
> +	dst->gtt_size = vma->node.size;
> +
> +	reloc_offset = dst->gtt_offset;
>  	use_ggtt = (src->cache_level == I915_CACHE_NONE &&
>  		   (vma->flags & I915_VMA_GLOBAL_BIND) &&
>  		   reloc_offset + num_pages * PAGE_SIZE <= ggtt->mappable_end);
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 28/33] drm/i915: Move per-request pid from request to ctx
  2016-08-07 14:45 ` [PATCH 28/33] drm/i915: Move per-request pid from request to ctx Chris Wilson
@ 2016-08-11 12:32   ` Joonas Lahtinen
  2016-08-11 12:41     ` Chris Wilson
  0 siblings, 1 reply; 125+ messages in thread
From: Joonas Lahtinen @ 2016-08-11 12:32 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On su, 2016-08-07 at 15:45 +0100, Chris Wilson wrote:
> Since contexts are not currently shared between userspace processes, we

How about future?

Patch itself seems fine. Title could be more like "Move PID tracking
from request to context".

Regards, Joonas

> have an exact correspondence between context creator and guilty batch
> submitter. Therefore we can save some per-batch work by inspecting the
> context->pid upon error instead. Note that we take the context's
> creator's pid rather than the file's pid in order to better track fd
> passed over sockets.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_debugfs.c     | 25 ++++++++++++++++---------
>  drivers/gpu/drm/i915/i915_drv.h         |  2 ++
>  drivers/gpu/drm/i915/i915_gem_context.c |  4 ++++
>  drivers/gpu/drm/i915/i915_gem_request.c |  5 -----
>  drivers/gpu/drm/i915/i915_gem_request.h |  3 ---
>  drivers/gpu/drm/i915/i915_gpu_error.c   | 13 ++++++++++---
>  6 files changed, 32 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 5f00d6347905..963c6d28d332 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -460,6 +460,8 @@ static int i915_gem_object_info(struct seq_file *m, void* data)
>  	print_context_stats(m, dev_priv);
>  	list_for_each_entry_reverse(file, &dev->filelist, lhead) {
>  		struct file_stats stats;
> +		struct drm_i915_file_private *file_priv = file->driver_priv;
> +		struct drm_i915_gem_request *request;
>  		struct task_struct *task;
>  
>  		memset(&stats, 0, sizeof(stats));
> @@ -473,10 +475,17 @@ static int i915_gem_object_info(struct seq_file *m, void* data)
>  		 * still alive (e.g. get_pid(current) => fork() => exit()).
>  		 * Therefore, we need to protect this ->comm access using RCU.
>  		 */
> +		mutex_lock(&dev->struct_mutex);
> +		request = list_first_entry_or_null(&file_priv->mm.request_list,
> +						   struct drm_i915_gem_request,
> +						   client_list);
>  		rcu_read_lock();
> -		task = pid_task(file->pid, PIDTYPE_PID);
> +		task = pid_task(request && request->ctx->pid ?
> +				request->ctx->pid : file->pid,
> +				PIDTYPE_PID);
>  		print_file_stats(m, task ? task->comm : "", stats);
>  		rcu_read_unlock();
> +		mutex_unlock(&dev->struct_mutex);
>  	}
>  	mutex_unlock(&dev->filelist_mutex);
>  
> @@ -657,12 +666,11 @@ static int i915_gem_request_info(struct seq_file *m, void *data)
>  
>  		seq_printf(m, "%s requests: %d\n", engine->name, count);
>  		list_for_each_entry(req, &engine->request_list, link) {
> +			struct pid *pid = req->ctx->pid;
>  			struct task_struct *task;
>  
>  			rcu_read_lock();
> -			task = NULL;
> -			if (req->pid)
> -				task = pid_task(req->pid, PIDTYPE_PID);
> +			task = pid ? pid_task(pid, PIDTYPE_PID) : NULL;
>  			seq_printf(m, "    %x @ %d: %s [%d]\n",
>  				   req->fence.seqno,
>  				   (int) (jiffies - req->emitted_jiffies),
> @@ -1951,18 +1959,17 @@ static int i915_context_status(struct seq_file *m, void *unused)
>  
>  	list_for_each_entry(ctx, &dev_priv->context_list, link) {
>  		seq_printf(m, "HW context %u ", ctx->hw_id);
> -		if (IS_ERR(ctx->file_priv)) {
> -			seq_puts(m, "(deleted) ");
> -		} else if (ctx->file_priv) {
> -			struct pid *pid = ctx->file_priv->file->pid;
> +		if (ctx->pid) {
>  			struct task_struct *task;
>  
> -			task = get_pid_task(pid, PIDTYPE_PID);
> +			task = get_pid_task(ctx->pid, PIDTYPE_PID);
>  			if (task) {
>  				seq_printf(m, "(%s [%d]) ",
>  					   task->comm, task->pid);
>  				put_task_struct(task);
>  			}
> +		} else if (IS_ERR(ctx->file_priv)) {
> +			seq_puts(m, "(deleted) ");
>  		} else {
>  			seq_puts(m, "(kernel) ");
>  		}
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 4023718017a8..e7357656728e 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -560,6 +560,7 @@ struct drm_i915_error_state {
>  
>  		struct drm_i915_error_request {
>  			long jiffies;
> +			pid_t pid;
>  			u32 seqno;
>  			u32 tail;
>  		} *requests;
> @@ -880,6 +881,7 @@ struct i915_gem_context {
>  	struct drm_i915_private *i915;
>  	struct drm_i915_file_private *file_priv;
>  	struct i915_hw_ppgtt *ppgtt;
> +	struct pid *pid;
>  
>  	struct i915_ctx_hang_stats hang_stats;
>  
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 15eed897b498..c026d591d142 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -158,6 +158,7 @@ void i915_gem_context_free(struct kref *ctx_ref)
>  		i915_gem_object_put(ce->state->obj);
>  	}
>  
> +	put_pid(ctx->pid);
>  	list_del(&ctx->link);
>  
>  	ida_simple_remove(&ctx->i915->context_hw_ida, ctx->hw_id);
> @@ -311,6 +312,9 @@ __create_hw_context(struct drm_device *dev,
>  		ret = DEFAULT_CONTEXT_HANDLE;
>  
>  	ctx->file_priv = file_priv;
> +	if (file_priv)
> +		ctx->pid = get_task_pid(current, PIDTYPE_PID);
> +
>  	ctx->user_handle = ret;
>  	/* NB: Mark all slices as needing a remap so that when the context first
>  	 * loads it will restore whatever remap state already exists. If there
> diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
> index 187c4f9ce8d0..8fdd313248f9 100644
> --- a/drivers/gpu/drm/i915/i915_gem_request.c
> +++ b/drivers/gpu/drm/i915/i915_gem_request.c
> @@ -137,8 +137,6 @@ int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
>  	list_add_tail(&req->client_list, &file_priv->mm.request_list);
>  	spin_unlock(&file_priv->mm.lock);
>  
> -	req->pid = get_pid(task_pid(current));
> -
>  	return 0;
>  }
>  
> @@ -154,9 +152,6 @@ i915_gem_request_remove_from_client(struct drm_i915_gem_request *request)
>  	list_del(&request->client_list);
>  	request->file_priv = NULL;
>  	spin_unlock(&file_priv->mm.lock);
> -
> -	put_pid(request->pid);
> -	request->pid = NULL;
>  }
>  
>  void i915_gem_retire_noop(struct i915_gem_active *active,
> diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
> index 1f396f470a86..72a4b73cbb79 100644
> --- a/drivers/gpu/drm/i915/i915_gem_request.h
> +++ b/drivers/gpu/drm/i915/i915_gem_request.h
> @@ -134,9 +134,6 @@ struct drm_i915_gem_request {
>  	/** file_priv list entry for this request */
>  	struct list_head client_list;
>  
> -	/** process identifier submitting this request */
> -	struct pid *pid;
> -
>  	/**
>  	 * The ELSP only accepts two elements at a time, so we queue
>  	 * context/tail pairs on a given queue (ring->execlist_queue) until the
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index 9faac19029cd..52d1564f37c4 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -460,7 +460,8 @@ int i915_error_state_to_str(struct drm_i915_error_state_buf *m,
>  				   dev_priv->engine[i].name,
>  				   ee->num_requests);
>  			for (j = 0; j < ee->num_requests; j++) {
> -				err_printf(m, "  seqno 0x%08x, emitted %ld, tail 0x%08x\n",
> +				err_printf(m, "  pid %d, seqno 0x%08x, emitted %ld, tail 0x%08x\n",
> +					   ee->requests[j].pid,
>  					   ee->requests[j].seqno,
>  					   ee->requests[j].jiffies,
>  					   ee->requests[j].tail);
> @@ -1061,6 +1062,7 @@ static void i915_gem_record_rings(struct drm_i915_private *dev_priv,
>  		request = i915_gem_find_active_request(engine);
>  		if (request) {
>  			struct intel_ring *ring;
> +			struct pid *pid;
>  
>  			ee->vm = request->ctx->ppgtt ?
>  				&request->ctx->ppgtt->base : &ggtt->base;
> @@ -1082,11 +1084,12 @@ static void i915_gem_record_rings(struct drm_i915_private *dev_priv,
>  				i915_error_object_create(dev_priv,
>  							 request->ctx->engine[i].state);
>  
> -			if (request->pid) {
> +			pid = request->ctx->pid;
> +			if (pid) {
>  				struct task_struct *task;
>  
>  				rcu_read_lock();
> -				task = pid_task(request->pid, PIDTYPE_PID);
> +				task = pid_task(pid, PIDTYPE_PID);
>  				if (task) {
>  					strcpy(ee->comm, task->comm);
>  					ee->pid = task->pid;
> @@ -1150,6 +1153,10 @@ static void i915_gem_record_rings(struct drm_i915_private *dev_priv,
>  			erq->seqno = request->fence.seqno;
>  			erq->jiffies = request->emitted_jiffies;
>  			erq->tail = request->postfix;
> +
> +			rcu_read_lock();
> +			erq->pid = request->ctx ? pid_nr(request->ctx->pid) : 0;
> +			rcu_read_unlock();
>  		}
>  	}
>  }
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 29/33] drm/i915: Only record active and pending requests upon a GPU hang
  2016-08-07 14:45 ` [PATCH 29/33] drm/i915: Only record active and pending requests upon a GPU hang Chris Wilson
@ 2016-08-11 12:36   ` Joonas Lahtinen
  0 siblings, 0 replies; 125+ messages in thread
From: Joonas Lahtinen @ 2016-08-11 12:36 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On su, 2016-08-07 at 15:45 +0100, Chris Wilson wrote:
> There is no other state pertaining to the completed requests in the
> hang, other than gleamed through the ringbuffer, so including the
> expired requests in the list of outstanding requests simply adds noise.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 26/33] drm/i915: Track pinned VMA
  2016-08-11 12:18   ` Joonas Lahtinen
@ 2016-08-11 12:37     ` Chris Wilson
  0 siblings, 0 replies; 125+ messages in thread
From: Chris Wilson @ 2016-08-11 12:37 UTC (permalink / raw)
  To: Joonas Lahtinen; +Cc: intel-gfx

On Thu, Aug 11, 2016 at 03:18:44PM +0300, Joonas Lahtinen wrote:
> On su, 2016-08-07 at 15:45 +0100, Chris Wilson wrote:
> > @@ -1616,7 +1618,7 @@ i915_gem_mmap_ioctl(struct drm_device *dev, void *data,
> >  
> >  /**
> >   * i915_gem_fault - fault a page into the GTT
> > - * @vma: VMA in question
> > + * @mm: VMA in question
> 
> should be @vm or whatever the correct name.
> 
> >   * @vmf: fault info
> >   *
> >   * The fault handler is set up by drm_gem_mmap() when a object is GTT mapped
> > @@ -1630,20 +1632,21 @@ i915_gem_mmap_ioctl(struct drm_device *dev, void *data,
> >   * suffer if the GTT working set is large or there are few fence registers
> >   * left.
> >   */
> > -int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
> > +int i915_gem_fault(struct vm_area_struct *vm, struct vm_fault *vmf)
> 
> 'vm' is used elsewhere for the address space, maybe 'kvma'? Or 'area'
> (used in linux/mm.h too occasionally)

I also was tempted by kvma. But area is better.

> 
> > @@ -1722,13 +1726,13 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
> >  	} else {
> >  		if (!obj->fault_mappable) {
> >  			unsigned long size = min_t(unsigned long,
> > -						   vma->vm_end - vma->vm_start,
> > +						   vm->vm_end - vm->vm_start,
> >  						   obj->base.size);
> >  			int i;
> >  
> >  			for (i = 0; i < size >> PAGE_SHIFT; i++) {
> > -				ret = vm_insert_pfn(vma,
> > -						    (unsigned long)vma->vm_start + i * PAGE_SIZE,
> > +				ret = vm_insert_pfn(vm,
> > +						    (unsigned long)vm->vm_start + i * PAGE_SIZE,
> 
> Hmm, vm->vm_start is already unsigned long, so cast could be
> eliminated.
> 
> >  						    pfn + i);
> >  				if (ret)
> >  					break;
> > @@ -1736,12 +1740,12 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
> >  
> >  			obj->fault_mappable = true;
> >  		} else
> > -			ret = vm_insert_pfn(vma,
> > +			ret = vm_insert_pfn(vm,
> >  					    (unsigned long)vmf->virtual_address,
> >  					    pfn + page_offset);
> >  	}
> >  err_unpin:
> > -	i915_gem_object_ggtt_unpin_view(obj, &view);
> > +	__i915_vma_unpin(vma);
> >  err_unlock:
> >  	mutex_unlock(&dev->struct_mutex);
> >  err_rpm:
> > @@ -3190,7 +3194,7 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
> >  					    old_write_domain);
> >  
> >  	/* And bump the LRU for this access */
> > -	vma = i915_gem_obj_to_ggtt(obj);
> > +	vma = i915_gem_object_to_ggtt(obj, NULL);
> >  	if (vma &&
> >  	    drm_mm_node_allocated(&vma->node) &&
> >  	    !i915_vma_is_active(vma))
> > @@ -3414,11 +3418,12 @@ rpm_put:
> >   * Can be called from an uninterruptible phase (modesetting) and allows
> >   * any flushes to be pipelined (for pageflips).
> >   */
> > -int
> > +struct i915_vma *
> >  i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
> >  				     u32 alignment,
> >  				     const struct i915_ggtt_view *view)
> >  {
> > +	struct i915_vma *vma;
> >  	u32 old_read_domains, old_write_domain;
> >  	int ret;
> >  
> > @@ -3438,19 +3443,23 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
> >  	 */
> >  	ret = i915_gem_object_set_cache_level(obj,
> >  					      HAS_WT(obj->base.dev) ? I915_CACHE_WT : I915_CACHE_NONE);
> > -	if (ret)
> > +	if (ret) {
> > +		vma = ERR_PTR(ret);
> >  		goto err_unpin_display;
> > +	}
> >  
> >  	/* As the user may map the buffer once pinned in the display plane
> >  	 * (e.g. libkms for the bootup splash), we have to ensure that we
> >  	 * always use map_and_fenceable for all scanout buffers.
> >  	 */
> > -	ret = i915_gem_object_ggtt_pin(obj, view, 0, alignment,
> > +	vma = i915_gem_object_ggtt_pin(obj, view, 0, alignment,
> >  				       view->type == I915_GGTT_VIEW_NORMAL ?
> >  				       PIN_MAPPABLE : 0);
> > -	if (ret)
> > +	if (IS_ERR(vma))
> >  		goto err_unpin_display;
> >  
> > +	WARN_ON(obj->pin_display > i915_vma_pin_count(vma));
> > +
> >  	i915_gem_object_flush_cpu_write_domain(obj);
> >  
> >  	old_write_domain = obj->base.write_domain;
> > @@ -3466,23 +3475,23 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
> >  					    old_read_domains,
> >  					    old_write_domain);
> >  
> > -	return 0;
> > +	return vma;
> >  
> >  err_unpin_display:
> >  	obj->pin_display--;
> > -	return ret;
> > +	return vma;
> >  }
> >  
> >  void
> > -i915_gem_object_unpin_from_display_plane(struct drm_i915_gem_object *obj,
> > -					 const struct i915_ggtt_view *view)
> > +i915_gem_object_unpin_from_display_plane(struct i915_vma *vma)
> >  {
> > -	if (WARN_ON(obj->pin_display == 0))
> > +	if (WARN_ON(vma->obj->pin_display == 0))
> >  		return;
> >  
> > -	i915_gem_object_ggtt_unpin_view(obj, view);
> > +	vma->obj->pin_display--;
> >  
> > -	obj->pin_display--;
> > +	i915_vma_unpin(vma);
> > +	WARN_ON(vma->obj->pin_display > i915_vma_pin_count(vma));
> >  }
> >  
> >  /**
> > @@ -3679,27 +3688,25 @@ err:
> >  	return ret;
> >  }
> >  
> > -int
> > +struct i915_vma *
> >  i915_gem_object_ggtt_pin(struct drm_i915_gem_object *obj,
> > -			 const struct i915_ggtt_view *view,
> > +			 const struct i915_ggtt_view *ggtt_view,
> 
> Hmm, why distinctive name compared to other places? This function has
> _ggtt_ in the name, so should be implicit.
> 
> >  			 u64 size,
> >  			 u64 alignment,
> >  			 u64 flags)
> >  {
> 
> <SNIP>
> 
> > -/* All the new VM stuff */
> 
> Oh noes, we destroy all the new stuff :P
> 
> > @@ -349,30 +349,34 @@ relocate_entry_gtt(struct drm_i915_gem_object *obj,
> >  		   struct drm_i915_gem_relocation_entry *reloc,
> >  		   uint64_t target_offset)
> >  {
> > -	struct drm_device *dev = obj->base.dev;
> > -	struct drm_i915_private *dev_priv = to_i915(dev);
> > +	struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
> >  	struct i915_ggtt *ggtt = &dev_priv->ggtt;
> > +	struct i915_vma *vma;
> >  	uint64_t delta = relocation_target(reloc, target_offset);
> >  	uint64_t offset;
> >  	void __iomem *reloc_page;
> >  	int ret;
> >  
> > +	vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0, PIN_MAPPABLE);
> > +	if (IS_ERR(vma))
> > +		return PTR_ERR(vma);
> > +
> >  	ret = i915_gem_object_set_to_gtt_domain(obj, true);
> >  	if (ret)
> > -		return ret;
> > +		goto unpin;
> >  
> >  	ret = i915_gem_object_put_fence(obj);
> >  	if (ret)
> > -		return ret;
> > +		goto unpin;
> >  
> >  	/* Map the page containing the relocation we're going to perform.  */
> > -	offset = i915_gem_obj_ggtt_offset(obj);
> > +	offset = vma->node.start;
> >  	offset += reloc->offset;
> 
> Could concatenate to previous line now;
> 
> offset = vma->node.start + reloc->offset;
> 
> > -static struct i915_vma*
> > +static struct i915_vma *
> >  i915_gem_execbuffer_parse(struct intel_engine_cs *engine,
> >  			  struct drm_i915_gem_exec_object2 *shadow_exec_entry,
> >  			  struct drm_i915_gem_object *batch_obj,
> > @@ -1305,31 +1311,30 @@ i915_gem_execbuffer_parse(struct intel_engine_cs *engine,
> >  				      batch_start_offset,
> >  				      batch_len,
> >  				      is_master);
> > -	if (ret)
> > +	if (ret) {
> > +		if (ret == -EACCES) /* unhandled chained batch */
> > +			vma = NULL;
> > +		else
> > +			vma = ERR_PTR(ret);
> >  		goto err;
> > +	}
> >  
> > -	ret = i915_gem_object_ggtt_pin(shadow_batch_obj, NULL, 0, 0, 0);
> > -	if (ret)
> > +	vma = i915_gem_object_ggtt_pin(shadow_batch_obj, NULL, 0, 0, 0);
> > +	if (IS_ERR(vma)) {
> > +		ret = PTR_ERR(vma);
> 
> Hmm, 'err' label no longer cares about ret, so this is redundant. Or
> should the ret-to-vma translation been kept at the end?
> 
> >  		goto err;
> > -
> > -	i915_gem_object_unpin_pages(shadow_batch_obj);
> > +	}
> >  
> >  	memset(shadow_exec_entry, 0, sizeof(*shadow_exec_entry));
> >  
> > -	vma = i915_gem_obj_to_ggtt(shadow_batch_obj);
> >  	vma->exec_entry = shadow_exec_entry;
> >  	vma->exec_entry->flags = __EXEC_OBJECT_HAS_PIN;
> >  	i915_gem_object_get(shadow_batch_obj);
> >  	list_add_tail(&vma->exec_list, &eb->vmas);
> >  
> > -	return vma;
> > -
> >  err:
> >  	i915_gem_object_unpin_pages(shadow_batch_obj);
> > -	if (ret == -EACCES) /* unhandled chained batch */
> > -		return NULL;
> > -	else
> > -		return ERR_PTR(ret);
> > +	return vma;
> >  }
> >  
> 
> <SNIP>
> 
> > @@ -432,13 +432,7 @@ bool
> >  i915_gem_object_pin_fence(struct drm_i915_gem_object *obj)
> >  {
> >  	if (obj->fence_reg != I915_FENCE_REG_NONE) {
> > -		struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
> > -		struct i915_vma *ggtt_vma = i915_gem_obj_to_ggtt(obj);
> > -
> > -		WARN_ON(!ggtt_vma ||
> > -			dev_priv->fence_regs[obj->fence_reg].pin_count >
> > -			i915_vma_pin_count(ggtt_vma));
> 
> Is this WARN_ON deliberately removed, not worthy GEM_BUG_ON?

The pin_count check is not strictly true for all futures, and the
ggtt_vma can just explode if NULL until it too is replaced in a few
patches time.

> > -		dev_priv->fence_regs[obj->fence_reg].pin_count++;
> > +		to_i915(obj->base.dev)->fence_regs[obj->fence_reg].pin_count++;
> 
> This is not the most readable line of code one can imagine. dev_priv
> alias does make readability better occasionally.

This just makes another patch smaller. I've not qualms about this line
;)

> > @@ -3351,14 +3351,10 @@ __i915_gem_vma_create(struct drm_i915_gem_object *obj,
> >  
> >  	GEM_BUG_ON(vm->closed);
> >  
> > -	if (WARN_ON(i915_is_ggtt(vm) != !!view))
> > -		return ERR_PTR(-EINVAL);
> > -
> >  	vma = kmem_cache_zalloc(to_i915(obj->base.dev)->vmas, GFP_KERNEL);
> >  	if (vma == NULL)
> >  		return ERR_PTR(-ENOMEM);
> >  
> > -	INIT_LIST_HEAD(&vma->obj_link);
> 
> This disappears completely?

It was never required.
 
> > @@ -3378,56 +3373,76 @@ __i915_gem_vma_create(struct drm_i915_gem_object *obj,
> > +static inline bool vma_matches(struct i915_vma *vma,
> > +			       struct i915_address_space *vm,
> > +			       const struct i915_ggtt_view *view)
> 
> Function name is not clearest. But I can not suggest a better one off
> the top of my head.
>  
> >  static struct drm_i915_error_object *
> >  i915_error_object_create(struct drm_i915_private *dev_priv,
> > -			 struct drm_i915_gem_object *src,
> > -			 struct i915_address_space *vm)
> > +			 struct i915_vma *vma)
> >  {
> >  	struct i915_ggtt *ggtt = &dev_priv->ggtt;
> > +	struct drm_i915_gem_object *src;
> >  	struct drm_i915_error_object *dst;
> > -	struct i915_vma *vma = NULL;
> >  	int num_pages;
> >  	bool use_ggtt;
> >  	int i = 0;
> >  	u64 reloc_offset;
> >  
> > -	if (src == NULL || src->pages == NULL)
> > +	if (!vma)
> > +		return NULL;
> > +
> > +	src = vma->obj;
> > +	if (!src->pages)
> >  		return NULL;
> >  
> >  	num_pages = src->base.size >> PAGE_SHIFT;
> >  
> >  	dst = kmalloc(sizeof(*dst) + num_pages * sizeof(u32 *), GFP_ATOMIC);
> > -	if (dst == NULL)
> > +	if (!dst)
> >  		return NULL;
> >  
> > -	if (i915_gem_obj_bound(src, vm))
> > -		dst->gtt_offset = i915_gem_obj_offset(src, vm);
> > -	else
> > -		dst->gtt_offset = -1;
> 
> What was the purpose of this line previously, the calculations would
> get majestically messed up?

No purpose. It would have taken quite a bit of abuse to be able to
trigger it, and certainly nothing relevant to the hang.
 
> > @@ -1035,11 +1029,8 @@ static void i915_gem_record_rings(struct drm_i915_private *dev_priv,
> >  	struct drm_i915_gem_request *request;
> >  	int i, count;
> >  
> > -	if (dev_priv->semaphore) {
> > -		error->semaphore =
> > -			i915_error_ggtt_object_create(dev_priv,
> > -						      dev_priv->semaphore->obj);
> > -	}
> > +	error->semaphore =
> > +		i915_error_object_create(dev_priv, dev_priv->semaphore);
> 
> Not sure if I like hiding the NULL tolerancy inside the function.

Otoh, it makes it a lot cleaner.
 
> > @@ -2291,7 +2293,8 @@ void intel_unpin_fb_obj(struct drm_framebuffer *fb, unsigned int rotation)
> >  	if (view.type == I915_GGTT_VIEW_NORMAL)
> >  		i915_gem_object_unpin_fence(obj);
> >  
> > -	i915_gem_object_unpin_from_display_plane(obj, &view);
> > +	vma = i915_gem_object_to_ggtt(obj, &view);
> 
> GEM_BUG_ON(!vma)?

The goal and original patch passed in the vma to free. I gave up
tracking that patch through the ongoing atomic transition.

> This might be the only acceptable use of if () in teardown path.
> 
> I hope there is an excellent revision listing in the next iteration.
> This was a pain to go through.

vN: Address some of Joonas's stylistic nitpicks.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 25/33] drm/i915: Use VMA for wa_ctx tracking
  2016-08-11 11:02     ` Chris Wilson
@ 2016-08-11 12:41       ` Joonas Lahtinen
  0 siblings, 0 replies; 125+ messages in thread
From: Joonas Lahtinen @ 2016-08-11 12:41 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On to, 2016-08-11 at 12:02 +0100, Chris Wilson wrote:
> On Thu, Aug 11, 2016 at 01:53:40PM +0300, Joonas Lahtinen wrote:
> > 
> > On su, 2016-08-07 at 15:45 +0100, Chris Wilson wrote:
> > > 
> > > @@ -2019,9 +2023,9 @@ populate_lr_context(struct i915_gem_context *ctx,
> > >  			       RING_INDIRECT_CTX(engine->mmio_base), 0);
> > >  		ASSIGN_CTX_REG(reg_state, CTX_RCS_INDIRECT_CTX_OFFSET,
> > >  			       RING_INDIRECT_CTX_OFFSET(engine->mmio_base), 0);
> > > -		if (engine->wa_ctx.obj) {
> > > +		if (engine->wa_ctx.vma) {
> > >  			struct i915_ctx_workarounds *wa_ctx = &engine->wa_ctx;
> > > -			uint32_t ggtt_offset = i915_gem_obj_ggtt_offset(wa_ctx->obj);
> > > +			u32 ggtt_offset = wa_ctx->vma->node.start;
> > lower_32_bits()?
> I considered, I didn't to keep the changes to a minimum plus I've a
> slight unease about making it seem like we don't care about the upper 32
> bits.
> 
> static inline u32 i915_ggtt_offset(vma)
> {
> 	GEM_BUG_ON(upper_32_bits(vma->node.start));
> 	return lower_32_bits(vma->node.start);
> }
> 

I was about to suggest this, it could maybe even take u64, depending on
how the respective locations look after the VMA overhaul.

Regards, Joonas

> is possibly overkill but stops me feeling uneasy about the
> seeming truncation. Is this something that UBSAN detects?
> -Chris
> 
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 28/33] drm/i915: Move per-request pid from request to ctx
  2016-08-11 12:32   ` Joonas Lahtinen
@ 2016-08-11 12:41     ` Chris Wilson
  0 siblings, 0 replies; 125+ messages in thread
From: Chris Wilson @ 2016-08-11 12:41 UTC (permalink / raw)
  To: Joonas Lahtinen; +Cc: intel-gfx

On Thu, Aug 11, 2016 at 03:32:18PM +0300, Joonas Lahtinen wrote:
> On su, 2016-08-07 at 15:45 +0100, Chris Wilson wrote:
> > Since contexts are not currently shared between userspace processes, we
> 
> How about future?

Don't care too much (not to the cost of tracking pid on every request)
if one process gives another process the entirety of its GPU memory and
state and then proceeds to go bang. The only situation I do care about
is passing an open render fd over AF_UNIX - and fortunately those always
use a context.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 15/33] drm/i915: Track pinned vma inside guc
  2016-08-07 14:45 ` [PATCH 15/33] drm/i915: Track pinned vma inside guc Chris Wilson
@ 2016-08-11 16:19   ` Dave Gordon
  2016-08-11 16:41     ` Chris Wilson
  0 siblings, 1 reply; 125+ messages in thread
From: Dave Gordon @ 2016-08-11 16:19 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On 07/08/16 15:45, Chris Wilson wrote:
> Since the guc allocates and pins and object into the GGTT for its usage,
> it is more natural to use that pinned VMA as our resource cookie.

Well it isn't really any more natural, as we hardly ever care about the 
mapping, whereas we more frequently work with the object. So it just 
seems to introduce an unnecessary extra level of indirection as we go 
from vma to object to whatever we really want.

> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_debugfs.c        |  10 +--
>  drivers/gpu/drm/i915/i915_guc_submission.c | 131 ++++++++++++++---------------
>  drivers/gpu/drm/i915/intel_guc.h           |   9 +-
>  drivers/gpu/drm/i915/intel_guc_loader.c    |   7 +-
>  4 files changed, 77 insertions(+), 80 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index b41c05767def..e2a9fc353ef3 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -2524,15 +2524,15 @@ static int i915_guc_log_dump(struct seq_file *m, void *data)
>  	struct drm_info_node *node = m->private;
>  	struct drm_device *dev = node->minor->dev;
>  	struct drm_i915_private *dev_priv = to_i915(dev);
> -	struct drm_i915_gem_object *log_obj = dev_priv->guc.log_obj;
> -	u32 *log;
> +	struct drm_i915_gem_object *obj;

It is completely unnecessary (and undesirable) to rename this local. A 
variable called 'obj' could be any sort of an object, but we know that 
we are dealing with *here* is a *specific* object that holds the pages 
of GuC log data, so it should have it a name that tells us so.

>  	int i = 0, pg;
>
> -	if (!log_obj)
> +	if (!dev_priv->guc.log)
>  		return 0;
>
> -	for (pg = 0; pg < log_obj->base.size / PAGE_SIZE; pg++) {
> -		log = kmap_atomic(i915_gem_object_get_page(log_obj, pg));
> +	obj = dev_priv->guc.log->obj;
> +	for (pg = 0; pg < obj->base.size / PAGE_SIZE; pg++) {
> +		u32 *log = kmap_atomic(i915_gem_object_get_page(obj, pg));
>
>  		for (i = 0; i < PAGE_SIZE / sizeof(u32); i += 4)
>  			seq_printf(m, "0x%08x 0x%08x 0x%08x 0x%08x\n",
> diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
> index 03a5cef353eb..f56d68173ae6 100644
> --- a/drivers/gpu/drm/i915/i915_guc_submission.c
> +++ b/drivers/gpu/drm/i915/i915_guc_submission.c
> @@ -183,7 +183,7 @@ static int guc_update_doorbell_id(struct intel_guc *guc,
>  				  struct i915_guc_client *client,
>  				  u16 new_id)
>  {
> -	struct sg_table *sg = guc->ctx_pool_obj->pages;
> +	struct sg_table *sg = guc->ctx_pool->obj->pages;

Hi-ho, hi-ho, it's off to RAM we go.
Notice the extra '->'

>  	void *doorbell_bitmap = guc->doorbell_bitmap;
>  	struct guc_doorbell_info *doorbell;
>  	struct guc_context_desc desc;
> @@ -325,8 +325,8 @@ static void guc_init_proc_desc(struct intel_guc *guc,
>  static void guc_init_ctx_desc(struct intel_guc *guc,
>  			      struct i915_guc_client *client)
>  {
> -	struct drm_i915_gem_object *client_obj = client->client_obj;
>  	struct drm_i915_private *dev_priv = guc_to_i915(guc);
> +	struct drm_i915_gem_object *client_obj = client->client->obj;

*Ugh*

>  	struct intel_engine_cs *engine;
>  	struct i915_gem_context *ctx = client->owner;
>  	struct guc_context_desc desc;
> @@ -380,7 +380,7 @@ static void guc_init_ctx_desc(struct intel_guc *guc,
>  	 * The doorbell, process descriptor, and workqueue are all parts
>  	 * of the client object, which the GuC will reference via the GGTT
>  	 */
> -	gfx_addr = i915_gem_obj_ggtt_offset(client_obj);
> +	gfx_addr = client->client->node.start;

Insufficient abstraction.

If you want VMAs to be a primary sort of thing for code that isn't 
primarily about mappings to nonetheless work with, there should be an 
abstraction layer (macros or trivial inline accessors) to retrieve the 
things that code cares about from the 'VMA'.

	gfx_addr = i915_vma_ggtt_addr(vma);	// Or something

GuC code shouldn't have to mention 'node' or any other of the internals 
of a VMA or the underlying DRM memory-manager structure.

>  	desc.db_trigger_phy = sg_dma_address(client_obj->pages->sgl) +
>  				client->doorbell_offset;
>  	desc.db_trigger_cpu = (uintptr_t)client->client_base +
> @@ -397,7 +397,7 @@ static void guc_init_ctx_desc(struct intel_guc *guc,
>  	desc.desc_private = (uintptr_t)client;
>
>  	/* Pool context is pinned already */
> -	sg = guc->ctx_pool_obj->pages;
> +	sg = guc->ctx_pool->obj->pages;
>  	sg_pcopy_from_buffer(sg->sgl, sg->nents, &desc, sizeof(desc),
>  			     sizeof(desc) * client->ctx_index);
>  }
> @@ -410,7 +410,7 @@ static void guc_fini_ctx_desc(struct intel_guc *guc,
>
>  	memset(&desc, 0, sizeof(desc));
>
> -	sg = guc->ctx_pool_obj->pages;
> +	sg = guc->ctx_pool->obj->pages;
>  	sg_pcopy_from_buffer(sg->sgl, sg->nents, &desc, sizeof(desc),
>  			     sizeof(desc) * client->ctx_index);
>  }
> @@ -492,7 +492,7 @@ static void guc_add_workqueue_item(struct i915_guc_client *gc,
>  	/* WQ starts from the page after doorbell / process_desc */
>  	wq_page = (wq_off + GUC_DB_SIZE) >> PAGE_SHIFT;
>  	wq_off &= PAGE_SIZE - 1;
> -	base = kmap_atomic(i915_gem_object_get_page(gc->client_obj, wq_page));
> +	base = kmap_atomic(i915_gem_object_get_page(gc->client->obj, wq_page));
>  	wqi = (struct guc_wq_item *)((char *)base + wq_off);
>
>  	/* Now fill in the 4-word work queue item */
> @@ -611,8 +611,8 @@ static void i915_guc_submit(struct drm_i915_gem_request *rq)
>   */
>
>  /**
> - * gem_allocate_guc_obj() - Allocate gem object for GuC usage
> - * @dev_priv:	driver private data structure
> + * guc_allocate_vma() - Allocate a GGTT VMA for GuC usage
> + * @guc:	the guc
>   * @size:	size of object
>   *
>   * This is a wrapper to create a gem obj. In order to use it inside GuC, the
> @@ -621,45 +621,49 @@ static void i915_guc_submit(struct drm_i915_gem_request *rq)
>   *
>   * Return:	A drm_i915_gem_object if successful, otherwise NULL.

This comment is no longer correct.

>   */
> -static struct drm_i915_gem_object *
> -gem_allocate_guc_obj(struct drm_i915_private *dev_priv, u32 size)
> +static struct i915_vma *guc_allocate_vma(struct intel_guc *guc, u32 size)
>  {
> +	struct drm_i915_private *dev_priv = guc_to_i915(guc);
>  	struct drm_i915_gem_object *obj;
> +	struct i915_vma *vma;
> +	int ret;
>
>  	obj = i915_gem_object_create(&dev_priv->drm, size);
>  	if (IS_ERR(obj))
> -		return NULL;
> +		return ERR_CAST(obj);
>
> -	if (i915_gem_object_get_pages(obj)) {
> -		i915_gem_object_put(obj);
> -		return NULL;
> -	}
> +	vma = i915_vma_create(obj, &dev_priv->ggtt.base, NULL);
> +	if (IS_ERR(vma))
> +		goto err;
>
> -	if (i915_gem_object_ggtt_pin(obj, NULL, 0, PAGE_SIZE,
> -				     PIN_OFFSET_BIAS | GUC_WOPCM_TOP)) {
> -		i915_gem_object_put(obj);
> -		return NULL;
> +	ret = i915_vma_pin(vma, 0, PAGE_SIZE,
> +			   PIN_GLOBAL | PIN_OFFSET_BIAS | GUC_WOPCM_TOP);
> +	if (ret) {
> +		vma = ERR_PTR(ret);
> +		goto err;
>  	}
>
>  	/* Invalidate GuC TLB to let GuC take the latest updates to GTT. */
>  	I915_WRITE(GEN8_GTCR, GEN8_GTCR_INVALIDATE);
>
> -	return obj;
> +	return vma;
> +
> +err:
> +	i915_gem_object_put(obj);
> +	return vma;
>  }
>
>  /**
> - * gem_release_guc_obj() - Release gem object allocated for GuC usage
> - * @obj:	gem obj to be released
> + * guc_release_vma() - Release gem object allocated for GuC usage
> + * @vma:	gem obj to be released
>   */
> -static void gem_release_guc_obj(struct drm_i915_gem_object *obj)
> +static void guc_release_vma(struct i915_vma *vma)
>  {
> -	if (!obj)
> +	if (!vma)
>  		return;
>
> -	if (i915_gem_obj_is_pinned(obj))
> -		i915_gem_object_ggtt_unpin(obj);
> -
> -	i915_gem_object_put(obj);
> +	i915_vma_unpin(vma);
> +	i915_gem_object_put(vma->obj);
>  }
>
>  static void
> @@ -686,7 +690,7 @@ guc_client_free(struct drm_i915_private *dev_priv,
>  		kunmap(kmap_to_page(client->client_base));
>  	}
>
> -	gem_release_guc_obj(client->client_obj);
> +	guc_release_vma(client->client);
>
>  	if (client->ctx_index != GUC_INVALID_CTX_ID) {
>  		guc_fini_ctx_desc(guc, client);
> @@ -757,7 +761,7 @@ guc_client_alloc(struct drm_i915_private *dev_priv,
>  {
>  	struct i915_guc_client *client;
>  	struct intel_guc *guc = &dev_priv->guc;
> -	struct drm_i915_gem_object *obj;
> +	struct i915_vma *vma;
>  	uint16_t db_id;
>
>  	client = kzalloc(sizeof(*client), GFP_KERNEL);
> @@ -777,13 +781,13 @@ guc_client_alloc(struct drm_i915_private *dev_priv,
>  	}
>
>  	/* The first page is doorbell/proc_desc. Two followed pages are wq. */
> -	obj = gem_allocate_guc_obj(dev_priv, GUC_DB_SIZE + GUC_WQ_SIZE);
> -	if (!obj)
> +	vma = guc_allocate_vma(guc, GUC_DB_SIZE + GUC_WQ_SIZE);
> +	if (IS_ERR(vma))
>  		goto err;
>
>  	/* We'll keep just the first (doorbell/proc) page permanently kmap'd. */
> -	client->client_obj = obj;
> -	client->client_base = kmap(i915_gem_object_get_page(obj, 0));
> +	client->client = vma;
> +	client->client_base = kmap(i915_gem_object_get_page(vma->obj, 0));
>  	client->wq_offset = GUC_DB_SIZE;
>  	client->wq_size = GUC_WQ_SIZE;
>
> @@ -825,8 +829,7 @@ err:
>
>  static void guc_create_log(struct intel_guc *guc)
>  {
> -	struct drm_i915_private *dev_priv = guc_to_i915(guc);
> -	struct drm_i915_gem_object *obj;
> +	struct i915_vma *vma;
>  	unsigned long offset;
>  	uint32_t size, flags;
>
> @@ -842,16 +845,16 @@ static void guc_create_log(struct intel_guc *guc)
>  		GUC_LOG_ISR_PAGES + 1 +
>  		GUC_LOG_CRASH_PAGES + 1) << PAGE_SHIFT;
>
> -	obj = guc->log_obj;
> -	if (!obj) {
> -		obj = gem_allocate_guc_obj(dev_priv, size);
> -		if (!obj) {
> +	vma = guc->log;
> +	if (!vma) {
> +		vma = guc_allocate_vma(guc, size);
> +		if (IS_ERR(vma)) {
>  			/* logging will be off */
>  			i915.guc_log_level = -1;
>  			return;
>  		}
>
> -		guc->log_obj = obj;
> +		guc->log = vma;
>  	}
>
>  	/* each allocated unit is a page */
> @@ -860,7 +863,7 @@ static void guc_create_log(struct intel_guc *guc)
>  		(GUC_LOG_ISR_PAGES << GUC_LOG_ISR_SHIFT) |
>  		(GUC_LOG_CRASH_PAGES << GUC_LOG_CRASH_SHIFT);
>
> -	offset = i915_gem_obj_ggtt_offset(obj) >> PAGE_SHIFT; /* in pages */
> +	offset = vma->node.start >> PAGE_SHIFT; /* in pages */
>  	guc->log_flags = (offset << GUC_LOG_BUF_ADDR_SHIFT) | flags;
>  }
>
> @@ -889,7 +892,7 @@ static void init_guc_policies(struct guc_policies *policies)
>  static void guc_create_ads(struct intel_guc *guc)
>  {
>  	struct drm_i915_private *dev_priv = guc_to_i915(guc);
> -	struct drm_i915_gem_object *obj;
> +	struct i915_vma *vma;
>  	struct guc_ads *ads;
>  	struct guc_policies *policies;
>  	struct guc_mmio_reg_state *reg_state;
> @@ -902,16 +905,16 @@ static void guc_create_ads(struct intel_guc *guc)
>  			sizeof(struct guc_mmio_reg_state) +
>  			GUC_S3_SAVE_SPACE_PAGES * PAGE_SIZE;
>
> -	obj = guc->ads_obj;
> -	if (!obj) {
> -		obj = gem_allocate_guc_obj(dev_priv, PAGE_ALIGN(size));
> -		if (!obj)
> +	vma = guc->ads;
> +	if (!vma) {
> +		vma = guc_allocate_vma(guc, PAGE_ALIGN(size));
> +		if (IS_ERR(vma))
>  			return;
>
> -		guc->ads_obj = obj;
> +		guc->ads = vma;
>  	}
>
> -	page = i915_gem_object_get_page(obj, 0);
> +	page = i915_gem_object_get_page(vma->obj, 0);
>  	ads = kmap(page);

Changing the names & types in the top-level structure leads to confusion 
here, as the member 'guc->ads' and the existing local 'ads' now have the 
same name but quite different types.

>  	/*
> @@ -931,8 +934,7 @@ static void guc_create_ads(struct intel_guc *guc)
>  	policies = (void *)ads + sizeof(struct guc_ads);
>  	init_guc_policies(policies);
>
> -	ads->scheduler_policies = i915_gem_obj_ggtt_offset(obj) +
> -			sizeof(struct guc_ads);
> +	ads->scheduler_policies = vma->node.start + sizeof(struct guc_ads);
>
>  	/* MMIO reg state */
>  	reg_state = (void *)policies + sizeof(struct guc_policies);
> @@ -960,10 +962,9 @@ static void guc_create_ads(struct intel_guc *guc)
>   */
>  int i915_guc_submission_init(struct drm_i915_private *dev_priv)
>  {
> -	const size_t ctxsize = sizeof(struct guc_context_desc);
> -	const size_t poolsize = GUC_MAX_GPU_CONTEXTS * ctxsize;
> -	const size_t gemsize = round_up(poolsize, PAGE_SIZE);
>  	struct intel_guc *guc = &dev_priv->guc;
> +	struct i915_vma *vma;
> +	u32 size;
>
>  	/* Wipe bitmap & delete client in case of reinitialisation */
>  	bitmap_clear(guc->doorbell_bitmap, 0, GUC_MAX_DOORBELLS);
> @@ -972,13 +973,15 @@ int i915_guc_submission_init(struct drm_i915_private *dev_priv)
>  	if (!i915.enable_guc_submission)
>  		return 0; /* not enabled  */
>
> -	if (guc->ctx_pool_obj)
> +	if (guc->ctx_pool)
>  		return 0; /* already allocated */
>
> -	guc->ctx_pool_obj = gem_allocate_guc_obj(dev_priv, gemsize);
> -	if (!guc->ctx_pool_obj)
> -		return -ENOMEM;
> +	size = PAGE_ALIGN(GUC_MAX_GPU_CONTEXTS*sizeof(struct guc_context_desc));

What a long ugly line :(

Breaking it into the 'const's at the top of the function made it easier 
to follow the stages of the calculation AND was at least as efficient, 
as the compiler folded the whole calculation into a single constant in 
the [deleted] call to gem_allocate_guc_obj() above.

> +	vma = guc_allocate_vma(guc, size);
> +	if (IS_ERR(vma))
> +		return PTR_ERR(vma);
>
> +	guc->ctx_pool = vma;
>  	ida_init(&guc->ctx_ids);
>  	guc_create_log(guc);
>  	guc_create_ads(guc);
> @@ -1030,16 +1033,12 @@ void i915_guc_submission_fini(struct drm_i915_private *dev_priv)
>  {
>  	struct intel_guc *guc = &dev_priv->guc;
>
> -	gem_release_guc_obj(dev_priv->guc.ads_obj);
> -	guc->ads_obj = NULL;
> -
> -	gem_release_guc_obj(dev_priv->guc.log_obj);
> -	guc->log_obj = NULL;
> +	guc_release_vma(nullify(&guc->ads));
> +	guc_release_vma(nullify(&guc->log));

I think this is a very ugly way of hiding the clearing of the pointers.
If you want to manage references like this, it could *possibly* be a macro:

	guc_release_vma(ZAP_AFTER_USE(guc->log));

*without* the '&' so the argument has to be an lvalue; or it could more 
clearly be done by having the releasing function take a pointer to the 
pointer-to-object, which it would clear after releasing the object and 
before returning to the caller.

	guc_release_vma_ref(&guc->ads);	// also clears guc->ads

But I think even that is not as good as explicitly clearing the pointer 
immediately after the call to release(). If K&R had thought having a way 
to implicitly clear a pointer after using it was a good idea, they'd 
have put it into the language:

	struct foo *new = *saved!;	// Dereference & clear 'saved'

But they didn't, so we probably shouldn't invent one.

> -	if (guc->ctx_pool_obj)
> +	if (guc->ctx_pool)
>  		ida_destroy(&guc->ctx_ids);
> -	gem_release_guc_obj(guc->ctx_pool_obj);
> -	guc->ctx_pool_obj = NULL;
> +	guc_release_vma(nullify(&guc->ctx_pool));
>  }
>
>  /**
> diff --git a/drivers/gpu/drm/i915/intel_guc.h b/drivers/gpu/drm/i915/intel_guc.h
> index 623cf26cd784..a8da563cadb7 100644
> --- a/drivers/gpu/drm/i915/intel_guc.h
> +++ b/drivers/gpu/drm/i915/intel_guc.h
> @@ -63,7 +63,7 @@ struct drm_i915_gem_request;
>   *   retcode: errno from last guc_submit()
>   */
>  struct i915_guc_client {
> -	struct drm_i915_gem_object *client_obj;
> +	struct i915_vma *client;

We can't call this vma 'client' because that is the name commonly used 
for an instance of the i915_guc_client class. x->client->client->y is 
just horrible. You could call it 'client_vma', I suppose.

>  	void *client_base;		/* first page (only) of above	*/
>  	struct i915_gem_context *owner;
>  	struct intel_guc *guc;
> @@ -125,11 +125,10 @@ struct intel_guc_fw {
>  struct intel_guc {
>  	struct intel_guc_fw guc_fw;
>  	uint32_t log_flags;
> -	struct drm_i915_gem_object *log_obj;
> +	struct i915_vma *log;

Changing the name to 'log_vma' would be better, since I'd expect 
something called just 'log' to actually BE a log -- or at most a pointer 
to a log -- not just a pointer to something containing a pointer to 
another thing that contains a pointer to a list of pages that eventually 
hold the log data.

Ditto for the other names below.

> -	struct drm_i915_gem_object *ads_obj;
> -
> -	struct drm_i915_gem_object *ctx_pool_obj;
> +	struct i915_vma *ads;
> +	struct i915_vma *ctx_pool;
>  	struct ida ctx_ids;
>
>  	struct i915_guc_client *execbuf_client;
> diff --git a/drivers/gpu/drm/i915/intel_guc_loader.c b/drivers/gpu/drm/i915/intel_guc_loader.c
> index 3763e30cc165..58ef4418a2ef 100644
> --- a/drivers/gpu/drm/i915/intel_guc_loader.c
> +++ b/drivers/gpu/drm/i915/intel_guc_loader.c
> @@ -181,16 +181,15 @@ static void set_guc_init_params(struct drm_i915_private *dev_priv)
>  			i915.guc_log_level << GUC_LOG_VERBOSITY_SHIFT;
>  	}
>
> -	if (guc->ads_obj) {
> -		u32 ads = (u32)i915_gem_obj_ggtt_offset(guc->ads_obj)
> -				>> PAGE_SHIFT;
> +	if (guc->ads) {
> +		u32 ads = (u32)guc->ads->node.start >> PAGE_SHIFT;
>  		params[GUC_CTL_DEBUG] |= ads << GUC_ADS_ADDR_SHIFT;
>  		params[GUC_CTL_DEBUG] |= GUC_ADS_ENABLED;
>  	}
>
>  	/* If GuC submission is enabled, set up additional parameters here */
>  	if (i915.enable_guc_submission) {
> -		u32 pgs = i915_gem_obj_ggtt_offset(dev_priv->guc.ctx_pool_obj);
> +		u32 pgs = dev_priv->guc.ctx_pool->node.start;
>  		u32 ctx_in_16 = GUC_MAX_GPU_CONTEXTS / 16;
>
>  		pgs >>= PAGE_SHIFT;

Summary: I'm not totally opposed to using VMAs more generally, but here 
there just seem to be extra costs with no offsetting advantages; and the 
details of some of the above changes are just plain ugly.

If the naming and abstraction issues were resolved, the remaining 
conversions would not in themselves be too objectionable, because either
* the extra cycles don't matter (in rarely executed code), or
* we can add extra direct pointers or other cached values in the 
top-level data structures to avoid deep memory chains where necessary.

.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 15/33] drm/i915: Track pinned vma inside guc
  2016-08-11 16:19   ` Dave Gordon
@ 2016-08-11 16:41     ` Chris Wilson
  0 siblings, 0 replies; 125+ messages in thread
From: Chris Wilson @ 2016-08-11 16:41 UTC (permalink / raw)
  To: Dave Gordon; +Cc: intel-gfx

On Thu, Aug 11, 2016 at 05:19:43PM +0100, Dave Gordon wrote:
> Summary: I'm not totally opposed to using VMAs more generally, but
> here there just seem to be extra costs with no offsetting
> advantages; and the details of some of the above changes are just
> plain ugly.

Shrug. Within i915_guc_submission you are primarily allocating GGTT space.
If you didn't mean to be, then address the code.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance
  2016-08-10 11:00                     ` Joonas Lahtinen
@ 2016-08-12  9:50                       ` Joonas Lahtinen
  0 siblings, 0 replies; 125+ messages in thread
From: Joonas Lahtinen @ 2016-08-12  9:50 UTC (permalink / raw)
  To: Daniel Vetter, Chris Wilson, intel-gfx, Daniel Vetter

On ke, 2016-08-10 at 14:00 +0300, Joonas Lahtinen wrote:
> 
> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> 
> I still think it's fragile, though. But lets see once the dust settles
> if we can make improvements.
> 

Daniel pointed out that the engine_id could still be different during
middle section when the engine_id is captured if the request is briefly
reused.

So backing off with the Reviewed-by, either we handle the possibly
wrong engine_id (no extra tests, so we might actually hit it in
testing) or we avoid it completely (with locking).

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

* Re: [PATCH 04/33] drm/i915: Use RCU to annotate and enforce protection for breadcrumb's bh
  2016-08-07 14:45 ` [PATCH 04/33] drm/i915: Use RCU to annotate and enforce protection for breadcrumb's bh Chris Wilson
  2016-08-08  9:33   ` Daniel Vetter
@ 2016-08-12  9:56   ` Joonas Lahtinen
  1 sibling, 0 replies; 125+ messages in thread
From: Joonas Lahtinen @ 2016-08-12  9:56 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Daniel Vetter

On su, 2016-08-07 at 15:45 +0100, Chris Wilson wrote:
> @@ -596,11 +594,9 @@ unsigned int intel_kick_waiters(struct drm_i915_private *i915)
>  	 * RCU lock, i.e. as we call wake_up_process() we must be holding the
>  	 * rcu_read_lock().
>  	 */

Comment is not relevant to this piece of code anymore?

Looks like more proper RCU code;

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas

> -	rcu_read_lock();
>  	for_each_engine(engine, i915)
>  		if (unlikely(intel_engine_wakeup(engine)))
>  			mask |= intel_engine_flag(engine);
> -	rcu_read_unlock();
>  
>  	return mask;
>  }
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 125+ messages in thread

end of thread, other threads:[~2016-08-12  9:56 UTC | newest]

Thread overview: 125+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-07 14:45 First class VMA, take 2 Chris Wilson
2016-08-07 14:45 ` [PATCH 01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance Chris Wilson
2016-08-08  9:12   ` Daniel Vetter
2016-08-08  9:30     ` Chris Wilson
2016-08-08  9:45       ` Chris Wilson
2016-08-09  6:36         ` Joonas Lahtinen
2016-08-09  7:14           ` Chris Wilson
2016-08-09  8:48             ` Joonas Lahtinen
2016-08-09  9:05               ` Chris Wilson
2016-08-10 10:12                 ` Daniel Vetter
2016-08-10 10:13                   ` Daniel Vetter
2016-08-10 11:00                     ` Joonas Lahtinen
2016-08-12  9:50                       ` Joonas Lahtinen
2016-08-07 14:45 ` [PATCH 02/33] drm/i915: Do not overwrite the request with zero on reallocation Chris Wilson
2016-08-08  9:25   ` Daniel Vetter
2016-08-08  9:56     ` Chris Wilson
2016-08-09  6:32       ` Daniel Vetter
2016-08-07 14:45 ` [PATCH 03/33] drm/i915: Move missed interrupt detection from hangcheck to breadcrumbs Chris Wilson
2016-08-09 14:08   ` [PATCH v2] " Chris Wilson
2016-08-09 14:10   ` [PATCH v3] " Chris Wilson
2016-08-09 15:24     ` Mika Kuoppala
2016-08-07 14:45 ` [PATCH 04/33] drm/i915: Use RCU to annotate and enforce protection for breadcrumb's bh Chris Wilson
2016-08-08  9:33   ` Daniel Vetter
2016-08-12  9:56   ` Joonas Lahtinen
2016-08-07 14:45 ` [PATCH 05/33] drm/i915: Reduce amount of duplicate buffer information captured on error Chris Wilson
2016-08-10  7:04   ` Joonas Lahtinen
2016-08-10  7:15     ` Chris Wilson
2016-08-10  8:07       ` Joonas Lahtinen
2016-08-10  8:36         ` Chris Wilson
2016-08-10 10:51           ` Joonas Lahtinen
2016-08-07 14:45 ` [PATCH 06/33] drm/i915: Stop the machine whilst capturing the GPU crash dump Chris Wilson
2016-08-07 14:45 ` [PATCH 07/33] drm/i915: Store the active context object on all engines upon error Chris Wilson
2016-08-09  9:02   ` Joonas Lahtinen
2016-08-07 14:45 ` [PATCH 08/33] drm/i915: Move setting of request->batch into its single callsite Chris Wilson
2016-08-09 15:53   ` Mika Kuoppala
2016-08-09 16:04     ` Chris Wilson
2016-08-10  7:19   ` Joonas Lahtinen
2016-08-07 14:45 ` [PATCH 09/33] drm/i915: Mark unmappable GGTT entries as PIN_HIGH Chris Wilson
2016-08-08  9:09   ` Joonas Lahtinen
2016-08-09 11:05   ` Tvrtko Ursulin
2016-08-09 11:13     ` Chris Wilson
2016-08-09 11:20       ` Chris Wilson
2016-08-07 14:45 ` [PATCH 10/33] drm/i915: Remove inactive/active list from debugfs Chris Wilson
2016-08-09 10:29   ` Joonas Lahtinen
2016-08-07 14:45 ` [PATCH 11/33] drm/i915: Focus debugfs/i915_gem_pinned to show only display pins Chris Wilson
2016-08-09 10:39   ` Joonas Lahtinen
2016-08-09 10:46     ` Chris Wilson
2016-08-09 11:32       ` Joonas Lahtinen
2016-08-07 14:45 ` [PATCH 12/33] drm/i915: Reduce i915_gem_objects to only show object information Chris Wilson
2016-08-10  7:29   ` Joonas Lahtinen
2016-08-10  7:38     ` Chris Wilson
2016-08-10  8:10       ` Joonas Lahtinen
2016-08-07 14:45 ` [PATCH 13/33] drm/i915: Remove redundant WARN_ON from __i915_add_request() Chris Wilson
2016-08-08  9:03   ` Joonas Lahtinen
2016-08-07 14:45 ` [PATCH 14/33] drm/i915: Create a VMA for an object Chris Wilson
2016-08-08  9:01   ` Joonas Lahtinen
2016-08-08  9:09     ` Chris Wilson
2016-08-10 10:58       ` Joonas Lahtinen
2016-08-07 14:45 ` [PATCH 15/33] drm/i915: Track pinned vma inside guc Chris Wilson
2016-08-11 16:19   ` Dave Gordon
2016-08-11 16:41     ` Chris Wilson
2016-08-07 14:45 ` [PATCH 16/33] drm/i915: Convert fence computations to use vma directly Chris Wilson
2016-08-09 10:27   ` Joonas Lahtinen
2016-08-09 10:33     ` Chris Wilson
2016-08-07 14:45 ` [PATCH 17/33] drm/i915: Use VMA directly for checking tiling parameters Chris Wilson
2016-08-09  6:18   ` Joonas Lahtinen
2016-08-09  8:03     ` Chris Wilson
2016-08-07 14:45 ` [PATCH 18/33] drm/i915: Use VMA as the primary object for context state Chris Wilson
2016-08-10  8:03   ` Joonas Lahtinen
2016-08-10  8:25     ` Chris Wilson
2016-08-10 10:54       ` Joonas Lahtinen
2016-08-07 14:45 ` [PATCH 19/33] drm/i915: Only clflush the context object when binding Chris Wilson
2016-08-10  8:41   ` Joonas Lahtinen
2016-08-10  9:02     ` Chris Wilson
2016-08-10 10:50       ` Joonas Lahtinen
2016-08-07 14:45 ` [PATCH 20/33] drm/i915: Use VMA for ringbuffer tracking Chris Wilson
2016-08-11  9:32   ` Joonas Lahtinen
2016-08-11  9:58     ` Chris Wilson
2016-08-07 14:45 ` [PATCH 21/33] drm/i915: Use VMA for scratch page tracking Chris Wilson
2016-08-08  8:00   ` [PATCH 1/3] " Chris Wilson
2016-08-08  8:00     ` [PATCH 2/3] drm/i915: Move common scratch allocation/destroy to intel_engine_cs.c Chris Wilson
2016-08-08  9:24       ` Matthew Auld
2016-08-08  8:00     ` [PATCH 3/3] drm/i915: Move common seqno reset " Chris Wilson
2016-08-08  9:40       ` Matthew Auld
2016-08-08 10:15         ` Chris Wilson
2016-08-08 15:34           ` Matthew Auld
2016-08-11 10:06   ` [PATCH 21/33] drm/i915: Use VMA for scratch page tracking Joonas Lahtinen
2016-08-11 10:22     ` Chris Wilson
2016-08-07 14:45 ` [PATCH 22/33] drm/i915/overlay: Use VMA as the primary tracker for images Chris Wilson
2016-08-11 10:17   ` Joonas Lahtinen
2016-08-07 14:45 ` [PATCH 23/33] drm/i915: Use VMA as the primary tracker for semaphore page Chris Wilson
2016-08-11 10:42   ` Joonas Lahtinen
2016-08-07 14:45 ` [PATCH 24/33] drm/i915: Use VMA for render state page tracking Chris Wilson
2016-08-11 10:46   ` Joonas Lahtinen
2016-08-07 14:45 ` [PATCH 25/33] drm/i915: Use VMA for wa_ctx tracking Chris Wilson
2016-08-11 10:53   ` Joonas Lahtinen
2016-08-11 11:02     ` Chris Wilson
2016-08-11 12:41       ` Joonas Lahtinen
2016-08-07 14:45 ` [PATCH 26/33] drm/i915: Track pinned VMA Chris Wilson
2016-08-11 12:18   ` Joonas Lahtinen
2016-08-11 12:37     ` Chris Wilson
2016-08-07 14:45 ` [PATCH 27/33] drm/i915: Print the batchbuffer offset next to BBADDR in error state Chris Wilson
2016-08-11 12:24   ` Joonas Lahtinen
2016-08-07 14:45 ` [PATCH 28/33] drm/i915: Move per-request pid from request to ctx Chris Wilson
2016-08-11 12:32   ` Joonas Lahtinen
2016-08-11 12:41     ` Chris Wilson
2016-08-07 14:45 ` [PATCH 29/33] drm/i915: Only record active and pending requests upon a GPU hang Chris Wilson
2016-08-11 12:36   ` Joonas Lahtinen
2016-08-07 14:45 ` [PATCH 30/33] drm/i915: Record the RING_MODE register for post-mortem debugging Chris Wilson
2016-08-08 11:35   ` Joonas Lahtinen
2016-08-07 14:45 ` [PATCH 31/33] drm/i915: Always use the GTT for error capture Chris Wilson
2016-08-07 14:45 ` [PATCH 32/33] drm/i915: Consolidate error object printing Chris Wilson
2016-08-09 11:44   ` Joonas Lahtinen
2016-08-09 11:53     ` Chris Wilson
2016-08-10 10:55       ` Joonas Lahtinen
2016-08-07 14:45 ` [PATCH 33/33] drm/i915: Compress GPU objects in error state Chris Wilson
2016-08-10 10:32   ` Joonas Lahtinen
2016-08-10 10:52     ` Chris Wilson
2016-08-10 11:26       ` Joonas Lahtinen
2016-08-07 15:16 ` ✗ Ro.CI.BAT: failure for series starting with [01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance Patchwork
2016-08-08  9:46 ` ✗ Ro.CI.BAT: failure for series starting with [01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance (rev4) Patchwork
2016-08-08 10:34 ` ✗ Fi.CI.BAT: " Patchwork
2016-08-09 14:10 ` ✗ Ro.CI.BAT: failure for series starting with [01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance (rev5) Patchwork
2016-08-09 14:20 ` ✗ Ro.CI.BAT: failure for series starting with [01/33] drm/i915: Add smp_rmb() to busy ioctl's RCU dance (rev6) Patchwork
2016-08-10  6:43 ` Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.