[Intel-gfx] Breadcrumbs fixes and stall avoidance

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Intel-gfx] Breadcrumbs fixes and stall avoidance
@ 2020-07-30  9:37 Chris Wilson
  2020-07-30  9:37 ` [Intel-gfx] [PATCH 01/21] drm/i915: Add a couple of missing i915_active_fini() Chris Wilson
                   ` (24 more replies)
  0 siblings, 25 replies; 39+ messages in thread
From: Chris Wilson @ 2020-07-30  9:37 UTC (permalink / raw)
  To: intel-gfx; +Cc: thomas.hellstrom

All the breadcrumb fixes and low hanging fruit from staring at perf and
lockstat for too long to minimise the delays during submission.
-Chris


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [Intel-gfx] [PATCH 01/21] drm/i915: Add a couple of missing i915_active_fini()
  2020-07-30  9:37 [Intel-gfx] Breadcrumbs fixes and stall avoidance Chris Wilson
@ 2020-07-30  9:37 ` Chris Wilson
  2020-07-30  9:37 ` [Intel-gfx] [PATCH 02/21] drm/i915: Skip taking acquire mutex for no ref->active callback Chris Wilson
                   ` (23 subsequent siblings)
  24 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2020-07-30  9:37 UTC (permalink / raw)
  To: intel-gfx; +Cc: thomas.hellstrom, Chris Wilson

We use i915_active_fini() as a debug check on the i915_active state
before freeing. If we forget to call it, we may end up angering the
debugobjects contained within.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com>
---
 drivers/gpu/drm/i915/display/intel_frontbuffer.c    | 2 ++
 drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c | 5 ++++-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/display/intel_frontbuffer.c b/drivers/gpu/drm/i915/display/intel_frontbuffer.c
index 2979ed2588eb..d898b370d7a4 100644
--- a/drivers/gpu/drm/i915/display/intel_frontbuffer.c
+++ b/drivers/gpu/drm/i915/display/intel_frontbuffer.c
@@ -232,6 +232,8 @@ static void frontbuffer_release(struct kref *ref)
 	RCU_INIT_POINTER(obj->frontbuffer, NULL);
 	spin_unlock(&to_i915(obj->base.dev)->fb_tracking.lock);
 
+	i915_active_fini(&front->write);
+
 	i915_gem_object_put(obj);
 	kfree_rcu(front, rcu);
 }
diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c
index 73243ba59c7d..e73854dd2fe0 100644
--- a/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c
+++ b/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c
@@ -47,7 +47,10 @@ static int pulse_active(struct i915_active *active)
 
 static void pulse_free(struct kref *kref)
 {
-	kfree(container_of(kref, struct pulse, kref));
+	struct pulse *p = container_of(kref, typeof(*p), kref);
+
+	i915_active_fini(&p->active);
+	kfree(p);
 }
 
 static void pulse_put(struct pulse *p)
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Intel-gfx] [PATCH 02/21] drm/i915: Skip taking acquire mutex for no ref->active callback
  2020-07-30  9:37 [Intel-gfx] Breadcrumbs fixes and stall avoidance Chris Wilson
  2020-07-30  9:37 ` [Intel-gfx] [PATCH 01/21] drm/i915: Add a couple of missing i915_active_fini() Chris Wilson
@ 2020-07-30  9:37 ` Chris Wilson
  2020-07-30  9:37 ` [Intel-gfx] [PATCH 03/21] drm/i915: Export a preallocate variant of i915_active_acquire() Chris Wilson
                   ` (22 subsequent siblings)
  24 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2020-07-30  9:37 UTC (permalink / raw)
  To: intel-gfx; +Cc: thomas.hellstrom, Chris Wilson

If no active callback is defined for i915_active, we do not need to
serialise its enabling with the mutex. We still do only want to call the
debug activate once, and must still serialise with a concurrent retire.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com>
---
 drivers/gpu/drm/i915/i915_active.c | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
index d960d0be5bd2..500537889e66 100644
--- a/drivers/gpu/drm/i915/i915_active.c
+++ b/drivers/gpu/drm/i915/i915_active.c
@@ -416,6 +416,14 @@ bool i915_active_acquire_if_busy(struct i915_active *ref)
 	return atomic_add_unless(&ref->count, 1, 0);
 }
 
+static void __i915_active_activate(struct i915_active *ref)
+{
+	spin_lock_irq(&ref->tree_lock); /* __active_retire() */
+	if (!atomic_fetch_inc(&ref->count))
+		debug_active_activate(ref);
+	spin_unlock_irq(&ref->tree_lock);
+}
+
 int i915_active_acquire(struct i915_active *ref)
 {
 	int err;
@@ -423,19 +431,19 @@ int i915_active_acquire(struct i915_active *ref)
 	if (i915_active_acquire_if_busy(ref))
 		return 0;
 
+	if (!ref->active) {
+		__i915_active_activate(ref);
+		return 0;
+	}
+
 	err = mutex_lock_interruptible(&ref->mutex);
 	if (err)
 		return err;
 
 	if (likely(!i915_active_acquire_if_busy(ref))) {
-		if (ref->active)
-			err = ref->active(ref);
-		if (!err) {
-			spin_lock_irq(&ref->tree_lock); /* __active_retire() */
-			debug_active_activate(ref);
-			atomic_inc(&ref->count);
-			spin_unlock_irq(&ref->tree_lock);
-		}
+		err = ref->active(ref);
+		if (!err)
+			__i915_active_activate(ref);
 	}
 
 	mutex_unlock(&ref->mutex);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Intel-gfx] [PATCH 03/21] drm/i915: Export a preallocate variant of i915_active_acquire()
  2020-07-30  9:37 [Intel-gfx] Breadcrumbs fixes and stall avoidance Chris Wilson
  2020-07-30  9:37 ` [Intel-gfx] [PATCH 01/21] drm/i915: Add a couple of missing i915_active_fini() Chris Wilson
  2020-07-30  9:37 ` [Intel-gfx] [PATCH 02/21] drm/i915: Skip taking acquire mutex for no ref->active callback Chris Wilson
@ 2020-07-30  9:37 ` Chris Wilson
  2020-07-31  7:33   ` Thomas Hellström (Intel)
  2020-07-30  9:37 ` [Intel-gfx] [PATCH 04/21] drm/i915: Keep the most recently used active-fence upon discard Chris Wilson
                   ` (21 subsequent siblings)
  24 siblings, 1 reply; 39+ messages in thread
From: Chris Wilson @ 2020-07-30  9:37 UTC (permalink / raw)
  To: intel-gfx; +Cc: thomas.hellstrom, Chris Wilson

Sometimes we have to be very careful not to allocate underneath a mutex
(or spinlock) and yet still want to track activity. Enter
i915_active_acquire_for_context(). This raises the activity counter on
i915_active prior to use and ensures that the fence-tree contains a slot
for the context.

v2: Refactor active_lookup() so it can be called again before/after
locking to resolve contention. Since we protect the rbtree until we
idle, we can do a lockfree lookup, with the caveat that if another
thread performs a concurrent insertion, the rotations from the insert
may cause us to not find our target. A second pass holding the treelock
will find the target if it exists, or the place to perform our
insertion.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |   2 +-
 drivers/gpu/drm/i915/gt/intel_timeline.c      |   4 +-
 drivers/gpu/drm/i915/i915_active.c            | 150 ++++++++++++++----
 drivers/gpu/drm/i915/i915_active.h            |  12 +-
 4 files changed, 130 insertions(+), 38 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index b7a86cdec9b5..07cb2dd0f795 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -1729,7 +1729,7 @@ __parser_mark_active(struct i915_vma *vma,
 {
 	struct intel_gt_buffer_pool_node *node = vma->private;
 
-	return i915_active_ref(&node->active, tl, fence);
+	return i915_active_ref(&node->active, tl->fence_context, fence);
 }
 
 static int
diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c b/drivers/gpu/drm/i915/gt/intel_timeline.c
index 46d20f5f3ddc..acb43aebd669 100644
--- a/drivers/gpu/drm/i915/gt/intel_timeline.c
+++ b/drivers/gpu/drm/i915/gt/intel_timeline.c
@@ -484,7 +484,9 @@ __intel_timeline_get_seqno(struct intel_timeline *tl,
 	 * free it after the current request is retired, which ensures that
 	 * all writes into the cacheline from previous requests are complete.
 	 */
-	err = i915_active_ref(&tl->hwsp_cacheline->active, tl, &rq->fence);
+	err = i915_active_ref(&tl->hwsp_cacheline->active,
+			      tl->fence_context,
+			      &rq->fence);
 	if (err)
 		goto err_cacheline;
 
diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
index 500537889e66..3a728401c09c 100644
--- a/drivers/gpu/drm/i915/i915_active.c
+++ b/drivers/gpu/drm/i915/i915_active.c
@@ -28,12 +28,14 @@ static struct i915_global_active {
 } global;
 
 struct active_node {
+	struct rb_node node;
 	struct i915_active_fence base;
 	struct i915_active *ref;
-	struct rb_node node;
 	u64 timeline;
 };
 
+#define fetch_node(x) rb_entry(READ_ONCE(x), typeof(struct active_node), node)
+
 static inline struct active_node *
 node_from_active(struct i915_active_fence *active)
 {
@@ -216,12 +218,9 @@ excl_retire(struct dma_fence *fence, struct dma_fence_cb *cb)
 		active_retire(container_of(cb, struct i915_active, excl.cb));
 }
 
-static struct i915_active_fence *
-active_instance(struct i915_active *ref, struct intel_timeline *tl)
+static struct active_node *__active_lookup(struct i915_active *ref, u64 idx)
 {
-	struct active_node *node, *prealloc;
-	struct rb_node **p, *parent;
-	u64 idx = tl->fence_context;
+	struct active_node *it;
 
 	/*
 	 * We track the most recently used timeline to skip a rbtree search
@@ -230,8 +229,39 @@ active_instance(struct i915_active *ref, struct intel_timeline *tl)
 	 * after the previous activity has been retired, or if it matches the
 	 * current timeline.
 	 */
-	node = READ_ONCE(ref->cache);
-	if (node && node->timeline == idx)
+	it = READ_ONCE(ref->cache);
+	if (it && it->timeline == idx)
+		return it;
+
+	BUILD_BUG_ON(offsetof(typeof(*it), node));
+
+	/* While active, the tree can only be built; not destroyed */
+	GEM_BUG_ON(i915_active_is_idle(ref));
+
+	it = fetch_node(ref->tree.rb_node);
+	while (it) {
+		if (it->timeline < idx) {
+			it = fetch_node(it->node.rb_right);
+		} else if (it->timeline > idx) {
+			it = fetch_node(it->node.rb_left);
+		} else {
+			WRITE_ONCE(ref->cache, it);
+			break;
+		}
+	}
+
+	/* NB: If the tree rotated beneath us, we may miss our target. */
+	return it;
+}
+
+static struct i915_active_fence *
+active_instance(struct i915_active *ref, u64 idx)
+{
+	struct active_node *node, *prealloc;
+	struct rb_node **p, *parent;
+
+	node = __active_lookup(ref, idx);
+	if (likely(node))
 		return &node->base;
 
 	/* Preallocate a replacement, just in case */
@@ -268,10 +298,9 @@ active_instance(struct i915_active *ref, struct intel_timeline *tl)
 	rb_insert_color(&node->node, &ref->tree);
 
 out:
-	ref->cache = node;
+	WRITE_ONCE(ref->cache, node);
 	spin_unlock_irq(&ref->tree_lock);
 
-	BUILD_BUG_ON(offsetof(typeof(*node), base));
 	return &node->base;
 }
 
@@ -353,63 +382,102 @@ __active_del_barrier(struct i915_active *ref, struct active_node *node)
 	return ____active_del_barrier(ref, node, barrier_to_engine(node));
 }
 
-int i915_active_ref(struct i915_active *ref,
-		    struct intel_timeline *tl,
-		    struct dma_fence *fence)
+static bool
+replace_barrier(struct i915_active *ref, struct i915_active_fence *active)
+{
+	if (!is_barrier(active)) /* proto-node used by our idle barrier? */
+		return false;
+
+	/*
+	 * This request is on the kernel_context timeline, and so
+	 * we can use it to substitute for the pending idle-barrer
+	 * request that we want to emit on the kernel_context.
+	 */
+	__active_del_barrier(ref, node_from_active(active));
+	return true;
+}
+
+int i915_active_ref(struct i915_active *ref, u64 idx, struct dma_fence *fence)
 {
 	struct i915_active_fence *active;
 	int err;
 
-	lockdep_assert_held(&tl->mutex);
-
 	/* Prevent reaping in case we malloc/wait while building the tree */
 	err = i915_active_acquire(ref);
 	if (err)
 		return err;
 
-	active = active_instance(ref, tl);
+	active = active_instance(ref, idx);
 	if (!active) {
 		err = -ENOMEM;
 		goto out;
 	}
 
-	if (is_barrier(active)) { /* proto-node used by our idle barrier */
-		/*
-		 * This request is on the kernel_context timeline, and so
-		 * we can use it to substitute for the pending idle-barrer
-		 * request that we want to emit on the kernel_context.
-		 */
-		__active_del_barrier(ref, node_from_active(active));
+	if (replace_barrier(ref, active)) {
 		RCU_INIT_POINTER(active->fence, NULL);
 		atomic_dec(&ref->count);
 	}
 	if (!__i915_active_fence_set(active, fence))
-		atomic_inc(&ref->count);
+		__i915_active_acquire(ref);
 
 out:
 	i915_active_release(ref);
 	return err;
 }
 
-struct dma_fence *
-i915_active_set_exclusive(struct i915_active *ref, struct dma_fence *f)
+static struct dma_fence *
+__i915_active_set_fence(struct i915_active *ref,
+			struct i915_active_fence *active,
+			struct dma_fence *fence)
 {
 	struct dma_fence *prev;
 
-	/* We expect the caller to manage the exclusive timeline ordering */
-	GEM_BUG_ON(i915_active_is_idle(ref));
+	if (replace_barrier(ref, active)) {
+		RCU_INIT_POINTER(active->fence, fence);
+		return NULL;
+	}
 
 	rcu_read_lock();
-	prev = __i915_active_fence_set(&ref->excl, f);
+	prev = __i915_active_fence_set(active, fence);
 	if (prev)
 		prev = dma_fence_get_rcu(prev);
 	else
-		atomic_inc(&ref->count);
+		__i915_active_acquire(ref);
 	rcu_read_unlock();
 
 	return prev;
 }
 
+static struct i915_active_fence *
+__active_fence(struct i915_active *ref, u64 idx)
+{
+	struct active_node *it;
+
+	it = __active_lookup(ref, idx);
+	if (unlikely(!it)) { /* Contention with parallel tree builders! */
+		spin_lock_irq(&ref->tree_lock);
+		it = __active_lookup(ref, idx);
+		spin_unlock_irq(&ref->tree_lock);
+	}
+	GEM_BUG_ON(!it); /* slot must be preallocated */
+
+	return &it->base;
+}
+
+struct dma_fence *
+__i915_active_ref(struct i915_active *ref, u64 idx, struct dma_fence *fence)
+{
+	/* Only valid while active, see i915_active_acquire_for_context() */
+	return __i915_active_set_fence(ref, __active_fence(ref, idx), fence);
+}
+
+struct dma_fence *
+i915_active_set_exclusive(struct i915_active *ref, struct dma_fence *f)
+{
+	/* We expect the caller to manage the exclusive timeline ordering */
+	return __i915_active_set_fence(ref, &ref->excl, f);
+}
+
 bool i915_active_acquire_if_busy(struct i915_active *ref)
 {
 	debug_active_assert(ref);
@@ -451,6 +519,24 @@ int i915_active_acquire(struct i915_active *ref)
 	return err;
 }
 
+int i915_active_acquire_for_context(struct i915_active *ref, u64 idx)
+{
+	struct i915_active_fence *active;
+	int err;
+
+	err = i915_active_acquire(ref);
+	if (err)
+		return err;
+
+	active = active_instance(ref, idx);
+	if (!active) {
+		i915_active_release(ref);
+		return -ENOMEM;
+	}
+
+	return 0; /* return with active ref */
+}
+
 void i915_active_release(struct i915_active *ref)
 {
 	debug_active_assert(ref);
@@ -754,7 +840,7 @@ static struct active_node *reuse_idle_barrier(struct i915_active *ref, u64 idx)
 match:
 	rb_erase(p, &ref->tree); /* Hide from waits and sibling allocations */
 	if (p == &ref->cache->node)
-		ref->cache = NULL;
+		WRITE_ONCE(ref->cache, NULL);
 	spin_unlock_irq(&ref->tree_lock);
 
 	return rb_entry(p, struct active_node, node);
@@ -812,7 +898,7 @@ int i915_active_acquire_preallocate_barrier(struct i915_active *ref,
 			 */
 			RCU_INIT_POINTER(node->base.fence, ERR_PTR(-EAGAIN));
 			node->base.cb.node.prev = (void *)engine;
-			atomic_inc(&ref->count);
+			__i915_active_acquire(ref);
 		}
 		GEM_BUG_ON(rcu_access_pointer(node->base.fence) != ERR_PTR(-EAGAIN));
 
diff --git a/drivers/gpu/drm/i915/i915_active.h b/drivers/gpu/drm/i915/i915_active.h
index cf4058150966..73ded3c52a04 100644
--- a/drivers/gpu/drm/i915/i915_active.h
+++ b/drivers/gpu/drm/i915/i915_active.h
@@ -163,14 +163,16 @@ void __i915_active_init(struct i915_active *ref,
 	__i915_active_init(ref, active, retire, &__mkey, &__wkey);	\
 } while (0)
 
-int i915_active_ref(struct i915_active *ref,
-		    struct intel_timeline *tl,
-		    struct dma_fence *fence);
+struct dma_fence *
+__i915_active_ref(struct i915_active *ref, u64 idx, struct dma_fence *fence);
+int i915_active_ref(struct i915_active *ref, u64 idx, struct dma_fence *fence);
 
 static inline int
 i915_active_add_request(struct i915_active *ref, struct i915_request *rq)
 {
-	return i915_active_ref(ref, i915_request_timeline(rq), &rq->fence);
+	return i915_active_ref(ref,
+			       i915_request_timeline(rq)->fence_context,
+			       &rq->fence);
 }
 
 struct dma_fence *
@@ -198,7 +200,9 @@ int i915_request_await_active(struct i915_request *rq,
 #define I915_ACTIVE_AWAIT_BARRIER BIT(2)
 
 int i915_active_acquire(struct i915_active *ref);
+int i915_active_acquire_for_context(struct i915_active *ref, u64 idx);
 bool i915_active_acquire_if_busy(struct i915_active *ref);
+
 void i915_active_release(struct i915_active *ref);
 
 static inline void __i915_active_acquire(struct i915_active *ref)
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Intel-gfx] [PATCH 04/21] drm/i915: Keep the most recently used active-fence upon discard
  2020-07-30  9:37 [Intel-gfx] Breadcrumbs fixes and stall avoidance Chris Wilson
                   ` (2 preceding siblings ...)
  2020-07-30  9:37 ` [Intel-gfx] [PATCH 03/21] drm/i915: Export a preallocate variant of i915_active_acquire() Chris Wilson
@ 2020-07-30  9:37 ` Chris Wilson
  2020-07-30  9:37 ` [Intel-gfx] [PATCH 05/21] drm/i915: Make the stale cached active node available for any timeline Chris Wilson
                   ` (20 subsequent siblings)
  24 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2020-07-30  9:37 UTC (permalink / raw)
  To: intel-gfx; +Cc: thomas.hellstrom, Chris Wilson

Whenever an i915_active idles, we prune its tree of old fence slots to
prevent a gradual leak should it be used to track many, many timelines.
The downside is that we then have to frequently reallocate the rbtree.
A compromise is that we keep the most recently used fence slot, and
reuse that for the next active reference as that is the most likely
timeline to be reused.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com>
---
 drivers/gpu/drm/i915/i915_active.c | 27 ++++++++++++++++++++-------
 drivers/gpu/drm/i915/i915_active.h |  4 ----
 2 files changed, 20 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
index 3a728401c09c..b9bd5578ff54 100644
--- a/drivers/gpu/drm/i915/i915_active.c
+++ b/drivers/gpu/drm/i915/i915_active.c
@@ -130,8 +130,8 @@ static inline void debug_active_assert(struct i915_active *ref) { }
 static void
 __active_retire(struct i915_active *ref)
 {
+	struct rb_root root = RB_ROOT;
 	struct active_node *it, *n;
-	struct rb_root root;
 	unsigned long flags;
 
 	GEM_BUG_ON(i915_active_is_idle(ref));
@@ -143,9 +143,21 @@ __active_retire(struct i915_active *ref)
 	GEM_BUG_ON(rcu_access_pointer(ref->excl.fence));
 	debug_active_deactivate(ref);
 
-	root = ref->tree;
-	ref->tree = RB_ROOT;
-	ref->cache = NULL;
+	/* Even if we have not used the cache, we may still have a barrier */
+	if (!ref->cache)
+		ref->cache = fetch_node(ref->tree.rb_node);
+
+	/* Keep the MRU cached node for reuse */
+	if (ref->cache) {
+		/* Discard all other nodes in the tree */
+		rb_erase(&ref->cache->node, &ref->tree);
+		root = ref->tree;
+
+		/* Rebuild the tree with only the cached node */
+		rb_link_node(&ref->cache->node, NULL, &ref->tree.rb_node);
+		rb_insert_color(&ref->cache->node, &ref->tree);
+		GEM_BUG_ON(ref->tree.rb_node != &ref->cache->node);
+	}
 
 	spin_unlock_irqrestore(&ref->tree_lock, flags);
 
@@ -156,6 +168,7 @@ __active_retire(struct i915_active *ref)
 	/* ... except if you wait on it, you must manage your own references! */
 	wake_up_var(ref);
 
+	/* Finally free the discarded timeline tree  */
 	rbtree_postorder_for_each_entry_safe(it, n, &root, node) {
 		GEM_BUG_ON(i915_active_fence_isset(&it->base));
 		kmem_cache_free(global.slab_cache, it);
@@ -745,16 +758,16 @@ int i915_sw_fence_await_active(struct i915_sw_fence *fence,
 	return await_active(ref, flags, sw_await_fence, fence, fence);
 }
 
-#if IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)
 void i915_active_fini(struct i915_active *ref)
 {
 	debug_active_fini(ref);
 	GEM_BUG_ON(atomic_read(&ref->count));
 	GEM_BUG_ON(work_pending(&ref->work));
-	GEM_BUG_ON(!RB_EMPTY_ROOT(&ref->tree));
 	mutex_destroy(&ref->mutex);
+
+	if (ref->cache)
+		kmem_cache_free(global.slab_cache, ref->cache);
 }
-#endif
 
 static inline bool is_idle_barrier(struct active_node *node, u64 idx)
 {
diff --git a/drivers/gpu/drm/i915/i915_active.h b/drivers/gpu/drm/i915/i915_active.h
index 73ded3c52a04..b9e0394e2975 100644
--- a/drivers/gpu/drm/i915/i915_active.h
+++ b/drivers/gpu/drm/i915/i915_active.h
@@ -217,11 +217,7 @@ i915_active_is_idle(const struct i915_active *ref)
 	return !atomic_read(&ref->count);
 }
 
-#if IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)
 void i915_active_fini(struct i915_active *ref);
-#else
-static inline void i915_active_fini(struct i915_active *ref) { }
-#endif
 
 int i915_active_acquire_preallocate_barrier(struct i915_active *ref,
 					    struct intel_engine_cs *engine);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Intel-gfx] [PATCH 05/21] drm/i915: Make the stale cached active node available for any timeline
  2020-07-30  9:37 [Intel-gfx] Breadcrumbs fixes and stall avoidance Chris Wilson
                   ` (3 preceding siblings ...)
  2020-07-30  9:37 ` [Intel-gfx] [PATCH 04/21] drm/i915: Keep the most recently used active-fence upon discard Chris Wilson
@ 2020-07-30  9:37 ` Chris Wilson
  2020-07-30  9:37 ` [Intel-gfx] [PATCH 06/21] drm/i915: Reduce locking around i915_active_acquire_preallocate_barrier() Chris Wilson
                   ` (19 subsequent siblings)
  24 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2020-07-30  9:37 UTC (permalink / raw)
  To: intel-gfx; +Cc: thomas.hellstrom, Chris Wilson

Rather than require the next timeline after idling to match the MRU
before idling, reset the index on the node and allow it to match the
first request. However, this requires cmpxchg(u64) and so is not trivial
on 32b, so for compatibility we just fallback to keeping the cached node
pointing to the MRU timeline.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_active.c | 30 ++++++++++++++++++++++++++++--
 1 file changed, 28 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
index b9bd5578ff54..7b51045c8461 100644
--- a/drivers/gpu/drm/i915/i915_active.c
+++ b/drivers/gpu/drm/i915/i915_active.c
@@ -157,6 +157,10 @@ __active_retire(struct i915_active *ref)
 		rb_link_node(&ref->cache->node, NULL, &ref->tree.rb_node);
 		rb_insert_color(&ref->cache->node, &ref->tree);
 		GEM_BUG_ON(ref->tree.rb_node != &ref->cache->node);
+
+		/* Make the cached node available for reuse with any timeline */
+		if (IS_ENABLED(CONFIG_64BIT))
+			ref->cache->timeline = 0; /* needs cmpxchg(u64) */
 	}
 
 	spin_unlock_irqrestore(&ref->tree_lock, flags);
@@ -235,6 +239,8 @@ static struct active_node *__active_lookup(struct i915_active *ref, u64 idx)
 {
 	struct active_node *it;
 
+	GEM_BUG_ON(idx == 0); /* 0 is the unordered timeline, rsvd for cache */
+
 	/*
 	 * We track the most recently used timeline to skip a rbtree search
 	 * for the common case, under typical loads we never need the rbtree
@@ -243,8 +249,28 @@ static struct active_node *__active_lookup(struct i915_active *ref, u64 idx)
 	 * current timeline.
 	 */
 	it = READ_ONCE(ref->cache);
-	if (it && it->timeline == idx)
-		return it;
+	if (it) {
+		u64 cached = READ_ONCE(it->timeline);
+
+		/* Once claimed, this slot will only belong to this idx */
+		if (cached == idx)
+			return it;
+
+#ifdef CONFIG_64BIT /* for cmpxchg(u64) */
+		/*
+		 * An unclaimed cache [.timeline=0] can only be claimed once.
+		 *
+		 * If the value is already non-zero, some other thread has
+		 * claimed the cache and we know that is does not match our
+		 * idx. If, and only if, the timeline is currently zero is it
+		 * worth competing to claim it atomically for ourselves (for
+		 * only the winner of that race will cmpxchg return the old
+		 * value of 0).
+		 */
+		if (!cached && !cmpxchg(&it->timeline, 0, idx))
+			return it;
+#endif
+	}
 
 	BUILD_BUG_ON(offsetof(typeof(*it), node));
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Intel-gfx] [PATCH 06/21] drm/i915: Reduce locking around i915_active_acquire_preallocate_barrier()
  2020-07-30  9:37 [Intel-gfx] Breadcrumbs fixes and stall avoidance Chris Wilson
                   ` (4 preceding siblings ...)
  2020-07-30  9:37 ` [Intel-gfx] [PATCH 05/21] drm/i915: Make the stale cached active node available for any timeline Chris Wilson
@ 2020-07-30  9:37 ` Chris Wilson
  2020-07-31  7:39   ` Thomas Hellström (Intel)
  2020-07-30  9:37 ` [Intel-gfx] [PATCH 07/21] drm/i915: Provide a fastpath for waiting on vma bindings Chris Wilson
                   ` (18 subsequent siblings)
  24 siblings, 1 reply; 39+ messages in thread
From: Chris Wilson @ 2020-07-30  9:37 UTC (permalink / raw)
  To: intel-gfx; +Cc: thomas.hellstrom, Chris Wilson

As the conversion between idle-barrier and full i915_active_fence is
already serialised by explicit memory barriers, we can reduce the
spinlock in i915_active_acquire_preallocate_barrier() for finding an
idle-barrier to reuse to an RCU read lock to ensure the fence remains
valid, only taking the spinlock for the update of the rbtree itself.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_active.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
index 7b51045c8461..5dd52bb6d38c 100644
--- a/drivers/gpu/drm/i915/i915_active.c
+++ b/drivers/gpu/drm/i915/i915_active.c
@@ -807,7 +807,6 @@ static struct active_node *reuse_idle_barrier(struct i915_active *ref, u64 idx)
 	if (RB_EMPTY_ROOT(&ref->tree))
 		return NULL;
 
-	spin_lock_irq(&ref->tree_lock);
 	GEM_BUG_ON(i915_active_is_idle(ref));
 
 	/*
@@ -872,11 +871,10 @@ static struct active_node *reuse_idle_barrier(struct i915_active *ref, u64 idx)
 			goto match;
 	}
 
-	spin_unlock_irq(&ref->tree_lock);
-
 	return NULL;
 
 match:
+	spin_lock_irq(&ref->tree_lock);
 	rb_erase(p, &ref->tree); /* Hide from waits and sibling allocations */
 	if (p == &ref->cache->node)
 		WRITE_ONCE(ref->cache, NULL);
@@ -911,7 +909,9 @@ int i915_active_acquire_preallocate_barrier(struct i915_active *ref,
 		struct llist_node *prev = first;
 		struct active_node *node;
 
+		rcu_read_lock();
 		node = reuse_idle_barrier(ref, idx);
+		rcu_read_unlock();
 		if (!node) {
 			node = kmem_cache_alloc(global.slab_cache, GFP_KERNEL);
 			if (!node) {
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Intel-gfx] [PATCH 07/21] drm/i915: Provide a fastpath for waiting on vma bindings
  2020-07-30  9:37 [Intel-gfx] Breadcrumbs fixes and stall avoidance Chris Wilson
                   ` (5 preceding siblings ...)
  2020-07-30  9:37 ` [Intel-gfx] [PATCH 06/21] drm/i915: Reduce locking around i915_active_acquire_preallocate_barrier() Chris Wilson
@ 2020-07-30  9:37 ` Chris Wilson
  2020-07-30  9:37 ` [Intel-gfx] [PATCH 08/21] drm/i915/gem: Reduce ctx->engine_mutex for reading the clone source Chris Wilson
                   ` (17 subsequent siblings)
  24 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2020-07-30  9:37 UTC (permalink / raw)
  To: intel-gfx; +Cc: thomas.hellstrom, Chris Wilson

Before we can execute a request, we must wait for all of its vma to be
bound. This is a frequent operation for which we can optimise away a
few atomic operations (notably a cmpxchg) in lieu of the RCU protection.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_active.h | 15 +++++++++++++++
 drivers/gpu/drm/i915/i915_vma.c    |  9 +++++++--
 2 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_active.h b/drivers/gpu/drm/i915/i915_active.h
index b9e0394e2975..fb165d3f01cf 100644
--- a/drivers/gpu/drm/i915/i915_active.h
+++ b/drivers/gpu/drm/i915/i915_active.h
@@ -231,4 +231,19 @@ struct i915_active *i915_active_create(void);
 struct i915_active *i915_active_get(struct i915_active *ref);
 void i915_active_put(struct i915_active *ref);
 
+static inline int __i915_request_await_exclusive(struct i915_request *rq,
+						 struct i915_active *active)
+{
+	struct dma_fence *fence;
+	int err = 0;
+
+	fence = i915_active_fence_get(&active->excl);
+	if (fence) {
+		err = i915_request_await_dma_fence(rq, fence);
+		dma_fence_put(fence);
+	}
+
+	return err;
+}
+
 #endif /* _I915_ACTIVE_H_ */
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index bc64f773dcdb..cd12047c7791 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -1167,6 +1167,12 @@ void i915_vma_revoke_mmap(struct i915_vma *vma)
 		list_del(&vma->obj->userfault_link);
 }
 
+static int
+__i915_request_await_bind(struct i915_request *rq, struct i915_vma *vma)
+{
+	return __i915_request_await_exclusive(rq, &vma->active);
+}
+
 int __i915_vma_move_to_active(struct i915_vma *vma, struct i915_request *rq)
 {
 	int err;
@@ -1174,8 +1180,7 @@ int __i915_vma_move_to_active(struct i915_vma *vma, struct i915_request *rq)
 	GEM_BUG_ON(!i915_vma_is_pinned(vma));
 
 	/* Wait for the vma to be bound before we start! */
-	err = i915_request_await_active(rq, &vma->active,
-					I915_ACTIVE_AWAIT_EXCL);
+	err = __i915_request_await_bind(rq, vma);
 	if (err)
 		return err;
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Intel-gfx] [PATCH 08/21] drm/i915/gem: Reduce ctx->engine_mutex for reading the clone source
  2020-07-30  9:37 [Intel-gfx] Breadcrumbs fixes and stall avoidance Chris Wilson
                   ` (6 preceding siblings ...)
  2020-07-30  9:37 ` [Intel-gfx] [PATCH 07/21] drm/i915: Provide a fastpath for waiting on vma bindings Chris Wilson
@ 2020-07-30  9:37 ` Chris Wilson
  2020-07-30  9:37 ` [Intel-gfx] [PATCH 09/21] drm/i915/gem: Reduce ctx->engines_mutex for get_engines() Chris Wilson
                   ` (16 subsequent siblings)
  24 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2020-07-30  9:37 UTC (permalink / raw)
  To: intel-gfx; +Cc: thomas.hellstrom, Chris Wilson

When cloning the engines from the source context, we need to ensure that
the engines are not freed as we copy them, and that the flags we clone
from the source correspond with the engines we copy across. To do this
we need only take a reference to the src->engines, rather than hold the
src->engine_mutex, so long as we verify that nothing changed under the
read.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c | 24 +++++++++++++--------
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index d0bdb6d447ed..b5b179f96d77 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -752,7 +752,8 @@ __create_context(struct drm_i915_private *i915)
 }
 
 static inline struct i915_gem_engines *
-__context_engines_await(const struct i915_gem_context *ctx)
+__context_engines_await(const struct i915_gem_context *ctx,
+			bool *user_engines)
 {
 	struct i915_gem_engines *engines;
 
@@ -761,6 +762,10 @@ __context_engines_await(const struct i915_gem_context *ctx)
 		engines = rcu_dereference(ctx->engines);
 		GEM_BUG_ON(!engines);
 
+		if (user_engines)
+			*user_engines = i915_gem_context_user_engines(ctx);
+
+		/* successful await => strong mb */
 		if (unlikely(!i915_sw_fence_await(&engines->fence)))
 			continue;
 
@@ -784,7 +789,7 @@ context_apply_all(struct i915_gem_context *ctx,
 	struct intel_context *ce;
 	int err = 0;
 
-	e = __context_engines_await(ctx);
+	e = __context_engines_await(ctx, NULL);
 	for_each_gem_engine(ce, e, it) {
 		err = fn(ce, data);
 		if (err)
@@ -1117,7 +1122,7 @@ static int context_barrier_task(struct i915_gem_context *ctx,
 		return err;
 	}
 
-	e = __context_engines_await(ctx);
+	e = __context_engines_await(ctx, NULL);
 	if (!e) {
 		i915_active_release(&cb->base);
 		return -ENOENT;
@@ -2114,11 +2119,14 @@ static int copy_ring_size(struct intel_context *dst,
 static int clone_engines(struct i915_gem_context *dst,
 			 struct i915_gem_context *src)
 {
-	struct i915_gem_engines *e = i915_gem_context_lock_engines(src);
-	struct i915_gem_engines *clone;
+	struct i915_gem_engines *clone, *e;
 	bool user_engines;
 	unsigned long n;
 
+	e = __context_engines_await(src, &user_engines);
+	if (!e)
+		return -ENOENT;
+
 	clone = alloc_engines(e->num_engines);
 	if (!clone)
 		goto err_unlock;
@@ -2160,9 +2168,7 @@ static int clone_engines(struct i915_gem_context *dst,
 		}
 	}
 	clone->num_engines = n;
-
-	user_engines = i915_gem_context_user_engines(src);
-	i915_gem_context_unlock_engines(src);
+	i915_sw_fence_complete(&e->fence);
 
 	/* Serialised by constructor */
 	engines_idle_release(dst, rcu_replace_pointer(dst->engines, clone, 1));
@@ -2173,7 +2179,7 @@ static int clone_engines(struct i915_gem_context *dst,
 	return 0;
 
 err_unlock:
-	i915_gem_context_unlock_engines(src);
+	i915_sw_fence_complete(&e->fence);
 	return -ENOMEM;
 }
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Intel-gfx] [PATCH 09/21] drm/i915/gem: Reduce ctx->engines_mutex for get_engines()
  2020-07-30  9:37 [Intel-gfx] Breadcrumbs fixes and stall avoidance Chris Wilson
                   ` (7 preceding siblings ...)
  2020-07-30  9:37 ` [Intel-gfx] [PATCH 08/21] drm/i915/gem: Reduce ctx->engine_mutex for reading the clone source Chris Wilson
@ 2020-07-30  9:37 ` Chris Wilson
  2020-07-30  9:37 ` [Intel-gfx] [PATCH 10/21] drm/i915: Remove requirement for holding i915_request.lock for breadcrumbs Chris Wilson
                   ` (15 subsequent siblings)
  24 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2020-07-30  9:37 UTC (permalink / raw)
  To: intel-gfx; +Cc: thomas.hellstrom, Chris Wilson

Take a snapshot of the ctx->engines, so we can avoid taking the
ctx->engines_mutex for a mere read in get_engines().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c | 39 +++++----------------
 1 file changed, 8 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index b5b179f96d77..f43f0ca4eec9 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -1862,27 +1862,6 @@ set_engines(struct i915_gem_context *ctx,
 	return 0;
 }
 
-static struct i915_gem_engines *
-__copy_engines(struct i915_gem_engines *e)
-{
-	struct i915_gem_engines *copy;
-	unsigned int n;
-
-	copy = alloc_engines(e->num_engines);
-	if (!copy)
-		return ERR_PTR(-ENOMEM);
-
-	for (n = 0; n < e->num_engines; n++) {
-		if (e->engines[n])
-			copy->engines[n] = intel_context_get(e->engines[n]);
-		else
-			copy->engines[n] = NULL;
-	}
-	copy->num_engines = n;
-
-	return copy;
-}
-
 static int
 get_engines(struct i915_gem_context *ctx,
 	    struct drm_i915_gem_context_param *args)
@@ -1890,19 +1869,17 @@ get_engines(struct i915_gem_context *ctx,
 	struct i915_context_param_engines __user *user;
 	struct i915_gem_engines *e;
 	size_t n, count, size;
+	bool user_engines;
 	int err = 0;
 
-	err = mutex_lock_interruptible(&ctx->engines_mutex);
-	if (err)
-		return err;
+	e = __context_engines_await(ctx, &user_engines);
+	if (!e)
+		return -ENOENT;
 
-	e = NULL;
-	if (i915_gem_context_user_engines(ctx))
-		e = __copy_engines(i915_gem_context_engines(ctx));
-	mutex_unlock(&ctx->engines_mutex);
-	if (IS_ERR_OR_NULL(e)) {
+	if (!user_engines) {
+		i915_sw_fence_complete(&e->fence);
 		args->size = 0;
-		return PTR_ERR_OR_ZERO(e);
+		return 0;
 	}
 
 	count = e->num_engines;
@@ -1953,7 +1930,7 @@ get_engines(struct i915_gem_context *ctx,
 	args->size = size;
 
 err_free:
-	free_engines(e);
+	i915_sw_fence_complete(&e->fence);
 	return err;
 }
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Intel-gfx] [PATCH 10/21] drm/i915: Remove requirement for holding i915_request.lock for breadcrumbs
  2020-07-30  9:37 [Intel-gfx] Breadcrumbs fixes and stall avoidance Chris Wilson
                   ` (8 preceding siblings ...)
  2020-07-30  9:37 ` [Intel-gfx] [PATCH 09/21] drm/i915/gem: Reduce ctx->engines_mutex for get_engines() Chris Wilson
@ 2020-07-30  9:37 ` Chris Wilson
  2020-07-30  9:37 ` [Intel-gfx] [PATCH 11/21] drm/i915/gt: Replace intel_engine_transfer_stale_breadcrumbs Chris Wilson
                   ` (14 subsequent siblings)
  24 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2020-07-30  9:37 UTC (permalink / raw)
  To: intel-gfx; +Cc: thomas.hellstrom, Chris Wilson

Since the breadcrumb enabling/cancelling itself is serialised by the
breadcrumbs.irq_lock, with a bit of care we can remove the outer
serialisation with i915_request.lock for concurrent
dma_fence_enable_signaling(). This has the important side-effect of
eliminating the nested i915_request.lock within request submission.

The challenge in serialisation is around the unsubmission where we take
an active request that wants a breadcrumb on the signaling engine and
put it to sleep. We do not want a concurrent
dma_fence_enable_signaling() to attach a breadcrumb as we unsubmit, so
we must mark the request as no longer active before serialising with the
concurrent enable-signaling.

On retire, we serialise with the concurrent enable-signaling, but
instead of clearing ACTIVE, we mark it as SIGNALED.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 130 +++++++++++++-------
 drivers/gpu/drm/i915/gt/intel_lrc.c         |  14 ---
 drivers/gpu/drm/i915/i915_request.c         |  39 +++---
 3 files changed, 100 insertions(+), 83 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
index 91786310c114..3d211a0c2b5a 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
@@ -220,17 +220,17 @@ static void signal_irq_work(struct irq_work *work)
 	}
 }
 
-static bool __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b)
+static void __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b)
 {
 	struct intel_engine_cs *engine =
 		container_of(b, struct intel_engine_cs, breadcrumbs);
 
 	lockdep_assert_held(&b->irq_lock);
 	if (b->irq_armed)
-		return true;
+		return;
 
 	if (!intel_gt_pm_get_if_awake(engine->gt))
-		return false;
+		return;
 
 	/*
 	 * The breadcrumb irq will be disarmed on the interrupt after the
@@ -250,8 +250,6 @@ static bool __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b)
 
 	if (!b->irq_enabled++)
 		irq_enable(engine);
-
-	return true;
 }
 
 void intel_engine_init_breadcrumbs(struct intel_engine_cs *engine)
@@ -310,57 +308,99 @@ void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine)
 {
 }
 
-bool i915_request_enable_breadcrumb(struct i915_request *rq)
+static void insert_breadcrumb(struct i915_request *rq,
+			      struct intel_breadcrumbs *b)
 {
-	lockdep_assert_held(&rq->lock);
+	struct intel_context *ce = rq->context;
+	struct list_head *pos;
 
-	if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &rq->fence.flags))
-		return true;
+	if (test_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags))
+		return;
 
-	if (test_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags)) {
-		struct intel_breadcrumbs *b = &rq->engine->breadcrumbs;
-		struct intel_context *ce = rq->context;
-		struct list_head *pos;
+	__intel_breadcrumbs_arm_irq(b);
 
-		spin_lock(&b->irq_lock);
+	/*
+	 * We keep the seqno in retirement order, so we can break
+	 * inside intel_engine_signal_breadcrumbs as soon as we've
+	 * passed the last completed request (or seen a request that
+	 * hasn't event started). We could walk the timeline->requests,
+	 * but keeping a separate signalers_list has the advantage of
+	 * hopefully being much smaller than the full list and so
+	 * provides faster iteration and detection when there are no
+	 * more interrupts required for this context.
+	 *
+	 * We typically expect to add new signalers in order, so we
+	 * start looking for our insertion point from the tail of
+	 * the list.
+	 */
+	list_for_each_prev(pos, &ce->signals) {
+		struct i915_request *it =
+			list_entry(pos, typeof(*it), signal_link);
 
-		if (test_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags))
-			goto unlock;
+		if (i915_seqno_passed(rq->fence.seqno, it->fence.seqno))
+			break;
+	}
+	list_add(&rq->signal_link, pos);
+	if (pos == &ce->signals) /* catch transitions from empty list */
+		list_move_tail(&ce->signal_link, &b->signalers);
+	GEM_BUG_ON(!check_signal_order(ce, rq));
 
-		if (!__intel_breadcrumbs_arm_irq(b))
-			goto unlock;
+	set_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags);
+}
 
-		/*
-		 * We keep the seqno in retirement order, so we can break
-		 * inside intel_engine_signal_breadcrumbs as soon as we've
-		 * passed the last completed request (or seen a request that
-		 * hasn't event started). We could walk the timeline->requests,
-		 * but keeping a separate signalers_list has the advantage of
-		 * hopefully being much smaller than the full list and so
-		 * provides faster iteration and detection when there are no
-		 * more interrupts required for this context.
-		 *
-		 * We typically expect to add new signalers in order, so we
-		 * start looking for our insertion point from the tail of
-		 * the list.
-		 */
-		list_for_each_prev(pos, &ce->signals) {
-			struct i915_request *it =
-				list_entry(pos, typeof(*it), signal_link);
+bool i915_request_enable_breadcrumb(struct i915_request *rq)
+{
+	struct intel_breadcrumbs *b;
 
-			if (i915_seqno_passed(rq->fence.seqno, it->fence.seqno))
-				break;
-		}
-		list_add(&rq->signal_link, pos);
-		if (pos == &ce->signals) /* catch transitions from empty list */
-			list_move_tail(&ce->signal_link, &b->signalers);
-		GEM_BUG_ON(!check_signal_order(ce, rq));
+	/* Serialises with i915_request_retire() using rq->lock */
+	if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &rq->fence.flags))
+		return true;
 
-		set_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags);
-unlock:
+	/*
+	 * Peek at i915_request_submit()/i915_request_unsubmit() status.
+	 *
+	 * If the request is not yet active (and not signaled), we will
+	 * attach the breadcrumb later.
+	 */
+	if (!test_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags))
+		return true;
+
+	/*
+	 * rq->engine is locked by rq->engine->active.lock. That however
+	 * is not known until after rq->engine has been dereferenced and
+	 * the lock acquired. Hence we acquire the lock and then validate
+	 * that rq->engine still matches the lock we hold for it.
+	 *
+	 * Here, we are using the breadcrumb lock as a proxy for the
+	 * rq->engine->active.lock, and we know that since the breadcrumb
+	 * will be serialised within i915_request_submit/i915_request_unsubmit,
+	 * the engine cannot change while active as long as we hold the
+	 * breadcrumb lock on that engine.
+	 *
+	 * From the dma_fence_enable_signaling() path, we are outside of the
+	 * request submit/unsubmit path, and so we must be more careful to
+	 * acquire the right lock.
+	 */
+	b = &READ_ONCE(rq->engine)->breadcrumbs;
+	spin_lock(&b->irq_lock);
+	while (unlikely(b != &READ_ONCE(rq->engine)->breadcrumbs)) {
 		spin_unlock(&b->irq_lock);
+		b = &READ_ONCE(rq->engine)->breadcrumbs;
+		spin_lock(&b->irq_lock);
 	}
 
+	/*
+	 * Now that we are finally serialised with request submit/unsubmit,
+	 * [with b->irq_lock] and with i915_request_retire() [via checking
+	 * SIGNALED with rq->lock] confirm the request is indeed active. If
+	 * it is no longer active, the breadcrumb will be attached upon
+	 * i915_request_submit().
+	 */
+	if (test_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags))
+		insert_breadcrumb(rq, b);
+
+	spin_unlock(&b->irq_lock);
+
 	return !__request_completed(rq);
 }
 
@@ -368,8 +408,6 @@ void i915_request_cancel_breadcrumb(struct i915_request *rq)
 {
 	struct intel_breadcrumbs *b = &rq->engine->breadcrumbs;
 
-	lockdep_assert_held(&rq->lock);
-
 	/*
 	 * We must wait for b->irq_lock so that we know the interrupt handler
 	 * has released its reference to the intel_context and has completed
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 353b1717fe84..ae886081a431 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -1148,20 +1148,6 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine)
 		} else {
 			struct intel_engine_cs *owner = rq->context->engine;
 
-			/*
-			 * Decouple the virtual breadcrumb before moving it
-			 * back to the virtual engine -- we don't want the
-			 * request to complete in the background and try
-			 * and cancel the breadcrumb on the virtual engine
-			 * (instead of the old engine where it is linked)!
-			 */
-			if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
-				     &rq->fence.flags)) {
-				spin_lock_nested(&rq->lock,
-						 SINGLE_DEPTH_NESTING);
-				i915_request_cancel_breadcrumb(rq);
-				spin_unlock(&rq->lock);
-			}
 			WRITE_ONCE(rq->engine, owner);
 			owner->submit_request(rq);
 			active = NULL;
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index a83641b7aa39..350bf6f158cb 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -304,11 +304,12 @@ bool i915_request_retire(struct i915_request *rq)
 		dma_fence_signal_locked(&rq->fence);
 	if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &rq->fence.flags))
 		i915_request_cancel_breadcrumb(rq);
+	spin_unlock_irq(&rq->lock);
+
 	if (i915_request_has_waitboost(rq)) {
 		GEM_BUG_ON(!atomic_read(&rq->engine->gt->rps.num_waiters));
 		atomic_dec(&rq->engine->gt->rps.num_waiters);
 	}
-	spin_unlock_irq(&rq->lock);
 
 	/*
 	 * We only loosely track inflight requests across preemption,
@@ -591,17 +592,9 @@ bool __i915_request_submit(struct i915_request *request)
 	 */
 	__notify_execute_cb_irq(request);
 
-	/* We may be recursing from the signal callback of another i915 fence */
-	if (!i915_request_signaled(request)) {
-		spin_lock_nested(&request->lock, SINGLE_DEPTH_NESTING);
-
-		if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
-			     &request->fence.flags) &&
-		    !i915_request_enable_breadcrumb(request))
-			intel_engine_signal_breadcrumbs(engine);
-
-		spin_unlock(&request->lock);
-	}
+	if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &request->fence.flags) &&
+	    !i915_request_enable_breadcrumb(request))
+		intel_engine_signal_breadcrumbs(engine);
 
 	return result;
 }
@@ -623,27 +616,27 @@ void __i915_request_unsubmit(struct i915_request *request)
 {
 	struct intel_engine_cs *engine = request->engine;
 
+	/*
+	 * Only unwind in reverse order, required so that the per-context list
+	 * is kept in seqno/ring order.
+	 */
 	RQ_TRACE(request, "\n");
 
 	GEM_BUG_ON(!irqs_disabled());
 	lockdep_assert_held(&engine->active.lock);
 
 	/*
-	 * Only unwind in reverse order, required so that the per-context list
-	 * is kept in seqno/ring order.
+	 * Before we remove this breadcrumb from the signal list, we have
+	 * to ensure that a concurrent dma_fence_enable_signaling() does not
+	 * attach itself. We first mark the request as no longer active and
+	 * make sure that is visible to other cores, and then remove the
+	 * breadcrumb if attached.
 	 */
-
-	/* We may be recursing from the signal callback of another i915 fence */
-	spin_lock_nested(&request->lock, SINGLE_DEPTH_NESTING);
-
+	GEM_BUG_ON(!test_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags));
+	clear_bit_unlock(I915_FENCE_FLAG_ACTIVE, &request->fence.flags);
 	if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &request->fence.flags))
 		i915_request_cancel_breadcrumb(request);
 
-	GEM_BUG_ON(!test_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags));
-	clear_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags);
-
-	spin_unlock(&request->lock);
-
 	/* We've already spun, don't charge on resubmitting. */
 	if (request->sched.semaphores && i915_request_started(request))
 		request->sched.semaphores = 0;
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Intel-gfx] [PATCH 11/21] drm/i915/gt: Replace intel_engine_transfer_stale_breadcrumbs
  2020-07-30  9:37 [Intel-gfx] Breadcrumbs fixes and stall avoidance Chris Wilson
                   ` (9 preceding siblings ...)
  2020-07-30  9:37 ` [Intel-gfx] [PATCH 10/21] drm/i915: Remove requirement for holding i915_request.lock for breadcrumbs Chris Wilson
@ 2020-07-30  9:37 ` Chris Wilson
  2020-07-30  9:37 ` [Intel-gfx] [PATCH 12/21] drm/i915/gt: Only transfer the virtual context to the new engine if active Chris Wilson
                   ` (13 subsequent siblings)
  24 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2020-07-30  9:37 UTC (permalink / raw)
  To: intel-gfx; +Cc: thomas.hellstrom, Chris Wilson

After staring at the breadcrumb enabling/cancellation and coming to the
conclusion that the cause of the mysterious stale breadcrumbs must the
act of submitting a completed requests, we can then redirect those
completed requests onto a dedicated signaled_list at the time of
construction and so eliminate intel_engine_transfer_stale_breadcrumbs().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 50 ++++++++-------------
 drivers/gpu/drm/i915/gt/intel_engine.h      |  3 --
 drivers/gpu/drm/i915/gt/intel_lrc.c         | 15 -------
 drivers/gpu/drm/i915/i915_request.c         |  5 +--
 4 files changed, 21 insertions(+), 52 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
index 3d211a0c2b5a..fbdc465a5870 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
@@ -142,16 +142,16 @@ static void add_retire(struct intel_breadcrumbs *b, struct intel_timeline *tl)
 	intel_engine_add_retire(engine, tl);
 }
 
-static void __signal_request(struct i915_request *rq, struct list_head *signals)
+static bool __signal_request(struct i915_request *rq, struct list_head *signals)
 {
-	GEM_BUG_ON(!test_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags));
 	clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags);
 
 	if (!__dma_fence_signal(&rq->fence))
-		return;
+		return false;
 
 	i915_request_get(rq);
 	list_add_tail(&rq->signal_link, signals);
+	return true;
 }
 
 static void signal_irq_work(struct irq_work *work)
@@ -278,32 +278,6 @@ void intel_engine_reset_breadcrumbs(struct intel_engine_cs *engine)
 	spin_unlock_irqrestore(&b->irq_lock, flags);
 }
 
-void intel_engine_transfer_stale_breadcrumbs(struct intel_engine_cs *engine,
-					     struct intel_context *ce)
-{
-	struct intel_breadcrumbs *b = &engine->breadcrumbs;
-	unsigned long flags;
-
-	spin_lock_irqsave(&b->irq_lock, flags);
-	if (!list_empty(&ce->signals)) {
-		struct i915_request *rq, *next;
-
-		/* Queue for executing the signal callbacks in the irq_work */
-		list_for_each_entry_safe(rq, next, &ce->signals, signal_link) {
-			GEM_BUG_ON(rq->engine != engine);
-			GEM_BUG_ON(!__request_completed(rq));
-
-			__signal_request(rq, &b->signaled_requests);
-		}
-
-		INIT_LIST_HEAD(&ce->signals);
-		list_del_init(&ce->signal_link);
-
-		irq_work_queue(&b->irq_work);
-	}
-	spin_unlock_irqrestore(&b->irq_lock, flags);
-}
-
 void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine)
 {
 }
@@ -317,6 +291,17 @@ static void insert_breadcrumb(struct i915_request *rq,
 	if (test_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags))
 		return;
 
+	/*
+	 * If the request is already completed, we can transfer it
+	 * straight onto a signaled list, and queue the irq worker for
+	 * its signal completion.
+	 */
+	if (__request_completed(rq)) {
+		if (__signal_request(rq, &b->signaled_requests))
+			irq_work_queue(&b->irq_work);
+		return;
+	}
+
 	__intel_breadcrumbs_arm_irq(b);
 
 	/*
@@ -344,8 +329,11 @@ static void insert_breadcrumb(struct i915_request *rq,
 	if (pos == &ce->signals) /* catch transitions from empty list */
 		list_move_tail(&ce->signal_link, &b->signalers);
 	GEM_BUG_ON(!check_signal_order(ce, rq));
-
 	set_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags);
+
+	/* Check after attaching to irq, interrupt may have already fired. */
+	if (__request_completed(rq))
+		irq_work_queue(&b->irq_work);
 }
 
 bool i915_request_enable_breadcrumb(struct i915_request *rq)
@@ -401,7 +389,7 @@ bool i915_request_enable_breadcrumb(struct i915_request *rq)
 
 	spin_unlock(&b->irq_lock);
 
-	return !__request_completed(rq);
+	return true;
 }
 
 void i915_request_cancel_breadcrumb(struct i915_request *rq)
diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
index a9249a23903a..faf00a353e25 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine.h
@@ -237,9 +237,6 @@ intel_engine_signal_breadcrumbs(struct intel_engine_cs *engine)
 void intel_engine_reset_breadcrumbs(struct intel_engine_cs *engine);
 void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine);
 
-void intel_engine_transfer_stale_breadcrumbs(struct intel_engine_cs *engine,
-					     struct intel_context *ce);
-
 void intel_engine_print_breadcrumbs(struct intel_engine_cs *engine,
 				    struct drm_printer *p);
 
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index ae886081a431..a4959e8229ac 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -1805,18 +1805,6 @@ static bool virtual_matches(const struct virtual_engine *ve,
 	return true;
 }
 
-static void virtual_xfer_breadcrumbs(struct virtual_engine *ve)
-{
-	/*
-	 * All the outstanding signals on ve->siblings[0] must have
-	 * been completed, just pending the interrupt handler. As those
-	 * signals still refer to the old sibling (via rq->engine), we must
-	 * transfer those to the old irq_worker to keep our locking
-	 * consistent.
-	 */
-	intel_engine_transfer_stale_breadcrumbs(ve->siblings[0], &ve->context);
-}
-
 #define for_each_waiter(p__, rq__) \
 	list_for_each_entry_lockless(p__, \
 				     &(rq__)->sched.waiters_list, \
@@ -2275,9 +2263,6 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 					virtual_update_register_offsets(regs,
 									engine);
 
-				if (!list_empty(&ve->context.signals))
-					virtual_xfer_breadcrumbs(ve);
-
 				/*
 				 * Move the bound engine to the top of the list
 				 * for future execution. We then kick this
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 350bf6f158cb..0cf2a10a24f1 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -592,9 +592,8 @@ bool __i915_request_submit(struct i915_request *request)
 	 */
 	__notify_execute_cb_irq(request);
 
-	if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &request->fence.flags) &&
-	    !i915_request_enable_breadcrumb(request))
-		intel_engine_signal_breadcrumbs(engine);
+	if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &request->fence.flags))
+		i915_request_enable_breadcrumb(request);
 
 	return result;
 }
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Intel-gfx] [PATCH 12/21] drm/i915/gt: Only transfer the virtual context to the new engine if active
  2020-07-30  9:37 [Intel-gfx] Breadcrumbs fixes and stall avoidance Chris Wilson
                   ` (10 preceding siblings ...)
  2020-07-30  9:37 ` [Intel-gfx] [PATCH 11/21] drm/i915/gt: Replace intel_engine_transfer_stale_breadcrumbs Chris Wilson
@ 2020-07-30  9:37 ` Chris Wilson
  2020-07-30  9:37 ` [Intel-gfx] [PATCH 13/21] drm/i915/gt: Distinguish the virtual breadcrumbs from the irq breadcrumbs Chris Wilson
                   ` (12 subsequent siblings)
  24 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2020-07-30  9:37 UTC (permalink / raw)
  To: intel-gfx; +Cc: Nayana, Venkata Ramana, thomas.hellstrom, Chris Wilson

One more complication of preempt-to-busy with respect to the virtual
engine is that we may have retired the last request along the virtual
engine at the same time as preparing to submit the completed request to
a new engine. That submit will be shortcircuited, but not before we have
updated the context with the new register offsets and marked the virtual
engine as bound to the new engine (by calling swap on ve->siblings[]).
As we may have just retired the completed request, we may also be in the
middle of calling virtual_context_exit() to turn off the power management
associated with the virtual engine, and that in turn walks the
ve->siblings[]. If we happen to call swap() on the array as we walk, we
will call intel_engine_pm_put() twice on the same engine.

In this patch, we prevent this by only updating the bound engine after a
successful submission which weeds out the already completed requests.

Alternatively, we could walk a non-volatile array for the pm, such as
using the engine->mask. The small advantage to performing the update
after the submit is that we then only have to do a swap for active
requests.

Fixes: 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy")
References: 6d06779e8672 ("drm/i915: Load balancing across a virtual engine"
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: "Nayana, Venkata Ramana" <venkata.ramana.nayana@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_lrc.c | 65 ++++++++++++++++++-----------
 1 file changed, 40 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index a4959e8229ac..904e9b8bcbf6 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -1805,6 +1805,33 @@ static bool virtual_matches(const struct virtual_engine *ve,
 	return true;
 }
 
+static void virtual_xfer_context(struct virtual_engine *ve,
+				 struct intel_engine_cs *engine)
+{
+	unsigned int n;
+
+	if (likely(engine == ve->siblings[0]))
+		return;
+
+	GEM_BUG_ON(READ_ONCE(ve->context.inflight));
+	if (!intel_engine_has_relative_mmio(engine))
+		virtual_update_register_offsets(ve->context.lrc_reg_state,
+						engine);
+
+	/*
+	 * Move the bound engine to the top of the list for
+	 * future execution. We then kick this tasklet first
+	 * before checking others, so that we preferentially
+	 * reuse this set of bound registers.
+	 */
+	for (n = 1; n < ve->num_siblings; n++) {
+		if (ve->siblings[n] == engine) {
+			swap(ve->siblings[n], ve->siblings[0]);
+			break;
+		}
+	}
+}
+
 #define for_each_waiter(p__, rq__) \
 	list_for_each_entry_lockless(p__, \
 				     &(rq__)->sched.waiters_list, \
@@ -2253,35 +2280,23 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 			GEM_BUG_ON(!(rq->execution_mask & engine->mask));
 			WRITE_ONCE(rq->engine, engine);
 
-			if (engine != ve->siblings[0]) {
-				u32 *regs = ve->context.lrc_reg_state;
-				unsigned int n;
-
-				GEM_BUG_ON(READ_ONCE(ve->context.inflight));
-
-				if (!intel_engine_has_relative_mmio(engine))
-					virtual_update_register_offsets(regs,
-									engine);
-
+			if (__i915_request_submit(rq)) {
 				/*
-				 * Move the bound engine to the top of the list
-				 * for future execution. We then kick this
-				 * tasklet first before checking others, so that
-				 * we preferentially reuse this set of bound
-				 * registers.
+				 * Only after we confirm that we will submit
+				 * this request (i.e. it has not already
+				 * completed), do we want to update the context.
+				 *
+				 * This serves two purposes. It avoids
+				 * unnecessary work if we are resubmitting an
+				 * already completed request after timeslicing.
+				 * But more importantly, it prevents us altering
+				 * ve->siblings[] on an idle context, where
+				 * we may be using ve->siblings[] in
+				 * virtual_context_enter / virtual_context_exit.
 				 */
-				for (n = 1; n < ve->num_siblings; n++) {
-					if (ve->siblings[n] == engine) {
-						swap(ve->siblings[n],
-						     ve->siblings[0]);
-						break;
-					}
-				}
-
+				virtual_xfer_context(ve, engine);
 				GEM_BUG_ON(ve->siblings[0] != engine);
-			}
 
-			if (__i915_request_submit(rq)) {
 				submit = true;
 				last = rq;
 			}
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Intel-gfx] [PATCH 13/21] drm/i915/gt: Distinguish the virtual breadcrumbs from the irq breadcrumbs
  2020-07-30  9:37 [Intel-gfx] Breadcrumbs fixes and stall avoidance Chris Wilson
                   ` (11 preceding siblings ...)
  2020-07-30  9:37 ` [Intel-gfx] [PATCH 12/21] drm/i915/gt: Only transfer the virtual context to the new engine if active Chris Wilson
@ 2020-07-30  9:37 ` Chris Wilson
  2020-07-31 14:53   ` Tvrtko Ursulin
  2020-07-30  9:37 ` [Intel-gfx] [PATCH 14/21] drm/i915/gt: Move intel_breadcrumbs_arm_irq earlier Chris Wilson
                   ` (11 subsequent siblings)
  24 siblings, 1 reply; 39+ messages in thread
From: Chris Wilson @ 2020-07-30  9:37 UTC (permalink / raw)
  To: intel-gfx; +Cc: thomas.hellstrom, Chris Wilson

On the virtual engines, we only use the intel_breadcrumbs for tracking
signaling of stale breadcrumbs from the irq_workers. They do not have
any associated interrupt handling, active requests are passed to a
physical engine and associated breadcrumb interrupt handler. This causes
issues for us as we need to ensure that we do not actually try and
enable interrupts and the powermanagement required for them on the
virtual engine, as they will never be disabled. Instead, let's
specify the physical engine used for interrupt handler on a particular
breadcrumb.

v2: Drop b->irq_armed = true mocking for no interrupt HW

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_breadcrumbs.c   | 76 ++++++++++---------
 drivers/gpu/drm/i915/gt/intel_breadcrumbs.h   | 36 +++++++++
 .../gpu/drm/i915/gt/intel_breadcrumbs_types.h | 47 ++++++++++++
 drivers/gpu/drm/i915/gt/intel_engine.h        | 17 -----
 drivers/gpu/drm/i915/gt/intel_engine_cs.c     | 14 +++-
 drivers/gpu/drm/i915/gt/intel_engine_pm.c     |  3 +-
 drivers/gpu/drm/i915/gt/intel_engine_types.h  | 31 +-------
 drivers/gpu/drm/i915/gt/intel_gt_irq.c        |  1 +
 drivers/gpu/drm/i915/gt/intel_lrc.c           | 11 ++-
 drivers/gpu/drm/i915/gt/intel_reset.c         |  1 +
 .../gpu/drm/i915/gt/intel_ring_submission.c   |  3 +-
 drivers/gpu/drm/i915/gt/intel_rps.c           |  1 +
 drivers/gpu/drm/i915/gt/mock_engine.c         | 10 ++-
 drivers/gpu/drm/i915/i915_irq.c               |  1 +
 drivers/gpu/drm/i915/i915_request.c           |  1 +
 drivers/gpu/drm/i915/i915_request.h           |  4 -
 16 files changed, 162 insertions(+), 95 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/gt/intel_breadcrumbs.h
 create mode 100644 drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h

diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
index fbdc465a5870..2ffd47a86656 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
@@ -28,6 +28,7 @@
 
 #include "i915_drv.h"
 #include "i915_trace.h"
+#include "intel_breadcrumbs.h"
 #include "intel_gt_pm.h"
 #include "intel_gt_requests.h"
 
@@ -55,30 +56,28 @@ static void irq_disable(struct intel_engine_cs *engine)
 
 static void __intel_breadcrumbs_disarm_irq(struct intel_breadcrumbs *b)
 {
-	struct intel_engine_cs *engine =
-		container_of(b, struct intel_engine_cs, breadcrumbs);
-
 	lockdep_assert_held(&b->irq_lock);
 
+	if (!b->irq_engine || !b->irq_armed)
+		return;
+
 	GEM_BUG_ON(!b->irq_enabled);
 	if (!--b->irq_enabled)
-		irq_disable(engine);
+		irq_disable(b->irq_engine);
 
 	WRITE_ONCE(b->irq_armed, false);
-	intel_gt_pm_put_async(engine->gt);
+	intel_gt_pm_put_async(b->irq_engine->gt);
 }
 
-void intel_engine_disarm_breadcrumbs(struct intel_engine_cs *engine)
+void intel_breadcrumbs_park(struct intel_breadcrumbs *b)
 {
-	struct intel_breadcrumbs *b = &engine->breadcrumbs;
 	unsigned long flags;
 
 	if (!READ_ONCE(b->irq_armed))
 		return;
 
 	spin_lock_irqsave(&b->irq_lock, flags);
-	if (b->irq_armed)
-		__intel_breadcrumbs_disarm_irq(b);
+	__intel_breadcrumbs_disarm_irq(b);
 	spin_unlock_irqrestore(&b->irq_lock, flags);
 }
 
@@ -133,13 +132,8 @@ __dma_fence_signal__notify(struct dma_fence *fence,
 
 static void add_retire(struct intel_breadcrumbs *b, struct intel_timeline *tl)
 {
-	struct intel_engine_cs *engine =
-		container_of(b, struct intel_engine_cs, breadcrumbs);
-
-	if (unlikely(intel_engine_is_virtual(engine)))
-		engine = intel_virtual_engine_get_sibling(engine, 0);
-
-	intel_engine_add_retire(engine, tl);
+	if (b->irq_engine)
+		intel_engine_add_retire(b->irq_engine, tl);
 }
 
 static bool __signal_request(struct i915_request *rq, struct list_head *signals)
@@ -164,7 +158,7 @@ static void signal_irq_work(struct irq_work *work)
 
 	spin_lock(&b->irq_lock);
 
-	if (b->irq_armed && list_empty(&b->signalers))
+	if (list_empty(&b->signalers))
 		__intel_breadcrumbs_disarm_irq(b);
 
 	list_splice_init(&b->signaled_requests, &signal);
@@ -222,14 +216,12 @@ static void signal_irq_work(struct irq_work *work)
 
 static void __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b)
 {
-	struct intel_engine_cs *engine =
-		container_of(b, struct intel_engine_cs, breadcrumbs);
-
 	lockdep_assert_held(&b->irq_lock);
-	if (b->irq_armed)
+
+	if (!b->irq_engine || b->irq_armed)
 		return;
 
-	if (!intel_gt_pm_get_if_awake(engine->gt))
+	if (!intel_gt_pm_get_if_awake(b->irq_engine->gt))
 		return;
 
 	/*
@@ -249,37 +241,49 @@ static void __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b)
 	 */
 
 	if (!b->irq_enabled++)
-		irq_enable(engine);
+		irq_enable(b->irq_engine);
 }
 
-void intel_engine_init_breadcrumbs(struct intel_engine_cs *engine)
+struct intel_breadcrumbs *
+intel_breadcrumbs_create(struct intel_engine_cs *irq_engine)
 {
-	struct intel_breadcrumbs *b = &engine->breadcrumbs;
+	struct intel_breadcrumbs *b;
+
+	b = kzalloc(sizeof(*b), GFP_KERNEL);
+	if (!b)
+		return NULL;
 
 	spin_lock_init(&b->irq_lock);
 	INIT_LIST_HEAD(&b->signalers);
 	INIT_LIST_HEAD(&b->signaled_requests);
 
 	init_irq_work(&b->irq_work, signal_irq_work);
+
+	b->irq_engine = irq_engine;
+
+	return b;
 }
 
-void intel_engine_reset_breadcrumbs(struct intel_engine_cs *engine)
+void intel_breadcrumbs_reset(struct intel_breadcrumbs *b)
 {
-	struct intel_breadcrumbs *b = &engine->breadcrumbs;
 	unsigned long flags;
 
+	if (!b->irq_engine)
+		return;
+
 	spin_lock_irqsave(&b->irq_lock, flags);
 
 	if (b->irq_enabled)
-		irq_enable(engine);
+		irq_enable(b->irq_engine);
 	else
-		irq_disable(engine);
+		irq_disable(b->irq_engine);
 
 	spin_unlock_irqrestore(&b->irq_lock, flags);
 }
 
-void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine)
+void intel_breadcrumbs_free(struct intel_breadcrumbs *b)
 {
+	kfree(b);
 }
 
 static void insert_breadcrumb(struct i915_request *rq,
@@ -369,11 +373,11 @@ bool i915_request_enable_breadcrumb(struct i915_request *rq)
 	 * request submit/unsubmit path, and so we must be more careful to
 	 * acquire the right lock.
 	 */
-	b = &READ_ONCE(rq->engine)->breadcrumbs;
+	b = READ_ONCE(rq->engine)->breadcrumbs;
 	spin_lock(&b->irq_lock);
-	while (unlikely(b != &READ_ONCE(rq->engine)->breadcrumbs)) {
+	while (unlikely(b != READ_ONCE(rq->engine)->breadcrumbs)) {
 		spin_unlock(&b->irq_lock);
-		b = &READ_ONCE(rq->engine)->breadcrumbs;
+		b = READ_ONCE(rq->engine)->breadcrumbs;
 		spin_lock(&b->irq_lock);
 	}
 
@@ -394,7 +398,7 @@ bool i915_request_enable_breadcrumb(struct i915_request *rq)
 
 void i915_request_cancel_breadcrumb(struct i915_request *rq)
 {
-	struct intel_breadcrumbs *b = &rq->engine->breadcrumbs;
+	struct intel_breadcrumbs *b = rq->engine->breadcrumbs;
 
 	/*
 	 * We must wait for b->irq_lock so that we know the interrupt handler
@@ -418,11 +422,11 @@ void i915_request_cancel_breadcrumb(struct i915_request *rq)
 void intel_engine_print_breadcrumbs(struct intel_engine_cs *engine,
 				    struct drm_printer *p)
 {
-	struct intel_breadcrumbs *b = &engine->breadcrumbs;
+	struct intel_breadcrumbs *b = engine->breadcrumbs;
 	struct intel_context *ce;
 	struct i915_request *rq;
 
-	if (list_empty(&b->signalers))
+	if (!b || list_empty(&b->signalers))
 		return;
 
 	drm_printf(p, "Signals:\n");
diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h
new file mode 100644
index 000000000000..ed3d1deabfbd
--- /dev/null
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h
@@ -0,0 +1,36 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2019 Intel Corporation
+ */
+
+#ifndef __INTEL_BREADCRUMBS__
+#define __INTEL_BREADCRUMBS__
+
+#include <linux/irq_work.h>
+
+#include "intel_engine_types.h"
+
+struct drm_printer;
+struct i915_request;
+struct intel_breadcrumbs;
+
+struct intel_breadcrumbs *
+intel_breadcrumbs_create(struct intel_engine_cs *irq_engine);
+void intel_breadcrumbs_free(struct intel_breadcrumbs *b);
+
+void intel_breadcrumbs_reset(struct intel_breadcrumbs *b);
+void intel_breadcrumbs_park(struct intel_breadcrumbs *b);
+
+static inline void
+intel_engine_signal_breadcrumbs(struct intel_engine_cs *engine)
+{
+	irq_work_queue(&engine->breadcrumbs->irq_work);
+}
+
+void intel_engine_print_breadcrumbs(struct intel_engine_cs *engine,
+				    struct drm_printer *p);
+
+bool i915_request_enable_breadcrumb(struct i915_request *request);
+void i915_request_cancel_breadcrumb(struct i915_request *request);
+
+#endif /* __INTEL_BREADCRUMBS__ */
diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h b/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
new file mode 100644
index 000000000000..8e53b9942695
--- /dev/null
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
@@ -0,0 +1,47 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2019 Intel Corporation
+ */
+
+#ifndef __INTEL_BREADCRUMBS_TYPES__
+#define __INTEL_BREADCRUMBS_TYPES__
+
+#include <linux/irq_work.h>
+#include <linux/list.h>
+#include <linux/spinlock.h>
+#include <linux/types.h>
+
+/*
+ * Rather than have every client wait upon all user interrupts,
+ * with the herd waking after every interrupt and each doing the
+ * heavyweight seqno dance, we delegate the task (of being the
+ * bottom-half of the user interrupt) to the first client. After
+ * every interrupt, we wake up one client, who does the heavyweight
+ * coherent seqno read and either goes back to sleep (if incomplete),
+ * or wakes up all the completed clients in parallel, before then
+ * transferring the bottom-half status to the next client in the queue.
+ *
+ * Compared to walking the entire list of waiters in a single dedicated
+ * bottom-half, we reduce the latency of the first waiter by avoiding
+ * a context switch, but incur additional coherent seqno reads when
+ * following the chain of request breadcrumbs. Since it is most likely
+ * that we have a single client waiting on each seqno, then reducing
+ * the overhead of waking that client is much preferred.
+ */
+struct intel_breadcrumbs {
+	spinlock_t irq_lock; /* protects the lists used in hardirq context */
+
+	/* Not all breadcrumbs are attached to physical HW */
+	struct intel_engine_cs *irq_engine;
+
+	struct list_head signalers;
+	struct list_head signaled_requests;
+
+	struct irq_work irq_work; /* for use from inside irq_lock */
+
+	unsigned int irq_enabled;
+
+	bool irq_armed;
+};
+
+#endif /* __INTEL_BREADCRUMBS_TYPES__ */
diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
index faf00a353e25..08e2c000dcc3 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine.h
@@ -223,23 +223,6 @@ void intel_engine_get_instdone(const struct intel_engine_cs *engine,
 
 void intel_engine_init_execlists(struct intel_engine_cs *engine);
 
-void intel_engine_init_breadcrumbs(struct intel_engine_cs *engine);
-void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine);
-
-void intel_engine_disarm_breadcrumbs(struct intel_engine_cs *engine);
-
-static inline void
-intel_engine_signal_breadcrumbs(struct intel_engine_cs *engine)
-{
-	irq_work_queue(&engine->breadcrumbs.irq_work);
-}
-
-void intel_engine_reset_breadcrumbs(struct intel_engine_cs *engine);
-void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine);
-
-void intel_engine_print_breadcrumbs(struct intel_engine_cs *engine,
-				    struct drm_printer *p);
-
 static inline u32 *__gen8_emit_pipe_control(u32 *batch, u32 flags0, u32 flags1, u32 offset)
 {
 	memset(batch, 0, 6 * sizeof(u32));
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index dd1a42c4d344..c20a91c1318f 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -28,6 +28,7 @@
 
 #include "i915_drv.h"
 
+#include "intel_breadcrumbs.h"
 #include "intel_context.h"
 #include "intel_engine.h"
 #include "intel_engine_pm.h"
@@ -700,8 +701,13 @@ static int engine_setup_common(struct intel_engine_cs *engine)
 	if (err)
 		return err;
 
+	engine->breadcrumbs = intel_breadcrumbs_create(engine);
+	if (!engine->breadcrumbs) {
+		err = -ENOMEM;
+		goto err_status;
+	}
+
 	intel_engine_init_active(engine, ENGINE_PHYSICAL);
-	intel_engine_init_breadcrumbs(engine);
 	intel_engine_init_execlists(engine);
 	intel_engine_init_cmd_parser(engine);
 	intel_engine_init__pm(engine);
@@ -716,6 +722,10 @@ static int engine_setup_common(struct intel_engine_cs *engine)
 	intel_engine_init_ctx_wa(engine);
 
 	return 0;
+
+err_status:
+	cleanup_status_page(engine);
+	return err;
 }
 
 struct measure_breadcrumb {
@@ -902,9 +912,9 @@ void intel_engine_cleanup_common(struct intel_engine_cs *engine)
 	tasklet_kill(&engine->execlists.tasklet); /* flush the callback */
 
 	cleanup_status_page(engine);
+	intel_breadcrumbs_free(engine->breadcrumbs);
 
 	intel_engine_fini_retire(engine);
-	intel_engine_fini_breadcrumbs(engine);
 	intel_engine_cleanup_cmd_parser(engine);
 
 	if (engine->default_state)
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
index 8ec3eecf3e39..f7b2e07e2229 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
@@ -6,6 +6,7 @@
 
 #include "i915_drv.h"
 
+#include "intel_breadcrumbs.h"
 #include "intel_context.h"
 #include "intel_engine.h"
 #include "intel_engine_heartbeat.h"
@@ -247,7 +248,7 @@ static int __engine_park(struct intel_wakeref *wf)
 	call_idle_barriers(engine); /* cleanup after wedging */
 
 	intel_engine_park_heartbeat(engine);
-	intel_engine_disarm_breadcrumbs(engine);
+	intel_breadcrumbs_park(engine->breadcrumbs);
 
 	/* Must be reset upon idling, or we may miss the busy wakeup. */
 	GEM_BUG_ON(engine->execlists.queue_priority_hint != INT_MIN);
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index 8de92fd7d392..c400aaa2287b 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -22,6 +22,7 @@
 #include "i915_pmu.h"
 #include "i915_priolist_types.h"
 #include "i915_selftest.h"
+#include "intel_breadcrumbs_types.h"
 #include "intel_sseu.h"
 #include "intel_timeline_types.h"
 #include "intel_uncore.h"
@@ -373,34 +374,8 @@ struct intel_engine_cs {
 	 */
 	struct ewma__engine_latency latency;
 
-	/* Rather than have every client wait upon all user interrupts,
-	 * with the herd waking after every interrupt and each doing the
-	 * heavyweight seqno dance, we delegate the task (of being the
-	 * bottom-half of the user interrupt) to the first client. After
-	 * every interrupt, we wake up one client, who does the heavyweight
-	 * coherent seqno read and either goes back to sleep (if incomplete),
-	 * or wakes up all the completed clients in parallel, before then
-	 * transferring the bottom-half status to the next client in the queue.
-	 *
-	 * Compared to walking the entire list of waiters in a single dedicated
-	 * bottom-half, we reduce the latency of the first waiter by avoiding
-	 * a context switch, but incur additional coherent seqno reads when
-	 * following the chain of request breadcrumbs. Since it is most likely
-	 * that we have a single client waiting on each seqno, then reducing
-	 * the overhead of waking that client is much preferred.
-	 */
-	struct intel_breadcrumbs {
-		spinlock_t irq_lock;
-		struct list_head signalers;
-
-		struct list_head signaled_requests;
-
-		struct irq_work irq_work; /* for use from inside irq_lock */
-
-		unsigned int irq_enabled;
-
-		bool irq_armed;
-	} breadcrumbs;
+	/* Keep track of all the seqno used, a trail of breadcrumbs */
+	struct intel_breadcrumbs *breadcrumbs;
 
 	struct intel_engine_pmu {
 		/**
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_irq.c b/drivers/gpu/drm/i915/gt/intel_gt_irq.c
index b05da68e52f4..257063a57101 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_irq.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_irq.c
@@ -8,6 +8,7 @@
 
 #include "i915_drv.h"
 #include "i915_irq.h"
+#include "intel_breadcrumbs.h"
 #include "intel_gt.h"
 #include "intel_gt_irq.h"
 #include "intel_uncore.h"
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 904e9b8bcbf6..03cac9303dfc 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -137,6 +137,7 @@
 #include "i915_perf.h"
 #include "i915_trace.h"
 #include "i915_vgpu.h"
+#include "intel_breadcrumbs.h"
 #include "intel_context.h"
 #include "intel_engine_pm.h"
 #include "intel_gt.h"
@@ -4119,7 +4120,7 @@ static int execlists_resume(struct intel_engine_cs *engine)
 {
 	intel_mocs_init_engine(engine);
 
-	intel_engine_reset_breadcrumbs(engine);
+	intel_breadcrumbs_reset(engine->breadcrumbs);
 
 	if (GEM_SHOW_DEBUG() && unexpected_starting_state(engine)) {
 		struct drm_printer p = drm_debug_printer(__func__);
@@ -5715,9 +5716,7 @@ intel_execlists_create_virtual(struct intel_engine_cs **siblings,
 	snprintf(ve->base.name, sizeof(ve->base.name), "virtual");
 
 	intel_engine_init_active(&ve->base, ENGINE_VIRTUAL);
-	intel_engine_init_breadcrumbs(&ve->base);
 	intel_engine_init_execlists(&ve->base);
-	ve->base.breadcrumbs.irq_armed = true; /* fake HW, used for irq_work */
 
 	ve->base.cops = &virtual_context_ops;
 	ve->base.request_alloc = execlists_request_alloc;
@@ -5734,6 +5733,12 @@ intel_execlists_create_virtual(struct intel_engine_cs **siblings,
 
 	intel_context_init(&ve->context, &ve->base);
 
+	ve->base.breadcrumbs = intel_breadcrumbs_create(NULL);
+	if (!ve->base.breadcrumbs) {
+		err = -ENOMEM;
+		goto err_put;
+	}
+
 	for (n = 0; n < count; n++) {
 		struct intel_engine_cs *sibling = siblings[n];
 
diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
index 46a5ceffc22f..ac36b67fb46b 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.c
+++ b/drivers/gpu/drm/i915/gt/intel_reset.c
@@ -15,6 +15,7 @@
 #include "i915_drv.h"
 #include "i915_gpu_error.h"
 #include "i915_irq.h"
+#include "intel_breadcrumbs.h"
 #include "intel_engine_pm.h"
 #include "intel_gt.h"
 #include "intel_gt_pm.h"
diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
index 94915f668715..186aa2d3a83e 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
@@ -32,6 +32,7 @@
 #include "gen6_ppgtt.h"
 #include "gen7_renderclear.h"
 #include "i915_drv.h"
+#include "intel_breadcrumbs.h"
 #include "intel_context.h"
 #include "intel_gt.h"
 #include "intel_reset.h"
@@ -255,7 +256,7 @@ static int xcs_resume(struct intel_engine_cs *engine)
 	else
 		ring_setup_status_page(engine);
 
-	intel_engine_reset_breadcrumbs(engine);
+	intel_breadcrumbs_reset(engine->breadcrumbs);
 
 	/* Enforce ordering by reading HEAD register back */
 	ENGINE_POSTING_READ(engine, RING_HEAD);
diff --git a/drivers/gpu/drm/i915/gt/intel_rps.c b/drivers/gpu/drm/i915/gt/intel_rps.c
index 97ba14ad52e4..e6a00eea0631 100644
--- a/drivers/gpu/drm/i915/gt/intel_rps.c
+++ b/drivers/gpu/drm/i915/gt/intel_rps.c
@@ -7,6 +7,7 @@
 #include <drm/i915_drm.h>
 
 #include "i915_drv.h"
+#include "intel_breadcrumbs.h"
 #include "intel_gt.h"
 #include "intel_gt_clock_utils.h"
 #include "intel_gt_irq.h"
diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c
index 06303ba98c19..d1971ffdca42 100644
--- a/drivers/gpu/drm/i915/gt/mock_engine.c
+++ b/drivers/gpu/drm/i915/gt/mock_engine.c
@@ -261,11 +261,12 @@ static void mock_engine_release(struct intel_engine_cs *engine)
 
 	GEM_BUG_ON(timer_pending(&mock->hw_delay));
 
+	intel_breadcrumbs_free(engine->breadcrumbs);
+
 	intel_context_unpin(engine->kernel_context);
 	intel_context_put(engine->kernel_context);
 
 	intel_engine_fini_retire(engine);
-	intel_engine_fini_breadcrumbs(engine);
 }
 
 struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
@@ -323,11 +324,14 @@ int mock_engine_init(struct intel_engine_cs *engine)
 	struct intel_context *ce;
 
 	intel_engine_init_active(engine, ENGINE_MOCK);
-	intel_engine_init_breadcrumbs(engine);
 	intel_engine_init_execlists(engine);
 	intel_engine_init__pm(engine);
 	intel_engine_init_retire(engine);
 
+	engine->breadcrumbs = intel_breadcrumbs_create(NULL);
+	if (!engine->breadcrumbs)
+		return -ENOMEM;
+
 	ce = create_kernel_context(engine);
 	if (IS_ERR(ce))
 		goto err_breadcrumbs;
@@ -339,7 +343,7 @@ int mock_engine_init(struct intel_engine_cs *engine)
 	return 0;
 
 err_breadcrumbs:
-	intel_engine_fini_breadcrumbs(engine);
+	intel_breadcrumbs_free(engine->breadcrumbs);
 	return -ENOMEM;
 }
 
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 1fa67700d8f4..f113fe44572b 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -41,6 +41,7 @@
 #include "display/intel_lpe_audio.h"
 #include "display/intel_psr.h"
 
+#include "gt/intel_breadcrumbs.h"
 #include "gt/intel_gt.h"
 #include "gt/intel_gt_irq.h"
 #include "gt/intel_gt_pm_irq.h"
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 0cf2a10a24f1..3b121fa38c60 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -31,6 +31,7 @@
 #include <linux/sched/signal.h>
 
 #include "gem/i915_gem_context.h"
+#include "gt/intel_breadcrumbs.h"
 #include "gt/intel_context.h"
 #include "gt/intel_ring.h"
 #include "gt/intel_rps.h"
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index fc18378c685d..16b721080195 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -361,10 +361,6 @@ void i915_request_submit(struct i915_request *request);
 void __i915_request_unsubmit(struct i915_request *request);
 void i915_request_unsubmit(struct i915_request *request);
 
-/* Note: part of the intel_breadcrumbs family */
-bool i915_request_enable_breadcrumb(struct i915_request *request);
-void i915_request_cancel_breadcrumb(struct i915_request *request);
-
 long i915_request_wait(struct i915_request *rq,
 		       unsigned int flags,
 		       long timeout)
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Intel-gfx] [PATCH 14/21] drm/i915/gt: Move intel_breadcrumbs_arm_irq earlier
  2020-07-30  9:37 [Intel-gfx] Breadcrumbs fixes and stall avoidance Chris Wilson
                   ` (12 preceding siblings ...)
  2020-07-30  9:37 ` [Intel-gfx] [PATCH 13/21] drm/i915/gt: Distinguish the virtual breadcrumbs from the irq breadcrumbs Chris Wilson
@ 2020-07-30  9:37 ` Chris Wilson
  2020-07-30  9:37 ` [Intel-gfx] [PATCH 15/21] drm/i915/gt: Hold context/request reference while breadcrumbs are active Chris Wilson
                   ` (10 subsequent siblings)
  24 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2020-07-30  9:37 UTC (permalink / raw)
  To: intel-gfx; +Cc: thomas.hellstrom, Chris Wilson

Move the __intel_breadcrumbs_arm_irq earlier, next to the disarm_irq, so
that we can make use of it in the following patch.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 84 ++++++++++-----------
 1 file changed, 42 insertions(+), 42 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
index 2ffd47a86656..9dd99969fd07 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
@@ -54,6 +54,36 @@ static void irq_disable(struct intel_engine_cs *engine)
 	spin_unlock(&engine->gt->irq_lock);
 }
 
+static void __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b)
+{
+	lockdep_assert_held(&b->irq_lock);
+
+	if (!b->irq_engine || b->irq_armed)
+		return;
+
+	if (!intel_gt_pm_get_if_awake(b->irq_engine->gt))
+		return;
+
+	/*
+	 * The breadcrumb irq will be disarmed on the interrupt after the
+	 * waiters are signaled. This gives us a single interrupt window in
+	 * which we can add a new waiter and avoid the cost of re-enabling
+	 * the irq.
+	 */
+	WRITE_ONCE(b->irq_armed, true);
+
+	/*
+	 * Since we are waiting on a request, the GPU should be busy
+	 * and should have its own rpm reference. This is tracked
+	 * by i915->gt.awake, we can forgo holding our own wakref
+	 * for the interrupt as before i915->gt.awake is released (when
+	 * the driver is idle) we disarm the breadcrumbs.
+	 */
+
+	if (!b->irq_enabled++)
+		irq_enable(b->irq_engine);
+}
+
 static void __intel_breadcrumbs_disarm_irq(struct intel_breadcrumbs *b)
 {
 	lockdep_assert_held(&b->irq_lock);
@@ -69,18 +99,6 @@ static void __intel_breadcrumbs_disarm_irq(struct intel_breadcrumbs *b)
 	intel_gt_pm_put_async(b->irq_engine->gt);
 }
 
-void intel_breadcrumbs_park(struct intel_breadcrumbs *b)
-{
-	unsigned long flags;
-
-	if (!READ_ONCE(b->irq_armed))
-		return;
-
-	spin_lock_irqsave(&b->irq_lock, flags);
-	__intel_breadcrumbs_disarm_irq(b);
-	spin_unlock_irqrestore(&b->irq_lock, flags);
-}
-
 static inline bool __request_completed(const struct i915_request *rq)
 {
 	return i915_seqno_passed(__hwsp_seqno(rq), rq->fence.seqno);
@@ -214,36 +232,6 @@ static void signal_irq_work(struct irq_work *work)
 	}
 }
 
-static void __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b)
-{
-	lockdep_assert_held(&b->irq_lock);
-
-	if (!b->irq_engine || b->irq_armed)
-		return;
-
-	if (!intel_gt_pm_get_if_awake(b->irq_engine->gt))
-		return;
-
-	/*
-	 * The breadcrumb irq will be disarmed on the interrupt after the
-	 * waiters are signaled. This gives us a single interrupt window in
-	 * which we can add a new waiter and avoid the cost of re-enabling
-	 * the irq.
-	 */
-	WRITE_ONCE(b->irq_armed, true);
-
-	/*
-	 * Since we are waiting on a request, the GPU should be busy
-	 * and should have its own rpm reference. This is tracked
-	 * by i915->gt.awake, we can forgo holding our own wakref
-	 * for the interrupt as before i915->gt.awake is released (when
-	 * the driver is idle) we disarm the breadcrumbs.
-	 */
-
-	if (!b->irq_enabled++)
-		irq_enable(b->irq_engine);
-}
-
 struct intel_breadcrumbs *
 intel_breadcrumbs_create(struct intel_engine_cs *irq_engine)
 {
@@ -281,6 +269,18 @@ void intel_breadcrumbs_reset(struct intel_breadcrumbs *b)
 	spin_unlock_irqrestore(&b->irq_lock, flags);
 }
 
+void intel_breadcrumbs_park(struct intel_breadcrumbs *b)
+{
+	unsigned long flags;
+
+	if (!READ_ONCE(b->irq_armed))
+		return;
+
+	spin_lock_irqsave(&b->irq_lock, flags);
+	__intel_breadcrumbs_disarm_irq(b);
+	spin_unlock_irqrestore(&b->irq_lock, flags);
+}
+
 void intel_breadcrumbs_free(struct intel_breadcrumbs *b)
 {
 	kfree(b);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Intel-gfx] [PATCH 15/21] drm/i915/gt: Hold context/request reference while breadcrumbs are active
  2020-07-30  9:37 [Intel-gfx] Breadcrumbs fixes and stall avoidance Chris Wilson
                   ` (13 preceding siblings ...)
  2020-07-30  9:37 ` [Intel-gfx] [PATCH 14/21] drm/i915/gt: Move intel_breadcrumbs_arm_irq earlier Chris Wilson
@ 2020-07-30  9:37 ` Chris Wilson
  2020-07-30  9:37 ` [Intel-gfx] [PATCH 16/21] drm/i915/gt: Track signaled breadcrumbs outside of the breadcrumb spinlock Chris Wilson
                   ` (9 subsequent siblings)
  24 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2020-07-30  9:37 UTC (permalink / raw)
  To: intel-gfx; +Cc: thomas.hellstrom, Chris Wilson

Currently we hold no actual reference to the request nor context while
they are attached to a breadcrumb. To avoid freeing the request/context
too early, we serialise with cancel-breadcrumbs by taking the irq
spinlock in i915_request_retire(). The alternative is to take a
reference for a new breadcrumb and release it upon signaling; removing
the more frequently hit contention point in i915_request_retire().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 57 ++++++++++++++++-----
 drivers/gpu/drm/i915/i915_request.c         |  9 ++--
 2 files changed, 47 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
index 9dd99969fd07..fc6f0223d2c8 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
@@ -29,6 +29,7 @@
 #include "i915_drv.h"
 #include "i915_trace.h"
 #include "intel_breadcrumbs.h"
+#include "intel_context.h"
 #include "intel_gt_pm.h"
 #include "intel_gt_requests.h"
 
@@ -99,6 +100,22 @@ static void __intel_breadcrumbs_disarm_irq(struct intel_breadcrumbs *b)
 	intel_gt_pm_put_async(b->irq_engine->gt);
 }
 
+static void add_signaling_context(struct intel_breadcrumbs *b,
+				  struct intel_context *ce)
+{
+	intel_context_get(ce);
+	list_add_tail(&ce->signal_link, &b->signalers);
+	if (list_is_first(&ce->signal_link, &b->signalers))
+		__intel_breadcrumbs_arm_irq(b);
+}
+
+static void remove_signaling_context(struct intel_breadcrumbs *b,
+				     struct intel_context *ce)
+{
+	list_del(&ce->signal_link);
+	intel_context_put(ce);
+}
+
 static inline bool __request_completed(const struct i915_request *rq)
 {
 	return i915_seqno_passed(__hwsp_seqno(rq), rq->fence.seqno);
@@ -107,6 +124,9 @@ static inline bool __request_completed(const struct i915_request *rq)
 __maybe_unused static bool
 check_signal_order(struct intel_context *ce, struct i915_request *rq)
 {
+	if (rq->context != ce)
+		return false;
+
 	if (!list_is_last(&rq->signal_link, &ce->signals) &&
 	    i915_seqno_passed(rq->fence.seqno,
 			      list_next_entry(rq, signal_link)->fence.seqno))
@@ -158,10 +178,11 @@ static bool __signal_request(struct i915_request *rq, struct list_head *signals)
 {
 	clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags);
 
-	if (!__dma_fence_signal(&rq->fence))
+	if (!__dma_fence_signal(&rq->fence)) {
+		i915_request_put(rq);
 		return false;
+	}
 
-	i915_request_get(rq);
 	list_add_tail(&rq->signal_link, signals);
 	return true;
 }
@@ -209,8 +230,8 @@ static void signal_irq_work(struct irq_work *work)
 			/* Advance the list to the first incomplete request */
 			__list_del_many(&ce->signals, pos);
 			if (&ce->signals == pos) { /* now empty */
-				list_del_init(&ce->signal_link);
 				add_retire(b, ce->timeline);
+				remove_signaling_context(b, ce);
 			}
 		}
 	}
@@ -279,6 +300,9 @@ void intel_breadcrumbs_park(struct intel_breadcrumbs *b)
 	spin_lock_irqsave(&b->irq_lock, flags);
 	__intel_breadcrumbs_disarm_irq(b);
 	spin_unlock_irqrestore(&b->irq_lock, flags);
+
+	if (!list_empty(&b->signalers))
+		irq_work_queue(&b->irq_work);
 }
 
 void intel_breadcrumbs_free(struct intel_breadcrumbs *b)
@@ -295,6 +319,8 @@ static void insert_breadcrumb(struct i915_request *rq,
 	if (test_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags))
 		return;
 
+	i915_request_get(rq);
+
 	/*
 	 * If the request is already completed, we can transfer it
 	 * straight onto a signaled list, and queue the irq worker for
@@ -306,8 +332,6 @@ static void insert_breadcrumb(struct i915_request *rq,
 		return;
 	}
 
-	__intel_breadcrumbs_arm_irq(b);
-
 	/*
 	 * We keep the seqno in retirement order, so we can break
 	 * inside intel_engine_signal_breadcrumbs as soon as we've
@@ -322,16 +346,19 @@ static void insert_breadcrumb(struct i915_request *rq,
 	 * start looking for our insertion point from the tail of
 	 * the list.
 	 */
-	list_for_each_prev(pos, &ce->signals) {
-		struct i915_request *it =
-			list_entry(pos, typeof(*it), signal_link);
-
-		if (i915_seqno_passed(rq->fence.seqno, it->fence.seqno))
-			break;
+	if (list_empty(&ce->signals)) {
+		add_signaling_context(b, ce);
+		pos = &ce->signals;
+	} else {
+		list_for_each_prev(pos, &ce->signals) {
+			struct i915_request *it =
+				list_entry(pos, typeof(*it), signal_link);
+
+			if (i915_seqno_passed(rq->fence.seqno, it->fence.seqno))
+				break;
+		}
 	}
 	list_add(&rq->signal_link, pos);
-	if (pos == &ce->signals) /* catch transitions from empty list */
-		list_move_tail(&ce->signal_link, &b->signalers);
 	GEM_BUG_ON(!check_signal_order(ce, rq));
 	set_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags);
 
@@ -412,9 +439,10 @@ void i915_request_cancel_breadcrumb(struct i915_request *rq)
 
 		list_del(&rq->signal_link);
 		if (list_empty(&ce->signals))
-			list_del_init(&ce->signal_link);
+			remove_signaling_context(b, ce);
 
 		clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags);
+		i915_request_put(rq);
 	}
 	spin_unlock(&b->irq_lock);
 }
@@ -429,6 +457,7 @@ void intel_engine_print_breadcrumbs(struct intel_engine_cs *engine,
 	if (!b || list_empty(&b->signalers))
 		return;
 
+	drm_printf(p, "IRQ: %s\n", enableddisabled(b->irq_armed));
 	drm_printf(p, "Signals:\n");
 
 	spin_lock_irq(&b->irq_lock);
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 3b121fa38c60..e36253bb6124 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -300,12 +300,11 @@ bool i915_request_retire(struct i915_request *rq)
 		__i915_request_fill(rq, POISON_FREE);
 	rq->ring->head = rq->postfix;
 
-	spin_lock_irq(&rq->lock);
-	if (!i915_request_signaled(rq))
+	if (!i915_request_signaled(rq)) {
+		spin_lock_irq(&rq->lock);
 		dma_fence_signal_locked(&rq->fence);
-	if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &rq->fence.flags))
-		i915_request_cancel_breadcrumb(rq);
-	spin_unlock_irq(&rq->lock);
+		spin_unlock_irq(&rq->lock);
+	}
 
 	if (i915_request_has_waitboost(rq)) {
 		GEM_BUG_ON(!atomic_read(&rq->engine->gt->rps.num_waiters));
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Intel-gfx] [PATCH 16/21] drm/i915/gt: Track signaled breadcrumbs outside of the breadcrumb spinlock
  2020-07-30  9:37 [Intel-gfx] Breadcrumbs fixes and stall avoidance Chris Wilson
                   ` (14 preceding siblings ...)
  2020-07-30  9:37 ` [Intel-gfx] [PATCH 15/21] drm/i915/gt: Hold context/request reference while breadcrumbs are active Chris Wilson
@ 2020-07-30  9:37 ` Chris Wilson
  2020-07-31 15:06   ` Tvrtko Ursulin
  2020-07-30  9:37 ` [Intel-gfx] [PATCH 17/21] drm/i915/gt: Protect context lifetime with RCU Chris Wilson
                   ` (8 subsequent siblings)
  24 siblings, 1 reply; 39+ messages in thread
From: Chris Wilson @ 2020-07-30  9:37 UTC (permalink / raw)
  To: intel-gfx; +Cc: thomas.hellstrom, Chris Wilson

Make b->signaled_requests a lockless-list so that we can manipulate it
outside of the b->irq_lock.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/intel_breadcrumbs.c   | 30 +++++++++++--------
 .../gpu/drm/i915/gt/intel_breadcrumbs_types.h |  2 +-
 drivers/gpu/drm/i915/i915_request.h           |  6 +++-
 3 files changed, 23 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
index fc6f0223d2c8..6a278bf0fc6b 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
@@ -174,16 +174,13 @@ static void add_retire(struct intel_breadcrumbs *b, struct intel_timeline *tl)
 		intel_engine_add_retire(b->irq_engine, tl);
 }
 
-static bool __signal_request(struct i915_request *rq, struct list_head *signals)
+static bool __signal_request(struct i915_request *rq)
 {
-	clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags);
-
 	if (!__dma_fence_signal(&rq->fence)) {
 		i915_request_put(rq);
 		return false;
 	}
 
-	list_add_tail(&rq->signal_link, signals);
 	return true;
 }
 
@@ -191,17 +188,19 @@ static void signal_irq_work(struct irq_work *work)
 {
 	struct intel_breadcrumbs *b = container_of(work, typeof(*b), irq_work);
 	const ktime_t timestamp = ktime_get();
+	struct llist_node *signal, *sn;
 	struct intel_context *ce, *cn;
 	struct list_head *pos, *next;
-	LIST_HEAD(signal);
+
+	signal = NULL;
+	if (unlikely(!llist_empty(&b->signaled_requests)))
+		signal = llist_del_all(&b->signaled_requests);
 
 	spin_lock(&b->irq_lock);
 
-	if (list_empty(&b->signalers))
+	if (!signal && list_empty(&b->signalers))
 		__intel_breadcrumbs_disarm_irq(b);
 
-	list_splice_init(&b->signaled_requests, &signal);
-
 	list_for_each_entry_safe(ce, cn, &b->signalers, signal_link) {
 		GEM_BUG_ON(list_empty(&ce->signals));
 
@@ -218,7 +217,11 @@ static void signal_irq_work(struct irq_work *work)
 			 * spinlock as the callback chain may end up adding
 			 * more signalers to the same context or engine.
 			 */
-			__signal_request(rq, &signal);
+			clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags);
+			if (__signal_request(rq)) {
+				rq->signal_node.next = signal;
+				signal = &rq->signal_node;
+			}
 		}
 
 		/*
@@ -238,9 +241,9 @@ static void signal_irq_work(struct irq_work *work)
 
 	spin_unlock(&b->irq_lock);
 
-	list_for_each_safe(pos, next, &signal) {
+	llist_for_each_safe(signal, sn, signal) {
 		struct i915_request *rq =
-			list_entry(pos, typeof(*rq), signal_link);
+			llist_entry(signal, typeof(*rq), signal_node);
 		struct list_head cb_list;
 
 		spin_lock(&rq->lock);
@@ -264,7 +267,7 @@ intel_breadcrumbs_create(struct intel_engine_cs *irq_engine)
 
 	spin_lock_init(&b->irq_lock);
 	INIT_LIST_HEAD(&b->signalers);
-	INIT_LIST_HEAD(&b->signaled_requests);
+	init_llist_head(&b->signaled_requests);
 
 	init_irq_work(&b->irq_work, signal_irq_work);
 
@@ -327,7 +330,8 @@ static void insert_breadcrumb(struct i915_request *rq,
 	 * its signal completion.
 	 */
 	if (__request_completed(rq)) {
-		if (__signal_request(rq, &b->signaled_requests))
+		if (__signal_request(rq) &&
+		    llist_add(&rq->signal_node, &b->signaled_requests))
 			irq_work_queue(&b->irq_work);
 		return;
 	}
diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h b/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
index 8e53b9942695..3fa19820b37a 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
@@ -35,7 +35,7 @@ struct intel_breadcrumbs {
 	struct intel_engine_cs *irq_engine;
 
 	struct list_head signalers;
-	struct list_head signaled_requests;
+	struct llist_head signaled_requests;
 
 	struct irq_work irq_work; /* for use from inside irq_lock */
 
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index 16b721080195..874af6db6103 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -176,7 +176,11 @@ struct i915_request {
 	struct intel_context *context;
 	struct intel_ring *ring;
 	struct intel_timeline __rcu *timeline;
-	struct list_head signal_link;
+
+	union {
+		struct list_head signal_link;
+		struct llist_node signal_node;
+	};
 
 	/*
 	 * The rcu epoch of when this request was allocated. Used to judiciously
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Intel-gfx] [PATCH 17/21] drm/i915/gt: Protect context lifetime with RCU
  2020-07-30  9:37 [Intel-gfx] Breadcrumbs fixes and stall avoidance Chris Wilson
                   ` (15 preceding siblings ...)
  2020-07-30  9:37 ` [Intel-gfx] [PATCH 16/21] drm/i915/gt: Track signaled breadcrumbs outside of the breadcrumb spinlock Chris Wilson
@ 2020-07-30  9:37 ` Chris Wilson
  2020-07-31 15:15   ` Tvrtko Ursulin
  2020-07-30  9:37 ` [Intel-gfx] [PATCH 18/21] drm/i915/gt: Split the breadcrumb spinlock between global and contexts Chris Wilson
                   ` (7 subsequent siblings)
  24 siblings, 1 reply; 39+ messages in thread
From: Chris Wilson @ 2020-07-30  9:37 UTC (permalink / raw)
  To: intel-gfx; +Cc: thomas.hellstrom, Chris Wilson

Allow a brief period for continued access to a dead intel_context by
deferring the release of the struct until after an RCU grace period.
As we are using a dedicated slab cache for the contexts, we can defer
the release of the slab pages via RCU, with the caveat that individual
structs may be reused from the freelist within an RCU grace period. To
handle that, we have to avoid clearing members of the zombie struct.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/intel_context.c | 330 +++++++++++++-----------
 drivers/gpu/drm/i915/i915_active.c      |  10 +
 drivers/gpu/drm/i915/i915_active.h      |   2 +
 drivers/gpu/drm/i915/i915_utils.h       |   7 +
 4 files changed, 202 insertions(+), 147 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
index 52db2bde44a3..4e7924640ffa 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -22,7 +22,7 @@ static struct i915_global_context {
 
 static struct intel_context *intel_context_alloc(void)
 {
-	return kmem_cache_zalloc(global.slab_ce, GFP_KERNEL);
+	return kmem_cache_alloc(global.slab_ce, GFP_KERNEL);
 }
 
 void intel_context_free(struct intel_context *ce)
@@ -30,6 +30,177 @@ void intel_context_free(struct intel_context *ce)
 	kmem_cache_free(global.slab_ce, ce);
 }
 
+static int __context_pin_state(struct i915_vma *vma)
+{
+	unsigned int bias = i915_ggtt_pin_bias(vma) | PIN_OFFSET_BIAS;
+	int err;
+
+	err = i915_ggtt_pin(vma, 0, bias | PIN_HIGH);
+	if (err)
+		return err;
+
+	err = i915_active_acquire(&vma->active);
+	if (err)
+		goto err_unpin;
+
+	/*
+	 * And mark it as a globally pinned object to let the shrinker know
+	 * it cannot reclaim the object until we release it.
+	 */
+	i915_vma_make_unshrinkable(vma);
+	vma->obj->mm.dirty = true;
+
+	return 0;
+
+err_unpin:
+	i915_vma_unpin(vma);
+	return err;
+}
+
+static void __context_unpin_state(struct i915_vma *vma)
+{
+	i915_vma_make_shrinkable(vma);
+	i915_active_release(&vma->active);
+	__i915_vma_unpin(vma);
+}
+
+static int __ring_active(struct intel_ring *ring)
+{
+	int err;
+
+	err = intel_ring_pin(ring);
+	if (err)
+		return err;
+
+	err = i915_active_acquire(&ring->vma->active);
+	if (err)
+		goto err_pin;
+
+	return 0;
+
+err_pin:
+	intel_ring_unpin(ring);
+	return err;
+}
+
+static void __ring_retire(struct intel_ring *ring)
+{
+	i915_active_release(&ring->vma->active);
+	intel_ring_unpin(ring);
+}
+
+__i915_active_call
+static void __intel_context_retire(struct i915_active *active)
+{
+	struct intel_context *ce = container_of(active, typeof(*ce), active);
+
+	CE_TRACE(ce, "retire runtime: { total:%lluns, avg:%lluns }\n",
+		 intel_context_get_total_runtime_ns(ce),
+		 intel_context_get_avg_runtime_ns(ce));
+
+	set_bit(CONTEXT_VALID_BIT, &ce->flags);
+	if (ce->state)
+		__context_unpin_state(ce->state);
+
+	intel_timeline_unpin(ce->timeline);
+	__ring_retire(ce->ring);
+
+	intel_context_put(ce);
+}
+
+static int __intel_context_active(struct i915_active *active)
+{
+	struct intel_context *ce = container_of(active, typeof(*ce), active);
+	int err;
+
+	CE_TRACE(ce, "active\n");
+
+	intel_context_get(ce);
+
+	err = __ring_active(ce->ring);
+	if (err)
+		goto err_put;
+
+	err = intel_timeline_pin(ce->timeline);
+	if (err)
+		goto err_ring;
+
+	if (!ce->state)
+		return 0;
+
+	err = __context_pin_state(ce->state);
+	if (err)
+		goto err_timeline;
+
+	return 0;
+
+err_timeline:
+	intel_timeline_unpin(ce->timeline);
+err_ring:
+	__ring_retire(ce->ring);
+err_put:
+	intel_context_put(ce);
+	return err;
+}
+
+static void __intel_context_ctor(void *arg)
+{
+	struct intel_context *ce = arg;
+
+	INIT_LIST_HEAD(&ce->signal_link);
+	INIT_LIST_HEAD(&ce->signals);
+
+	atomic_set(&ce->pin_count, 0);
+	mutex_init(&ce->pin_mutex);
+
+	ce->active_count = 0;
+	i915_active_init(&ce->active,
+			 __intel_context_active, __intel_context_retire);
+
+	ce->inflight = NULL;
+	ce->lrc_reg_state = NULL;
+	ce->lrc.desc = 0;
+}
+
+static void
+__intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
+{
+	GEM_BUG_ON(!engine->cops);
+	GEM_BUG_ON(!engine->gt->vm);
+
+	kref_init(&ce->ref);
+	i915_active_reinit(&ce->active);
+	mutex_reinit(&ce->pin_mutex);
+
+	ce->engine = engine;
+	ce->ops = engine->cops;
+	ce->sseu = engine->sseu;
+
+	ce->wa_bb_page = 0;
+	ce->flags = 0;
+	ce->tag = 0;
+
+	memset(&ce->runtime, 0, sizeof(ce->runtime));
+
+	ce->vm = i915_vm_get(engine->gt->vm);
+	ce->gem_context = NULL;
+
+	ce->ring = __intel_context_ring_size(SZ_4K);
+	ce->timeline = NULL;
+	ce->state = NULL;
+
+	GEM_BUG_ON(atomic_read(&ce->pin_count));
+	GEM_BUG_ON(ce->active_count);
+	GEM_BUG_ON(ce->inflight);
+}
+
+void
+intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
+{
+	__intel_context_ctor(ce);
+	__intel_context_init(ce, engine);
+}
+
 struct intel_context *
 intel_context_create(struct intel_engine_cs *engine)
 {
@@ -39,7 +210,7 @@ intel_context_create(struct intel_engine_cs *engine)
 	if (!ce)
 		return ERR_PTR(-ENOMEM);
 
-	intel_context_init(ce, engine);
+	__intel_context_init(ce, engine);
 	return ce;
 }
 
@@ -158,154 +329,13 @@ void intel_context_unpin(struct intel_context *ce)
 	/*
 	 * Once released, we may asynchronously drop the active reference.
 	 * As that may be the only reference keeping the context alive,
-	 * take an extra now so that it is not freed before we finish
+	 * hold onto RCU so that it is not freed before we finish
 	 * dereferencing it.
 	 */
-	intel_context_get(ce);
+	rcu_read_lock();
 	intel_context_active_release(ce);
-	intel_context_put(ce);
-}
-
-static int __context_pin_state(struct i915_vma *vma)
-{
-	unsigned int bias = i915_ggtt_pin_bias(vma) | PIN_OFFSET_BIAS;
-	int err;
-
-	err = i915_ggtt_pin(vma, 0, bias | PIN_HIGH);
-	if (err)
-		return err;
-
-	err = i915_active_acquire(&vma->active);
-	if (err)
-		goto err_unpin;
-
-	/*
-	 * And mark it as a globally pinned object to let the shrinker know
-	 * it cannot reclaim the object until we release it.
-	 */
-	i915_vma_make_unshrinkable(vma);
-	vma->obj->mm.dirty = true;
-
-	return 0;
-
-err_unpin:
-	i915_vma_unpin(vma);
-	return err;
-}
-
-static void __context_unpin_state(struct i915_vma *vma)
-{
-	i915_vma_make_shrinkable(vma);
-	i915_active_release(&vma->active);
-	__i915_vma_unpin(vma);
-}
-
-static int __ring_active(struct intel_ring *ring)
-{
-	int err;
-
-	err = intel_ring_pin(ring);
-	if (err)
-		return err;
-
-	err = i915_active_acquire(&ring->vma->active);
-	if (err)
-		goto err_pin;
-
-	return 0;
-
-err_pin:
-	intel_ring_unpin(ring);
-	return err;
-}
-
-static void __ring_retire(struct intel_ring *ring)
-{
-	i915_active_release(&ring->vma->active);
-	intel_ring_unpin(ring);
+	rcu_read_unlock();
 }
-
-__i915_active_call
-static void __intel_context_retire(struct i915_active *active)
-{
-	struct intel_context *ce = container_of(active, typeof(*ce), active);
-
-	CE_TRACE(ce, "retire runtime: { total:%lluns, avg:%lluns }\n",
-		 intel_context_get_total_runtime_ns(ce),
-		 intel_context_get_avg_runtime_ns(ce));
-
-	set_bit(CONTEXT_VALID_BIT, &ce->flags);
-	if (ce->state)
-		__context_unpin_state(ce->state);
-
-	intel_timeline_unpin(ce->timeline);
-	__ring_retire(ce->ring);
-
-	intel_context_put(ce);
-}
-
-static int __intel_context_active(struct i915_active *active)
-{
-	struct intel_context *ce = container_of(active, typeof(*ce), active);
-	int err;
-
-	CE_TRACE(ce, "active\n");
-
-	intel_context_get(ce);
-
-	err = __ring_active(ce->ring);
-	if (err)
-		goto err_put;
-
-	err = intel_timeline_pin(ce->timeline);
-	if (err)
-		goto err_ring;
-
-	if (!ce->state)
-		return 0;
-
-	err = __context_pin_state(ce->state);
-	if (err)
-		goto err_timeline;
-
-	return 0;
-
-err_timeline:
-	intel_timeline_unpin(ce->timeline);
-err_ring:
-	__ring_retire(ce->ring);
-err_put:
-	intel_context_put(ce);
-	return err;
-}
-
-void
-intel_context_init(struct intel_context *ce,
-		   struct intel_engine_cs *engine)
-{
-	GEM_BUG_ON(!engine->cops);
-	GEM_BUG_ON(!engine->gt->vm);
-
-	kref_init(&ce->ref);
-
-	ce->engine = engine;
-	ce->ops = engine->cops;
-	ce->sseu = engine->sseu;
-	ce->ring = __intel_context_ring_size(SZ_4K);
-
-	ewma_runtime_init(&ce->runtime.avg);
-
-	ce->vm = i915_vm_get(engine->gt->vm);
-
-	INIT_LIST_HEAD(&ce->signal_link);
-	INIT_LIST_HEAD(&ce->signals);
-
-	mutex_init(&ce->pin_mutex);
-
-	i915_active_init(&ce->active,
-			 __intel_context_active, __intel_context_retire);
-}
-
 void intel_context_fini(struct intel_context *ce)
 {
 	if (ce->timeline)
@@ -333,7 +363,13 @@ static struct i915_global_context global = { {
 
 int __init i915_global_context_init(void)
 {
-	global.slab_ce = KMEM_CACHE(intel_context, SLAB_HWCACHE_ALIGN);
+	global.slab_ce =
+		kmem_cache_create("intel_context",
+				  sizeof(struct intel_context),
+				  __alignof__(struct intel_context),
+				  SLAB_HWCACHE_ALIGN |
+				  SLAB_TYPESAFE_BY_RCU,
+				  __intel_context_ctor);
 	if (!global.slab_ce)
 		return -ENOMEM;
 
diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
index 5dd52bb6d38c..878fe6664f19 100644
--- a/drivers/gpu/drm/i915/i915_active.c
+++ b/drivers/gpu/drm/i915/i915_active.c
@@ -795,6 +795,16 @@ void i915_active_fini(struct i915_active *ref)
 		kmem_cache_free(global.slab_cache, ref->cache);
 }
 
+void i915_active_reinit(struct i915_active *ref)
+{
+	GEM_BUG_ON(!i915_active_is_idle(ref));
+	debug_active_init(ref);
+	mutex_reinit(&ref->mutex);
+
+	ref->cache = NULL;
+	ref->tree = RB_ROOT;
+}
+
 static inline bool is_idle_barrier(struct active_node *node, u64 idx)
 {
 	return node->timeline == idx && !i915_active_fence_isset(&node->base);
diff --git a/drivers/gpu/drm/i915/i915_active.h b/drivers/gpu/drm/i915/i915_active.h
index fb165d3f01cf..6df7e721616d 100644
--- a/drivers/gpu/drm/i915/i915_active.h
+++ b/drivers/gpu/drm/i915/i915_active.h
@@ -219,6 +219,8 @@ i915_active_is_idle(const struct i915_active *ref)
 
 void i915_active_fini(struct i915_active *ref);
 
+void i915_active_reinit(struct i915_active *ref);
+
 int i915_active_acquire_preallocate_barrier(struct i915_active *ref,
 					    struct intel_engine_cs *engine);
 void i915_active_acquire_barrier(struct i915_active *ref);
diff --git a/drivers/gpu/drm/i915/i915_utils.h b/drivers/gpu/drm/i915/i915_utils.h
index 54773371e6bd..ef8db3aa75c7 100644
--- a/drivers/gpu/drm/i915/i915_utils.h
+++ b/drivers/gpu/drm/i915/i915_utils.h
@@ -443,6 +443,13 @@ static inline bool timer_expired(const struct timer_list *t)
 	return READ_ONCE(t->expires) && !timer_pending(t);
 }
 
+static inline void mutex_reinit(struct mutex *lock)
+{
+#if IS_ENABLED(CONFIG_DEBUG_MUTEXES)
+	lock->magic = lock;
+#endif
+}
+
 /*
  * This is a lookalike for IS_ENABLED() that takes a kconfig value,
  * e.g. CONFIG_DRM_I915_SPIN_REQUEST, and evaluates whether it is non-zero
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Intel-gfx] [PATCH 18/21] drm/i915/gt: Split the breadcrumb spinlock between global and contexts
  2020-07-30  9:37 [Intel-gfx] Breadcrumbs fixes and stall avoidance Chris Wilson
                   ` (16 preceding siblings ...)
  2020-07-30  9:37 ` [Intel-gfx] [PATCH 17/21] drm/i915/gt: Protect context lifetime with RCU Chris Wilson
@ 2020-07-30  9:37 ` Chris Wilson
  2020-07-30  9:37 ` [Intel-gfx] [PATCH 19/21] drm/i915: Drop i915_request.lock serialisation around await_start Chris Wilson
                   ` (6 subsequent siblings)
  24 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2020-07-30  9:37 UTC (permalink / raw)
  To: intel-gfx; +Cc: thomas.hellstrom, Chris Wilson

As we funnel more and more contexts into the breadcrumbs on an engine,
the hold time of b->irq_lock grows. As we may then contend with the
b->irq_lock during request submission, this increases the burden upon
the engine->active.lock and so directly impacts both our execution
latency and client latency. If we split the b->irq_lock by introducing a
per-context spinlock to manage the signalers within a context, we then
only need the b->irq_lock for enabling/disabling the interrupt and can
avoid taking the lock for walking the list of contexts within the signal
worker. Even with the current setup, this greatly reduces the number of
times we have to take and fight for b->irq_lock.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/intel_breadcrumbs.c   | 149 ++++++++++--------
 drivers/gpu/drm/i915/gt/intel_context.c       |   1 +
 drivers/gpu/drm/i915/gt/intel_context_types.h |   1 +
 3 files changed, 84 insertions(+), 67 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
index 6a278bf0fc6b..23fe647b50b3 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
@@ -103,17 +103,19 @@ static void __intel_breadcrumbs_disarm_irq(struct intel_breadcrumbs *b)
 static void add_signaling_context(struct intel_breadcrumbs *b,
 				  struct intel_context *ce)
 {
-	intel_context_get(ce);
-	list_add_tail(&ce->signal_link, &b->signalers);
-	if (list_is_first(&ce->signal_link, &b->signalers))
+	lockdep_assert_held(&b->irq_lock);
+
+	list_add_rcu(&ce->signal_link, &b->signalers);
+	if (list_is_last(&ce->signal_link, &b->signalers))
 		__intel_breadcrumbs_arm_irq(b);
 }
 
 static void remove_signaling_context(struct intel_breadcrumbs *b,
 				     struct intel_context *ce)
 {
-	list_del(&ce->signal_link);
-	intel_context_put(ce);
+	spin_lock(&b->irq_lock);
+	list_del_rcu(&ce->signal_link);
+	spin_unlock(&b->irq_lock);
 }
 
 static inline bool __request_completed(const struct i915_request *rq)
@@ -189,20 +191,30 @@ static void signal_irq_work(struct irq_work *work)
 	struct intel_breadcrumbs *b = container_of(work, typeof(*b), irq_work);
 	const ktime_t timestamp = ktime_get();
 	struct llist_node *signal, *sn;
-	struct intel_context *ce, *cn;
-	struct list_head *pos, *next;
+	struct intel_context *ce;
 
 	signal = NULL;
 	if (unlikely(!llist_empty(&b->signaled_requests)))
 		signal = llist_del_all(&b->signaled_requests);
 
-	spin_lock(&b->irq_lock);
+	if (!signal && READ_ONCE(b->irq_armed) && list_empty(&b->signalers)) {
+		if (spin_trylock(&b->irq_lock)) {
+			if (list_empty(&b->signalers))
+				__intel_breadcrumbs_disarm_irq(b);
+			spin_unlock(&b->irq_lock);
+		}
+	}
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(ce, &b->signalers, signal_link) {
+		struct list_head *pos, *next;
+		bool release = false;
 
-	if (!signal && list_empty(&b->signalers))
-		__intel_breadcrumbs_disarm_irq(b);
+		if (!spin_trylock(&ce->signal_lock))
+			continue;
 
-	list_for_each_entry_safe(ce, cn, &b->signalers, signal_link) {
-		GEM_BUG_ON(list_empty(&ce->signals));
+		if (list_empty(&ce->signals))
+			goto unlock;
 
 		list_for_each_safe(pos, next, &ce->signals) {
 			struct i915_request *rq =
@@ -235,11 +247,16 @@ static void signal_irq_work(struct irq_work *work)
 			if (&ce->signals == pos) { /* now empty */
 				add_retire(b, ce->timeline);
 				remove_signaling_context(b, ce);
+				release = true;
 			}
 		}
-	}
 
-	spin_unlock(&b->irq_lock);
+unlock:
+		spin_unlock(&ce->signal_lock);
+		if (release)
+			intel_context_put(ce);
+	}
+	rcu_read_unlock();
 
 	llist_for_each_safe(signal, sn, signal) {
 		struct i915_request *rq =
@@ -313,9 +330,9 @@ void intel_breadcrumbs_free(struct intel_breadcrumbs *b)
 	kfree(b);
 }
 
-static void insert_breadcrumb(struct i915_request *rq,
-			      struct intel_breadcrumbs *b)
+static void insert_breadcrumb(struct i915_request *rq)
 {
+	struct intel_breadcrumbs *b = READ_ONCE(rq->engine)->breadcrumbs;
 	struct intel_context *ce = rq->context;
 	struct list_head *pos;
 
@@ -351,7 +368,33 @@ static void insert_breadcrumb(struct i915_request *rq,
 	 * the list.
 	 */
 	if (list_empty(&ce->signals)) {
+		/*
+		 * rq->engine is locked by rq->engine->active.lock. That
+		 * however is not known until after rq->engine has been
+		 * dereferenced and the lock acquired. Hence we acquire the
+		 * lock and then validate that rq->engine still matches the
+		 * lock we hold for it.
+		 *
+		 * Here, we are using the breadcrumb lock as a proxy for the
+		 * rq->engine->active.lock, and we know that since the
+		 * breadcrumb will be serialised within i915_request_submit
+		 * the engine cannot change while active as long as we hold
+		 * the breadcrumb lock on that engine.
+		 *
+		 * From the dma_fence_enable_signaling() path, we are outside
+		 * of the request submit/unsubmit path, and so we must be more
+		 * careful to acquire the right lock.
+		 */
+		intel_context_get(ce);
+		spin_lock(&b->irq_lock);
+		while (unlikely(b != READ_ONCE(rq->engine)->breadcrumbs)) {
+			spin_unlock(&b->irq_lock);
+			b = READ_ONCE(rq->engine)->breadcrumbs;
+			spin_lock(&b->irq_lock);
+		}
 		add_signaling_context(b, ce);
+		spin_unlock(&b->irq_lock);
+
 		pos = &ce->signals;
 	} else {
 		list_for_each_prev(pos, &ce->signals) {
@@ -373,7 +416,7 @@ static void insert_breadcrumb(struct i915_request *rq,
 
 bool i915_request_enable_breadcrumb(struct i915_request *rq)
 {
-	struct intel_breadcrumbs *b;
+	struct intel_context *ce = rq->context;
 
 	/* Serialises with i915_request_retire() using rq->lock */
 	if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &rq->fence.flags))
@@ -388,67 +431,37 @@ bool i915_request_enable_breadcrumb(struct i915_request *rq)
 	if (!test_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags))
 		return true;
 
-	/*
-	 * rq->engine is locked by rq->engine->active.lock. That however
-	 * is not known until after rq->engine has been dereferenced and
-	 * the lock acquired. Hence we acquire the lock and then validate
-	 * that rq->engine still matches the lock we hold for it.
-	 *
-	 * Here, we are using the breadcrumb lock as a proxy for the
-	 * rq->engine->active.lock, and we know that since the breadcrumb
-	 * will be serialised within i915_request_submit/i915_request_unsubmit,
-	 * the engine cannot change while active as long as we hold the
-	 * breadcrumb lock on that engine.
-	 *
-	 * From the dma_fence_enable_signaling() path, we are outside of the
-	 * request submit/unsubmit path, and so we must be more careful to
-	 * acquire the right lock.
-	 */
-	b = READ_ONCE(rq->engine)->breadcrumbs;
-	spin_lock(&b->irq_lock);
-	while (unlikely(b != READ_ONCE(rq->engine)->breadcrumbs)) {
-		spin_unlock(&b->irq_lock);
-		b = READ_ONCE(rq->engine)->breadcrumbs;
-		spin_lock(&b->irq_lock);
-	}
-
-	/*
-	 * Now that we are finally serialised with request submit/unsubmit,
-	 * [with b->irq_lock] and with i915_request_retire() [via checking
-	 * SIGNALED with rq->lock] confirm the request is indeed active. If
-	 * it is no longer active, the breadcrumb will be attached upon
-	 * i915_request_submit().
-	 */
+	spin_lock(&ce->signal_lock);
 	if (test_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags))
-		insert_breadcrumb(rq, b);
-
-	spin_unlock(&b->irq_lock);
+		insert_breadcrumb(rq);
+	spin_unlock(&ce->signal_lock);
 
 	return true;
 }
 
 void i915_request_cancel_breadcrumb(struct i915_request *rq)
 {
-	struct intel_breadcrumbs *b = rq->engine->breadcrumbs;
+	struct intel_context *ce = rq->context;
+	bool release = false;
 
-	/*
-	 * We must wait for b->irq_lock so that we know the interrupt handler
-	 * has released its reference to the intel_context and has completed
-	 * the DMA_FENCE_FLAG_SIGNALED_BIT/I915_FENCE_FLAG_SIGNAL dance (if
-	 * required).
-	 */
-	spin_lock(&b->irq_lock);
+	if (!test_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags))
+		return;
+
+	spin_lock(&ce->signal_lock);
 	if (test_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags)) {
-		struct intel_context *ce = rq->context;
+		clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags);
 
 		list_del(&rq->signal_link);
-		if (list_empty(&ce->signals))
-			remove_signaling_context(b, ce);
+		if (list_empty(&ce->signals)) {
+			remove_signaling_context(rq->engine->breadcrumbs, ce);
+			release = true;
+		}
 
-		clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags);
 		i915_request_put(rq);
 	}
-	spin_unlock(&b->irq_lock);
+	spin_unlock(&ce->signal_lock);
+	if (release)
+		intel_context_put(ce);
 }
 
 void intel_engine_print_breadcrumbs(struct intel_engine_cs *engine,
@@ -464,8 +477,9 @@ void intel_engine_print_breadcrumbs(struct intel_engine_cs *engine,
 	drm_printf(p, "IRQ: %s\n", enableddisabled(b->irq_armed));
 	drm_printf(p, "Signals:\n");
 
-	spin_lock_irq(&b->irq_lock);
-	list_for_each_entry(ce, &b->signalers, signal_link) {
+	rcu_read_lock();
+	list_for_each_entry_rcu(ce, &b->signalers, signal_link) {
+		spin_lock_irq(&ce->signal_lock);
 		list_for_each_entry(rq, &ce->signals, signal_link) {
 			drm_printf(p, "\t[%llx:%llx%s] @ %dms\n",
 				   rq->fence.context, rq->fence.seqno,
@@ -474,6 +488,7 @@ void intel_engine_print_breadcrumbs(struct intel_engine_cs *engine,
 				   "",
 				   jiffies_to_msecs(jiffies - rq->emitted_jiffies));
 		}
+		spin_unlock_irq(&ce->signal_lock);
 	}
-	spin_unlock_irq(&b->irq_lock);
+	rcu_read_unlock();
 }
diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
index 4e7924640ffa..cde356c7754d 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -147,6 +147,7 @@ static void __intel_context_ctor(void *arg)
 {
 	struct intel_context *ce = arg;
 
+	spin_lock_init(&ce->signal_lock);
 	INIT_LIST_HEAD(&ce->signal_link);
 	INIT_LIST_HEAD(&ce->signals);
 
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 4954b0df4864..a78c1c225ce3 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -51,6 +51,7 @@ struct intel_context {
 	struct i915_address_space *vm;
 	struct i915_gem_context __rcu *gem_context;
 
+	spinlock_t signal_lock;
 	struct list_head signal_link;
 	struct list_head signals;
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Intel-gfx] [PATCH 19/21] drm/i915: Drop i915_request.lock serialisation around await_start
  2020-07-30  9:37 [Intel-gfx] Breadcrumbs fixes and stall avoidance Chris Wilson
                   ` (17 preceding siblings ...)
  2020-07-30  9:37 ` [Intel-gfx] [PATCH 18/21] drm/i915/gt: Split the breadcrumb spinlock between global and contexts Chris Wilson
@ 2020-07-30  9:37 ` Chris Wilson
  2020-07-30  9:37 ` [Intel-gfx] [PATCH 20/21] drm/i915: Drop i915_request.lock requirement for intel_rps_boost() Chris Wilson
                   ` (5 subsequent siblings)
  24 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2020-07-30  9:37 UTC (permalink / raw)
  To: intel-gfx; +Cc: thomas.hellstrom, Chris Wilson

Originally, we used the signal->lock as a means of following the
previous link in its timeline and peeking at the previous fence.
However, we have replaced the explicit serialisation with a series of
very careful probes that anticipate the links being deleted and the
fences recycled before we are able to acquire a strong reference to it.
We do not need the signal->lock crutch anymore, nor want the contention.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_request.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index e36253bb6124..247e021203c2 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -955,9 +955,16 @@ i915_request_await_start(struct i915_request *rq, struct i915_request *signal)
 	if (i915_request_started(signal))
 		return 0;
 
+	/*
+	 * The caller holds a reference on @signal, but we do not serialise
+	 * against it being retired and removed from the lists.
+	 *
+	 * We do not hold a reference to the request before @signal, and
+	 * so must be very careful to ensure that it is not _recycled_ as
+	 * we follow the link backwards.
+	 */
 	fence = NULL;
 	rcu_read_lock();
-	spin_lock_irq(&signal->lock);
 	do {
 		struct list_head *pos = READ_ONCE(signal->link.prev);
 		struct i915_request *prev;
@@ -988,7 +995,6 @@ i915_request_await_start(struct i915_request *rq, struct i915_request *signal)
 
 		fence = &prev->fence;
 	} while (0);
-	spin_unlock_irq(&signal->lock);
 	rcu_read_unlock();
 	if (!fence)
 		return 0;
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Intel-gfx] [PATCH 20/21] drm/i915: Drop i915_request.lock requirement for intel_rps_boost()
  2020-07-30  9:37 [Intel-gfx] Breadcrumbs fixes and stall avoidance Chris Wilson
                   ` (18 preceding siblings ...)
  2020-07-30  9:37 ` [Intel-gfx] [PATCH 19/21] drm/i915: Drop i915_request.lock serialisation around await_start Chris Wilson
@ 2020-07-30  9:37 ` Chris Wilson
  2020-07-30  9:37   ` [Intel-gfx] " Chris Wilson
                   ` (4 subsequent siblings)
  24 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2020-07-30  9:37 UTC (permalink / raw)
  To: intel-gfx; +Cc: thomas.hellstrom, Chris Wilson

Since we use a flag within i915_request.flags to indicate when we have
boosted the request (so that we only apply the boost) once, this can be
used as the serialisation with i915_request_retire() to avoid having to
explicitly take the i915_request.lock which is more heavily contended.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/intel_rps.c | 15 ++++++---------
 drivers/gpu/drm/i915/i915_request.c |  4 +---
 2 files changed, 7 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_rps.c b/drivers/gpu/drm/i915/gt/intel_rps.c
index e6a00eea0631..2a43e216e0d4 100644
--- a/drivers/gpu/drm/i915/gt/intel_rps.c
+++ b/drivers/gpu/drm/i915/gt/intel_rps.c
@@ -889,17 +889,15 @@ void intel_rps_park(struct intel_rps *rps)
 
 void intel_rps_boost(struct i915_request *rq)
 {
-	struct intel_rps *rps = &READ_ONCE(rq->engine)->gt->rps;
-	unsigned long flags;
-
-	if (i915_request_signaled(rq) || !intel_rps_is_active(rps))
+	if (i915_request_signaled(rq) || i915_request_has_waitboost(rq))
 		return;
 
 	/* Serializes with i915_request_retire() */
-	spin_lock_irqsave(&rq->lock, flags);
-	if (!i915_request_has_waitboost(rq) &&
-	    !dma_fence_is_signaled_locked(&rq->fence)) {
-		set_bit(I915_FENCE_FLAG_BOOST, &rq->fence.flags);
+	if (!test_and_set_bit(I915_FENCE_FLAG_BOOST, &rq->fence.flags)) {
+		struct intel_rps *rps = &READ_ONCE(rq->engine)->gt->rps;
+
+		if (!intel_rps_is_active(rps))
+			return;
 
 		GT_TRACE(rps_to_gt(rps), "boost fence:%llx:%llx\n",
 			 rq->fence.context, rq->fence.seqno);
@@ -910,7 +908,6 @@ void intel_rps_boost(struct i915_request *rq)
 
 		atomic_inc(&rps->boosts);
 	}
-	spin_unlock_irqrestore(&rq->lock, flags);
 }
 
 int intel_rps_set(struct intel_rps *rps, u8 val)
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 247e021203c2..43614d8fa18d 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -306,10 +306,8 @@ bool i915_request_retire(struct i915_request *rq)
 		spin_unlock_irq(&rq->lock);
 	}
 
-	if (i915_request_has_waitboost(rq)) {
-		GEM_BUG_ON(!atomic_read(&rq->engine->gt->rps.num_waiters));
+	if (test_and_set_bit(I915_FENCE_FLAG_BOOST, &rq->fence.flags))
 		atomic_dec(&rq->engine->gt->rps.num_waiters);
-	}
 
 	/*
 	 * We only loosely track inflight requests across preemption,
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 21/21] drm/i915/gem: Delay tracking the GEM context until it is registered
  2020-07-30  9:37 [Intel-gfx] Breadcrumbs fixes and stall avoidance Chris Wilson
@ 2020-07-30  9:37   ` Chris Wilson
  2020-07-30  9:37 ` [Intel-gfx] [PATCH 02/21] drm/i915: Skip taking acquire mutex for no ref->active callback Chris Wilson
                     ` (23 subsequent siblings)
  24 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2020-07-30  9:37 UTC (permalink / raw)
  To: intel-gfx
  Cc: tvrtko.ursulin, thomas.hellstrom, Chris Wilson, CQ Tang, stable,
	Mika Kuoppala

Avoid exposing a partially constructed context by deferring the
list_add() from the initial construction to the end of registration.
Otherwise, if we peek into the list of contexts from inside debugfs, we
may see the partially constructed context and chase down some dangling
incomplete pointers.

Reported-by: CQ Tang <cq.tang@intel.com>
Fixes: 3aa9945a528e ("drm/i915: Separate GEM context construction and registration to userspace")
References: f6e8aa387171 ("drm/i915: Report the number of closed vma held by each context in debugfs")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: CQ Tang <cq.tang@intel.com>
Cc: <stable@vger.kernel.org> # v5.2+
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index f43f0ca4eec9..d308cefb2b34 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -713,6 +713,7 @@ __create_context(struct drm_i915_private *i915)
 	ctx->i915 = i915;
 	ctx->sched.priority = I915_USER_PRIORITY(I915_PRIORITY_NORMAL);
 	mutex_init(&ctx->mutex);
+	INIT_LIST_HEAD(&ctx->link);
 
 	spin_lock_init(&ctx->stale.lock);
 	INIT_LIST_HEAD(&ctx->stale.engines);
@@ -740,10 +741,6 @@ __create_context(struct drm_i915_private *i915)
 	for (i = 0; i < ARRAY_SIZE(ctx->hang_timestamp); i++)
 		ctx->hang_timestamp[i] = jiffies - CONTEXT_FAST_HANG_JIFFIES;
 
-	spin_lock(&i915->gem.contexts.lock);
-	list_add_tail(&ctx->link, &i915->gem.contexts.list);
-	spin_unlock(&i915->gem.contexts.lock);
-
 	return ctx;
 
 err_free:
@@ -936,6 +933,7 @@ static int gem_context_register(struct i915_gem_context *ctx,
 				struct drm_i915_file_private *fpriv,
 				u32 *id)
 {
+	struct drm_i915_private *i915 = ctx->i915;
 	struct i915_address_space *vm;
 	int ret;
 
@@ -954,8 +952,16 @@ static int gem_context_register(struct i915_gem_context *ctx,
 	/* And finally expose ourselves to userspace via the idr */
 	ret = xa_alloc(&fpriv->context_xa, id, ctx, xa_limit_32b, GFP_KERNEL);
 	if (ret)
-		put_pid(fetch_and_zero(&ctx->pid));
+		goto err_pid;
+
+	spin_lock(&i915->gem.contexts.lock);
+	list_add_tail(&ctx->link, &i915->gem.contexts.list);
+	spin_unlock(&i915->gem.contexts.lock);
+
+	return 0;
 
+err_pid:
+	put_pid(fetch_and_zero(&ctx->pid));
 	return ret;
 }
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Intel-gfx] [PATCH 21/21] drm/i915/gem: Delay tracking the GEM context until it is registered
@ 2020-07-30  9:37   ` Chris Wilson
  0 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2020-07-30  9:37 UTC (permalink / raw)
  To: intel-gfx; +Cc: stable, Chris Wilson, thomas.hellstrom

Avoid exposing a partially constructed context by deferring the
list_add() from the initial construction to the end of registration.
Otherwise, if we peek into the list of contexts from inside debugfs, we
may see the partially constructed context and chase down some dangling
incomplete pointers.

Reported-by: CQ Tang <cq.tang@intel.com>
Fixes: 3aa9945a528e ("drm/i915: Separate GEM context construction and registration to userspace")
References: f6e8aa387171 ("drm/i915: Report the number of closed vma held by each context in debugfs")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: CQ Tang <cq.tang@intel.com>
Cc: <stable@vger.kernel.org> # v5.2+
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index f43f0ca4eec9..d308cefb2b34 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -713,6 +713,7 @@ __create_context(struct drm_i915_private *i915)
 	ctx->i915 = i915;
 	ctx->sched.priority = I915_USER_PRIORITY(I915_PRIORITY_NORMAL);
 	mutex_init(&ctx->mutex);
+	INIT_LIST_HEAD(&ctx->link);
 
 	spin_lock_init(&ctx->stale.lock);
 	INIT_LIST_HEAD(&ctx->stale.engines);
@@ -740,10 +741,6 @@ __create_context(struct drm_i915_private *i915)
 	for (i = 0; i < ARRAY_SIZE(ctx->hang_timestamp); i++)
 		ctx->hang_timestamp[i] = jiffies - CONTEXT_FAST_HANG_JIFFIES;
 
-	spin_lock(&i915->gem.contexts.lock);
-	list_add_tail(&ctx->link, &i915->gem.contexts.list);
-	spin_unlock(&i915->gem.contexts.lock);
-
 	return ctx;
 
 err_free:
@@ -936,6 +933,7 @@ static int gem_context_register(struct i915_gem_context *ctx,
 				struct drm_i915_file_private *fpriv,
 				u32 *id)
 {
+	struct drm_i915_private *i915 = ctx->i915;
 	struct i915_address_space *vm;
 	int ret;
 
@@ -954,8 +952,16 @@ static int gem_context_register(struct i915_gem_context *ctx,
 	/* And finally expose ourselves to userspace via the idr */
 	ret = xa_alloc(&fpriv->context_xa, id, ctx, xa_limit_32b, GFP_KERNEL);
 	if (ret)
-		put_pid(fetch_and_zero(&ctx->pid));
+		goto err_pid;
+
+	spin_lock(&i915->gem.contexts.lock);
+	list_add_tail(&ctx->link, &i915->gem.contexts.list);
+	spin_unlock(&i915->gem.contexts.lock);
+
+	return 0;
 
+err_pid:
+	put_pid(fetch_and_zero(&ctx->pid));
 	return ret;
 }
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/21] drm/i915: Add a couple of missing i915_active_fini()
  2020-07-30  9:37 [Intel-gfx] Breadcrumbs fixes and stall avoidance Chris Wilson
                   ` (20 preceding siblings ...)
  2020-07-30  9:37   ` [Intel-gfx] " Chris Wilson
@ 2020-07-30 13:45 ` Patchwork
  2020-07-30 13:46 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
                   ` (2 subsequent siblings)
  24 siblings, 0 replies; 39+ messages in thread
From: Patchwork @ 2020-07-30 13:45 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/21] drm/i915: Add a couple of missing i915_active_fini()
URL   : https://patchwork.freedesktop.org/series/80083/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
a0fa9abdd53b drm/i915: Add a couple of missing i915_active_fini()
67cc7c03075d drm/i915: Skip taking acquire mutex for no ref->active callback
a0e8fa99a854 drm/i915: Export a preallocate variant of i915_active_acquire()
262dde54a973 drm/i915: Keep the most recently used active-fence upon discard
924da40b21b0 drm/i915: Make the stale cached active node available for any timeline
84873e9f3274 drm/i915: Reduce locking around i915_active_acquire_preallocate_barrier()
bc7bca4d9864 drm/i915: Provide a fastpath for waiting on vma bindings
8a3376d81ddb drm/i915/gem: Reduce ctx->engine_mutex for reading the clone source
e5dd3b861541 drm/i915/gem: Reduce ctx->engines_mutex for get_engines()
0fcbf16a7b94 drm/i915: Remove requirement for holding i915_request.lock for breadcrumbs
0620516e4506 drm/i915/gt: Replace intel_engine_transfer_stale_breadcrumbs
cb5ecba76529 drm/i915/gt: Only transfer the virtual context to the new engine if active
-:28: WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description (prefer a maximum 75 chars per line)
#28: 
References: 6d06779e8672 ("drm/i915: Load balancing across a virtual engine"

-:28: ERROR:GIT_COMMIT_ID: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' - ie: 'commit 6d06779e8672 ("drm/i915: Load balancing across a virtual engine")'
#28: 
References: 6d06779e8672 ("drm/i915: Load balancing across a virtual engine"

total: 1 errors, 1 warnings, 0 checks, 81 lines checked
be5a292c6ab9 drm/i915/gt: Distinguish the virtual breadcrumbs from the irq breadcrumbs
-:212: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#212: 
new file mode 100644

total: 0 errors, 1 warnings, 0 checks, 513 lines checked
28e0222e4292 drm/i915/gt: Move intel_breadcrumbs_arm_irq earlier
10e0ecedd3dc drm/i915/gt: Hold context/request reference while breadcrumbs are active
ec7fa5b7b139 drm/i915/gt: Track signaled breadcrumbs outside of the breadcrumb spinlock
77ca5e185eed drm/i915/gt: Protect context lifetime with RCU
e7cea35077a1 drm/i915/gt: Split the breadcrumb spinlock between global and contexts
-:287: CHECK:UNCOMMENTED_DEFINITION: spinlock_t definition without comment
#287: FILE: drivers/gpu/drm/i915/gt/intel_context_types.h:54:
+	spinlock_t signal_lock;

total: 0 errors, 0 warnings, 1 checks, 248 lines checked
9497a93cc244 drm/i915: Drop i915_request.lock serialisation around await_start
73380f416eca drm/i915: Drop i915_request.lock requirement for intel_rps_boost()


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [Intel-gfx] ✗ Fi.CI.SPARSE: warning for series starting with [01/21] drm/i915: Add a couple of missing i915_active_fini()
  2020-07-30  9:37 [Intel-gfx] Breadcrumbs fixes and stall avoidance Chris Wilson
                   ` (21 preceding siblings ...)
  2020-07-30 13:45 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/21] drm/i915: Add a couple of missing i915_active_fini() Patchwork
@ 2020-07-30 13:46 ` Patchwork
  2020-07-30 14:04 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
  2020-07-30 19:20 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork
  24 siblings, 0 replies; 39+ messages in thread
From: Patchwork @ 2020-07-30 13:46 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/21] drm/i915: Add a couple of missing i915_active_fini()
URL   : https://patchwork.freedesktop.org/series/80083/
State : warning

== Summary ==

$ dim sparse --fast origin/drm-tip
Sparse version: v0.6.0
Fast mode used, each commit won't be checked separately.


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [Intel-gfx] ✓ Fi.CI.BAT: success for series starting with [01/21] drm/i915: Add a couple of missing i915_active_fini()
  2020-07-30  9:37 [Intel-gfx] Breadcrumbs fixes and stall avoidance Chris Wilson
                   ` (22 preceding siblings ...)
  2020-07-30 13:46 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
@ 2020-07-30 14:04 ` Patchwork
  2020-07-30 19:20 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork
  24 siblings, 0 replies; 39+ messages in thread
From: Patchwork @ 2020-07-30 14:04 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 7235 bytes --]

== Series Details ==

Series: series starting with [01/21] drm/i915: Add a couple of missing i915_active_fini()
URL   : https://patchwork.freedesktop.org/series/80083/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_8817 -> Patchwork_18271
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18271/index.html

Known issues
------------

  Here are the changes found in Patchwork_18271 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@kms_cursor_legacy@basic-flip-before-cursor-atomic:
    - fi-icl-u2:          [PASS][1] -> [DMESG-WARN][2] ([i915#1982])
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8817/fi-icl-u2/igt@kms_cursor_legacy@basic-flip-before-cursor-atomic.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18271/fi-icl-u2/igt@kms_cursor_legacy@basic-flip-before-cursor-atomic.html

  
#### Possible fixes ####

  * igt@gem_exec_suspend@basic-s0:
    - fi-tgl-u2:          [FAIL][3] ([i915#1888]) -> [PASS][4]
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8817/fi-tgl-u2/igt@gem_exec_suspend@basic-s0.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18271/fi-tgl-u2/igt@gem_exec_suspend@basic-s0.html

  * igt@i915_pm_rpm@basic-pci-d3-state:
    - fi-byt-j1900:       [DMESG-WARN][5] ([i915#1982]) -> [PASS][6]
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8817/fi-byt-j1900/igt@i915_pm_rpm@basic-pci-d3-state.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18271/fi-byt-j1900/igt@i915_pm_rpm@basic-pci-d3-state.html
    - {fi-tgl-dsi}:       [DMESG-WARN][7] ([i915#1982]) -> [PASS][8]
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8817/fi-tgl-dsi/igt@i915_pm_rpm@basic-pci-d3-state.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18271/fi-tgl-dsi/igt@i915_pm_rpm@basic-pci-d3-state.html

  * igt@i915_pm_rpm@module-reload:
    - fi-bsw-kefka:       [DMESG-WARN][9] ([i915#1982]) -> [PASS][10]
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8817/fi-bsw-kefka/igt@i915_pm_rpm@module-reload.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18271/fi-bsw-kefka/igt@i915_pm_rpm@module-reload.html

  * igt@i915_selftest@live@execlists:
    - fi-icl-u2:          [INCOMPLETE][11] ([i915#2089]) -> [PASS][12]
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8817/fi-icl-u2/igt@i915_selftest@live@execlists.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18271/fi-icl-u2/igt@i915_selftest@live@execlists.html

  * igt@kms_cursor_legacy@basic-busy-flip-before-cursor-atomic:
    - fi-bsw-n3050:       [DMESG-WARN][13] ([i915#1982]) -> [PASS][14] +1 similar issue
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8817/fi-bsw-n3050/igt@kms_cursor_legacy@basic-busy-flip-before-cursor-atomic.html
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18271/fi-bsw-n3050/igt@kms_cursor_legacy@basic-busy-flip-before-cursor-atomic.html

  
#### Warnings ####

  * igt@i915_pm_rpm@module-reload:
    - fi-kbl-guc:         [DMESG-WARN][15] ([i915#2203]) -> [DMESG-FAIL][16] ([i915#2203])
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8817/fi-kbl-guc/igt@i915_pm_rpm@module-reload.html
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18271/fi-kbl-guc/igt@i915_pm_rpm@module-reload.html

  * igt@kms_force_connector_basic@force-connector-state:
    - fi-kbl-x1275:       [DMESG-WARN][17] ([i915#62] / [i915#92] / [i915#95]) -> [DMESG-WARN][18] ([i915#62] / [i915#92]) +1 similar issue
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8817/fi-kbl-x1275/igt@kms_force_connector_basic@force-connector-state.html
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18271/fi-kbl-x1275/igt@kms_force_connector_basic@force-connector-state.html

  * igt@kms_force_connector_basic@force-edid:
    - fi-kbl-x1275:       [DMESG-WARN][19] ([i915#62] / [i915#92]) -> [DMESG-WARN][20] ([i915#62] / [i915#92] / [i915#95]) +5 similar issues
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8817/fi-kbl-x1275/igt@kms_force_connector_basic@force-edid.html
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18271/fi-kbl-x1275/igt@kms_force_connector_basic@force-edid.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [i915#1888]: https://gitlab.freedesktop.org/drm/intel/issues/1888
  [i915#1982]: https://gitlab.freedesktop.org/drm/intel/issues/1982
  [i915#2089]: https://gitlab.freedesktop.org/drm/intel/issues/2089
  [i915#2203]: https://gitlab.freedesktop.org/drm/intel/issues/2203
  [i915#62]: https://gitlab.freedesktop.org/drm/intel/issues/62
  [i915#92]: https://gitlab.freedesktop.org/drm/intel/issues/92
  [i915#95]: https://gitlab.freedesktop.org/drm/intel/issues/95


Participating hosts (42 -> 36)
------------------------------

  Missing    (6): fi-ilk-m540 fi-hsw-4200u fi-byt-squawks fi-bsw-cyan fi-byt-clapper fi-bdw-samus 


Build changes
-------------

  * Linux: CI_DRM_8817 -> Patchwork_18271

  CI-20190529: 20190529
  CI_DRM_8817: 9694a4caf26c3c4f3d50f335415218c709029450 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_5752: 3ecf9d88803a686354394ea60164551646235273 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_18271: 73380f416eca3b4dc5d554b730eeeb142ac7c9c9 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

73380f416eca drm/i915: Drop i915_request.lock requirement for intel_rps_boost()
9497a93cc244 drm/i915: Drop i915_request.lock serialisation around await_start
e7cea35077a1 drm/i915/gt: Split the breadcrumb spinlock between global and contexts
77ca5e185eed drm/i915/gt: Protect context lifetime with RCU
ec7fa5b7b139 drm/i915/gt: Track signaled breadcrumbs outside of the breadcrumb spinlock
10e0ecedd3dc drm/i915/gt: Hold context/request reference while breadcrumbs are active
28e0222e4292 drm/i915/gt: Move intel_breadcrumbs_arm_irq earlier
be5a292c6ab9 drm/i915/gt: Distinguish the virtual breadcrumbs from the irq breadcrumbs
cb5ecba76529 drm/i915/gt: Only transfer the virtual context to the new engine if active
0620516e4506 drm/i915/gt: Replace intel_engine_transfer_stale_breadcrumbs
0fcbf16a7b94 drm/i915: Remove requirement for holding i915_request.lock for breadcrumbs
e5dd3b861541 drm/i915/gem: Reduce ctx->engines_mutex for get_engines()
8a3376d81ddb drm/i915/gem: Reduce ctx->engine_mutex for reading the clone source
bc7bca4d9864 drm/i915: Provide a fastpath for waiting on vma bindings
84873e9f3274 drm/i915: Reduce locking around i915_active_acquire_preallocate_barrier()
924da40b21b0 drm/i915: Make the stale cached active node available for any timeline
262dde54a973 drm/i915: Keep the most recently used active-fence upon discard
a0e8fa99a854 drm/i915: Export a preallocate variant of i915_active_acquire()
67cc7c03075d drm/i915: Skip taking acquire mutex for no ref->active callback
a0fa9abdd53b drm/i915: Add a couple of missing i915_active_fini()

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18271/index.html

[-- Attachment #1.2: Type: text/html, Size: 8986 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [Intel-gfx] ✓ Fi.CI.IGT: success for series starting with [01/21] drm/i915: Add a couple of missing i915_active_fini()
  2020-07-30  9:37 [Intel-gfx] Breadcrumbs fixes and stall avoidance Chris Wilson
                   ` (23 preceding siblings ...)
  2020-07-30 14:04 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
@ 2020-07-30 19:20 ` Patchwork
  24 siblings, 0 replies; 39+ messages in thread
From: Patchwork @ 2020-07-30 19:20 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 17331 bytes --]

== Series Details ==

Series: series starting with [01/21] drm/i915: Add a couple of missing i915_active_fini()
URL   : https://patchwork.freedesktop.org/series/80083/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_8817_full -> Patchwork_18271_full
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  

Known issues
------------

  Here are the changes found in Patchwork_18271_full that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_exec_whisper@basic-queues-priority-all:
    - shard-glk:          [PASS][1] -> [DMESG-WARN][2] ([i915#118] / [i915#95])
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8817/shard-glk4/igt@gem_exec_whisper@basic-queues-priority-all.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18271/shard-glk6/igt@gem_exec_whisper@basic-queues-priority-all.html

  * igt@gem_huc_copy@huc-copy:
    - shard-tglb:         [PASS][3] -> [SKIP][4] ([i915#2190])
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8817/shard-tglb2/igt@gem_huc_copy@huc-copy.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18271/shard-tglb6/igt@gem_huc_copy@huc-copy.html

  * igt@gen9_exec_parse@allowed-all:
    - shard-apl:          [PASS][5] -> [DMESG-WARN][6] ([i915#1436] / [i915#1635] / [i915#716])
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8817/shard-apl3/igt@gen9_exec_parse@allowed-all.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18271/shard-apl2/igt@gen9_exec_parse@allowed-all.html

  * igt@i915_selftest@mock@requests:
    - shard-skl:          [PASS][7] -> [INCOMPLETE][8] ([i915#198] / [i915#2119])
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8817/shard-skl8/igt@i915_selftest@mock@requests.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18271/shard-skl7/igt@i915_selftest@mock@requests.html

  * igt@kms_big_fb@y-tiled-64bpp-rotate-180:
    - shard-glk:          [PASS][9] -> [DMESG-FAIL][10] ([i915#118] / [i915#95])
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8817/shard-glk9/igt@kms_big_fb@y-tiled-64bpp-rotate-180.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18271/shard-glk8/igt@kms_big_fb@y-tiled-64bpp-rotate-180.html

  * igt@kms_color@pipe-b-ctm-negative:
    - shard-skl:          [PASS][11] -> [DMESG-WARN][12] ([i915#1982]) +11 similar issues
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8817/shard-skl5/igt@kms_color@pipe-b-ctm-negative.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18271/shard-skl1/igt@kms_color@pipe-b-ctm-negative.html

  * igt@kms_cursor_legacy@short-flip-after-cursor-atomic-transitions-varying-size:
    - shard-apl:          [PASS][13] -> [DMESG-WARN][14] ([i915#1635] / [i915#1982]) +1 similar issue
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8817/shard-apl6/igt@kms_cursor_legacy@short-flip-after-cursor-atomic-transitions-varying-size.html
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18271/shard-apl6/igt@kms_cursor_legacy@short-flip-after-cursor-atomic-transitions-varying-size.html

  * igt@kms_flip@plain-flip-fb-recreate@c-edp1:
    - shard-skl:          [PASS][15] -> [FAIL][16] ([i915#2122])
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8817/shard-skl3/igt@kms_flip@plain-flip-fb-recreate@c-edp1.html
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18271/shard-skl2/igt@kms_flip@plain-flip-fb-recreate@c-edp1.html

  * igt@kms_frontbuffer_tracking@fbc-1p-offscren-pri-shrfb-draw-mmap-gtt:
    - shard-iclb:         [PASS][17] -> [DMESG-WARN][18] ([i915#1982])
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8817/shard-iclb5/igt@kms_frontbuffer_tracking@fbc-1p-offscren-pri-shrfb-draw-mmap-gtt.html
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18271/shard-iclb2/igt@kms_frontbuffer_tracking@fbc-1p-offscren-pri-shrfb-draw-mmap-gtt.html

  * igt@kms_frontbuffer_tracking@fbc-1p-primscrn-pri-indfb-draw-mmap-wc:
    - shard-kbl:          [PASS][19] -> [DMESG-WARN][20] ([i915#1982])
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8817/shard-kbl6/igt@kms_frontbuffer_tracking@fbc-1p-primscrn-pri-indfb-draw-mmap-wc.html
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18271/shard-kbl7/igt@kms_frontbuffer_tracking@fbc-1p-primscrn-pri-indfb-draw-mmap-wc.html

  * igt@kms_frontbuffer_tracking@fbc-suspend:
    - shard-kbl:          [PASS][21] -> [DMESG-WARN][22] ([i915#180]) +3 similar issues
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8817/shard-kbl2/igt@kms_frontbuffer_tracking@fbc-suspend.html
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18271/shard-kbl4/igt@kms_frontbuffer_tracking@fbc-suspend.html

  * igt@kms_frontbuffer_tracking@psr-1p-primscrn-pri-shrfb-draw-pwrite:
    - shard-tglb:         [PASS][23] -> [DMESG-WARN][24] ([i915#1982]) +2 similar issues
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8817/shard-tglb6/igt@kms_frontbuffer_tracking@psr-1p-primscrn-pri-shrfb-draw-pwrite.html
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18271/shard-tglb3/igt@kms_frontbuffer_tracking@psr-1p-primscrn-pri-shrfb-draw-pwrite.html

  * igt@kms_pipe_crc_basic@suspend-read-crc-pipe-a:
    - shard-kbl:          [PASS][25] -> [INCOMPLETE][26] ([i915#155])
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8817/shard-kbl7/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-a.html
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18271/shard-kbl1/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-a.html

  * igt@kms_plane_alpha_blend@pipe-a-constant-alpha-min:
    - shard-skl:          [PASS][27] -> [FAIL][28] ([fdo#108145] / [i915#265]) +1 similar issue
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8817/shard-skl9/igt@kms_plane_alpha_blend@pipe-a-constant-alpha-min.html
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18271/shard-skl2/igt@kms_plane_alpha_blend@pipe-a-constant-alpha-min.html

  * igt@kms_psr@psr2_cursor_plane_move:
    - shard-iclb:         [PASS][29] -> [SKIP][30] ([fdo#109441]) +1 similar issue
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8817/shard-iclb2/igt@kms_psr@psr2_cursor_plane_move.html
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18271/shard-iclb3/igt@kms_psr@psr2_cursor_plane_move.html

  * igt@kms_setmode@basic:
    - shard-kbl:          [PASS][31] -> [FAIL][32] ([i915#31])
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8817/shard-kbl6/igt@kms_setmode@basic.html
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18271/shard-kbl7/igt@kms_setmode@basic.html

  * igt@perf_pmu@semaphore-busy@rcs0:
    - shard-kbl:          [PASS][33] -> [FAIL][34] ([i915#1820])
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8817/shard-kbl7/igt@perf_pmu@semaphore-busy@rcs0.html
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18271/shard-kbl1/igt@perf_pmu@semaphore-busy@rcs0.html

  * igt@sysfs_heartbeat_interval@mixed@bcs0:
    - shard-skl:          [PASS][35] -> [FAIL][36] ([i915#1731])
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8817/shard-skl8/igt@sysfs_heartbeat_interval@mixed@bcs0.html
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18271/shard-skl7/igt@sysfs_heartbeat_interval@mixed@bcs0.html

  
#### Possible fixes ####

  * igt@gem_ctx_persistence@legacy-engines-mixed@render:
    - shard-skl:          [FAIL][37] -> [PASS][38]
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8817/shard-skl10/igt@gem_ctx_persistence@legacy-engines-mixed@render.html
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18271/shard-skl10/igt@gem_ctx_persistence@legacy-engines-mixed@render.html

  * igt@gem_ctx_persistence@processes:
    - shard-apl:          [FAIL][39] ([i915#1528] / [i915#1635]) -> [PASS][40]
   [39]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8817/shard-apl7/igt@gem_ctx_persistence@processes.html
   [40]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18271/shard-apl3/igt@gem_ctx_persistence@processes.html

  * igt@gem_exec_balancer@bonded-early:
    - shard-tglb:         [FAIL][41] ([i915#2079]) -> [PASS][42]
   [41]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8817/shard-tglb1/igt@gem_exec_balancer@bonded-early.html
   [42]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18271/shard-tglb3/igt@gem_exec_balancer@bonded-early.html

  * igt@gem_exec_reloc@basic-cpu-gtt-noreloc:
    - shard-skl:          [DMESG-WARN][43] ([i915#1982]) -> [PASS][44] +36 similar issues
   [43]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8817/shard-skl4/igt@gem_exec_reloc@basic-cpu-gtt-noreloc.html
   [44]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18271/shard-skl2/igt@gem_exec_reloc@basic-cpu-gtt-noreloc.html

  * igt@gem_exec_whisper@basic-forked:
    - shard-glk:          [DMESG-WARN][45] ([i915#118] / [i915#95]) -> [PASS][46]
   [45]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8817/shard-glk2/igt@gem_exec_whisper@basic-forked.html
   [46]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18271/shard-glk9/igt@gem_exec_whisper@basic-forked.html

  * igt@i915_pm_dc@dc6-psr:
    - shard-skl:          [FAIL][47] ([i915#454]) -> [PASS][48]
   [47]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8817/shard-skl9/igt@i915_pm_dc@dc6-psr.html
   [48]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18271/shard-skl4/igt@i915_pm_dc@dc6-psr.html

  * igt@kms_big_fb@x-tiled-64bpp-rotate-180:
    - shard-glk:          [DMESG-FAIL][49] ([i915#118] / [i915#95]) -> [PASS][50]
   [49]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8817/shard-glk8/igt@kms_big_fb@x-tiled-64bpp-rotate-180.html
   [50]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18271/shard-glk2/igt@kms_big_fb@x-tiled-64bpp-rotate-180.html

  * igt@kms_cursor_crc@pipe-a-cursor-128x128-random:
    - shard-apl:          [FAIL][51] ([i915#1635] / [i915#54]) -> [PASS][52]
   [51]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8817/shard-apl4/igt@kms_cursor_crc@pipe-a-cursor-128x128-random.html
   [52]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18271/shard-apl2/igt@kms_cursor_crc@pipe-a-cursor-128x128-random.html

  * igt@kms_cursor_edge_walk@pipe-b-64x64-right-edge:
    - shard-glk:          [DMESG-WARN][53] ([i915#1982]) -> [PASS][54] +1 similar issue
   [53]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8817/shard-glk9/igt@kms_cursor_edge_walk@pipe-b-64x64-right-edge.html
   [54]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18271/shard-glk8/igt@kms_cursor_edge_walk@pipe-b-64x64-right-edge.html
    - shard-apl:          [DMESG-WARN][55] ([i915#1635] / [i915#1982]) -> [PASS][56]
   [55]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8817/shard-apl4/igt@kms_cursor_edge_walk@pipe-b-64x64-right-edge.html
   [56]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18271/shard-apl6/igt@kms_cursor_edge_walk@pipe-b-64x64-right-edge.html

  * igt@kms_flip@flip-vs-suspend-interruptible@a-dp1:
    - shard-kbl:          [DMESG-WARN][57] ([i915#180]) -> [PASS][58] +4 similar issues
   [57]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8817/shard-kbl3/igt@kms_flip@flip-vs-suspend-interruptible@a-dp1.html
   [58]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18271/shard-kbl2/igt@kms_flip@flip-vs-suspend-interruptible@a-dp1.html

  * igt@kms_frontbuffer_tracking@fbcpsr-1p-primscrn-cur-indfb-draw-mmap-wc:
    - shard-tglb:         [DMESG-WARN][59] ([i915#1982]) -> [PASS][60] +1 similar issue
   [59]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8817/shard-tglb3/igt@kms_frontbuffer_tracking@fbcpsr-1p-primscrn-cur-indfb-draw-mmap-wc.html
   [60]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18271/shard-tglb7/igt@kms_frontbuffer_tracking@fbcpsr-1p-primscrn-cur-indfb-draw-mmap-wc.html

  * igt@kms_hdr@bpc-switch-dpms:
    - shard-skl:          [FAIL][61] ([i915#1188]) -> [PASS][62]
   [61]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8817/shard-skl8/igt@kms_hdr@bpc-switch-dpms.html
   [62]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18271/shard-skl5/igt@kms_hdr@bpc-switch-dpms.html

  * igt@kms_plane@plane-panning-bottom-right-suspend-pipe-a-planes:
    - shard-iclb:         [DMESG-WARN][63] ([i915#1982]) -> [PASS][64]
   [63]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8817/shard-iclb4/igt@kms_plane@plane-panning-bottom-right-suspend-pipe-a-planes.html
   [64]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18271/shard-iclb1/igt@kms_plane@plane-panning-bottom-right-suspend-pipe-a-planes.html

  * igt@kms_plane_alpha_blend@pipe-a-coverage-7efc:
    - shard-skl:          [FAIL][65] ([fdo#108145] / [i915#265]) -> [PASS][66]
   [65]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8817/shard-skl4/igt@kms_plane_alpha_blend@pipe-a-coverage-7efc.html
   [66]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18271/shard-skl2/igt@kms_plane_alpha_blend@pipe-a-coverage-7efc.html

  * igt@kms_psr2_su@frontbuffer:
    - shard-iclb:         [SKIP][67] ([fdo#109642] / [fdo#111068]) -> [PASS][68] +1 similar issue
   [67]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8817/shard-iclb5/igt@kms_psr2_su@frontbuffer.html
   [68]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18271/shard-iclb2/igt@kms_psr2_su@frontbuffer.html

  * igt@kms_psr@psr2_no_drrs:
    - shard-iclb:         [SKIP][69] ([fdo#109441]) -> [PASS][70] +1 similar issue
   [69]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8817/shard-iclb7/igt@kms_psr@psr2_no_drrs.html
   [70]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18271/shard-iclb2/igt@kms_psr@psr2_no_drrs.html

  * igt@kms_psr@suspend:
    - shard-skl:          [INCOMPLETE][71] ([i915#198]) -> [PASS][72]
   [71]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8817/shard-skl9/igt@kms_psr@suspend.html
   [72]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18271/shard-skl4/igt@kms_psr@suspend.html

  
#### Warnings ####

  * igt@kms_content_protection@atomic:
    - shard-kbl:          [TIMEOUT][73] ([i915#1319] / [i915#2119]) -> [TIMEOUT][74] ([i915#1319] / [i915#1958] / [i915#2119])
   [73]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8817/shard-kbl2/igt@kms_content_protection@atomic.html
   [74]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18271/shard-kbl6/igt@kms_content_protection@atomic.html

  * igt@kms_plane_alpha_blend@pipe-c-coverage-7efc:
    - shard-skl:          [DMESG-FAIL][75] ([fdo#108145] / [i915#1982]) -> [FAIL][76] ([fdo#108145] / [i915#265]) +2 similar issues
   [75]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8817/shard-skl6/igt@kms_plane_alpha_blend@pipe-c-coverage-7efc.html
   [76]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18271/shard-skl10/igt@kms_plane_alpha_blend@pipe-c-coverage-7efc.html

  
  [fdo#108145]: https://bugs.freedesktop.org/show_bug.cgi?id=108145
  [fdo#109441]: https://bugs.freedesktop.org/show_bug.cgi?id=109441
  [fdo#109642]: https://bugs.freedesktop.org/show_bug.cgi?id=109642
  [fdo#111068]: https://bugs.freedesktop.org/show_bug.cgi?id=111068
  [i915#118]: https://gitlab.freedesktop.org/drm/intel/issues/118
  [i915#1188]: https://gitlab.freedesktop.org/drm/intel/issues/1188
  [i915#1319]: https://gitlab.freedesktop.org/drm/intel/issues/1319
  [i915#1436]: https://gitlab.freedesktop.org/drm/intel/issues/1436
  [i915#1528]: https://gitlab.freedesktop.org/drm/intel/issues/1528
  [i915#155]: https://gitlab.freedesktop.org/drm/intel/issues/155
  [i915#1635]: https://gitlab.freedesktop.org/drm/intel/issues/1635
  [i915#1731]: https://gitlab.freedesktop.org/drm/intel/issues/1731
  [i915#180]: https://gitlab.freedesktop.org/drm/intel/issues/180
  [i915#1820]: https://gitlab.freedesktop.org/drm/intel/issues/1820
  [i915#1958]: https://gitlab.freedesktop.org/drm/intel/issues/1958
  [i915#198]: https://gitlab.freedesktop.org/drm/intel/issues/198
  [i915#1982]: https://gitlab.freedesktop.org/drm/intel/issues/1982
  [i915#2079]: https://gitlab.freedesktop.org/drm/intel/issues/2079
  [i915#2119]: https://gitlab.freedesktop.org/drm/intel/issues/2119
  [i915#2122]: https://gitlab.freedesktop.org/drm/intel/issues/2122
  [i915#2190]: https://gitlab.freedesktop.org/drm/intel/issues/2190
  [i915#265]: https://gitlab.freedesktop.org/drm/intel/issues/265
  [i915#31]: https://gitlab.freedesktop.org/drm/intel/issues/31
  [i915#454]: https://gitlab.freedesktop.org/drm/intel/issues/454
  [i915#54]: https://gitlab.freedesktop.org/drm/intel/issues/54
  [i915#716]: https://gitlab.freedesktop.org/drm/intel/issues/716
  [i915#95]: https://gitlab.freedesktop.org/drm/intel/issues/95


Participating hosts (11 -> 11)
------------------------------

  No changes in participating hosts


Build changes
-------------

  * Linux: CI_DRM_8817 -> Patchwork_18271

  CI-20190529: 20190529
  CI_DRM_8817: 9694a4caf26c3c4f3d50f335415218c709029450 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_5752: 3ecf9d88803a686354394ea60164551646235273 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_18271: 73380f416eca3b4dc5d554b730eeeb142ac7c9c9 @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18271/index.html

[-- Attachment #1.2: Type: text/html, Size: 20904 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Intel-gfx] [PATCH 03/21] drm/i915: Export a preallocate variant of i915_active_acquire()
  2020-07-30  9:37 ` [Intel-gfx] [PATCH 03/21] drm/i915: Export a preallocate variant of i915_active_acquire() Chris Wilson
@ 2020-07-31  7:33   ` Thomas Hellström (Intel)
  0 siblings, 0 replies; 39+ messages in thread
From: Thomas Hellström (Intel) @ 2020-07-31  7:33 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: thomas.hellstrom


On 7/30/20 11:37 AM, Chris Wilson wrote:
> Sometimes we have to be very careful not to allocate underneath a mutex
> (or spinlock) and yet still want to track activity. Enter
> i915_active_acquire_for_context(). This raises the activity counter on
> i915_active prior to use and ensures that the fence-tree contains a slot
> for the context.
>
> v2: Refactor active_lookup() so it can be called again before/after
> locking to resolve contention. Since we protect the rbtree until we
> idle, we can do a lockfree lookup, with the caveat that if another
> thread performs a concurrent insertion, the rotations from the insert
> may cause us to not find our target. A second pass holding the treelock
> will find the target if it exists, or the place to perform our
> insertion.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com>


> ---
>   .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |   2 +-
>   drivers/gpu/drm/i915/gt/intel_timeline.c      |   4 +-
>   drivers/gpu/drm/i915/i915_active.c            | 150 ++++++++++++++----
>   drivers/gpu/drm/i915/i915_active.h            |  12 +-
>   4 files changed, 130 insertions(+), 38 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> index b7a86cdec9b5..07cb2dd0f795 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> @@ -1729,7 +1729,7 @@ __parser_mark_active(struct i915_vma *vma,
>   {
>   	struct intel_gt_buffer_pool_node *node = vma->private;
>   
> -	return i915_active_ref(&node->active, tl, fence);
> +	return i915_active_ref(&node->active, tl->fence_context, fence);
>   }
>   
>   static int
> diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c b/drivers/gpu/drm/i915/gt/intel_timeline.c
> index 46d20f5f3ddc..acb43aebd669 100644
> --- a/drivers/gpu/drm/i915/gt/intel_timeline.c
> +++ b/drivers/gpu/drm/i915/gt/intel_timeline.c
> @@ -484,7 +484,9 @@ __intel_timeline_get_seqno(struct intel_timeline *tl,
>   	 * free it after the current request is retired, which ensures that
>   	 * all writes into the cacheline from previous requests are complete.
>   	 */
> -	err = i915_active_ref(&tl->hwsp_cacheline->active, tl, &rq->fence);
> +	err = i915_active_ref(&tl->hwsp_cacheline->active,
> +			      tl->fence_context,
> +			      &rq->fence);
>   	if (err)
>   		goto err_cacheline;
>   
> diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
> index 500537889e66..3a728401c09c 100644
> --- a/drivers/gpu/drm/i915/i915_active.c
> +++ b/drivers/gpu/drm/i915/i915_active.c
> @@ -28,12 +28,14 @@ static struct i915_global_active {
>   } global;
>   
>   struct active_node {
> +	struct rb_node node;
>   	struct i915_active_fence base;
>   	struct i915_active *ref;
> -	struct rb_node node;
>   	u64 timeline;
>   };
>   
> +#define fetch_node(x) rb_entry(READ_ONCE(x), typeof(struct active_node), node)
> +
>   static inline struct active_node *
>   node_from_active(struct i915_active_fence *active)
>   {
> @@ -216,12 +218,9 @@ excl_retire(struct dma_fence *fence, struct dma_fence_cb *cb)
>   		active_retire(container_of(cb, struct i915_active, excl.cb));
>   }
>   
> -static struct i915_active_fence *
> -active_instance(struct i915_active *ref, struct intel_timeline *tl)
> +static struct active_node *__active_lookup(struct i915_active *ref, u64 idx)
>   {
> -	struct active_node *node, *prealloc;
> -	struct rb_node **p, *parent;
> -	u64 idx = tl->fence_context;
> +	struct active_node *it;
>   
>   	/*
>   	 * We track the most recently used timeline to skip a rbtree search
> @@ -230,8 +229,39 @@ active_instance(struct i915_active *ref, struct intel_timeline *tl)
>   	 * after the previous activity has been retired, or if it matches the
>   	 * current timeline.
>   	 */
> -	node = READ_ONCE(ref->cache);
> -	if (node && node->timeline == idx)
> +	it = READ_ONCE(ref->cache);
> +	if (it && it->timeline == idx)
> +		return it;
> +
> +	BUILD_BUG_ON(offsetof(typeof(*it), node));
> +
> +	/* While active, the tree can only be built; not destroyed */
> +	GEM_BUG_ON(i915_active_is_idle(ref));
> +
> +	it = fetch_node(ref->tree.rb_node);
> +	while (it) {
> +		if (it->timeline < idx) {
> +			it = fetch_node(it->node.rb_right);
> +		} else if (it->timeline > idx) {
> +			it = fetch_node(it->node.rb_left);
> +		} else {
> +			WRITE_ONCE(ref->cache, it);
> +			break;
> +		}
> +	}
> +
> +	/* NB: If the tree rotated beneath us, we may miss our target. */
> +	return it;
> +}
> +
> +static struct i915_active_fence *
> +active_instance(struct i915_active *ref, u64 idx)
> +{
> +	struct active_node *node, *prealloc;
> +	struct rb_node **p, *parent;
> +
> +	node = __active_lookup(ref, idx);
> +	if (likely(node))
>   		return &node->base;
>   
>   	/* Preallocate a replacement, just in case */
> @@ -268,10 +298,9 @@ active_instance(struct i915_active *ref, struct intel_timeline *tl)
>   	rb_insert_color(&node->node, &ref->tree);
>   
>   out:
> -	ref->cache = node;
> +	WRITE_ONCE(ref->cache, node);
>   	spin_unlock_irq(&ref->tree_lock);
>   
> -	BUILD_BUG_ON(offsetof(typeof(*node), base));
>   	return &node->base;
>   }
>   
> @@ -353,63 +382,102 @@ __active_del_barrier(struct i915_active *ref, struct active_node *node)
>   	return ____active_del_barrier(ref, node, barrier_to_engine(node));
>   }
>   
> -int i915_active_ref(struct i915_active *ref,
> -		    struct intel_timeline *tl,
> -		    struct dma_fence *fence)
> +static bool
> +replace_barrier(struct i915_active *ref, struct i915_active_fence *active)
> +{
> +	if (!is_barrier(active)) /* proto-node used by our idle barrier? */
> +		return false;
> +
> +	/*
> +	 * This request is on the kernel_context timeline, and so
> +	 * we can use it to substitute for the pending idle-barrer
> +	 * request that we want to emit on the kernel_context.
> +	 */
> +	__active_del_barrier(ref, node_from_active(active));
> +	return true;
> +}
> +
> +int i915_active_ref(struct i915_active *ref, u64 idx, struct dma_fence *fence)
>   {
>   	struct i915_active_fence *active;
>   	int err;
>   
> -	lockdep_assert_held(&tl->mutex);
> -
>   	/* Prevent reaping in case we malloc/wait while building the tree */
>   	err = i915_active_acquire(ref);
>   	if (err)
>   		return err;
>   
> -	active = active_instance(ref, tl);
> +	active = active_instance(ref, idx);
>   	if (!active) {
>   		err = -ENOMEM;
>   		goto out;
>   	}
>   
> -	if (is_barrier(active)) { /* proto-node used by our idle barrier */
> -		/*
> -		 * This request is on the kernel_context timeline, and so
> -		 * we can use it to substitute for the pending idle-barrer
> -		 * request that we want to emit on the kernel_context.
> -		 */
> -		__active_del_barrier(ref, node_from_active(active));
> +	if (replace_barrier(ref, active)) {
>   		RCU_INIT_POINTER(active->fence, NULL);
>   		atomic_dec(&ref->count);
>   	}
>   	if (!__i915_active_fence_set(active, fence))
> -		atomic_inc(&ref->count);
> +		__i915_active_acquire(ref);
>   
>   out:
>   	i915_active_release(ref);
>   	return err;
>   }
>   
> -struct dma_fence *
> -i915_active_set_exclusive(struct i915_active *ref, struct dma_fence *f)
> +static struct dma_fence *
> +__i915_active_set_fence(struct i915_active *ref,
> +			struct i915_active_fence *active,
> +			struct dma_fence *fence)
>   {
>   	struct dma_fence *prev;
>   
> -	/* We expect the caller to manage the exclusive timeline ordering */
> -	GEM_BUG_ON(i915_active_is_idle(ref));
> +	if (replace_barrier(ref, active)) {
> +		RCU_INIT_POINTER(active->fence, fence);
> +		return NULL;
> +	}
>   
>   	rcu_read_lock();
> -	prev = __i915_active_fence_set(&ref->excl, f);
> +	prev = __i915_active_fence_set(active, fence);
>   	if (prev)
>   		prev = dma_fence_get_rcu(prev);
>   	else
> -		atomic_inc(&ref->count);
> +		__i915_active_acquire(ref);
>   	rcu_read_unlock();
>   
>   	return prev;
>   }
>   
> +static struct i915_active_fence *
> +__active_fence(struct i915_active *ref, u64 idx)
> +{
> +	struct active_node *it;
> +
> +	it = __active_lookup(ref, idx);
> +	if (unlikely(!it)) { /* Contention with parallel tree builders! */
> +		spin_lock_irq(&ref->tree_lock);
> +		it = __active_lookup(ref, idx);
> +		spin_unlock_irq(&ref->tree_lock);
> +	}
> +	GEM_BUG_ON(!it); /* slot must be preallocated */
> +
> +	return &it->base;
> +}
> +
> +struct dma_fence *
> +__i915_active_ref(struct i915_active *ref, u64 idx, struct dma_fence *fence)
> +{
> +	/* Only valid while active, see i915_active_acquire_for_context() */
> +	return __i915_active_set_fence(ref, __active_fence(ref, idx), fence);
> +}
> +
> +struct dma_fence *
> +i915_active_set_exclusive(struct i915_active *ref, struct dma_fence *f)
> +{
> +	/* We expect the caller to manage the exclusive timeline ordering */
> +	return __i915_active_set_fence(ref, &ref->excl, f);
> +}
> +
>   bool i915_active_acquire_if_busy(struct i915_active *ref)
>   {
>   	debug_active_assert(ref);
> @@ -451,6 +519,24 @@ int i915_active_acquire(struct i915_active *ref)
>   	return err;
>   }
>   
> +int i915_active_acquire_for_context(struct i915_active *ref, u64 idx)
> +{
> +	struct i915_active_fence *active;
> +	int err;
> +
> +	err = i915_active_acquire(ref);
> +	if (err)
> +		return err;
> +
> +	active = active_instance(ref, idx);
> +	if (!active) {
> +		i915_active_release(ref);
> +		return -ENOMEM;
> +	}
> +
> +	return 0; /* return with active ref */
> +}
> +
>   void i915_active_release(struct i915_active *ref)
>   {
>   	debug_active_assert(ref);
> @@ -754,7 +840,7 @@ static struct active_node *reuse_idle_barrier(struct i915_active *ref, u64 idx)
>   match:
>   	rb_erase(p, &ref->tree); /* Hide from waits and sibling allocations */
>   	if (p == &ref->cache->node)
> -		ref->cache = NULL;
> +		WRITE_ONCE(ref->cache, NULL);
>   	spin_unlock_irq(&ref->tree_lock);
>   
>   	return rb_entry(p, struct active_node, node);
> @@ -812,7 +898,7 @@ int i915_active_acquire_preallocate_barrier(struct i915_active *ref,
>   			 */
>   			RCU_INIT_POINTER(node->base.fence, ERR_PTR(-EAGAIN));
>   			node->base.cb.node.prev = (void *)engine;
> -			atomic_inc(&ref->count);
> +			__i915_active_acquire(ref);
>   		}
>   		GEM_BUG_ON(rcu_access_pointer(node->base.fence) != ERR_PTR(-EAGAIN));
>   
> diff --git a/drivers/gpu/drm/i915/i915_active.h b/drivers/gpu/drm/i915/i915_active.h
> index cf4058150966..73ded3c52a04 100644
> --- a/drivers/gpu/drm/i915/i915_active.h
> +++ b/drivers/gpu/drm/i915/i915_active.h
> @@ -163,14 +163,16 @@ void __i915_active_init(struct i915_active *ref,
>   	__i915_active_init(ref, active, retire, &__mkey, &__wkey);	\
>   } while (0)
>   
> -int i915_active_ref(struct i915_active *ref,
> -		    struct intel_timeline *tl,
> -		    struct dma_fence *fence);
> +struct dma_fence *
> +__i915_active_ref(struct i915_active *ref, u64 idx, struct dma_fence *fence);
> +int i915_active_ref(struct i915_active *ref, u64 idx, struct dma_fence *fence);
>   
>   static inline int
>   i915_active_add_request(struct i915_active *ref, struct i915_request *rq)
>   {
> -	return i915_active_ref(ref, i915_request_timeline(rq), &rq->fence);
> +	return i915_active_ref(ref,
> +			       i915_request_timeline(rq)->fence_context,
> +			       &rq->fence);
>   }
>   
>   struct dma_fence *
> @@ -198,7 +200,9 @@ int i915_request_await_active(struct i915_request *rq,
>   #define I915_ACTIVE_AWAIT_BARRIER BIT(2)
>   
>   int i915_active_acquire(struct i915_active *ref);
> +int i915_active_acquire_for_context(struct i915_active *ref, u64 idx);
>   bool i915_active_acquire_if_busy(struct i915_active *ref);
> +
>   void i915_active_release(struct i915_active *ref);
>   
>   static inline void __i915_active_acquire(struct i915_active *ref)
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Intel-gfx] [PATCH 06/21] drm/i915: Reduce locking around i915_active_acquire_preallocate_barrier()
  2020-07-30  9:37 ` [Intel-gfx] [PATCH 06/21] drm/i915: Reduce locking around i915_active_acquire_preallocate_barrier() Chris Wilson
@ 2020-07-31  7:39   ` Thomas Hellström (Intel)
  0 siblings, 0 replies; 39+ messages in thread
From: Thomas Hellström (Intel) @ 2020-07-31  7:39 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: thomas.hellstrom


On 7/30/20 11:37 AM, Chris Wilson wrote:
> As the conversion between idle-barrier and full i915_active_fence is
> already serialised by explicit memory barriers, we can reduce the
> spinlock in i915_active_acquire_preallocate_barrier() for finding an
> idle-barrier to reuse to an RCU read lock to ensure the fence remains
> valid, only taking the spinlock for the update of the rbtree itself.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Thomas Hellström <thomas.hellstrom@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_active.c | 6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
> index 7b51045c8461..5dd52bb6d38c 100644
> --- a/drivers/gpu/drm/i915/i915_active.c
> +++ b/drivers/gpu/drm/i915/i915_active.c
> @@ -807,7 +807,6 @@ static struct active_node *reuse_idle_barrier(struct i915_active *ref, u64 idx)
>   	if (RB_EMPTY_ROOT(&ref->tree))
>   		return NULL;
>   
> -	spin_lock_irq(&ref->tree_lock);
>   	GEM_BUG_ON(i915_active_is_idle(ref));
>   
>   	/*
> @@ -872,11 +871,10 @@ static struct active_node *reuse_idle_barrier(struct i915_active *ref, u64 idx)
>   			goto match;
>   	}
>   
> -	spin_unlock_irq(&ref->tree_lock);
> -
>   	return NULL;
>   
>   match:
> +	spin_lock_irq(&ref->tree_lock);
>   	rb_erase(p, &ref->tree); /* Hide from waits and sibling allocations */
>   	if (p == &ref->cache->node)
>   		WRITE_ONCE(ref->cache, NULL);
> @@ -911,7 +909,9 @@ int i915_active_acquire_preallocate_barrier(struct i915_active *ref,
>   		struct llist_node *prev = first;
>   		struct active_node *node;
>   
> +		rcu_read_lock();
>   		node = reuse_idle_barrier(ref, idx);
> +		rcu_read_unlock();
>   		if (!node) {
>   			node = kmem_cache_alloc(global.slab_cache, GFP_KERNEL);
>   			if (!node) {
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Intel-gfx] [PATCH 13/21] drm/i915/gt: Distinguish the virtual breadcrumbs from the irq breadcrumbs
  2020-07-30  9:37 ` [Intel-gfx] [PATCH 13/21] drm/i915/gt: Distinguish the virtual breadcrumbs from the irq breadcrumbs Chris Wilson
@ 2020-07-31 14:53   ` Tvrtko Ursulin
  0 siblings, 0 replies; 39+ messages in thread
From: Tvrtko Ursulin @ 2020-07-31 14:53 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: thomas.hellstrom


On 30/07/2020 10:37, Chris Wilson wrote:
> On the virtual engines, we only use the intel_breadcrumbs for tracking
> signaling of stale breadcrumbs from the irq_workers. They do not have
> any associated interrupt handling, active requests are passed to a
> physical engine and associated breadcrumb interrupt handler. This causes
> issues for us as we need to ensure that we do not actually try and
> enable interrupts and the powermanagement required for them on the
> virtual engine, as they will never be disabled. Instead, let's
> specify the physical engine used for interrupt handler on a particular
> breadcrumb.
> 
> v2: Drop b->irq_armed = true mocking for no interrupt HW
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_breadcrumbs.c   | 76 ++++++++++---------
>   drivers/gpu/drm/i915/gt/intel_breadcrumbs.h   | 36 +++++++++
>   .../gpu/drm/i915/gt/intel_breadcrumbs_types.h | 47 ++++++++++++
>   drivers/gpu/drm/i915/gt/intel_engine.h        | 17 -----
>   drivers/gpu/drm/i915/gt/intel_engine_cs.c     | 14 +++-
>   drivers/gpu/drm/i915/gt/intel_engine_pm.c     |  3 +-
>   drivers/gpu/drm/i915/gt/intel_engine_types.h  | 31 +-------
>   drivers/gpu/drm/i915/gt/intel_gt_irq.c        |  1 +
>   drivers/gpu/drm/i915/gt/intel_lrc.c           | 11 ++-
>   drivers/gpu/drm/i915/gt/intel_reset.c         |  1 +
>   .../gpu/drm/i915/gt/intel_ring_submission.c   |  3 +-
>   drivers/gpu/drm/i915/gt/intel_rps.c           |  1 +
>   drivers/gpu/drm/i915/gt/mock_engine.c         | 10 ++-
>   drivers/gpu/drm/i915/i915_irq.c               |  1 +
>   drivers/gpu/drm/i915/i915_request.c           |  1 +
>   drivers/gpu/drm/i915/i915_request.h           |  4 -
>   16 files changed, 162 insertions(+), 95 deletions(-)
>   create mode 100644 drivers/gpu/drm/i915/gt/intel_breadcrumbs.h
>   create mode 100644 drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> index fbdc465a5870..2ffd47a86656 100644
> --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> @@ -28,6 +28,7 @@
>   
>   #include "i915_drv.h"
>   #include "i915_trace.h"
> +#include "intel_breadcrumbs.h"
>   #include "intel_gt_pm.h"
>   #include "intel_gt_requests.h"
>   
> @@ -55,30 +56,28 @@ static void irq_disable(struct intel_engine_cs *engine)
>   
>   static void __intel_breadcrumbs_disarm_irq(struct intel_breadcrumbs *b)
>   {
> -	struct intel_engine_cs *engine =
> -		container_of(b, struct intel_engine_cs, breadcrumbs);
> -
>   	lockdep_assert_held(&b->irq_lock);
>   
> +	if (!b->irq_engine || !b->irq_armed)
> +		return;
> +
>   	GEM_BUG_ON(!b->irq_enabled);
>   	if (!--b->irq_enabled)
> -		irq_disable(engine);
> +		irq_disable(b->irq_engine);
>   
>   	WRITE_ONCE(b->irq_armed, false);
> -	intel_gt_pm_put_async(engine->gt);
> +	intel_gt_pm_put_async(b->irq_engine->gt);
>   }
>   
> -void intel_engine_disarm_breadcrumbs(struct intel_engine_cs *engine)
> +void intel_breadcrumbs_park(struct intel_breadcrumbs *b)
>   {
> -	struct intel_breadcrumbs *b = &engine->breadcrumbs;
>   	unsigned long flags;
>   
>   	if (!READ_ONCE(b->irq_armed))
>   		return;
>   
>   	spin_lock_irqsave(&b->irq_lock, flags);
> -	if (b->irq_armed)
> -		__intel_breadcrumbs_disarm_irq(b);
> +	__intel_breadcrumbs_disarm_irq(b);
>   	spin_unlock_irqrestore(&b->irq_lock, flags);
>   }
>   
> @@ -133,13 +132,8 @@ __dma_fence_signal__notify(struct dma_fence *fence,
>   
>   static void add_retire(struct intel_breadcrumbs *b, struct intel_timeline *tl)
>   {
> -	struct intel_engine_cs *engine =
> -		container_of(b, struct intel_engine_cs, breadcrumbs);
> -
> -	if (unlikely(intel_engine_is_virtual(engine)))
> -		engine = intel_virtual_engine_get_sibling(engine, 0);
> -
> -	intel_engine_add_retire(engine, tl);
> +	if (b->irq_engine)
> +		intel_engine_add_retire(b->irq_engine, tl);
>   }
>   
>   static bool __signal_request(struct i915_request *rq, struct list_head *signals)
> @@ -164,7 +158,7 @@ static void signal_irq_work(struct irq_work *work)
>   
>   	spin_lock(&b->irq_lock);
>   
> -	if (b->irq_armed && list_empty(&b->signalers))
> +	if (list_empty(&b->signalers))
>   		__intel_breadcrumbs_disarm_irq(b);
>   
>   	list_splice_init(&b->signaled_requests, &signal);
> @@ -222,14 +216,12 @@ static void signal_irq_work(struct irq_work *work)
>   
>   static void __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b)
>   {
> -	struct intel_engine_cs *engine =
> -		container_of(b, struct intel_engine_cs, breadcrumbs);
> -
>   	lockdep_assert_held(&b->irq_lock);
> -	if (b->irq_armed)
> +
> +	if (!b->irq_engine || b->irq_armed)
>   		return;
>   
> -	if (!intel_gt_pm_get_if_awake(engine->gt))
> +	if (!intel_gt_pm_get_if_awake(b->irq_engine->gt))
>   		return;
>   
>   	/*
> @@ -249,37 +241,49 @@ static void __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b)
>   	 */
>   
>   	if (!b->irq_enabled++)
> -		irq_enable(engine);
> +		irq_enable(b->irq_engine);
>   }
>   
> -void intel_engine_init_breadcrumbs(struct intel_engine_cs *engine)
> +struct intel_breadcrumbs *
> +intel_breadcrumbs_create(struct intel_engine_cs *irq_engine)
>   {
> -	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> +	struct intel_breadcrumbs *b;
> +
> +	b = kzalloc(sizeof(*b), GFP_KERNEL);
> +	if (!b)
> +		return NULL;
>   
>   	spin_lock_init(&b->irq_lock);
>   	INIT_LIST_HEAD(&b->signalers);
>   	INIT_LIST_HEAD(&b->signaled_requests);
>   
>   	init_irq_work(&b->irq_work, signal_irq_work);
> +
> +	b->irq_engine = irq_engine;
> +
> +	return b;
>   }
>   
> -void intel_engine_reset_breadcrumbs(struct intel_engine_cs *engine)
> +void intel_breadcrumbs_reset(struct intel_breadcrumbs *b)
>   {
> -	struct intel_breadcrumbs *b = &engine->breadcrumbs;
>   	unsigned long flags;
>   
> +	if (!b->irq_engine)
> +		return;
> +
>   	spin_lock_irqsave(&b->irq_lock, flags);
>   
>   	if (b->irq_enabled)
> -		irq_enable(engine);
> +		irq_enable(b->irq_engine);
>   	else
> -		irq_disable(engine);
> +		irq_disable(b->irq_engine);
>   
>   	spin_unlock_irqrestore(&b->irq_lock, flags);
>   }
>   
> -void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine)
> +void intel_breadcrumbs_free(struct intel_breadcrumbs *b)
>   {
> +	kfree(b);
>   }
>   
>   static void insert_breadcrumb(struct i915_request *rq,
> @@ -369,11 +373,11 @@ bool i915_request_enable_breadcrumb(struct i915_request *rq)
>   	 * request submit/unsubmit path, and so we must be more careful to
>   	 * acquire the right lock.
>   	 */
> -	b = &READ_ONCE(rq->engine)->breadcrumbs;
> +	b = READ_ONCE(rq->engine)->breadcrumbs;
>   	spin_lock(&b->irq_lock);
> -	while (unlikely(b != &READ_ONCE(rq->engine)->breadcrumbs)) {
> +	while (unlikely(b != READ_ONCE(rq->engine)->breadcrumbs)) {
>   		spin_unlock(&b->irq_lock);
> -		b = &READ_ONCE(rq->engine)->breadcrumbs;
> +		b = READ_ONCE(rq->engine)->breadcrumbs;
>   		spin_lock(&b->irq_lock);
>   	}
>   
> @@ -394,7 +398,7 @@ bool i915_request_enable_breadcrumb(struct i915_request *rq)
>   
>   void i915_request_cancel_breadcrumb(struct i915_request *rq)
>   {
> -	struct intel_breadcrumbs *b = &rq->engine->breadcrumbs;
> +	struct intel_breadcrumbs *b = rq->engine->breadcrumbs;
>   
>   	/*
>   	 * We must wait for b->irq_lock so that we know the interrupt handler
> @@ -418,11 +422,11 @@ void i915_request_cancel_breadcrumb(struct i915_request *rq)
>   void intel_engine_print_breadcrumbs(struct intel_engine_cs *engine,
>   				    struct drm_printer *p)
>   {
> -	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> +	struct intel_breadcrumbs *b = engine->breadcrumbs;
>   	struct intel_context *ce;
>   	struct i915_request *rq;
>   
> -	if (list_empty(&b->signalers))
> +	if (!b || list_empty(&b->signalers))
>   		return;
>   
>   	drm_printf(p, "Signals:\n");
> diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h
> new file mode 100644
> index 000000000000..ed3d1deabfbd
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h
> @@ -0,0 +1,36 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2019 Intel Corporation
> + */
> +
> +#ifndef __INTEL_BREADCRUMBS__
> +#define __INTEL_BREADCRUMBS__
> +
> +#include <linux/irq_work.h>
> +
> +#include "intel_engine_types.h"
> +
> +struct drm_printer;
> +struct i915_request;
> +struct intel_breadcrumbs;
> +
> +struct intel_breadcrumbs *
> +intel_breadcrumbs_create(struct intel_engine_cs *irq_engine);
> +void intel_breadcrumbs_free(struct intel_breadcrumbs *b);
> +
> +void intel_breadcrumbs_reset(struct intel_breadcrumbs *b);
> +void intel_breadcrumbs_park(struct intel_breadcrumbs *b);
> +
> +static inline void
> +intel_engine_signal_breadcrumbs(struct intel_engine_cs *engine)
> +{
> +	irq_work_queue(&engine->breadcrumbs->irq_work);
> +}
> +
> +void intel_engine_print_breadcrumbs(struct intel_engine_cs *engine,
> +				    struct drm_printer *p);
> +
> +bool i915_request_enable_breadcrumb(struct i915_request *request);
> +void i915_request_cancel_breadcrumb(struct i915_request *request);
> +
> +#endif /* __INTEL_BREADCRUMBS__ */
> diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h b/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
> new file mode 100644
> index 000000000000..8e53b9942695
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
> @@ -0,0 +1,47 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2019 Intel Corporation
> + */
> +
> +#ifndef __INTEL_BREADCRUMBS_TYPES__
> +#define __INTEL_BREADCRUMBS_TYPES__
> +
> +#include <linux/irq_work.h>
> +#include <linux/list.h>
> +#include <linux/spinlock.h>
> +#include <linux/types.h>
> +
> +/*
> + * Rather than have every client wait upon all user interrupts,
> + * with the herd waking after every interrupt and each doing the
> + * heavyweight seqno dance, we delegate the task (of being the
> + * bottom-half of the user interrupt) to the first client. After
> + * every interrupt, we wake up one client, who does the heavyweight
> + * coherent seqno read and either goes back to sleep (if incomplete),
> + * or wakes up all the completed clients in parallel, before then
> + * transferring the bottom-half status to the next client in the queue.
> + *
> + * Compared to walking the entire list of waiters in a single dedicated
> + * bottom-half, we reduce the latency of the first waiter by avoiding
> + * a context switch, but incur additional coherent seqno reads when
> + * following the chain of request breadcrumbs. Since it is most likely
> + * that we have a single client waiting on each seqno, then reducing
> + * the overhead of waking that client is much preferred.
> + */
> +struct intel_breadcrumbs {
> +	spinlock_t irq_lock; /* protects the lists used in hardirq context */
> +
> +	/* Not all breadcrumbs are attached to physical HW */
> +	struct intel_engine_cs *irq_engine;
> +
> +	struct list_head signalers;
> +	struct list_head signaled_requests;
> +
> +	struct irq_work irq_work; /* for use from inside irq_lock */
> +
> +	unsigned int irq_enabled;
> +
> +	bool irq_armed;
> +};
> +
> +#endif /* __INTEL_BREADCRUMBS_TYPES__ */
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
> index faf00a353e25..08e2c000dcc3 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine.h
> @@ -223,23 +223,6 @@ void intel_engine_get_instdone(const struct intel_engine_cs *engine,
>   
>   void intel_engine_init_execlists(struct intel_engine_cs *engine);
>   
> -void intel_engine_init_breadcrumbs(struct intel_engine_cs *engine);
> -void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine);
> -
> -void intel_engine_disarm_breadcrumbs(struct intel_engine_cs *engine);
> -
> -static inline void
> -intel_engine_signal_breadcrumbs(struct intel_engine_cs *engine)
> -{
> -	irq_work_queue(&engine->breadcrumbs.irq_work);
> -}
> -
> -void intel_engine_reset_breadcrumbs(struct intel_engine_cs *engine);
> -void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine);
> -
> -void intel_engine_print_breadcrumbs(struct intel_engine_cs *engine,
> -				    struct drm_printer *p);
> -
>   static inline u32 *__gen8_emit_pipe_control(u32 *batch, u32 flags0, u32 flags1, u32 offset)
>   {
>   	memset(batch, 0, 6 * sizeof(u32));
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> index dd1a42c4d344..c20a91c1318f 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> @@ -28,6 +28,7 @@
>   
>   #include "i915_drv.h"
>   
> +#include "intel_breadcrumbs.h"
>   #include "intel_context.h"
>   #include "intel_engine.h"
>   #include "intel_engine_pm.h"
> @@ -700,8 +701,13 @@ static int engine_setup_common(struct intel_engine_cs *engine)
>   	if (err)
>   		return err;
>   
> +	engine->breadcrumbs = intel_breadcrumbs_create(engine);
> +	if (!engine->breadcrumbs) {
> +		err = -ENOMEM;
> +		goto err_status;
> +	}
> +
>   	intel_engine_init_active(engine, ENGINE_PHYSICAL);
> -	intel_engine_init_breadcrumbs(engine);
>   	intel_engine_init_execlists(engine);
>   	intel_engine_init_cmd_parser(engine);
>   	intel_engine_init__pm(engine);
> @@ -716,6 +722,10 @@ static int engine_setup_common(struct intel_engine_cs *engine)
>   	intel_engine_init_ctx_wa(engine);
>   
>   	return 0;
> +
> +err_status:
> +	cleanup_status_page(engine);
> +	return err;
>   }
>   
>   struct measure_breadcrumb {
> @@ -902,9 +912,9 @@ void intel_engine_cleanup_common(struct intel_engine_cs *engine)
>   	tasklet_kill(&engine->execlists.tasklet); /* flush the callback */
>   
>   	cleanup_status_page(engine);
> +	intel_breadcrumbs_free(engine->breadcrumbs);
>   
>   	intel_engine_fini_retire(engine);
> -	intel_engine_fini_breadcrumbs(engine);
>   	intel_engine_cleanup_cmd_parser(engine);
>   
>   	if (engine->default_state)
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
> index 8ec3eecf3e39..f7b2e07e2229 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
> @@ -6,6 +6,7 @@
>   
>   #include "i915_drv.h"
>   
> +#include "intel_breadcrumbs.h"
>   #include "intel_context.h"
>   #include "intel_engine.h"
>   #include "intel_engine_heartbeat.h"
> @@ -247,7 +248,7 @@ static int __engine_park(struct intel_wakeref *wf)
>   	call_idle_barriers(engine); /* cleanup after wedging */
>   
>   	intel_engine_park_heartbeat(engine);
> -	intel_engine_disarm_breadcrumbs(engine);
> +	intel_breadcrumbs_park(engine->breadcrumbs);
>   
>   	/* Must be reset upon idling, or we may miss the busy wakeup. */
>   	GEM_BUG_ON(engine->execlists.queue_priority_hint != INT_MIN);
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> index 8de92fd7d392..c400aaa2287b 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> @@ -22,6 +22,7 @@
>   #include "i915_pmu.h"
>   #include "i915_priolist_types.h"
>   #include "i915_selftest.h"
> +#include "intel_breadcrumbs_types.h"
>   #include "intel_sseu.h"
>   #include "intel_timeline_types.h"
>   #include "intel_uncore.h"
> @@ -373,34 +374,8 @@ struct intel_engine_cs {
>   	 */
>   	struct ewma__engine_latency latency;
>   
> -	/* Rather than have every client wait upon all user interrupts,
> -	 * with the herd waking after every interrupt and each doing the
> -	 * heavyweight seqno dance, we delegate the task (of being the
> -	 * bottom-half of the user interrupt) to the first client. After
> -	 * every interrupt, we wake up one client, who does the heavyweight
> -	 * coherent seqno read and either goes back to sleep (if incomplete),
> -	 * or wakes up all the completed clients in parallel, before then
> -	 * transferring the bottom-half status to the next client in the queue.
> -	 *
> -	 * Compared to walking the entire list of waiters in a single dedicated
> -	 * bottom-half, we reduce the latency of the first waiter by avoiding
> -	 * a context switch, but incur additional coherent seqno reads when
> -	 * following the chain of request breadcrumbs. Since it is most likely
> -	 * that we have a single client waiting on each seqno, then reducing
> -	 * the overhead of waking that client is much preferred.
> -	 */
> -	struct intel_breadcrumbs {
> -		spinlock_t irq_lock;
> -		struct list_head signalers;
> -
> -		struct list_head signaled_requests;
> -
> -		struct irq_work irq_work; /* for use from inside irq_lock */
> -
> -		unsigned int irq_enabled;
> -
> -		bool irq_armed;
> -	} breadcrumbs;
> +	/* Keep track of all the seqno used, a trail of breadcrumbs */
> +	struct intel_breadcrumbs *breadcrumbs;
>   
>   	struct intel_engine_pmu {
>   		/**
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_irq.c b/drivers/gpu/drm/i915/gt/intel_gt_irq.c
> index b05da68e52f4..257063a57101 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_irq.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_irq.c
> @@ -8,6 +8,7 @@
>   
>   #include "i915_drv.h"
>   #include "i915_irq.h"
> +#include "intel_breadcrumbs.h"
>   #include "intel_gt.h"
>   #include "intel_gt_irq.h"
>   #include "intel_uncore.h"
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> index 904e9b8bcbf6..03cac9303dfc 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> @@ -137,6 +137,7 @@
>   #include "i915_perf.h"
>   #include "i915_trace.h"
>   #include "i915_vgpu.h"
> +#include "intel_breadcrumbs.h"
>   #include "intel_context.h"
>   #include "intel_engine_pm.h"
>   #include "intel_gt.h"
> @@ -4119,7 +4120,7 @@ static int execlists_resume(struct intel_engine_cs *engine)
>   {
>   	intel_mocs_init_engine(engine);
>   
> -	intel_engine_reset_breadcrumbs(engine);
> +	intel_breadcrumbs_reset(engine->breadcrumbs);
>   
>   	if (GEM_SHOW_DEBUG() && unexpected_starting_state(engine)) {
>   		struct drm_printer p = drm_debug_printer(__func__);
> @@ -5715,9 +5716,7 @@ intel_execlists_create_virtual(struct intel_engine_cs **siblings,
>   	snprintf(ve->base.name, sizeof(ve->base.name), "virtual");
>   
>   	intel_engine_init_active(&ve->base, ENGINE_VIRTUAL);
> -	intel_engine_init_breadcrumbs(&ve->base);
>   	intel_engine_init_execlists(&ve->base);
> -	ve->base.breadcrumbs.irq_armed = true; /* fake HW, used for irq_work */
>   
>   	ve->base.cops = &virtual_context_ops;
>   	ve->base.request_alloc = execlists_request_alloc;
> @@ -5734,6 +5733,12 @@ intel_execlists_create_virtual(struct intel_engine_cs **siblings,
>   
>   	intel_context_init(&ve->context, &ve->base);
>   
> +	ve->base.breadcrumbs = intel_breadcrumbs_create(NULL);
> +	if (!ve->base.breadcrumbs) {
> +		err = -ENOMEM;
> +		goto err_put;
> +	}
> +
>   	for (n = 0; n < count; n++) {
>   		struct intel_engine_cs *sibling = siblings[n];
>   
> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
> index 46a5ceffc22f..ac36b67fb46b 100644
> --- a/drivers/gpu/drm/i915/gt/intel_reset.c
> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
> @@ -15,6 +15,7 @@
>   #include "i915_drv.h"
>   #include "i915_gpu_error.h"
>   #include "i915_irq.h"
> +#include "intel_breadcrumbs.h"
>   #include "intel_engine_pm.h"
>   #include "intel_gt.h"
>   #include "intel_gt_pm.h"
> diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> index 94915f668715..186aa2d3a83e 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> @@ -32,6 +32,7 @@
>   #include "gen6_ppgtt.h"
>   #include "gen7_renderclear.h"
>   #include "i915_drv.h"
> +#include "intel_breadcrumbs.h"
>   #include "intel_context.h"
>   #include "intel_gt.h"
>   #include "intel_reset.h"
> @@ -255,7 +256,7 @@ static int xcs_resume(struct intel_engine_cs *engine)
>   	else
>   		ring_setup_status_page(engine);
>   
> -	intel_engine_reset_breadcrumbs(engine);
> +	intel_breadcrumbs_reset(engine->breadcrumbs);
>   
>   	/* Enforce ordering by reading HEAD register back */
>   	ENGINE_POSTING_READ(engine, RING_HEAD);
> diff --git a/drivers/gpu/drm/i915/gt/intel_rps.c b/drivers/gpu/drm/i915/gt/intel_rps.c
> index 97ba14ad52e4..e6a00eea0631 100644
> --- a/drivers/gpu/drm/i915/gt/intel_rps.c
> +++ b/drivers/gpu/drm/i915/gt/intel_rps.c
> @@ -7,6 +7,7 @@
>   #include <drm/i915_drm.h>
>   
>   #include "i915_drv.h"
> +#include "intel_breadcrumbs.h"
>   #include "intel_gt.h"
>   #include "intel_gt_clock_utils.h"
>   #include "intel_gt_irq.h"
> diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c
> index 06303ba98c19..d1971ffdca42 100644
> --- a/drivers/gpu/drm/i915/gt/mock_engine.c
> +++ b/drivers/gpu/drm/i915/gt/mock_engine.c
> @@ -261,11 +261,12 @@ static void mock_engine_release(struct intel_engine_cs *engine)
>   
>   	GEM_BUG_ON(timer_pending(&mock->hw_delay));
>   
> +	intel_breadcrumbs_free(engine->breadcrumbs);
> +
>   	intel_context_unpin(engine->kernel_context);
>   	intel_context_put(engine->kernel_context);
>   
>   	intel_engine_fini_retire(engine);
> -	intel_engine_fini_breadcrumbs(engine);
>   }
>   
>   struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
> @@ -323,11 +324,14 @@ int mock_engine_init(struct intel_engine_cs *engine)
>   	struct intel_context *ce;
>   
>   	intel_engine_init_active(engine, ENGINE_MOCK);
> -	intel_engine_init_breadcrumbs(engine);
>   	intel_engine_init_execlists(engine);
>   	intel_engine_init__pm(engine);
>   	intel_engine_init_retire(engine);
>   
> +	engine->breadcrumbs = intel_breadcrumbs_create(NULL);
> +	if (!engine->breadcrumbs)
> +		return -ENOMEM;
> +
>   	ce = create_kernel_context(engine);
>   	if (IS_ERR(ce))
>   		goto err_breadcrumbs;
> @@ -339,7 +343,7 @@ int mock_engine_init(struct intel_engine_cs *engine)
>   	return 0;
>   
>   err_breadcrumbs:
> -	intel_engine_fini_breadcrumbs(engine);
> +	intel_breadcrumbs_free(engine->breadcrumbs);
>   	return -ENOMEM;
>   }
>   
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 1fa67700d8f4..f113fe44572b 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -41,6 +41,7 @@
>   #include "display/intel_lpe_audio.h"
>   #include "display/intel_psr.h"
>   
> +#include "gt/intel_breadcrumbs.h"
>   #include "gt/intel_gt.h"
>   #include "gt/intel_gt_irq.h"
>   #include "gt/intel_gt_pm_irq.h"
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 0cf2a10a24f1..3b121fa38c60 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -31,6 +31,7 @@
>   #include <linux/sched/signal.h>
>   
>   #include "gem/i915_gem_context.h"
> +#include "gt/intel_breadcrumbs.h"
>   #include "gt/intel_context.h"
>   #include "gt/intel_ring.h"
>   #include "gt/intel_rps.h"
> diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
> index fc18378c685d..16b721080195 100644
> --- a/drivers/gpu/drm/i915/i915_request.h
> +++ b/drivers/gpu/drm/i915/i915_request.h
> @@ -361,10 +361,6 @@ void i915_request_submit(struct i915_request *request);
>   void __i915_request_unsubmit(struct i915_request *request);
>   void i915_request_unsubmit(struct i915_request *request);
>   
> -/* Note: part of the intel_breadcrumbs family */
> -bool i915_request_enable_breadcrumb(struct i915_request *request);
> -void i915_request_cancel_breadcrumb(struct i915_request *request);
> -
>   long i915_request_wait(struct i915_request *rq,
>   		       unsigned int flags,
>   		       long timeout)
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Intel-gfx] [PATCH 16/21] drm/i915/gt: Track signaled breadcrumbs outside of the breadcrumb spinlock
  2020-07-30  9:37 ` [Intel-gfx] [PATCH 16/21] drm/i915/gt: Track signaled breadcrumbs outside of the breadcrumb spinlock Chris Wilson
@ 2020-07-31 15:06   ` Tvrtko Ursulin
  2020-07-31 15:12     ` Chris Wilson
  0 siblings, 1 reply; 39+ messages in thread
From: Tvrtko Ursulin @ 2020-07-31 15:06 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: thomas.hellstrom


On 30/07/2020 10:37, Chris Wilson wrote:
> Make b->signaled_requests a lockless-list so that we can manipulate it
> outside of the b->irq_lock.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/gt/intel_breadcrumbs.c   | 30 +++++++++++--------
>   .../gpu/drm/i915/gt/intel_breadcrumbs_types.h |  2 +-
>   drivers/gpu/drm/i915/i915_request.h           |  6 +++-
>   3 files changed, 23 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> index fc6f0223d2c8..6a278bf0fc6b 100644
> --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> @@ -174,16 +174,13 @@ static void add_retire(struct intel_breadcrumbs *b, struct intel_timeline *tl)
>   		intel_engine_add_retire(b->irq_engine, tl);
>   }
>   
> -static bool __signal_request(struct i915_request *rq, struct list_head *signals)
> +static bool __signal_request(struct i915_request *rq)
>   {
> -	clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags);
> -
>   	if (!__dma_fence_signal(&rq->fence)) {
>   		i915_request_put(rq);
>   		return false;
>   	}
>   
> -	list_add_tail(&rq->signal_link, signals);
>   	return true;
>   }
>   
> @@ -191,17 +188,19 @@ static void signal_irq_work(struct irq_work *work)
>   {
>   	struct intel_breadcrumbs *b = container_of(work, typeof(*b), irq_work);
>   	const ktime_t timestamp = ktime_get();
> +	struct llist_node *signal, *sn;
>   	struct intel_context *ce, *cn;
>   	struct list_head *pos, *next;
> -	LIST_HEAD(signal);
> +
> +	signal = NULL;
> +	if (unlikely(!llist_empty(&b->signaled_requests)))
> +		signal = llist_del_all(&b->signaled_requests);
>   
>   	spin_lock(&b->irq_lock);
>   
> -	if (list_empty(&b->signalers))
> +	if (!signal && list_empty(&b->signalers))

The only open from previous round was on this change. If I understood 
your previous reply correctly, checking this or not simply controls the 
disarm point and is not related to this patch. With the check added now 
we would disarm later, because even already signaled requests would keep 
it armed. I would prefer this was a separate patch if you could possibly 
be convinced.

Regards,

Tvrtko

>   		__intel_breadcrumbs_disarm_irq(b);
>   
> -	list_splice_init(&b->signaled_requests, &signal);
> -
>   	list_for_each_entry_safe(ce, cn, &b->signalers, signal_link) {
>   		GEM_BUG_ON(list_empty(&ce->signals));
>   
> @@ -218,7 +217,11 @@ static void signal_irq_work(struct irq_work *work)
>   			 * spinlock as the callback chain may end up adding
>   			 * more signalers to the same context or engine.
>   			 */
> -			__signal_request(rq, &signal);
> +			clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags);
> +			if (__signal_request(rq)) {
> +				rq->signal_node.next = signal;
> +				signal = &rq->signal_node;
> +			}
>   		}
>   
>   		/*
> @@ -238,9 +241,9 @@ static void signal_irq_work(struct irq_work *work)
>   
>   	spin_unlock(&b->irq_lock);
>   
> -	list_for_each_safe(pos, next, &signal) {
> +	llist_for_each_safe(signal, sn, signal) {
>   		struct i915_request *rq =
> -			list_entry(pos, typeof(*rq), signal_link);
> +			llist_entry(signal, typeof(*rq), signal_node);
>   		struct list_head cb_list;
>   
>   		spin_lock(&rq->lock);
> @@ -264,7 +267,7 @@ intel_breadcrumbs_create(struct intel_engine_cs *irq_engine)
>   
>   	spin_lock_init(&b->irq_lock);
>   	INIT_LIST_HEAD(&b->signalers);
> -	INIT_LIST_HEAD(&b->signaled_requests);
> +	init_llist_head(&b->signaled_requests);
>   
>   	init_irq_work(&b->irq_work, signal_irq_work);
>   
> @@ -327,7 +330,8 @@ static void insert_breadcrumb(struct i915_request *rq,
>   	 * its signal completion.
>   	 */
>   	if (__request_completed(rq)) {
> -		if (__signal_request(rq, &b->signaled_requests))
> +		if (__signal_request(rq) &&
> +		    llist_add(&rq->signal_node, &b->signaled_requests))
>   			irq_work_queue(&b->irq_work);
>   		return;
>   	}
> diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h b/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
> index 8e53b9942695..3fa19820b37a 100644
> --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
> @@ -35,7 +35,7 @@ struct intel_breadcrumbs {
>   	struct intel_engine_cs *irq_engine;
>   
>   	struct list_head signalers;
> -	struct list_head signaled_requests;
> +	struct llist_head signaled_requests;
>   
>   	struct irq_work irq_work; /* for use from inside irq_lock */
>   
> diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
> index 16b721080195..874af6db6103 100644
> --- a/drivers/gpu/drm/i915/i915_request.h
> +++ b/drivers/gpu/drm/i915/i915_request.h
> @@ -176,7 +176,11 @@ struct i915_request {
>   	struct intel_context *context;
>   	struct intel_ring *ring;
>   	struct intel_timeline __rcu *timeline;
> -	struct list_head signal_link;
> +
> +	union {
> +		struct list_head signal_link;
> +		struct llist_node signal_node;
> +	};
>   
>   	/*
>   	 * The rcu epoch of when this request was allocated. Used to judiciously
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Intel-gfx] [PATCH 16/21] drm/i915/gt: Track signaled breadcrumbs outside of the breadcrumb spinlock
  2020-07-31 15:06   ` Tvrtko Ursulin
@ 2020-07-31 15:12     ` Chris Wilson
  2020-07-31 15:21       ` Chris Wilson
  2020-07-31 15:32       ` Tvrtko Ursulin
  0 siblings, 2 replies; 39+ messages in thread
From: Chris Wilson @ 2020-07-31 15:12 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx; +Cc: thomas.hellstrom

Quoting Tvrtko Ursulin (2020-07-31 16:06:55)
> 
> On 30/07/2020 10:37, Chris Wilson wrote:
> > @@ -191,17 +188,19 @@ static void signal_irq_work(struct irq_work *work)
> >   {
> >       struct intel_breadcrumbs *b = container_of(work, typeof(*b), irq_work);
> >       const ktime_t timestamp = ktime_get();
> > +     struct llist_node *signal, *sn;
> >       struct intel_context *ce, *cn;
> >       struct list_head *pos, *next;
> > -     LIST_HEAD(signal);
> > +
> > +     signal = NULL;
> > +     if (unlikely(!llist_empty(&b->signaled_requests)))
> > +             signal = llist_del_all(&b->signaled_requests);
> >   
> >       spin_lock(&b->irq_lock);
> >   
> > -     if (list_empty(&b->signalers))
> > +     if (!signal && list_empty(&b->signalers))
> 
> The only open from previous round was on this change. If I understood 
> your previous reply correctly, checking this or not simply controls the 
> disarm point and is not related to this patch. With the check added now 
> we would disarm later, because even already signaled requests would keep 
> it armed. I would prefer this was a separate patch if you could possibly 
> be convinced.

I considered that since we add to the lockless list and then queue the
irq work, that is a path that is not driven by the interrupt and so
causing an issue with the idea of the interrupt shadow. Having a simple
test I thought was a positive side-effect to filter out the early
irq_work.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Intel-gfx] [PATCH 17/21] drm/i915/gt: Protect context lifetime with RCU
  2020-07-30  9:37 ` [Intel-gfx] [PATCH 17/21] drm/i915/gt: Protect context lifetime with RCU Chris Wilson
@ 2020-07-31 15:15   ` Tvrtko Ursulin
  2020-07-31 15:24     ` Chris Wilson
  0 siblings, 1 reply; 39+ messages in thread
From: Tvrtko Ursulin @ 2020-07-31 15:15 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: thomas.hellstrom


On 30/07/2020 10:37, Chris Wilson wrote:
> Allow a brief period for continued access to a dead intel_context by
> deferring the release of the struct until after an RCU grace period.
> As we are using a dedicated slab cache for the contexts, we can defer
> the release of the slab pages via RCU, with the caveat that individual
> structs may be reused from the freelist within an RCU grace period. To
> handle that, we have to avoid clearing members of the zombie struct.

What was the motivation?

Regards,

Tvrtko

> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/gt/intel_context.c | 330 +++++++++++++-----------
>   drivers/gpu/drm/i915/i915_active.c      |  10 +
>   drivers/gpu/drm/i915/i915_active.h      |   2 +
>   drivers/gpu/drm/i915/i915_utils.h       |   7 +
>   4 files changed, 202 insertions(+), 147 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
> index 52db2bde44a3..4e7924640ffa 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context.c
> +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> @@ -22,7 +22,7 @@ static struct i915_global_context {
>   
>   static struct intel_context *intel_context_alloc(void)
>   {
> -	return kmem_cache_zalloc(global.slab_ce, GFP_KERNEL);
> +	return kmem_cache_alloc(global.slab_ce, GFP_KERNEL);
>   }
>   
>   void intel_context_free(struct intel_context *ce)
> @@ -30,6 +30,177 @@ void intel_context_free(struct intel_context *ce)
>   	kmem_cache_free(global.slab_ce, ce);
>   }
>   
> +static int __context_pin_state(struct i915_vma *vma)
> +{
> +	unsigned int bias = i915_ggtt_pin_bias(vma) | PIN_OFFSET_BIAS;
> +	int err;
> +
> +	err = i915_ggtt_pin(vma, 0, bias | PIN_HIGH);
> +	if (err)
> +		return err;
> +
> +	err = i915_active_acquire(&vma->active);
> +	if (err)
> +		goto err_unpin;
> +
> +	/*
> +	 * And mark it as a globally pinned object to let the shrinker know
> +	 * it cannot reclaim the object until we release it.
> +	 */
> +	i915_vma_make_unshrinkable(vma);
> +	vma->obj->mm.dirty = true;
> +
> +	return 0;
> +
> +err_unpin:
> +	i915_vma_unpin(vma);
> +	return err;
> +}
> +
> +static void __context_unpin_state(struct i915_vma *vma)
> +{
> +	i915_vma_make_shrinkable(vma);
> +	i915_active_release(&vma->active);
> +	__i915_vma_unpin(vma);
> +}
> +
> +static int __ring_active(struct intel_ring *ring)
> +{
> +	int err;
> +
> +	err = intel_ring_pin(ring);
> +	if (err)
> +		return err;
> +
> +	err = i915_active_acquire(&ring->vma->active);
> +	if (err)
> +		goto err_pin;
> +
> +	return 0;
> +
> +err_pin:
> +	intel_ring_unpin(ring);
> +	return err;
> +}
> +
> +static void __ring_retire(struct intel_ring *ring)
> +{
> +	i915_active_release(&ring->vma->active);
> +	intel_ring_unpin(ring);
> +}
> +
> +__i915_active_call
> +static void __intel_context_retire(struct i915_active *active)
> +{
> +	struct intel_context *ce = container_of(active, typeof(*ce), active);
> +
> +	CE_TRACE(ce, "retire runtime: { total:%lluns, avg:%lluns }\n",
> +		 intel_context_get_total_runtime_ns(ce),
> +		 intel_context_get_avg_runtime_ns(ce));
> +
> +	set_bit(CONTEXT_VALID_BIT, &ce->flags);
> +	if (ce->state)
> +		__context_unpin_state(ce->state);
> +
> +	intel_timeline_unpin(ce->timeline);
> +	__ring_retire(ce->ring);
> +
> +	intel_context_put(ce);
> +}
> +
> +static int __intel_context_active(struct i915_active *active)
> +{
> +	struct intel_context *ce = container_of(active, typeof(*ce), active);
> +	int err;
> +
> +	CE_TRACE(ce, "active\n");
> +
> +	intel_context_get(ce);
> +
> +	err = __ring_active(ce->ring);
> +	if (err)
> +		goto err_put;
> +
> +	err = intel_timeline_pin(ce->timeline);
> +	if (err)
> +		goto err_ring;
> +
> +	if (!ce->state)
> +		return 0;
> +
> +	err = __context_pin_state(ce->state);
> +	if (err)
> +		goto err_timeline;
> +
> +	return 0;
> +
> +err_timeline:
> +	intel_timeline_unpin(ce->timeline);
> +err_ring:
> +	__ring_retire(ce->ring);
> +err_put:
> +	intel_context_put(ce);
> +	return err;
> +}
> +
> +static void __intel_context_ctor(void *arg)
> +{
> +	struct intel_context *ce = arg;
> +
> +	INIT_LIST_HEAD(&ce->signal_link);
> +	INIT_LIST_HEAD(&ce->signals);
> +
> +	atomic_set(&ce->pin_count, 0);
> +	mutex_init(&ce->pin_mutex);
> +
> +	ce->active_count = 0;
> +	i915_active_init(&ce->active,
> +			 __intel_context_active, __intel_context_retire);
> +
> +	ce->inflight = NULL;
> +	ce->lrc_reg_state = NULL;
> +	ce->lrc.desc = 0;
> +}
> +
> +static void
> +__intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
> +{
> +	GEM_BUG_ON(!engine->cops);
> +	GEM_BUG_ON(!engine->gt->vm);
> +
> +	kref_init(&ce->ref);
> +	i915_active_reinit(&ce->active);
> +	mutex_reinit(&ce->pin_mutex);
> +
> +	ce->engine = engine;
> +	ce->ops = engine->cops;
> +	ce->sseu = engine->sseu;
> +
> +	ce->wa_bb_page = 0;
> +	ce->flags = 0;
> +	ce->tag = 0;
> +
> +	memset(&ce->runtime, 0, sizeof(ce->runtime));
> +
> +	ce->vm = i915_vm_get(engine->gt->vm);
> +	ce->gem_context = NULL;
> +
> +	ce->ring = __intel_context_ring_size(SZ_4K);
> +	ce->timeline = NULL;
> +	ce->state = NULL;
> +
> +	GEM_BUG_ON(atomic_read(&ce->pin_count));
> +	GEM_BUG_ON(ce->active_count);
> +	GEM_BUG_ON(ce->inflight);
> +}
> +
> +void
> +intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
> +{
> +	__intel_context_ctor(ce);
> +	__intel_context_init(ce, engine);
> +}
> +
>   struct intel_context *
>   intel_context_create(struct intel_engine_cs *engine)
>   {
> @@ -39,7 +210,7 @@ intel_context_create(struct intel_engine_cs *engine)
>   	if (!ce)
>   		return ERR_PTR(-ENOMEM);
>   
> -	intel_context_init(ce, engine);
> +	__intel_context_init(ce, engine);
>   	return ce;
>   }
>   
> @@ -158,154 +329,13 @@ void intel_context_unpin(struct intel_context *ce)
>   	/*
>   	 * Once released, we may asynchronously drop the active reference.
>   	 * As that may be the only reference keeping the context alive,
> -	 * take an extra now so that it is not freed before we finish
> +	 * hold onto RCU so that it is not freed before we finish
>   	 * dereferencing it.
>   	 */
> -	intel_context_get(ce);
> +	rcu_read_lock();
>   	intel_context_active_release(ce);
> -	intel_context_put(ce);
> -}
> -
> -static int __context_pin_state(struct i915_vma *vma)
> -{
> -	unsigned int bias = i915_ggtt_pin_bias(vma) | PIN_OFFSET_BIAS;
> -	int err;
> -
> -	err = i915_ggtt_pin(vma, 0, bias | PIN_HIGH);
> -	if (err)
> -		return err;
> -
> -	err = i915_active_acquire(&vma->active);
> -	if (err)
> -		goto err_unpin;
> -
> -	/*
> -	 * And mark it as a globally pinned object to let the shrinker know
> -	 * it cannot reclaim the object until we release it.
> -	 */
> -	i915_vma_make_unshrinkable(vma);
> -	vma->obj->mm.dirty = true;
> -
> -	return 0;
> -
> -err_unpin:
> -	i915_vma_unpin(vma);
> -	return err;
> -}
> -
> -static void __context_unpin_state(struct i915_vma *vma)
> -{
> -	i915_vma_make_shrinkable(vma);
> -	i915_active_release(&vma->active);
> -	__i915_vma_unpin(vma);
> -}
> -
> -static int __ring_active(struct intel_ring *ring)
> -{
> -	int err;
> -
> -	err = intel_ring_pin(ring);
> -	if (err)
> -		return err;
> -
> -	err = i915_active_acquire(&ring->vma->active);
> -	if (err)
> -		goto err_pin;
> -
> -	return 0;
> -
> -err_pin:
> -	intel_ring_unpin(ring);
> -	return err;
> -}
> -
> -static void __ring_retire(struct intel_ring *ring)
> -{
> -	i915_active_release(&ring->vma->active);
> -	intel_ring_unpin(ring);
> +	rcu_read_unlock();
>   }
> -
> -__i915_active_call
> -static void __intel_context_retire(struct i915_active *active)
> -{
> -	struct intel_context *ce = container_of(active, typeof(*ce), active);
> -
> -	CE_TRACE(ce, "retire runtime: { total:%lluns, avg:%lluns }\n",
> -		 intel_context_get_total_runtime_ns(ce),
> -		 intel_context_get_avg_runtime_ns(ce));
> -
> -	set_bit(CONTEXT_VALID_BIT, &ce->flags);
> -	if (ce->state)
> -		__context_unpin_state(ce->state);
> -
> -	intel_timeline_unpin(ce->timeline);
> -	__ring_retire(ce->ring);
> -
> -	intel_context_put(ce);
> -}
> -
> -static int __intel_context_active(struct i915_active *active)
> -{
> -	struct intel_context *ce = container_of(active, typeof(*ce), active);
> -	int err;
> -
> -	CE_TRACE(ce, "active\n");
> -
> -	intel_context_get(ce);
> -
> -	err = __ring_active(ce->ring);
> -	if (err)
> -		goto err_put;
> -
> -	err = intel_timeline_pin(ce->timeline);
> -	if (err)
> -		goto err_ring;
> -
> -	if (!ce->state)
> -		return 0;
> -
> -	err = __context_pin_state(ce->state);
> -	if (err)
> -		goto err_timeline;
> -
> -	return 0;
> -
> -err_timeline:
> -	intel_timeline_unpin(ce->timeline);
> -err_ring:
> -	__ring_retire(ce->ring);
> -err_put:
> -	intel_context_put(ce);
> -	return err;
> -}
> -
> -void
> -intel_context_init(struct intel_context *ce,
> -		   struct intel_engine_cs *engine)
> -{
> -	GEM_BUG_ON(!engine->cops);
> -	GEM_BUG_ON(!engine->gt->vm);
> -
> -	kref_init(&ce->ref);
> -
> -	ce->engine = engine;
> -	ce->ops = engine->cops;
> -	ce->sseu = engine->sseu;
> -	ce->ring = __intel_context_ring_size(SZ_4K);
> -
> -	ewma_runtime_init(&ce->runtime.avg);
> -
> -	ce->vm = i915_vm_get(engine->gt->vm);
> -
> -	INIT_LIST_HEAD(&ce->signal_link);
> -	INIT_LIST_HEAD(&ce->signals);
> -
> -	mutex_init(&ce->pin_mutex);
> -
> -	i915_active_init(&ce->active,
> -			 __intel_context_active, __intel_context_retire);
> -}
> -
>   void intel_context_fini(struct intel_context *ce)
>   {
>   	if (ce->timeline)
> @@ -333,7 +363,13 @@ static struct i915_global_context global = { {
>   
>   int __init i915_global_context_init(void)
>   {
> -	global.slab_ce = KMEM_CACHE(intel_context, SLAB_HWCACHE_ALIGN);
> +	global.slab_ce =
> +		kmem_cache_create("intel_context",
> +				  sizeof(struct intel_context),
> +				  __alignof__(struct intel_context),
> +				  SLAB_HWCACHE_ALIGN |
> +				  SLAB_TYPESAFE_BY_RCU,
> +				  __intel_context_ctor);
>   	if (!global.slab_ce)
>   		return -ENOMEM;
>   
> diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
> index 5dd52bb6d38c..878fe6664f19 100644
> --- a/drivers/gpu/drm/i915/i915_active.c
> +++ b/drivers/gpu/drm/i915/i915_active.c
> @@ -795,6 +795,16 @@ void i915_active_fini(struct i915_active *ref)
>   		kmem_cache_free(global.slab_cache, ref->cache);
>   }
>   
> +void i915_active_reinit(struct i915_active *ref)
> +{
> +	GEM_BUG_ON(!i915_active_is_idle(ref));
> +	debug_active_init(ref);
> +	mutex_reinit(&ref->mutex);
> +
> +	ref->cache = NULL;
> +	ref->tree = RB_ROOT;
> +}
> +
>   static inline bool is_idle_barrier(struct active_node *node, u64 idx)
>   {
>   	return node->timeline == idx && !i915_active_fence_isset(&node->base);
> diff --git a/drivers/gpu/drm/i915/i915_active.h b/drivers/gpu/drm/i915/i915_active.h
> index fb165d3f01cf..6df7e721616d 100644
> --- a/drivers/gpu/drm/i915/i915_active.h
> +++ b/drivers/gpu/drm/i915/i915_active.h
> @@ -219,6 +219,8 @@ i915_active_is_idle(const struct i915_active *ref)
>   
>   void i915_active_fini(struct i915_active *ref);
>   
> +void i915_active_reinit(struct i915_active *ref);
> +
>   int i915_active_acquire_preallocate_barrier(struct i915_active *ref,
>   					    struct intel_engine_cs *engine);
>   void i915_active_acquire_barrier(struct i915_active *ref);
> diff --git a/drivers/gpu/drm/i915/i915_utils.h b/drivers/gpu/drm/i915/i915_utils.h
> index 54773371e6bd..ef8db3aa75c7 100644
> --- a/drivers/gpu/drm/i915/i915_utils.h
> +++ b/drivers/gpu/drm/i915/i915_utils.h
> @@ -443,6 +443,13 @@ static inline bool timer_expired(const struct timer_list *t)
>   	return READ_ONCE(t->expires) && !timer_pending(t);
>   }
>   
> +static inline void mutex_reinit(struct mutex *lock)
> +{
> +#if IS_ENABLED(CONFIG_DEBUG_MUTEXES)
> +	lock->magic = lock;
> +#endif
> +}
> +
>   /*
>    * This is a lookalike for IS_ENABLED() that takes a kconfig value,
>    * e.g. CONFIG_DRM_I915_SPIN_REQUEST, and evaluates whether it is non-zero
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Intel-gfx] [PATCH 16/21] drm/i915/gt: Track signaled breadcrumbs outside of the breadcrumb spinlock
  2020-07-31 15:12     ` Chris Wilson
@ 2020-07-31 15:21       ` Chris Wilson
  2020-07-31 16:06         ` Tvrtko Ursulin
  2020-07-31 15:32       ` Tvrtko Ursulin
  1 sibling, 1 reply; 39+ messages in thread
From: Chris Wilson @ 2020-07-31 15:21 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx; +Cc: thomas.hellstrom

Quoting Chris Wilson (2020-07-31 16:12:32)
> Quoting Tvrtko Ursulin (2020-07-31 16:06:55)
> > 
> > On 30/07/2020 10:37, Chris Wilson wrote:
> > > @@ -191,17 +188,19 @@ static void signal_irq_work(struct irq_work *work)
> > >   {
> > >       struct intel_breadcrumbs *b = container_of(work, typeof(*b), irq_work);
> > >       const ktime_t timestamp = ktime_get();
> > > +     struct llist_node *signal, *sn;
> > >       struct intel_context *ce, *cn;
> > >       struct list_head *pos, *next;
> > > -     LIST_HEAD(signal);
> > > +
> > > +     signal = NULL;
> > > +     if (unlikely(!llist_empty(&b->signaled_requests)))
> > > +             signal = llist_del_all(&b->signaled_requests);
> > >   
> > >       spin_lock(&b->irq_lock);
> > >   
> > > -     if (list_empty(&b->signalers))
> > > +     if (!signal && list_empty(&b->signalers))
> > 
> > The only open from previous round was on this change. If I understood 
> > your previous reply correctly, checking this or not simply controls the 
> > disarm point and is not related to this patch. With the check added now 
> > we would disarm later, because even already signaled requests would keep 
> > it armed. I would prefer this was a separate patch if you could possibly 
> > be convinced.
> 
> I considered that since we add to the lockless list and then queue the
> irq work, that is a path that is not driven by the interrupt and so
> causing an issue with the idea of the interrupt shadow. Having a simple
> test I thought was a positive side-effect to filter out the early
> irq_work.

How about a compromise and I sell the patch with a comment:
        /*
         * Keep the irq armed until the interrupt after all listeners are gone.
         *
         * Enabling/disabling the interrupt is rather costly, roughly a couple
         * of hundred microseconds. If we are proactive and enable/disable
         * the interrupt around every request that wants a breadcrumb, we
         * quickly drown in the extra orders of magnitude of latency imposed
         * on request submission.
         *
         * So we try to be lazy, and keep the interrupts enabled until no
         * more listeners appear within a breadcrumb interrupt interval (that
         * is until a request completes that no one cares about). The
         * observation is that listeners come in batches, and will often
         * listen to a bunch of requests in succession.
         *
         * We also try to avoid raising too many interrupts, as they may
         * be generated by userspace batches and it is unfortunately rather
         * too easy to drown the CPU under a flood of GPU interrupts. Thus
         * whenever no one appears to be listening, we turn off the interrupts.
         * Fewer interrupts should conserve power -- at the very least, fewer
         * interrupt draw less ire from other users of the system and tools
         * like powertop.
	 */
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Intel-gfx] [PATCH 17/21] drm/i915/gt: Protect context lifetime with RCU
  2020-07-31 15:15   ` Tvrtko Ursulin
@ 2020-07-31 15:24     ` Chris Wilson
  2020-07-31 15:45       ` Tvrtko Ursulin
  0 siblings, 1 reply; 39+ messages in thread
From: Chris Wilson @ 2020-07-31 15:24 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx; +Cc: thomas.hellstrom

Quoting Tvrtko Ursulin (2020-07-31 16:15:43)
> 
> On 30/07/2020 10:37, Chris Wilson wrote:
> > Allow a brief period for continued access to a dead intel_context by
> > deferring the release of the struct until after an RCU grace period.
> > As we are using a dedicated slab cache for the contexts, we can defer
> > the release of the slab pages via RCU, with the caveat that individual
> > structs may be reused from the freelist within an RCU grace period. To
> > handle that, we have to avoid clearing members of the zombie struct.
> 
> What was the motivation?

I wanted a window where the pointer was kept alive by RCU after we
called intel_context_put() so I could keep using spinlock for a bit.

If you look closely, you might spot that isn't used any more, but I
liked the ctor so kept it around.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Intel-gfx] [PATCH 16/21] drm/i915/gt: Track signaled breadcrumbs outside of the breadcrumb spinlock
  2020-07-31 15:12     ` Chris Wilson
  2020-07-31 15:21       ` Chris Wilson
@ 2020-07-31 15:32       ` Tvrtko Ursulin
  1 sibling, 0 replies; 39+ messages in thread
From: Tvrtko Ursulin @ 2020-07-31 15:32 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: thomas.hellstrom


On 31/07/2020 16:12, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2020-07-31 16:06:55)
>>
>> On 30/07/2020 10:37, Chris Wilson wrote:
>>> @@ -191,17 +188,19 @@ static void signal_irq_work(struct irq_work *work)
>>>    {
>>>        struct intel_breadcrumbs *b = container_of(work, typeof(*b), irq_work);
>>>        const ktime_t timestamp = ktime_get();
>>> +     struct llist_node *signal, *sn;
>>>        struct intel_context *ce, *cn;
>>>        struct list_head *pos, *next;
>>> -     LIST_HEAD(signal);
>>> +
>>> +     signal = NULL;
>>> +     if (unlikely(!llist_empty(&b->signaled_requests)))
>>> +             signal = llist_del_all(&b->signaled_requests);
>>>    
>>>        spin_lock(&b->irq_lock);
>>>    
>>> -     if (list_empty(&b->signalers))
>>> +     if (!signal && list_empty(&b->signalers))
>>
>> The only open from previous round was on this change. If I understood
>> your previous reply correctly, checking this or not simply controls the
>> disarm point and is not related to this patch. With the check added now
>> we would disarm later, because even already signaled requests would keep
>> it armed. I would prefer this was a separate patch if you could possibly
>> be convinced.
> 
> I considered that since we add to the lockless list and then queue the
> irq work, that is a path that is not driven by the interrupt and so
> causing an issue with the idea of the interrupt shadow. Having a simple
> test I thought was a positive side-effect to filter out the early
> irq_work.

I don't really follow. I look at it like this: No active signalers so we 
can disarm. What is the purpose of keeping the interrupt enabled if all 
that is on list are already completed requests? They will get signaled 
in the very same run of signal_irq_work so I don't see a connection with 
lockless list and keeping the interrupts on for longer.

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Intel-gfx] [PATCH 17/21] drm/i915/gt: Protect context lifetime with RCU
  2020-07-31 15:24     ` Chris Wilson
@ 2020-07-31 15:45       ` Tvrtko Ursulin
  0 siblings, 0 replies; 39+ messages in thread
From: Tvrtko Ursulin @ 2020-07-31 15:45 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: thomas.hellstrom


On 31/07/2020 16:24, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2020-07-31 16:15:43)
>>
>> On 30/07/2020 10:37, Chris Wilson wrote:
>>> Allow a brief period for continued access to a dead intel_context by
>>> deferring the release of the struct until after an RCU grace period.
>>> As we are using a dedicated slab cache for the contexts, we can defer
>>> the release of the slab pages via RCU, with the caveat that individual
>>> structs may be reused from the freelist within an RCU grace period. To
>>> handle that, we have to avoid clearing members of the zombie struct.
>>
>> What was the motivation?
> 
> I wanted a window where the pointer was kept alive by RCU after we
> called intel_context_put() so I could keep using spinlock for a bit.
> 
> If you look closely, you might spot that isn't used any more, but I
> liked the ctor so kept it around.

In this case I think it is better to avoid changes which can introduce 
potential bugs. (And added bonus time saved on review. :)

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Intel-gfx] [PATCH 16/21] drm/i915/gt: Track signaled breadcrumbs outside of the breadcrumb spinlock
  2020-07-31 15:21       ` Chris Wilson
@ 2020-07-31 16:06         ` Tvrtko Ursulin
  2020-07-31 17:59           ` Chris Wilson
  0 siblings, 1 reply; 39+ messages in thread
From: Tvrtko Ursulin @ 2020-07-31 16:06 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: thomas.hellstrom


On 31/07/2020 16:21, Chris Wilson wrote:
> Quoting Chris Wilson (2020-07-31 16:12:32)
>> Quoting Tvrtko Ursulin (2020-07-31 16:06:55)
>>>
>>> On 30/07/2020 10:37, Chris Wilson wrote:
>>>> @@ -191,17 +188,19 @@ static void signal_irq_work(struct irq_work *work)
>>>>    {
>>>>        struct intel_breadcrumbs *b = container_of(work, typeof(*b), irq_work);
>>>>        const ktime_t timestamp = ktime_get();
>>>> +     struct llist_node *signal, *sn;
>>>>        struct intel_context *ce, *cn;
>>>>        struct list_head *pos, *next;
>>>> -     LIST_HEAD(signal);
>>>> +
>>>> +     signal = NULL;
>>>> +     if (unlikely(!llist_empty(&b->signaled_requests)))
>>>> +             signal = llist_del_all(&b->signaled_requests);
>>>>    
>>>>        spin_lock(&b->irq_lock);
>>>>    
>>>> -     if (list_empty(&b->signalers))
>>>> +     if (!signal && list_empty(&b->signalers))
>>>
>>> The only open from previous round was on this change. If I understood
>>> your previous reply correctly, checking this or not simply controls the
>>> disarm point and is not related to this patch. With the check added now
>>> we would disarm later, because even already signaled requests would keep
>>> it armed. I would prefer this was a separate patch if you could possibly
>>> be convinced.
>>
>> I considered that since we add to the lockless list and then queue the
>> irq work, that is a path that is not driven by the interrupt and so
>> causing an issue with the idea of the interrupt shadow. Having a simple
>> test I thought was a positive side-effect to filter out the early
>> irq_work.
> 
> How about a compromise and I sell the patch with a comment:
>          /*
>           * Keep the irq armed until the interrupt after all listeners are gone.
>           *
>           * Enabling/disabling the interrupt is rather costly, roughly a couple
>           * of hundred microseconds. If we are proactive and enable/disable
>           * the interrupt around every request that wants a breadcrumb, we
>           * quickly drown in the extra orders of magnitude of latency imposed
>           * on request submission.
>           *
>           * So we try to be lazy, and keep the interrupts enabled until no
>           * more listeners appear within a breadcrumb interrupt interval (that
>           * is until a request completes that no one cares about). The
>           * observation is that listeners come in batches, and will often
>           * listen to a bunch of requests in succession.
>           *
>           * We also try to avoid raising too many interrupts, as they may
>           * be generated by userspace batches and it is unfortunately rather
>           * too easy to drown the CPU under a flood of GPU interrupts. Thus
>           * whenever no one appears to be listening, we turn off the interrupts.
>           * Fewer interrupts should conserve power -- at the very least, fewer
>           * interrupt draw less ire from other users of the system and tools
>           * like powertop.
> 	 */

It really feels like it should be a separate patch.

I am not sure at the moment how exactly the hysteresis this would apply 
would look like. The end point is driven by the requests on the signaled 
list, but that is also driven by the timing of listeners adding 
themselves versus the request completion. For instance maybe if we want 
a hysteresis we won't something more deterministic and explicit. Maybe 
tied directly to the user interrupt following no more listeners. Like 
simply disarm on the second irq work after all b->signalers have gone. I 
just can't picture the state or b->signaled_requests in relation to all 
dynamic actions.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Intel-gfx] [PATCH 16/21] drm/i915/gt: Track signaled breadcrumbs outside of the breadcrumb spinlock
  2020-07-31 16:06         ` Tvrtko Ursulin
@ 2020-07-31 17:59           ` Chris Wilson
  0 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2020-07-31 17:59 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx; +Cc: thomas.hellstrom

Quoting Tvrtko Ursulin (2020-07-31 17:06:08)
> 
> On 31/07/2020 16:21, Chris Wilson wrote:
> > Quoting Chris Wilson (2020-07-31 16:12:32)
> >> Quoting Tvrtko Ursulin (2020-07-31 16:06:55)
> >>>
> >>> On 30/07/2020 10:37, Chris Wilson wrote:
> >>>> @@ -191,17 +188,19 @@ static void signal_irq_work(struct irq_work *work)
> >>>>    {
> >>>>        struct intel_breadcrumbs *b = container_of(work, typeof(*b), irq_work);
> >>>>        const ktime_t timestamp = ktime_get();
> >>>> +     struct llist_node *signal, *sn;
> >>>>        struct intel_context *ce, *cn;
> >>>>        struct list_head *pos, *next;
> >>>> -     LIST_HEAD(signal);
> >>>> +
> >>>> +     signal = NULL;
> >>>> +     if (unlikely(!llist_empty(&b->signaled_requests)))
> >>>> +             signal = llist_del_all(&b->signaled_requests);
> >>>>    
> >>>>        spin_lock(&b->irq_lock);
> >>>>    
> >>>> -     if (list_empty(&b->signalers))
> >>>> +     if (!signal && list_empty(&b->signalers))
> >>>
> >>> The only open from previous round was on this change. If I understood
> >>> your previous reply correctly, checking this or not simply controls the
> >>> disarm point and is not related to this patch. With the check added now
> >>> we would disarm later, because even already signaled requests would keep
> >>> it armed. I would prefer this was a separate patch if you could possibly
> >>> be convinced.
> >>
> >> I considered that since we add to the lockless list and then queue the
> >> irq work, that is a path that is not driven by the interrupt and so
> >> causing an issue with the idea of the interrupt shadow. Having a simple
> >> test I thought was a positive side-effect to filter out the early
> >> irq_work.
> > 
> > How about a compromise and I sell the patch with a comment:
> >          /*
> >           * Keep the irq armed until the interrupt after all listeners are gone.
> >           *
> >           * Enabling/disabling the interrupt is rather costly, roughly a couple
> >           * of hundred microseconds. If we are proactive and enable/disable
> >           * the interrupt around every request that wants a breadcrumb, we
> >           * quickly drown in the extra orders of magnitude of latency imposed
> >           * on request submission.
> >           *
> >           * So we try to be lazy, and keep the interrupts enabled until no
> >           * more listeners appear within a breadcrumb interrupt interval (that
> >           * is until a request completes that no one cares about). The
> >           * observation is that listeners come in batches, and will often
> >           * listen to a bunch of requests in succession.
> >           *
> >           * We also try to avoid raising too many interrupts, as they may
> >           * be generated by userspace batches and it is unfortunately rather
> >           * too easy to drown the CPU under a flood of GPU interrupts. Thus
> >           * whenever no one appears to be listening, we turn off the interrupts.
> >           * Fewer interrupts should conserve power -- at the very least, fewer
> >           * interrupt draw less ire from other users of the system and tools
> >           * like powertop.
> >        */
> 
> It really feels like it should be a separate patch.
> 
> I am not sure at the moment how exactly the hysteresis this would apply 
> would look like. The end point is driven by the requests on the signaled 
> list, but that is also driven by the timing of listeners adding 
> themselves versus the request completion. For instance maybe if we want 
> a hysteresis we won't something more deterministic and explicit. Maybe 
> tied directly to the user interrupt following no more listeners. Like 
> simply disarm on the second irq work after all b->signalers have gone. I 
> just can't picture the state or b->signaled_requests in relation to all 
> dynamic actions.

Which is kind of the point. b->signaled_requests doesn't have a
relationship with the interrupt delivery. So we are changing the
behaviour by kicking the irq_work after adding to b->signaled_requests
independently of the interrupts, hence why we shouldn't be making that
change incidently in the patch.

It is about conserving existing behaviour.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2020-07-31 17:59 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-30  9:37 [Intel-gfx] Breadcrumbs fixes and stall avoidance Chris Wilson
2020-07-30  9:37 ` [Intel-gfx] [PATCH 01/21] drm/i915: Add a couple of missing i915_active_fini() Chris Wilson
2020-07-30  9:37 ` [Intel-gfx] [PATCH 02/21] drm/i915: Skip taking acquire mutex for no ref->active callback Chris Wilson
2020-07-30  9:37 ` [Intel-gfx] [PATCH 03/21] drm/i915: Export a preallocate variant of i915_active_acquire() Chris Wilson
2020-07-31  7:33   ` Thomas Hellström (Intel)
2020-07-30  9:37 ` [Intel-gfx] [PATCH 04/21] drm/i915: Keep the most recently used active-fence upon discard Chris Wilson
2020-07-30  9:37 ` [Intel-gfx] [PATCH 05/21] drm/i915: Make the stale cached active node available for any timeline Chris Wilson
2020-07-30  9:37 ` [Intel-gfx] [PATCH 06/21] drm/i915: Reduce locking around i915_active_acquire_preallocate_barrier() Chris Wilson
2020-07-31  7:39   ` Thomas Hellström (Intel)
2020-07-30  9:37 ` [Intel-gfx] [PATCH 07/21] drm/i915: Provide a fastpath for waiting on vma bindings Chris Wilson
2020-07-30  9:37 ` [Intel-gfx] [PATCH 08/21] drm/i915/gem: Reduce ctx->engine_mutex for reading the clone source Chris Wilson
2020-07-30  9:37 ` [Intel-gfx] [PATCH 09/21] drm/i915/gem: Reduce ctx->engines_mutex for get_engines() Chris Wilson
2020-07-30  9:37 ` [Intel-gfx] [PATCH 10/21] drm/i915: Remove requirement for holding i915_request.lock for breadcrumbs Chris Wilson
2020-07-30  9:37 ` [Intel-gfx] [PATCH 11/21] drm/i915/gt: Replace intel_engine_transfer_stale_breadcrumbs Chris Wilson
2020-07-30  9:37 ` [Intel-gfx] [PATCH 12/21] drm/i915/gt: Only transfer the virtual context to the new engine if active Chris Wilson
2020-07-30  9:37 ` [Intel-gfx] [PATCH 13/21] drm/i915/gt: Distinguish the virtual breadcrumbs from the irq breadcrumbs Chris Wilson
2020-07-31 14:53   ` Tvrtko Ursulin
2020-07-30  9:37 ` [Intel-gfx] [PATCH 14/21] drm/i915/gt: Move intel_breadcrumbs_arm_irq earlier Chris Wilson
2020-07-30  9:37 ` [Intel-gfx] [PATCH 15/21] drm/i915/gt: Hold context/request reference while breadcrumbs are active Chris Wilson
2020-07-30  9:37 ` [Intel-gfx] [PATCH 16/21] drm/i915/gt: Track signaled breadcrumbs outside of the breadcrumb spinlock Chris Wilson
2020-07-31 15:06   ` Tvrtko Ursulin
2020-07-31 15:12     ` Chris Wilson
2020-07-31 15:21       ` Chris Wilson
2020-07-31 16:06         ` Tvrtko Ursulin
2020-07-31 17:59           ` Chris Wilson
2020-07-31 15:32       ` Tvrtko Ursulin
2020-07-30  9:37 ` [Intel-gfx] [PATCH 17/21] drm/i915/gt: Protect context lifetime with RCU Chris Wilson
2020-07-31 15:15   ` Tvrtko Ursulin
2020-07-31 15:24     ` Chris Wilson
2020-07-31 15:45       ` Tvrtko Ursulin
2020-07-30  9:37 ` [Intel-gfx] [PATCH 18/21] drm/i915/gt: Split the breadcrumb spinlock between global and contexts Chris Wilson
2020-07-30  9:37 ` [Intel-gfx] [PATCH 19/21] drm/i915: Drop i915_request.lock serialisation around await_start Chris Wilson
2020-07-30  9:37 ` [Intel-gfx] [PATCH 20/21] drm/i915: Drop i915_request.lock requirement for intel_rps_boost() Chris Wilson
2020-07-30  9:37 ` [PATCH 21/21] drm/i915/gem: Delay tracking the GEM context until it is registered Chris Wilson
2020-07-30  9:37   ` [Intel-gfx] " Chris Wilson
2020-07-30 13:45 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/21] drm/i915: Add a couple of missing i915_active_fini() Patchwork
2020-07-30 13:46 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2020-07-30 14:04 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2020-07-30 19:20 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.