[Intel-gfx] [PATCH 1/4] drm/i915/gt: Report context-is-closed prior to pinning

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Intel-gfx] [PATCH 1/4] drm/i915/gt: Report context-is-closed prior to pinning
@ 2020-03-20 13:01 Chris Wilson
  2020-03-20 13:01 ` [Intel-gfx] [PATCH 2/4] drm/i915/execlists: Pull tasklet interrupt-bh local to direct submission Chris Wilson
                   ` (7 more replies)
  0 siblings, 8 replies; 19+ messages in thread
From: Chris Wilson @ 2020-03-20 13:01 UTC (permalink / raw)
  To: intel-gfx

Our assertion caught that we do try to pin a closed context if userspace
is viciously racing context-closure with execbuf, so make it fail
gracefully.

Closes: https://gitlab.freedesktop.org/drm/intel/issues/1492
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/gt/intel_context.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
index aea992e46c42..7132bf616cc4 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -97,8 +97,6 @@ int __intel_context_do_pin(struct intel_context *ce)
 {
 	int err;
 
-	GEM_BUG_ON(intel_context_is_closed(ce));
-
 	if (unlikely(!test_bit(CONTEXT_ALLOC_BIT, &ce->flags))) {
 		err = intel_context_alloc_state(ce);
 		if (err)
@@ -114,6 +112,11 @@ int __intel_context_do_pin(struct intel_context *ce)
 		goto out_release;
 	}
 
+	if (unlikely(intel_context_is_closed(ce))) {
+		err = -ENOENT;
+		goto out_release;
+	}
+
 	if (likely(!atomic_add_unless(&ce->pin_count, 1, 0))) {
 		err = intel_context_active_acquire(ce);
 		if (unlikely(err))
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Intel-gfx] [PATCH 2/4] drm/i915/execlists: Pull tasklet interrupt-bh local to direct submission
  2020-03-20 13:01 [Intel-gfx] [PATCH 1/4] drm/i915/gt: Report context-is-closed prior to pinning Chris Wilson
@ 2020-03-20 13:01 ` Chris Wilson
  2020-03-20 17:47   ` [Intel-gfx] [PATCH] " Chris Wilson
  2020-03-20 13:01 ` [Intel-gfx] [PATCH 3/4] drm/i915: Immediately execute the fenced work Chris Wilson
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Chris Wilson @ 2020-03-20 13:01 UTC (permalink / raw)
  To: intel-gfx

We dropped calling process_csb prior to handling direct submission in
order to avoid the nesting of spinlocks and lift process_csb() and the
majority of the tasklet out of irq-off. However, we do want to avoid
ksoftirqd latency in the fast path, so try and pull the interrupt-bh
local to direct submission if we can acquire the tasklet's lock.

v2: Tweak the balance to avoid over submitting lite-restores

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Francisco Jerez <currojerez@riseup.net>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
---
 drivers/gpu/drm/i915/gt/intel_lrc.c | 40 +++++++++++++++++++++--------
 1 file changed, 29 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index f09dd87324b9..0d9c6ea4adaa 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -2884,29 +2884,47 @@ static void queue_request(struct intel_engine_cs *engine,
 	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
 }
 
-static void __submit_queue_imm(struct intel_engine_cs *engine)
+static bool pending_csb(const struct intel_engine_execlists *el)
 {
-	struct intel_engine_execlists * const execlists = &engine->execlists;
-
-	if (reset_in_progress(execlists))
-		return; /* defer until we restart the engine following reset */
-
-	if (execlists->tasklet.func == execlists_submission_tasklet)
-		__execlists_submission_tasklet(engine);
-	else
-		tasklet_hi_schedule(&execlists->tasklet);
+	return READ_ONCE(*el->csb_write) != READ_ONCE(el->csb_head);
 }
 
 static void submit_queue(struct intel_engine_cs *engine,
 			 const struct i915_request *rq)
 {
 	struct intel_engine_execlists *execlists = &engine->execlists;
+	struct i915_request *inflight;
 
 	if (rq_prio(rq) <= execlists->queue_priority_hint)
 		return;
 
+	if (reset_in_progress(execlists))
+		return; /* defer until we restart the engine following reset */
+
+	/* Hopefully we clear execlists->pending[] to let us through */
+	if (execlists->pending[0] && tasklet_trylock(&execlists->tasklet)) {
+		process_csb(engine);
+		tasklet_unlock(&execlists->tasklet);
+	}
+
+	/*
+	 * Suppress immediate lite-restores, leave that to the tasklet.
+	 *
+	 * However, we leave the queue_priority_hint unset so that if we do
+	 * submit a second context, we push that into ELSP[1] immediately.
+	 */
+	inflight = execlists_active(&engine->execlists);
+	if (inflight && inflight->context == rq->context)
+		return;
+
 	execlists->queue_priority_hint = rq_prio(rq);
-	__submit_queue_imm(engine);
+	__execlists_submission_tasklet(engine);
+
+	/* Try and pull an interrupt-bh queued on another CPU to here */
+	if (pending_csb(execlists) && tasklet_trylock(&execlists->tasklet)) {
+		process_csb(engine);
+		tasklet_unlock(&execlists->tasklet);
+	}
 }
 
 static bool ancestor_on_hold(const struct intel_engine_cs *engine,
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Intel-gfx] [PATCH 3/4] drm/i915: Immediately execute the fenced work
  2020-03-20 13:01 [Intel-gfx] [PATCH 1/4] drm/i915/gt: Report context-is-closed prior to pinning Chris Wilson
  2020-03-20 13:01 ` [Intel-gfx] [PATCH 2/4] drm/i915/execlists: Pull tasklet interrupt-bh local to direct submission Chris Wilson
@ 2020-03-20 13:01 ` Chris Wilson
  2020-03-20 13:01 ` [Intel-gfx] [PATCH 4/4] drm/i915/gem: Avoid gem_context->mutex for simple vma lookup Chris Wilson
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Chris Wilson @ 2020-03-20 13:01 UTC (permalink / raw)
  To: intel-gfx

If the caller allows and we do not have to wait for any signals,
immediately execute the work within the caller's process. By doing so we
avoid the overhead of scheduling a new task, and the latency in
executing it, at the cost of pulling that work back into the immediate
context. (Sometimes we still prefer to offload the task to another cpu,
especially if we plan on executing many such tasks in parallel for this
client.)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c |  2 +-
 drivers/gpu/drm/i915/i915_sw_fence_work.c      |  5 ++++-
 drivers/gpu/drm/i915/i915_sw_fence_work.h      | 12 ++++++++++++
 drivers/gpu/drm/i915/i915_vma.c                |  2 +-
 4 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 36d069504836..c1179c00bc61 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -1784,7 +1784,7 @@ static int eb_parse_pipeline(struct i915_execbuffer *eb,
 	dma_resv_add_excl_fence(shadow->resv, &pw->base.dma);
 	dma_resv_unlock(shadow->resv);
 
-	dma_fence_work_commit(&pw->base);
+	dma_fence_work_commit_imm(&pw->base);
 	return 0;
 
 err_batch_unlock:
diff --git a/drivers/gpu/drm/i915/i915_sw_fence_work.c b/drivers/gpu/drm/i915/i915_sw_fence_work.c
index 997b2998f1f2..a3a81bb8f2c3 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence_work.c
+++ b/drivers/gpu/drm/i915/i915_sw_fence_work.c
@@ -38,7 +38,10 @@ fence_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
 
 		if (!f->dma.error) {
 			dma_fence_get(&f->dma);
-			queue_work(system_unbound_wq, &f->work);
+			if (test_bit(DMA_FENCE_WORK_IMM, &f->dma.flags))
+				fence_work(&f->work);
+			else
+				queue_work(system_unbound_wq, &f->work);
 		} else {
 			fence_complete(f);
 		}
diff --git a/drivers/gpu/drm/i915/i915_sw_fence_work.h b/drivers/gpu/drm/i915/i915_sw_fence_work.h
index 3a22b287e201..0719d661dc9c 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence_work.h
+++ b/drivers/gpu/drm/i915/i915_sw_fence_work.h
@@ -32,6 +32,10 @@ struct dma_fence_work {
 	const struct dma_fence_work_ops *ops;
 };
 
+enum {
+	DMA_FENCE_WORK_IMM = DMA_FENCE_FLAG_USER_BITS,
+};
+
 void dma_fence_work_init(struct dma_fence_work *f,
 			 const struct dma_fence_work_ops *ops);
 int dma_fence_work_chain(struct dma_fence_work *f, struct dma_fence *signal);
@@ -41,4 +45,12 @@ static inline void dma_fence_work_commit(struct dma_fence_work *f)
 	i915_sw_fence_commit(&f->chain);
 }
 
+static inline void dma_fence_work_commit_imm(struct dma_fence_work *f)
+{
+	if (atomic_read(&f->chain.pending) <= 1)
+		__set_bit(DMA_FENCE_WORK_IMM, &f->dma.flags);
+
+	dma_fence_work_commit(f);
+}
+
 #endif /* I915_SW_FENCE_WORK_H */
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 5b3efb43a8ef..6dd242c09daf 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -980,7 +980,7 @@ int i915_vma_pin(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
 	mutex_unlock(&vma->vm->mutex);
 err_fence:
 	if (work)
-		dma_fence_work_commit(&work->base);
+		dma_fence_work_commit_imm(&work->base);
 	if (wakeref)
 		intel_runtime_pm_put(&vma->vm->i915->runtime_pm, wakeref);
 err_pages:
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Intel-gfx] [PATCH 4/4] drm/i915/gem: Avoid gem_context->mutex for simple vma lookup
  2020-03-20 13:01 [Intel-gfx] [PATCH 1/4] drm/i915/gt: Report context-is-closed prior to pinning Chris Wilson
  2020-03-20 13:01 ` [Intel-gfx] [PATCH 2/4] drm/i915/execlists: Pull tasklet interrupt-bh local to direct submission Chris Wilson
  2020-03-20 13:01 ` [Intel-gfx] [PATCH 3/4] drm/i915: Immediately execute the fenced work Chris Wilson
@ 2020-03-20 13:01 ` Chris Wilson
  2020-03-20 13:47   ` Tvrtko Ursulin
  2020-03-20 13:15 ` [Intel-gfx] [PATCH 1/4] drm/i915/gt: Report context-is-closed prior to pinning Tvrtko Ursulin
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Chris Wilson @ 2020-03-20 13:01 UTC (permalink / raw)
  To: intel-gfx

As we store the handle lookup inside a radix tree, we do not need the
gem_context->mutex except until we need to insert our lookup into the
common radix tree. This takes a small bit of rearranging to ensure that
the lut we insert into the tree is ready prior to actually inserting it
(as soon as it is exposed via the radixtree, it is visible to any other
submission).

v2: For brownie points, remove the goto spaghetti.
v3: Tighten up the closed-handle checks.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 136 +++++++++++-------
 1 file changed, 87 insertions(+), 49 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index c1179c00bc61..876fc2e124b9 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -481,7 +481,7 @@ eb_add_vma(struct i915_execbuffer *eb,
 
 	GEM_BUG_ON(i915_vma_is_closed(vma));
 
-	ev->vma = i915_vma_get(vma);
+	ev->vma = vma;
 	ev->exec = entry;
 	ev->flags = entry->flags;
 
@@ -728,77 +728,117 @@ static int eb_select_context(struct i915_execbuffer *eb)
 	return 0;
 }
 
-static int eb_lookup_vmas(struct i915_execbuffer *eb)
+static int __eb_add_lut(struct i915_execbuffer *eb,
+			u32 handle, struct i915_vma *vma)
 {
-	struct radix_tree_root *handles_vma = &eb->gem_context->handles_vma;
-	struct drm_i915_gem_object *obj;
-	unsigned int i, batch;
+	struct i915_gem_context *ctx = eb->gem_context;
+	struct i915_lut_handle *lut;
 	int err;
 
-	if (unlikely(i915_gem_context_is_closed(eb->gem_context)))
-		return -ENOENT;
+	lut = i915_lut_handle_alloc();
+	if (unlikely(!lut))
+		return -ENOMEM;
 
-	INIT_LIST_HEAD(&eb->relocs);
-	INIT_LIST_HEAD(&eb->unbound);
+	i915_vma_get(vma);
+	if (!atomic_fetch_inc(&vma->open_count))
+		i915_vma_reopen(vma);
+	lut->handle = handle;
+	lut->ctx = ctx;
+
+	/* Check that the context hasn't been closed in the meantime */
+	err = -EINTR;
+	if (!mutex_lock_interruptible(&ctx->mutex)) {
+		err = -ENOENT;
+		if (likely(!i915_gem_context_is_closed(ctx)))
+			err = radix_tree_insert(&ctx->handles_vma, handle, vma);
+		if (err == 0) { /* And nor has this handle */
+			struct drm_i915_gem_object *obj = vma->obj;
+
+			i915_gem_object_lock(obj);
+			if (idr_find(&eb->file->object_idr, handle) == obj) {
+				list_add(&lut->obj_link, &obj->lut_list);
+			} else {
+				radix_tree_delete(&ctx->handles_vma, handle);
+				err = -ENOENT;
+			}
+			i915_gem_object_unlock(obj);
+		}
+		mutex_unlock(&ctx->mutex);
+	}
+	if (unlikely(err))
+		goto err;
 
-	batch = eb_batch_index(eb);
+	return 0;
 
-	for (i = 0; i < eb->buffer_count; i++) {
-		u32 handle = eb->exec[i].handle;
-		struct i915_lut_handle *lut;
+err:
+	atomic_dec(&vma->open_count);
+	i915_vma_put(vma);
+	i915_lut_handle_free(lut);
+	return err;
+}
+
+static struct i915_vma *eb_lookup_vma(struct i915_execbuffer *eb, u32 handle)
+{
+	do {
+		struct drm_i915_gem_object *obj;
 		struct i915_vma *vma;
+		int err;
 
-		vma = radix_tree_lookup(handles_vma, handle);
+		rcu_read_lock();
+		vma = radix_tree_lookup(&eb->gem_context->handles_vma, handle);
+		if (likely(vma))
+			vma = i915_vma_tryget(vma);
+		rcu_read_unlock();
 		if (likely(vma))
-			goto add_vma;
+			return vma;
 
 		obj = i915_gem_object_lookup(eb->file, handle);
-		if (unlikely(!obj)) {
-			err = -ENOENT;
-			goto err_vma;
-		}
+		if (unlikely(!obj))
+			return ERR_PTR(-ENOENT);
 
 		vma = i915_vma_instance(obj, eb->context->vm, NULL);
 		if (IS_ERR(vma)) {
-			err = PTR_ERR(vma);
-			goto err_obj;
+			i915_gem_object_put(obj);
+			return vma;
 		}
 
-		lut = i915_lut_handle_alloc();
-		if (unlikely(!lut)) {
-			err = -ENOMEM;
-			goto err_obj;
-		}
+		err = __eb_add_lut(eb, handle, vma);
+		if (likely(!err))
+			return vma;
 
-		err = radix_tree_insert(handles_vma, handle, vma);
-		if (unlikely(err)) {
-			i915_lut_handle_free(lut);
-			goto err_obj;
-		}
+		i915_gem_object_put(obj);
+		if (err != -EEXIST)
+			return ERR_PTR(err);
+	} while (1);
+}
 
-		/* transfer ref to lut */
-		if (!atomic_fetch_inc(&vma->open_count))
-			i915_vma_reopen(vma);
-		lut->handle = handle;
-		lut->ctx = eb->gem_context;
+static int eb_lookup_vmas(struct i915_execbuffer *eb)
+{
+	unsigned int batch = eb_batch_index(eb);
+	unsigned int i;
+	int err = 0;
 
-		i915_gem_object_lock(obj);
-		list_add(&lut->obj_link, &obj->lut_list);
-		i915_gem_object_unlock(obj);
+	INIT_LIST_HEAD(&eb->relocs);
+	INIT_LIST_HEAD(&eb->unbound);
+
+	for (i = 0; i < eb->buffer_count; i++) {
+		struct i915_vma *vma;
+
+		vma = eb_lookup_vma(eb, eb->exec[i].handle);
+		if (IS_ERR(vma)) {
+			err = PTR_ERR(vma);
+			break;
+		}
 
-add_vma:
 		err = eb_validate_vma(eb, &eb->exec[i], vma);
-		if (unlikely(err))
-			goto err_vma;
+		if (unlikely(err)) {
+			i915_vma_put(vma);
+			break;
+		}
 
 		eb_add_vma(eb, i, batch, vma);
 	}
 
-	return 0;
-
-err_obj:
-	i915_gem_object_put(obj);
-err_vma:
 	eb->vma[i].vma = NULL;
 	return err;
 }
@@ -1494,9 +1534,7 @@ static int eb_relocate(struct i915_execbuffer *eb)
 {
 	int err;
 
-	mutex_lock(&eb->gem_context->mutex);
 	err = eb_lookup_vmas(eb);
-	mutex_unlock(&eb->gem_context->mutex);
 	if (err)
 		return err;
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [Intel-gfx] [PATCH 1/4] drm/i915/gt: Report context-is-closed prior to pinning
  2020-03-20 13:01 [Intel-gfx] [PATCH 1/4] drm/i915/gt: Report context-is-closed prior to pinning Chris Wilson
                   ` (2 preceding siblings ...)
  2020-03-20 13:01 ` [Intel-gfx] [PATCH 4/4] drm/i915/gem: Avoid gem_context->mutex for simple vma lookup Chris Wilson
@ 2020-03-20 13:15 ` Tvrtko Ursulin
  2020-03-20 18:08 ` [Intel-gfx] ✗ Fi.CI.BAT: failure for series starting with [1/4] " Patchwork
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Tvrtko Ursulin @ 2020-03-20 13:15 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 20/03/2020 13:01, Chris Wilson wrote:
> Our assertion caught that we do try to pin a closed context if userspace
> is viciously racing context-closure with execbuf, so make it fail
> gracefully.
> 
> Closes: https://gitlab.freedesktop.org/drm/intel/issues/1492
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_context.c | 7 +++++--
>   1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
> index aea992e46c42..7132bf616cc4 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context.c
> +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> @@ -97,8 +97,6 @@ int __intel_context_do_pin(struct intel_context *ce)
>   {
>   	int err;
>   
> -	GEM_BUG_ON(intel_context_is_closed(ce));
> -
>   	if (unlikely(!test_bit(CONTEXT_ALLOC_BIT, &ce->flags))) {
>   		err = intel_context_alloc_state(ce);
>   		if (err)
> @@ -114,6 +112,11 @@ int __intel_context_do_pin(struct intel_context *ce)
>   		goto out_release;
>   	}
>   
> +	if (unlikely(intel_context_is_closed(ce))) {
> +		err = -ENOENT;
> +		goto out_release;
> +	}
> +
>   	if (likely(!atomic_add_unless(&ce->pin_count, 1, 0))) {
>   		err = intel_context_active_acquire(ce);
>   		if (unlikely(err))
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Intel-gfx] [PATCH 4/4] drm/i915/gem: Avoid gem_context->mutex for simple vma lookup
  2020-03-20 13:01 ` [Intel-gfx] [PATCH 4/4] drm/i915/gem: Avoid gem_context->mutex for simple vma lookup Chris Wilson
@ 2020-03-20 13:47   ` Tvrtko Ursulin
  2020-03-20 13:56     ` Chris Wilson
  0 siblings, 1 reply; 19+ messages in thread
From: Tvrtko Ursulin @ 2020-03-20 13:47 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 20/03/2020 13:01, Chris Wilson wrote:
> As we store the handle lookup inside a radix tree, we do not need the
> gem_context->mutex except until we need to insert our lookup into the
> common radix tree. This takes a small bit of rearranging to ensure that
> the lut we insert into the tree is ready prior to actually inserting it
> (as soon as it is exposed via the radixtree, it is visible to any other
> submission).
> 
> v2: For brownie points, remove the goto spaghetti.
> v3: Tighten up the closed-handle checks.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 136 +++++++++++-------
>   1 file changed, 87 insertions(+), 49 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> index c1179c00bc61..876fc2e124b9 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> @@ -481,7 +481,7 @@ eb_add_vma(struct i915_execbuffer *eb,
>   
>   	GEM_BUG_ON(i915_vma_is_closed(vma));
>   
> -	ev->vma = i915_vma_get(vma);
> +	ev->vma = vma;
>   	ev->exec = entry;
>   	ev->flags = entry->flags;
>   
> @@ -728,77 +728,117 @@ static int eb_select_context(struct i915_execbuffer *eb)
>   	return 0;
>   }
>   
> -static int eb_lookup_vmas(struct i915_execbuffer *eb)
> +static int __eb_add_lut(struct i915_execbuffer *eb,
> +			u32 handle, struct i915_vma *vma)
>   {
> -	struct radix_tree_root *handles_vma = &eb->gem_context->handles_vma;
> -	struct drm_i915_gem_object *obj;
> -	unsigned int i, batch;
> +	struct i915_gem_context *ctx = eb->gem_context;
> +	struct i915_lut_handle *lut;
>   	int err;
>   
> -	if (unlikely(i915_gem_context_is_closed(eb->gem_context)))
> -		return -ENOENT;
> +	lut = i915_lut_handle_alloc();
> +	if (unlikely(!lut))
> +		return -ENOMEM;
>   
> -	INIT_LIST_HEAD(&eb->relocs);
> -	INIT_LIST_HEAD(&eb->unbound);
> +	i915_vma_get(vma);
> +	if (!atomic_fetch_inc(&vma->open_count))
> +		i915_vma_reopen(vma);
> +	lut->handle = handle;
> +	lut->ctx = ctx;
> +
> +	/* Check that the context hasn't been closed in the meantime */
> +	err = -EINTR;
> +	if (!mutex_lock_interruptible(&ctx->mutex)) {
> +		err = -ENOENT;
> +		if (likely(!i915_gem_context_is_closed(ctx)))
> +			err = radix_tree_insert(&ctx->handles_vma, handle, vma);
> +		if (err == 0) { /* And nor has this handle */
> +			struct drm_i915_gem_object *obj = vma->obj;
> +
> +			i915_gem_object_lock(obj);

Does this fit into Maarten's rework - nesting of object_lock under 
ctx->mutex I mean?

Other than this question it looks clean.

Regards,

Tvrtko

> +			if (idr_find(&eb->file->object_idr, handle) == obj) {
> +				list_add(&lut->obj_link, &obj->lut_list);
> +			} else {
> +				radix_tree_delete(&ctx->handles_vma, handle);
> +				err = -ENOENT;
> +			}
> +			i915_gem_object_unlock(obj);
> +		}
> +		mutex_unlock(&ctx->mutex);
> +	}
> +	if (unlikely(err))
> +		goto err;
>   
> -	batch = eb_batch_index(eb);
> +	return 0;
>   
> -	for (i = 0; i < eb->buffer_count; i++) {
> -		u32 handle = eb->exec[i].handle;
> -		struct i915_lut_handle *lut;
> +err:
> +	atomic_dec(&vma->open_count);
> +	i915_vma_put(vma);
> +	i915_lut_handle_free(lut);
> +	return err;
> +}
> +
> +static struct i915_vma *eb_lookup_vma(struct i915_execbuffer *eb, u32 handle)
> +{
> +	do {
> +		struct drm_i915_gem_object *obj;
>   		struct i915_vma *vma;
> +		int err;
>   
> -		vma = radix_tree_lookup(handles_vma, handle);
> +		rcu_read_lock();
> +		vma = radix_tree_lookup(&eb->gem_context->handles_vma, handle);
> +		if (likely(vma))
> +			vma = i915_vma_tryget(vma);
> +		rcu_read_unlock();
>   		if (likely(vma))
> -			goto add_vma;
> +			return vma;
>   
>   		obj = i915_gem_object_lookup(eb->file, handle);
> -		if (unlikely(!obj)) {
> -			err = -ENOENT;
> -			goto err_vma;
> -		}
> +		if (unlikely(!obj))
> +			return ERR_PTR(-ENOENT);
>   
>   		vma = i915_vma_instance(obj, eb->context->vm, NULL);
>   		if (IS_ERR(vma)) {
> -			err = PTR_ERR(vma);
> -			goto err_obj;
> +			i915_gem_object_put(obj);
> +			return vma;
>   		}
>   
> -		lut = i915_lut_handle_alloc();
> -		if (unlikely(!lut)) {
> -			err = -ENOMEM;
> -			goto err_obj;
> -		}
> +		err = __eb_add_lut(eb, handle, vma);
> +		if (likely(!err))
> +			return vma;
>   
> -		err = radix_tree_insert(handles_vma, handle, vma);
> -		if (unlikely(err)) {
> -			i915_lut_handle_free(lut);
> -			goto err_obj;
> -		}
> +		i915_gem_object_put(obj);
> +		if (err != -EEXIST)
> +			return ERR_PTR(err);
> +	} while (1);
> +}
>   
> -		/* transfer ref to lut */
> -		if (!atomic_fetch_inc(&vma->open_count))
> -			i915_vma_reopen(vma);
> -		lut->handle = handle;
> -		lut->ctx = eb->gem_context;
> +static int eb_lookup_vmas(struct i915_execbuffer *eb)
> +{
> +	unsigned int batch = eb_batch_index(eb);
> +	unsigned int i;
> +	int err = 0;
>   
> -		i915_gem_object_lock(obj);
> -		list_add(&lut->obj_link, &obj->lut_list);
> -		i915_gem_object_unlock(obj);
> +	INIT_LIST_HEAD(&eb->relocs);
> +	INIT_LIST_HEAD(&eb->unbound);
> +
> +	for (i = 0; i < eb->buffer_count; i++) {
> +		struct i915_vma *vma;
> +
> +		vma = eb_lookup_vma(eb, eb->exec[i].handle);
> +		if (IS_ERR(vma)) {
> +			err = PTR_ERR(vma);
> +			break;
> +		}
>   
> -add_vma:
>   		err = eb_validate_vma(eb, &eb->exec[i], vma);
> -		if (unlikely(err))
> -			goto err_vma;
> +		if (unlikely(err)) {
> +			i915_vma_put(vma);
> +			break;
> +		}
>   
>   		eb_add_vma(eb, i, batch, vma);
>   	}
>   
> -	return 0;
> -
> -err_obj:
> -	i915_gem_object_put(obj);
> -err_vma:
>   	eb->vma[i].vma = NULL;
>   	return err;
>   }
> @@ -1494,9 +1534,7 @@ static int eb_relocate(struct i915_execbuffer *eb)
>   {
>   	int err;
>   
> -	mutex_lock(&eb->gem_context->mutex);
>   	err = eb_lookup_vmas(eb);
> -	mutex_unlock(&eb->gem_context->mutex);
>   	if (err)
>   		return err;
>   
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Intel-gfx] [PATCH 4/4] drm/i915/gem: Avoid gem_context->mutex for simple vma lookup
  2020-03-20 13:47   ` Tvrtko Ursulin
@ 2020-03-20 13:56     ` Chris Wilson
  0 siblings, 0 replies; 19+ messages in thread
From: Chris Wilson @ 2020-03-20 13:56 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2020-03-20 13:47:39)
> 
> On 20/03/2020 13:01, Chris Wilson wrote:
> > As we store the handle lookup inside a radix tree, we do not need the
> > gem_context->mutex except until we need to insert our lookup into the
> > common radix tree. This takes a small bit of rearranging to ensure that
> > the lut we insert into the tree is ready prior to actually inserting it
> > (as soon as it is exposed via the radixtree, it is visible to any other
> > submission).
> > 
> > v2: For brownie points, remove the goto spaghetti.
> > v3: Tighten up the closed-handle checks.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >   .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 136 +++++++++++-------
> >   1 file changed, 87 insertions(+), 49 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > index c1179c00bc61..876fc2e124b9 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > @@ -481,7 +481,7 @@ eb_add_vma(struct i915_execbuffer *eb,
> >   
> >       GEM_BUG_ON(i915_vma_is_closed(vma));
> >   
> > -     ev->vma = i915_vma_get(vma);
> > +     ev->vma = vma;
> >       ev->exec = entry;
> >       ev->flags = entry->flags;
> >   
> > @@ -728,77 +728,117 @@ static int eb_select_context(struct i915_execbuffer *eb)
> >       return 0;
> >   }
> >   
> > -static int eb_lookup_vmas(struct i915_execbuffer *eb)
> > +static int __eb_add_lut(struct i915_execbuffer *eb,
> > +                     u32 handle, struct i915_vma *vma)
> >   {
> > -     struct radix_tree_root *handles_vma = &eb->gem_context->handles_vma;
> > -     struct drm_i915_gem_object *obj;
> > -     unsigned int i, batch;
> > +     struct i915_gem_context *ctx = eb->gem_context;
> > +     struct i915_lut_handle *lut;
> >       int err;
> >   
> > -     if (unlikely(i915_gem_context_is_closed(eb->gem_context)))
> > -             return -ENOENT;
> > +     lut = i915_lut_handle_alloc();
> > +     if (unlikely(!lut))
> > +             return -ENOMEM;
> >   
> > -     INIT_LIST_HEAD(&eb->relocs);
> > -     INIT_LIST_HEAD(&eb->unbound);
> > +     i915_vma_get(vma);
> > +     if (!atomic_fetch_inc(&vma->open_count))
> > +             i915_vma_reopen(vma);
> > +     lut->handle = handle;
> > +     lut->ctx = ctx;
> > +
> > +     /* Check that the context hasn't been closed in the meantime */
> > +     err = -EINTR;
> > +     if (!mutex_lock_interruptible(&ctx->mutex)) {
> > +             err = -ENOENT;
> > +             if (likely(!i915_gem_context_is_closed(ctx)))
> > +                     err = radix_tree_insert(&ctx->handles_vma, handle, vma);
> > +             if (err == 0) { /* And nor has this handle */
> > +                     struct drm_i915_gem_object *obj = vma->obj;
> > +
> > +                     i915_gem_object_lock(obj);
> 
> Does this fit into Maarten's rework - nesting of object_lock under 
> ctx->mutex I mean?

This is the current lock nesting, and should generally remain so. One
context will peek at multiple objects and we should be holding an object
lock to peek at a context.

As for the rework, it holds a bunch of exclusive locks far beyond their
scope causing an even worse BKL, and that will have to be reworked.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Intel-gfx] [PATCH] drm/i915/execlists: Pull tasklet interrupt-bh local to direct submission
  2020-03-20 13:01 ` [Intel-gfx] [PATCH 2/4] drm/i915/execlists: Pull tasklet interrupt-bh local to direct submission Chris Wilson
@ 2020-03-20 17:47   ` Chris Wilson
  2020-03-20 20:28     ` Francisco Jerez
  2020-03-21 13:12     ` Chris Wilson
  0 siblings, 2 replies; 19+ messages in thread
From: Chris Wilson @ 2020-03-20 17:47 UTC (permalink / raw)
  To: intel-gfx

We dropped calling process_csb prior to handling direct submission in
order to avoid the nesting of spinlocks and lift process_csb() and the
majority of the tasklet out of irq-off. However, we do want to avoid
ksoftirqd latency in the fast path, so try and pull the interrupt-bh
local to direct submission if we can acquire the tasklet's lock.

v2: Tweak the balance to avoid over submitting lite-restores

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Francisco Jerez <currojerez@riseup.net>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
---
 drivers/gpu/drm/i915/gt/intel_lrc.c    | 44 ++++++++++++++++++++------
 drivers/gpu/drm/i915/gt/selftest_lrc.c |  2 +-
 2 files changed, 36 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index f09dd87324b9..dceb65a0088f 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -2884,17 +2884,17 @@ static void queue_request(struct intel_engine_cs *engine,
 	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
 }
 
-static void __submit_queue_imm(struct intel_engine_cs *engine)
+static bool pending_csb(const struct intel_engine_execlists *el)
 {
-	struct intel_engine_execlists * const execlists = &engine->execlists;
+	return READ_ONCE(*el->csb_write) != READ_ONCE(el->csb_head);
+}
 
-	if (reset_in_progress(execlists))
-		return; /* defer until we restart the engine following reset */
+static bool skip_lite_restore(struct intel_engine_execlists *el,
+			      const struct i915_request *rq)
+{
+	struct i915_request *inflight = execlists_active(el);
 
-	if (execlists->tasklet.func == execlists_submission_tasklet)
-		__execlists_submission_tasklet(engine);
-	else
-		tasklet_hi_schedule(&execlists->tasklet);
+	return inflight && inflight->context == rq->context;
 }
 
 static void submit_queue(struct intel_engine_cs *engine,
@@ -2905,8 +2905,34 @@ static void submit_queue(struct intel_engine_cs *engine,
 	if (rq_prio(rq) <= execlists->queue_priority_hint)
 		return;
 
+	if (reset_in_progress(execlists))
+		return; /* defer until we restart the engine following reset */
+
+	/*
+	 * Suppress immediate lite-restores, leave that to the tasklet.
+	 *
+	 * However, we leave the queue_priority_hint unset so that if we do
+	 * submit a second context, we push that into ELSP[1] immediately.
+	 */
+	if (skip_lite_restore(execlists, rq))
+		return;
+
+	/* Hopefully we clear execlists->pending[] to let us through */
+	if (execlists->pending[0] && tasklet_trylock(&execlists->tasklet)) {
+		process_csb(engine);
+		tasklet_unlock(&execlists->tasklet);
+		if (skip_lite_restore(execlists, rq))
+			return;
+	}
+
 	execlists->queue_priority_hint = rq_prio(rq);
-	__submit_queue_imm(engine);
+	__execlists_submission_tasklet(engine);
+
+	/* Try and pull an interrupt-bh queued on another CPU to here */
+	if (pending_csb(execlists) && tasklet_trylock(&execlists->tasklet)) {
+		process_csb(engine);
+		tasklet_unlock(&execlists->tasklet);
+	}
 }
 
 static bool ancestor_on_hold(const struct intel_engine_cs *engine,
diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c
index 6f06ba750a0a..c5c4b07a7d5f 100644
--- a/drivers/gpu/drm/i915/gt/selftest_lrc.c
+++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c
@@ -1028,7 +1028,7 @@ static int live_timeslice_rewind(void *arg)
 		if (IS_ERR(rq[1]))
 			goto err;
 
-		err = wait_for_submit(engine, rq[1], HZ / 2);
+		err = wait_for_submit(engine, rq[0], HZ / 2);
 		if (err) {
 			pr_err("%s: failed to submit first context\n",
 			       engine->name);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Intel-gfx] ✗ Fi.CI.BAT: failure for series starting with [1/4] drm/i915/gt: Report context-is-closed prior to pinning
  2020-03-20 13:01 [Intel-gfx] [PATCH 1/4] drm/i915/gt: Report context-is-closed prior to pinning Chris Wilson
                   ` (3 preceding siblings ...)
  2020-03-20 13:15 ` [Intel-gfx] [PATCH 1/4] drm/i915/gt: Report context-is-closed prior to pinning Tvrtko Ursulin
@ 2020-03-20 18:08 ` Patchwork
  2020-03-20 20:06 ` [Intel-gfx] ✓ Fi.CI.BAT: success for series starting with [1/4] drm/i915/gt: Report context-is-closed prior to pinning (rev2) Patchwork
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Patchwork @ 2020-03-20 18:08 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/4] drm/i915/gt: Report context-is-closed prior to pinning
URL   : https://patchwork.freedesktop.org/series/74918/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_8167 -> Patchwork_17037
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_17037 absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_17037, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17037/index.html

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_17037:

### IGT changes ###

#### Possible regressions ####

  * igt@i915_selftest@live@execlists:
    - fi-kbl-r:           [PASS][1] -> [DMESG-FAIL][2]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/fi-kbl-r/igt@i915_selftest@live@execlists.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17037/fi-kbl-r/igt@i915_selftest@live@execlists.html
    - fi-cfl-8109u:       [PASS][3] -> [DMESG-FAIL][4]
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/fi-cfl-8109u/igt@i915_selftest@live@execlists.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17037/fi-cfl-8109u/igt@i915_selftest@live@execlists.html
    - fi-apl-guc:         [PASS][5] -> [DMESG-FAIL][6]
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/fi-apl-guc/igt@i915_selftest@live@execlists.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17037/fi-apl-guc/igt@i915_selftest@live@execlists.html
    - fi-kbl-x1275:       [PASS][7] -> [DMESG-FAIL][8]
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/fi-kbl-x1275/igt@i915_selftest@live@execlists.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17037/fi-kbl-x1275/igt@i915_selftest@live@execlists.html
    - fi-skl-6600u:       [PASS][9] -> [DMESG-FAIL][10]
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/fi-skl-6600u/igt@i915_selftest@live@execlists.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17037/fi-skl-6600u/igt@i915_selftest@live@execlists.html
    - fi-bsw-kefka:       [PASS][11] -> [DMESG-FAIL][12]
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/fi-bsw-kefka/igt@i915_selftest@live@execlists.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17037/fi-bsw-kefka/igt@i915_selftest@live@execlists.html
    - fi-bsw-n3050:       [PASS][13] -> [DMESG-FAIL][14]
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/fi-bsw-n3050/igt@i915_selftest@live@execlists.html
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17037/fi-bsw-n3050/igt@i915_selftest@live@execlists.html
    - fi-skl-guc:         [PASS][15] -> [DMESG-FAIL][16]
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/fi-skl-guc/igt@i915_selftest@live@execlists.html
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17037/fi-skl-guc/igt@i915_selftest@live@execlists.html
    - fi-bxt-dsi:         [PASS][17] -> [DMESG-FAIL][18]
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/fi-bxt-dsi/igt@i915_selftest@live@execlists.html
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17037/fi-bxt-dsi/igt@i915_selftest@live@execlists.html
    - fi-bdw-5557u:       [PASS][19] -> [DMESG-FAIL][20]
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/fi-bdw-5557u/igt@i915_selftest@live@execlists.html
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17037/fi-bdw-5557u/igt@i915_selftest@live@execlists.html
    - fi-kbl-7500u:       [PASS][21] -> [DMESG-FAIL][22]
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/fi-kbl-7500u/igt@i915_selftest@live@execlists.html
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17037/fi-kbl-7500u/igt@i915_selftest@live@execlists.html

  
#### Suppressed ####

  The following results come from untrusted machines, tests, or statuses.
  They do not affect the overall result.

  * igt@i915_selftest@live@execlists:
    - {fi-ehl-1}:         [PASS][23] -> [DMESG-FAIL][24]
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/fi-ehl-1/igt@i915_selftest@live@execlists.html
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17037/fi-ehl-1/igt@i915_selftest@live@execlists.html

  
Known issues
------------

  Here are the changes found in Patchwork_17037 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@i915_selftest@live@execlists:
    - fi-cfl-guc:         [PASS][25] -> [INCOMPLETE][26] ([i915#656])
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/fi-cfl-guc/igt@i915_selftest@live@execlists.html
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17037/fi-cfl-guc/igt@i915_selftest@live@execlists.html
    - fi-cml-u2:          [PASS][27] -> [INCOMPLETE][28] ([i915#283] / [i915#656])
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/fi-cml-u2/igt@i915_selftest@live@execlists.html
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17037/fi-cml-u2/igt@i915_selftest@live@execlists.html

  
#### Possible fixes ####

  * igt@gem_exec_suspend@basic-s4-devices:
    - fi-tgl-y:           [FAIL][29] ([CI#94]) -> [PASS][30]
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/fi-tgl-y/igt@gem_exec_suspend@basic-s4-devices.html
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17037/fi-tgl-y/igt@gem_exec_suspend@basic-s4-devices.html

  * igt@kms_chamelium@hdmi-crc-fast:
    - fi-icl-u2:          [FAIL][31] ([fdo#109635] / [i915#217]) -> [PASS][32]
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/fi-icl-u2/igt@kms_chamelium@hdmi-crc-fast.html
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17037/fi-icl-u2/igt@kms_chamelium@hdmi-crc-fast.html

  * igt@kms_chamelium@hdmi-hpd-fast:
    - fi-kbl-7500u:       [FAIL][33] ([fdo#111407]) -> [PASS][34]
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/fi-kbl-7500u/igt@kms_chamelium@hdmi-hpd-fast.html
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17037/fi-kbl-7500u/igt@kms_chamelium@hdmi-hpd-fast.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [CI#94]: https://gitlab.freedesktop.org/gfx-ci/i915-infra/issues/94
  [fdo#109635]: https://bugs.freedesktop.org/show_bug.cgi?id=109635
  [fdo#111407]: https://bugs.freedesktop.org/show_bug.cgi?id=111407
  [i915#217]: https://gitlab.freedesktop.org/drm/intel/issues/217
  [i915#283]: https://gitlab.freedesktop.org/drm/intel/issues/283
  [i915#656]: https://gitlab.freedesktop.org/drm/intel/issues/656


Participating hosts (49 -> 36)
------------------------------

  Additional (1): fi-skl-6770hq 
  Missing    (14): fi-ilk-m540 fi-hsw-4200u fi-glk-dsi fi-byt-squawks fi-bsw-cyan fi-ilk-650 fi-snb-2520m fi-ctg-p8600 fi-gdg-551 fi-skl-lmem fi-bdw-samus fi-byt-clapper fi-skl-6700k2 fi-snb-2600 


Build changes
-------------

  * CI: CI-20190529 -> None
  * Linux: CI_DRM_8167 -> Patchwork_17037

  CI-20190529: 20190529
  CI_DRM_8167: b51a7e7f4f72cf780661a1e4b479e2b27ddbafc8 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_5526: f49ebeee9f54d6f23c60a842f75f65561d452ab0 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_17037: eb0ddcd6e52bdcf434f717011b645c7ce683f191 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

eb0ddcd6e52b drm/i915/gem: Avoid gem_context->mutex for simple vma lookup
cf6cc566d0f7 drm/i915: Immediately execute the fenced work
610326d39898 drm/i915/execlists: Pull tasklet interrupt-bh local to direct submission

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17037/index.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Intel-gfx] ✓ Fi.CI.BAT: success for series starting with [1/4] drm/i915/gt: Report context-is-closed prior to pinning (rev2)
  2020-03-20 13:01 [Intel-gfx] [PATCH 1/4] drm/i915/gt: Report context-is-closed prior to pinning Chris Wilson
                   ` (4 preceding siblings ...)
  2020-03-20 18:08 ` [Intel-gfx] ✗ Fi.CI.BAT: failure for series starting with [1/4] " Patchwork
@ 2020-03-20 20:06 ` Patchwork
  2020-03-21  2:24 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
  2020-03-24  0:11 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for series starting with [1/4] drm/i915/gt: Report context-is-closed prior to pinning (rev3) Patchwork
  7 siblings, 0 replies; 19+ messages in thread
From: Patchwork @ 2020-03-20 20:06 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/4] drm/i915/gt: Report context-is-closed prior to pinning (rev2)
URL   : https://patchwork.freedesktop.org/series/74918/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_8167 -> Patchwork_17041
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17041/index.html

Known issues
------------

  Here are the changes found in Patchwork_17041 that come from known issues:

### IGT changes ###

#### Possible fixes ####

  * igt@gem_exec_suspend@basic-s4-devices:
    - fi-tgl-y:           [FAIL][1] ([CI#94]) -> [PASS][2]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/fi-tgl-y/igt@gem_exec_suspend@basic-s4-devices.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17041/fi-tgl-y/igt@gem_exec_suspend@basic-s4-devices.html

  * igt@kms_chamelium@hdmi-crc-fast:
    - fi-icl-u2:          [FAIL][3] ([fdo#109635] / [i915#217]) -> [PASS][4]
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/fi-icl-u2/igt@kms_chamelium@hdmi-crc-fast.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17041/fi-icl-u2/igt@kms_chamelium@hdmi-crc-fast.html

  
  [CI#94]: https://gitlab.freedesktop.org/gfx-ci/i915-infra/issues/94
  [fdo#109635]: https://bugs.freedesktop.org/show_bug.cgi?id=109635
  [i915#217]: https://gitlab.freedesktop.org/drm/intel/issues/217


Participating hosts (49 -> 38)
------------------------------

  Additional (1): fi-skl-6770hq 
  Missing    (12): fi-ilk-m540 fi-bsw-n3050 fi-hsw-4200u fi-hsw-peppy fi-byt-squawks fi-bsw-cyan fi-kbl-7500u fi-ctg-p8600 fi-ivb-3770 fi-skl-lmem fi-byt-clapper fi-bdw-samus 


Build changes
-------------

  * CI: CI-20190529 -> None
  * Linux: CI_DRM_8167 -> Patchwork_17041

  CI-20190529: 20190529
  CI_DRM_8167: b51a7e7f4f72cf780661a1e4b479e2b27ddbafc8 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_5526: f49ebeee9f54d6f23c60a842f75f65561d452ab0 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_17041: e1571c1ed506ad288f8064f5bc432aaceb66ad95 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

e1571c1ed506 drm/i915/gem: Avoid gem_context->mutex for simple vma lookup
1492bcdd449a drm/i915: Immediately execute the fenced work
bee757d6cbda drm/i915/execlists: Pull tasklet interrupt-bh local to direct submission

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17041/index.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915/execlists: Pull tasklet interrupt-bh local to direct submission
  2020-03-20 17:47   ` [Intel-gfx] [PATCH] " Chris Wilson
@ 2020-03-20 20:28     ` Francisco Jerez
  2020-03-20 22:14       ` Francisco Jerez
  2020-03-21 13:12     ` Chris Wilson
  1 sibling, 1 reply; 19+ messages in thread
From: Francisco Jerez @ 2020-03-20 20:28 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


[-- Attachment #1.1.1: Type: text/plain, Size: 4083 bytes --]

Chris Wilson <chris@chris-wilson.co.uk> writes:

> We dropped calling process_csb prior to handling direct submission in
> order to avoid the nesting of spinlocks and lift process_csb() and the
> majority of the tasklet out of irq-off. However, we do want to avoid
> ksoftirqd latency in the fast path, so try and pull the interrupt-bh
> local to direct submission if we can acquire the tasklet's lock.
>
> v2: Tweak the balance to avoid over submitting lite-restores
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Francisco Jerez <currojerez@riseup.net>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_lrc.c    | 44 ++++++++++++++++++++------
>  drivers/gpu/drm/i915/gt/selftest_lrc.c |  2 +-
>  2 files changed, 36 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> index f09dd87324b9..dceb65a0088f 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> @@ -2884,17 +2884,17 @@ static void queue_request(struct intel_engine_cs *engine,
>  	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
>  }
>  
> -static void __submit_queue_imm(struct intel_engine_cs *engine)
> +static bool pending_csb(const struct intel_engine_execlists *el)
>  {
> -	struct intel_engine_execlists * const execlists = &engine->execlists;
> +	return READ_ONCE(*el->csb_write) != READ_ONCE(el->csb_head);
> +}
>  
> -	if (reset_in_progress(execlists))
> -		return; /* defer until we restart the engine following reset */
> +static bool skip_lite_restore(struct intel_engine_execlists *el,
> +			      const struct i915_request *rq)
> +{
> +	struct i915_request *inflight = execlists_active(el);
>  
> -	if (execlists->tasklet.func == execlists_submission_tasklet)
> -		__execlists_submission_tasklet(engine);
> -	else
> -		tasklet_hi_schedule(&execlists->tasklet);
> +	return inflight && inflight->context == rq->context;
>  }
>  
>  static void submit_queue(struct intel_engine_cs *engine,
> @@ -2905,8 +2905,34 @@ static void submit_queue(struct intel_engine_cs *engine,
>  	if (rq_prio(rq) <= execlists->queue_priority_hint)
>  		return;
>  
> +	if (reset_in_progress(execlists))
> +		return; /* defer until we restart the engine following reset */
> +
> +	/*
> +	 * Suppress immediate lite-restores, leave that to the tasklet.
> +	 *
> +	 * However, we leave the queue_priority_hint unset so that if we do
> +	 * submit a second context, we push that into ELSP[1] immediately.
> +	 */
> +	if (skip_lite_restore(execlists, rq))
> +		return;
> +
Why do you need to treat lite-restore specially here?

Anyway, trying this out now in combination with my patches now.

> +	/* Hopefully we clear execlists->pending[] to let us through */
> +	if (execlists->pending[0] && tasklet_trylock(&execlists->tasklet)) {
> +		process_csb(engine);
> +		tasklet_unlock(&execlists->tasklet);
> +		if (skip_lite_restore(execlists, rq))
> +			return;
> +	}
> +
>  	execlists->queue_priority_hint = rq_prio(rq);
> -	__submit_queue_imm(engine);
> +	__execlists_submission_tasklet(engine);
> +
> +	/* Try and pull an interrupt-bh queued on another CPU to here */
> +	if (pending_csb(execlists) && tasklet_trylock(&execlists->tasklet)) {
> +		process_csb(engine);
> +		tasklet_unlock(&execlists->tasklet);
> +	}
>  }
>  
>  static bool ancestor_on_hold(const struct intel_engine_cs *engine,
> diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c
> index 6f06ba750a0a..c5c4b07a7d5f 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c
> @@ -1028,7 +1028,7 @@ static int live_timeslice_rewind(void *arg)
>  		if (IS_ERR(rq[1]))
>  			goto err;
>  
> -		err = wait_for_submit(engine, rq[1], HZ / 2);
> +		err = wait_for_submit(engine, rq[0], HZ / 2);
>  		if (err) {
>  			pr_err("%s: failed to submit first context\n",
>  			       engine->name);
> -- 
> 2.20.1

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 227 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915/execlists: Pull tasklet interrupt-bh local to direct submission
  2020-03-20 20:28     ` Francisco Jerez
@ 2020-03-20 22:14       ` Francisco Jerez
  2020-03-23  9:45         ` Chris Wilson
  0 siblings, 1 reply; 19+ messages in thread
From: Francisco Jerez @ 2020-03-20 22:14 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


[-- Attachment #1.1.1: Type: text/plain, Size: 5061 bytes --]

Francisco Jerez <currojerez@riseup.net> writes:

> Chris Wilson <chris@chris-wilson.co.uk> writes:
>
>> We dropped calling process_csb prior to handling direct submission in
>> order to avoid the nesting of spinlocks and lift process_csb() and the
>> majority of the tasklet out of irq-off. However, we do want to avoid
>> ksoftirqd latency in the fast path, so try and pull the interrupt-bh
>> local to direct submission if we can acquire the tasklet's lock.
>>
>> v2: Tweak the balance to avoid over submitting lite-restores
>>
>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: Francisco Jerez <currojerez@riseup.net>
>> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
>> ---
>>  drivers/gpu/drm/i915/gt/intel_lrc.c    | 44 ++++++++++++++++++++------
>>  drivers/gpu/drm/i915/gt/selftest_lrc.c |  2 +-
>>  2 files changed, 36 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
>> index f09dd87324b9..dceb65a0088f 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
>> @@ -2884,17 +2884,17 @@ static void queue_request(struct intel_engine_cs *engine,
>>  	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
>>  }
>>  
>> -static void __submit_queue_imm(struct intel_engine_cs *engine)
>> +static bool pending_csb(const struct intel_engine_execlists *el)
>>  {
>> -	struct intel_engine_execlists * const execlists = &engine->execlists;
>> +	return READ_ONCE(*el->csb_write) != READ_ONCE(el->csb_head);
>> +}
>>  
>> -	if (reset_in_progress(execlists))
>> -		return; /* defer until we restart the engine following reset */
>> +static bool skip_lite_restore(struct intel_engine_execlists *el,
>> +			      const struct i915_request *rq)
>> +{
>> +	struct i915_request *inflight = execlists_active(el);
>>  
>> -	if (execlists->tasklet.func == execlists_submission_tasklet)
>> -		__execlists_submission_tasklet(engine);
>> -	else
>> -		tasklet_hi_schedule(&execlists->tasklet);
>> +	return inflight && inflight->context == rq->context;
>>  }
>>  
>>  static void submit_queue(struct intel_engine_cs *engine,
>> @@ -2905,8 +2905,34 @@ static void submit_queue(struct intel_engine_cs *engine,
>>  	if (rq_prio(rq) <= execlists->queue_priority_hint)
>>  		return;
>>  
>> +	if (reset_in_progress(execlists))
>> +		return; /* defer until we restart the engine following reset */
>> +
>> +	/*
>> +	 * Suppress immediate lite-restores, leave that to the tasklet.
>> +	 *
>> +	 * However, we leave the queue_priority_hint unset so that if we do
>> +	 * submit a second context, we push that into ELSP[1] immediately.
>> +	 */
>> +	if (skip_lite_restore(execlists, rq))
>> +		return;
>> +
> Why do you need to treat lite-restore specially here?
>
> Anyway, trying this out now in combination with my patches now.
>

This didn't seem to help (together with your other suggestion to move
the overload accounting to __execlists_schedule_in/out).  And it makes
the current -5% SynMark OglMultithread regression with my series go down
to -10%.  My previous suggestion of moving the
intel_gt_pm_active_begin() call to process_csb() when the submission is
ACK'ed by the hardware does seem to help (and it roughly halves the
OglMultithread regression), possibly because that way we're able to
determine whether the first context was actually overlapping at the
point that the second was received by the hardware -- I haven't tested
it extensively yet though.

>> +	/* Hopefully we clear execlists->pending[] to let us through */
>> +	if (execlists->pending[0] && tasklet_trylock(&execlists->tasklet)) {
>> +		process_csb(engine);
>> +		tasklet_unlock(&execlists->tasklet);
>> +		if (skip_lite_restore(execlists, rq))
>> +			return;
>> +	}
>> +
>>  	execlists->queue_priority_hint = rq_prio(rq);
>> -	__submit_queue_imm(engine);
>> +	__execlists_submission_tasklet(engine);
>> +
>> +	/* Try and pull an interrupt-bh queued on another CPU to here */
>> +	if (pending_csb(execlists) && tasklet_trylock(&execlists->tasklet)) {
>> +		process_csb(engine);
>> +		tasklet_unlock(&execlists->tasklet);
>> +	}
>>  }
>>  
>>  static bool ancestor_on_hold(const struct intel_engine_cs *engine,
>> diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c
>> index 6f06ba750a0a..c5c4b07a7d5f 100644
>> --- a/drivers/gpu/drm/i915/gt/selftest_lrc.c
>> +++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c
>> @@ -1028,7 +1028,7 @@ static int live_timeslice_rewind(void *arg)
>>  		if (IS_ERR(rq[1]))
>>  			goto err;
>>  
>> -		err = wait_for_submit(engine, rq[1], HZ / 2);
>> +		err = wait_for_submit(engine, rq[0], HZ / 2);
>>  		if (err) {
>>  			pr_err("%s: failed to submit first context\n",
>>  			       engine->name);
>> -- 
>> 2.20.1
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 227 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Intel-gfx] ✗ Fi.CI.IGT: failure for series starting with [1/4] drm/i915/gt: Report context-is-closed prior to pinning (rev2)
  2020-03-20 13:01 [Intel-gfx] [PATCH 1/4] drm/i915/gt: Report context-is-closed prior to pinning Chris Wilson
                   ` (5 preceding siblings ...)
  2020-03-20 20:06 ` [Intel-gfx] ✓ Fi.CI.BAT: success for series starting with [1/4] drm/i915/gt: Report context-is-closed prior to pinning (rev2) Patchwork
@ 2020-03-21  2:24 ` Patchwork
  2020-03-24  0:11 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for series starting with [1/4] drm/i915/gt: Report context-is-closed prior to pinning (rev3) Patchwork
  7 siblings, 0 replies; 19+ messages in thread
From: Patchwork @ 2020-03-21  2:24 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/4] drm/i915/gt: Report context-is-closed prior to pinning (rev2)
URL   : https://patchwork.freedesktop.org/series/74918/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_8167_full -> Patchwork_17041_full
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_17041_full absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_17041_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_17041_full:

### IGT changes ###

#### Possible regressions ####

  * igt@gem_ctx_persistence@replace@rcs0:
    - shard-skl:          [PASS][1] -> [INCOMPLETE][2]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/shard-skl6/igt@gem_ctx_persistence@replace@rcs0.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17041/shard-skl6/igt@gem_ctx_persistence@replace@rcs0.html

  * igt@gem_ctx_persistence@smoketest:
    - shard-tglb:         [PASS][3] -> [INCOMPLETE][4]
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/shard-tglb1/igt@gem_ctx_persistence@smoketest.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17041/shard-tglb7/igt@gem_ctx_persistence@smoketest.html
    - shard-kbl:          [PASS][5] -> [INCOMPLETE][6]
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/shard-kbl3/igt@gem_ctx_persistence@smoketest.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17041/shard-kbl7/igt@gem_ctx_persistence@smoketest.html
    - shard-iclb:         [PASS][7] -> [INCOMPLETE][8]
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/shard-iclb8/igt@gem_ctx_persistence@smoketest.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17041/shard-iclb4/igt@gem_ctx_persistence@smoketest.html

  * igt@gem_exec_async@concurrent-writes-bsd1:
    - shard-tglb:         [PASS][9] -> [FAIL][10]
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/shard-tglb7/igt@gem_exec_async@concurrent-writes-bsd1.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17041/shard-tglb6/igt@gem_exec_async@concurrent-writes-bsd1.html

  
#### Warnings ####

  * igt@gem_ctx_persistence@close-replace-race:
    - shard-tglb:         [TIMEOUT][11] ([i915#1340]) -> [INCOMPLETE][12]
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/shard-tglb1/igt@gem_ctx_persistence@close-replace-race.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17041/shard-tglb7/igt@gem_ctx_persistence@close-replace-race.html
    - shard-kbl:          [DMESG-WARN][13] -> [INCOMPLETE][14]
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/shard-kbl1/igt@gem_ctx_persistence@close-replace-race.html
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17041/shard-kbl7/igt@gem_ctx_persistence@close-replace-race.html

  
Known issues
------------

  Here are the changes found in Patchwork_17041_full that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_ctx_isolation@vecs0-s3:
    - shard-apl:          [PASS][15] -> [DMESG-WARN][16] ([i915#180]) +1 similar issue
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/shard-apl1/igt@gem_ctx_isolation@vecs0-s3.html
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17041/shard-apl1/igt@gem_ctx_isolation@vecs0-s3.html

  * igt@gem_ctx_persistence@smoketest:
    - shard-apl:          [PASS][17] -> [INCOMPLETE][18] ([fdo#103927])
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/shard-apl6/igt@gem_ctx_persistence@smoketest.html
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17041/shard-apl1/igt@gem_ctx_persistence@smoketest.html

  * igt@gem_exec_schedule@independent-bsd2:
    - shard-iclb:         [PASS][19] -> [SKIP][20] ([fdo#109276]) +11 similar issues
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/shard-iclb4/igt@gem_exec_schedule@independent-bsd2.html
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17041/shard-iclb3/igt@gem_exec_schedule@independent-bsd2.html

  * igt@gem_exec_schedule@pi-ringfull-bsd:
    - shard-iclb:         [PASS][21] -> [SKIP][22] ([fdo#112146]) +4 similar issues
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/shard-iclb6/igt@gem_exec_schedule@pi-ringfull-bsd.html
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17041/shard-iclb1/igt@gem_exec_schedule@pi-ringfull-bsd.html

  * igt@gem_exec_schedule@pi-shared-iova-bsd:
    - shard-iclb:         [PASS][23] -> [SKIP][24] ([i915#677])
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/shard-iclb7/igt@gem_exec_schedule@pi-shared-iova-bsd.html
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17041/shard-iclb4/igt@gem_exec_schedule@pi-shared-iova-bsd.html

  * igt@gem_exec_suspend@basic-s3:
    - shard-kbl:          [PASS][25] -> [DMESG-WARN][26] ([i915#180]) +4 similar issues
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/shard-kbl7/igt@gem_exec_suspend@basic-s3.html
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17041/shard-kbl3/igt@gem_exec_suspend@basic-s3.html

  * igt@gem_ppgtt@flink-and-close-vma-leak:
    - shard-apl:          [PASS][27] -> [FAIL][28] ([i915#644])
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/shard-apl3/igt@gem_ppgtt@flink-and-close-vma-leak.html
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17041/shard-apl6/igt@gem_ppgtt@flink-and-close-vma-leak.html

  * igt@i915_pm_dc@dc6-psr:
    - shard-skl:          [PASS][29] -> [FAIL][30] ([i915#454])
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/shard-skl6/igt@i915_pm_dc@dc6-psr.html
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17041/shard-skl8/igt@i915_pm_dc@dc6-psr.html

  * igt@i915_pm_rpm@system-suspend-modeset:
    - shard-skl:          [PASS][31] -> [INCOMPLETE][32] ([i915#151] / [i915#69])
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/shard-skl9/igt@i915_pm_rpm@system-suspend-modeset.html
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17041/shard-skl7/igt@i915_pm_rpm@system-suspend-modeset.html

  * igt@kms_cursor_crc@pipe-c-cursor-256x256-onscreen:
    - shard-skl:          [PASS][33] -> [FAIL][34] ([i915#54])
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/shard-skl9/igt@kms_cursor_crc@pipe-c-cursor-256x256-onscreen.html
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17041/shard-skl1/igt@kms_cursor_crc@pipe-c-cursor-256x256-onscreen.html

  * igt@kms_draw_crc@draw-method-xrgb2101010-mmap-cpu-untiled:
    - shard-skl:          [PASS][35] -> [FAIL][36] ([i915#52] / [i915#54])
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/shard-skl5/igt@kms_draw_crc@draw-method-xrgb2101010-mmap-cpu-untiled.html
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17041/shard-skl7/igt@kms_draw_crc@draw-method-xrgb2101010-mmap-cpu-untiled.html

  * igt@kms_flip@flip-vs-blocking-wf-vblank:
    - shard-skl:          [PASS][37] -> [FAIL][38] ([i915#34])
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/shard-skl4/igt@kms_flip@flip-vs-blocking-wf-vblank.html
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17041/shard-skl10/igt@kms_flip@flip-vs-blocking-wf-vblank.html

  * igt@kms_flip@plain-flip-fb-recreate-interruptible:
    - shard-kbl:          [PASS][39] -> [FAIL][40] ([i915#34])
   [39]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/shard-kbl4/igt@kms_flip@plain-flip-fb-recreate-interruptible.html
   [40]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17041/shard-kbl6/igt@kms_flip@plain-flip-fb-recreate-interruptible.html

  * igt@kms_frontbuffer_tracking@fbc-suspend:
    - shard-kbl:          [PASS][41] -> [INCOMPLETE][42] ([i915#155])
   [41]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/shard-kbl6/igt@kms_frontbuffer_tracking@fbc-suspend.html
   [42]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17041/shard-kbl3/igt@kms_frontbuffer_tracking@fbc-suspend.html

  * igt@kms_pipe_crc_basic@hang-read-crc-pipe-c:
    - shard-skl:          [PASS][43] -> [FAIL][44] ([i915#53])
   [43]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/shard-skl9/igt@kms_pipe_crc_basic@hang-read-crc-pipe-c.html
   [44]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17041/shard-skl1/igt@kms_pipe_crc_basic@hang-read-crc-pipe-c.html

  * igt@kms_plane_alpha_blend@pipe-c-coverage-7efc:
    - shard-skl:          [PASS][45] -> [FAIL][46] ([fdo#108145] / [i915#265])
   [45]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/shard-skl6/igt@kms_plane_alpha_blend@pipe-c-coverage-7efc.html
   [46]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17041/shard-skl8/igt@kms_plane_alpha_blend@pipe-c-coverage-7efc.html

  * igt@kms_psr@psr2_primary_mmap_gtt:
    - shard-iclb:         [PASS][47] -> [SKIP][48] ([fdo#109441])
   [47]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/shard-iclb2/igt@kms_psr@psr2_primary_mmap_gtt.html
   [48]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17041/shard-iclb8/igt@kms_psr@psr2_primary_mmap_gtt.html

  * igt@kms_setmode@basic:
    - shard-skl:          [PASS][49] -> [FAIL][50] ([i915#31])
   [49]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/shard-skl4/igt@kms_setmode@basic.html
   [50]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17041/shard-skl9/igt@kms_setmode@basic.html

  * igt@perf_pmu@busy-check-all-vcs1:
    - shard-iclb:         [PASS][51] -> [SKIP][52] ([fdo#112080]) +5 similar issues
   [51]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/shard-iclb4/igt@perf_pmu@busy-check-all-vcs1.html
   [52]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17041/shard-iclb3/igt@perf_pmu@busy-check-all-vcs1.html

  
#### Possible fixes ####

  * igt@gem_exec_balancer@smoke:
    - shard-iclb:         [SKIP][53] ([fdo#110854]) -> [PASS][54]
   [53]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/shard-iclb3/igt@gem_exec_balancer@smoke.html
   [54]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17041/shard-iclb1/igt@gem_exec_balancer@smoke.html

  * igt@gem_exec_parallel@vcs1-fds:
    - shard-iclb:         [SKIP][55] ([fdo#112080]) -> [PASS][56] +3 similar issues
   [55]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/shard-iclb6/igt@gem_exec_parallel@vcs1-fds.html
   [56]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17041/shard-iclb1/igt@gem_exec_parallel@vcs1-fds.html

  * igt@gem_exec_schedule@out-order-bsd2:
    - shard-iclb:         [SKIP][57] ([fdo#109276]) -> [PASS][58] +14 similar issues
   [57]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/shard-iclb6/igt@gem_exec_schedule@out-order-bsd2.html
   [58]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17041/shard-iclb1/igt@gem_exec_schedule@out-order-bsd2.html

  * igt@gem_exec_schedule@preempt-self-bsd:
    - shard-iclb:         [SKIP][59] ([fdo#112146]) -> [PASS][60] +1 similar issue
   [59]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/shard-iclb4/igt@gem_exec_schedule@preempt-self-bsd.html
   [60]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17041/shard-iclb3/igt@gem_exec_schedule@preempt-self-bsd.html

  * igt@kms_dp_dsc@basic-dsc-enable-edp:
    - shard-iclb:         [SKIP][61] ([fdo#109349]) -> [PASS][62]
   [61]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/shard-iclb1/igt@kms_dp_dsc@basic-dsc-enable-edp.html
   [62]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17041/shard-iclb2/igt@kms_dp_dsc@basic-dsc-enable-edp.html

  * igt@kms_flip@flip-vs-expired-vblank-interruptible:
    - shard-skl:          [FAIL][63] ([i915#79]) -> [PASS][64]
   [63]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/shard-skl7/igt@kms_flip@flip-vs-expired-vblank-interruptible.html
   [64]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17041/shard-skl5/igt@kms_flip@flip-vs-expired-vblank-interruptible.html

  * igt@kms_flip@flip-vs-suspend-interruptible:
    - shard-kbl:          [DMESG-WARN][65] ([i915#180]) -> [PASS][66] +3 similar issues
   [65]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/shard-kbl4/igt@kms_flip@flip-vs-suspend-interruptible.html
   [66]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17041/shard-kbl4/igt@kms_flip@flip-vs-suspend-interruptible.html

  * igt@kms_flip@plain-flip-ts-check:
    - shard-skl:          [FAIL][67] ([i915#34]) -> [PASS][68]
   [67]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/shard-skl9/igt@kms_flip@plain-flip-ts-check.html
   [68]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17041/shard-skl7/igt@kms_flip@plain-flip-ts-check.html

  * igt@kms_plane@plane-panning-bottom-right-suspend-pipe-b-planes:
    - shard-apl:          [DMESG-WARN][69] ([i915#180]) -> [PASS][70] +1 similar issue
   [69]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/shard-apl4/igt@kms_plane@plane-panning-bottom-right-suspend-pipe-b-planes.html
   [70]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17041/shard-apl7/igt@kms_plane@plane-panning-bottom-right-suspend-pipe-b-planes.html

  * igt@kms_plane_lowres@pipe-a-tiling-x:
    - shard-glk:          [FAIL][71] ([i915#899]) -> [PASS][72]
   [71]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/shard-glk8/igt@kms_plane_lowres@pipe-a-tiling-x.html
   [72]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17041/shard-glk6/igt@kms_plane_lowres@pipe-a-tiling-x.html

  * igt@kms_psr@psr2_sprite_plane_move:
    - shard-iclb:         [SKIP][73] ([fdo#109441]) -> [PASS][74] +4 similar issues
   [73]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/shard-iclb1/igt@kms_psr@psr2_sprite_plane_move.html
   [74]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17041/shard-iclb2/igt@kms_psr@psr2_sprite_plane_move.html

  * igt@kms_setmode@basic:
    - shard-hsw:          [FAIL][75] ([i915#31]) -> [PASS][76]
   [75]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/shard-hsw8/igt@kms_setmode@basic.html
   [76]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17041/shard-hsw1/igt@kms_setmode@basic.html

  * {igt@sysfs_preempt_timeout@timeout@vecs0}:
    - shard-glk:          [FAIL][77] -> [PASS][78]
   [77]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/shard-glk6/igt@sysfs_preempt_timeout@timeout@vecs0.html
   [78]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17041/shard-glk4/igt@sysfs_preempt_timeout@timeout@vecs0.html

  
#### Warnings ####

  * igt@gem_ctx_persistence@close-replace-race:
    - shard-apl:          [INCOMPLETE][79] ([fdo#103927] / [i915#1402]) -> [TIMEOUT][80] ([i915#1340])
   [79]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/shard-apl6/igt@gem_ctx_persistence@close-replace-race.html
   [80]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17041/shard-apl7/igt@gem_ctx_persistence@close-replace-race.html
    - shard-glk:          [INCOMPLETE][81] ([i915#1402] / [i915#58] / [k.org#198133]) -> [INCOMPLETE][82] ([i915#58] / [k.org#198133])
   [81]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/shard-glk8/igt@gem_ctx_persistence@close-replace-race.html
   [82]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17041/shard-glk8/igt@gem_ctx_persistence@close-replace-race.html

  * igt@i915_pm_dc@dc3co-vpb-simulation:
    - shard-iclb:         [SKIP][83] ([i915#588]) -> [SKIP][84] ([i915#658])
   [83]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/shard-iclb2/igt@i915_pm_dc@dc3co-vpb-simulation.html
   [84]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17041/shard-iclb8/igt@i915_pm_dc@dc3co-vpb-simulation.html

  * igt@i915_pm_dc@dc6-psr:
    - shard-tglb:         [SKIP][85] ([i915#468]) -> [FAIL][86] ([i915#454])
   [85]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/shard-tglb2/igt@i915_pm_dc@dc6-psr.html
   [86]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17041/shard-tglb7/igt@i915_pm_dc@dc6-psr.html

  * igt@kms_hdr@bpc-switch-suspend:
    - shard-skl:          [INCOMPLETE][87] ([i915#198]) -> [FAIL][88] ([i915#1188])
   [87]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/shard-skl9/igt@kms_hdr@bpc-switch-suspend.html
   [88]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17041/shard-skl7/igt@kms_hdr@bpc-switch-suspend.html

  * igt@runner@aborted:
    - shard-kbl:          ([FAIL][89], [FAIL][90]) ([i915#1389] / [i915#1402] / [i915#1485] / [i915#92]) -> [FAIL][91] ([i915#92])
   [89]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/shard-kbl1/igt@runner@aborted.html
   [90]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8167/shard-kbl4/igt@runner@aborted.html
   [91]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17041/shard-kbl4/igt@runner@aborted.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#103927]: https://bugs.freedesktop.org/show_bug.cgi?id=103927
  [fdo#108145]: https://bugs.freedesktop.org/show_bug.cgi?id=108145
  [fdo#109276]: https://bugs.freedesktop.org/show_bug.cgi?id=109276
  [fdo#109349]: https://bugs.freedesktop.org/show_bug.cgi?id=109349
  [fdo#109441]: https://bugs.freedesktop.org/show_bug.cgi?id=109441
  [fdo#110854]: https://bugs.freedesktop.org/show_bug.cgi?id=110854
  [fdo#112080]: https://bugs.freedesktop.org/show_bug.cgi?id=112080
  [fdo#112146]: https://bugs.freedesktop.org/show_bug.cgi?id=112146
  [i915#1188]: https://gitlab.freedesktop.org/drm/intel/issues/1188
  [i915#1340]: https://gitlab.freedesktop.org/drm/intel/issues/1340
  [i915#1389]: https://gitlab.freedesktop.org/drm/intel/issues/1389
  [i915#1402]: https://gitlab.freedesktop.org/drm/intel/issues/1402
  [i915#1485]: https://gitlab.freedesktop.org/drm/intel/issues/1485
  [i915#151]: https://gitlab.freedesktop.org/drm/intel/issues/151
  [i915#155]: https://gitlab.freedesktop.org/drm/intel/issues/155
  [i915#180]: https://gitlab.freedesktop.org/drm/intel/issues/180
  [i915#198]: https://gitlab.freedesktop.org/drm/intel/issues/198
  [i915#265]: https://gitlab.freedesktop.org/drm/intel/issues/265
  [i915#31]: https://gitlab.freedesktop.org/drm/intel/issues/31
  [i915#34]: https://gitlab.freedesktop.org/drm/intel/issues/34
  [i915#454]: https://gitlab.freedesktop.org/drm/intel/issues/454
  [i915#468]: https://gitlab.freedesktop.org/drm/intel/issues/468
  [i915#52]: https://gitlab.freedesktop.org/drm/intel/issues/52
  [i915#53]: https://gitlab.freedesktop.org/drm/intel/issues/53
  [i915#54]: https://gitlab.freedesktop.org/drm/intel/issues/54
  [i915#58]: https://gitlab.freedesktop.org/drm/intel/issues/58
  [i915#588]: https://gitlab.freedesktop.org/drm/intel/issues/588
  [i915#644]: https://gitlab.freedesktop.org/drm/intel/issues/644
  [i915#658]: https://gitlab.freedesktop.org/drm/intel/issues/658
  [i915#677]: https://gitlab.freedesktop.org/drm/intel/issues/677
  [i915#69]: https://gitlab.freedesktop.org/drm/intel/issues/69
  [i915#79]: https://gitlab.freedesktop.org/drm/intel/issues/79
  [i915#899]: https://gitlab.freedesktop.org/drm/intel/issues/899
  [i915#92]: https://gitlab.freedesktop.org/drm/intel/issues/92
  [k.org#198133]: https://bugzilla.kernel.org/show_bug.cgi?id=198133


Participating hosts (10 -> 10)
------------------------------

  No changes in participating hosts


Build changes
-------------

  * CI: CI-20190529 -> None
  * Linux: CI_DRM_8167 -> Patchwork_17041

  CI-20190529: 20190529
  CI_DRM_8167: b51a7e7f4f72cf780661a1e4b479e2b27ddbafc8 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_5526: f49ebeee9f54d6f23c60a842f75f65561d452ab0 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_17041: e1571c1ed506ad288f8064f5bc432aaceb66ad95 @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17041/index.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915/execlists: Pull tasklet interrupt-bh local to direct submission
  2020-03-20 17:47   ` [Intel-gfx] [PATCH] " Chris Wilson
  2020-03-20 20:28     ` Francisco Jerez
@ 2020-03-21 13:12     ` Chris Wilson
  1 sibling, 0 replies; 19+ messages in thread
From: Chris Wilson @ 2020-03-21 13:12 UTC (permalink / raw)
  To: intel-gfx

Quoting Chris Wilson (2020-03-20 17:47:45)
> We dropped calling process_csb prior to handling direct submission in
> order to avoid the nesting of spinlocks and lift process_csb() and the
> majority of the tasklet out of irq-off. However, we do want to avoid
> ksoftirqd latency in the fast path, so try and pull the interrupt-bh
> local to direct submission if we can acquire the tasklet's lock.
> 
> v2: Tweak the balance to avoid over submitting lite-restores

Sigh. It's a driver problem with lock contention, mostly from aggressive
idling.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915/execlists: Pull tasklet interrupt-bh local to direct submission
  2020-03-20 22:14       ` Francisco Jerez
@ 2020-03-23  9:45         ` Chris Wilson
  2020-03-23 22:30           ` Francisco Jerez
  0 siblings, 1 reply; 19+ messages in thread
From: Chris Wilson @ 2020-03-23  9:45 UTC (permalink / raw)
  To: Francisco Jerez, intel-gfx

Quoting Francisco Jerez (2020-03-20 22:14:51)
> Francisco Jerez <currojerez@riseup.net> writes:
> 
> > Chris Wilson <chris@chris-wilson.co.uk> writes:
> >
> >> We dropped calling process_csb prior to handling direct submission in
> >> order to avoid the nesting of spinlocks and lift process_csb() and the
> >> majority of the tasklet out of irq-off. However, we do want to avoid
> >> ksoftirqd latency in the fast path, so try and pull the interrupt-bh
> >> local to direct submission if we can acquire the tasklet's lock.
> >>
> >> v2: Tweak the balance to avoid over submitting lite-restores
> >>
> >> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >> Cc: Francisco Jerez <currojerez@riseup.net>
> >> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> >> ---
> >>  drivers/gpu/drm/i915/gt/intel_lrc.c    | 44 ++++++++++++++++++++------
> >>  drivers/gpu/drm/i915/gt/selftest_lrc.c |  2 +-
> >>  2 files changed, 36 insertions(+), 10 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> >> index f09dd87324b9..dceb65a0088f 100644
> >> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> >> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> >> @@ -2884,17 +2884,17 @@ static void queue_request(struct intel_engine_cs *engine,
> >>      set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> >>  }
> >>  
> >> -static void __submit_queue_imm(struct intel_engine_cs *engine)
> >> +static bool pending_csb(const struct intel_engine_execlists *el)
> >>  {
> >> -    struct intel_engine_execlists * const execlists = &engine->execlists;
> >> +    return READ_ONCE(*el->csb_write) != READ_ONCE(el->csb_head);
> >> +}
> >>  
> >> -    if (reset_in_progress(execlists))
> >> -            return; /* defer until we restart the engine following reset */
> >> +static bool skip_lite_restore(struct intel_engine_execlists *el,
> >> +                          const struct i915_request *rq)
> >> +{
> >> +    struct i915_request *inflight = execlists_active(el);
> >>  
> >> -    if (execlists->tasklet.func == execlists_submission_tasklet)
> >> -            __execlists_submission_tasklet(engine);
> >> -    else
> >> -            tasklet_hi_schedule(&execlists->tasklet);
> >> +    return inflight && inflight->context == rq->context;
> >>  }
> >>  
> >>  static void submit_queue(struct intel_engine_cs *engine,
> >> @@ -2905,8 +2905,34 @@ static void submit_queue(struct intel_engine_cs *engine,
> >>      if (rq_prio(rq) <= execlists->queue_priority_hint)
> >>              return;
> >>  
> >> +    if (reset_in_progress(execlists))
> >> +            return; /* defer until we restart the engine following reset */
> >> +
> >> +    /*
> >> +     * Suppress immediate lite-restores, leave that to the tasklet.
> >> +     *
> >> +     * However, we leave the queue_priority_hint unset so that if we do
> >> +     * submit a second context, we push that into ELSP[1] immediately.
> >> +     */
> >> +    if (skip_lite_restore(execlists, rq))
> >> +            return;
> >> +
> > Why do you need to treat lite-restore specially here?

Lite-restore have a noticeable impact on no-op loads. A part of that is
that a lite-restore is about 1us, and the other part is that the driver
has a lot more work to do. There's a balance point around here for not
needlessly interrupting ourselves and ensuring that there is no bubble.

> >
> > Anyway, trying this out now in combination with my patches now.
> >
> 
> This didn't seem to help (together with your other suggestion to move
> the overload accounting to __execlists_schedule_in/out).  And it makes
> the current -5% SynMark OglMultithread regression with my series go down
> to -10%.  My previous suggestion of moving the
> intel_gt_pm_active_begin() call to process_csb() when the submission is
> ACK'ed by the hardware does seem to help (and it roughly halves the
> OglMultithread regression), possibly because that way we're able to
> determine whether the first context was actually overlapping at the
> point that the second was received by the hardware -- I haven't tested
> it extensively yet though.

Grumble, it just seems like we are setting and clearing the flag on
completely unrelated events -- which I still think boils down to working
around latency in the driver. Or at least I hope there's an explanation
and bug to fix that improves responsiveness for all.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915/execlists: Pull tasklet interrupt-bh local to direct submission
  2020-03-23  9:45         ` Chris Wilson
@ 2020-03-23 22:30           ` Francisco Jerez
  2020-03-23 23:50             ` Chris Wilson
  0 siblings, 1 reply; 19+ messages in thread
From: Francisco Jerez @ 2020-03-23 22:30 UTC (permalink / raw)
  To: chris.p.wilson, intel-gfx


[-- Attachment #1.1.1: Type: text/plain, Size: 6591 bytes --]

Chris Wilson <chris@chris-wilson.co.uk> writes:

> Quoting Francisco Jerez (2020-03-20 22:14:51)
>> Francisco Jerez <currojerez@riseup.net> writes:
>> 
>> > Chris Wilson <chris@chris-wilson.co.uk> writes:
>> >
>> >> We dropped calling process_csb prior to handling direct submission in
>> >> order to avoid the nesting of spinlocks and lift process_csb() and the
>> >> majority of the tasklet out of irq-off. However, we do want to avoid
>> >> ksoftirqd latency in the fast path, so try and pull the interrupt-bh
>> >> local to direct submission if we can acquire the tasklet's lock.
>> >>
>> >> v2: Tweak the balance to avoid over submitting lite-restores
>> >>
>> >> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> >> Cc: Francisco Jerez <currojerez@riseup.net>
>> >> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
>> >> ---
>> >>  drivers/gpu/drm/i915/gt/intel_lrc.c    | 44 ++++++++++++++++++++------
>> >>  drivers/gpu/drm/i915/gt/selftest_lrc.c |  2 +-
>> >>  2 files changed, 36 insertions(+), 10 deletions(-)
>> >>
>> >> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
>> >> index f09dd87324b9..dceb65a0088f 100644
>> >> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
>> >> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
>> >> @@ -2884,17 +2884,17 @@ static void queue_request(struct intel_engine_cs *engine,
>> >>      set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
>> >>  }
>> >>  
>> >> -static void __submit_queue_imm(struct intel_engine_cs *engine)
>> >> +static bool pending_csb(const struct intel_engine_execlists *el)
>> >>  {
>> >> -    struct intel_engine_execlists * const execlists = &engine->execlists;
>> >> +    return READ_ONCE(*el->csb_write) != READ_ONCE(el->csb_head);
>> >> +}
>> >>  
>> >> -    if (reset_in_progress(execlists))
>> >> -            return; /* defer until we restart the engine following reset */
>> >> +static bool skip_lite_restore(struct intel_engine_execlists *el,
>> >> +                          const struct i915_request *rq)
>> >> +{
>> >> +    struct i915_request *inflight = execlists_active(el);
>> >>  
>> >> -    if (execlists->tasklet.func == execlists_submission_tasklet)
>> >> -            __execlists_submission_tasklet(engine);
>> >> -    else
>> >> -            tasklet_hi_schedule(&execlists->tasklet);
>> >> +    return inflight && inflight->context == rq->context;
>> >>  }
>> >>  
>> >>  static void submit_queue(struct intel_engine_cs *engine,
>> >> @@ -2905,8 +2905,34 @@ static void submit_queue(struct intel_engine_cs *engine,
>> >>      if (rq_prio(rq) <= execlists->queue_priority_hint)
>> >>              return;
>> >>  
>> >> +    if (reset_in_progress(execlists))
>> >> +            return; /* defer until we restart the engine following reset */
>> >> +
>> >> +    /*
>> >> +     * Suppress immediate lite-restores, leave that to the tasklet.
>> >> +     *
>> >> +     * However, we leave the queue_priority_hint unset so that if we do
>> >> +     * submit a second context, we push that into ELSP[1] immediately.
>> >> +     */
>> >> +    if (skip_lite_restore(execlists, rq))
>> >> +            return;
>> >> +
>> > Why do you need to treat lite-restore specially here?
>
> Lite-restore have a noticeable impact on no-op loads. A part of that is
> that a lite-restore is about 1us, and the other part is that the driver
> has a lot more work to do. There's a balance point around here for not
> needlessly interrupting ourselves and ensuring that there is no bubble.
>

Oh, I see.  But isn't inhibiting the lite restore likely to be fairly
costly in some cases as well if it causes the GPU to go idle after the
current context completes for as long as it takes the CPU to wake up,
process the IRQ and dequeue the next request?  Would it make sense to
inhibit lite-restore in roughly the same conditions I set the overload
flag?  (since that indicates we'll get an IRQ at least one request
*before* the GPU actually goes idle, so there shouldn't be any penalty
from inhibiting lite restore).

>> >
>> > Anyway, trying this out now in combination with my patches now.
>> >
>> 
>> This didn't seem to help (together with your other suggestion to move
>> the overload accounting to __execlists_schedule_in/out).  And it makes
>> the current -5% SynMark OglMultithread regression with my series go down
>> to -10%.  My previous suggestion of moving the
>> intel_gt_pm_active_begin() call to process_csb() when the submission is
>> ACK'ed by the hardware does seem to help (and it roughly halves the
>> OglMultithread regression), possibly because that way we're able to
>> determine whether the first context was actually overlapping at the
>> point that the second was received by the hardware -- I haven't tested
>> it extensively yet though.
>
> Grumble, it just seems like we are setting and clearing the flag on
> completely unrelated events -- which I still think boils down to working
> around latency in the driver. Or at least I hope there's an explanation
> and bug to fix that improves responsiveness for all.
> -Chris

There *might* be a performance issue somewhere introducing latency that
the instrumentation I added happens to mitigate, but isn't that a sign
that it's fulfilling its purpose of determining when the workload could
be sensitive to CPU latency?

Maybe I didn't explain the idea properly: Given that command submission
is asynchronous with the CS processing the previous context, there is no
way for us to tell whether a request we just submitted was actually
overlapping with the previous one until we read the CSB and see whether
it led to an idle-to-active transition.  Only then can we assume that
the CPU is sending commands to the GPU quickly enough to keep it busy
without interruption.

You might argue that this will introduce a delay in the signalling of
overload roughly equal to the latency it takes for the CPU to receive
the execlists IRQ with the hardware ACK.  However that seems beneficial
since the clearing of overload suffers from the same latency, so the
fraction of time that overload is signalled will otherwise be biased as
a result of the latency difference, causing overload to be overreported
on the average.  Delaying the signalling of overload to the CSB handler
means that any systematic latency in our interrupt processing is
self-correcting.

Anyway, I'm open to other suggestions if you have other ideas that at
least don't worsen the pre-existing regression from my series.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 227 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915/execlists: Pull tasklet interrupt-bh local to direct submission
  2020-03-23 22:30           ` Francisco Jerez
@ 2020-03-23 23:50             ` Chris Wilson
  2020-03-24 22:55               ` Francisco Jerez
  0 siblings, 1 reply; 19+ messages in thread
From: Chris Wilson @ 2020-03-23 23:50 UTC (permalink / raw)
  To: Francisco Jerez, intel-gfx

Quoting Francisco Jerez (2020-03-23 22:30:13)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > Quoting Francisco Jerez (2020-03-20 22:14:51)
> >> Francisco Jerez <currojerez@riseup.net> writes:
> >> 
> >> > Chris Wilson <chris@chris-wilson.co.uk> writes:
> >> >
> >> >> We dropped calling process_csb prior to handling direct submission in
> >> >> order to avoid the nesting of spinlocks and lift process_csb() and the
> >> >> majority of the tasklet out of irq-off. However, we do want to avoid
> >> >> ksoftirqd latency in the fast path, so try and pull the interrupt-bh
> >> >> local to direct submission if we can acquire the tasklet's lock.
> >> >>
> >> >> v2: Tweak the balance to avoid over submitting lite-restores
> >> >>
> >> >> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >> >> Cc: Francisco Jerez <currojerez@riseup.net>
> >> >> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> >> >> ---
> >> >>  drivers/gpu/drm/i915/gt/intel_lrc.c    | 44 ++++++++++++++++++++------
> >> >>  drivers/gpu/drm/i915/gt/selftest_lrc.c |  2 +-
> >> >>  2 files changed, 36 insertions(+), 10 deletions(-)
> >> >>
> >> >> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> >> >> index f09dd87324b9..dceb65a0088f 100644
> >> >> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> >> >> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> >> >> @@ -2884,17 +2884,17 @@ static void queue_request(struct intel_engine_cs *engine,
> >> >>      set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> >> >>  }
> >> >>  
> >> >> -static void __submit_queue_imm(struct intel_engine_cs *engine)
> >> >> +static bool pending_csb(const struct intel_engine_execlists *el)
> >> >>  {
> >> >> -    struct intel_engine_execlists * const execlists = &engine->execlists;
> >> >> +    return READ_ONCE(*el->csb_write) != READ_ONCE(el->csb_head);
> >> >> +}
> >> >>  
> >> >> -    if (reset_in_progress(execlists))
> >> >> -            return; /* defer until we restart the engine following reset */
> >> >> +static bool skip_lite_restore(struct intel_engine_execlists *el,
> >> >> +                          const struct i915_request *rq)
> >> >> +{
> >> >> +    struct i915_request *inflight = execlists_active(el);
> >> >>  
> >> >> -    if (execlists->tasklet.func == execlists_submission_tasklet)
> >> >> -            __execlists_submission_tasklet(engine);
> >> >> -    else
> >> >> -            tasklet_hi_schedule(&execlists->tasklet);
> >> >> +    return inflight && inflight->context == rq->context;
> >> >>  }
> >> >>  
> >> >>  static void submit_queue(struct intel_engine_cs *engine,
> >> >> @@ -2905,8 +2905,34 @@ static void submit_queue(struct intel_engine_cs *engine,
> >> >>      if (rq_prio(rq) <= execlists->queue_priority_hint)
> >> >>              return;
> >> >>  
> >> >> +    if (reset_in_progress(execlists))
> >> >> +            return; /* defer until we restart the engine following reset */
> >> >> +
> >> >> +    /*
> >> >> +     * Suppress immediate lite-restores, leave that to the tasklet.
> >> >> +     *
> >> >> +     * However, we leave the queue_priority_hint unset so that if we do
> >> >> +     * submit a second context, we push that into ELSP[1] immediately.
> >> >> +     */
> >> >> +    if (skip_lite_restore(execlists, rq))
> >> >> +            return;
> >> >> +
> >> > Why do you need to treat lite-restore specially here?
> >
> > Lite-restore have a noticeable impact on no-op loads. A part of that is
> > that a lite-restore is about 1us, and the other part is that the driver
> > has a lot more work to do. There's a balance point around here for not
> > needlessly interrupting ourselves and ensuring that there is no bubble.
> >
> 
> Oh, I see.  But isn't inhibiting the lite restore likely to be fairly
> costly in some cases as well if it causes the GPU to go idle after the
> current context completes for as long as it takes the CPU to wake up,
> process the IRQ and dequeue the next request?

Yes. It's an amusing diversion to try and think how can we do a single
context submission (well 2) for a sequence of requests, for those
clients that like to submit N batches at once. From an idle GPU,
assuming each batch is non-trivial, we want to do something like submit
batch 1, accumulate, then submit batches 1-N, i.e. skip the intervening
lite-restores but complete the submission with all the work queued.

> Would it make sense to
> inhibit lite-restore in roughly the same conditions I set the overload
> flag?  (since that indicates we'll get an IRQ at least one request
> *before* the GPU actually goes idle, so there shouldn't be any penalty
> from inhibiting lite restore).

Yes/no. Once we have multiple contexts scheduled, we won't be doing lite
restores.

The key problem is that the irq/tasklet-bh latency is unpredictable. Only
submitting one context at a time costs about 1% in [multi-context]
transcode throughput. And the cost of lite-restore on that scale is
barely noticeable.

So it annoys me that we can measure the impact of lite-restore on
nop-throughput, but in reality we have a self-inflicted regression on
top of the lite-restore that caught my eye.

Since we don't resubmit more contexts until we receive an ack from the
HW, the latency in processing that ack actually allowed us to accumulate
more nops into a single submission. Process that ack earlier and we
start submitting each nop individually. But we do want to process that
ack earlier as we know we are handling a request that should be pushed
to the GPU immediately.

[The age old adage, batching submissions is good for throughput, bad for
latency. And I have to pinch myself that throughput is easier to
measure, but latency is the crucial metric.]

> >> > Anyway, trying this out now in combination with my patches now.
> >> >
> >> 
> >> This didn't seem to help (together with your other suggestion to move
> >> the overload accounting to __execlists_schedule_in/out).  And it makes
> >> the current -5% SynMark OglMultithread regression with my series go down
> >> to -10%.  My previous suggestion of moving the
> >> intel_gt_pm_active_begin() call to process_csb() when the submission is
> >> ACK'ed by the hardware does seem to help (and it roughly halves the
> >> OglMultithread regression), possibly because that way we're able to
> >> determine whether the first context was actually overlapping at the
> >> point that the second was received by the hardware -- I haven't tested
> >> it extensively yet though.
> >
> > Grumble, it just seems like we are setting and clearing the flag on
> > completely unrelated events -- which I still think boils down to working
> > around latency in the driver. Or at least I hope there's an explanation
> > and bug to fix that improves responsiveness for all.
> > -Chris
> 
> There *might* be a performance issue somewhere introducing latency that
> the instrumentation I added happens to mitigate, but isn't that a sign
> that it's fulfilling its purpose of determining when the workload could
> be sensitive to CPU latency?

We have latency issues in the tasklet submission. The irq-off lock
contention along the submission path [execlists_dequeue] is perhaps the
biggest issue in the driver (at least from lockstats perspective). My
expectation is that the delay in processing the CSB is affecting the
'overload' heuristic.
 
> Maybe I didn't explain the idea properly: Given that command submission
> is asynchronous with the CS processing the previous context, there is no
> way for us to tell whether a request we just submitted was actually
> overlapping with the previous one until we read the CSB and see whether
> it led to an idle-to-active transition.  Only then can we assume that
> the CPU is sending commands to the GPU quickly enough to keep it busy
> without interruption.

That sounds like you just want to use the C0 counters.

But if you want to use the active/idle state as your heuristic, then you
want something like

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 3102159d2b3b..3e08ecd53ecb 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -2443,6 +2443,8 @@ static void process_csb(struct intel_engine_cs *engine)

                        GEM_BUG_ON(execlists->active - execlists->inflight >
                                   execlists_num_ports(execlists));
+
+                       WRITE_ONCE(execlists->overload, !!*execlists->active);
                }
        } while (head != tail);


That will be set to true when we have enough work to keep the GPU busy
into the second context, and only be set to false when the GPU idles.

However, just because we switched contexts does not necessarily imply
that the previous context did substantial work. And there may be lots of
fun with timeslicing preemptions that do not let a single context run to
completion. So perhaps,

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 3102159d2b3b..05bb533d556c 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -2340,6 +2340,7 @@ static void process_csb(struct intel_engine_cs *engine)
        rmb();

        ENGINE_TRACE(engine, "cs-irq head=%d, tail=%d\n", head, tail);
+       bool overload = *execlists->active;
        do {
                bool promote;

@@ -2443,8 +2444,11 @@ static void process_csb(struct intel_engine_cs *engine)

                        GEM_BUG_ON(execlists->active - execlists->inflight >
                                   execlists_num_ports(execlists));
+
+                       overload = *execlists->overload; /* active->idle? */
                }
        } while (head != tail);
+       WRITE_ONCE(execlists->overload, overload && *execlists->overload);


which sets the overload flag if we enter with an active context and
leave with another active context, without going through an idle state.
But that will set overload for timeslicing and for high priority
preemption.

(Now I have to ask whether that's what you had before :)
 
> You might argue that this will introduce a delay in the signalling of
> overload roughly equal to the latency it takes for the CPU to receive
> the execlists IRQ with the hardware ACK.  However that seems beneficial
> since the clearing of overload suffers from the same latency, so the
> fraction of time that overload is signalled will otherwise be biased as
> a result of the latency difference, causing overload to be overreported
> on the average.  Delaying the signalling of overload to the CSB handler
> means that any systematic latency in our interrupt processing is
> self-correcting.

That only worries me if we are trying to balance decisions made on
submission with the async ack. If we can solely use the CSB events that
worry is moot.
 
> Anyway, I'm open to other suggestions if you have other ideas that at
> least don't worsen the pre-existing regression from my series.

Likewise, if the best is the current overload semantics then so be it.
-Chris
---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Intel-gfx] ✗ Fi.CI.BUILD: failure for series starting with [1/4] drm/i915/gt: Report context-is-closed prior to pinning (rev3)
  2020-03-20 13:01 [Intel-gfx] [PATCH 1/4] drm/i915/gt: Report context-is-closed prior to pinning Chris Wilson
                   ` (6 preceding siblings ...)
  2020-03-21  2:24 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
@ 2020-03-24  0:11 ` Patchwork
  7 siblings, 0 replies; 19+ messages in thread
From: Patchwork @ 2020-03-24  0:11 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/4] drm/i915/gt: Report context-is-closed prior to pinning (rev3)
URL   : https://patchwork.freedesktop.org/series/74918/
State : failure

== Summary ==

Applying: drm/i915/gt: Report context-is-closed prior to pinning
Using index info to reconstruct a base tree...
M	drivers/gpu/drm/i915/gt/intel_context.c
Falling back to patching base and 3-way merge...
Auto-merging drivers/gpu/drm/i915/gt/intel_context.c
CONFLICT (content): Merge conflict in drivers/gpu/drm/i915/gt/intel_context.c
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch' to see the failed patch
Patch failed at 0001 drm/i915/gt: Report context-is-closed prior to pinning
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915/execlists: Pull tasklet interrupt-bh local to direct submission
  2020-03-23 23:50             ` Chris Wilson
@ 2020-03-24 22:55               ` Francisco Jerez
  0 siblings, 0 replies; 19+ messages in thread
From: Francisco Jerez @ 2020-03-24 22:55 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


[-- Attachment #1.1.1: Type: text/plain, Size: 14564 bytes --]

Chris Wilson <chris.p.wilson@intel.com> writes:

> Quoting Francisco Jerez (2020-03-23 22:30:13)
>> Chris Wilson <chris@chris-wilson.co.uk> writes:
>> 
>> > Quoting Francisco Jerez (2020-03-20 22:14:51)
>> >> Francisco Jerez <currojerez@riseup.net> writes:
>> >> 
>> >> > Chris Wilson <chris@chris-wilson.co.uk> writes:
>> >> >
>> >> >> We dropped calling process_csb prior to handling direct submission in
>> >> >> order to avoid the nesting of spinlocks and lift process_csb() and the
>> >> >> majority of the tasklet out of irq-off. However, we do want to avoid
>> >> >> ksoftirqd latency in the fast path, so try and pull the interrupt-bh
>> >> >> local to direct submission if we can acquire the tasklet's lock.
>> >> >>
>> >> >> v2: Tweak the balance to avoid over submitting lite-restores
>> >> >>
>> >> >> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> >> >> Cc: Francisco Jerez <currojerez@riseup.net>
>> >> >> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
>> >> >> ---
>> >> >>  drivers/gpu/drm/i915/gt/intel_lrc.c    | 44 ++++++++++++++++++++------
>> >> >>  drivers/gpu/drm/i915/gt/selftest_lrc.c |  2 +-
>> >> >>  2 files changed, 36 insertions(+), 10 deletions(-)
>> >> >>
>> >> >> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
>> >> >> index f09dd87324b9..dceb65a0088f 100644
>> >> >> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
>> >> >> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
>> >> >> @@ -2884,17 +2884,17 @@ static void queue_request(struct intel_engine_cs *engine,
>> >> >>      set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
>> >> >>  }
>> >> >>  
>> >> >> -static void __submit_queue_imm(struct intel_engine_cs *engine)
>> >> >> +static bool pending_csb(const struct intel_engine_execlists *el)
>> >> >>  {
>> >> >> -    struct intel_engine_execlists * const execlists = &engine->execlists;
>> >> >> +    return READ_ONCE(*el->csb_write) != READ_ONCE(el->csb_head);
>> >> >> +}
>> >> >>  
>> >> >> -    if (reset_in_progress(execlists))
>> >> >> -            return; /* defer until we restart the engine following reset */
>> >> >> +static bool skip_lite_restore(struct intel_engine_execlists *el,
>> >> >> +                          const struct i915_request *rq)
>> >> >> +{
>> >> >> +    struct i915_request *inflight = execlists_active(el);
>> >> >>  
>> >> >> -    if (execlists->tasklet.func == execlists_submission_tasklet)
>> >> >> -            __execlists_submission_tasklet(engine);
>> >> >> -    else
>> >> >> -            tasklet_hi_schedule(&execlists->tasklet);
>> >> >> +    return inflight && inflight->context == rq->context;
>> >> >>  }
>> >> >>  
>> >> >>  static void submit_queue(struct intel_engine_cs *engine,
>> >> >> @@ -2905,8 +2905,34 @@ static void submit_queue(struct intel_engine_cs *engine,
>> >> >>      if (rq_prio(rq) <= execlists->queue_priority_hint)
>> >> >>              return;
>> >> >>  
>> >> >> +    if (reset_in_progress(execlists))
>> >> >> +            return; /* defer until we restart the engine following reset */
>> >> >> +
>> >> >> +    /*
>> >> >> +     * Suppress immediate lite-restores, leave that to the tasklet.
>> >> >> +     *
>> >> >> +     * However, we leave the queue_priority_hint unset so that if we do
>> >> >> +     * submit a second context, we push that into ELSP[1] immediately.
>> >> >> +     */
>> >> >> +    if (skip_lite_restore(execlists, rq))
>> >> >> +            return;
>> >> >> +
>> >> > Why do you need to treat lite-restore specially here?
>> >
>> > Lite-restore have a noticeable impact on no-op loads. A part of that is
>> > that a lite-restore is about 1us, and the other part is that the driver
>> > has a lot more work to do. There's a balance point around here for not
>> > needlessly interrupting ourselves and ensuring that there is no bubble.
>> >
>> 
>> Oh, I see.  But isn't inhibiting the lite restore likely to be fairly
>> costly in some cases as well if it causes the GPU to go idle after the
>> current context completes for as long as it takes the CPU to wake up,
>> process the IRQ and dequeue the next request?
>
> Yes. It's an amusing diversion to try and think how can we do a single
> context submission (well 2) for a sequence of requests, for those
> clients that like to submit N batches at once. From an idle GPU,
> assuming each batch is non-trivial, we want to do something like submit
> batch 1, accumulate, then submit batches 1-N, i.e. skip the intervening
> lite-restores but complete the submission with all the work queued.
>

I guess ideally you would submit batch 1, then batch 2 (so you don't
have to rush to feed the GPU once batch 1 terminates), then accumulate
any further requests until batch 1 completes.  That would reduce the
number of lite-restores to the minimum you need to keep the GPU at least
one request ahead.  Though I'm guessing that this would involve using
the breadcrumbs IRQ to aid command submission so we can tell when batch
1 terminated, since AFAIUI we won't get any execlists interrupt at that
point if batch 1 and 2 happened to belong to the same context.

Sigh.  Submitting commands efficiently would be easier if the hardware
didn't have this annoying restriction of the contexts from different
ELSP ports having to be different -- Or has the restriction been lifted
in recent hardware?  At least they gave us 8 ELSP ports on ICL, if only
we could take advantage of them without having 8 contexts running in
parallel...

>> Would it make sense to
>> inhibit lite-restore in roughly the same conditions I set the overload
>> flag?  (since that indicates we'll get an IRQ at least one request
>> *before* the GPU actually goes idle, so there shouldn't be any penalty
>> from inhibiting lite restore).
>
> Yes/no. Once we have multiple contexts scheduled, we won't be doing lite
> restores.
>
> The key problem is that the irq/tasklet-bh latency is unpredictable. Only
> submitting one context at a time costs about 1% in [multi-context]
> transcode throughput. And the cost of lite-restore on that scale is
> barely noticeable.
>
> So it annoys me that we can measure the impact of lite-restore on
> nop-throughput, but in reality we have a self-inflicted regression on
> top of the lite-restore that caught my eye.
>
> Since we don't resubmit more contexts until we receive an ack from the
> HW, the latency in processing that ack actually allowed us to accumulate
> more nops into a single submission. Process that ack earlier and we
> start submitting each nop individually. But we do want to process that
> ack earlier as we know we are handling a request that should be pushed
> to the GPU immediately.
>

Uhm, that's awful.

> [The age old adage, batching submissions is good for throughput, bad for
> latency. And I have to pinch myself that throughput is easier to
> measure, but latency is the crucial metric.]
>
>> >> > Anyway, trying this out now in combination with my patches now.
>> >> >
>> >> 
>> >> This didn't seem to help (together with your other suggestion to move
>> >> the overload accounting to __execlists_schedule_in/out).  And it makes
>> >> the current -5% SynMark OglMultithread regression with my series go down
>> >> to -10%.  My previous suggestion of moving the
>> >> intel_gt_pm_active_begin() call to process_csb() when the submission is
>> >> ACK'ed by the hardware does seem to help (and it roughly halves the
>> >> OglMultithread regression), possibly because that way we're able to
>> >> determine whether the first context was actually overlapping at the
>> >> point that the second was received by the hardware -- I haven't tested
>> >> it extensively yet though.
>> >
>> > Grumble, it just seems like we are setting and clearing the flag on
>> > completely unrelated events -- which I still think boils down to working
>> > around latency in the driver. Or at least I hope there's an explanation
>> > and bug to fix that improves responsiveness for all.
>> > -Chris
>> 
>> There *might* be a performance issue somewhere introducing latency that
>> the instrumentation I added happens to mitigate, but isn't that a sign
>> that it's fulfilling its purpose of determining when the workload could
>> be sensitive to CPU latency?
>
> We have latency issues in the tasklet submission. The irq-off lock
> contention along the submission path [execlists_dequeue] is perhaps the
> biggest issue in the driver (at least from lockstats perspective). My
> expectation is that the delay in processing the CSB is affecting the
> 'overload' heuristic.
>  
>> Maybe I didn't explain the idea properly: Given that command submission
>> is asynchronous with the CS processing the previous context, there is no
>> way for us to tell whether a request we just submitted was actually
>> overlapping with the previous one until we read the CSB and see whether
>> it led to an idle-to-active transition.  Only then can we assume that
>> the CPU is sending commands to the GPU quickly enough to keep it busy
>> without interruption.
>
> That sounds like you just want to use the C0 counters.
>

I tried to do something like that at some point and got mixed results.
It would have different behavior from the patch I submitted since it
would count all the GPU time any engine is active (rather than the time
there are at least two contexts queued for execution), so it would have
greater potential for regressions when the GPU is only one request ahead
of the CPU, though the performance regression would be limited by the
residency threshold in the steady state (e.g. if you allow the heuristic
to kick in while you have at least 99% GPU utilization you may still
have to pay up to a 1% regression in some pessimal cases).  But the
greatest downside I noticed from using any sort of residency counters is
that is that there is substantial delay between the GPU going idle and
it entering a sleep state, so the metric can be fairly biased upwards
for intermittent workloads (which are frequently latency-sensitive too).

> But if you want to use the active/idle state as your heuristic, then you
> want something like
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> index 3102159d2b3b..3e08ecd53ecb 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> @@ -2443,6 +2443,8 @@ static void process_csb(struct intel_engine_cs *engine)
>
>                         GEM_BUG_ON(execlists->active - execlists->inflight >
>                                    execlists_num_ports(execlists));
> +
> +                       WRITE_ONCE(execlists->overload, !!*execlists->active);
>                 }
>         } while (head != tail);
>
>
> That will be set to true when we have enough work to keep the GPU busy
> into the second context, and only be set to false when the GPU idles.
>
> However, just because we switched contexts does not necessarily imply
> that the previous context did substantial work. And there may be lots of
> fun with timeslicing preemptions that do not let a single context run to
> completion. So perhaps,
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> index 3102159d2b3b..05bb533d556c 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> @@ -2340,6 +2340,7 @@ static void process_csb(struct intel_engine_cs *engine)
>         rmb();
>
>         ENGINE_TRACE(engine, "cs-irq head=%d, tail=%d\n", head, tail);
> +       bool overload = *execlists->active;
>         do {
>                 bool promote;
>
> @@ -2443,8 +2444,11 @@ static void process_csb(struct intel_engine_cs *engine)
>
>                         GEM_BUG_ON(execlists->active - execlists->inflight >
>                                    execlists_num_ports(execlists));
> +
> +                       overload = *execlists->overload; /* active->idle? */
>                 }
>         } while (head != tail);
> +       WRITE_ONCE(execlists->overload, overload && *execlists->overload);
>
>
> which sets the overload flag if we enter with an active context and
> leave with another active context, without going through an idle state.
> But that will set overload for timeslicing and for high priority
> preemption.
>
> (Now I have to ask whether that's what you had before :)

Not exactly.  I'm clearing the overload flag unconditionally on any
completion event since that indicates that we're back to single port
submission best-case.  If we then go back to dual-ELSP, overload will
get signalled by a subsequent iteration of process_csb() (rather than
doing it from execlists_dequeue()).  In addition I got the preemption
handler to clear the overload flag in cases where that causes us to go
back to single-ELSP submission, in order to fix the bug you pointed out
earlier.

>  
>> You might argue that this will introduce a delay in the signalling of
>> overload roughly equal to the latency it takes for the CPU to receive
>> the execlists IRQ with the hardware ACK.  However that seems beneficial
>> since the clearing of overload suffers from the same latency, so the
>> fraction of time that overload is signalled will otherwise be biased as
>> a result of the latency difference, causing overload to be overreported
>> on the average.  Delaying the signalling of overload to the CSB handler
>> means that any systematic latency in our interrupt processing is
>> self-correcting.
>
> That only worries me if we are trying to balance decisions made on
> submission with the async ack. If we can solely use the CSB events that
> worry is moot.
>  
>> Anyway, I'm open to other suggestions if you have other ideas that at
>> least don't worsen the pre-existing regression from my series.
>
> Likewise, if the best is the current overload semantics then so be it.

Thanks!

> -Chris
> ---------------------------------------------------------------------
> Intel Corporation (UK) Limited
> Registered No. 1134945 (England)
> Registered Office: Pipers Way, Swindon SN3 1RJ
> VAT No: 860 2173 47
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 227 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2020-03-24 22:55 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-20 13:01 [Intel-gfx] [PATCH 1/4] drm/i915/gt: Report context-is-closed prior to pinning Chris Wilson
2020-03-20 13:01 ` [Intel-gfx] [PATCH 2/4] drm/i915/execlists: Pull tasklet interrupt-bh local to direct submission Chris Wilson
2020-03-20 17:47   ` [Intel-gfx] [PATCH] " Chris Wilson
2020-03-20 20:28     ` Francisco Jerez
2020-03-20 22:14       ` Francisco Jerez
2020-03-23  9:45         ` Chris Wilson
2020-03-23 22:30           ` Francisco Jerez
2020-03-23 23:50             ` Chris Wilson
2020-03-24 22:55               ` Francisco Jerez
2020-03-21 13:12     ` Chris Wilson
2020-03-20 13:01 ` [Intel-gfx] [PATCH 3/4] drm/i915: Immediately execute the fenced work Chris Wilson
2020-03-20 13:01 ` [Intel-gfx] [PATCH 4/4] drm/i915/gem: Avoid gem_context->mutex for simple vma lookup Chris Wilson
2020-03-20 13:47   ` Tvrtko Ursulin
2020-03-20 13:56     ` Chris Wilson
2020-03-20 13:15 ` [Intel-gfx] [PATCH 1/4] drm/i915/gt: Report context-is-closed prior to pinning Tvrtko Ursulin
2020-03-20 18:08 ` [Intel-gfx] ✗ Fi.CI.BAT: failure for series starting with [1/4] " Patchwork
2020-03-20 20:06 ` [Intel-gfx] ✓ Fi.CI.BAT: success for series starting with [1/4] drm/i915/gt: Report context-is-closed prior to pinning (rev2) Patchwork
2020-03-21  2:24 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
2020-03-24  0:11 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for series starting with [1/4] drm/i915/gt: Report context-is-closed prior to pinning (rev3) Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.