[PATCH 1/2] drm/i915: Recheck breadcrumb seqno after an interrupt

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 1/2] drm/i915: Recheck breadcrumb seqno after an interrupt
@ 2017-02-16  9:29 Chris Wilson
  2017-02-16  9:29 ` [PATCH 2/2] drm/i915: Postpone fake breadcrumb interrupt if a real interrupt is pending Chris Wilson
  2017-02-16 10:21 ` [PATCH 1/2] drm/i915: Recheck breadcrumb seqno after an interrupt Tvrtko Ursulin
  0 siblings, 2 replies; 20+ messages in thread
From: Chris Wilson @ 2017-02-16  9:29 UTC (permalink / raw)
  To: intel-gfx

If an interrupt arrives whilst we are performing the irq-seqno barrier,
recheck the seqno again before returning.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index c1b400f1ede4..ecb8b414bdd2 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -4102,6 +4102,9 @@ __i915_request_irq_complete(const struct drm_i915_gem_request *req)
 	if (__i915_gem_request_completed(req, seqno))
 		return true;
 
+	if (!engine->irq_seqno_barrier)
+		return false;
+
 	/* Ensure our read of the seqno is coherent so that we
 	 * do not "miss an interrupt" (i.e. if this is the last
 	 * request and the seqno write from the GPU is not visible
@@ -4113,8 +4116,7 @@ __i915_request_irq_complete(const struct drm_i915_gem_request *req)
 	 * but it is easier and safer to do it every time the waiter
 	 * is woken.
 	 */
-	if (engine->irq_seqno_barrier &&
-	    test_and_clear_bit(ENGINE_IRQ_BREADCRUMB, &engine->irq_posted)) {
+	while (test_and_clear_bit(ENGINE_IRQ_BREADCRUMB, &engine->irq_posted)) {
 		unsigned long flags;
 
 		/* The ordering of irq_posted versus applying the barrier
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/2] drm/i915: Postpone fake breadcrumb interrupt if a real interrupt is pending
  2017-02-16  9:29 [PATCH 1/2] drm/i915: Recheck breadcrumb seqno after an interrupt Chris Wilson
@ 2017-02-16  9:29 ` Chris Wilson
  2017-02-16 10:38   ` Tvrtko Ursulin
  2017-02-16 11:13   ` [PATCH v2] drm/i915: Postpone fake breadcrumb interrupt until real interrupts cease Chris Wilson
  2017-02-16 10:21 ` [PATCH 1/2] drm/i915: Recheck breadcrumb seqno after an interrupt Tvrtko Ursulin
  1 sibling, 2 replies; 20+ messages in thread
From: Chris Wilson @ 2017-02-16  9:29 UTC (permalink / raw)
  To: intel-gfx

If the timer expires for enabling the fake interrupt, check to see
if there is a real interrupt queued before making the decision to start
polling. This helps in situations where we have a very slow irq-seqno
barrier that may accrue more breadcrumb interrupts before it is able to
catch up. It still leaves a hole for the timer to expire as we are
processing the last irq-seqno barrier, but it appears to help reduce the
frequency of "missed-interrupts" on Ironlake, at least.

References: https://bugs.freedesktop.org/show_bug.cgi?id=99816
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/intel_breadcrumbs.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
index ef3adfd37d7d..21269421bd2a 100644
--- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
@@ -41,6 +41,11 @@ static void intel_breadcrumbs_hangcheck(unsigned long data)
 		return;
 	}
 
+	if (test_bit(ENGINE_IRQ_BREADCRUMB, &engine->irq_posted)) {
+		mod_timer(&b->hangcheck, jiffies + 1);
+		return;
+	}
+
 	DRM_DEBUG("Hangcheck timer elapsed... %s idle\n", engine->name);
 	set_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings);
 	mod_timer(&engine->breadcrumbs.fake_irq, jiffies + 1);
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] drm/i915: Recheck breadcrumb seqno after an interrupt
  2017-02-16  9:29 [PATCH 1/2] drm/i915: Recheck breadcrumb seqno after an interrupt Chris Wilson
  2017-02-16  9:29 ` [PATCH 2/2] drm/i915: Postpone fake breadcrumb interrupt if a real interrupt is pending Chris Wilson
@ 2017-02-16 10:21 ` Tvrtko Ursulin
  2017-02-16 10:36   ` Chris Wilson
  1 sibling, 1 reply; 20+ messages in thread
From: Tvrtko Ursulin @ 2017-02-16 10:21 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 16/02/2017 09:29, Chris Wilson wrote:
> If an interrupt arrives whilst we are performing the irq-seqno barrier,
> recheck the seqno again before returning.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.h | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index c1b400f1ede4..ecb8b414bdd2 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -4102,6 +4102,9 @@ __i915_request_irq_complete(const struct drm_i915_gem_request *req)
>  	if (__i915_gem_request_completed(req, seqno))
>  		return true;
>
> +	if (!engine->irq_seqno_barrier)
> +		return false;
> +
>  	/* Ensure our read of the seqno is coherent so that we
>  	 * do not "miss an interrupt" (i.e. if this is the last
>  	 * request and the seqno write from the GPU is not visible
> @@ -4113,8 +4116,7 @@ __i915_request_irq_complete(const struct drm_i915_gem_request *req)
>  	 * but it is easier and safer to do it every time the waiter
>  	 * is woken.
>  	 */
> -	if (engine->irq_seqno_barrier &&
> -	    test_and_clear_bit(ENGINE_IRQ_BREADCRUMB, &engine->irq_posted)) {
> +	while (test_and_clear_bit(ENGINE_IRQ_BREADCRUMB, &engine->irq_posted)) {
>  		unsigned long flags;
>
>  		/* The ordering of irq_posted versus applying the barrier
>

Hmmm.. if this helps it feels that there is a race somewhere. Because we 
have processed one interrupt, but it wasn't for us. That means there 
will be another one coming. Why handle that at this level? Why check 
twice and not three, four, five times? :)

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] drm/i915: Recheck breadcrumb seqno after an interrupt
  2017-02-16 10:21 ` [PATCH 1/2] drm/i915: Recheck breadcrumb seqno after an interrupt Tvrtko Ursulin
@ 2017-02-16 10:36   ` Chris Wilson
  2017-02-16 10:45     ` Tvrtko Ursulin
  0 siblings, 1 reply; 20+ messages in thread
From: Chris Wilson @ 2017-02-16 10:36 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Thu, Feb 16, 2017 at 10:21:17AM +0000, Tvrtko Ursulin wrote:
> 
> On 16/02/2017 09:29, Chris Wilson wrote:
> >If an interrupt arrives whilst we are performing the irq-seqno barrier,
> >recheck the seqno again before returning.
> >
> >Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >---
> > drivers/gpu/drm/i915/i915_drv.h | 6 ++++--
> > 1 file changed, 4 insertions(+), 2 deletions(-)
> >
> >diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> >index c1b400f1ede4..ecb8b414bdd2 100644
> >--- a/drivers/gpu/drm/i915/i915_drv.h
> >+++ b/drivers/gpu/drm/i915/i915_drv.h
> >@@ -4102,6 +4102,9 @@ __i915_request_irq_complete(const struct drm_i915_gem_request *req)
> > 	if (__i915_gem_request_completed(req, seqno))
> > 		return true;
> >
> >+	if (!engine->irq_seqno_barrier)
> >+		return false;
> >+
> > 	/* Ensure our read of the seqno is coherent so that we
> > 	 * do not "miss an interrupt" (i.e. if this is the last
> > 	 * request and the seqno write from the GPU is not visible
> >@@ -4113,8 +4116,7 @@ __i915_request_irq_complete(const struct drm_i915_gem_request *req)
> > 	 * but it is easier and safer to do it every time the waiter
> > 	 * is woken.
> > 	 */
> >-	if (engine->irq_seqno_barrier &&
> >-	    test_and_clear_bit(ENGINE_IRQ_BREADCRUMB, &engine->irq_posted)) {
> >+	while (test_and_clear_bit(ENGINE_IRQ_BREADCRUMB, &engine->irq_posted)) {
> > 		unsigned long flags;
> >
> > 		/* The ordering of irq_posted versus applying the barrier
> >
> 
> Hmmm.. if this helps it feels that there is a race somewhere.
> Because we have processed one interrupt, but it wasn't for us.

There may be many interrupts between now and the one we actually want.
The caller of this function is (or was at the time) has the first seqno
to be signaled, it is just not necessarily the next interrupt.

> That
> means there will be another one coming. Why handle that at this
> level? Why check twice and not three, four, five times? :)

Because they are all for us! This only saves a trip through schedule. If
we don't loop here, we just loop at the next level.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] drm/i915: Postpone fake breadcrumb interrupt if a real interrupt is pending
  2017-02-16  9:29 ` [PATCH 2/2] drm/i915: Postpone fake breadcrumb interrupt if a real interrupt is pending Chris Wilson
@ 2017-02-16 10:38   ` Tvrtko Ursulin
  2017-02-16 10:47     ` Chris Wilson
  2017-02-16 11:13   ` [PATCH v2] drm/i915: Postpone fake breadcrumb interrupt until real interrupts cease Chris Wilson
  1 sibling, 1 reply; 20+ messages in thread
From: Tvrtko Ursulin @ 2017-02-16 10:38 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 16/02/2017 09:29, Chris Wilson wrote:
> If the timer expires for enabling the fake interrupt, check to see
> if there is a real interrupt queued before making the decision to start
> polling. This helps in situations where we have a very slow irq-seqno
> barrier that may accrue more breadcrumb interrupts before it is able to
> catch up. It still leaves a hole for the timer to expire as we are
> processing the last irq-seqno barrier, but it appears to help reduce the
> frequency of "missed-interrupts" on Ironlake, at least.
>
> References: https://bugs.freedesktop.org/show_bug.cgi?id=99816
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>  drivers/gpu/drm/i915/intel_breadcrumbs.c | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> index ef3adfd37d7d..21269421bd2a 100644
> --- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
> +++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> @@ -41,6 +41,11 @@ static void intel_breadcrumbs_hangcheck(unsigned long data)
>  		return;
>  	}
>
> +	if (test_bit(ENGINE_IRQ_BREADCRUMB, &engine->irq_posted)) {
> +		mod_timer(&b->hangcheck, jiffies + 1);
> +		return;
> +	}
> +
>  	DRM_DEBUG("Hangcheck timer elapsed... %s idle\n", engine->name);
>  	set_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings);
>  	mod_timer(&engine->breadcrumbs.fake_irq, jiffies + 1);
>

Another hmm :), barriers are so much shorter than the hangcheck interval 
(1500ms) so I don't quite understand what is the problem.

Should we instead put a mod_timer when raising the user irq?

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] drm/i915: Recheck breadcrumb seqno after an interrupt
  2017-02-16 10:36   ` Chris Wilson
@ 2017-02-16 10:45     ` Tvrtko Ursulin
  2017-02-16 10:50       ` Chris Wilson
  2017-02-16 11:21       ` Chris Wilson
  0 siblings, 2 replies; 20+ messages in thread
From: Tvrtko Ursulin @ 2017-02-16 10:45 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 16/02/2017 10:36, Chris Wilson wrote:
> On Thu, Feb 16, 2017 at 10:21:17AM +0000, Tvrtko Ursulin wrote:
>>
>> On 16/02/2017 09:29, Chris Wilson wrote:
>>> If an interrupt arrives whilst we are performing the irq-seqno barrier,
>>> recheck the seqno again before returning.
>>>
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> ---
>>> drivers/gpu/drm/i915/i915_drv.h | 6 ++++--
>>> 1 file changed, 4 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>>> index c1b400f1ede4..ecb8b414bdd2 100644
>>> --- a/drivers/gpu/drm/i915/i915_drv.h
>>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>>> @@ -4102,6 +4102,9 @@ __i915_request_irq_complete(const struct drm_i915_gem_request *req)
>>> 	if (__i915_gem_request_completed(req, seqno))
>>> 		return true;
>>>
>>> +	if (!engine->irq_seqno_barrier)
>>> +		return false;
>>> +
>>> 	/* Ensure our read of the seqno is coherent so that we
>>> 	 * do not "miss an interrupt" (i.e. if this is the last
>>> 	 * request and the seqno write from the GPU is not visible
>>> @@ -4113,8 +4116,7 @@ __i915_request_irq_complete(const struct drm_i915_gem_request *req)
>>> 	 * but it is easier and safer to do it every time the waiter
>>> 	 * is woken.
>>> 	 */
>>> -	if (engine->irq_seqno_barrier &&
>>> -	    test_and_clear_bit(ENGINE_IRQ_BREADCRUMB, &engine->irq_posted)) {
>>> +	while (test_and_clear_bit(ENGINE_IRQ_BREADCRUMB, &engine->irq_posted)) {
>>> 		unsigned long flags;
>>>
>>> 		/* The ordering of irq_posted versus applying the barrier
>>>
>>
>> Hmmm.. if this helps it feels that there is a race somewhere.
>> Because we have processed one interrupt, but it wasn't for us.
>
> There may be many interrupts between now and the one we actually want.
> The caller of this function is (or was at the time) has the first seqno
> to be signaled, it is just not necessarily the next interrupt.
>
>> That
>> means there will be another one coming. Why handle that at this
>> level? Why check twice and not three, four, five times? :)
>
> Because they are all for us! This only saves a trip through schedule. If
> we don't loop here, we just loop at the next level.

Yes, which looks more correct to me. Because the caller then do what 
they want. Signaller just sleeps and i915_wait_request potentially busy 
waits for a bit.

So essentially this is just a performance optimisation for some cases, 
but we have to be sure it is not the opposite for some other ones.

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] drm/i915: Postpone fake breadcrumb interrupt if a real interrupt is pending
  2017-02-16 10:38   ` Tvrtko Ursulin
@ 2017-02-16 10:47     ` Chris Wilson
  2017-02-16 11:17       ` Tvrtko Ursulin
  0 siblings, 1 reply; 20+ messages in thread
From: Chris Wilson @ 2017-02-16 10:47 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Thu, Feb 16, 2017 at 10:38:08AM +0000, Tvrtko Ursulin wrote:
> 
> On 16/02/2017 09:29, Chris Wilson wrote:
> >If the timer expires for enabling the fake interrupt, check to see
> >if there is a real interrupt queued before making the decision to start
> >polling. This helps in situations where we have a very slow irq-seqno
> >barrier that may accrue more breadcrumb interrupts before it is able to
> >catch up. It still leaves a hole for the timer to expire as we are
> >processing the last irq-seqno barrier, but it appears to help reduce the
> >frequency of "missed-interrupts" on Ironlake, at least.
> >
> >References: https://bugs.freedesktop.org/show_bug.cgi?id=99816
> >Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >---
> > drivers/gpu/drm/i915/intel_breadcrumbs.c | 5 +++++
> > 1 file changed, 5 insertions(+)
> >
> >diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> >index ef3adfd37d7d..21269421bd2a 100644
> >--- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
> >+++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> >@@ -41,6 +41,11 @@ static void intel_breadcrumbs_hangcheck(unsigned long data)
> > 		return;
> > 	}
> >
> >+	if (test_bit(ENGINE_IRQ_BREADCRUMB, &engine->irq_posted)) {
> >+		mod_timer(&b->hangcheck, jiffies + 1);
> >+		return;
> >+	}
> >+
> > 	DRM_DEBUG("Hangcheck timer elapsed... %s idle\n", engine->name);
> > 	set_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings);
> > 	mod_timer(&engine->breadcrumbs.fake_irq, jiffies + 1);
> >
> 
> Another hmm :), barriers are so much shorter than the hangcheck
> interval (1500ms) so I don't quite understand what is the problem.

We may queue the wait many, many seconds before it is even being
processed. The point of this timer is to detect when we haven't seen an
interrupt for some time and that happens to be the waiter missed (backup
for seqno-barrier failing).
 
> Should we instead put a mod_timer when raising the user irq?

Frequency, mod_timer may require reprogramming of the apic and often
does when profiling ;)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] drm/i915: Recheck breadcrumb seqno after an interrupt
  2017-02-16 10:45     ` Tvrtko Ursulin
@ 2017-02-16 10:50       ` Chris Wilson
  2017-02-16 11:19         ` Tvrtko Ursulin
  2017-02-16 11:21       ` Chris Wilson
  1 sibling, 1 reply; 20+ messages in thread
From: Chris Wilson @ 2017-02-16 10:50 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Thu, Feb 16, 2017 at 10:45:56AM +0000, Tvrtko Ursulin wrote:
> 
> On 16/02/2017 10:36, Chris Wilson wrote:
> >On Thu, Feb 16, 2017 at 10:21:17AM +0000, Tvrtko Ursulin wrote:
> >>
> >>On 16/02/2017 09:29, Chris Wilson wrote:
> >>>If an interrupt arrives whilst we are performing the irq-seqno barrier,
> >>>recheck the seqno again before returning.
> >>>
> >>>Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >>>Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>>---
> >>>drivers/gpu/drm/i915/i915_drv.h | 6 ++++--
> >>>1 file changed, 4 insertions(+), 2 deletions(-)
> >>>
> >>>diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> >>>index c1b400f1ede4..ecb8b414bdd2 100644
> >>>--- a/drivers/gpu/drm/i915/i915_drv.h
> >>>+++ b/drivers/gpu/drm/i915/i915_drv.h
> >>>@@ -4102,6 +4102,9 @@ __i915_request_irq_complete(const struct drm_i915_gem_request *req)
> >>>	if (__i915_gem_request_completed(req, seqno))
> >>>		return true;
> >>>
> >>>+	if (!engine->irq_seqno_barrier)
> >>>+		return false;
> >>>+
> >>>	/* Ensure our read of the seqno is coherent so that we
> >>>	 * do not "miss an interrupt" (i.e. if this is the last
> >>>	 * request and the seqno write from the GPU is not visible
> >>>@@ -4113,8 +4116,7 @@ __i915_request_irq_complete(const struct drm_i915_gem_request *req)
> >>>	 * but it is easier and safer to do it every time the waiter
> >>>	 * is woken.
> >>>	 */
> >>>-	if (engine->irq_seqno_barrier &&
> >>>-	    test_and_clear_bit(ENGINE_IRQ_BREADCRUMB, &engine->irq_posted)) {
> >>>+	while (test_and_clear_bit(ENGINE_IRQ_BREADCRUMB, &engine->irq_posted)) {
> >>>		unsigned long flags;
> >>>
> >>>		/* The ordering of irq_posted versus applying the barrier
> >>>
> >>
> >>Hmmm.. if this helps it feels that there is a race somewhere.
> >>Because we have processed one interrupt, but it wasn't for us.
> >
> >There may be many interrupts between now and the one we actually want.
> >The caller of this function is (or was at the time) has the first seqno
> >to be signaled, it is just not necessarily the next interrupt.
> >
> >>That
> >>means there will be another one coming. Why handle that at this
> >>level? Why check twice and not three, four, five times? :)
> >
> >Because they are all for us! This only saves a trip through schedule. If
> >we don't loop here, we just loop at the next level.
> 
> Yes, which looks more correct to me. Because the caller then do what
> they want. Signaller just sleeps and i915_wait_request potentially
> busy waits for a bit.

That's incorrect.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2] drm/i915: Postpone fake breadcrumb interrupt until real interrupts cease
  2017-02-16  9:29 ` [PATCH 2/2] drm/i915: Postpone fake breadcrumb interrupt if a real interrupt is pending Chris Wilson
  2017-02-16 10:38   ` Tvrtko Ursulin
@ 2017-02-16 11:13   ` Chris Wilson
  2017-02-16 11:42     ` Chris Wilson
  2017-02-16 11:44     ` Tvrtko Ursulin
  1 sibling, 2 replies; 20+ messages in thread
From: Chris Wilson @ 2017-02-16 11:13 UTC (permalink / raw)
  To: intel-gfx

When the timer expires for checking on interrupt processing, check to
see if any interrupts arrived within the last time period. If real
interrupts are still being delivered, we can be reassured that we
haven't missed the final interrupt as the waiter will still be woken.
Only once all activity ceases, do we have to worry about the waiter
never being woken and so need to install a timer to kick the waiter for
a slow arrival of a seqno.

References: https://bugs.freedesktop.org/show_bug.cgi?id=99816
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_irq.c          |  1 +
 drivers/gpu/drm/i915/intel_breadcrumbs.c | 26 ++++++++++++--------------
 drivers/gpu/drm/i915/intel_ringbuffer.h  |  3 ++-
 3 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 1ce54601432b..d3c851c93ee3 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1037,6 +1037,7 @@ static void notify_ring(struct intel_engine_cs *engine)
 	struct intel_wait *wait;
 
 	trace_i915_gem_request_notify(engine);
+	atomic_inc(&engine->irq_count);
 	set_bit(ENGINE_IRQ_BREADCRUMB, &engine->irq_posted);
 
 	rcu_read_lock();
diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
index ef3adfd37d7d..3267e671888c 100644
--- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
@@ -26,6 +26,11 @@
 
 #include "i915_drv.h"
 
+static unsigned long wait_timeout(void)
+{
+	return round_jiffies_up(jiffies + DRM_I915_HANGCHECK_JIFFIES);
+}
+
 static void intel_breadcrumbs_hangcheck(unsigned long data)
 {
 	struct intel_engine_cs *engine = (struct intel_engine_cs *)data;
@@ -36,8 +41,9 @@ static void intel_breadcrumbs_hangcheck(unsigned long data)
 		return;
 	}
 
-	if (time_before(jiffies, b->timeout)) {
-		mod_timer(&b->hangcheck, b->timeout);
+	if (b->hangcheck_interrupts != atomic_read(&engine->irq_count)) {
+		b->hangcheck_interrupts = atomic_read(&engine->irq_count);
+		mod_timer(&b->hangcheck, wait_timeout());
 		return;
 	}
 
@@ -57,11 +63,6 @@ static void intel_breadcrumbs_hangcheck(unsigned long data)
 	i915_queue_hangcheck(engine->i915);
 }
 
-static unsigned long wait_timeout(void)
-{
-	return round_jiffies_up(jiffies + DRM_I915_HANGCHECK_JIFFIES);
-}
-
 static void intel_breadcrumbs_fake_irq(unsigned long data)
 {
 	struct intel_engine_cs *engine = (struct intel_engine_cs *)data;
@@ -176,8 +177,8 @@ static void __intel_breadcrumbs_enable_irq(struct intel_breadcrumbs *b)
 		i915_queue_hangcheck(i915);
 	} else {
 		/* Ensure we never sleep indefinitely */
-		GEM_BUG_ON(!time_after(b->timeout, jiffies));
-		mod_timer(&b->hangcheck, b->timeout);
+		b->hangcheck_interrupts = atomic_read(&engine->irq_count);
+		mod_timer(&b->hangcheck, wait_timeout());
 	}
 }
 
@@ -270,7 +271,6 @@ static bool __intel_engine_add_wait(struct intel_engine_cs *engine,
 		GEM_BUG_ON(!next && !first);
 		if (next && next != &wait->node) {
 			GEM_BUG_ON(first);
-			b->timeout = wait_timeout();
 			b->first_wait = to_wait(next);
 			/* As there is a delay between reading the current
 			 * seqno, processing the completed tasks and selecting
@@ -297,7 +297,6 @@ static bool __intel_engine_add_wait(struct intel_engine_cs *engine,
 
 	if (first) {
 		GEM_BUG_ON(rb_first(&b->waiters) != &wait->node);
-		b->timeout = wait_timeout();
 		b->first_wait = wait;
 		/* After assigning ourselves as the new bottom-half, we must
 		 * perform a cursory check to prevent a missed interrupt.
@@ -396,7 +395,6 @@ static void __intel_engine_remove_wait(struct intel_engine_cs *engine,
 			 * the interrupt, or if we have to handle an
 			 * exception rather than a seqno completion.
 			 */
-			b->timeout = wait_timeout();
 			b->first_wait = to_wait(next);
 			if (b->first_wait->seqno != wait->seqno)
 				__intel_breadcrumbs_enable_irq(b);
@@ -702,10 +700,10 @@ void intel_engine_reset_breadcrumbs(struct intel_engine_cs *engine)
 		irq_disable(engine);
 
 	if (intel_engine_has_waiter(engine)) {
-		b->timeout = wait_timeout();
 		if (test_bit(ENGINE_IRQ_BREADCRUMB, &engine->irq_posted))
 			wake_up_process(b->first_wait->tsk);
-		mod_timer(&b->hangcheck, b->timeout);
+		b->hangcheck_interrupts = atomic_read(&engine->irq_count);
+		mod_timer(&b->hangcheck, wait_timeout());
 		/* sanitize the IMR and unmask any auxiliary interrupts */
 	}
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index e036ee3e35f9..75b572a9e80a 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -213,6 +213,7 @@ struct intel_engine_cs {
 
 	struct intel_render_state *render_state;
 
+	atomic_t irq_count;
 	unsigned long irq_posted;
 #define ENGINE_IRQ_BREADCRUMB 0
 #define ENGINE_IRQ_EXECLIST 1
@@ -243,7 +244,7 @@ struct intel_engine_cs {
 		struct timer_list fake_irq; /* used after a missed interrupt */
 		struct timer_list hangcheck; /* detect missed interrupts */
 
-		unsigned long timeout;
+		unsigned int hangcheck_interrupts;
 
 		bool irq_armed : 1;
 		bool irq_enabled : 1;
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] drm/i915: Postpone fake breadcrumb interrupt if a real interrupt is pending
  2017-02-16 10:47     ` Chris Wilson
@ 2017-02-16 11:17       ` Tvrtko Ursulin
  2017-02-16 11:40         ` Chris Wilson
  0 siblings, 1 reply; 20+ messages in thread
From: Tvrtko Ursulin @ 2017-02-16 11:17 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 16/02/2017 10:47, Chris Wilson wrote:
> On Thu, Feb 16, 2017 at 10:38:08AM +0000, Tvrtko Ursulin wrote:
>>
>> On 16/02/2017 09:29, Chris Wilson wrote:
>>> If the timer expires for enabling the fake interrupt, check to see
>>> if there is a real interrupt queued before making the decision to start
>>> polling. This helps in situations where we have a very slow irq-seqno
>>> barrier that may accrue more breadcrumb interrupts before it is able to
>>> catch up. It still leaves a hole for the timer to expire as we are
>>> processing the last irq-seqno barrier, but it appears to help reduce the
>>> frequency of "missed-interrupts" on Ironlake, at least.
>>>
>>> References: https://bugs.freedesktop.org/show_bug.cgi?id=99816
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> ---
>>> drivers/gpu/drm/i915/intel_breadcrumbs.c | 5 +++++
>>> 1 file changed, 5 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
>>> index ef3adfd37d7d..21269421bd2a 100644
>>> --- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
>>> +++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
>>> @@ -41,6 +41,11 @@ static void intel_breadcrumbs_hangcheck(unsigned long data)
>>> 		return;
>>> 	}
>>>
>>> +	if (test_bit(ENGINE_IRQ_BREADCRUMB, &engine->irq_posted)) {
>>> +		mod_timer(&b->hangcheck, jiffies + 1);
>>> +		return;
>>> +	}
>>> +
>>> 	DRM_DEBUG("Hangcheck timer elapsed... %s idle\n", engine->name);
>>> 	set_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings);
>>> 	mod_timer(&engine->breadcrumbs.fake_irq, jiffies + 1);
>>>
>>
>> Another hmm :), barriers are so much shorter than the hangcheck
>> interval (1500ms) so I don't quite understand what is the problem.
>
> We may queue the wait many, many seconds before it is even being
> processed. The point of this timer is to detect when we haven't seen an
> interrupt for some time and that happens to be the waiter missed (backup
> for seqno-barrier failing).

But it is the same as a slow batch which completes just as the hangcheck 
timer fires.

Is the problem is seqno barrier on some platforms can slow down the 
signaller thread a lot? Hm, if the request duration is right it would 
sleep on every invocation. So in effect extend the request duration as 
seen form userspace by the barrier duration. Why would that be a problem 
though? Haven't fully figured it out.

>> Should we instead put a mod_timer when raising the user irq?
>
> Frequency, mod_timer may require reprogramming of the apic and often
> does when profiling ;)

Agreed on this one so no complaints, just trying to understand the 
precise problem.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] drm/i915: Recheck breadcrumb seqno after an interrupt
  2017-02-16 10:50       ` Chris Wilson
@ 2017-02-16 11:19         ` Tvrtko Ursulin
  2017-02-16 11:34           ` Chris Wilson
  0 siblings, 1 reply; 20+ messages in thread
From: Tvrtko Ursulin @ 2017-02-16 11:19 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 16/02/2017 10:50, Chris Wilson wrote:
> On Thu, Feb 16, 2017 at 10:45:56AM +0000, Tvrtko Ursulin wrote:
>>
>> On 16/02/2017 10:36, Chris Wilson wrote:
>>> On Thu, Feb 16, 2017 at 10:21:17AM +0000, Tvrtko Ursulin wrote:
>>>>
>>>> On 16/02/2017 09:29, Chris Wilson wrote:
>>>>> If an interrupt arrives whilst we are performing the irq-seqno barrier,
>>>>> recheck the seqno again before returning.
>>>>>
>>>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>> ---
>>>>> drivers/gpu/drm/i915/i915_drv.h | 6 ++++--
>>>>> 1 file changed, 4 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>>>>> index c1b400f1ede4..ecb8b414bdd2 100644
>>>>> --- a/drivers/gpu/drm/i915/i915_drv.h
>>>>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>>>>> @@ -4102,6 +4102,9 @@ __i915_request_irq_complete(const struct drm_i915_gem_request *req)
>>>>> 	if (__i915_gem_request_completed(req, seqno))
>>>>> 		return true;
>>>>>
>>>>> +	if (!engine->irq_seqno_barrier)
>>>>> +		return false;
>>>>> +
>>>>> 	/* Ensure our read of the seqno is coherent so that we
>>>>> 	 * do not "miss an interrupt" (i.e. if this is the last
>>>>> 	 * request and the seqno write from the GPU is not visible
>>>>> @@ -4113,8 +4116,7 @@ __i915_request_irq_complete(const struct drm_i915_gem_request *req)
>>>>> 	 * but it is easier and safer to do it every time the waiter
>>>>> 	 * is woken.
>>>>> 	 */
>>>>> -	if (engine->irq_seqno_barrier &&
>>>>> -	    test_and_clear_bit(ENGINE_IRQ_BREADCRUMB, &engine->irq_posted)) {
>>>>> +	while (test_and_clear_bit(ENGINE_IRQ_BREADCRUMB, &engine->irq_posted)) {
>>>>> 		unsigned long flags;
>>>>>
>>>>> 		/* The ordering of irq_posted versus applying the barrier
>>>>>
>>>>
>>>> Hmmm.. if this helps it feels that there is a race somewhere.
>>>> Because we have processed one interrupt, but it wasn't for us.
>>>
>>> There may be many interrupts between now and the one we actually want.
>>> The caller of this function is (or was at the time) has the first seqno
>>> to be signaled, it is just not necessarily the next interrupt.
>>>
>>>> That
>>>> means there will be another one coming. Why handle that at this
>>>> level? Why check twice and not three, four, five times? :)
>>>
>>> Because they are all for us! This only saves a trip through schedule. If
>>> we don't loop here, we just loop at the next level.
>>
>> Yes, which looks more correct to me. Because the caller then do what
>> they want. Signaller just sleeps and i915_wait_request potentially
>> busy waits for a bit.
>
> That's incorrect.

Which part is incorrect? I double checked and as far as I can see it is 
how I described it.

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] drm/i915: Recheck breadcrumb seqno after an interrupt
  2017-02-16 10:45     ` Tvrtko Ursulin
  2017-02-16 10:50       ` Chris Wilson
@ 2017-02-16 11:21       ` Chris Wilson
  2017-02-16 12:47         ` Tvrtko Ursulin
  1 sibling, 1 reply; 20+ messages in thread
From: Chris Wilson @ 2017-02-16 11:21 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Thu, Feb 16, 2017 at 10:45:56AM +0000, Tvrtko Ursulin wrote:
> 
> On 16/02/2017 10:36, Chris Wilson wrote:
> >On Thu, Feb 16, 2017 at 10:21:17AM +0000, Tvrtko Ursulin wrote:
> >>
> >>On 16/02/2017 09:29, Chris Wilson wrote:
> >>>If an interrupt arrives whilst we are performing the irq-seqno barrier,
> >>>recheck the seqno again before returning.
> >>>
> >>>Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >>>Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>>---
> >>>drivers/gpu/drm/i915/i915_drv.h | 6 ++++--
> >>>1 file changed, 4 insertions(+), 2 deletions(-)
> >>>
> >>>diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> >>>index c1b400f1ede4..ecb8b414bdd2 100644
> >>>--- a/drivers/gpu/drm/i915/i915_drv.h
> >>>+++ b/drivers/gpu/drm/i915/i915_drv.h
> >>>@@ -4102,6 +4102,9 @@ __i915_request_irq_complete(const struct drm_i915_gem_request *req)
> >>>	if (__i915_gem_request_completed(req, seqno))
> >>>		return true;
> >>>
> >>>+	if (!engine->irq_seqno_barrier)
> >>>+		return false;
> >>>+
> >>>	/* Ensure our read of the seqno is coherent so that we
> >>>	 * do not "miss an interrupt" (i.e. if this is the last
> >>>	 * request and the seqno write from the GPU is not visible
> >>>@@ -4113,8 +4116,7 @@ __i915_request_irq_complete(const struct drm_i915_gem_request *req)
> >>>	 * but it is easier and safer to do it every time the waiter
> >>>	 * is woken.
> >>>	 */
> >>>-	if (engine->irq_seqno_barrier &&
> >>>-	    test_and_clear_bit(ENGINE_IRQ_BREADCRUMB, &engine->irq_posted)) {
> >>>+	while (test_and_clear_bit(ENGINE_IRQ_BREADCRUMB, &engine->irq_posted)) {
> >>>		unsigned long flags;
> >>>
> >>>		/* The ordering of irq_posted versus applying the barrier
> >>>
> >>
> >>Hmmm.. if this helps it feels that there is a race somewhere.
> >>Because we have processed one interrupt, but it wasn't for us.
> >
> >There may be many interrupts between now and the one we actually want.
> >The caller of this function is (or was at the time) has the first seqno
> >to be signaled, it is just not necessarily the next interrupt.
> >
> >>That
> >>means there will be another one coming. Why handle that at this
> >>level? Why check twice and not three, four, five times? :)
> >
> >Because they are all for us! This only saves a trip through schedule. If
> >we don't loop here, we just loop at the next level.
> 
> Yes, which looks more correct to me. Because the caller then do what
> they want. Signaller just sleeps and i915_wait_request potentially
> busy waits for a bit.

A persuasive counter argument would be:

Interrupts can be arriving frequently for an indefinite period before we
see our seqno that breaks the loop. We should not allow this to spin
indefinitely and at the very least need to be checking for pending
interrupts or timeslice exhaustion. Both of which are handled in the
callers.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] drm/i915: Recheck breadcrumb seqno after an interrupt
  2017-02-16 11:19         ` Tvrtko Ursulin
@ 2017-02-16 11:34           ` Chris Wilson
  0 siblings, 0 replies; 20+ messages in thread
From: Chris Wilson @ 2017-02-16 11:34 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Thu, Feb 16, 2017 at 11:19:41AM +0000, Tvrtko Ursulin wrote:
> 
> On 16/02/2017 10:50, Chris Wilson wrote:
> >On Thu, Feb 16, 2017 at 10:45:56AM +0000, Tvrtko Ursulin wrote:
> >>
> >>On 16/02/2017 10:36, Chris Wilson wrote:
> >>>On Thu, Feb 16, 2017 at 10:21:17AM +0000, Tvrtko Ursulin wrote:
> >>>>
> >>>>On 16/02/2017 09:29, Chris Wilson wrote:
> >>>>>If an interrupt arrives whilst we are performing the irq-seqno barrier,
> >>>>>recheck the seqno again before returning.
> >>>>>
> >>>>>Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >>>>>Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>>>>---
> >>>>>drivers/gpu/drm/i915/i915_drv.h | 6 ++++--
> >>>>>1 file changed, 4 insertions(+), 2 deletions(-)
> >>>>>
> >>>>>diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> >>>>>index c1b400f1ede4..ecb8b414bdd2 100644
> >>>>>--- a/drivers/gpu/drm/i915/i915_drv.h
> >>>>>+++ b/drivers/gpu/drm/i915/i915_drv.h
> >>>>>@@ -4102,6 +4102,9 @@ __i915_request_irq_complete(const struct drm_i915_gem_request *req)
> >>>>>	if (__i915_gem_request_completed(req, seqno))
> >>>>>		return true;
> >>>>>
> >>>>>+	if (!engine->irq_seqno_barrier)
> >>>>>+		return false;
> >>>>>+
> >>>>>	/* Ensure our read of the seqno is coherent so that we
> >>>>>	 * do not "miss an interrupt" (i.e. if this is the last
> >>>>>	 * request and the seqno write from the GPU is not visible
> >>>>>@@ -4113,8 +4116,7 @@ __i915_request_irq_complete(const struct drm_i915_gem_request *req)
> >>>>>	 * but it is easier and safer to do it every time the waiter
> >>>>>	 * is woken.
> >>>>>	 */
> >>>>>-	if (engine->irq_seqno_barrier &&
> >>>>>-	    test_and_clear_bit(ENGINE_IRQ_BREADCRUMB, &engine->irq_posted)) {
> >>>>>+	while (test_and_clear_bit(ENGINE_IRQ_BREADCRUMB, &engine->irq_posted)) {
> >>>>>		unsigned long flags;
> >>>>>
> >>>>>		/* The ordering of irq_posted versus applying the barrier
> >>>>>
> >>>>
> >>>>Hmmm.. if this helps it feels that there is a race somewhere.
> >>>>Because we have processed one interrupt, but it wasn't for us.
> >>>
> >>>There may be many interrupts between now and the one we actually want.
> >>>The caller of this function is (or was at the time) has the first seqno
> >>>to be signaled, it is just not necessarily the next interrupt.
> >>>
> >>>>That
> >>>>means there will be another one coming. Why handle that at this
> >>>>level? Why check twice and not three, four, five times? :)
> >>>
> >>>Because they are all for us! This only saves a trip through schedule. If
> >>>we don't loop here, we just loop at the next level.
> >>
> >>Yes, which looks more correct to me. Because the caller then do what
> >>they want. Signaller just sleeps and i915_wait_request potentially
> >>busy waits for a bit.
> >
> >That's incorrect.
> 
> Which part is incorrect? I double checked and as far as I can see it
> is how I described it.

I had removed the spin part, so for both parties this was just in the
{ check; sleep; } loop. Note that in the case where we do need the
seqno-barrier, the barrier is more effective than spinning. :|

diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index 7760d7481f85..9e42b2687cae 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -993,6 +993,9 @@ bool __i915_spin_request(const struct drm_i915_gem_request *req,
                                      seqno))
                        return true;
 
+               if (test_bit(ENGINE_IRQ_BREADCRUMB, &req->engine->irq_posted))
+                       break;
+
                if (signal_pending_state(state, current))
                        break;

which basically tells us that interrupt has arrived without the seqno
becoming visible, abort.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/2] drm/i915: Postpone fake breadcrumb interrupt if a real interrupt is pending
  2017-02-16 11:17       ` Tvrtko Ursulin
@ 2017-02-16 11:40         ` Chris Wilson
  0 siblings, 0 replies; 20+ messages in thread
From: Chris Wilson @ 2017-02-16 11:40 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Thu, Feb 16, 2017 at 11:17:11AM +0000, Tvrtko Ursulin wrote:
> 
> On 16/02/2017 10:47, Chris Wilson wrote:
> >On Thu, Feb 16, 2017 at 10:38:08AM +0000, Tvrtko Ursulin wrote:
> >>
> >>On 16/02/2017 09:29, Chris Wilson wrote:
> >>>If the timer expires for enabling the fake interrupt, check to see
> >>>if there is a real interrupt queued before making the decision to start
> >>>polling. This helps in situations where we have a very slow irq-seqno
> >>>barrier that may accrue more breadcrumb interrupts before it is able to
> >>>catch up. It still leaves a hole for the timer to expire as we are
> >>>processing the last irq-seqno barrier, but it appears to help reduce the
> >>>frequency of "missed-interrupts" on Ironlake, at least.
> >>>
> >>>References: https://bugs.freedesktop.org/show_bug.cgi?id=99816
> >>>Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >>>Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>>---
> >>>drivers/gpu/drm/i915/intel_breadcrumbs.c | 5 +++++
> >>>1 file changed, 5 insertions(+)
> >>>
> >>>diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> >>>index ef3adfd37d7d..21269421bd2a 100644
> >>>--- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
> >>>+++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> >>>@@ -41,6 +41,11 @@ static void intel_breadcrumbs_hangcheck(unsigned long data)
> >>>		return;
> >>>	}
> >>>
> >>>+	if (test_bit(ENGINE_IRQ_BREADCRUMB, &engine->irq_posted)) {
> >>>+		mod_timer(&b->hangcheck, jiffies + 1);
> >>>+		return;
> >>>+	}
> >>>+
> >>>	DRM_DEBUG("Hangcheck timer elapsed... %s idle\n", engine->name);
> >>>	set_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings);
> >>>	mod_timer(&engine->breadcrumbs.fake_irq, jiffies + 1);
> >>>
> >>
> >>Another hmm :), barriers are so much shorter than the hangcheck
> >>interval (1500ms) so I don't quite understand what is the problem.
> >
> >We may queue the wait many, many seconds before it is even being
> >processed. The point of this timer is to detect when we haven't seen an
> >interrupt for some time and that happens to be the waiter missed (backup
> >for seqno-barrier failing).
> 
> But it is the same as a slow batch which completes just as the
> hangcheck timer fires.

Yes.

> Is the problem is seqno barrier on some platforms can slow down the
> signaller thread a lot?

Yes. Waiting for the seqno to become visible appears unbounded...

> Hm, if the request duration is right it
> would sleep on every invocation. So in effect extend the request
> duration as seen form userspace by the barrier duration. Why would
> that be a problem though? Haven't fully figured it out.

It's only a problem if we get interrupts in that case and no seqno
advance. But as we are getting the interrupts, we are at least checking
the seqno and doing a timer wakeup is not going to progress the waiter.

The premise behind the timer is that we will wakeup after an interrupt
and not see the seqno advance. (Or even worse, never see an interrupt!)
Only when the interrupts stop, and the waiter stops checking, do we need
to intervene.

I've rewritten this to explicitly check for stopped interrupts,
hopefully that makes this backup strategy clear.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2] drm/i915: Postpone fake breadcrumb interrupt until real interrupts cease
  2017-02-16 11:13   ` [PATCH v2] drm/i915: Postpone fake breadcrumb interrupt until real interrupts cease Chris Wilson
@ 2017-02-16 11:42     ` Chris Wilson
  2017-02-16 11:44     ` Tvrtko Ursulin
  1 sibling, 0 replies; 20+ messages in thread
From: Chris Wilson @ 2017-02-16 11:42 UTC (permalink / raw)
  To: intel-gfx

On Thu, Feb 16, 2017 at 11:13:47AM +0000, Chris Wilson wrote:
> When the timer expires for checking on interrupt processing, check to
> see if any interrupts arrived within the last time period. If real
> interrupts are still being delivered, we can be reassured that we
> haven't missed the final interrupt as the waiter will still be woken.
> Only once all activity ceases, do we have to worry about the waiter
> never being woken and so need to install a timer to kick the waiter for
> a slow arrival of a seqno.
> 
> References: https://bugs.freedesktop.org/show_bug.cgi?id=99816

Bah, not relevant anymore. Ironlake misses are still present.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2] drm/i915: Postpone fake breadcrumb interrupt until real interrupts cease
  2017-02-16 11:13   ` [PATCH v2] drm/i915: Postpone fake breadcrumb interrupt until real interrupts cease Chris Wilson
  2017-02-16 11:42     ` Chris Wilson
@ 2017-02-16 11:44     ` Tvrtko Ursulin
  2017-02-16 12:00       ` Chris Wilson
  1 sibling, 1 reply; 20+ messages in thread
From: Tvrtko Ursulin @ 2017-02-16 11:44 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 16/02/2017 11:13, Chris Wilson wrote:
> When the timer expires for checking on interrupt processing, check to
> see if any interrupts arrived within the last time period. If real
> interrupts are still being delivered, we can be reassured that we
> haven't missed the final interrupt as the waiter will still be woken.
> Only once all activity ceases, do we have to worry about the waiter
> never being woken and so need to install a timer to kick the waiter for
> a slow arrival of a seqno.
>
> References: https://bugs.freedesktop.org/show_bug.cgi?id=99816
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_irq.c          |  1 +
>  drivers/gpu/drm/i915/intel_breadcrumbs.c | 26 ++++++++++++--------------
>  drivers/gpu/drm/i915/intel_ringbuffer.h  |  3 ++-
>  3 files changed, 15 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 1ce54601432b..d3c851c93ee3 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -1037,6 +1037,7 @@ static void notify_ring(struct intel_engine_cs *engine)
>  	struct intel_wait *wait;
>
>  	trace_i915_gem_request_notify(engine);
> +	atomic_inc(&engine->irq_count);
>  	set_bit(ENGINE_IRQ_BREADCRUMB, &engine->irq_posted);
>
>  	rcu_read_lock();
> diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> index ef3adfd37d7d..3267e671888c 100644
> --- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
> +++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> @@ -26,6 +26,11 @@
>
>  #include "i915_drv.h"
>
> +static unsigned long wait_timeout(void)
> +{
> +	return round_jiffies_up(jiffies + DRM_I915_HANGCHECK_JIFFIES);
> +}
> +
>  static void intel_breadcrumbs_hangcheck(unsigned long data)
>  {
>  	struct intel_engine_cs *engine = (struct intel_engine_cs *)data;
> @@ -36,8 +41,9 @@ static void intel_breadcrumbs_hangcheck(unsigned long data)
>  		return;
>  	}
>
> -	if (time_before(jiffies, b->timeout)) {
> -		mod_timer(&b->hangcheck, b->timeout);
> +	if (b->hangcheck_interrupts != atomic_read(&engine->irq_count)) {
> +		b->hangcheck_interrupts = atomic_read(&engine->irq_count);
> +		mod_timer(&b->hangcheck, wait_timeout());
>  		return;
>  	}
>
> @@ -57,11 +63,6 @@ static void intel_breadcrumbs_hangcheck(unsigned long data)
>  	i915_queue_hangcheck(engine->i915);
>  }
>
> -static unsigned long wait_timeout(void)
> -{
> -	return round_jiffies_up(jiffies + DRM_I915_HANGCHECK_JIFFIES);
> -}
> -
>  static void intel_breadcrumbs_fake_irq(unsigned long data)
>  {
>  	struct intel_engine_cs *engine = (struct intel_engine_cs *)data;
> @@ -176,8 +177,8 @@ static void __intel_breadcrumbs_enable_irq(struct intel_breadcrumbs *b)
>  		i915_queue_hangcheck(i915);
>  	} else {
>  		/* Ensure we never sleep indefinitely */
> -		GEM_BUG_ON(!time_after(b->timeout, jiffies));
> -		mod_timer(&b->hangcheck, b->timeout);
> +		b->hangcheck_interrupts = atomic_read(&engine->irq_count);
> +		mod_timer(&b->hangcheck, wait_timeout());
>  	}
>  }
>
> @@ -270,7 +271,6 @@ static bool __intel_engine_add_wait(struct intel_engine_cs *engine,
>  		GEM_BUG_ON(!next && !first);
>  		if (next && next != &wait->node) {
>  			GEM_BUG_ON(first);
> -			b->timeout = wait_timeout();
>  			b->first_wait = to_wait(next);
>  			/* As there is a delay between reading the current
>  			 * seqno, processing the completed tasks and selecting
> @@ -297,7 +297,6 @@ static bool __intel_engine_add_wait(struct intel_engine_cs *engine,
>
>  	if (first) {
>  		GEM_BUG_ON(rb_first(&b->waiters) != &wait->node);
> -		b->timeout = wait_timeout();
>  		b->first_wait = wait;
>  		/* After assigning ourselves as the new bottom-half, we must
>  		 * perform a cursory check to prevent a missed interrupt.
> @@ -396,7 +395,6 @@ static void __intel_engine_remove_wait(struct intel_engine_cs *engine,
>  			 * the interrupt, or if we have to handle an
>  			 * exception rather than a seqno completion.
>  			 */
> -			b->timeout = wait_timeout();
>  			b->first_wait = to_wait(next);
>  			if (b->first_wait->seqno != wait->seqno)
>  				__intel_breadcrumbs_enable_irq(b);
> @@ -702,10 +700,10 @@ void intel_engine_reset_breadcrumbs(struct intel_engine_cs *engine)
>  		irq_disable(engine);
>
>  	if (intel_engine_has_waiter(engine)) {
> -		b->timeout = wait_timeout();
>  		if (test_bit(ENGINE_IRQ_BREADCRUMB, &engine->irq_posted))
>  			wake_up_process(b->first_wait->tsk);
> -		mod_timer(&b->hangcheck, b->timeout);
> +		b->hangcheck_interrupts = atomic_read(&engine->irq_count);
> +		mod_timer(&b->hangcheck, wait_timeout());
>  		/* sanitize the IMR and unmask any auxiliary interrupts */
>  	}
>
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index e036ee3e35f9..75b572a9e80a 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -213,6 +213,7 @@ struct intel_engine_cs {
>
>  	struct intel_render_state *render_state;
>
> +	atomic_t irq_count;
>  	unsigned long irq_posted;
>  #define ENGINE_IRQ_BREADCRUMB 0
>  #define ENGINE_IRQ_EXECLIST 1
> @@ -243,7 +244,7 @@ struct intel_engine_cs {
>  		struct timer_list fake_irq; /* used after a missed interrupt */
>  		struct timer_list hangcheck; /* detect missed interrupts */
>
> -		unsigned long timeout;
> +		unsigned int hangcheck_interrupts;
>
>  		bool irq_armed : 1;
>  		bool irq_enabled : 1;
>

I prefer this version by far. Especially since it keeps the normal 
hangcheck period as long as the interrupts are happening.

I suppose it could only get foiled if interrupts kept coming but the 
seqno staying put. Does this sounds like a concern?

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2] drm/i915: Postpone fake breadcrumb interrupt until real interrupts cease
  2017-02-16 11:44     ` Tvrtko Ursulin
@ 2017-02-16 12:00       ` Chris Wilson
  2017-02-16 12:15         ` Tvrtko Ursulin
  0 siblings, 1 reply; 20+ messages in thread
From: Chris Wilson @ 2017-02-16 12:00 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Thu, Feb 16, 2017 at 11:44:31AM +0000, Tvrtko Ursulin wrote:
> 
> On 16/02/2017 11:13, Chris Wilson wrote:
> >When the timer expires for checking on interrupt processing, check to
> >see if any interrupts arrived within the last time period. If real
> >interrupts are still being delivered, we can be reassured that we
> >haven't missed the final interrupt as the waiter will still be woken.
> >Only once all activity ceases, do we have to worry about the waiter
> >never being woken and so need to install a timer to kick the waiter for
> >a slow arrival of a seqno.
> 
> I prefer this version by far. Especially since it keeps the normal
> hangcheck period as long as the interrupts are happening.
> 
> I suppose it could only get foiled if interrupts kept coming but the
> seqno staying put. Does this sounds like a concern?

The failure condition would be the hw gets stuck sending out
user-interrupts, even after the rings stop, and the seqno is stuck.

The seqno sticking even though the rings keep running (until the ring is
empty) is definitely a problem seen in some GPU hangs. But we haven't
been suceptible to a stuck user-interrupt before so I'm not sure if that
is feasible. I hope not.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2] drm/i915: Postpone fake breadcrumb interrupt until real interrupts cease
  2017-02-16 12:00       ` Chris Wilson
@ 2017-02-16 12:15         ` Tvrtko Ursulin
  0 siblings, 0 replies; 20+ messages in thread
From: Tvrtko Ursulin @ 2017-02-16 12:15 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 16/02/2017 12:00, Chris Wilson wrote:
> On Thu, Feb 16, 2017 at 11:44:31AM +0000, Tvrtko Ursulin wrote:
>>
>> On 16/02/2017 11:13, Chris Wilson wrote:
>>> When the timer expires for checking on interrupt processing, check to
>>> see if any interrupts arrived within the last time period. If real
>>> interrupts are still being delivered, we can be reassured that we
>>> haven't missed the final interrupt as the waiter will still be woken.
>>> Only once all activity ceases, do we have to worry about the waiter
>>> never being woken and so need to install a timer to kick the waiter for
>>> a slow arrival of a seqno.
>>
>> I prefer this version by far. Especially since it keeps the normal
>> hangcheck period as long as the interrupts are happening.
>>
>> I suppose it could only get foiled if interrupts kept coming but the
>> seqno staying put. Does this sounds like a concern?
>
> The failure condition would be the hw gets stuck sending out
> user-interrupts, even after the rings stop, and the seqno is stuck.
>
> The seqno sticking even though the rings keep running (until the ring is
> empty) is definitely a problem seen in some GPU hangs. But we haven't
> been suceptible to a stuck user-interrupt before so I'm not sure if that
> is feasible. I hope not.

Sounds good to me.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] drm/i915: Recheck breadcrumb seqno after an interrupt
  2017-02-16 11:21       ` Chris Wilson
@ 2017-02-16 12:47         ` Tvrtko Ursulin
  2017-02-16 13:48           ` Chris Wilson
  0 siblings, 1 reply; 20+ messages in thread
From: Tvrtko Ursulin @ 2017-02-16 12:47 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 16/02/2017 11:21, Chris Wilson wrote:
> On Thu, Feb 16, 2017 at 10:45:56AM +0000, Tvrtko Ursulin wrote:
>>
>> On 16/02/2017 10:36, Chris Wilson wrote:
>>> On Thu, Feb 16, 2017 at 10:21:17AM +0000, Tvrtko Ursulin wrote:
>>>>
>>>> On 16/02/2017 09:29, Chris Wilson wrote:
>>>>> If an interrupt arrives whilst we are performing the irq-seqno barrier,
>>>>> recheck the seqno again before returning.
>>>>>
>>>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>> ---
>>>>> drivers/gpu/drm/i915/i915_drv.h | 6 ++++--
>>>>> 1 file changed, 4 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>>>>> index c1b400f1ede4..ecb8b414bdd2 100644
>>>>> --- a/drivers/gpu/drm/i915/i915_drv.h
>>>>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>>>>> @@ -4102,6 +4102,9 @@ __i915_request_irq_complete(const struct drm_i915_gem_request *req)
>>>>> 	if (__i915_gem_request_completed(req, seqno))
>>>>> 		return true;
>>>>>
>>>>> +	if (!engine->irq_seqno_barrier)
>>>>> +		return false;
>>>>> +
>>>>> 	/* Ensure our read of the seqno is coherent so that we
>>>>> 	 * do not "miss an interrupt" (i.e. if this is the last
>>>>> 	 * request and the seqno write from the GPU is not visible
>>>>> @@ -4113,8 +4116,7 @@ __i915_request_irq_complete(const struct drm_i915_gem_request *req)
>>>>> 	 * but it is easier and safer to do it every time the waiter
>>>>> 	 * is woken.
>>>>> 	 */
>>>>> -	if (engine->irq_seqno_barrier &&
>>>>> -	    test_and_clear_bit(ENGINE_IRQ_BREADCRUMB, &engine->irq_posted)) {
>>>>> +	while (test_and_clear_bit(ENGINE_IRQ_BREADCRUMB, &engine->irq_posted)) {
>>>>> 		unsigned long flags;
>>>>>
>>>>> 		/* The ordering of irq_posted versus applying the barrier
>>>>>
>>>>
>>>> Hmmm.. if this helps it feels that there is a race somewhere.
>>>> Because we have processed one interrupt, but it wasn't for us.
>>>
>>> There may be many interrupts between now and the one we actually want.
>>> The caller of this function is (or was at the time) has the first seqno
>>> to be signaled, it is just not necessarily the next interrupt.
>>>
>>>> That
>>>> means there will be another one coming. Why handle that at this
>>>> level? Why check twice and not three, four, five times? :)
>>>
>>> Because they are all for us! This only saves a trip through schedule. If
>>> we don't loop here, we just loop at the next level.
>>
>> Yes, which looks more correct to me. Because the caller then do what
>> they want. Signaller just sleeps and i915_wait_request potentially
>> busy waits for a bit.
>
> A persuasive counter argument would be:
>
> Interrupts can be arriving frequently for an indefinite period before we
> see our seqno that breaks the loop. We should not allow this to spin
> indefinitely and at the very least need to be checking for pending
> interrupts or timeslice exhaustion. Both of which are handled in the
> callers.

I got lost in this branch of the thread. Are you making an argument for 
me doubting the patch, or for you supporting the patch? :)

Regards,

Tvrtko


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/2] drm/i915: Recheck breadcrumb seqno after an interrupt
  2017-02-16 12:47         ` Tvrtko Ursulin
@ 2017-02-16 13:48           ` Chris Wilson
  0 siblings, 0 replies; 20+ messages in thread
From: Chris Wilson @ 2017-02-16 13:48 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Thu, Feb 16, 2017 at 12:47:41PM +0000, Tvrtko Ursulin wrote:
> 
> On 16/02/2017 11:21, Chris Wilson wrote:
> >On Thu, Feb 16, 2017 at 10:45:56AM +0000, Tvrtko Ursulin wrote:
> >>
> >>On 16/02/2017 10:36, Chris Wilson wrote:
> >>>On Thu, Feb 16, 2017 at 10:21:17AM +0000, Tvrtko Ursulin wrote:
> >>>>
> >>>>On 16/02/2017 09:29, Chris Wilson wrote:
> >>>>>If an interrupt arrives whilst we are performing the irq-seqno barrier,
> >>>>>recheck the seqno again before returning.
> >>>>>
> >>>>>Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >>>>>Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>>>>---
> >>>>>drivers/gpu/drm/i915/i915_drv.h | 6 ++++--
> >>>>>1 file changed, 4 insertions(+), 2 deletions(-)
> >>>>>
> >>>>>diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> >>>>>index c1b400f1ede4..ecb8b414bdd2 100644
> >>>>>--- a/drivers/gpu/drm/i915/i915_drv.h
> >>>>>+++ b/drivers/gpu/drm/i915/i915_drv.h
> >>>>>@@ -4102,6 +4102,9 @@ __i915_request_irq_complete(const struct drm_i915_gem_request *req)
> >>>>>	if (__i915_gem_request_completed(req, seqno))
> >>>>>		return true;
> >>>>>
> >>>>>+	if (!engine->irq_seqno_barrier)
> >>>>>+		return false;
> >>>>>+
> >>>>>	/* Ensure our read of the seqno is coherent so that we
> >>>>>	 * do not "miss an interrupt" (i.e. if this is the last
> >>>>>	 * request and the seqno write from the GPU is not visible
> >>>>>@@ -4113,8 +4116,7 @@ __i915_request_irq_complete(const struct drm_i915_gem_request *req)
> >>>>>	 * but it is easier and safer to do it every time the waiter
> >>>>>	 * is woken.
> >>>>>	 */
> >>>>>-	if (engine->irq_seqno_barrier &&
> >>>>>-	    test_and_clear_bit(ENGINE_IRQ_BREADCRUMB, &engine->irq_posted)) {
> >>>>>+	while (test_and_clear_bit(ENGINE_IRQ_BREADCRUMB, &engine->irq_posted)) {
> >>>>>		unsigned long flags;
> >>>>>
> >>>>>		/* The ordering of irq_posted versus applying the barrier
> >>>>>
> >>>>
> >>>>Hmmm.. if this helps it feels that there is a race somewhere.
> >>>>Because we have processed one interrupt, but it wasn't for us.
> >>>
> >>>There may be many interrupts between now and the one we actually want.
> >>>The caller of this function is (or was at the time) has the first seqno
> >>>to be signaled, it is just not necessarily the next interrupt.
> >>>
> >>>>That
> >>>>means there will be another one coming. Why handle that at this
> >>>>level? Why check twice and not three, four, five times? :)
> >>>
> >>>Because they are all for us! This only saves a trip through schedule. If
> >>>we don't loop here, we just loop at the next level.
> >>
> >>Yes, which looks more correct to me. Because the caller then do what
> >>they want. Signaller just sleeps and i915_wait_request potentially
> >>busy waits for a bit.
> >
> >A persuasive counter argument would be:
> >
> >Interrupts can be arriving frequently for an indefinite period before we
> >see our seqno that breaks the loop. We should not allow this to spin
> >indefinitely and at the very least need to be checking for pending
> >interrupts or timeslice exhaustion. Both of which are handled in the
> >callers.
> 
> I got lost in this branch of the thread. Are you making an argument
> for me doubting the patch, or for you supporting the patch? :)

Against the patch.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2017-02-16 13:48 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-16  9:29 [PATCH 1/2] drm/i915: Recheck breadcrumb seqno after an interrupt Chris Wilson
2017-02-16  9:29 ` [PATCH 2/2] drm/i915: Postpone fake breadcrumb interrupt if a real interrupt is pending Chris Wilson
2017-02-16 10:38   ` Tvrtko Ursulin
2017-02-16 10:47     ` Chris Wilson
2017-02-16 11:17       ` Tvrtko Ursulin
2017-02-16 11:40         ` Chris Wilson
2017-02-16 11:13   ` [PATCH v2] drm/i915: Postpone fake breadcrumb interrupt until real interrupts cease Chris Wilson
2017-02-16 11:42     ` Chris Wilson
2017-02-16 11:44     ` Tvrtko Ursulin
2017-02-16 12:00       ` Chris Wilson
2017-02-16 12:15         ` Tvrtko Ursulin
2017-02-16 10:21 ` [PATCH 1/2] drm/i915: Recheck breadcrumb seqno after an interrupt Tvrtko Ursulin
2017-02-16 10:36   ` Chris Wilson
2017-02-16 10:45     ` Tvrtko Ursulin
2017-02-16 10:50       ` Chris Wilson
2017-02-16 11:19         ` Tvrtko Ursulin
2017-02-16 11:34           ` Chris Wilson
2017-02-16 11:21       ` Chris Wilson
2017-02-16 12:47         ` Tvrtko Ursulin
2017-02-16 13:48           ` Chris Wilson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.