All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 1/3] drm/i915: Drop racy markup of missed-irqs from idle-worker
@ 2016-07-21  6:57 Chris Wilson
  2016-07-21  6:57 ` [PATCH v2 2/3] drm/i915: Update the breadcrumb interrupt counter before enabling Chris Wilson
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Chris Wilson @ 2016-07-21  6:57 UTC (permalink / raw)
  To: intel-gfx

During the idle-worker we disable the hangcheck and so kick any waiters
that should have been completed (since the GPU is now idle). Unlike the
hangcheck, we do not take any care to avoid the race between the irq
handler and ourselves, and so it is possible for us to declare a missed
interrupt even as the bottom-half is being scheduled to run. Let's
ignore this race to stop a potential false-positive error.

References: https://bugs.freedesktop.org/show_bug.cgi?id=96974
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 40047eb48826..9e826585edb2 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2706,10 +2706,9 @@ i915_gem_idle_work_handler(struct work_struct *work)
 	rearm_hangcheck = false;
 
 	stuck_engines = intel_kick_waiters(dev_priv);
-	if (unlikely(stuck_engines)) {
-		DRM_DEBUG_DRIVER("kicked stuck waiters...missed irq\n");
-		dev_priv->gpu_error.missed_irq_rings |= stuck_engines;
-	}
+	if (unlikely(stuck_engines))
+		DRM_DEBUG_DRIVER("kicked stuck waiters (%x)...missed irq?\n",
+				 stuck_engines);
 
 	if (INTEL_GEN(dev_priv) >= 6)
 		gen6_rps_idle(dev_priv);
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 2/3] drm/i915: Update the breadcrumb interrupt counter before enabling
  2016-07-21  6:57 [PATCH v2 1/3] drm/i915: Drop racy markup of missed-irqs from idle-worker Chris Wilson
@ 2016-07-21  6:57 ` Chris Wilson
  2016-07-21  6:57 ` [PATCH v2 3/3] drm/i915: Check for a stuck waiter before a missed interrupt Chris Wilson
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 12+ messages in thread
From: Chris Wilson @ 2016-07-21  6:57 UTC (permalink / raw)
  To: intel-gfx

In order to close a race with a long running hangcheck comparing a stale
interrupt counter with a just started waiter, we need to first bump the
counter as we start the fresh wait.

References: https://bugs.freedesktop.org/show_bug.cgi?id=96974
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_breadcrumbs.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
index f0b56e3f4abe..d893ccdd62ac 100644
--- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
@@ -51,6 +51,13 @@ static void irq_enable(struct intel_engine_cs *engine)
 	 */
 	engine->breadcrumbs.irq_posted = true;
 
+	/* Make sure the current hangcheck doesn't falsely accuse a just
+	 * started irq handler from missing an interrupt (because the
+	 * interrupt count still matches the stale value from when
+	 * the irq handler was disabled, many hangchecks ago).
+	 */
+	engine->breadcrumbs.irq_wakeups++;
+
 	spin_lock_irq(&engine->i915->irq_lock);
 	engine->irq_enable(engine);
 	spin_unlock_irq(&engine->i915->irq_lock);
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 3/3] drm/i915: Check for a stuck waiter before a missed interrupt
  2016-07-21  6:57 [PATCH v2 1/3] drm/i915: Drop racy markup of missed-irqs from idle-worker Chris Wilson
  2016-07-21  6:57 ` [PATCH v2 2/3] drm/i915: Update the breadcrumb interrupt counter before enabling Chris Wilson
@ 2016-07-21  6:57 ` Chris Wilson
  2016-07-21 18:34   ` Chris Wilson
  2016-07-22  7:57   ` Chris Wilson
  2016-07-21  7:55 ` ✓ Ro.CI.BAT: success for series starting with [v2,1/3] drm/i915: Drop racy markup of missed-irqs from idle-worker Patchwork
  2016-07-21  9:58 ` [PATCH v2 1/3] " Tvrtko Ursulin
  3 siblings, 2 replies; 12+ messages in thread
From: Chris Wilson @ 2016-07-21  6:57 UTC (permalink / raw)
  To: intel-gfx

As the interrupt wakeup counter only increments when we have a waiter,
before testing to see if that counter is unchanged we have to first
check that we do expect it to change (i.e. we have a waiter).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_irq.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 7104dc1463eb..45afcdfe89b1 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -3062,7 +3062,9 @@ static unsigned long kick_waiters(struct intel_engine_cs *engine)
 	struct drm_i915_private *i915 = engine->i915;
 	unsigned long irq_count = READ_ONCE(engine->breadcrumbs.irq_wakeups);
 
-	if (engine->hangcheck.user_interrupts == irq_count &&
+	rcu_read_lock();
+	if (intel_engine_wakeup(engine) &&
+	    engine->hangcheck.user_interrupts == irq_count &&
 	    !test_and_set_bit(engine->id, &i915->gpu_error.missed_irq_rings)) {
 		if (!test_bit(engine->id, &i915->gpu_error.test_irq_rings))
 			DRM_ERROR("Hangcheck timer elapsed... %s idle\n",
@@ -3070,6 +3072,7 @@ static unsigned long kick_waiters(struct intel_engine_cs *engine)
 
 		intel_engine_enable_fake_irq(engine);
 	}
+	rcu_read_unlock();
 
 	return irq_count;
 }
-- 
2.8.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* ✓ Ro.CI.BAT: success for series starting with [v2,1/3] drm/i915: Drop racy markup of missed-irqs from idle-worker
  2016-07-21  6:57 [PATCH v2 1/3] drm/i915: Drop racy markup of missed-irqs from idle-worker Chris Wilson
  2016-07-21  6:57 ` [PATCH v2 2/3] drm/i915: Update the breadcrumb interrupt counter before enabling Chris Wilson
  2016-07-21  6:57 ` [PATCH v2 3/3] drm/i915: Check for a stuck waiter before a missed interrupt Chris Wilson
@ 2016-07-21  7:55 ` Patchwork
  2016-07-21  9:58 ` [PATCH v2 1/3] " Tvrtko Ursulin
  3 siblings, 0 replies; 12+ messages in thread
From: Patchwork @ 2016-07-21  7:55 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [v2,1/3] drm/i915: Drop racy markup of missed-irqs from idle-worker
URL   : https://patchwork.freedesktop.org/series/10124/
State : success

== Summary ==

Series 10124v1 Series without cover letter
http://patchwork.freedesktop.org/api/1.0/series/10124/revisions/1/mbox


fi-hsw-i7-4770k  total:244  pass:216  dwarn:0   dfail:0   fail:8   skip:20 
fi-kbl-qkkr      total:244  pass:180  dwarn:29  dfail:0   fail:8   skip:27 
fi-skl-i5-6260u  total:244  pass:224  dwarn:0   dfail:0   fail:8   skip:12 
fi-skl-i7-6700k  total:244  pass:210  dwarn:0   dfail:0   fail:8   skip:26 
fi-snb-i7-2600   total:244  pass:196  dwarn:0   dfail:0   fail:8   skip:40 
ro-bdw-i5-5250u  total:244  pass:219  dwarn:4   dfail:0   fail:8   skip:13 
ro-bdw-i7-5557U  total:244  pass:220  dwarn:2   dfail:0   fail:8   skip:14 
ro-bdw-i7-5600u  total:244  pass:204  dwarn:0   dfail:0   fail:8   skip:32 
ro-bsw-n3050     total:218  pass:173  dwarn:0   dfail:0   fail:2   skip:42 
ro-byt-n2820     total:244  pass:197  dwarn:0   dfail:0   fail:9   skip:38 
ro-hsw-i3-4010u  total:244  pass:212  dwarn:0   dfail:0   fail:8   skip:24 
ro-hsw-i7-4770r  total:244  pass:212  dwarn:0   dfail:0   fail:8   skip:24 
ro-ilk-i7-620lm  total:244  pass:172  dwarn:0   dfail:0   fail:9   skip:63 
ro-ilk1-i5-650   total:239  pass:172  dwarn:0   dfail:0   fail:9   skip:58 
ro-ivb-i7-3770   total:244  pass:203  dwarn:0   dfail:0   fail:8   skip:33 
ro-skl3-i5-6260u total:244  pass:224  dwarn:0   dfail:0   fail:8   skip:12 
ro-snb-i7-2620M  total:244  pass:193  dwarn:0   dfail:0   fail:9   skip:42 

Results at /archive/results/CI_IGT_test/RO_Patchwork_1555/

621e9dc drm-intel-nightly: 2016y-07m-20d-19h-19m-37s UTC integration manifest
baf6dda drm/i915: Check for a stuck waiter before a missed interrupt
9bbfde3 drm/i915: Update the breadcrumb interrupt counter before enabling
9741fde drm/i915: Drop racy markup of missed-irqs from idle-worker

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/3] drm/i915: Drop racy markup of missed-irqs from idle-worker
  2016-07-21  6:57 [PATCH v2 1/3] drm/i915: Drop racy markup of missed-irqs from idle-worker Chris Wilson
                   ` (2 preceding siblings ...)
  2016-07-21  7:55 ` ✓ Ro.CI.BAT: success for series starting with [v2,1/3] drm/i915: Drop racy markup of missed-irqs from idle-worker Patchwork
@ 2016-07-21  9:58 ` Tvrtko Ursulin
  2016-07-21 10:10   ` Chris Wilson
  3 siblings, 1 reply; 12+ messages in thread
From: Tvrtko Ursulin @ 2016-07-21  9:58 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 21/07/16 07:57, Chris Wilson wrote:
> During the idle-worker we disable the hangcheck and so kick any waiters
> that should have been completed (since the GPU is now idle). Unlike the
> hangcheck, we do not take any care to avoid the race between the irq
> handler and ourselves, and so it is possible for us to declare a missed
> interrupt even as the bottom-half is being scheduled to run. Let's
> ignore this race to stop a potential false-positive error.

If the bottom half is scheduled to run then then..

> References: https://bugs.freedesktop.org/show_bug.cgi?id=96974
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem.c | 7 +++----
>   1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 40047eb48826..9e826585edb2 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2706,10 +2706,9 @@ i915_gem_idle_work_handler(struct work_struct *work)
>   	rearm_hangcheck = false;
>
>   	stuck_engines = intel_kick_waiters(dev_priv);

... this will not return a stucked engine since the there is a bh task 
assigned all until the bh exits.

So I don't get it. :)

Regards,

Tvrtko

> -	if (unlikely(stuck_engines)) {
> -		DRM_DEBUG_DRIVER("kicked stuck waiters...missed irq\n");
> -		dev_priv->gpu_error.missed_irq_rings |= stuck_engines;
> -	}
> +	if (unlikely(stuck_engines))
> +		DRM_DEBUG_DRIVER("kicked stuck waiters (%x)...missed irq?\n",
> +				 stuck_engines);
>
>   	if (INTEL_GEN(dev_priv) >= 6)
>   		gen6_rps_idle(dev_priv);
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/3] drm/i915: Drop racy markup of missed-irqs from idle-worker
  2016-07-21  9:58 ` [PATCH v2 1/3] " Tvrtko Ursulin
@ 2016-07-21 10:10   ` Chris Wilson
  2016-07-21 10:28     ` Tvrtko Ursulin
  0 siblings, 1 reply; 12+ messages in thread
From: Chris Wilson @ 2016-07-21 10:10 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Thu, Jul 21, 2016 at 10:58:05AM +0100, Tvrtko Ursulin wrote:
> 
> On 21/07/16 07:57, Chris Wilson wrote:
> >During the idle-worker we disable the hangcheck and so kick any waiters
> >that should have been completed (since the GPU is now idle). Unlike the
> >hangcheck, we do not take any care to avoid the race between the irq
> >handler and ourselves, and so it is possible for us to declare a missed
> >interrupt even as the bottom-half is being scheduled to run. Let's
> >ignore this race to stop a potential false-positive error.
> 
> If the bottom half is scheduled to run then then..
> 
> >References: https://bugs.freedesktop.org/show_bug.cgi?id=96974
> >Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> >Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> >---
> >  drivers/gpu/drm/i915/i915_gem.c | 7 +++----
> >  1 file changed, 3 insertions(+), 4 deletions(-)
> >
> >diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> >index 40047eb48826..9e826585edb2 100644
> >--- a/drivers/gpu/drm/i915/i915_gem.c
> >+++ b/drivers/gpu/drm/i915/i915_gem.c
> >@@ -2706,10 +2706,9 @@ i915_gem_idle_work_handler(struct work_struct *work)
> >  	rearm_hangcheck = false;
> >
> >  	stuck_engines = intel_kick_waiters(dev_priv);
> 
> ... this will not return a stucked engine since the there is a bh
> task assigned all until the bh exits.

It reports if it wakes up a waiter on any engine. If the bh is already
running, we cannot know if it has missed the seqno update. If it isn't
running yet, we cannot know if it is about to be run.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/3] drm/i915: Drop racy markup of missed-irqs from idle-worker
  2016-07-21 10:10   ` Chris Wilson
@ 2016-07-21 10:28     ` Tvrtko Ursulin
  2016-07-21 11:04       ` Chris Wilson
  0 siblings, 1 reply; 12+ messages in thread
From: Tvrtko Ursulin @ 2016-07-21 10:28 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx, Joonas Lahtinen


On 21/07/16 11:10, Chris Wilson wrote:
> On Thu, Jul 21, 2016 at 10:58:05AM +0100, Tvrtko Ursulin wrote:
>>
>> On 21/07/16 07:57, Chris Wilson wrote:
>>> During the idle-worker we disable the hangcheck and so kick any waiters
>>> that should have been completed (since the GPU is now idle). Unlike the
>>> hangcheck, we do not take any care to avoid the race between the irq
>>> handler and ourselves, and so it is possible for us to declare a missed
>>> interrupt even as the bottom-half is being scheduled to run. Let's
>>> ignore this race to stop a potential false-positive error.
>>
>> If the bottom half is scheduled to run then then..
>>
>>> References: https://bugs.freedesktop.org/show_bug.cgi?id=96974
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/i915_gem.c | 7 +++----
>>>   1 file changed, 3 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>>> index 40047eb48826..9e826585edb2 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>>> @@ -2706,10 +2706,9 @@ i915_gem_idle_work_handler(struct work_struct *work)
>>>   	rearm_hangcheck = false;
>>>
>>>   	stuck_engines = intel_kick_waiters(dev_priv);
>>
>> ... this will not return a stucked engine since the there is a bh
>> task assigned all until the bh exits.
>
> It reports if it wakes up a waiter on any engine. If the bh is already
> running, we cannot know if it has missed the seqno update. If it isn't
> running yet, we cannot know if it is about to be run.

Oh I read the logic as completely opposite than what it is.

Since the idle worker runs 100ms after last retirement, that would mean 
a really slow waiter or what?

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/3] drm/i915: Drop racy markup of missed-irqs from idle-worker
  2016-07-21 10:28     ` Tvrtko Ursulin
@ 2016-07-21 11:04       ` Chris Wilson
  2016-07-22 10:10         ` Tvrtko Ursulin
  0 siblings, 1 reply; 12+ messages in thread
From: Chris Wilson @ 2016-07-21 11:04 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Thu, Jul 21, 2016 at 11:28:02AM +0100, Tvrtko Ursulin wrote:
> 
> On 21/07/16 11:10, Chris Wilson wrote:
> >On Thu, Jul 21, 2016 at 10:58:05AM +0100, Tvrtko Ursulin wrote:
> >>
> >>On 21/07/16 07:57, Chris Wilson wrote:
> >>>During the idle-worker we disable the hangcheck and so kick any waiters
> >>>that should have been completed (since the GPU is now idle). Unlike the
> >>>hangcheck, we do not take any care to avoid the race between the irq
> >>>handler and ourselves, and so it is possible for us to declare a missed
> >>>interrupt even as the bottom-half is being scheduled to run. Let's
> >>>ignore this race to stop a potential false-positive error.
> >>
> >>If the bottom half is scheduled to run then then..
> >>
> >>>References: https://bugs.freedesktop.org/show_bug.cgi?id=96974
> >>>Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >>>Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> >>>Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> >>>---
> >>>  drivers/gpu/drm/i915/i915_gem.c | 7 +++----
> >>>  1 file changed, 3 insertions(+), 4 deletions(-)
> >>>
> >>>diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> >>>index 40047eb48826..9e826585edb2 100644
> >>>--- a/drivers/gpu/drm/i915/i915_gem.c
> >>>+++ b/drivers/gpu/drm/i915/i915_gem.c
> >>>@@ -2706,10 +2706,9 @@ i915_gem_idle_work_handler(struct work_struct *work)
> >>>  	rearm_hangcheck = false;
> >>>
> >>>  	stuck_engines = intel_kick_waiters(dev_priv);
> >>
> >>... this will not return a stucked engine since the there is a bh
> >>task assigned all until the bh exits.
> >
> >It reports if it wakes up a waiter on any engine. If the bh is already
> >running, we cannot know if it has missed the seqno update. If it isn't
> >running yet, we cannot know if it is about to be run.
> 
> Oh I read the logic as completely opposite than what it is.
> 
> Since the idle worker runs 100ms after last retirement, that would
> mean a really slow waiter or what?

It is dubious. But the idle worker runs 100ms after the first time we
detect all engines are idle and may be running as we detect all engines
are idle again. The only thing for sure is that in some cases that bdw-u
is reaching the idle-worker with an unwoken engine (and that there is
a race here in declaring it as a missed interrupt). I wasn't that
concerned about the race because of that 100ms delay where eveything
should have been idle, but on reflection that 100ms is not guarranteed.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 3/3] drm/i915: Check for a stuck waiter before a missed interrupt
  2016-07-21  6:57 ` [PATCH v2 3/3] drm/i915: Check for a stuck waiter before a missed interrupt Chris Wilson
@ 2016-07-21 18:34   ` Chris Wilson
  2016-07-22  7:57   ` Chris Wilson
  1 sibling, 0 replies; 12+ messages in thread
From: Chris Wilson @ 2016-07-21 18:34 UTC (permalink / raw)
  To: intel-gfx

On Thu, Jul 21, 2016 at 07:57:39AM +0100, Chris Wilson wrote:
> As the interrupt wakeup counter only increments when we have a waiter,
> before testing to see if that counter is unchanged we have to first
> check that we do expect it to change (i.e. we have a waiter).
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Fwiw, this series got a 10/10 pass rate for that bdw-u. It's not much,
but maybe an improvement?
https://bugs.freedesktop.org/show_bug.cgi?id=96974#c5
Tested-by: Humberto Israel Perez Rodriguez <humberto.i.perez.rodriguez@intel.com>
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 3/3] drm/i915: Check for a stuck waiter before a missed interrupt
  2016-07-21  6:57 ` [PATCH v2 3/3] drm/i915: Check for a stuck waiter before a missed interrupt Chris Wilson
  2016-07-21 18:34   ` Chris Wilson
@ 2016-07-22  7:57   ` Chris Wilson
  1 sibling, 0 replies; 12+ messages in thread
From: Chris Wilson @ 2016-07-22  7:57 UTC (permalink / raw)
  To: intel-gfx

On Thu, Jul 21, 2016 at 07:57:39AM +0100, Chris Wilson wrote:
> As the interrupt wakeup counter only increments when we have a waiter,
> before testing to see if that counter is unchanged we have to first
> check that we do expect it to change (i.e. we have a waiter).
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_irq.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 7104dc1463eb..45afcdfe89b1 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -3062,7 +3062,9 @@ static unsigned long kick_waiters(struct intel_engine_cs *engine)
>  	struct drm_i915_private *i915 = engine->i915;
>  	unsigned long irq_count = READ_ONCE(engine->breadcrumbs.irq_wakeups);
>  
> -	if (engine->hangcheck.user_interrupts == irq_count &&
> +	rcu_read_lock();
> +	if (intel_engine_wakeup(engine) &&
> +	    engine->hangcheck.user_interrupts == irq_count &&

Sigh. Completely nerfs the detection of stuck waiters.
Should be
	if (engine->hangcheck.user_interrupts == irq_count &&
	    intel_engine_wakeup(engine) &&

The test itself doesn't imply a missed interrupt either, there be a
valid long lived batch causing a delay in the waiter. We can do better
if we allow ourselves to take a spinlock here.
-Chris


-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/3] drm/i915: Drop racy markup of missed-irqs from idle-worker
  2016-07-21 11:04       ` Chris Wilson
@ 2016-07-22 10:10         ` Tvrtko Ursulin
  2016-07-22 10:22           ` Chris Wilson
  0 siblings, 1 reply; 12+ messages in thread
From: Tvrtko Ursulin @ 2016-07-22 10:10 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx, Joonas Lahtinen


On 21/07/16 12:04, Chris Wilson wrote:
> On Thu, Jul 21, 2016 at 11:28:02AM +0100, Tvrtko Ursulin wrote:
>> On 21/07/16 11:10, Chris Wilson wrote:
>>> On Thu, Jul 21, 2016 at 10:58:05AM +0100, Tvrtko Ursulin wrote:
>>>>
>>>> On 21/07/16 07:57, Chris Wilson wrote:
>>>>> During the idle-worker we disable the hangcheck and so kick any waiters
>>>>> that should have been completed (since the GPU is now idle). Unlike the
>>>>> hangcheck, we do not take any care to avoid the race between the irq
>>>>> handler and ourselves, and so it is possible for us to declare a missed
>>>>> interrupt even as the bottom-half is being scheduled to run. Let's
>>>>> ignore this race to stop a potential false-positive error.
>>>>
>>>> If the bottom half is scheduled to run then then..
>>>>
>>>>> References: https://bugs.freedesktop.org/show_bug.cgi?id=96974
>>>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>>>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>>>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
>>>>> ---
>>>>>   drivers/gpu/drm/i915/i915_gem.c | 7 +++----
>>>>>   1 file changed, 3 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>>>>> index 40047eb48826..9e826585edb2 100644
>>>>> --- a/drivers/gpu/drm/i915/i915_gem.c
>>>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>>>>> @@ -2706,10 +2706,9 @@ i915_gem_idle_work_handler(struct work_struct *work)
>>>>>   	rearm_hangcheck = false;
>>>>>
>>>>>   	stuck_engines = intel_kick_waiters(dev_priv);
>>>>
>>>> ... this will not return a stucked engine since the there is a bh
>>>> task assigned all until the bh exits.
>>>
>>> It reports if it wakes up a waiter on any engine. If the bh is already
>>> running, we cannot know if it has missed the seqno update. If it isn't
>>> running yet, we cannot know if it is about to be run.
>>
>> Oh I read the logic as completely opposite than what it is.
>>
>> Since the idle worker runs 100ms after last retirement, that would
>> mean a really slow waiter or what?
>
> It is dubious. But the idle worker runs 100ms after the first time we
> detect all engines are idle and may be running as we detect all engines
> are idle again. The only thing for sure is that in some cases that bdw-u
> is reaching the idle-worker with an unwoken engine (and that there is
> a race here in declaring it as a missed interrupt). I wasn't that
> concerned about the race because of that 100ms delay where eveything
> should have been idle, but on reflection that 100ms is not guarranteed.

Would canceling the idle worker be to expensive?

Either way, looks OK to me.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko



_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/3] drm/i915: Drop racy markup of missed-irqs from idle-worker
  2016-07-22 10:10         ` Tvrtko Ursulin
@ 2016-07-22 10:22           ` Chris Wilson
  0 siblings, 0 replies; 12+ messages in thread
From: Chris Wilson @ 2016-07-22 10:22 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

On Fri, Jul 22, 2016 at 11:10:28AM +0100, Tvrtko Ursulin wrote:
> 
> Would canceling the idle worker be to expensive?

It wasn't as much as that, I was trying to keep runtime suspend simple.
In that the GT takes the wakelock to prevent suspend as required and
not have the knowledge about all the users of the device inside runtime
management callbacks. (It means the users then have to be concious that
if they don't hold an explicit wakelock, they should check rpm first.)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-07-22 10:22 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-21  6:57 [PATCH v2 1/3] drm/i915: Drop racy markup of missed-irqs from idle-worker Chris Wilson
2016-07-21  6:57 ` [PATCH v2 2/3] drm/i915: Update the breadcrumb interrupt counter before enabling Chris Wilson
2016-07-21  6:57 ` [PATCH v2 3/3] drm/i915: Check for a stuck waiter before a missed interrupt Chris Wilson
2016-07-21 18:34   ` Chris Wilson
2016-07-22  7:57   ` Chris Wilson
2016-07-21  7:55 ` ✓ Ro.CI.BAT: success for series starting with [v2,1/3] drm/i915: Drop racy markup of missed-irqs from idle-worker Patchwork
2016-07-21  9:58 ` [PATCH v2 1/3] " Tvrtko Ursulin
2016-07-21 10:10   ` Chris Wilson
2016-07-21 10:28     ` Tvrtko Ursulin
2016-07-21 11:04       ` Chris Wilson
2016-07-22 10:10         ` Tvrtko Ursulin
2016-07-22 10:22           ` Chris Wilson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.