dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] drm/i915/gt: Disarm breadcrumbs if engines are already idle
@ 2024-04-23 16:23 Janusz Krzysztofik
  2024-04-26 16:13 ` Nirmoy Das
  0 siblings, 1 reply; 3+ messages in thread
From: Janusz Krzysztofik @ 2024-04-23 16:23 UTC (permalink / raw)
  To: intel-gfx
  Cc: dri-devel, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, Andrzej Hajda, Andi Shyti, Chris Wilson,
	Janusz Krzysztofik

From: Chris Wilson <chris@chris-wilson.co.uk>

The breadcrumbs use a GT wakeref for guarding the interrupt, but are
disarmed during release of the engine wakeref. This leaves a hole where
we may attach a breadcrumb just as the engine is parking (after it has
parked its breadcrumbs), execute the irq worker with some signalers still
attached, but never be woken again.

That issue manifests itself in CI with IGT runner timeouts while tests
are waiting indefinitely for release of all GT wakerefs.

<6> [209.151778] i915: Running live_engine_pm_selftests/live_engine_busy_stats
<7> [209.231628] i915 0000:00:02.0: [drm:intel_power_well_disable [i915]] disabling PW_5
<7> [209.231816] i915 0000:00:02.0: [drm:intel_power_well_disable [i915]] disabling PW_4
<7> [209.231944] i915 0000:00:02.0: [drm:intel_power_well_disable [i915]] disabling PW_3
<7> [209.232056] i915 0000:00:02.0: [drm:intel_power_well_disable [i915]] disabling PW_2
<7> [209.232166] i915 0000:00:02.0: [drm:intel_power_well_disable [i915]] disabling DC_off
<7> [209.232270] i915 0000:00:02.0: [drm:skl_enable_dc6 [i915]] Enabling DC6
<7> [209.232368] i915 0000:00:02.0: [drm:gen9_set_dc_state.part.0 [i915]] Setting DC state from 00 to 02
<4> [299.356116] [IGT] Inactivity timeout exceeded. Killing the current test with SIGQUIT.
...
<6> [299.356526] sysrq: Show State
...
<6> [299.373964] task:i915_selftest   state:D stack:11784 pid:5578  tgid:5578  ppid:873    flags:0x00004002
<6> [299.373967] Call Trace:
<6> [299.373968]  <TASK>
<6> [299.373970]  __schedule+0x3bb/0xda0
<6> [299.373974]  schedule+0x41/0x110
<6> [299.373976]  intel_wakeref_wait_for_idle+0x82/0x100 [i915]
<6> [299.374083]  ? __pfx_var_wake_function+0x10/0x10
<6> [299.374087]  live_engine_busy_stats+0x9b/0x500 [i915]
<6> [299.374173]  __i915_subtests+0xbe/0x240 [i915]
<6> [299.374277]  ? __pfx___intel_gt_live_setup+0x10/0x10 [i915]
<6> [299.374369]  ? __pfx___intel_gt_live_teardown+0x10/0x10 [i915]
<6> [299.374456]  intel_engine_live_selftests+0x1c/0x30 [i915]
<6> [299.374547]  __run_selftests+0xbb/0x190 [i915]
<6> [299.374635]  i915_live_selftests+0x4b/0x90 [i915]
<6> [299.374717]  i915_pci_probe+0x10d/0x210 [i915]

At the end of the interrupt worker, if there are no more engines awake,
disarm the breadcrumb and go to sleep.

Fixes: 9d5612ca165a ("drm/i915/gt: Defer enabling the breadcrumb interrupt to after submission")
Closes: https://gitlab.freedesktop.org/drm/intel/issues/10026
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Cc: <stable@vger.kernel.org> # v5.12+
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
---
 drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
index d650beb8ed22f..20b9b04ec1e0b 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
@@ -263,8 +263,13 @@ static void signal_irq_work(struct irq_work *work)
 		i915_request_put(rq);
 	}
 
+	/* Lazy irq enabling after HW submission */
 	if (!READ_ONCE(b->irq_armed) && !list_empty(&b->signalers))
 		intel_breadcrumbs_arm_irq(b);
+
+	/* And confirm that we still want irqs enabled before we yield */
+	if (READ_ONCE(b->irq_armed) && !atomic_read(&b->active))
+		intel_breadcrumbs_disarm_irq(b);
 }
 
 struct intel_breadcrumbs *
@@ -315,13 +320,7 @@ void __intel_breadcrumbs_park(struct intel_breadcrumbs *b)
 		return;
 
 	/* Kick the work once more to drain the signalers, and disarm the irq */
-	irq_work_sync(&b->irq_work);
-	while (READ_ONCE(b->irq_armed) && !atomic_read(&b->active)) {
-		local_irq_disable();
-		signal_irq_work(&b->irq_work);
-		local_irq_enable();
-		cond_resched();
-	}
+	irq_work_queue(&b->irq_work);
 }
 
 void intel_breadcrumbs_free(struct kref *kref)
@@ -404,7 +403,7 @@ static void insert_breadcrumb(struct i915_request *rq)
 	 * the request as it may have completed and raised the interrupt as
 	 * we were attaching it into the lists.
 	 */
-	if (!b->irq_armed || __i915_request_is_complete(rq))
+	if (!READ_ONCE(b->irq_armed) || __i915_request_is_complete(rq))
 		irq_work_queue(&b->irq_work);
 }
 
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] drm/i915/gt: Disarm breadcrumbs if engines are already idle
  2024-04-23 16:23 [PATCH] drm/i915/gt: Disarm breadcrumbs if engines are already idle Janusz Krzysztofik
@ 2024-04-26 16:13 ` Nirmoy Das
  2024-04-29  9:20   ` Janusz Krzysztofik
  0 siblings, 1 reply; 3+ messages in thread
From: Nirmoy Das @ 2024-04-26 16:13 UTC (permalink / raw)
  To: Janusz Krzysztofik, intel-gfx
  Cc: dri-devel, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, Andrzej Hajda, Andi Shyti, Chris Wilson


On 4/23/2024 6:23 PM, Janusz Krzysztofik wrote:
> From: Chris Wilson <chris@chris-wilson.co.uk>
>
> The breadcrumbs use a GT wakeref for guarding the interrupt, but are
> disarmed during release of the engine wakeref. This leaves a hole where
> we may attach a breadcrumb just as the engine is parking (after it has
> parked its breadcrumbs), execute the irq worker with some signalers still
> attached, but never be woken again.
>
> That issue manifests itself in CI with IGT runner timeouts while tests
> are waiting indefinitely for release of all GT wakerefs.
>
> <6> [209.151778] i915: Running live_engine_pm_selftests/live_engine_busy_stats
> <7> [209.231628] i915 0000:00:02.0: [drm:intel_power_well_disable [i915]] disabling PW_5
> <7> [209.231816] i915 0000:00:02.0: [drm:intel_power_well_disable [i915]] disabling PW_4
> <7> [209.231944] i915 0000:00:02.0: [drm:intel_power_well_disable [i915]] disabling PW_3
> <7> [209.232056] i915 0000:00:02.0: [drm:intel_power_well_disable [i915]] disabling PW_2
> <7> [209.232166] i915 0000:00:02.0: [drm:intel_power_well_disable [i915]] disabling DC_off
> <7> [209.232270] i915 0000:00:02.0: [drm:skl_enable_dc6 [i915]] Enabling DC6
> <7> [209.232368] i915 0000:00:02.0: [drm:gen9_set_dc_state.part.0 [i915]] Setting DC state from 00 to 02
> <4> [299.356116] [IGT] Inactivity timeout exceeded. Killing the current test with SIGQUIT.
> ...
> <6> [299.356526] sysrq: Show State
> ...
> <6> [299.373964] task:i915_selftest   state:D stack:11784 pid:5578  tgid:5578  ppid:873    flags:0x00004002
> <6> [299.373967] Call Trace:
> <6> [299.373968]  <TASK>
> <6> [299.373970]  __schedule+0x3bb/0xda0
> <6> [299.373974]  schedule+0x41/0x110
> <6> [299.373976]  intel_wakeref_wait_for_idle+0x82/0x100 [i915]
> <6> [299.374083]  ? __pfx_var_wake_function+0x10/0x10
> <6> [299.374087]  live_engine_busy_stats+0x9b/0x500 [i915]
> <6> [299.374173]  __i915_subtests+0xbe/0x240 [i915]
> <6> [299.374277]  ? __pfx___intel_gt_live_setup+0x10/0x10 [i915]
> <6> [299.374369]  ? __pfx___intel_gt_live_teardown+0x10/0x10 [i915]
> <6> [299.374456]  intel_engine_live_selftests+0x1c/0x30 [i915]
> <6> [299.374547]  __run_selftests+0xbb/0x190 [i915]
> <6> [299.374635]  i915_live_selftests+0x4b/0x90 [i915]
> <6> [299.374717]  i915_pci_probe+0x10d/0x210 [i915]
>
> At the end of the interrupt worker, if there are no more engines awake,
> disarm the breadcrumb and go to sleep.
>
> Fixes: 9d5612ca165a ("drm/i915/gt: Defer enabling the breadcrumb interrupt to after submission")
> Closes: https://gitlab.freedesktop.org/drm/intel/issues/10026
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Andrzej Hajda <andrzej.hajda@intel.com>
> Cc: <stable@vger.kernel.org> # v5.12+
> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>


Acked-by: Nirmoy Das <nirmoy.das@intel.com>

I will let others/Andrzej r-b this as I am not very familiar with the code.


Thanks,

Nirmoy

> ---
>   drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 15 +++++++--------
>   1 file changed, 7 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> index d650beb8ed22f..20b9b04ec1e0b 100644
> --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> @@ -263,8 +263,13 @@ static void signal_irq_work(struct irq_work *work)
>   		i915_request_put(rq);
>   	}
>   
> +	/* Lazy irq enabling after HW submission */
>   	if (!READ_ONCE(b->irq_armed) && !list_empty(&b->signalers))
>   		intel_breadcrumbs_arm_irq(b);
> +
> +	/* And confirm that we still want irqs enabled before we yield */
> +	if (READ_ONCE(b->irq_armed) && !atomic_read(&b->active))
> +		intel_breadcrumbs_disarm_irq(b);
>   }
>   
>   struct intel_breadcrumbs *
> @@ -315,13 +320,7 @@ void __intel_breadcrumbs_park(struct intel_breadcrumbs *b)
>   		return;
>   
>   	/* Kick the work once more to drain the signalers, and disarm the irq */
> -	irq_work_sync(&b->irq_work);
> -	while (READ_ONCE(b->irq_armed) && !atomic_read(&b->active)) {
> -		local_irq_disable();
> -		signal_irq_work(&b->irq_work);
> -		local_irq_enable();
> -		cond_resched();
> -	}
> +	irq_work_queue(&b->irq_work);
>   }
>   
>   void intel_breadcrumbs_free(struct kref *kref)
> @@ -404,7 +403,7 @@ static void insert_breadcrumb(struct i915_request *rq)
>   	 * the request as it may have completed and raised the interrupt as
>   	 * we were attaching it into the lists.
>   	 */
> -	if (!b->irq_armed || __i915_request_is_complete(rq))
> +	if (!READ_ONCE(b->irq_armed) || __i915_request_is_complete(rq))
>   		irq_work_queue(&b->irq_work);
>   }
>   

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] drm/i915/gt: Disarm breadcrumbs if engines are already idle
  2024-04-26 16:13 ` Nirmoy Das
@ 2024-04-29  9:20   ` Janusz Krzysztofik
  0 siblings, 0 replies; 3+ messages in thread
From: Janusz Krzysztofik @ 2024-04-29  9:20 UTC (permalink / raw)
  To: Andrzej Hajda
  Cc: intel-gfx, Nirmoy Das, dri-devel, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, Tvrtko Ursulin, Andi Shyti, Chris Wilson

Hi Andrzej,

On Friday, 26 April 2024 18:13:02 CEST Nirmoy Das wrote:
> 
> On 4/23/2024 6:23 PM, Janusz Krzysztofik wrote:
> > From: Chris Wilson <chris@chris-wilson.co.uk>
> >
> > The breadcrumbs use a GT wakeref for guarding the interrupt, but are
> > disarmed during release of the engine wakeref. This leaves a hole where
> > we may attach a breadcrumb just as the engine is parking (after it has
> > parked its breadcrumbs), execute the irq worker with some signalers still
> > attached, but never be woken again.
> >
> > That issue manifests itself in CI with IGT runner timeouts while tests
> > are waiting indefinitely for release of all GT wakerefs.
> >
> > <6> [209.151778] i915: Running live_engine_pm_selftests/live_engine_busy_stats
> > <7> [209.231628] i915 0000:00:02.0: [drm:intel_power_well_disable [i915]] disabling PW_5
> > <7> [209.231816] i915 0000:00:02.0: [drm:intel_power_well_disable [i915]] disabling PW_4
> > <7> [209.231944] i915 0000:00:02.0: [drm:intel_power_well_disable [i915]] disabling PW_3
> > <7> [209.232056] i915 0000:00:02.0: [drm:intel_power_well_disable [i915]] disabling PW_2
> > <7> [209.232166] i915 0000:00:02.0: [drm:intel_power_well_disable [i915]] disabling DC_off
> > <7> [209.232270] i915 0000:00:02.0: [drm:skl_enable_dc6 [i915]] Enabling DC6
> > <7> [209.232368] i915 0000:00:02.0: [drm:gen9_set_dc_state.part.0 [i915]] Setting DC state from 00 to 02
> > <4> [299.356116] [IGT] Inactivity timeout exceeded. Killing the current test with SIGQUIT.
> > ...
> > <6> [299.356526] sysrq: Show State
> > ...
> > <6> [299.373964] task:i915_selftest   state:D stack:11784 pid:5578  tgid:5578  ppid:873    flags:0x00004002
> > <6> [299.373967] Call Trace:
> > <6> [299.373968]  <TASK>
> > <6> [299.373970]  __schedule+0x3bb/0xda0
> > <6> [299.373974]  schedule+0x41/0x110
> > <6> [299.373976]  intel_wakeref_wait_for_idle+0x82/0x100 [i915]
> > <6> [299.374083]  ? __pfx_var_wake_function+0x10/0x10
> > <6> [299.374087]  live_engine_busy_stats+0x9b/0x500 [i915]
> > <6> [299.374173]  __i915_subtests+0xbe/0x240 [i915]
> > <6> [299.374277]  ? __pfx___intel_gt_live_setup+0x10/0x10 [i915]
> > <6> [299.374369]  ? __pfx___intel_gt_live_teardown+0x10/0x10 [i915]
> > <6> [299.374456]  intel_engine_live_selftests+0x1c/0x30 [i915]
> > <6> [299.374547]  __run_selftests+0xbb/0x190 [i915]
> > <6> [299.374635]  i915_live_selftests+0x4b/0x90 [i915]
> > <6> [299.374717]  i915_pci_probe+0x10d/0x210 [i915]
> >
> > At the end of the interrupt worker, if there are no more engines awake,
> > disarm the breadcrumb and go to sleep.
> >
> > Fixes: 9d5612ca165a ("drm/i915/gt: Defer enabling the breadcrumb interrupt to after submission")
> > Closes: https://gitlab.freedesktop.org/drm/intel/issues/10026
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Andrzej Hajda <andrzej.hajda@intel.com>
> > Cc: <stable@vger.kernel.org> # v5.12+
> > Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> 
> 
> Acked-by: Nirmoy Das <nirmoy.das@intel.com>
> 
> I will let others/Andrzej r-b this as I am not very familiar with the code.

This patch should be familiar to you, could you please take a look?

Thanks,
Janusz

> 
> 
> Thanks,
> 
> Nirmoy
> 
> > ---
> >   drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 15 +++++++--------
> >   1 file changed, 7 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> > index d650beb8ed22f..20b9b04ec1e0b 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> > @@ -263,8 +263,13 @@ static void signal_irq_work(struct irq_work *work)
> >   		i915_request_put(rq);
> >   	}
> >   
> > +	/* Lazy irq enabling after HW submission */
> >   	if (!READ_ONCE(b->irq_armed) && !list_empty(&b->signalers))
> >   		intel_breadcrumbs_arm_irq(b);
> > +
> > +	/* And confirm that we still want irqs enabled before we yield */
> > +	if (READ_ONCE(b->irq_armed) && !atomic_read(&b->active))
> > +		intel_breadcrumbs_disarm_irq(b);
> >   }
> >   
> >   struct intel_breadcrumbs *
> > @@ -315,13 +320,7 @@ void __intel_breadcrumbs_park(struct intel_breadcrumbs *b)
> >   		return;
> >   
> >   	/* Kick the work once more to drain the signalers, and disarm the irq */
> > -	irq_work_sync(&b->irq_work);
> > -	while (READ_ONCE(b->irq_armed) && !atomic_read(&b->active)) {
> > -		local_irq_disable();
> > -		signal_irq_work(&b->irq_work);
> > -		local_irq_enable();
> > -		cond_resched();
> > -	}
> > +	irq_work_queue(&b->irq_work);
> >   }
> >   
> >   void intel_breadcrumbs_free(struct kref *kref)
> > @@ -404,7 +403,7 @@ static void insert_breadcrumb(struct i915_request *rq)
> >   	 * the request as it may have completed and raised the interrupt as
> >   	 * we were attaching it into the lists.
> >   	 */
> > -	if (!b->irq_armed || __i915_request_is_complete(rq))
> > +	if (!READ_ONCE(b->irq_armed) || __i915_request_is_complete(rq))
> >   		irq_work_queue(&b->irq_work);
> >   }
> >   
> 





^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-04-29  9:20 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-23 16:23 [PATCH] drm/i915/gt: Disarm breadcrumbs if engines are already idle Janusz Krzysztofik
2024-04-26 16:13 ` Nirmoy Das
2024-04-29  9:20   ` Janusz Krzysztofik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).