stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] drm/i915/gt: Delay execlist processing for tgl
@ 2020-10-15 19:50 Chris Wilson
  2020-10-16  1:08 ` [Intel-gfx] " Shi, Yang A
  2020-10-16  7:07 ` Mika Kuoppala
  0 siblings, 2 replies; 3+ messages in thread
From: Chris Wilson @ 2020-10-15 19:50 UTC (permalink / raw)
  To: intel-gfx
  Cc: Chris Wilson, Mika Kuoppala, Bruce Chang, Joonas Lahtinen, stable

When running gem_exec_nop, it floods the system with many requests (with
the goal of userspace submitting faster than the HW can process a single
empty batch). This causes the driver to continually resubmit new
requests onto the end of an active context, a flood of lite-restore
preemptions. If we time this just right, Tigerlake hangs.

Inserting a small delay between the processing of CS events and
submitting the next context, prevents the hang. Naturally it does not
occur with debugging enabled. The suspicion then is that this is related
to the issues with the CS event buffer, and inserting an mmio read of
the CS pointer status appears to be very successful in preventing the
hang. Other registers, or uncached reads, or plain mb, do not prevent
the hang, suggesting that register is key -- but that the hang can be
prevented by a simple udelay, suggests it is just a timing issue like
that encountered by commit 233c1ae3c83f ("drm/i915/gt: Wait for CSB
entries on Tigerlake"). Also note that the hang is not prevented by
applying CTX_DESC_FORCE_RESTORE, or by inserting a delay on the GPU
between requests.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Bruce Chang <yu.bruce.chang@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: stable@vger.kernel.org
---
 drivers/gpu/drm/i915/gt/intel_lrc.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 6170f6874f52..d15d561152ba 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -2711,6 +2711,9 @@ static void process_csb(struct intel_engine_cs *engine)
 			smp_wmb(); /* complete the seqlock */
 			WRITE_ONCE(execlists->active, execlists->inflight);
 
+			/* Magic delay for tgl */
+			ENGINE_POSTING_READ(engine, RING_CONTEXT_STATUS_PTR);
+
 			WRITE_ONCE(execlists->pending[0], NULL);
 		} else {
 			if (GEM_WARN_ON(!*execlists->active)) {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* RE: [Intel-gfx] [PATCH] drm/i915/gt: Delay execlist processing for tgl
  2020-10-15 19:50 [PATCH] drm/i915/gt: Delay execlist processing for tgl Chris Wilson
@ 2020-10-16  1:08 ` Shi, Yang A
  2020-10-16  7:07 ` Mika Kuoppala
  1 sibling, 0 replies; 3+ messages in thread
From: Shi, Yang A @ 2020-10-16  1:08 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: stable

Hi Chris:
	
	How to determine the length of the magic delay in here?


Best Regards.
Yang


> -----Original Message-----
> From: Intel-gfx <intel-gfx-bounces@lists.freedesktop.org> On Behalf Of Chris Wilson
> Sent: Friday, October 16, 2020 3:50 AM
> To: intel-gfx@lists.freedesktop.org
> Cc: stable@vger.kernel.org; Chris Wilson <chris@chris-wilson.co.uk>
> Subject: [Intel-gfx] [PATCH] drm/i915/gt: Delay execlist processing for tgl
> 
> When running gem_exec_nop, it floods the system with many requests (with the goal of
> userspace submitting faster than the HW can process a single empty batch). This causes
> the driver to continually resubmit new requests onto the end of an active context, a flood
> of lite-restore preemptions. If we time this just right, Tigerlake hangs.
> 
> Inserting a small delay between the processing of CS events and submitting the next
> context, prevents the hang. Naturally it does not occur with debugging enabled. The
> suspicion then is that this is related to the issues with the CS event buffer, and inserting
> an mmio read of the CS pointer status appears to be very successful in preventing the
> hang. Other registers, or uncached reads, or plain mb, do not prevent the hang, suggesting
> that register is key -- but that the hang can be prevented by a simple udelay, suggests it is
> just a timing issue like that encountered by commit 233c1ae3c83f ("drm/i915/gt: Wait for
> CSB entries on Tigerlake"). Also note that the hang is not prevented by applying
> CTX_DESC_FORCE_RESTORE, or by inserting a delay on the GPU between requests.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Bruce Chang <yu.bruce.chang@intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: stable@vger.kernel.org
> ---
>  drivers/gpu/drm/i915/gt/intel_lrc.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> index 6170f6874f52..d15d561152ba 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> @@ -2711,6 +2711,9 @@ static void process_csb(struct intel_engine_cs *engine)
>  			smp_wmb(); /* complete the seqlock */
>  			WRITE_ONCE(execlists->active, execlists->inflight);
> 
> +			/* Magic delay for tgl */
> +			ENGINE_POSTING_READ(engine,
> RING_CONTEXT_STATUS_PTR);
> +
>  			WRITE_ONCE(execlists->pending[0], NULL);
>  		} else {
>  			if (GEM_WARN_ON(!*execlists->active)) {
> --
> 2.20.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] drm/i915/gt: Delay execlist processing for tgl
  2020-10-15 19:50 [PATCH] drm/i915/gt: Delay execlist processing for tgl Chris Wilson
  2020-10-16  1:08 ` [Intel-gfx] " Shi, Yang A
@ 2020-10-16  7:07 ` Mika Kuoppala
  1 sibling, 0 replies; 3+ messages in thread
From: Mika Kuoppala @ 2020-10-16  7:07 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx
  Cc: Chris Wilson, Bruce Chang, Joonas Lahtinen, stable

Chris Wilson <chris@chris-wilson.co.uk> writes:

> When running gem_exec_nop, it floods the system with many requests (with
> the goal of userspace submitting faster than the HW can process a single
> empty batch). This causes the driver to continually resubmit new
> requests onto the end of an active context, a flood of lite-restore
> preemptions. If we time this just right, Tigerlake hangs.
>
> Inserting a small delay between the processing of CS events and
> submitting the next context, prevents the hang. Naturally it does not
> occur with debugging enabled. The suspicion then is that this is related
> to the issues with the CS event buffer, and inserting an mmio read of
> the CS pointer status appears to be very successful in preventing the
> hang. Other registers, or uncached reads, or plain mb, do not prevent
> the hang, suggesting that register is key -- but that the hang can be
> prevented by a simple udelay, suggests it is just a timing issue like
> that encountered by commit 233c1ae3c83f ("drm/i915/gt: Wait for CSB
> entries on Tigerlake"). Also note that the hang is not prevented by
> applying CTX_DESC_FORCE_RESTORE, or by inserting a delay on the GPU
> between requests.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Bruce Chang <yu.bruce.chang@intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: stable@vger.kernel.org

Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>

> ---
>  drivers/gpu/drm/i915/gt/intel_lrc.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> index 6170f6874f52..d15d561152ba 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> @@ -2711,6 +2711,9 @@ static void process_csb(struct intel_engine_cs *engine)
>  			smp_wmb(); /* complete the seqlock */
>  			WRITE_ONCE(execlists->active, execlists->inflight);
>  
> +			/* Magic delay for tgl */
> +			ENGINE_POSTING_READ(engine, RING_CONTEXT_STATUS_PTR);
> +
>  			WRITE_ONCE(execlists->pending[0], NULL);
>  		} else {
>  			if (GEM_WARN_ON(!*execlists->active)) {
> -- 
> 2.20.1

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-10-16  7:08 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-15 19:50 [PATCH] drm/i915/gt: Delay execlist processing for tgl Chris Wilson
2020-10-16  1:08 ` [Intel-gfx] " Shi, Yang A
2020-10-16  7:07 ` Mika Kuoppala

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).