All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>, intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH 4/6] drm/i915/ringbuffer: Move irq seqno barrier to the GPU for gen7
Date: Mon, 31 Dec 2018 10:43:07 +0000	[thread overview]
Message-ID: <8a4a057e-c784-2466-66ff-e69055200426@linux.intel.com> (raw)
In-Reply-To: <20181228171641.16531-4-chris@chris-wilson.co.uk>


On 28/12/2018 17:16, Chris Wilson wrote:
> The irq_seqno_barrier is a tradeoff between doing work on every request
> (on the GPU) and doing work after every interrupt (on the CPU). We
> presume we have many more requests than interrupts! However, the current
> w/a for Ivybridge is an implicit delay that currently fails sporadically
> and consistently if we move the w/a into the irq handler itself. This
> makes the CPU barrier untenable for upcoming interrupt handler changes
> and so we need to replace it with a delay on the GPU before we send the
> MI_USER_INTERRUPT. As it turns out that delay is 32x MI_STORE_DWORD_IMM,
> or about 0.6us per request! Quite nasty, but the lesser of two evils
> looking to the future.
> 
> Testcase: igt/gem_sync
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/intel_ringbuffer.c | 80 ++++++++++++++-----------
>   1 file changed, 44 insertions(+), 36 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 2fb3a364c390..dd996103d495 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -443,6 +443,34 @@ static void gen6_xcs_emit_breadcrumb(struct i915_request *rq, u32 *cs)
>   }
>   static const int gen6_xcs_emit_breadcrumb_sz = 4;
>   
> +#define GEN7_XCS_WA 32
> +static void gen7_xcs_emit_breadcrumb(struct i915_request *rq, u32 *cs)
> +{
> +	int i;
> +
> +	*cs++ = MI_FLUSH_DW | MI_FLUSH_DW_OP_STOREDW;
> +	*cs++ = intel_hws_seqno_address(rq->engine) | MI_FLUSH_DW_USE_GTT;
> +	*cs++ = rq->global_seqno;
> +
> +	for (i = 0; i < GEN7_XCS_WA; i++) {
> +		*cs++ = MI_STORE_DWORD_INDEX;
> +		*cs++ = I915_GEM_HWS_INDEX_ADDR;
> +		*cs++ = rq->global_seqno;
> +	}
> +
> +	*cs++ = MI_FLUSH_DW;
> +	*cs++ = 0;
> +	*cs++ = 0;
> +
> +	*cs++ = MI_USER_INTERRUPT;
> +	*cs++ = MI_NOOP;
> +
> +	rq->tail = intel_ring_offset(rq, cs);
> +	assert_ring_tail_valid(rq->ring, rq->tail);
> +}
> +static const int gen7_xcs_emit_breadcrumb_sz = 8 + GEN7_XCS_WA * 3;

Is it worth moving this before the function, and then in the function 
have a little GEM_BUG_ON using cs pointer arithmetic to check the two 
sizes match? Or even simpler the function could use 
engine->emit_breadcrumb_sz to check against.

> +#undef GEN7_XCS_WA
> +
>   static void set_hwstam(struct intel_engine_cs *engine, u32 mask)
>   {
>   	/*
> @@ -874,31 +902,6 @@ gen5_seqno_barrier(struct intel_engine_cs *engine)
>   	usleep_range(125, 250);
>   }
>   
> -static void
> -gen6_seqno_barrier(struct intel_engine_cs *engine)
> -{
> -	struct drm_i915_private *dev_priv = engine->i915;
> -
> -	/* Workaround to force correct ordering between irq and seqno writes on
> -	 * ivb (and maybe also on snb) by reading from a CS register (like
> -	 * ACTHD) before reading the status page.
> -	 *
> -	 * Note that this effectively stalls the read by the time it takes to
> -	 * do a memory transaction, which more or less ensures that the write
> -	 * from the GPU has sufficient time to invalidate the CPU cacheline.
> -	 * Alternatively we could delay the interrupt from the CS ring to give
> -	 * the write time to land, but that would incur a delay after every
> -	 * batch i.e. much more frequent than a delay when waiting for the
> -	 * interrupt (with the same net latency).
> -	 *
> -	 * Also note that to prevent whole machine hangs on gen7, we have to
> -	 * take the spinlock to guard against concurrent cacheline access.
> -	 */
> -	spin_lock_irq(&dev_priv->uncore.lock);
> -	POSTING_READ_FW(RING_ACTHD(engine->mmio_base));
> -	spin_unlock_irq(&dev_priv->uncore.lock);
> -}
> -
>   static void
>   gen5_irq_enable(struct intel_engine_cs *engine)
>   {
> @@ -2258,10 +2261,13 @@ int intel_init_bsd_ring_buffer(struct intel_engine_cs *engine)
>   		engine->emit_flush = gen6_bsd_ring_flush;
>   		engine->irq_enable_mask = GT_BSD_USER_INTERRUPT;
>   
> -		engine->emit_breadcrumb = gen6_xcs_emit_breadcrumb;
> -		engine->emit_breadcrumb_sz = gen6_xcs_emit_breadcrumb_sz;
> -		if (!IS_GEN(dev_priv, 6))
> -			engine->irq_seqno_barrier = gen6_seqno_barrier;
> +		if (IS_GEN(dev_priv, 6)) {
> +			engine->emit_breadcrumb = gen6_xcs_emit_breadcrumb;
> +			engine->emit_breadcrumb_sz = gen6_xcs_emit_breadcrumb_sz;
> +		} else {
> +			engine->emit_breadcrumb = gen7_xcs_emit_breadcrumb;
> +			engine->emit_breadcrumb_sz = gen7_xcs_emit_breadcrumb_sz;
> +		}
>   	} else {
>   		engine->emit_flush = bsd_ring_flush;
>   		if (IS_GEN(dev_priv, 5))
> @@ -2284,10 +2290,13 @@ int intel_init_blt_ring_buffer(struct intel_engine_cs *engine)
>   	engine->emit_flush = gen6_ring_flush;
>   	engine->irq_enable_mask = GT_BLT_USER_INTERRUPT;
>   
> -	engine->emit_breadcrumb = gen6_xcs_emit_breadcrumb;
> -	engine->emit_breadcrumb_sz = gen6_xcs_emit_breadcrumb_sz;
> -	if (!IS_GEN(dev_priv, 6))
> -		engine->irq_seqno_barrier = gen6_seqno_barrier;
> +	if (IS_GEN(dev_priv, 6)) {
> +		engine->emit_breadcrumb = gen6_xcs_emit_breadcrumb;
> +		engine->emit_breadcrumb_sz = gen6_xcs_emit_breadcrumb_sz;
> +	} else {
> +		engine->emit_breadcrumb = gen7_xcs_emit_breadcrumb;
> +		engine->emit_breadcrumb_sz = gen7_xcs_emit_breadcrumb_sz;
> +	}
>   
>   	return intel_init_ring_buffer(engine);
>   }
> @@ -2305,9 +2314,8 @@ int intel_init_vebox_ring_buffer(struct intel_engine_cs *engine)
>   	engine->irq_enable = hsw_vebox_irq_enable;
>   	engine->irq_disable = hsw_vebox_irq_disable;
>   
> -	engine->emit_breadcrumb = gen6_xcs_emit_breadcrumb;
> -	engine->emit_breadcrumb_sz = gen6_xcs_emit_breadcrumb_sz;
> -	engine->irq_seqno_barrier = gen6_seqno_barrier;
> +	engine->emit_breadcrumb = gen7_xcs_emit_breadcrumb;
> +	engine->emit_breadcrumb_sz = gen7_xcs_emit_breadcrumb_sz;
>   
>   	return intel_init_ring_buffer(engine);
>   }
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2018-12-31 10:43 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-28 17:16 [PATCH 1/6] drm/i915: Remove redundant trailing request flush Chris Wilson
2018-12-28 17:16 ` [PATCH 2/6] drm/i915/ringbuffer: Remove irq-seqno w/a for gen6/7 rcs Chris Wilson
2018-12-31 10:28   ` Tvrtko Ursulin
2018-12-28 17:16 ` [PATCH 3/6] drm/i915/ringbuffer: Remove irq-seqno w/a for gen6 xcs Chris Wilson
2018-12-31 10:31   ` Tvrtko Ursulin
2018-12-28 17:16 ` [PATCH 4/6] drm/i915/ringbuffer: Move irq seqno barrier to the GPU for gen7 Chris Wilson
2018-12-31 10:43   ` Tvrtko Ursulin [this message]
2018-12-31 10:56     ` Chris Wilson
2018-12-28 17:16 ` [PATCH 5/6] drm/i915/ringbuffer: Move irq seqno barrier to the GPU for gen5 Chris Wilson
2018-12-31 10:49   ` Tvrtko Ursulin
2018-12-31 11:07     ` Chris Wilson
2018-12-31 11:21       ` Tvrtko Ursulin
2018-12-31 15:25     ` Chris Wilson
2018-12-28 17:16 ` [PATCH 6/6] drm/i915: Drop unused engine->irq_seqno_barrier w/a Chris Wilson
2018-12-31 11:35   ` Tvrtko Ursulin
2018-12-28 17:48 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/6] drm/i915: Remove redundant trailing request flush Patchwork
2018-12-28 18:18 ` ✓ Fi.CI.BAT: success " Patchwork
2018-12-31 15:38   ` Chris Wilson
2018-12-28 22:00 ` ✓ Fi.CI.IGT: " Patchwork
2018-12-31 10:25 ` [PATCH 1/6] " Tvrtko Ursulin
2018-12-31 10:32   ` Chris Wilson
2018-12-31 11:24     ` Tvrtko Ursulin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8a4a057e-c784-2466-66ff-e69055200426@linux.intel.com \
    --to=tvrtko.ursulin@linux.intel.com \
    --cc=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.