dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
From: John Harrison <john.c.harrison@intel.com>
To: Matthew Brost <matthew.brost@intel.com>,
	<intel-gfx@lists.freedesktop.org>,
	<dri-devel@lists.freedesktop.org>
Cc: <daniele.ceraolospurio@intel.com>
Subject: Re: [Intel-gfx] [PATCH 11/27] drm/i915/guc: Copy whole golden context, set engine state size of subset
Date: Thu, 26 Aug 2021 16:33:28 -0700	[thread overview]
Message-ID: <b5aecbb0-3f27-50c0-e54e-a62593b9e22c@intel.com> (raw)
In-Reply-To: <20210826032327.18078-12-matthew.brost@intel.com>

On 8/25/2021 20:23, Matthew Brost wrote:
> When the GuC does a media reset, it copies a golden context state back
> into the corrupted context's state. The address of the golden context
> and the size of the engine state restore are passed in via the GuC ADS.
> The i915 had a bug where it passed in the whole size of the golden
> context, not the size of the engine state to restore resulting in a
> memory corruption.
>
> Also copy the entire golden context on init rather than just the engine
> state that is restored.
>
> Fixes: 481d458caede ("drm/i915/guc: Add golden context to GuC ADS")
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c | 28 +++++++++++++++++-----
>   1 file changed, 22 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
> index 6926919bcac6..df2734bfe078 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
> @@ -358,6 +358,11 @@ static int guc_prep_golden_context(struct intel_guc *guc,
>   	u8 engine_class, guc_class;
>   	struct guc_gt_system_info *info, local_info;
>   
> +	/* Skip execlist and PPGTT registers + HWSP */
> +	const u32 lr_hw_context_size = 80 * sizeof(u32);
> +	const u32 skip_size = LRC_PPHWSP_SZ * PAGE_SIZE +
> +		lr_hw_context_size;
> +
>   	/*
>   	 * Reserve the memory for the golden contexts and point GuC at it but
>   	 * leave it empty for now. The context data will be filled in later
> @@ -396,7 +401,18 @@ static int guc_prep_golden_context(struct intel_guc *guc,
>   		if (!blob)
>   			continue;
>   
> -		blob->ads.eng_state_size[guc_class] = real_size;
> +		/*
> +		 * This interface is slightly confusing. We need to pass the
> +		 * base address of the golden context and the engine state size
> +		 * which is not the size of the whole golden context, it is a
> +		 * subset that the GuC uses when doing a watchdog reset. The
> +		 * engine state size must match the size of the golden context
> +		 * minus the first part of the golden context that the GuC does
> +		 * not retore during reset. Currently no real way to verify this
> +		 * other than reading the GuC spec / code and ensuring the
> +		 * 'skip_size' below matches the value used in the GuC code.
> +		 */
> +		blob->ads.eng_state_size[guc_class] = real_size - skip_size;
>   		blob->ads.golden_context_lrca[guc_class] = addr_ggtt;
>   		addr_ggtt += alloc_size;
>   	}
> @@ -437,8 +453,8 @@ static void guc_init_golden_context(struct intel_guc *guc)
>   	u8 *ptr;
>   
>   	/* Skip execlist and PPGTT registers + HWSP */
> -	const u32 lr_hw_context_size = 80 * sizeof(u32);
> -	const u32 skip_size = LRC_PPHWSP_SZ * PAGE_SIZE +
> +	__maybe_unused const u32 lr_hw_context_size = 80 * sizeof(u32);
> +	__maybe_unused const u32 skip_size = LRC_PPHWSP_SZ * PAGE_SIZE +
>   		lr_hw_context_size;
Not sure why the 'maybe unused'? The values are not only used in BUG_ONs 
or such that could vanish.

More importantly, you now have two sets of definitions for these magic 
numbers. That seems like a very bad idea. They should be moved into a 
helper function rather than repeated.

John.


>   
>   	if (!intel_uc_uses_guc_submission(&gt->uc))
> @@ -476,12 +492,12 @@ static void guc_init_golden_context(struct intel_guc *guc)
>   			continue;
>   		}
>   
> -		GEM_BUG_ON(blob->ads.eng_state_size[guc_class] != real_size);
> +		GEM_BUG_ON(blob->ads.eng_state_size[guc_class] !=
> +			   real_size - skip_size);
>   		GEM_BUG_ON(blob->ads.golden_context_lrca[guc_class] != addr_ggtt);
>   		addr_ggtt += alloc_size;
>   
> -		shmem_read(engine->default_state, skip_size, ptr + skip_size,
> -			   real_size - skip_size);
> +		shmem_read(engine->default_state, 0, ptr, real_size);
>   		ptr += alloc_size;
>   	}
>   


  parent reply	other threads:[~2021-08-26 23:33 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-26  3:23 [PATCH 00/27] Clean up GuC CI failures, simplify locking, and kernel DOC Matthew Brost
2021-08-26  3:23 ` [PATCH 01/27] drm/i915/guc: Fix blocked context accounting Matthew Brost
2021-08-26  3:23 ` [PATCH 02/27] drm/i915/guc: Fix outstanding G2H accounting Matthew Brost
2021-08-26 23:09   ` Daniele Ceraolo Spurio
2021-08-27  1:36     ` Matthew Brost
2021-08-26  3:23 ` [PATCH 03/27] drm/i915/guc: Unwind context requests in reverse order Matthew Brost
2021-08-26  3:23 ` [PATCH 04/27] drm/i915/guc: Don't drop ce->guc_active.lock when unwinding context Matthew Brost
2021-08-26  3:23 ` [PATCH 05/27] drm/i915/guc: Process all G2H message at once in work queue Matthew Brost
2021-08-26  3:23 ` [PATCH 06/27] drm/i915/guc: Workaround reset G2H is received after schedule done G2H Matthew Brost
2021-08-26 23:11   ` Daniele Ceraolo Spurio
2021-08-26  3:23 ` [PATCH 07/27] Revert "drm/i915/gt: Propagate change in error status to children on unhold" Matthew Brost
2021-08-26  3:23 ` [PATCH 08/27] drm/i915/selftests: Add a cancel request selftest that triggers a reset Matthew Brost
2021-08-26  9:32   ` [Intel-gfx] " Tvrtko Ursulin
2021-08-26 14:00     ` Matthew Brost
2021-08-26  3:23 ` [PATCH 09/27] drm/i915/guc: Kick tasklet after queuing a request Matthew Brost
2021-08-26  3:23 ` [PATCH 10/27] drm/i915/guc: Don't enable scheduling on a banned context, guc_id invalid, not registered Matthew Brost
2021-08-26  3:23 ` [PATCH 11/27] drm/i915/guc: Copy whole golden context, set engine state size of subset Matthew Brost
2021-08-26 23:21   ` Daniele Ceraolo Spurio
2021-08-26 23:33   ` John Harrison [this message]
2021-08-26  3:23 ` [PATCH 12/27] drm/i915/selftests: Add initial GuC selftest for scrubbing lost G2H Matthew Brost
2021-08-26  3:23 ` [PATCH 13/27] drm/i915/guc: Take context ref when cancelling request Matthew Brost
2021-08-26  3:23 ` [PATCH 14/27] drm/i915/guc: Don't touch guc_state.sched_state without a lock Matthew Brost
2021-08-26  3:23 ` [PATCH 15/27] drm/i915/guc: Reset LRC descriptor if register returns -ENODEV Matthew Brost
2021-08-26  3:23 ` [PATCH 16/27] drm/i915: Allocate error capture in nowait context Matthew Brost
2021-08-26 16:18   ` [Intel-gfx] " Matthew Brost
2021-08-26 16:21   ` Daniel Vetter
2021-08-26  3:23 ` [PATCH 17/27] drm/i915/guc: Flush G2H work queue during reset Matthew Brost
2021-08-26  3:23 ` [PATCH 18/27] drm/i915/guc: Release submit fence from an irq_work Matthew Brost
2021-08-26  3:23 ` [PATCH 19/27] drm/i915/guc: Move guc_blocked fence to struct guc_state Matthew Brost
2021-08-26  3:23 ` [PATCH 20/27] drm/i915/guc: Rework and simplify locking Matthew Brost
2021-08-26  3:23 ` [PATCH 21/27] drm/i915/guc: Proper xarray usage for contexts_lookup Matthew Brost
2021-08-26  3:23 ` [PATCH 22/27] drm/i915/guc: Drop pin count check trick between sched_disable and re-pin Matthew Brost
2021-08-26  3:23 ` [PATCH 23/27] drm/i915/guc: Move GuC priority fields in context under guc_active Matthew Brost
2021-08-26 23:26   ` Daniele Ceraolo Spurio
2021-08-26  3:23 ` [PATCH 24/27] drm/i915/guc: Move fields protected by guc->contexts_lock into sub structure Matthew Brost
2021-08-26  3:23 ` [PATCH 25/27] drm/i915/guc: Drop guc_active move everything into guc_state Matthew Brost
2021-08-26  3:23 ` [PATCH 26/27] drm/i915/guc: Add GuC kernel doc Matthew Brost
2021-08-31 19:04   ` [Intel-gfx] " John Harrison
2021-08-26  3:23 ` [PATCH 27/27] drm/i915/guc: Drop static inline functions intel_guc_submission.c Matthew Brost
2021-08-31 19:09   ` [Intel-gfx] " John Harrison

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b5aecbb0-3f27-50c0-e54e-a62593b9e22c@intel.com \
    --to=john.c.harrison@intel.com \
    --cc=daniele.ceraolospurio@intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=matthew.brost@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).