All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Wajdeczko <michal.wajdeczko@intel.com>
To: John.C.Harrison@Intel.com, Intel-GFX@Lists.FreeDesktop.Org
Cc: DRI-Devel@Lists.FreeDesktop.Org
Subject: Re: [Intel-gfx] [PATCH 6/6] drm/i915/guc: Don't abort on CTB_UNUSED status
Date: Thu, 28 Jul 2022 21:06:02 +0200	[thread overview]
Message-ID: <d1f3646b-ff37-0120-4284-7861b23e30b3@intel.com> (raw)
In-Reply-To: <20220728024225.2363663-7-John.C.Harrison@Intel.com>



On 28.07.2022 04:42, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> When the KMD sends a CLIENT_RESET request to GuC (as part of the
> suspend sequence), GuC will mark the CTB buffer as 'UNUSED'. If the

hmm, GuC shouldn't do that on CLIENT_RESET, GuC shall only mark CTB as
UNUSED when we explicitly disable CTB using CONTROL_CTB as only then CTB
descriptors are known to be valid

> KMD then checked the CTB queue, it would see a non-zero status value
> and report the buffer as corrupted.
> 
> Technically, no G2H messages should be received once the CLIENT_RESET
> has been sent. However, if a context was outstanding on an engine then
> it would get reset and a reset notification would be sent. So, don't
> actually treat UNUSED as a catastrophic error. Just flag it up as
> unexpected and keep going.

we should have already marked locally that CTB is disabled, either as
part of the explicit disabling of CTB with CONTROL_CTB, or implicit due
to issued CLIENT_RESET, but in both cases we shouldn't try to read CTB
any more, even it there are any outstanding messages ...

is this due to a race with ct->enabled ?

> 
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> ---
>  .../i915/gt/uc/abi/guc_communication_ctb_abi.h |  8 +++++---
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c      | 18 ++++++++++++++++--
>  2 files changed, 21 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h
> index df83c1cc7c7a6..28b8387f97b77 100644
> --- a/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h
> +++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_communication_ctb_abi.h
> @@ -37,6 +37,7 @@
>   *  |   |       |   - _`GUC_CTB_STATUS_OVERFLOW` = 1 (head/tail too large)     |
>   *  |   |       |   - _`GUC_CTB_STATUS_UNDERFLOW` = 2 (truncated message)      |
>   *  |   |       |   - _`GUC_CTB_STATUS_MISMATCH` = 4 (head/tail modified)      |
> + *  |   |       |   - _`GUC_CTB_STATUS_UNUSED` = 8 (CTB is not in use)         |
>   *  +---+-------+--------------------------------------------------------------+
>   *  |...|       | RESERVED = MBZ                                               |
>   *  +---+-------+--------------------------------------------------------------+
> @@ -49,9 +50,10 @@ struct guc_ct_buffer_desc {
>  	u32 tail;
>  	u32 status;
>  #define GUC_CTB_STATUS_NO_ERROR				0
> -#define GUC_CTB_STATUS_OVERFLOW				(1 << 0)
> -#define GUC_CTB_STATUS_UNDERFLOW			(1 << 1)
> -#define GUC_CTB_STATUS_MISMATCH				(1 << 2)
> +#define GUC_CTB_STATUS_OVERFLOW				BIT(0)
> +#define GUC_CTB_STATUS_UNDERFLOW			BIT(1)
> +#define GUC_CTB_STATUS_MISMATCH				BIT(2)
> +#define GUC_CTB_STATUS_UNUSED				BIT(3)

nit: our goal was to use plain C definitions in ABI headers as much as
possible without introducing any dependency on external macros

>  	u32 reserved[13];
>  } __packed;
>  static_assert(sizeof(struct guc_ct_buffer_desc) == 64);
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index f01325cd1b625..11b5d4ddb19ce 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -816,8 +816,22 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
>  	if (unlikely(ctb->broken))
>  		return -EPIPE;
>  
> -	if (unlikely(desc->status))
> -		goto corrupted;
> +	if (unlikely(desc->status)) {
> +		u32 status = desc->status;
> +
> +		if (status & GUC_CTB_STATUS_UNUSED) {
> +			/*
> +			 * Potentially valid if a CLIENT_RESET request resulted in
> +			 * contexts/engines being reset. But should never happen as
> +			 * no contexts should be active when CLIENT_RESET is sent.
> +			 */
> +			CT_ERROR(ct, "Unexpected G2H after GuC has stopped!\n");
> +			status &= ~GUC_CTB_STATUS_UNUSED;

do you really want to continue read messages from already disabled CTB ?
maybe instead of clearing GUC_CTB_STATUS_UNUSED bit we should just return?

Michal

> +		}
> +
> +		if (status)
> +			goto corrupted;
> +	}
>  
>  	GEM_BUG_ON(head > size);
>  

  reply	other threads:[~2022-07-28 19:07 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-28  2:42 [PATCH 0/6] Random assortment of (mostly) GuC related patches John.C.Harrison
2022-07-28  2:42 ` [Intel-gfx] " John.C.Harrison
2022-07-28  2:42 ` [PATCH 1/6] drm/i915/guc: Route semaphores to GuC for Gen12+ John.C.Harrison
2022-07-28  2:42   ` [Intel-gfx] " John.C.Harrison
2022-07-28  2:42 ` [PATCH 2/6] drm/i915/guc: Fix issues with live_preempt_cancel John.C.Harrison
2022-07-28  2:42   ` [Intel-gfx] " John.C.Harrison
2022-07-28  2:45   ` John Harrison
2022-07-28  2:45     ` [Intel-gfx] " John Harrison
2022-07-28  2:42 ` [PATCH 3/6] drm/i915/guc: Add selftest for a hung GuC John.C.Harrison
2022-07-28  2:42   ` [Intel-gfx] " John.C.Harrison
2022-07-28 18:21   ` John Harrison
2022-07-28 18:21     ` [Intel-gfx] " John Harrison
2022-07-28 18:26     ` John.C.Harrison
2022-07-28 18:26       ` [Intel-gfx] " John.C.Harrison
2022-07-28 18:33       ` John Harrison
2022-07-28 18:33         ` [Intel-gfx] " John Harrison
2022-07-28  2:42 ` [PATCH 4/6] drm/i915/selftest: Cope with not having an RCS engine John.C.Harrison
2022-07-28  2:42   ` [Intel-gfx] " John.C.Harrison
2022-07-28  2:42 ` [PATCH 5/6] drm/i915/guc: Support larger contexts on newer hardware John.C.Harrison
2022-07-28  2:42   ` [Intel-gfx] " John.C.Harrison
2022-07-28  2:46   ` John Harrison
2022-07-28  2:46     ` [Intel-gfx] " John Harrison
2022-07-28  2:42 ` [PATCH 6/6] drm/i915/guc: Don't abort on CTB_UNUSED status John.C.Harrison
2022-07-28  2:42   ` [Intel-gfx] " John.C.Harrison
2022-07-28 19:06   ` Michal Wajdeczko [this message]
2022-07-28 19:38     ` John Harrison
2022-07-29  0:00   ` Ceraolo Spurio, Daniele
2022-07-29  0:35     ` John Harrison
2022-07-28  3:09 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Random assortment of (mostly) GuC related patches (rev3) Patchwork
2022-07-28  3:09 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2022-07-28  3:34 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2022-07-28 10:44 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork
2022-07-28 19:17 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Random assortment of (mostly) GuC related patches (rev4) Patchwork
2022-07-28 19:17 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2022-07-28 19:40 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
2022-07-29  0:40   ` John Harrison

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d1f3646b-ff37-0120-4284-7861b23e30b3@intel.com \
    --to=michal.wajdeczko@intel.com \
    --cc=DRI-Devel@Lists.FreeDesktop.Org \
    --cc=Intel-GFX@Lists.FreeDesktop.Org \
    --cc=John.C.Harrison@Intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.