From: John Harrison <john.c.harrison@intel.com> To: Matthew Brost <matthew.brost@intel.com>, <dri-devel@lists.freedesktop.org>, <intel-gfx@lists.freedesktop.org> Subject: Re: [PATCH 2/3] drm/i915/guc: Add work queue to trigger a GT reset Date: Thu, 20 Jan 2022 17:34:54 -0800 [thread overview] Message-ID: <4eecb7af-245c-60aa-2eed-0dbb54e65189@intel.com> (raw) In-Reply-To: <20220119212419.23068-3-matthew.brost@intel.com> On 1/19/2022 13:24, Matthew Brost wrote: > The G2H handler needs to be flushed during a GT reset but a G2H > indicating engine reset failure can trigger a GT reset. Add a worker to > trigger the GT when an engine reset failure is received to break this trigger the GT reset? > circular dependency. > > v2: > (John Harrison) > - Store engine reset mask > - Fix typo in commit message > > Signed-off-by: Matthew Brost <matthew.brost@intel.com> > --- > drivers/gpu/drm/i915/gt/uc/intel_guc.h | 9 +++++ > .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 37 +++++++++++++++++-- > 2 files changed, 42 insertions(+), 4 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h > index 9d26a86fe557..c4a9fc7dd246 100644 > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h > @@ -119,6 +119,15 @@ struct intel_guc { > * function as it might be in an atomic context (no sleeping) > */ > struct work_struct destroyed_worker; > + /** > + * @reset_worker: worker to trigger a GT reset after an engine > + * reset fails > + */ > + struct work_struct reset_worker; > + /** > + * @reset_mask: mask of engines that failed to reset > + */ > + intel_engine_mask_t reset_mask; reset_fail_mask might be a less ambiguous name? Same for the worker struct and function. John. > } submission_state; > > /** > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > index 3918f1be114f..514b3060b141 100644 > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > @@ -1731,6 +1731,7 @@ void intel_guc_submission_reset_finish(struct intel_guc *guc) > } > > static void destroyed_worker_func(struct work_struct *w); > +static void reset_worker_func(struct work_struct *w); > > /* > * Set up the memory resources to be shared with the GuC (via the GGTT) > @@ -1761,6 +1762,8 @@ int intel_guc_submission_init(struct intel_guc *guc) > INIT_LIST_HEAD(&guc->submission_state.destroyed_contexts); > INIT_WORK(&guc->submission_state.destroyed_worker, > destroyed_worker_func); > + INIT_WORK(&guc->submission_state.reset_worker, > + reset_worker_func); > > guc->submission_state.guc_ids_bitmap = > bitmap_zalloc(NUMBER_MULTI_LRC_GUC_ID(guc), GFP_KERNEL); > @@ -4026,6 +4029,26 @@ guc_lookup_engine(struct intel_guc *guc, u8 guc_class, u8 instance) > return gt->engine_class[engine_class][instance]; > } > > +static void reset_worker_func(struct work_struct *w) > +{ > + struct intel_guc *guc = container_of(w, struct intel_guc, > + submission_state.reset_worker); > + struct intel_gt *gt = guc_to_gt(guc); > + intel_engine_mask_t reset_mask; > + unsigned long flags; > + > + spin_lock_irqsave(&guc->submission_state.lock, flags); > + reset_mask = guc->submission_state.reset_mask; > + guc->submission_state.reset_mask = 0; > + spin_unlock_irqrestore(&guc->submission_state.lock, flags); > + > + if (likely(reset_mask)) > + intel_gt_handle_error(gt, reset_mask, > + I915_ERROR_CAPTURE, > + "GuC failed to reset engine mask=0x%x\n", > + reset_mask); > +} > + > int intel_guc_engine_failure_process_msg(struct intel_guc *guc, > const u32 *msg, u32 len) > { > @@ -4033,6 +4056,7 @@ int intel_guc_engine_failure_process_msg(struct intel_guc *guc, > struct intel_gt *gt = guc_to_gt(guc); > u8 guc_class, instance; > u32 reason; > + unsigned long flags; > > if (unlikely(len != 3)) { > drm_err(>->i915->drm, "Invalid length %u", len); > @@ -4057,10 +4081,15 @@ int intel_guc_engine_failure_process_msg(struct intel_guc *guc, > drm_err(>->i915->drm, "GuC engine reset request failed on %d:%d (%s) because 0x%08X", > guc_class, instance, engine->name, reason); > > - intel_gt_handle_error(gt, engine->mask, > - I915_ERROR_CAPTURE, > - "GuC failed to reset %s (reason=0x%08x)\n", > - engine->name, reason); > + spin_lock_irqsave(&guc->submission_state.lock, flags); > + guc->submission_state.reset_mask |= engine->mask; > + spin_unlock_irqrestore(&guc->submission_state.lock, flags); > + > + /* > + * A GT reset flushes this worker queue (G2H handler) so we must use > + * another worker to trigger a GT reset. > + */ > + queue_work(system_unbound_wq, &guc->submission_state.reset_worker); > > return 0; > }
WARNING: multiple messages have this Message-ID (diff)
From: John Harrison <john.c.harrison@intel.com> To: Matthew Brost <matthew.brost@intel.com>, <dri-devel@lists.freedesktop.org>, <intel-gfx@lists.freedesktop.org> Subject: Re: [Intel-gfx] [PATCH 2/3] drm/i915/guc: Add work queue to trigger a GT reset Date: Thu, 20 Jan 2022 17:34:54 -0800 [thread overview] Message-ID: <4eecb7af-245c-60aa-2eed-0dbb54e65189@intel.com> (raw) In-Reply-To: <20220119212419.23068-3-matthew.brost@intel.com> On 1/19/2022 13:24, Matthew Brost wrote: > The G2H handler needs to be flushed during a GT reset but a G2H > indicating engine reset failure can trigger a GT reset. Add a worker to > trigger the GT when an engine reset failure is received to break this trigger the GT reset? > circular dependency. > > v2: > (John Harrison) > - Store engine reset mask > - Fix typo in commit message > > Signed-off-by: Matthew Brost <matthew.brost@intel.com> > --- > drivers/gpu/drm/i915/gt/uc/intel_guc.h | 9 +++++ > .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 37 +++++++++++++++++-- > 2 files changed, 42 insertions(+), 4 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h > index 9d26a86fe557..c4a9fc7dd246 100644 > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h > @@ -119,6 +119,15 @@ struct intel_guc { > * function as it might be in an atomic context (no sleeping) > */ > struct work_struct destroyed_worker; > + /** > + * @reset_worker: worker to trigger a GT reset after an engine > + * reset fails > + */ > + struct work_struct reset_worker; > + /** > + * @reset_mask: mask of engines that failed to reset > + */ > + intel_engine_mask_t reset_mask; reset_fail_mask might be a less ambiguous name? Same for the worker struct and function. John. > } submission_state; > > /** > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > index 3918f1be114f..514b3060b141 100644 > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > @@ -1731,6 +1731,7 @@ void intel_guc_submission_reset_finish(struct intel_guc *guc) > } > > static void destroyed_worker_func(struct work_struct *w); > +static void reset_worker_func(struct work_struct *w); > > /* > * Set up the memory resources to be shared with the GuC (via the GGTT) > @@ -1761,6 +1762,8 @@ int intel_guc_submission_init(struct intel_guc *guc) > INIT_LIST_HEAD(&guc->submission_state.destroyed_contexts); > INIT_WORK(&guc->submission_state.destroyed_worker, > destroyed_worker_func); > + INIT_WORK(&guc->submission_state.reset_worker, > + reset_worker_func); > > guc->submission_state.guc_ids_bitmap = > bitmap_zalloc(NUMBER_MULTI_LRC_GUC_ID(guc), GFP_KERNEL); > @@ -4026,6 +4029,26 @@ guc_lookup_engine(struct intel_guc *guc, u8 guc_class, u8 instance) > return gt->engine_class[engine_class][instance]; > } > > +static void reset_worker_func(struct work_struct *w) > +{ > + struct intel_guc *guc = container_of(w, struct intel_guc, > + submission_state.reset_worker); > + struct intel_gt *gt = guc_to_gt(guc); > + intel_engine_mask_t reset_mask; > + unsigned long flags; > + > + spin_lock_irqsave(&guc->submission_state.lock, flags); > + reset_mask = guc->submission_state.reset_mask; > + guc->submission_state.reset_mask = 0; > + spin_unlock_irqrestore(&guc->submission_state.lock, flags); > + > + if (likely(reset_mask)) > + intel_gt_handle_error(gt, reset_mask, > + I915_ERROR_CAPTURE, > + "GuC failed to reset engine mask=0x%x\n", > + reset_mask); > +} > + > int intel_guc_engine_failure_process_msg(struct intel_guc *guc, > const u32 *msg, u32 len) > { > @@ -4033,6 +4056,7 @@ int intel_guc_engine_failure_process_msg(struct intel_guc *guc, > struct intel_gt *gt = guc_to_gt(guc); > u8 guc_class, instance; > u32 reason; > + unsigned long flags; > > if (unlikely(len != 3)) { > drm_err(>->i915->drm, "Invalid length %u", len); > @@ -4057,10 +4081,15 @@ int intel_guc_engine_failure_process_msg(struct intel_guc *guc, > drm_err(>->i915->drm, "GuC engine reset request failed on %d:%d (%s) because 0x%08X", > guc_class, instance, engine->name, reason); > > - intel_gt_handle_error(gt, engine->mask, > - I915_ERROR_CAPTURE, > - "GuC failed to reset %s (reason=0x%08x)\n", > - engine->name, reason); > + spin_lock_irqsave(&guc->submission_state.lock, flags); > + guc->submission_state.reset_mask |= engine->mask; > + spin_unlock_irqrestore(&guc->submission_state.lock, flags); > + > + /* > + * A GT reset flushes this worker queue (G2H handler) so we must use > + * another worker to trigger a GT reset. > + */ > + queue_work(system_unbound_wq, &guc->submission_state.reset_worker); > > return 0; > }
next prev parent reply other threads:[~2022-01-21 1:35 UTC|newest] Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top 2022-01-19 21:24 [PATCH 0/3] Flush G2H handler during a GT reset Matthew Brost 2022-01-19 21:24 ` [Intel-gfx] " Matthew Brost 2022-01-19 21:24 ` [PATCH 1/3] drm/i915: Allocate intel_engine_coredump_alloc with ALLOW_FAIL Matthew Brost 2022-01-19 21:24 ` [Intel-gfx] " Matthew Brost 2022-01-19 21:24 ` [PATCH 2/3] drm/i915/guc: Add work queue to trigger a GT reset Matthew Brost 2022-01-19 21:24 ` [Intel-gfx] " Matthew Brost 2022-01-21 1:34 ` John Harrison [this message] 2022-01-21 1:34 ` John Harrison 2022-01-21 4:04 ` Matthew Brost 2022-01-21 4:04 ` [Intel-gfx] " Matthew Brost 2022-01-19 21:24 ` [PATCH 3/3] drm/i915/guc: Flush G2H handler during " Matthew Brost 2022-01-19 21:24 ` [Intel-gfx] " Matthew Brost 2022-01-21 1:36 ` John Harrison 2022-01-21 1:36 ` [Intel-gfx] " John Harrison 2022-01-21 4:05 ` Matthew Brost 2022-01-21 4:05 ` [Intel-gfx] " Matthew Brost 2022-01-19 23:49 ` [Intel-gfx] ✗ Fi.CI.SPARSE: warning for Flush G2H handler during a GT reset (rev3) Patchwork 2022-01-20 0:20 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork 2022-01-20 4:01 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork -- strict thread matches above, loose matches on Subject: below -- 2022-01-21 4:31 [PATCH 0/3] Flush G2H handler during a GT reset Matthew Brost 2022-01-21 4:31 ` [PATCH 2/3] drm/i915/guc: Add work queue to trigger " Matthew Brost 2022-01-21 18:53 ` John Harrison 2022-01-18 21:43 [PATCH 0/3] Flush G2H handler during " Matthew Brost 2022-01-18 21:43 ` [PATCH 2/3] drm/i915/guc: Add work queue to trigger " Matthew Brost 2022-01-19 1:37 ` John Harrison 2022-01-19 20:54 ` Matthew Brost 2022-01-19 21:07 ` John Harrison 2022-01-19 21:05 ` Matthew Brost
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=4eecb7af-245c-60aa-2eed-0dbb54e65189@intel.com \ --to=john.c.harrison@intel.com \ --cc=dri-devel@lists.freedesktop.org \ --cc=intel-gfx@lists.freedesktop.org \ --cc=matthew.brost@intel.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.