From: Matthew Brost <matthew.brost@intel.com> To: John Harrison <john.c.harrison@intel.com> Cc: intel-gfx@lists.freedesktop.org, daniele.ceraolospurio@intel.com, dri-devel@lists.freedesktop.org Subject: Re: [PATCH 4/7] drm/i915/guc: Don't hog IRQs when destroying contexts Date: Fri, 10 Dec 2021 17:10:50 -0800 [thread overview] Message-ID: <20211211011049.GA8660@jons-linux-dev-box> (raw) In-Reply-To: <ec870417-3894-0bb2-6561-722b8345be6f@intel.com> On Fri, Dec 10, 2021 at 05:07:12PM -0800, John Harrison wrote: > On 12/10/2021 16:56, Matthew Brost wrote: > > From: John Harrison <John.C.Harrison@Intel.com> > > > > While attempting to debug a CT deadlock issue in various CI failures > > (most easily reproduced with gem_ctx_create/basic-files), I was seeing > > CPU deadlock errors being reported. This were because the context > > destroy loop was blocking waiting on H2G space from inside an IRQ > > spinlock. There was deadlock as such, it's just that the H2G queue was > There was *no* deadlock as such > Let's fix this up when applying the series. With that: Reviewed-by: Matthew Brost <matthew.brost@intel.com> > John. > > > full of context destroy commands and GuC was taking a long time to > > process them. However, the kernel was seeing the large amount of time > > spent inside the IRQ lock as a dead CPU. Various Bad Things(tm) would > > then happen (heartbeat failures, CT deadlock errors, outstanding H2G > > WARNs, etc.). > > > > Re-working the loop to only acquire the spinlock around the list > > management (which is all it is meant to protect) rather than the > > entire destroy operation seems to fix all the above issues. > > > > Signed-off-by: John Harrison <John.C.Harrison@Intel.com> > > --- > > .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 45 ++++++++++++------- > > 1 file changed, 28 insertions(+), 17 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > > index 36c2965db49b..96fcf869e3ff 100644 > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > > @@ -2644,7 +2644,6 @@ static inline void guc_lrc_desc_unpin(struct intel_context *ce) > > unsigned long flags; > > bool disabled; > > - lockdep_assert_held(&guc->submission_state.lock); > > GEM_BUG_ON(!intel_gt_pm_is_awake(gt)); > > GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id.id)); > > GEM_BUG_ON(ce != __get_context(guc, ce->guc_id.id)); > > @@ -2660,7 +2659,7 @@ static inline void guc_lrc_desc_unpin(struct intel_context *ce) > > } > > spin_unlock_irqrestore(&ce->guc_state.lock, flags); > > if (unlikely(disabled)) { > > - __release_guc_id(guc, ce); > > + release_guc_id(guc, ce); > > __guc_context_destroy(ce); > > return; > > } > > @@ -2694,36 +2693,48 @@ static void __guc_context_destroy(struct intel_context *ce) > > static void guc_flush_destroyed_contexts(struct intel_guc *guc) > > { > > - struct intel_context *ce, *cn; > > + struct intel_context *ce; > > unsigned long flags; > > GEM_BUG_ON(!submission_disabled(guc) && > > guc_submission_initialized(guc)); > > - spin_lock_irqsave(&guc->submission_state.lock, flags); > > - list_for_each_entry_safe(ce, cn, > > - &guc->submission_state.destroyed_contexts, > > - destroyed_link) { > > - list_del_init(&ce->destroyed_link); > > - __release_guc_id(guc, ce); > > + while (!list_empty(&guc->submission_state.destroyed_contexts)) { > > + spin_lock_irqsave(&guc->submission_state.lock, flags); > > + ce = list_first_entry_or_null(&guc->submission_state.destroyed_contexts, > > + struct intel_context, > > + destroyed_link); > > + if (ce) > > + list_del_init(&ce->destroyed_link); > > + spin_unlock_irqrestore(&guc->submission_state.lock, flags); > > + > > + if (!ce) > > + break; > > + > > + release_guc_id(guc, ce); > > __guc_context_destroy(ce); > > } > > - spin_unlock_irqrestore(&guc->submission_state.lock, flags); > > } > > static void deregister_destroyed_contexts(struct intel_guc *guc) > > { > > - struct intel_context *ce, *cn; > > + struct intel_context *ce; > > unsigned long flags; > > - spin_lock_irqsave(&guc->submission_state.lock, flags); > > - list_for_each_entry_safe(ce, cn, > > - &guc->submission_state.destroyed_contexts, > > - destroyed_link) { > > - list_del_init(&ce->destroyed_link); > > + while (!list_empty(&guc->submission_state.destroyed_contexts)) { > > + spin_lock_irqsave(&guc->submission_state.lock, flags); > > + ce = list_first_entry_or_null(&guc->submission_state.destroyed_contexts, > > + struct intel_context, > > + destroyed_link); > > + if (ce) > > + list_del_init(&ce->destroyed_link); > > + spin_unlock_irqrestore(&guc->submission_state.lock, flags); > > + > > + if (!ce) > > + break; > > + > > guc_lrc_desc_unpin(ce); > > } > > - spin_unlock_irqrestore(&guc->submission_state.lock, flags); > > } > > static void destroyed_worker_func(struct work_struct *w) >
WARNING: multiple messages have this Message-ID (diff)
From: Matthew Brost <matthew.brost@intel.com> To: John Harrison <john.c.harrison@intel.com> Cc: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org Subject: Re: [Intel-gfx] [PATCH 4/7] drm/i915/guc: Don't hog IRQs when destroying contexts Date: Fri, 10 Dec 2021 17:10:50 -0800 [thread overview] Message-ID: <20211211011049.GA8660@jons-linux-dev-box> (raw) In-Reply-To: <ec870417-3894-0bb2-6561-722b8345be6f@intel.com> On Fri, Dec 10, 2021 at 05:07:12PM -0800, John Harrison wrote: > On 12/10/2021 16:56, Matthew Brost wrote: > > From: John Harrison <John.C.Harrison@Intel.com> > > > > While attempting to debug a CT deadlock issue in various CI failures > > (most easily reproduced with gem_ctx_create/basic-files), I was seeing > > CPU deadlock errors being reported. This were because the context > > destroy loop was blocking waiting on H2G space from inside an IRQ > > spinlock. There was deadlock as such, it's just that the H2G queue was > There was *no* deadlock as such > Let's fix this up when applying the series. With that: Reviewed-by: Matthew Brost <matthew.brost@intel.com> > John. > > > full of context destroy commands and GuC was taking a long time to > > process them. However, the kernel was seeing the large amount of time > > spent inside the IRQ lock as a dead CPU. Various Bad Things(tm) would > > then happen (heartbeat failures, CT deadlock errors, outstanding H2G > > WARNs, etc.). > > > > Re-working the loop to only acquire the spinlock around the list > > management (which is all it is meant to protect) rather than the > > entire destroy operation seems to fix all the above issues. > > > > Signed-off-by: John Harrison <John.C.Harrison@Intel.com> > > --- > > .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 45 ++++++++++++------- > > 1 file changed, 28 insertions(+), 17 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > > index 36c2965db49b..96fcf869e3ff 100644 > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > > @@ -2644,7 +2644,6 @@ static inline void guc_lrc_desc_unpin(struct intel_context *ce) > > unsigned long flags; > > bool disabled; > > - lockdep_assert_held(&guc->submission_state.lock); > > GEM_BUG_ON(!intel_gt_pm_is_awake(gt)); > > GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id.id)); > > GEM_BUG_ON(ce != __get_context(guc, ce->guc_id.id)); > > @@ -2660,7 +2659,7 @@ static inline void guc_lrc_desc_unpin(struct intel_context *ce) > > } > > spin_unlock_irqrestore(&ce->guc_state.lock, flags); > > if (unlikely(disabled)) { > > - __release_guc_id(guc, ce); > > + release_guc_id(guc, ce); > > __guc_context_destroy(ce); > > return; > > } > > @@ -2694,36 +2693,48 @@ static void __guc_context_destroy(struct intel_context *ce) > > static void guc_flush_destroyed_contexts(struct intel_guc *guc) > > { > > - struct intel_context *ce, *cn; > > + struct intel_context *ce; > > unsigned long flags; > > GEM_BUG_ON(!submission_disabled(guc) && > > guc_submission_initialized(guc)); > > - spin_lock_irqsave(&guc->submission_state.lock, flags); > > - list_for_each_entry_safe(ce, cn, > > - &guc->submission_state.destroyed_contexts, > > - destroyed_link) { > > - list_del_init(&ce->destroyed_link); > > - __release_guc_id(guc, ce); > > + while (!list_empty(&guc->submission_state.destroyed_contexts)) { > > + spin_lock_irqsave(&guc->submission_state.lock, flags); > > + ce = list_first_entry_or_null(&guc->submission_state.destroyed_contexts, > > + struct intel_context, > > + destroyed_link); > > + if (ce) > > + list_del_init(&ce->destroyed_link); > > + spin_unlock_irqrestore(&guc->submission_state.lock, flags); > > + > > + if (!ce) > > + break; > > + > > + release_guc_id(guc, ce); > > __guc_context_destroy(ce); > > } > > - spin_unlock_irqrestore(&guc->submission_state.lock, flags); > > } > > static void deregister_destroyed_contexts(struct intel_guc *guc) > > { > > - struct intel_context *ce, *cn; > > + struct intel_context *ce; > > unsigned long flags; > > - spin_lock_irqsave(&guc->submission_state.lock, flags); > > - list_for_each_entry_safe(ce, cn, > > - &guc->submission_state.destroyed_contexts, > > - destroyed_link) { > > - list_del_init(&ce->destroyed_link); > > + while (!list_empty(&guc->submission_state.destroyed_contexts)) { > > + spin_lock_irqsave(&guc->submission_state.lock, flags); > > + ce = list_first_entry_or_null(&guc->submission_state.destroyed_contexts, > > + struct intel_context, > > + destroyed_link); > > + if (ce) > > + list_del_init(&ce->destroyed_link); > > + spin_unlock_irqrestore(&guc->submission_state.lock, flags); > > + > > + if (!ce) > > + break; > > + > > guc_lrc_desc_unpin(ce); > > } > > - spin_unlock_irqrestore(&guc->submission_state.lock, flags); > > } > > static void destroyed_worker_func(struct work_struct *w) >
next prev parent reply other threads:[~2021-12-11 1:16 UTC|newest] Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-12-11 0:56 [PATCH 0/7] Fix stealing guc_ids + test Matthew Brost 2021-12-11 0:56 ` [Intel-gfx] " Matthew Brost 2021-12-11 0:56 ` [PATCH 1/7] drm/i915/guc: Use correct context lock when callig clr_context_registered Matthew Brost 2021-12-11 0:56 ` [Intel-gfx] " Matthew Brost 2021-12-11 0:56 ` [PATCH 2/7] drm/i915/guc: Only assign guc_id.id when stealing guc_id Matthew Brost 2021-12-11 0:56 ` [Intel-gfx] " Matthew Brost 2021-12-11 1:08 ` John Harrison 2021-12-11 1:08 ` [Intel-gfx] " John Harrison 2021-12-11 0:56 ` [PATCH 3/7] drm/i915/guc: Remove racey GEM_BUG_ON Matthew Brost 2021-12-11 0:56 ` [Intel-gfx] " Matthew Brost 2021-12-11 0:56 ` [PATCH 4/7] drm/i915/guc: Don't hog IRQs when destroying contexts Matthew Brost 2021-12-11 0:56 ` [Intel-gfx] " Matthew Brost 2021-12-11 1:07 ` John Harrison 2021-12-11 1:07 ` [Intel-gfx] " John Harrison 2021-12-11 1:10 ` Matthew Brost [this message] 2021-12-11 1:10 ` Matthew Brost 2021-12-11 0:56 ` [PATCH 5/7] drm/i915/guc: Add extra debug on CT deadlock Matthew Brost 2021-12-11 0:56 ` [Intel-gfx] " Matthew Brost 2021-12-11 1:43 ` John Harrison 2021-12-11 1:43 ` [Intel-gfx] " John Harrison 2021-12-11 1:45 ` John Harrison 2021-12-11 3:24 ` Matthew Brost 2021-12-11 0:56 ` [PATCH 6/7] drm/i915/guc: Kick G2H tasklet if no credits Matthew Brost 2021-12-11 0:56 ` [Intel-gfx] " Matthew Brost 2021-12-11 0:56 ` [PATCH 7/7] drm/i915/guc: Selftest for stealing of guc ids Matthew Brost 2021-12-11 0:56 ` [Intel-gfx] " Matthew Brost 2021-12-11 1:33 ` John Harrison 2021-12-11 1:33 ` [Intel-gfx] " John Harrison 2021-12-11 3:31 ` Matthew Brost 2021-12-11 3:31 ` [Intel-gfx] " Matthew Brost 2021-12-11 3:32 ` Matthew Brost 2021-12-11 3:32 ` [Intel-gfx] " Matthew Brost 2021-12-11 2:28 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Fix stealing guc_ids + test Patchwork 2021-12-11 2:30 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork 2021-12-11 2:58 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork 2021-12-12 1:32 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork 2021-12-11 17:35 [PATCH 0/7] " Matthew Brost 2021-12-11 17:35 ` [PATCH 4/7] drm/i915/guc: Don't hog IRQs when destroying contexts Matthew Brost 2021-12-14 17:04 [PATCH 0/7] Fix stealing guc_ids + test Matthew Brost 2021-12-14 17:04 ` [PATCH 4/7] drm/i915/guc: Don't hog IRQs when destroying contexts Matthew Brost
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20211211011049.GA8660@jons-linux-dev-box \ --to=matthew.brost@intel.com \ --cc=daniele.ceraolospurio@intel.com \ --cc=dri-devel@lists.freedesktop.org \ --cc=intel-gfx@lists.freedesktop.org \ --cc=john.c.harrison@intel.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.