All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Teres Alexis, Alan Previn" <alan.previn.teres.alexis@intel.com>
To: "tvrtko.ursulin@linux.intel.com" <tvrtko.ursulin@linux.intel.com>,
	"intel-gfx@lists.freedesktop.org"
	<intel-gfx@lists.freedesktop.org>
Subject: Re: [Intel-gfx] [PATCH 1/1] drm/i915/guc: Delay disabling guc_id scheduling for better hysteresis
Date: Wed, 21 Sep 2022 22:30:29 +0000	[thread overview]
Message-ID: <35b85f4d4156e3afe558a9b231605769249de449.camel@intel.com> (raw)
In-Reply-To: <70636d43-57e8-46ac-7751-d8fd0fb13d72@linux.intel.com>


On Fri, 2022-09-16 at 09:58 +0100, Tvrtko Ursulin wrote:
> On 16/09/2022 08:53, Teres Alexis, Alan Previn wrote:
> > On Thu, 2022-09-15 at 09:48 +0100, Tvrtko Ursulin wrote:
> > > On 15/09/2022 03:12, Alan Previn wrote:
> > > > From: Matthew Brost <matthew.brost@intel.com>
> > > > 
> > > > +static void guc_flush_all_delayed_disable_sched_contexts(struct intel_guc *guc)
> > > > +{
> > > > +	struct intel_context *ce;
> > > > +	unsigned long index;
> > > > +	unsigned long flags;
> > > > +	unsigned long ceflags;
> > > >    
> > > > -	with_intel_runtime_pm(runtime_pm, wakeref)
> > > > -		__guc_context_sched_disable(guc, ce, guc_id);
> > > > +	xa_lock_irqsave(&guc->context_lookup, flags);
> > > > +	xa_for_each(&guc->context_lookup, index, ce) {
> > > > +		if (!kref_get_unless_zero(&ce->ref))
> > > > +			continue;
> > > > +		xa_unlock(&guc->context_lookup);
> > > 
> > > So this whole loop _needs_ to run with interrupts disabled? Explaining
> > > why in a comment would be good.
> > > 
> > Being mid-reset, the locking mode is consistent with other functions also used
> > as part of the reset preparation that parses and potentially modifies contexts.
> > I believe the goal is to handle all of this parsing without getting conflicting
> > latent G2H replies that breaks the preparation flow (that herds active contexts
> > into a fewer set of states as part of reset) - but i will double check
> > with my colleagues.
> > 
Alan: Update i realize the guc-reset-preparation code disable irqs for the guc being reset
prior to this function so in fact, we really ought not to see any change to that xa_array
because of a context-id change that is coming via a interrupt that is orthogonal to this
thread. I will change to xa_lock.

> > > > +		if (test_bit(CONTEXT_GUC_INIT, &ce->flags) &&
> > > > +		    cancel_delayed_work(&ce->guc_state.sched_disable_delay)) {
> > > > +			spin_lock_irqsave(&ce->guc_state.lock, ceflags);
> > > > +			spin_unlock_irqrestore(&ce->guc_state.lock, ceflags);
> > > 
> > > This deserves a comment about what lock toggling wants to ensure.
> > > 
> > I realize this might have been my local rebasing mistake, the intention was to
> > handle cases where sched_disable_delay wasn't pending but potentially still
> > executing do_sched_disable. I believe I could try cancel_delayed_work_sync (but
> > not sure if i can call that might-sleep funtion mid reset while not-
> > interruptible). Else, i would move that lock-unlock to if the
> > cancel_delayed_work did not return true (as per original intent before my
> > rebase error).
> 
> Okay a comment like "flush any currently executing do_sched_disable" 
> above the lock toggling would do then. Even better if you add what 
> happens if that flushing isn't done.
> 
> > > Also, if the loops runs with interrupts disabled what is the point of
> > > irqsave variant in here??
> > Yes - its redundant, let me fix that, apologies for that.
> > 
same thing here, a context's guc state should not change in response to an IRQ
from that guc since we disabled it while we are in reset preparatoin
- so actually "not needed" as opposed to "redundant"

> > > Also2, what is the reason for dropping the lock? intel_context_put?
> > Being consistent with other reset preparation code that closes contexts,
> > the lock is dropped before the intel_context_put.
> > (I hope i am not misunderstanding your question).
> 
> Right, okay.. so for locking inversion issues - intel_context_put must 
> not be called with guc_state.lock held, yes?
> 
> Not a mandatory request, but if you want you could look into the option 
> of avoiding lock dropping and instead doing xa_erase and recording the 
> list of contexts for sched disable or put in a local list, and doing 
> them all in one block outside the locked/irq disabled section. Although 
> tbh I am not sure if that would improve anything. Probably not in this 
> case of a reset path, but if there are other places in GuC code which do 
> this it may be beneficial for less hammering on the CPU and core 
> synchronisation for atomics.
> 
apologies my ignorance - perhaps i misunderstood how these functions work but
I assumed that calling kref_get_unless_zero with a non-zero return that lead us
here meant that there is still a ref on the context that didnt come from the
reset path so i am just following the correct rules to release that ref 
and not destroy the contexts yet - but leaving it in the pending-disable
that will be handled in scrub_guc_desc_for_outstanding_g2h that gets called
later in the reset preparation.

Actually i realize that the better option might be to move above code
into the loop already present inside of scrub_guc_desc_for_outstanding_g2h
just prior to its checking of whether a context is pending-disable.
That would ensure we take all these context locks once in the same function
for the herding of all possible states when scrubbing those outstanding g2h.



  parent reply	other threads:[~2022-09-21 22:30 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-15  2:12 [Intel-gfx] [PATCH 0/1] Delay disabling GuC scheduling of an idle context Alan Previn
2022-09-15  2:12 ` [Intel-gfx] [PATCH 1/1] drm/i915/guc: Delay disabling guc_id scheduling for better hysteresis Alan Previn
2022-09-15  8:48   ` Tvrtko Ursulin
2022-09-16  7:53     ` Teres Alexis, Alan Previn
2022-09-16  8:58       ` Tvrtko Ursulin
2022-09-16 15:36         ` Ceraolo Spurio, Daniele
2022-09-20  6:22           ` Teres Alexis, Alan Previn
2022-09-20 14:33           ` Tvrtko Ursulin
2022-09-21  7:59           ` Teres Alexis, Alan Previn
2022-09-21 22:30           ` Teres Alexis, Alan Previn
2022-09-21 22:30         ` Teres Alexis, Alan Previn [this message]
2022-09-15  3:22 ` [Intel-gfx] ✗ Fi.CI.SPARSE: warning for Delay disabling GuC scheduling of an idle context Patchwork
2022-09-15  3:41 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2022-09-15 20:41 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=35b85f4d4156e3afe558a9b231605769249de449.camel@intel.com \
    --to=alan.previn.teres.alexis@intel.com \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=tvrtko.ursulin@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.