Re: [Intel-gfx] [PATCH 1/1] drm/i915/guc: Delay disabling guc_id scheduling for better hysteresis

From: "Teres Alexis, Alan Previn" <alan.previn.teres.alexis@intel.com>
To: "tvrtko.ursulin@linux.intel.com" <tvrtko.ursulin@linux.intel.com>,
	"Ceraolo Spurio, Daniele" <daniele.ceraolospurio@intel.com>,
	"intel-gfx@lists.freedesktop.org"
	<intel-gfx@lists.freedesktop.org>
Subject: Re: [Intel-gfx] [PATCH 1/1] drm/i915/guc: Delay disabling guc_id scheduling for better hysteresis
Date: Tue, 20 Sep 2022 06:22:48 +0000	[thread overview]
Message-ID: <d69e9fd9b3df695779adc2cb94b96bf44ff59878.camel@intel.com> (raw)
In-Reply-To: <c6e968c8-aaa0-949c-61c9-857ef5d9f617@intel.com>


On Fri, 2022-09-16 at 08:36 -0700, Ceraolo Spurio, Daniele wrote:
> 
> On 9/16/2022 1:58 AM, Tvrtko Ursulin wrote:
> > On 16/09/2022 08:53, Teres Alexis, Alan Previn wrote:
> > > On Thu, 2022-09-15 at 09:48 +0100, Tvrtko Ursulin wrote:
> > > > On 15/09/2022 03:12, Alan Previn wrote:
> > > > > From: Matthew Brost <matthew.brost@intel.com>
> > > > > 
> > > > > +static void guc_context_sched_disable(struct intel_context *ce)
> > > > > +{
> > > > > +    struct intel_guc *guc = ce_to_guc(ce);
> > > > > +    u64 delay = guc->submission_state.sched_disable_delay_ms;
> > > > > +    unsigned long flags;
> > > > > +
> > > > > +    spin_lock_irqsave(&ce->guc_state.lock, flags);
> > > > > +
> > > > > +    if (bypass_sched_disable(guc, ce)) {
> > > > > +        spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> > > > > +        intel_context_sched_disable_unpin(ce);
> > > > > +    } else if (!intel_context_is_closed(ce) && 
> > > > > !guc_id_pressure(guc, ce) &&
> > > > > +           delay) {
> > > > > spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> > > > > -        goto unpin;
> > > > > +        mod_delayed_work(system_unbound_wq,
> > > > > +                 &ce->guc_state.sched_disable_delay,
> > > > > +                 msecs_to_jiffies(delay));
> > > > > +    } else {
> > > > > +        do_sched_disable(guc, ce, flags);
> > > > >        }
> > > > 
> > > > lock
> > > > if
> > > >     unlock
> > > >     do stuff
> > > > else if
> > > >     unlock
> > > >     do stuff
> > > > else
> > > >     do_sched_disable - which unlocks inside
> > > > 
> > > > IMO it creates less readable code for the benefit of not repeating
> > > > with_intel_runtime_pm -> __guc_context_sched_disable two times. Dunno..
> > > > it's ugly but I have no suggestions. Hm does it have to send using the
> > > > busy loop? It couldn't just queue the request and then wait for 
> > > > reply if
> > > > disable message was emitted?
> > > > 
> > > I agree that the above code lacks readability - will see if i can 
> > > break it down to smaller
> > > functions with cleaner in-function lock/unlock pairs. I agree that a 
> > > little code duplication
> > > is better than less readable code. It was inherited code i didn't 
> > > want to modify but I'll
> > > go ahead and refactor this.
> > > 
> > > On the busy loop - im assuming you are refering to the actual ct 
> > > sending. I'll consult my
> > > team if i am missing anything more but based on comments, I believe 
> > > callers must use that
> > > function to guarantee reservation of space in the G2H CTB to always 
> > > have space to capture
> > > responses for actions that MUST be acknowledged from GuC 
> > > (acknowledged by either replying
> > > with a success or failure). This is necessary for coherent guc-id 
> > > state machine (because the
> > > GuC firmware will drop requests for guc-id's that are not registered 
> > > or not in a
> > > 'sched-enabled' state).
> > 
> > Maybe to explain what I meant a bit better. I thought that the reason 
> > for strange unlock pattern is the with_runtime_pm which needs to 
> > happen for the CT send/receive loop. What I meant was would it be 
> > possible to do it like this:
> > 
Alan: I assumed the strange lock-unlock was for no other reason than to ensure that
once the delayed task begins executing, we dont let go of the lock until after
we complete the call prep_context_pending_disable (which modifies the state-machine
for the context's guc-id). And as we expect, the runtime_pm is about trying to send
the H2G signal to the GuC which touches register. (i.e. lock was on ce-gucid state
and power was for guc-interaction via ct). If my assumption are correct, then I
can refactor the functions a bit to remove that weird unlock flow while keeping the
lock held from start until prep_context_pending_disable, else i could
still go ahead and do that (with some minimal duplication).


> > state lock
> > ...
> > sent = queue_msg_2_guc (this would be i915 only, no runtime pm needed)
> > ...
> > state unlock
> > 
> > if (sent)
> >   with_runtime_pm:
> >      send/receive queued guc messages (only this would talk to guc)
> > 
> > But I have truly no idea if the CT messaging infrastructure supports 
> > something like this.
> > 
Alan: I'll look around and see if we have something that provides that flow
but i don't think I should pursue such a redesign as part of this series
if none exists.


> > Anyway, see what it is possible and if it is not or too hard for now 
> > leave it be.
> > 
> > > > > -    guc_id = prep_context_pending_disable(ce);
> > > > > +}
> > > > >    -    spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> > > > > +static void guc_flush_all_delayed_disable_sched_contexts(struct 
> > > > > intel_guc *guc)
> > > > > +{
> > > > > +    struct intel_context *ce;
> > > > > +    unsigned long index;
> > > > > +    unsigned long flags;
> > > > > +    unsigned long ceflags;
> > > > >    -    with_intel_runtime_pm(runtime_pm, wakeref)
> > > > > -        __guc_context_sched_disable(guc, ce, guc_id);
> > > > > +    xa_lock_irqsave(&guc->context_lookup, flags);
> > > > > +    xa_for_each(&guc->context_lookup, index, ce) {
> > > > > +        if (!kref_get_unless_zero(&ce->ref))
> > > > > +            continue;
> > > > > +        xa_unlock(&guc->context_lookup);
> > > > 
> > > > So this whole loop _needs_ to run with interrupts disabled? Explaining
> > > > why in a comment would be good.
> > > > 
> > > Being mid-reset, the locking mode is consistent with other functions 
> > > also used
> > > as part of the reset preparation that parses and potentially modifies 
> > > contexts.
> > > I believe the goal is to handle all of this parsing without getting 
> > > conflicting
> > > latent G2H replies that breaks the preparation flow (that herds 
> > > active contexts
> > > into a fewer set of states as part of reset) - but i will double check
> > > with my colleagues.
> > > 
> > > > > +        if (test_bit(CONTEXT_GUC_INIT, &ce->flags) &&
> > > > > + cancel_delayed_work(&ce->guc_state.sched_disable_delay)) {
> > > > > +            spin_lock_irqsave(&ce->guc_state.lock, ceflags);
> > > > > + spin_unlock_irqrestore(&ce->guc_state.lock, ceflags);
> > > > 
> > > > This deserves a comment about what lock toggling wants to ensure.
> > > > 
> > > I realize this might have been my local rebasing mistake, the 
> > > intention was to
> > > handle cases where sched_disable_delay wasn't pending but potentially 
> > > still
> > > executing do_sched_disable. I believe I could try 
> > > cancel_delayed_work_sync (but
> > > not sure if i can call that might-sleep funtion mid reset while not-
> > > interruptible). Else, i would move that lock-unlock to if the
> > > cancel_delayed_work did not return true (as per original intent 
> > > before my
> > > rebase error).
> > 
> > Okay a comment like "flush any currently executing do_sched_disable" 
> > above the lock toggling would do then. Even better if you add what 
> > happens if that flushing isn't done.
> > 
got it. will do.


> > > > Also, if the loops runs with interrupts disabled what is the point of
> > > > irqsave variant in here??
> > > Yes - its redundant, let me fix that, apologies for that.
> > > 
> > > > Also2, what is the reason for dropping the lock? intel_context_put?
> > > Being consistent with other reset preparation code that closes contexts,
> > > the lock is dropped before the intel_context_put.
> > > (I hope i am not misunderstanding your question).
> > 
> > Right, okay.. so for locking inversion issues - intel_context_put must 
> > not be called with guc_state.lock held, yes?
> > 
> > Not a mandatory request, but if you want you could look into the 
> > option of avoiding lock dropping and instead doing xa_erase and 
> > recording the list of contexts for sched disable or put in a local 
> > list, and doing them all in one block outside the locked/irq disabled 
> > section. Although tbh I am not sure if that would improve anything. 
> > Probably not in this case of a reset path, but if there are other 
> > places in GuC code which do this it may be beneficial for less 
> > hammering on the CPU and core synchronisation for atomics.
> > 
> > > > > +    /*
> > > > > +     * If the context gets closed while the execbuf is ongoing, 
> > > > > the context
> > > > > +     * close code will race with the below code to cancel the 
> > > > > delayed work.
> > > > > +     * If the context close wins the race and cancels the work, it 
> > > > > will
> > > > > +     * immediately call the sched disable (see guc_context_close), 
> > > > > so there
> > > > > +     * is a chance we can get past this check while the 
> > > > > sched_disable code
> > > > > +     * is being executed. To make sure that code completes before 
> > > > > we check
> > > > > +     * the status further down, we wait for the close process to 
> > > > > complete.
> > > > > +     */
> > > > > +    if (cancel_delayed_work_sync(&ce->guc_state.sched_disable_delay))
> > > > > +        intel_context_sched_disable_unpin(ce);
> > > > > +    else if (intel_context_is_closed(ce))
> > > > > +        wait_for(context_close_done(ce), 1);
> > > > 
> > > > Comment makes it sounds important to handle the race, althought it
> > > > doesn't really explain the consequences. But most importantly, what if
> > > > close doesn't complete in 1ms?
> > > 
> > > will add the consequence (i believe the consequence is that we could 
> > > prematurely
> > > read context flags bit indicating its gucid is still registered and 
> > > after skipping
> > > re-registration, find that context gets closed and guc-id freed).
> > > 
> > > Yes the 1 second is arbitrary and unnervingly short. Just spent 
> > > sometime trying to
> > 
> > One millisecond even. :)
> 
hehe - typo :)

> Normally 1ms is not a slow time for this. We can only hit the wait if 
> the context_close call has already called the cancel_delayed_work, so 
> the only thing left to do in that function is to send the H2G, which is 
> relatively fast. However, I've just realized that if the H2G buffer is 
> full the H2G will stall, so in that case it can take longer (maximum 
> stall time before a hang is declared is 1.5s).
> 
Daniele, so increase to 1500 milisecs?

> > What would be the consequence of prematurely reading the still 
> > registered context flag? Execbuf fails? GuC hangs? Kernel crashes? 
> > Something else?
> 
> i915 thinks that a request pending with the GuC, while GuC thinks we 
> disabled it (because the completion of the delayed_disable happens after 
> the new request has been submitted). The heartbeat will still go 
> through, so no reset will be triggered, the request is just lost. A new 
> request submission on the same context should be able to recover it, but 
> we didn't test that.
> 
> 
> > And why cant' this race with context close happen at any other point 
> > than this particular spot in guc_request_alloc? Immediately after the 
> > added checks? After atomic_add_unless?
> 
> The race is tied to the canceling of the delayed work. context_close 
> only does something if it cancels the delayed work, so if the 
> cancel_delayed_work_sync here does the cancelling then context_close is 
> a no-op.
> 
> > > figure out portions of the SCHED_foo state machine bits and believe 
> > > that its possible
> > > for guc_request_alloc to just force context_close to be done from 
> > > here as it would
> > > force it into a state requiring re-registration and would close that 
> > > a few lines
> > > below. I will however verify with my team mates as i am new to these 
> > > SCHED_foo state
> > > machine bits.
> 
> I'm not sure we want to force context_close unconditionally here, that'd 
> be a big overhead. Doing it only if there is a close pending is also 
> potentially an issue, the whole point is that the close can race in. The 
> close also has to work on its own, because in the normal use-cases we 
> don't get a context_close while a request submission is ongoing.
> Unless you meant something different and I completely misunderstood.
> 
> Daniele
> 
> > Yes it always looked complicated and perhaps it has even grown more so 
> > - I'm afraid I cannot be of any help there.
> > 
> > Regards,
> > 
> > Tvrtko