All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: John Harrison <john.c.harrison@intel.com>,
	Intel-GFX@Lists.FreeDesktop.Org
Cc: DRI-Devel@Lists.FreeDesktop.Org
Subject: Re: [Intel-gfx] [PATCH 2/3] drm/i915/gt: Make the heartbeat play nice with long pre-emption timeouts
Date: Wed, 2 Mar 2022 11:07:14 +0000	[thread overview]
Message-ID: <42b114a7-04ea-71eb-7cd6-507fb4fb1655@linux.intel.com> (raw)
In-Reply-To: <8fb0a3a7-7968-79cb-9ea1-e31b0593acaa@intel.com>


On 01/03/2022 20:59, John Harrison wrote:
> On 3/1/2022 04:09, Tvrtko Ursulin wrote:
>>
>> I'll trim it a bit again..
>>
>> On 28/02/2022 18:55, John Harrison wrote:
>>> On 2/28/2022 09:12, Tvrtko Ursulin wrote:
>>>> On 25/02/2022 18:48, John Harrison wrote:
>>>>> On 2/25/2022 10:14, Tvrtko Ursulin wrote:
>>
>> [snip]
>>
>>>>>> Your only objection is that ends up with too long total time 
>>>>>> before reset? Or something else as well?
>>>>> An unnecessarily long total heartbeat timeout is the main 
>>>>> objection. (2.5 + 12) * 5 = 72.5 seconds. That is a massive change 
>>>>> from the current 12.5s.
>>>>>
>>>>> If we are happy with that huge increase then fine. But I'm pretty 
>>>>> sure you are going to get a lot more bug reports about hung systems 
>>>>> not recovering. 10-20s is just about long enough for someone to 
>>>>> wait before leaning on the power button of their machine. Over a 
>>>>> minute is not. That kind of delay is going to cause support issues.
>>>>
>>>> Sorry I wrote 12s, while you actually said tP * 12, so 7.68s, chosen 
>>>> just so it is longer than tH * 3?
>>>>
>>>> And how do you keep coming up with factor of five? Isn't it four 
>>>> periods before "heartbeat stopped"? (Prio normal, hearbeat, barrier 
>>>> and then reset.)
>>> Prio starts at low not normal.
>>
>> Right, slipped my mind since I only keep seeing that one priority 
>> ladder block in intel_engine_heartbeat.c/heartbeat()..
>>
>>>> From the point of view of user experience I agree reasonable 
>>>> responsiveness is needed before user "reaches for the power button".
>>>>
>>>> In your proposal we are talking about 3 * 2.5s + 2 * 7.5s, so 22.5s.
>>>>
>>>> Question of workloads.. what is the actual preempt timeout compute 
>>>> is happy with? And I don't mean compute setups with disabled 
>>>> hangcheck, which you say they want anyway, but if we target defaults 
>>>> for end users. Do we have some numbers on what they are likely to run?
>>> Not that I have ever seen. This is all just finger in the air stuff. 
>>> I don't recall if we invented the number and the compute people 
>>> agreed with it or if they proposed the number to us.
>>
>> Yeah me neither. And found nothing in my email archives. :(
>>
>> Thinking about it today I don't see that disabled timeout is a 
>> practical default.
>>
>> With it, if users have something un-preemptable to run (assuming prio 
>> normal), it would get killed after ~13s (5 * 2.5).
>>
>> If we go for my scheme it gets killed in ~17.5s (3 * (2.5 + 2.5) + 2.5 
>> (third pulse triggers preempt timeout)).
>>
>> And if we go for your scheme it gets killed in ~22.5s (4 * 2.5 + 2 * 3 
>> * 2.5).
> Erm, that is not an apples to apples comparison. Your 17.5 is for an 
> engine reset tripped by the pre-emption timeout, but your 22.5s is for a 
> GT reset tripped by the heartbeat reaching the end and nuking the universe.

Right, in your scheme I did get it wrong. It would wait for GuC to reset 
the engine at the end, rather than hit the fake "hearbeat stopped" in 
that case, full reset path.

4 * 2.5 to trigger a max prio pulse, then 3 * 2.5 preempt timeout for 
GuC to reset (last hearbeat delay extended so it does not trigger). So 
17.5 as well.

> If you are saying that the first pulse at sufficient priority (third 
> being normal prio) is what causes the reset because the system is 
> working as expected and the pre-emption timeout trips the reset. In that 
> case, you have two periods to get to normal prio plus one pre-emption 
> timeout to trip the reset. I.e. (tH * 2) + tP.
> 
> Your scheme is then tH(actual) = tH(user) + tP, yes?
> So pre-emption based reset is after ((tH(user) + tP) * 2) + tP => (3 * 
> tP) + (2 * tH)
> And GT based reset is after (tH(user) + tP) * 5 => (5 * tP) + (5 * tH)
> 
> My scheme is tH(actual) = tH(user) for first four, then max(tH(user), 
> tP) for fifth.
> So pre-emption based reset is after tH(user) * 2 + tP = > tP + (2 * tH);
> And GT based reset is after (tH(user) * 4) + (max(tH(user), tP) * 1) => 
> greater of ((4 * tH) + tP) or (5 * tH)
> 
> Either way your scheme is longer. With tH(user) = 2.5s, tP(RCS) = 7.5s, 
> we get 27.5s for engine and 50s for GT versus my 12.5s for engine and 
> 17.5s for GT. With tP(RCS) = 2.5s, yours is 12.5s for engine and 25s for 
> GT versus my 7.5s for engine and 12.5s for GT.
> 
> Plus, not sure why your calculations above are using 2.5 for tP? Are you 
> still arguing that 7.5s is too long? That is a separate issue and not 
> related to the heartbeat algorithms. tP must be long enough to allow 
> 'out of box OpenCL workloads to complete'. That doesn't just mean not 
> being killed by the heartbeat, it also means not being killed by running 
> two of them concurrently (or one plus desktop OpenGL rendering) and not 
> having it killed by basic time slicing between the two contexts. The 
> heartbeat is not involved in that process. That is purely the 
> pre-emption timeout. And that is the fundamental reason why tP needs to 
> be much larger on RCS/CCS.

I was assuming 2.5s tP is enough and basing all calculation on that. 
Heartbeat or timeslicing regardless. I thought we established neither of 
us knows how long is enough.

Are you now saying 2.5s is definitely not enough? How is that usable for 
a default out of the box desktop?

>> If I did not confuse any calculation this time round, then the 
>> differences for default case for normal priority sound pretty 
>> immaterial to me.
>>
>>>> What if we gave them a default of 2.5s? That would be 4 * (2.5s + 
>>>> 2.5s) = 20s worst case until reset, comparable to your proposal. Are 
>>>> there realistic workloads which are non-preemptable for 2.5s? I am 
>>>> having hard time imagining someone would run them on a general 
>>>> purpose desktop since it would mean frozen and unusable UI anyway.
>>>>
>>>> Advantage still being in my mind that there would be no fudging of 
>>>> timeouts during driver load and heartbeat periods depending on 
>>>> priority. To me it feels more plausible to account for preempt 
>>>> timeout in heartbeat pulses that to calculate preempt timeout to be 
>>>> longer than hearbeat pulses, just to avoid races between the two.
>>> Except that when the user asks for a heartbeat period of 2.5s you are 
>>> actually setting it to 5s. How is that not a major fudge that is 
>>> totally disregarding the user's request?
>>
>> This is indeed the core question. My thinking:
>>
>> It is not defined in the heartbeat ABI that N pulses should do 
>> anything, just that they are periodic pulses which check the health of 
>> an engine.
>>
>> If we view user priority as not under our control then we can say that 
>> any heartbeat pulse can trigger preempt timeout and we should let it 
>> do that.
>>
>> From that it follows that it is justified to account for preempt 
>> timeout in the decision when to schedule heartbeat pulses and that it 
>> is legitimate to do it for all of them.
> But it can be optimised to say that it doesn't matter if you bump the 
> priority of a pulse before waiting for the pre-emption period to expire. 
> If the pulse was already high enough prio then the pre-emption has 
> already been triggered and bumping the prio has no effect. If was too 
> low then waiting for longer has no benefit at all.
> 
> All that matters is that you don't hit the end stop and trigger the GT 
> reset too early.

Yes I agree that it can also be argued what you are saying.

I was trying to weigh pros&cons of both approaches by bringing into the 
discussing the question of what are heartbeats. Given they are 
loosely/vaguely defined I think we have freedom to tweak things.

And I don't have a problem with extending the last pulse. It is 
fundamentally correct to do regardless of the backend. I just raised the 
question of whether to extend all heartbeats to account for preemption 
(and scheduling delays). (What is the point of bumping their priority 
and re-scheduling if we didn't give enough time to the engine to react? 
So opposite of the question you raise.)

What I do have a problem with is deriving the preempt timeout from the 
hearbeat period. Hence I am looking if we can instead find a fixed 
number which works, and so avoid having bi-directional coupling.

>> It also avoids the double reset problem, regardless of the backend and 
>> regardless of how the user configured the timeouts. Without the need 
>> to fudge them neither during driver load or during sysfs store.
>>
>> User has configured that heartbeat pulses should be sent every N 
>> seconds, yes, but I think we are free to account for inherent hardware 
>> and software latencies in our calculations. Especially since other 
>> than flawed Gen12 RCS, other engines will be much closer to the 
>> configured period.
>>
>> It is just the same as user asking for preempt timeout N and we say on 
>> driver load "oh no you won't get it". Same for heartbeats, they said 
>> 2.5s, we said 2.5s + broken engine factor...
> Why would you not get the pre-emption timeout? Because it is too large? 
> That is a physical limitation (of the GuC firmware) not an override 
> because we think we know better. If we can obey the user then we should 
> do so.

I was simply referring to the override in intel_engine_setup.

Regards,

Tvrtko

  reply	other threads:[~2022-03-02 11:07 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-18 21:33 [PATCH 0/3] Improve anti-pre-emption w/a for compute workloads John.C.Harrison
2022-02-18 21:33 ` [Intel-gfx] " John.C.Harrison
2022-02-18 21:33 ` [PATCH 1/3] drm/i915/guc: Limit scheduling properties to avoid overflow John.C.Harrison
2022-02-18 21:33   ` [Intel-gfx] " John.C.Harrison
2022-02-22  9:52   ` Tvrtko Ursulin
2022-02-22 10:39     ` Tvrtko Ursulin
2022-02-23  2:11     ` John Harrison
2022-02-23 12:13       ` Tvrtko Ursulin
2022-02-23 19:03         ` John Harrison
2022-02-24  9:59           ` Tvrtko Ursulin
2022-02-24 19:19             ` John Harrison
2022-02-24 19:51               ` John Harrison
2022-02-25 17:44                 ` Tvrtko Ursulin
2022-02-25 17:06               ` Tvrtko Ursulin
2022-02-25 17:39                 ` John Harrison
2022-02-28 16:11                   ` Tvrtko Ursulin
2022-02-28 18:32                     ` John Harrison
2022-03-01 10:50                       ` Tvrtko Ursulin
2022-03-01 19:57                         ` John Harrison
2022-03-02  9:20                           ` Tvrtko Ursulin
2022-03-02 18:07                             ` John Harrison
2022-02-23  0:52   ` Ceraolo Spurio, Daniele
2022-02-23  2:15     ` John Harrison
2022-02-18 21:33 ` [PATCH 2/3] drm/i915/gt: Make the heartbeat play nice with long pre-emption timeouts John.C.Harrison
2022-02-18 21:33   ` [Intel-gfx] " John.C.Harrison
2022-02-22 11:19   ` Tvrtko Ursulin
2022-02-23  2:45     ` John Harrison
2022-02-23 13:58       ` Tvrtko Ursulin
2022-02-23 20:00         ` John Harrison
2022-02-24 11:41           ` Tvrtko Ursulin
2022-02-24 19:45             ` John Harrison
2022-02-25 18:14               ` Tvrtko Ursulin
2022-02-25 18:48                 ` John Harrison
2022-02-28 17:12                   ` Tvrtko Ursulin
2022-02-28 18:55                     ` John Harrison
2022-03-01 12:09                       ` Tvrtko Ursulin
2022-03-01 20:59                         ` John Harrison
2022-03-02 11:07                           ` Tvrtko Ursulin [this message]
2022-03-02 17:55                             ` John Harrison
2022-03-03  9:55                               ` Tvrtko Ursulin
2022-03-03 19:09                                 ` John Harrison
2022-03-04 12:36                                   ` Tvrtko Ursulin
2022-02-18 21:33 ` [PATCH 3/3] drm/i915: Improve long running OCL w/a for GuC submission John.C.Harrison
2022-02-18 21:33   ` [Intel-gfx] " John.C.Harrison
2022-02-19  2:54 ` [Intel-gfx] ✗ Fi.CI.SPARSE: warning for Improve anti-pre-emption w/a for compute workloads Patchwork
2022-02-19  3:33 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
2022-02-22  9:53 ` [Intel-gfx] [PATCH 0/3] " Tvrtko Ursulin
2022-02-23  2:22   ` John Harrison
2022-02-23 12:00     ` Tvrtko Ursulin
2022-02-24 20:02       ` John Harrison
2022-02-25 16:36         ` Tvrtko Ursulin
2022-02-25 17:11           ` John Harrison
2022-02-25 17:39             ` Tvrtko Ursulin
2022-02-25 18:01               ` John Harrison
2022-02-25 18:29                 ` Tvrtko Ursulin
2022-02-25 19:03                   ` John Harrison
2022-02-28 15:32                     ` Tvrtko Ursulin
2022-02-28 19:17                       ` John Harrison
2022-03-02 11:21                         ` Tvrtko Ursulin
2022-03-02 17:40                           ` John Harrison

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=42b114a7-04ea-71eb-7cd6-507fb4fb1655@linux.intel.com \
    --to=tvrtko.ursulin@linux.intel.com \
    --cc=DRI-Devel@Lists.FreeDesktop.Org \
    --cc=Intel-GFX@Lists.FreeDesktop.Org \
    --cc=john.c.harrison@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.