dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] drm/i915/slpc: Optmize waitboost for SLPC
@ 2022-10-18 22:15 Vinay Belgaumkar
  2022-10-19  7:40 ` [Intel-gfx] " Tvrtko Ursulin
  0 siblings, 1 reply; 5+ messages in thread
From: Vinay Belgaumkar @ 2022-10-18 22:15 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: Vinay Belgaumkar

Waitboost (when SLPC is enabled) results in a H2G message. This can result
in thousands of messages during a stress test and fill up an already full
CTB. There is no need to request for RP0 if GuC is already requesting the
same.

Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_rps.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_rps.c b/drivers/gpu/drm/i915/gt/intel_rps.c
index fc23c562d9b2..a20ae4fceac8 100644
--- a/drivers/gpu/drm/i915/gt/intel_rps.c
+++ b/drivers/gpu/drm/i915/gt/intel_rps.c
@@ -1005,13 +1005,20 @@ void intel_rps_dec_waiters(struct intel_rps *rps)
 void intel_rps_boost(struct i915_request *rq)
 {
 	struct intel_guc_slpc *slpc;
+	struct intel_rps *rps = &READ_ONCE(rq->engine)->gt->rps;
 
 	if (i915_request_signaled(rq) || i915_request_has_waitboost(rq))
 		return;
 
+	/* If GuC is already requesting RP0, skip */
+	if (rps_uses_slpc(rps)) {
+		slpc = rps_to_slpc(rps);
+		if (intel_rps_get_requested_frequency(rps) == slpc->rp0_freq)
+			return;
+	}
+
 	/* Serializes with i915_request_retire() */
 	if (!test_and_set_bit(I915_FENCE_FLAG_BOOST, &rq->fence.flags)) {
-		struct intel_rps *rps = &READ_ONCE(rq->engine)->gt->rps;
 
 		if (rps_uses_slpc(rps)) {
 			slpc = rps_to_slpc(rps);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915/slpc: Optmize waitboost for SLPC
  2022-10-18 22:15 [PATCH] drm/i915/slpc: Optmize waitboost for SLPC Vinay Belgaumkar
@ 2022-10-19  7:40 ` Tvrtko Ursulin
  2022-10-19 21:12   ` Belgaumkar, Vinay
  0 siblings, 1 reply; 5+ messages in thread
From: Tvrtko Ursulin @ 2022-10-19  7:40 UTC (permalink / raw)
  To: Vinay Belgaumkar, intel-gfx, dri-devel


On 18/10/2022 23:15, Vinay Belgaumkar wrote:
> Waitboost (when SLPC is enabled) results in a H2G message. This can result
> in thousands of messages during a stress test and fill up an already full
> CTB. There is no need to request for RP0 if GuC is already requesting the
> same.
> 
> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_rps.c | 9 ++++++++-
>   1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_rps.c b/drivers/gpu/drm/i915/gt/intel_rps.c
> index fc23c562d9b2..a20ae4fceac8 100644
> --- a/drivers/gpu/drm/i915/gt/intel_rps.c
> +++ b/drivers/gpu/drm/i915/gt/intel_rps.c
> @@ -1005,13 +1005,20 @@ void intel_rps_dec_waiters(struct intel_rps *rps)
>   void intel_rps_boost(struct i915_request *rq)
>   {
>   	struct intel_guc_slpc *slpc;
> +	struct intel_rps *rps = &READ_ONCE(rq->engine)->gt->rps;
>   
>   	if (i915_request_signaled(rq) || i915_request_has_waitboost(rq))
>   		return;
>   
> +	/* If GuC is already requesting RP0, skip */
> +	if (rps_uses_slpc(rps)) {
> +		slpc = rps_to_slpc(rps);
> +		if (intel_rps_get_requested_frequency(rps) == slpc->rp0_freq)
> +			return;
> +	}
> +

Feels a little bit like a layering violation. Wait boost reference 
counts and request markings will changed based on asynchronous state - a 
mmio read.

Also, a little below we have this:

"""
	/* Serializes with i915_request_retire() */
	if (!test_and_set_bit(I915_FENCE_FLAG_BOOST, &rq->fence.flags)) {
		struct intel_rps *rps = &READ_ONCE(rq->engine)->gt->rps;

		if (rps_uses_slpc(rps)) {
			slpc = rps_to_slpc(rps);

			/* Return if old value is non zero */
			if (!atomic_fetch_inc(&slpc->num_waiters))

***>>>> Wouldn't it skip doing anything here already? <<<<***

				schedule_work(&slpc->boost_work);

			return;
		}

		if (atomic_fetch_inc(&rps->num_waiters))
			return;
"""

But I wonder if this is not a layering violation already. Looks like one 
for me at the moment. And as it happens there is an ongoing debug of 
clvk slowness where I was a bit puzzled by the lack of "boost fence" in 
trace_printk logs - but now I see how that happens. Does not feel right 
to me that we lose that tracing with SLPC.

So in general - why the correct approach wouldn't be to solve this in 
the worker - which perhaps should fork to slpc specific branch and do 
the consolidations/skips based on mmio reads in there?

Regards,

Tvrtko

>   	/* Serializes with i915_request_retire() */
>   	if (!test_and_set_bit(I915_FENCE_FLAG_BOOST, &rq->fence.flags)) {
> -		struct intel_rps *rps = &READ_ONCE(rq->engine)->gt->rps;
>   
>   		if (rps_uses_slpc(rps)) {
>   			slpc = rps_to_slpc(rps);

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915/slpc: Optmize waitboost for SLPC
  2022-10-19  7:40 ` [Intel-gfx] " Tvrtko Ursulin
@ 2022-10-19 21:12   ` Belgaumkar, Vinay
  2022-10-19 23:05     ` Belgaumkar, Vinay
  2022-10-20  8:14     ` Tvrtko Ursulin
  0 siblings, 2 replies; 5+ messages in thread
From: Belgaumkar, Vinay @ 2022-10-19 21:12 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx, dri-devel


On 10/19/2022 12:40 AM, Tvrtko Ursulin wrote:
>
> On 18/10/2022 23:15, Vinay Belgaumkar wrote:
>> Waitboost (when SLPC is enabled) results in a H2G message. This can 
>> result
>> in thousands of messages during a stress test and fill up an already 
>> full
>> CTB. There is no need to request for RP0 if GuC is already requesting 
>> the
>> same.
>>
>> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
>> ---
>>   drivers/gpu/drm/i915/gt/intel_rps.c | 9 ++++++++-
>>   1 file changed, 8 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/intel_rps.c 
>> b/drivers/gpu/drm/i915/gt/intel_rps.c
>> index fc23c562d9b2..a20ae4fceac8 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_rps.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_rps.c
>> @@ -1005,13 +1005,20 @@ void intel_rps_dec_waiters(struct intel_rps 
>> *rps)
>>   void intel_rps_boost(struct i915_request *rq)
>>   {
>>       struct intel_guc_slpc *slpc;
>> +    struct intel_rps *rps = &READ_ONCE(rq->engine)->gt->rps;
>>         if (i915_request_signaled(rq) || i915_request_has_waitboost(rq))
>>           return;
>>   +    /* If GuC is already requesting RP0, skip */
>> +    if (rps_uses_slpc(rps)) {
>> +        slpc = rps_to_slpc(rps);
>> +        if (intel_rps_get_requested_frequency(rps) == slpc->rp0_freq)
One correction here is this should be slpc->boost_freq.
>> +            return;
>> +    }
>> +
>
> Feels a little bit like a layering violation. Wait boost reference 
> counts and request markings will changed based on asynchronous state - 
> a mmio read.
>
> Also, a little below we have this:
>
> """
>     /* Serializes with i915_request_retire() */
>     if (!test_and_set_bit(I915_FENCE_FLAG_BOOST, &rq->fence.flags)) {
>         struct intel_rps *rps = &READ_ONCE(rq->engine)->gt->rps;
>
>         if (rps_uses_slpc(rps)) {
>             slpc = rps_to_slpc(rps);
>
>             /* Return if old value is non zero */
>             if (!atomic_fetch_inc(&slpc->num_waiters))
>
> ***>>>> Wouldn't it skip doing anything here already? <<<<***
It will skip only if boost is already happening. This patch is trying to 
prevent even that first one if possible.
>
>                 schedule_work(&slpc->boost_work);
>
>             return;
>         }
>
>         if (atomic_fetch_inc(&rps->num_waiters))
>             return;
> """
>
> But I wonder if this is not a layering violation already. Looks like 
> one for me at the moment. And as it happens there is an ongoing debug 
> of clvk slowness where I was a bit puzzled by the lack of "boost 
> fence" in trace_printk logs - but now I see how that happens. Does not 
> feel right to me that we lose that tracing with SLPC.
Agreed. Will add the trace to the SLPC case as well.  However, the 
question is what does that trace indicate? Even in the host case, we log 
the trace, but may skip the actual boost as the req is already matching 
boost freq. IMO, we should log the trace only when we actually decide to 
boost.
>
> So in general - why the correct approach wouldn't be to solve this in 
> the worker - which perhaps should fork to slpc specific branch and do 
> the consolidations/skips based on mmio reads in there?

sure, I can move the mmio read to the SLPC worker thread.

Thanks,

Vinay.

>
> Regards,
>
> Tvrtko
>
>>       /* Serializes with i915_request_retire() */
>>       if (!test_and_set_bit(I915_FENCE_FLAG_BOOST, &rq->fence.flags)) {
>> -        struct intel_rps *rps = &READ_ONCE(rq->engine)->gt->rps;
>>             if (rps_uses_slpc(rps)) {
>>               slpc = rps_to_slpc(rps);

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915/slpc: Optmize waitboost for SLPC
  2022-10-19 21:12   ` Belgaumkar, Vinay
@ 2022-10-19 23:05     ` Belgaumkar, Vinay
  2022-10-20  8:14     ` Tvrtko Ursulin
  1 sibling, 0 replies; 5+ messages in thread
From: Belgaumkar, Vinay @ 2022-10-19 23:05 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx, dri-devel


On 10/19/2022 2:12 PM, Belgaumkar, Vinay wrote:
>
> On 10/19/2022 12:40 AM, Tvrtko Ursulin wrote:
>>
>> On 18/10/2022 23:15, Vinay Belgaumkar wrote:
>>> Waitboost (when SLPC is enabled) results in a H2G message. This can 
>>> result
>>> in thousands of messages during a stress test and fill up an already 
>>> full
>>> CTB. There is no need to request for RP0 if GuC is already 
>>> requesting the
>>> same.
>>>
>>> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/gt/intel_rps.c | 9 ++++++++-
>>>   1 file changed, 8 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_rps.c 
>>> b/drivers/gpu/drm/i915/gt/intel_rps.c
>>> index fc23c562d9b2..a20ae4fceac8 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_rps.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_rps.c
>>> @@ -1005,13 +1005,20 @@ void intel_rps_dec_waiters(struct intel_rps 
>>> *rps)
>>>   void intel_rps_boost(struct i915_request *rq)
>>>   {
>>>       struct intel_guc_slpc *slpc;
>>> +    struct intel_rps *rps = &READ_ONCE(rq->engine)->gt->rps;
>>>         if (i915_request_signaled(rq) || 
>>> i915_request_has_waitboost(rq))
>>>           return;
>>>   +    /* If GuC is already requesting RP0, skip */
>>> +    if (rps_uses_slpc(rps)) {
>>> +        slpc = rps_to_slpc(rps);
>>> +        if (intel_rps_get_requested_frequency(rps) == slpc->rp0_freq)
> One correction here is this should be slpc->boost_freq.
>>> +            return;
>>> +    }
>>> +
>>
>> Feels a little bit like a layering violation. Wait boost reference 
>> counts and request markings will changed based on asynchronous state 
>> - a mmio read.
>>
>> Also, a little below we have this:
>>
>> """
>>     /* Serializes with i915_request_retire() */
>>     if (!test_and_set_bit(I915_FENCE_FLAG_BOOST, &rq->fence.flags)) {
>>         struct intel_rps *rps = &READ_ONCE(rq->engine)->gt->rps;
>>
>>         if (rps_uses_slpc(rps)) {
>>             slpc = rps_to_slpc(rps);
>>
>>             /* Return if old value is non zero */
>>             if (!atomic_fetch_inc(&slpc->num_waiters))
>>
>> ***>>>> Wouldn't it skip doing anything here already? <<<<***
> It will skip only if boost is already happening. This patch is trying 
> to prevent even that first one if possible.
>>
>>                 schedule_work(&slpc->boost_work);
>>
>>             return;
>>         }
>>
>>         if (atomic_fetch_inc(&rps->num_waiters))
>>             return;
>> """
>>
>> But I wonder if this is not a layering violation already. Looks like 
>> one for me at the moment. And as it happens there is an ongoing debug 
>> of clvk slowness where I was a bit puzzled by the lack of "boost 
>> fence" in trace_printk logs - but now I see how that happens. Does 
>> not feel right to me that we lose that tracing with SLPC.
> Agreed. Will add the trace to the SLPC case as well.  However, the 
> question is what does that trace indicate? Even in the host case, we 
> log the trace, but may skip the actual boost as the req is already 
> matching boost freq. IMO, we should log the trace only when we 
> actually decide to boost.
On second thoughts, that trace only tracks the boost fence, which is set 
in this case. So, might be ok to have it regardless. We count the 
num_boosts anyways if we ever wanted to know how many of those actually 
went on to boost the freq.
>>
>> So in general - why the correct approach wouldn't be to solve this in 
>> the worker - which perhaps should fork to slpc specific branch and do 
>> the consolidations/skips based on mmio reads in there?
>
> sure, I can move the mmio read to the SLPC worker thread.
>
> Thanks,
>
> Vinay.
>
>>
>> Regards,
>>
>> Tvrtko
>>
>>>       /* Serializes with i915_request_retire() */
>>>       if (!test_and_set_bit(I915_FENCE_FLAG_BOOST, &rq->fence.flags)) {
>>> -        struct intel_rps *rps = &READ_ONCE(rq->engine)->gt->rps;
>>>             if (rps_uses_slpc(rps)) {
>>>               slpc = rps_to_slpc(rps);

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915/slpc: Optmize waitboost for SLPC
  2022-10-19 21:12   ` Belgaumkar, Vinay
  2022-10-19 23:05     ` Belgaumkar, Vinay
@ 2022-10-20  8:14     ` Tvrtko Ursulin
  1 sibling, 0 replies; 5+ messages in thread
From: Tvrtko Ursulin @ 2022-10-20  8:14 UTC (permalink / raw)
  To: Belgaumkar, Vinay, intel-gfx, dri-devel


On 19/10/2022 22:12, Belgaumkar, Vinay wrote:
> 
> On 10/19/2022 12:40 AM, Tvrtko Ursulin wrote:
>>
>> On 18/10/2022 23:15, Vinay Belgaumkar wrote:
>>> Waitboost (when SLPC is enabled) results in a H2G message. This can 
>>> result
>>> in thousands of messages during a stress test and fill up an already 
>>> full
>>> CTB. There is no need to request for RP0 if GuC is already requesting 
>>> the
>>> same.
>>>
>>> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/gt/intel_rps.c | 9 ++++++++-
>>>   1 file changed, 8 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_rps.c 
>>> b/drivers/gpu/drm/i915/gt/intel_rps.c
>>> index fc23c562d9b2..a20ae4fceac8 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_rps.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_rps.c
>>> @@ -1005,13 +1005,20 @@ void intel_rps_dec_waiters(struct intel_rps 
>>> *rps)
>>>   void intel_rps_boost(struct i915_request *rq)
>>>   {
>>>       struct intel_guc_slpc *slpc;
>>> +    struct intel_rps *rps = &READ_ONCE(rq->engine)->gt->rps;
>>>         if (i915_request_signaled(rq) || i915_request_has_waitboost(rq))
>>>           return;
>>>   +    /* If GuC is already requesting RP0, skip */
>>> +    if (rps_uses_slpc(rps)) {
>>> +        slpc = rps_to_slpc(rps);
>>> +        if (intel_rps_get_requested_frequency(rps) == slpc->rp0_freq)
> One correction here is this should be slpc->boost_freq.
>>> +            return;
>>> +    }
>>> +
>>
>> Feels a little bit like a layering violation. Wait boost reference 
>> counts and request markings will changed based on asynchronous state - 
>> a mmio read.
>>
>> Also, a little below we have this:
>>
>> """
>>     /* Serializes with i915_request_retire() */
>>     if (!test_and_set_bit(I915_FENCE_FLAG_BOOST, &rq->fence.flags)) {
>>         struct intel_rps *rps = &READ_ONCE(rq->engine)->gt->rps;
>>
>>         if (rps_uses_slpc(rps)) {
>>             slpc = rps_to_slpc(rps);
>>
>>             /* Return if old value is non zero */
>>             if (!atomic_fetch_inc(&slpc->num_waiters))
>>
>> ***>>>> Wouldn't it skip doing anything here already? <<<<***
> It will skip only if boost is already happening. This patch is trying to 
> prevent even that first one if possible.

Do you mean that the first boost request comes outside the driver control?

>>
>>                 schedule_work(&slpc->boost_work);
>>
>>             return;
>>         }
>>
>>         if (atomic_fetch_inc(&rps->num_waiters))
>>             return;
>> """
>>
>> But I wonder if this is not a layering violation already. Looks like 
>> one for me at the moment. And as it happens there is an ongoing debug 
>> of clvk slowness where I was a bit puzzled by the lack of "boost 
>> fence" in trace_printk logs - but now I see how that happens. Does not 
>> feel right to me that we lose that tracing with SLPC.
> Agreed. Will add the trace to the SLPC case as well.  However, the 
> question is what does that trace indicate? Even in the host case, we log 
> the trace, but may skip the actual boost as the req is already matching 
> boost freq. IMO, we should log the trace only when we actually decide to 
> boost.

Good question - let me come back to this later when the current 
emergencies subside. Feel free to remind me if I forget.

>> So in general - why the correct approach wouldn't be to solve this in 
>> the worker - which perhaps should fork to slpc specific branch and do 
>> the consolidations/skips based on mmio reads in there?
> 
> sure, I can move the mmio read to the SLPC worker thread.

Thanks, yes I think that will even be better since mmio read will only 
happen if the higher level thinks that it should boost. So the hierarchy 
of "duties" would be slightly improved. Driver tracking -> SLPC tracking 
-> HW status.

I'll come back to the latest version of the patch later today or tomorrow.

Regards,

Tvrtko
> Thanks,
> 
> Vinay.
> 
>>
>> Regards,
>>
>> Tvrtko
>>
>>>       /* Serializes with i915_request_retire() */
>>>       if (!test_and_set_bit(I915_FENCE_FLAG_BOOST, &rq->fence.flags)) {
>>> -        struct intel_rps *rps = &READ_ONCE(rq->engine)->gt->rps;
>>>             if (rps_uses_slpc(rps)) {
>>>               slpc = rps_to_slpc(rps);

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-10-20  8:14 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-18 22:15 [PATCH] drm/i915/slpc: Optmize waitboost for SLPC Vinay Belgaumkar
2022-10-19  7:40 ` [Intel-gfx] " Tvrtko Ursulin
2022-10-19 21:12   ` Belgaumkar, Vinay
2022-10-19 23:05     ` Belgaumkar, Vinay
2022-10-20  8:14     ` Tvrtko Ursulin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).