All of lore.kernel.org
 help / color / mirror / Atom feed
From: Francisco Jerez <currojerez@riseup.net>
To: Chris Wilson <chris@chris-wilson.co.uk>,
	intel-gfx@lists.freedesktop.org, linux-pm@vger.kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>, "Pandruvada\,
	Srinivas" <srinivas.pandruvada@intel.com>
Subject: Re: [Intel-gfx] [PATCH 02/10] drm/i915: Adjust PM QoS response frequency based on GPU load.
Date: Tue, 10 Mar 2020 17:34:00 -0700	[thread overview]
Message-ID: <87r1xzafwn.fsf@riseup.net> (raw)
In-Reply-To: <158387916218.28297.4489489879582782488@build.alporthouse.com>


[-- Attachment #1.1: Type: text/plain, Size: 5531 bytes --]

Chris Wilson <chris@chris-wilson.co.uk> writes:

> Quoting Francisco Jerez (2020-03-10 21:41:55)
>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
>> index b9b3f78f1324..a5d7a80b826d 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
>> @@ -1577,6 +1577,11 @@ static void execlists_submit_ports(struct intel_engine_cs *engine)
>>         /* we need to manually load the submit queue */
>>         if (execlists->ctrl_reg)
>>                 writel(EL_CTRL_LOAD, execlists->ctrl_reg);
>> +
>> +       if (execlists_num_ports(execlists) > 1 &&
> pending[1] is always defined, the minimum submission is one slot, with
> pending[1] as the sentinel NULL.
>
>> +           execlists->pending[1] &&
>> +           !atomic_xchg(&execlists->overload, 1))
>> +               intel_gt_pm_active_begin(&engine->i915->gt);
>
> engine->gt
>

Applied your suggestions above locally, will probably wait to have a few
more changes batched up before sending a v2.

>>  }
>>  
>>  static bool ctx_single_port_submission(const struct intel_context *ce)
>> @@ -2213,6 +2218,12 @@ cancel_port_requests(struct intel_engine_execlists * const execlists)
>>         clear_ports(execlists->inflight, ARRAY_SIZE(execlists->inflight));
>>  
>>         WRITE_ONCE(execlists->active, execlists->inflight);
>> +
>> +       if (atomic_xchg(&execlists->overload, 0)) {
>> +               struct intel_engine_cs *engine =
>> +                       container_of(execlists, typeof(*engine), execlists);
>> +               intel_gt_pm_active_end(&engine->i915->gt);
>> +       }
>>  }
>>  
>>  static inline void
>> @@ -2386,6 +2397,9 @@ static void process_csb(struct intel_engine_cs *engine)
>>                         /* port0 completed, advanced to port1 */
>>                         trace_ports(execlists, "completed", execlists->active);
>>  
>> +                       if (atomic_xchg(&execlists->overload, 0))
>> +                               intel_gt_pm_active_end(&engine->i915->gt);
>
> So this looses track if we preempt a dual-ELSP submission with a
> single-ELSP submission (and never go back to dual).
>

Yes, good point.  You're right that if a dual-ELSP submission gets
preempted by a single-ELSP submission "overload" will remain signaled
until the first completion interrupt arrives (e.g. from the preempting
submission).

> If you move this to the end of the loop and check
>
> if (!execlists->active[1] && atomic_xchg(&execlists->overload, 0))
> 	intel_gt_pm_active_end(engine->gt);
>
> so that it covers both preemption/promotion and completion.
>

That sounds reasonable.

> However, that will fluctuate quite rapidly. (And runs the risk of
> exceeding the sentinel.)
>
> An alternative approach would be to couple along
> schedule_in/schedule_out
>
> atomic_set(overload, -1);
>
> __execlists_schedule_in:
> 	if (!atomic_fetch_inc(overload)
> 		intel_gt_pm_active_begin(engine->gt);
> __execlists_schedule_out:
> 	if (!atomic_dec_return(overload)
> 		intel_gt_pm_active_end(engine->gt);
>
> which would mean we are overloaded as soon as we try to submit an
> overlapping ELSP.
>

That sounds good to me too, and AFAICT would have roughly the same
behavior as this metric except for the preemption corner case you
mention above.  I'll try this and verify that I get approximately the
same performance numbers.

>
> The metric feels very multiple client (game + display server, or
> saturated transcode) centric. In the endless kernel world, we expect
> 100% engine utilisation from a single context, and never a dual-ELSP
> submission. They are also likely to want to avoid being throttled to
> converse TDP for the CPU.
>
Yes, this metric is fairly conservative, it won't trigger in all cases
which would potentially benefit from the energy efficiency optimization,
only where we can be reasonably certain that CPU latency is not critical
in order to keep the GPU busy (e.g. because the CS has an additional
ELSP port pending execution that will immediately kick in as soon as the
current one completes).

My original approach was to call intel_gt_pm_active_begin() directly as
soon as the first ELSP is submitted to the GPU, which was somewhat more
effective at improving the energy efficiency of the system than waiting
for the second port to be in use, but it involved a slight execlists
submission latency cost that led to some regressions.  It would
certainly cover the single-context case you have in mind though.  I'll
get some updated numbers with my previous approach so we can decide
which one provides a better trade-off.

> Should we also reduce the overload for the number of clients who are
> waiting for interrupts from the GPU, so that their wakeup latency is not
> impacted?

A number of clients waiting doesn't necessarily indicate that wake-up
latency is a concern.  It frequently indicates the opposite: That the
GPU has a bottleneck which will only be exacerbated by attempting to
reduce the ramp-up latency of the CPU.  IOW, I think we should only care
about reducing the CPU wake-up latency in cases where the client is
unable to keep the GPU fully utilized with the latency target which
allows the GPU to run at maximum throughput -- If the client is unable
to it will already cause the GPU utilization to drop, so the PM QoS
request will be removed whether it is waiting or not.

> -Chris

Thanks!

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 227 bytes --]

WARNING: multiple messages have this Message-ID (diff)
From: Francisco Jerez <currojerez@riseup.net>
To: Chris Wilson <chris@chris-wilson.co.uk>,
	intel-gfx@lists.freedesktop.org, linux-pm@vger.kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>,
	"Pandruvada, Srinivas" <srinivas.pandruvada@intel.com>
Subject: Re: [Intel-gfx] [PATCH 02/10] drm/i915: Adjust PM QoS response frequency based on GPU load.
Date: Tue, 10 Mar 2020 17:34:00 -0700	[thread overview]
Message-ID: <87r1xzafwn.fsf@riseup.net> (raw)
In-Reply-To: <158387916218.28297.4489489879582782488@build.alporthouse.com>


[-- Attachment #1.1.1: Type: text/plain, Size: 5531 bytes --]

Chris Wilson <chris@chris-wilson.co.uk> writes:

> Quoting Francisco Jerez (2020-03-10 21:41:55)
>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
>> index b9b3f78f1324..a5d7a80b826d 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
>> @@ -1577,6 +1577,11 @@ static void execlists_submit_ports(struct intel_engine_cs *engine)
>>         /* we need to manually load the submit queue */
>>         if (execlists->ctrl_reg)
>>                 writel(EL_CTRL_LOAD, execlists->ctrl_reg);
>> +
>> +       if (execlists_num_ports(execlists) > 1 &&
> pending[1] is always defined, the minimum submission is one slot, with
> pending[1] as the sentinel NULL.
>
>> +           execlists->pending[1] &&
>> +           !atomic_xchg(&execlists->overload, 1))
>> +               intel_gt_pm_active_begin(&engine->i915->gt);
>
> engine->gt
>

Applied your suggestions above locally, will probably wait to have a few
more changes batched up before sending a v2.

>>  }
>>  
>>  static bool ctx_single_port_submission(const struct intel_context *ce)
>> @@ -2213,6 +2218,12 @@ cancel_port_requests(struct intel_engine_execlists * const execlists)
>>         clear_ports(execlists->inflight, ARRAY_SIZE(execlists->inflight));
>>  
>>         WRITE_ONCE(execlists->active, execlists->inflight);
>> +
>> +       if (atomic_xchg(&execlists->overload, 0)) {
>> +               struct intel_engine_cs *engine =
>> +                       container_of(execlists, typeof(*engine), execlists);
>> +               intel_gt_pm_active_end(&engine->i915->gt);
>> +       }
>>  }
>>  
>>  static inline void
>> @@ -2386,6 +2397,9 @@ static void process_csb(struct intel_engine_cs *engine)
>>                         /* port0 completed, advanced to port1 */
>>                         trace_ports(execlists, "completed", execlists->active);
>>  
>> +                       if (atomic_xchg(&execlists->overload, 0))
>> +                               intel_gt_pm_active_end(&engine->i915->gt);
>
> So this looses track if we preempt a dual-ELSP submission with a
> single-ELSP submission (and never go back to dual).
>

Yes, good point.  You're right that if a dual-ELSP submission gets
preempted by a single-ELSP submission "overload" will remain signaled
until the first completion interrupt arrives (e.g. from the preempting
submission).

> If you move this to the end of the loop and check
>
> if (!execlists->active[1] && atomic_xchg(&execlists->overload, 0))
> 	intel_gt_pm_active_end(engine->gt);
>
> so that it covers both preemption/promotion and completion.
>

That sounds reasonable.

> However, that will fluctuate quite rapidly. (And runs the risk of
> exceeding the sentinel.)
>
> An alternative approach would be to couple along
> schedule_in/schedule_out
>
> atomic_set(overload, -1);
>
> __execlists_schedule_in:
> 	if (!atomic_fetch_inc(overload)
> 		intel_gt_pm_active_begin(engine->gt);
> __execlists_schedule_out:
> 	if (!atomic_dec_return(overload)
> 		intel_gt_pm_active_end(engine->gt);
>
> which would mean we are overloaded as soon as we try to submit an
> overlapping ELSP.
>

That sounds good to me too, and AFAICT would have roughly the same
behavior as this metric except for the preemption corner case you
mention above.  I'll try this and verify that I get approximately the
same performance numbers.

>
> The metric feels very multiple client (game + display server, or
> saturated transcode) centric. In the endless kernel world, we expect
> 100% engine utilisation from a single context, and never a dual-ELSP
> submission. They are also likely to want to avoid being throttled to
> converse TDP for the CPU.
>
Yes, this metric is fairly conservative, it won't trigger in all cases
which would potentially benefit from the energy efficiency optimization,
only where we can be reasonably certain that CPU latency is not critical
in order to keep the GPU busy (e.g. because the CS has an additional
ELSP port pending execution that will immediately kick in as soon as the
current one completes).

My original approach was to call intel_gt_pm_active_begin() directly as
soon as the first ELSP is submitted to the GPU, which was somewhat more
effective at improving the energy efficiency of the system than waiting
for the second port to be in use, but it involved a slight execlists
submission latency cost that led to some regressions.  It would
certainly cover the single-context case you have in mind though.  I'll
get some updated numbers with my previous approach so we can decide
which one provides a better trade-off.

> Should we also reduce the overload for the number of clients who are
> waiting for interrupts from the GPU, so that their wakeup latency is not
> impacted?

A number of clients waiting doesn't necessarily indicate that wake-up
latency is a concern.  It frequently indicates the opposite: That the
GPU has a bottleneck which will only be exacerbated by attempting to
reduce the ramp-up latency of the CPU.  IOW, I think we should only care
about reducing the CPU wake-up latency in cases where the client is
unable to keep the GPU fully utilized with the latency target which
allows the GPU to run at maximum throughput -- If the client is unable
to it will already cause the GPU utilization to drop, so the PM QoS
request will be removed whether it is waiting or not.

> -Chris

Thanks!

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 227 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2020-03-11  0:34 UTC|newest]

Thread overview: 85+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-10 21:41 [RFC] GPU-bound energy efficiency improvements for the intel_pstate driver (v2) Francisco Jerez
2020-03-10 21:41 ` [Intel-gfx] " Francisco Jerez
2020-03-10 21:41 ` [PATCH 01/10] PM: QoS: Add CPU_RESPONSE_FREQUENCY global PM QoS limit Francisco Jerez
2020-03-10 21:41   ` [Intel-gfx] " Francisco Jerez
2020-03-11 12:42   ` Peter Zijlstra
2020-03-11 12:42     ` [Intel-gfx] " Peter Zijlstra
2020-03-11 19:23     ` Francisco Jerez
2020-03-11 19:23       ` [Intel-gfx] " Francisco Jerez
2020-03-11 19:23       ` [PATCHv2 " Francisco Jerez
2020-03-11 19:23         ` [Intel-gfx] " Francisco Jerez
2020-03-19 10:25         ` Rafael J. Wysocki
2020-03-19 10:25           ` [Intel-gfx] " Rafael J. Wysocki
2020-03-10 21:41 ` [PATCH 02/10] drm/i915: Adjust PM QoS response frequency based on GPU load Francisco Jerez
2020-03-10 21:41   ` [Intel-gfx] " Francisco Jerez
2020-03-10 22:26   ` Chris Wilson
2020-03-10 22:26     ` Chris Wilson
2020-03-11  0:34     ` Francisco Jerez [this message]
2020-03-11  0:34       ` Francisco Jerez
2020-03-18 19:42       ` Francisco Jerez
2020-03-18 19:42         ` Francisco Jerez
2020-03-20  2:46         ` Francisco Jerez
2020-03-20  2:46           ` Francisco Jerez
2020-03-20 10:06           ` Chris Wilson
2020-03-20 10:06             ` Chris Wilson
2020-03-11 10:00     ` Tvrtko Ursulin
2020-03-11 10:00       ` Tvrtko Ursulin
2020-03-11 10:21       ` Chris Wilson
2020-03-11 10:21         ` Chris Wilson
2020-03-11 19:54       ` Francisco Jerez
2020-03-11 19:54         ` Francisco Jerez
2020-03-12 11:52         ` Tvrtko Ursulin
2020-03-12 11:52           ` Tvrtko Ursulin
2020-03-13  7:39           ` Francisco Jerez
2020-03-13  7:39             ` Francisco Jerez
2020-03-16 20:54             ` Francisco Jerez
2020-03-16 20:54               ` Francisco Jerez
2020-03-10 21:41 ` [PATCH 03/10] OPTIONAL: drm/i915: Expose PM QoS control parameters via debugfs Francisco Jerez
2020-03-10 21:41   ` [Intel-gfx] " Francisco Jerez
2020-03-10 21:41 ` [PATCH 04/10] Revert "cpufreq: intel_pstate: Drop ->update_util from pstate_funcs" Francisco Jerez
2020-03-10 21:41   ` [Intel-gfx] " Francisco Jerez
2020-03-19 10:45   ` Rafael J. Wysocki
2020-03-19 10:45     ` [Intel-gfx] " Rafael J. Wysocki
2020-03-10 21:41 ` [PATCH 05/10] cpufreq: intel_pstate: Implement VLP controller statistics and status calculation Francisco Jerez
2020-03-10 21:41   ` [Intel-gfx] " Francisco Jerez
2020-03-19 11:06   ` Rafael J. Wysocki
2020-03-19 11:06     ` [Intel-gfx] " Rafael J. Wysocki
2020-03-10 21:41 ` [PATCH 06/10] cpufreq: intel_pstate: Implement VLP controller target P-state range estimation Francisco Jerez
2020-03-10 21:41   ` [Intel-gfx] " Francisco Jerez
2020-03-19 11:12   ` Rafael J. Wysocki
2020-03-19 11:12     ` [Intel-gfx] " Rafael J. Wysocki
2020-03-10 21:42 ` [PATCH 07/10] cpufreq: intel_pstate: Implement VLP controller for HWP parts Francisco Jerez
2020-03-10 21:42   ` [Intel-gfx] " Francisco Jerez
2020-03-17 23:59   ` Pandruvada, Srinivas
2020-03-17 23:59     ` [Intel-gfx] " Pandruvada, Srinivas
2020-03-18 19:51     ` Francisco Jerez
2020-03-18 19:51       ` [Intel-gfx] " Francisco Jerez
2020-03-18 20:10       ` Pandruvada, Srinivas
2020-03-18 20:10         ` [Intel-gfx] " Pandruvada, Srinivas
2020-03-18 20:22         ` Francisco Jerez
2020-03-18 20:22           ` [Intel-gfx] " Francisco Jerez
2020-03-23 20:13           ` Pandruvada, Srinivas
2020-03-23 20:13             ` [Intel-gfx] " Pandruvada, Srinivas
2020-03-10 21:42 ` [PATCH 08/10] cpufreq: intel_pstate: Enable VLP controller based on ACPI FADT profile and CPUID Francisco Jerez
2020-03-10 21:42   ` [Intel-gfx] " Francisco Jerez
2020-03-19 11:20   ` Rafael J. Wysocki
2020-03-19 11:20     ` [Intel-gfx] " Rafael J. Wysocki
2020-03-10 21:42 ` [PATCH 09/10] OPTIONAL: cpufreq: intel_pstate: Add tracing of VLP controller status Francisco Jerez
2020-03-10 21:42   ` [Intel-gfx] " Francisco Jerez
2020-03-10 21:42 ` [PATCH 10/10] OPTIONAL: cpufreq: intel_pstate: Expose VLP controller parameters via debugfs Francisco Jerez
2020-03-10 21:42   ` [Intel-gfx] " Francisco Jerez
2020-03-11  2:35 ` [RFC] GPU-bound energy efficiency improvements for the intel_pstate driver (v2) Pandruvada, Srinivas
2020-03-11  2:35   ` [Intel-gfx] " Pandruvada, Srinivas
2020-03-11  3:55   ` Francisco Jerez
2020-03-11  3:55     ` [Intel-gfx] " Francisco Jerez
2020-03-11  4:25 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for " Patchwork
2020-03-12  2:31 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for GPU-bound energy efficiency improvements for the intel_pstate driver (v2). (rev2) Patchwork
2020-03-12  2:32 ` Patchwork
2020-03-23 23:29 ` [RFC] GPU-bound energy efficiency improvements for the intel_pstate driver (v2) Pandruvada, Srinivas
2020-03-23 23:29   ` [Intel-gfx] " Pandruvada, Srinivas
2020-03-24  0:23   ` Francisco Jerez
2020-03-24  0:23     ` [Intel-gfx] " Francisco Jerez
2020-03-24 19:16     ` Francisco Jerez
2020-03-24 19:16       ` [Intel-gfx] " Francisco Jerez
2020-03-24 20:03       ` Pandruvada, Srinivas
2020-03-24 20:03         ` [Intel-gfx] " Pandruvada, Srinivas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87r1xzafwn.fsf@riseup.net \
    --to=currojerez@riseup.net \
    --cc=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=rjw@rjwysocki.net \
    --cc=srinivas.pandruvada@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.