[PATCH v3 0/3] Add allowed CPU capacity knowledge to EAS

* [PATCH v3 0/3] Add allowed CPU capacity knowledge to EAS
@ 2021-06-10 15:03 Lukasz Luba
  2021-06-10 15:03 ` [PATCH v3 1/3] thermal: cpufreq_cooling: Update also offline CPUs per-cpu thermal_pressure Lukasz Luba
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Lukasz Luba @ 2021-06-10 15:03 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-pm, peterz, rjw, viresh.kumar, vincent.guittot, qperret,
	dietmar.eggemann, vincent.donnefort, lukasz.luba,
	Beata.Michalska, mingo, juri.lelli, rostedt, segall, mgorman,
	bristot, thara.gopinath, amit.kachhap, amitk, rui.zhang,
	daniel.lezcano

Hi all,

The patch set v3 aims to add knowledge about reduced CPU capacity
into the Energy Model (EM) and Energy Aware Scheduler (EAS). Currently the
issue is that SchedUtil CPU frequency and EM frequency are not aligned,
when there is a CPU thermal capping. This causes an estimation error.
This patch set provides the information about allowed CPU capacity
into the EM (thanks to thermal pressure information). This improves the
energy estimation. More info about this mechanism can be found in the
patches comments.

There is a new patch 1/3 in this v3, addressing an issue triggered for
hotplugged out CPU. The offline CPUs don't have proper value stored by
thermal framework in their per-cpu thermal_pressure. Thus, the thermal
pressure geometric series machinery reads 'stale' value when the CPU
is back online. The patch fixes it, so all mechanisms like
load balance, not only EAS, would have more accurate CPU capacity
information for those 'returning online' CPUs. I've added also related
cpu cooling maintainers to the CC of this patch set.

Changelog:
v3:
- switched to 'raw' per-cpu thermal pressure instead of thermal pressure
  geometric series signal, since it more suited for purpose of
  this use case: predicting SchedUtil frequency (Vincent, Dietmar)
- added more comment in the patch 2/3 header for use case when thermal
  capping might be applied even the CPUs are not over-utilized
  (Dietmar)
- added ACK tag from Rafael for SchedUtil part
- added a fix patch for offline CPUs in cpufreq_cooling and per-cpu
  thermal_pressure missing update
v2 [2]:
- clamp the returned value from effective_cpu_util() and avoid irq
  util scaling issues (Quentin)
v1 is available at [1]

Regards,
Lukasz

[1] https://lore.kernel.org/linux-pm/20210602135609.10867-1-lukasz.luba@arm.com/
[2] https://lore.kernel.org/lkml/20210604080954.13915-1-lukasz.luba@arm.com/

Lukasz Luba (3):
  thermal: cpufreq_cooling: Update also offline CPUs per-cpu
    thermal_pressure
  sched/fair: Take thermal pressure into account while estimating energy
  sched/cpufreq: Consider reduced CPU capacity in energy calculation

 drivers/thermal/cpufreq_cooling.c |  2 +-
 include/linux/energy_model.h      | 16 +++++++++++++---
 include/linux/sched/cpufreq.h     |  2 +-
 kernel/sched/cpufreq_schedutil.c  |  1 +
 kernel/sched/fair.c               | 14 ++++++++++----
 5 files changed, 26 insertions(+), 9 deletions(-)

-- 
2.17.1

^ permalink raw reply	[flat|nested] 10+ messages in thread