archive mirror
 help / color / mirror / Atom feed
From: Ricardo Neri <>
To: "Rafael J. Wysocki" <>,
	Daniel Lezcano <>,
	Len Brown <>,
	Srinivas Pandruvada <>,
	Aubrey Li <>,
	Amit Kucheria <>, Andi Kleen <>,
	Tim Chen <>,
	Lukasz Luba <>,
	"Ravi V. Shankar" <>,
	Ricardo Neri <>,,
	Ricardo Neri <>
Subject: [PATCH v2 0/7] Thermal: Introduce the Hardware Feedback Interface for thermal and performance management
Date: Mon, 20 Dec 2021 07:14:31 -0800	[thread overview]
Message-ID: <> (raw)


This is v2 of this patchset after having incorporated the feedback from
reviewers. Please find v1 here [1].

The Intel Hardware Feedback Interface (HFI) [2] provides information about
the performance and energy efficiency of each CPU in the system. It uses a
table that is shared between hardware and the operating system. The
contents of the table may be updated as a result of changes in the
operating conditions of the system (e.g., reaching a thermal limit) or the
action of external factors (e.g., changes in the thermal design power).

The information that HFI provides are specified as numeric, unit-less
capabilities relative to other CPUs in the system. These capabilities have
a range of [0-255] where higher numbers represent higher capabilities.
Energy efficiency and performance are reported in separate capabilities.
If either the performance or energy capabilities efficiency of a CPU are 0,
the hardware recommends to not schedule any tasks on such CPU for
performance, energy efficiency or thermal reasons, respectively.

The kernel or user space may use the information from the HFI to modify
task placement and/or adjust power limits. This patchset focuses on the
user space. The thermal notification framework is extended to relay
updates of CPU capacity. Thus, a userspace daemon can affinitize workloads
to certain CPUs and/or offline CPUs whose capabilities are zero.

The frequency of HFI updates is specific to each processor model. On some
systems, there is just a single HFI update at boot. On other systems, there
may be updates every tens of milliseconds. In order to not overwhelm
userspace with too many updates, they are limited to one update every
CONFIG_HZ jiffies.

Thanks and BR,


Changes since v1:
 ++ Unchanged patches: none
 * Clarified that HFI capabilities are independent. (Patch 1; Daniel)
 * Provided examples on changes reflected in the HFI table.
   (Patch 1; Daniel)
 * Renamed X86_FEATURE_INTEL_HFI as X86_FEATURE_HFI. (Patch 2, 3, 4, 5;
 * Reworked parsing of HFI features using bitfields instead of bitmasks.
   (Patch 3; PeterZ).
 * Removed hfi_instance::parsed. (Patch 3; Rafael)
 * Converted pr_err() to pr_debug(). (Patch 3; Srinivas, Rafael)
 * Removed unnecessary dependency on CONFIG_SCHED_MC. (Patch 3)
 * Renamed hfi_instance::ts_counter as hfi_instance::timestamp.
   (Patch 3)
 * Renamed hfi_instance::table_base as hfi_instance::local_table and
   relocated its definition to this patch. (Patch 3)
 * Wrapped hfi_instance::timestamp and hfi_instance:local_table in an
   anonymous union, since both point at the same location. (Patch 3)
 * Relocated definitions of MSR bits from intel_hfi.h to intel_hfi.c.
   (Patch 4)
   (Patch 4).
 * Reworked init_hfi_cpu_index() to take a pointer of type struct
   hfi_cpu_info. (Patch 4; Rafael)
 * In intel_hfi_online(), check for null hfi_instances and improve checks
   of the die_id. (Patch 4; Rafael)
 * Use a local cpumask_var_t to keep track of the CPUs of each
   hfi_instance. (Patch 4; Rafael)
 * Removed hfi_instance::die_id. It is not used anywhere. (Patch 4)
 * Renamed hfi_instance::table_base as hfi_instance::local_table for
   clarity. (Patch 4)
 * Repurposed hfi_instance::event_lock to handle HFI interrupt
   events to avoid keeping CPUs spinning needlessly. Added a new
   hfi_instance::table_lock to serialize access to an HFI table.
   (Patch 5; PeterZ)
 * Replaced raw_spin_[un]lock[restore|save]() with raw_spin_[un]lock().
   intel_hfi_process_event() runs in interrupt context and hence there
   is no need to save interrupt flags. (Patch 5)
   for clarity. (Patch 5)
 * Relaxed checks on timestamp to allow time wrap around. (Patch 5)
 * Reworded the members of struct cpu_capacity for clarity. (Patch 6;
 * Defined the CPU capability attributes to be scaled in the [0, 1023]
   interval. (Patch 6; Lukasz)
 * Made get_one_hfi_cap() return void. Removed unnecessary checks.
   (Patch 7; Rafael)
 * Replaced raw_spin_[un]lock_irq[restore|save]() with raw_spin_
   [un]lock_irq() in get_one_hfi_cap(). This function is only called from
   a workqueue and there is no need to save and restore irq flags.
 * Scaled performance and energy efficiency values to a [0, 1023] interval
   when reporting values to user space via thermal netlink notifications.
   (Patch 7; Lucasz).

Ricardo Neri (5):
  x86/Documentation: Describe the Intel Hardware Feedback Interface
  x86: Add definitions for the Intel Hardware Feedback Interface
  thermal: intel: hfi: Minimally initialize the Hardware Feedback
  thermal: intel: hfi: Handle CPU hotplug events
  thermal: intel: hfi: Enable notification interrupt

Srinivas Pandruvada (2):
  thermal: netlink: Add a new event to notify CPU capabilities change
  thermal: intel: hfi: Notify user space for HFI events

 Documentation/x86/index.rst         |   1 +
 Documentation/x86/intel-hfi.rst     |  72 ++++
 arch/x86/include/asm/cpufeatures.h  |   1 +
 arch/x86/include/asm/msr-index.h    |   6 +
 drivers/thermal/intel/Kconfig       |  13 +
 drivers/thermal/intel/Makefile      |   1 +
 drivers/thermal/intel/intel_hfi.c   | 533 ++++++++++++++++++++++++++++
 drivers/thermal/intel/intel_hfi.h   |  17 +
 drivers/thermal/intel/therm_throt.c |  21 ++
 drivers/thermal/thermal_netlink.c   |  55 +++
 drivers/thermal/thermal_netlink.h   |  13 +
 include/uapi/linux/thermal.h        |   6 +-
 12 files changed, 738 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/x86/intel-hfi.rst
 create mode 100644 drivers/thermal/intel/intel_hfi.c
 create mode 100644 drivers/thermal/intel/intel_hfi.h


             reply	other threads:[~2021-12-20 15:23 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-20 15:14 Ricardo Neri [this message]
2021-12-20 15:14 ` [PATCH v2 1/7] x86/Documentation: Describe the Intel Hardware Feedback Interface Ricardo Neri
2021-12-20 15:14 ` [PATCH v2 2/7] x86: Add definitions for " Ricardo Neri
2021-12-30 18:03   ` Rafael J. Wysocki
2021-12-30 18:13     ` Borislav Petkov
2022-01-02 21:35       ` Ricardo Neri
2021-12-20 15:14 ` [PATCH v2 3/7] thermal: intel: hfi: Minimally initialize the " Ricardo Neri
2021-12-30 18:43   ` Rafael J. Wysocki
2022-01-02 21:46     ` Ricardo Neri
2022-01-03  2:22       ` srinivas pandruvada
2022-01-03 18:24         ` Ricardo Neri
2021-12-20 15:14 ` [PATCH v2 4/7] thermal: intel: hfi: Handle CPU hotplug events Ricardo Neri
2021-12-30 19:15   ` Rafael J. Wysocki
2022-01-02 21:34     ` Ricardo Neri
2021-12-20 15:14 ` [PATCH v2 5/7] thermal: intel: hfi: Enable notification interrupt Ricardo Neri
2021-12-20 15:14 ` [PATCH v2 6/7] thermal: netlink: Add a new event to notify CPU capabilities change Ricardo Neri
2021-12-20 15:14 ` [PATCH v2 7/7] thermal: intel: hfi: Notify user space for HFI events Ricardo Neri

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).