linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/22] sched: Introduce IPC classes for load balance
@ 2022-11-28 13:20 Ricardo Neri
  2022-11-28 13:20 ` [PATCH v2 01/22] sched/task_struct: Introduce IPC classes of tasks Ricardo Neri
                   ` (21 more replies)
  0 siblings, 22 replies; 39+ messages in thread
From: Ricardo Neri @ 2022-11-28 13:20 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86,
	Joel Fernandes (Google),
	linux-kernel, linux-pm, Ricardo Neri

Hi,

This is the v2 of the patchset. Since it did not receive strong objections
on the design, I took the liberty of promoting the series from RFC to
PATCH :)

The problem statement and design do not change in this version. Thus, I did
not repeat the cover letter. It can be retrieved here [1].

This series depends on my other patches to use identical asym_packing CPU
priorities on all the SMT siblings of a physical core on x86 [2].

These patches apply cleanly on top of [2]. For convenience, these patches
and [2] can be found here:

	https://github.com/ricardon/tip.git rneri/ipc_classes_v2 

Thanks and BR,
Ricardo

Changes since v1 (sorted by significance):
 * Renamed task_struct::class as task::struct_ipcc. (Joel)
 * Use task_struct::ipcc = 0 for unclassified tasks. (PeterZ)
 * Renamed CONFIG_SCHED_TASK_CLASSES as CONFIG_IPC_CLASSES. (PeterZ, Joel)
 * Dropped patch to take spin lock to read the HFI table from the
 * scheduler and from the HFI enabling code.
 * Implemented per-CPU variables to store the IPCC scores of each class.
   These can be read without holding a lock. (PeterZ).
 * Dropped patch to expose is_core_idle() outside the scheduler. It is
   now exposed as part of [2].
 * Implemented cleanups and reworks from PeterZ when collecting IPCC
   statistics. I took all his suggestions, except the computation of the
   total IPC score of two physical cores.
 * Quantified the cost of HRESET.
 * Use an ALTERNATIVE macro instead of static_cpu_has() to execute HRESET
   when supported. (PeterZ)
 * Fixed a bug when selecting a busiest runqueue: when comparing two
   runqueues with equal nr_running, we must compute the IPCC score delta
   of both runqueues.
 * Fixed the bit number DISABLE_ITD to the correct DISABLE_MASK: 14 instead
   of 13.
 * Redefined union hfi_thread_feedback_char_msr to ensure all
   bit-fields are packed. (PeterZ)
 * Use bit-fields to fit all the ipcc members of task_struct in 4 bytes.
   (PeterZ)
 * Shortened the names of the IPCC interfaces (PeterZ):
   sched_task_classes_enabled >> sched_ipcc_enabled
   arch_has_task_classes >> arch_has_ipc_classes
   arch_update_task_class >> arch_update_ipcc
   arch_get_task_class_score >> arch_get_ipcc_score
 * Removed smt_siblings_idle argument from arch_update_ipcc(). (PeterZ)
 * Added a comment to clarify why sched_asym_prefer() needs a tie breaker
   only in update_sd_pick_busiest(). (PeterZ)
 * Renamed functions for accuracy:
   sched_asym_class_prefer() >> sched_asym_ipcc_prefer()
   sched_asym_class_pick() >> sched_asym_ipcc_pick()
 * Renamed local variables to improve the layout of the code block I added
   in find_busiest_queue(). (PeterZ)
 * Removed proposed CONFIG_INTEL_THREAD_DIRECTOR Kconfig option.
 * Mark hardware_history_features as __ro_after_init instead of
   __read_mostly. (PeterZ)
 
[1]. https://lore.kernel.org/lkml/20220909231205.14009-1-ricardo.neri-calderon@linux.intel.com/
[2]. https://lore.kernel.org/lkml/20221122203532.15013-1-ricardo.neri-calderon@linux.intel.com/

Ricardo Neri (22):
  sched/task_struct: Introduce IPC classes of tasks
  sched: Add interfaces for IPC classes
  sched/core: Initialize the IPC class of a new task
  sched/core: Add user_tick as argument to scheduler_tick()
  sched/core: Update the IPC class of the current task
  sched/fair: Collect load-balancing stats for IPC classes
  sched/fair: Compute IPC class scores for load balancing
  sched/fair: Use IPC class to pick the busiest group
  sched/fair: Use IPC class score to select a busiest runqueue
  thermal: intel: hfi: Introduce Intel Thread Director classes
  thermal: intel: hfi: Store per-CPU IPCC scores
  x86/cpufeatures: Add the Intel Thread Director feature definitions
  thermal: intel: hfi: Update the IPC class of the current task
  thermal: intel: hfi: Report the IPC class score of a CPU
  thermal: intel: hfi: Define a default class for unclassified tasks
  thermal: intel: hfi: Enable the Intel Thread Director
  sched/task_struct: Add helpers for IPC classification
  sched/core: Initialize helpers of task classification
  thermal: intel: hfi: Implement model-specific checks for task
    classification
  x86/cpufeatures: Add feature bit for HRESET
  x86/hreset: Configure history reset
  x86/process: Reset hardware history in context switch

 arch/x86/include/asm/cpufeatures.h       |   2 +
 arch/x86/include/asm/disabled-features.h |   8 +-
 arch/x86/include/asm/hreset.h            |  30 +++
 arch/x86/include/asm/msr-index.h         |   6 +-
 arch/x86/include/asm/topology.h          |  10 +
 arch/x86/kernel/cpu/common.c             |  30 ++-
 arch/x86/kernel/cpu/cpuid-deps.c         |   1 +
 arch/x86/kernel/cpu/scattered.c          |   1 +
 arch/x86/kernel/process_32.c             |   3 +
 arch/x86/kernel/process_64.c             |   3 +
 drivers/thermal/intel/intel_hfi.c        | 229 ++++++++++++++++++++++-
 include/linux/sched.h                    |  22 ++-
 init/Kconfig                             |  12 ++
 kernel/sched/core.c                      |  10 +-
 kernel/sched/fair.c                      | 229 ++++++++++++++++++++++-
 kernel/sched/sched.h                     |  60 ++++++
 kernel/sched/topology.c                  |   8 +
 kernel/time/timer.c                      |   2 +-
 18 files changed, 653 insertions(+), 13 deletions(-)
 create mode 100644 arch/x86/include/asm/hreset.h

-- 
2.25.1


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v2 01/22] sched/task_struct: Introduce IPC classes of tasks
  2022-11-28 13:20 [PATCH v2 00/22] sched: Introduce IPC classes for load balance Ricardo Neri
@ 2022-11-28 13:20 ` Ricardo Neri
  2022-11-28 13:20 ` [PATCH v2 02/22] sched: Add interfaces for IPC classes Ricardo Neri
                   ` (20 subsequent siblings)
  21 siblings, 0 replies; 39+ messages in thread
From: Ricardo Neri @ 2022-11-28 13:20 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86,
	Joel Fernandes (Google),
	linux-kernel, linux-pm, Ricardo Neri, Tim C . Chen

On hybrid processors, the architecture differences between the types of
CPUs lead to different instructions-per-cycle (IPC) on each type of CPU.
IPCs may differ further by the type of instructions. Instructions can be
grouped into classes of similar IPCs.

Hence, tasks can be classified into groups based on the type of
instructions they execute.

Add a new member task_struct::ipcc to associate a particular task to
an IPC class that depends on the instructions it executes.

The scheduler may use the IPC class of a task and data about the
performance among CPUs of a given IPC class to improve throughput. It
may, for instance, place certain classes of tasks on CPUs of higher
performance.

The methods to determine the classification of a task and its relative
IPC score are specific to each CPU architecture.

Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim C. Chen <tim.c.chen@intel.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: x86@kernel.org
Cc: linux-pm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes since v1:
 * Renamed task_struct::class as task::struct_ipcc. (Joel)
 * Use task_struct::ipcc = 0 for unclassified tasks. (PeterZ)
 * Renamed CONFIG_SCHED_TASK_CLASSES as CONFIG_IPC_CLASSES. (PeterZ, Joel)
---
 include/linux/sched.h | 10 ++++++++++
 init/Kconfig          | 12 ++++++++++++
 2 files changed, 22 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 68c07ae0d7ff..47ae3557ba07 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -127,6 +127,8 @@ struct task_group;
 					 __TASK_TRACED | EXIT_DEAD | EXIT_ZOMBIE | \
 					 TASK_PARKED)
 
+#define IPC_CLASS_UNCLASSIFIED		0
+
 #define task_is_running(task)		(READ_ONCE((task)->__state) == TASK_RUNNING)
 
 #define task_is_traced(task)		((READ_ONCE(task->jobctl) & JOBCTL_TRACED) != 0)
@@ -1525,6 +1527,14 @@ struct task_struct {
 	union rv_task_monitor		rv[RV_PER_TASK_MONITORS];
 #endif
 
+#ifdef CONFIG_IPC_CLASSES
+	/*
+	 * A hardware-defined classification of task based on the number
+	 * of instructions per cycle.
+	 */
+	unsigned int			ipcc;
+#endif
+
 	/*
 	 * New fields for task_struct should be added above here, so that
 	 * they are included in the randomized portion of task_struct.
diff --git a/init/Kconfig b/init/Kconfig
index abf65098f1b6..cd17dd4d3718 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -865,6 +865,18 @@ config UCLAMP_BUCKETS_COUNT
 
 	  If in doubt, use the default value.
 
+config IPC_CLASSES
+	bool "IPC classes of tasks"
+	depends on SMP
+	help
+	  If selected, each task is assigned a classification value that
+	  reflects the type of instructions that the task executes. This
+	  classification reflects but is not equal to the number of
+	  instructions retired per cycle.
+
+	  The scheduler uses the classification value to improve the placement
+	  of tasks.
+
 endmenu
 
 #
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 02/22] sched: Add interfaces for IPC classes
  2022-11-28 13:20 [PATCH v2 00/22] sched: Introduce IPC classes for load balance Ricardo Neri
  2022-11-28 13:20 ` [PATCH v2 01/22] sched/task_struct: Introduce IPC classes of tasks Ricardo Neri
@ 2022-11-28 13:20 ` Ricardo Neri
  2022-12-08  8:48   ` Ionela Voinescu
  2022-12-14  7:36   ` Lukasz Luba
  2022-11-28 13:20 ` [PATCH v2 03/22] sched/core: Initialize the IPC class of a new task Ricardo Neri
                   ` (19 subsequent siblings)
  21 siblings, 2 replies; 39+ messages in thread
From: Ricardo Neri @ 2022-11-28 13:20 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86,
	Joel Fernandes (Google),
	linux-kernel, linux-pm, Ricardo Neri, Tim C . Chen

Add the interfaces that architectures shall implement to convey the data
to support IPC classes.

arch_update_ipcc() updates the IPC classification of the current task as
given by hardware.

arch_get_ipcc_score() provides a performance score for a given IPC class
when placed on a specific CPU. Higher scores indicate higher performance.

The number of classes and the score of each class of task are determined
by hardware.

Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim C. Chen <tim.c.chen@intel.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: x86@kernel.org
Cc: linux-pm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes since v1:
 * Shortened the names of the IPCC interfaces (PeterZ):
   sched_task_classes_enabled >> sched_ipcc_enabled
   arch_has_task_classes >> arch_has_ipc_classes
   arch_update_task_class >> arch_update_ipcc
   arch_get_task_class_score >> arch_get_ipcc_score
 * Removed smt_siblings_idle argument from arch_update_ipcc(). (PeterZ)
---
 kernel/sched/sched.h    | 60 +++++++++++++++++++++++++++++++++++++++++
 kernel/sched/topology.c |  8 ++++++
 2 files changed, 68 insertions(+)

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index b1d338a740e5..75e22baa2622 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2531,6 +2531,66 @@ void arch_scale_freq_tick(void)
 }
 #endif
 
+#ifdef CONFIG_IPC_CLASSES
+DECLARE_STATIC_KEY_FALSE(sched_ipcc);
+
+static inline bool sched_ipcc_enabled(void)
+{
+	return static_branch_unlikely(&sched_ipcc);
+}
+
+#ifndef arch_has_ipc_classes
+/**
+ * arch_has_ipc_classes() - Check whether hardware supports IPC classes of tasks
+ *
+ * Returns: true of IPC classes of tasks are supported.
+ */
+static __always_inline
+bool arch_has_ipc_classes(void)
+{
+	return false;
+}
+#endif
+
+#ifndef arch_update_ipcc
+/**
+ * arch_update_ipcc() - Update the IPC class of the current task
+ * @curr:		The current task
+ *
+ * Request that the IPC classification of @curr is updated.
+ *
+ * Returns: none
+ */
+static __always_inline
+void arch_update_ipcc(struct task_struct *curr)
+{
+}
+#endif
+
+#ifndef arch_get_ipcc_score
+/**
+ * arch_get_ipcc_score() - Get the IPC score of a class of task
+ * @ipcc:	The IPC class
+ * @cpu:	A CPU number
+ *
+ * Returns the performance score of an IPC class when running on @cpu.
+ * Error when either @class or @cpu are invalid.
+ */
+static __always_inline
+int arch_get_ipcc_score(unsigned short ipcc, int cpu)
+{
+	return 1;
+}
+#endif
+#else /* CONFIG_IPC_CLASSES */
+
+#define arch_get_ipcc_score(ipcc, cpu) (-EINVAL)
+#define arch_update_ipcc(curr)
+
+static inline bool sched_ipcc_enabled(void) { return false; }
+
+#endif /* CONFIG_IPC_CLASSES */
+
 #ifndef arch_scale_freq_capacity
 /**
  * arch_scale_freq_capacity - get the frequency scale factor of a given CPU.
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 8154ef590b9f..eb1654b64df7 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -669,6 +669,9 @@ DEFINE_PER_CPU(struct sched_domain __rcu *, sd_numa);
 DEFINE_PER_CPU(struct sched_domain __rcu *, sd_asym_packing);
 DEFINE_PER_CPU(struct sched_domain __rcu *, sd_asym_cpucapacity);
 DEFINE_STATIC_KEY_FALSE(sched_asym_cpucapacity);
+#ifdef CONFIG_IPC_CLASSES
+DEFINE_STATIC_KEY_FALSE(sched_ipcc);
+#endif
 
 static void update_top_cache_domain(int cpu)
 {
@@ -2388,6 +2391,11 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att
 	if (has_asym)
 		static_branch_inc_cpuslocked(&sched_asym_cpucapacity);
 
+#ifdef CONFIG_IPC_CLASSES
+	if (arch_has_ipc_classes())
+		static_branch_enable_cpuslocked(&sched_ipcc);
+#endif
+
 	if (rq && sched_debug_verbose) {
 		pr_info("root domain span: %*pbl (max cpu_capacity = %lu)\n",
 			cpumask_pr_args(cpu_map), rq->rd->max_cpu_capacity);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 03/22] sched/core: Initialize the IPC class of a new task
  2022-11-28 13:20 [PATCH v2 00/22] sched: Introduce IPC classes for load balance Ricardo Neri
  2022-11-28 13:20 ` [PATCH v2 01/22] sched/task_struct: Introduce IPC classes of tasks Ricardo Neri
  2022-11-28 13:20 ` [PATCH v2 02/22] sched: Add interfaces for IPC classes Ricardo Neri
@ 2022-11-28 13:20 ` Ricardo Neri
  2022-11-28 13:20 ` [PATCH v2 04/22] sched/core: Add user_tick as argument to scheduler_tick() Ricardo Neri
                   ` (18 subsequent siblings)
  21 siblings, 0 replies; 39+ messages in thread
From: Ricardo Neri @ 2022-11-28 13:20 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86,
	Joel Fernandes (Google),
	linux-kernel, linux-pm, Ricardo Neri, Tim C . Chen

New tasks shall start life as unclassified. They will be classified by
hardware when they run.

Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim C. Chen <tim.c.chen@intel.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: x86@kernel.org
Cc: linux-pm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes since v1:
 * None
---
 kernel/sched/core.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 78b2d5cabcc5..8dd43ee05534 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4372,6 +4372,9 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p)
 	p->se.prev_sum_exec_runtime	= 0;
 	p->se.nr_migrations		= 0;
 	p->se.vruntime			= 0;
+#ifdef CONFIG_IPC_CLASSES
+	p->ipcc				= IPC_CLASS_UNCLASSIFIED;
+#endif
 	INIT_LIST_HEAD(&p->se.group_node);
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 04/22] sched/core: Add user_tick as argument to scheduler_tick()
  2022-11-28 13:20 [PATCH v2 00/22] sched: Introduce IPC classes for load balance Ricardo Neri
                   ` (2 preceding siblings ...)
  2022-11-28 13:20 ` [PATCH v2 03/22] sched/core: Initialize the IPC class of a new task Ricardo Neri
@ 2022-11-28 13:20 ` Ricardo Neri
  2022-12-07 12:21   ` Dietmar Eggemann
  2022-11-28 13:20 ` [PATCH v2 05/22] sched/core: Update the IPC class of the current task Ricardo Neri
                   ` (17 subsequent siblings)
  21 siblings, 1 reply; 39+ messages in thread
From: Ricardo Neri @ 2022-11-28 13:20 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86,
	Joel Fernandes (Google),
	linux-kernel, linux-pm, Ricardo Neri, Tim C . Chen

Differentiate between user and kernel ticks so that the scheduler updates
the IPC class of the current task during the latter.

Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim C. Chen <tim.c.chen@intel.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: x86@kernel.org
Cc: linux-pm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes since v1:
 * None
---
 include/linux/sched.h | 2 +-
 kernel/sched/core.c   | 2 +-
 kernel/time/timer.c   | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 47ae3557ba07..ddabc7449edd 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -293,7 +293,7 @@ enum {
 	TASK_COMM_LEN = 16,
 };
 
-extern void scheduler_tick(void);
+extern void scheduler_tick(bool user_tick);
 
 #define	MAX_SCHEDULE_TIMEOUT		LONG_MAX
 
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8dd43ee05534..8bb6f597c42b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5487,7 +5487,7 @@ static inline u64 cpu_resched_latency(struct rq *rq) { return 0; }
  * This function gets called by the timer code, with HZ frequency.
  * We call it with interrupts disabled.
  */
-void scheduler_tick(void)
+void scheduler_tick(bool user_tick)
 {
 	int cpu = smp_processor_id();
 	struct rq *rq = cpu_rq(cpu);
diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 63a8ce7177dd..e15e24105891 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -2073,7 +2073,7 @@ void update_process_times(int user_tick)
 	if (in_irq())
 		irq_work_tick();
 #endif
-	scheduler_tick();
+	scheduler_tick(user_tick);
 	if (IS_ENABLED(CONFIG_POSIX_TIMERS))
 		run_posix_cpu_timers();
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 05/22] sched/core: Update the IPC class of the current task
  2022-11-28 13:20 [PATCH v2 00/22] sched: Introduce IPC classes for load balance Ricardo Neri
                   ` (3 preceding siblings ...)
  2022-11-28 13:20 ` [PATCH v2 04/22] sched/core: Add user_tick as argument to scheduler_tick() Ricardo Neri
@ 2022-11-28 13:20 ` Ricardo Neri
  2022-11-28 13:20 ` [PATCH v2 06/22] sched/fair: Collect load-balancing stats for IPC classes Ricardo Neri
                   ` (16 subsequent siblings)
  21 siblings, 0 replies; 39+ messages in thread
From: Ricardo Neri @ 2022-11-28 13:20 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86,
	Joel Fernandes (Google),
	linux-kernel, linux-pm, Ricardo Neri, Tim C . Chen

When supported, hardware monitors the instruction stream to classify the
current task. Hence, at userspace tick, we are ready to read the most
recent classification result for the current task.

Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim C. Chen <tim.c.chen@intel.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: x86@kernel.org
Cc: linux-pm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes since v1:
 * Removed argument smt_siblings_idle from call to arch_ipcc_update().
 * Used the new IPCC interfaces names.
---
 kernel/sched/core.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8bb6f597c42b..2cd409536b72 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5496,6 +5496,9 @@ void scheduler_tick(bool user_tick)
 	unsigned long thermal_pressure;
 	u64 resched_latency;
 
+	if (sched_ipcc_enabled() && user_tick)
+		arch_update_ipcc(curr);
+
 	arch_scale_freq_tick();
 	sched_clock_tick();
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 06/22] sched/fair: Collect load-balancing stats for IPC classes
  2022-11-28 13:20 [PATCH v2 00/22] sched: Introduce IPC classes for load balance Ricardo Neri
                   ` (4 preceding siblings ...)
  2022-11-28 13:20 ` [PATCH v2 05/22] sched/core: Update the IPC class of the current task Ricardo Neri
@ 2022-11-28 13:20 ` Ricardo Neri
  2022-12-07 17:00   ` Dietmar Eggemann
  2022-12-08  8:50   ` Ionela Voinescu
  2022-11-28 13:20 ` [PATCH v2 07/22] sched/fair: Compute IPC class scores for load balancing Ricardo Neri
                   ` (15 subsequent siblings)
  21 siblings, 2 replies; 39+ messages in thread
From: Ricardo Neri @ 2022-11-28 13:20 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86,
	Joel Fernandes (Google),
	linux-kernel, linux-pm, Ricardo Neri, Tim C . Chen

When selecting a busiest scheduling group, the IPC class of the current
task can be used to select between two scheduling groups of equal
asym_packing priority and number of running tasks.

Compute a new IPC class performance score for a scheduling group. It
is the sum of the performance of the current tasks of all the runqueues.

Also, keep track of the task with the lowest IPC class score on the
scheduling group.

These two metrics will be used during idle load balancing to compute the
current and the prospective task-class performance of a scheduling
group.

Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim C. Chen <tim.c.chen@intel.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: x86@kernel.org
Cc: linux-pm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes since v1:
 * Implemented cleanups and reworks from PeterZ. Thanks!
 * Used the new interface names.
---
 kernel/sched/fair.c | 55 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 55 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 224107278471..3a1d6c50a19b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9100,6 +9100,57 @@ group_type group_classify(unsigned int imbalance_pct,
 	return group_has_spare;
 }
 
+struct sg_lb_ipcc_stats {
+	int min_score;	/* Min(score(rq->curr->ipcc)) */
+	int min_ipcc;	/* Min(rq->curr->ipcc) */
+	long sum_score; /* Sum(score(rq->curr->ipcc)) */
+};
+
+#ifdef CONFIG_IPC_CLASSES
+static void init_rq_ipcc_stats(struct sg_lb_ipcc_stats *sgcs)
+{
+	*sgcs = (struct sg_lb_ipcc_stats) {
+		.min_score = INT_MAX,
+	};
+}
+
+/** Called only if cpu_of(@rq) is not idle and has tasks running. */
+static void update_sg_lb_ipcc_stats(struct sg_lb_ipcc_stats *sgcs,
+				    struct rq *rq)
+{
+	struct task_struct *curr;
+	unsigned short ipcc;
+	int score;
+
+	if (!sched_ipcc_enabled())
+		return;
+
+	curr = rcu_dereference(rq->curr);
+	if (!curr || (curr->flags & PF_EXITING) || is_idle_task(curr))
+		return;
+
+	ipcc = curr->ipcc;
+	score = arch_get_ipcc_score(ipcc, cpu_of(rq));
+
+	sgcs->sum_score += score;
+
+	if (score < sgcs->min_score) {
+		sgcs->min_score = score;
+		sgcs->min_ipcc = ipcc;
+	}
+}
+
+#else /* CONFIG_IPC_CLASSES */
+static void update_sg_lb_ipcc_stats(struct sg_lb_ipcc_stats *sgcs,
+				    struct rq *rq)
+{
+}
+
+static void init_rq_ipcc_stats(struct sg_lb_ipcc_stats *class_sgs)
+{
+}
+#endif /* CONFIG_IPC_CLASSES */
+
 /**
  * asym_smt_can_pull_tasks - Check whether the load balancing CPU can pull tasks
  * @dst_cpu:	Destination CPU of the load balancing
@@ -9212,9 +9263,11 @@ static inline void update_sg_lb_stats(struct lb_env *env,
 				      struct sg_lb_stats *sgs,
 				      int *sg_status)
 {
+	struct sg_lb_ipcc_stats sgcs;
 	int i, nr_running, local_group;
 
 	memset(sgs, 0, sizeof(*sgs));
+	init_rq_ipcc_stats(&sgcs);
 
 	local_group = group == sds->local;
 
@@ -9264,6 +9317,8 @@ static inline void update_sg_lb_stats(struct lb_env *env,
 			if (sgs->group_misfit_task_load < load)
 				sgs->group_misfit_task_load = load;
 		}
+
+		update_sg_lb_ipcc_stats(&sgcs, rq);
 	}
 
 	sgs->group_capacity = group->sgc->capacity;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 07/22] sched/fair: Compute IPC class scores for load balancing
  2022-11-28 13:20 [PATCH v2 00/22] sched: Introduce IPC classes for load balance Ricardo Neri
                   ` (5 preceding siblings ...)
  2022-11-28 13:20 ` [PATCH v2 06/22] sched/fair: Collect load-balancing stats for IPC classes Ricardo Neri
@ 2022-11-28 13:20 ` Ricardo Neri
  2022-11-28 13:20 ` [PATCH v2 08/22] sched/fair: Use IPC class to pick the busiest group Ricardo Neri
                   ` (14 subsequent siblings)
  21 siblings, 0 replies; 39+ messages in thread
From: Ricardo Neri @ 2022-11-28 13:20 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86,
	Joel Fernandes (Google),
	linux-kernel, linux-pm, Ricardo Neri, Tim C . Chen

Compute the joint total (both current and prospective) IPC class score of
a scheduling group and the local scheduling group.

These IPCC statistics are used during asym_packing load balancing. It
implies that the candidate sched group will have one fewer busy CPU after
load balancing. This observation is important for physical cores with
SMT support.

The IPCC score of scheduling groups composed of SMT siblings needs to
consider that the siblings share CPU resources. When computing the total
IPCC score of the scheduling group, divide score from each sibilng by
the number of busy siblings.

Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim C. Chen <tim.c.chen@intel.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: x86@kernel.org
Cc: linux-pm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes since v1:
 * Implemented cleanups and reworks from PeterZ. I took all his
   suggestions, except the computation of the  IPC score before and after
   load balancing. We are computing not the average score, but the *total*.
 * Check for the SD_SHARE_CPUCAPACITY to compute the throughput of the SMT
   siblings of a physical core.
 * Used the new interface names.
 * Reworded commit message for clarity.
---
 kernel/sched/fair.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 45 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3a1d6c50a19b..e333f9623b3a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8766,6 +8766,10 @@ struct sg_lb_stats {
 	unsigned int nr_numa_running;
 	unsigned int nr_preferred_running;
 #endif
+#ifdef CONFIG_IPC_CLASSES
+	long ipcc_score_after; /* Prospective IPCC score after load balancing */
+	long ipcc_score_before; /* IPCC score before load balancing */
+#endif
 };
 
 /*
@@ -9140,6 +9144,38 @@ static void update_sg_lb_ipcc_stats(struct sg_lb_ipcc_stats *sgcs,
 	}
 }
 
+static void update_sg_lb_stats_scores(struct sg_lb_ipcc_stats *sgcs,
+				      struct sg_lb_stats *sgs,
+				      struct sched_group *sg,
+				      int dst_cpu)
+{
+	int busy_cpus, score_on_dst_cpu;
+	long before, after;
+
+	if (!sched_ipcc_enabled())
+		return;
+
+	busy_cpus = sgs->group_weight - sgs->idle_cpus;
+	/* No busy CPUs in the group. No tasks to move. */
+	if (!busy_cpus)
+		return;
+
+	score_on_dst_cpu = arch_get_ipcc_score(sgcs->min_ipcc, dst_cpu);
+
+	before = sgcs->sum_score;
+	after = before - sgcs->min_score;
+
+	/* SMT siblings share throughput. */
+	if (busy_cpus > 1 && sg->flags & SD_SHARE_CPUCAPACITY) {
+		before /= busy_cpus;
+		/* One sibling will become idle after load balance. */
+		after /= busy_cpus - 1;
+	}
+
+	sgs->ipcc_score_after = after + score_on_dst_cpu;
+	sgs->ipcc_score_before = before;
+}
+
 #else /* CONFIG_IPC_CLASSES */
 static void update_sg_lb_ipcc_stats(struct sg_lb_ipcc_stats *sgcs,
 				    struct rq *rq)
@@ -9149,6 +9185,14 @@ static void update_sg_lb_ipcc_stats(struct sg_lb_ipcc_stats *sgcs,
 static void init_rq_ipcc_stats(struct sg_lb_ipcc_stats *class_sgs)
 {
 }
+
+static void update_sg_lb_stats_scores(struct sg_lb_ipcc_stats *sgcs,
+				      struct sg_lb_stats *sgs,
+				      struct sched_group *sg,
+				      int dst_cpu)
+{
+}
+
 #endif /* CONFIG_IPC_CLASSES */
 
 /**
@@ -9329,6 +9373,7 @@ static inline void update_sg_lb_stats(struct lb_env *env,
 	if (!local_group && env->sd->flags & SD_ASYM_PACKING &&
 	    env->idle != CPU_NOT_IDLE && sgs->sum_h_nr_running &&
 	    sched_asym(env, sds, sgs, group)) {
+		update_sg_lb_stats_scores(&sgcs, sgs, group, env->dst_cpu);
 		sgs->group_asym_packing = 1;
 	}
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 08/22] sched/fair: Use IPC class to pick the busiest group
  2022-11-28 13:20 [PATCH v2 00/22] sched: Introduce IPC classes for load balance Ricardo Neri
                   ` (6 preceding siblings ...)
  2022-11-28 13:20 ` [PATCH v2 07/22] sched/fair: Compute IPC class scores for load balancing Ricardo Neri
@ 2022-11-28 13:20 ` Ricardo Neri
  2022-11-28 13:20 ` [PATCH v2 09/22] sched/fair: Use IPC class score to select a busiest runqueue Ricardo Neri
                   ` (13 subsequent siblings)
  21 siblings, 0 replies; 39+ messages in thread
From: Ricardo Neri @ 2022-11-28 13:20 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86,
	Joel Fernandes (Google),
	linux-kernel, linux-pm, Ricardo Neri, Tim C . Chen

As it iterates, update_sd_pick_busiest() keeps on selecting as busiest
sched groups of identical priority. Since both groups have the same
priority, either group is a good choice. The IPCC score of the tasks
placed a sched group can break this tie.

Pick as busiest the sched group that yields a higher IPCC score after
load balancing.

Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim C. Chen <tim.c.chen@intel.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: x86@kernel.org
Cc: linux-pm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes since v1:
 * Added a comment to clarify why sched_asym_prefer() needs a tie breaker
   only in update_sd_pick_busiest(). (PeterZ)
 * Renamed functions for accuracy:
   sched_asym_class_prefer() >> sched_asym_ipcc_prefer()
   sched_asym_class_pick() >> sched_asym_ipcc_pick()
 * Reworded commit message for clarity.
---
 kernel/sched/fair.c | 75 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 75 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e333f9623b3a..e8b181c31842 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9176,6 +9176,63 @@ static void update_sg_lb_stats_scores(struct sg_lb_ipcc_stats *sgcs,
 	sgs->ipcc_score_before = before;
 }
 
+/**
+ * sched_asym_ipcc_prefer - Select a sched group based on its IPCC score
+ * @a:	Load balancing statistics of @sg_a
+ * @b:	Load balancing statistics of @sg_b
+ *
+ * Returns: true if preferring @a has a higher IPCC score than @b after
+ * balancing load. Returns false otherwise.
+ */
+static bool sched_asym_ipcc_prefer(struct sg_lb_stats *a,
+				   struct sg_lb_stats *b)
+{
+	if (!sched_ipcc_enabled())
+		return false;
+
+	/* @a increases overall throughput after load balance. */
+	if (a->ipcc_score_after > b->ipcc_score_after)
+		return true;
+
+	/*
+	 * If @a and @b yield the same overall throughput, pick @a if
+	 * its current throughput is lower than that of @b.
+	 */
+	if (a->ipcc_score_after == b->ipcc_score_after)
+		return a->ipcc_score_before < b->ipcc_score_before;
+
+	return false;
+}
+
+/**
+ * sched_asym_ipcc_pick - Select a sched group based on its IPCC score
+ * @a:		A scheduling group
+ * @b:		A second scheduling group
+ * @a_stats:	Load balancing statistics of @a
+ * @b_stats:	Load balancing statistics of @b
+ *
+ * Returns: true if @a has the same priority and @a has tasks with IPCC classes
+ * that yield higher overall throughput after load balance.
+ * Returns false otherwise.
+ */
+static bool sched_asym_ipcc_pick(struct sched_group *a,
+				 struct sched_group *b,
+				 struct sg_lb_stats *a_stats,
+				 struct sg_lb_stats *b_stats)
+{
+	/*
+	 * Only use the class-specific preference selection if both sched
+	 * groups have the same priority. We are not looking at a specific
+	 * CPU. We do not care about the idle state of the groups'
+	 * preferred CPU.
+	 */
+	if (arch_asym_cpu_priority(a->asym_prefer_cpu, false) !=
+	    arch_asym_cpu_priority(b->asym_prefer_cpu, false))
+		return false;
+
+	return sched_asym_ipcc_prefer(a_stats, b_stats);
+}
+
 #else /* CONFIG_IPC_CLASSES */
 static void update_sg_lb_ipcc_stats(struct sg_lb_ipcc_stats *sgcs,
 				    struct rq *rq)
@@ -9193,6 +9250,14 @@ static void update_sg_lb_stats_scores(struct sg_lb_ipcc_stats *sgcs,
 {
 }
 
+static bool sched_asym_ipcc_pick(struct sched_group *a,
+				 struct sched_group *b,
+				 struct sg_lb_stats *a_stats,
+				 struct sg_lb_stats *b_stats)
+{
+	return false;
+}
+
 #endif /* CONFIG_IPC_CLASSES */
 
 /**
@@ -9452,6 +9517,16 @@ static bool update_sd_pick_busiest(struct lb_env *env,
 				      sds->busiest->asym_prefer_cpu,
 				      false))
 			return false;
+
+		/*
+		 * Unlike other callers of sched_asym_prefer(), here both @sg
+		 * and @sds::busiest have tasks running. When they have equal
+		 * priority, their IPC class scores can be used to select a
+		 * better busiest.
+		 */
+		if (sched_asym_ipcc_pick(sds->busiest, sg, &sds->busiest_stat, sgs))
+			return false;
+
 		break;
 
 	case group_misfit_task:
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 09/22] sched/fair: Use IPC class score to select a busiest runqueue
  2022-11-28 13:20 [PATCH v2 00/22] sched: Introduce IPC classes for load balance Ricardo Neri
                   ` (7 preceding siblings ...)
  2022-11-28 13:20 ` [PATCH v2 08/22] sched/fair: Use IPC class to pick the busiest group Ricardo Neri
@ 2022-11-28 13:20 ` Ricardo Neri
  2022-12-08  8:51   ` Ionela Voinescu
  2022-11-28 13:20 ` [PATCH v2 10/22] thermal: intel: hfi: Introduce Intel Thread Director classes Ricardo Neri
                   ` (12 subsequent siblings)
  21 siblings, 1 reply; 39+ messages in thread
From: Ricardo Neri @ 2022-11-28 13:20 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86,
	Joel Fernandes (Google),
	linux-kernel, linux-pm, Ricardo Neri, Tim C . Chen

For two runqueues of equal priority and equal number of running of tasks,
select the one whose current task would have the highest IPC class score
if placed on the destination CPU.

Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim C. Chen <tim.c.chen@intel.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: x86@kernel.org
Cc: linux-pm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes since v1:
 * Fixed a bug when selecting a busiest runqueue: when comparing two
   runqueues with equal nr_running, we must compute the IPCC score delta
   of both.
 * Renamed local variables to improve the layout of the code block.
   (PeterZ)
 * Used the new interface names.
---
 kernel/sched/fair.c | 54 ++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 53 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e8b181c31842..113470bbd7a5 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9233,6 +9233,24 @@ static bool sched_asym_ipcc_pick(struct sched_group *a,
 	return sched_asym_ipcc_prefer(a_stats, b_stats);
 }
 
+/**
+ * ipcc_score_delta - Get the IPCC score delta on a different CPU
+ * @p:		A task
+ * @alt_cpu:	A prospective CPU to place @p
+ *
+ * Returns: The IPCC score delta that @p would get if placed on @alt_cpu
+ */
+static int ipcc_score_delta(struct task_struct *p, int alt_cpu)
+{
+	unsigned long ipcc = p->ipcc;
+
+	if (!sched_ipcc_enabled())
+		return INT_MIN;
+
+	return arch_get_ipcc_score(ipcc, alt_cpu) -
+	       arch_get_ipcc_score(ipcc, task_cpu(p));
+}
+
 #else /* CONFIG_IPC_CLASSES */
 static void update_sg_lb_ipcc_stats(struct sg_lb_ipcc_stats *sgcs,
 				    struct rq *rq)
@@ -9258,6 +9276,11 @@ static bool sched_asym_ipcc_pick(struct sched_group *a,
 	return false;
 }
 
+static int ipcc_score_delta(struct task_struct *p, int alt_cpu)
+{
+	return INT_MIN;
+}
+
 #endif /* CONFIG_IPC_CLASSES */
 
 /**
@@ -10419,8 +10442,8 @@ static struct rq *find_busiest_queue(struct lb_env *env,
 {
 	struct rq *busiest = NULL, *rq;
 	unsigned long busiest_util = 0, busiest_load = 0, busiest_capacity = 1;
+	int i, busiest_ipcc_delta = INT_MIN;
 	unsigned int busiest_nr = 0;
-	int i;
 
 	for_each_cpu_and(i, sched_group_span(group), env->cpus) {
 		unsigned long capacity, load, util;
@@ -10526,8 +10549,37 @@ static struct rq *find_busiest_queue(struct lb_env *env,
 
 		case migrate_task:
 			if (busiest_nr < nr_running) {
+				struct task_struct *curr;
+
 				busiest_nr = nr_running;
 				busiest = rq;
+
+				/*
+				 * Remember the IPC score delta of busiest::curr.
+				 * We may need it to break a tie with other queues
+				 * with equal nr_running.
+				 */
+				curr = rcu_dereference(busiest->curr);
+				busiest_ipcc_delta = ipcc_score_delta(curr,
+								      env->dst_cpu);
+			/*
+			 * If rq and busiest have the same number of running
+			 * tasks, pick rq if doing so would give rq::curr a
+			 * bigger IPC boost on dst_cpu.
+			 */
+			} else if (sched_ipcc_enabled() &&
+				   busiest_nr == nr_running) {
+				struct task_struct *curr;
+				int delta;
+
+				curr = rcu_dereference(rq->curr);
+				delta = ipcc_score_delta(curr, env->dst_cpu);
+
+				if (busiest_ipcc_delta < delta) {
+					busiest_ipcc_delta = delta;
+					busiest_nr = nr_running;
+					busiest = rq;
+				}
 			}
 			break;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 10/22] thermal: intel: hfi: Introduce Intel Thread Director classes
  2022-11-28 13:20 [PATCH v2 00/22] sched: Introduce IPC classes for load balance Ricardo Neri
                   ` (8 preceding siblings ...)
  2022-11-28 13:20 ` [PATCH v2 09/22] sched/fair: Use IPC class score to select a busiest runqueue Ricardo Neri
@ 2022-11-28 13:20 ` Ricardo Neri
  2022-11-28 13:20 ` [PATCH v2 11/22] thermal: intel: hfi: Store per-CPU IPCC scores Ricardo Neri
                   ` (11 subsequent siblings)
  21 siblings, 0 replies; 39+ messages in thread
From: Ricardo Neri @ 2022-11-28 13:20 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86,
	Joel Fernandes (Google),
	linux-kernel, linux-pm, Ricardo Neri, Tim C . Chen

On Intel hybrid parts, each type of CPU has specific performance and
energy efficiency capabilities. The Intel Thread Director technology
extends the Hardware Feedback Interface (HFI) to provide performance and
energy efficiency data for advanced classes of instructions.

Add support to parse and parse per-class capabilities.

Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim C. Chen <tim.c.chen@intel.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: x86@kernel.org
Cc: linux-pm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes since v1:
 * Removed a now obsolete comment.
---
 drivers/thermal/intel/intel_hfi.c | 30 ++++++++++++++++++++++++------
 1 file changed, 24 insertions(+), 6 deletions(-)

diff --git a/drivers/thermal/intel/intel_hfi.c b/drivers/thermal/intel/intel_hfi.c
index a0640f762dc5..df4dc50e19fb 100644
--- a/drivers/thermal/intel/intel_hfi.c
+++ b/drivers/thermal/intel/intel_hfi.c
@@ -79,7 +79,7 @@ union cpuid6_edx {
  * @ee_cap:		Energy efficiency capability
  *
  * Capabilities of a logical processor in the HFI table. These capabilities are
- * unitless.
+ * unitless and specific to each HFI class.
  */
 struct hfi_cpu_data {
 	u8	perf_cap;
@@ -91,7 +91,8 @@ struct hfi_cpu_data {
  * @perf_updated:	Hardware updated performance capabilities
  * @ee_updated:		Hardware updated energy efficiency capabilities
  *
- * Properties of the data in an HFI table.
+ * Properties of the data in an HFI table. There exists one header per each
+ * HFI class.
  */
 struct hfi_hdr {
 	u8	perf_updated;
@@ -129,16 +130,21 @@ struct hfi_instance {
 
 /**
  * struct hfi_features - Supported HFI features
+ * @nr_classes:		Number of classes supported
  * @nr_table_pages:	Size of the HFI table in 4KB pages
  * @cpu_stride:		Stride size to locate the capability data of a logical
  *			processor within the table (i.e., row stride)
+ * @class_stride:	Stride size to locate a class within the capability
+ *			data of a logical processor or the HFI table header
  * @hdr_size:		Size of the table header
  *
  * Parameters and supported features that are common to all HFI instances
  */
 struct hfi_features {
+	unsigned int	nr_classes;
 	unsigned int	nr_table_pages;
 	unsigned int	cpu_stride;
+	unsigned int	class_stride;
 	unsigned int	hdr_size;
 };
 
@@ -325,8 +331,8 @@ static void init_hfi_cpu_index(struct hfi_cpu_info *info)
 }
 
 /*
- * The format of the HFI table depends on the number of capabilities that the
- * hardware supports. Keep a data structure to navigate the table.
+ * The format of the HFI table depends on the number of capabilities and classes
+ * that the hardware supports. Keep a data structure to navigate the table.
  */
 static void init_hfi_instance(struct hfi_instance *hfi_instance)
 {
@@ -507,18 +513,30 @@ static __init int hfi_parse_features(void)
 	/* The number of 4KB pages required by the table */
 	hfi_features.nr_table_pages = edx.split.table_pages + 1;
 
+	/*
+	 * Capability fields of an HFI class are grouped together. Classes are
+	 * contiguous in memory.  Hence, use the number of supported features to
+	 * locate a specific class.
+	 */
+	hfi_features.class_stride = nr_capabilities;
+
+	/* For now, use only one class of the HFI table */
+	hfi_features.nr_classes = 1;
+
 	/*
 	 * The header contains change indications for each supported feature.
 	 * The size of the table header is rounded up to be a multiple of 8
 	 * bytes.
 	 */
-	hfi_features.hdr_size = DIV_ROUND_UP(nr_capabilities, 8) * 8;
+	hfi_features.hdr_size = DIV_ROUND_UP(nr_capabilities *
+					     hfi_features.nr_classes, 8) * 8;
 
 	/*
 	 * Data of each logical processor is also rounded up to be a multiple
 	 * of 8 bytes.
 	 */
-	hfi_features.cpu_stride = DIV_ROUND_UP(nr_capabilities, 8) * 8;
+	hfi_features.cpu_stride = DIV_ROUND_UP(nr_capabilities *
+					       hfi_features.nr_classes, 8) * 8;
 
 	return 0;
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 11/22] thermal: intel: hfi: Store per-CPU IPCC scores
  2022-11-28 13:20 [PATCH v2 00/22] sched: Introduce IPC classes for load balance Ricardo Neri
                   ` (9 preceding siblings ...)
  2022-11-28 13:20 ` [PATCH v2 10/22] thermal: intel: hfi: Introduce Intel Thread Director classes Ricardo Neri
@ 2022-11-28 13:20 ` Ricardo Neri
  2022-11-28 13:20 ` [PATCH v2 12/22] x86/cpufeatures: Add the Intel Thread Director feature definitions Ricardo Neri
                   ` (10 subsequent siblings)
  21 siblings, 0 replies; 39+ messages in thread
From: Ricardo Neri @ 2022-11-28 13:20 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86,
	Joel Fernandes (Google),
	linux-kernel, linux-pm, Ricardo Neri, Tim C . Chen

The scheduler reads the IPCC scores when balancing load. These reads can
be quite frequent. Hardware can also update the HFI table frequently.
Concurrent access may cause a lot of contention. It gets worse as the
number of CPUs increases.

Instead, create separate per-CPU IPCC scores that the scheduler can read
without the HFI table lock.

Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim C. Chen <tim.c.chen@intel.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: x86@kernel.org
Cc: linux-pm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes since v1:
 * Added this patch.
---
 drivers/thermal/intel/intel_hfi.c | 38 +++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/drivers/thermal/intel/intel_hfi.c b/drivers/thermal/intel/intel_hfi.c
index df4dc50e19fb..56dba967849c 100644
--- a/drivers/thermal/intel/intel_hfi.c
+++ b/drivers/thermal/intel/intel_hfi.c
@@ -29,6 +29,7 @@
 #include <linux/kernel.h>
 #include <linux/math.h>
 #include <linux/mutex.h>
+#include <linux/percpu.h>
 #include <linux/percpu-defs.h>
 #include <linux/printk.h>
 #include <linux/processor.h>
@@ -172,6 +173,35 @@ static struct workqueue_struct *hfi_updates_wq;
 #define HFI_UPDATE_INTERVAL		HZ
 #define HFI_MAX_THERM_NOTIFY_COUNT	16
 
+#ifdef CONFIG_IPC_CLASSES
+static int __percpu *hfi_ipcc_scores;
+
+static int alloc_hfi_ipcc_scores(void)
+{
+	hfi_ipcc_scores = __alloc_percpu(sizeof(*hfi_ipcc_scores) *
+					 hfi_features.nr_classes,
+					 sizeof(*hfi_ipcc_scores));
+
+	return !hfi_ipcc_scores;
+}
+
+static void set_hfi_ipcc_score(void *caps, int cpu)
+{
+	int i, *hfi_class = per_cpu_ptr(hfi_ipcc_scores, cpu);
+
+	for (i = 0;  i < hfi_features.nr_classes; i++) {
+		struct hfi_cpu_data *class_caps;
+
+		class_caps = caps + i * hfi_features.class_stride;
+		WRITE_ONCE(hfi_class[i], class_caps->perf_cap);
+	}
+}
+
+#else
+static int alloc_hfi_ipcc_scores(void) { return 0; }
+static void set_hfi_ipcc_score(void *caps, int cpu) { }
+#endif /* CONFIG_IPC_CLASSES */
+
 static void get_hfi_caps(struct hfi_instance *hfi_instance,
 			 struct thermal_genl_cpu_caps *cpu_caps)
 {
@@ -194,6 +224,8 @@ static void get_hfi_caps(struct hfi_instance *hfi_instance,
 		cpu_caps[i].efficiency = caps->ee_cap << 2;
 
 		++i;
+
+		set_hfi_ipcc_score(caps, cpu);
 	}
 	raw_spin_unlock_irq(&hfi_instance->table_lock);
 }
@@ -572,8 +604,14 @@ void __init intel_hfi_init(void)
 	if (!hfi_updates_wq)
 		goto err_nomem;
 
+	if (alloc_hfi_ipcc_scores())
+		goto err_ipcc;
+
 	return;
 
+err_ipcc:
+	destroy_workqueue(hfi_updates_wq);
+
 err_nomem:
 	for (j = 0; j < i; ++j) {
 		hfi_instance = &hfi_instances[j];
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 12/22] x86/cpufeatures: Add the Intel Thread Director feature definitions
  2022-11-28 13:20 [PATCH v2 00/22] sched: Introduce IPC classes for load balance Ricardo Neri
                   ` (10 preceding siblings ...)
  2022-11-28 13:20 ` [PATCH v2 11/22] thermal: intel: hfi: Store per-CPU IPCC scores Ricardo Neri
@ 2022-11-28 13:20 ` Ricardo Neri
  2022-11-28 13:20 ` [PATCH v2 13/22] thermal: intel: hfi: Update the IPC class of the current task Ricardo Neri
                   ` (9 subsequent siblings)
  21 siblings, 0 replies; 39+ messages in thread
From: Ricardo Neri @ 2022-11-28 13:20 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86,
	Joel Fernandes (Google),
	linux-kernel, linux-pm, Ricardo Neri, Tim C . Chen

Intel Thread Director (ITD) provides hardware resources to classify
the current task. The classification reflects the type of instructions that
a task currently executes.

ITD extends the Hardware Feedback Interface table to provide performance
and energy efficiency capabilities for each of the supported classes of
tasks.

Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim C. Chen <tim.c.chen@intel.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: x86@kernel.org
Cc: linux-pm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes since v1:
 * Removed dependency on CONFIG_INTEL_THREAD_DIRECTOR. Instead, depend on
   CONFIG_IPC_CLASSES.
 * Added DISABLE_ITD to the correct DISABLE_MASK: 14 instead of 13.
---
 arch/x86/include/asm/cpufeatures.h       | 1 +
 arch/x86/include/asm/disabled-features.h | 8 +++++++-
 arch/x86/kernel/cpu/cpuid-deps.c         | 1 +
 3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index b6525491a41b..80b2beafc81e 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -344,6 +344,7 @@
 #define X86_FEATURE_HWP_EPP		(14*32+10) /* HWP Energy Perf. Preference */
 #define X86_FEATURE_HWP_PKG_REQ		(14*32+11) /* HWP Package Level Request */
 #define X86_FEATURE_HFI			(14*32+19) /* Hardware Feedback Interface */
+#define X86_FEATURE_ITD			(14*32+23) /* Intel Thread Director */
 
 /* AMD SVM Feature Identification, CPUID level 0x8000000a (EDX), word 15 */
 #define X86_FEATURE_NPT			(15*32+ 0) /* Nested Page Table support */
diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
index c44b56f7ffba..0edd9bef7f2e 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -99,6 +99,12 @@
 # define DISABLE_TDX_GUEST	(1 << (X86_FEATURE_TDX_GUEST & 31))
 #endif
 
+#ifdef CONFIG_IPC_CLASSES
+# define DISABLE_ITD	0
+#else
+# define DISABLE_ITD	(1 << (X86_FEATURE_ITD & 31))
+#endif
+
 /*
  * Make sure to add features to the correct mask
  */
@@ -117,7 +123,7 @@
 			 DISABLE_CALL_DEPTH_TRACKING)
 #define DISABLED_MASK12	0
 #define DISABLED_MASK13	0
-#define DISABLED_MASK14	0
+#define DISABLED_MASK14	(DISABLE_ITD)
 #define DISABLED_MASK15	0
 #define DISABLED_MASK16	(DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP| \
 			 DISABLE_ENQCMD)
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index d95221117129..277f157e067e 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -79,6 +79,7 @@ static const struct cpuid_dep cpuid_deps[] = {
 	{ X86_FEATURE_XFD,			X86_FEATURE_XSAVES    },
 	{ X86_FEATURE_XFD,			X86_FEATURE_XGETBV1   },
 	{ X86_FEATURE_AMX_TILE,			X86_FEATURE_XFD       },
+	{ X86_FEATURE_ITD,			X86_FEATURE_HFI       },
 	{}
 };
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 13/22] thermal: intel: hfi: Update the IPC class of the current task
  2022-11-28 13:20 [PATCH v2 00/22] sched: Introduce IPC classes for load balance Ricardo Neri
                   ` (11 preceding siblings ...)
  2022-11-28 13:20 ` [PATCH v2 12/22] x86/cpufeatures: Add the Intel Thread Director feature definitions Ricardo Neri
@ 2022-11-28 13:20 ` Ricardo Neri
  2022-11-28 13:20 ` [PATCH v2 14/22] thermal: intel: hfi: Report the IPC class score of a CPU Ricardo Neri
                   ` (8 subsequent siblings)
  21 siblings, 0 replies; 39+ messages in thread
From: Ricardo Neri @ 2022-11-28 13:20 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86,
	Joel Fernandes (Google),
	linux-kernel, linux-pm, Ricardo Neri, Tim C . Chen

Use Intel Thread Director classification to update the IPC class of a
task. Implement the needed scheduler interfaces.

Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim C. Chen <tim.c.chen@intel.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: x86@kernel.org
Cc: linux-pm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes since v1:
 * Adjusted the result the classification of Intel Thread Director to start
   at class 1. Class 0 for the scheduler means that the task is
   unclassified.
 * Redefined union hfi_thread_feedback_char_msr to ensure all
   bit-fields are packed. (PeterZ)
 * Removed CONFIG_INTEL_THREAD_DIRECTOR. (PeterZ)
 * Shortened the names of the functions that implement IPC classes.
 * Removed argument smt_siblings_idle from intel_hfi_update_ipcc().
   (PeterZ)
---
 arch/x86/include/asm/topology.h   |  8 +++++++
 drivers/thermal/intel/intel_hfi.c | 37 +++++++++++++++++++++++++++++++
 2 files changed, 45 insertions(+)

diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index 458c891a8273..cf46a3aea283 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -227,4 +227,12 @@ void init_freq_invariance_cppc(void);
 #define arch_init_invariance_cppc init_freq_invariance_cppc
 #endif
 
+#if defined(CONFIG_IPC_CLASSES) && defined(CONFIG_INTEL_HFI_THERMAL)
+int intel_hfi_has_ipc_classes(void);
+void intel_hfi_update_ipcc(struct task_struct *curr);
+
+#define arch_has_ipc_classes intel_hfi_has_ipc_classes
+#define arch_update_ipcc intel_hfi_update_ipcc
+#endif /* defined(CONFIG_IPC_CLASSES) && defined(CONFIG_INTEL_HFI_THERMAL) */
+
 #endif /* _ASM_X86_TOPOLOGY_H */
diff --git a/drivers/thermal/intel/intel_hfi.c b/drivers/thermal/intel/intel_hfi.c
index 56dba967849c..f85394b532a7 100644
--- a/drivers/thermal/intel/intel_hfi.c
+++ b/drivers/thermal/intel/intel_hfi.c
@@ -74,6 +74,17 @@ union cpuid6_edx {
 	u32 full;
 };
 
+#ifdef CONFIG_IPC_CLASSES
+union hfi_thread_feedback_char_msr {
+	struct {
+		u64	classid : 8;
+		u64	__reserved : 55;
+		u64	valid : 1;
+	} split;
+	u64 full;
+};
+#endif
+
 /**
  * struct hfi_cpu_data - HFI capabilities per CPU
  * @perf_cap:		Performance capability
@@ -176,6 +187,32 @@ static struct workqueue_struct *hfi_updates_wq;
 #ifdef CONFIG_IPC_CLASSES
 static int __percpu *hfi_ipcc_scores;
 
+int intel_hfi_has_ipc_classes(void)
+{
+	return cpu_feature_enabled(X86_FEATURE_ITD);
+}
+
+void intel_hfi_update_ipcc(struct task_struct *curr)
+{
+	union hfi_thread_feedback_char_msr msr;
+
+	/* We should not be here if ITD is not supported. */
+	if (!cpu_feature_enabled(X86_FEATURE_ITD)) {
+		pr_warn_once("task classification requested but not supported!");
+		return;
+	}
+
+	rdmsrl(MSR_IA32_HW_FEEDBACK_CHAR, msr.full);
+	if (!msr.split.valid)
+		return;
+
+	/*
+	 * 0 is a valid classification for Intel Thread Director. A scheduler
+	 * IPCC class of 0 means that the task is unclassified. Adjust.
+	 */
+	curr->ipcc = msr.split.classid + 1;
+}
+
 static int alloc_hfi_ipcc_scores(void)
 {
 	hfi_ipcc_scores = __alloc_percpu(sizeof(*hfi_ipcc_scores) *
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 14/22] thermal: intel: hfi: Report the IPC class score of a CPU
  2022-11-28 13:20 [PATCH v2 00/22] sched: Introduce IPC classes for load balance Ricardo Neri
                   ` (12 preceding siblings ...)
  2022-11-28 13:20 ` [PATCH v2 13/22] thermal: intel: hfi: Update the IPC class of the current task Ricardo Neri
@ 2022-11-28 13:20 ` Ricardo Neri
  2022-11-28 13:20 ` [PATCH v2 15/22] thermal: intel: hfi: Define a default class for unclassified tasks Ricardo Neri
                   ` (7 subsequent siblings)
  21 siblings, 0 replies; 39+ messages in thread
From: Ricardo Neri @ 2022-11-28 13:20 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86,
	Joel Fernandes (Google),
	linux-kernel, linux-pm, Ricardo Neri, Tim C . Chen

Implement the arch_get_ipcc_score() interface of the scheduler. Use the
performance capabilities of the extended Hardware Feedback Interface table
as the IPC score of a class of tasks when placed on a given CPU.

Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim C. Chen <tim.c.chen@intel.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: x86@kernel.org
Cc: linux-pm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes since v1:
 * Adjusted the returned HFI class (which starts at 0) to match the
   scheduler IPCC class (which starts at 1). (PeterZ)
 * Used the new interface names.
---
 arch/x86/include/asm/topology.h   |  2 ++
 drivers/thermal/intel/intel_hfi.c | 27 +++++++++++++++++++++++++++
 2 files changed, 29 insertions(+)

diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index cf46a3aea283..0fae13058f01 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -230,9 +230,11 @@ void init_freq_invariance_cppc(void);
 #if defined(CONFIG_IPC_CLASSES) && defined(CONFIG_INTEL_HFI_THERMAL)
 int intel_hfi_has_ipc_classes(void);
 void intel_hfi_update_ipcc(struct task_struct *curr);
+int intel_hfi_get_ipcc_score(unsigned short ipcc, int cpu);
 
 #define arch_has_ipc_classes intel_hfi_has_ipc_classes
 #define arch_update_ipcc intel_hfi_update_ipcc
+#define arch_get_ipcc_score intel_hfi_get_ipcc_score
 #endif /* defined(CONFIG_IPC_CLASSES) && defined(CONFIG_INTEL_HFI_THERMAL) */
 
 #endif /* _ASM_X86_TOPOLOGY_H */
diff --git a/drivers/thermal/intel/intel_hfi.c b/drivers/thermal/intel/intel_hfi.c
index f85394b532a7..1f7b18198bd4 100644
--- a/drivers/thermal/intel/intel_hfi.c
+++ b/drivers/thermal/intel/intel_hfi.c
@@ -213,6 +213,33 @@ void intel_hfi_update_ipcc(struct task_struct *curr)
 	curr->ipcc = msr.split.classid + 1;
 }
 
+int intel_hfi_get_ipcc_score(unsigned short ipcc, int cpu)
+{
+	unsigned short hfi_class;
+	int *scores;
+
+	if (cpu < 0 || cpu >= nr_cpu_ids)
+		return -EINVAL;
+
+	if (ipcc == IPC_CLASS_UNCLASSIFIED)
+		return -EINVAL;
+
+	/*
+	 * Scheduler IPC classes start at 1. HFI classes start at 0.
+	 * See note intel_hfi_update_ipcc().
+	 */
+	hfi_class = ipcc - 1;
+
+	if (hfi_class >= hfi_features.nr_classes)
+		return -EINVAL;
+
+	scores = per_cpu_ptr(hfi_ipcc_scores, cpu);
+	if (!scores)
+		return -ENODEV;
+
+	return READ_ONCE(scores[hfi_class]);
+}
+
 static int alloc_hfi_ipcc_scores(void)
 {
 	hfi_ipcc_scores = __alloc_percpu(sizeof(*hfi_ipcc_scores) *
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 15/22] thermal: intel: hfi: Define a default class for unclassified tasks
  2022-11-28 13:20 [PATCH v2 00/22] sched: Introduce IPC classes for load balance Ricardo Neri
                   ` (13 preceding siblings ...)
  2022-11-28 13:20 ` [PATCH v2 14/22] thermal: intel: hfi: Report the IPC class score of a CPU Ricardo Neri
@ 2022-11-28 13:20 ` Ricardo Neri
  2022-11-28 13:20 ` [PATCH v2 16/22] thermal: intel: hfi: Enable the Intel Thread Director Ricardo Neri
                   ` (6 subsequent siblings)
  21 siblings, 0 replies; 39+ messages in thread
From: Ricardo Neri @ 2022-11-28 13:20 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86,
	Joel Fernandes (Google),
	linux-kernel, linux-pm, Ricardo Neri, Tim C . Chen

A task may be unclassified if it has been recently created, spend most of
its lifetime sleeping, or hardware has not provided a classification.

Most tasks will be eventually classified as scheduler's IPC class 1
(HFI class 0). This class corresponds to the capabilities in the legacy,
classless, HFI table.

IPC class 1 is a reasonable choice until hardware provides an actual
classification. Meanwhile, the scheduler will place other tasks with higher
scores on higher-performance CPUs.

Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim C. Chen <tim.c.chen@intel.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: x86@kernel.org
Cc: linux-pm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes since v1:
 * Now the default class is 1.
---
 drivers/thermal/intel/intel_hfi.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/thermal/intel/intel_hfi.c b/drivers/thermal/intel/intel_hfi.c
index 1f7b18198bd4..1b3fd704ae9a 100644
--- a/drivers/thermal/intel/intel_hfi.c
+++ b/drivers/thermal/intel/intel_hfi.c
@@ -187,6 +187,19 @@ static struct workqueue_struct *hfi_updates_wq;
 #ifdef CONFIG_IPC_CLASSES
 static int __percpu *hfi_ipcc_scores;
 
+/*
+ * A task may be unclassified if it has been recently created, spend most of
+ * its lifetime sleeping, or hardware has not provided a classification.
+ *
+ * Most tasks will be classified as scheduler's IPC class 1 (HFI class 0)
+ * eventually. Meanwhile, the scheduler will place tasks of higher IPC score
+ * on higher-performance CPUs.
+ *
+ * IPC class 1 is a reasonable choice. It matches the performance capability
+ * of the legacy, classless, HFI table.
+ */
+#define HFI_UNCLASSIFIED_DEFAULT 1
+
 int intel_hfi_has_ipc_classes(void)
 {
 	return cpu_feature_enabled(X86_FEATURE_ITD);
@@ -222,7 +235,7 @@ int intel_hfi_get_ipcc_score(unsigned short ipcc, int cpu)
 		return -EINVAL;
 
 	if (ipcc == IPC_CLASS_UNCLASSIFIED)
-		return -EINVAL;
+		ipcc = HFI_UNCLASSIFIED_DEFAULT;
 
 	/*
 	 * Scheduler IPC classes start at 1. HFI classes start at 0.
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 16/22] thermal: intel: hfi: Enable the Intel Thread Director
  2022-11-28 13:20 [PATCH v2 00/22] sched: Introduce IPC classes for load balance Ricardo Neri
                   ` (14 preceding siblings ...)
  2022-11-28 13:20 ` [PATCH v2 15/22] thermal: intel: hfi: Define a default class for unclassified tasks Ricardo Neri
@ 2022-11-28 13:20 ` Ricardo Neri
  2022-11-28 13:20 ` [PATCH v2 17/22] sched/task_struct: Add helpers for IPC classification Ricardo Neri
                   ` (5 subsequent siblings)
  21 siblings, 0 replies; 39+ messages in thread
From: Ricardo Neri @ 2022-11-28 13:20 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86,
	Joel Fernandes (Google),
	linux-kernel, linux-pm, Ricardo Neri, Tim C . Chen

Enable Intel Thread Director from the CPU hotplug callback: globally from
CPU0 and then enable the thread-classification hardware in each logical
processor individually.

Also, initialize the number of classes supported.

Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim C. Chen <tim.c.chen@intel.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: x86@kernel.org
Cc: linux-pm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes since v1:
 * None
---
 arch/x86/include/asm/msr-index.h  |  2 ++
 drivers/thermal/intel/intel_hfi.c | 30 ++++++++++++++++++++++++++++--
 2 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 37ff47552bcb..96303330223b 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -1075,6 +1075,8 @@
 /* Hardware Feedback Interface */
 #define MSR_IA32_HW_FEEDBACK_PTR        0x17d0
 #define MSR_IA32_HW_FEEDBACK_CONFIG     0x17d1
+#define MSR_IA32_HW_FEEDBACK_THREAD_CONFIG 0x17d4
+#define MSR_IA32_HW_FEEDBACK_CHAR	0x17d2
 
 /* x2APIC locked status */
 #define MSR_IA32_XAPIC_DISABLE_STATUS	0xBD
diff --git a/drivers/thermal/intel/intel_hfi.c b/drivers/thermal/intel/intel_hfi.c
index 1b3fd704ae9a..8287bfd7d6b6 100644
--- a/drivers/thermal/intel/intel_hfi.c
+++ b/drivers/thermal/intel/intel_hfi.c
@@ -50,6 +50,8 @@
 /* Hardware Feedback Interface MSR configuration bits */
 #define HW_FEEDBACK_PTR_VALID_BIT		BIT(0)
 #define HW_FEEDBACK_CONFIG_HFI_ENABLE_BIT	BIT(0)
+#define HW_FEEDBACK_CONFIG_ITD_ENABLE_BIT	BIT(1)
+#define HW_FEEDBACK_THREAD_CONFIG_ENABLE_BIT	BIT(0)
 
 /* CPUID detection and enumeration definitions for HFI */
 
@@ -74,6 +76,15 @@ union cpuid6_edx {
 	u32 full;
 };
 
+union cpuid6_ecx {
+	struct {
+		u32	dont_care0:8;
+		u32	nr_classes:8;
+		u32	dont_care1:16;
+	} split;
+	u32 full;
+};
+
 #ifdef CONFIG_IPC_CLASSES
 union hfi_thread_feedback_char_msr {
 	struct {
@@ -495,6 +506,11 @@ void intel_hfi_online(unsigned int cpu)
 
 	init_hfi_cpu_index(info);
 
+	if (cpu_feature_enabled(X86_FEATURE_ITD)) {
+		msr_val = HW_FEEDBACK_THREAD_CONFIG_ENABLE_BIT;
+		wrmsrl(MSR_IA32_HW_FEEDBACK_THREAD_CONFIG, msr_val);
+	}
+
 	/*
 	 * Now check if the HFI instance of the package/die of @cpu has been
 	 * initialized (by checking its header). In such case, all we have to
@@ -550,6 +566,10 @@ void intel_hfi_online(unsigned int cpu)
 	 */
 	rdmsrl(MSR_IA32_HW_FEEDBACK_CONFIG, msr_val);
 	msr_val |= HW_FEEDBACK_CONFIG_HFI_ENABLE_BIT;
+
+	if (cpu_feature_enabled(X86_FEATURE_ITD))
+		msr_val |= HW_FEEDBACK_CONFIG_ITD_ENABLE_BIT;
+
 	wrmsrl(MSR_IA32_HW_FEEDBACK_CONFIG, msr_val);
 
 unlock:
@@ -629,8 +649,14 @@ static __init int hfi_parse_features(void)
 	 */
 	hfi_features.class_stride = nr_capabilities;
 
-	/* For now, use only one class of the HFI table */
-	hfi_features.nr_classes = 1;
+	if (cpu_feature_enabled(X86_FEATURE_ITD)) {
+		union cpuid6_ecx ecx;
+
+		ecx.full = cpuid_ecx(CPUID_HFI_LEAF);
+		hfi_features.nr_classes = ecx.split.nr_classes;
+	} else {
+		hfi_features.nr_classes = 1;
+	}
 
 	/*
 	 * The header contains change indications for each supported feature.
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 17/22] sched/task_struct: Add helpers for IPC classification
  2022-11-28 13:20 [PATCH v2 00/22] sched: Introduce IPC classes for load balance Ricardo Neri
                   ` (15 preceding siblings ...)
  2022-11-28 13:20 ` [PATCH v2 16/22] thermal: intel: hfi: Enable the Intel Thread Director Ricardo Neri
@ 2022-11-28 13:20 ` Ricardo Neri
  2022-11-28 13:20 ` [PATCH v2 18/22] sched/core: Initialize helpers of task classification Ricardo Neri
                   ` (4 subsequent siblings)
  21 siblings, 0 replies; 39+ messages in thread
From: Ricardo Neri @ 2022-11-28 13:20 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86,
	Joel Fernandes (Google),
	linux-kernel, linux-pm, Ricardo Neri, Tim C . Chen

The unprocessed classification that hardware provides for a task may not
be usable by the scheduler: the classification may change too frequently or
architectures may want to consider extra factors. For instance, some
processors with Intel Thread Director need to consider the state of the SMT
siblings of a core.

Provide per-task helper variables that architectures can use to post-
process the classification that hardware provides.

Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim C. Chen <tim.c.chen@intel.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: x86@kernel.org
Cc: linux-pm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes since v1:
 * Used bit-fields to fit all the IPC class data in 4 bytes. (PeterZ)
 * Shortened names of the helpers.
 * Renamed helpers with the ipcc_ prefix.
 * Reworded commit message for clarity
---
 include/linux/sched.h | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index ddabc7449edd..8a99aa316c37 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1532,7 +1532,17 @@ struct task_struct {
 	 * A hardware-defined classification of task based on the number
 	 * of instructions per cycle.
 	 */
-	unsigned int			ipcc;
+	unsigned int			ipcc : 9;
+	/*
+	 * A candidate classification that arch-specific implementations
+	 * qualify for correctness.
+	 */
+	unsigned int			ipcc_tmp : 9;
+	/*
+	 * Counter to filter out transient the candidate classification
+	 * of a task
+	 */
+	unsigned int			ipcc_cntr : 14;
 #endif
 
 	/*
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 18/22] sched/core: Initialize helpers of task classification
  2022-11-28 13:20 [PATCH v2 00/22] sched: Introduce IPC classes for load balance Ricardo Neri
                   ` (16 preceding siblings ...)
  2022-11-28 13:20 ` [PATCH v2 17/22] sched/task_struct: Add helpers for IPC classification Ricardo Neri
@ 2022-11-28 13:20 ` Ricardo Neri
  2022-11-28 13:20 ` [PATCH v2 19/22] thermal: intel: hfi: Implement model-specific checks for " Ricardo Neri
                   ` (3 subsequent siblings)
  21 siblings, 0 replies; 39+ messages in thread
From: Ricardo Neri @ 2022-11-28 13:20 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86,
	Joel Fernandes (Google),
	linux-kernel, linux-pm, Ricardo Neri, Tim C . Chen

Just as tasks start life unclassified, initialize the classification
auxiliar variables.

Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim C. Chen <tim.c.chen@intel.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: x86@kernel.org
Cc: linux-pm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes since v1:
 * None
---
 kernel/sched/core.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2cd409536b72..0406b07c51a0 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4374,6 +4374,8 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p)
 	p->se.vruntime			= 0;
 #ifdef CONFIG_IPC_CLASSES
 	p->ipcc				= IPC_CLASS_UNCLASSIFIED;
+	p->ipcc_tmp			= IPC_CLASS_UNCLASSIFIED;
+	p->ipcc_cntr			= 0;
 #endif
 	INIT_LIST_HEAD(&p->se.group_node);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 19/22] thermal: intel: hfi: Implement model-specific checks for task classification
  2022-11-28 13:20 [PATCH v2 00/22] sched: Introduce IPC classes for load balance Ricardo Neri
                   ` (17 preceding siblings ...)
  2022-11-28 13:20 ` [PATCH v2 18/22] sched/core: Initialize helpers of task classification Ricardo Neri
@ 2022-11-28 13:20 ` Ricardo Neri
  2022-11-28 13:20 ` [PATCH v2 20/22] x86/cpufeatures: Add feature bit for HRESET Ricardo Neri
                   ` (2 subsequent siblings)
  21 siblings, 0 replies; 39+ messages in thread
From: Ricardo Neri @ 2022-11-28 13:20 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86,
	Joel Fernandes (Google),
	linux-kernel, linux-pm, Ricardo Neri, Tim C . Chen

In Alderlake and Raptorlake, the result of thread classification is more
accurate when only one SMT sibling is busy. Classification results for
class 2 and 3 that are always reliable.

To avoid unnecessary migrations, only update the class of a task if it has
been the same during 4 consecutive ticks.

Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim C. Chen <tim.c.chen@intel.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: x86@kernel.org
Cc: linux-pm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes since v1:
 * Adjusted the result the classification of Intel Thread Director to start
   at class 1. Class 0 for the scheduler means that the task is
   unclassified.
 * Used the new names of the IPC classes members in task_struct.
 * Reworked helper functions to use sched_smt_siblings_idle() to query
   the idle state of the SMT siblings of a CPU.
---
 drivers/thermal/intel/intel_hfi.c | 60 ++++++++++++++++++++++++++++++-
 1 file changed, 59 insertions(+), 1 deletion(-)

diff --git a/drivers/thermal/intel/intel_hfi.c b/drivers/thermal/intel/intel_hfi.c
index 8287bfd7d6b6..a9ae09036909 100644
--- a/drivers/thermal/intel/intel_hfi.c
+++ b/drivers/thermal/intel/intel_hfi.c
@@ -40,6 +40,7 @@
 #include <linux/workqueue.h>
 
 #include <asm/msr.h>
+#include <asm/intel-family.h>
 
 #include "../thermal_core.h"
 #include "intel_hfi.h"
@@ -216,9 +217,64 @@ int intel_hfi_has_ipc_classes(void)
 	return cpu_feature_enabled(X86_FEATURE_ITD);
 }
 
+#define CLASS_DEBOUNCER_SKIPS 4
+
+/**
+ * debounce_and_update_class() - Process and update a task's classification
+ *
+ * @p:		The task of which the classification will be updated
+ * @new_ipcc:	The new IPC classification
+ *
+ * Update the classification of @p with the new value that hardware provides.
+ * Only update the classification of @p if it has been the same during
+ * CLASS_DEBOUNCER_SKIPS consecutive ticks.
+ */
+static void debounce_and_update_class(struct task_struct *p, u8 new_ipcc)
+{
+	u16 debounce_skip;
+
+	/* The class of @p changed, only restart the debounce counter. */
+	if (p->ipcc_tmp != new_ipcc) {
+		p->ipcc_cntr = 1;
+		goto out;
+	}
+
+	/*
+	 * The class of @p did not change. Update it if it has been the same
+	 * for CLASS_DEBOUNCER_SKIPS user ticks.
+	 */
+	debounce_skip = p->ipcc_cntr + 1;
+	if (debounce_skip < CLASS_DEBOUNCER_SKIPS)
+		p->ipcc_cntr++;
+	else
+		p->ipcc = new_ipcc;
+
+out:
+	p->ipcc_tmp = new_ipcc;
+}
+
+static bool classification_is_accurate(u8 hfi_class, bool smt_siblings_idle)
+{
+	switch (boot_cpu_data.x86_model) {
+	case INTEL_FAM6_ALDERLAKE:
+	case INTEL_FAM6_ALDERLAKE_L:
+	case INTEL_FAM6_RAPTORLAKE:
+	case INTEL_FAM6_RAPTORLAKE_P:
+	case INTEL_FAM6_RAPTORLAKE_S:
+		if (hfi_class == 3 || hfi_class == 2 || smt_siblings_idle)
+			return true;
+
+		return false;
+
+	default:
+		return true;
+	}
+}
+
 void intel_hfi_update_ipcc(struct task_struct *curr)
 {
 	union hfi_thread_feedback_char_msr msr;
+	bool idle;
 
 	/* We should not be here if ITD is not supported. */
 	if (!cpu_feature_enabled(X86_FEATURE_ITD)) {
@@ -234,7 +290,9 @@ void intel_hfi_update_ipcc(struct task_struct *curr)
 	 * 0 is a valid classification for Intel Thread Director. A scheduler
 	 * IPCC class of 0 means that the task is unclassified. Adjust.
 	 */
-	curr->ipcc = msr.split.classid + 1;
+	idle = sched_smt_siblings_idle(task_cpu(curr));
+	if (classification_is_accurate(msr.split.classid, idle))
+		debounce_and_update_class(curr, msr.split.classid + 1);
 }
 
 int intel_hfi_get_ipcc_score(unsigned short ipcc, int cpu)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 20/22] x86/cpufeatures: Add feature bit for HRESET
  2022-11-28 13:20 [PATCH v2 00/22] sched: Introduce IPC classes for load balance Ricardo Neri
                   ` (18 preceding siblings ...)
  2022-11-28 13:20 ` [PATCH v2 19/22] thermal: intel: hfi: Implement model-specific checks for " Ricardo Neri
@ 2022-11-28 13:20 ` Ricardo Neri
  2022-11-28 13:20 ` [PATCH v2 21/22] x86/hreset: Configure history reset Ricardo Neri
  2022-11-28 13:21 ` [PATCH v2 22/22] x86/process: Reset hardware history in context switch Ricardo Neri
  21 siblings, 0 replies; 39+ messages in thread
From: Ricardo Neri @ 2022-11-28 13:20 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86,
	Joel Fernandes (Google),
	linux-kernel, linux-pm, Ricardo Neri, Tim C . Chen

The HRESET instruction prevents the classification of the current task
from influencing the classification of the next task when running serially
on the same logical processor.

Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim C. Chen <tim.c.chen@intel.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: x86@kernel.org
Cc: linux-pm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes since v1:
 * None
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 arch/x86/include/asm/msr-index.h   | 4 +++-
 arch/x86/kernel/cpu/scattered.c    | 1 +
 3 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 80b2beafc81e..281a7c861b8d 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -308,6 +308,7 @@
 #define X86_FEATURE_CALL_DEPTH		(11*32+19) /* "" Call depth tracking for RSB stuffing */
 
 #define X86_FEATURE_MSR_TSX_CTRL	(11*32+20) /* "" MSR IA32_TSX_CTRL (Intel) implemented */
+#define X86_FEATURE_HRESET		(11*32+21) /* Hardware history reset instruction */
 
 /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
 #define X86_FEATURE_AVX_VNNI		(12*32+ 4) /* AVX VNNI instructions */
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 96303330223b..7a3ff73164bd 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -1078,6 +1078,9 @@
 #define MSR_IA32_HW_FEEDBACK_THREAD_CONFIG 0x17d4
 #define MSR_IA32_HW_FEEDBACK_CHAR	0x17d2
 
+/* Hardware History Reset  */
+#define MSR_IA32_HW_HRESET_ENABLE	0x17da
+
 /* x2APIC locked status */
 #define MSR_IA32_XAPIC_DISABLE_STATUS	0xBD
 #define LEGACY_XAPIC_DISABLED		BIT(0) /*
@@ -1085,5 +1088,4 @@
 						* disabling x2APIC will cause
 						* a #GP
 						*/
-
 #endif /* _ASM_X86_MSR_INDEX_H */
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index f53944fb8f7f..66bc5713644d 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -28,6 +28,7 @@ static const struct cpuid_bit cpuid_bits[] = {
 	{ X86_FEATURE_EPB,		CPUID_ECX,  3, 0x00000006, 0 },
 	{ X86_FEATURE_INTEL_PPIN,	CPUID_EBX,  0, 0x00000007, 1 },
 	{ X86_FEATURE_RRSBA_CTRL,	CPUID_EDX,  2, 0x00000007, 2 },
+	{ X86_FEATURE_HRESET,		CPUID_EAX, 22, 0x00000007, 1 },
 	{ X86_FEATURE_CQM_LLC,		CPUID_EDX,  1, 0x0000000f, 0 },
 	{ X86_FEATURE_CQM_OCCUP_LLC,	CPUID_EDX,  0, 0x0000000f, 1 },
 	{ X86_FEATURE_CQM_MBM_TOTAL,	CPUID_EDX,  1, 0x0000000f, 1 },
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 21/22] x86/hreset: Configure history reset
  2022-11-28 13:20 [PATCH v2 00/22] sched: Introduce IPC classes for load balance Ricardo Neri
                   ` (19 preceding siblings ...)
  2022-11-28 13:20 ` [PATCH v2 20/22] x86/cpufeatures: Add feature bit for HRESET Ricardo Neri
@ 2022-11-28 13:20 ` Ricardo Neri
  2022-11-28 13:21 ` [PATCH v2 22/22] x86/process: Reset hardware history in context switch Ricardo Neri
  21 siblings, 0 replies; 39+ messages in thread
From: Ricardo Neri @ 2022-11-28 13:20 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86,
	Joel Fernandes (Google),
	linux-kernel, linux-pm, Ricardo Neri, Tim C . Chen

Configure the MSR that controls the behavior of HRESET on each logical
processor.

Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim C. Chen <tim.c.chen@intel.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: x86@kernel.org
Cc: linux-pm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes since v1:
 * Marked hardware_history_features as __ro_after_init instead of
   __read_mostly. (PeterZ)
---
 arch/x86/kernel/cpu/common.c | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 73cc546e024d..f8630da2a6dd 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -412,6 +412,26 @@ static __always_inline void setup_umip(struct cpuinfo_x86 *c)
 	cr4_clear_bits(X86_CR4_UMIP);
 }
 
+static u32 hardware_history_features __ro_after_init;
+
+static __always_inline void setup_hreset(struct cpuinfo_x86 *c)
+{
+	if (!cpu_feature_enabled(X86_FEATURE_HRESET))
+		return;
+
+	/*
+	 * Use on all CPUs the hardware history features that the boot
+	 * CPU supports.
+	 */
+	if (c == &boot_cpu_data)
+		hardware_history_features = cpuid_ebx(0x20);
+
+	if (!hardware_history_features)
+		return;
+
+	wrmsrl(MSR_IA32_HW_HRESET_ENABLE, hardware_history_features);
+}
+
 /* These bits should not change their value after CPU init is finished. */
 static const unsigned long cr4_pinned_mask =
 	X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_UMIP |
@@ -1844,10 +1864,11 @@ static void identify_cpu(struct cpuinfo_x86 *c)
 	/* Disable the PN if appropriate */
 	squash_the_stupid_serial_number(c);
 
-	/* Set up SMEP/SMAP/UMIP */
+	/* Set up SMEP/SMAP/UMIP/HRESET */
 	setup_smep(c);
 	setup_smap(c);
 	setup_umip(c);
+	setup_hreset(c);
 
 	/* Enable FSGSBASE instructions if available. */
 	if (cpu_has(c, X86_FEATURE_FSGSBASE)) {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 22/22] x86/process: Reset hardware history in context switch
  2022-11-28 13:20 [PATCH v2 00/22] sched: Introduce IPC classes for load balance Ricardo Neri
                   ` (20 preceding siblings ...)
  2022-11-28 13:20 ` [PATCH v2 21/22] x86/hreset: Configure history reset Ricardo Neri
@ 2022-11-28 13:21 ` Ricardo Neri
  21 siblings, 0 replies; 39+ messages in thread
From: Ricardo Neri @ 2022-11-28 13:21 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86,
	Joel Fernandes (Google),
	linux-kernel, linux-pm, Ricardo Neri, Tim C . Chen

Reset the classification history of the current task when switching to the
next task. Hardware will start the classification of the next task from
scratch.

Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim C. Chen <tim.c.chen@intel.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: x86@kernel.org
Cc: linux-pm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
Changes since v1:
 * Measurements of the cost of the HRESET instruction

   Methodology:
   I created a tight loop with interrupts and preemption disabled. I
   recorded the value of the TSC counter before and after executing
   HRESET or RDTSC. I repeated the measurement 100,000 times.
   I performed the experiment using an Alder Lake S system. I set the
   frequency of the CPUs at a fixed value.

   The table below compares the cost of HRESET with RDTSC (expressed in
   the elapsed TSC count). The cost of the two instructions is
   comparable.

                              PCore      ECore
        Frequency (GHz)        5.0        3.8
        HRESET (avg)          28.5       44.7
        HRESET (stdev %)       3.6        2.3
        RDTSC  (avg)          25.2       35.7
        RDTSC  (stdev %)       3.9        2.6

 * Used an ALTERNATIVE macro instead of static_cpu_has() to execute HRESET
   when supported. (PeterZ)
---
 arch/x86/include/asm/hreset.h | 30 ++++++++++++++++++++++++++++++
 arch/x86/kernel/cpu/common.c  |  7 +++++++
 arch/x86/kernel/process_32.c  |  3 +++
 arch/x86/kernel/process_64.c  |  3 +++
 4 files changed, 43 insertions(+)
 create mode 100644 arch/x86/include/asm/hreset.h

diff --git a/arch/x86/include/asm/hreset.h b/arch/x86/include/asm/hreset.h
new file mode 100644
index 000000000000..d68ca2fb8642
--- /dev/null
+++ b/arch/x86/include/asm/hreset.h
@@ -0,0 +1,30 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_HRESET_H
+
+/**
+ * HRESET - History reset. Available since binutils v2.36.
+ *
+ * Request the processor to reset the history of task classification on the
+ * current logical processor. The history components to be
+ * reset are specified in %eax. Only bits specified in CPUID(0x20).EBX
+ * and enabled in the IA32_HRESET_ENABLE MSR can be selected.
+ *
+ * The assembly code looks like:
+ *
+ *	hreset %eax
+ *
+ * The corresponding machine code looks like:
+ *
+ *	F3 0F 3A F0 ModRM Imm
+ *
+ * The value of ModRM is 0xc0 to specify %eax register addressing.
+ * The ignored immediate operand is set to 0.
+ *
+ * The instruction is documented in the Intel SDM.
+ */
+
+#define __ASM_HRESET  ".byte 0xf3, 0xf, 0x3a, 0xf0, 0xc0, 0x0"
+
+void reset_hardware_history(void);
+
+#endif /* _ASM_X86_HRESET_H */
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index f8630da2a6dd..6c2b9768698e 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -53,6 +53,7 @@
 #include <asm/mce.h>
 #include <asm/msr.h>
 #include <asm/cacheinfo.h>
+#include <asm/hreset.h>
 #include <asm/memtype.h>
 #include <asm/microcode.h>
 #include <asm/microcode_intel.h>
@@ -414,6 +415,12 @@ static __always_inline void setup_umip(struct cpuinfo_x86 *c)
 
 static u32 hardware_history_features __ro_after_init;
 
+void reset_hardware_history(void)
+{
+	asm_inline volatile (ALTERNATIVE("", __ASM_HRESET, X86_FEATURE_HRESET)
+			     : : "a" (hardware_history_features) : "memory");
+}
+
 static __always_inline void setup_hreset(struct cpuinfo_x86 *c)
 {
 	if (!cpu_feature_enabled(X86_FEATURE_HRESET))
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 470c128759ea..397a6e6f4e61 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -52,6 +52,7 @@
 #include <asm/switch_to.h>
 #include <asm/vm86.h>
 #include <asm/resctrl.h>
+#include <asm/hreset.h>
 #include <asm/proto.h>
 
 #include "process.h"
@@ -214,6 +215,8 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	/* Load the Intel cache allocation PQR MSR. */
 	resctrl_sched_in();
 
+	reset_hardware_history();
+
 	return prev_p;
 }
 
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 084ec467dbb1..ac9b3d44c1bd 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -53,6 +53,7 @@
 #include <asm/xen/hypervisor.h>
 #include <asm/vdso.h>
 #include <asm/resctrl.h>
+#include <asm/hreset.h>
 #include <asm/unistd.h>
 #include <asm/fsgsbase.h>
 #ifdef CONFIG_IA32_EMULATION
@@ -658,6 +659,8 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	/* Load the Intel cache allocation PQR MSR. */
 	resctrl_sched_in();
 
+	reset_hardware_history();
+
 	return prev_p;
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 04/22] sched/core: Add user_tick as argument to scheduler_tick()
  2022-11-28 13:20 ` [PATCH v2 04/22] sched/core: Add user_tick as argument to scheduler_tick() Ricardo Neri
@ 2022-12-07 12:21   ` Dietmar Eggemann
  2022-12-12 18:47     ` Ricardo Neri
  0 siblings, 1 reply; 39+ messages in thread
From: Dietmar Eggemann @ 2022-12-07 12:21 UTC (permalink / raw)
  To: Ricardo Neri, Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Len Brown, Mel Gorman,
	Rafael J. Wysocki, Srinivas Pandruvada, Steven Rostedt, Tim Chen,
	Valentin Schneider, x86, Joel Fernandes (Google),
	linux-kernel, linux-pm, Tim C . Chen

On 28/11/2022 14:20, Ricardo Neri wrote:
> Differentiate between user and kernel ticks so that the scheduler updates
> the IPC class of the current task during the latter.

Just to make sure ,,, 05/22 introduces `rq->curr` IPCC update during
user_tick, i.e. the former?

[...]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 06/22] sched/fair: Collect load-balancing stats for IPC classes
  2022-11-28 13:20 ` [PATCH v2 06/22] sched/fair: Collect load-balancing stats for IPC classes Ricardo Neri
@ 2022-12-07 17:00   ` Dietmar Eggemann
  2022-12-12 21:41     ` Ricardo Neri
  2022-12-08  8:50   ` Ionela Voinescu
  1 sibling, 1 reply; 39+ messages in thread
From: Dietmar Eggemann @ 2022-12-07 17:00 UTC (permalink / raw)
  To: Ricardo Neri, Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Len Brown, Mel Gorman,
	Rafael J. Wysocki, Srinivas Pandruvada, Steven Rostedt, Tim Chen,
	Valentin Schneider, x86, Joel Fernandes (Google),
	linux-kernel, linux-pm, Tim C . Chen

On 28/11/2022 14:20, Ricardo Neri wrote:

[...]

> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 224107278471..3a1d6c50a19b 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -9100,6 +9100,57 @@ group_type group_classify(unsigned int imbalance_pct,
>  	return group_has_spare;
>  }
>  
> +struct sg_lb_ipcc_stats {
> +	int min_score;	/* Min(score(rq->curr->ipcc)) */
> +	int min_ipcc;	/* Min(rq->curr->ipcc) */
> +	long sum_score; /* Sum(score(rq->curr->ipcc)) */
> +};

Wouldn't it be cleaner to put `min_score`, `min_ipcc` and `sum_score`
into `struct sg_lb_stats` next to `ipcc_score_{after, before}` under the
same #ifdef CONFIG_IPC_CLASSES?

Looks like those IPCC stats would only be needed in the specific
condition under which update_sg_lb_stats_scores() is called?

> +#ifdef CONFIG_IPC_CLASSES
> +static void init_rq_ipcc_stats(struct sg_lb_ipcc_stats *sgcs)
> +{
> +	*sgcs = (struct sg_lb_ipcc_stats) {
> +		.min_score = INT_MAX,
> +	};
> +}
> +
> +/** Called only if cpu_of(@rq) is not idle and has tasks running. */
> +static void update_sg_lb_ipcc_stats(struct sg_lb_ipcc_stats *sgcs,
> +				    struct rq *rq)
> +{
> +	struct task_struct *curr;
> +	unsigned short ipcc;
> +	int score;
> +
> +	if (!sched_ipcc_enabled())
> +		return;
> +
> +	curr = rcu_dereference(rq->curr);
> +	if (!curr || (curr->flags & PF_EXITING) || is_idle_task(curr))

So the Idle task is excluded but RT, DL, (Stopper) tasks are not. Looks
weird if non-CFS tasks could influence CFS load-balancing.
AFAICS, RT and DL tasks could have p->ipcc != IPC_CLASS_UNCLASSIFIED?

[...]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 02/22] sched: Add interfaces for IPC classes
  2022-11-28 13:20 ` [PATCH v2 02/22] sched: Add interfaces for IPC classes Ricardo Neri
@ 2022-12-08  8:48   ` Ionela Voinescu
  2022-12-14  0:31     ` Ricardo Neri
  2022-12-14  7:36   ` Lukasz Luba
  1 sibling, 1 reply; 39+ messages in thread
From: Ionela Voinescu @ 2022-12-08  8:48 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Peter Zijlstra (Intel),
	Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86,
	Joel Fernandes (Google),
	linux-kernel, linux-pm, Tim C . Chen

Hi,

On Monday 28 Nov 2022 at 05:20:40 (-0800), Ricardo Neri wrote:
[..]
> +#ifndef arch_has_ipc_classes
> +/**
> + * arch_has_ipc_classes() - Check whether hardware supports IPC classes of tasks
> + *
> + * Returns: true of IPC classes of tasks are supported.
> + */
> +static __always_inline
> +bool arch_has_ipc_classes(void)
> +{
> +	return false;
> +}
> +#endif
> +
> +#ifndef arch_update_ipcc
> +/**
> + * arch_update_ipcc() - Update the IPC class of the current task
> + * @curr:		The current task
> + *
> + * Request that the IPC classification of @curr is updated.
> + *
> + * Returns: none
> + */
> +static __always_inline
> +void arch_update_ipcc(struct task_struct *curr)
> +{
> +}
> +#endif
> +
> +#ifndef arch_get_ipcc_score
> +/**
> + * arch_get_ipcc_score() - Get the IPC score of a class of task
> + * @ipcc:	The IPC class
> + * @cpu:	A CPU number
> + *
> + * Returns the performance score of an IPC class when running on @cpu.
> + * Error when either @class or @cpu are invalid.
> + */
> +static __always_inline
> +int arch_get_ipcc_score(unsigned short ipcc, int cpu)
> +{
> +	return 1;
> +}
> +#endif

The interface looks mostly alright but this arch_get_ipcc_score() leaves
unclear what are the characteristics of the returned value.

Does it have a meaning as an absolute value or is it a value on an
abstract scale? If it should be interpreted as instructions per cycle,
if I wanted to have a proper comparison between the ability of two CPUs
to handle this class of tasks then I would need to take into consideration
the maximum frequency of each CPU. If it's a performance value on an
abstract scale (more likely), similar cu capacity, then it might be good
to better define this abstract scale. That would help with the default
implementation where possibly the best choice for a return value would
be the maximum value on the scale, suggesting equal/maximum performance
for different CPUs handling the class of tasks.

I suppose you avoided returning 0 for the default implementation as you
intend that to mean the inability of the CPU to handle that class of
tasks? It would be good to document this.

> +#else /* CONFIG_IPC_CLASSES */
> +
> +#define arch_get_ipcc_score(ipcc, cpu) (-EINVAL)
> +#define arch_update_ipcc(curr)
> +
> +static inline bool sched_ipcc_enabled(void) { return false; }
> +
> +#endif /* CONFIG_IPC_CLASSES */
> +
>  #ifndef arch_scale_freq_capacity
>  /**
>   * arch_scale_freq_capacity - get the frequency scale factor of a given CPU.
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index 8154ef590b9f..eb1654b64df7 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -669,6 +669,9 @@ DEFINE_PER_CPU(struct sched_domain __rcu *, sd_numa);
>  DEFINE_PER_CPU(struct sched_domain __rcu *, sd_asym_packing);
>  DEFINE_PER_CPU(struct sched_domain __rcu *, sd_asym_cpucapacity);
>  DEFINE_STATIC_KEY_FALSE(sched_asym_cpucapacity);
> +#ifdef CONFIG_IPC_CLASSES
> +DEFINE_STATIC_KEY_FALSE(sched_ipcc);
> +#endif
>  
>  static void update_top_cache_domain(int cpu)
>  {
> @@ -2388,6 +2391,11 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att
>  	if (has_asym)
>  		static_branch_inc_cpuslocked(&sched_asym_cpucapacity);
>  
> +#ifdef CONFIG_IPC_CLASSES
> +	if (arch_has_ipc_classes())
> +		static_branch_enable_cpuslocked(&sched_ipcc);
> +#endif

Wouldn't this be better placed directly in sched_init_smp()?
It's not gated by and it does not need any sched domains information.

Hope it helps,
Ionela.

> +
>  	if (rq && sched_debug_verbose) {
>  		pr_info("root domain span: %*pbl (max cpu_capacity = %lu)\n",
>  			cpumask_pr_args(cpu_map), rq->rd->max_cpu_capacity);
> -- 
> 2.25.1
> 
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 06/22] sched/fair: Collect load-balancing stats for IPC classes
  2022-11-28 13:20 ` [PATCH v2 06/22] sched/fair: Collect load-balancing stats for IPC classes Ricardo Neri
  2022-12-07 17:00   ` Dietmar Eggemann
@ 2022-12-08  8:50   ` Ionela Voinescu
  2022-12-14  0:31     ` Ricardo Neri
  1 sibling, 1 reply; 39+ messages in thread
From: Ionela Voinescu @ 2022-12-08  8:50 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Peter Zijlstra (Intel),
	Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86,
	Joel Fernandes (Google),
	linux-kernel, linux-pm, Tim C . Chen

Hi,

On Monday 28 Nov 2022 at 05:20:44 (-0800), Ricardo Neri wrote:
[..]
>  
> +struct sg_lb_ipcc_stats {
> +	int min_score;	/* Min(score(rq->curr->ipcc)) */
> +	int min_ipcc;	/* Min(rq->curr->ipcc) */

Nit: this is not the minimum IPCC between the current tasks of all
runqueues, but the IPCC specific to the task with the minimum score.

Possibly there's not much to be done about the variable name, but the
comment can be made more clear.

Thanks,
Ionela.

> +	long sum_score; /* Sum(score(rq->curr->ipcc)) */
> +};
[..]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 09/22] sched/fair: Use IPC class score to select a busiest runqueue
  2022-11-28 13:20 ` [PATCH v2 09/22] sched/fair: Use IPC class score to select a busiest runqueue Ricardo Neri
@ 2022-12-08  8:51   ` Ionela Voinescu
  2022-12-14  0:32     ` Ricardo Neri
  0 siblings, 1 reply; 39+ messages in thread
From: Ionela Voinescu @ 2022-12-08  8:51 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Peter Zijlstra (Intel),
	Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86,
	Joel Fernandes (Google),
	linux-kernel, linux-pm, Tim C . Chen

Hi,

On Monday 28 Nov 2022 at 05:20:47 (-0800), Ricardo Neri wrote:
> For two runqueues of equal priority and equal number of running of tasks,
> select the one whose current task would have the highest IPC class score
> if placed on the destination CPU.
> 
[..]
> +static int ipcc_score_delta(struct task_struct *p, int alt_cpu)
> +{
> +	unsigned long ipcc = p->ipcc;
> +
> +	if (!sched_ipcc_enabled())
> +		return INT_MIN;
> +
> +	return arch_get_ipcc_score(ipcc, alt_cpu) -
> +	       arch_get_ipcc_score(ipcc, task_cpu(p));

Nit: arch_get_ipcc_score() return values are never checked for error.

> +}
> +
>  #else /* CONFIG_IPC_CLASSES */
>  static void update_sg_lb_ipcc_stats(struct sg_lb_ipcc_stats *sgcs,
>  				    struct rq *rq)
> @@ -9258,6 +9276,11 @@ static bool sched_asym_ipcc_pick(struct sched_group *a,
>  	return false;
>  }
>  
> +static int ipcc_score_delta(struct task_struct *p, int alt_cpu)
> +{
> +	return INT_MIN;
> +}
> +
>  #endif /* CONFIG_IPC_CLASSES */
>  
>  /**
> @@ -10419,8 +10442,8 @@ static struct rq *find_busiest_queue(struct lb_env *env,
>  {
>  	struct rq *busiest = NULL, *rq;
>  	unsigned long busiest_util = 0, busiest_load = 0, busiest_capacity = 1;
> +	int i, busiest_ipcc_delta = INT_MIN;
>  	unsigned int busiest_nr = 0;
> -	int i;
>  
>  	for_each_cpu_and(i, sched_group_span(group), env->cpus) {
>  		unsigned long capacity, load, util;
> @@ -10526,8 +10549,37 @@ static struct rq *find_busiest_queue(struct lb_env *env,
>  
>  		case migrate_task:
>  			if (busiest_nr < nr_running) {
> +				struct task_struct *curr;
> +
>  				busiest_nr = nr_running;
>  				busiest = rq;
> +
> +				/*
> +				 * Remember the IPC score delta of busiest::curr.
> +				 * We may need it to break a tie with other queues
> +				 * with equal nr_running.
> +				 */
> +				curr = rcu_dereference(busiest->curr);
> +				busiest_ipcc_delta = ipcc_score_delta(curr,
> +								      env->dst_cpu);
> +			/*
> +			 * If rq and busiest have the same number of running
> +			 * tasks, pick rq if doing so would give rq::curr a
> +			 * bigger IPC boost on dst_cpu.
> +			 */
> +			} else if (sched_ipcc_enabled() &&
> +				   busiest_nr == nr_running) {
> +				struct task_struct *curr;
> +				int delta;
> +
> +				curr = rcu_dereference(rq->curr);
> +				delta = ipcc_score_delta(curr, env->dst_cpu);
> +
> +				if (busiest_ipcc_delta < delta) {
> +					busiest_ipcc_delta = delta;
> +					busiest_nr = nr_running;
> +					busiest = rq;
> +				}
>  			}
>  			break;
>  

While in the commit message you describe this as breaking a tie for
asym_packing, the code here does not only affect asym_packing. If
another architecture would have sched_ipcc_enabled() it would use this
as generic policy, and that might not be desired.

Hope it helps,
Ionela.

> -- 
> 2.25.1
> 
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 04/22] sched/core: Add user_tick as argument to scheduler_tick()
  2022-12-07 12:21   ` Dietmar Eggemann
@ 2022-12-12 18:47     ` Ricardo Neri
  0 siblings, 0 replies; 39+ messages in thread
From: Ricardo Neri @ 2022-12-12 18:47 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: Peter Zijlstra (Intel),
	Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Len Brown, Mel Gorman,
	Rafael J. Wysocki, Srinivas Pandruvada, Steven Rostedt, Tim Chen,
	Valentin Schneider, x86, Joel Fernandes (Google),
	linux-kernel, linux-pm, Tim C . Chen

On Wed, Dec 07, 2022 at 01:21:47PM +0100, Dietmar Eggemann wrote:
> On 28/11/2022 14:20, Ricardo Neri wrote:
> > Differentiate between user and kernel ticks so that the scheduler updates
> > the IPC class of the current task during the latter.
> 
> Just to make sure ,,, 05/22 introduces `rq->curr` IPCC update during
> user_tick, i.e. the former?

Yes. Thank you for the catch!

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 06/22] sched/fair: Collect load-balancing stats for IPC classes
  2022-12-07 17:00   ` Dietmar Eggemann
@ 2022-12-12 21:41     ` Ricardo Neri
  0 siblings, 0 replies; 39+ messages in thread
From: Ricardo Neri @ 2022-12-12 21:41 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: Peter Zijlstra (Intel),
	Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Len Brown, Mel Gorman,
	Rafael J. Wysocki, Srinivas Pandruvada, Steven Rostedt, Tim Chen,
	Valentin Schneider, x86, Joel Fernandes (Google),
	linux-kernel, linux-pm, Tim C . Chen

On Wed, Dec 07, 2022 at 06:00:32PM +0100, Dietmar Eggemann wrote:
> On 28/11/2022 14:20, Ricardo Neri wrote:
> 
> [...]
> 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 224107278471..3a1d6c50a19b 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -9100,6 +9100,57 @@ group_type group_classify(unsigned int imbalance_pct,
> >  	return group_has_spare;
> >  }
> >  
> > +struct sg_lb_ipcc_stats {
> > +	int min_score;	/* Min(score(rq->curr->ipcc)) */
> > +	int min_ipcc;	/* Min(rq->curr->ipcc) */
> > +	long sum_score; /* Sum(score(rq->curr->ipcc)) */
> > +};
> 
> Wouldn't it be cleaner to put `min_score`, `min_ipcc` and `sum_score`
> into `struct sg_lb_stats` next to `ipcc_score_{after, before}` under the
> same #ifdef CONFIG_IPC_CLASSES?

Yes, that is a good observation. I initially wanted to hide these
intermediate and only expose the end result ipcc_score_{after, before} to
struct sg_lb_stats. I agree, it would look cleaner as you suggest.
> 
> Looks like those IPCC stats would only be needed in the specific
> condition under which update_sg_lb_stats_scores() is called?

True.

> 
> > +#ifdef CONFIG_IPC_CLASSES
> > +static void init_rq_ipcc_stats(struct sg_lb_ipcc_stats *sgcs)
> > +{
> > +	*sgcs = (struct sg_lb_ipcc_stats) {
> > +		.min_score = INT_MAX,
> > +	};
> > +}
> > +
> > +/** Called only if cpu_of(@rq) is not idle and has tasks running. */
> > +static void update_sg_lb_ipcc_stats(struct sg_lb_ipcc_stats *sgcs,
> > +				    struct rq *rq)
> > +{
> > +	struct task_struct *curr;
> > +	unsigned short ipcc;
> > +	int score;
> > +
> > +	if (!sched_ipcc_enabled())
> > +		return;
> > +
> > +	curr = rcu_dereference(rq->curr);
> > +	if (!curr || (curr->flags & PF_EXITING) || is_idle_task(curr))
> 
> So the Idle task is excluded but RT, DL, (Stopper) tasks are not. Looks
> weird if non-CFS tasks could influence CFS load-balancing.
> AFAICS, RT and DL tasks could have p->ipcc != IPC_CLASS_UNCLASSIFIED?

Agreed. Perhaps I can also check for !(task_is_realtime()), which see
seems to cover all these cases.
> 
> [...]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 02/22] sched: Add interfaces for IPC classes
  2022-12-08  8:48   ` Ionela Voinescu
@ 2022-12-14  0:31     ` Ricardo Neri
  2022-12-14 23:15       ` Ionela Voinescu
  0 siblings, 1 reply; 39+ messages in thread
From: Ricardo Neri @ 2022-12-14  0:31 UTC (permalink / raw)
  To: Ionela Voinescu
  Cc: Peter Zijlstra (Intel),
	Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86,
	Joel Fernandes (Google),
	linux-kernel, linux-pm, Tim C . Chen

On Thu, Dec 08, 2022 at 08:48:46AM +0000, Ionela Voinescu wrote:
> Hi,
> 
> On Monday 28 Nov 2022 at 05:20:40 (-0800), Ricardo Neri wrote:
> [..]
> > +#ifndef arch_has_ipc_classes
> > +/**
> > + * arch_has_ipc_classes() - Check whether hardware supports IPC classes of tasks
> > + *
> > + * Returns: true of IPC classes of tasks are supported.
> > + */
> > +static __always_inline
> > +bool arch_has_ipc_classes(void)
> > +{
> > +	return false;
> > +}
> > +#endif
> > +
> > +#ifndef arch_update_ipcc
> > +/**
> > + * arch_update_ipcc() - Update the IPC class of the current task
> > + * @curr:		The current task
> > + *
> > + * Request that the IPC classification of @curr is updated.
> > + *
> > + * Returns: none
> > + */
> > +static __always_inline
> > +void arch_update_ipcc(struct task_struct *curr)
> > +{
> > +}
> > +#endif
> > +
> > +#ifndef arch_get_ipcc_score
> > +/**
> > + * arch_get_ipcc_score() - Get the IPC score of a class of task
> > + * @ipcc:	The IPC class
> > + * @cpu:	A CPU number
> > + *
> > + * Returns the performance score of an IPC class when running on @cpu.
> > + * Error when either @class or @cpu are invalid.
> > + */
> > +static __always_inline
> > +int arch_get_ipcc_score(unsigned short ipcc, int cpu)
> > +{
> > +	return 1;
> > +}
> > +#endif

Thank you very much for your feedback Ionela!

> 
> The interface looks mostly alright but this arch_get_ipcc_score() leaves
> unclear what are the characteristics of the returned value.

Fair point. I mean for the return value to be defined by architectures;
but yes, architectures need to know how to implement this function.

> 
> Does it have a meaning as an absolute value or is it a value on an
> abstract scale? If it should be interpreted as instructions per cycle,
> if I wanted to have a proper comparison between the ability of two CPUs
> to handle this class of tasks then I would need to take into consideration
> the maximum frequency of each CPU.

Do you mean when calling arch_get_ipcc_score()? If yes, then I agree, IPC
class may not be the only factor, but the criteria to use the return value
is up to the caller.

In asym_packing it is assumed that higher-priority CPUs are preferred.
When balancing load, IPC class scores are used to select between otherwise
identical runqueues. This should also be the case for migrate_misfit: we
know already that the tasks being considered do not fit on their current
CPU.

We would need to think what to do with other type of balancing, if at all.

That said, arch_get_ipcc_score() should only return a metric of the
instructions-per-*cycle*, independent of frequency, no?

> If it's a performance value on an
> abstract scale (more likely), similar cu capacity, then it might be good
> to better define this abstract scale. That would help with the default
> implementation where possibly the best choice for a return value would
> be the maximum value on the scale, suggesting equal/maximum performance
> for different CPUs handling the class of tasks.

I guess something like:

#define SCHED_IPCC_DEFAULT_SCALE 1024

?

I think I am fine with this value being the default. I also think that it
is up to architectures to whether scale all IPC class scores from the
best-performing class on the best-performing CPU. Doing so would introduce
overhead, especially if hardware updates the IPC class scores multiple
times during runtime.

> 
> I suppose you avoided returning 0 for the default implementation as you
> intend that to mean the inability of the CPU to handle that class of
> tasks? It would be good to document this.

I meant this to be minimum possible IPC class score for any CPU: any
CPU should be able handle any IPC class. If not implemented, all CPUs
handle all IPC classes equally.

> 
> > +#else /* CONFIG_IPC_CLASSES */
> > +
> > +#define arch_get_ipcc_score(ipcc, cpu) (-EINVAL)
> > +#define arch_update_ipcc(curr)
> > +
> > +static inline bool sched_ipcc_enabled(void) { return false; }
> > +
> > +#endif /* CONFIG_IPC_CLASSES */
> > +
> >  #ifndef arch_scale_freq_capacity
> >  /**
> >   * arch_scale_freq_capacity - get the frequency scale factor of a given CPU.
> > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> > index 8154ef590b9f..eb1654b64df7 100644
> > --- a/kernel/sched/topology.c
> > +++ b/kernel/sched/topology.c
> > @@ -669,6 +669,9 @@ DEFINE_PER_CPU(struct sched_domain __rcu *, sd_numa);
> >  DEFINE_PER_CPU(struct sched_domain __rcu *, sd_asym_packing);
> >  DEFINE_PER_CPU(struct sched_domain __rcu *, sd_asym_cpucapacity);
> >  DEFINE_STATIC_KEY_FALSE(sched_asym_cpucapacity);
> > +#ifdef CONFIG_IPC_CLASSES
> > +DEFINE_STATIC_KEY_FALSE(sched_ipcc);
> > +#endif
> >  
> >  static void update_top_cache_domain(int cpu)
> >  {
> > @@ -2388,6 +2391,11 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att
> >  	if (has_asym)
> >  		static_branch_inc_cpuslocked(&sched_asym_cpucapacity);
> >  
> > +#ifdef CONFIG_IPC_CLASSES
> > +	if (arch_has_ipc_classes())
> > +		static_branch_enable_cpuslocked(&sched_ipcc);
> > +#endif
> 
> Wouldn't this be better placed directly in sched_init_smp()?
> It's not gated by and it does not need any sched domains information.

Very true. I will take your suggestion.

> 
> Hope it helps,

It does help significantly. Thanks again for your feedback.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 06/22] sched/fair: Collect load-balancing stats for IPC classes
  2022-12-08  8:50   ` Ionela Voinescu
@ 2022-12-14  0:31     ` Ricardo Neri
  0 siblings, 0 replies; 39+ messages in thread
From: Ricardo Neri @ 2022-12-14  0:31 UTC (permalink / raw)
  To: Ionela Voinescu
  Cc: Peter Zijlstra (Intel),
	Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86,
	Joel Fernandes (Google),
	linux-kernel, linux-pm, Tim C . Chen

On Thu, Dec 08, 2022 at 08:50:23AM +0000, Ionela Voinescu wrote:
> Hi,
> 
> On Monday 28 Nov 2022 at 05:20:44 (-0800), Ricardo Neri wrote:
> [..]
> >  
> > +struct sg_lb_ipcc_stats {
> > +	int min_score;	/* Min(score(rq->curr->ipcc)) */
> > +	int min_ipcc;	/* Min(rq->curr->ipcc) */
> 
> Nit: this is not the minimum IPCC between the current tasks of all
> runqueues, but the IPCC specific to the task with the minimum score.

> Possibly there's not much to be done about the variable name, but the
> comment can be made more clear.

Very true. I will make the change.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 09/22] sched/fair: Use IPC class score to select a busiest runqueue
  2022-12-08  8:51   ` Ionela Voinescu
@ 2022-12-14  0:32     ` Ricardo Neri
  2022-12-14 23:16       ` Ionela Voinescu
  0 siblings, 1 reply; 39+ messages in thread
From: Ricardo Neri @ 2022-12-14  0:32 UTC (permalink / raw)
  To: Ionela Voinescu
  Cc: Peter Zijlstra (Intel),
	Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86,
	Joel Fernandes (Google),
	linux-kernel, linux-pm, Tim C . Chen

On Thu, Dec 08, 2022 at 08:51:03AM +0000, Ionela Voinescu wrote:
> Hi,
> 
> On Monday 28 Nov 2022 at 05:20:47 (-0800), Ricardo Neri wrote:
> > For two runqueues of equal priority and equal number of running of tasks,
> > select the one whose current task would have the highest IPC class score
> > if placed on the destination CPU.
> > 
> [..]
> > +static int ipcc_score_delta(struct task_struct *p, int alt_cpu)
> > +{
> > +	unsigned long ipcc = p->ipcc;
> > +
> > +	if (!sched_ipcc_enabled())
> > +		return INT_MIN;
> > +
> > +	return arch_get_ipcc_score(ipcc, alt_cpu) -
> > +	       arch_get_ipcc_score(ipcc, task_cpu(p));
> 
> Nit: arch_get_ipcc_score() return values are never checked for error.

Fair point. I will handle error values.

> 
> > +}
> > +
> >  #else /* CONFIG_IPC_CLASSES */
> >  static void update_sg_lb_ipcc_stats(struct sg_lb_ipcc_stats *sgcs,
> >  				    struct rq *rq)
> > @@ -9258,6 +9276,11 @@ static bool sched_asym_ipcc_pick(struct sched_group *a,
> >  	return false;
> >  }
> >  
> > +static int ipcc_score_delta(struct task_struct *p, int alt_cpu)
> > +{
> > +	return INT_MIN;
> > +}
> > +
> >  #endif /* CONFIG_IPC_CLASSES */
> >  
> >  /**
> > @@ -10419,8 +10442,8 @@ static struct rq *find_busiest_queue(struct lb_env *env,
> >  {
> >  	struct rq *busiest = NULL, *rq;
> >  	unsigned long busiest_util = 0, busiest_load = 0, busiest_capacity = 1;
> > +	int i, busiest_ipcc_delta = INT_MIN;
> >  	unsigned int busiest_nr = 0;
> > -	int i;
> >  
> >  	for_each_cpu_and(i, sched_group_span(group), env->cpus) {
> >  		unsigned long capacity, load, util;
> > @@ -10526,8 +10549,37 @@ static struct rq *find_busiest_queue(struct lb_env *env,
> >  
> >  		case migrate_task:
> >  			if (busiest_nr < nr_running) {
> > +				struct task_struct *curr;
> > +
> >  				busiest_nr = nr_running;
> >  				busiest = rq;
> > +
> > +				/*
> > +				 * Remember the IPC score delta of busiest::curr.
> > +				 * We may need it to break a tie with other queues
> > +				 * with equal nr_running.
> > +				 */
> > +				curr = rcu_dereference(busiest->curr);
> > +				busiest_ipcc_delta = ipcc_score_delta(curr,
> > +								      env->dst_cpu);
> > +			/*
> > +			 * If rq and busiest have the same number of running
> > +			 * tasks, pick rq if doing so would give rq::curr a
> > +			 * bigger IPC boost on dst_cpu.
> > +			 */
> > +			} else if (sched_ipcc_enabled() &&
> > +				   busiest_nr == nr_running) {
> > +				struct task_struct *curr;
> > +				int delta;
> > +
> > +				curr = rcu_dereference(rq->curr);
> > +				delta = ipcc_score_delta(curr, env->dst_cpu);
> > +
> > +				if (busiest_ipcc_delta < delta) {
> > +					busiest_ipcc_delta = delta;
> > +					busiest_nr = nr_running;
> > +					busiest = rq;
> > +				}
> >  			}
> >  			break;
> >  
> 
> While in the commit message you describe this as breaking a tie for
> asym_packing,

Are you referring to the overall series or this specific patch? I checked
commit message and I do not see references to asym_packing.

> the code here does not only affect asym_packing. If
> another architecture would have sched_ipcc_enabled() it would use this
> as generic policy, and that might not be desired.

Indeed, the patchset implements support to use IPCC classes for asym_packing,
but it is not limited to it.

It is true that I don't check here for asym_packing, but it should not be a
problem, IMO. I compare two runqueues with equal nr_running, either runqueue
is a good choice. This tie breaker is an overall improvement, no?

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 02/22] sched: Add interfaces for IPC classes
  2022-11-28 13:20 ` [PATCH v2 02/22] sched: Add interfaces for IPC classes Ricardo Neri
  2022-12-08  8:48   ` Ionela Voinescu
@ 2022-12-14  7:36   ` Lukasz Luba
  2022-12-16 21:56     ` Ricardo Neri
  1 sibling, 1 reply; 39+ messages in thread
From: Lukasz Luba @ 2022-12-14  7:36 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Juri Lelli, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86,
	Joel Fernandes (Google),
	linux-kernel, linux-pm, Tim C . Chen, Vincent Guittot,
	Peter Zijlstra (Intel)

Hi Richardo,

I have some generic comment for the design of those interfaces.

On 11/28/22 13:20, Ricardo Neri wrote:
> Add the interfaces that architectures shall implement to convey the data
> to support IPC classes.
> 
> arch_update_ipcc() updates the IPC classification of the current task as
> given by hardware.
> 
> arch_get_ipcc_score() provides a performance score for a given IPC class
> when placed on a specific CPU. Higher scores indicate higher performance.
> 
> The number of classes and the score of each class of task are determined
> by hardware.
> 
> Cc: Ben Segall <bsegall@google.com>
> Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
> Cc: Len Brown <len.brown@intel.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Tim C. Chen <tim.c.chen@intel.com>
> Cc: Valentin Schneider <vschneid@redhat.com>
> Cc: x86@kernel.org
> Cc: linux-pm@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> ---
> Changes since v1:
>   * Shortened the names of the IPCC interfaces (PeterZ):
>     sched_task_classes_enabled >> sched_ipcc_enabled
>     arch_has_task_classes >> arch_has_ipc_classes
>     arch_update_task_class >> arch_update_ipcc
>     arch_get_task_class_score >> arch_get_ipcc_score
>   * Removed smt_siblings_idle argument from arch_update_ipcc(). (PeterZ)
> ---
>   kernel/sched/sched.h    | 60 +++++++++++++++++++++++++++++++++++++++++
>   kernel/sched/topology.c |  8 ++++++
>   2 files changed, 68 insertions(+)
> 
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index b1d338a740e5..75e22baa2622 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -2531,6 +2531,66 @@ void arch_scale_freq_tick(void)
>   }
>   #endif
>   
> +#ifdef CONFIG_IPC_CLASSES
> +DECLARE_STATIC_KEY_FALSE(sched_ipcc);
> +
> +static inline bool sched_ipcc_enabled(void)
> +{
> +	return static_branch_unlikely(&sched_ipcc);
> +}
> +
> +#ifndef arch_has_ipc_classes
> +/**
> + * arch_has_ipc_classes() - Check whether hardware supports IPC classes of tasks
> + *
> + * Returns: true of IPC classes of tasks are supported.
> + */
> +static __always_inline
> +bool arch_has_ipc_classes(void)
> +{
> +	return false;
> +}
> +#endif
> +
> +#ifndef arch_update_ipcc
> +/**
> + * arch_update_ipcc() - Update the IPC class of the current task
> + * @curr:		The current task
> + *
> + * Request that the IPC classification of @curr is updated.
> + *
> + * Returns: none
> + */
> +static __always_inline
> +void arch_update_ipcc(struct task_struct *curr)
> +{
> +}
> +#endif
> +
> +#ifndef arch_get_ipcc_score
> +/**
> + * arch_get_ipcc_score() - Get the IPC score of a class of task
> + * @ipcc:	The IPC class
> + * @cpu:	A CPU number
> + *
> + * Returns the performance score of an IPC class when running on @cpu.
> + * Error when either @class or @cpu are invalid.
> + */
> +static __always_inline
> +int arch_get_ipcc_score(unsigned short ipcc, int cpu)
> +{
> +	return 1;
> +}
> +#endif

Those interfaces are quite simple, probably work really OK with
your HW/FW. If any other architecture is going to re-use them
in future, we might face some issue. Let me explain why.

These kernel functions are start to be used very early in boot.
Your HW/FW is probably instantly ready to work from the very
beginning during boot. What is some other HW needs some
preparation code, like setup communication channel to FW or enable
needed clocks/events/etc.

What I would like to see is a similar mechanism to the one in schedutil.
Schedutil governor has to wait till cpufreq initialize the cpu freq
driver and policy objects (which sometimes takes ~2-3 sec). After that
cpufreq fwk starts the governor which populates this hook [1].
It's based on RCU mechanism with function pointer that can be then
called from the task scheduler when everything is ready to work.

If we (Arm) is going to use your proposed interfaces, we might need
different mechanisms because the platform likely would be ready after
our SCMI FW channels and cpufreq are setup.

Would it be possible to address such need now or I would have to
change that interface code later?

Regards,
Lukasz

[1] 
https://elixir.bootlin.com/linux/latest/source/kernel/sched/cpufreq.c#L29


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 02/22] sched: Add interfaces for IPC classes
  2022-12-14  0:31     ` Ricardo Neri
@ 2022-12-14 23:15       ` Ionela Voinescu
  2022-12-20  0:12         ` Ricardo Neri
  0 siblings, 1 reply; 39+ messages in thread
From: Ionela Voinescu @ 2022-12-14 23:15 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Peter Zijlstra (Intel),
	Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86,
	Joel Fernandes (Google),
	linux-kernel, linux-pm, Tim C . Chen

Hi,

On Tuesday 13 Dec 2022 at 16:31:28 (-0800), Ricardo Neri wrote:
> On Thu, Dec 08, 2022 at 08:48:46AM +0000, Ionela Voinescu wrote:
> > Hi,
> > 
> > On Monday 28 Nov 2022 at 05:20:40 (-0800), Ricardo Neri wrote:
> > [..]
> > > +#ifndef arch_has_ipc_classes
> > > +/**
> > > + * arch_has_ipc_classes() - Check whether hardware supports IPC classes of tasks
> > > + *
> > > + * Returns: true of IPC classes of tasks are supported.
> > > + */
> > > +static __always_inline
> > > +bool arch_has_ipc_classes(void)
> > > +{
> > > +	return false;
> > > +}
> > > +#endif
> > > +
> > > +#ifndef arch_update_ipcc
> > > +/**
> > > + * arch_update_ipcc() - Update the IPC class of the current task
> > > + * @curr:		The current task
> > > + *
> > > + * Request that the IPC classification of @curr is updated.
> > > + *
> > > + * Returns: none
> > > + */
> > > +static __always_inline
> > > +void arch_update_ipcc(struct task_struct *curr)
> > > +{
> > > +}
> > > +#endif
> > > +
> > > +#ifndef arch_get_ipcc_score
> > > +/**
> > > + * arch_get_ipcc_score() - Get the IPC score of a class of task
> > > + * @ipcc:	The IPC class
> > > + * @cpu:	A CPU number
> > > + *
> > > + * Returns the performance score of an IPC class when running on @cpu.
> > > + * Error when either @class or @cpu are invalid.
> > > + */
> > > +static __always_inline
> > > +int arch_get_ipcc_score(unsigned short ipcc, int cpu)
> > > +{
> > > +	return 1;
> > > +}
> > > +#endif
> 
> Thank you very much for your feedback Ionela!
> 
> > 
> > The interface looks mostly alright but this arch_get_ipcc_score() leaves
> > unclear what are the characteristics of the returned value.
> 
> Fair point. I mean for the return value to be defined by architectures;
> but yes, architectures need to know how to implement this function.
> 
> > 
> > Does it have a meaning as an absolute value or is it a value on an
> > abstract scale? If it should be interpreted as instructions per cycle,
> > if I wanted to have a proper comparison between the ability of two CPUs
> > to handle this class of tasks then I would need to take into consideration
> > the maximum frequency of each CPU.
> 
> Do you mean when calling arch_get_ipcc_score()? If yes, then I agree, IPC
> class may not be the only factor, but the criteria to use the return value
> is up to the caller.
> 

Yes, but if different architectures give different meanings to this score
(scale, relative difference between two values, etc) while the policies
are common (uses of arch_get_ipcc_score() in common scheduler paths)
then the outcome can be vastly different.

If the "criteria to use the returned value is up to the caller", then
the caller of arch_get_ipcc_score() should always be architecture
specific code, which currently is not (see 09/22).

> In asym_packing it is assumed that higher-priority CPUs are preferred.
> When balancing load, IPC class scores are used to select between otherwise
> identical runqueues. This should also be the case for migrate_misfit: we
> know already that the tasks being considered do not fit on their current
> CPU.
> 
> We would need to think what to do with other type of balancing, if at all.
> 
> That said, arch_get_ipcc_score() should only return a metric of the
> instructions-per-*cycle*, independent of frequency, no?
> 

Yes, performance on an abstract scale is preferred here. We would not
want to have to scale the score by frequency :). It was just an example
showing that the description of arch_get_ipcc_score() should be clarified.
Another possible clarification: is it expected that the scores scale
linearly with performance (does double the score mean double the
performance?).

> > If it's a performance value on an
> > abstract scale (more likely), similar cu capacity, then it might be good
> > to better define this abstract scale. That would help with the default
> > implementation where possibly the best choice for a return value would
> > be the maximum value on the scale, suggesting equal/maximum performance
> > for different CPUs handling the class of tasks.
> 
> I guess something like:
> 
> #define SCHED_IPCC_DEFAULT_SCALE 1024
> 
> ?
> 
> I think I am fine with this value being the default. I also think that it
> is up to architectures to whether scale all IPC class scores from the
> best-performing class on the best-performing CPU. Doing so would introduce
> overhead, especially if hardware updates the IPC class scores multiple
> times during runtime.
>

Yes, it's a very good point. Initially I thought that one would need to
rescale the values anyway for them to make sense relative to each other,
but I now realise that would not be needed.

Therefore, you are right, to avoid this extra work it's best to leave
the range of possible score values up to the implementer and not force
something like [0 - 1024].

But again, this raises the point that if one architecture decides to
return its scores on a scale [0 - 1024] and possibly use these scores to
scale utilization/alter capacity for example, this cannot be generic
policy as not all architectures are guaranteed to use this scale for its
scores.

So leaving the score unrestricted makes it more difficult to have
generic policies across architectures that use them.

> > 
> > I suppose you avoided returning 0 for the default implementation as you
> > intend that to mean the inability of the CPU to handle that class of
> > tasks? It would be good to document this.
> 
> I meant this to be minimum possible IPC class score for any CPU: any
> CPU should be able handle any IPC class. If not implemented, all CPUs
> handle all IPC classes equally.
> 

Ah, I see. In this case you might as well return 0 in the default
implementation of arch_get_ipcc_score(). I know it does not matter much
what gets returned there, but returning a meaningless "1" is strange to
me :).

Thanks,
Ionela.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 09/22] sched/fair: Use IPC class score to select a busiest runqueue
  2022-12-14  0:32     ` Ricardo Neri
@ 2022-12-14 23:16       ` Ionela Voinescu
  2022-12-16 23:24         ` Ricardo Neri
  0 siblings, 1 reply; 39+ messages in thread
From: Ionela Voinescu @ 2022-12-14 23:16 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Peter Zijlstra (Intel),
	Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86,
	Joel Fernandes (Google),
	linux-kernel, linux-pm, Tim C . Chen

Hi Ricardo,

On Tuesday 13 Dec 2022 at 16:32:43 (-0800), Ricardo Neri wrote:
[..]
> > >  /**
> > > @@ -10419,8 +10442,8 @@ static struct rq *find_busiest_queue(struct lb_env *env,
> > >  {
> > >  	struct rq *busiest = NULL, *rq;
> > >  	unsigned long busiest_util = 0, busiest_load = 0, busiest_capacity = 1;
> > > +	int i, busiest_ipcc_delta = INT_MIN;
> > >  	unsigned int busiest_nr = 0;
> > > -	int i;
> > >  
> > >  	for_each_cpu_and(i, sched_group_span(group), env->cpus) {
> > >  		unsigned long capacity, load, util;
> > > @@ -10526,8 +10549,37 @@ static struct rq *find_busiest_queue(struct lb_env *env,
> > >  
> > >  		case migrate_task:
> > >  			if (busiest_nr < nr_running) {
> > > +				struct task_struct *curr;
> > > +
> > >  				busiest_nr = nr_running;
> > >  				busiest = rq;
> > > +
> > > +				/*
> > > +				 * Remember the IPC score delta of busiest::curr.
> > > +				 * We may need it to break a tie with other queues
> > > +				 * with equal nr_running.
> > > +				 */
> > > +				curr = rcu_dereference(busiest->curr);
> > > +				busiest_ipcc_delta = ipcc_score_delta(curr,
> > > +								      env->dst_cpu);
> > > +			/*
> > > +			 * If rq and busiest have the same number of running
> > > +			 * tasks, pick rq if doing so would give rq::curr a
> > > +			 * bigger IPC boost on dst_cpu.
> > > +			 */
> > > +			} else if (sched_ipcc_enabled() &&
> > > +				   busiest_nr == nr_running) {
> > > +				struct task_struct *curr;
> > > +				int delta;
> > > +
> > > +				curr = rcu_dereference(rq->curr);
> > > +				delta = ipcc_score_delta(curr, env->dst_cpu);
> > > +
> > > +				if (busiest_ipcc_delta < delta) {
> > > +					busiest_ipcc_delta = delta;
> > > +					busiest_nr = nr_running;
> > > +					busiest = rq;
> > > +				}
> > >  			}
> > >  			break;
> > >  
> > 
> > While in the commit message you describe this as breaking a tie for
> > asym_packing,
> 
> Are you referring to the overall series or this specific patch? I checked
> commit message and I do not see references to asym_packing.

Sorry, my bad, I was thinking about the cover letter, not the commit
message. It's under "+++ Balancing load using classes of tasks. Theory
of operation".

> 
> > the code here does not only affect asym_packing. If
> > another architecture would have sched_ipcc_enabled() it would use this
> > as generic policy, and that might not be desired.
> 
> Indeed, the patchset implements support to use IPCC classes for asym_packing,
> but it is not limited to it.
> 

So is your current intention to support IPC classes only for asym_packing
for now? What would be the impact on you if you were to limit the
functionality in this patch to asym_packing only?

> It is true that I don't check here for asym_packing, but it should not be a
> problem, IMO. I compare two runqueues with equal nr_running, either runqueue
> is a good choice. This tie breaker is an overall improvement, no?
> 

It could be, but equally there could be other better policies as well -
other ways to consider IPC class information to break the tie.

If other architectures start having sched_ipcc_enabled() they would
automatically use the policy you've decided on here. If other policies
are better for those architectures this generic policy would be difficult
to modify to ensure there are no regressions for all other architectures
that use it, or it would be difficult to work around it.

For this and for future support of IPC classes I am just wondering if we
can better design how we enable different architectures to have different
policies.

Thanks,
Ionela.

> Thanks and BR,
> Ricardo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 02/22] sched: Add interfaces for IPC classes
  2022-12-14  7:36   ` Lukasz Luba
@ 2022-12-16 21:56     ` Ricardo Neri
  0 siblings, 0 replies; 39+ messages in thread
From: Ricardo Neri @ 2022-12-16 21:56 UTC (permalink / raw)
  To: Lukasz Luba
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Juri Lelli, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86,
	Joel Fernandes (Google),
	linux-kernel, linux-pm, Tim C . Chen, Vincent Guittot,
	Peter Zijlstra (Intel)

On Wed, Dec 14, 2022 at 07:36:44AM +0000, Lukasz Luba wrote:
> Hi Richardo,
> 
> I have some generic comment for the design of those interfaces.
> 
> On 11/28/22 13:20, Ricardo Neri wrote:
> > Add the interfaces that architectures shall implement to convey the data
> > to support IPC classes.
> > 
> > arch_update_ipcc() updates the IPC classification of the current task as
> > given by hardware.
> > 
> > arch_get_ipcc_score() provides a performance score for a given IPC class
> > when placed on a specific CPU. Higher scores indicate higher performance.
> > 
> > The number of classes and the score of each class of task are determined
> > by hardware.
> > 
> > Cc: Ben Segall <bsegall@google.com>
> > Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
> > Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> > Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
> > Cc: Len Brown <len.brown@intel.com>
> > Cc: Mel Gorman <mgorman@suse.de>
> > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > Cc: Tim C. Chen <tim.c.chen@intel.com>
> > Cc: Valentin Schneider <vschneid@redhat.com>
> > Cc: x86@kernel.org
> > Cc: linux-pm@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> > ---
> > Changes since v1:
> >   * Shortened the names of the IPCC interfaces (PeterZ):
> >     sched_task_classes_enabled >> sched_ipcc_enabled
> >     arch_has_task_classes >> arch_has_ipc_classes
> >     arch_update_task_class >> arch_update_ipcc
> >     arch_get_task_class_score >> arch_get_ipcc_score
> >   * Removed smt_siblings_idle argument from arch_update_ipcc(). (PeterZ)
> > ---
> >   kernel/sched/sched.h    | 60 +++++++++++++++++++++++++++++++++++++++++
> >   kernel/sched/topology.c |  8 ++++++
> >   2 files changed, 68 insertions(+)
> > 
> > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> > index b1d338a740e5..75e22baa2622 100644
> > --- a/kernel/sched/sched.h
> > +++ b/kernel/sched/sched.h
> > @@ -2531,6 +2531,66 @@ void arch_scale_freq_tick(void)
> >   }
> >   #endif
> > +#ifdef CONFIG_IPC_CLASSES
> > +DECLARE_STATIC_KEY_FALSE(sched_ipcc);
> > +
> > +static inline bool sched_ipcc_enabled(void)
> > +{
> > +	return static_branch_unlikely(&sched_ipcc);
> > +}
> > +
> > +#ifndef arch_has_ipc_classes
> > +/**
> > + * arch_has_ipc_classes() - Check whether hardware supports IPC classes of tasks
> > + *
> > + * Returns: true of IPC classes of tasks are supported.
> > + */
> > +static __always_inline
> > +bool arch_has_ipc_classes(void)
> > +{
> > +	return false;
> > +}
> > +#endif
> > +
> > +#ifndef arch_update_ipcc
> > +/**
> > + * arch_update_ipcc() - Update the IPC class of the current task
> > + * @curr:		The current task
> > + *
> > + * Request that the IPC classification of @curr is updated.
> > + *
> > + * Returns: none
> > + */
> > +static __always_inline
> > +void arch_update_ipcc(struct task_struct *curr)
> > +{
> > +}
> > +#endif
> > +
> > +#ifndef arch_get_ipcc_score
> > +/**
> > + * arch_get_ipcc_score() - Get the IPC score of a class of task
> > + * @ipcc:	The IPC class
> > + * @cpu:	A CPU number
> > + *
> > + * Returns the performance score of an IPC class when running on @cpu.
> > + * Error when either @class or @cpu are invalid.
> > + */
> > +static __always_inline
> > +int arch_get_ipcc_score(unsigned short ipcc, int cpu)
> > +{
> > +	return 1;
> > +}
> > +#endif
> 
> Those interfaces are quite simple, probably work really OK with
> your HW/FW. If any other architecture is going to re-use them
> in future, we might face some issue. Let me explain why.
> 
> These kernel functions are start to be used very early in boot.
> Your HW/FW is probably instantly ready to work from the very
> beginning during boot. What is some other HW needs some
> preparation code, like setup communication channel to FW or enable
> needed clocks/events/etc.
> 
> What I would like to see is a similar mechanism to the one in schedutil.
> Schedutil governor has to wait till cpufreq initialize the cpu freq
> driver and policy objects (which sometimes takes ~2-3 sec). After that
> cpufreq fwk starts the governor which populates this hook [1].
> It's based on RCU mechanism with function pointer that can be then
> called from the task scheduler when everything is ready to work.
> 
> If we (Arm) is going to use your proposed interfaces, we might need
> different mechanisms because the platform likely would be ready after
> our SCMI FW channels and cpufreq are setup.
> 
> Would it be possible to address such need now or I would have to
> change that interface code later?

Thank you very much for your feeback, Lukas!

I took a look a cpufreq implementation you refer. I can certainly try to
accommodate your requirements. Before jumping into it, I have a few
questions.

I see that cpufreq_update_util() only does something when the per-CPU
pointers cpufreq_update_util_data become non-NULL. I use static key for the
same purpose. Is this not usable for you?

Indeed, arch_has_ipc_classes() implies that has to return true very early
after boot if called, as per Ionela's suggestion from sched_init_smp(). I
can convert this interface to an arch_enable_ipc_classes() that drivers or
preparation code can call when ready. Would this be acceptable?

Do think that a hook per CPU would be needed? If unsure, perhaps this can
be left for future work.

Thanks and BR,
Ricardo
>
> [1]
> https://elixir.bootlin.com/linux/latest/source/kernel/sched/cpufreq.c#L29
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 09/22] sched/fair: Use IPC class score to select a busiest runqueue
  2022-12-14 23:16       ` Ionela Voinescu
@ 2022-12-16 23:24         ` Ricardo Neri
  0 siblings, 0 replies; 39+ messages in thread
From: Ricardo Neri @ 2022-12-16 23:24 UTC (permalink / raw)
  To: Ionela Voinescu
  Cc: Peter Zijlstra (Intel),
	Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86,
	Joel Fernandes (Google),
	linux-kernel, linux-pm, Tim C . Chen

On Wed, Dec 14, 2022 at 11:16:39PM +0000, Ionela Voinescu wrote:
> Hi Ricardo,
> 
> On Tuesday 13 Dec 2022 at 16:32:43 (-0800), Ricardo Neri wrote:
> [..]
> > > >  /**
> > > > @@ -10419,8 +10442,8 @@ static struct rq *find_busiest_queue(struct lb_env *env,
> > > >  {
> > > >  	struct rq *busiest = NULL, *rq;
> > > >  	unsigned long busiest_util = 0, busiest_load = 0, busiest_capacity = 1;
> > > > +	int i, busiest_ipcc_delta = INT_MIN;
> > > >  	unsigned int busiest_nr = 0;
> > > > -	int i;
> > > >  
> > > >  	for_each_cpu_and(i, sched_group_span(group), env->cpus) {
> > > >  		unsigned long capacity, load, util;
> > > > @@ -10526,8 +10549,37 @@ static struct rq *find_busiest_queue(struct lb_env *env,
> > > >  
> > > >  		case migrate_task:
> > > >  			if (busiest_nr < nr_running) {
> > > > +				struct task_struct *curr;
> > > > +
> > > >  				busiest_nr = nr_running;
> > > >  				busiest = rq;
> > > > +
> > > > +				/*
> > > > +				 * Remember the IPC score delta of busiest::curr.
> > > > +				 * We may need it to break a tie with other queues
> > > > +				 * with equal nr_running.
> > > > +				 */
> > > > +				curr = rcu_dereference(busiest->curr);
> > > > +				busiest_ipcc_delta = ipcc_score_delta(curr,
> > > > +								      env->dst_cpu);
> > > > +			/*
> > > > +			 * If rq and busiest have the same number of running
> > > > +			 * tasks, pick rq if doing so would give rq::curr a
> > > > +			 * bigger IPC boost on dst_cpu.
> > > > +			 */
> > > > +			} else if (sched_ipcc_enabled() &&
> > > > +				   busiest_nr == nr_running) {
> > > > +				struct task_struct *curr;
> > > > +				int delta;
> > > > +
> > > > +				curr = rcu_dereference(rq->curr);
> > > > +				delta = ipcc_score_delta(curr, env->dst_cpu);
> > > > +
> > > > +				if (busiest_ipcc_delta < delta) {
> > > > +					busiest_ipcc_delta = delta;
> > > > +					busiest_nr = nr_running;
> > > > +					busiest = rq;
> > > > +				}
> > > >  			}
> > > >  			break;
> > > >  
> > > 
> > > While in the commit message you describe this as breaking a tie for
> > > asym_packing,
> > 
> > Are you referring to the overall series or this specific patch? I checked
> > commit message and I do not see references to asym_packing.
> 
> Sorry, my bad, I was thinking about the cover letter, not the commit
> message. It's under "+++ Balancing load using classes of tasks. Theory
> of operation".
> 
> > 
> > > the code here does not only affect asym_packing. If
> > > another architecture would have sched_ipcc_enabled() it would use this
> > > as generic policy, and that might not be desired.
> > 
> > Indeed, the patchset implements support to use IPCC classes for asym_packing,
> > but it is not limited to it.
> > 
> 
> So is your current intention to support IPC classes only for asym_packing
> for now?

My intention is to introduce IPC classes in general and make it available
to other policies or architectures. I use asym_packing as use case.


> What would be the impact on you if you were to limit the
> functionality in this patch to asym_packing only?

There would not be any adverse impact.

> 
> > It is true that I don't check here for asym_packing, but it should not be a
> > problem, IMO. I compare two runqueues with equal nr_running, either runqueue
> > is a good choice. This tie breaker is an overall improvement, no?
> > 
> 
> It could be, but equally there could be other better policies as well -
> other ways to consider IPC class information to break the tie.
> 
> If other architectures start having sched_ipcc_enabled() they would
> automatically use the policy you've decided on here. If other policies
> are better for those architectures this generic policy would be difficult
> to modify to ensure there are no regressions for all other architectures
> that use it, or it would be difficult to work around it.
> 
> For this and for future support of IPC classes I am just wondering if we
> can better design how we enable different architectures to have different
> policies.

I see your point. I agree that other architectures may want to implement
policies differently. I'll add an extra check for env->sd & SD_ASYM_PACKING.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 02/22] sched: Add interfaces for IPC classes
  2022-12-14 23:15       ` Ionela Voinescu
@ 2022-12-20  0:12         ` Ricardo Neri
  0 siblings, 0 replies; 39+ messages in thread
From: Ricardo Neri @ 2022-12-20  0:12 UTC (permalink / raw)
  To: Ionela Voinescu
  Cc: Peter Zijlstra (Intel),
	Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86,
	Joel Fernandes (Google),
	linux-kernel, linux-pm, Tim C . Chen

On Wed, Dec 14, 2022 at 11:15:56PM +0000, Ionela Voinescu wrote:
> Hi,
> 
> On Tuesday 13 Dec 2022 at 16:31:28 (-0800), Ricardo Neri wrote:
> > On Thu, Dec 08, 2022 at 08:48:46AM +0000, Ionela Voinescu wrote:
> > > Hi,
> > > 
> > > On Monday 28 Nov 2022 at 05:20:40 (-0800), Ricardo Neri wrote:
> > > [..]
> > > > +#ifndef arch_has_ipc_classes
> > > > +/**
> > > > + * arch_has_ipc_classes() - Check whether hardware supports IPC classes of tasks
> > > > + *
> > > > + * Returns: true of IPC classes of tasks are supported.
> > > > + */
> > > > +static __always_inline
> > > > +bool arch_has_ipc_classes(void)
> > > > +{
> > > > +	return false;
> > > > +}
> > > > +#endif
> > > > +
> > > > +#ifndef arch_update_ipcc
> > > > +/**
> > > > + * arch_update_ipcc() - Update the IPC class of the current task
> > > > + * @curr:		The current task
> > > > + *
> > > > + * Request that the IPC classification of @curr is updated.
> > > > + *
> > > > + * Returns: none
> > > > + */
> > > > +static __always_inline
> > > > +void arch_update_ipcc(struct task_struct *curr)
> > > > +{
> > > > +}
> > > > +#endif
> > > > +
> > > > +#ifndef arch_get_ipcc_score
> > > > +/**
> > > > + * arch_get_ipcc_score() - Get the IPC score of a class of task
> > > > + * @ipcc:	The IPC class
> > > > + * @cpu:	A CPU number
> > > > + *
> > > > + * Returns the performance score of an IPC class when running on @cpu.
> > > > + * Error when either @class or @cpu are invalid.
> > > > + */
> > > > +static __always_inline
> > > > +int arch_get_ipcc_score(unsigned short ipcc, int cpu)
> > > > +{
> > > > +	return 1;
> > > > +}
> > > > +#endif
> > 
> > Thank you very much for your feedback Ionela!
> > 
> > > 
> > > The interface looks mostly alright but this arch_get_ipcc_score() leaves
> > > unclear what are the characteristics of the returned value.
> > 
> > Fair point. I mean for the return value to be defined by architectures;
> > but yes, architectures need to know how to implement this function.
> > 
> > > 
> > > Does it have a meaning as an absolute value or is it a value on an
> > > abstract scale? If it should be interpreted as instructions per cycle,
> > > if I wanted to have a proper comparison between the ability of two CPUs
> > > to handle this class of tasks then I would need to take into consideration
> > > the maximum frequency of each CPU.
> > 
> > Do you mean when calling arch_get_ipcc_score()? If yes, then I agree, IPC
> > class may not be the only factor, but the criteria to use the return value
> > is up to the caller.
> > 
> 
> Yes, but if different architectures give different meanings to this score
> (scale, relative difference between two values, etc) while the policies
> are common (uses of arch_get_ipcc_score() in common scheduler paths)
> then the outcome can be vastly different.

One more reason to leave to the caller the handling of the returned value.

> 
> If the "criteria to use the returned value is up to the caller", then
> the caller of arch_get_ipcc_score() should always be architecture
> specific code, which currently is not (see 09/22).

Agreed. I now get your point. I'll change my patch accordingly.

> 
> > In asym_packing it is assumed that higher-priority CPUs are preferred.
> > When balancing load, IPC class scores are used to select between otherwise
> > identical runqueues. This should also be the case for migrate_misfit: we
> > know already that the tasks being considered do not fit on their current
> > CPU.
> > 
> > We would need to think what to do with other type of balancing, if at all.
> > 
> > That said, arch_get_ipcc_score() should only return a metric of the
> > instructions-per-*cycle*, independent of frequency, no?
> > 
> 
> Yes, performance on an abstract scale is preferred here. We would not
> want to have to scale the score by frequency :). It was just an example
> showing that the description of arch_get_ipcc_score() should be clarified.
> Another possible clarification: is it expected that the scores scale
> linearly with performance (does double the score mean double the
> performance?).

Indeed this seems sensible.

> 
> > > If it's a performance value on an
> > > abstract scale (more likely), similar cu capacity, then it might be good
> > > to better define this abstract scale. That would help with the default
> > > implementation where possibly the best choice for a return value would
> > > be the maximum value on the scale, suggesting equal/maximum performance
> > > for different CPUs handling the class of tasks.
> > 
> > I guess something like:
> > 
> > #define SCHED_IPCC_DEFAULT_SCALE 1024
> > 
> > ?
> > 
> > I think I am fine with this value being the default. I also think that it
> > is up to architectures to whether scale all IPC class scores from the
> > best-performing class on the best-performing CPU. Doing so would introduce
> > overhead, especially if hardware updates the IPC class scores multiple
> > times during runtime.
> >
> 
> Yes, it's a very good point. Initially I thought that one would need to
> rescale the values anyway for them to make sense relative to each other,
> but I now realise that would not be needed.
> 
> Therefore, you are right, to avoid this extra work it's best to leave
> the range of possible score values up to the implementer and not force
> something like [0 - 1024].
> 
> But again, this raises the point that if one architecture decides to
> return its scores on a scale [0 - 1024] and possibly use these scores to
> scale utilization/alter capacity for example, this cannot be generic
> policy as not all architectures are guaranteed to use this scale for its
> scores.

Very good point.

> 
> So leaving the score unrestricted makes it more difficult to have
> generic policies across architectures that use them.
>

In asym_packing we select the CPU of higher priority, regardless of how big
the priority delta is. IPC classes extend the same mechanism. (We do have
a throughput calculation, but it does not require IPC class to be scaled).

So yes, IPC classes need to be scaled when combined with another metric.

Another addition to the documentation of the interface? :)

> 
> > > 
> > > I suppose you avoided returning 0 for the default implementation as you
> > > intend that to mean the inability of the CPU to handle that class of
> > > tasks? It would be good to document this.
> > 
> > I meant this to be minimum possible IPC class score for any CPU: any
> > CPU should be able handle any IPC class. If not implemented, all CPUs
> > handle all IPC classes equally.
> > 
> 
> Ah, I see. In this case you might as well return 0 in the default
> implementation of arch_get_ipcc_score(). I know it does not matter much
> what gets returned there, but returning a meaningless "1" is strange to
> me :).

Yes, the value does not really matter to my use case, as long as it the
same for all all CPUs. I can use 1024 as other scheduler metrics.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2022-12-20  0:03 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-28 13:20 [PATCH v2 00/22] sched: Introduce IPC classes for load balance Ricardo Neri
2022-11-28 13:20 ` [PATCH v2 01/22] sched/task_struct: Introduce IPC classes of tasks Ricardo Neri
2022-11-28 13:20 ` [PATCH v2 02/22] sched: Add interfaces for IPC classes Ricardo Neri
2022-12-08  8:48   ` Ionela Voinescu
2022-12-14  0:31     ` Ricardo Neri
2022-12-14 23:15       ` Ionela Voinescu
2022-12-20  0:12         ` Ricardo Neri
2022-12-14  7:36   ` Lukasz Luba
2022-12-16 21:56     ` Ricardo Neri
2022-11-28 13:20 ` [PATCH v2 03/22] sched/core: Initialize the IPC class of a new task Ricardo Neri
2022-11-28 13:20 ` [PATCH v2 04/22] sched/core: Add user_tick as argument to scheduler_tick() Ricardo Neri
2022-12-07 12:21   ` Dietmar Eggemann
2022-12-12 18:47     ` Ricardo Neri
2022-11-28 13:20 ` [PATCH v2 05/22] sched/core: Update the IPC class of the current task Ricardo Neri
2022-11-28 13:20 ` [PATCH v2 06/22] sched/fair: Collect load-balancing stats for IPC classes Ricardo Neri
2022-12-07 17:00   ` Dietmar Eggemann
2022-12-12 21:41     ` Ricardo Neri
2022-12-08  8:50   ` Ionela Voinescu
2022-12-14  0:31     ` Ricardo Neri
2022-11-28 13:20 ` [PATCH v2 07/22] sched/fair: Compute IPC class scores for load balancing Ricardo Neri
2022-11-28 13:20 ` [PATCH v2 08/22] sched/fair: Use IPC class to pick the busiest group Ricardo Neri
2022-11-28 13:20 ` [PATCH v2 09/22] sched/fair: Use IPC class score to select a busiest runqueue Ricardo Neri
2022-12-08  8:51   ` Ionela Voinescu
2022-12-14  0:32     ` Ricardo Neri
2022-12-14 23:16       ` Ionela Voinescu
2022-12-16 23:24         ` Ricardo Neri
2022-11-28 13:20 ` [PATCH v2 10/22] thermal: intel: hfi: Introduce Intel Thread Director classes Ricardo Neri
2022-11-28 13:20 ` [PATCH v2 11/22] thermal: intel: hfi: Store per-CPU IPCC scores Ricardo Neri
2022-11-28 13:20 ` [PATCH v2 12/22] x86/cpufeatures: Add the Intel Thread Director feature definitions Ricardo Neri
2022-11-28 13:20 ` [PATCH v2 13/22] thermal: intel: hfi: Update the IPC class of the current task Ricardo Neri
2022-11-28 13:20 ` [PATCH v2 14/22] thermal: intel: hfi: Report the IPC class score of a CPU Ricardo Neri
2022-11-28 13:20 ` [PATCH v2 15/22] thermal: intel: hfi: Define a default class for unclassified tasks Ricardo Neri
2022-11-28 13:20 ` [PATCH v2 16/22] thermal: intel: hfi: Enable the Intel Thread Director Ricardo Neri
2022-11-28 13:20 ` [PATCH v2 17/22] sched/task_struct: Add helpers for IPC classification Ricardo Neri
2022-11-28 13:20 ` [PATCH v2 18/22] sched/core: Initialize helpers of task classification Ricardo Neri
2022-11-28 13:20 ` [PATCH v2 19/22] thermal: intel: hfi: Implement model-specific checks for " Ricardo Neri
2022-11-28 13:20 ` [PATCH v2 20/22] x86/cpufeatures: Add feature bit for HRESET Ricardo Neri
2022-11-28 13:20 ` [PATCH v2 21/22] x86/hreset: Configure history reset Ricardo Neri
2022-11-28 13:21 ` [PATCH v2 22/22] x86/process: Reset hardware history in context switch Ricardo Neri

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).