linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 00/23] sched: Introduce classes of tasks for load balance
@ 2022-09-09 23:11 Ricardo Neri
  2022-09-09 23:11 ` [RFC PATCH 01/23] sched/task_struct: Introduce classes of tasks Ricardo Neri
                   ` (24 more replies)
  0 siblings, 25 replies; 78+ messages in thread
From: Ricardo Neri @ 2022-09-09 23:11 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Ricardo Neri

+++ Introduction

On hybrid processors, the microarchitectural properties of the different
types of CPUs cause them to have different instruction-per-cycle (IPC)
capabilities. IPC can be higher on some CPUs for advanced instructions
Figure 1 illustrates this concept. It plots hypothetical workloads
grouped by classes of instructions vs the IPC ratio between high and low
performance CPUs.

IPC ratio
  ^
  | Class0 .             Class1               .   ClassN-1    .  ClassN
  |        .                                  .               .   +
  |        .                                  .               .  +
  |        .                                  .               . +
  |        .                                  . + + + + + + + +
  |        .                                  .               .
  |        .                                  .               .
  |        .             + + + + + + + + + + +                .
  |        + + + + + + +                      .               .
  |      + .                                  .               .
  |     +  .                                  .               .
  |    +   .                                  .               .
  |   +    .                                  .               .
1 |-------------------------------------------//---------------------------->
  |                                                      wokloads of interest
															
            Figure 1. Hypothetical workloads sorted by IPC ratio


The load balancer can discover the use of advanced instructions and prefer
CPUs with higher IPC for tasks running those instructions.

Hardware is free to partition its instruction set into an arbitrary number
of classes. It must provide a mechanism identify the class of the
currently running task and inform the kernel about the performance of each
class of task on each type of CPU.

This patchset introduces the concept of classes of tasks, proposes the
interfaces that hardware needs to implement and proposes changes to the
load balancer to leverage this extra information in combination with
asymmetric packing.

This patchset includes a full implementation for Intel hybrid processors
using Intel Thread Director technology [1].


+++ Structure of this series

Patches 1-6 introduce the concept of classes of tasks. They also introduce
interfaces that architectures implement to update the class of a task and
to inform the scheduler about the class-specific performance scores.

Patches 7-9 use the class of the current task of each runqueue to break
ties between two otherwise identical group_asym_packing scheduling groups.

Patches 10-16 implement support for classes of tasks on Intel hybrid
processors using Intel Thread Director technology.

Patches 17-19 introduce extra helper members to task_struct to deal with
transient classification of tasks and arch-specific implementation
vagaries.

Patches 20-22 are specific to Intel Thread Director. They reset the
classification hardware when switching to a new task.


+++ Classes of tasks

The first step to leverage the asymmetries in IPC ratios is to assign a
class label to each individual task. Hardware can monitor the instruction
stream and classify a task as it runs. At user tick, the kernel reads the
resulting classification and assigns it to the currently running task. It
stores the class in the proposed task_struct::class.


+++ Balancing load using classes of tasks. Theory of operation

Intel hybrid processors rely on asymmetric packing to place tasks on
higher performance CPUs first. The class of the current task on each
runqueue can be used to break ties between two otherwise identical
scheduling groups.

Consider these scenarios (for simplicity, assume that task-class
performance score is such that

score(Cl0) < score(Cl1) < ... < score(Cl(n-1)) < score(Cln)). (Eq I)

Boxes depict scheduling groups being considered during load balance.
Labels inside the box depict the class of rq->curr, or the CPU being
idle.

    asym
    packing
    priorities    50    50           30           30
                _____________
                |  i  .  i  |
                |  d  .  d  |
                |  l  .  l  |      _______      _______
                |  e  .  e  |      | Cl0 |      | Cl1 |
                |___________|      |_____|      |_____|

                         ^
                      dst_cpu        sgA          sgB
                                                   ^
                                                busiest

                           Figure 2. Scenario A
	
In Figure 2, dst_cpu is a group of SMT siblings, has become idle, has
higher priority, and is identifying the busiest group. sgA and sgB are of
group_asym_packing type, have the same priority, have a single CPU, and
have the same number of running tasks. By checking the class of the task
currently running on both scheduling groups, it selects sgB as the busiest
because it has a class of task higher performance score if placed on
dst_cpu.

    asym
    packing
    priorities    50    50           50    50           30
                _____________     _____________
                |     .     |     |     .     |
                |     .     |     |     .     |       idle
                | cl0 . cl1 |     | cl0 . cl2 |      _______
                |     .     |     |     .     |      |     |
                |___________|     |___________|      |_____|

                                                        ^
                     sgA                sgB          dst_cpu
                                         ^
                                    busiest group

                                     ^
                                  busiest queue

                           Figure 3. Scenario B

In Figure 3, dst_cpu has become idle, has lower priority and is identifying
a busiest group. sgA and sgB are groups of SMT siblings. Both siblings are
busy and, therefore, classified as group_asym_packing [2], have the same
priority and the same number of running tasks. The load balancer computes
the class-specific performance score (scaled by the number of busy
siblings) by observing the currently running task on each runqueue.

As per Eq. I, cl0+cl2 has a higher throughput than cl0+cl1. So, it selects
sgB as the busiest group. If cl2 is left to run with the whole big core to
itself, it would deliver higher throughput than cl0. Hence, the runqueue of
cl0 is selected as the busiest.


+++ Dependencies
These patches assume that both SMT siblings of a core have the same
priority, as proposed in [3]. Also, they rely on the existing support for
the Hardware Feedback Interface [4].


I look forward to your review and thank you in advance!

These patches have been Acked-by: Len Brown <len.brown@intel.com>

Thanks and BR,
Ricardo

[1]. Intel Software Developer Manual, Volume 3, Section 14.6
     https://intel.com/sdm
[2]. https://lkml.kernel.org/r/20210911011819.12184-7-ricardo.neri-calderon@linux.intel.com
[3]. https://lore.kernel.org/lkml/20220825225529.26465-1-ricardo.neri-calderon@linux.intel.com/
[4]. https://lore.kernel.org/linux-pm/20220127193454.12814-1-ricardo.neri-calderon@linux.intel.com/

Ricardo Neri (23):
  sched/task_struct: Introduce classes of tasks
  sched: Add interfaces for classes of tasks
  sched/core: Initialize the class of a new task
  sched/core: Add user_tick as argument to scheduler_tick()
  sched/core: Move is_core_idle() out of fair.c
  sched/core: Update the classification of the current task
  sched/fair: Collect load-balancing stats for task classes
  sched/fair: Compute task-class performance scores for load balancing
  sched/fair: Use task-class performance score to pick the busiest group
  sched/fair: Use classes of tasks when selecting a busiest runqueue
  thermal: intel: hfi: Introduce Hardware Feedback Interface classes
  thermal: intel: hfi: Convert table_lock to use flags-handling variants
  x86/cpufeatures: Add the Intel Thread Director feature definitions
  thermal: intel: hfi: Update the class of the current task
  thermal: intel: hfi: Report per-cpu class-specific performance scores
  thermal: intel: hfi: Define a default classification for unclassified
    tasks
  thermal: intel: hfi: Enable the Intel Thread Director
  sched/task_struct: Add helpers for task classification
  sched/core: Initialize helpers of task classification
  thermal: intel: hfi: Implement model-specific checks for task
    classification
  x86/cpufeatures: Add feature bit for HRESET
  x86/hreset: Configure history reset
  x86/process: Reset hardware history in context switch

 arch/x86/include/asm/cpufeatures.h       |   2 +
 arch/x86/include/asm/disabled-features.h |   8 +-
 arch/x86/include/asm/hreset.h            |  30 ++++
 arch/x86/include/asm/msr-index.h         |   4 +
 arch/x86/include/asm/topology.h          |  10 ++
 arch/x86/kernel/cpu/common.c             |  35 +++-
 arch/x86/kernel/cpu/cpuid-deps.c         |   1 +
 arch/x86/kernel/cpu/scattered.c          |   1 +
 arch/x86/kernel/process_32.c             |   3 +
 arch/x86/kernel/process_64.c             |   3 +
 drivers/thermal/intel/Kconfig            |  12 ++
 drivers/thermal/intel/intel_hfi.c        | 215 +++++++++++++++++++++--
 include/linux/sched.h                    |  19 +-
 init/Kconfig                             |   9 +
 kernel/sched/core.c                      |  10 +-
 kernel/sched/fair.c                      | 214 ++++++++++++++++++++--
 kernel/sched/sched.h                     |  81 +++++++++
 kernel/sched/topology.c                  |   8 +
 kernel/time/timer.c                      |   2 +-
 19 files changed, 635 insertions(+), 32 deletions(-)
 create mode 100644 arch/x86/include/asm/hreset.h

-- 
2.25.1


^ permalink raw reply	[flat|nested] 78+ messages in thread

* [RFC PATCH 01/23] sched/task_struct: Introduce classes of tasks
  2022-09-09 23:11 [RFC PATCH 00/23] sched: Introduce classes of tasks for load balance Ricardo Neri
@ 2022-09-09 23:11 ` Ricardo Neri
  2022-09-14 13:46   ` Peter Zijlstra
  2022-09-09 23:11 ` [RFC PATCH 02/23] sched: Add interfaces for " Ricardo Neri
                   ` (23 subsequent siblings)
  24 siblings, 1 reply; 78+ messages in thread
From: Ricardo Neri @ 2022-09-09 23:11 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Ricardo Neri, Tim C . Chen

On hybrid processors, the architecture differences between the types of
CPUs lead to different instructions-per-cycle (IPC) on each type of CPU.
IPCs may differ further by the type of instructions. Instructions can be
grouped into classes of similar IPCs.

Hence, tasks may be classified into groups depending on the type of
instructions they execute.

Add a new member task_struct::classid to associate a particular task to
a classification that depends on the instructions it executes.

The scheduler may use the classification of a task and information about
the relative performance among CPUs of a particular class of task to
improve throughput. It may, for instance, place certain tasks on CPUs that
have higher performance

The methods to determine the classification of a task and its relative
IPC are specific to each CPU architecture.

Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim C. Chen <tim.c.chen@intel.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 include/linux/sched.h | 7 +++++++
 init/Kconfig          | 9 +++++++++
 2 files changed, 16 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index e7b2f8a5c711..acc33dbaa47c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -117,6 +117,8 @@ struct task_group;
 					 __TASK_TRACED | EXIT_DEAD | EXIT_ZOMBIE | \
 					 TASK_PARKED)
 
+#define TASK_CLASS_UNCLASSIFIED		-1
+
 #define task_is_running(task)		(READ_ONCE((task)->__state) == TASK_RUNNING)
 
 #define task_is_traced(task)		((READ_ONCE(task->jobctl) & JOBCTL_TRACED) != 0)
@@ -1512,6 +1514,11 @@ struct task_struct {
 	union rv_task_monitor		rv[RV_PER_TASK_MONITORS];
 #endif
 
+#ifdef CONFIG_SCHED_TASK_CLASSES
+	/* Class of task that the scheduler uses for task placement decisions */
+	short				class;
+#endif
+
 	/*
 	 * New fields for task_struct should be added above here, so that
 	 * they are included in the randomized portion of task_struct.
diff --git a/init/Kconfig b/init/Kconfig
index 532362fcfe31..edfa27f8717a 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -854,6 +854,15 @@ config UCLAMP_BUCKETS_COUNT
 
 	  If in doubt, use the default value.
 
+config SCHED_TASK_CLASSES
+	bool "Classes of tasks"
+	depends on SMP
+	help
+	  If selected, each task is assigned a classification value that
+	  reflects the types of instructions that the task executes. The
+	  scheduler uses the classification value to improve the placement of
+	  tasks.
+
 endmenu
 
 #
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [RFC PATCH 02/23] sched: Add interfaces for classes of tasks
  2022-09-09 23:11 [RFC PATCH 00/23] sched: Introduce classes of tasks for load balance Ricardo Neri
  2022-09-09 23:11 ` [RFC PATCH 01/23] sched/task_struct: Introduce classes of tasks Ricardo Neri
@ 2022-09-09 23:11 ` Ricardo Neri
  2022-09-09 23:11 ` [RFC PATCH 03/23] sched/core: Initialize the class of a new task Ricardo Neri
                   ` (22 subsequent siblings)
  24 siblings, 0 replies; 78+ messages in thread
From: Ricardo Neri @ 2022-09-09 23:11 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Ricardo Neri, Tim C . Chen

Add the interfaces that architectures shall implement to convey needed data
to support classes of tasks.

arch_update_task_class() updates the classification of the current task as
given by dedicated hardware resources.

arch_get_task_class_score() provides a performance score for a given class
of task when placed on a specific CPU relative to other CPUs. Higher scores
indicate higher performance.

The number of classes and the score of each class of task is determined
by hardware.

Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim C. Chen <tim.c.chen@intel.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 kernel/sched/sched.h    | 64 +++++++++++++++++++++++++++++++++++++++++
 kernel/sched/topology.c |  8 ++++++
 2 files changed, 72 insertions(+)

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index e26688d387ae..719bdba660e2 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2510,6 +2510,70 @@ void arch_scale_freq_tick(void)
 }
 #endif
 
+#ifdef CONFIG_SCHED_TASK_CLASSES
+DECLARE_STATIC_KEY_FALSE(sched_task_classes);
+
+static inline bool sched_task_classes_enabled(void)
+{
+	return static_branch_unlikely(&sched_task_classes);
+}
+
+#ifndef arch_has_task_classes
+/**
+ * arch_has_task_classes() - Check whether hardware supports classes of tasks
+ *
+ * Returns: true of classes of tasks are supported.
+ */
+static __always_inline
+bool arch_has_task_classes(void)
+{
+	return false;
+}
+#endif
+
+#ifndef arch_update_task_class
+/**
+ * arch_update_task_class() - Update the classification of the current task
+ * @curr:		The current task
+ * @smt_siblings_idle:	True if all of the SMT siblings of the CPU of @curr
+ *			are idle.
+ *
+ * Request that the classification of @curr is updated. On certain CPUs, the
+ * classification is only reliable if all of the SMT siblings of the CPU of
+ * @curr are idle.
+ *
+ * Returns: none
+ */
+static __always_inline
+void arch_update_task_class(struct task_struct *curr, bool smt_siblings_idle)
+{
+}
+#endif
+
+#ifndef arch_get_task_class_score
+/**
+ * arch_get_task_class_score() - Get the priority of a class
+ * @class:	A classification value
+ * @cpu:	A CPU number
+ *
+ * Returns the performance score of a class of tasks when running on @cpu.
+ * Error when either @class or @cpu are invalid.
+ */
+static __always_inline
+int arch_get_task_class_score(int class, int cpu)
+{
+	return 1;
+}
+#endif
+#else /* CONFIG_SCHED_TASK_CLASSES */
+
+#define arch_get_task_class_score(class, cpu) (-EINVAL)
+#define arch_update_task_class(curr, smt_siblings_idle)
+
+static inline bool sched_task_classes_enabled(void) { return false; }
+
+#endif /* CONFIG_SCHED_TASK_CLASSES */
+
 #ifndef arch_scale_freq_capacity
 /**
  * arch_scale_freq_capacity - get the frequency scale factor of a given CPU.
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 8739c2a5a54e..8d886dbe6318 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -669,6 +669,9 @@ DEFINE_PER_CPU(struct sched_domain __rcu *, sd_numa);
 DEFINE_PER_CPU(struct sched_domain __rcu *, sd_asym_packing);
 DEFINE_PER_CPU(struct sched_domain __rcu *, sd_asym_cpucapacity);
 DEFINE_STATIC_KEY_FALSE(sched_asym_cpucapacity);
+#ifdef CONFIG_SCHED_TASK_CLASSES
+DEFINE_STATIC_KEY_FALSE(sched_task_classes);
+#endif
 
 static void update_top_cache_domain(int cpu)
 {
@@ -2384,6 +2387,11 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att
 	if (has_asym)
 		static_branch_inc_cpuslocked(&sched_asym_cpucapacity);
 
+#ifdef CONFIG_SCHED_TASK_CLASSES
+	if (arch_has_task_classes())
+		static_branch_enable_cpuslocked(&sched_task_classes);
+#endif
+
 	if (rq && sched_debug_verbose) {
 		pr_info("root domain span: %*pbl (max cpu_capacity = %lu)\n",
 			cpumask_pr_args(cpu_map), rq->rd->max_cpu_capacity);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [RFC PATCH 03/23] sched/core: Initialize the class of a new task
  2022-09-09 23:11 [RFC PATCH 00/23] sched: Introduce classes of tasks for load balance Ricardo Neri
  2022-09-09 23:11 ` [RFC PATCH 01/23] sched/task_struct: Introduce classes of tasks Ricardo Neri
  2022-09-09 23:11 ` [RFC PATCH 02/23] sched: Add interfaces for " Ricardo Neri
@ 2022-09-09 23:11 ` Ricardo Neri
  2022-09-26 14:57   ` Joel Fernandes
  2022-09-09 23:11 ` [RFC PATCH 04/23] sched/core: Add user_tick as argument to scheduler_tick() Ricardo Neri
                   ` (21 subsequent siblings)
  24 siblings, 1 reply; 78+ messages in thread
From: Ricardo Neri @ 2022-09-09 23:11 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Ricardo Neri, Tim C . Chen

New tasks shall start life as unclassified. They will be classified by
hardware when they run.

Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim C. Chen <tim.c.chen@intel.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 kernel/sched/core.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ee28253c9ac0..db548c1a25ef 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4336,6 +4336,9 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p)
 	p->se.prev_sum_exec_runtime	= 0;
 	p->se.nr_migrations		= 0;
 	p->se.vruntime			= 0;
+#ifdef CONFIG_SCHED_TASK_CLASSES
+	p->class			= TASK_CLASS_UNCLASSIFIED;
+#endif
 	INIT_LIST_HEAD(&p->se.group_node);
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [RFC PATCH 04/23] sched/core: Add user_tick as argument to scheduler_tick()
  2022-09-09 23:11 [RFC PATCH 00/23] sched: Introduce classes of tasks for load balance Ricardo Neri
                   ` (2 preceding siblings ...)
  2022-09-09 23:11 ` [RFC PATCH 03/23] sched/core: Initialize the class of a new task Ricardo Neri
@ 2022-09-09 23:11 ` Ricardo Neri
  2022-09-09 23:11 ` [RFC PATCH 05/23] sched/core: Move is_core_idle() out of fair.c Ricardo Neri
                   ` (20 subsequent siblings)
  24 siblings, 0 replies; 78+ messages in thread
From: Ricardo Neri @ 2022-09-09 23:11 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Ricardo Neri, Tim C . Chen

Differentiate between user and kernel ticks so that the scheduler updates
the class of the current task during the latter.

Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim C. Chen <tim.c.chen@intel.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 include/linux/sched.h | 2 +-
 kernel/sched/core.c   | 2 +-
 kernel/time/timer.c   | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index acc33dbaa47c..73f9e745d17d 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -283,7 +283,7 @@ enum {
 	TASK_COMM_LEN = 16,
 };
 
-extern void scheduler_tick(void);
+extern void scheduler_tick(bool user_tick);
 
 #define	MAX_SCHEDULE_TIMEOUT		LONG_MAX
 
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index db548c1a25ef..76015dbc45c5 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5436,7 +5436,7 @@ static inline u64 cpu_resched_latency(struct rq *rq) { return 0; }
  * This function gets called by the timer code, with HZ frequency.
  * We call it with interrupts disabled.
  */
-void scheduler_tick(void)
+void scheduler_tick(bool user_tick)
 {
 	int cpu = smp_processor_id();
 	struct rq *rq = cpu_rq(cpu);
diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 717fcb9fb14a..b444b6f5f585 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1841,7 +1841,7 @@ void update_process_times(int user_tick)
 	if (in_irq())
 		irq_work_tick();
 #endif
-	scheduler_tick();
+	scheduler_tick(user_tick);
 	if (IS_ENABLED(CONFIG_POSIX_TIMERS))
 		run_posix_cpu_timers();
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [RFC PATCH 05/23] sched/core: Move is_core_idle() out of fair.c
  2022-09-09 23:11 [RFC PATCH 00/23] sched: Introduce classes of tasks for load balance Ricardo Neri
                   ` (3 preceding siblings ...)
  2022-09-09 23:11 ` [RFC PATCH 04/23] sched/core: Add user_tick as argument to scheduler_tick() Ricardo Neri
@ 2022-09-09 23:11 ` Ricardo Neri
  2022-09-09 23:11 ` [RFC PATCH 06/23] sched/core: Update the classification of the current task Ricardo Neri
                   ` (19 subsequent siblings)
  24 siblings, 0 replies; 78+ messages in thread
From: Ricardo Neri @ 2022-09-09 23:11 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Ricardo Neri, Tim C . Chen

Information about the idle state of the SMT siblings of a core can
be used to improve the accuracy of the classification of the current task.

Move is_core_idle() to sched.h to make it available to scheduler_tick().

Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim C. Chen <tim.c.chen@intel.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 kernel/sched/fair.c  | 17 -----------------
 kernel/sched/sched.h | 17 +++++++++++++++++
 2 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0089b33ea7b8..2f2a6bb5990d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1536,23 +1536,6 @@ struct numa_stats {
 	int idle_cpu;
 };
 
-static inline bool is_core_idle(int cpu)
-{
-#ifdef CONFIG_SCHED_SMT
-	int sibling;
-
-	for_each_cpu(sibling, cpu_smt_mask(cpu)) {
-		if (cpu == sibling)
-			continue;
-
-		if (!idle_cpu(sibling))
-			return false;
-	}
-#endif
-
-	return true;
-}
-
 struct task_numa_env {
 	struct task_struct *p;
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 719bdba660e2..2bddedc55ee9 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -3212,6 +3212,23 @@ static inline bool is_per_cpu_kthread(struct task_struct *p)
 }
 #endif
 
+static inline bool is_core_idle(int cpu)
+{
+#ifdef CONFIG_SCHED_SMT
+	int sibling;
+
+	for_each_cpu(sibling, cpu_smt_mask(cpu)) {
+		if (cpu == sibling)
+			continue;
+
+		if (!idle_cpu(sibling))
+			return false;
+	}
+#endif
+
+	return true;
+}
+
 extern void swake_up_all_locked(struct swait_queue_head *q);
 extern void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [RFC PATCH 06/23] sched/core: Update the classification of the current task
  2022-09-09 23:11 [RFC PATCH 00/23] sched: Introduce classes of tasks for load balance Ricardo Neri
                   ` (4 preceding siblings ...)
  2022-09-09 23:11 ` [RFC PATCH 05/23] sched/core: Move is_core_idle() out of fair.c Ricardo Neri
@ 2022-09-09 23:11 ` Ricardo Neri
  2022-09-14 13:44   ` Peter Zijlstra
  2022-09-09 23:11 ` [RFC PATCH 07/23] sched/fair: Collect load-balancing stats for task classes Ricardo Neri
                   ` (18 subsequent siblings)
  24 siblings, 1 reply; 78+ messages in thread
From: Ricardo Neri @ 2022-09-09 23:11 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Ricardo Neri, Tim C . Chen

When supported, hardware monitors the instruction stream to classify the
current task. Hence, at userspace tick, we are ready to read the most
recent classification result for the current task.

Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim C. Chen <tim.c.chen@intel.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 kernel/sched/core.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 76015dbc45c5..477a90bddcd5 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5445,6 +5445,9 @@ void scheduler_tick(bool user_tick)
 	unsigned long thermal_pressure;
 	u64 resched_latency;
 
+	if (sched_task_classes_enabled() && user_tick)
+		arch_update_task_class(curr, is_core_idle(cpu));
+
 	arch_scale_freq_tick();
 	sched_clock_tick();
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [RFC PATCH 07/23] sched/fair: Collect load-balancing stats for task classes
  2022-09-09 23:11 [RFC PATCH 00/23] sched: Introduce classes of tasks for load balance Ricardo Neri
                   ` (5 preceding siblings ...)
  2022-09-09 23:11 ` [RFC PATCH 06/23] sched/core: Update the classification of the current task Ricardo Neri
@ 2022-09-09 23:11 ` Ricardo Neri
  2022-09-09 23:11 ` [RFC PATCH 08/23] sched/fair: Compute task-class performance scores for load balancing Ricardo Neri
                   ` (17 subsequent siblings)
  24 siblings, 0 replies; 78+ messages in thread
From: Ricardo Neri @ 2022-09-09 23:11 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Ricardo Neri, Tim C . Chen

When selecting a busiest scheduling group, the class of the current task
can be used to select between two scheduling groups of equal asym_packing
priority and number of running tasks.

Compute a new task-class performance score for a scheduling group. It
is the sum of the performance of the current tasks of all the runqueues.

Also, keep track of the task with the lowest performance score on the
scheduling group.

These two metrics will be used during idle load balancing to compute the
current and the prospective task-class performance of a scheduling
group.

Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim C. Chen <tim.c.chen@intel.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 kernel/sched/fair.c | 61 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 61 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2f2a6bb5990d..58a435a04c1c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8686,6 +8686,63 @@ group_type group_classify(unsigned int imbalance_pct,
 	return group_has_spare;
 }
 
+struct sg_lb_task_class_stats {
+	/*
+	 * Score of the task with lowest score among the current tasks (i.e.,
+	 * runqueue::curr) of all runqueues in the scheduling group.
+	 */
+	int min_score;
+	/*
+	 * Sum of the scores of the current tasks of all runqueues in the
+	 * scheduling group.
+	 */
+	long sum_score;
+	/* The task with score equal to @min_score */
+	struct task_struct *p_min_score;
+};
+
+#ifdef CONFIG_SCHED_TASK_CLASSES
+static void init_rq_task_classes_stats(struct sg_lb_task_class_stats *class_sgs)
+{
+	class_sgs->min_score = INT_MAX;
+	class_sgs->sum_score = 0;
+	class_sgs->p_min_score = NULL;
+}
+
+/** Called only if cpu_of(@rq) is not idle and has tasks running. */
+static void update_rq_task_classes_stats(struct sg_lb_task_class_stats *class_sgs,
+					 struct rq *rq)
+{
+	int score;
+
+	if (!sched_task_classes_enabled())
+		return;
+
+	/*
+	 * TODO: if nr_running > 1 we may want go through all the tasks behind
+	 * rq->curr.
+	 */
+	score = arch_get_task_class_score(rq->curr->class, cpu_of(rq));
+
+	class_sgs->sum_score += score;
+
+	if (score >= class_sgs->min_score)
+		return;
+
+	class_sgs->min_score = score;
+	class_sgs->p_min_score = rq->curr;
+}
+#else /* CONFIG_SCHED_TASK_CLASSES */
+static void update_rq_task_classes_stats(struct sg_lb_task_class_stats *class_sgs,
+					 struct rq *rq)
+{
+}
+
+static void init_rq_task_classes_stats(struct sg_lb_task_class_stats *class_sgs)
+{
+}
+#endif /* CONFIG_SCHED_TASK_CLASSES */
+
 /**
  * asym_smt_can_pull_tasks - Check whether the load balancing CPU can pull tasks
  * @dst_cpu:	Destination CPU of the load balancing
@@ -8797,9 +8854,11 @@ static inline void update_sg_lb_stats(struct lb_env *env,
 				      struct sg_lb_stats *sgs,
 				      int *sg_status)
 {
+	struct sg_lb_task_class_stats class_stats;
 	int i, nr_running, local_group;
 
 	memset(sgs, 0, sizeof(*sgs));
+	init_rq_task_classes_stats(&class_stats);
 
 	local_group = group == sds->local;
 
@@ -8849,6 +8908,8 @@ static inline void update_sg_lb_stats(struct lb_env *env,
 			if (sgs->group_misfit_task_load < load)
 				sgs->group_misfit_task_load = load;
 		}
+
+		update_rq_task_classes_stats(&class_stats, rq);
 	}
 
 	sgs->group_capacity = group->sgc->capacity;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [RFC PATCH 08/23] sched/fair: Compute task-class performance scores for load balancing
  2022-09-09 23:11 [RFC PATCH 00/23] sched: Introduce classes of tasks for load balance Ricardo Neri
                   ` (6 preceding siblings ...)
  2022-09-09 23:11 ` [RFC PATCH 07/23] sched/fair: Collect load-balancing stats for task classes Ricardo Neri
@ 2022-09-09 23:11 ` Ricardo Neri
  2022-09-27  9:15   ` Peter Zijlstra
  2022-09-09 23:11 ` [RFC PATCH 09/23] sched/fair: Use task-class performance score to pick the busiest group Ricardo Neri
                   ` (16 subsequent siblings)
  24 siblings, 1 reply; 78+ messages in thread
From: Ricardo Neri @ 2022-09-09 23:11 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Ricardo Neri, Tim C . Chen

Compute both the current and the prospective the task-class performance
of a scheduling group. As task-class statistics are used during asym_
packing load balancing, the scheduling group will become idle.

For a scheduling group with only one CPU, the prospective performance is
the performance of its current task if placed on the destination CPU.

In a scheduling group composed of SMT siblings the current tasks of all
CPUs share the resources of the core. Divide the task-class performance of
scheduling group by the number of busy CPUs.

After load balancing, the throughput of the siblings that remain busy
increases. Plus, the destination CPU now contributes to the overall
throughput.

Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim C. Chen <tim.c.chen@intel.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 kernel/sched/fair.c | 53 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 53 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 58a435a04c1c..97731f81b570 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8405,6 +8405,8 @@ struct sg_lb_stats {
 	enum group_type group_type;
 	unsigned int group_asym_packing; /* Tasks should be moved to preferred CPU */
 	unsigned long group_misfit_task_load; /* A CPU has a task too big for its capacity */
+	long task_class_score_after; /* Prospective task-class score after load balancing */
+	long task_class_score_before; /* Task-class score before load balancing */
 #ifdef CONFIG_NUMA_BALANCING
 	unsigned int nr_numa_running;
 	unsigned int nr_preferred_running;
@@ -8732,6 +8734,49 @@ static void update_rq_task_classes_stats(struct sg_lb_task_class_stats *class_sg
 	class_sgs->min_score = score;
 	class_sgs->p_min_score = rq->curr;
 }
+
+static void compute_ilb_sg_task_class_scores(struct sg_lb_task_class_stats *class_sgs,
+					     struct sg_lb_stats *sgs,
+					     int dst_cpu)
+{
+	int group_score, group_score_without, score_on_dst_cpu;
+	int busy_cpus = sgs->group_weight - sgs->idle_cpus;
+
+	if (!sched_task_classes_enabled())
+		return;
+
+	/* No busy CPUs in the group. No tasks to move. */
+	if (!busy_cpus)
+		return;
+
+	score_on_dst_cpu = arch_get_task_class_score(class_sgs->p_min_score->class,
+						     dst_cpu);
+
+	/*
+	 * The simpest case. The single busy CPU in the current group will
+	 * become idle after pulling its current task. The destination CPU is
+	 * idle.
+	 */
+	if (busy_cpus == 1) {
+		sgs->task_class_score_before = class_sgs->sum_score;
+		sgs->task_class_score_after = score_on_dst_cpu;
+		return;
+	}
+
+	/*
+	 * Now compute the group score with and without the task with the
+	 * lowest score. We assume that the tasks that remain in the group share
+	 * the CPU resources equally.
+	 */
+	group_score = class_sgs->sum_score / busy_cpus;
+
+	group_score_without =  (class_sgs->sum_score - class_sgs->min_score) /
+			       (busy_cpus - 1);
+
+	sgs->task_class_score_after = group_score_without + score_on_dst_cpu;
+	sgs->task_class_score_before = group_score;
+}
+
 #else /* CONFIG_SCHED_TASK_CLASSES */
 static void update_rq_task_classes_stats(struct sg_lb_task_class_stats *class_sgs,
 					 struct rq *rq)
@@ -8741,6 +8786,13 @@ static void update_rq_task_classes_stats(struct sg_lb_task_class_stats *class_sg
 static void init_rq_task_classes_stats(struct sg_lb_task_class_stats *class_sgs)
 {
 }
+
+static void compute_ilb_sg_task_class_scores(struct sg_lb_task_class_stats *class_sgs,
+					     struct sg_lb_stats *sgs,
+					     int dst_cpu)
+{
+}
+
 #endif /* CONFIG_SCHED_TASK_CLASSES */
 
 /**
@@ -8920,6 +8972,7 @@ static inline void update_sg_lb_stats(struct lb_env *env,
 	if (!local_group && env->sd->flags & SD_ASYM_PACKING &&
 	    env->idle != CPU_NOT_IDLE && sgs->sum_h_nr_running &&
 	    sched_asym(env, sds, sgs, group)) {
+		compute_ilb_sg_task_class_scores(&class_stats, sgs, env->dst_cpu);
 		sgs->group_asym_packing = 1;
 	}
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [RFC PATCH 09/23] sched/fair: Use task-class performance score to pick the busiest group
  2022-09-09 23:11 [RFC PATCH 00/23] sched: Introduce classes of tasks for load balance Ricardo Neri
                   ` (7 preceding siblings ...)
  2022-09-09 23:11 ` [RFC PATCH 08/23] sched/fair: Compute task-class performance scores for load balancing Ricardo Neri
@ 2022-09-09 23:11 ` Ricardo Neri
  2022-09-27 11:01   ` Peter Zijlstra
  2022-09-09 23:11 ` [RFC PATCH 10/23] sched/fair: Use classes of tasks when selecting a busiest runqueue Ricardo Neri
                   ` (15 subsequent siblings)
  24 siblings, 1 reply; 78+ messages in thread
From: Ricardo Neri @ 2022-09-09 23:11 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Ricardo Neri, Tim C . Chen

update_sd_pick_busiest() keeps on selecting as the busiest group scheduling
groups of identical priority. Since both groups have the same priority,
either group is a good choice. The classes of tasks in the scheduling
groups can break this tie.

Pick as busiest the scheduling group that yields a higher task-class
performance score after load balancing.

Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim C. Chen <tim.c.chen@intel.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 kernel/sched/fair.c | 68 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 68 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 97731f81b570..7368a0b453ee 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8777,6 +8777,60 @@ static void compute_ilb_sg_task_class_scores(struct sg_lb_task_class_stats *clas
 	sgs->task_class_score_before = group_score;
 }
 
+/**
+ * sched_asym_class_prefer - Select a sched group based on its classes of tasks
+ * @a:	Load balancing statistics of @sg_a
+ * @b:	Load balancing statistics of @sg_b
+ *
+ * Returns: true if preferring @a yields a higher overall throughput after
+ * balancing load. Returns false otherwise.
+ */
+static bool sched_asym_class_prefer(struct sg_lb_stats *a,
+				    struct sg_lb_stats *b)
+{
+	if (!sched_task_classes_enabled())
+		return false;
+
+	/* @a increases overall throughput after load balance. */
+	if (a->task_class_score_after > b->task_class_score_after)
+		return true;
+
+	/*
+	 * If @a and @b yield the same overall throughput, pick @a if
+	 * its current throughput is lower than that of @b.
+	 */
+	if (a->task_class_score_after == b->task_class_score_after)
+		return a->task_class_score_before < b->task_class_score_before;
+
+	return false;
+}
+
+/**
+ * sched_asym_class_pick - Select a sched group based on classes of tasks
+ * @a:		A scheduling group
+ * @b:		A second scheduling group
+ * @a_stats:	Load balancing statistics of @a
+ * @b_stats:	Load balancing statistics of @b
+ *
+ * Returns: true if @a has the same priority and @a has classes of tasks that
+ * yield higher overall throughput after load balance. Returns false otherwise.
+ */
+static bool sched_asym_class_pick(struct sched_group *a,
+				  struct sched_group *b,
+				  struct sg_lb_stats *a_stats,
+				  struct sg_lb_stats *b_stats)
+{
+	/*
+	 * Only use the class-specific preference selection if both sched
+	 * groups have the same priority.
+	 */
+	if (arch_asym_cpu_priority(a->asym_prefer_cpu) !=
+	    arch_asym_cpu_priority(b->asym_prefer_cpu))
+		return false;
+
+	return sched_asym_class_prefer(a_stats, b_stats);
+}
+
 #else /* CONFIG_SCHED_TASK_CLASSES */
 static void update_rq_task_classes_stats(struct sg_lb_task_class_stats *class_sgs,
 					 struct rq *rq)
@@ -8793,6 +8847,14 @@ static void compute_ilb_sg_task_class_scores(struct sg_lb_task_class_stats *clas
 {
 }
 
+static bool sched_asym_class_pick(struct sched_group *a,
+				  struct sched_group *b,
+				  struct sg_lb_stats *a_stats,
+				  struct sg_lb_stats *b_stats)
+{
+	return false;
+}
+
 #endif /* CONFIG_SCHED_TASK_CLASSES */
 
 /**
@@ -9049,6 +9111,12 @@ static bool update_sd_pick_busiest(struct lb_env *env,
 		/* Prefer to move from lowest priority CPU's work */
 		if (sched_asym_prefer(sg->asym_prefer_cpu, sds->busiest->asym_prefer_cpu))
 			return false;
+
+		/* @sg and @sds::busiest have the same priority. */
+		if (sched_asym_class_pick(sds->busiest, sg, &sds->busiest_stat, sgs))
+			return false;
+
+		/* @sg has lower priority than @sds::busiest. */
 		break;
 
 	case group_misfit_task:
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [RFC PATCH 10/23] sched/fair: Use classes of tasks when selecting a busiest runqueue
  2022-09-09 23:11 [RFC PATCH 00/23] sched: Introduce classes of tasks for load balance Ricardo Neri
                   ` (8 preceding siblings ...)
  2022-09-09 23:11 ` [RFC PATCH 09/23] sched/fair: Use task-class performance score to pick the busiest group Ricardo Neri
@ 2022-09-09 23:11 ` Ricardo Neri
  2022-09-27 11:25   ` Peter Zijlstra
  2022-09-09 23:11 ` [RFC PATCH 11/23] thermal: intel: hfi: Introduce Hardware Feedback Interface classes Ricardo Neri
                   ` (14 subsequent siblings)
  24 siblings, 1 reply; 78+ messages in thread
From: Ricardo Neri @ 2022-09-09 23:11 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Ricardo Neri, Tim C . Chen

For two runqueues of equal priority and equal number of running of tasks,
select the one whose current task would have the highest task class score
if placed on the destination CPU.

Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim C. Chen <tim.c.chen@intel.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 kernel/sched/fair.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7368a0b453ee..085b1f75d90b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10009,6 +10009,7 @@ static struct rq *find_busiest_queue(struct lb_env *env,
 	int i;
 
 	for_each_cpu_and(i, sched_group_span(group), env->cpus) {
+		int busiest_class_delta_score = INT_MIN;
 		unsigned long capacity, load, util;
 		unsigned int nr_running;
 		enum fbq_type rt;
@@ -10118,6 +10119,20 @@ static struct rq *find_busiest_queue(struct lb_env *env,
 			if (busiest_nr < nr_running) {
 				busiest_nr = nr_running;
 				busiest = rq;
+			} else if (sched_task_classes_enabled() &&
+				   busiest_nr == nr_running) {
+				int curr_class_delta_score;
+
+				curr_class_delta_score = arch_get_task_class_score(rq->curr->class,
+										   env->dst_cpu) -
+							 arch_get_task_class_score(rq->curr->class,
+										   cpu_of(rq));
+
+				if (busiest_class_delta_score < curr_class_delta_score) {
+					busiest_class_delta_score = curr_class_delta_score;
+					busiest_nr = nr_running;
+					busiest = rq;
+				}
 			}
 			break;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [RFC PATCH 11/23] thermal: intel: hfi: Introduce Hardware Feedback Interface classes
  2022-09-09 23:11 [RFC PATCH 00/23] sched: Introduce classes of tasks for load balance Ricardo Neri
                   ` (9 preceding siblings ...)
  2022-09-09 23:11 ` [RFC PATCH 10/23] sched/fair: Use classes of tasks when selecting a busiest runqueue Ricardo Neri
@ 2022-09-09 23:11 ` Ricardo Neri
  2022-09-09 23:11 ` [RFC PATCH 12/23] thermal: intel: hfi: Convert table_lock to use flags-handling variants Ricardo Neri
                   ` (13 subsequent siblings)
  24 siblings, 0 replies; 78+ messages in thread
From: Ricardo Neri @ 2022-09-09 23:11 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Ricardo Neri, Tim C . Chen

On Intel hybrid parts, each type of CPU has specific performance and
energy efficiency capabilities. The Intel Thread Director technology
extends the Hardware Feedback Interface (HFI) to provide performance and
energy efficiency data for advanced classes of instructions.

Add support to parse and parse per-class capabilities.

Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim C. Chen <tim.c.chen@intel.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 drivers/thermal/intel/intel_hfi.c | 34 +++++++++++++++++++++++++------
 1 file changed, 28 insertions(+), 6 deletions(-)

diff --git a/drivers/thermal/intel/intel_hfi.c b/drivers/thermal/intel/intel_hfi.c
index 239afe02e518..405495dad0b2 100644
--- a/drivers/thermal/intel/intel_hfi.c
+++ b/drivers/thermal/intel/intel_hfi.c
@@ -79,7 +79,7 @@ union cpuid6_edx {
  * @ee_cap:		Energy efficiency capability
  *
  * Capabilities of a logical processor in the HFI table. These capabilities are
- * unitless.
+ * unitless and specific to each HFI class.
  */
 struct hfi_cpu_data {
 	u8	perf_cap;
@@ -91,7 +91,8 @@ struct hfi_cpu_data {
  * @perf_updated:	Hardware updated performance capabilities
  * @ee_updated:		Hardware updated energy efficiency capabilities
  *
- * Properties of the data in an HFI table.
+ * Properties of the data in an HFI table. There exists one header per each
+ * HFI class.
  */
 struct hfi_hdr {
 	u8	perf_updated;
@@ -129,16 +130,21 @@ struct hfi_instance {
 
 /**
  * struct hfi_features - Supported HFI features
+ * @nr_classes:		Number of classes supported
  * @nr_table_pages:	Size of the HFI table in 4KB pages
  * @cpu_stride:		Stride size to locate the capability data of a logical
  *			processor within the table (i.e., row stride)
+ * @class_stride:	Stride size to locate a class within the capability
+ *			data of a logical processor or the HFI table header
  * @hdr_size:		Size of the table header
  *
  * Parameters and supported features that are common to all HFI instances
  */
 struct hfi_features {
+	int		nr_classes;
 	size_t		nr_table_pages;
 	unsigned int	cpu_stride;
+	unsigned int	class_stride;
 	unsigned int	hdr_size;
 };
 
@@ -177,6 +183,10 @@ static void get_hfi_caps(struct hfi_instance *hfi_instance,
 		s16 index;
 
 		index = per_cpu(hfi_cpu_info, cpu).index;
+		/*
+		 * Only report the legacy (i.e., class 0) capabilities. No need
+		 * of hfi_features.class_stride.
+		 */
 		caps = hfi_instance->data + index * hfi_features.cpu_stride;
 		cpu_caps[i].cpu = cpu;
 
@@ -325,8 +335,8 @@ static void init_hfi_cpu_index(struct hfi_cpu_info *info)
 }
 
 /*
- * The format of the HFI table depends on the number of capabilities that the
- * hardware supports. Keep a data structure to navigate the table.
+ * The format of the HFI table depends on the number of capabilities and classes
+ * that the hardware supports. Keep a data structure to navigate the table.
  */
 static void init_hfi_instance(struct hfi_instance *hfi_instance)
 {
@@ -507,18 +517,30 @@ static __init int hfi_parse_features(void)
 	/* The number of 4KB pages required by the table */
 	hfi_features.nr_table_pages = edx.split.table_pages + 1;
 
+	/*
+	 * Capability fields of an HFI class are grouped together. Classes are
+	 * contiguous in memory.  Hence, use the number of supported features to
+	 * locate a specific class.
+	 */
+	hfi_features.class_stride = nr_capabilities;
+
+	/* For now, use only one class of the HFI table */
+	hfi_features.nr_classes = 1;
+
 	/*
 	 * The header contains change indications for each supported feature.
 	 * The size of the table header is rounded up to be a multiple of 8
 	 * bytes.
 	 */
-	hfi_features.hdr_size = DIV_ROUND_UP(nr_capabilities, 8) * 8;
+	hfi_features.hdr_size = DIV_ROUND_UP(nr_capabilities *
+					     hfi_features.nr_classes, 8) * 8;
 
 	/*
 	 * Data of each logical processor is also rounded up to be a multiple
 	 * of 8 bytes.
 	 */
-	hfi_features.cpu_stride = DIV_ROUND_UP(nr_capabilities, 8) * 8;
+	hfi_features.cpu_stride = DIV_ROUND_UP(nr_capabilities *
+					       hfi_features.nr_classes, 8) * 8;
 
 	return 0;
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [RFC PATCH 12/23] thermal: intel: hfi: Convert table_lock to use flags-handling variants
  2022-09-09 23:11 [RFC PATCH 00/23] sched: Introduce classes of tasks for load balance Ricardo Neri
                   ` (10 preceding siblings ...)
  2022-09-09 23:11 ` [RFC PATCH 11/23] thermal: intel: hfi: Introduce Hardware Feedback Interface classes Ricardo Neri
@ 2022-09-09 23:11 ` Ricardo Neri
  2022-09-27 11:34   ` Peter Zijlstra
  2022-09-09 23:11 ` [RFC PATCH 13/23] x86/cpufeatures: Add the Intel Thread Director feature definitions Ricardo Neri
                   ` (12 subsequent siblings)
  24 siblings, 1 reply; 78+ messages in thread
From: Ricardo Neri @ 2022-09-09 23:11 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Ricardo Neri, Tim C . Chen

Currently, the table of an HFI instance is only accessed from the HFI
thermal interrupt handler and from the delayed work that sends the thermal
netlink event to user space.

When using Intel Thread Director to support classes of tasks in the
scheduler, the HFI table will also be accessed from the timer interrupt
handler.

As two interrupt handlers will concurrently access the table, update locks
to use raw_spin_[un]lock_irq[save|restore].

Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim C. Chen <tim.c.chen@intel.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 drivers/thermal/intel/intel_hfi.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/thermal/intel/intel_hfi.c b/drivers/thermal/intel/intel_hfi.c
index 405495dad0b2..4bafe6848d5d 100644
--- a/drivers/thermal/intel/intel_hfi.c
+++ b/drivers/thermal/intel/intel_hfi.c
@@ -175,9 +175,10 @@ static struct workqueue_struct *hfi_updates_wq;
 static void get_hfi_caps(struct hfi_instance *hfi_instance,
 			 struct thermal_genl_cpu_caps *cpu_caps)
 {
+	unsigned long flags;
 	int cpu, i = 0;
 
-	raw_spin_lock_irq(&hfi_instance->table_lock);
+	raw_spin_lock_irqsave(&hfi_instance->table_lock, flags);
 	for_each_cpu(cpu, hfi_instance->cpus) {
 		struct hfi_cpu_data *caps;
 		s16 index;
@@ -199,7 +200,7 @@ static void get_hfi_caps(struct hfi_instance *hfi_instance,
 
 		++i;
 	}
-	raw_spin_unlock_irq(&hfi_instance->table_lock);
+	raw_spin_unlock_irqrestore(&hfi_instance->table_lock, flags);
 }
 
 /*
@@ -262,6 +263,7 @@ void intel_hfi_process_event(__u64 pkg_therm_status_msr_val)
 	struct hfi_instance *hfi_instance;
 	int cpu = smp_processor_id();
 	struct hfi_cpu_info *info;
+	unsigned long flags;
 	u64 new_timestamp;
 
 	if (!pkg_therm_status_msr_val)
@@ -298,7 +300,7 @@ void intel_hfi_process_event(__u64 pkg_therm_status_msr_val)
 		return;
 	}
 
-	raw_spin_lock(&hfi_instance->table_lock);
+	raw_spin_lock_irqsave(&hfi_instance->table_lock, flags);
 
 	/*
 	 * Copy the updated table into our local copy. This includes the new
@@ -307,7 +309,7 @@ void intel_hfi_process_event(__u64 pkg_therm_status_msr_val)
 	memcpy(hfi_instance->local_table, hfi_instance->hw_table,
 	       hfi_features.nr_table_pages << PAGE_SHIFT);
 
-	raw_spin_unlock(&hfi_instance->table_lock);
+	raw_spin_unlock_irqrestore(&hfi_instance->table_lock, flags);
 	raw_spin_unlock(&hfi_instance->event_lock);
 
 	/*
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [RFC PATCH 13/23] x86/cpufeatures: Add the Intel Thread Director feature definitions
  2022-09-09 23:11 [RFC PATCH 00/23] sched: Introduce classes of tasks for load balance Ricardo Neri
                   ` (11 preceding siblings ...)
  2022-09-09 23:11 ` [RFC PATCH 12/23] thermal: intel: hfi: Convert table_lock to use flags-handling variants Ricardo Neri
@ 2022-09-09 23:11 ` Ricardo Neri
  2022-09-09 23:11 ` [RFC PATCH 14/23] thermal: intel: hfi: Update the class of the current task Ricardo Neri
                   ` (11 subsequent siblings)
  24 siblings, 0 replies; 78+ messages in thread
From: Ricardo Neri @ 2022-09-09 23:11 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Ricardo Neri, Tim C . Chen

Intel Thread Director (ITD) provides hardware resources to categorize
the currently running task with a classification value. The classification
reflects the type of instructions that a task executes.

ITD extends the Hardware Feedback Interface table to provide performance
and energy efficiency capabilities for each of the supported classes of
tasks.

Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim C. Chen <tim.c.chen@intel.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/include/asm/cpufeatures.h       | 1 +
 arch/x86/include/asm/disabled-features.h | 8 +++++++-
 arch/x86/kernel/cpu/cpuid-deps.c         | 1 +
 3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index ef4775c6db01..d3202d665ac0 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -339,6 +339,7 @@
 #define X86_FEATURE_HWP_EPP		(14*32+10) /* HWP Energy Perf. Preference */
 #define X86_FEATURE_HWP_PKG_REQ		(14*32+11) /* HWP Package Level Request */
 #define X86_FEATURE_HFI			(14*32+19) /* Hardware Feedback Interface */
+#define X86_FEATURE_ITD			(14*32+23) /* Intel Thread Director */
 
 /* AMD SVM Feature Identification, CPUID level 0x8000000a (EDX), word 15 */
 #define X86_FEATURE_NPT			(15*32+ 0) /* Nested Page Table support */
diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
index 33d2cd04d254..225657aff476 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -87,6 +87,12 @@
 # define DISABLE_TDX_GUEST	(1 << (X86_FEATURE_TDX_GUEST & 31))
 #endif
 
+#ifdef CONFIG_INTEL_THREAD_DIRECTOR
+# define DISABLE_ITD	0
+#else
+# define DISABLE_ITD	(1 << (X86_FEATURE_ITD & 31))
+#endif
+
 /*
  * Make sure to add features to the correct mask
  */
@@ -103,7 +109,7 @@
 #define DISABLED_MASK10	0
 #define DISABLED_MASK11	(DISABLE_RETPOLINE|DISABLE_RETHUNK|DISABLE_UNRET)
 #define DISABLED_MASK12	0
-#define DISABLED_MASK13	0
+#define DISABLED_MASK13	(DISABLE_ITD)
 #define DISABLED_MASK14	0
 #define DISABLED_MASK15	0
 #define DISABLED_MASK16	(DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP| \
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index c881bcafba7d..f6f8a3cd4f2c 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -78,6 +78,7 @@ static const struct cpuid_dep cpuid_deps[] = {
 	{ X86_FEATURE_XFD,			X86_FEATURE_XSAVES    },
 	{ X86_FEATURE_XFD,			X86_FEATURE_XGETBV1   },
 	{ X86_FEATURE_AMX_TILE,			X86_FEATURE_XFD       },
+	{ X86_FEATURE_ITD,			X86_FEATURE_HFI       },
 	{}
 };
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [RFC PATCH 14/23] thermal: intel: hfi: Update the class of the current task
  2022-09-09 23:11 [RFC PATCH 00/23] sched: Introduce classes of tasks for load balance Ricardo Neri
                   ` (12 preceding siblings ...)
  2022-09-09 23:11 ` [RFC PATCH 13/23] x86/cpufeatures: Add the Intel Thread Director feature definitions Ricardo Neri
@ 2022-09-09 23:11 ` Ricardo Neri
  2022-09-27 11:46   ` Peter Zijlstra
  2022-09-09 23:11 ` [RFC PATCH 15/23] thermal: intel: hfi: Report per-cpu class-specific performance scores Ricardo Neri
                   ` (10 subsequent siblings)
  24 siblings, 1 reply; 78+ messages in thread
From: Ricardo Neri @ 2022-09-09 23:11 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Ricardo Neri, Tim C . Chen

Support the interfaces of the scheduler for classes of tasks. If an Intel
processor supports Intel Thread Director, then it supports classes of
tasks.

Use the hardware resources that Intel Thread Director provides to assign
a class to the currently running task.

Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim C. Chen <tim.c.chen@intel.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/include/asm/topology.h   |  8 +++++++
 drivers/thermal/intel/intel_hfi.c | 35 +++++++++++++++++++++++++++++++
 2 files changed, 43 insertions(+)

diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index 458c891a8273..9c6df4fd9414 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -227,4 +227,12 @@ void init_freq_invariance_cppc(void);
 #define arch_init_invariance_cppc init_freq_invariance_cppc
 #endif
 
+#ifdef CONFIG_INTEL_THREAD_DIRECTOR
+int intel_hfi_has_task_classes(void);
+void intel_hfi_update_task_class(struct task_struct *curr, bool smt_siblings_idle);
+
+#define arch_has_task_classes intel_hfi_has_task_classes
+#define arch_update_task_class intel_hfi_update_task_class
+#endif
+
 #endif /* _ASM_X86_TOPOLOGY_H */
diff --git a/drivers/thermal/intel/intel_hfi.c b/drivers/thermal/intel/intel_hfi.c
index 4bafe6848d5d..f46d9331f912 100644
--- a/drivers/thermal/intel/intel_hfi.c
+++ b/drivers/thermal/intel/intel_hfi.c
@@ -73,6 +73,17 @@ union cpuid6_edx {
 	u32 full;
 };
 
+#ifdef CONFIG_INTEL_THREAD_DIRECTOR
+union hfi_thread_feedback_char_msr {
+	struct {
+		u8	classid;
+		u64	__reserved:55;
+		u8	valid:1;
+	} split;
+	u64 full;
+};
+#endif
+
 /**
  * struct hfi_cpu_data - HFI capabilities per CPU
  * @perf_cap:		Performance capability
@@ -172,6 +183,30 @@ static struct workqueue_struct *hfi_updates_wq;
 #define HFI_UPDATE_INTERVAL		HZ
 #define HFI_MAX_THERM_NOTIFY_COUNT	16
 
+#ifdef CONFIG_INTEL_THREAD_DIRECTOR
+int intel_hfi_has_task_classes(void)
+{
+	return cpu_feature_enabled(X86_FEATURE_ITD);
+}
+
+void intel_hfi_update_task_class(struct task_struct *curr, bool smt_siblings_idle)
+{
+	union hfi_thread_feedback_char_msr msr;
+
+	/* We should not be here if ITD is not supported. */
+	if (!cpu_feature_enabled(X86_FEATURE_ITD)) {
+		pr_warn_once("task classification requested but not supported!");
+		return;
+	}
+
+	rdmsrl(MSR_IA32_HW_FEEDBACK_CHAR, msr.full);
+	if (!msr.split.valid)
+		return;
+
+	curr->class = msr.split.classid;
+}
+#endif /* CONFIG_INTEL_THREAD_DIRECTOR */
+
 static void get_hfi_caps(struct hfi_instance *hfi_instance,
 			 struct thermal_genl_cpu_caps *cpu_caps)
 {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [RFC PATCH 15/23] thermal: intel: hfi: Report per-cpu class-specific performance scores
  2022-09-09 23:11 [RFC PATCH 00/23] sched: Introduce classes of tasks for load balance Ricardo Neri
                   ` (13 preceding siblings ...)
  2022-09-09 23:11 ` [RFC PATCH 14/23] thermal: intel: hfi: Update the class of the current task Ricardo Neri
@ 2022-09-09 23:11 ` Ricardo Neri
  2022-09-27 11:59   ` Peter Zijlstra
  2022-09-09 23:11 ` [RFC PATCH 16/23] thermal: intel: hfi: Define a default classification for unclassified tasks Ricardo Neri
                   ` (9 subsequent siblings)
  24 siblings, 1 reply; 78+ messages in thread
From: Ricardo Neri @ 2022-09-09 23:11 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Ricardo Neri, Tim C . Chen

Support the arch_get_task_class_score() interface of the scheduler. Use the
data that Intel Thread Director provides to inform the scheduler the
performance of a class of tasks when placed on a given CPU.

Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim C. Chen <tim.c.chen@intel.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/include/asm/topology.h   |  2 ++
 drivers/thermal/intel/intel_hfi.c | 40 +++++++++++++++++++++++++++++++
 2 files changed, 42 insertions(+)

diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index 9c6df4fd9414..2ed234104ef4 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -230,9 +230,11 @@ void init_freq_invariance_cppc(void);
 #ifdef CONFIG_INTEL_THREAD_DIRECTOR
 int intel_hfi_has_task_classes(void);
 void intel_hfi_update_task_class(struct task_struct *curr, bool smt_siblings_idle);
+int intel_hfi_get_task_class_score(int class, int cpu);
 
 #define arch_has_task_classes intel_hfi_has_task_classes
 #define arch_update_task_class intel_hfi_update_task_class
+#define arch_get_task_class_score intel_hfi_get_task_class_score
 #endif
 
 #endif /* _ASM_X86_TOPOLOGY_H */
diff --git a/drivers/thermal/intel/intel_hfi.c b/drivers/thermal/intel/intel_hfi.c
index f46d9331f912..1b6072c828ff 100644
--- a/drivers/thermal/intel/intel_hfi.c
+++ b/drivers/thermal/intel/intel_hfi.c
@@ -205,6 +205,46 @@ void intel_hfi_update_task_class(struct task_struct *curr, bool smt_siblings_idl
 
 	curr->class = msr.split.classid;
 }
+
+static void get_one_hfi_cap(struct hfi_instance *hfi_instance, s16 index,
+			    struct hfi_cpu_data *hfi_caps, int class)
+{
+	struct hfi_cpu_data *caps;
+
+	/* Find the capabilities of @cpu */
+	caps = hfi_instance->data + index * hfi_features.cpu_stride +
+	       class * hfi_features.class_stride;
+	memcpy(hfi_caps, caps, sizeof(*hfi_caps));
+}
+
+int intel_hfi_get_task_class_score(int class, int cpu)
+{
+	struct hfi_cpu_info *info = &per_cpu(hfi_cpu_info, cpu);
+	struct hfi_instance *instance;
+	struct hfi_cpu_data caps;
+	unsigned long flags;
+	int cap;
+
+	if (cpu < 0 || cpu >= nr_cpu_ids)
+		return -EINVAL;
+
+	if (class == TASK_CLASS_UNCLASSIFIED)
+		return -EINVAL;
+
+	if (class >= (int)hfi_features.nr_classes)
+		return -EINVAL;
+
+	instance = info->hfi_instance;
+	if (!instance)
+		return -ENOENT;
+
+	raw_spin_lock_irqsave(&instance->table_lock, flags);
+	get_one_hfi_cap(instance, info->index, &caps, class);
+	cap = caps.perf_cap;
+	raw_spin_unlock_irqrestore(&instance->table_lock, flags);
+
+	return cap;
+}
 #endif /* CONFIG_INTEL_THREAD_DIRECTOR */
 
 static void get_hfi_caps(struct hfi_instance *hfi_instance,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [RFC PATCH 16/23] thermal: intel: hfi: Define a default classification for unclassified tasks
  2022-09-09 23:11 [RFC PATCH 00/23] sched: Introduce classes of tasks for load balance Ricardo Neri
                   ` (14 preceding siblings ...)
  2022-09-09 23:11 ` [RFC PATCH 15/23] thermal: intel: hfi: Report per-cpu class-specific performance scores Ricardo Neri
@ 2022-09-09 23:11 ` Ricardo Neri
  2022-09-09 23:11 ` [RFC PATCH 17/23] thermal: intel: hfi: Enable the Intel Thread Director Ricardo Neri
                   ` (8 subsequent siblings)
  24 siblings, 0 replies; 78+ messages in thread
From: Ricardo Neri @ 2022-09-09 23:11 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Ricardo Neri, Tim C . Chen

Tasks may be unclassified if they have been recently created, spend most
of its lifetime sleeping, or hardware has not provided a classification.

Most tasks will be eventually classified as 0. This class corresponds to
the capabilities in the legacy, classless, Hardware Feedback Interface
table.

Class 0 is a reasonable choice until hardware provides an actual
classification. Meanwhile, the scheduler will place tasks of higher-
performance classes on higher-performance CPUs.

Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim C. Chen <tim.c.chen@intel.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 drivers/thermal/intel/intel_hfi.c | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/drivers/thermal/intel/intel_hfi.c b/drivers/thermal/intel/intel_hfi.c
index 1b6072c828ff..967899d2c01f 100644
--- a/drivers/thermal/intel/intel_hfi.c
+++ b/drivers/thermal/intel/intel_hfi.c
@@ -184,6 +184,20 @@ static struct workqueue_struct *hfi_updates_wq;
 #define HFI_MAX_THERM_NOTIFY_COUNT	16
 
 #ifdef CONFIG_INTEL_THREAD_DIRECTOR
+
+/*
+ * Tasks may be unclassified if they have been recently created, spend most
+ * of its lifetime sleeping, or hardware has not provided a classification.
+ *
+ * Most tasks will be classified as 0 eventually. Meanwhile, the scheduler
+ * will place tasks of higher per-class performance on higher-performance
+ * CPUs.
+ *
+ * Class 0 is a reasonable choice. It matches the performance capability
+ * of the legacy, classless, Hardware Feedback Interface table.
+ */
+#define UNCLASSIFIED_TASK_DEFAULT 0
+
 int intel_hfi_has_task_classes(void)
 {
 	return cpu_feature_enabled(X86_FEATURE_ITD);
@@ -222,14 +236,14 @@ int intel_hfi_get_task_class_score(int class, int cpu)
 	struct hfi_cpu_info *info = &per_cpu(hfi_cpu_info, cpu);
 	struct hfi_instance *instance;
 	struct hfi_cpu_data caps;
+	int cap, _class = class;
 	unsigned long flags;
-	int cap;
 
 	if (cpu < 0 || cpu >= nr_cpu_ids)
 		return -EINVAL;
 
 	if (class == TASK_CLASS_UNCLASSIFIED)
-		return -EINVAL;
+		_class = UNCLASSIFIED_TASK_DEFAULT;
 
 	if (class >= (int)hfi_features.nr_classes)
 		return -EINVAL;
@@ -239,7 +253,7 @@ int intel_hfi_get_task_class_score(int class, int cpu)
 		return -ENOENT;
 
 	raw_spin_lock_irqsave(&instance->table_lock, flags);
-	get_one_hfi_cap(instance, info->index, &caps, class);
+	get_one_hfi_cap(instance, info->index, &caps, _class);
 	cap = caps.perf_cap;
 	raw_spin_unlock_irqrestore(&instance->table_lock, flags);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [RFC PATCH 17/23] thermal: intel: hfi: Enable the Intel Thread Director
  2022-09-09 23:11 [RFC PATCH 00/23] sched: Introduce classes of tasks for load balance Ricardo Neri
                   ` (15 preceding siblings ...)
  2022-09-09 23:11 ` [RFC PATCH 16/23] thermal: intel: hfi: Define a default classification for unclassified tasks Ricardo Neri
@ 2022-09-09 23:11 ` Ricardo Neri
  2022-09-27 12:00   ` Peter Zijlstra
  2022-09-09 23:12 ` [RFC PATCH 18/23] sched/task_struct: Add helpers for task classification Ricardo Neri
                   ` (7 subsequent siblings)
  24 siblings, 1 reply; 78+ messages in thread
From: Ricardo Neri @ 2022-09-09 23:11 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Ricardo Neri, Tim C . Chen

Enable Intel Thread Director from the CPU hotplug callback: globally from
CPU0 and then enable the thread-classification hardware in each logical
processor individually.

Also, initialize the number of classes supported.

Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim C. Chen <tim.c.chen@intel.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/include/asm/msr-index.h  |  2 ++
 drivers/thermal/intel/Kconfig     | 12 ++++++++++++
 drivers/thermal/intel/intel_hfi.c | 30 ++++++++++++++++++++++++++++--
 3 files changed, 42 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 6674bdb096f3..810b950dc2e0 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -1053,5 +1053,7 @@
 /* Hardware Feedback Interface */
 #define MSR_IA32_HW_FEEDBACK_PTR        0x17d0
 #define MSR_IA32_HW_FEEDBACK_CONFIG     0x17d1
+#define MSR_IA32_HW_FEEDBACK_THREAD_CONFIG 0x17d4
+#define MSR_IA32_HW_FEEDBACK_CHAR	0x17d2
 
 #endif /* _ASM_X86_MSR_INDEX_H */
diff --git a/drivers/thermal/intel/Kconfig b/drivers/thermal/intel/Kconfig
index f0c845679250..8848bf45ffbc 100644
--- a/drivers/thermal/intel/Kconfig
+++ b/drivers/thermal/intel/Kconfig
@@ -113,3 +113,15 @@ config INTEL_HFI_THERMAL
 	  These capabilities may change as a result of changes in the operating
 	  conditions of the system such power and thermal limits. If selected,
 	  the kernel relays updates in CPUs' capabilities to userspace.
+
+config INTEL_THREAD_DIRECTOR
+	bool "Intel Thread Director"
+	depends on INTEL_HFI_THERMAL
+	depends on SMP
+	select SCHED_TASK_CLASSES
+	help
+	  Select this option to enable the Intel Thread Director. If selected,
+	  hardware classifies tasks based on the type of instructions they
+	  execute. It also provides performance capabilities for each class of
+	  task. On hybrid processors, the scheduler uses this data to place
+	  tasks of classes of higher performance on higher-performnance CPUs.
diff --git a/drivers/thermal/intel/intel_hfi.c b/drivers/thermal/intel/intel_hfi.c
index 967899d2c01f..9ddd3047eb39 100644
--- a/drivers/thermal/intel/intel_hfi.c
+++ b/drivers/thermal/intel/intel_hfi.c
@@ -49,6 +49,8 @@
 /* Hardware Feedback Interface MSR configuration bits */
 #define HW_FEEDBACK_PTR_VALID_BIT		BIT(0)
 #define HW_FEEDBACK_CONFIG_HFI_ENABLE_BIT	BIT(0)
+#define HW_FEEDBACK_CONFIG_ITD_ENABLE_BIT	BIT(1)
+#define HW_FEEDBACK_THREAD_CONFIG_ENABLE_BIT	BIT(0)
 
 /* CPUID detection and enumeration definitions for HFI */
 
@@ -73,6 +75,15 @@ union cpuid6_edx {
 	u32 full;
 };
 
+union cpuid6_ecx {
+	struct {
+		u32	dont_care0:8;
+		u32	nr_classes:8;
+		u32	dont_care1:16;
+	} split;
+	u32 full;
+};
+
 #ifdef CONFIG_INTEL_THREAD_DIRECTOR
 union hfi_thread_feedback_char_msr {
 	struct {
@@ -481,6 +492,11 @@ void intel_hfi_online(unsigned int cpu)
 
 	init_hfi_cpu_index(info);
 
+	if (cpu_feature_enabled(X86_FEATURE_ITD)) {
+		msr_val = HW_FEEDBACK_THREAD_CONFIG_ENABLE_BIT;
+		wrmsrl(MSR_IA32_HW_FEEDBACK_THREAD_CONFIG, msr_val);
+	}
+
 	/*
 	 * Now check if the HFI instance of the package/die of @cpu has been
 	 * initialized (by checking its header). In such case, all we have to
@@ -536,6 +552,10 @@ void intel_hfi_online(unsigned int cpu)
 	 */
 	rdmsrl(MSR_IA32_HW_FEEDBACK_CONFIG, msr_val);
 	msr_val |= HW_FEEDBACK_CONFIG_HFI_ENABLE_BIT;
+
+	if (cpu_feature_enabled(X86_FEATURE_ITD))
+		msr_val |= HW_FEEDBACK_CONFIG_ITD_ENABLE_BIT;
+
 	wrmsrl(MSR_IA32_HW_FEEDBACK_CONFIG, msr_val);
 
 unlock:
@@ -615,8 +635,14 @@ static __init int hfi_parse_features(void)
 	 */
 	hfi_features.class_stride = nr_capabilities;
 
-	/* For now, use only one class of the HFI table */
-	hfi_features.nr_classes = 1;
+	if (cpu_feature_enabled(X86_FEATURE_ITD)) {
+		union cpuid6_ecx ecx;
+
+		ecx.full = cpuid_ecx(CPUID_HFI_LEAF);
+		hfi_features.nr_classes = ecx.split.nr_classes;
+	} else {
+		hfi_features.nr_classes = 1;
+	}
 
 	/*
 	 * The header contains change indications for each supported feature.
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [RFC PATCH 18/23] sched/task_struct: Add helpers for task classification
  2022-09-09 23:11 [RFC PATCH 00/23] sched: Introduce classes of tasks for load balance Ricardo Neri
                   ` (16 preceding siblings ...)
  2022-09-09 23:11 ` [RFC PATCH 17/23] thermal: intel: hfi: Enable the Intel Thread Director Ricardo Neri
@ 2022-09-09 23:12 ` Ricardo Neri
  2022-09-27 11:52   ` Peter Zijlstra
  2022-09-09 23:12 ` [RFC PATCH 19/23] sched/core: Initialize helpers of " Ricardo Neri
                   ` (6 subsequent siblings)
  24 siblings, 1 reply; 78+ messages in thread
From: Ricardo Neri @ 2022-09-09 23:12 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Ricardo Neri, Tim C . Chen

The unprocessed classification that hardware provides for a task may not
be usable by the scheduler. The classification may change too frequently.
An arch-specific implementation may want to consider other factors. For
instance, Intel Thread Director considers the state of the SMT siblings
of core.

Provide per-task helper variables that architectures can use to post-
process the classification that hardware provides.

Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim C. Chen <tim.c.chen@intel.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
signed char and unsigned char are both signed in riscv architecture.
Use a signed char to avoid compilation warnings.
---
 include/linux/sched.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 73f9e745d17d..6c8907622f00 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1517,6 +1517,16 @@ struct task_struct {
 #ifdef CONFIG_SCHED_TASK_CLASSES
 	/* Class of task that the scheduler uses for task placement decisions */
 	short				class;
+	/*
+	 * A candidate classification that arch-specific implementations
+	 * qualify for correctness.
+	 */
+	short				class_candidate;
+	/*
+	 * Counter to filter out transient the candidate classification
+	 * of a task
+	 */
+	char				class_debounce_counter;
 #endif
 
 	/*
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [RFC PATCH 19/23] sched/core: Initialize helpers of task classification
  2022-09-09 23:11 [RFC PATCH 00/23] sched: Introduce classes of tasks for load balance Ricardo Neri
                   ` (17 preceding siblings ...)
  2022-09-09 23:12 ` [RFC PATCH 18/23] sched/task_struct: Add helpers for task classification Ricardo Neri
@ 2022-09-09 23:12 ` Ricardo Neri
  2022-09-09 23:12 ` [RFC PATCH 20/23] thermal: intel: hfi: Implement model-specific checks for " Ricardo Neri
                   ` (5 subsequent siblings)
  24 siblings, 0 replies; 78+ messages in thread
From: Ricardo Neri @ 2022-09-09 23:12 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Ricardo Neri, Tim C . Chen

Just as tasks start life unclassified, initialize the classification
auxiliary variables.

Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim C. Chen <tim.c.chen@intel.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 kernel/sched/core.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 477a90bddcd5..5f84aae735d9 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4338,6 +4338,8 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p)
 	p->se.vruntime			= 0;
 #ifdef CONFIG_SCHED_TASK_CLASSES
 	p->class			= TASK_CLASS_UNCLASSIFIED;
+	p->class_candidate		= TASK_CLASS_UNCLASSIFIED;
+	p->class_debounce_counter	= 0;
 #endif
 	INIT_LIST_HEAD(&p->se.group_node);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [RFC PATCH 20/23] thermal: intel: hfi: Implement model-specific checks for task classification
  2022-09-09 23:11 [RFC PATCH 00/23] sched: Introduce classes of tasks for load balance Ricardo Neri
                   ` (18 preceding siblings ...)
  2022-09-09 23:12 ` [RFC PATCH 19/23] sched/core: Initialize helpers of " Ricardo Neri
@ 2022-09-09 23:12 ` Ricardo Neri
  2022-09-09 23:12 ` [RFC PATCH 21/23] x86/cpufeatures: Add feature bit for HRESET Ricardo Neri
                   ` (4 subsequent siblings)
  24 siblings, 0 replies; 78+ messages in thread
From: Ricardo Neri @ 2022-09-09 23:12 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Ricardo Neri, Tim C . Chen

In Alderlake and Raptorlake, the result of thread classification is more
accurate when only one SMT sibling is busy. Classification results for
class 2 and 3 that are always reliable.

To avoid unnecessary migrations, only update the class of a task if it has
been the same during 4 consecutive ticks.

Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim C. Chen <tim.c.chen@intel.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 drivers/thermal/intel/intel_hfi.c | 58 ++++++++++++++++++++++++++++++-
 1 file changed, 57 insertions(+), 1 deletion(-)

diff --git a/drivers/thermal/intel/intel_hfi.c b/drivers/thermal/intel/intel_hfi.c
index 9ddd3047eb39..972ccfd392cf 100644
--- a/drivers/thermal/intel/intel_hfi.c
+++ b/drivers/thermal/intel/intel_hfi.c
@@ -39,6 +39,7 @@
 #include <linux/workqueue.h>
 
 #include <asm/msr.h>
+#include <asm/intel-family.h>
 
 #include "../thermal_core.h"
 #include "intel_hfi.h"
@@ -214,6 +215,60 @@ int intel_hfi_has_task_classes(void)
 	return cpu_feature_enabled(X86_FEATURE_ITD);
 }
 
+#define CLASS_DEBOUNCER_SKIPS 4
+
+/**
+ * debounce_and_update_class() - Process and update a task's classification
+ *
+ * @p:		The task of which the classification will be updated
+ * @new_class:	The new class that hardware provides
+ *
+ * Update the classification of @p with the new value that hardware provides.
+ * Only update the classification of @p it has been the same during
+ * CLASS_DEBOUNCER_SKIPS consecutive ticks.
+ */
+static void debounce_and_update_class(struct task_struct *p, u8 new_class)
+{
+	char debounce_skip;
+
+	/* The class of @p changed, only restart the debounce counter. */
+	if (p->class_candidate != new_class) {
+		p->class_debounce_counter = 1;
+		goto out;
+	}
+
+	/*
+	 * The class of @p did not change. Update it if it has been the same
+	 * for CLASS_DEBOUNCER_SKIPS user ticks.
+	 */
+	debounce_skip = p->class_debounce_counter + 1;
+	if (debounce_skip < CLASS_DEBOUNCER_SKIPS)
+		p->class_debounce_counter++;
+	else
+		p->class = new_class;
+
+out:
+	p->class_candidate = new_class;
+}
+
+static bool classification_is_accurate(u8 class, bool smt_siblings_idle)
+{
+	switch (boot_cpu_data.x86_model) {
+	case INTEL_FAM6_ALDERLAKE:
+	case INTEL_FAM6_ALDERLAKE_L:
+	case INTEL_FAM6_RAPTORLAKE:
+	case INTEL_FAM6_RAPTORLAKE_P:
+	case INTEL_FAM6_RAPTORLAKE_S:
+		if (class == 3 || class == 2 || smt_siblings_idle)
+			return true;
+
+		return false;
+
+	default:
+		return true;
+	}
+}
+
 void intel_hfi_update_task_class(struct task_struct *curr, bool smt_siblings_idle)
 {
 	union hfi_thread_feedback_char_msr msr;
@@ -228,7 +283,8 @@ void intel_hfi_update_task_class(struct task_struct *curr, bool smt_siblings_idl
 	if (!msr.split.valid)
 		return;
 
-	curr->class = msr.split.classid;
+	if (classification_is_accurate(msr.split.classid, smt_siblings_idle))
+		debounce_and_update_class(curr, msr.split.classid);
 }
 
 static void get_one_hfi_cap(struct hfi_instance *hfi_instance, s16 index,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [RFC PATCH 21/23] x86/cpufeatures: Add feature bit for HRESET
  2022-09-09 23:11 [RFC PATCH 00/23] sched: Introduce classes of tasks for load balance Ricardo Neri
                   ` (19 preceding siblings ...)
  2022-09-09 23:12 ` [RFC PATCH 20/23] thermal: intel: hfi: Implement model-specific checks for " Ricardo Neri
@ 2022-09-09 23:12 ` Ricardo Neri
  2022-09-09 23:12 ` [RFC PATCH 22/23] x86/hreset: Configure history reset Ricardo Neri
                   ` (3 subsequent siblings)
  24 siblings, 0 replies; 78+ messages in thread
From: Ricardo Neri @ 2022-09-09 23:12 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Ricardo Neri, Tim C . Chen

The HRESET instruction prevents the classification of the
current task from influencing the classification of the next task
when running serially on the same logical processor.

Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim C. Chen <tim.c.chen@intel.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 arch/x86/include/asm/msr-index.h   | 2 ++
 arch/x86/kernel/cpu/scattered.c    | 1 +
 3 files changed, 4 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index d3202d665ac0..b5e64203c6b2 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -304,6 +304,7 @@
 #define X86_FEATURE_UNRET		(11*32+15) /* "" AMD BTB untrain return */
 #define X86_FEATURE_USE_IBPB_FW		(11*32+16) /* "" Use IBPB during runtime firmware calls */
 #define X86_FEATURE_RSB_VMEXIT_LITE	(11*32+17) /* "" Fill RSB on VM exit when EIBRS is enabled */
+#define X86_FEATURE_HRESET		(11*32+18) /* Hardware history reset instruction */
 
 /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
 #define X86_FEATURE_AVX_VNNI		(12*32+ 4) /* AVX VNNI instructions */
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 810b950dc2e0..18f741499465 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -1056,4 +1056,6 @@
 #define MSR_IA32_HW_FEEDBACK_THREAD_CONFIG 0x17d4
 #define MSR_IA32_HW_FEEDBACK_CHAR	0x17d2
 
+/* Hardware History Reset  */
+#define MSR_IA32_HW_HRESET_ENABLE	0x17da
 #endif /* _ASM_X86_MSR_INDEX_H */
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index fd44b54c90d5..7d268377b03a 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -28,6 +28,7 @@ static const struct cpuid_bit cpuid_bits[] = {
 	{ X86_FEATURE_EPB,		CPUID_ECX,  3, 0x00000006, 0 },
 	{ X86_FEATURE_INTEL_PPIN,	CPUID_EBX,  0, 0x00000007, 1 },
 	{ X86_FEATURE_RRSBA_CTRL,	CPUID_EDX,  2, 0x00000007, 2 },
+	{ X86_FEATURE_HRESET,		CPUID_EAX, 22, 0x00000007, 1 },
 	{ X86_FEATURE_CQM_LLC,		CPUID_EDX,  1, 0x0000000f, 0 },
 	{ X86_FEATURE_CQM_OCCUP_LLC,	CPUID_EDX,  0, 0x0000000f, 1 },
 	{ X86_FEATURE_CQM_MBM_TOTAL,	CPUID_EDX,  1, 0x0000000f, 1 },
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [RFC PATCH 22/23] x86/hreset: Configure history reset
  2022-09-09 23:11 [RFC PATCH 00/23] sched: Introduce classes of tasks for load balance Ricardo Neri
                   ` (20 preceding siblings ...)
  2022-09-09 23:12 ` [RFC PATCH 21/23] x86/cpufeatures: Add feature bit for HRESET Ricardo Neri
@ 2022-09-09 23:12 ` Ricardo Neri
  2022-09-27 12:03   ` Peter Zijlstra
  2022-09-09 23:12 ` [RFC PATCH 23/23] x86/process: Reset hardware history in context switch Ricardo Neri
                   ` (2 subsequent siblings)
  24 siblings, 1 reply; 78+ messages in thread
From: Ricardo Neri @ 2022-09-09 23:12 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Ricardo Neri, Tim C . Chen

Configure the MSR that controls the behavior of HRESET on each logical
processor.

Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim C. Chen <tim.c.chen@intel.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/kernel/cpu/common.c | 25 ++++++++++++++++++++++++-
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 3e508f239098..108642fe6761 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -411,6 +411,28 @@ static __always_inline void setup_umip(struct cpuinfo_x86 *c)
 	cr4_clear_bits(X86_CR4_UMIP);
 }
 
+static u32 hardware_history_features __read_mostly;
+
+static __always_inline void setup_hreset(struct cpuinfo_x86 *c)
+{
+	if (!cpu_feature_enabled(X86_FEATURE_HRESET))
+		return;
+
+	/*
+	 * Use on all CPUs the hardware history features that the boot
+	 * CPU supports.
+	 */
+	if (c == &boot_cpu_data)
+		hardware_history_features = cpuid_ebx(0x20);
+
+	if (!hardware_history_features)
+		return;
+
+	wrmsrl(MSR_IA32_HW_HRESET_ENABLE, hardware_history_features);
+
+	pr_info_once("x86/cpu: Intel History Reset (HRESET) activated\n");
+}
+
 /* These bits should not change their value after CPU init is finished. */
 static const unsigned long cr4_pinned_mask =
 	X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_UMIP |
@@ -1823,10 +1845,11 @@ static void identify_cpu(struct cpuinfo_x86 *c)
 	/* Disable the PN if appropriate */
 	squash_the_stupid_serial_number(c);
 
-	/* Set up SMEP/SMAP/UMIP */
+	/* Set up SMEP/SMAP/UMIP/HRESET */
 	setup_smep(c);
 	setup_smap(c);
 	setup_umip(c);
+	setup_hreset(c);
 
 	/* Enable FSGSBASE instructions if available. */
 	if (cpu_has(c, X86_FEATURE_FSGSBASE)) {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [RFC PATCH 23/23] x86/process: Reset hardware history in context switch
  2022-09-09 23:11 [RFC PATCH 00/23] sched: Introduce classes of tasks for load balance Ricardo Neri
                   ` (21 preceding siblings ...)
  2022-09-09 23:12 ` [RFC PATCH 22/23] x86/hreset: Configure history reset Ricardo Neri
@ 2022-09-09 23:12 ` Ricardo Neri
  2022-09-27 12:52   ` Peter Zijlstra
                     ` (2 more replies)
       [not found] ` <20220910072120.2651-1-hdanton@sina.com>
  2022-10-11 19:12 ` Trying to apply patch set Carlos Bilbao
  24 siblings, 3 replies; 78+ messages in thread
From: Ricardo Neri @ 2022-09-09 23:12 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Ricardo Neri, Tim C . Chen

Reset the classification history of the current task when switching to
the next task. Hardware will start anew the classification of the next
running task.

Cc: Ben Segall <bsegall@google.com>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim C. Chen <tim.c.chen@intel.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/include/asm/hreset.h | 30 ++++++++++++++++++++++++++++++
 arch/x86/kernel/cpu/common.c  | 10 ++++++++++
 arch/x86/kernel/process_32.c  |  3 +++
 arch/x86/kernel/process_64.c  |  3 +++
 4 files changed, 46 insertions(+)
 create mode 100644 arch/x86/include/asm/hreset.h

diff --git a/arch/x86/include/asm/hreset.h b/arch/x86/include/asm/hreset.h
new file mode 100644
index 000000000000..d68ca2fb8642
--- /dev/null
+++ b/arch/x86/include/asm/hreset.h
@@ -0,0 +1,30 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_HRESET_H
+
+/**
+ * HRESET - History reset. Available since binutils v2.36.
+ *
+ * Request the processor to reset the history of task classification on the
+ * current logical processor. The history components to be
+ * reset are specified in %eax. Only bits specified in CPUID(0x20).EBX
+ * and enabled in the IA32_HRESET_ENABLE MSR can be selected.
+ *
+ * The assembly code looks like:
+ *
+ *	hreset %eax
+ *
+ * The corresponding machine code looks like:
+ *
+ *	F3 0F 3A F0 ModRM Imm
+ *
+ * The value of ModRM is 0xc0 to specify %eax register addressing.
+ * The ignored immediate operand is set to 0.
+ *
+ * The instruction is documented in the Intel SDM.
+ */
+
+#define __ASM_HRESET  ".byte 0xf3, 0xf, 0x3a, 0xf0, 0xc0, 0x0"
+
+void reset_hardware_history(void);
+
+#endif /* _ASM_X86_HRESET_H */
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 108642fe6761..4622a0ebf8a1 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -52,6 +52,7 @@
 #include <asm/cpu.h>
 #include <asm/mce.h>
 #include <asm/msr.h>
+#include <asm/hreset.h>
 #include <asm/memtype.h>
 #include <asm/microcode.h>
 #include <asm/microcode_intel.h>
@@ -413,6 +414,15 @@ static __always_inline void setup_umip(struct cpuinfo_x86 *c)
 
 static u32 hardware_history_features __read_mostly;
 
+void reset_hardware_history(void)
+{
+	if (!static_cpu_has(X86_FEATURE_HRESET))
+		return;
+
+	asm volatile("mov %0, %%eax;" __ASM_HRESET "\n" : :
+		     "r" (hardware_history_features) : "%rax");
+}
+
 static __always_inline void setup_hreset(struct cpuinfo_x86 *c)
 {
 	if (!cpu_feature_enabled(X86_FEATURE_HRESET))
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 2f314b170c9f..74d8ad83e0b3 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -52,6 +52,7 @@
 #include <asm/switch_to.h>
 #include <asm/vm86.h>
 #include <asm/resctrl.h>
+#include <asm/hreset.h>
 #include <asm/proto.h>
 
 #include "process.h"
@@ -214,6 +215,8 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	/* Load the Intel cache allocation PQR MSR. */
 	resctrl_sched_in();
 
+	reset_hardware_history();
+
 	return prev_p;
 }
 
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 1962008fe743..0b175f30f359 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -53,6 +53,7 @@
 #include <asm/xen/hypervisor.h>
 #include <asm/vdso.h>
 #include <asm/resctrl.h>
+#include <asm/hreset.h>
 #include <asm/unistd.h>
 #include <asm/fsgsbase.h>
 #ifdef CONFIG_IA32_EMULATION
@@ -657,6 +658,8 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	/* Load the Intel cache allocation PQR MSR. */
 	resctrl_sched_in();
 
+	reset_hardware_history();
+
 	return prev_p;
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 06/23] sched/core: Update the classification of the current task
  2022-09-09 23:11 ` [RFC PATCH 06/23] sched/core: Update the classification of the current task Ricardo Neri
@ 2022-09-14 13:44   ` Peter Zijlstra
  2022-09-16 14:42     ` Ricardo Neri
  0 siblings, 1 reply; 78+ messages in thread
From: Peter Zijlstra @ 2022-09-14 13:44 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Tim C . Chen

On Fri, Sep 09, 2022 at 04:11:48PM -0700, Ricardo Neri wrote:

> +	if (sched_task_classes_enabled() && user_tick)
> +		arch_update_task_class(curr, is_core_idle(cpu));

This evaluates is_core_idle() even if the hardware improves.


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 01/23] sched/task_struct: Introduce classes of tasks
  2022-09-09 23:11 ` [RFC PATCH 01/23] sched/task_struct: Introduce classes of tasks Ricardo Neri
@ 2022-09-14 13:46   ` Peter Zijlstra
  2022-09-16 14:41     ` Ricardo Neri
  0 siblings, 1 reply; 78+ messages in thread
From: Peter Zijlstra @ 2022-09-14 13:46 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Tim C . Chen

On Fri, Sep 09, 2022 at 04:11:43PM -0700, Ricardo Neri wrote:

>  include/linux/sched.h | 7 +++++++
>  init/Kconfig          | 9 +++++++++
>  2 files changed, 16 insertions(+)
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index e7b2f8a5c711..acc33dbaa47c 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -117,6 +117,8 @@ struct task_group;
>  					 __TASK_TRACED | EXIT_DEAD | EXIT_ZOMBIE | \
>  					 TASK_PARKED)
>  
> +#define TASK_CLASS_UNCLASSIFIED		-1

> +#ifdef CONFIG_SCHED_TASK_CLASSES
> +	/* Class of task that the scheduler uses for task placement decisions */
> +	short				class;
> +#endif

You're missing a hunk for init/init_task.c for this non-zero init value.

Does we really have to use a signed type and non-zero init value?

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 01/23] sched/task_struct: Introduce classes of tasks
  2022-09-14 13:46   ` Peter Zijlstra
@ 2022-09-16 14:41     ` Ricardo Neri
  2022-09-27 13:01       ` Peter Zijlstra
  0 siblings, 1 reply; 78+ messages in thread
From: Ricardo Neri @ 2022-09-16 14:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Tim C . Chen

On Wed, Sep 14, 2022 at 03:46:34PM +0200, Peter Zijlstra wrote:
> On Fri, Sep 09, 2022 at 04:11:43PM -0700, Ricardo Neri wrote:
> 
> >  include/linux/sched.h | 7 +++++++
> >  init/Kconfig          | 9 +++++++++
> >  2 files changed, 16 insertions(+)
> > 
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index e7b2f8a5c711..acc33dbaa47c 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -117,6 +117,8 @@ struct task_group;
> >  					 __TASK_TRACED | EXIT_DEAD | EXIT_ZOMBIE | \
> >  					 TASK_PARKED)
> >  
> > +#define TASK_CLASS_UNCLASSIFIED		-1
> 
> > +#ifdef CONFIG_SCHED_TASK_CLASSES
> > +	/* Class of task that the scheduler uses for task placement decisions */
> > +	short				class;
> > +#endif

Thanks for your feedback Peter!
> 
> You're missing a hunk for init/init_task.c for this non-zero init value.

Ah, yes. I'll add it.
> 
> Does we really have to use a signed type and non-zero init value?

At least on Intel processors, class 0 is a valid class. The scheduler needs to
have a notion of unclassified tasks and decide how to handle them, IMO.

Intel processors currently support 8-bit, unsigned classes. I doubt other
architectures will ever support more than 256 classes. Short can handle all the
possible classification values and also the unclassified case.

On the other hand, class 0 could be the default classification unless hardware
classifies differently. 0 would be special and need to be documented clearly.
This would work for Intel processors.

Thanks and BR,
Ricardo 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 06/23] sched/core: Update the classification of the current task
  2022-09-14 13:44   ` Peter Zijlstra
@ 2022-09-16 14:42     ` Ricardo Neri
  0 siblings, 0 replies; 78+ messages in thread
From: Ricardo Neri @ 2022-09-16 14:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Tim C . Chen

On Wed, Sep 14, 2022 at 03:44:29PM +0200, Peter Zijlstra wrote:
> On Fri, Sep 09, 2022 at 04:11:48PM -0700, Ricardo Neri wrote:
> 
> > +	if (sched_task_classes_enabled() && user_tick)
> > +		arch_update_task_class(curr, is_core_idle(cpu));
> 
> This evaluates is_core_idle() even if the hardware improves.

Yes, this is true. Do you think it would make sense to expose is_core_idle()
outside the scheduler? In this manner, only hardware that needs it would
call it.

Thanks and BR,
Ricardo
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 06/23] sched/core: Update the classification of the current task
       [not found] ` <20220910072120.2651-1-hdanton@sina.com>
@ 2022-09-16 14:51   ` Ricardo Neri
  0 siblings, 0 replies; 78+ messages in thread
From: Ricardo Neri @ 2022-09-16 14:51 UTC (permalink / raw)
  To: Hillf Danton; +Cc: Peter Zijlstra, Mel Gorman, linux-kernel, Vincent Guittot

On Sat, Sep 10, 2022 at 03:21:20PM +0800, Hillf Danton wrote:
> On 9 Sep 2022 16:11:48 -0700 Ricardo Neri <ricardo.neri-calderon@linux.intel.com> wrote:
> > @@ -5445,6 +5445,9 @@ void scheduler_tick(bool user_tick)
> >  	unsigned long thermal_pressure;
> >  	u64 resched_latency;
> >  
> > +	if (sched_task_classes_enabled() && user_tick)
> > +		arch_update_task_class(curr, is_core_idle(cpu));
> > +
> >  	arch_scale_freq_tick();
> >  	sched_clock_tick();
> 
Thank you very much for your feedback Hillf!

> Given user_tick == true, core is not idle regardless of SMT.
> IOW I doubt is_core_idle() helps here.

Perhaps is_core_idle() is a bad name? The second argument of arch_update_
task_class() is smt_siblings_idle. is_core_idle() gives us the answer
we want.

is_core_idle() only checks the siblings of @cpu. It explicitly skips itself
from the checks of idle state. We are only interested in the idle state of
the siblings.

Am I missing anything?

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 03/23] sched/core: Initialize the class of a new task
  2022-09-09 23:11 ` [RFC PATCH 03/23] sched/core: Initialize the class of a new task Ricardo Neri
@ 2022-09-26 14:57   ` Joel Fernandes
  2022-09-26 21:53     ` Ricardo Neri
  2022-09-27 13:04     ` Peter Zijlstra
  0 siblings, 2 replies; 78+ messages in thread
From: Joel Fernandes @ 2022-09-26 14:57 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Peter Zijlstra (Intel),
	Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Tim C . Chen

Hi Ricardo,

On Fri, Sep 09, 2022 at 04:11:45PM -0700, Ricardo Neri wrote:
> New tasks shall start life as unclassified. They will be classified by
> hardware when they run.
> 
> Cc: Ben Segall <bsegall@google.com>
> Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Cc: Len Brown <len.brown@intel.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Tim C. Chen <tim.c.chen@intel.com>
> Cc: Valentin Schneider <vschneid@redhat.com>
> Cc: x86@kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> ---
>  kernel/sched/core.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index ee28253c9ac0..db548c1a25ef 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4336,6 +4336,9 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p)
>  	p->se.prev_sum_exec_runtime	= 0;
>  	p->se.nr_migrations		= 0;
>  	p->se.vruntime			= 0;
> +#ifdef CONFIG_SCHED_TASK_CLASSES
> +	p->class			= TASK_CLASS_UNCLASSIFIED;
> +#endif

I find the term 'class' very broad and unclear what kind of class (without
further reading). So I am worried about how this generic term usage plays
with Linux source code in the long-term (like what if someone else comes up
with a usage of term 'class' that is unrelated to IPC.)

To that end, I was wondering if it could be renamed to p->ipc_class, and
CONFIG_SCHED_TASK_IPC_CLASSES, or something.

thanks,

 - Joel



>  	INIT_LIST_HEAD(&p->se.group_node);
>  
>  #ifdef CONFIG_FAIR_GROUP_SCHED
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 03/23] sched/core: Initialize the class of a new task
  2022-09-26 14:57   ` Joel Fernandes
@ 2022-09-26 21:53     ` Ricardo Neri
  2022-09-27 13:04     ` Peter Zijlstra
  1 sibling, 0 replies; 78+ messages in thread
From: Ricardo Neri @ 2022-09-26 21:53 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Peter Zijlstra (Intel),
	Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Tim C . Chen

On Mon, Sep 26, 2022 at 02:57:29PM +0000, Joel Fernandes wrote:
> Hi Ricardo,
> 
> On Fri, Sep 09, 2022 at 04:11:45PM -0700, Ricardo Neri wrote:
> > New tasks shall start life as unclassified. They will be classified by
> > hardware when they run.
> > 
> > Cc: Ben Segall <bsegall@google.com>
> > Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
> > Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> > Cc: Len Brown <len.brown@intel.com>
> > Cc: Mel Gorman <mgorman@suse.de>
> > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > Cc: Tim C. Chen <tim.c.chen@intel.com>
> > Cc: Valentin Schneider <vschneid@redhat.com>
> > Cc: x86@kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> > ---
> >  kernel/sched/core.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index ee28253c9ac0..db548c1a25ef 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -4336,6 +4336,9 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p)
> >  	p->se.prev_sum_exec_runtime	= 0;
> >  	p->se.nr_migrations		= 0;
> >  	p->se.vruntime			= 0;
> > +#ifdef CONFIG_SCHED_TASK_CLASSES
> > +	p->class			= TASK_CLASS_UNCLASSIFIED;
> > +#endif
> 
> I find the term 'class' very broad and unclear what kind of class (without
> further reading). So I am worried about how this generic term usage plays
> with Linux source code in the long-term (like what if someone else comes up
> with a usage of term 'class' that is unrelated to IPC.)

Thank you very much for your review Joel! Yes, class seems too generic. It is
meant to read, for instance, task_struct::class or p->class, or rq->current->
class. This should imply that we are referring to the class of a task. But yes,
I agree that it is too generic.

> 
> To that end, I was wondering if it could be renamed to p->ipc_class, and
> CONFIG_SCHED_TASK_IPC_CLASSES, or something.

This is a good suggestion. I will take it, unless others disagree.

BR,
Ricardo

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 08/23] sched/fair: Compute task-class performance scores for load balancing
  2022-09-09 23:11 ` [RFC PATCH 08/23] sched/fair: Compute task-class performance scores for load balancing Ricardo Neri
@ 2022-09-27  9:15   ` Peter Zijlstra
  2022-10-26  3:57     ` Ricardo Neri
  0 siblings, 1 reply; 78+ messages in thread
From: Peter Zijlstra @ 2022-09-27  9:15 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Tim C . Chen

On Fri, Sep 09, 2022 at 04:11:50PM -0700, Ricardo Neri wrote:

> +static void compute_ilb_sg_task_class_scores(struct sg_lb_task_class_stats *class_sgs,
> +					     struct sg_lb_stats *sgs,
> +					     int dst_cpu)
> +{
> +	int group_score, group_score_without, score_on_dst_cpu;
> +	int busy_cpus = sgs->group_weight - sgs->idle_cpus;
> +
> +	if (!sched_task_classes_enabled())
> +		return;
> +
> +	/* No busy CPUs in the group. No tasks to move. */
> +	if (!busy_cpus)
> +		return;
> +
> +	score_on_dst_cpu = arch_get_task_class_score(class_sgs->p_min_score->class,
> +						     dst_cpu);
> +
> +	/*
> +	 * The simpest case. The single busy CPU in the current group will
> +	 * become idle after pulling its current task. The destination CPU is
> +	 * idle.
> +	 */
> +	if (busy_cpus == 1) {
> +		sgs->task_class_score_before = class_sgs->sum_score;
> +		sgs->task_class_score_after = score_on_dst_cpu;
> +		return;
> +	}
> +
> +	/*
> +	 * Now compute the group score with and without the task with the
> +	 * lowest score. We assume that the tasks that remain in the group share
> +	 * the CPU resources equally.
> +	 */
> +	group_score = class_sgs->sum_score / busy_cpus;
> +
> +	group_score_without =  (class_sgs->sum_score - class_sgs->min_score) /
> +			       (busy_cpus - 1);
> +
> +	sgs->task_class_score_after = group_score_without + score_on_dst_cpu;
> +	sgs->task_class_score_before = group_score;
> +}

That's just plain broken; also lots of cleanups done...

---
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8405,12 +8405,14 @@ struct sg_lb_stats {
 	enum group_type group_type;
 	unsigned int group_asym_packing; /* Tasks should be moved to preferred CPU */
 	unsigned long group_misfit_task_load; /* A CPU has a task too big for its capacity */
-	long task_class_score_after; /* Prospective task-class score after load balancing */
-	long task_class_score_before; /* Task-class score before load balancing */
 #ifdef CONFIG_NUMA_BALANCING
 	unsigned int nr_numa_running;
 	unsigned int nr_preferred_running;
 #endif
+#ifdef CONFIG_SCHED_TASK_CLASSES
+	long task_class_score_after;  /* Prospective task-class score after load balancing */
+	long task_class_score_before; /* Task-class score before load balancing */
+#endif
 };
 
 /*
@@ -8689,58 +8691,54 @@ group_type group_classify(unsigned int i
 }
 
 struct sg_lb_task_class_stats {
-	/*
-	 * Score of the task with lowest score among the current tasks (i.e.,
-	 * runqueue::curr) of all runqueues in the scheduling group.
-	 */
-	int min_score;
-	/*
-	 * Sum of the scores of the current tasks of all runqueues in the
-	 * scheduling group.
-	 */
-	long sum_score;
-	/* The task with score equal to @min_score */
-	struct task_struct *p_min_score;
+	int min_score;                   /* Min(rq->curr->score) */
+	int min_class;
+	long sum_score;                  /* Sum(rq->curr->score) */
 };
 
 #ifdef CONFIG_SCHED_TASK_CLASSES
-static void init_rq_task_classes_stats(struct sg_lb_task_class_stats *class_sgs)
+static void init_sg_lb_task_class_stats(struct sg_lb_task_class_stats *sgcs)
 {
-	class_sgs->min_score = INT_MAX;
-	class_sgs->sum_score = 0;
-	class_sgs->p_min_score = NULL;
+	*sgcs = (struct sg_lb_task_class_stats){
+		.min_score = INT_MAX,
+	};
 }
 
 /** Called only if cpu_of(@rq) is not idle and has tasks running. */
-static void update_rq_task_classes_stats(struct sg_lb_task_class_stats *class_sgs,
-					 struct rq *rq)
+static void update_sg_lb_task_class_stats(struct sg_lb_task_class_stats *sgcs,
+					  struct rq *rq)
 {
-	int score;
+	struct task_struct *curr;
+	int class, score;
 
 	if (!sched_task_classes_enabled())
 		return;
 
+	curr = rcu_dereference(rq->curr);
+	if (!curr || (curr->flags & PF_EXITING) || is_idle_task(curr))
+		return;
+
 	/*
 	 * TODO: if nr_running > 1 we may want go through all the tasks behind
 	 * rq->curr.
 	 */
-	score = arch_get_task_class_score(rq->curr->class, cpu_of(rq));
-
-	class_sgs->sum_score += score;
+	class = curr->class;
+	score = arch_get_task_class_score(class, cpu_of(rq));
 
-	if (score >= class_sgs->min_score)
-		return;
+	sgcs->sum_score += score;
 
-	class_sgs->min_score = score;
-	class_sgs->p_min_score = rq->curr;
+	if (score < sgcs->min_score) {
+		sgcs->min_score = score;
+		sgcs->min_class = class;
+	}
 }
 
-static void compute_ilb_sg_task_class_scores(struct sg_lb_task_class_stats *class_sgs,
-					     struct sg_lb_stats *sgs,
-					     int dst_cpu)
+static void update_sg_lb_stats_scores(struct sg_lb_task_class_stats *sgcs,
+				      struct sg_lb_stats *sgs,
+				      int dst_cpu)
 {
-	int group_score, group_score_without, score_on_dst_cpu;
 	int busy_cpus = sgs->group_weight - sgs->idle_cpus;
+	long before, after;
 
 	if (!sched_task_classes_enabled())
 		return;
@@ -8749,32 +8747,18 @@ static void compute_ilb_sg_task_class_sc
 	if (!busy_cpus)
 		return;
 
-	score_on_dst_cpu = arch_get_task_class_score(class_sgs->p_min_score->class,
-						     dst_cpu);
+	score_on_dst_cpu = arch_get_task_class_score(sgcs->min_class, dst_cpu);
 
-	/*
-	 * The simpest case. The single busy CPU in the current group will
-	 * become idle after pulling its current task. The destination CPU is
-	 * idle.
-	 */
-	if (busy_cpus == 1) {
-		sgs->task_class_score_before = class_sgs->sum_score;
-		sgs->task_class_score_after = score_on_dst_cpu;
-		return;
-	}
+	before = sgcs->sum_score
+	after  = before - sgcs->min_score + score_on_dst_cpu;
 
-	/*
-	 * Now compute the group score with and without the task with the
-	 * lowest score. We assume that the tasks that remain in the group share
-	 * the CPU resources equally.
-	 */
-	group_score = class_sgs->sum_score / busy_cpus;
-
-	group_score_without =  (class_sgs->sum_score - class_sgs->min_score) /
-			       (busy_cpus - 1);
+	if (busy_cpus > 1) {
+		before /= busy_cpus;
+		after  /= busy_cpus;
+	}
 
-	sgs->task_class_score_after = group_score_without + score_on_dst_cpu;
-	sgs->task_class_score_before = group_score;
+	sgs->task_class_score_before = before;
+	sgs->task_class_score_after  = after;
 }
 
 /**
@@ -8832,18 +8816,19 @@ static bool sched_asym_class_pick(struct
 }
 
 #else /* CONFIG_SCHED_TASK_CLASSES */
-static void update_rq_task_classes_stats(struct sg_lb_task_class_stats *class_sgs,
-					 struct rq *rq)
+
+static void init_sg_lb_task_class_stats(struct sg_lb_task_class_stats *sgcs)
 {
 }
 
-static void init_rq_task_classes_stats(struct sg_lb_task_class_stats *class_sgs)
+static void update_sg_lb_task_class_stats(struct sg_lb_task_class_stats *sgcs,
+					  struct rq *rq)
 {
 }
 
-static void compute_ilb_sg_task_class_scores(struct sg_lb_task_class_stats *class_sgs,
-					     struct sg_lb_stats *sgs,
-					     int dst_cpu)
+static void update_sg_lb_stats_scores(struct sg_lb_task_class_stats *sgcs,
+				      struct sg_lb_stats *sgs,
+				      int dst_cpu)
 {
 }
 
@@ -8854,7 +8839,6 @@ static bool sched_asym_class_pick(struct
 {
 	return false;
 }
-
 #endif /* CONFIG_SCHED_TASK_CLASSES */
 
 /**
@@ -8979,11 +8963,11 @@ static inline void update_sg_lb_stats(st
 				      struct sg_lb_stats *sgs,
 				      int *sg_status)
 {
-	struct sg_lb_task_class_stats class_stats;
+	struct sg_lb_task_class_stats sgcs;
 	int i, nr_running, local_group;
 
 	memset(sgs, 0, sizeof(*sgs));
-	init_rq_task_classes_stats(&class_stats);
+	init_sg_lb_task_class_stats(&sgcs);
 
 	local_group = group == sds->local;
 
@@ -9034,7 +9018,7 @@ static inline void update_sg_lb_stats(st
 				sgs->group_misfit_task_load = load;
 		}
 
-		update_rq_task_classes_stats(&class_stats, rq);
+		update_sg_lb_task_class_stats(&sgcs, rq);
 	}
 
 	sgs->group_capacity = group->sgc->capacity;
@@ -9045,7 +9029,7 @@ static inline void update_sg_lb_stats(st
 	if (!local_group && env->sd->flags & SD_ASYM_PACKING &&
 	    env->idle != CPU_NOT_IDLE && sgs->sum_h_nr_running &&
 	    sched_asym(env, sds, sgs, group)) {
-		compute_ilb_sg_task_class_scores(&class_stats, sgs, env->dst_cpu);
+		update_sg_lb_stats_scores(&sgcs, sgs, env->dst_cpu);
 		sgs->group_asym_packing = 1;
 	}
 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 09/23] sched/fair: Use task-class performance score to pick the busiest group
  2022-09-09 23:11 ` [RFC PATCH 09/23] sched/fair: Use task-class performance score to pick the busiest group Ricardo Neri
@ 2022-09-27 11:01   ` Peter Zijlstra
  2022-10-05 23:38     ` Ricardo Neri
  0 siblings, 1 reply; 78+ messages in thread
From: Peter Zijlstra @ 2022-09-27 11:01 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Tim C . Chen

On Fri, Sep 09, 2022 at 04:11:51PM -0700, Ricardo Neri wrote:
> update_sd_pick_busiest() keeps on selecting as the busiest group scheduling
> groups of identical priority. Since both groups have the same priority,
> either group is a good choice. The classes of tasks in the scheduling
> groups can break this tie.
> 
> Pick as busiest the scheduling group that yields a higher task-class
> performance score after load balancing.

> +/**
> + * sched_asym_class_pick - Select a sched group based on classes of tasks
> + * @a:		A scheduling group
> + * @b:		A second scheduling group
> + * @a_stats:	Load balancing statistics of @a
> + * @b_stats:	Load balancing statistics of @b
> + *
> + * Returns: true if @a has the same priority and @a has classes of tasks that
> + * yield higher overall throughput after load balance. Returns false otherwise.
> + */
> +static bool sched_asym_class_pick(struct sched_group *a,
> +				  struct sched_group *b,
> +				  struct sg_lb_stats *a_stats,
> +				  struct sg_lb_stats *b_stats)
> +{
> +	/*
> +	 * Only use the class-specific preference selection if both sched
> +	 * groups have the same priority.
> +	 */
> +	if (arch_asym_cpu_priority(a->asym_prefer_cpu) !=
> +	    arch_asym_cpu_priority(b->asym_prefer_cpu))
> +		return false;
> +
> +	return sched_asym_class_prefer(a_stats, b_stats);
> +}
> +
>  #else /* CONFIG_SCHED_TASK_CLASSES */
>  static void update_rq_task_classes_stats(struct sg_lb_task_class_stats *class_sgs,
>  					 struct rq *rq)

> @@ -9049,6 +9111,12 @@ static bool update_sd_pick_busiest(struct lb_env *env,
>  		/* Prefer to move from lowest priority CPU's work */
>  		if (sched_asym_prefer(sg->asym_prefer_cpu, sds->busiest->asym_prefer_cpu))
>  			return false;
> +
> +		/* @sg and @sds::busiest have the same priority. */
> +		if (sched_asym_class_pick(sds->busiest, sg, &sds->busiest_stat, sgs))
> +			return false;
> +
> +		/* @sg has lower priority than @sds::busiest. */
>  		break;
>  
>  	case group_misfit_task:

So why does only this one instance of asym_prefer() require tie
breaking?

I must also re-iterate how much I hate having two different means of
dealing with big-little topologies.

And while looking through this, I must ask about the comment that goes
with sched_set_itmt_core_prio() vs the sg->asym_prefer_cpu assignment in
init_sched_groups_capacity(), what-up ?!



^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 10/23] sched/fair: Use classes of tasks when selecting a busiest runqueue
  2022-09-09 23:11 ` [RFC PATCH 10/23] sched/fair: Use classes of tasks when selecting a busiest runqueue Ricardo Neri
@ 2022-09-27 11:25   ` Peter Zijlstra
  2022-10-07 23:36     ` Ricardo Neri
  0 siblings, 1 reply; 78+ messages in thread
From: Peter Zijlstra @ 2022-09-27 11:25 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Tim C . Chen

On Fri, Sep 09, 2022 at 04:11:52PM -0700, Ricardo Neri wrote:

> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 7368a0b453ee..085b1f75d90b 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -10009,6 +10009,7 @@ static struct rq *find_busiest_queue(struct lb_env *env,
>  	int i;
>  
>  	for_each_cpu_and(i, sched_group_span(group), env->cpus) {
> +		int busiest_class_delta_score = INT_MIN;

You sure you want to break ties toward a negative IPC 'improvement' ?

>  		unsigned long capacity, load, util;
>  		unsigned int nr_running;
>  		enum fbq_type rt;
> @@ -10118,6 +10119,20 @@ static struct rq *find_busiest_queue(struct lb_env *env,
>  			if (busiest_nr < nr_running) {
>  				busiest_nr = nr_running;
>  				busiest = rq;
> +			} else if (sched_task_classes_enabled() &&
> +				   busiest_nr == nr_running) {
> +				int curr_class_delta_score;
> +
> +				curr_class_delta_score = arch_get_task_class_score(rq->curr->class,
> +										   env->dst_cpu) -
> +							 arch_get_task_class_score(rq->curr->class,
> +										   cpu_of(rq));

*sigh*, you really couldn't find a more ugly layout if you tried.

Perhaps use less than novella length identifiers?

> +
> +				if (busiest_class_delta_score < curr_class_delta_score) {
> +					busiest_class_delta_score = curr_class_delta_score;
> +					busiest_nr = nr_running;
> +					busiest = rq;
> +				}
>  			}
>  			break;



^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 12/23] thermal: intel: hfi: Convert table_lock to use flags-handling variants
  2022-09-09 23:11 ` [RFC PATCH 12/23] thermal: intel: hfi: Convert table_lock to use flags-handling variants Ricardo Neri
@ 2022-09-27 11:34   ` Peter Zijlstra
  2022-09-27 11:36     ` Peter Zijlstra
  2022-10-26  3:58     ` Ricardo Neri
  0 siblings, 2 replies; 78+ messages in thread
From: Peter Zijlstra @ 2022-09-27 11:34 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Tim C . Chen

On Fri, Sep 09, 2022 at 04:11:54PM -0700, Ricardo Neri wrote:

> --- a/drivers/thermal/intel/intel_hfi.c
> +++ b/drivers/thermal/intel/intel_hfi.c
> @@ -175,9 +175,10 @@ static struct workqueue_struct *hfi_updates_wq;
>  static void get_hfi_caps(struct hfi_instance *hfi_instance,
>  			 struct thermal_genl_cpu_caps *cpu_caps)
>  {
> +	unsigned long flags;
>  	int cpu, i = 0;
>  
> -	raw_spin_lock_irq(&hfi_instance->table_lock);
> +	raw_spin_lock_irqsave(&hfi_instance->table_lock, flags);
>  	for_each_cpu(cpu, hfi_instance->cpus) {
>  		struct hfi_cpu_data *caps;
>  		s16 index;

^^^^ Anti-pattern alert!

Now your IRQ latency depends on nr_cpus -- which is a fair fail. The
existing code is already pretty crap in that it has the preemption
latency depend on nr_cpus.

While I'm here looking at the HFI stuff, did they fix that HFI interrupt
broadcast mess already? Sending an interrupt to *all* CPUs is quite
insane.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 12/23] thermal: intel: hfi: Convert table_lock to use flags-handling variants
  2022-09-27 11:34   ` Peter Zijlstra
@ 2022-09-27 11:36     ` Peter Zijlstra
  2022-10-26  3:59       ` Ricardo Neri
  2022-10-26  3:58     ` Ricardo Neri
  1 sibling, 1 reply; 78+ messages in thread
From: Peter Zijlstra @ 2022-09-27 11:36 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Tim C . Chen

On Tue, Sep 27, 2022 at 01:34:07PM +0200, Peter Zijlstra wrote:
> On Fri, Sep 09, 2022 at 04:11:54PM -0700, Ricardo Neri wrote:
> 
> > --- a/drivers/thermal/intel/intel_hfi.c
> > +++ b/drivers/thermal/intel/intel_hfi.c
> > @@ -175,9 +175,10 @@ static struct workqueue_struct *hfi_updates_wq;
> >  static void get_hfi_caps(struct hfi_instance *hfi_instance,
> >  			 struct thermal_genl_cpu_caps *cpu_caps)
> >  {
> > +	unsigned long flags;
> >  	int cpu, i = 0;
> >  
> > -	raw_spin_lock_irq(&hfi_instance->table_lock);
> > +	raw_spin_lock_irqsave(&hfi_instance->table_lock, flags);
> >  	for_each_cpu(cpu, hfi_instance->cpus) {
> >  		struct hfi_cpu_data *caps;
> >  		s16 index;
> 
> ^^^^ Anti-pattern alert!
> 
> Now your IRQ latency depends on nr_cpus -- which is a fair fail. The
> existing code is already pretty crap in that it has the preemption
> latency depend on nr_cpus.
> 
> While I'm here looking at the HFI stuff, did they fix that HFI interrupt
> broadcast mess already? Sending an interrupt to *all* CPUs is quite
> insane.

Anyway; given the existence of this here loop; why not have something
like:

DEFINE_PER_CPU(int, hfi_ipc_class);

	class = // extract from HFI mess
	WRITE_ONCE(per_cpu(hfi_ipc_class, cpu), class);

And then have the tick use this_cpu_read(hfi_ipc_class)? No extra
locking required.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 14/23] thermal: intel: hfi: Update the class of the current task
  2022-09-09 23:11 ` [RFC PATCH 14/23] thermal: intel: hfi: Update the class of the current task Ricardo Neri
@ 2022-09-27 11:46   ` Peter Zijlstra
  2022-10-07 20:34     ` Ricardo Neri
  0 siblings, 1 reply; 78+ messages in thread
From: Peter Zijlstra @ 2022-09-27 11:46 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Tim C . Chen

On Fri, Sep 09, 2022 at 04:11:56PM -0700, Ricardo Neri wrote:
> +union hfi_thread_feedback_char_msr {
> +	struct {
> +		u8	classid;
> +		u64	__reserved:55;
> +		u8	valid:1;
> +	} split;
> +	u64 full;
> +};

Urgh, did you perhaps mean:

	struct {
		u64	classid    :8;
		u64	__reserved :55;
		u64	valid      :1
	};

?

Because yes, GCC does fold that into a single u64, but that's
implementation defined behaviour; the C spec doesn't require one to pack
adjacent bitfields of different types together.

I layout of:

	u8	class;			// offset 0
	u64	__reserver : 55;	// offset 8
	u8	valid : 1;		// offset 16

with a total size of 24 bytes is, AFAIU, a valid result of what you
wrote.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 18/23] sched/task_struct: Add helpers for task classification
  2022-09-09 23:12 ` [RFC PATCH 18/23] sched/task_struct: Add helpers for task classification Ricardo Neri
@ 2022-09-27 11:52   ` Peter Zijlstra
  2022-10-08  0:38     ` Ricardo Neri
  0 siblings, 1 reply; 78+ messages in thread
From: Peter Zijlstra @ 2022-09-27 11:52 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Tim C . Chen

On Fri, Sep 09, 2022 at 04:12:00PM -0700, Ricardo Neri wrote:

>  #ifdef CONFIG_SCHED_TASK_CLASSES
>  	short				class;
> +	short				class_candidate;
> +	char				class_debounce_counter;
>  #endif

That's 5 bytes; is that really needed? 4 would be so much better :-)

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 15/23] thermal: intel: hfi: Report per-cpu class-specific performance scores
  2022-09-09 23:11 ` [RFC PATCH 15/23] thermal: intel: hfi: Report per-cpu class-specific performance scores Ricardo Neri
@ 2022-09-27 11:59   ` Peter Zijlstra
  2022-10-05 23:59     ` Ricardo Neri
  0 siblings, 1 reply; 78+ messages in thread
From: Peter Zijlstra @ 2022-09-27 11:59 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Tim C . Chen

On Fri, Sep 09, 2022 at 04:11:57PM -0700, Ricardo Neri wrote:
> Support the arch_get_task_class_score() interface of the scheduler. Use the
> data that Intel Thread Director provides to inform the scheduler the
> performance of a class of tasks when placed on a given CPU.
> 

> +static void get_one_hfi_cap(struct hfi_instance *hfi_instance, s16 index,
> +			    struct hfi_cpu_data *hfi_caps, int class)
> +{
> +	struct hfi_cpu_data *caps;
> +
> +	/* Find the capabilities of @cpu */
> +	caps = hfi_instance->data + index * hfi_features.cpu_stride +
> +	       class * hfi_features.class_stride;
> +	memcpy(hfi_caps, caps, sizeof(*hfi_caps));
> +}
> +
> +int intel_hfi_get_task_class_score(int class, int cpu)
> +{
> +	struct hfi_cpu_info *info = &per_cpu(hfi_cpu_info, cpu);
> +	struct hfi_instance *instance;
> +	struct hfi_cpu_data caps;
> +	unsigned long flags;
> +	int cap;
> +
> +	if (cpu < 0 || cpu >= nr_cpu_ids)
> +		return -EINVAL;
> +
> +	if (class == TASK_CLASS_UNCLASSIFIED)
> +		return -EINVAL;
> +
> +	if (class >= (int)hfi_features.nr_classes)
> +		return -EINVAL;
> +
> +	instance = info->hfi_instance;
> +	if (!instance)
> +		return -ENOENT;
> +
> +	raw_spin_lock_irqsave(&instance->table_lock, flags);
> +	get_one_hfi_cap(instance, info->index, &caps, class);
> +	cap = caps.perf_cap;
> +	raw_spin_unlock_irqrestore(&instance->table_lock, flags);
> +
> +	return cap;
> +}

Does any of that data actually ever change? Isn't the class score fixed
per CPU type?

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 17/23] thermal: intel: hfi: Enable the Intel Thread Director
  2022-09-09 23:11 ` [RFC PATCH 17/23] thermal: intel: hfi: Enable the Intel Thread Director Ricardo Neri
@ 2022-09-27 12:00   ` Peter Zijlstra
  2022-10-06  1:50     ` Ricardo Neri
  0 siblings, 1 reply; 78+ messages in thread
From: Peter Zijlstra @ 2022-09-27 12:00 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Tim C . Chen

On Fri, Sep 09, 2022 at 04:11:59PM -0700, Ricardo Neri wrote:

> +config INTEL_THREAD_DIRECTOR
> +	bool "Intel Thread Director"
> +	depends on INTEL_HFI_THERMAL
> +	depends on SMP
> +	select SCHED_TASK_CLASSES
> +	help
> +	  Select this option to enable the Intel Thread Director. If selected,
> +	  hardware classifies tasks based on the type of instructions they
> +	  execute. It also provides performance capabilities for each class of
> +	  task. On hybrid processors, the scheduler uses this data to place
> +	  tasks of classes of higher performance on higher-performnance CPUs.

Do we really need yet another CONFIG symbol for all this!? AFAICT this
Thread Director crud simply extends the HFI table and doesn't actually
carry that much code with it.

Best to always have it on when HFI is on, no?

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 22/23] x86/hreset: Configure history reset
  2022-09-09 23:12 ` [RFC PATCH 22/23] x86/hreset: Configure history reset Ricardo Neri
@ 2022-09-27 12:03   ` Peter Zijlstra
  2022-10-02 22:34     ` Ricardo Neri
  0 siblings, 1 reply; 78+ messages in thread
From: Peter Zijlstra @ 2022-09-27 12:03 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Tim C . Chen

On Fri, Sep 09, 2022 at 04:12:04PM -0700, Ricardo Neri wrote:

> +static u32 hardware_history_features __read_mostly;

__ro_after_init ?

> +static __always_inline void setup_hreset(struct cpuinfo_x86 *c)
> +{
> +	if (!cpu_feature_enabled(X86_FEATURE_HRESET))
> +		return;
> +
> +	/*
> +	 * Use on all CPUs the hardware history features that the boot
> +	 * CPU supports.
> +	 */
> +	if (c == &boot_cpu_data)
> +		hardware_history_features = cpuid_ebx(0x20);
> +
> +	if (!hardware_history_features)
> +		return;
> +
> +	wrmsrl(MSR_IA32_HW_HRESET_ENABLE, hardware_history_features);
> +
> +	pr_info_once("x86/cpu: Intel History Reset (HRESET) activated\n");
> +}
> +
>  /* These bits should not change their value after CPU init is finished. */
>  static const unsigned long cr4_pinned_mask =
>  	X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_UMIP |

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 23/23] x86/process: Reset hardware history in context switch
  2022-09-09 23:12 ` [RFC PATCH 23/23] x86/process: Reset hardware history in context switch Ricardo Neri
@ 2022-09-27 12:52   ` Peter Zijlstra
  2022-10-03 23:07     ` Ricardo Neri
  2022-09-27 12:53   ` Peter Zijlstra
  2022-09-27 13:15   ` Borislav Petkov
  2 siblings, 1 reply; 78+ messages in thread
From: Peter Zijlstra @ 2022-09-27 12:52 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Tim C . Chen

On Fri, Sep 09, 2022 at 04:12:05PM -0700, Ricardo Neri wrote:

> +void reset_hardware_history(void)
> +{
> +	if (!static_cpu_has(X86_FEATURE_HRESET))
> +		return;
> +
> +	asm volatile("mov %0, %%eax;" __ASM_HRESET "\n" : :
> +		     "r" (hardware_history_features) : "%rax");
> +}

	asm_inline volatile (ALTERNATIVE("", __ASM_HRESET, X86_FEATURE_HRESET)
			     : : "a" (hardware_history_features) : "memory");


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 23/23] x86/process: Reset hardware history in context switch
  2022-09-09 23:12 ` [RFC PATCH 23/23] x86/process: Reset hardware history in context switch Ricardo Neri
  2022-09-27 12:52   ` Peter Zijlstra
@ 2022-09-27 12:53   ` Peter Zijlstra
  2022-10-02 22:02     ` Ricardo Neri
  2022-09-27 13:15   ` Borislav Petkov
  2 siblings, 1 reply; 78+ messages in thread
From: Peter Zijlstra @ 2022-09-27 12:53 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Tim C . Chen

On Fri, Sep 09, 2022 at 04:12:05PM -0700, Ricardo Neri wrote:
> Reset the classification history of the current task when switching to
> the next task. Hardware will start anew the classification of the next
> running task.

Please quantify the cost of this HRESET instruction.

> diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
> index 2f314b170c9f..74d8ad83e0b3 100644
> --- a/arch/x86/kernel/process_32.c
> +++ b/arch/x86/kernel/process_32.c
> @@ -52,6 +52,7 @@
>  #include <asm/switch_to.h>
>  #include <asm/vm86.h>
>  #include <asm/resctrl.h>
> +#include <asm/hreset.h>
>  #include <asm/proto.h>
>  
>  #include "process.h"
> @@ -214,6 +215,8 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
>  	/* Load the Intel cache allocation PQR MSR. */
>  	resctrl_sched_in();
>  
> +	reset_hardware_history();
> +
>  	return prev_p;
>  }
>  
> diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
> index 1962008fe743..0b175f30f359 100644
> --- a/arch/x86/kernel/process_64.c
> +++ b/arch/x86/kernel/process_64.c
> @@ -53,6 +53,7 @@
>  #include <asm/xen/hypervisor.h>
>  #include <asm/vdso.h>
>  #include <asm/resctrl.h>
> +#include <asm/hreset.h>
>  #include <asm/unistd.h>
>  #include <asm/fsgsbase.h>
>  #ifdef CONFIG_IA32_EMULATION
> @@ -657,6 +658,8 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
>  	/* Load the Intel cache allocation PQR MSR. */
>  	resctrl_sched_in();
>  
> +	reset_hardware_history();
> +
>  	return prev_p;
>  }
>  
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 01/23] sched/task_struct: Introduce classes of tasks
  2022-09-16 14:41     ` Ricardo Neri
@ 2022-09-27 13:01       ` Peter Zijlstra
  2022-10-02 22:32         ` Ricardo Neri
  0 siblings, 1 reply; 78+ messages in thread
From: Peter Zijlstra @ 2022-09-27 13:01 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Tim C . Chen

On Fri, Sep 16, 2022 at 07:41:13AM -0700, Ricardo Neri wrote:

> At least on Intel processors, class 0 is a valid class. The scheduler needs to
> have a notion of unclassified tasks and decide how to handle them, IMO.
> 
> Intel processors currently support 8-bit, unsigned classes. I doubt other
> architectures will ever support more than 256 classes. Short can handle all the
> possible classification values and also the unclassified case.
> 
> On the other hand, class 0 could be the default classification unless hardware
> classifies differently. 0 would be special and need to be documented clearly.
> This would work for Intel processors.

You can always do: class = hw_class + 1; that makes 0 'special' and the
hardware class can be trivially reconstructed by subtracting 1.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 03/23] sched/core: Initialize the class of a new task
  2022-09-26 14:57   ` Joel Fernandes
  2022-09-26 21:53     ` Ricardo Neri
@ 2022-09-27 13:04     ` Peter Zijlstra
  2022-09-27 15:48       ` Joel Fernandes
  2022-10-01 20:32       ` Ricardo Neri
  1 sibling, 2 replies; 78+ messages in thread
From: Peter Zijlstra @ 2022-09-27 13:04 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Ricardo Neri, Juri Lelli, Vincent Guittot, Ricardo Neri,
	Ravi V. Shankar, Ben Segall, Daniel Bristot de Oliveira,
	Dietmar Eggemann, Len Brown, Mel Gorman, Rafael J. Wysocki,
	Srinivas Pandruvada, Steven Rostedt, Tim Chen,
	Valentin Schneider, x86, linux-kernel, Tim C . Chen

On Mon, Sep 26, 2022 at 02:57:29PM +0000, Joel Fernandes wrote:

> > +#ifdef CONFIG_SCHED_TASK_CLASSES
> > +	p->class			= TASK_CLASS_UNCLASSIFIED;
> > +#endif
> 
> I find the term 'class' very broad and unclear what kind of class (without
> further reading). So I am worried about how this generic term usage plays
> with Linux source code in the long-term (like what if someone else comes up
> with a usage of term 'class' that is unrelated to IPC.)

However much I like making a pain for people using C++ to compile the
kernel, I do think ipcc might be better here
(instructions_per_cycle_class for those of the novel per identifier
school of thought).

> To that end, I was wondering if it could be renamed to p->ipc_class, and
> CONFIG_SCHED_TASK_IPC_CLASSES, or something.

Can we *please* shorten those thing instead of writing a novel?
CONFIG_SCHED_IPC_CLASS works just as well, no? Or TASK_IPC, whatever.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 23/23] x86/process: Reset hardware history in context switch
  2022-09-09 23:12 ` [RFC PATCH 23/23] x86/process: Reset hardware history in context switch Ricardo Neri
  2022-09-27 12:52   ` Peter Zijlstra
  2022-09-27 12:53   ` Peter Zijlstra
@ 2022-09-27 13:15   ` Borislav Petkov
  2022-10-02 22:12     ` Ricardo Neri
  2 siblings, 1 reply; 78+ messages in thread
From: Borislav Petkov @ 2022-09-27 13:15 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Peter Zijlstra (Intel),
	Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Tim C . Chen

On Fri, Sep 09, 2022 at 04:12:05PM -0700, Ricardo Neri wrote:
> +void reset_hardware_history(void)
> +{
> +	if (!static_cpu_has(X86_FEATURE_HRESET))

In your whole patchset:

	s/static_cpu_has/cpu_feature_enabled/g

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 03/23] sched/core: Initialize the class of a new task
  2022-09-27 13:04     ` Peter Zijlstra
@ 2022-09-27 15:48       ` Joel Fernandes
  2022-10-01 20:32       ` Ricardo Neri
  1 sibling, 0 replies; 78+ messages in thread
From: Joel Fernandes @ 2022-09-27 15:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ricardo Neri, Juri Lelli, Vincent Guittot, Ricardo Neri,
	Ravi V. Shankar, Ben Segall, Daniel Bristot de Oliveira,
	Dietmar Eggemann, Len Brown, Mel Gorman, Rafael J. Wysocki,
	Srinivas Pandruvada, Steven Rostedt, Tim Chen,
	Valentin Schneider, x86, linux-kernel, Tim C . Chen

On Tue, Sep 27, 2022 at 03:04:39PM +0200, Peter Zijlstra wrote:
> On Mon, Sep 26, 2022 at 02:57:29PM +0000, Joel Fernandes wrote:
> 
> > > +#ifdef CONFIG_SCHED_TASK_CLASSES
> > > +	p->class			= TASK_CLASS_UNCLASSIFIED;
> > > +#endif
> > 
> > I find the term 'class' very broad and unclear what kind of class (without
> > further reading). So I am worried about how this generic term usage plays
> > with Linux source code in the long-term (like what if someone else comes up
> > with a usage of term 'class' that is unrelated to IPC.)
> 
> However much I like making a pain for people using C++ to compile the
> kernel, I do think ipcc might be better here
> (instructions_per_cycle_class for those of the novel per identifier
> school of thought).

Yes, ipcc sounds fine to me.

> > To that end, I was wondering if it could be renamed to p->ipc_class, and
> > CONFIG_SCHED_TASK_IPC_CLASSES, or something.
> 
> Can we *please* shorten those thing instead of writing a novel?
> CONFIG_SCHED_IPC_CLASS works just as well, no? Or TASK_IPC, whatever.

CONFIG_SCHED_IPC_CLASS also sounds fine, or:

CONFIG_SCHED_IPC_CLASS_SHORTENED_VERSION_TO_NOT_ANNOY_PETER.


thanks,

 - Joel


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 03/23] sched/core: Initialize the class of a new task
  2022-09-27 13:04     ` Peter Zijlstra
  2022-09-27 15:48       ` Joel Fernandes
@ 2022-10-01 20:32       ` Ricardo Neri
  1 sibling, 0 replies; 78+ messages in thread
From: Ricardo Neri @ 2022-10-01 20:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Joel Fernandes, Juri Lelli, Vincent Guittot, Ricardo Neri,
	Ravi V. Shankar, Ben Segall, Daniel Bristot de Oliveira,
	Dietmar Eggemann, Len Brown, Mel Gorman, Rafael J. Wysocki,
	Srinivas Pandruvada, Steven Rostedt, Tim Chen,
	Valentin Schneider, x86, linux-kernel, Tim C . Chen

On Tue, Sep 27, 2022 at 03:04:39PM +0200, Peter Zijlstra wrote:
> On Mon, Sep 26, 2022 at 02:57:29PM +0000, Joel Fernandes wrote:
> 
> > > +#ifdef CONFIG_SCHED_TASK_CLASSES
> > > +	p->class			= TASK_CLASS_UNCLASSIFIED;
> > > +#endif
> > 
> > I find the term 'class' very broad and unclear what kind of class (without
> > further reading). So I am worried about how this generic term usage plays
> > with Linux source code in the long-term (like what if someone else comes up
> > with a usage of term 'class' that is unrelated to IPC.)
> 
> However much I like making a pain for people using C++ to compile the
> kernel, I do think ipcc might be better here
> (instructions_per_cycle_class for those of the novel per identifier
> school of thought).

Sure, I will use ippc
> 
> > To that end, I was wondering if it could be renamed to p->ipc_class, and
> > CONFIG_SCHED_TASK_IPC_CLASSES, or something.
> 
> Can we *please* shorten those thing instead of writing a novel?
> CONFIG_SCHED_IPC_CLASS works just as well, no? Or TASK_IPC, whatever.

... and avoid the novel-style identifiers :) .

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 23/23] x86/process: Reset hardware history in context switch
  2022-09-27 12:53   ` Peter Zijlstra
@ 2022-10-02 22:02     ` Ricardo Neri
  0 siblings, 0 replies; 78+ messages in thread
From: Ricardo Neri @ 2022-10-02 22:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Tim C . Chen

On Tue, Sep 27, 2022 at 02:53:29PM +0200, Peter Zijlstra wrote:
> On Fri, Sep 09, 2022 at 04:12:05PM -0700, Ricardo Neri wrote:
> > Reset the classification history of the current task when switching to
> > the next task. Hardware will start anew the classification of the next
> > running task.
> 
> Please quantify the cost of this HRESET instruction.

Sure Peter. I will.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 23/23] x86/process: Reset hardware history in context switch
  2022-09-27 13:15   ` Borislav Petkov
@ 2022-10-02 22:12     ` Ricardo Neri
  2022-10-02 22:15       ` Borislav Petkov
  0 siblings, 1 reply; 78+ messages in thread
From: Ricardo Neri @ 2022-10-02 22:12 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Peter Zijlstra (Intel),
	Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Tim C . Chen

On Tue, Sep 27, 2022 at 03:15:47PM +0200, Borislav Petkov wrote:
> On Fri, Sep 09, 2022 at 04:12:05PM -0700, Ricardo Neri wrote:
> > +void reset_hardware_history(void)
> > +{
> > +	if (!static_cpu_has(X86_FEATURE_HRESET))
> 
> In your whole patchset:
> 
> 	s/static_cpu_has/cpu_feature_enabled/g

Sure I can do this, Boris. I guess this implies that I also need to add the
DISABLE_MASK bits. Othewise, IIUC, cpu_feature_enabled() falls back to
static_cpu_has().

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 23/23] x86/process: Reset hardware history in context switch
  2022-10-02 22:12     ` Ricardo Neri
@ 2022-10-02 22:15       ` Borislav Petkov
  2022-10-03 19:49         ` Ricardo Neri
  0 siblings, 1 reply; 78+ messages in thread
From: Borislav Petkov @ 2022-10-02 22:15 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Peter Zijlstra (Intel),
	Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Tim C . Chen

On Sun, Oct 02, 2022 at 03:12:38PM -0700, Ricardo Neri wrote:
> Sure I can do this, Boris. I guess this implies that I also need to
> add the DISABLE_MASK bits.

Are you adding a CONFIG_ item which can control the DISABLE_MASK bit
too?

> Othewise, IIUC, cpu_feature_enabled() falls back to static_cpu_has().

And?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 01/23] sched/task_struct: Introduce classes of tasks
  2022-09-27 13:01       ` Peter Zijlstra
@ 2022-10-02 22:32         ` Ricardo Neri
  0 siblings, 0 replies; 78+ messages in thread
From: Ricardo Neri @ 2022-10-02 22:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Tim C . Chen

On Tue, Sep 27, 2022 at 03:01:07PM +0200, Peter Zijlstra wrote:
> On Fri, Sep 16, 2022 at 07:41:13AM -0700, Ricardo Neri wrote:
> 
> > At least on Intel processors, class 0 is a valid class. The scheduler needs to
> > have a notion of unclassified tasks and decide how to handle them, IMO.
> > 
> > Intel processors currently support 8-bit, unsigned classes. I doubt other
> > architectures will ever support more than 256 classes. Short can handle all the
> > possible classification values and also the unclassified case.
> > 
> > On the other hand, class 0 could be the default classification unless hardware
> > classifies differently. 0 would be special and need to be documented clearly.
> > This would work for Intel processors.
> 
> You can always do: class = hw_class + 1; that makes 0 'special' and the
> hardware class can be trivially reconstructed by subtracting 1.

This makes sense to me. I will implement as you suggest.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 22/23] x86/hreset: Configure history reset
  2022-09-27 12:03   ` Peter Zijlstra
@ 2022-10-02 22:34     ` Ricardo Neri
  0 siblings, 0 replies; 78+ messages in thread
From: Ricardo Neri @ 2022-10-02 22:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Tim C . Chen

On Tue, Sep 27, 2022 at 02:03:59PM +0200, Peter Zijlstra wrote:
> On Fri, Sep 09, 2022 at 04:12:04PM -0700, Ricardo Neri wrote:
> 
> > +static u32 hardware_history_features __read_mostly;
> 
> __ro_after_init ?

Yes! You are correct. the HRESET features are only set once.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 23/23] x86/process: Reset hardware history in context switch
  2022-10-02 22:15       ` Borislav Petkov
@ 2022-10-03 19:49         ` Ricardo Neri
  2022-10-03 19:55           ` Borislav Petkov
  0 siblings, 1 reply; 78+ messages in thread
From: Ricardo Neri @ 2022-10-03 19:49 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Peter Zijlstra (Intel),
	Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Tim C . Chen

On Mon, Oct 03, 2022 at 12:15:32AM +0200, Borislav Petkov wrote:
> On Sun, Oct 02, 2022 at 03:12:38PM -0700, Ricardo Neri wrote:
> > Sure I can do this, Boris. I guess this implies that I also need to
> > add the DISABLE_MASK bits.
> 
> Are you adding a CONFIG_ item which can control the DISABLE_MASK bit
> too?

I am not. I could use CONFIG_INTEL_HFI_THERMAL, as using HRESET only makes
sense when Intel Thread Director (Peter argued in separate email against
having a config option specific for Intel Thread Director).

> 
> > Othewise, IIUC, cpu_feature_enabled() falls back to static_cpu_has().
> 
> And?

Since I did not implement a DISABLE_MASK bit nor a CONFIG_ option for
HRESET, I thought that static_cpu_has() was sufficient.

I am not against using cpu_feature_enabled(), I just want to confirm that
I also need to implement the DISABLE_MASK bit and the CONFIG_ option.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 23/23] x86/process: Reset hardware history in context switch
  2022-10-03 19:49         ` Ricardo Neri
@ 2022-10-03 19:55           ` Borislav Petkov
  0 siblings, 0 replies; 78+ messages in thread
From: Borislav Petkov @ 2022-10-03 19:55 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Peter Zijlstra (Intel),
	Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Tim C . Chen

On Mon, Oct 03, 2022 at 12:49:35PM -0700, Ricardo Neri wrote:
> Since I did not implement a DISABLE_MASK bit nor a CONFIG_ option for
> HRESET, I thought that static_cpu_has() was sufficient.

The only function that should be used to query X86_FEATURE_ flags is
cpu_feature_enabled().

> I am not against using cpu_feature_enabled(), I just want to confirm that
> I also need to implement the DISABLE_MASK bit and the CONFIG_ option.

You don't have to.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 23/23] x86/process: Reset hardware history in context switch
  2022-09-27 12:52   ` Peter Zijlstra
@ 2022-10-03 23:07     ` Ricardo Neri
  2022-10-06  8:35       ` Peter Zijlstra
  0 siblings, 1 reply; 78+ messages in thread
From: Ricardo Neri @ 2022-10-03 23:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Tim C . Chen

On Tue, Sep 27, 2022 at 02:52:24PM +0200, Peter Zijlstra wrote:
> On Fri, Sep 09, 2022 at 04:12:05PM -0700, Ricardo Neri wrote:
> 
> > +void reset_hardware_history(void)
> > +{
> > +	if (!static_cpu_has(X86_FEATURE_HRESET))
> > +		return;

If I used cpu_feature_enabled(X86_FEATURE_ITD) along with the CONFIG_HFI_
THERMAL and its corresponding DISABLE_MASK bit the code below would be
compiled out.

> > +
> > +	asm volatile("mov %0, %%eax;" __ASM_HRESET "\n" : :
> > +		     "r" (hardware_history_features) : "%rax");
> > +}
> 
> 	asm_inline volatile (ALTERNATIVE("", __ASM_HRESET, X86_FEATURE_HRESET)
> 			     : : "a" (hardware_history_features) : "memory");

Do you still prefer have implemented as an ALTERNATIVE?

Thanks and BR,
Ricardo
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 09/23] sched/fair: Use task-class performance score to pick the busiest group
  2022-09-27 11:01   ` Peter Zijlstra
@ 2022-10-05 23:38     ` Ricardo Neri
  2022-10-06  8:37       ` Peter Zijlstra
  0 siblings, 1 reply; 78+ messages in thread
From: Ricardo Neri @ 2022-10-05 23:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Tim C . Chen

On Tue, Sep 27, 2022 at 01:01:32PM +0200, Peter Zijlstra wrote:
> On Fri, Sep 09, 2022 at 04:11:51PM -0700, Ricardo Neri wrote:
> > update_sd_pick_busiest() keeps on selecting as the busiest group scheduling
> > groups of identical priority. Since both groups have the same priority,
> > either group is a good choice. The classes of tasks in the scheduling
> > groups can break this tie.
> > 
> > Pick as busiest the scheduling group that yields a higher task-class
> > performance score after load balancing.
> 
> > +/**
> > + * sched_asym_class_pick - Select a sched group based on classes of tasks
> > + * @a:		A scheduling group
> > + * @b:		A second scheduling group
> > + * @a_stats:	Load balancing statistics of @a
> > + * @b_stats:	Load balancing statistics of @b
> > + *
> > + * Returns: true if @a has the same priority and @a has classes of tasks that
> > + * yield higher overall throughput after load balance. Returns false otherwise.
> > + */
> > +static bool sched_asym_class_pick(struct sched_group *a,
> > +				  struct sched_group *b,
> > +				  struct sg_lb_stats *a_stats,
> > +				  struct sg_lb_stats *b_stats)
> > +{
> > +	/*
> > +	 * Only use the class-specific preference selection if both sched
> > +	 * groups have the same priority.
> > +	 */
> > +	if (arch_asym_cpu_priority(a->asym_prefer_cpu) !=
> > +	    arch_asym_cpu_priority(b->asym_prefer_cpu))
> > +		return false;
> > +
> > +	return sched_asym_class_prefer(a_stats, b_stats);
> > +}
> > +
> >  #else /* CONFIG_SCHED_TASK_CLASSES */
> >  static void update_rq_task_classes_stats(struct sg_lb_task_class_stats *class_sgs,
> >  					 struct rq *rq)
> 
> > @@ -9049,6 +9111,12 @@ static bool update_sd_pick_busiest(struct lb_env *env,
> >  		/* Prefer to move from lowest priority CPU's work */
> >  		if (sched_asym_prefer(sg->asym_prefer_cpu, sds->busiest->asym_prefer_cpu))
> >  			return false;
> > +
> > +		/* @sg and @sds::busiest have the same priority. */
> > +		if (sched_asym_class_pick(sds->busiest, sg, &sds->busiest_stat, sgs))
> > +			return false;
> > +
> > +		/* @sg has lower priority than @sds::busiest. */
> >  		break;
> >  
> >  	case group_misfit_task:
> 
> So why does only this one instance of asym_prefer() require tie
> breaking?

This is the only place in which two sched groups with running tasks and of
equal priority are compared.

In all other places sched_asym_prefer() is used to compare the destination
CPU with others. Since asym_packing is done only when the destination CPU is
idle, there is no need to break this tie.

> 
> I must also re-iterate how much I hate having two different means of
> dealing with big-little topologies.

Yes, that is true. We discussed the challenges of using EAS on x86 during
this year's Linux Plumbers Conference. For the discussion at hand, I think
that the most relevant part is the difficulty to assess CPU capacity
on Intel processors given the presence of SMT, a large range of turbo
frequencies and hardware-controlled frequency scaling.

We are looking into these challenges.

This patchset focuses mainly on the infrastructure to support IPC classes of
tasks in the scheduler. We use it on this tie break when using asym_packing,
but it could be used in capacity computations.

> 
> And while looking through this, I must ask about the comment that goes
> with sched_set_itmt_core_prio() vs the sg->asym_prefer_cpu assignment in
> init_sched_groups_capacity(), what-up ?!

Are you referring to this comment?

	"No need to rebuild sched domain after updating
	 the CPU priorities. The sched domains have no
	 dependency on CPU priorities"

If yes, then it looks wrong to me. Sched domains are rebuilt after updating
priorities.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 15/23] thermal: intel: hfi: Report per-cpu class-specific performance scores
  2022-09-27 11:59   ` Peter Zijlstra
@ 2022-10-05 23:59     ` Ricardo Neri
  2022-10-06  8:52       ` Peter Zijlstra
  0 siblings, 1 reply; 78+ messages in thread
From: Ricardo Neri @ 2022-10-05 23:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Tim C . Chen

On Tue, Sep 27, 2022 at 01:59:15PM +0200, Peter Zijlstra wrote:
> On Fri, Sep 09, 2022 at 04:11:57PM -0700, Ricardo Neri wrote:
> > Support the arch_get_task_class_score() interface of the scheduler. Use the
> > data that Intel Thread Director provides to inform the scheduler the
> > performance of a class of tasks when placed on a given CPU.
> > 
> 
> > +static void get_one_hfi_cap(struct hfi_instance *hfi_instance, s16 index,
> > +			    struct hfi_cpu_data *hfi_caps, int class)
> > +{
> > +	struct hfi_cpu_data *caps;
> > +
> > +	/* Find the capabilities of @cpu */
> > +	caps = hfi_instance->data + index * hfi_features.cpu_stride +
> > +	       class * hfi_features.class_stride;
> > +	memcpy(hfi_caps, caps, sizeof(*hfi_caps));
> > +}
> > +
> > +int intel_hfi_get_task_class_score(int class, int cpu)
> > +{
> > +	struct hfi_cpu_info *info = &per_cpu(hfi_cpu_info, cpu);
> > +	struct hfi_instance *instance;
> > +	struct hfi_cpu_data caps;
> > +	unsigned long flags;
> > +	int cap;
> > +
> > +	if (cpu < 0 || cpu >= nr_cpu_ids)
> > +		return -EINVAL;
> > +
> > +	if (class == TASK_CLASS_UNCLASSIFIED)
> > +		return -EINVAL;
> > +
> > +	if (class >= (int)hfi_features.nr_classes)
> > +		return -EINVAL;
> > +
> > +	instance = info->hfi_instance;
> > +	if (!instance)
> > +		return -ENOENT;
> > +
> > +	raw_spin_lock_irqsave(&instance->table_lock, flags);
> > +	get_one_hfi_cap(instance, info->index, &caps, class);
> > +	cap = caps.perf_cap;
> > +	raw_spin_unlock_irqrestore(&instance->table_lock, flags);
> > +
> > +	return cap;
> > +}
> 
> Does any of that data actually ever change? Isn't the class score fixed
> per CPU type?

Yes, data can change. The Intel SDM Vol 3 Section 14.6.7 states that the
table can be updated during runtime.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 17/23] thermal: intel: hfi: Enable the Intel Thread Director
  2022-09-27 12:00   ` Peter Zijlstra
@ 2022-10-06  1:50     ` Ricardo Neri
  0 siblings, 0 replies; 78+ messages in thread
From: Ricardo Neri @ 2022-10-06  1:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Tim C . Chen

On Tue, Sep 27, 2022 at 02:00:39PM +0200, Peter Zijlstra wrote:
> On Fri, Sep 09, 2022 at 04:11:59PM -0700, Ricardo Neri wrote:
> 
> > +config INTEL_THREAD_DIRECTOR
> > +	bool "Intel Thread Director"
> > +	depends on INTEL_HFI_THERMAL
> > +	depends on SMP
> > +	select SCHED_TASK_CLASSES
> > +	help
> > +	  Select this option to enable the Intel Thread Director. If selected,
> > +	  hardware classifies tasks based on the type of instructions they
> > +	  execute. It also provides performance capabilities for each class of
> > +	  task. On hybrid processors, the scheduler uses this data to place
> > +	  tasks of classes of higher performance on higher-performnance CPUs.
> 
> Do we really need yet another CONFIG symbol for all this!? AFAICT this
> Thread Director crud simply extends the HFI table and doesn't actually
> carry that much code with it.
> 
> Best to always have it on when HFI is on, no?

I decided to add CONFIG_INTEL_THREAD_DIRECTOR mainly to select CONFIG_IPC_
CLASS and have the needed members of task_struct only when needed. Legacy,
classless, HFI can work with the Thread Director part on some processors
(e.g., Sapphire Rapids or, FWIW, Lakefield).

I could get rid of CONFIG_INTEL_THREAD_DIRECTOR and instead wrap the Thread
Director code in an #ifdef(CONFIG_IPC_CLASS) block instead.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 23/23] x86/process: Reset hardware history in context switch
  2022-10-03 23:07     ` Ricardo Neri
@ 2022-10-06  8:35       ` Peter Zijlstra
  2022-10-06 22:55         ` Ricardo Neri
  0 siblings, 1 reply; 78+ messages in thread
From: Peter Zijlstra @ 2022-10-06  8:35 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Tim C . Chen

On Mon, Oct 03, 2022 at 04:07:58PM -0700, Ricardo Neri wrote:
> On Tue, Sep 27, 2022 at 02:52:24PM +0200, Peter Zijlstra wrote:
> > On Fri, Sep 09, 2022 at 04:12:05PM -0700, Ricardo Neri wrote:
> > 
> > > +void reset_hardware_history(void)
> > > +{
> > > +	if (!static_cpu_has(X86_FEATURE_HRESET))
> > > +		return;
> 
> If I used cpu_feature_enabled(X86_FEATURE_ITD) along with the CONFIG_HFI_
> THERMAL and its corresponding DISABLE_MASK bit the code below would be
> compiled out.

Nobody cares about compiled out -- distro's must enable all this. So
what counts is the code size, and the alternative is smaller.

> > > +
> > > +	asm volatile("mov %0, %%eax;" __ASM_HRESET "\n" : :
> > > +		     "r" (hardware_history_features) : "%rax");
> > > +}
> > 
> > 	asm_inline volatile (ALTERNATIVE("", __ASM_HRESET, X86_FEATURE_HRESET)
> > 			     : : "a" (hardware_history_features) : "memory");
> 
> Do you still prefer have implemented as an ALTERNATIVE?

Yes, look at the generated code.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 09/23] sched/fair: Use task-class performance score to pick the busiest group
  2022-10-05 23:38     ` Ricardo Neri
@ 2022-10-06  8:37       ` Peter Zijlstra
  2022-10-06 19:07         ` Ricardo Neri
  0 siblings, 1 reply; 78+ messages in thread
From: Peter Zijlstra @ 2022-10-06  8:37 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Tim C . Chen

On Wed, Oct 05, 2022 at 04:38:41PM -0700, Ricardo Neri wrote:
> On Tue, Sep 27, 2022 at 01:01:32PM +0200, Peter Zijlstra wrote:
> > On Fri, Sep 09, 2022 at 04:11:51PM -0700, Ricardo Neri wrote:

> > > @@ -9049,6 +9111,12 @@ static bool update_sd_pick_busiest(struct lb_env *env,
> > >  		/* Prefer to move from lowest priority CPU's work */
> > >  		if (sched_asym_prefer(sg->asym_prefer_cpu, sds->busiest->asym_prefer_cpu))
> > >  			return false;
> > > +
> > > +		/* @sg and @sds::busiest have the same priority. */
> > > +		if (sched_asym_class_pick(sds->busiest, sg, &sds->busiest_stat, sgs))
> > > +			return false;
> > > +
> > > +		/* @sg has lower priority than @sds::busiest. */
> > >  		break;
> > >  
> > >  	case group_misfit_task:
> > 
> > So why does only this one instance of asym_prefer() require tie
> > breaking?
> 
> This is the only place in which two sched groups with running tasks and of
> equal priority are compared.
> 
> In all other places sched_asym_prefer() is used to compare the destination
> CPU with others. Since asym_packing is done only when the destination CPU is
> idle, there is no need to break this tie.

That would make for a fine comment, no? Because as presented one is left
wondering, why if asym_prefer() needs tie breaking, only this one site
needs it.

> > And while looking through this, I must ask about the comment that goes
> > with sched_set_itmt_core_prio() vs the sg->asym_prefer_cpu assignment in
> > init_sched_groups_capacity(), what-up ?!
> 
> Are you referring to this comment?
> 
> 	"No need to rebuild sched domain after updating
> 	 the CPU priorities. The sched domains have no
> 	 dependency on CPU priorities"
> 
> If yes, then it looks wrong to me. Sched domains are rebuilt after updating
> priorities.

Right.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 15/23] thermal: intel: hfi: Report per-cpu class-specific performance scores
  2022-10-05 23:59     ` Ricardo Neri
@ 2022-10-06  8:52       ` Peter Zijlstra
  2022-10-06  9:14         ` Peter Zijlstra
  0 siblings, 1 reply; 78+ messages in thread
From: Peter Zijlstra @ 2022-10-06  8:52 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Tim C . Chen

On Wed, Oct 05, 2022 at 04:59:59PM -0700, Ricardo Neri wrote:
> On Tue, Sep 27, 2022 at 01:59:15PM +0200, Peter Zijlstra wrote:
> > On Fri, Sep 09, 2022 at 04:11:57PM -0700, Ricardo Neri wrote:
> > > Support the arch_get_task_class_score() interface of the scheduler. Use the
> > > data that Intel Thread Director provides to inform the scheduler the
> > > performance of a class of tasks when placed on a given CPU.
> > > 
> > 
> > > +static void get_one_hfi_cap(struct hfi_instance *hfi_instance, s16 index,
> > > +			    struct hfi_cpu_data *hfi_caps, int class)
> > > +{
> > > +	struct hfi_cpu_data *caps;
> > > +
> > > +	/* Find the capabilities of @cpu */
> > > +	caps = hfi_instance->data + index * hfi_features.cpu_stride +
> > > +	       class * hfi_features.class_stride;
> > > +	memcpy(hfi_caps, caps, sizeof(*hfi_caps));
> > > +}
> > > +
> > > +int intel_hfi_get_task_class_score(int class, int cpu)
> > > +{
> > > +	struct hfi_cpu_info *info = &per_cpu(hfi_cpu_info, cpu);
> > > +	struct hfi_instance *instance;
> > > +	struct hfi_cpu_data caps;
> > > +	unsigned long flags;
> > > +	int cap;
> > > +
> > > +	if (cpu < 0 || cpu >= nr_cpu_ids)
> > > +		return -EINVAL;
> > > +
> > > +	if (class == TASK_CLASS_UNCLASSIFIED)
> > > +		return -EINVAL;
> > > +
> > > +	if (class >= (int)hfi_features.nr_classes)
> > > +		return -EINVAL;
> > > +
> > > +	instance = info->hfi_instance;
> > > +	if (!instance)
> > > +		return -ENOENT;
> > > +
> > > +	raw_spin_lock_irqsave(&instance->table_lock, flags);
> > > +	get_one_hfi_cap(instance, info->index, &caps, class);
> > > +	cap = caps.perf_cap;
> > > +	raw_spin_unlock_irqrestore(&instance->table_lock, flags);
> > > +
> > > +	return cap;
> > > +}
> > 
> > Does any of that data actually ever change? Isn't the class score fixed
> > per CPU type?
> 
> Yes, data can change. The Intel SDM Vol 3 Section 14.6.7 states that the
> table can be updated during runtime.

I find the SDM is often unreadable gibberish, this part doesn't dissapoint.

There's a ton of might and maybe there; what does it actually do and how
often does it do it? Given the thermal interrupt is such a shitshow, we
really, as in *REALLY* don't want this to happen at any frequency at
all.

And if it barely happens, why do we care if it happens at all?

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 15/23] thermal: intel: hfi: Report per-cpu class-specific performance scores
  2022-10-06  8:52       ` Peter Zijlstra
@ 2022-10-06  9:14         ` Peter Zijlstra
  2022-10-06 15:05           ` Brown, Len
  0 siblings, 1 reply; 78+ messages in thread
From: Peter Zijlstra @ 2022-10-06  9:14 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Tim C . Chen

On Thu, Oct 06, 2022 at 10:52:16AM +0200, Peter Zijlstra wrote:
> On Wed, Oct 05, 2022 at 04:59:59PM -0700, Ricardo Neri wrote:
> > On Tue, Sep 27, 2022 at 01:59:15PM +0200, Peter Zijlstra wrote:
> > > On Fri, Sep 09, 2022 at 04:11:57PM -0700, Ricardo Neri wrote:
> > > > Support the arch_get_task_class_score() interface of the scheduler. Use the
> > > > data that Intel Thread Director provides to inform the scheduler the
> > > > performance of a class of tasks when placed on a given CPU.
> > > > 
> > > 
> > > > +static void get_one_hfi_cap(struct hfi_instance *hfi_instance, s16 index,
> > > > +			    struct hfi_cpu_data *hfi_caps, int class)
> > > > +{
> > > > +	struct hfi_cpu_data *caps;
> > > > +
> > > > +	/* Find the capabilities of @cpu */
> > > > +	caps = hfi_instance->data + index * hfi_features.cpu_stride +
> > > > +	       class * hfi_features.class_stride;
> > > > +	memcpy(hfi_caps, caps, sizeof(*hfi_caps));
> > > > +}
> > > > +
> > > > +int intel_hfi_get_task_class_score(int class, int cpu)
> > > > +{
> > > > +	struct hfi_cpu_info *info = &per_cpu(hfi_cpu_info, cpu);
> > > > +	struct hfi_instance *instance;
> > > > +	struct hfi_cpu_data caps;
> > > > +	unsigned long flags;
> > > > +	int cap;
> > > > +
> > > > +	if (cpu < 0 || cpu >= nr_cpu_ids)
> > > > +		return -EINVAL;
> > > > +
> > > > +	if (class == TASK_CLASS_UNCLASSIFIED)
> > > > +		return -EINVAL;
> > > > +
> > > > +	if (class >= (int)hfi_features.nr_classes)
> > > > +		return -EINVAL;
> > > > +
> > > > +	instance = info->hfi_instance;
> > > > +	if (!instance)
> > > > +		return -ENOENT;
> > > > +
> > > > +	raw_spin_lock_irqsave(&instance->table_lock, flags);
> > > > +	get_one_hfi_cap(instance, info->index, &caps, class);
> > > > +	cap = caps.perf_cap;
> > > > +	raw_spin_unlock_irqrestore(&instance->table_lock, flags);
> > > > +
> > > > +	return cap;
> > > > +}
> > > 
> > > Does any of that data actually ever change? Isn't the class score fixed
> > > per CPU type?
> > 
> > Yes, data can change. The Intel SDM Vol 3 Section 14.6.7 states that the
> > table can be updated during runtime.
> 
> I find the SDM is often unreadable gibberish, this part doesn't dissapoint.
> 
> There's a ton of might and maybe there; what does it actually do and how
> often does it do it? Given the thermal interrupt is such a shitshow, we
> really, as in *REALLY* don't want this to happen at any frequency at
> all.
> 
> And if it barely happens, why do we care if it happens at all?

I enabled this HFI crud on my ADL (INTEL_HFI_THERMAL -- because
apparently Debian doesn't default enable this) and now I get all of _1_
interrupt during boot.

Building a kernel on that machine doesn't manage to trip another one.

TRM:          1          1          1          1          1          1          1          1          1          1          1          1          1          1          1          1          1          1          1          1          1          1          1          1   Thermal event interrupts

So yeah.. *can* change, but doesn't.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* RE: [RFC PATCH 15/23] thermal: intel: hfi: Report per-cpu class-specific performance scores
  2022-10-06  9:14         ` Peter Zijlstra
@ 2022-10-06 15:05           ` Brown, Len
  2022-10-06 16:14             ` Peter Zijlstra
  0 siblings, 1 reply; 78+ messages in thread
From: Brown, Len @ 2022-10-06 15:05 UTC (permalink / raw)
  To: Peter Zijlstra, Ricardo Neri
  Cc: Juri Lelli, Vincent Guittot, Neri, Ricardo, Shankar, Ravi V,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Mel Gorman, Wysocki, Rafael J, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Chen, Tim C

> > > Does any of that data actually ever change? Isn't the class score 
> > > fixed per CPU type?

Depends on the chip.

As we described at LPC, the ADL chips shipping today update their tables in response
to RAPL working to keep the system below PL1.  Linux or user-space can scribble on PL1
at any time, so technically, this table update can happen at any time.

That said, it is true that, say, an ADL desktop part that operates with plenty of power and cooling
will send the initial table and never have a need to update the table after that.

Upcoming chips are smarter and will give us more dynamic information.
We expect the P-unit to send only "meaningful" changes, and that they
Shall not occur more often than every 10ms.

Re: class core fixed per CPU type
Usually (but not always) the class score for every CPU of a type will be the same,
They can change, but likely will all change in concert.
However the scores for the different types can change relative to each other.

The not-always bit is because CPUs of the same type may not actually be identical.
Recall that ITMT was originally implemented to favor some Pcores that are
Faster than other Pcores of the same type.  Similarly, I would expect that
Ecores within a module will be uniform, but some Ecore modules may
differ from modules -- and how they differ can change at run-time.

Cheers,
-Len




^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 15/23] thermal: intel: hfi: Report per-cpu class-specific performance scores
  2022-10-06 15:05           ` Brown, Len
@ 2022-10-06 16:14             ` Peter Zijlstra
  2022-10-07 11:20               ` Len Brown
  0 siblings, 1 reply; 78+ messages in thread
From: Peter Zijlstra @ 2022-10-06 16:14 UTC (permalink / raw)
  To: Brown, Len
  Cc: Ricardo Neri, Juri Lelli, Vincent Guittot, Neri, Ricardo,
	Shankar, Ravi V, Ben Segall, Daniel Bristot de Oliveira,
	Dietmar Eggemann, Mel Gorman, Wysocki, Rafael J,
	Srinivas Pandruvada, Steven Rostedt, Tim Chen,
	Valentin Schneider, x86, linux-kernel, Chen, Tim C

On Thu, Oct 06, 2022 at 03:05:18PM +0000, Brown, Len wrote:
> > > > Does any of that data actually ever change? Isn't the class score 
> > > > fixed per CPU type?
> 
> Depends on the chip.
> 
> As we described at LPC, the ADL chips shipping today update their
> tables in response to RAPL working to keep the system below PL1.
> Linux or user-space can scribble on PL1 at any time, so technically,
> this table update can happen at any time.
> 
> That said, it is true that, say, an ADL desktop part that operates
> with plenty of power and cooling will send the initial table and never
> have a need to update the table after that.

I have a NUC, so laptop part with limited thermals (the Lenovo P360
Ultra was announed just after I ordered the P360 Tiny). Still I wasn't
able to trigger this during normal operation.

> Upcoming chips are smarter and will give us more dynamic information.
> We expect the P-unit to send only "meaningful" changes, and that they
> Shall not occur more often than every 10ms.

Make *very* sure those upcoming chips don't broadcast that interrupt.
Broadcast interrupts are unconditional crap.

Broadcast interrupts every 10ms is terrifying crap and a good reason for
people to force disable this stuff.



^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 09/23] sched/fair: Use task-class performance score to pick the busiest group
  2022-10-06  8:37       ` Peter Zijlstra
@ 2022-10-06 19:07         ` Ricardo Neri
  0 siblings, 0 replies; 78+ messages in thread
From: Ricardo Neri @ 2022-10-06 19:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Tim C . Chen

On Thu, Oct 06, 2022 at 10:37:52AM +0200, Peter Zijlstra wrote:
> On Wed, Oct 05, 2022 at 04:38:41PM -0700, Ricardo Neri wrote:
> > On Tue, Sep 27, 2022 at 01:01:32PM +0200, Peter Zijlstra wrote:
> > > On Fri, Sep 09, 2022 at 04:11:51PM -0700, Ricardo Neri wrote:
> 
> > > > @@ -9049,6 +9111,12 @@ static bool update_sd_pick_busiest(struct lb_env *env,
> > > >  		/* Prefer to move from lowest priority CPU's work */
> > > >  		if (sched_asym_prefer(sg->asym_prefer_cpu, sds->busiest->asym_prefer_cpu))
> > > >  			return false;
> > > > +
> > > > +		/* @sg and @sds::busiest have the same priority. */
> > > > +		if (sched_asym_class_pick(sds->busiest, sg, &sds->busiest_stat, sgs))
> > > > +			return false;
> > > > +
> > > > +		/* @sg has lower priority than @sds::busiest. */
> > > >  		break;
> > > >  
> > > >  	case group_misfit_task:
> > > 
> > > So why does only this one instance of asym_prefer() require tie
> > > breaking?
> > 
> > This is the only place in which two sched groups with running tasks and of
> > equal priority are compared.
> > 
> > In all other places sched_asym_prefer() is used to compare the destination
> > CPU with others. Since asym_packing is done only when the destination CPU is
> > idle, there is no need to break this tie.

> 
> That would make for a fine comment, no? Because as presented one is left
> wondering, why if asym_prefer() needs tie breaking, only this one site
> needs it.

Sure.  I will add this comment.

> 
> > > And while looking through this, I must ask about the comment that goes
> > > with sched_set_itmt_core_prio() vs the sg->asym_prefer_cpu assignment in
> > > init_sched_groups_capacity(), what-up ?!
> > 
> > Are you referring to this comment?
> > 
> > 	"No need to rebuild sched domain after updating
> > 	 the CPU priorities. The sched domains have no
> > 	 dependency on CPU priorities"
> > 
> > If yes, then it looks wrong to me. Sched domains are rebuilt after updating
> > priorities.

I can included in the series a patch removing this comment.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 23/23] x86/process: Reset hardware history in context switch
  2022-10-06  8:35       ` Peter Zijlstra
@ 2022-10-06 22:55         ` Ricardo Neri
  0 siblings, 0 replies; 78+ messages in thread
From: Ricardo Neri @ 2022-10-06 22:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Tim C . Chen

On Thu, Oct 06, 2022 at 10:35:36AM +0200, Peter Zijlstra wrote:
> On Mon, Oct 03, 2022 at 04:07:58PM -0700, Ricardo Neri wrote:
> > On Tue, Sep 27, 2022 at 02:52:24PM +0200, Peter Zijlstra wrote:
> > > On Fri, Sep 09, 2022 at 04:12:05PM -0700, Ricardo Neri wrote:
> > > 
> > > > +void reset_hardware_history(void)
> > > > +{
> > > > +	if (!static_cpu_has(X86_FEATURE_HRESET))
> > > > +		return;
> > 
> > If I used cpu_feature_enabled(X86_FEATURE_ITD) along with the CONFIG_HFI_
> > THERMAL and its corresponding DISABLE_MASK bit the code below would be
> > compiled out.
> 
> Nobody cares about compiled out -- distro's must enable all this.

I see.

> So
> what counts is the code size, and the alternative is smaller.
> 
> > > > +
> > > > +	asm volatile("mov %0, %%eax;" __ASM_HRESET "\n" : :
> > > > +		     "r" (hardware_history_features) : "%rax");
> > > > +}
> > > 
> > > 	asm_inline volatile (ALTERNATIVE("", __ASM_HRESET, X86_FEATURE_HRESET)
> > > 			     : : "a" (hardware_history_features) : "memory");
> > 
> > Do you still prefer have implemented as an ALTERNATIVE?
> 
> Yes, look at the generated code.

I did compare the sizes of the two options as several NOPs are added at
the end. I will take your code.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 15/23] thermal: intel: hfi: Report per-cpu class-specific performance scores
  2022-10-06 16:14             ` Peter Zijlstra
@ 2022-10-07 11:20               ` Len Brown
  0 siblings, 0 replies; 78+ messages in thread
From: Len Brown @ 2022-10-07 11:20 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Brown, Len, Ricardo Neri, Juri Lelli, Vincent Guittot, Neri,
	Ricardo, Shankar, Ravi V, Ben Segall, Daniel Bristot de Oliveira,
	Dietmar Eggemann, Mel Gorman, Wysocki, Rafael J,
	Srinivas Pandruvada, Steven Rostedt, Tim Chen,
	Valentin Schneider, x86, linux-kernel, Chen, Tim C

On Thu, Oct 6, 2022 at 6:15 PM Peter Zijlstra <peterz@infradead.org> wrote:

> > That said, it is true that, say, an ADL desktop part that operates
> > with plenty of power and cooling will send the initial table and never
> > have a need to update the table after that.
>
> I have a NUC, so laptop part with limited thermals (the Lenovo P360
> Ultra was announed just after I ordered the P360 Tiny). Still I wasn't
> able to trigger this during normal operation.

To force an update, scribble on the software power limit:

# echo 0 >   /sys/class/powercap/intel-rapl:0/constraint_0_power_limit_uw
# echo 0 >   /sys/class/powercap/intel-rapl:0/constraint_1_power_limit_uw

Note that your P-unit may not instantaneously respond to this stimulus.

> > Upcoming chips are smarter and will give us more dynamic information.
> > We expect the P-unit to send only "meaningful" changes, and that they
> > Shall not occur more often than every 10ms.
>
> Make *very* sure those upcoming chips don't broadcast that interrupt.
> Broadcast interrupts are unconditional crap.
>
> Broadcast interrupts every 10ms is terrifying crap and a good reason for
> people to force disable this stuff.

The package-wide broadcast interrupt is indeed, ugly, and we've
been screaming bloody murder about it from day 1.  Unfortunately
the chip pipeline is long and deep, and so more chips with this issue will
materialize before the broadcast interrupt issue completely goes away.
--
Len Brown, Intel

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 14/23] thermal: intel: hfi: Update the class of the current task
  2022-09-27 11:46   ` Peter Zijlstra
@ 2022-10-07 20:34     ` Ricardo Neri
  0 siblings, 0 replies; 78+ messages in thread
From: Ricardo Neri @ 2022-10-07 20:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Tim C . Chen

On Tue, Sep 27, 2022 at 01:46:59PM +0200, Peter Zijlstra wrote:
> On Fri, Sep 09, 2022 at 04:11:56PM -0700, Ricardo Neri wrote:
> > +union hfi_thread_feedback_char_msr {
> > +	struct {
> > +		u8	classid;
> > +		u64	__reserved:55;
> > +		u8	valid:1;
> > +	} split;
> > +	u64 full;
> > +};
> 
> Urgh, did you perhaps mean:
> 
> 	struct {
> 		u64	classid    :8;
> 		u64	__reserved :55;
> 		u64	valid      :1
> 	};
> 
> ?
> 
> Because yes, GCC does fold that into a single u64, but that's
> implementation defined behaviour; the C spec doesn't require one to pack
> adjacent bitfields of different types together.
> 
> I layout of:
> 
> 	u8	class;			// offset 0
> 	u64	__reserver : 55;	// offset 8
> 	u8	valid : 1;		// offset 16
> 
> with a total size of 24 bytes is, AFAIU, a valid result of what you
> wrote.

I checked the C99 and C11 specs and, IIUC, it does prescribe how to handle
adjacent bit-fields:

	"An implementation may allocate any addressable storage unit large
	 enough to hold a bitfield. If enough space remains, a bit-field
	 that immediately follows another bit-field in a structure shall be
	 packed into adjacent bits of the same unit."

Hence, __reserved and valid should be packed. classid, however, it is not
guaranteed to be adjacent to __reserved.

I will implement the struct are you have described.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 10/23] sched/fair: Use classes of tasks when selecting a busiest runqueue
  2022-09-27 11:25   ` Peter Zijlstra
@ 2022-10-07 23:36     ` Ricardo Neri
  0 siblings, 0 replies; 78+ messages in thread
From: Ricardo Neri @ 2022-10-07 23:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Tim C . Chen

On Tue, Sep 27, 2022 at 01:25:57PM +0200, Peter Zijlstra wrote:
> On Fri, Sep 09, 2022 at 04:11:52PM -0700, Ricardo Neri wrote:
> 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 7368a0b453ee..085b1f75d90b 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -10009,6 +10009,7 @@ static struct rq *find_busiest_queue(struct lb_env *env,
> >  	int i;
> >  
> >  	for_each_cpu_and(i, sched_group_span(group), env->cpus) {
> > +		int busiest_class_delta_score = INT_MIN;
> 
> You sure you want to break ties toward a negative IPC 'improvement' ?

Yes. We may have a negative IPC class delta if the destination CPU has
lower prioriy than the CPUs in the busiest group. This may occur when
the busiest group is a core with more than one busy SMT sibling.

> 
> >  		unsigned long capacity, load, util;
> >  		unsigned int nr_running;
> >  		enum fbq_type rt;
> > @@ -10118,6 +10119,20 @@ static struct rq *find_busiest_queue(struct lb_env *env,
> >  			if (busiest_nr < nr_running) {
> >  				busiest_nr = nr_running;
> >  				busiest = rq;
> > +			} else if (sched_task_classes_enabled() &&
> > +				   busiest_nr == nr_running) {
> > +				int curr_class_delta_score;
> > +
> > +				curr_class_delta_score = arch_get_task_class_score(rq->curr->class,
> > +										   env->dst_cpu) -
> > +							 arch_get_task_class_score(rq->curr->class,
> > +										   cpu_of(rq));
> 
> *sigh*, you really couldn't find a more ugly layout if you tried.
> 
> Perhaps use less than novella length identifiers?

Sure, I will revisit the naming of the identifiers.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 18/23] sched/task_struct: Add helpers for task classification
  2022-09-27 11:52   ` Peter Zijlstra
@ 2022-10-08  0:38     ` Ricardo Neri
  0 siblings, 0 replies; 78+ messages in thread
From: Ricardo Neri @ 2022-10-08  0:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Tim C . Chen

On Tue, Sep 27, 2022 at 01:52:24PM +0200, Peter Zijlstra wrote:
> On Fri, Sep 09, 2022 at 04:12:00PM -0700, Ricardo Neri wrote:
> 
> >  #ifdef CONFIG_SCHED_TASK_CLASSES
> >  	short				class;
> > +	short				class_candidate;
> > +	char				class_debounce_counter;
> >  #endif

> 
> That's 5 bytes; is that really needed? 4 would be so much better :-)

It can probably be implemented as:

u32	class : 9,
	class_candidate : 9,
	class_debounce_counter : 14;

9 bits accommodate the 256 possible classes + 'unclassified' that the
existing hardware can support (I doubt there will ever be this many
classes). The remainder bits can be used for debouncing, or even have
some reserved for future use.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Trying to apply patch set
  2022-09-09 23:11 [RFC PATCH 00/23] sched: Introduce classes of tasks for load balance Ricardo Neri
                   ` (23 preceding siblings ...)
       [not found] ` <20220910072120.2651-1-hdanton@sina.com>
@ 2022-10-11 19:12 ` Carlos Bilbao
  2022-10-18  2:31   ` Ricardo Neri
  24 siblings, 1 reply; 78+ messages in thread
From: Carlos Bilbao @ 2022-10-11 19:12 UTC (permalink / raw)
  To: ricardo.neri-calderon
  Cc: bristot, bsegall, dietmar.eggemann, juri.lelli, len.brown,
	linux-kernel, mgorman, peterz, rafael.j.wysocki, ravi.v.shankar,
	ricardo.neri, rostedt, srinivas.pandruvada, tim.c.chen,
	vincent.guittot, vschneid, x86, Carlos Bilbao

Could you please tell me on top of what tree does this apply? It would be
great if you could provide a link to a patched repository, given the
depedencies on other patch sets.

Thanks in advance,
Carlos

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: Trying to apply patch set
  2022-10-11 19:12 ` Trying to apply patch set Carlos Bilbao
@ 2022-10-18  2:31   ` Ricardo Neri
  0 siblings, 0 replies; 78+ messages in thread
From: Ricardo Neri @ 2022-10-18  2:31 UTC (permalink / raw)
  To: Carlos Bilbao
  Cc: bristot, bsegall, dietmar.eggemann, juri.lelli, len.brown,
	linux-kernel, mgorman, peterz, rafael.j.wysocki, ravi.v.shankar,
	ricardo.neri, rostedt, srinivas.pandruvada, tim.c.chen,
	vincent.guittot, vschneid, x86

On Tue, Oct 11, 2022 at 02:12:21PM -0500, Carlos Bilbao wrote:
> Could you please tell me on top of what tree does this apply? It would be
> great if you could provide a link to a patched repository, given the
> depedencies on other patch sets.

Hi Carlos,

These patches apply on top of v6.0. Also, you need to apply these patches
first:

https://lore.kernel.org/lkml/20220825225529.26465-1-ricardo.neri-calderon@linux.intel.com/

I will push these patches to a repo to make it convenient.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 08/23] sched/fair: Compute task-class performance scores for load balancing
  2022-09-27  9:15   ` Peter Zijlstra
@ 2022-10-26  3:57     ` Ricardo Neri
  2022-10-26  8:55       ` Peter Zijlstra
  0 siblings, 1 reply; 78+ messages in thread
From: Ricardo Neri @ 2022-10-26  3:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Tim C . Chen

On Tue, Sep 27, 2022 at 11:15:46AM +0200, Peter Zijlstra wrote:
> On Fri, Sep 09, 2022 at 04:11:50PM -0700, Ricardo Neri wrote:
> 
> > +static void compute_ilb_sg_task_class_scores(struct sg_lb_task_class_stats *class_sgs,
> > +					     struct sg_lb_stats *sgs,
> > +					     int dst_cpu)
> > +{
> > +	int group_score, group_score_without, score_on_dst_cpu;
> > +	int busy_cpus = sgs->group_weight - sgs->idle_cpus;
> > +
> > +	if (!sched_task_classes_enabled())
> > +		return;
> > +
> > +	/* No busy CPUs in the group. No tasks to move. */
> > +	if (!busy_cpus)
> > +		return;
> > +
> > +	score_on_dst_cpu = arch_get_task_class_score(class_sgs->p_min_score->class,
> > +						     dst_cpu);
> > +
> > +	/*
> > +	 * The simpest case. The single busy CPU in the current group will
> > +	 * become idle after pulling its current task. The destination CPU is
> > +	 * idle.
> > +	 */
> > +	if (busy_cpus == 1) {
> > +		sgs->task_class_score_before = class_sgs->sum_score;
> > +		sgs->task_class_score_after = score_on_dst_cpu;
> > +		return;
> > +	}
> > +
> > +	/*
> > +	 * Now compute the group score with and without the task with the
> > +	 * lowest score. We assume that the tasks that remain in the group share
> > +	 * the CPU resources equally.
> > +	 */
> > +	group_score = class_sgs->sum_score / busy_cpus;
> > +
> > +	group_score_without =  (class_sgs->sum_score - class_sgs->min_score) /
> > +			       (busy_cpus - 1);
> > +
> > +	sgs->task_class_score_after = group_score_without + score_on_dst_cpu;
> > +	sgs->task_class_score_before = group_score;
> > +}

(I am sorry Peter, I just found that several emails were sitting on my drafts
directory).

> 
> That's just plain broken; also lots of cleanups done...

Thank you very much for your suggestions. They make sense to me. I only
have a comment...

Do you want me to add your Signed-off-by and Co-developed-by tags?

> 
> ---
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8405,12 +8405,14 @@ struct sg_lb_stats {
>  	enum group_type group_type;
>  	unsigned int group_asym_packing; /* Tasks should be moved to preferred CPU */
>  	unsigned long group_misfit_task_load; /* A CPU has a task too big for its capacity */
> -	long task_class_score_after; /* Prospective task-class score after load balancing */
> -	long task_class_score_before; /* Task-class score before load balancing */
>  #ifdef CONFIG_NUMA_BALANCING
>  	unsigned int nr_numa_running;
>  	unsigned int nr_preferred_running;
>  #endif
> +#ifdef CONFIG_SCHED_TASK_CLASSES
> +	long task_class_score_after;  /* Prospective task-class score after load balancing */
> +	long task_class_score_before; /* Task-class score before load balancing */
> +#endif
>  };
>  
>  /*
> @@ -8689,58 +8691,54 @@ group_type group_classify(unsigned int i
>  }
>  
>  struct sg_lb_task_class_stats {
> -	/*
> -	 * Score of the task with lowest score among the current tasks (i.e.,
> -	 * runqueue::curr) of all runqueues in the scheduling group.
> -	 */
> -	int min_score;
> -	/*
> -	 * Sum of the scores of the current tasks of all runqueues in the
> -	 * scheduling group.
> -	 */
> -	long sum_score;
> -	/* The task with score equal to @min_score */
> -	struct task_struct *p_min_score;
> +	int min_score;                   /* Min(rq->curr->score) */
> +	int min_class;
> +	long sum_score;                  /* Sum(rq->curr->score) */
>  };
>  
>  #ifdef CONFIG_SCHED_TASK_CLASSES
> -static void init_rq_task_classes_stats(struct sg_lb_task_class_stats *class_sgs)
> +static void init_sg_lb_task_class_stats(struct sg_lb_task_class_stats *sgcs)
>  {
> -	class_sgs->min_score = INT_MAX;
> -	class_sgs->sum_score = 0;
> -	class_sgs->p_min_score = NULL;
> +	*sgcs = (struct sg_lb_task_class_stats){
> +		.min_score = INT_MAX,
> +	};
>  }
>  
>  /** Called only if cpu_of(@rq) is not idle and has tasks running. */
> -static void update_rq_task_classes_stats(struct sg_lb_task_class_stats *class_sgs,
> -					 struct rq *rq)
> +static void update_sg_lb_task_class_stats(struct sg_lb_task_class_stats *sgcs,
> +					  struct rq *rq)
>  {
> -	int score;
> +	struct task_struct *curr;
> +	int class, score;
>  
>  	if (!sched_task_classes_enabled())
>  		return;
>  
> +	curr = rcu_dereference(rq->curr);
> +	if (!curr || (curr->flags & PF_EXITING) || is_idle_task(curr))
> +		return;
> +
>  	/*
>  	 * TODO: if nr_running > 1 we may want go through all the tasks behind
>  	 * rq->curr.
>  	 */
> -	score = arch_get_task_class_score(rq->curr->class, cpu_of(rq));
> -
> -	class_sgs->sum_score += score;
> +	class = curr->class;
> +	score = arch_get_task_class_score(class, cpu_of(rq));
>  
> -	if (score >= class_sgs->min_score)
> -		return;
> +	sgcs->sum_score += score;
>  
> -	class_sgs->min_score = score;
> -	class_sgs->p_min_score = rq->curr;
> +	if (score < sgcs->min_score) {
> +		sgcs->min_score = score;
> +		sgcs->min_class = class;
> +	}
>  }
>  
> -static void compute_ilb_sg_task_class_scores(struct sg_lb_task_class_stats *class_sgs,
> -					     struct sg_lb_stats *sgs,
> -					     int dst_cpu)
> +static void update_sg_lb_stats_scores(struct sg_lb_task_class_stats *sgcs,
> +				      struct sg_lb_stats *sgs,
> +				      int dst_cpu)
>  {
> -	int group_score, group_score_without, score_on_dst_cpu;
>  	int busy_cpus = sgs->group_weight - sgs->idle_cpus;
> +	long before, after;
>  
>  	if (!sched_task_classes_enabled())
>  		return;
> @@ -8749,32 +8747,18 @@ static void compute_ilb_sg_task_class_sc
>  	if (!busy_cpus)
>  		return;
>  
> -	score_on_dst_cpu = arch_get_task_class_score(class_sgs->p_min_score->class,
> -						     dst_cpu);
> +	score_on_dst_cpu = arch_get_task_class_score(sgcs->min_class, dst_cpu);
>  
> -	/*
> -	 * The simpest case. The single busy CPU in the current group will
> -	 * become idle after pulling its current task. The destination CPU is
> -	 * idle.
> -	 */
> -	if (busy_cpus == 1) {
> -		sgs->task_class_score_before = class_sgs->sum_score;
> -		sgs->task_class_score_after = score_on_dst_cpu;
> -		return;
> -	}
> +	before = sgcs->sum_score
> +	after  = before - sgcs->min_score + score_on_dst_cpu;

This works when the sched group being evaluated has only one busy CPU
because it will become idle if the destination CPU (which was idle) pulls
the current task.

>  
> -	/*
> -	 * Now compute the group score with and without the task with the
> -	 * lowest score. We assume that the tasks that remain in the group share
> -	 * the CPU resources equally.
> -	 */
> -	group_score = class_sgs->sum_score / busy_cpus;
> -
> -	group_score_without =  (class_sgs->sum_score - class_sgs->min_score) /
> -			       (busy_cpus - 1);
> +	if (busy_cpus > 1) {
> +		before /= busy_cpus;
> +		after  /= busy_cpus;


However, I don't think this works when the sched group has more than one
busy CPU. 'before' and 'after' reflect the total throughput score of both
the sched group *and* the destination CPU.

One of the CPUs in the sched group will become idle after the balance.

Also, at this point we have already added score_on_dst_cpu. We are incorrectly
scaling it by the number of busy CPUs in the sched group.

We instead must scale 'after' by busy_cpus - 1 and then add score_on_dst_cpu.

> +	}
>  
> -	sgs->task_class_score_after = group_score_without + score_on_dst_cpu;
> -	sgs->task_class_score_before = group_score;
> +	sgs->task_class_score_before = before;
> +	sgs->task_class_score_after  = after;
>  

Thanks and BR,
Ricardo
>  

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 12/23] thermal: intel: hfi: Convert table_lock to use flags-handling variants
  2022-09-27 11:34   ` Peter Zijlstra
  2022-09-27 11:36     ` Peter Zijlstra
@ 2022-10-26  3:58     ` Ricardo Neri
  1 sibling, 0 replies; 78+ messages in thread
From: Ricardo Neri @ 2022-10-26  3:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Tim C . Chen

On Tue, Sep 27, 2022 at 01:34:07PM +0200, Peter Zijlstra wrote:
> On Fri, Sep 09, 2022 at 04:11:54PM -0700, Ricardo Neri wrote:
> 
> > --- a/drivers/thermal/intel/intel_hfi.c
> > +++ b/drivers/thermal/intel/intel_hfi.c
> > @@ -175,9 +175,10 @@ static struct workqueue_struct *hfi_updates_wq;
> >  static void get_hfi_caps(struct hfi_instance *hfi_instance,
> >  			 struct thermal_genl_cpu_caps *cpu_caps)
> >  {
> > +	unsigned long flags;
> >  	int cpu, i = 0;
> >  
> > -	raw_spin_lock_irq(&hfi_instance->table_lock);
> > +	raw_spin_lock_irqsave(&hfi_instance->table_lock, flags);
> >  	for_each_cpu(cpu, hfi_instance->cpus) {
> >  		struct hfi_cpu_data *caps;
> >  		s16 index;

(Another email I thought I had sent but did not. Sorry!)

> 
> ^^^^ Anti-pattern alert!
> 
> Now your IRQ latency depends on nr_cpus -- which is a fair fail. The
> existing code is already pretty crap in that it has the preemption
> latency depend on nr_cpus.

I see.

> 
> While I'm here looking at the HFI stuff, did they fix that HFI interrupt
> broadcast mess already? Sending an interrupt to *all* CPUs is quite
> insane.

This issue has been raised with hardware teams and they are looking into
fixes. The issue, however, may persist on several models as while a fix
propagates.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 12/23] thermal: intel: hfi: Convert table_lock to use flags-handling variants
  2022-09-27 11:36     ` Peter Zijlstra
@ 2022-10-26  3:59       ` Ricardo Neri
  0 siblings, 0 replies; 78+ messages in thread
From: Ricardo Neri @ 2022-10-26  3:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Tim C . Chen

On Tue, Sep 27, 2022 at 01:36:53PM +0200, Peter Zijlstra wrote:
> On Tue, Sep 27, 2022 at 01:34:07PM +0200, Peter Zijlstra wrote:
> > On Fri, Sep 09, 2022 at 04:11:54PM -0700, Ricardo Neri wrote:
> > 
> > > --- a/drivers/thermal/intel/intel_hfi.c
> > > +++ b/drivers/thermal/intel/intel_hfi.c
> > > @@ -175,9 +175,10 @@ static struct workqueue_struct *hfi_updates_wq;
> > >  static void get_hfi_caps(struct hfi_instance *hfi_instance,
> > >  			 struct thermal_genl_cpu_caps *cpu_caps)
> > >  {
> > > +	unsigned long flags;
> > >  	int cpu, i = 0;
> > >  
> > > -	raw_spin_lock_irq(&hfi_instance->table_lock);
> > > +	raw_spin_lock_irqsave(&hfi_instance->table_lock, flags);
> > >  	for_each_cpu(cpu, hfi_instance->cpus) {
> > >  		struct hfi_cpu_data *caps;
> > >  		s16 index;
> > 
> > ^^^^ Anti-pattern alert!
> > 
> > Now your IRQ latency depends on nr_cpus -- which is a fair fail. The
> > existing code is already pretty crap in that it has the preemption
> > latency depend on nr_cpus.
> > 
> > While I'm here looking at the HFI stuff, did they fix that HFI interrupt
> > broadcast mess already? Sending an interrupt to *all* CPUs is quite
> > insane.
> 
> Anyway; given the existence of this here loop; why not have something
> like:
> 
> DEFINE_PER_CPU(int, hfi_ipc_class);
> 
> 	class = // extract from HFI mess
> 	WRITE_ONCE(per_cpu(hfi_ipc_class, cpu), class);
> 
> And then have the tick use this_cpu_read(hfi_ipc_class)? No extra
> locking required.

Thanks Peter. I think this is a good solution. I will implement it.

BR,
Ricardo

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 08/23] sched/fair: Compute task-class performance scores for load balancing
  2022-10-26  3:57     ` Ricardo Neri
@ 2022-10-26  8:55       ` Peter Zijlstra
  2022-10-27  3:30         ` Ricardo Neri
  0 siblings, 1 reply; 78+ messages in thread
From: Peter Zijlstra @ 2022-10-26  8:55 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Tim C . Chen

On Tue, Oct 25, 2022 at 08:57:24PM -0700, Ricardo Neri wrote:

> Do you want me to add your Signed-off-by and Co-developed-by tags?

Nah; who cares ;-)


> > @@ -8749,32 +8747,18 @@ static void compute_ilb_sg_task_class_sc
> >  	if (!busy_cpus)
> >  		return;
> >  
> > -	score_on_dst_cpu = arch_get_task_class_score(class_sgs->p_min_score->class,
> > -						     dst_cpu);
> > +	score_on_dst_cpu = arch_get_task_class_score(sgcs->min_class, dst_cpu);
> >  
> > -	/*
> > -	 * The simpest case. The single busy CPU in the current group will
> > -	 * become idle after pulling its current task. The destination CPU is
> > -	 * idle.
> > -	 */
> > -	if (busy_cpus == 1) {
> > -		sgs->task_class_score_before = class_sgs->sum_score;
> > -		sgs->task_class_score_after = score_on_dst_cpu;
> > -		return;
> > -	}
> > +	before = sgcs->sum_score
> > +	after  = before - sgcs->min_score + score_on_dst_cpu;
> 
> This works when the sched group being evaluated has only one busy CPU
> because it will become idle if the destination CPU (which was idle) pulls
> the current task.
> 
> >  
> > -	/*
> > -	 * Now compute the group score with and without the task with the
> > -	 * lowest score. We assume that the tasks that remain in the group share
> > -	 * the CPU resources equally.
> > -	 */
> > -	group_score = class_sgs->sum_score / busy_cpus;
> > -
> > -	group_score_without =  (class_sgs->sum_score - class_sgs->min_score) /
> > -			       (busy_cpus - 1);
> > +	if (busy_cpus > 1) {
> > +		before /= busy_cpus;
> > +		after  /= busy_cpus;
> 
> 
> However, I don't think this works when the sched group has more than one
> busy CPU. 'before' and 'after' reflect the total throughput score of both
> the sched group *and* the destination CPU.
> 
> One of the CPUs in the sched group will become idle after the balance.
> 
> Also, at this point we have already added score_on_dst_cpu. We are incorrectly
> scaling it by the number of busy CPUs in the sched group.
> 
> We instead must scale 'after' by busy_cpus - 1 and then add score_on_dst_cpu.

So none of that makes sense.

'x/n + y' != '(x+y)/(n+1)'

IOW:

> > +	}
> >  
> > -	sgs->task_class_score_after = group_score_without + score_on_dst_cpu;
> > -	sgs->task_class_score_before = group_score;
> > +	sgs->task_class_score_before = before;
> > +	sgs->task_class_score_after  = after;

your task_class_score_after is a sum value for 2 cpus worth, not a value
for a single cpu, while your task_class_score_before is a single cpu
average. You can't compare these numbers and have a sensible outcome.

If you have a number of values: x_1....x_n, their average is
Sum(x_1...x_n) / n, which is a single value again.

If you want to update one of the x's, say x_i->x'_i, then the average
changes like:

	Sum(x_1...x_n-x_i+x'_i) / n

If you want to remove one of the x's, then you get:

	Sum(x_1...x_n-x_i) / (n-1) ; 0<i<n

if you want to add an x:

	Sum(x_1...x_n+i_i) / (n+1) ; i>n

Nowhere would you ever get:

	Sum(x_1...x_n) / n + x_i

That's just straight up nonsense.

So I might buy an argument for:

	if (busy_cpus > 1) {
		before /= (busy_cpus-1);
		after  /= (busy_cpus+1);
	}

Or something along those lines (where you remove an entry from before
and add it to after), but not this.


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [RFC PATCH 08/23] sched/fair: Compute task-class performance scores for load balancing
  2022-10-26  8:55       ` Peter Zijlstra
@ 2022-10-27  3:30         ` Ricardo Neri
  0 siblings, 0 replies; 78+ messages in thread
From: Ricardo Neri @ 2022-10-27  3:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Juri Lelli, Vincent Guittot, Ricardo Neri, Ravi V. Shankar,
	Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann,
	Len Brown, Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Tim C . Chen

On Wed, Oct 26, 2022 at 10:55:11AM +0200, Peter Zijlstra wrote:
> On Tue, Oct 25, 2022 at 08:57:24PM -0700, Ricardo Neri wrote:
> 
> > Do you want me to add your Signed-off-by and Co-developed-by tags?
> 
> Nah; who cares ;-)
> 
> 
> > > @@ -8749,32 +8747,18 @@ static void compute_ilb_sg_task_class_sc
> > >  	if (!busy_cpus)
> > >  		return;
> > >  
> > > -	score_on_dst_cpu = arch_get_task_class_score(class_sgs->p_min_score->class,
> > > -						     dst_cpu);
> > > +	score_on_dst_cpu = arch_get_task_class_score(sgcs->min_class, dst_cpu);
> > >  
> > > -	/*
> > > -	 * The simpest case. The single busy CPU in the current group will
> > > -	 * become idle after pulling its current task. The destination CPU is
> > > -	 * idle.
> > > -	 */
> > > -	if (busy_cpus == 1) {
> > > -		sgs->task_class_score_before = class_sgs->sum_score;
> > > -		sgs->task_class_score_after = score_on_dst_cpu;
> > > -		return;
> > > -	}
> > > +	before = sgcs->sum_score
> > > +	after  = before - sgcs->min_score + score_on_dst_cpu;
> > 
> > This works when the sched group being evaluated has only one busy CPU
> > because it will become idle if the destination CPU (which was idle) pulls
> > the current task.
> > 
> > >  
> > > -	/*
> > > -	 * Now compute the group score with and without the task with the
> > > -	 * lowest score. We assume that the tasks that remain in the group share
> > > -	 * the CPU resources equally.
> > > -	 */
> > > -	group_score = class_sgs->sum_score / busy_cpus;
> > > -
> > > -	group_score_without =  (class_sgs->sum_score - class_sgs->min_score) /
> > > -			       (busy_cpus - 1);
> > > +	if (busy_cpus > 1) {
> > > +		before /= busy_cpus;
> > > +		after  /= busy_cpus;
> > 
> > 
> > However, I don't think this works when the sched group has more than one
> > busy CPU. 'before' and 'after' reflect the total throughput score of both
> > the sched group *and* the destination CPU.
> > 
> > One of the CPUs in the sched group will become idle after the balance.
> > 
> > Also, at this point we have already added score_on_dst_cpu. We are incorrectly
> > scaling it by the number of busy CPUs in the sched group.
> > 
> > We instead must scale 'after' by busy_cpus - 1 and then add score_on_dst_cpu.
> 
> So none of that makes sense.
> 
> 'x/n + y' != '(x+y)/(n+1)'
> 
> IOW:
> 
> > > +	}
> > >  
> > > -	sgs->task_class_score_after = group_score_without + score_on_dst_cpu;
> > > -	sgs->task_class_score_before = group_score;
> > > +	sgs->task_class_score_before = before;
> > > +	sgs->task_class_score_after  = after;
> 
> your task_class_score_after is a sum value for 2 cpus worth not a value
> for a single cpu,

Agreed.

> while your task_class_score_before is a single cpu
> average.

Agreed. You can also regard task_class_score_before as a value for 2 CPUs
worth, only that the contribution to throughput of dst_cpu is 0.

> You can't compare these numbers and have a sensible outcome.
> 
> If you have a number of values: x_1....x_n, their average is
> Sum(x_1...x_n) / n, which is a single value again.
> 
> If you want to update one of the x's, say x_i->x'_i, then the average
> changes like:
> 
> 	Sum(x_1...x_n-x_i+x'_i) / n
> 
> If you want to remove one of the x's, then you get:
> 
> 	Sum(x_1...x_n-x_i) / (n-1) ; 0<i<n
> 
> if you want to add an x:
> 
> 	Sum(x_1...x_n+i_i) / (n+1) ; i>n
> 
> Nowhere would you ever get:
> 
> 	Sum(x_1...x_n) / n + x_i
> 
> That's just straight up nonsense.

But we are not computing the average throughput. We are computing the
*total* throughput of two CPUs. Hence, what we need is the sum of the
throughput score of both CPUs.

We may be here because the sched group of sgcs is composed of SMT
siblings. Hence, we divide by busy_cpus assuming that all busy siblings
share the core resources evenly. (For non-SMT sched groups, busy_cpus is 1
at most).

> 
> So I might buy an argument for:
> 
> 	if (busy_cpus > 1) {
> 		before /= (busy_cpus-1);
> 		after  /= (busy_cpus+1);
> 	}
> 
> Or something along those lines (where you remove an entry from before
> and add it to after), but not this.

The entry that we remove from before will go to after. The entry will be
placed in the local group. This group has a different busy_cpus: all of its
SMT siblings, if any, idle (otherwise, we would not be here). It will
become 1 after the balance.

^ permalink raw reply	[flat|nested] 78+ messages in thread

end of thread, other threads:[~2022-10-27  3:23 UTC | newest]

Thread overview: 78+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-09 23:11 [RFC PATCH 00/23] sched: Introduce classes of tasks for load balance Ricardo Neri
2022-09-09 23:11 ` [RFC PATCH 01/23] sched/task_struct: Introduce classes of tasks Ricardo Neri
2022-09-14 13:46   ` Peter Zijlstra
2022-09-16 14:41     ` Ricardo Neri
2022-09-27 13:01       ` Peter Zijlstra
2022-10-02 22:32         ` Ricardo Neri
2022-09-09 23:11 ` [RFC PATCH 02/23] sched: Add interfaces for " Ricardo Neri
2022-09-09 23:11 ` [RFC PATCH 03/23] sched/core: Initialize the class of a new task Ricardo Neri
2022-09-26 14:57   ` Joel Fernandes
2022-09-26 21:53     ` Ricardo Neri
2022-09-27 13:04     ` Peter Zijlstra
2022-09-27 15:48       ` Joel Fernandes
2022-10-01 20:32       ` Ricardo Neri
2022-09-09 23:11 ` [RFC PATCH 04/23] sched/core: Add user_tick as argument to scheduler_tick() Ricardo Neri
2022-09-09 23:11 ` [RFC PATCH 05/23] sched/core: Move is_core_idle() out of fair.c Ricardo Neri
2022-09-09 23:11 ` [RFC PATCH 06/23] sched/core: Update the classification of the current task Ricardo Neri
2022-09-14 13:44   ` Peter Zijlstra
2022-09-16 14:42     ` Ricardo Neri
2022-09-09 23:11 ` [RFC PATCH 07/23] sched/fair: Collect load-balancing stats for task classes Ricardo Neri
2022-09-09 23:11 ` [RFC PATCH 08/23] sched/fair: Compute task-class performance scores for load balancing Ricardo Neri
2022-09-27  9:15   ` Peter Zijlstra
2022-10-26  3:57     ` Ricardo Neri
2022-10-26  8:55       ` Peter Zijlstra
2022-10-27  3:30         ` Ricardo Neri
2022-09-09 23:11 ` [RFC PATCH 09/23] sched/fair: Use task-class performance score to pick the busiest group Ricardo Neri
2022-09-27 11:01   ` Peter Zijlstra
2022-10-05 23:38     ` Ricardo Neri
2022-10-06  8:37       ` Peter Zijlstra
2022-10-06 19:07         ` Ricardo Neri
2022-09-09 23:11 ` [RFC PATCH 10/23] sched/fair: Use classes of tasks when selecting a busiest runqueue Ricardo Neri
2022-09-27 11:25   ` Peter Zijlstra
2022-10-07 23:36     ` Ricardo Neri
2022-09-09 23:11 ` [RFC PATCH 11/23] thermal: intel: hfi: Introduce Hardware Feedback Interface classes Ricardo Neri
2022-09-09 23:11 ` [RFC PATCH 12/23] thermal: intel: hfi: Convert table_lock to use flags-handling variants Ricardo Neri
2022-09-27 11:34   ` Peter Zijlstra
2022-09-27 11:36     ` Peter Zijlstra
2022-10-26  3:59       ` Ricardo Neri
2022-10-26  3:58     ` Ricardo Neri
2022-09-09 23:11 ` [RFC PATCH 13/23] x86/cpufeatures: Add the Intel Thread Director feature definitions Ricardo Neri
2022-09-09 23:11 ` [RFC PATCH 14/23] thermal: intel: hfi: Update the class of the current task Ricardo Neri
2022-09-27 11:46   ` Peter Zijlstra
2022-10-07 20:34     ` Ricardo Neri
2022-09-09 23:11 ` [RFC PATCH 15/23] thermal: intel: hfi: Report per-cpu class-specific performance scores Ricardo Neri
2022-09-27 11:59   ` Peter Zijlstra
2022-10-05 23:59     ` Ricardo Neri
2022-10-06  8:52       ` Peter Zijlstra
2022-10-06  9:14         ` Peter Zijlstra
2022-10-06 15:05           ` Brown, Len
2022-10-06 16:14             ` Peter Zijlstra
2022-10-07 11:20               ` Len Brown
2022-09-09 23:11 ` [RFC PATCH 16/23] thermal: intel: hfi: Define a default classification for unclassified tasks Ricardo Neri
2022-09-09 23:11 ` [RFC PATCH 17/23] thermal: intel: hfi: Enable the Intel Thread Director Ricardo Neri
2022-09-27 12:00   ` Peter Zijlstra
2022-10-06  1:50     ` Ricardo Neri
2022-09-09 23:12 ` [RFC PATCH 18/23] sched/task_struct: Add helpers for task classification Ricardo Neri
2022-09-27 11:52   ` Peter Zijlstra
2022-10-08  0:38     ` Ricardo Neri
2022-09-09 23:12 ` [RFC PATCH 19/23] sched/core: Initialize helpers of " Ricardo Neri
2022-09-09 23:12 ` [RFC PATCH 20/23] thermal: intel: hfi: Implement model-specific checks for " Ricardo Neri
2022-09-09 23:12 ` [RFC PATCH 21/23] x86/cpufeatures: Add feature bit for HRESET Ricardo Neri
2022-09-09 23:12 ` [RFC PATCH 22/23] x86/hreset: Configure history reset Ricardo Neri
2022-09-27 12:03   ` Peter Zijlstra
2022-10-02 22:34     ` Ricardo Neri
2022-09-09 23:12 ` [RFC PATCH 23/23] x86/process: Reset hardware history in context switch Ricardo Neri
2022-09-27 12:52   ` Peter Zijlstra
2022-10-03 23:07     ` Ricardo Neri
2022-10-06  8:35       ` Peter Zijlstra
2022-10-06 22:55         ` Ricardo Neri
2022-09-27 12:53   ` Peter Zijlstra
2022-10-02 22:02     ` Ricardo Neri
2022-09-27 13:15   ` Borislav Petkov
2022-10-02 22:12     ` Ricardo Neri
2022-10-02 22:15       ` Borislav Petkov
2022-10-03 19:49         ` Ricardo Neri
2022-10-03 19:55           ` Borislav Petkov
     [not found] ` <20220910072120.2651-1-hdanton@sina.com>
2022-09-16 14:51   ` [RFC PATCH 06/23] sched/core: Update the classification of the current task Ricardo Neri
2022-10-11 19:12 ` Trying to apply patch set Carlos Bilbao
2022-10-18  2:31   ` Ricardo Neri

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).