linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 00/23] sched: Introduce classes of tasks for load balance
@ 2022-09-09 23:11 Ricardo Neri
  2022-09-09 23:11 ` [RFC PATCH 01/23] sched/task_struct: Introduce classes of tasks Ricardo Neri
                   ` (24 more replies)
  0 siblings, 25 replies; 78+ messages in thread
From: Ricardo Neri @ 2022-09-09 23:11 UTC (permalink / raw)
  To: Peter Zijlstra (Intel), Juri Lelli, Vincent Guittot
  Cc: Ricardo Neri, Ravi V. Shankar, Ben Segall,
	Daniel Bristot de Oliveira, Dietmar Eggemann, Len Brown,
	Mel Gorman, Rafael J. Wysocki, Srinivas Pandruvada,
	Steven Rostedt, Tim Chen, Valentin Schneider, x86, linux-kernel,
	Ricardo Neri

+++ Introduction

On hybrid processors, the microarchitectural properties of the different
types of CPUs cause them to have different instruction-per-cycle (IPC)
capabilities. IPC can be higher on some CPUs for advanced instructions
Figure 1 illustrates this concept. It plots hypothetical workloads
grouped by classes of instructions vs the IPC ratio between high and low
performance CPUs.

IPC ratio
  ^
  | Class0 .             Class1               .   ClassN-1    .  ClassN
  |        .                                  .               .   +
  |        .                                  .               .  +
  |        .                                  .               . +
  |        .                                  . + + + + + + + +
  |        .                                  .               .
  |        .                                  .               .
  |        .             + + + + + + + + + + +                .
  |        + + + + + + +                      .               .
  |      + .                                  .               .
  |     +  .                                  .               .
  |    +   .                                  .               .
  |   +    .                                  .               .
1 |-------------------------------------------//---------------------------->
  |                                                      wokloads of interest
															
            Figure 1. Hypothetical workloads sorted by IPC ratio


The load balancer can discover the use of advanced instructions and prefer
CPUs with higher IPC for tasks running those instructions.

Hardware is free to partition its instruction set into an arbitrary number
of classes. It must provide a mechanism identify the class of the
currently running task and inform the kernel about the performance of each
class of task on each type of CPU.

This patchset introduces the concept of classes of tasks, proposes the
interfaces that hardware needs to implement and proposes changes to the
load balancer to leverage this extra information in combination with
asymmetric packing.

This patchset includes a full implementation for Intel hybrid processors
using Intel Thread Director technology [1].


+++ Structure of this series

Patches 1-6 introduce the concept of classes of tasks. They also introduce
interfaces that architectures implement to update the class of a task and
to inform the scheduler about the class-specific performance scores.

Patches 7-9 use the class of the current task of each runqueue to break
ties between two otherwise identical group_asym_packing scheduling groups.

Patches 10-16 implement support for classes of tasks on Intel hybrid
processors using Intel Thread Director technology.

Patches 17-19 introduce extra helper members to task_struct to deal with
transient classification of tasks and arch-specific implementation
vagaries.

Patches 20-22 are specific to Intel Thread Director. They reset the
classification hardware when switching to a new task.


+++ Classes of tasks

The first step to leverage the asymmetries in IPC ratios is to assign a
class label to each individual task. Hardware can monitor the instruction
stream and classify a task as it runs. At user tick, the kernel reads the
resulting classification and assigns it to the currently running task. It
stores the class in the proposed task_struct::class.


+++ Balancing load using classes of tasks. Theory of operation

Intel hybrid processors rely on asymmetric packing to place tasks on
higher performance CPUs first. The class of the current task on each
runqueue can be used to break ties between two otherwise identical
scheduling groups.

Consider these scenarios (for simplicity, assume that task-class
performance score is such that

score(Cl0) < score(Cl1) < ... < score(Cl(n-1)) < score(Cln)). (Eq I)

Boxes depict scheduling groups being considered during load balance.
Labels inside the box depict the class of rq->curr, or the CPU being
idle.

    asym
    packing
    priorities    50    50           30           30
                _____________
                |  i  .  i  |
                |  d  .  d  |
                |  l  .  l  |      _______      _______
                |  e  .  e  |      | Cl0 |      | Cl1 |
                |___________|      |_____|      |_____|

                         ^
                      dst_cpu        sgA          sgB
                                                   ^
                                                busiest

                           Figure 2. Scenario A
	
In Figure 2, dst_cpu is a group of SMT siblings, has become idle, has
higher priority, and is identifying the busiest group. sgA and sgB are of
group_asym_packing type, have the same priority, have a single CPU, and
have the same number of running tasks. By checking the class of the task
currently running on both scheduling groups, it selects sgB as the busiest
because it has a class of task higher performance score if placed on
dst_cpu.

    asym
    packing
    priorities    50    50           50    50           30
                _____________     _____________
                |     .     |     |     .     |
                |     .     |     |     .     |       idle
                | cl0 . cl1 |     | cl0 . cl2 |      _______
                |     .     |     |     .     |      |     |
                |___________|     |___________|      |_____|

                                                        ^
                     sgA                sgB          dst_cpu
                                         ^
                                    busiest group

                                     ^
                                  busiest queue

                           Figure 3. Scenario B

In Figure 3, dst_cpu has become idle, has lower priority and is identifying
a busiest group. sgA and sgB are groups of SMT siblings. Both siblings are
busy and, therefore, classified as group_asym_packing [2], have the same
priority and the same number of running tasks. The load balancer computes
the class-specific performance score (scaled by the number of busy
siblings) by observing the currently running task on each runqueue.

As per Eq. I, cl0+cl2 has a higher throughput than cl0+cl1. So, it selects
sgB as the busiest group. If cl2 is left to run with the whole big core to
itself, it would deliver higher throughput than cl0. Hence, the runqueue of
cl0 is selected as the busiest.


+++ Dependencies
These patches assume that both SMT siblings of a core have the same
priority, as proposed in [3]. Also, they rely on the existing support for
the Hardware Feedback Interface [4].


I look forward to your review and thank you in advance!

These patches have been Acked-by: Len Brown <len.brown@intel.com>

Thanks and BR,
Ricardo

[1]. Intel Software Developer Manual, Volume 3, Section 14.6
     https://intel.com/sdm
[2]. https://lkml.kernel.org/r/20210911011819.12184-7-ricardo.neri-calderon@linux.intel.com
[3]. https://lore.kernel.org/lkml/20220825225529.26465-1-ricardo.neri-calderon@linux.intel.com/
[4]. https://lore.kernel.org/linux-pm/20220127193454.12814-1-ricardo.neri-calderon@linux.intel.com/

Ricardo Neri (23):
  sched/task_struct: Introduce classes of tasks
  sched: Add interfaces for classes of tasks
  sched/core: Initialize the class of a new task
  sched/core: Add user_tick as argument to scheduler_tick()
  sched/core: Move is_core_idle() out of fair.c
  sched/core: Update the classification of the current task
  sched/fair: Collect load-balancing stats for task classes
  sched/fair: Compute task-class performance scores for load balancing
  sched/fair: Use task-class performance score to pick the busiest group
  sched/fair: Use classes of tasks when selecting a busiest runqueue
  thermal: intel: hfi: Introduce Hardware Feedback Interface classes
  thermal: intel: hfi: Convert table_lock to use flags-handling variants
  x86/cpufeatures: Add the Intel Thread Director feature definitions
  thermal: intel: hfi: Update the class of the current task
  thermal: intel: hfi: Report per-cpu class-specific performance scores
  thermal: intel: hfi: Define a default classification for unclassified
    tasks
  thermal: intel: hfi: Enable the Intel Thread Director
  sched/task_struct: Add helpers for task classification
  sched/core: Initialize helpers of task classification
  thermal: intel: hfi: Implement model-specific checks for task
    classification
  x86/cpufeatures: Add feature bit for HRESET
  x86/hreset: Configure history reset
  x86/process: Reset hardware history in context switch

 arch/x86/include/asm/cpufeatures.h       |   2 +
 arch/x86/include/asm/disabled-features.h |   8 +-
 arch/x86/include/asm/hreset.h            |  30 ++++
 arch/x86/include/asm/msr-index.h         |   4 +
 arch/x86/include/asm/topology.h          |  10 ++
 arch/x86/kernel/cpu/common.c             |  35 +++-
 arch/x86/kernel/cpu/cpuid-deps.c         |   1 +
 arch/x86/kernel/cpu/scattered.c          |   1 +
 arch/x86/kernel/process_32.c             |   3 +
 arch/x86/kernel/process_64.c             |   3 +
 drivers/thermal/intel/Kconfig            |  12 ++
 drivers/thermal/intel/intel_hfi.c        | 215 +++++++++++++++++++++--
 include/linux/sched.h                    |  19 +-
 init/Kconfig                             |   9 +
 kernel/sched/core.c                      |  10 +-
 kernel/sched/fair.c                      | 214 ++++++++++++++++++++--
 kernel/sched/sched.h                     |  81 +++++++++
 kernel/sched/topology.c                  |   8 +
 kernel/time/timer.c                      |   2 +-
 19 files changed, 635 insertions(+), 32 deletions(-)
 create mode 100644 arch/x86/include/asm/hreset.h

-- 
2.25.1


^ permalink raw reply	[flat|nested] 78+ messages in thread

end of thread, other threads:[~2022-10-27  3:23 UTC | newest]

Thread overview: 78+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-09 23:11 [RFC PATCH 00/23] sched: Introduce classes of tasks for load balance Ricardo Neri
2022-09-09 23:11 ` [RFC PATCH 01/23] sched/task_struct: Introduce classes of tasks Ricardo Neri
2022-09-14 13:46   ` Peter Zijlstra
2022-09-16 14:41     ` Ricardo Neri
2022-09-27 13:01       ` Peter Zijlstra
2022-10-02 22:32         ` Ricardo Neri
2022-09-09 23:11 ` [RFC PATCH 02/23] sched: Add interfaces for " Ricardo Neri
2022-09-09 23:11 ` [RFC PATCH 03/23] sched/core: Initialize the class of a new task Ricardo Neri
2022-09-26 14:57   ` Joel Fernandes
2022-09-26 21:53     ` Ricardo Neri
2022-09-27 13:04     ` Peter Zijlstra
2022-09-27 15:48       ` Joel Fernandes
2022-10-01 20:32       ` Ricardo Neri
2022-09-09 23:11 ` [RFC PATCH 04/23] sched/core: Add user_tick as argument to scheduler_tick() Ricardo Neri
2022-09-09 23:11 ` [RFC PATCH 05/23] sched/core: Move is_core_idle() out of fair.c Ricardo Neri
2022-09-09 23:11 ` [RFC PATCH 06/23] sched/core: Update the classification of the current task Ricardo Neri
2022-09-14 13:44   ` Peter Zijlstra
2022-09-16 14:42     ` Ricardo Neri
2022-09-09 23:11 ` [RFC PATCH 07/23] sched/fair: Collect load-balancing stats for task classes Ricardo Neri
2022-09-09 23:11 ` [RFC PATCH 08/23] sched/fair: Compute task-class performance scores for load balancing Ricardo Neri
2022-09-27  9:15   ` Peter Zijlstra
2022-10-26  3:57     ` Ricardo Neri
2022-10-26  8:55       ` Peter Zijlstra
2022-10-27  3:30         ` Ricardo Neri
2022-09-09 23:11 ` [RFC PATCH 09/23] sched/fair: Use task-class performance score to pick the busiest group Ricardo Neri
2022-09-27 11:01   ` Peter Zijlstra
2022-10-05 23:38     ` Ricardo Neri
2022-10-06  8:37       ` Peter Zijlstra
2022-10-06 19:07         ` Ricardo Neri
2022-09-09 23:11 ` [RFC PATCH 10/23] sched/fair: Use classes of tasks when selecting a busiest runqueue Ricardo Neri
2022-09-27 11:25   ` Peter Zijlstra
2022-10-07 23:36     ` Ricardo Neri
2022-09-09 23:11 ` [RFC PATCH 11/23] thermal: intel: hfi: Introduce Hardware Feedback Interface classes Ricardo Neri
2022-09-09 23:11 ` [RFC PATCH 12/23] thermal: intel: hfi: Convert table_lock to use flags-handling variants Ricardo Neri
2022-09-27 11:34   ` Peter Zijlstra
2022-09-27 11:36     ` Peter Zijlstra
2022-10-26  3:59       ` Ricardo Neri
2022-10-26  3:58     ` Ricardo Neri
2022-09-09 23:11 ` [RFC PATCH 13/23] x86/cpufeatures: Add the Intel Thread Director feature definitions Ricardo Neri
2022-09-09 23:11 ` [RFC PATCH 14/23] thermal: intel: hfi: Update the class of the current task Ricardo Neri
2022-09-27 11:46   ` Peter Zijlstra
2022-10-07 20:34     ` Ricardo Neri
2022-09-09 23:11 ` [RFC PATCH 15/23] thermal: intel: hfi: Report per-cpu class-specific performance scores Ricardo Neri
2022-09-27 11:59   ` Peter Zijlstra
2022-10-05 23:59     ` Ricardo Neri
2022-10-06  8:52       ` Peter Zijlstra
2022-10-06  9:14         ` Peter Zijlstra
2022-10-06 15:05           ` Brown, Len
2022-10-06 16:14             ` Peter Zijlstra
2022-10-07 11:20               ` Len Brown
2022-09-09 23:11 ` [RFC PATCH 16/23] thermal: intel: hfi: Define a default classification for unclassified tasks Ricardo Neri
2022-09-09 23:11 ` [RFC PATCH 17/23] thermal: intel: hfi: Enable the Intel Thread Director Ricardo Neri
2022-09-27 12:00   ` Peter Zijlstra
2022-10-06  1:50     ` Ricardo Neri
2022-09-09 23:12 ` [RFC PATCH 18/23] sched/task_struct: Add helpers for task classification Ricardo Neri
2022-09-27 11:52   ` Peter Zijlstra
2022-10-08  0:38     ` Ricardo Neri
2022-09-09 23:12 ` [RFC PATCH 19/23] sched/core: Initialize helpers of " Ricardo Neri
2022-09-09 23:12 ` [RFC PATCH 20/23] thermal: intel: hfi: Implement model-specific checks for " Ricardo Neri
2022-09-09 23:12 ` [RFC PATCH 21/23] x86/cpufeatures: Add feature bit for HRESET Ricardo Neri
2022-09-09 23:12 ` [RFC PATCH 22/23] x86/hreset: Configure history reset Ricardo Neri
2022-09-27 12:03   ` Peter Zijlstra
2022-10-02 22:34     ` Ricardo Neri
2022-09-09 23:12 ` [RFC PATCH 23/23] x86/process: Reset hardware history in context switch Ricardo Neri
2022-09-27 12:52   ` Peter Zijlstra
2022-10-03 23:07     ` Ricardo Neri
2022-10-06  8:35       ` Peter Zijlstra
2022-10-06 22:55         ` Ricardo Neri
2022-09-27 12:53   ` Peter Zijlstra
2022-10-02 22:02     ` Ricardo Neri
2022-09-27 13:15   ` Borislav Petkov
2022-10-02 22:12     ` Ricardo Neri
2022-10-02 22:15       ` Borislav Petkov
2022-10-03 19:49         ` Ricardo Neri
2022-10-03 19:55           ` Borislav Petkov
     [not found] ` <20220910072120.2651-1-hdanton@sina.com>
2022-09-16 14:51   ` [RFC PATCH 06/23] sched/core: Update the classification of the current task Ricardo Neri
2022-10-11 19:12 ` Trying to apply patch set Carlos Bilbao
2022-10-18  2:31   ` Ricardo Neri

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).