qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 00/10] ARM virt: Introduce CPU clusters topology support
@ 2021-11-21 12:24 Yanan Wang via
  2021-11-21 12:24 ` [PATCH v4 01/10] qemu-options: Improve readability of SMP related Docs Yanan Wang via
                   ` (11 more replies)
  0 siblings, 12 replies; 15+ messages in thread
From: Yanan Wang via @ 2021-11-21 12:24 UTC (permalink / raw)
  To: qemu-devel, qemu-arm
  Cc: Peter Maydell, Andrew Jones, Eduardo Habkost,
	Philippe Mathieu-Daudé,
	Marcel Apfelbaum, Paolo Bonzini, Michael S . Tsirkin,
	Igor Mammedov, Ani Sinha, Markus Armbruster, Eric Blake,
	wanghaibin.wang, Yanan Wang

Hi,

This series introduces the new CPU clusters topology parameter
and enable the support for it on ARM virt machines.

Background and descriptions:
The new Cluster-Aware Scheduling support has landed in Linux 5.16,
which has been proved to benefit the scheduling performance (e.g.
load balance and wake_affine strategy) on both x86_64 and AArch64.
We can see Kernel PR [1] and the latest patch set [2] for reference.

So now in Linux 5.16 we have four-level arch-neutral CPU topology
definition like below and a new scheduler level for clusters.
struct cpu_topology {
    int thread_id;
    int core_id;
    int cluster_id;
    int package_id;
    int llc_id;
    cpumask_t thread_sibling;
    cpumask_t core_sibling;
    cpumask_t cluster_sibling;
    cpumask_t llc_sibling;
}

A cluster generally means a group of CPU cores which share L2 cache
or other mid-level resources, and it is the shared resources that
is used to improve scheduler's behavior. From the point of view of
the size range, it's between CPU die and CPU core. For example, on
some ARM64 Kunpeng servers, we have 6 clusters in each NUMA node,
and 4 CPU cores in each cluster. The 4 CPU cores share a separate
L2 cache and a L3 cache tag, which brings cache affinity advantage.

[1] https://lore.kernel.org/lkml/163572864855.3357115.17938524897008353101.tglx@xen13/
[2] https://lkml.org/lkml/2021/9/24/178

In virtualization, on the Hosts which have pClusters, if we can
design a vCPU topology with cluster level for guest kernel and
have a dedicated vCPU pinning. A Cluster-Aware Guest kernel can
also make use of the cache affinity of CPU clusters to gain
similar scheduling performance.

This series consists of two parts:
The first part (patch 1-3):
Implement infrastructure for CPU cluster level topology support,
including the SMP documentation, configuration and parsing.

The second part (part 4-10):
Enable CPU cluster support on ARM virt machines, so that users
can specify a 4-level CPU hierarchy sockets/clusters/cores/threads.
And the 4-level topology will be described to guest kernel through
ACPI PPTT and DT cpu-map.

Changelog:
v3->v4:
- Significant change from v3 to v4, since the whole series is reworked
  based on latest QEMU SMP frame.
- v3: https://lore.kernel.org/qemu-devel/20210516103228.37792-1-wangyanan55@huawei.com/

Yanan Wang (10):
  qemu-options: Improve readability of SMP related Docs
  hw/core/machine: Introduce CPU cluster topology support
  hw/core/machine: Wrap target specific parameters together
  hw/arm/virt: Support clusters on ARM virt machines
  hw/arm/virt: Support cluster level in DT cpu-map
  hw/acpi/aml-build: Improve scalability of PPTT generation
  hw/arm/virt-acpi-build: Make an ARM specific PPTT generator
  tests/acpi/bios-tables-test: Allow changes to virt/PPTT file
  hw/acpi/virt-acpi-build: Support cluster level in PPTT generation
  tests/acpi/bios-table-test: Update expected virt/PPTT file

 hw/acpi/aml-build.c         |  66 ++------------------------
 hw/arm/virt-acpi-build.c    |  92 +++++++++++++++++++++++++++++++++++-
 hw/arm/virt.c               |  16 ++++---
 hw/core/machine-smp.c       |  29 +++++++++---
 hw/core/machine.c           |   3 ++
 include/hw/acpi/aml-build.h |   5 +-
 include/hw/boards.h         |   6 ++-
 qapi/machine.json           |   5 +-
 qemu-options.hx             |  91 +++++++++++++++++++++++++++--------
 softmmu/vl.c                |   3 ++
 tests/data/acpi/virt/PPTT   | Bin 76 -> 96 bytes
 11 files changed, 214 insertions(+), 102 deletions(-)

--
2.19.1



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v4 01/10] qemu-options: Improve readability of SMP related Docs
  2021-11-21 12:24 [PATCH v4 00/10] ARM virt: Introduce CPU clusters topology support Yanan Wang via
@ 2021-11-21 12:24 ` Yanan Wang via
  2021-11-21 12:24 ` [PATCH v4 02/10] hw/core/machine: Introduce CPU cluster topology support Yanan Wang via
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: Yanan Wang via @ 2021-11-21 12:24 UTC (permalink / raw)
  To: qemu-devel, qemu-arm
  Cc: Peter Maydell, Andrew Jones, Eduardo Habkost,
	Philippe Mathieu-Daudé,
	Marcel Apfelbaum, Paolo Bonzini, Michael S . Tsirkin,
	Igor Mammedov, Ani Sinha, Markus Armbruster, Eric Blake,
	wanghaibin.wang, Yanan Wang

We have a description in qemu-options.hx for each CPU topology
parameter to explain what it exactly means, and also an extra
declaration for the target-specific one, e.g. "for PC only"
when describing "dies", and "for PC, it's on one die" when
describing "cores".

Now we are going to introduce one more non-generic parameter
"clusters", it will make the Doc less readable and  if we still
continue to use the legacy way to describe it.

So let's at first make two tweaks of the Docs to improve the
readability and also scalability:
1) In the -help text: Delete the extra specific declaration and
   describe each topology parameter level by level. Then add a
   note to declare that different machines may support different
   subsets and the actual meaning of the supported parameters
   will vary accordingly.
2) In the rST text: List all the sub-hierarchies currently
   supported in QEMU, and correspondingly give an example of
   -smp configuration for each of them.

Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
---
 qemu-options.hx | 76 ++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 59 insertions(+), 17 deletions(-)

diff --git a/qemu-options.hx b/qemu-options.hx
index ae2c6dbbfc..7a59db7764 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -207,14 +207,26 @@ ERST
 
 DEF("smp", HAS_ARG, QEMU_OPTION_smp,
     "-smp [[cpus=]n][,maxcpus=maxcpus][,sockets=sockets][,dies=dies][,cores=cores][,threads=threads]\n"
-    "                set the number of CPUs to 'n' [default=1]\n"
+    "                set the number of initial CPUs to 'n' [default=1]\n"
     "                maxcpus= maximum number of total CPUs, including\n"
     "                offline CPUs for hotplug, etc\n"
-    "                sockets= number of discrete sockets in the system\n"
-    "                dies= number of CPU dies on one socket (for PC only)\n"
-    "                cores= number of CPU cores on one socket (for PC, it's on one die)\n"
-    "                threads= number of threads on one CPU core\n",
-        QEMU_ARCH_ALL)
+    "                sockets= number of sockets on the machine board\n"
+    "                dies= number of dies in one socket\n"
+    "                cores= number of cores in one die\n"
+    "                threads= number of threads in one core\n"
+    "Note: Different machines may have different subsets of the CPU topology\n"
+    "      parameters supported, so the actual meaning of the supported parameters\n"
+    "      will vary accordingly. For example, for a machine type that supports a\n"
+    "      three-level CPU hierarchy of sockets/cores/threads, the parameters will\n"
+    "      sequentially mean as below:\n"
+    "                sockets means the number of sockets on the machine board\n"
+    "                cores means the number of cores in one socket\n"
+    "                threads means the number of threads in one core\n"
+    "      For a particular machine type board, an expected CPU topology hierarchy\n"
+    "      can be defined through the supported sub-option. Unsupported parameters\n"
+    "      can also be provided in addition to the sub-option, but their values\n"
+    "      must be set as 1 in the purpose of correct parsing.\n",
+    QEMU_ARCH_ALL)
 SRST
 ``-smp [[cpus=]n][,maxcpus=maxcpus][,sockets=sockets][,dies=dies][,cores=cores][,threads=threads]``
     Simulate a SMP system with '\ ``n``\ ' CPUs initially present on
@@ -225,27 +237,57 @@ SRST
     initial CPU count will match the maximum number. When only one of them
     is given then the omitted one will be set to its counterpart's value.
     Both parameters may be specified, but the maximum number of CPUs must
-    be equal to or greater than the initial CPU count. Both parameters are
-    subject to an upper limit that is determined by the specific machine
-    type chosen.
-
-    To control reporting of CPU topology information, the number of sockets,
-    dies per socket, cores per die, and threads per core can be specified.
-    The sum `` sockets * cores * dies * threads `` must be equal to the
-    maximum CPU count. CPU targets may only support a subset of the topology
-    parameters. Where a CPU target does not support use of a particular
-    topology parameter, its value should be assumed to be 1 for the purpose
-    of computing the CPU maximum count.
+    be equal to or greater than the initial CPU count. Product of the
+    CPU topology hierarchy must be equal to the maximum number of CPUs.
+    Both parameters are subject to an upper limit that is determined by
+    the specific machine type chosen.
+
+    To control reporting of CPU topology information, values of the topology
+    parameters can be specified. Machines may only support a subset of the
+    parameters and different machines may have different subsets supported
+    which vary depending on capacity of the corresponding CPU targets. So
+    for a particular machine type board, an expected topology hierarchy can
+    be defined through the supported sub-option. Unsupported parameters can
+    also be provided in addition to the sub-option, but their values must be
+    set as 1 in the purpose of correct parsing.
 
     Either the initial CPU count, or at least one of the topology parameters
     must be specified. The specified parameters must be greater than zero,
     explicit configuration like "cpus=0" is not allowed. Values for any
     omitted parameters will be computed from those which are given.
+
+    For example, the following sub-option defines a CPU topology hierarchy
+    (2 sockets totally on the machine, 2 cores per socket, 2 threads per
+    core) for a machine that only supports sockets/cores/threads.
+    Some members of the option can be omitted but their values will be
+    automatically computed:
+
+    ::
+
+        -smp 8,sockets=2,cores=2,threads=2,maxcpus=8
+
+    The following sub-option defines a CPU topology hierarchy (2 sockets
+    totally on the machine, 2 dies per socket, 2 cores per die, 2 threads
+    per core) for PC machines which support sockets/dies/cores/threads.
+    Some members of the option can be omitted but their values will be
+    automatically computed:
+
+    ::
+
+        -smp 16,sockets=2,dies=2,cores=2,threads=2,maxcpus=16
+
     Historically preference was given to the coarsest topology parameters
     when computing missing values (ie sockets preferred over cores, which
     were preferred over threads), however, this behaviour is considered
     liable to change. Prior to 6.2 the preference was sockets over cores
     over threads. Since 6.2 the preference is cores over sockets over threads.
+
+    For example, the following option defines a machine board with 2 sockets
+    of 1 core before 6.2 and 1 socket of 2 cores after 6.2:
+
+    ::
+
+        -smp 2
 ERST
 
 DEF("numa", HAS_ARG, QEMU_OPTION_numa,
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v4 02/10] hw/core/machine: Introduce CPU cluster topology support
  2021-11-21 12:24 [PATCH v4 00/10] ARM virt: Introduce CPU clusters topology support Yanan Wang via
  2021-11-21 12:24 ` [PATCH v4 01/10] qemu-options: Improve readability of SMP related Docs Yanan Wang via
@ 2021-11-21 12:24 ` Yanan Wang via
  2021-11-21 12:24 ` [PATCH v4 03/10] hw/core/machine: Wrap target specific parameters together Yanan Wang via
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: Yanan Wang via @ 2021-11-21 12:24 UTC (permalink / raw)
  To: qemu-devel, qemu-arm
  Cc: Peter Maydell, Andrew Jones, Eduardo Habkost,
	Philippe Mathieu-Daudé,
	Marcel Apfelbaum, Paolo Bonzini, Michael S . Tsirkin,
	Igor Mammedov, Ani Sinha, Markus Armbruster, Eric Blake,
	wanghaibin.wang, Yanan Wang

The new Cluster-Aware Scheduling support has landed in Linux 5.16,
which has been proved to benefit the scheduling performance (e.g.
load balance and wake_affine strategy) on both x86_64 and AArch64.

So now in Linux 5.16 we have four-level arch-neutral CPU topology
definition like below and a new scheduler level for clusters.
struct cpu_topology {
    int thread_id;
    int core_id;
    int cluster_id;
    int package_id;
    int llc_id;
    cpumask_t thread_sibling;
    cpumask_t core_sibling;
    cpumask_t cluster_sibling;
    cpumask_t llc_sibling;
}

A cluster generally means a group of CPU cores which share L2 cache
or other mid-level resources, and it is the shared resources that
is used to improve scheduler's behavior. From the point of view of
the size range, it's between CPU die and CPU core. For example, on
some ARM64 Kunpeng servers, we have 6 clusters in each NUMA node,
and 4 CPU cores in each cluster. The 4 CPU cores share a separate
L2 cache and a L3 cache tag, which brings cache affinity advantage.

In virtualization, on the Hosts which have pClusters, if we can
design a vCPU topology with cluster level for guest kernel and
have a dedicated vCPU pinning. A Cluster-Aware Guest kernel can
also make use of the cache affinity of CPU clusters to gain
similar scheduling performance.

This patch adds infrastructure for CPU cluster level topology
configuration and parsing, so that the user can specify cluster
parameter if their machines support it.

Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
---
 hw/core/machine-smp.c | 26 +++++++++++++++++++-------
 hw/core/machine.c     |  3 +++
 include/hw/boards.h   |  6 +++++-
 qapi/machine.json     |  5 ++++-
 qemu-options.hx       |  7 ++++---
 softmmu/vl.c          |  3 +++
 6 files changed, 38 insertions(+), 12 deletions(-)

diff --git a/hw/core/machine-smp.c b/hw/core/machine-smp.c
index 116a0cbbfa..87ceb45470 100644
--- a/hw/core/machine-smp.c
+++ b/hw/core/machine-smp.c
@@ -37,6 +37,10 @@ static char *cpu_hierarchy_to_string(MachineState *ms)
         g_string_append_printf(s, " * dies (%u)", ms->smp.dies);
     }
 
+    if (mc->smp_props.clusters_supported) {
+        g_string_append_printf(s, " * clusters (%u)", ms->smp.clusters);
+    }
+
     g_string_append_printf(s, " * cores (%u)", ms->smp.cores);
     g_string_append_printf(s, " * threads (%u)", ms->smp.threads);
 
@@ -69,6 +73,7 @@ void smp_parse(MachineState *ms, SMPConfiguration *config, Error **errp)
     unsigned cpus    = config->has_cpus ? config->cpus : 0;
     unsigned sockets = config->has_sockets ? config->sockets : 0;
     unsigned dies    = config->has_dies ? config->dies : 0;
+    unsigned clusters = config->has_clusters ? config->clusters : 0;
     unsigned cores   = config->has_cores ? config->cores : 0;
     unsigned threads = config->has_threads ? config->threads : 0;
     unsigned maxcpus = config->has_maxcpus ? config->maxcpus : 0;
@@ -80,6 +85,7 @@ void smp_parse(MachineState *ms, SMPConfiguration *config, Error **errp)
     if ((config->has_cpus && config->cpus == 0) ||
         (config->has_sockets && config->sockets == 0) ||
         (config->has_dies && config->dies == 0) ||
+        (config->has_clusters && config->clusters == 0) ||
         (config->has_cores && config->cores == 0) ||
         (config->has_threads && config->threads == 0) ||
         (config->has_maxcpus && config->maxcpus == 0)) {
@@ -95,8 +101,13 @@ void smp_parse(MachineState *ms, SMPConfiguration *config, Error **errp)
         error_setg(errp, "dies not supported by this machine's CPU topology");
         return;
     }
+    if (!mc->smp_props.clusters_supported && clusters > 1) {
+        error_setg(errp, "clusters not supported by this machine's CPU topology");
+        return;
+    }
 
     dies = dies > 0 ? dies : 1;
+    clusters = clusters > 0 ? clusters : 1;
 
     /* compute missing values based on the provided ones */
     if (cpus == 0 && maxcpus == 0) {
@@ -111,41 +122,42 @@ void smp_parse(MachineState *ms, SMPConfiguration *config, Error **errp)
             if (sockets == 0) {
                 cores = cores > 0 ? cores : 1;
                 threads = threads > 0 ? threads : 1;
-                sockets = maxcpus / (dies * cores * threads);
+                sockets = maxcpus / (dies * clusters * cores * threads);
             } else if (cores == 0) {
                 threads = threads > 0 ? threads : 1;
-                cores = maxcpus / (sockets * dies * threads);
+                cores = maxcpus / (sockets * dies * clusters * threads);
             }
         } else {
             /* prefer cores over sockets since 6.2 */
             if (cores == 0) {
                 sockets = sockets > 0 ? sockets : 1;
                 threads = threads > 0 ? threads : 1;
-                cores = maxcpus / (sockets * dies * threads);
+                cores = maxcpus / (sockets * dies * clusters * threads);
             } else if (sockets == 0) {
                 threads = threads > 0 ? threads : 1;
-                sockets = maxcpus / (dies * cores * threads);
+                sockets = maxcpus / (dies * clusters * cores * threads);
             }
         }
 
         /* try to calculate omitted threads at last */
         if (threads == 0) {
-            threads = maxcpus / (sockets * dies * cores);
+            threads = maxcpus / (sockets * dies * clusters * cores);
         }
     }
 
-    maxcpus = maxcpus > 0 ? maxcpus : sockets * dies * cores * threads;
+    maxcpus = maxcpus > 0 ? maxcpus : sockets * dies * clusters * cores * threads;
     cpus = cpus > 0 ? cpus : maxcpus;
 
     ms->smp.cpus = cpus;
     ms->smp.sockets = sockets;
     ms->smp.dies = dies;
+    ms->smp.clusters = clusters;
     ms->smp.cores = cores;
     ms->smp.threads = threads;
     ms->smp.max_cpus = maxcpus;
 
     /* sanity-check of the computed topology */
-    if (sockets * dies * cores * threads != maxcpus) {
+    if (sockets * dies * clusters * cores * threads != maxcpus) {
         g_autofree char *topo_msg = cpu_hierarchy_to_string(ms);
         error_setg(errp, "Invalid CPU topology: "
                    "product of the hierarchy must match maxcpus: "
diff --git a/hw/core/machine.c b/hw/core/machine.c
index 53a99abc56..d4fa6e0306 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -742,10 +742,12 @@ static void machine_get_smp(Object *obj, Visitor *v, const char *name,
         .has_cpus = true, .cpus = ms->smp.cpus,
         .has_sockets = true, .sockets = ms->smp.sockets,
         .has_dies = true, .dies = ms->smp.dies,
+        .has_clusters = true, .clusters = ms->smp.clusters,
         .has_cores = true, .cores = ms->smp.cores,
         .has_threads = true, .threads = ms->smp.threads,
         .has_maxcpus = true, .maxcpus = ms->smp.max_cpus,
     };
+
     if (!visit_type_SMPConfiguration(v, name, &config, &error_abort)) {
         return;
     }
@@ -932,6 +934,7 @@ static void machine_initfn(Object *obj)
     ms->smp.max_cpus = mc->default_cpus;
     ms->smp.sockets = 1;
     ms->smp.dies = 1;
+    ms->smp.clusters = 1;
     ms->smp.cores = 1;
     ms->smp.threads = 1;
 }
diff --git a/include/hw/boards.h b/include/hw/boards.h
index 9c1c190104..1a136edb0e 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -128,10 +128,12 @@ typedef struct {
  * SMPCompatProps:
  * @prefer_sockets - whether sockets are preferred over cores in smp parsing
  * @dies_supported - whether dies are supported by the machine
+ * @clusters_supported - whether clusters are supported by the machine
  */
 typedef struct {
     bool prefer_sockets;
     bool dies_supported;
+    bool clusters_supported;
 } SMPCompatProps;
 
 /**
@@ -298,7 +300,8 @@ typedef struct DeviceMemoryState {
  * @cpus: the number of present logical processors on the machine
  * @sockets: the number of sockets on the machine
  * @dies: the number of dies in one socket
- * @cores: the number of cores in one die
+ * @clusters: the number of clusters in one die
+ * @cores: the number of cores in one cluster
  * @threads: the number of threads in one core
  * @max_cpus: the maximum number of logical processors on the machine
  */
@@ -306,6 +309,7 @@ typedef struct CpuTopology {
     unsigned int cpus;
     unsigned int sockets;
     unsigned int dies;
+    unsigned int clusters;
     unsigned int cores;
     unsigned int threads;
     unsigned int max_cpus;
diff --git a/qapi/machine.json b/qapi/machine.json
index 067e3f5378..b9dd8f7f90 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -1396,7 +1396,9 @@
 #
 # @dies: number of dies per socket in the CPU topology
 #
-# @cores: number of cores per die in the CPU topology
+# @clusters: number of clusters per die in the CPU topology
+#
+# @cores: number of cores per cluster in the CPU topology
 #
 # @threads: number of threads per core in the CPU topology
 #
@@ -1408,6 +1410,7 @@
      '*cpus': 'int',
      '*sockets': 'int',
      '*dies': 'int',
+     '*clusters': 'int',
      '*cores': 'int',
      '*threads': 'int',
      '*maxcpus': 'int' } }
diff --git a/qemu-options.hx b/qemu-options.hx
index 7a59db7764..0f26f7dad7 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -206,13 +206,14 @@ SRST
 ERST
 
 DEF("smp", HAS_ARG, QEMU_OPTION_smp,
-    "-smp [[cpus=]n][,maxcpus=maxcpus][,sockets=sockets][,dies=dies][,cores=cores][,threads=threads]\n"
+    "-smp [[cpus=]n][,maxcpus=maxcpus][,sockets=sockets][,dies=dies][,clusters=clusters][,cores=cores][,threads=threads]\n"
     "                set the number of initial CPUs to 'n' [default=1]\n"
     "                maxcpus= maximum number of total CPUs, including\n"
     "                offline CPUs for hotplug, etc\n"
     "                sockets= number of sockets on the machine board\n"
     "                dies= number of dies in one socket\n"
-    "                cores= number of cores in one die\n"
+    "                clusters= number of clusters in one die\n"
+    "                cores= number of cores in one cluster\n"
     "                threads= number of threads in one core\n"
     "Note: Different machines may have different subsets of the CPU topology\n"
     "      parameters supported, so the actual meaning of the supported parameters\n"
@@ -228,7 +229,7 @@ DEF("smp", HAS_ARG, QEMU_OPTION_smp,
     "      must be set as 1 in the purpose of correct parsing.\n",
     QEMU_ARCH_ALL)
 SRST
-``-smp [[cpus=]n][,maxcpus=maxcpus][,sockets=sockets][,dies=dies][,cores=cores][,threads=threads]``
+``-smp [[cpus=]n][,maxcpus=maxcpus][,sockets=sockets][,dies=dies][,clusters=clusters][,cores=cores][,threads=threads]``
     Simulate a SMP system with '\ ``n``\ ' CPUs initially present on
     the machine type board. On boards supporting CPU hotplug, the optional
     '\ ``maxcpus``\ ' parameter can be set to enable further CPUs to be
diff --git a/softmmu/vl.c b/softmmu/vl.c
index 1159a64bce..7acf06dace 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -726,6 +726,9 @@ static QemuOptsList qemu_smp_opts = {
         }, {
             .name = "dies",
             .type = QEMU_OPT_NUMBER,
+        }, {
+            .name = "clusters",
+            .type = QEMU_OPT_NUMBER,
         }, {
             .name = "cores",
             .type = QEMU_OPT_NUMBER,
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v4 03/10] hw/core/machine: Wrap target specific parameters together
  2021-11-21 12:24 [PATCH v4 00/10] ARM virt: Introduce CPU clusters topology support Yanan Wang via
  2021-11-21 12:24 ` [PATCH v4 01/10] qemu-options: Improve readability of SMP related Docs Yanan Wang via
  2021-11-21 12:24 ` [PATCH v4 02/10] hw/core/machine: Introduce CPU cluster topology support Yanan Wang via
@ 2021-11-21 12:24 ` Yanan Wang via
  2021-12-16 13:23   ` Philippe Mathieu-Daudé
  2021-11-21 12:24 ` [PATCH v4 04/10] hw/arm/virt: Support clusters on ARM virt machines Yanan Wang via
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 15+ messages in thread
From: Yanan Wang via @ 2021-11-21 12:24 UTC (permalink / raw)
  To: qemu-devel, qemu-arm
  Cc: Peter Maydell, Andrew Jones, Eduardo Habkost,
	Philippe Mathieu-Daudé,
	Marcel Apfelbaum, Paolo Bonzini, Michael S . Tsirkin,
	Igor Mammedov, Ani Sinha, Markus Armbruster, Eric Blake,
	wanghaibin.wang, Yanan Wang

Wrap the CPU target specific parameters together into a single
variable, so that we don't need to update the other lines but
a single line when new topology parameters are introduced.

No functional change intended.

Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
---
 hw/core/machine-smp.c | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/hw/core/machine-smp.c b/hw/core/machine-smp.c
index 87ceb45470..2a3f16e52b 100644
--- a/hw/core/machine-smp.c
+++ b/hw/core/machine-smp.c
@@ -77,6 +77,7 @@ void smp_parse(MachineState *ms, SMPConfiguration *config, Error **errp)
     unsigned cores   = config->has_cores ? config->cores : 0;
     unsigned threads = config->has_threads ? config->threads : 0;
     unsigned maxcpus = config->has_maxcpus ? config->maxcpus : 0;
+    unsigned others;
 
     /*
      * Specified CPU topology parameters must be greater than zero,
@@ -109,6 +110,8 @@ void smp_parse(MachineState *ms, SMPConfiguration *config, Error **errp)
     dies = dies > 0 ? dies : 1;
     clusters = clusters > 0 ? clusters : 1;
 
+    others = dies * clusters;
+
     /* compute missing values based on the provided ones */
     if (cpus == 0 && maxcpus == 0) {
         sockets = sockets > 0 ? sockets : 1;
@@ -122,30 +125,30 @@ void smp_parse(MachineState *ms, SMPConfiguration *config, Error **errp)
             if (sockets == 0) {
                 cores = cores > 0 ? cores : 1;
                 threads = threads > 0 ? threads : 1;
-                sockets = maxcpus / (dies * clusters * cores * threads);
+                sockets = maxcpus / (cores * threads * others);
             } else if (cores == 0) {
                 threads = threads > 0 ? threads : 1;
-                cores = maxcpus / (sockets * dies * clusters * threads);
+                cores = maxcpus / (sockets * threads * others);
             }
         } else {
             /* prefer cores over sockets since 6.2 */
             if (cores == 0) {
                 sockets = sockets > 0 ? sockets : 1;
                 threads = threads > 0 ? threads : 1;
-                cores = maxcpus / (sockets * dies * clusters * threads);
+                cores = maxcpus / (sockets * threads * others);
             } else if (sockets == 0) {
                 threads = threads > 0 ? threads : 1;
-                sockets = maxcpus / (dies * clusters * cores * threads);
+                sockets = maxcpus / (cores * threads * others);
             }
         }
 
         /* try to calculate omitted threads at last */
         if (threads == 0) {
-            threads = maxcpus / (sockets * dies * clusters * cores);
+            threads = maxcpus / (sockets * cores * others);
         }
     }
 
-    maxcpus = maxcpus > 0 ? maxcpus : sockets * dies * clusters * cores * threads;
+    maxcpus = maxcpus > 0 ? maxcpus : sockets * cores * threads * others;
     cpus = cpus > 0 ? cpus : maxcpus;
 
     ms->smp.cpus = cpus;
@@ -157,7 +160,7 @@ void smp_parse(MachineState *ms, SMPConfiguration *config, Error **errp)
     ms->smp.max_cpus = maxcpus;
 
     /* sanity-check of the computed topology */
-    if (sockets * dies * clusters * cores * threads != maxcpus) {
+    if (sockets * cores * threads * others != maxcpus) {
         g_autofree char *topo_msg = cpu_hierarchy_to_string(ms);
         error_setg(errp, "Invalid CPU topology: "
                    "product of the hierarchy must match maxcpus: "
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v4 04/10] hw/arm/virt: Support clusters on ARM virt machines
  2021-11-21 12:24 [PATCH v4 00/10] ARM virt: Introduce CPU clusters topology support Yanan Wang via
                   ` (2 preceding siblings ...)
  2021-11-21 12:24 ` [PATCH v4 03/10] hw/core/machine: Wrap target specific parameters together Yanan Wang via
@ 2021-11-21 12:24 ` Yanan Wang via
  2021-11-21 12:24 ` [PATCH v4 05/10] hw/arm/virt: Support cluster level in DT cpu-map Yanan Wang via
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: Yanan Wang via @ 2021-11-21 12:24 UTC (permalink / raw)
  To: qemu-devel, qemu-arm
  Cc: Peter Maydell, Andrew Jones, Eduardo Habkost,
	Philippe Mathieu-Daudé,
	Marcel Apfelbaum, Paolo Bonzini, Michael S . Tsirkin,
	Igor Mammedov, Ani Sinha, Markus Armbruster, Eric Blake,
	wanghaibin.wang, Yanan Wang

In implementations of ARM64 architecture, at most there could be
a CPU topology hierarchy like "sockets/dies/clusters/cores/threads"
defined. For example, some ARM64 server chip Kunpeng 920 totally
has 2 sockets, 2 NUMA nodes (also represent CPU dies range) in each
socket, 6 clusters in each NUMA node, 4 CPU cores in each cluster.

Clusters within the same NUMA share the L3 cache data and cores
within the same cluster share a L2 cache and a L3 cache tag.
Given that designing a vCPU topology with cluster level for the
guest can gain scheduling performance improvement, let's support
this new parameter on ARM virt machines.

After this, we can define a 4-level CPU topology hierarchy like:
cpus=*,maxcpus=*,sockets=*,clusters=*,cores=*,threads=*.

Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
---
 hw/arm/virt.c   |  1 +
 qemu-options.hx | 10 ++++++++++
 2 files changed, 11 insertions(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 369552ad45..b2129f7ccd 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -2698,6 +2698,7 @@ static void virt_machine_class_init(ObjectClass *oc, void *data)
     hc->unplug_request = virt_machine_device_unplug_request_cb;
     hc->unplug = virt_machine_device_unplug_cb;
     mc->nvdimm_supported = true;
+    mc->smp_props.clusters_supported = true;
     mc->auto_enable_numa_with_memhp = true;
     mc->auto_enable_numa_with_memdev = true;
     mc->default_ram_id = "mach-virt.ram";
diff --git a/qemu-options.hx b/qemu-options.hx
index 0f26f7dad7..74d335e4c3 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -277,6 +277,16 @@ SRST
 
         -smp 16,sockets=2,dies=2,cores=2,threads=2,maxcpus=16
 
+    The following sub-option defines a CPU topology hierarchy (2 sockets
+    totally on the machine, 2 clusters per socket, 2 cores per cluster,
+    2 threads per core) for ARM virt machines which support sockets/clusters
+    /cores/threads. Some members of the option can be omitted but their values
+    will be automatically computed:
+
+    ::
+
+        -smp 16,sockets=2,clusters=2,cores=2,threads=2,maxcpus=16
+
     Historically preference was given to the coarsest topology parameters
     when computing missing values (ie sockets preferred over cores, which
     were preferred over threads), however, this behaviour is considered
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v4 05/10] hw/arm/virt: Support cluster level in DT cpu-map
  2021-11-21 12:24 [PATCH v4 00/10] ARM virt: Introduce CPU clusters topology support Yanan Wang via
                   ` (3 preceding siblings ...)
  2021-11-21 12:24 ` [PATCH v4 04/10] hw/arm/virt: Support clusters on ARM virt machines Yanan Wang via
@ 2021-11-21 12:24 ` Yanan Wang via
  2021-11-21 12:24 ` [PATCH v4 06/10] hw/acpi/aml-build: Improve scalability of PPTT generation Yanan Wang via
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: Yanan Wang via @ 2021-11-21 12:24 UTC (permalink / raw)
  To: qemu-devel, qemu-arm
  Cc: Peter Maydell, Andrew Jones, Eduardo Habkost,
	Philippe Mathieu-Daudé,
	Marcel Apfelbaum, Paolo Bonzini, Michael S . Tsirkin,
	Igor Mammedov, Ani Sinha, Markus Armbruster, Eric Blake,
	wanghaibin.wang, Yanan Wang

Support one cluster level between core and physical package in the
cpu-map of Arm/virt devicetree. This is also consistent with Linux
Doc "Documentation/devicetree/bindings/cpu/cpu-topology.txt".

Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
---
 hw/arm/virt.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index b2129f7ccd..dfdc64c4e3 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -430,9 +430,8 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms)
          * can contain several layers of clustering within a single physical
          * package and cluster nodes can be contained in parent cluster nodes.
          *
-         * Given that cluster is not yet supported in the vCPU topology,
-         * we currently generate one cluster node within each socket node
-         * by default.
+         * Note: currently we only support one layer of clustering within
+         * each physical package.
          */
         qemu_fdt_add_subnode(ms->fdt, "/cpus/cpu-map");
 
@@ -442,14 +441,16 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms)
 
             if (ms->smp.threads > 1) {
                 map_path = g_strdup_printf(
-                    "/cpus/cpu-map/socket%d/cluster0/core%d/thread%d",
-                    cpu / (ms->smp.cores * ms->smp.threads),
+                    "/cpus/cpu-map/socket%d/cluster%d/core%d/thread%d",
+                    cpu / (ms->smp.clusters * ms->smp.cores * ms->smp.threads),
+                    (cpu / (ms->smp.cores * ms->smp.threads)) % ms->smp.clusters,
                     (cpu / ms->smp.threads) % ms->smp.cores,
                     cpu % ms->smp.threads);
             } else {
                 map_path = g_strdup_printf(
-                    "/cpus/cpu-map/socket%d/cluster0/core%d",
-                    cpu / ms->smp.cores,
+                    "/cpus/cpu-map/socket%d/cluster%d/core%d",
+                    cpu / (ms->smp.clusters * ms->smp.cores),
+                    (cpu / ms->smp.cores) % ms->smp.clusters,
                     cpu % ms->smp.cores);
             }
             qemu_fdt_add_path(ms->fdt, map_path);
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v4 06/10] hw/acpi/aml-build: Improve scalability of PPTT generation
  2021-11-21 12:24 [PATCH v4 00/10] ARM virt: Introduce CPU clusters topology support Yanan Wang via
                   ` (4 preceding siblings ...)
  2021-11-21 12:24 ` [PATCH v4 05/10] hw/arm/virt: Support cluster level in DT cpu-map Yanan Wang via
@ 2021-11-21 12:24 ` Yanan Wang via
  2021-11-21 12:24 ` [PATCH v4 07/10] hw/arm/virt-acpi-build: Make an ARM specific PPTT generator Yanan Wang via
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: Yanan Wang via @ 2021-11-21 12:24 UTC (permalink / raw)
  To: qemu-devel, qemu-arm
  Cc: Peter Maydell, Andrew Jones, Eduardo Habkost,
	Philippe Mathieu-Daudé,
	Marcel Apfelbaum, Paolo Bonzini, Michael S . Tsirkin,
	Igor Mammedov, Ani Sinha, Markus Armbruster, Eric Blake,
	wanghaibin.wang, Yanan Wang

Currently we generate a PPTT table of n-level processor hierarchy
with n-level loops in build_pptt(). It works fine as now there are
only three CPU topology parameters. But the code may become less
scalable with the processor hierarchy levels increasing.

This patch only improves the scalability of build_pptt by reducing
the loops, and intends to make no functional change.

Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
---
 hw/acpi/aml-build.c | 50 +++++++++++++++++++++++++++++----------------
 1 file changed, 32 insertions(+), 18 deletions(-)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index b3b3310df3..be3851be36 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -2001,7 +2001,10 @@ static void build_processor_hierarchy_node(GArray *tbl, uint32_t flags,
 void build_pptt(GArray *table_data, BIOSLinker *linker, MachineState *ms,
                 const char *oem_id, const char *oem_table_id)
 {
-    int pptt_start = table_data->len;
+    GQueue *list = g_queue_new();
+    guint pptt_start = table_data->len;
+    guint father_offset;
+    guint length, i;
     int uid = 0;
     int socket;
     AcpiTable table = { .sig = "PPTT", .rev = 2,
@@ -2010,9 +2013,8 @@ void build_pptt(GArray *table_data, BIOSLinker *linker, MachineState *ms,
     acpi_table_begin(&table, table_data);
 
     for (socket = 0; socket < ms->smp.sockets; socket++) {
-        uint32_t socket_offset = table_data->len - pptt_start;
-        int core;
-
+        g_queue_push_tail(list,
+            GUINT_TO_POINTER(table_data->len - pptt_start));
         build_processor_hierarchy_node(
             table_data,
             /*
@@ -2021,35 +2023,47 @@ void build_pptt(GArray *table_data, BIOSLinker *linker, MachineState *ms,
              */
             (1 << 0),
             0, socket, NULL, 0);
+    }
 
-        for (core = 0; core < ms->smp.cores; core++) {
-            uint32_t core_offset = table_data->len - pptt_start;
-            int thread;
+    length = g_queue_get_length(list);
+    for (i = 0; i < length; i++) {
+        int core;
 
+        father_offset = GPOINTER_TO_UINT(g_queue_pop_head(list));
+        for (core = 0; core < ms->smp.cores; core++) {
             if (ms->smp.threads > 1) {
+                g_queue_push_tail(list,
+                    GUINT_TO_POINTER(table_data->len - pptt_start));
                 build_processor_hierarchy_node(
                     table_data,
                     (0 << 0), /* not a physical package */
-                    socket_offset, core, NULL, 0);
-
-                for (thread = 0; thread < ms->smp.threads; thread++) {
-                    build_processor_hierarchy_node(
-                        table_data,
-                        (1 << 1) | /* ACPI Processor ID valid */
-                        (1 << 2) | /* Processor is a Thread */
-                        (1 << 3),  /* Node is a Leaf */
-                        core_offset, uid++, NULL, 0);
-                }
+                    father_offset, core, NULL, 0);
             } else {
                 build_processor_hierarchy_node(
                     table_data,
                     (1 << 1) | /* ACPI Processor ID valid */
                     (1 << 3),  /* Node is a Leaf */
-                    socket_offset, uid++, NULL, 0);
+                    father_offset, uid++, NULL, 0);
             }
         }
     }
 
+    length = g_queue_get_length(list);
+    for (i = 0; i < length; i++) {
+        int thread;
+
+        father_offset = GPOINTER_TO_UINT(g_queue_pop_head(list));
+        for (thread = 0; thread < ms->smp.threads; thread++) {
+            build_processor_hierarchy_node(
+                table_data,
+                (1 << 1) | /* ACPI Processor ID valid */
+                (1 << 2) | /* Processor is a Thread */
+                (1 << 3),  /* Node is a Leaf */
+                father_offset, uid++, NULL, 0);
+        }
+    }
+
+    g_queue_free(list);
     acpi_table_end(linker, &table);
 }
 
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v4 07/10] hw/arm/virt-acpi-build: Make an ARM specific PPTT generator
  2021-11-21 12:24 [PATCH v4 00/10] ARM virt: Introduce CPU clusters topology support Yanan Wang via
                   ` (5 preceding siblings ...)
  2021-11-21 12:24 ` [PATCH v4 06/10] hw/acpi/aml-build: Improve scalability of PPTT generation Yanan Wang via
@ 2021-11-21 12:24 ` Yanan Wang via
  2021-11-21 12:25 ` [PATCH v4 08/10] tests/acpi/bios-tables-test: Allow changes to virt/PPTT file Yanan Wang via
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: Yanan Wang via @ 2021-11-21 12:24 UTC (permalink / raw)
  To: qemu-devel, qemu-arm
  Cc: Peter Maydell, Andrew Jones, Eduardo Habkost,
	Philippe Mathieu-Daudé,
	Marcel Apfelbaum, Paolo Bonzini, Michael S . Tsirkin,
	Igor Mammedov, Ani Sinha, Markus Armbruster, Eric Blake,
	wanghaibin.wang, Yanan Wang

We have a generic build_pptt() in hw/acpi/aml-build.c but it's
currently only used in ARM acpi initialization. Now we are going
to support the new CPU cluster parameter which is currently only
supported by ARM, it won't be a very good idea to add it to the
generic build_pptt() as it will make the code complex and hard
to maintain especially when we also support CPU cache topology
hierarchy in build_pptt() too. Note that the cache topology
design also varies between different CPU targets.

So an ARM specific PPTT generator becomes necessary now. Given
that the generic one is currently only used by ARM, let's just
move build_pptt() from aml-build.c to virt-acpi-build.c with
minor update.

Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
---
 hw/acpi/aml-build.c         | 80 ++-----------------------------------
 hw/arm/virt-acpi-build.c    | 77 ++++++++++++++++++++++++++++++++++-
 include/hw/acpi/aml-build.h |  5 ++-
 3 files changed, 81 insertions(+), 81 deletions(-)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index be3851be36..040fbc9b4b 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1968,10 +1968,9 @@ void build_slit(GArray *table_data, BIOSLinker *linker, MachineState *ms,
  * ACPI spec, Revision 6.3
  * 5.2.29.1 Processor hierarchy node structure (Type 0)
  */
-static void build_processor_hierarchy_node(GArray *tbl, uint32_t flags,
-                                           uint32_t parent, uint32_t id,
-                                           uint32_t *priv_rsrc,
-                                           uint32_t priv_num)
+void build_processor_hierarchy_node(GArray *tbl, uint32_t flags,
+                                    uint32_t parent, uint32_t id,
+                                    uint32_t *priv_rsrc, uint32_t priv_num)
 {
     int i;
 
@@ -1994,79 +1993,6 @@ static void build_processor_hierarchy_node(GArray *tbl, uint32_t flags,
     }
 }
 
-/*
- * ACPI spec, Revision 6.3
- * 5.2.29 Processor Properties Topology Table (PPTT)
- */
-void build_pptt(GArray *table_data, BIOSLinker *linker, MachineState *ms,
-                const char *oem_id, const char *oem_table_id)
-{
-    GQueue *list = g_queue_new();
-    guint pptt_start = table_data->len;
-    guint father_offset;
-    guint length, i;
-    int uid = 0;
-    int socket;
-    AcpiTable table = { .sig = "PPTT", .rev = 2,
-                        .oem_id = oem_id, .oem_table_id = oem_table_id };
-
-    acpi_table_begin(&table, table_data);
-
-    for (socket = 0; socket < ms->smp.sockets; socket++) {
-        g_queue_push_tail(list,
-            GUINT_TO_POINTER(table_data->len - pptt_start));
-        build_processor_hierarchy_node(
-            table_data,
-            /*
-             * Physical package - represents the boundary
-             * of a physical package
-             */
-            (1 << 0),
-            0, socket, NULL, 0);
-    }
-
-    length = g_queue_get_length(list);
-    for (i = 0; i < length; i++) {
-        int core;
-
-        father_offset = GPOINTER_TO_UINT(g_queue_pop_head(list));
-        for (core = 0; core < ms->smp.cores; core++) {
-            if (ms->smp.threads > 1) {
-                g_queue_push_tail(list,
-                    GUINT_TO_POINTER(table_data->len - pptt_start));
-                build_processor_hierarchy_node(
-                    table_data,
-                    (0 << 0), /* not a physical package */
-                    father_offset, core, NULL, 0);
-            } else {
-                build_processor_hierarchy_node(
-                    table_data,
-                    (1 << 1) | /* ACPI Processor ID valid */
-                    (1 << 3),  /* Node is a Leaf */
-                    father_offset, uid++, NULL, 0);
-            }
-        }
-    }
-
-    length = g_queue_get_length(list);
-    for (i = 0; i < length; i++) {
-        int thread;
-
-        father_offset = GPOINTER_TO_UINT(g_queue_pop_head(list));
-        for (thread = 0; thread < ms->smp.threads; thread++) {
-            build_processor_hierarchy_node(
-                table_data,
-                (1 << 1) | /* ACPI Processor ID valid */
-                (1 << 2) | /* Processor is a Thread */
-                (1 << 3),  /* Node is a Leaf */
-                father_offset, uid++, NULL, 0);
-        }
-    }
-
-    g_queue_free(list);
-    acpi_table_end(linker, &table);
-}
-
 /* build rev1/rev3/rev5.1 FADT */
 void build_fadt(GArray *tbl, BIOSLinker *linker, const AcpiFadtData *f,
                 const char *oem_id, const char *oem_table_id)
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 674f902652..bef7056213 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -807,6 +807,80 @@ build_madt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
     acpi_table_end(linker, &table);
 }
 
+/*
+ * ACPI spec, Revision 6.3
+ * 5.2.29 Processor Properties Topology Table (PPTT)
+ */
+static void
+build_pptt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
+{
+    MachineState *ms = MACHINE(vms);
+    GQueue *list = g_queue_new();
+    guint pptt_start = table_data->len;
+    guint father_offset;
+    guint length, i;
+    int uid = 0;
+    int socket;
+    AcpiTable table = { .sig = "PPTT", .rev = 2, .oem_id = vms->oem_id,
+                        .oem_table_id = vms->oem_table_id };
+
+    acpi_table_begin(&table, table_data);
+
+    for (socket = 0; socket < ms->smp.sockets; socket++) {
+        g_queue_push_tail(list,
+            GUINT_TO_POINTER(table_data->len - pptt_start));
+        build_processor_hierarchy_node(
+            table_data,
+            /*
+             * Physical package - represents the boundary
+             * of a physical package
+             */
+            (1 << 0),
+            0, socket, NULL, 0);
+    }
+
+    length = g_queue_get_length(list);
+    for (i = 0; i < length; i++) {
+        int core;
+
+        father_offset = GPOINTER_TO_UINT(g_queue_pop_head(list));
+        for (core = 0; core < ms->smp.cores; core++) {
+            if (ms->smp.threads > 1) {
+                g_queue_push_tail(list,
+                    GUINT_TO_POINTER(table_data->len - pptt_start));
+                build_processor_hierarchy_node(
+                    table_data,
+                    (0 << 0), /* not a physical package */
+                    father_offset, core, NULL, 0);
+            } else {
+                build_processor_hierarchy_node(
+                    table_data,
+                    (1 << 1) | /* ACPI Processor ID valid */
+                    (1 << 3),  /* Node is a Leaf */
+                    father_offset, uid++, NULL, 0);
+            }
+        }
+    }
+
+    length = g_queue_get_length(list);
+    for (i = 0; i < length; i++) {
+        int thread;
+
+        father_offset = GPOINTER_TO_UINT(g_queue_pop_head(list));
+        for (thread = 0; thread < ms->smp.threads; thread++) {
+            build_processor_hierarchy_node(
+                table_data,
+                (1 << 1) | /* ACPI Processor ID valid */
+                (1 << 2) | /* Processor is a Thread */
+                (1 << 3),  /* Node is a Leaf */
+                father_offset, uid++, NULL, 0);
+        }
+    }
+
+    g_queue_free(list);
+    acpi_table_end(linker, &table);
+}
+
 /* FADT */
 static void build_fadt_rev5(GArray *table_data, BIOSLinker *linker,
                             VirtMachineState *vms, unsigned dsdt_tbl_offset)
@@ -952,8 +1026,7 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
 
     if (!vmc->no_cpu_topology) {
         acpi_add_table(table_offsets, tables_blob);
-        build_pptt(tables_blob, tables->linker, ms,
-                   vms->oem_id, vms->oem_table_id);
+        build_pptt(tables_blob, tables->linker, vms);
     }
 
     acpi_add_table(table_offsets, tables_blob);
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index 8346003a22..2c457c8f17 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -489,8 +489,9 @@ void build_srat_memory(GArray *table_data, uint64_t base,
 void build_slit(GArray *table_data, BIOSLinker *linker, MachineState *ms,
                 const char *oem_id, const char *oem_table_id);
 
-void build_pptt(GArray *table_data, BIOSLinker *linker, MachineState *ms,
-                const char *oem_id, const char *oem_table_id);
+void build_processor_hierarchy_node(GArray *tbl, uint32_t flags,
+                                    uint32_t parent, uint32_t id,
+                                    uint32_t *priv_rsrc, uint32_t priv_num);
 
 void build_fadt(GArray *tbl, BIOSLinker *linker, const AcpiFadtData *f,
                 const char *oem_id, const char *oem_table_id);
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v4 08/10] tests/acpi/bios-tables-test: Allow changes to virt/PPTT file
  2021-11-21 12:24 [PATCH v4 00/10] ARM virt: Introduce CPU clusters topology support Yanan Wang via
                   ` (6 preceding siblings ...)
  2021-11-21 12:24 ` [PATCH v4 07/10] hw/arm/virt-acpi-build: Make an ARM specific PPTT generator Yanan Wang via
@ 2021-11-21 12:25 ` Yanan Wang via
  2021-11-21 12:25 ` [PATCH v4 09/10] hw/acpi/virt-acpi-build: Support cluster level in PPTT generation Yanan Wang via
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: Yanan Wang via @ 2021-11-21 12:25 UTC (permalink / raw)
  To: qemu-devel, qemu-arm
  Cc: Peter Maydell, Andrew Jones, Eduardo Habkost,
	Philippe Mathieu-Daudé,
	Marcel Apfelbaum, Paolo Bonzini, Michael S . Tsirkin,
	Igor Mammedov, Ani Sinha, Markus Armbruster, Eric Blake,
	wanghaibin.wang, Yanan Wang

List test/data/acpi/virt/PPTT as the expected files allowed to
be changed in tests/qtest/bios-tables-test-allowed-diff.h

Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
---
 tests/qtest/bios-tables-test-allowed-diff.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tests/qtest/bios-tables-test-allowed-diff.h b/tests/qtest/bios-tables-test-allowed-diff.h
index dfb8523c8b..cb143a55a6 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1 +1,2 @@
 /* List of comma-separated changed AML files to ignore */
+"tests/data/acpi/virt/PPTT",
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v4 09/10] hw/acpi/virt-acpi-build: Support cluster level in PPTT generation
  2021-11-21 12:24 [PATCH v4 00/10] ARM virt: Introduce CPU clusters topology support Yanan Wang via
                   ` (7 preceding siblings ...)
  2021-11-21 12:25 ` [PATCH v4 08/10] tests/acpi/bios-tables-test: Allow changes to virt/PPTT file Yanan Wang via
@ 2021-11-21 12:25 ` Yanan Wang via
  2021-11-21 12:25 ` [PATCH v4 10/10] tests/acpi/bios-table-test: Update expected virt/PPTT file Yanan Wang via
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: Yanan Wang via @ 2021-11-21 12:25 UTC (permalink / raw)
  To: qemu-devel, qemu-arm
  Cc: Peter Maydell, Andrew Jones, Eduardo Habkost,
	Philippe Mathieu-Daudé,
	Marcel Apfelbaum, Paolo Bonzini, Michael S . Tsirkin,
	Igor Mammedov, Ani Sinha, Markus Armbruster, Eric Blake,
	wanghaibin.wang, Yanan Wang

Support cluster level in generation of ACPI Processor Properties
Topology Table (PPTT) for ARM virt machines.

Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
---
 hw/arm/virt-acpi-build.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index bef7056213..b34f0dbee0 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -839,6 +839,21 @@ build_pptt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
             0, socket, NULL, 0);
     }
 
+    length = g_queue_get_length(list);
+    for (i = 0; i < length; i++) {
+        int cluster;
+
+        father_offset = GPOINTER_TO_UINT(g_queue_pop_head(list));
+        for (cluster = 0; cluster < ms->smp.clusters; cluster++) {
+            g_queue_push_tail(list,
+                GUINT_TO_POINTER(table_data->len - pptt_start));
+            build_processor_hierarchy_node(
+                table_data,
+                (0 << 0), /* not a physical package */
+                father_offset, cluster, NULL, 0);
+        }
+    }
+
     length = g_queue_get_length(list);
     for (i = 0; i < length; i++) {
         int core;
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v4 10/10] tests/acpi/bios-table-test: Update expected virt/PPTT file
  2021-11-21 12:24 [PATCH v4 00/10] ARM virt: Introduce CPU clusters topology support Yanan Wang via
                   ` (8 preceding siblings ...)
  2021-11-21 12:25 ` [PATCH v4 09/10] hw/acpi/virt-acpi-build: Support cluster level in PPTT generation Yanan Wang via
@ 2021-11-21 12:25 ` Yanan Wang via
  2021-12-16  3:22 ` [PATCH v4 00/10] ARM virt: Introduce CPU clusters topology support wangyanan (Y) via
  2021-12-28 10:56 ` wangyanan (Y) via
  11 siblings, 0 replies; 15+ messages in thread
From: Yanan Wang via @ 2021-11-21 12:25 UTC (permalink / raw)
  To: qemu-devel, qemu-arm
  Cc: Peter Maydell, Andrew Jones, Eduardo Habkost,
	Philippe Mathieu-Daudé,
	Marcel Apfelbaum, Paolo Bonzini, Michael S . Tsirkin,
	Igor Mammedov, Ani Sinha, Markus Armbruster, Eric Blake,
	wanghaibin.wang, Yanan Wang

Run ./tests/data/acpi/rebuild-expected-aml.sh from build directory
to update PPTT binary. Also empty bios-tables-test-allowed-diff.h.

The disassembled differences between actual and expected PPTT:

 /*
  * Intel ACPI Component Architecture
  * AML/ASL+ Disassembler version 20180810 (64-bit version)
  * Copyright (c) 2000 - 2018 Intel Corporation
  *
- * Disassembly of tests/data/acpi/virt/PPTT, Mon Oct 25 20:24:53 2021
+ * Disassembly of /tmp/aml-BPI5B1, Mon Oct 25 20:24:53 2021
  *
  * ACPI Data Table [PPTT]
  *
  * Format: [HexOffset DecimalOffset ByteLength]  FieldName : FieldValue
  */

 [000h 0000   4]                    Signature : "PPTT"    [Processor Properties Topology Table]
-[004h 0004   4]                 Table Length : 0000004C
+[004h 0004   4]                 Table Length : 00000060
 [008h 0008   1]                     Revision : 02
-[009h 0009   1]                     Checksum : A8
+[009h 0009   1]                     Checksum : 48
 [00Ah 0010   6]                       Oem ID : "BOCHS "
 [010h 0016   8]                 Oem Table ID : "BXPC    "
 [018h 0024   4]                 Oem Revision : 00000001
 [01Ch 0028   4]              Asl Compiler ID : "BXPC"
 [020h 0032   4]        Asl Compiler Revision : 00000001

 [024h 0036   1]                Subtable Type : 00 [Processor Hierarchy Node]
 [025h 0037   1]                       Length : 14
 [026h 0038   2]                     Reserved : 0000
 [028h 0040   4]        Flags (decoded below) : 00000001
                             Physical package : 1
                      ACPI Processor ID valid : 0
 [02Ch 0044   4]                       Parent : 00000000
 [030h 0048   4]            ACPI Processor ID : 00000000
 [034h 0052   4]      Private Resource Number : 00000000

 [038h 0056   1]                Subtable Type : 00 [Processor Hierarchy Node]
 [039h 0057   1]                       Length : 14
 [03Ah 0058   2]                     Reserved : 0000
-[03Ch 0060   4]        Flags (decoded below) : 0000000A
+[03Ch 0060   4]        Flags (decoded below) : 00000000
                             Physical package : 0
-                     ACPI Processor ID valid : 1
+                     ACPI Processor ID valid : 0
 [040h 0064   4]                       Parent : 00000024
 [044h 0068   4]            ACPI Processor ID : 00000000
 [048h 0072   4]      Private Resource Number : 00000000

-Raw Table Data: Length 76 (0x4C)
+[04Ch 0076   1]                Subtable Type : 00 [Processor Hierarchy Node]
+[04Dh 0077   1]                       Length : 14
+[04Eh 0078   2]                     Reserved : 0000
+[050h 0080   4]        Flags (decoded below) : 0000000A
+                            Physical package : 0
+                     ACPI Processor ID valid : 1
+[054h 0084   4]                       Parent : 00000038
+[058h 0088   4]            ACPI Processor ID : 00000000
+[05Ch 0092   4]      Private Resource Number : 00000000
+
+Raw Table Data: Length 96 (0x60)

-    0000: 50 50 54 54 4C 00 00 00 02 A8 42 4F 43 48 53 20  // PPTTL.....BOCHS
+    0000: 50 50 54 54 60 00 00 00 02 48 42 4F 43 48 53 20  // PPTT`....HBOCHS
     0010: 42 58 50 43 20 20 20 20 01 00 00 00 42 58 50 43  // BXPC    ....BXPC
     0020: 01 00 00 00 00 14 00 00 01 00 00 00 00 00 00 00  // ................
-    0030: 00 00 00 00 00 00 00 00 00 14 00 00 0A 00 00 00  // ................
-    0040: 24 00 00 00 00 00 00 00 00 00 00 00              // $...........
+    0030: 00 00 00 00 00 00 00 00 00 14 00 00 00 00 00 00  // ................
+    0040: 24 00 00 00 00 00 00 00 00 00 00 00 00 14 00 00  // $...............
+    0050: 0A 00 00 00 38 00 00 00 00 00 00 00 00 00 00 00  // ....8...........

Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
---
 tests/data/acpi/virt/PPTT                   | Bin 76 -> 96 bytes
 tests/qtest/bios-tables-test-allowed-diff.h |   1 -
 2 files changed, 1 deletion(-)

diff --git a/tests/data/acpi/virt/PPTT b/tests/data/acpi/virt/PPTT
index 7a1258ecf123555b24462c98ccbb76b4ac1d0c2b..f56ea63b369a604877374ad696c396e796ab1c83 100644
GIT binary patch
delta 53
zcmV-50LuSNU<y!BR8(L90006=kqR;-00000Bme*a000000000002BZK3IG5AH~;_u
L0000000000uCW9Z

delta 32
qcmV+*0N?*$ObSp?R8&j=00080kqR=APy`Gl00000000000001OcLdh}

diff --git a/tests/qtest/bios-tables-test-allowed-diff.h b/tests/qtest/bios-tables-test-allowed-diff.h
index cb143a55a6..dfb8523c8b 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1,2 +1 @@
 /* List of comma-separated changed AML files to ignore */
-"tests/data/acpi/virt/PPTT",
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 00/10] ARM virt: Introduce CPU clusters topology support
  2021-11-21 12:24 [PATCH v4 00/10] ARM virt: Introduce CPU clusters topology support Yanan Wang via
                   ` (9 preceding siblings ...)
  2021-11-21 12:25 ` [PATCH v4 10/10] tests/acpi/bios-table-test: Update expected virt/PPTT file Yanan Wang via
@ 2021-12-16  3:22 ` wangyanan (Y) via
  2021-12-28 10:56 ` wangyanan (Y) via
  11 siblings, 0 replies; 15+ messages in thread
From: wangyanan (Y) via @ 2021-12-16  3:22 UTC (permalink / raw)
  To: qemu-devel, qemu-arm
  Cc: Peter Maydell, Andrew Jones, Eduardo Habkost,
	Philippe Mathieu-Daudé,
	Marcel Apfelbaum, Paolo Bonzini, Michael S . Tsirkin,
	Igor Mammedov, Ani Sinha, Markus Armbruster, Eric Blake,
	wanghaibin.wang

Ping...

On 2021/11/21 20:24, Yanan Wang wrote:
> Hi,
>
> This series introduces the new CPU clusters topology parameter
> and enable the support for it on ARM virt machines.
>
> Background and descriptions:
> The new Cluster-Aware Scheduling support has landed in Linux 5.16,
> which has been proved to benefit the scheduling performance (e.g.
> load balance and wake_affine strategy) on both x86_64 and AArch64.
> We can see Kernel PR [1] and the latest patch set [2] for reference.
>
> So now in Linux 5.16 we have four-level arch-neutral CPU topology
> definition like below and a new scheduler level for clusters.
> struct cpu_topology {
>      int thread_id;
>      int core_id;
>      int cluster_id;
>      int package_id;
>      int llc_id;
>      cpumask_t thread_sibling;
>      cpumask_t core_sibling;
>      cpumask_t cluster_sibling;
>      cpumask_t llc_sibling;
> }
>
> A cluster generally means a group of CPU cores which share L2 cache
> or other mid-level resources, and it is the shared resources that
> is used to improve scheduler's behavior. From the point of view of
> the size range, it's between CPU die and CPU core. For example, on
> some ARM64 Kunpeng servers, we have 6 clusters in each NUMA node,
> and 4 CPU cores in each cluster. The 4 CPU cores share a separate
> L2 cache and a L3 cache tag, which brings cache affinity advantage.
>
> [1] https://lore.kernel.org/lkml/163572864855.3357115.17938524897008353101.tglx@xen13/
> [2] https://lkml.org/lkml/2021/9/24/178
>
> In virtualization, on the Hosts which have pClusters, if we can
> design a vCPU topology with cluster level for guest kernel and
> have a dedicated vCPU pinning. A Cluster-Aware Guest kernel can
> also make use of the cache affinity of CPU clusters to gain
> similar scheduling performance.
>
> This series consists of two parts:
> The first part (patch 1-3):
> Implement infrastructure for CPU cluster level topology support,
> including the SMP documentation, configuration and parsing.
>
> The second part (part 4-10):
> Enable CPU cluster support on ARM virt machines, so that users
> can specify a 4-level CPU hierarchy sockets/clusters/cores/threads.
> And the 4-level topology will be described to guest kernel through
> ACPI PPTT and DT cpu-map.
>
> Changelog:
> v3->v4:
> - Significant change from v3 to v4, since the whole series is reworked
>    based on latest QEMU SMP frame.
> - v3: https://lore.kernel.org/qemu-devel/20210516103228.37792-1-wangyanan55@huawei.com/
>
> Yanan Wang (10):
>    qemu-options: Improve readability of SMP related Docs
>    hw/core/machine: Introduce CPU cluster topology support
>    hw/core/machine: Wrap target specific parameters together
>    hw/arm/virt: Support clusters on ARM virt machines
>    hw/arm/virt: Support cluster level in DT cpu-map
>    hw/acpi/aml-build: Improve scalability of PPTT generation
>    hw/arm/virt-acpi-build: Make an ARM specific PPTT generator
>    tests/acpi/bios-tables-test: Allow changes to virt/PPTT file
>    hw/acpi/virt-acpi-build: Support cluster level in PPTT generation
>    tests/acpi/bios-table-test: Update expected virt/PPTT file
>
>   hw/acpi/aml-build.c         |  66 ++------------------------
>   hw/arm/virt-acpi-build.c    |  92 +++++++++++++++++++++++++++++++++++-
>   hw/arm/virt.c               |  16 ++++---
>   hw/core/machine-smp.c       |  29 +++++++++---
>   hw/core/machine.c           |   3 ++
>   include/hw/acpi/aml-build.h |   5 +-
>   include/hw/boards.h         |   6 ++-
>   qapi/machine.json           |   5 +-
>   qemu-options.hx             |  91 +++++++++++++++++++++++++++--------
>   softmmu/vl.c                |   3 ++
>   tests/data/acpi/virt/PPTT   | Bin 76 -> 96 bytes
>   11 files changed, 214 insertions(+), 102 deletions(-)
>
> --
> 2.19.1
>
> .



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 03/10] hw/core/machine: Wrap target specific parameters together
  2021-11-21 12:24 ` [PATCH v4 03/10] hw/core/machine: Wrap target specific parameters together Yanan Wang via
@ 2021-12-16 13:23   ` Philippe Mathieu-Daudé
  2021-12-16 14:06     ` wangyanan (Y) via
  0 siblings, 1 reply; 15+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-12-16 13:23 UTC (permalink / raw)
  To: Yanan Wang, qemu-devel, qemu-arm
  Cc: Peter Maydell, Andrew Jones, Eduardo Habkost,
	Michael S . Tsirkin, wanghaibin.wang, Markus Armbruster,
	Igor Mammedov, Ani Sinha, Paolo Bonzini, Eric Blake

On 11/21/21 13:24, Yanan Wang wrote:
> Wrap the CPU target specific parameters together into a single
> variable, so that we don't need to update the other lines but
> a single line when new topology parameters are introduced.

Where new params are introduced? Not in this series apparently.

> No functional change intended.
> 
> Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
> ---
>  hw/core/machine-smp.c | 17 ++++++++++-------
>  1 file changed, 10 insertions(+), 7 deletions(-)



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 03/10] hw/core/machine: Wrap target specific parameters together
  2021-12-16 13:23   ` Philippe Mathieu-Daudé
@ 2021-12-16 14:06     ` wangyanan (Y) via
  0 siblings, 0 replies; 15+ messages in thread
From: wangyanan (Y) via @ 2021-12-16 14:06 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: Peter Maydell, Andrew Jones, Eduardo Habkost, Marcel Apfelbaum,
	Paolo Bonzini, Michael S . Tsirkin, Igor Mammedov, Ani Sinha,
	Markus Armbruster, Eric Blake, wanghaibin.wang, qemu-devel,
	qemu-arm

Hi,

On 2021/12/16 21:23, Philippe Mathieu-Daudé wrote:
> On 11/21/21 13:24, Yanan Wang wrote:
>> Wrap the CPU target specific parameters together into a single
>> variable, so that we don't need to update the other lines but
>> a single line when new topology parameters are introduced.
> Where new params are introduced? Not in this series apparently.
The commit message may not clearly express what I mean.
A new parameter "clusters" is added in patch #2, and now we have
specific dies and clusters, I tried to wrap these two parameters together
so that the code lines can be shorter and look more concise.

If it's thought not that necessary to do this change, I will get rid of this
patch of course.

Thanks,
Yanan
>
>> No functional change intended.
>>
>> Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
>> ---
>>   hw/core/machine-smp.c | 17 ++++++++++-------
>>   1 file changed, 10 insertions(+), 7 deletions(-)
> .



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4 00/10] ARM virt: Introduce CPU clusters topology support
  2021-11-21 12:24 [PATCH v4 00/10] ARM virt: Introduce CPU clusters topology support Yanan Wang via
                   ` (10 preceding siblings ...)
  2021-12-16  3:22 ` [PATCH v4 00/10] ARM virt: Introduce CPU clusters topology support wangyanan (Y) via
@ 2021-12-28 10:56 ` wangyanan (Y) via
  11 siblings, 0 replies; 15+ messages in thread
From: wangyanan (Y) via @ 2021-12-28 10:56 UTC (permalink / raw)
  To: qemu-devel, qemu-arm
  Cc: Peter Maydell, Andrew Jones, Eduardo Habkost,
	Philippe Mathieu-Daudé,
	Marcel Apfelbaum, Paolo Bonzini, Michael S . Tsirkin,
	Igor Mammedov, Ani Sinha, Markus Armbruster, Eric Blake,
	wanghaibin.wang

I have sent a v5 with four new patches added, so this v4 can be ignored.
v5: https://patchew.org/QEMU/20211228092221.21068-1-wangyanan55@huawei.com/

Thanks,
Yanan

On 2021/11/21 20:24, Yanan Wang wrote:
> Hi,
>
> This series introduces the new CPU clusters topology parameter
> and enable the support for it on ARM virt machines.
>
> Background and descriptions:
> The new Cluster-Aware Scheduling support has landed in Linux 5.16,
> which has been proved to benefit the scheduling performance (e.g.
> load balance and wake_affine strategy) on both x86_64 and AArch64.
> We can see Kernel PR [1] and the latest patch set [2] for reference.
>
> So now in Linux 5.16 we have four-level arch-neutral CPU topology
> definition like below and a new scheduler level for clusters.
> struct cpu_topology {
>      int thread_id;
>      int core_id;
>      int cluster_id;
>      int package_id;
>      int llc_id;
>      cpumask_t thread_sibling;
>      cpumask_t core_sibling;
>      cpumask_t cluster_sibling;
>      cpumask_t llc_sibling;
> }
>
> A cluster generally means a group of CPU cores which share L2 cache
> or other mid-level resources, and it is the shared resources that
> is used to improve scheduler's behavior. From the point of view of
> the size range, it's between CPU die and CPU core. For example, on
> some ARM64 Kunpeng servers, we have 6 clusters in each NUMA node,
> and 4 CPU cores in each cluster. The 4 CPU cores share a separate
> L2 cache and a L3 cache tag, which brings cache affinity advantage.
>
> [1] https://lore.kernel.org/lkml/163572864855.3357115.17938524897008353101.tglx@xen13/
> [2] https://lkml.org/lkml/2021/9/24/178
>
> In virtualization, on the Hosts which have pClusters, if we can
> design a vCPU topology with cluster level for guest kernel and
> have a dedicated vCPU pinning. A Cluster-Aware Guest kernel can
> also make use of the cache affinity of CPU clusters to gain
> similar scheduling performance.
>
> This series consists of two parts:
> The first part (patch 1-3):
> Implement infrastructure for CPU cluster level topology support,
> including the SMP documentation, configuration and parsing.
>
> The second part (part 4-10):
> Enable CPU cluster support on ARM virt machines, so that users
> can specify a 4-level CPU hierarchy sockets/clusters/cores/threads.
> And the 4-level topology will be described to guest kernel through
> ACPI PPTT and DT cpu-map.
>
> Changelog:
> v3->v4:
> - Significant change from v3 to v4, since the whole series is reworked
>    based on latest QEMU SMP frame.
> - v3: https://lore.kernel.org/qemu-devel/20210516103228.37792-1-wangyanan55@huawei.com/
>
> Yanan Wang (10):
>    qemu-options: Improve readability of SMP related Docs
>    hw/core/machine: Introduce CPU cluster topology support
>    hw/core/machine: Wrap target specific parameters together
>    hw/arm/virt: Support clusters on ARM virt machines
>    hw/arm/virt: Support cluster level in DT cpu-map
>    hw/acpi/aml-build: Improve scalability of PPTT generation
>    hw/arm/virt-acpi-build: Make an ARM specific PPTT generator
>    tests/acpi/bios-tables-test: Allow changes to virt/PPTT file
>    hw/acpi/virt-acpi-build: Support cluster level in PPTT generation
>    tests/acpi/bios-table-test: Update expected virt/PPTT file
>
>   hw/acpi/aml-build.c         |  66 ++------------------------
>   hw/arm/virt-acpi-build.c    |  92 +++++++++++++++++++++++++++++++++++-
>   hw/arm/virt.c               |  16 ++++---
>   hw/core/machine-smp.c       |  29 +++++++++---
>   hw/core/machine.c           |   3 ++
>   include/hw/acpi/aml-build.h |   5 +-
>   include/hw/boards.h         |   6 ++-
>   qapi/machine.json           |   5 +-
>   qemu-options.hx             |  91 +++++++++++++++++++++++++++--------
>   softmmu/vl.c                |   3 ++
>   tests/data/acpi/virt/PPTT   | Bin 76 -> 96 bytes
>   11 files changed, 214 insertions(+), 102 deletions(-)
>
> --
> 2.19.1
>
> .



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-12-28 10:57 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-21 12:24 [PATCH v4 00/10] ARM virt: Introduce CPU clusters topology support Yanan Wang via
2021-11-21 12:24 ` [PATCH v4 01/10] qemu-options: Improve readability of SMP related Docs Yanan Wang via
2021-11-21 12:24 ` [PATCH v4 02/10] hw/core/machine: Introduce CPU cluster topology support Yanan Wang via
2021-11-21 12:24 ` [PATCH v4 03/10] hw/core/machine: Wrap target specific parameters together Yanan Wang via
2021-12-16 13:23   ` Philippe Mathieu-Daudé
2021-12-16 14:06     ` wangyanan (Y) via
2021-11-21 12:24 ` [PATCH v4 04/10] hw/arm/virt: Support clusters on ARM virt machines Yanan Wang via
2021-11-21 12:24 ` [PATCH v4 05/10] hw/arm/virt: Support cluster level in DT cpu-map Yanan Wang via
2021-11-21 12:24 ` [PATCH v4 06/10] hw/acpi/aml-build: Improve scalability of PPTT generation Yanan Wang via
2021-11-21 12:24 ` [PATCH v4 07/10] hw/arm/virt-acpi-build: Make an ARM specific PPTT generator Yanan Wang via
2021-11-21 12:25 ` [PATCH v4 08/10] tests/acpi/bios-tables-test: Allow changes to virt/PPTT file Yanan Wang via
2021-11-21 12:25 ` [PATCH v4 09/10] hw/acpi/virt-acpi-build: Support cluster level in PPTT generation Yanan Wang via
2021-11-21 12:25 ` [PATCH v4 10/10] tests/acpi/bios-table-test: Update expected virt/PPTT file Yanan Wang via
2021-12-16  3:22 ` [PATCH v4 00/10] ARM virt: Introduce CPU clusters topology support wangyanan (Y) via
2021-12-28 10:56 ` wangyanan (Y) via

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).