QEMU-Devel Archive on lore.kernel.org
 help / color / Atom feed
* [RFC PATCH 0/6] Introduce cluster cpu topology support
@ 2021-03-31  9:53 Yanan Wang
  2021-03-31  9:53 ` [RFC PATCH 1/6] vl.c: Add arch-neutral -smp, clusters=* command line support Yanan Wang
                   ` (7 more replies)
  0 siblings, 8 replies; 13+ messages in thread
From: Yanan Wang @ 2021-03-31  9:53 UTC (permalink / raw)
  To: qemu-devel, qemu-arm
  Cc: Barry Song, Peter Maydell, Andrew Jones, Eduardo Habkost,
	Michael S . Tsirkin, wanghaibin.wang, Richard Henderson,
	Yanan Wang, Shannon Zhao, Paolo Bonzini, yuzenghui,
	Igor Mammedov, zhukeqian1, Jiajie Li

Hi,
This series introduces the cluster cpu topology support, besides now
existing sockets, cores, and threads.

A cluster means a group of cores that share some resources (e.g. cache)
among them under the LLC. For example, ARM64 server chip Kunpeng 920 has
6 or 8 clusters in each NUMA, and each cluster has 4 cores. All clusters
share L3 cache data while cores within each cluster share the L2 cache.

Also, there are some x86 CPU implementations (e.g. Jacobsville) where L2
cache is shared among a cluster of cores instead of being exclusive to
one single core. For example, on Jacobsville there are 6 clusters of 4
Atom cores, each cluster sharing a separate L2, and 24 cores sharing L3).

The cache affinity of cluster has been proved to improve the Linux kernel
scheduling performance and a patchset [1] has already been posted, where
a general sched_domain for clusters was added and a cluster level was
added in the arch-neutral cpu topology struct like below.
struct cpu_topology {
    int thread_id;
    int core_id;
    int cluster_id;
    int package_id;
    int llc_id;
    cpumask_t thread_sibling;
    cpumask_t core_sibling;
    cpumask_t cluster_sibling;
    cpumask_t llc_sibling;
};

Also Kernel Doc [2]: Documentation/devicetree/bindings/cpu/cpu-topology.txt
defines a four-level CPU topology hierarchy like socket/cluster/core/thread.
According to the context, a socket node's child nodes must be one or more
cluster nodes and a cluster node's child nodes must be one or more cluster
nodes/one or more core nodes.

So let's add the arch-neutral -smp, clusters=* command line support, so that
future guest os could make use of cluster cpu topology for better scheduling
performance. And whichever architecture that has groups of cpus sharing some
separate resources(e.g. L2 cache) internely under LLC can use this command
line parameter to define a VM with cluster level cpu topology.

For ARM machines, a four-level cpu hierarchy can be defined and it will be
sockets/clusters/cores/threads. For PC machines, a five-level cpu hierarchy
can be defined and it will be sockets/dies/clusters/cores/threads.

About this series:
Note that, this series was implemented based on [3] and [4]. Although they
have not merged into qemu mainline for now, it's still meaning to post this
series to express the thoughts first. So a RFC is sent and any comments are
welcomed and appreciated.

Test results:
With command line: -smp 96,sockets=2,clusters=6,cores=4,threads=2, VM's cpu
topology description shows as below.
lscpu:
Architecture:        aarch64
Byte Order:          Little Endian
CPU(s):              96
On-line CPU(s) list: 0-95
Thread(s) per core:  2
Core(s) per socket:  24
Socket(s):           2
NUMA node(s):        1
Vendor ID:           0x48
Model:               0
Stepping:            0x1
BogoMIPS:            200.00
L1d cache:           unknown size
L1i cache:           unknown size
L2 cache:            unknown size
NUMA node0 CPU(s):   0-95

Topology information of clusters can also be got:
cat /sys/devices/system/cpu/cpu0/topology/cluster_cpus_list: 0-7
cat /sys/devices/system/cpu/cpu0/topology/cluster_id: 56

cat /sys/devices/system/cpu/cpu8/topology/cluster_cpus_list: 8-15
cat /sys/devices/system/cpu/cpu8/topology/cluster_id: 316
...
cat /sys/devices/system/cpu/cpu95/topology/cluster_cpus_list: 88-95
cat /sys/devices/system/cpu/cpu95/topology/cluster_id: 2936

Links:
[1] https://patchwork.kernel.org/project/linux-arm-kernel/cover/20210319041618.14316-1-song.bao.hua@hisilicon.com/
[2] https://github.com/torvalds/linux/blob/master/Documentation/devicetree/bindings/cpu/cpu-topology.txt
[3] https://patchwork.kernel.org/project/qemu-devel/cover/20210225085627.2263-1-fangying1@huawei.com/
[4] https://patchwork.kernel.org/project/qemu-devel/patch/20201109030452.2197-4-fangying1@huawei.com/

Yanan Wang (6):
  vl.c: Add arch-neutral -smp, clusters=* command line support
  hw/core/machine: Parse cluster cpu topology in smp_parse()
  hw/arm/virt: Parse cluster cpu topology for ARM machines
  hw/i386/pc: Parse cluster cpu topology for PC machines
  hw/arm/virt-acpi-build: Add cluster level for ARM PPTT table
  hw/arm/virt: Add cluster level for ARM device tree

 hw/acpi/aml-build.c         | 11 +++++++++
 hw/arm/virt-acpi-build.c    | 43 ++++++++++++++++++++---------------
 hw/arm/virt.c               | 45 ++++++++++++++++++++++---------------
 hw/core/machine.c           | 32 +++++++++++++++-----------
 hw/i386/pc.c                | 31 +++++++++++++++----------
 include/hw/acpi/aml-build.h |  2 ++
 include/hw/boards.h         |  4 +++-
 qemu-options.hx             | 27 +++++++++++++---------
 softmmu/vl.c                |  3 +++
 9 files changed, 125 insertions(+), 73 deletions(-)

-- 
2.19.1



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC PATCH 1/6] vl.c: Add arch-neutral -smp, clusters=* command line support
  2021-03-31  9:53 [RFC PATCH 0/6] Introduce cluster cpu topology support Yanan Wang
@ 2021-03-31  9:53 ` Yanan Wang
  2021-03-31  9:53 ` [RFC PATCH 2/6] hw/core/machine: Parse cluster cpu topology in smp_parse() Yanan Wang
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Yanan Wang @ 2021-03-31  9:53 UTC (permalink / raw)
  To: qemu-devel, qemu-arm
  Cc: Barry Song, Peter Maydell, Andrew Jones, Eduardo Habkost,
	Michael S . Tsirkin, wanghaibin.wang, Richard Henderson,
	Yanan Wang, Shannon Zhao, Paolo Bonzini, yuzenghui,
	Igor Mammedov, zhukeqian1, Jiajie Li

A cluster means a group of cores that share some resources (e.g. cache)
among them under the LLC. For example, ARM64 server chip Kunpeng 920 has
6 or 8 clusters in each NUMA, and each cluster has 4 cores. All clusters
share L3 cache data while cores within each cluster share the L2 cache.

Also, there are some x86 CPU implementations (e.g. Jacobsville) where L2
cache is shared among a cluster of cores instead of being exclusive to
one single core. For example, on Jacobsville there are 6 clusters of 4
Atom cores, each cluster sharing a separate L2, and 24 cores sharing L3).

The cache affinity of cluster has been proved to improve the Linux kernel
scheduling performance and a patchset has been posted, in which a general
sched_domain for clusters was added and a cluster level was added in the
arch-neutral cpu topology struct like below.
struct cpu_topology {
    int thread_id;
    int core_id;
    int cluster_id;
    int package_id;
    int llc_id;
    cpumask_t thread_sibling;
    cpumask_t core_sibling;
    cpumask_t cluster_sibling;
    cpumask_t llc_sibling;
}

Also the Kernel Doc: Documentation/devicetree/bindings/cpu/cpu-topology.txt
defines a four-level CPU topology hierarchy like socket/cluster/core/thread.
According to the context, a socket node's child nodes must be one or more
cluster nodes and a cluster node's child nodes must be one or more cluster
nodes/one or more core nodes.

So let's add the arch-neutral -smp, clusters=* command line support, so that
future guest os could make use of cluster cpu topology for better scheduling
performance. And whichever architecture that has groups of cpus sharing some
separate resources(e.g. L2 cache) internely under LLC can use this command
line parameter to define a VM with cluster level cpu topology.

For ARM machines, a four-level cpu hierarchy can be defined and it will be
sockets/clusters/cores/threads. For PC machines, a five-level cpu hierarchy
can be defined and it will be sockets/dies/clusters/cores/threads.

Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
---
 include/hw/boards.h |  4 +++-
 qemu-options.hx     | 27 ++++++++++++++++-----------
 softmmu/vl.c        |  3 +++
 3 files changed, 22 insertions(+), 12 deletions(-)

diff --git a/include/hw/boards.h b/include/hw/boards.h
index a46dfe5d1a..776c3fe5e1 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -235,13 +235,15 @@ typedef struct DeviceMemoryState {
 /**
  * CpuTopology:
  * @cpus: the number of present logical processors on the machine
- * @cores: the number of cores in one package
+ * @clusters: the number of clusters in one package
+ * @cores: the number of cores in one cluster
  * @threads: the number of threads in one core
  * @sockets: the number of sockets on the machine
  * @max_cpus: the maximum number of logical processors on the machine
  */
 typedef struct CpuTopology {
     unsigned int cpus;
+    unsigned int clusters;
     unsigned int cores;
     unsigned int threads;
     unsigned int sockets;
diff --git a/qemu-options.hx b/qemu-options.hx
index 6c34c7050f..213904ceda 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -184,25 +184,30 @@ SRST
 ERST
 
 DEF("smp", HAS_ARG, QEMU_OPTION_smp,
-    "-smp [cpus=]n[,maxcpus=cpus][,cores=cores][,threads=threads][,dies=dies][,sockets=sockets]\n"
+    "-smp [cpus=]n[,maxcpus=cpus][,clusters=clusters][,cores=cores][,threads=threads][,dies=dies][,sockets=sockets]\n"
     "                set the number of CPUs to 'n' [default=1]\n"
     "                maxcpus= maximum number of total cpus, including\n"
     "                offline CPUs for hotplug, etc\n"
-    "                cores= number of CPU cores on one socket (for PC, it's on one die)\n"
+    "                clusters= number of CPU clusters on one socket\n"
+    "                (for PC, it's on one die)\n"
+    "                cores= number of CPU cores on one cluster\n"
     "                threads= number of threads on one CPU core\n"
     "                dies= number of CPU dies on one socket (for PC only)\n"
     "                sockets= number of discrete sockets in the system\n",
         QEMU_ARCH_ALL)
 SRST
-``-smp [cpus=]n[,cores=cores][,threads=threads][,dies=dies][,sockets=sockets][,maxcpus=maxcpus]``
-    Simulate an SMP system with n CPUs. On the PC target, up to 255 CPUs
-    are supported. On Sparc32 target, Linux limits the number of usable
-    CPUs to 4. For the PC target, the number of cores per die, the
-    number of threads per cores, the number of dies per packages and the
-    total number of sockets can be specified. Missing values will be
-    computed. If any on the three values is given, the total number of
-    CPUs n can be omitted. maxcpus specifies the maximum number of
-    hotpluggable CPUs.
+``-smp [cpus=]n[,maxcpus=cpus][,clusters=clusters][,cores=cores][,threads=threads][,dies=dies][,sockets=sockets]``
+    Simulate an SMP system with n CPUs. On the PC target, up to 255
+    CPUs are supported. On the Sparc32 target, Linux limits the number
+    of usable CPUs to 4. For the PC target, the number of threads per
+    core, the number of cores per cluster, the number of clusters per
+    die, the number of dies per package and the total number of sockets
+    can be specified. For the ARM target, the number of threads per core,
+    the number of cores per cluster, the number of clusters per socket
+    and the total number of sockets can be specified. And missing values
+    will be computed. If any of the five values is given, the total
+    number of CPUs n can be omitted. Maxcpus specifies the maximum
+    number of hotpluggable CPUs.
 ERST
 
 DEF("numa", HAS_ARG, QEMU_OPTION_numa,
diff --git a/softmmu/vl.c b/softmmu/vl.c
index b219ce1f35..b915f0606a 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -711,6 +711,9 @@ static QemuOptsList qemu_smp_opts = {
         }, {
             .name = "dies",
             .type = QEMU_OPT_NUMBER,
+        }, {
+            .name = "clusters",
+            .type = QEMU_OPT_NUMBER,
         }, {
             .name = "cores",
             .type = QEMU_OPT_NUMBER,
-- 
2.19.1



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC PATCH 2/6] hw/core/machine: Parse cluster cpu topology in smp_parse()
  2021-03-31  9:53 [RFC PATCH 0/6] Introduce cluster cpu topology support Yanan Wang
  2021-03-31  9:53 ` [RFC PATCH 1/6] vl.c: Add arch-neutral -smp, clusters=* command line support Yanan Wang
@ 2021-03-31  9:53 ` Yanan Wang
  2021-03-31  9:53 ` [RFC PATCH 3/6] hw/arm/virt: Parse cluster cpu topology for ARM machines Yanan Wang
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Yanan Wang @ 2021-03-31  9:53 UTC (permalink / raw)
  To: qemu-devel, qemu-arm
  Cc: Barry Song, Peter Maydell, Andrew Jones, Eduardo Habkost,
	Michael S . Tsirkin, wanghaibin.wang, Richard Henderson,
	Yanan Wang, Shannon Zhao, Paolo Bonzini, yuzenghui,
	Igor Mammedov, zhukeqian1, Jiajie Li

Function smp_parse() in hw/core/machine.c is a generic/default function
used for parsing the -smp command line. Since the new cluster parameter
has been added in struct CpuTopology, we should parse this new parameter
in the default function.

In smp_parse(), the computing logic of missing values prefers sockets over
cores over threads. And the value of clusters will be set as default 1 if
not explictly specified, so that it will not impact the parsing results of
machines that won't specify "clusters=" in -smp command line because they
just don't support it.

Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
---
 hw/core/machine.c | 32 +++++++++++++++++++-------------
 1 file changed, 19 insertions(+), 13 deletions(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 970046f438..dd77ad183d 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -720,33 +720,38 @@ static void smp_parse(MachineState *ms, QemuOpts *opts)
     if (opts) {
         unsigned cpus    = qemu_opt_get_number(opts, "cpus", 0);
         unsigned sockets = qemu_opt_get_number(opts, "sockets", 0);
+        unsigned clusters = qemu_opt_get_number(opts, "clusters", 1);
         unsigned cores   = qemu_opt_get_number(opts, "cores", 0);
         unsigned threads = qemu_opt_get_number(opts, "threads", 0);
 
-        /* compute missing values, prefer sockets over cores over threads */
+        /*
+         * Compute missing values, prefer sockets over cores
+         * over threads. And the value of clusters has been
+         * set as default 1 if not explicitly specified.
+         */
         if (cpus == 0 || sockets == 0) {
             cores = cores > 0 ? cores : 1;
             threads = threads > 0 ? threads : 1;
             if (cpus == 0) {
                 sockets = sockets > 0 ? sockets : 1;
-                cpus = cores * threads * sockets;
+                cpus = sockets * clusters * cores * threads;
             } else {
                 ms->smp.max_cpus =
                         qemu_opt_get_number(opts, "maxcpus", cpus);
-                sockets = ms->smp.max_cpus / (cores * threads);
+                sockets = ms->smp.max_cpus / (clusters * cores * threads);
             }
         } else if (cores == 0) {
             threads = threads > 0 ? threads : 1;
-            cores = cpus / (sockets * threads);
+            cores = cpus / (sockets * clusters * threads);
             cores = cores > 0 ? cores : 1;
         } else if (threads == 0) {
-            threads = cpus / (cores * sockets);
+            threads = cpus / (sockets * clusters * cores);
             threads = threads > 0 ? threads : 1;
-        } else if (sockets * cores * threads < cpus) {
+        } else if (sockets * clusters * cores * threads < cpus) {
             error_report("cpu topology: "
-                         "sockets (%u) * cores (%u) * threads (%u) < "
-                         "smp_cpus (%u)",
-                         sockets, cores, threads, cpus);
+                         "sockets (%u) * clusters (%u) * cores (%u) * "
+                         "threads (%u) < smp_cpus (%u)",
+                         sockets, clusters, cores, threads, cpus);
             exit(1);
         }
 
@@ -758,16 +763,17 @@ static void smp_parse(MachineState *ms, QemuOpts *opts)
             exit(1);
         }
 
-        if (sockets * cores * threads != ms->smp.max_cpus) {
+        if (sockets * clusters * cores * threads != ms->smp.max_cpus) {
             error_report("Invalid CPU topology: "
-                         "sockets (%u) * cores (%u) * threads (%u) "
-                         "!= maxcpus (%u)",
-                         sockets, cores, threads,
+                         "sockets (%u) * clusters (%u) * cores (%u) * "
+                         "threads (%u) != maxcpus (%u)",
+                         sockets, clusters, cores, threads,
                          ms->smp.max_cpus);
             exit(1);
         }
 
         ms->smp.cpus = cpus;
+        ms->smp.clusters = clusters;
         ms->smp.cores = cores;
         ms->smp.threads = threads;
         ms->smp.sockets = sockets;
-- 
2.19.1



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC PATCH 3/6] hw/arm/virt: Parse cluster cpu topology for ARM machines
  2021-03-31  9:53 [RFC PATCH 0/6] Introduce cluster cpu topology support Yanan Wang
  2021-03-31  9:53 ` [RFC PATCH 1/6] vl.c: Add arch-neutral -smp, clusters=* command line support Yanan Wang
  2021-03-31  9:53 ` [RFC PATCH 2/6] hw/core/machine: Parse cluster cpu topology in smp_parse() Yanan Wang
@ 2021-03-31  9:53 ` Yanan Wang
  2021-03-31  9:53 ` [RFC PATCH 4/6] hw/i386/pc: Parse cluster cpu topology for PC machines Yanan Wang
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Yanan Wang @ 2021-03-31  9:53 UTC (permalink / raw)
  To: qemu-devel, qemu-arm
  Cc: Barry Song, Peter Maydell, Andrew Jones, Eduardo Habkost,
	Michael S . Tsirkin, wanghaibin.wang, Richard Henderson,
	Yanan Wang, Shannon Zhao, Paolo Bonzini, yuzenghui,
	Igor Mammedov, zhukeqian1, Jiajie Li

There is a separate function virt_smp_parse() in hw/virt/arm.c used
to parse cpu topology for the ARM machines. And there are some ARM
implementations that have the concept of cluster, for example, ARM64
server chip Kunpeng 920 has 6 or 8 clusters in each NUMA node and each
cluster has 4 cores. All clusters share L3 cache data while the cores
within each cluster share L2 cache. So parse cluster cpu topology for
ARM machines, then guest kernel will take advantages of it for better
scheduling performance.

In virt_smp_parse(), the computing logic of missing values prefers
cores over sockets over threads. And the value of clusters will be
set as default 1 if not explictly specified, so that it will not
impact the parsing results of machines that won't specify "clusters="
in -smp command line because they just don't support it.

Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
---
 hw/arm/virt.c | 31 +++++++++++++++++--------------
 1 file changed, 17 insertions(+), 14 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 911ad7d3aa..c9ad76ff64 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -2608,35 +2608,37 @@ static void virt_smp_parse(MachineState *ms, QemuOpts *opts)
     if (opts) {
         unsigned cpus    = qemu_opt_get_number(opts, "cpus", 0);
         unsigned sockets = qemu_opt_get_number(opts, "sockets", 0);
+        unsigned clusters = qemu_opt_get_number(opts, "clusters", 1);
         unsigned cores   = qemu_opt_get_number(opts, "cores", 0);
         unsigned threads = qemu_opt_get_number(opts, "threads", 0);
 
         /*
-         * Compute missing values; prefer cores over sockets and
-         * sockets over threads.
+         * Compute missing values, prefer cores over sockets
+         * and sockets over threads. The value of clusters has
+         * been be set as default 1 if not explicitly specified.
          */
         if (cpus == 0 || cores == 0) {
             sockets = sockets > 0 ? sockets : 1;
             threads = threads > 0 ? threads : 1;
             if (cpus == 0) {
                 cores = cores > 0 ? cores : 1;
-                cpus = cores * threads * sockets;
+                cpus = sockets * clusters * cores * threads;
             } else {
                 ms->smp.max_cpus = qemu_opt_get_number(opts, "maxcpus", cpus);
-                cores = ms->smp.max_cpus / (sockets * threads);
+                cores = ms->smp.max_cpus / (sockets * clusters * threads);
             }
         } else if (sockets == 0) {
             threads = threads > 0 ? threads : 1;
-            sockets = cpus / (cores * threads);
+            sockets = cpus / (clusters * cores * threads);
             sockets = sockets > 0 ? sockets : 1;
         } else if (threads == 0) {
-            threads = cpus / (cores * sockets);
+            threads = cpus / (sockets * clusters * cores);
             threads = threads > 0 ? threads : 1;
-        } else if (sockets * cores * threads < cpus) {
+        } else if (sockets * clusters * cores * threads < cpus) {
             error_report("cpu topology: "
-                         "sockets (%u) * cores (%u) * threads (%u) < "
-                         "smp_cpus (%u)",
-                         sockets, cores, threads, cpus);
+                         "sockets (%u) * clusters (%u) * cores (%u) * "
+                         "threads (%u) < smp_cpus (%u)",
+                         sockets, clusters, cores, threads, cpus);
             exit(1);
         }
 
@@ -2647,16 +2649,17 @@ static void virt_smp_parse(MachineState *ms, QemuOpts *opts)
             exit(1);
         }
 
-        if (sockets * cores * threads != ms->smp.max_cpus) {
+        if (sockets * clusters * cores * threads != ms->smp.max_cpus) {
             error_report("cpu topology: "
-                         "sockets (%u) * cores (%u) * threads (%u)"
-                         "!= maxcpus (%u)",
-                         sockets, cores, threads,
+                         "sockets (%u) * clusters(%u) * cores (%u) * "
+                         "threads (%u) != maxcpus (%u)",
+                         sockets, clusters, cores, threads,
                          ms->smp.max_cpus);
             exit(1);
         }
 
         ms->smp.cpus = cpus;
+        ms->smp.clusters = clusters;
         ms->smp.cores = cores;
         ms->smp.threads = threads;
         ms->smp.sockets = sockets;
-- 
2.19.1



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC PATCH 4/6] hw/i386/pc: Parse cluster cpu topology for PC machines
  2021-03-31  9:53 [RFC PATCH 0/6] Introduce cluster cpu topology support Yanan Wang
                   ` (2 preceding siblings ...)
  2021-03-31  9:53 ` [RFC PATCH 3/6] hw/arm/virt: Parse cluster cpu topology for ARM machines Yanan Wang
@ 2021-03-31  9:53 ` Yanan Wang
  2021-03-31  9:53 ` [RFC PATCH 5/6] hw/arm/virt-acpi-build: Add cluster level for ARM PPTT table Yanan Wang
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Yanan Wang @ 2021-03-31  9:53 UTC (permalink / raw)
  To: qemu-devel, qemu-arm
  Cc: Barry Song, Peter Maydell, Andrew Jones, Eduardo Habkost,
	Michael S . Tsirkin, wanghaibin.wang, Richard Henderson,
	Yanan Wang, Shannon Zhao, Paolo Bonzini, yuzenghui,
	Igor Mammedov, zhukeqian1, Jiajie Li

There is a separate function pc_smp_parse() in hw/i386/pc.c used
to parse cpu topology for the PC machines. And there are some x86
implementations that have a similar concept of cluster, for example,
on Jacobsville there are 6 clusters of 4 Atom cores, each cluster
sharing a separate L2 cache, and 24 cores sharing L3 cache.
So parse cluster cpu topology the for PC machines, then guest kernel
will take advantages of it for better scheduling performance.

In pc_smp_parse(), the computing logic of missing values prefers
sockets over cores over threads. And the value of clusters will be
set as default 1 if not explictly specified, so that it will not
impact the parsing results of machines that won't specify "clusters="
in -smp command line because they just don't support it.

Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
---
 hw/i386/pc.c | 31 +++++++++++++++++++------------
 1 file changed, 19 insertions(+), 12 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 8aa85dec54..f2906f9185 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -716,33 +716,39 @@ void pc_smp_parse(MachineState *ms, QemuOpts *opts)
         unsigned cpus    = qemu_opt_get_number(opts, "cpus", 0);
         unsigned sockets = qemu_opt_get_number(opts, "sockets", 0);
         unsigned dies = qemu_opt_get_number(opts, "dies", 1);
+        unsigned clusters = qemu_opt_get_number(opts, "clusters", 1);
         unsigned cores   = qemu_opt_get_number(opts, "cores", 0);
         unsigned threads = qemu_opt_get_number(opts, "threads", 0);
 
-        /* compute missing values, prefer sockets over cores over threads */
+        /*
+         * Compute missing values, prefer sockets over cores
+         * over threads. And the value of dies or clusters has
+         * been set as default 1 if not explicitly specified.
+         */
         if (cpus == 0 || sockets == 0) {
             cores = cores > 0 ? cores : 1;
             threads = threads > 0 ? threads : 1;
             if (cpus == 0) {
                 sockets = sockets > 0 ? sockets : 1;
-                cpus = cores * threads * dies * sockets;
+                cpus = sockets * dies * clusters * cores * threads;
             } else {
                 ms->smp.max_cpus =
                         qemu_opt_get_number(opts, "maxcpus", cpus);
-                sockets = ms->smp.max_cpus / (cores * threads * dies);
+                sockets = ms->smp.max_cpus /
+                          (dies * clusters * cores * threads);
             }
         } else if (cores == 0) {
             threads = threads > 0 ? threads : 1;
-            cores = cpus / (sockets * dies * threads);
+            cores = cpus / (sockets * dies * clusters * threads);
             cores = cores > 0 ? cores : 1;
         } else if (threads == 0) {
-            threads = cpus / (cores * dies * sockets);
+            threads = cpus / (sockets * dies * clusters * cores);
             threads = threads > 0 ? threads : 1;
-        } else if (sockets * dies * cores * threads < cpus) {
+        } else if (sockets * dies * clusters * cores * threads < cpus) {
             error_report("cpu topology: "
-                         "sockets (%u) * dies (%u) * cores (%u) * threads (%u) < "
-                         "smp_cpus (%u)",
-                         sockets, dies, cores, threads, cpus);
+                         "sockets (%u) * dies (%u) * clusters (%u) * "
+                         "cores (%u) * threads (%u) < smp_cpus (%u)",
+                         sockets, dies, clusters, cores, threads, cpus);
             exit(1);
         }
 
@@ -756,14 +762,15 @@ void pc_smp_parse(MachineState *ms, QemuOpts *opts)
 
         if (sockets * dies * cores * threads != ms->smp.max_cpus) {
             error_report("Invalid CPU topology deprecated: "
-                         "sockets (%u) * dies (%u) * cores (%u) * threads (%u) "
-                         "!= maxcpus (%u)",
-                         sockets, dies, cores, threads,
+                         "sockets (%u) * dies (%u) * clusters (%u) * "
+                         "cores (%u) * threads (%u) != maxcpus (%u)",
+                         sockets, dies, clusters, cores, threads,
                          ms->smp.max_cpus);
             exit(1);
         }
 
         ms->smp.cpus = cpus;
+        ms->smp.clusters = clusters;
         ms->smp.cores = cores;
         ms->smp.threads = threads;
         ms->smp.sockets = sockets;
-- 
2.19.1



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC PATCH 5/6] hw/arm/virt-acpi-build: Add cluster level for ARM PPTT table
  2021-03-31  9:53 [RFC PATCH 0/6] Introduce cluster cpu topology support Yanan Wang
                   ` (3 preceding siblings ...)
  2021-03-31  9:53 ` [RFC PATCH 4/6] hw/i386/pc: Parse cluster cpu topology for PC machines Yanan Wang
@ 2021-03-31  9:53 ` Yanan Wang
  2021-03-31  9:53 ` [RFC PATCH 6/6] hw/arm/virt: Add cluster level for ARM device tree Yanan Wang
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Yanan Wang @ 2021-03-31  9:53 UTC (permalink / raw)
  To: qemu-devel, qemu-arm
  Cc: Barry Song, Peter Maydell, Andrew Jones, Eduardo Habkost,
	Michael S . Tsirkin, wanghaibin.wang, Richard Henderson,
	Yanan Wang, Shannon Zhao, Paolo Bonzini, yuzenghui,
	Igor Mammedov, zhukeqian1, Jiajie Li

Add a Processor Hierarchy Node of cluster level between core level
and package level for ARM PPTT table.

Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
---
 hw/acpi/aml-build.c         | 11 ++++++++++
 hw/arm/virt-acpi-build.c    | 43 +++++++++++++++++++++----------------
 include/hw/acpi/aml-build.h |  2 ++
 3 files changed, 38 insertions(+), 18 deletions(-)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index a0af3e9d73..fe07817013 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1902,6 +1902,17 @@ void build_socket_hierarchy(GArray *tbl, uint32_t parent, uint32_t id)
     build_append_int_noprefix(tbl, 0, 4);  /* Number of private resources */
 }
 
+void build_cluster_hierarchy(GArray *tbl, uint32_t parent, uint32_t id)
+{
+    build_append_byte(tbl, ACPI_PPTT_TYPE_PROCESSOR); /* Type 0 - processor */
+    build_append_byte(tbl, 20);           /* Length, no private resources */
+    build_append_int_noprefix(tbl, 0, 2);      /* Reserved */
+    build_append_int_noprefix(tbl, 0, 4);      /* Flags */
+    build_append_int_noprefix(tbl, parent, 4); /* Parent */
+    build_append_int_noprefix(tbl, id, 4);     /* ACPI processor ID */
+    build_append_int_noprefix(tbl, 0, 4); /* Number of private resources */
+}
+
 void build_processor_hierarchy(GArray *tbl, uint32_t flags,
                                uint32_t parent, uint32_t id)
 {
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 38d50ce66c..3b80518a90 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -442,6 +442,7 @@ build_pptt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
     int pptt_start = table_data->len;
     int uid = 0, cpus = 0, socket = 0;
     MachineState *ms = MACHINE(vms);
+    unsigned int smp_clusters = ms->smp.clusters;
     unsigned int smp_cores = ms->smp.cores;
     unsigned int smp_threads = ms->smp.threads;
 
@@ -449,29 +450,35 @@ build_pptt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
 
     for (socket = 0; cpus < ms->possible_cpus->len; socket++) {
         uint32_t socket_offset = table_data->len - pptt_start;
-        int core;
+        int cluster;
 
         build_socket_hierarchy(table_data, 0, socket);
 
-        for (core = 0; core < smp_cores; core++) {
-            uint32_t core_offset = table_data->len - pptt_start;
-            int thread;
-
-            if (smp_threads <= 1) {
-                build_processor_hierarchy(table_data,
-                                          ACPI_PPTT_ACPI_PROCESSOR_ID_VALID |
-                                          ACPI_PPTT_ACPI_LEAF_NODE,
-                                          socket_offset, uid++);
-             } else {
-                build_processor_hierarchy(table_data,
-                                          ACPI_PPTT_ACPI_PROCESSOR_ID_VALID,
-                                          socket_offset, core);
-                for (thread = 0; thread < smp_threads; thread++) {
-                    build_thread_hierarchy(table_data, core_offset, uid++);
+        for (cluster = 0; cluster < smp_clusters; cluster++) {
+            uint32_t cluster_offset = table_data->len - pptt_start;
+            int core;
+
+            build_cluster_hierarchy(table_data, socket_offset, cluster);
+
+            for (core = 0; core < smp_cores; core++) {
+                uint32_t core_offset = table_data->len - pptt_start;
+                int thread;
+
+                if (smp_threads <= 1) {
+                    build_processor_hierarchy(table_data,
+                                              ACPI_PPTT_ACPI_PROCESSOR_ID_VALID |
+                                              ACPI_PPTT_ACPI_LEAF_NODE,
+                                              cluster_offset, uid++);
+                } else {
+                    build_processor_hierarchy(table_data, 0, cluster_offset, core);
+
+                    for (thread = 0; thread < smp_threads; thread++) {
+                        build_thread_hierarchy(table_data, core_offset, uid++);
+                    }
                 }
-             }
+            }
         }
-        cpus += smp_cores * smp_threads;
+        cpus += smp_clusters * smp_cores * smp_threads;
     }
 
     build_header(linker, table_data,
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index 7f0ca1a198..cb68535cf1 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -464,6 +464,8 @@ void build_slit(GArray *table_data, BIOSLinker *linker, MachineState *ms,
 
 void build_socket_hierarchy(GArray *tbl, uint32_t parent, uint32_t id);
 
+void build_cluster_hierarchy(GArray *tbl, uint32_t parent, uint32_t id);
+
 void build_processor_hierarchy(GArray *tbl, uint32_t flags,
                                uint32_t parent, uint32_t id);
 
-- 
2.19.1



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC PATCH 6/6] hw/arm/virt: Add cluster level for ARM device tree
  2021-03-31  9:53 [RFC PATCH 0/6] Introduce cluster cpu topology support Yanan Wang
                   ` (4 preceding siblings ...)
  2021-03-31  9:53 ` [RFC PATCH 5/6] hw/arm/virt-acpi-build: Add cluster level for ARM PPTT table Yanan Wang
@ 2021-03-31  9:53 ` Yanan Wang
  2021-03-31 10:00 ` [RFC PATCH 0/6] Introduce cluster cpu topology support Paolo Bonzini
  2021-04-27  9:57 ` Philippe Mathieu-Daudé
  7 siblings, 0 replies; 13+ messages in thread
From: Yanan Wang @ 2021-03-31  9:53 UTC (permalink / raw)
  To: qemu-devel, qemu-arm
  Cc: Barry Song, Peter Maydell, Andrew Jones, Eduardo Habkost,
	Michael S . Tsirkin, wanghaibin.wang, Richard Henderson,
	Yanan Wang, Shannon Zhao, Paolo Bonzini, yuzenghui,
	Igor Mammedov, zhukeqian1, Jiajie Li

Add a cluster level between core level and package level for
ARM device tree.

Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
---
 hw/arm/virt.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index c9ad76ff64..d78db3387e 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -430,14 +430,20 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms)
 
             if (ms->smp.threads > 1) {
                 map_path = g_strdup_printf(
-                            "/cpus/cpu-map/%s%d/%s%d/%s%d",
-                            "socket", cpu / (ms->smp.cores * ms->smp.threads),
+                            "/cpus/cpu-map/%s%d/%s%d/%s%d/%s%d",
+                            "socket",
+                            cpu / (ms->smp.clusters * ms->smp.cores *
+                            ms->smp.threads),
+                            "cluster",
+                            (cpu / (ms->smp.cores * ms->smp.threads)) %
+                            ms->smp.clusters,
                             "core", (cpu / ms->smp.threads) % ms->smp.cores,
                             "thread", cpu % ms->smp.threads);
             } else {
                 map_path = g_strdup_printf(
-                            "/cpus/cpu-map/%s%d/%s%d",
-                            "socket", cpu / ms->smp.cores,
+                            "/cpus/cpu-map/%s%d/%s%d/%s%d",
+                            "socket", cpu / (ms->smp.clusters * ms->smp.cores),
+                            "cluster", (cpu / ms->smp.cores) % ms->smp.clusters,
                             "core", cpu % ms->smp.cores);
             }
             qemu_fdt_add_path(vms->fdt, map_path);
-- 
2.19.1



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 0/6] Introduce cluster cpu topology support
  2021-03-31  9:53 [RFC PATCH 0/6] Introduce cluster cpu topology support Yanan Wang
                   ` (5 preceding siblings ...)
  2021-03-31  9:53 ` [RFC PATCH 6/6] hw/arm/virt: Add cluster level for ARM device tree Yanan Wang
@ 2021-03-31 10:00 ` Paolo Bonzini
  2021-04-01  8:43   ` wangyanan (Y)
  2021-04-27  9:57 ` Philippe Mathieu-Daudé
  7 siblings, 1 reply; 13+ messages in thread
From: Paolo Bonzini @ 2021-03-31 10:00 UTC (permalink / raw)
  To: Yanan Wang, qemu-devel, qemu-arm
  Cc: Barry Song, Peter Maydell, Andrew Jones, Eduardo Habkost,
	Michael S . Tsirkin, Richard Henderson, Shannon Zhao,
	wanghaibin.wang, yuzenghui, Igor Mammedov, zhukeqian1, Jiajie Li

On 31/03/21 11:53, Yanan Wang wrote:
> A cluster means a group of cores that share some resources (e.g. cache)
> among them under the LLC. For example, ARM64 server chip Kunpeng 920 has
> 6 or 8 clusters in each NUMA, and each cluster has 4 cores. All clusters
> share L3 cache data while cores within each cluster share the L2 cache.

Is this different from what we already have with "-smp dies"?

Paolo



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 0/6] Introduce cluster cpu topology support
  2021-03-31 10:00 ` [RFC PATCH 0/6] Introduce cluster cpu topology support Paolo Bonzini
@ 2021-04-01  8:43   ` wangyanan (Y)
  0 siblings, 0 replies; 13+ messages in thread
From: wangyanan (Y) @ 2021-04-01  8:43 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel, qemu-arm
  Cc: Barry Song, Peter Maydell, Andrew Jones, Eduardo Habkost,
	Michael S . Tsirkin, Richard Henderson, Shannon Zhao,
	wanghaibin.wang, yuzenghui, Igor Mammedov, zhukeqian1, Jiajie Li

Hi Paolo,

On 2021/3/31 18:00, Paolo Bonzini wrote:
> On 31/03/21 11:53, Yanan Wang wrote:
>> A cluster means a group of cores that share some resources (e.g. cache)
>> among them under the LLC. For example, ARM64 server chip Kunpeng 920 has
>> 6 or 8 clusters in each NUMA, and each cluster has 4 cores. All clusters
>> share L3 cache data while cores within each cluster share the L2 cache.
>
> Is this different from what we already have with "-smp dies"?
As far as I know, yes. I think they are two architecture concepts of 
different levels.
A cpu socket/package can consist of multiple dies, and each die can 
consist of
multiple clusters, which means dies are parent for clusters. And this 
kind of cpu
hierarchy structure is normal in ARM64 platform.

Still take above ARM64 server chip Kunpeng 920 as an example, there 
totally are
2 sockets, 2 dies in each socket, 6 or 8 clusters in each die, and 4 
cores in each
cluster. Since it also supports NUMA architecture, then each NUMA actually
represents one die.

Although there is already "-smp dies=*" command line parameter for PC 
Machines,
a cluster level can be added for x86 architecture if meaningful and 
there will be more
work to do in qemu to make use of it. And I am sure that ARM needs this 
level.

If there is something wrong above, please correct me, thanks!

Thanks,
Yanan
>
> Paolo
>
> .


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 0/6] Introduce cluster cpu topology support
  2021-03-31  9:53 [RFC PATCH 0/6] Introduce cluster cpu topology support Yanan Wang
                   ` (6 preceding siblings ...)
  2021-03-31 10:00 ` [RFC PATCH 0/6] Introduce cluster cpu topology support Paolo Bonzini
@ 2021-04-27  9:57 ` Philippe Mathieu-Daudé
  2021-04-27 11:00   ` wangyanan (Y)
  7 siblings, 1 reply; 13+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-04-27  9:57 UTC (permalink / raw)
  To: Yanan Wang, qemu-devel, qemu-arm
  Cc: Barry Song, Peter Maydell, Andrew Jones, Eduardo Habkost,
	Michael S . Tsirkin, Richard Henderson, Shannon Zhao,
	Igor Mammedov, Paolo Bonzini, yuzenghui, wanghaibin.wang,
	zhukeqian1, Jiajie Li

Hi Yanan,

On 3/31/21 11:53 AM, Yanan Wang wrote:
> Hi,
> This series introduces the cluster cpu topology support, besides now
> existing sockets, cores, and threads.
> 
> A cluster means a group of cores that share some resources (e.g. cache)
> among them under the LLC. For example, ARM64 server chip Kunpeng 920 has
> 6 or 8 clusters in each NUMA, and each cluster has 4 cores. All clusters
> share L3 cache data while cores within each cluster share the L2 cache.
> 
> Also, there are some x86 CPU implementations (e.g. Jacobsville) where L2
> cache is shared among a cluster of cores instead of being exclusive to
> one single core. For example, on Jacobsville there are 6 clusters of 4
> Atom cores, each cluster sharing a separate L2, and 24 cores sharing L3).

> About this series:
> Note that, this series was implemented based on [3] and [4]. Although they
> have not merged into qemu mainline for now, it's still meaning to post this
> series to express the thoughts first. So a RFC is sent and any comments are
> welcomed and appreciated.

At a glance: tests/unit/test-x86-cpuid.c should be adapted to be generic
(but still supporting target-specific sub-tests) and some aarch64 tests
added.

Similarly the ARM PPTT tables tested in tests/qtest/bios-tables-test.c.

Otherwise, the overall series looks good and coherent, but it isn't my
area :)

Regards,

Phil.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 0/6] Introduce cluster cpu topology support
  2021-04-27  9:57 ` Philippe Mathieu-Daudé
@ 2021-04-27 11:00   ` wangyanan (Y)
  2021-04-27 12:08     ` Philippe Mathieu-Daudé
  0 siblings, 1 reply; 13+ messages in thread
From: wangyanan (Y) @ 2021-04-27 11:00 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, qemu-devel, qemu-arm
  Cc: Barry Song, Peter Maydell, Andrew Jones, Eduardo Habkost,
	Michael S . Tsirkin, Richard Henderson, Shannon Zhao,
	Igor Mammedov, Paolo Bonzini, yuzenghui, wanghaibin.wang,
	zhukeqian1, Jiajie Li

Hi Philippe,

On 2021/4/27 17:57, Philippe Mathieu-Daudé wrote:
> Hi Yanan,
>
> On 3/31/21 11:53 AM, Yanan Wang wrote:
>> Hi,
>> This series introduces the cluster cpu topology support, besides now
>> existing sockets, cores, and threads.
>>
>> A cluster means a group of cores that share some resources (e.g. cache)
>> among them under the LLC. For example, ARM64 server chip Kunpeng 920 has
>> 6 or 8 clusters in each NUMA, and each cluster has 4 cores. All clusters
>> share L3 cache data while cores within each cluster share the L2 cache.
>>
>> Also, there are some x86 CPU implementations (e.g. Jacobsville) where L2
>> cache is shared among a cluster of cores instead of being exclusive to
>> one single core. For example, on Jacobsville there are 6 clusters of 4
>> Atom cores, each cluster sharing a separate L2, and 24 cores sharing L3).
>> About this series:
>> Note that, this series was implemented based on [3] and [4]. Although they
>> have not merged into qemu mainline for now, it's still meaning to post this
>> series to express the thoughts first. So a RFC is sent and any comments are
>> welcomed and appreciated.
> At a glance: tests/unit/test-x86-cpuid.c should be adapted to be generic
> (but still supporting target-specific sub-tests) and some aarch64 tests
> added.
>
> Similarly the ARM PPTT tables tested in tests/qtest/bios-tables-test.c.
>
> Otherwise, the overall series looks good and coherent, but it isn't my
> area :)
Thank you for the reminder of the related tests. :)
I will have some work to make them cover the new features introduced
by this series.

Thanks,
Yanan
>
> Regards,
>
> Phil.
>
> .


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 0/6] Introduce cluster cpu topology support
  2021-04-27 11:00   ` wangyanan (Y)
@ 2021-04-27 12:08     ` Philippe Mathieu-Daudé
  2021-04-27 12:34       ` wangyanan (Y)
  0 siblings, 1 reply; 13+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-04-27 12:08 UTC (permalink / raw)
  To: wangyanan (Y), qemu-devel, qemu-arm
  Cc: Barry Song, Peter Maydell, Andrew Jones, Eduardo Habkost,
	Michael S . Tsirkin, Richard Henderson, Shannon Zhao,
	Igor Mammedov, Paolo Bonzini, yuzenghui, wanghaibin.wang,
	zhukeqian1, Jiajie Li

On 4/27/21 1:00 PM, wangyanan (Y) wrote:
> On 2021/4/27 17:57, Philippe Mathieu-Daudé wrote:
>> On 3/31/21 11:53 AM, Yanan Wang wrote:
>>> Hi,
>>> This series introduces the cluster cpu topology support, besides now
>>> existing sockets, cores, and threads.
>>>
>>> A cluster means a group of cores that share some resources (e.g. cache)
>>> among them under the LLC. For example, ARM64 server chip Kunpeng 920 has
>>> 6 or 8 clusters in each NUMA, and each cluster has 4 cores. All clusters
>>> share L3 cache data while cores within each cluster share the L2 cache.
>>>
>>> Also, there are some x86 CPU implementations (e.g. Jacobsville) where L2
>>> cache is shared among a cluster of cores instead of being exclusive to
>>> one single core. For example, on Jacobsville there are 6 clusters of 4
>>> Atom cores, each cluster sharing a separate L2, and 24 cores sharing
>>> L3).
>>> About this series:
>>> Note that, this series was implemented based on [3] and [4]. Although
>>> they
>>> have not merged into qemu mainline for now, it's still meaning to
>>> post this
>>> series to express the thoughts first. So a RFC is sent and any
>>> comments are
>>> welcomed and appreciated.
>> At a glance: tests/unit/test-x86-cpuid.c should be adapted to be generic
>> (but still supporting target-specific sub-tests) and some aarch64 tests
>> added.
>>
>> Similarly the ARM PPTT tables tested in tests/qtest/bios-tables-test.c.
>>
>> Otherwise, the overall series looks good and coherent, but it isn't my
>> area :)
> Thank you for the reminder of the related tests. :)
> I will have some work to make them cover the new features introduced
> by this series.

BTW if after 4 weeks and 2 pings nobody sent negative feedback or
NAcked your series, it usually means the community is not against
your purposal, but has some doubts this feature is necessary or
well designed. Tests help to show your work is safe, as it doesn't
break anything. You might need to better explain why this feature
is needed, and what are the limitations of what is currently
possible.

OTOH QEMU has been in "feature freeze" for the next v6.0 release
for the same amount of time, so maybe the maintainers/reviewers
were busy with bugs and still have your series in their TODO list.

Regards,

Phil.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 0/6] Introduce cluster cpu topology support
  2021-04-27 12:08     ` Philippe Mathieu-Daudé
@ 2021-04-27 12:34       ` wangyanan (Y)
  0 siblings, 0 replies; 13+ messages in thread
From: wangyanan (Y) @ 2021-04-27 12:34 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, qemu-devel, qemu-arm
  Cc: Barry Song, Peter Maydell, Andrew Jones, Eduardo Habkost,
	Michael S . Tsirkin, Richard Henderson, Shannon Zhao,
	Igor Mammedov, Paolo Bonzini, yuzenghui, wanghaibin.wang,
	zhukeqian1, Jiajie Li


On 2021/4/27 20:08, Philippe Mathieu-Daudé wrote:
> On 4/27/21 1:00 PM, wangyanan (Y) wrote:
>> On 2021/4/27 17:57, Philippe Mathieu-Daudé wrote:
>>> On 3/31/21 11:53 AM, Yanan Wang wrote:
>>>> Hi,
>>>> This series introduces the cluster cpu topology support, besides now
>>>> existing sockets, cores, and threads.
>>>>
>>>> A cluster means a group of cores that share some resources (e.g. cache)
>>>> among them under the LLC. For example, ARM64 server chip Kunpeng 920 has
>>>> 6 or 8 clusters in each NUMA, and each cluster has 4 cores. All clusters
>>>> share L3 cache data while cores within each cluster share the L2 cache.
>>>>
>>>> Also, there are some x86 CPU implementations (e.g. Jacobsville) where L2
>>>> cache is shared among a cluster of cores instead of being exclusive to
>>>> one single core. For example, on Jacobsville there are 6 clusters of 4
>>>> Atom cores, each cluster sharing a separate L2, and 24 cores sharing
>>>> L3).
>>>> About this series:
>>>> Note that, this series was implemented based on [3] and [4]. Although
>>>> they
>>>> have not merged into qemu mainline for now, it's still meaning to
>>>> post this
>>>> series to express the thoughts first. So a RFC is sent and any
>>>> comments are
>>>> welcomed and appreciated.
>>> At a glance: tests/unit/test-x86-cpuid.c should be adapted to be generic
>>> (but still supporting target-specific sub-tests) and some aarch64 tests
>>> added.
>>>
>>> Similarly the ARM PPTT tables tested in tests/qtest/bios-tables-test.c.
>>>
>>> Otherwise, the overall series looks good and coherent, but it isn't my
>>> area :)
>> Thank you for the reminder of the related tests. :)
>> I will have some work to make them cover the new features introduced
>> by this series.
> BTW if after 4 weeks and 2 pings nobody sent negative feedback or
> NAcked your series, it usually means the community is not against
> your purposal, but has some doubts this feature is necessary or
> well designed. Tests help to show your work is safe, as it doesn't
> break anything. You might need to better explain why this feature
> is needed, and what are the limitations of what is currently
> possible.
>
> OTOH QEMU has been in "feature freeze" for the next v6.0 release
> for the same amount of time, so maybe the maintainers/reviewers
> were busy with bugs and still have your series in their TODO list.
I fully understand your point, and thanks for the explanations.

I will just patiently wait for some feedback, and of course on the same 
time,
will also refine the solution with some convincing tests for later new 
version.

Thanks,
Yanan
> Regards,
>
> Phil.
>
> .


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, back to index

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-31  9:53 [RFC PATCH 0/6] Introduce cluster cpu topology support Yanan Wang
2021-03-31  9:53 ` [RFC PATCH 1/6] vl.c: Add arch-neutral -smp, clusters=* command line support Yanan Wang
2021-03-31  9:53 ` [RFC PATCH 2/6] hw/core/machine: Parse cluster cpu topology in smp_parse() Yanan Wang
2021-03-31  9:53 ` [RFC PATCH 3/6] hw/arm/virt: Parse cluster cpu topology for ARM machines Yanan Wang
2021-03-31  9:53 ` [RFC PATCH 4/6] hw/i386/pc: Parse cluster cpu topology for PC machines Yanan Wang
2021-03-31  9:53 ` [RFC PATCH 5/6] hw/arm/virt-acpi-build: Add cluster level for ARM PPTT table Yanan Wang
2021-03-31  9:53 ` [RFC PATCH 6/6] hw/arm/virt: Add cluster level for ARM device tree Yanan Wang
2021-03-31 10:00 ` [RFC PATCH 0/6] Introduce cluster cpu topology support Paolo Bonzini
2021-04-01  8:43   ` wangyanan (Y)
2021-04-27  9:57 ` Philippe Mathieu-Daudé
2021-04-27 11:00   ` wangyanan (Y)
2021-04-27 12:08     ` Philippe Mathieu-Daudé
2021-04-27 12:34       ` wangyanan (Y)

QEMU-Devel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/qemu-devel/0 qemu-devel/git/0.git
	git clone --mirror https://lore.kernel.org/qemu-devel/1 qemu-devel/git/1.git
	git clone --mirror https://lore.kernel.org/qemu-devel/2 qemu-devel/git/2.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 qemu-devel qemu-devel/ https://lore.kernel.org/qemu-devel \
		qemu-devel@nongnu.org
	public-inbox-index qemu-devel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.nongnu.qemu-devel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git