All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v3 0/4] hw/arm/virt: Introduce cluster cpu topology support
@ 2021-05-16 10:32 Yanan Wang
  2021-05-16 10:32 ` [RFC PATCH v3 1/4] vl.c: Add -smp, clusters=* command line support for ARM cpu Yanan Wang
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Yanan Wang @ 2021-05-16 10:32 UTC (permalink / raw)
  To: Peter Maydell, Paolo Bonzini, Andrew Jones, Michael S . Tsirkin,
	Igor Mammedov, Shannon Zhao, qemu-devel, qemu-arm
  Cc: Barry Song, Philippe Mathieu-Daudé,
	Yanan Wang, prime.zeng, wanghaibin.wang, yuzenghui, yangyicong,
	zhukeqian1

Hi,

This is v3 of the series [1] that I posted to introduce cluster cpu topology
besides now existing sockets, cores, and threads for ARM platform.

Description:
In implementations of ARM architecture, at most there could be a
cpu hierarchy like "sockets/dies/clusters/cores/threads" defined.
For example, ARM64 server chip Kunpeng 920 totally has 2 sockets,
2 NUMA nodes (also means cpu dies) in each socket, 6 clusters in
each NUMA node, 4 cores in each cluster, and doesn't support SMT.
Clusters within the same NUMA share a L3 cache and cores within
the same cluster share a L2 cache.

The cache affinity of ARM cluster has been proved to improve the
kernel scheduling performance and a patchset [2] has been posted,
where a general sched_domain for clusters was added and a cluster
level was added in the arch-neutral cpu topology struct like below.

struct cpu_topology {
    int thread_id;
    int core_id;
    int cluster_id;
    int package_id;
    int llc_id;
    cpumask_t thread_sibling;
    cpumask_t core_sibling;
    cpumask_t cluster_sibling;
    cpumask_t llc_sibling;
}

In virtuallization, exposing the cluster level topology to guest
kernel may also improve the scheduling performance. So let's add
the -smp, clusters=* command line support for ARM cpu, then users
will be able to define a four-level cpu hierarchy for machines
and it will be sockets/clusters/cores/threads.

In this series, we only add the cluster concept of cpu topology
for ARM platform currently, and only focus on exposure of the
topology to guest through ACPI and DT.

[1] https://patchwork.kernel.org/project/qemu-devel/cover/20210413083147.34236-1-wangyanan55@huawei.com/
[2] https://patchwork.kernel.org/project/linux-arm-kernel/cover/20210420001844.9116-1-song.bao.hua@hisilicon.com/

Test results about exposure of topology:
After applying this patch series, launch a guest with virt-6.1.

Cmdline: -smp 96, sockets=2, clusters=12, cores=4, threads=1
Output:
linux-atxcNc:~ # lscpu
Architecture:        aarch64
Byte Order:          Little Endian
CPU(s):              96
On-line CPU(s) list: 0-95
Thread(s) per core:  1
Core(s) per socket:  48
Socket(s):           2
NUMA node(s):        1
Vendor ID:           0x48

Topology information of clusters can also be got:
cat /sys/devices/system/cpu/cpu0/topology/cluster_cpus_list: 0-3
cat /sys/devices/system/cpu/cpu4/topology/cluster_cpus_list: 4-7
cat /sys/devices/system/cpu/cpu8/topology/cluster_cpus_list: 8-11
...

cat /sys/devices/system/cpu/cpu95/topology/cluster_cpus_list: 92-95

THINGS TO DO SOON:
1) Run some benchmark to test the scheduling improvement of guest kernel
   introduced by cluster level virtual topology
2) Add some QEMU tests about ARM vcpu topology, ACPI PPTT table, and DT
   cpu nodes. Will post in a separate patchset later.

---

Changelogs:
v2->v3:
- Address comments from Philippe, and Andrew. Thanks!
- Rebased the code on v3 of series " hw/arm/virt: Introduce cpu topology support"
- v2: https://patchwork.kernel.org/project/qemu-devel/cover/20210413083147.34236-1-wangyanan55@huawei.com/

v1->v2:
- Only focus on cluster support for ARM platform
- v1: https://patchwork.kernel.org/project/qemu-devel/cover/20210331095343.12172-1-wangyanan55@huawei.com/

---

Yanan Wang (4):
  vl.c: Add -smp, clusters=* command line support for ARM cpu
  hw/arm/virt: Add cluster level to device tree
  hw/arm/virt-acpi-build: Add cluster level to PPTT table
  hw/arm/virt: Parse -smp cluster parameter in virt_smp_parse

 hw/arm/virt-acpi-build.c | 45 ++++++++++++++++++++++++----------------
 hw/arm/virt.c            | 44 +++++++++++++++++++++++----------------
 include/hw/arm/virt.h    |  1 +
 qemu-options.hx          | 26 +++++++++++++----------
 softmmu/vl.c             |  3 +++
 5 files changed, 72 insertions(+), 47 deletions(-)

-- 
2.19.1



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC PATCH v3 1/4] vl.c: Add -smp, clusters=* command line support for ARM cpu
  2021-05-16 10:32 [RFC PATCH v3 0/4] hw/arm/virt: Introduce cluster cpu topology support Yanan Wang
@ 2021-05-16 10:32 ` Yanan Wang
  2021-05-17  9:07   ` Andrew Jones
  2021-05-16 10:32 ` [RFC PATCH v3 2/4] hw/arm/virt: Add cluster level to device tree Yanan Wang
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: Yanan Wang @ 2021-05-16 10:32 UTC (permalink / raw)
  To: Peter Maydell, Paolo Bonzini, Andrew Jones, Michael S . Tsirkin,
	Igor Mammedov, Shannon Zhao, qemu-devel, qemu-arm
  Cc: Barry Song, Philippe Mathieu-Daudé,
	Yanan Wang, prime.zeng, wanghaibin.wang, yuzenghui, yangyicong,
	zhukeqian1

In implementations of ARM architecture, at most there could be a
cpu hierarchy like "sockets/dies/clusters/cores/threads" defined.
For example, ARM64 server chip Kunpeng 920 totally has 2 sockets,
2 NUMA nodes (also means cpu dies) in each socket, 6 clusters in
each NUMA node, 4 cores in each cluster, and doesn't support SMT.
Clusters within the same NUMA share a L3 cache and cores within
the same cluster share a L2 cache.

The cache affinity of ARM cluster has been proved to improve the
kernel scheduling performance and a patchset has been posted, in
which a general sched_domain for clusters was added and a cluster
level was added in the arch-neutral cpu topology struct like below.

struct cpu_topology {
    int thread_id;
    int core_id;
    int cluster_id;
    int package_id;
    int llc_id;
    cpumask_t thread_sibling;
    cpumask_t core_sibling;
    cpumask_t cluster_sibling;
    cpumask_t llc_sibling;
}

In virtuallization, exposing the cluster level topology to guest
kernel may also improve the scheduling performance. So let's add
the -smp, clusters=* command line support for ARM cpu, then users
will be able to define a four-level cpu hierarchy for machines
and it will be sockets/clusters/cores/threads.

Because we only support clusters for ARM cpu currently, a new member
"smp_clusters" is only added to the VirtMachineState structure.

Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
---
 include/hw/arm/virt.h |  1 +
 qemu-options.hx       | 26 +++++++++++++++-----------
 softmmu/vl.c          |  3 +++
 3 files changed, 19 insertions(+), 11 deletions(-)

diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index f546dd2023..74fff9667b 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -156,6 +156,7 @@ struct VirtMachineState {
     char *pciehb_nodename;
     const int *irqmap;
     int fdt_size;
+    unsigned smp_clusters;
     uint32_t clock_phandle;
     uint32_t gic_phandle;
     uint32_t msi_phandle;
diff --git a/qemu-options.hx b/qemu-options.hx
index bd97086c21..245eb415a6 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -184,25 +184,29 @@ SRST
 ERST
 
 DEF("smp", HAS_ARG, QEMU_OPTION_smp,
-    "-smp [cpus=]n[,maxcpus=cpus][,cores=cores][,threads=threads][,dies=dies][,sockets=sockets]\n"
+    "-smp [cpus=]n[,maxcpus=cpus][,cores=cores][,threads=threads][,clusters=clusters][,dies=dies][,sockets=sockets]\n"
     "                set the number of CPUs to 'n' [default=1]\n"
     "                maxcpus= maximum number of total cpus, including\n"
     "                offline CPUs for hotplug, etc\n"
-    "                cores= number of CPU cores on one socket (for PC, it's on one die)\n"
+    "                cores= number of CPU cores on one socket\n"
+    "                (it's on one die for PC, and on one cluster for ARM)\n"
     "                threads= number of threads on one CPU core\n"
+    "                clusters= number of CPU clusters on one socket (for ARM only)\n"
     "                dies= number of CPU dies on one socket (for PC only)\n"
     "                sockets= number of discrete sockets in the system\n",
         QEMU_ARCH_ALL)
 SRST
-``-smp [cpus=]n[,cores=cores][,threads=threads][,dies=dies][,sockets=sockets][,maxcpus=maxcpus]``
-    Simulate an SMP system with n CPUs. On the PC target, up to 255 CPUs
-    are supported. On Sparc32 target, Linux limits the number of usable
-    CPUs to 4. For the PC target, the number of cores per die, the
-    number of threads per cores, the number of dies per packages and the
-    total number of sockets can be specified. Missing values will be
-    computed. If any on the three values is given, the total number of
-    CPUs n can be omitted. maxcpus specifies the maximum number of
-    hotpluggable CPUs.
+``-smp [cpus=]n[,cores=cores][,threads=threads][,clusters=clusters][,dies=dies][,sockets=sockets][,maxcpus=maxcpus]``
+    Simulate an SMP system with n CPUs. On the PC target, up to 255
+    CPUs are supported. On the Sparc32 target, Linux limits the number
+    of usable CPUs to 4. For the PC target, the number of threads per
+    core, the number of cores per die, the number of dies per package
+    and the total number of sockets can be specified. For the ARM target,
+    the number of threads per core, the number of cores per cluster, the
+    number of clusters per socket and the total number of sockets can be
+    specified. And missing values will be computed. If any of the five
+    values is given, the total number of CPUs n can be omitted. Maxcpus
+    specifies the maximum number of hotpluggable CPUs.
 
     For the ARM target, at least one of cpus or maxcpus must be provided.
     Threads will default to 1 if not provided. Sockets and cores must be
diff --git a/softmmu/vl.c b/softmmu/vl.c
index 307944aef3..69a5c73ef7 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -719,6 +719,9 @@ static QemuOptsList qemu_smp_opts = {
         }, {
             .name = "dies",
             .type = QEMU_OPT_NUMBER,
+        }, {
+            .name = "clusters",
+            .type = QEMU_OPT_NUMBER,
         }, {
             .name = "cores",
             .type = QEMU_OPT_NUMBER,
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH v3 2/4] hw/arm/virt: Add cluster level to device tree
  2021-05-16 10:32 [RFC PATCH v3 0/4] hw/arm/virt: Introduce cluster cpu topology support Yanan Wang
  2021-05-16 10:32 ` [RFC PATCH v3 1/4] vl.c: Add -smp, clusters=* command line support for ARM cpu Yanan Wang
@ 2021-05-16 10:32 ` Yanan Wang
  2021-05-16 10:32 ` [RFC PATCH v3 3/4] hw/arm/virt-acpi-build: Add cluster level to PPTT table Yanan Wang
  2021-05-16 10:32 ` [RFC PATCH v3 4/4] hw/arm/virt: Parse -smp cluster parameter in virt_smp_parse Yanan Wang
  3 siblings, 0 replies; 14+ messages in thread
From: Yanan Wang @ 2021-05-16 10:32 UTC (permalink / raw)
  To: Peter Maydell, Paolo Bonzini, Andrew Jones, Michael S . Tsirkin,
	Igor Mammedov, Shannon Zhao, qemu-devel, qemu-arm
  Cc: Barry Song, Philippe Mathieu-Daudé,
	Yanan Wang, prime.zeng, wanghaibin.wang, yuzenghui, yangyicong,
	zhukeqian1

Add a cluster level between core level and socket level to ARM
device tree. This is also consistent with content in Linux Doc
"Documentation/devicetree/bindings/cpu/cpu-topology.txt".

Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
---
 hw/arm/virt.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 44e990e3be..7de822e491 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -431,14 +431,18 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms)
 
             if (ms->smp.threads > 1) {
                 map_path = g_strdup_printf(
-                    "/cpus/cpu-map/%s%d/%s%d/%s%d",
-                    "socket", cpu / (ms->smp.cores * ms->smp.threads),
+                    "/cpus/cpu-map/%s%d/%s%d/%s%d/%s%d",
+                    "socket", cpu / (vms->smp_clusters * ms->smp.cores *
+                    ms->smp.threads),
+                    "cluster", (cpu / (ms->smp.cores * ms->smp.threads)) %
+                    vms->smp_clusters,
                     "core", (cpu / ms->smp.threads) % ms->smp.cores,
                     "thread", cpu % ms->smp.threads);
             } else {
                 map_path = g_strdup_printf(
-                    "/cpus/cpu-map/%s%d/%s%d",
-                    "socket", cpu / ms->smp.cores,
+                    "/cpus/cpu-map/%s%d/%s%d/%s%d",
+                    "socket", cpu / (vms->smp_clusters * ms->smp.cores),
+                    "cluster", (cpu / ms->smp.cores) % vms->smp_clusters,
                     "core", cpu % ms->smp.cores);
             }
             qemu_fdt_add_path(ms->fdt, map_path);
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH v3 3/4] hw/arm/virt-acpi-build: Add cluster level to PPTT table
  2021-05-16 10:32 [RFC PATCH v3 0/4] hw/arm/virt: Introduce cluster cpu topology support Yanan Wang
  2021-05-16 10:32 ` [RFC PATCH v3 1/4] vl.c: Add -smp, clusters=* command line support for ARM cpu Yanan Wang
  2021-05-16 10:32 ` [RFC PATCH v3 2/4] hw/arm/virt: Add cluster level to device tree Yanan Wang
@ 2021-05-16 10:32 ` Yanan Wang
  2021-05-16 10:32 ` [RFC PATCH v3 4/4] hw/arm/virt: Parse -smp cluster parameter in virt_smp_parse Yanan Wang
  3 siblings, 0 replies; 14+ messages in thread
From: Yanan Wang @ 2021-05-16 10:32 UTC (permalink / raw)
  To: Peter Maydell, Paolo Bonzini, Andrew Jones, Michael S . Tsirkin,
	Igor Mammedov, Shannon Zhao, qemu-devel, qemu-arm
  Cc: Barry Song, Philippe Mathieu-Daudé,
	Yanan Wang, prime.zeng, wanghaibin.wang, yuzenghui, yangyicong,
	zhukeqian1

Add a Processor Hierarchy Node of cluster level between core level
and socket level to ARM PPTT table.

Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
---
 hw/arm/virt-acpi-build.c | 45 ++++++++++++++++++++++++----------------
 1 file changed, 27 insertions(+), 18 deletions(-)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index b03d57745a..4d09b51bb0 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -447,34 +447,43 @@ static void build_pptt(GArray *table_data, BIOSLinker *linker,
 
     for (socket = 0; socket < ms->smp.sockets; socket++) {
         uint32_t socket_offset = table_data->len - pptt_start;
-        int core;
+        int cluster;
 
         build_processor_hierarchy_node(
             table_data,
             (1 << 0), /* ACPI 6.2 - Physical package */
             0, socket, NULL, 0);
 
-        for (core = 0; core < ms->smp.cores; core++) {
-            uint32_t core_offset = table_data->len - pptt_start;
-            int thread;
-
-            if (ms->smp.threads <= 1) {
-                build_processor_hierarchy_node(
-                    table_data,
-                    (1 << 1) | /* ACPI 6.2 - ACPI Processor ID valid */
-                    (1 << 3),  /* ACPI 6.3 - Node is a Leaf */
-                    socket_offset, uid++, NULL, 0);
-            } else {
-                build_processor_hierarchy_node(table_data, 0, socket_offset,
-                                               core, NULL, 0);
-
-                for (thread = 0; thread < ms->smp.threads; thread++) {
+        for (cluster = 0; cluster < vms->smp_clusters; cluster++) {
+            uint32_t cluster_offset = table_data->len - pptt_start;
+            int core;
+
+            build_processor_hierarchy_node(table_data, 0, socket_offset,
+                                           cluster, NULL, 0);
+
+            for (core = 0; core < ms->smp.cores; core++) {
+                uint32_t core_offset = table_data->len - pptt_start;
+                int thread;
+
+                if (ms->smp.threads <= 1) {
                     build_processor_hierarchy_node(
                         table_data,
                         (1 << 1) | /* ACPI 6.2 - ACPI Processor ID valid */
-                        (1 << 2) | /* ACPI 6.3 - Processor is a Thread */
                         (1 << 3),  /* ACPI 6.3 - Node is a Leaf */
-                        core_offset, uid++, NULL, 0);
+                        cluster_offset, uid++, NULL, 0);
+                } else {
+                    build_processor_hierarchy_node(table_data, 0,
+                                                   cluster_offset,
+                                                   core, NULL, 0);
+
+                    for (thread = 0; thread < ms->smp.threads; thread++) {
+                        build_processor_hierarchy_node(
+                            table_data,
+                            (1 << 1) | /* ACPI 6.2 - ACPI Processor ID valid */
+                            (1 << 2) | /* ACPI 6.3 - Processor is a Thread */
+                            (1 << 3),  /* ACPI 6.3 - Node is a Leaf */
+                            core_offset, uid++, NULL, 0);
+                    }
                 }
             }
         }
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH v3 4/4] hw/arm/virt: Parse -smp cluster parameter in virt_smp_parse
  2021-05-16 10:32 [RFC PATCH v3 0/4] hw/arm/virt: Introduce cluster cpu topology support Yanan Wang
                   ` (2 preceding siblings ...)
  2021-05-16 10:32 ` [RFC PATCH v3 3/4] hw/arm/virt-acpi-build: Add cluster level to PPTT table Yanan Wang
@ 2021-05-16 10:32 ` Yanan Wang
  2021-05-17  9:12   ` Andrew Jones
  2021-05-17 15:17   ` Salil Mehta
  3 siblings, 2 replies; 14+ messages in thread
From: Yanan Wang @ 2021-05-16 10:32 UTC (permalink / raw)
  To: Peter Maydell, Paolo Bonzini, Andrew Jones, Michael S . Tsirkin,
	Igor Mammedov, Shannon Zhao, qemu-devel, qemu-arm
  Cc: Barry Song, Philippe Mathieu-Daudé,
	Yanan Wang, prime.zeng, wanghaibin.wang, yuzenghui, yangyicong,
	zhukeqian1

There is a separate function virt_smp_parse() in hw/virt/arm.c used
to parse cpu topology for the ARM machines. So add parsing of -smp
cluster parameter in it, then total number of logical cpus will be
calculated like: max_cpus = sockets * clusters * cores * threads.

Note, we will assume multi-cluster in one socket is not supported
and default the value of clusters to 1, if it's not explicitly
specified in -smp cmdline.

Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
---
 hw/arm/virt.c | 32 ++++++++++++++++++--------------
 1 file changed, 18 insertions(+), 14 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 7de822e491..678d5ef36c 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -2642,8 +2642,8 @@ static int virt_kvm_type(MachineState *ms, const char *type_str)
  * with the -smp cmdlines when parsing them.
  *
  * We require that at least one of cpus or maxcpus must be provided.
- * Threads will default to 1 if not provided. Sockets and cores must
- * be either both provided or both not.
+ * Clusters and threads will default to 1 if they are not provided.
+ * Sockets and cores must be either both provided or both not.
  *
  * Note, if neither sockets nor cores are specified, we will calculate
  * all the missing values just like smp_parse() does, but will disable
@@ -2652,15 +2652,18 @@ static int virt_kvm_type(MachineState *ms, const char *type_str)
 static void virt_smp_parse(MachineState *ms, QemuOpts *opts)
 {
     VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(ms);
+    VirtMachineState *vms = VIRT_MACHINE(ms);
 
     if (opts) {
         unsigned cpus = qemu_opt_get_number(opts, "cpus", 0);
         unsigned maxcpus = qemu_opt_get_number(opts, "maxcpus", 0);
         unsigned sockets = qemu_opt_get_number(opts, "sockets", 0);
+        unsigned clusters = qemu_opt_get_number(opts, "clusters", 0);
         unsigned cores = qemu_opt_get_number(opts, "cores", 0);
         unsigned threads = qemu_opt_get_number(opts, "threads", 0);
 
-        /* Default threads to 1 if not provided */
+        /* Default clusters and threads to 1 if not provided */
+        clusters = clusters > 0 ? clusters : 1;
         threads = threads > 0 ? threads : 1;
 
         if (cpus == 0 && maxcpus == 0) {
@@ -2676,13 +2679,13 @@ static void virt_smp_parse(MachineState *ms, QemuOpts *opts)
             cores = 1;
             if (cpus == 0) {
                 sockets = 1;
-                cpus = sockets * cores * threads;
+                cpus = sockets * clusters * cores * threads;
             } else {
                 maxcpus = maxcpus > 0 ? maxcpus : cpus;
-                sockets = maxcpus / (cores * threads);
+                sockets = maxcpus / (clusters * cores * threads);
             }
         } else if (sockets > 0 && cores > 0) {
-            cpus = cpus > 0 ? cpus : sockets * cores * threads;
+            cpus = cpus > 0 ? cpus : sockets * clusters * cores * threads;
             maxcpus = maxcpus > 0 ? maxcpus : cpus;
         } else {
             error_report("sockets and cores must be both provided "
@@ -2695,25 +2698,26 @@ static void virt_smp_parse(MachineState *ms, QemuOpts *opts)
             exit(1);
         }
 
-        if (sockets * cores * threads < cpus) {
+        if (sockets * clusters * cores * threads < cpus) {
             error_report("cpu topology: "
-                         "sockets (%u) * cores (%u) * threads (%u) < "
-                         "smp_cpus (%u)",
-                         sockets, cores, threads, cpus);
+                         "sockets (%u) * clusters (%u) * cores (%u) * "
+                         "threads (%u) < smp_cpus (%u)",
+                         sockets, clusters, cores, threads, cpus);
             exit(1);
         }
 
-        if (sockets * cores * threads != maxcpus) {
+        if (sockets * clusters * cores * threads != maxcpus) {
             error_report("cpu topology: "
-                         "sockets (%u) * cores (%u) * threads (%u) "
-                         "!= maxcpus (%u)",
-                         sockets, cores, threads, maxcpus);
+                         "sockets (%u) * clusters (%u) * cores (%u) * "
+                         "threads (%u) != maxcpus (%u)",
+                         sockets, clusters, cores, threads, maxcpus);
             exit(1);
         }
 
         ms->smp.cpus = cpus;
         ms->smp.max_cpus = maxcpus;
         ms->smp.sockets = sockets;
+        vms->smp_clusters = clusters;
         ms->smp.cores = cores;
         ms->smp.threads = threads;
     }
-- 
2.19.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v3 1/4] vl.c: Add -smp, clusters=* command line support for ARM cpu
  2021-05-16 10:32 ` [RFC PATCH v3 1/4] vl.c: Add -smp, clusters=* command line support for ARM cpu Yanan Wang
@ 2021-05-17  9:07   ` Andrew Jones
  2021-05-17 15:07     ` wangyanan (Y)
  0 siblings, 1 reply; 14+ messages in thread
From: Andrew Jones @ 2021-05-17  9:07 UTC (permalink / raw)
  To: Yanan Wang
  Cc: Barry Song, Peter Maydell, Michael S . Tsirkin, wanghaibin.wang,
	zhukeqian1, qemu-devel, yangyicong, Shannon Zhao, qemu-arm,
	prime.zeng, Paolo Bonzini, yuzenghui, Igor Mammedov,
	Philippe Mathieu-Daudé

On Sun, May 16, 2021 at 06:32:25PM +0800, Yanan Wang wrote:
> In implementations of ARM architecture, at most there could be a
> cpu hierarchy like "sockets/dies/clusters/cores/threads" defined.
> For example, ARM64 server chip Kunpeng 920 totally has 2 sockets,
> 2 NUMA nodes (also means cpu dies) in each socket, 6 clusters in
> each NUMA node, 4 cores in each cluster, and doesn't support SMT.
> Clusters within the same NUMA share a L3 cache and cores within
> the same cluster share a L2 cache.
> 
> The cache affinity of ARM cluster has been proved to improve the
> kernel scheduling performance and a patchset has been posted, in
> which a general sched_domain for clusters was added and a cluster
> level was added in the arch-neutral cpu topology struct like below.
> 
> struct cpu_topology {
>     int thread_id;
>     int core_id;
>     int cluster_id;
>     int package_id;
>     int llc_id;
>     cpumask_t thread_sibling;
>     cpumask_t core_sibling;
>     cpumask_t cluster_sibling;
>     cpumask_t llc_sibling;
> }
> 
> In virtuallization, exposing the cluster level topology to guest
> kernel may also improve the scheduling performance. So let's add
> the -smp, clusters=* command line support for ARM cpu, then users
> will be able to define a four-level cpu hierarchy for machines
> and it will be sockets/clusters/cores/threads.
> 
> Because we only support clusters for ARM cpu currently, a new member
> "smp_clusters" is only added to the VirtMachineState structure.
> 
> Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
> ---
>  include/hw/arm/virt.h |  1 +
>  qemu-options.hx       | 26 +++++++++++++++-----------
>  softmmu/vl.c          |  3 +++
>  3 files changed, 19 insertions(+), 11 deletions(-)
> 
> diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
> index f546dd2023..74fff9667b 100644
> --- a/include/hw/arm/virt.h
> +++ b/include/hw/arm/virt.h
> @@ -156,6 +156,7 @@ struct VirtMachineState {
>      char *pciehb_nodename;
>      const int *irqmap;
>      int fdt_size;
> +    unsigned smp_clusters;
>      uint32_t clock_phandle;
>      uint32_t gic_phandle;
>      uint32_t msi_phandle;
> diff --git a/qemu-options.hx b/qemu-options.hx
> index bd97086c21..245eb415a6 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -184,25 +184,29 @@ SRST
>  ERST
>  
>  DEF("smp", HAS_ARG, QEMU_OPTION_smp,
> -    "-smp [cpus=]n[,maxcpus=cpus][,cores=cores][,threads=threads][,dies=dies][,sockets=sockets]\n"
> +    "-smp [cpus=]n[,maxcpus=cpus][,cores=cores][,threads=threads][,clusters=clusters][,dies=dies][,sockets=sockets]\n"
>      "                set the number of CPUs to 'n' [default=1]\n"
>      "                maxcpus= maximum number of total cpus, including\n"
>      "                offline CPUs for hotplug, etc\n"
> -    "                cores= number of CPU cores on one socket (for PC, it's on one die)\n"
> +    "                cores= number of CPU cores on one socket\n"
> +    "                (it's on one die for PC, and on one cluster for ARM)\n"
>      "                threads= number of threads on one CPU core\n"
> +    "                clusters= number of CPU clusters on one socket (for ARM only)\n"
>      "                dies= number of CPU dies on one socket (for PC only)\n"
>      "                sockets= number of discrete sockets in the system\n",
>          QEMU_ARCH_ALL)
>  SRST
> -``-smp [cpus=]n[,cores=cores][,threads=threads][,dies=dies][,sockets=sockets][,maxcpus=maxcpus]``
> -    Simulate an SMP system with n CPUs. On the PC target, up to 255 CPUs
> -    are supported. On Sparc32 target, Linux limits the number of usable
> -    CPUs to 4. For the PC target, the number of cores per die, the
> -    number of threads per cores, the number of dies per packages and the
> -    total number of sockets can be specified. Missing values will be
> -    computed. If any on the three values is given, the total number of
> -    CPUs n can be omitted. maxcpus specifies the maximum number of
> -    hotpluggable CPUs.
> +``-smp [cpus=]n[,cores=cores][,threads=threads][,clusters=clusters][,dies=dies][,sockets=sockets][,maxcpus=maxcpus]``
> +    Simulate an SMP system with n CPUs. On the PC target, up to 255
> +    CPUs are supported. On the Sparc32 target, Linux limits the number
> +    of usable CPUs to 4. For the PC target, the number of threads per
> +    core, the number of cores per die, the number of dies per package
> +    and the total number of sockets can be specified. For the ARM target,
> +    the number of threads per core, the number of cores per cluster, the
> +    number of clusters per socket and the total number of sockets can be
> +    specified. And missing values will be computed. If any of the five
                  ^ Why did you add this 'And'?
> +    values is given, the total number of CPUs n can be omitted.

The last two sentences are not valid for Arm, which requires most of its
parameters to be given.

> Maxcpus
> +    specifies the maximum number of hotpluggable CPUs.
>  
>      For the ARM target, at least one of cpus or maxcpus must be provided.
>      Threads will default to 1 if not provided. Sockets and cores must be
> diff --git a/softmmu/vl.c b/softmmu/vl.c
> index 307944aef3..69a5c73ef7 100644
> --- a/softmmu/vl.c
> +++ b/softmmu/vl.c
> @@ -719,6 +719,9 @@ static QemuOptsList qemu_smp_opts = {
>          }, {
>              .name = "dies",
>              .type = QEMU_OPT_NUMBER,
> +        }, {
> +            .name = "clusters",
> +            .type = QEMU_OPT_NUMBER,
>          }, {
>              .name = "cores",
>              .type = QEMU_OPT_NUMBER,
> -- 
> 2.19.1
>

Thanks,
drew



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v3 4/4] hw/arm/virt: Parse -smp cluster parameter in virt_smp_parse
  2021-05-16 10:32 ` [RFC PATCH v3 4/4] hw/arm/virt: Parse -smp cluster parameter in virt_smp_parse Yanan Wang
@ 2021-05-17  9:12   ` Andrew Jones
  2021-05-17 15:10     ` wangyanan (Y)
  2021-05-17 15:17   ` Salil Mehta
  1 sibling, 1 reply; 14+ messages in thread
From: Andrew Jones @ 2021-05-17  9:12 UTC (permalink / raw)
  To: Yanan Wang
  Cc: Barry Song, Peter Maydell, Michael S . Tsirkin, wanghaibin.wang,
	zhukeqian1, qemu-devel, yangyicong, Shannon Zhao, qemu-arm,
	prime.zeng, Paolo Bonzini, yuzenghui, Igor Mammedov,
	Philippe Mathieu-Daudé

On Sun, May 16, 2021 at 06:32:28PM +0800, Yanan Wang wrote:
> There is a separate function virt_smp_parse() in hw/virt/arm.c used
> to parse cpu topology for the ARM machines. So add parsing of -smp
> cluster parameter in it, then total number of logical cpus will be
> calculated like: max_cpus = sockets * clusters * cores * threads.
> 
> Note, we will assume multi-cluster in one socket is not supported
> and default the value of clusters to 1, if it's not explicitly
> specified in -smp cmdline.
> 
> Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
> ---
>  hw/arm/virt.c | 32 ++++++++++++++++++--------------
>  1 file changed, 18 insertions(+), 14 deletions(-)
> 
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 7de822e491..678d5ef36c 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -2642,8 +2642,8 @@ static int virt_kvm_type(MachineState *ms, const char *type_str)
>   * with the -smp cmdlines when parsing them.
>   *
>   * We require that at least one of cpus or maxcpus must be provided.
> - * Threads will default to 1 if not provided. Sockets and cores must
> - * be either both provided or both not.
> + * Clusters and threads will default to 1 if they are not provided.
> + * Sockets and cores must be either both provided or both not.
>   *
>   * Note, if neither sockets nor cores are specified, we will calculate
>   * all the missing values just like smp_parse() does, but will disable
> @@ -2652,15 +2652,18 @@ static int virt_kvm_type(MachineState *ms, const char *type_str)
>  static void virt_smp_parse(MachineState *ms, QemuOpts *opts)
>  {
>      VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(ms);
> +    VirtMachineState *vms = VIRT_MACHINE(ms);
>  
>      if (opts) {
>          unsigned cpus = qemu_opt_get_number(opts, "cpus", 0);
>          unsigned maxcpus = qemu_opt_get_number(opts, "maxcpus", 0);
>          unsigned sockets = qemu_opt_get_number(opts, "sockets", 0);
> +        unsigned clusters = qemu_opt_get_number(opts, "clusters", 0);
>          unsigned cores = qemu_opt_get_number(opts, "cores", 0);
>          unsigned threads = qemu_opt_get_number(opts, "threads", 0);
>  
> -        /* Default threads to 1 if not provided */
> +        /* Default clusters and threads to 1 if not provided */
> +        clusters = clusters > 0 ? clusters : 1;
>          threads = threads > 0 ? threads : 1;
>  
>          if (cpus == 0 && maxcpus == 0) {
> @@ -2676,13 +2679,13 @@ static void virt_smp_parse(MachineState *ms, QemuOpts *opts)
>              cores = 1;
>              if (cpus == 0) {
>                  sockets = 1;
> -                cpus = sockets * cores * threads;
> +                cpus = sockets * clusters * cores * threads;
>              } else {
>                  maxcpus = maxcpus > 0 ? maxcpus : cpus;
> -                sockets = maxcpus / (cores * threads);
> +                sockets = maxcpus / (clusters * cores * threads);
>              }
>          } else if (sockets > 0 && cores > 0) {
> -            cpus = cpus > 0 ? cpus : sockets * cores * threads;
> +            cpus = cpus > 0 ? cpus : sockets * clusters * cores * threads;
>              maxcpus = maxcpus > 0 ? maxcpus : cpus;
>          } else {
>              error_report("sockets and cores must be both provided "
> @@ -2695,25 +2698,26 @@ static void virt_smp_parse(MachineState *ms, QemuOpts *opts)
>              exit(1);
>          }
>  
> -        if (sockets * cores * threads < cpus) {
> +        if (sockets * clusters * cores * threads < cpus) {
>              error_report("cpu topology: "
> -                         "sockets (%u) * cores (%u) * threads (%u) < "
> -                         "smp_cpus (%u)",
> -                         sockets, cores, threads, cpus);
> +                         "sockets (%u) * clusters (%u) * cores (%u) * "
> +                         "threads (%u) < smp_cpus (%u)",
> +                         sockets, clusters, cores, threads, cpus);
>              exit(1);
>          }
>  
> -        if (sockets * cores * threads != maxcpus) {
> +        if (sockets * clusters * cores * threads != maxcpus) {
>              error_report("cpu topology: "
> -                         "sockets (%u) * cores (%u) * threads (%u) "
> -                         "!= maxcpus (%u)",
> -                         sockets, cores, threads, maxcpus);
> +                         "sockets (%u) * clusters (%u) * cores (%u) * "
> +                         "threads (%u) != maxcpus (%u)",
> +                         sockets, clusters, cores, threads, maxcpus);
>              exit(1);
>          }
>  
>          ms->smp.cpus = cpus;
>          ms->smp.max_cpus = maxcpus;
>          ms->smp.sockets = sockets;
> +        vms->smp_clusters = clusters;
>          ms->smp.cores = cores;
>          ms->smp.threads = threads;
>      }
> -- 
> 2.19.1
>

After reworking "[RFC PATCH v3 9/9] hw/arm/virt: Add separate -smp parsing
function for ARM machines", this should also be reworked and fully tested,
possibly using a standalone test, as as I suggested in the other review.

Thanks,
drew



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v3 1/4] vl.c: Add -smp, clusters=* command line support for ARM cpu
  2021-05-17  9:07   ` Andrew Jones
@ 2021-05-17 15:07     ` wangyanan (Y)
  0 siblings, 0 replies; 14+ messages in thread
From: wangyanan (Y) @ 2021-05-17 15:07 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Barry Song, Peter Maydell, Michael S . Tsirkin, wanghaibin.wang,
	zhukeqian1, qemu-devel, yangyicong, Shannon Zhao, qemu-arm,
	prime.zeng, Paolo Bonzini, yuzenghui, Igor Mammedov,
	Philippe Mathieu-Daudé


On 2021/5/17 17:07, Andrew Jones wrote:
> On Sun, May 16, 2021 at 06:32:25PM +0800, Yanan Wang wrote:
>> In implementations of ARM architecture, at most there could be a
>> cpu hierarchy like "sockets/dies/clusters/cores/threads" defined.
>> For example, ARM64 server chip Kunpeng 920 totally has 2 sockets,
>> 2 NUMA nodes (also means cpu dies) in each socket, 6 clusters in
>> each NUMA node, 4 cores in each cluster, and doesn't support SMT.
>> Clusters within the same NUMA share a L3 cache and cores within
>> the same cluster share a L2 cache.
>>
>> The cache affinity of ARM cluster has been proved to improve the
>> kernel scheduling performance and a patchset has been posted, in
>> which a general sched_domain for clusters was added and a cluster
>> level was added in the arch-neutral cpu topology struct like below.
>>
>> struct cpu_topology {
>>      int thread_id;
>>      int core_id;
>>      int cluster_id;
>>      int package_id;
>>      int llc_id;
>>      cpumask_t thread_sibling;
>>      cpumask_t core_sibling;
>>      cpumask_t cluster_sibling;
>>      cpumask_t llc_sibling;
>> }
>>
>> In virtuallization, exposing the cluster level topology to guest
>> kernel may also improve the scheduling performance. So let's add
>> the -smp, clusters=* command line support for ARM cpu, then users
>> will be able to define a four-level cpu hierarchy for machines
>> and it will be sockets/clusters/cores/threads.
>>
>> Because we only support clusters for ARM cpu currently, a new member
>> "smp_clusters" is only added to the VirtMachineState structure.
>>
>> Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
>> ---
>>   include/hw/arm/virt.h |  1 +
>>   qemu-options.hx       | 26 +++++++++++++++-----------
>>   softmmu/vl.c          |  3 +++
>>   3 files changed, 19 insertions(+), 11 deletions(-)
>>
>> diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
>> index f546dd2023..74fff9667b 100644
>> --- a/include/hw/arm/virt.h
>> +++ b/include/hw/arm/virt.h
>> @@ -156,6 +156,7 @@ struct VirtMachineState {
>>       char *pciehb_nodename;
>>       const int *irqmap;
>>       int fdt_size;
>> +    unsigned smp_clusters;
>>       uint32_t clock_phandle;
>>       uint32_t gic_phandle;
>>       uint32_t msi_phandle;
>> diff --git a/qemu-options.hx b/qemu-options.hx
>> index bd97086c21..245eb415a6 100644
>> --- a/qemu-options.hx
>> +++ b/qemu-options.hx
>> @@ -184,25 +184,29 @@ SRST
>>   ERST
>>   
>>   DEF("smp", HAS_ARG, QEMU_OPTION_smp,
>> -    "-smp [cpus=]n[,maxcpus=cpus][,cores=cores][,threads=threads][,dies=dies][,sockets=sockets]\n"
>> +    "-smp [cpus=]n[,maxcpus=cpus][,cores=cores][,threads=threads][,clusters=clusters][,dies=dies][,sockets=sockets]\n"
>>       "                set the number of CPUs to 'n' [default=1]\n"
>>       "                maxcpus= maximum number of total cpus, including\n"
>>       "                offline CPUs for hotplug, etc\n"
>> -    "                cores= number of CPU cores on one socket (for PC, it's on one die)\n"
>> +    "                cores= number of CPU cores on one socket\n"
>> +    "                (it's on one die for PC, and on one cluster for ARM)\n"
>>       "                threads= number of threads on one CPU core\n"
>> +    "                clusters= number of CPU clusters on one socket (for ARM only)\n"
>>       "                dies= number of CPU dies on one socket (for PC only)\n"
>>       "                sockets= number of discrete sockets in the system\n",
>>           QEMU_ARCH_ALL)
>>   SRST
>> -``-smp [cpus=]n[,cores=cores][,threads=threads][,dies=dies][,sockets=sockets][,maxcpus=maxcpus]``
>> -    Simulate an SMP system with n CPUs. On the PC target, up to 255 CPUs
>> -    are supported. On Sparc32 target, Linux limits the number of usable
>> -    CPUs to 4. For the PC target, the number of cores per die, the
>> -    number of threads per cores, the number of dies per packages and the
>> -    total number of sockets can be specified. Missing values will be
>> -    computed. If any on the three values is given, the total number of
>> -    CPUs n can be omitted. maxcpus specifies the maximum number of
>> -    hotpluggable CPUs.
>> +``-smp [cpus=]n[,cores=cores][,threads=threads][,clusters=clusters][,dies=dies][,sockets=sockets][,maxcpus=maxcpus]``
>> +    Simulate an SMP system with n CPUs. On the PC target, up to 255
>> +    CPUs are supported. On the Sparc32 target, Linux limits the number
>> +    of usable CPUs to 4. For the PC target, the number of threads per
>> +    core, the number of cores per die, the number of dies per package
>> +    and the total number of sockets can be specified. For the ARM target,
>> +    the number of threads per core, the number of cores per cluster, the
>> +    number of clusters per socket and the total number of sockets can be
>> +    specified. And missing values will be computed. If any of the five
>                    ^ Why did you add this 'And'?
My fault.. I will drop it.
>> +    values is given, the total number of CPUs n can be omitted.
> The last two sentences are not valid for Arm, which requires most of its
> parameters to be given.
Yes, indeed. I think I should state more *clearly* about these two 
sentences.
Will rearrange the Doc in v4.

Thanks,
Yanan
>> Maxcpus
>> +    specifies the maximum number of hotpluggable CPUs.
>>   
>>       For the ARM target, at least one of cpus or maxcpus must be provided.
>>       Threads will default to 1 if not provided. Sockets and cores must be
>> diff --git a/softmmu/vl.c b/softmmu/vl.c
>> index 307944aef3..69a5c73ef7 100644
>> --- a/softmmu/vl.c
>> +++ b/softmmu/vl.c
>> @@ -719,6 +719,9 @@ static QemuOptsList qemu_smp_opts = {
>>           }, {
>>               .name = "dies",
>>               .type = QEMU_OPT_NUMBER,
>> +        }, {
>> +            .name = "clusters",
>> +            .type = QEMU_OPT_NUMBER,
>>           }, {
>>               .name = "cores",
>>               .type = QEMU_OPT_NUMBER,
>> -- 
>> 2.19.1
>>
> Thanks,
> drew
>
> .


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v3 4/4] hw/arm/virt: Parse -smp cluster parameter in virt_smp_parse
  2021-05-17  9:12   ` Andrew Jones
@ 2021-05-17 15:10     ` wangyanan (Y)
  0 siblings, 0 replies; 14+ messages in thread
From: wangyanan (Y) @ 2021-05-17 15:10 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Barry Song, Peter Maydell, Michael S . Tsirkin, wanghaibin.wang,
	zhukeqian1, qemu-devel, yangyicong, Shannon Zhao, qemu-arm,
	prime.zeng, Paolo Bonzini, yuzenghui, Igor Mammedov,
	Philippe Mathieu-Daudé

Hi Drew,

On 2021/5/17 17:12, Andrew Jones wrote:
> On Sun, May 16, 2021 at 06:32:28PM +0800, Yanan Wang wrote:
>> There is a separate function virt_smp_parse() in hw/virt/arm.c used
>> to parse cpu topology for the ARM machines. So add parsing of -smp
>> cluster parameter in it, then total number of logical cpus will be
>> calculated like: max_cpus = sockets * clusters * cores * threads.
>>
>> Note, we will assume multi-cluster in one socket is not supported
>> and default the value of clusters to 1, if it's not explicitly
>> specified in -smp cmdline.
>>
>> Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
>> ---
>>   hw/arm/virt.c | 32 ++++++++++++++++++--------------
>>   1 file changed, 18 insertions(+), 14 deletions(-)
>>
>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
>> index 7de822e491..678d5ef36c 100644
>> --- a/hw/arm/virt.c
>> +++ b/hw/arm/virt.c
>> @@ -2642,8 +2642,8 @@ static int virt_kvm_type(MachineState *ms, const char *type_str)
>>    * with the -smp cmdlines when parsing them.
>>    *
>>    * We require that at least one of cpus or maxcpus must be provided.
>> - * Threads will default to 1 if not provided. Sockets and cores must
>> - * be either both provided or both not.
>> + * Clusters and threads will default to 1 if they are not provided.
>> + * Sockets and cores must be either both provided or both not.
>>    *
>>    * Note, if neither sockets nor cores are specified, we will calculate
>>    * all the missing values just like smp_parse() does, but will disable
>> @@ -2652,15 +2652,18 @@ static int virt_kvm_type(MachineState *ms, const char *type_str)
>>   static void virt_smp_parse(MachineState *ms, QemuOpts *opts)
>>   {
>>       VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(ms);
>> +    VirtMachineState *vms = VIRT_MACHINE(ms);
>>   
>>       if (opts) {
>>           unsigned cpus = qemu_opt_get_number(opts, "cpus", 0);
>>           unsigned maxcpus = qemu_opt_get_number(opts, "maxcpus", 0);
>>           unsigned sockets = qemu_opt_get_number(opts, "sockets", 0);
>> +        unsigned clusters = qemu_opt_get_number(opts, "clusters", 0);
>>           unsigned cores = qemu_opt_get_number(opts, "cores", 0);
>>           unsigned threads = qemu_opt_get_number(opts, "threads", 0);
>>   
>> -        /* Default threads to 1 if not provided */
>> +        /* Default clusters and threads to 1 if not provided */
>> +        clusters = clusters > 0 ? clusters : 1;
>>           threads = threads > 0 ? threads : 1;
>>   
>>           if (cpus == 0 && maxcpus == 0) {
>> @@ -2676,13 +2679,13 @@ static void virt_smp_parse(MachineState *ms, QemuOpts *opts)
>>               cores = 1;
>>               if (cpus == 0) {
>>                   sockets = 1;
>> -                cpus = sockets * cores * threads;
>> +                cpus = sockets * clusters * cores * threads;
>>               } else {
>>                   maxcpus = maxcpus > 0 ? maxcpus : cpus;
>> -                sockets = maxcpus / (cores * threads);
>> +                sockets = maxcpus / (clusters * cores * threads);
>>               }
>>           } else if (sockets > 0 && cores > 0) {
>> -            cpus = cpus > 0 ? cpus : sockets * cores * threads;
>> +            cpus = cpus > 0 ? cpus : sockets * clusters * cores * threads;
>>               maxcpus = maxcpus > 0 ? maxcpus : cpus;
>>           } else {
>>               error_report("sockets and cores must be both provided "
>> @@ -2695,25 +2698,26 @@ static void virt_smp_parse(MachineState *ms, QemuOpts *opts)
>>               exit(1);
>>           }
>>   
>> -        if (sockets * cores * threads < cpus) {
>> +        if (sockets * clusters * cores * threads < cpus) {
>>               error_report("cpu topology: "
>> -                         "sockets (%u) * cores (%u) * threads (%u) < "
>> -                         "smp_cpus (%u)",
>> -                         sockets, cores, threads, cpus);
>> +                         "sockets (%u) * clusters (%u) * cores (%u) * "
>> +                         "threads (%u) < smp_cpus (%u)",
>> +                         sockets, clusters, cores, threads, cpus);
>>               exit(1);
>>           }
>>   
>> -        if (sockets * cores * threads != maxcpus) {
>> +        if (sockets * clusters * cores * threads != maxcpus) {
>>               error_report("cpu topology: "
>> -                         "sockets (%u) * cores (%u) * threads (%u) "
>> -                         "!= maxcpus (%u)",
>> -                         sockets, cores, threads, maxcpus);
>> +                         "sockets (%u) * clusters (%u) * cores (%u) * "
>> +                         "threads (%u) != maxcpus (%u)",
>> +                         sockets, clusters, cores, threads, maxcpus);
>>               exit(1);
>>           }
>>   
>>           ms->smp.cpus = cpus;
>>           ms->smp.max_cpus = maxcpus;
>>           ms->smp.sockets = sockets;
>> +        vms->smp_clusters = clusters;
>>           ms->smp.cores = cores;
>>           ms->smp.threads = threads;
>>       }
>> -- 
>> 2.19.1
>>
> After reworking "[RFC PATCH v3 9/9] hw/arm/virt: Add separate -smp parsing
> function for ARM machines", this should also be reworked and fully tested,
> possibly using a standalone test, as as I suggested in the other review.
Ok, I will make full test.

Thanks,
Yanan
> Thanks,
> drew
>
> .


^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [RFC PATCH v3 4/4] hw/arm/virt: Parse -smp cluster parameter in virt_smp_parse
  2021-05-16 10:32 ` [RFC PATCH v3 4/4] hw/arm/virt: Parse -smp cluster parameter in virt_smp_parse Yanan Wang
  2021-05-17  9:12   ` Andrew Jones
@ 2021-05-17 15:17   ` Salil Mehta
  2021-05-18  3:48     ` wangyanan (Y)
  1 sibling, 1 reply; 14+ messages in thread
From: Salil Mehta @ 2021-05-17 15:17 UTC (permalink / raw)
  To: wangyanan (Y),
	Peter Maydell, Paolo Bonzini, Andrew Jones, Michael S . Tsirkin,
	Igor Mammedov, Shannon Zhao, qemu-devel, qemu-arm
  Cc: Song Bao Hua (Barry Song),
	zhukeqian, Linuxarm, linuxarm, wangyanan (Y), Zengtao (B),
	yangyicong, yuzenghui, Wanghaibin (D),
	Philippe Mathieu-Daudé

> From: Qemu-devel
> [mailto:qemu-devel-bounces+salil.mehta=huawei.com@nongnu.org] On Behalf Of
> Yanan Wang
> Sent: Sunday, May 16, 2021 11:32 AM
> To: Peter Maydell <peter.maydell@linaro.org>; Paolo Bonzini
> <pbonzini@redhat.com>; Andrew Jones <drjones@redhat.com>; Michael S . Tsirkin
> <mst@redhat.com>; Igor Mammedov <imammedo@redhat.com>; Shannon Zhao
> <shannon.zhaosl@gmail.com>; qemu-devel@nongnu.org; qemu-arm@nongnu.org
> Cc: Song Bao Hua (Barry Song) <song.bao.hua@hisilicon.com>; Philippe
> Mathieu-Daudé <philmd@redhat.com>; wangyanan (Y) <wangyanan55@huawei.com>;
> Zengtao (B) <prime.zeng@hisilicon.com>; Wanghaibin (D)
> <wanghaibin.wang@huawei.com>; yuzenghui <yuzenghui@huawei.com>; yangyicong
> <yangyicong@huawei.com>; zhukeqian <zhukeqian1@huawei.com>
> Subject: [RFC PATCH v3 4/4] hw/arm/virt: Parse -smp cluster parameter in
> virt_smp_parse
> 
> There is a separate function virt_smp_parse() in hw/virt/arm.c used
> to parse cpu topology for the ARM machines. So add parsing of -smp
> cluster parameter in it, then total number of logical cpus will be
> calculated like: max_cpus = sockets * clusters * cores * threads.
> 
> Note, we will assume multi-cluster in one socket is not supported
> and default the value of clusters to 1, if it's not explicitly
> specified in -smp cmdline.
> 
> Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
> ---
>  hw/arm/virt.c | 32 ++++++++++++++++++--------------
>  1 file changed, 18 insertions(+), 14 deletions(-)
> 
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 7de822e491..678d5ef36c 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -2642,8 +2642,8 @@ static int virt_kvm_type(MachineState *ms, const char
> *type_str)
>   * with the -smp cmdlines when parsing them.
>   *
>   * We require that at least one of cpus or maxcpus must be provided.
> - * Threads will default to 1 if not provided. Sockets and cores must
> - * be either both provided or both not.
> + * Clusters and threads will default to 1 if they are not provided.
> + * Sockets and cores must be either both provided or both not.
>   *
>   * Note, if neither sockets nor cores are specified, we will calculate
>   * all the missing values just like smp_parse() does, but will disable
> @@ -2652,15 +2652,18 @@ static int virt_kvm_type(MachineState *ms, const char
> *type_str)
>  static void virt_smp_parse(MachineState *ms, QemuOpts *opts)
>  {
>      VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(ms);
> +    VirtMachineState *vms = VIRT_MACHINE(ms);
> 
>      if (opts) {
>          unsigned cpus = qemu_opt_get_number(opts, "cpus", 0);
>          unsigned maxcpus = qemu_opt_get_number(opts, "maxcpus", 0);
>          unsigned sockets = qemu_opt_get_number(opts, "sockets", 0);
> +        unsigned clusters = qemu_opt_get_number(opts, "clusters", 0);
>          unsigned cores = qemu_opt_get_number(opts, "cores", 0);
>          unsigned threads = qemu_opt_get_number(opts, "threads", 0);
> 
> -        /* Default threads to 1 if not provided */
> +        /* Default clusters and threads to 1 if not provided */
> +        clusters = clusters > 0 ? clusters : 1;
>          threads = threads > 0 ? threads : 1;
> 
>          if (cpus == 0 && maxcpus == 0) {
> @@ -2676,13 +2679,13 @@ static void virt_smp_parse(MachineState *ms, QemuOpts
> *opts)
>              cores = 1;
>              if (cpus == 0) {
>                  sockets = 1;
> -                cpus = sockets * cores * threads;
> +                cpus = sockets * clusters * cores * threads;
>              } else {
>                  maxcpus = maxcpus > 0 ? maxcpus : cpus;
> -                sockets = maxcpus / (cores * threads);
> +                sockets = maxcpus / (clusters * cores * threads);
>              }
>          } else if (sockets > 0 && cores > 0) {
> -            cpus = cpus > 0 ? cpus : sockets * cores * threads;
> +            cpus = cpus > 0 ? cpus : sockets * clusters * cores * threads;
>              maxcpus = maxcpus > 0 ? maxcpus : cpus;
>          } else {
>              error_report("sockets and cores must be both provided "
> @@ -2695,25 +2698,26 @@ static void virt_smp_parse(MachineState *ms, QemuOpts
> *opts)
>              exit(1);
>          }
> 
> -        if (sockets * cores * threads < cpus) {
> +        if (sockets * clusters * cores * threads < cpus) {
>              error_report("cpu topology: "
> -                         "sockets (%u) * cores (%u) * threads (%u) < "
> -                         "smp_cpus (%u)",
> -                         sockets, cores, threads, cpus);
> +                         "sockets (%u) * clusters (%u) * cores (%u) * "
> +                         "threads (%u) < smp_cpus (%u)",
> +                         sockets, clusters, cores, threads, cpus);
>              exit(1);
>          }
> 
> -        if (sockets * cores * threads != maxcpus) {
> +        if (sockets * clusters * cores * threads != maxcpus) {
>              error_report("cpu topology: "
> -                         "sockets (%u) * cores (%u) * threads (%u) "
> -                         "!= maxcpus (%u)",
> -                         sockets, cores, threads, maxcpus);
> +                         "sockets (%u) * clusters (%u) * cores (%u) * "
> +                         "threads (%u) != maxcpus (%u)",
> +                         sockets, clusters, cores, threads, maxcpus);
>              exit(1);
>          }
> 
>          ms->smp.cpus = cpus;
>          ms->smp.max_cpus = maxcpus;
>          ms->smp.sockets = sockets;
> +        vms->smp_clusters = clusters;


This variable naming *smp_clusters* looks out-of-sorts. I thought a similar
variable *smp_cpus* was destined to be removed for the reason given in below
link - a patch by Andrew Jones?

Link: https://lists.gnu.org/archive/html/qemu-arm/2020-12/msg00418.html

Am I missing anything here?

Salil.

>          ms->smp.cores = cores;
>          ms->smp.threads = threads;
>      }



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v3 4/4] hw/arm/virt: Parse -smp cluster parameter in virt_smp_parse
  2021-05-17 15:17   ` Salil Mehta
@ 2021-05-18  3:48     ` wangyanan (Y)
  2021-05-18  6:52         ` Salil Mehta
  2021-05-18  8:19       ` Andrew Jones
  0 siblings, 2 replies; 14+ messages in thread
From: wangyanan (Y) @ 2021-05-18  3:48 UTC (permalink / raw)
  To: Salil Mehta
  Cc: Barry Song, Peter Maydell, Andrew Jones, Michael S . Tsirkin,
	wanghaibin.wang, zhukeqian1, qemu-devel, yangyicong,
	Shannon Zhao, qemu-arm, prime.zeng, Paolo Bonzini, yuzenghui,
	Igor Mammedov, Philippe Mathieu-Daudé

Hi Salil,

On 2021/5/17 23:17, Salil Mehta wrote:
>> From: Qemu-devel
>> [mailto:qemu-devel-bounces+salil.mehta=huawei.com@nongnu.org] On Behalf Of
>> Yanan Wang
>> Sent: Sunday, May 16, 2021 11:32 AM
>> To: Peter Maydell <peter.maydell@linaro.org>; Paolo Bonzini
>> <pbonzini@redhat.com>; Andrew Jones <drjones@redhat.com>; Michael S . Tsirkin
>> <mst@redhat.com>; Igor Mammedov <imammedo@redhat.com>; Shannon Zhao
>> <shannon.zhaosl@gmail.com>; qemu-devel@nongnu.org; qemu-arm@nongnu.org
>> Cc: Song Bao Hua (Barry Song) <song.bao.hua@hisilicon.com>; Philippe
>> Mathieu-Daudé <philmd@redhat.com>; wangyanan (Y) <wangyanan55@huawei.com>;
>> Zengtao (B) <prime.zeng@hisilicon.com>; Wanghaibin (D)
>> <wanghaibin.wang@huawei.com>; yuzenghui <yuzenghui@huawei.com>; yangyicong
>> <yangyicong@huawei.com>; zhukeqian <zhukeqian1@huawei.com>
>> Subject: [RFC PATCH v3 4/4] hw/arm/virt: Parse -smp cluster parameter in
>> virt_smp_parse
>>
>> There is a separate function virt_smp_parse() in hw/virt/arm.c used
>> to parse cpu topology for the ARM machines. So add parsing of -smp
>> cluster parameter in it, then total number of logical cpus will be
>> calculated like: max_cpus = sockets * clusters * cores * threads.
>>
>> Note, we will assume multi-cluster in one socket is not supported
>> and default the value of clusters to 1, if it's not explicitly
>> specified in -smp cmdline.
>>
>> Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
>> ---
>>   hw/arm/virt.c | 32 ++++++++++++++++++--------------
>>   1 file changed, 18 insertions(+), 14 deletions(-)
>>
>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
>> index 7de822e491..678d5ef36c 100644
>> --- a/hw/arm/virt.c
>> +++ b/hw/arm/virt.c
>> @@ -2642,8 +2642,8 @@ static int virt_kvm_type(MachineState *ms, const char
>> *type_str)
>>    * with the -smp cmdlines when parsing them.
>>    *
>>    * We require that at least one of cpus or maxcpus must be provided.
>> - * Threads will default to 1 if not provided. Sockets and cores must
>> - * be either both provided or both not.
>> + * Clusters and threads will default to 1 if they are not provided.
>> + * Sockets and cores must be either both provided or both not.
>>    *
>>    * Note, if neither sockets nor cores are specified, we will calculate
>>    * all the missing values just like smp_parse() does, but will disable
>> @@ -2652,15 +2652,18 @@ static int virt_kvm_type(MachineState *ms, const char
>> *type_str)
>>   static void virt_smp_parse(MachineState *ms, QemuOpts *opts)
>>   {
>>       VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(ms);
>> +    VirtMachineState *vms = VIRT_MACHINE(ms);
>>
>>       if (opts) {
>>           unsigned cpus = qemu_opt_get_number(opts, "cpus", 0);
>>           unsigned maxcpus = qemu_opt_get_number(opts, "maxcpus", 0);
>>           unsigned sockets = qemu_opt_get_number(opts, "sockets", 0);
>> +        unsigned clusters = qemu_opt_get_number(opts, "clusters", 0);
>>           unsigned cores = qemu_opt_get_number(opts, "cores", 0);
>>           unsigned threads = qemu_opt_get_number(opts, "threads", 0);
>>
>> -        /* Default threads to 1 if not provided */
>> +        /* Default clusters and threads to 1 if not provided */
>> +        clusters = clusters > 0 ? clusters : 1;
>>           threads = threads > 0 ? threads : 1;
>>
>>           if (cpus == 0 && maxcpus == 0) {
>> @@ -2676,13 +2679,13 @@ static void virt_smp_parse(MachineState *ms, QemuOpts
>> *opts)
>>               cores = 1;
>>               if (cpus == 0) {
>>                   sockets = 1;
>> -                cpus = sockets * cores * threads;
>> +                cpus = sockets * clusters * cores * threads;
>>               } else {
>>                   maxcpus = maxcpus > 0 ? maxcpus : cpus;
>> -                sockets = maxcpus / (cores * threads);
>> +                sockets = maxcpus / (clusters * cores * threads);
>>               }
>>           } else if (sockets > 0 && cores > 0) {
>> -            cpus = cpus > 0 ? cpus : sockets * cores * threads;
>> +            cpus = cpus > 0 ? cpus : sockets * clusters * cores * threads;
>>               maxcpus = maxcpus > 0 ? maxcpus : cpus;
>>           } else {
>>               error_report("sockets and cores must be both provided "
>> @@ -2695,25 +2698,26 @@ static void virt_smp_parse(MachineState *ms, QemuOpts
>> *opts)
>>               exit(1);
>>           }
>>
>> -        if (sockets * cores * threads < cpus) {
>> +        if (sockets * clusters * cores * threads < cpus) {
>>               error_report("cpu topology: "
>> -                         "sockets (%u) * cores (%u) * threads (%u) < "
>> -                         "smp_cpus (%u)",
>> -                         sockets, cores, threads, cpus);
>> +                         "sockets (%u) * clusters (%u) * cores (%u) * "
>> +                         "threads (%u) < smp_cpus (%u)",
>> +                         sockets, clusters, cores, threads, cpus);
>>               exit(1);
>>           }
>>
>> -        if (sockets * cores * threads != maxcpus) {
>> +        if (sockets * clusters * cores * threads != maxcpus) {
>>               error_report("cpu topology: "
>> -                         "sockets (%u) * cores (%u) * threads (%u) "
>> -                         "!= maxcpus (%u)",
>> -                         sockets, cores, threads, maxcpus);
>> +                         "sockets (%u) * clusters (%u) * cores (%u) * "
>> +                         "threads (%u) != maxcpus (%u)",
>> +                         sockets, clusters, cores, threads, maxcpus);
>>               exit(1);
>>           }
>>
>>           ms->smp.cpus = cpus;
>>           ms->smp.max_cpus = maxcpus;
>>           ms->smp.sockets = sockets;
>> +        vms->smp_clusters = clusters;
>
> This variable naming *smp_clusters* looks out-of-sorts. I thought a similar
> variable *smp_cpus* was destined to be removed for the reason given in below
> link - a patch by Andrew Jones?
>
> Link: https://lists.gnu.org/archive/html/qemu-arm/2020-12/msg00418.html
>
> Am I missing anything here?
The smp_clusters is added in VirtMachineState and nowhere else because
it's currently only used for ARM. But I think maybe I should also move it to
CpuTopology structure like [1] is doing to move dies to CpuTopology.

Move clusters to CpuTopology won't affect other architectures that don't
support it yet, and will also make it easy if they want to in the future.

[1] From Paolo:
https://patchwork.kernel.org/project/qemu-devel/patch/20210513162901.1310239-10-pbonzini@redhat.com/

Thanks,
Yanan
> Salil.
>
>>           ms->smp.cores = cores;
>>           ms->smp.threads = threads;
>>       }
> .


^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [RFC PATCH v3 4/4] hw/arm/virt: Parse -smp cluster parameter in virt_smp_parse
  2021-05-18  3:48     ` wangyanan (Y)
@ 2021-05-18  6:52         ` Salil Mehta
  2021-05-18  8:19       ` Andrew Jones
  1 sibling, 0 replies; 14+ messages in thread
From: Salil Mehta @ 2021-05-18  6:52 UTC (permalink / raw)
  To: wangyanan (Y)
  Cc: Peter Maydell, Paolo Bonzini, Andrew Jones, Michael S . Tsirkin,
	Igor Mammedov, Shannon Zhao, qemu-devel, qemu-arm,
	Philippe Mathieu-Daudé, yangyicong, Zengtao (B),
	Song Bao Hua (Barry Song), Wanghaibin (D),
	zhukeqian, yuzenghui, linux-kernel, linuxarm

> From: wangyanan (Y)
> 
> Hi Salil,
> 
> On 2021/5/17 23:17, Salil Mehta wrote:
> >> From: Qemu-devel
> >> [mailto:qemu-devel-bounces+salil.mehta=huawei.com@nongnu.org] On Behalf Of
> >> Yanan Wang
> >> Sent: Sunday, May 16, 2021 11:32 AM
> >> To: Peter Maydell <peter.maydell@linaro.org>; Paolo Bonzini
> >> <pbonzini@redhat.com>; Andrew Jones <drjones@redhat.com>; Michael S . Tsirkin
> >> <mst@redhat.com>; Igor Mammedov <imammedo@redhat.com>; Shannon Zhao
> >> <shannon.zhaosl@gmail.com>; qemu-devel@nongnu.org; qemu-arm@nongnu.org
> >> Cc: Song Bao Hua (Barry Song) <song.bao.hua@hisilicon.com>; Philippe
> >> Mathieu-Daudé <philmd@redhat.com>; wangyanan (Y) <wangyanan55@huawei.com>;
> >> Zengtao (B) <prime.zeng@hisilicon.com>; Wanghaibin (D)
> >> <wanghaibin.wang@huawei.com>; yuzenghui <yuzenghui@huawei.com>; yangyicong
> >> <yangyicong@huawei.com>; zhukeqian <zhukeqian1@huawei.com>
> >> Subject: [RFC PATCH v3 4/4] hw/arm/virt: Parse -smp cluster parameter in
> >> virt_smp_parse
> >>
> >> There is a separate function virt_smp_parse() in hw/virt/arm.c used
> >> to parse cpu topology for the ARM machines. So add parsing of -smp
> >> cluster parameter in it, then total number of logical cpus will be
> >> calculated like: max_cpus = sockets * clusters * cores * threads.
> >>
> >> Note, we will assume multi-cluster in one socket is not supported
> >> and default the value of clusters to 1, if it's not explicitly
> >> specified in -smp cmdline.
> >>
> >> Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
> >> ---
> >>   hw/arm/virt.c | 32 ++++++++++++++++++--------------
> >>   1 file changed, 18 insertions(+), 14 deletions(-)
> >>
> >> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> >> index 7de822e491..678d5ef36c 100644
> >> --- a/hw/arm/virt.c
> >> +++ b/hw/arm/virt.c
> >> @@ -2642,8 +2642,8 @@ static int virt_kvm_type(MachineState *ms, const char
> >> *type_str)
> >>    * with the -smp cmdlines when parsing them.
> >>    *
> >>    * We require that at least one of cpus or maxcpus must be provided.
> >> - * Threads will default to 1 if not provided. Sockets and cores must
> >> - * be either both provided or both not.
> >> + * Clusters and threads will default to 1 if they are not provided.
> >> + * Sockets and cores must be either both provided or both not.
> >>    *
> >>    * Note, if neither sockets nor cores are specified, we will calculate
> >>    * all the missing values just like smp_parse() does, but will disable
> >> @@ -2652,15 +2652,18 @@ static int virt_kvm_type(MachineState *ms, const char
> >> *type_str)
> >>   static void virt_smp_parse(MachineState *ms, QemuOpts *opts)
> >>   {
> >>       VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(ms);
> >> +    VirtMachineState *vms = VIRT_MACHINE(ms);
> >>
> >>       if (opts) {
> >>           unsigned cpus = qemu_opt_get_number(opts, "cpus", 0);
> >>           unsigned maxcpus = qemu_opt_get_number(opts, "maxcpus", 0);
> >>           unsigned sockets = qemu_opt_get_number(opts, "sockets", 0);
> >> +        unsigned clusters = qemu_opt_get_number(opts, "clusters", 0);
> >>           unsigned cores = qemu_opt_get_number(opts, "cores", 0);
> >>           unsigned threads = qemu_opt_get_number(opts, "threads", 0);
> >>
> >> -        /* Default threads to 1 if not provided */
> >> +        /* Default clusters and threads to 1 if not provided */
> >> +        clusters = clusters > 0 ? clusters : 1;
> >>           threads = threads > 0 ? threads : 1;
> >>
> >>           if (cpus == 0 && maxcpus == 0) {
> >> @@ -2676,13 +2679,13 @@ static void virt_smp_parse(MachineState *ms, QemuOpts
> >> *opts)
> >>               cores = 1;
> >>               if (cpus == 0) {
> >>                   sockets = 1;
> >> -                cpus = sockets * cores * threads;
> >> +                cpus = sockets * clusters * cores * threads;
> >>               } else {
> >>                   maxcpus = maxcpus > 0 ? maxcpus : cpus;
> >> -                sockets = maxcpus / (cores * threads);
> >> +                sockets = maxcpus / (clusters * cores * threads);
> >>               }
> >>           } else if (sockets > 0 && cores > 0) {
> >> -            cpus = cpus > 0 ? cpus : sockets * cores * threads;
> >> +            cpus = cpus > 0 ? cpus : sockets * clusters * cores * threads;
> >>               maxcpus = maxcpus > 0 ? maxcpus : cpus;
> >>           } else {
> >>               error_report("sockets and cores must be both provided "
> >> @@ -2695,25 +2698,26 @@ static void virt_smp_parse(MachineState *ms, QemuOpts
> >> *opts)
> >>               exit(1);
> >>           }
> >>
> >> -        if (sockets * cores * threads < cpus) {
> >> +        if (sockets * clusters * cores * threads < cpus) {
> >>               error_report("cpu topology: "
> >> -                         "sockets (%u) * cores (%u) * threads (%u) < "
> >> -                         "smp_cpus (%u)",
> >> -                         sockets, cores, threads, cpus);
> >> +                         "sockets (%u) * clusters (%u) * cores (%u) * "
> >> +                         "threads (%u) < smp_cpus (%u)",
> >> +                         sockets, clusters, cores, threads, cpus);
> >>               exit(1);
> >>           }
> >>
> >> -        if (sockets * cores * threads != maxcpus) {
> >> +        if (sockets * clusters * cores * threads != maxcpus) {
> >>               error_report("cpu topology: "
> >> -                         "sockets (%u) * cores (%u) * threads (%u) "
> >> -                         "!= maxcpus (%u)",
> >> -                         sockets, cores, threads, maxcpus);
> >> +                         "sockets (%u) * clusters (%u) * cores (%u) * "
> >> +                         "threads (%u) != maxcpus (%u)",
> >> +                         sockets, clusters, cores, threads, maxcpus);
> >>               exit(1);
> >>           }
> >>
> >>           ms->smp.cpus = cpus;
> >>           ms->smp.max_cpus = maxcpus;
> >>           ms->smp.sockets = sockets;
> >> +        vms->smp_clusters = clusters;
> >
> > This variable naming *smp_clusters* looks out-of-sorts. I thought a similar
> > variable *smp_cpus* was destined to be removed for the reason given in below
> > link - a patch by Andrew Jones?
> >
> > Link: https://lists.gnu.org/archive/html/qemu-arm/2020-12/msg00418.html
> >
> > Am I missing anything here?
> The smp_clusters is added in VirtMachineState and nowhere else because
> it's currently only used for ARM. But I think maybe I should also move it to
> CpuTopology structure like [1] is doing to move dies to CpuTopology.

yes, that’s the idea. It is always good to have right place holders so that
the code comprehension/usage(in this case) becomes easy and obvious. 


> 
> Move clusters to CpuTopology won't affect other architectures that don't
> support it yet, and will also make it easy if they want to in the future.
> 
> [1] From Paolo:
> https://patchwork.kernel.org/project/qemu-devel/patch/20210513162901.131023
> 9-10-pbonzini@redhat.com/

sure.


> 
> Thanks,
> Yanan
> > Salil.
> >
> >>           ms->smp.cores = cores;
> >>           ms->smp.threads = threads;
> >>       }

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [RFC PATCH v3 4/4] hw/arm/virt: Parse -smp cluster parameter in virt_smp_parse
@ 2021-05-18  6:52         ` Salil Mehta
  0 siblings, 0 replies; 14+ messages in thread
From: Salil Mehta @ 2021-05-18  6:52 UTC (permalink / raw)
  To: wangyanan (Y)
  Cc: Song Bao Hua (Barry Song),
	Peter Maydell, Andrew Jones, linuxarm, Michael S . Tsirkin,
	Wanghaibin (D),
	zhukeqian, qemu-devel, yangyicong, Shannon Zhao, qemu-arm,
	Zengtao (B),
	Paolo Bonzini, yuzenghui, Igor Mammedov,
	Philippe Mathieu-Daudé,
	linux-kernel

> From: wangyanan (Y)
> 
> Hi Salil,
> 
> On 2021/5/17 23:17, Salil Mehta wrote:
> >> From: Qemu-devel
> >> [mailto:qemu-devel-bounces+salil.mehta=huawei.com@nongnu.org] On Behalf Of
> >> Yanan Wang
> >> Sent: Sunday, May 16, 2021 11:32 AM
> >> To: Peter Maydell <peter.maydell@linaro.org>; Paolo Bonzini
> >> <pbonzini@redhat.com>; Andrew Jones <drjones@redhat.com>; Michael S . Tsirkin
> >> <mst@redhat.com>; Igor Mammedov <imammedo@redhat.com>; Shannon Zhao
> >> <shannon.zhaosl@gmail.com>; qemu-devel@nongnu.org; qemu-arm@nongnu.org
> >> Cc: Song Bao Hua (Barry Song) <song.bao.hua@hisilicon.com>; Philippe
> >> Mathieu-Daudé <philmd@redhat.com>; wangyanan (Y) <wangyanan55@huawei.com>;
> >> Zengtao (B) <prime.zeng@hisilicon.com>; Wanghaibin (D)
> >> <wanghaibin.wang@huawei.com>; yuzenghui <yuzenghui@huawei.com>; yangyicong
> >> <yangyicong@huawei.com>; zhukeqian <zhukeqian1@huawei.com>
> >> Subject: [RFC PATCH v3 4/4] hw/arm/virt: Parse -smp cluster parameter in
> >> virt_smp_parse
> >>
> >> There is a separate function virt_smp_parse() in hw/virt/arm.c used
> >> to parse cpu topology for the ARM machines. So add parsing of -smp
> >> cluster parameter in it, then total number of logical cpus will be
> >> calculated like: max_cpus = sockets * clusters * cores * threads.
> >>
> >> Note, we will assume multi-cluster in one socket is not supported
> >> and default the value of clusters to 1, if it's not explicitly
> >> specified in -smp cmdline.
> >>
> >> Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
> >> ---
> >>   hw/arm/virt.c | 32 ++++++++++++++++++--------------
> >>   1 file changed, 18 insertions(+), 14 deletions(-)
> >>
> >> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> >> index 7de822e491..678d5ef36c 100644
> >> --- a/hw/arm/virt.c
> >> +++ b/hw/arm/virt.c
> >> @@ -2642,8 +2642,8 @@ static int virt_kvm_type(MachineState *ms, const char
> >> *type_str)
> >>    * with the -smp cmdlines when parsing them.
> >>    *
> >>    * We require that at least one of cpus or maxcpus must be provided.
> >> - * Threads will default to 1 if not provided. Sockets and cores must
> >> - * be either both provided or both not.
> >> + * Clusters and threads will default to 1 if they are not provided.
> >> + * Sockets and cores must be either both provided or both not.
> >>    *
> >>    * Note, if neither sockets nor cores are specified, we will calculate
> >>    * all the missing values just like smp_parse() does, but will disable
> >> @@ -2652,15 +2652,18 @@ static int virt_kvm_type(MachineState *ms, const char
> >> *type_str)
> >>   static void virt_smp_parse(MachineState *ms, QemuOpts *opts)
> >>   {
> >>       VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(ms);
> >> +    VirtMachineState *vms = VIRT_MACHINE(ms);
> >>
> >>       if (opts) {
> >>           unsigned cpus = qemu_opt_get_number(opts, "cpus", 0);
> >>           unsigned maxcpus = qemu_opt_get_number(opts, "maxcpus", 0);
> >>           unsigned sockets = qemu_opt_get_number(opts, "sockets", 0);
> >> +        unsigned clusters = qemu_opt_get_number(opts, "clusters", 0);
> >>           unsigned cores = qemu_opt_get_number(opts, "cores", 0);
> >>           unsigned threads = qemu_opt_get_number(opts, "threads", 0);
> >>
> >> -        /* Default threads to 1 if not provided */
> >> +        /* Default clusters and threads to 1 if not provided */
> >> +        clusters = clusters > 0 ? clusters : 1;
> >>           threads = threads > 0 ? threads : 1;
> >>
> >>           if (cpus == 0 && maxcpus == 0) {
> >> @@ -2676,13 +2679,13 @@ static void virt_smp_parse(MachineState *ms, QemuOpts
> >> *opts)
> >>               cores = 1;
> >>               if (cpus == 0) {
> >>                   sockets = 1;
> >> -                cpus = sockets * cores * threads;
> >> +                cpus = sockets * clusters * cores * threads;
> >>               } else {
> >>                   maxcpus = maxcpus > 0 ? maxcpus : cpus;
> >> -                sockets = maxcpus / (cores * threads);
> >> +                sockets = maxcpus / (clusters * cores * threads);
> >>               }
> >>           } else if (sockets > 0 && cores > 0) {
> >> -            cpus = cpus > 0 ? cpus : sockets * cores * threads;
> >> +            cpus = cpus > 0 ? cpus : sockets * clusters * cores * threads;
> >>               maxcpus = maxcpus > 0 ? maxcpus : cpus;
> >>           } else {
> >>               error_report("sockets and cores must be both provided "
> >> @@ -2695,25 +2698,26 @@ static void virt_smp_parse(MachineState *ms, QemuOpts
> >> *opts)
> >>               exit(1);
> >>           }
> >>
> >> -        if (sockets * cores * threads < cpus) {
> >> +        if (sockets * clusters * cores * threads < cpus) {
> >>               error_report("cpu topology: "
> >> -                         "sockets (%u) * cores (%u) * threads (%u) < "
> >> -                         "smp_cpus (%u)",
> >> -                         sockets, cores, threads, cpus);
> >> +                         "sockets (%u) * clusters (%u) * cores (%u) * "
> >> +                         "threads (%u) < smp_cpus (%u)",
> >> +                         sockets, clusters, cores, threads, cpus);
> >>               exit(1);
> >>           }
> >>
> >> -        if (sockets * cores * threads != maxcpus) {
> >> +        if (sockets * clusters * cores * threads != maxcpus) {
> >>               error_report("cpu topology: "
> >> -                         "sockets (%u) * cores (%u) * threads (%u) "
> >> -                         "!= maxcpus (%u)",
> >> -                         sockets, cores, threads, maxcpus);
> >> +                         "sockets (%u) * clusters (%u) * cores (%u) * "
> >> +                         "threads (%u) != maxcpus (%u)",
> >> +                         sockets, clusters, cores, threads, maxcpus);
> >>               exit(1);
> >>           }
> >>
> >>           ms->smp.cpus = cpus;
> >>           ms->smp.max_cpus = maxcpus;
> >>           ms->smp.sockets = sockets;
> >> +        vms->smp_clusters = clusters;
> >
> > This variable naming *smp_clusters* looks out-of-sorts. I thought a similar
> > variable *smp_cpus* was destined to be removed for the reason given in below
> > link - a patch by Andrew Jones?
> >
> > Link: https://lists.gnu.org/archive/html/qemu-arm/2020-12/msg00418.html
> >
> > Am I missing anything here?
> The smp_clusters is added in VirtMachineState and nowhere else because
> it's currently only used for ARM. But I think maybe I should also move it to
> CpuTopology structure like [1] is doing to move dies to CpuTopology.

yes, that’s the idea. It is always good to have right place holders so that
the code comprehension/usage(in this case) becomes easy and obvious. 


> 
> Move clusters to CpuTopology won't affect other architectures that don't
> support it yet, and will also make it easy if they want to in the future.
> 
> [1] From Paolo:
> https://patchwork.kernel.org/project/qemu-devel/patch/20210513162901.131023
> 9-10-pbonzini@redhat.com/

sure.


> 
> Thanks,
> Yanan
> > Salil.
> >
> >>           ms->smp.cores = cores;
> >>           ms->smp.threads = threads;
> >>       }

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH v3 4/4] hw/arm/virt: Parse -smp cluster parameter in virt_smp_parse
  2021-05-18  3:48     ` wangyanan (Y)
  2021-05-18  6:52         ` Salil Mehta
@ 2021-05-18  8:19       ` Andrew Jones
  1 sibling, 0 replies; 14+ messages in thread
From: Andrew Jones @ 2021-05-18  8:19 UTC (permalink / raw)
  To: wangyanan (Y)
  Cc: Barry Song, Peter Maydell, Salil Mehta, Michael S . Tsirkin,
	wanghaibin.wang, zhukeqian1, qemu-devel, yangyicong,
	Shannon Zhao, qemu-arm, prime.zeng, Paolo Bonzini, yuzenghui,
	Igor Mammedov, Philippe Mathieu-Daudé

On Tue, May 18, 2021 at 11:48:34AM +0800, wangyanan (Y) wrote:
> Hi Salil,
> 
> On 2021/5/17 23:17, Salil Mehta wrote:
> > > From: Qemu-devel
> > > [mailto:qemu-devel-bounces+salil.mehta=huawei.com@nongnu.org] On Behalf Of
> > > Yanan Wang
> > > Sent: Sunday, May 16, 2021 11:32 AM
> > > To: Peter Maydell <peter.maydell@linaro.org>; Paolo Bonzini
> > > <pbonzini@redhat.com>; Andrew Jones <drjones@redhat.com>; Michael S . Tsirkin
> > > <mst@redhat.com>; Igor Mammedov <imammedo@redhat.com>; Shannon Zhao
> > > <shannon.zhaosl@gmail.com>; qemu-devel@nongnu.org; qemu-arm@nongnu.org
> > > Cc: Song Bao Hua (Barry Song) <song.bao.hua@hisilicon.com>; Philippe
> > > Mathieu-Daudé <philmd@redhat.com>; wangyanan (Y) <wangyanan55@huawei.com>;
> > > Zengtao (B) <prime.zeng@hisilicon.com>; Wanghaibin (D)
> > > <wanghaibin.wang@huawei.com>; yuzenghui <yuzenghui@huawei.com>; yangyicong
> > > <yangyicong@huawei.com>; zhukeqian <zhukeqian1@huawei.com>
> > > Subject: [RFC PATCH v3 4/4] hw/arm/virt: Parse -smp cluster parameter in
> > > virt_smp_parse
> > > 
> > > There is a separate function virt_smp_parse() in hw/virt/arm.c used
> > > to parse cpu topology for the ARM machines. So add parsing of -smp
> > > cluster parameter in it, then total number of logical cpus will be
> > > calculated like: max_cpus = sockets * clusters * cores * threads.
> > > 
> > > Note, we will assume multi-cluster in one socket is not supported
> > > and default the value of clusters to 1, if it's not explicitly
> > > specified in -smp cmdline.
> > > 
> > > Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
> > > ---
> > >   hw/arm/virt.c | 32 ++++++++++++++++++--------------
> > >   1 file changed, 18 insertions(+), 14 deletions(-)
> > > 
> > > diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> > > index 7de822e491..678d5ef36c 100644
> > > --- a/hw/arm/virt.c
> > > +++ b/hw/arm/virt.c
> > > @@ -2642,8 +2642,8 @@ static int virt_kvm_type(MachineState *ms, const char
> > > *type_str)
> > >    * with the -smp cmdlines when parsing them.
> > >    *
> > >    * We require that at least one of cpus or maxcpus must be provided.
> > > - * Threads will default to 1 if not provided. Sockets and cores must
> > > - * be either both provided or both not.
> > > + * Clusters and threads will default to 1 if they are not provided.
> > > + * Sockets and cores must be either both provided or both not.
> > >    *
> > >    * Note, if neither sockets nor cores are specified, we will calculate
> > >    * all the missing values just like smp_parse() does, but will disable
> > > @@ -2652,15 +2652,18 @@ static int virt_kvm_type(MachineState *ms, const char
> > > *type_str)
> > >   static void virt_smp_parse(MachineState *ms, QemuOpts *opts)
> > >   {
> > >       VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(ms);
> > > +    VirtMachineState *vms = VIRT_MACHINE(ms);
> > > 
> > >       if (opts) {
> > >           unsigned cpus = qemu_opt_get_number(opts, "cpus", 0);
> > >           unsigned maxcpus = qemu_opt_get_number(opts, "maxcpus", 0);
> > >           unsigned sockets = qemu_opt_get_number(opts, "sockets", 0);
> > > +        unsigned clusters = qemu_opt_get_number(opts, "clusters", 0);
> > >           unsigned cores = qemu_opt_get_number(opts, "cores", 0);
> > >           unsigned threads = qemu_opt_get_number(opts, "threads", 0);
> > > 
> > > -        /* Default threads to 1 if not provided */
> > > +        /* Default clusters and threads to 1 if not provided */
> > > +        clusters = clusters > 0 ? clusters : 1;
> > >           threads = threads > 0 ? threads : 1;
> > > 
> > >           if (cpus == 0 && maxcpus == 0) {
> > > @@ -2676,13 +2679,13 @@ static void virt_smp_parse(MachineState *ms, QemuOpts
> > > *opts)
> > >               cores = 1;
> > >               if (cpus == 0) {
> > >                   sockets = 1;
> > > -                cpus = sockets * cores * threads;
> > > +                cpus = sockets * clusters * cores * threads;
> > >               } else {
> > >                   maxcpus = maxcpus > 0 ? maxcpus : cpus;
> > > -                sockets = maxcpus / (cores * threads);
> > > +                sockets = maxcpus / (clusters * cores * threads);
> > >               }
> > >           } else if (sockets > 0 && cores > 0) {
> > > -            cpus = cpus > 0 ? cpus : sockets * cores * threads;
> > > +            cpus = cpus > 0 ? cpus : sockets * clusters * cores * threads;
> > >               maxcpus = maxcpus > 0 ? maxcpus : cpus;
> > >           } else {
> > >               error_report("sockets and cores must be both provided "
> > > @@ -2695,25 +2698,26 @@ static void virt_smp_parse(MachineState *ms, QemuOpts
> > > *opts)
> > >               exit(1);
> > >           }
> > > 
> > > -        if (sockets * cores * threads < cpus) {
> > > +        if (sockets * clusters * cores * threads < cpus) {
> > >               error_report("cpu topology: "
> > > -                         "sockets (%u) * cores (%u) * threads (%u) < "
> > > -                         "smp_cpus (%u)",
> > > -                         sockets, cores, threads, cpus);
> > > +                         "sockets (%u) * clusters (%u) * cores (%u) * "
> > > +                         "threads (%u) < smp_cpus (%u)",
> > > +                         sockets, clusters, cores, threads, cpus);
> > >               exit(1);
> > >           }
> > > 
> > > -        if (sockets * cores * threads != maxcpus) {
> > > +        if (sockets * clusters * cores * threads != maxcpus) {
> > >               error_report("cpu topology: "
> > > -                         "sockets (%u) * cores (%u) * threads (%u) "
> > > -                         "!= maxcpus (%u)",
> > > -                         sockets, cores, threads, maxcpus);
> > > +                         "sockets (%u) * clusters (%u) * cores (%u) * "
> > > +                         "threads (%u) != maxcpus (%u)",
> > > +                         sockets, clusters, cores, threads, maxcpus);
> > >               exit(1);
> > >           }
> > > 
> > >           ms->smp.cpus = cpus;
> > >           ms->smp.max_cpus = maxcpus;
> > >           ms->smp.sockets = sockets;
> > > +        vms->smp_clusters = clusters;
> > 
> > This variable naming *smp_clusters* looks out-of-sorts. I thought a similar
> > variable *smp_cpus* was destined to be removed for the reason given in below
> > link - a patch by Andrew Jones?
> > 
> > Link: https://lists.gnu.org/archive/html/qemu-arm/2020-12/msg00418.html
> > 
> > Am I missing anything here?
> The smp_clusters is added in VirtMachineState and nowhere else because
> it's currently only used for ARM. But I think maybe I should also move it to
> CpuTopology structure like [1] is doing to move dies to CpuTopology.

Yes, please do that.

Thanks,
drew

> 
> Move clusters to CpuTopology won't affect other architectures that don't
> support it yet, and will also make it easy if they want to in the future.
> 
> [1] From Paolo:
> https://patchwork.kernel.org/project/qemu-devel/patch/20210513162901.1310239-10-pbonzini@redhat.com/
> 
> Thanks,
> Yanan
> > Salil.
> > 
> > >           ms->smp.cores = cores;
> > >           ms->smp.threads = threads;
> > >       }
> > .
> 



^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-05-18  8:22 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-16 10:32 [RFC PATCH v3 0/4] hw/arm/virt: Introduce cluster cpu topology support Yanan Wang
2021-05-16 10:32 ` [RFC PATCH v3 1/4] vl.c: Add -smp, clusters=* command line support for ARM cpu Yanan Wang
2021-05-17  9:07   ` Andrew Jones
2021-05-17 15:07     ` wangyanan (Y)
2021-05-16 10:32 ` [RFC PATCH v3 2/4] hw/arm/virt: Add cluster level to device tree Yanan Wang
2021-05-16 10:32 ` [RFC PATCH v3 3/4] hw/arm/virt-acpi-build: Add cluster level to PPTT table Yanan Wang
2021-05-16 10:32 ` [RFC PATCH v3 4/4] hw/arm/virt: Parse -smp cluster parameter in virt_smp_parse Yanan Wang
2021-05-17  9:12   ` Andrew Jones
2021-05-17 15:10     ` wangyanan (Y)
2021-05-17 15:17   ` Salil Mehta
2021-05-18  3:48     ` wangyanan (Y)
2021-05-18  6:52       ` Salil Mehta
2021-05-18  6:52         ` Salil Mehta
2021-05-18  8:19       ` Andrew Jones

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.