qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v2 0/4] hw/arm/virt: Introduce cluster cpu topology support
@ 2021-04-13  8:31 Yanan Wang
  2021-04-13  8:31 ` [RFC PATCH v2 1/4] vl.c: Add -smp, clusters=* command line support for ARM cpu Yanan Wang
                   ` (3 more replies)
  0 siblings, 4 replies; 22+ messages in thread
From: Yanan Wang @ 2021-04-13  8:31 UTC (permalink / raw)
  To: qemu-devel, qemu-arm
  Cc: Barry Song, Peter Maydell, Andrew Jones, Michael S . Tsirkin,
	wanghaibin.wang, yangyicong, Yanan Wang, Shannon Zhao,
	Alistair Francis, prime.zeng, Paolo Bonzini, yuzenghui,
	Igor Mammedov, zhukeqian1, Jiajie Li, David Gibson

Hi,

This series is a new version of [0] posted to introduce the cluster cpu
topology support for ARM platform, besides now existing sockets, cores,
and threads. And the code has been rewriten based on patch series [1].
[0] https://patchwork.kernel.org/project/qemu-devel/cover/20210331095343.12172-1-wangyanan55@huawei.com/
[1] https://patchwork.kernel.org/project/qemu-devel/cover/20210413080745.33004-1-wangyanan55@huawei.com/

Changelogs:
v1->v2:
- Only focus on cluster support for ARM platform
- Rebase the code on patch series [1]

Description:
A cluster means a group of cores that share some resources (e.g. cache)
among them under the LLC. For example, ARM64 server chip Kunpeng 920 has
6 or 8 clusters in each NUMA, and each cluster has 4 cores. All clusters
share L3 cache data while cores within each cluster share the L2 cache.

The cache affinity of cluster has been proved to improve the Linux kernel
scheduling performance and a patchset [2] has already been posted, where
a general sched_domain for clusters was added and a cluster level was
added in the arch-neutral cpu topology struct like below.
struct cpu_topology {
    int thread_id;
    int core_id;
    int cluster_id;
    int package_id;
    int llc_id;
    cpumask_t thread_sibling;
    cpumask_t core_sibling;
    cpumask_t cluster_sibling;
    cpumask_t llc_sibling;
};

Also Kernel Doc [3]: Documentation/devicetree/bindings/cpu/cpu-topology.txt
defines a four-level CPU topology hierarchy like socket/cluster/core/thread.
According to the context, a socket node's child nodes must be one or more
cluster nodes and a cluster node's child nodes must be one or more cluster
nodes/one or more core nodes.

So let's add the -smp, clusters=* command line support for ARM cpu, so that
future guest os could make use of cluster cpu topology for better scheduling
performance. For ARM machines, a four-level cpu hierarchy can be defined and
it will be sockets/clusters/cores/threads.
[2] https://patchwork.kernel.org/project/linux-arm-kernel/cover/20210319041618.14316-1-song.bao.hua@hisilicon.com/
[3] https://github.com/torvalds/linux/blob/master/Documentation/devicetree/bindings/cpu/cpu-topology.txt

Test results:
After applying this patch series, launch a guest with virt-6.0 and cpu
topology configured with: -smp
96,sockets=2,clusters=6,cores=4,threads=2,
VM's cpu topology description shows as below.
lscpu:
Architecture:        aarch64
Byte Order:          Little Endian
CPU(s):              96
On-line CPU(s) list: 0-95
Thread(s) per core:  2
Core(s) per socket:  24
Socket(s):           2
NUMA node(s):        1
Vendor ID:           0x48
Model:               0
Stepping:            0x1
BogoMIPS:            200.00
NUMA node0 CPU(s):   0-95

Topology information of clusters can also be got:
cat /sys/devices/system/cpu/cpu0/topology/cluster_cpus_list: 0-7
cat /sys/devices/system/cpu/cpu0/topology/cluster_id: 56

cat /sys/devices/system/cpu/cpu8/topology/cluster_cpus_list: 8-15
cat /sys/devices/system/cpu/cpu8/topology/cluster_id: 316
...
cat /sys/devices/system/cpu/cpu95/topology/cluster_cpus_list: 88-95
cat /sys/devices/system/cpu/cpu95/topology/cluster_id: 2936

Yanan Wang (4):
  vl.c: Add -smp, clusters=* command line support for ARM cpu
  hw/arm/virt: Parse -smp cluster parameter in virt_smp_parse
  hw/arm/virt-acpi-build: Add cluster level for PPTT table
  hw/arm/virt: Add cluster level for device tree

 hw/arm/virt-acpi-build.c | 55 ++++++++++++++++++++++++----------------
 hw/arm/virt.c            | 44 +++++++++++++++++++-------------
 include/hw/arm/virt.h    |  1 +
 qemu-options.hx          | 26 +++++++++++--------
 softmmu/vl.c             |  3 +++
 5 files changed, 78 insertions(+), 51 deletions(-)

-- 
2.19.1



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [RFC PATCH v2 1/4] vl.c: Add -smp, clusters=* command line support for ARM cpu
  2021-04-13  8:31 [RFC PATCH v2 0/4] hw/arm/virt: Introduce cluster cpu topology support Yanan Wang
@ 2021-04-13  8:31 ` Yanan Wang
  2021-04-28 10:23   ` Andrew Jones
  2021-04-13  8:31 ` [RFC PATCH v2 2/4] hw/arm/virt: Parse -smp cluster parameter in virt_smp_parse Yanan Wang
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 22+ messages in thread
From: Yanan Wang @ 2021-04-13  8:31 UTC (permalink / raw)
  To: qemu-devel, qemu-arm
  Cc: Barry Song, Peter Maydell, Andrew Jones, Michael S . Tsirkin,
	wanghaibin.wang, yangyicong, Yanan Wang, Shannon Zhao,
	Alistair Francis, prime.zeng, Paolo Bonzini, yuzenghui,
	Igor Mammedov, zhukeqian1, Jiajie Li, David Gibson

A cluster means a group of cores that share some resources (e.g. cache)
among them under the LLC. For example, ARM64 server chip Kunpeng 920 has
6 or 8 clusters in each NUMA, and each cluster has 4 cores. All clusters
share L3 cache data while cores within each cluster share the L2 cache.

The cache affinity of cluster has been proved to improve the Linux kernel
scheduling performance and a patchset has been posted, in which a general
sched_domain for clusters was added and a cluster level was added in the
arch-neutral cpu topology struct like below.
struct cpu_topology {
    int thread_id;
    int core_id;
    int cluster_id;
    int package_id;
    int llc_id;
    cpumask_t thread_sibling;
    cpumask_t core_sibling;
    cpumask_t cluster_sibling;
    cpumask_t llc_sibling;
}

Also the Kernel Doc: Documentation/devicetree/bindings/cpu/cpu-topology.txt
defines a four-level CPU topology hierarchy like socket/cluster/core/thread.
According to the context, a socket node's child nodes must be one or more
cluster nodes and a cluster node's child nodes must be one or more cluster
nodes/one or more core nodes.

So let's add the -smp, clusters=* command line support for ARM cpu, so that
future guest os could make use of cluster cpu topology for better scheduling
performance. For ARM machines, a four-level cpu hierarchy can be defined and
it will be sockets/clusters/cores/threads. Because we only support clusters
for ARM cpu currently, a new member "unsigned smp_clusters" is added to the
VirtMachineState structure.

Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
---
 include/hw/arm/virt.h |  1 +
 qemu-options.hx       | 26 +++++++++++++++-----------
 softmmu/vl.c          |  3 +++
 3 files changed, 19 insertions(+), 11 deletions(-)

diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index 4a4b98e4a7..5d5d156924 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -155,6 +155,7 @@ struct VirtMachineState {
     char *pciehb_nodename;
     const int *irqmap;
     int fdt_size;
+    unsigned smp_clusters;
     uint32_t clock_phandle;
     uint32_t gic_phandle;
     uint32_t msi_phandle;
diff --git a/qemu-options.hx b/qemu-options.hx
index fd21002bd6..65343ea23c 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -184,25 +184,29 @@ SRST
 ERST
 
 DEF("smp", HAS_ARG, QEMU_OPTION_smp,
-    "-smp [cpus=]n[,maxcpus=cpus][,cores=cores][,threads=threads][,dies=dies][,sockets=sockets]\n"
+    "-smp [cpus=]n[,maxcpus=cpus][,clusters=clusters][,cores=cores][,threads=threads][,dies=dies][,sockets=sockets]\n"
     "                set the number of CPUs to 'n' [default=1]\n"
     "                maxcpus= maximum number of total cpus, including\n"
     "                offline CPUs for hotplug, etc\n"
-    "                cores= number of CPU cores on one socket (for PC, it's on one die)\n"
+    "                cores= number of CPU cores on one socket\n"
+    "                (it's on one die for PC, and on one cluster for ARM)\n"
     "                threads= number of threads on one CPU core\n"
+    "                clusters= number of CPU clusters on one socket (for ARM only)\n"
     "                dies= number of CPU dies on one socket (for PC only)\n"
     "                sockets= number of discrete sockets in the system\n",
         QEMU_ARCH_ALL)
 SRST
-``-smp [cpus=]n[,cores=cores][,threads=threads][,dies=dies][,sockets=sockets][,maxcpus=maxcpus]``
-    Simulate an SMP system with n CPUs. On the PC target, up to 255 CPUs
-    are supported. On Sparc32 target, Linux limits the number of usable
-    CPUs to 4. For the PC target, the number of cores per die, the
-    number of threads per cores, the number of dies per packages and the
-    total number of sockets can be specified. Missing values will be
-    computed. If any on the three values is given, the total number of
-    CPUs n can be omitted. maxcpus specifies the maximum number of
-    hotpluggable CPUs.
+``-smp [cpus=]n[,maxcpus=cpus][,clusters=clusters][,cores=cores][,threads=threads][,dies=dies][,sockets=sockets]``
+    Simulate an SMP system with n CPUs. On the PC target, up to 255
+    CPUs are supported. On the Sparc32 target, Linux limits the number
+    of usable CPUs to 4. For the PC target, the number of threads per
+    core, the number of cores per die, the number of dies per package
+    and the total number of sockets can be specified. For the ARM target,
+    the number of threads per core, the number of cores per cluster, the
+    number of clusters per socket and the total number of sockets can be
+    specified. And missing values will be computed. If any of the five
+    values is given, the total number of CPUs n can be omitted. Maxcpus
+    specifies the maximum number of hotpluggable CPUs.
 ERST
 
 DEF("numa", HAS_ARG, QEMU_OPTION_numa,
diff --git a/softmmu/vl.c b/softmmu/vl.c
index aadb526138..46f5b6b575 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -720,6 +720,9 @@ static QemuOptsList qemu_smp_opts = {
         }, {
             .name = "dies",
             .type = QEMU_OPT_NUMBER,
+        }, {
+            .name = "clusters",
+            .type = QEMU_OPT_NUMBER,
         }, {
             .name = "cores",
             .type = QEMU_OPT_NUMBER,
-- 
2.19.1



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [RFC PATCH v2 2/4] hw/arm/virt: Parse -smp cluster parameter in virt_smp_parse
  2021-04-13  8:31 [RFC PATCH v2 0/4] hw/arm/virt: Introduce cluster cpu topology support Yanan Wang
  2021-04-13  8:31 ` [RFC PATCH v2 1/4] vl.c: Add -smp, clusters=* command line support for ARM cpu Yanan Wang
@ 2021-04-13  8:31 ` Yanan Wang
  2021-04-28 10:31   ` Andrew Jones
  2021-04-13  8:31 ` [RFC PATCH v2 3/4] hw/arm/virt-acpi-build: Add cluster level for PPTT table Yanan Wang
  2021-04-13  8:31 ` [RFC PATCH v2 4/4] hw/arm/virt: Add cluster level for device tree Yanan Wang
  3 siblings, 1 reply; 22+ messages in thread
From: Yanan Wang @ 2021-04-13  8:31 UTC (permalink / raw)
  To: qemu-devel, qemu-arm
  Cc: Barry Song, Peter Maydell, Andrew Jones, Michael S . Tsirkin,
	wanghaibin.wang, yangyicong, Yanan Wang, Shannon Zhao,
	Alistair Francis, prime.zeng, Paolo Bonzini, yuzenghui,
	Igor Mammedov, zhukeqian1, Jiajie Li, David Gibson

There is a separate function virt_smp_parse() in hw/virt/arm.c used
to parse cpu topology for the ARM machines. So add parsing of -smp
cluster parameter in it, then total number of logical cpus will be
calculated like: max_cpus = sockets * clusters * cores * threads.

In virt_smp_parse(), the computing logic of missing values prefers
cores over sockets over threads. And for compatibility, the value
of clusters will be set as default 1 if not explicitly specified.

Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
---
 hw/arm/virt.c | 32 ++++++++++++++++++--------------
 1 file changed, 18 insertions(+), 14 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 57ef961cb5..51797628db 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -2639,35 +2639,38 @@ static void virt_smp_parse(MachineState *ms, QemuOpts *opts)
     if (opts) {
         unsigned cpus = qemu_opt_get_number(opts, "cpus", 0);
         unsigned sockets = qemu_opt_get_number(opts, "sockets", 0);
+        unsigned clusters = qemu_opt_get_number(opts, "clusters", 1);
         unsigned cores = qemu_opt_get_number(opts, "cores", 0);
         unsigned threads = qemu_opt_get_number(opts, "threads", 0);
+        VirtMachineState *vms = VIRT_MACHINE(ms);
 
         /*
-         * Compute missing values; prefer cores over sockets and
-         * sockets over threads.
+         * Compute missing values; prefer cores over sockets and sockets
+         * over threads. For compatibility, value of clusters will have
+         * been set as default 1 if not explicitly specified.
          */
         if (cpus == 0 || cores == 0) {
             sockets = sockets > 0 ? sockets : 1;
             threads = threads > 0 ? threads : 1;
             if (cpus == 0) {
                 cores = cores > 0 ? cores : 1;
-                cpus = cores * threads * sockets;
+                cpus = sockets * clusters * cores * threads;
             } else {
                 ms->smp.max_cpus = qemu_opt_get_number(opts, "maxcpus", cpus);
-                cores = ms->smp.max_cpus / (sockets * threads);
+                cores = ms->smp.max_cpus / (sockets * clusters * threads);
             }
         } else if (sockets == 0) {
             threads = threads > 0 ? threads : 1;
-            sockets = cpus / (cores * threads);
+            sockets = cpus / (clusters * cores * threads);
             sockets = sockets > 0 ? sockets : 1;
         } else if (threads == 0) {
-            threads = cpus / (cores * sockets);
+            threads = cpus / (sockets * clusters * cores);
             threads = threads > 0 ? threads : 1;
-        } else if (sockets * cores * threads < cpus) {
+        } else if (sockets * clusters * cores * threads < cpus) {
             error_report("cpu topology: "
-                         "sockets (%u) * cores (%u) * threads (%u) < "
-                         "smp_cpus (%u)",
-                         sockets, cores, threads, cpus);
+                         "sockets (%u) * clusters (%u) * cores (%u) * "
+                         "threads (%u) < smp_cpus (%u)",
+                         sockets, clusters, cores, threads, cpus);
             exit(1);
         }
 
@@ -2678,11 +2681,11 @@ static void virt_smp_parse(MachineState *ms, QemuOpts *opts)
             exit(1);
         }
 
-        if (sockets * cores * threads != ms->smp.max_cpus) {
+        if (sockets * clusters * cores * threads != ms->smp.max_cpus) {
             error_report("cpu topology: "
-                         "sockets (%u) * cores (%u) * threads (%u)"
-                         "!= maxcpus (%u)",
-                         sockets, cores, threads,
+                         "sockets (%u) * clusters (%u) * cores (%u) * "
+                         "threads (%u) != maxcpus (%u)",
+                         sockets, clusters, cores, threads,
                          ms->smp.max_cpus);
             exit(1);
         }
@@ -2691,6 +2694,7 @@ static void virt_smp_parse(MachineState *ms, QemuOpts *opts)
         ms->smp.cores = cores;
         ms->smp.threads = threads;
         ms->smp.sockets = sockets;
+        vms->smp_clusters = clusters;
     }
 
     if (ms->smp.cpus > 1) {
-- 
2.19.1



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [RFC PATCH v2 3/4] hw/arm/virt-acpi-build: Add cluster level for PPTT table
  2021-04-13  8:31 [RFC PATCH v2 0/4] hw/arm/virt: Introduce cluster cpu topology support Yanan Wang
  2021-04-13  8:31 ` [RFC PATCH v2 1/4] vl.c: Add -smp, clusters=* command line support for ARM cpu Yanan Wang
  2021-04-13  8:31 ` [RFC PATCH v2 2/4] hw/arm/virt: Parse -smp cluster parameter in virt_smp_parse Yanan Wang
@ 2021-04-13  8:31 ` Yanan Wang
  2021-04-28 10:41   ` Andrew Jones
  2021-04-13  8:31 ` [RFC PATCH v2 4/4] hw/arm/virt: Add cluster level for device tree Yanan Wang
  3 siblings, 1 reply; 22+ messages in thread
From: Yanan Wang @ 2021-04-13  8:31 UTC (permalink / raw)
  To: qemu-devel, qemu-arm
  Cc: Barry Song, Peter Maydell, Andrew Jones, Michael S . Tsirkin,
	wanghaibin.wang, yangyicong, Yanan Wang, Shannon Zhao,
	Alistair Francis, prime.zeng, Paolo Bonzini, yuzenghui,
	Igor Mammedov, zhukeqian1, Jiajie Li, David Gibson

Add a Processor Hierarchy Node of cluster level between core level
and package level for ARM PPTT table.

Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
---
 hw/arm/virt-acpi-build.c | 55 ++++++++++++++++++++++++----------------
 1 file changed, 33 insertions(+), 22 deletions(-)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 03fd812d5a..2b745711d1 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -443,6 +443,7 @@ build_pptt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
     int pptt_start = table_data->len;
     int uid = 0, cpus = 0, socket = 0;
     MachineState *ms = MACHINE(vms);
+    unsigned int smp_clusters = vms->smp_clusters;
     unsigned int smp_cores = ms->smp.cores;
     unsigned int smp_threads = ms->smp.threads;
 
@@ -450,42 +451,52 @@ build_pptt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
 
     for (socket = 0; cpus < ms->possible_cpus->len; socket++) {
         uint32_t socket_offset = table_data->len - pptt_start;
-        int core;
+        int cluster;
 
         build_processor_hierarchy_node(
             table_data, 1, /* Physical package */
             0, socket, /* No parent */
             NULL, 0);  /* No private resources */
 
-        for (core = 0; core < smp_cores; core++) {
-            uint32_t core_offset = table_data->len - pptt_start;
-            int thread;
-
-            if (smp_threads <= 1) {
-                build_processor_hierarchy_node(
-                    table_data,
-                    (1 << 1) | /* ACPI Processor ID valid */
-                    (1 << 3),  /* ACPI 6.3 - Node is a Leaf */
-                    socket_offset, uid++, /* Parent is a Socket */
-                    NULL, 0);  /* No private resources */
-            } else {
-                build_processor_hierarchy_node(
-                    table_data, 0,
-                    socket_offset, core, /* Parent is a Socket */
-                    NULL, 0); /* No private resources */
-
-                for (thread = 0; thread < smp_threads; thread++) {
+        for (cluster = 0; cluster < smp_clusters; cluster++) {
+            uint32_t cluster_offset = table_data->len - pptt_start;
+            int core;
+
+            build_processor_hierarchy_node(
+                table_data, 0,
+                socket_offset, cluster, /* Parent is a Socket */
+                NULL, 0); /* No private resources */
+
+            for (core = 0; core < smp_cores; core++) {
+                uint32_t core_offset = table_data->len - pptt_start;
+                int thread;
+
+                if (smp_threads <= 1) {
                     build_processor_hierarchy_node(
                         table_data,
                         (1 << 1) | /* ACPI Processor ID valid */
-                        (1 << 2) | /* ACPI 6.3 - Processor is a Thread */
                         (1 << 3),  /* ACPI 6.3 - Node is a Leaf */
-                        core_offset, uid++, /* Parent is a Core */
+                        cluster_offset, uid++, /* Parent is a Cluster */
                         NULL, 0);  /* No private resources */
+                } else {
+                    build_processor_hierarchy_node(
+                        table_data, 0,
+                        cluster_offset, core, /* Parent is a Cluster */
+                        NULL, 0); /* No private resources */
+
+                    for (thread = 0; thread < smp_threads; thread++) {
+                        build_processor_hierarchy_node(
+                            table_data,
+                            (1 << 1) | /* ACPI Processor ID valid */
+                            (1 << 2) | /* ACPI 6.3 - Processor is a Thread */
+                            (1 << 3),  /* ACPI 6.3 - Node is a Leaf */
+                            core_offset, uid++, /* Parent is a Core */
+                            NULL, 0);  /* No private resources */
+                    }
                 }
             }
         }
-        cpus += smp_cores * smp_threads;
+        cpus += smp_clusters * smp_cores * smp_threads;
     }
 
     build_header(linker, table_data,
-- 
2.19.1



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [RFC PATCH v2 4/4] hw/arm/virt: Add cluster level for device tree
  2021-04-13  8:31 [RFC PATCH v2 0/4] hw/arm/virt: Introduce cluster cpu topology support Yanan Wang
                   ` (2 preceding siblings ...)
  2021-04-13  8:31 ` [RFC PATCH v2 3/4] hw/arm/virt-acpi-build: Add cluster level for PPTT table Yanan Wang
@ 2021-04-13  8:31 ` Yanan Wang
  2021-04-28 10:46   ` Andrew Jones
  3 siblings, 1 reply; 22+ messages in thread
From: Yanan Wang @ 2021-04-13  8:31 UTC (permalink / raw)
  To: qemu-devel, qemu-arm
  Cc: Barry Song, Peter Maydell, Andrew Jones, Michael S . Tsirkin,
	wanghaibin.wang, yangyicong, Yanan Wang, Shannon Zhao,
	Alistair Francis, prime.zeng, Paolo Bonzini, yuzenghui,
	Igor Mammedov, zhukeqian1, Jiajie Li, David Gibson

Add a cluster level between core level and package level for
ARM device tree.

Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
---
 hw/arm/virt.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 51797628db..4468a4734b 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -434,14 +434,18 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms)
 
             if (ms->smp.threads > 1) {
                 map_path = g_strdup_printf(
-                    "/cpus/cpu-map/%s%d/%s%d/%s%d",
-                    "socket", cpu / (ms->smp.cores * ms->smp.threads),
+                    "/cpus/cpu-map/%s%d/%s%d/%s%d/%s%d",
+                    "socket", cpu / (vms->smp_clusters * ms->smp.cores *
+                    ms->smp.threads),
+                    "cluster", (cpu / (ms->smp.cores * ms->smp.threads)) %
+                    vms->smp_clusters,
                     "core", (cpu / ms->smp.threads) % ms->smp.cores,
                     "thread", cpu % ms->smp.threads);
             } else {
                 map_path = g_strdup_printf(
-                    "/cpus/cpu-map/%s%d/%s%d",
-                    "socket", cpu / ms->smp.cores,
+                    "/cpus/cpu-map/%s%d/%s%d/%s%d",
+                    "socket", cpu / (vms->smp_clusters * ms->smp.cores),
+                    "cluster", (cpu / ms->smp.cores) % vms->smp_clusters,
                     "core", cpu % ms->smp.cores);
             }
             qemu_fdt_add_path(ms->fdt, map_path);
-- 
2.19.1



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v2 1/4] vl.c: Add -smp, clusters=* command line support for ARM cpu
  2021-04-13  8:31 ` [RFC PATCH v2 1/4] vl.c: Add -smp, clusters=* command line support for ARM cpu Yanan Wang
@ 2021-04-28 10:23   ` Andrew Jones
  2021-04-29  1:22     ` wangyanan (Y)
  0 siblings, 1 reply; 22+ messages in thread
From: Andrew Jones @ 2021-04-28 10:23 UTC (permalink / raw)
  To: Yanan Wang
  Cc: Barry Song, Peter Maydell, Michael S . Tsirkin, wanghaibin.wang,
	qemu-devel, yangyicong, Shannon Zhao, qemu-arm, Alistair Francis,
	prime.zeng, Igor Mammedov, yuzenghui, Paolo Bonzini, zhukeqian1,
	Jiajie Li, David Gibson

On Tue, Apr 13, 2021 at 04:31:44PM +0800, Yanan Wang wrote:
> A cluster means a group of cores that share some resources (e.g. cache)
> among them under the LLC. For example, ARM64 server chip Kunpeng 920 has
> 6 or 8 clusters in each NUMA, and each cluster has 4 cores. All clusters
> share L3 cache data while cores within each cluster share the L2 cache.
> 
> The cache affinity of cluster has been proved to improve the Linux kernel
> scheduling performance and a patchset has been posted, in which a general
> sched_domain for clusters was added and a cluster level was added in the
> arch-neutral cpu topology struct like below.
> struct cpu_topology {
>     int thread_id;
>     int core_id;
>     int cluster_id;
>     int package_id;
>     int llc_id;
>     cpumask_t thread_sibling;
>     cpumask_t core_sibling;
>     cpumask_t cluster_sibling;
>     cpumask_t llc_sibling;
> }
> 
> Also the Kernel Doc: Documentation/devicetree/bindings/cpu/cpu-topology.txt
> defines a four-level CPU topology hierarchy like socket/cluster/core/thread.
> According to the context, a socket node's child nodes must be one or more
> cluster nodes and a cluster node's child nodes must be one or more cluster
> nodes/one or more core nodes.
> 
> So let's add the -smp, clusters=* command line support for ARM cpu, so that
> future guest os could make use of cluster cpu topology for better scheduling
> performance. For ARM machines, a four-level cpu hierarchy can be defined and
> it will be sockets/clusters/cores/threads. Because we only support clusters
> for ARM cpu currently, a new member "unsigned smp_clusters" is added to the
> VirtMachineState structure.
> 
> Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
> ---
>  include/hw/arm/virt.h |  1 +
>  qemu-options.hx       | 26 +++++++++++++++-----------
>  softmmu/vl.c          |  3 +++
>  3 files changed, 19 insertions(+), 11 deletions(-)
> 
> diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
> index 4a4b98e4a7..5d5d156924 100644
> --- a/include/hw/arm/virt.h
> +++ b/include/hw/arm/virt.h
> @@ -155,6 +155,7 @@ struct VirtMachineState {
>      char *pciehb_nodename;
>      const int *irqmap;
>      int fdt_size;
> +    unsigned smp_clusters;
>      uint32_t clock_phandle;
>      uint32_t gic_phandle;
>      uint32_t msi_phandle;
> diff --git a/qemu-options.hx b/qemu-options.hx
> index fd21002bd6..65343ea23c 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -184,25 +184,29 @@ SRST
>  ERST
>  
>  DEF("smp", HAS_ARG, QEMU_OPTION_smp,
> -    "-smp [cpus=]n[,maxcpus=cpus][,cores=cores][,threads=threads][,dies=dies][,sockets=sockets]\n"
> +    "-smp [cpus=]n[,maxcpus=cpus][,clusters=clusters][,cores=cores][,threads=threads][,dies=dies][,sockets=sockets]\n"

Please put clusters directly in front of dies in the above list, like it
is in the list description below

>      "                set the number of CPUs to 'n' [default=1]\n"
>      "                maxcpus= maximum number of total cpus, including\n"
>      "                offline CPUs for hotplug, etc\n"
> -    "                cores= number of CPU cores on one socket (for PC, it's on one die)\n"
> +    "                cores= number of CPU cores on one socket\n"
> +    "                (it's on one die for PC, and on one cluster for ARM)\n"
>      "                threads= number of threads on one CPU core\n"
> +    "                clusters= number of CPU clusters on one socket (for ARM only)\n"
>      "                dies= number of CPU dies on one socket (for PC only)\n"
>      "                sockets= number of discrete sockets in the system\n",
>          QEMU_ARCH_ALL)
>  SRST
> -``-smp [cpus=]n[,cores=cores][,threads=threads][,dies=dies][,sockets=sockets][,maxcpus=maxcpus]``
> -    Simulate an SMP system with n CPUs. On the PC target, up to 255 CPUs
> -    are supported. On Sparc32 target, Linux limits the number of usable
> -    CPUs to 4. For the PC target, the number of cores per die, the
> -    number of threads per cores, the number of dies per packages and the
> -    total number of sockets can be specified. Missing values will be
> -    computed. If any on the three values is given, the total number of
> -    CPUs n can be omitted. maxcpus specifies the maximum number of
> -    hotpluggable CPUs.
> +``-smp [cpus=]n[,maxcpus=cpus][,clusters=clusters][,cores=cores][,threads=threads][,dies=dies][,sockets=sockets]``

Also move clusters in this list over in front of dies to match the
suggested change above.

> +    Simulate an SMP system with n CPUs. On the PC target, up to 255
> +    CPUs are supported. On the Sparc32 target, Linux limits the number
> +    of usable CPUs to 4. For the PC target, the number of threads per
> +    core, the number of cores per die, the number of dies per package
> +    and the total number of sockets can be specified. For the ARM target,
> +    the number of threads per core, the number of cores per cluster, the
> +    number of clusters per socket and the total number of sockets can be
> +    specified. And missing values will be computed. If any of the five
> +    values is given, the total number of CPUs n can be omitted. Maxcpus
> +    specifies the maximum number of hotpluggable CPUs.
>  ERST
>  
>  DEF("numa", HAS_ARG, QEMU_OPTION_numa,
> diff --git a/softmmu/vl.c b/softmmu/vl.c
> index aadb526138..46f5b6b575 100644
> --- a/softmmu/vl.c
> +++ b/softmmu/vl.c
> @@ -720,6 +720,9 @@ static QemuOptsList qemu_smp_opts = {
>          }, {
>              .name = "dies",
>              .type = QEMU_OPT_NUMBER,
> +        }, {
> +            .name = "clusters",
> +            .type = QEMU_OPT_NUMBER,
>          }, {
>              .name = "cores",
>              .type = QEMU_OPT_NUMBER,
> -- 
> 2.19.1
>

Thanks,
drew



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v2 2/4] hw/arm/virt: Parse -smp cluster parameter in virt_smp_parse
  2021-04-13  8:31 ` [RFC PATCH v2 2/4] hw/arm/virt: Parse -smp cluster parameter in virt_smp_parse Yanan Wang
@ 2021-04-28 10:31   ` Andrew Jones
  2021-04-29  2:14     ` wangyanan (Y)
  0 siblings, 1 reply; 22+ messages in thread
From: Andrew Jones @ 2021-04-28 10:31 UTC (permalink / raw)
  To: Yanan Wang
  Cc: Barry Song, Peter Maydell, Michael S . Tsirkin, wanghaibin.wang,
	qemu-devel, yangyicong, Shannon Zhao, qemu-arm, Alistair Francis,
	prime.zeng, Igor Mammedov, yuzenghui, Paolo Bonzini, zhukeqian1,
	Jiajie Li, David Gibson

On Tue, Apr 13, 2021 at 04:31:45PM +0800, Yanan Wang wrote:
> There is a separate function virt_smp_parse() in hw/virt/arm.c used
> to parse cpu topology for the ARM machines. So add parsing of -smp
> cluster parameter in it, then total number of logical cpus will be
> calculated like: max_cpus = sockets * clusters * cores * threads.
> 
> In virt_smp_parse(), the computing logic of missing values prefers
> cores over sockets over threads. And for compatibility, the value
> of clusters will be set as default 1 if not explicitly specified.
> 
> Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
> ---
>  hw/arm/virt.c | 32 ++++++++++++++++++--------------
>  1 file changed, 18 insertions(+), 14 deletions(-)
> 
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 57ef961cb5..51797628db 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -2639,35 +2639,38 @@ static void virt_smp_parse(MachineState *ms, QemuOpts *opts)
>      if (opts) {
>          unsigned cpus = qemu_opt_get_number(opts, "cpus", 0);
>          unsigned sockets = qemu_opt_get_number(opts, "sockets", 0);
> +        unsigned clusters = qemu_opt_get_number(opts, "clusters", 1);
>          unsigned cores = qemu_opt_get_number(opts, "cores", 0);
>          unsigned threads = qemu_opt_get_number(opts, "threads", 0);
> +        VirtMachineState *vms = VIRT_MACHINE(ms);
>  
>          /*
> -         * Compute missing values; prefer cores over sockets and
> -         * sockets over threads.
> +         * Compute missing values; prefer cores over sockets and sockets
> +         * over threads. For compatibility, value of clusters will have
> +         * been set as default 1 if not explicitly specified.
>           */
>          if (cpus == 0 || cores == 0) {
>              sockets = sockets > 0 ? sockets : 1;
>              threads = threads > 0 ? threads : 1;
>              if (cpus == 0) {
>                  cores = cores > 0 ? cores : 1;
> -                cpus = cores * threads * sockets;
> +                cpus = sockets * clusters * cores * threads;
>              } else {
>                  ms->smp.max_cpus = qemu_opt_get_number(opts, "maxcpus", cpus);
> -                cores = ms->smp.max_cpus / (sockets * threads);
> +                cores = ms->smp.max_cpus / (sockets * clusters * threads);
>              }
>          } else if (sockets == 0) {
>              threads = threads > 0 ? threads : 1;
> -            sockets = cpus / (cores * threads);
> +            sockets = cpus / (clusters * cores * threads);
>              sockets = sockets > 0 ? sockets : 1;

If we initialize clusters to zero instead of one and add lines in
'cpus == 0 || cores == 0' and 'sockets == 0' like
'clusters = clusters > 0 ? clusters : 1' as needed, then I think we can
add

 } else if (clusters == 0) {
     threads = threads > 0 ? threads : 1;
     clusters = cpus / (sockets * cores * thread);
     clusters = clusters > 0 ? clusters : 1;
 }

here.

>          } else if (threads == 0) {
> -            threads = cpus / (cores * sockets);
> +            threads = cpus / (sockets * clusters * cores);
>              threads = threads > 0 ? threads : 1;
> -        } else if (sockets * cores * threads < cpus) {
> +        } else if (sockets * clusters * cores * threads < cpus) {
>              error_report("cpu topology: "
> -                         "sockets (%u) * cores (%u) * threads (%u) < "
> -                         "smp_cpus (%u)",
> -                         sockets, cores, threads, cpus);
> +                         "sockets (%u) * clusters (%u) * cores (%u) * "
> +                         "threads (%u) < smp_cpus (%u)",
> +                         sockets, clusters, cores, threads, cpus);
>              exit(1);
>          }
>  
> @@ -2678,11 +2681,11 @@ static void virt_smp_parse(MachineState *ms, QemuOpts *opts)
>              exit(1);
>          }
>  
> -        if (sockets * cores * threads != ms->smp.max_cpus) {
> +        if (sockets * clusters * cores * threads != ms->smp.max_cpus) {
>              error_report("cpu topology: "
> -                         "sockets (%u) * cores (%u) * threads (%u)"
> -                         "!= maxcpus (%u)",
> -                         sockets, cores, threads,
> +                         "sockets (%u) * clusters (%u) * cores (%u) * "
> +                         "threads (%u) != maxcpus (%u)",
> +                         sockets, clusters, cores, threads,
>                           ms->smp.max_cpus);
>              exit(1);
>          }
> @@ -2691,6 +2694,7 @@ static void virt_smp_parse(MachineState *ms, QemuOpts *opts)
>          ms->smp.cores = cores;
>          ms->smp.threads = threads;
>          ms->smp.sockets = sockets;
> +        vms->smp_clusters = clusters;
>      }
>  
>      if (ms->smp.cpus > 1) {
> -- 
> 2.19.1
>

Thanks,
drew 



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v2 3/4] hw/arm/virt-acpi-build: Add cluster level for PPTT table
  2021-04-13  8:31 ` [RFC PATCH v2 3/4] hw/arm/virt-acpi-build: Add cluster level for PPTT table Yanan Wang
@ 2021-04-28 10:41   ` Andrew Jones
  0 siblings, 0 replies; 22+ messages in thread
From: Andrew Jones @ 2021-04-28 10:41 UTC (permalink / raw)
  To: Yanan Wang
  Cc: Barry Song, Peter Maydell, Michael S . Tsirkin, wanghaibin.wang,
	qemu-devel, yangyicong, Shannon Zhao, qemu-arm, Alistair Francis,
	prime.zeng, Igor Mammedov, yuzenghui, Paolo Bonzini, zhukeqian1,
	Jiajie Li, David Gibson

On Tue, Apr 13, 2021 at 04:31:46PM +0800, Yanan Wang wrote:
> Add a Processor Hierarchy Node of cluster level between core level
> and package level for ARM PPTT table.
> 
> Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
> ---
>  hw/arm/virt-acpi-build.c | 55 ++++++++++++++++++++++++----------------
>  1 file changed, 33 insertions(+), 22 deletions(-)
>

Reviewed-by: Andrew Jones <drjones@redhat.com>



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v2 4/4] hw/arm/virt: Add cluster level for device tree
  2021-04-13  8:31 ` [RFC PATCH v2 4/4] hw/arm/virt: Add cluster level for device tree Yanan Wang
@ 2021-04-28 10:46   ` Andrew Jones
  0 siblings, 0 replies; 22+ messages in thread
From: Andrew Jones @ 2021-04-28 10:46 UTC (permalink / raw)
  To: Yanan Wang
  Cc: Barry Song, Peter Maydell, Michael S . Tsirkin, wanghaibin.wang,
	qemu-devel, yangyicong, Shannon Zhao, qemu-arm, Alistair Francis,
	prime.zeng, Igor Mammedov, yuzenghui, Paolo Bonzini, zhukeqian1,
	Jiajie Li, David Gibson

On Tue, Apr 13, 2021 at 04:31:47PM +0800, Yanan Wang wrote:
> Add a cluster level between core level and package level for
> ARM device tree.
> 
> Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
> ---
>  hw/arm/virt.c | 12 ++++++++----
>  1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 51797628db..4468a4734b 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -434,14 +434,18 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms)
>  
>              if (ms->smp.threads > 1) {
>                  map_path = g_strdup_printf(
> -                    "/cpus/cpu-map/%s%d/%s%d/%s%d",
> -                    "socket", cpu / (ms->smp.cores * ms->smp.threads),
> +                    "/cpus/cpu-map/%s%d/%s%d/%s%d/%s%d",
> +                    "socket", cpu / (vms->smp_clusters * ms->smp.cores *
> +                    ms->smp.threads),
> +                    "cluster", (cpu / (ms->smp.cores * ms->smp.threads)) %
> +                    vms->smp_clusters,
>                      "core", (cpu / ms->smp.threads) % ms->smp.cores,
>                      "thread", cpu % ms->smp.threads);
>              } else {
>                  map_path = g_strdup_printf(
> -                    "/cpus/cpu-map/%s%d/%s%d",
> -                    "socket", cpu / ms->smp.cores,
> +                    "/cpus/cpu-map/%s%d/%s%d/%s%d",
> +                    "socket", cpu / (vms->smp_clusters * ms->smp.cores),
> +                    "cluster", (cpu / ms->smp.cores) % vms->smp_clusters,
>                      "core", cpu % ms->smp.cores);
>              }
>              qemu_fdt_add_path(ms->fdt, map_path);
> -- 
> 2.19.1
>

Reviewed-by: Andrew Jones <drjones@redhat.com>



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v2 1/4] vl.c: Add -smp, clusters=* command line support for ARM cpu
  2021-04-28 10:23   ` Andrew Jones
@ 2021-04-29  1:22     ` wangyanan (Y)
  0 siblings, 0 replies; 22+ messages in thread
From: wangyanan (Y) @ 2021-04-29  1:22 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Barry Song, Peter Maydell, Michael S . Tsirkin, wanghaibin.wang,
	qemu-devel, yangyicong, Shannon Zhao, qemu-arm, Alistair Francis,
	prime.zeng, Igor Mammedov, yuzenghui, Paolo Bonzini, zhukeqian1,
	Jiajie Li, David Gibson


On 2021/4/28 18:23, Andrew Jones wrote:
> On Tue, Apr 13, 2021 at 04:31:44PM +0800, Yanan Wang wrote:
>> A cluster means a group of cores that share some resources (e.g. cache)
>> among them under the LLC. For example, ARM64 server chip Kunpeng 920 has
>> 6 or 8 clusters in each NUMA, and each cluster has 4 cores. All clusters
>> share L3 cache data while cores within each cluster share the L2 cache.
>>
>> The cache affinity of cluster has been proved to improve the Linux kernel
>> scheduling performance and a patchset has been posted, in which a general
>> sched_domain for clusters was added and a cluster level was added in the
>> arch-neutral cpu topology struct like below.
>> struct cpu_topology {
>>      int thread_id;
>>      int core_id;
>>      int cluster_id;
>>      int package_id;
>>      int llc_id;
>>      cpumask_t thread_sibling;
>>      cpumask_t core_sibling;
>>      cpumask_t cluster_sibling;
>>      cpumask_t llc_sibling;
>> }
>>
>> Also the Kernel Doc: Documentation/devicetree/bindings/cpu/cpu-topology.txt
>> defines a four-level CPU topology hierarchy like socket/cluster/core/thread.
>> According to the context, a socket node's child nodes must be one or more
>> cluster nodes and a cluster node's child nodes must be one or more cluster
>> nodes/one or more core nodes.
>>
>> So let's add the -smp, clusters=* command line support for ARM cpu, so that
>> future guest os could make use of cluster cpu topology for better scheduling
>> performance. For ARM machines, a four-level cpu hierarchy can be defined and
>> it will be sockets/clusters/cores/threads. Because we only support clusters
>> for ARM cpu currently, a new member "unsigned smp_clusters" is added to the
>> VirtMachineState structure.
>>
>> Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
>> ---
>>   include/hw/arm/virt.h |  1 +
>>   qemu-options.hx       | 26 +++++++++++++++-----------
>>   softmmu/vl.c          |  3 +++
>>   3 files changed, 19 insertions(+), 11 deletions(-)
>>
>> diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
>> index 4a4b98e4a7..5d5d156924 100644
>> --- a/include/hw/arm/virt.h
>> +++ b/include/hw/arm/virt.h
>> @@ -155,6 +155,7 @@ struct VirtMachineState {
>>       char *pciehb_nodename;
>>       const int *irqmap;
>>       int fdt_size;
>> +    unsigned smp_clusters;
>>       uint32_t clock_phandle;
>>       uint32_t gic_phandle;
>>       uint32_t msi_phandle;
>> diff --git a/qemu-options.hx b/qemu-options.hx
>> index fd21002bd6..65343ea23c 100644
>> --- a/qemu-options.hx
>> +++ b/qemu-options.hx
>> @@ -184,25 +184,29 @@ SRST
>>   ERST
>>   
>>   DEF("smp", HAS_ARG, QEMU_OPTION_smp,
>> -    "-smp [cpus=]n[,maxcpus=cpus][,cores=cores][,threads=threads][,dies=dies][,sockets=sockets]\n"
>> +    "-smp [cpus=]n[,maxcpus=cpus][,clusters=clusters][,cores=cores][,threads=threads][,dies=dies][,sockets=sockets]\n"
> Please put clusters directly in front of dies in the above list, like it
> is in the list description below
>
>>       "                set the number of CPUs to 'n' [default=1]\n"
>>       "                maxcpus= maximum number of total cpus, including\n"
>>       "                offline CPUs for hotplug, etc\n"
>> -    "                cores= number of CPU cores on one socket (for PC, it's on one die)\n"
>> +    "                cores= number of CPU cores on one socket\n"
>> +    "                (it's on one die for PC, and on one cluster for ARM)\n"
>>       "                threads= number of threads on one CPU core\n"
>> +    "                clusters= number of CPU clusters on one socket (for ARM only)\n"
>>       "                dies= number of CPU dies on one socket (for PC only)\n"
>>       "                sockets= number of discrete sockets in the system\n",
>>           QEMU_ARCH_ALL)
>>   SRST
>> -``-smp [cpus=]n[,cores=cores][,threads=threads][,dies=dies][,sockets=sockets][,maxcpus=maxcpus]``
>> -    Simulate an SMP system with n CPUs. On the PC target, up to 255 CPUs
>> -    are supported. On Sparc32 target, Linux limits the number of usable
>> -    CPUs to 4. For the PC target, the number of cores per die, the
>> -    number of threads per cores, the number of dies per packages and the
>> -    total number of sockets can be specified. Missing values will be
>> -    computed. If any on the three values is given, the total number of
>> -    CPUs n can be omitted. maxcpus specifies the maximum number of
>> -    hotpluggable CPUs.
>> +``-smp [cpus=]n[,maxcpus=cpus][,clusters=clusters][,cores=cores][,threads=threads][,dies=dies][,sockets=sockets]``
> Also move clusters in this list over in front of dies to match the
> suggested change above.
Thanks, I will change the place.

Thanks,
Yanan
>> +    Simulate an SMP system with n CPUs. On the PC target, up to 255
>> +    CPUs are supported. On the Sparc32 target, Linux limits the number
>> +    of usable CPUs to 4. For the PC target, the number of threads per
>> +    core, the number of cores per die, the number of dies per package
>> +    and the total number of sockets can be specified. For the ARM target,
>> +    the number of threads per core, the number of cores per cluster, the
>> +    number of clusters per socket and the total number of sockets can be
>> +    specified. And missing values will be computed. If any of the five
>> +    values is given, the total number of CPUs n can be omitted. Maxcpus
>> +    specifies the maximum number of hotpluggable CPUs.
>>   ERST
>>   
>>   DEF("numa", HAS_ARG, QEMU_OPTION_numa,
>> diff --git a/softmmu/vl.c b/softmmu/vl.c
>> index aadb526138..46f5b6b575 100644
>> --- a/softmmu/vl.c
>> +++ b/softmmu/vl.c
>> @@ -720,6 +720,9 @@ static QemuOptsList qemu_smp_opts = {
>>           }, {
>>               .name = "dies",
>>               .type = QEMU_OPT_NUMBER,
>> +        }, {
>> +            .name = "clusters",
>> +            .type = QEMU_OPT_NUMBER,
>>           }, {
>>               .name = "cores",
>>               .type = QEMU_OPT_NUMBER,
>> -- 
>> 2.19.1
>>
> Thanks,
> drew
>
> .


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v2 2/4] hw/arm/virt: Parse -smp cluster parameter in virt_smp_parse
  2021-04-28 10:31   ` Andrew Jones
@ 2021-04-29  2:14     ` wangyanan (Y)
  2021-04-29  7:16       ` Andrew Jones
  0 siblings, 1 reply; 22+ messages in thread
From: wangyanan (Y) @ 2021-04-29  2:14 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Barry Song, Peter Maydell, Michael S . Tsirkin, wanghaibin.wang,
	qemu-devel, yangyicong, Shannon Zhao, qemu-arm, Alistair Francis,
	prime.zeng, Igor Mammedov, yuzenghui, Paolo Bonzini, zhukeqian1,
	Jiajie Li, David Gibson

Hi Drew,

On 2021/4/28 18:31, Andrew Jones wrote:
> On Tue, Apr 13, 2021 at 04:31:45PM +0800, Yanan Wang wrote:
>> There is a separate function virt_smp_parse() in hw/virt/arm.c used
>> to parse cpu topology for the ARM machines. So add parsing of -smp
>> cluster parameter in it, then total number of logical cpus will be
>> calculated like: max_cpus = sockets * clusters * cores * threads.
>>
>> In virt_smp_parse(), the computing logic of missing values prefers
>> cores over sockets over threads. And for compatibility, the value
>> of clusters will be set as default 1 if not explicitly specified.
>>
>> Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
>> ---
>>   hw/arm/virt.c | 32 ++++++++++++++++++--------------
>>   1 file changed, 18 insertions(+), 14 deletions(-)
>>
>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
>> index 57ef961cb5..51797628db 100644
>> --- a/hw/arm/virt.c
>> +++ b/hw/arm/virt.c
>> @@ -2639,35 +2639,38 @@ static void virt_smp_parse(MachineState *ms, QemuOpts *opts)
>>       if (opts) {
>>           unsigned cpus = qemu_opt_get_number(opts, "cpus", 0);
>>           unsigned sockets = qemu_opt_get_number(opts, "sockets", 0);
>> +        unsigned clusters = qemu_opt_get_number(opts, "clusters", 1);
>>           unsigned cores = qemu_opt_get_number(opts, "cores", 0);
>>           unsigned threads = qemu_opt_get_number(opts, "threads", 0);
>> +        VirtMachineState *vms = VIRT_MACHINE(ms);
>>   
>>           /*
>> -         * Compute missing values; prefer cores over sockets and
>> -         * sockets over threads.
>> +         * Compute missing values; prefer cores over sockets and sockets
>> +         * over threads. For compatibility, value of clusters will have
>> +         * been set as default 1 if not explicitly specified.
>>            */
>>           if (cpus == 0 || cores == 0) {
>>               sockets = sockets > 0 ? sockets : 1;
>>               threads = threads > 0 ? threads : 1;
>>               if (cpus == 0) {
>>                   cores = cores > 0 ? cores : 1;
>> -                cpus = cores * threads * sockets;
>> +                cpus = sockets * clusters * cores * threads;
>>               } else {
>>                   ms->smp.max_cpus = qemu_opt_get_number(opts, "maxcpus", cpus);
>> -                cores = ms->smp.max_cpus / (sockets * threads);
>> +                cores = ms->smp.max_cpus / (sockets * clusters * threads);
>>               }
>>           } else if (sockets == 0) {
>>               threads = threads > 0 ? threads : 1;
>> -            sockets = cpus / (cores * threads);
>> +            sockets = cpus / (clusters * cores * threads);
>>               sockets = sockets > 0 ? sockets : 1;
> If we initialize clusters to zero instead of one and add lines in
> 'cpus == 0 || cores == 0' and 'sockets == 0' like
> 'clusters = clusters > 0 ? clusters : 1' as needed, then I think we can
> add
>
>   } else if (clusters == 0) {
>       threads = threads > 0 ? threads : 1;
>       clusters = cpus / (sockets * cores * thread);
>       clusters = clusters > 0 ? clusters : 1;
>   }
>
> here.
I have thought about this kind of format before, but there is a little bit
difference between these two ways. Let's chose the better and more
reasonable one of the two.

Way A currently in this patch:
If value of clusters is not explicitly specified in -smp command line, 
we assume
that users don't want to support clusters, for compatibility we 
initialized the
value to 1. So that with cmdline "-smp cpus=24, sockets=2, cores=6", we will
parse out the topology description like below:
cpus=24, sockets=2, clusters=1, cores=6, threads=2

Way B that you suggested for this patch:
Whether value of clusters is explicitly specified in -smp command line 
or not,
we assume that clusters are supported and calculate the value. So that with
cmdline "-smp cpus=24, sockets=2, cores=6", we will parse out the topology
description like below:
cpus =24, sockets=2, clusters=2, cores=6, threads=1

But I think maybe we should not assume too much about what users think
through the -smp command line. We should just assume that all levels of
cpu topology are supported and calculate them, and users should be more
careful if they want to get the expected results with not so complete 
cmdline.
If I'm right, then Way B should be better. :)

Thanks,
Yanan
>>           } else if (threads == 0) {
>> -            threads = cpus / (cores * sockets);
>> +            threads = cpus / (sockets * clusters * cores);
>>               threads = threads > 0 ? threads : 1;
>> -        } else if (sockets * cores * threads < cpus) {
>> +        } else if (sockets * clusters * cores * threads < cpus) {
>>               error_report("cpu topology: "
>> -                         "sockets (%u) * cores (%u) * threads (%u) < "
>> -                         "smp_cpus (%u)",
>> -                         sockets, cores, threads, cpus);
>> +                         "sockets (%u) * clusters (%u) * cores (%u) * "
>> +                         "threads (%u) < smp_cpus (%u)",
>> +                         sockets, clusters, cores, threads, cpus);
>>               exit(1);
>>           }
>>   
>> @@ -2678,11 +2681,11 @@ static void virt_smp_parse(MachineState *ms, QemuOpts *opts)
>>               exit(1);
>>           }
>>   
>> -        if (sockets * cores * threads != ms->smp.max_cpus) {
>> +        if (sockets * clusters * cores * threads != ms->smp.max_cpus) {
>>               error_report("cpu topology: "
>> -                         "sockets (%u) * cores (%u) * threads (%u)"
>> -                         "!= maxcpus (%u)",
>> -                         sockets, cores, threads,
>> +                         "sockets (%u) * clusters (%u) * cores (%u) * "
>> +                         "threads (%u) != maxcpus (%u)",
>> +                         sockets, clusters, cores, threads,
>>                            ms->smp.max_cpus);
>>               exit(1);
>>           }
>> @@ -2691,6 +2694,7 @@ static void virt_smp_parse(MachineState *ms, QemuOpts *opts)
>>           ms->smp.cores = cores;
>>           ms->smp.threads = threads;
>>           ms->smp.sockets = sockets;
>> +        vms->smp_clusters = clusters;
>>       }
>>   
>>       if (ms->smp.cpus > 1) {
>> -- 
>> 2.19.1
>>
> Thanks,
> drew
>
> .


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v2 2/4] hw/arm/virt: Parse -smp cluster parameter in virt_smp_parse
  2021-04-29  2:14     ` wangyanan (Y)
@ 2021-04-29  7:16       ` Andrew Jones
  2021-04-29  8:56         ` wangyanan (Y)
  0 siblings, 1 reply; 22+ messages in thread
From: Andrew Jones @ 2021-04-29  7:16 UTC (permalink / raw)
  To: wangyanan (Y)
  Cc: Barry Song, Peter Maydell, Michael S . Tsirkin, wanghaibin.wang,
	qemu-devel, yangyicong, Shannon Zhao, qemu-arm, Alistair Francis,
	prime.zeng, Igor Mammedov, yuzenghui, Paolo Bonzini, zhukeqian1,
	Jiajie Li, David Gibson

On Thu, Apr 29, 2021 at 10:14:37AM +0800, wangyanan (Y) wrote:
> On 2021/4/28 18:31, Andrew Jones wrote:
> > On Tue, Apr 13, 2021 at 04:31:45PM +0800, Yanan Wang wrote:
> > >           } else if (sockets == 0) {
> > >               threads = threads > 0 ? threads : 1;
> > > -            sockets = cpus / (cores * threads);
> > > +            sockets = cpus / (clusters * cores * threads);
> > >               sockets = sockets > 0 ? sockets : 1;
> > If we initialize clusters to zero instead of one and add lines in
> > 'cpus == 0 || cores == 0' and 'sockets == 0' like
> > 'clusters = clusters > 0 ? clusters : 1' as needed, then I think we can
> > add
> > 
> >   } else if (clusters == 0) {
> >       threads = threads > 0 ? threads : 1;
> >       clusters = cpus / (sockets * cores * thread);
> >       clusters = clusters > 0 ? clusters : 1;
> >   }
> > 
> > here.
> I have thought about this kind of format before, but there is a little bit
> difference between these two ways. Let's chose the better and more
> reasonable one of the two.
> 
> Way A currently in this patch:
> If value of clusters is not explicitly specified in -smp command line, we
> assume
> that users don't want to support clusters, for compatibility we initialized
> the
> value to 1. So that with cmdline "-smp cpus=24, sockets=2, cores=6", we will
> parse out the topology description like below:
> cpus=24, sockets=2, clusters=1, cores=6, threads=2
> 
> Way B that you suggested for this patch:
> Whether value of clusters is explicitly specified in -smp command line or
> not,
> we assume that clusters are supported and calculate the value. So that with
> cmdline "-smp cpus=24, sockets=2, cores=6", we will parse out the topology
> description like below:
> cpus =24, sockets=2, clusters=2, cores=6, threads=1
> 
> But I think maybe we should not assume too much about what users think
> through the -smp command line. We should just assume that all levels of
> cpu topology are supported and calculate them, and users should be more
> careful if they want to get the expected results with not so complete
> cmdline.
> If I'm right, then Way B should be better. :)
>

Hi Yanan,

We're already assuming the user wants to describe clusters to the guest
because we require at least one per socket. If we want the user to have a
choice between using clusters or not, then I guess we need to change the
logic in the PPTT and the cpu-map to only generate the cluster level when
the number of clusters is not zero. And then change this parser to not
require clusters at all.

I'm not a big fan of these auto-calculated values either, but the
documentation says that it'll do that, and it's been done that way
forever, so I think we're stuck with it for the -smp option. Hmm, I was
just about to say that x86 computes all its values, but I see the most
recently added one, 'dies', is implemented the way you're proposing we
implement 'clusters', i.e. default to one and don't calculate it when it's
missing. I actually consider that either a documentation bug or an smp
parsing bug, though.

Another possible option, for Arm, because only the cpus and maxcpus
parameters of -smp have ever worked, is to document, for Arm, that if even
one parameter other than cpus or maxcpus is provided, then all parameters
must be provided. We can still decide if clusters=0 is valid, but we'll
enforce that everything is explicit and that the product (with or without
clusters) matches maxcpus.

Requiring every parameter might be stricter than necessary, though, I
think we're mostly concerned with cpus/maxcpus, sockets, and cores.
clusters can default to one or zero (whatever we choose and document),
threads can default to one, and cpus can default to maxcpus or maxcpus can
default to cpus, but at least one of those must be provided. And, if
sockets are provided, then cores must be provided and vice versa. If
neither sockets nor cores are provided, then nothing else besides cpus and
maxcpus may be provided, and that would mean to not generate any topology
descriptions for the guest.

Thanks,
drew



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v2 2/4] hw/arm/virt: Parse -smp cluster parameter in virt_smp_parse
  2021-04-29  7:16       ` Andrew Jones
@ 2021-04-29  8:56         ` wangyanan (Y)
  2021-04-29 11:02           ` Andrew Jones
  0 siblings, 1 reply; 22+ messages in thread
From: wangyanan (Y) @ 2021-04-29  8:56 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Barry Song, Peter Maydell, Michael S . Tsirkin, wanghaibin.wang,
	qemu-devel, yangyicong, Shannon Zhao, qemu-arm, Alistair Francis,
	prime.zeng, Igor Mammedov, yuzenghui, Paolo Bonzini, zhukeqian1,
	Jiajie Li, David Gibson


On 2021/4/29 15:16, Andrew Jones wrote:
> On Thu, Apr 29, 2021 at 10:14:37AM +0800, wangyanan (Y) wrote:
>> On 2021/4/28 18:31, Andrew Jones wrote:
>>> On Tue, Apr 13, 2021 at 04:31:45PM +0800, Yanan Wang wrote:
>>>>            } else if (sockets == 0) {
>>>>                threads = threads > 0 ? threads : 1;
>>>> -            sockets = cpus / (cores * threads);
>>>> +            sockets = cpus / (clusters * cores * threads);
>>>>                sockets = sockets > 0 ? sockets : 1;
>>> If we initialize clusters to zero instead of one and add lines in
>>> 'cpus == 0 || cores == 0' and 'sockets == 0' like
>>> 'clusters = clusters > 0 ? clusters : 1' as needed, then I think we can
>>> add
>>>
>>>    } else if (clusters == 0) {
>>>        threads = threads > 0 ? threads : 1;
>>>        clusters = cpus / (sockets * cores * thread);
>>>        clusters = clusters > 0 ? clusters : 1;
>>>    }
>>>
>>> here.
>> I have thought about this kind of format before, but there is a little bit
>> difference between these two ways. Let's chose the better and more
>> reasonable one of the two.
>>
>> Way A currently in this patch:
>> If value of clusters is not explicitly specified in -smp command line, we
>> assume
>> that users don't want to support clusters, for compatibility we initialized
>> the
>> value to 1. So that with cmdline "-smp cpus=24, sockets=2, cores=6", we will
>> parse out the topology description like below:
>> cpus=24, sockets=2, clusters=1, cores=6, threads=2
>>
>> Way B that you suggested for this patch:
>> Whether value of clusters is explicitly specified in -smp command line or
>> not,
>> we assume that clusters are supported and calculate the value. So that with
>> cmdline "-smp cpus=24, sockets=2, cores=6", we will parse out the topology
>> description like below:
>> cpus =24, sockets=2, clusters=2, cores=6, threads=1
>>
>> But I think maybe we should not assume too much about what users think
>> through the -smp command line. We should just assume that all levels of
>> cpu topology are supported and calculate them, and users should be more
>> careful if they want to get the expected results with not so complete
>> cmdline.
>> If I'm right, then Way B should be better. :)
>>
> Hi Yanan,
>
> We're already assuming the user wants to describe clusters to the guest
> because we require at least one per socket. If we want the user to have a
> choice between using clusters or not, then I guess we need to change the
> logic in the PPTT and the cpu-map to only generate the cluster level when
> the number of clusters is not zero. And then change this parser to not
> require clusters at all.
Hi Drew,

I think this kind of change will introduce more complexity and actually 
is not necessary.
Not generating cluster level at all and generating cluster level (one 
per socket) are same
to kernel. Without cluster description provided, kernel will initialize 
all cores in the same
cluster which also means one cluster per socket.

So we should only ensure value of clusters per socket is one if we don't 
want to use clusters,
and don't need to care about whether or not to generate description in 
PPTT and cpu-map.
Is this right?
> I'm not a big fan of these auto-calculated values either, but the
> documentation says that it'll do that, and it's been done that way
> forever, so I think we're stuck with it for the -smp option. Hmm, I was
> just about to say that x86 computes all its values, but I see the most
> recently added one, 'dies', is implemented the way you're proposing we
> implement 'clusters', i.e. default to one and don't calculate it when it's
> missing. I actually consider that either a documentation bug or an smp
> parsing bug, though.
My propose originally came from implementation of x86.
> Another possible option, for Arm, because only the cpus and maxcpus
> parameters of -smp have ever worked, is to document, for Arm, that if even
> one parameter other than cpus or maxcpus is provided, then all parameters
> must be provided. We can still decide if clusters=0 is valid, but we'll
> enforce that everything is explicit and that the product (with or without
> clusters) matches maxcpus.
Requiring every parameter explicitly will be most stable but indeed strict.

Currently all the parsers use way B to calculate value of thread if it 
is not provided explicitly.
So users should ensure the -smp cmdline they provided can result in that 
parsed threads will
be 1 if they don't want to support multiple threads in one core.

Very similar to thread, users should also ensure the provided cmdline 
can result in that parsed
clusters will be 1 if they don't want to support multiple clusters in 
one socket.

So I'm wondering if we can just add some commit in the documentation to 
tell users that they
should ensure this if they don't want support it. And as for calculation 
of clusters, we follow
the logic of other parameters as you suggested in way B.

Thanks,
Yanan
>
> Requiring every parameter might be stricter than necessary, though, I
> think we're mostly concerned with cpus/maxcpus, sockets, and cores.
> clusters can default to one or zero (whatever we choose and document),
> threads can default to one, and cpus can default to maxcpus or maxcpus can
> default to cpus, but at least one of those must be provided. And, if
> sockets are provided, then cores must be provided and vice versa. If
> neither sockets nor cores are provided, then nothing else besides cpus and
> maxcpus may be provided, and that would mean to not generate any topology
> descriptions for the guest.
> Thanks,
> drew
>
> .


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v2 2/4] hw/arm/virt: Parse -smp cluster parameter in virt_smp_parse
  2021-04-29  8:56         ` wangyanan (Y)
@ 2021-04-29 11:02           ` Andrew Jones
  2021-04-30  5:09             ` wangyanan (Y)
  0 siblings, 1 reply; 22+ messages in thread
From: Andrew Jones @ 2021-04-29 11:02 UTC (permalink / raw)
  To: wangyanan (Y)
  Cc: Barry Song, Peter Maydell, Michael S . Tsirkin, wanghaibin.wang,
	qemu-devel, yangyicong, Shannon Zhao, qemu-arm, Alistair Francis,
	prime.zeng, Igor Mammedov, yuzenghui, Paolo Bonzini, zhukeqian1,
	Jiajie Li, David Gibson

On Thu, Apr 29, 2021 at 04:56:06PM +0800, wangyanan (Y) wrote:
> 
> On 2021/4/29 15:16, Andrew Jones wrote:
> > On Thu, Apr 29, 2021 at 10:14:37AM +0800, wangyanan (Y) wrote:
> > > On 2021/4/28 18:31, Andrew Jones wrote:
> > > > On Tue, Apr 13, 2021 at 04:31:45PM +0800, Yanan Wang wrote:
> > > > >            } else if (sockets == 0) {
> > > > >                threads = threads > 0 ? threads : 1;
> > > > > -            sockets = cpus / (cores * threads);
> > > > > +            sockets = cpus / (clusters * cores * threads);
> > > > >                sockets = sockets > 0 ? sockets : 1;
> > > > If we initialize clusters to zero instead of one and add lines in
> > > > 'cpus == 0 || cores == 0' and 'sockets == 0' like
> > > > 'clusters = clusters > 0 ? clusters : 1' as needed, then I think we can
> > > > add
> > > > 
> > > >    } else if (clusters == 0) {
> > > >        threads = threads > 0 ? threads : 1;
> > > >        clusters = cpus / (sockets * cores * thread);
> > > >        clusters = clusters > 0 ? clusters : 1;
> > > >    }
> > > > 
> > > > here.
> > > I have thought about this kind of format before, but there is a little bit
> > > difference between these two ways. Let's chose the better and more
> > > reasonable one of the two.
> > > 
> > > Way A currently in this patch:
> > > If value of clusters is not explicitly specified in -smp command line, we
> > > assume
> > > that users don't want to support clusters, for compatibility we initialized
> > > the
> > > value to 1. So that with cmdline "-smp cpus=24, sockets=2, cores=6", we will
> > > parse out the topology description like below:
> > > cpus=24, sockets=2, clusters=1, cores=6, threads=2
> > > 
> > > Way B that you suggested for this patch:
> > > Whether value of clusters is explicitly specified in -smp command line or
> > > not,
> > > we assume that clusters are supported and calculate the value. So that with
> > > cmdline "-smp cpus=24, sockets=2, cores=6", we will parse out the topology
> > > description like below:
> > > cpus =24, sockets=2, clusters=2, cores=6, threads=1
> > > 
> > > But I think maybe we should not assume too much about what users think
> > > through the -smp command line. We should just assume that all levels of
> > > cpu topology are supported and calculate them, and users should be more
> > > careful if they want to get the expected results with not so complete
> > > cmdline.
> > > If I'm right, then Way B should be better. :)
> > > 
> > Hi Yanan,
> > 
> > We're already assuming the user wants to describe clusters to the guest
> > because we require at least one per socket. If we want the user to have a
> > choice between using clusters or not, then I guess we need to change the
> > logic in the PPTT and the cpu-map to only generate the cluster level when
> > the number of clusters is not zero. And then change this parser to not
> > require clusters at all.
> Hi Drew,
> 
> I think this kind of change will introduce more complexity and actually is
> not necessary.
> Not generating cluster level at all and generating cluster level (one per
> socket) are same
> to kernel. Without cluster description provided, kernel will initialize all
> cores in the same
> cluster which also means one cluster per socket.

Which kernel? All kernels? One without the cluster support patches [1]?

[1] https://lore.kernel.org/lkml/20210420001844.9116-1-song.bao.hua@hisilicon.com/#t

> 
> So we should only ensure value of clusters per socket is one if we don't
> want to use clusters,
> and don't need to care about whether or not to generate description in PPTT
> and cpu-map.
> Is this right?

Depends on your answer to my 'which kernel' questions.

> > I'm not a big fan of these auto-calculated values either, but the
> > documentation says that it'll do that, and it's been done that way
> > forever, so I think we're stuck with it for the -smp option. Hmm, I was
> > just about to say that x86 computes all its values, but I see the most
> > recently added one, 'dies', is implemented the way you're proposing we
> > implement 'clusters', i.e. default to one and don't calculate it when it's
> > missing. I actually consider that either a documentation bug or an smp
> > parsing bug, though.
> My propose originally came from implementation of x86.
> > Another possible option, for Arm, because only the cpus and maxcpus
> > parameters of -smp have ever worked, is to document, for Arm, that if even
> > one parameter other than cpus or maxcpus is provided, then all parameters
> > must be provided. We can still decide if clusters=0 is valid, but we'll
> > enforce that everything is explicit and that the product (with or without
> > clusters) matches maxcpus.
> Requiring every parameter explicitly will be most stable but indeed strict.
> 
> Currently all the parsers use way B to calculate value of thread if it is
> not provided explicitly.
> So users should ensure the -smp cmdline they provided can result in that
> parsed threads will
> be 1 if they don't want to support multiple threads in one core.
> 
> Very similar to thread, users should also ensure the provided cmdline can
> result in that parsed
> clusters will be 1 if they don't want to support multiple clusters in one
> socket.
> 
> So I'm wondering if we can just add some commit in the documentation to tell
> users that they
> should ensure this if they don't want support it. And as for calculation of
> clusters, we follow
> the logic of other parameters as you suggested in way B.
> 
> Thanks,
> Yanan
> > 
> > Requiring every parameter might be stricter than necessary, though, I
> > think we're mostly concerned with cpus/maxcpus, sockets, and cores.
> > clusters can default to one or zero (whatever we choose and document),
> > threads can default to one, and cpus can default to maxcpus or maxcpus can
> > default to cpus, but at least one of those must be provided. And, if
> > sockets are provided, then cores must be provided and vice versa. If
> > neither sockets nor cores are provided, then nothing else besides cpus and
> > maxcpus may be provided, and that would mean to not generate any topology
> > descriptions for the guest.

I still don't know. I think I prefer making -smp more strict (even for
other architectures, but that's more difficult to do than for Arm.) What I
wrote above isn't that bad. We only require one of cpus or maxcpus (pretty
much everybody already does that anyway), and then, if given sockets
or cores, the other will also be required. I assume anybody who bothers to
specify one or the other would already specify both anyway.

Thanks,
drew



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v2 2/4] hw/arm/virt: Parse -smp cluster parameter in virt_smp_parse
  2021-04-29 11:02           ` Andrew Jones
@ 2021-04-30  5:09             ` wangyanan (Y)
  2021-04-30  6:41               ` Andrew Jones
  0 siblings, 1 reply; 22+ messages in thread
From: wangyanan (Y) @ 2021-04-30  5:09 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Barry Song, Peter Maydell, Michael S . Tsirkin, wanghaibin.wang,
	qemu-devel, yangyicong, Shannon Zhao, qemu-arm, Alistair Francis,
	prime.zeng, Igor Mammedov, yuzenghui, Paolo Bonzini, zhukeqian1,
	Jiajie Li, David Gibson

Hi Drew,

On 2021/4/29 19:02, Andrew Jones wrote:
> On Thu, Apr 29, 2021 at 04:56:06PM +0800, wangyanan (Y) wrote:
>> On 2021/4/29 15:16, Andrew Jones wrote:
>>> On Thu, Apr 29, 2021 at 10:14:37AM +0800, wangyanan (Y) wrote:
>>>> On 2021/4/28 18:31, Andrew Jones wrote:
>>>>> On Tue, Apr 13, 2021 at 04:31:45PM +0800, Yanan Wang wrote:
>>>>>>             } else if (sockets == 0) {
>>>>>>                 threads = threads > 0 ? threads : 1;
>>>>>> -            sockets = cpus / (cores * threads);
>>>>>> +            sockets = cpus / (clusters * cores * threads);
>>>>>>                 sockets = sockets > 0 ? sockets : 1;
>>>>> If we initialize clusters to zero instead of one and add lines in
>>>>> 'cpus == 0 || cores == 0' and 'sockets == 0' like
>>>>> 'clusters = clusters > 0 ? clusters : 1' as needed, then I think we can
>>>>> add
>>>>>
>>>>>     } else if (clusters == 0) {
>>>>>         threads = threads > 0 ? threads : 1;
>>>>>         clusters = cpus / (sockets * cores * thread);
>>>>>         clusters = clusters > 0 ? clusters : 1;
>>>>>     }
>>>>>
>>>>> here.
>>>> I have thought about this kind of format before, but there is a little bit
>>>> difference between these two ways. Let's chose the better and more
>>>> reasonable one of the two.
>>>>
>>>> Way A currently in this patch:
>>>> If value of clusters is not explicitly specified in -smp command line, we
>>>> assume
>>>> that users don't want to support clusters, for compatibility we initialized
>>>> the
>>>> value to 1. So that with cmdline "-smp cpus=24, sockets=2, cores=6", we will
>>>> parse out the topology description like below:
>>>> cpus=24, sockets=2, clusters=1, cores=6, threads=2
>>>>
>>>> Way B that you suggested for this patch:
>>>> Whether value of clusters is explicitly specified in -smp command line or
>>>> not,
>>>> we assume that clusters are supported and calculate the value. So that with
>>>> cmdline "-smp cpus=24, sockets=2, cores=6", we will parse out the topology
>>>> description like below:
>>>> cpus =24, sockets=2, clusters=2, cores=6, threads=1
>>>>
>>>> But I think maybe we should not assume too much about what users think
>>>> through the -smp command line. We should just assume that all levels of
>>>> cpu topology are supported and calculate them, and users should be more
>>>> careful if they want to get the expected results with not so complete
>>>> cmdline.
>>>> If I'm right, then Way B should be better. :)
>>>>
>>> Hi Yanan,
>>>
>>> We're already assuming the user wants to describe clusters to the guest
>>> because we require at least one per socket. If we want the user to have a
>>> choice between using clusters or not, then I guess we need to change the
>>> logic in the PPTT and the cpu-map to only generate the cluster level when
>>> the number of clusters is not zero. And then change this parser to not
>>> require clusters at all.
>> Hi Drew,
>>
>> I think this kind of change will introduce more complexity and actually is
>> not necessary.
>> Not generating cluster level at all and generating cluster level (one per
>> socket) are same
>> to kernel. Without cluster description provided, kernel will initialize all
>> cores in the same
>> cluster which also means one cluster per socket.
> Which kernel? All kernels? One without the cluster support patches [1]?
>
> [1] https://lore.kernel.org/lkml/20210420001844.9116-1-song.bao.hua@hisilicon.com/#t
I'm sorry, I didn't make it clear. :)
I actually mean the ARM64 kernel, with or without [1].

Without [1]: Kernel doesn't care about cluster. When populating cpu 
topology, it directly
finds the hierarchy node with "physical package flag" as package when 
parsing ACPI, and
finds the core node's parent as package when parsing DT. So even we 
provide cluster level
description (one per socket), the parsing results will be the same as 
not providing at all.

With [1]: Kernel finds the core hierarchy node's parent as cluster when 
parsing ACPI. So if
we don't provide cluster level description, package will be taken as 
cluster. And if we provide
the description (one per socket), the parsing result will also be the same.

That's why I said that we just need to provide description of cluster 
(one per socket) if we
don't want to make use of it in VMs.

[1] 
https://lore.kernel.org/lkml/20210420001844.9116-1-song.bao.hua@hisilicon.com/#t
>> So we should only ensure value of clusters per socket is one if we don't
>> want to use clusters,
>> and don't need to care about whether or not to generate description in PPTT
>> and cpu-map.
>> Is this right?
> Depends on your answer to my 'which kernel' questions.
>
>>> I'm not a big fan of these auto-calculated values either, but the
>>> documentation says that it'll do that, and it's been done that way
>>> forever, so I think we're stuck with it for the -smp option. Hmm, I was
>>> just about to say that x86 computes all its values, but I see the most
>>> recently added one, 'dies', is implemented the way you're proposing we
>>> implement 'clusters', i.e. default to one and don't calculate it when it's
>>> missing. I actually consider that either a documentation bug or an smp
>>> parsing bug, though.
>> My propose originally came from implementation of x86.
>>> Another possible option, for Arm, because only the cpus and maxcpus
>>> parameters of -smp have ever worked, is to document, for Arm, that if even
>>> one parameter other than cpus or maxcpus is provided, then all parameters
>>> must be provided. We can still decide if clusters=0 is valid, but we'll
>>> enforce that everything is explicit and that the product (with or without
>>> clusters) matches maxcpus.
>> Requiring every parameter explicitly will be most stable but indeed strict.
>>
>> Currently all the parsers use way B to calculate value of thread if it is
>> not provided explicitly.
>> So users should ensure the -smp cmdline they provided can result in that
>> parsed threads will
>> be 1 if they don't want to support multiple threads in one core.
>>
>> Very similar to thread, users should also ensure the provided cmdline can
>> result in that parsed
>> clusters will be 1 if they don't want to support multiple clusters in one
>> socket.
>>
>> So I'm wondering if we can just add some commit in the documentation to tell
>> users that they
>> should ensure this if they don't want support it. And as for calculation of
>> clusters, we follow
>> the logic of other parameters as you suggested in way B.
>>
>> Thanks,
>> Yanan
>>> Requiring every parameter might be stricter than necessary, though, I
>>> think we're mostly concerned with cpus/maxcpus, sockets, and cores.
>>> clusters can default to one or zero (whatever we choose and document),
>>> threads can default to one, and cpus can default to maxcpus or maxcpus can
>>> default to cpus, but at least one of those must be provided. And, if
>>> sockets are provided, then cores must be provided and vice versa. If
>>> neither sockets nor cores are provided, then nothing else besides cpus and
>>> maxcpus may be provided, and that would mean to not generate any topology
>>> descriptions for the guest.
> I still don't know. I think I prefer making -smp more strict (even for
> other architectures, but that's more difficult to do than for Arm.) What I
> wrote above isn't that bad. We only require one of cpus or maxcpus (pretty
> much everybody already does that anyway), and then, if given sockets
> or cores, the other will also be required. I assume anybody who bothers to
> specify one or the other would already specify both anyway.
I agree to make -smp more strict. We want to expose the cpu topology 
information
to guest kernel, so that it can take advantage of the information for 
better scheduling.
 From this point of view, we hope the topology descriptions are 
accurately specified
but not automatically populated.

But I think the requirement for ARM "if even one parameter other than 
cpus or maxcpus
is provided then all parameters must be provided" will be better. This 
can ensure the
whole accurate users-specified topology. As you mentioned, if anybody 
who bothers
to specify one, why not also specify the others.

I can add the requirement for ARM in the documentation, and also check 
the parameters
in virt_smp_parse. Will this be fine?

Thanks,
Yanan
> Thanks,
> drew
>
> .


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v2 2/4] hw/arm/virt: Parse -smp cluster parameter in virt_smp_parse
  2021-04-30  5:09             ` wangyanan (Y)
@ 2021-04-30  6:41               ` Andrew Jones
  2021-04-30  7:01                 ` Andrew Jones
  2021-04-30  8:59                 ` wangyanan (Y)
  0 siblings, 2 replies; 22+ messages in thread
From: Andrew Jones @ 2021-04-30  6:41 UTC (permalink / raw)
  To: wangyanan (Y)
  Cc: Barry Song, Peter Maydell, Michael S . Tsirkin, wanghaibin.wang,
	qemu-devel, yangyicong, Shannon Zhao, qemu-arm, Alistair Francis,
	prime.zeng, Igor Mammedov, yuzenghui, Paolo Bonzini, zhukeqian1,
	Jiajie Li, David Gibson

On Fri, Apr 30, 2021 at 01:09:00PM +0800, wangyanan (Y) wrote:
> Hi Drew,
> 
> On 2021/4/29 19:02, Andrew Jones wrote:
> > On Thu, Apr 29, 2021 at 04:56:06PM +0800, wangyanan (Y) wrote:
> > > On 2021/4/29 15:16, Andrew Jones wrote:
> > > > On Thu, Apr 29, 2021 at 10:14:37AM +0800, wangyanan (Y) wrote:
> > > > > On 2021/4/28 18:31, Andrew Jones wrote:
> > > > > > On Tue, Apr 13, 2021 at 04:31:45PM +0800, Yanan Wang wrote:
> > > > > > >             } else if (sockets == 0) {
> > > > > > >                 threads = threads > 0 ? threads : 1;
> > > > > > > -            sockets = cpus / (cores * threads);
> > > > > > > +            sockets = cpus / (clusters * cores * threads);
> > > > > > >                 sockets = sockets > 0 ? sockets : 1;
> > > > > > If we initialize clusters to zero instead of one and add lines in
> > > > > > 'cpus == 0 || cores == 0' and 'sockets == 0' like
> > > > > > 'clusters = clusters > 0 ? clusters : 1' as needed, then I think we can
> > > > > > add
> > > > > > 
> > > > > >     } else if (clusters == 0) {
> > > > > >         threads = threads > 0 ? threads : 1;
> > > > > >         clusters = cpus / (sockets * cores * thread);
> > > > > >         clusters = clusters > 0 ? clusters : 1;
> > > > > >     }
> > > > > > 
> > > > > > here.
> > > > > I have thought about this kind of format before, but there is a little bit
> > > > > difference between these two ways. Let's chose the better and more
> > > > > reasonable one of the two.
> > > > > 
> > > > > Way A currently in this patch:
> > > > > If value of clusters is not explicitly specified in -smp command line, we
> > > > > assume
> > > > > that users don't want to support clusters, for compatibility we initialized
> > > > > the
> > > > > value to 1. So that with cmdline "-smp cpus=24, sockets=2, cores=6", we will
> > > > > parse out the topology description like below:
> > > > > cpus=24, sockets=2, clusters=1, cores=6, threads=2
> > > > > 
> > > > > Way B that you suggested for this patch:
> > > > > Whether value of clusters is explicitly specified in -smp command line or
> > > > > not,
> > > > > we assume that clusters are supported and calculate the value. So that with
> > > > > cmdline "-smp cpus=24, sockets=2, cores=6", we will parse out the topology
> > > > > description like below:
> > > > > cpus =24, sockets=2, clusters=2, cores=6, threads=1
> > > > > 
> > > > > But I think maybe we should not assume too much about what users think
> > > > > through the -smp command line. We should just assume that all levels of
> > > > > cpu topology are supported and calculate them, and users should be more
> > > > > careful if they want to get the expected results with not so complete
> > > > > cmdline.
> > > > > If I'm right, then Way B should be better. :)
> > > > > 
> > > > Hi Yanan,
> > > > 
> > > > We're already assuming the user wants to describe clusters to the guest
> > > > because we require at least one per socket. If we want the user to have a
> > > > choice between using clusters or not, then I guess we need to change the
> > > > logic in the PPTT and the cpu-map to only generate the cluster level when
> > > > the number of clusters is not zero. And then change this parser to not
> > > > require clusters at all.
> > > Hi Drew,
> > > 
> > > I think this kind of change will introduce more complexity and actually is
> > > not necessary.
> > > Not generating cluster level at all and generating cluster level (one per
> > > socket) are same
> > > to kernel. Without cluster description provided, kernel will initialize all
> > > cores in the same
> > > cluster which also means one cluster per socket.
> > Which kernel? All kernels? One without the cluster support patches [1]?
> > 
> > [1] https://lore.kernel.org/lkml/20210420001844.9116-1-song.bao.hua@hisilicon.com/#t
> I'm sorry, I didn't make it clear. :)
> I actually mean the ARM64 kernel, with or without [1].
> 
> Without [1]: Kernel doesn't care about cluster. When populating cpu
> topology, it directly
> finds the hierarchy node with "physical package flag" as package when
> parsing ACPI, and
> finds the core node's parent as package when parsing DT. So even we provide
> cluster level
> description (one per socket), the parsing results will be the same as not
> providing at all.
> 
> With [1]: Kernel finds the core hierarchy node's parent as cluster when
> parsing ACPI. So if
> we don't provide cluster level description, package will be taken as
> cluster. And if we provide
> the description (one per socket), the parsing result will also be the same.
> 
> That's why I said that we just need to provide description of cluster (one
> per socket) if we
> don't want to make use of it in VMs.

OK, that sounds good.

> 
> [1] https://lore.kernel.org/lkml/20210420001844.9116-1-song.bao.hua@hisilicon.com/#t
> > > So we should only ensure value of clusters per socket is one if we don't
> > > want to use clusters,
> > > and don't need to care about whether or not to generate description in PPTT
> > > and cpu-map.
> > > Is this right?
> > Depends on your answer to my 'which kernel' questions.
> > 
> > > > I'm not a big fan of these auto-calculated values either, but the
> > > > documentation says that it'll do that, and it's been done that way
> > > > forever, so I think we're stuck with it for the -smp option. Hmm, I was
> > > > just about to say that x86 computes all its values, but I see the most
> > > > recently added one, 'dies', is implemented the way you're proposing we
> > > > implement 'clusters', i.e. default to one and don't calculate it when it's
> > > > missing. I actually consider that either a documentation bug or an smp
> > > > parsing bug, though.
> > > My propose originally came from implementation of x86.
> > > > Another possible option, for Arm, because only the cpus and maxcpus
> > > > parameters of -smp have ever worked, is to document, for Arm, that if even
> > > > one parameter other than cpus or maxcpus is provided, then all parameters
> > > > must be provided. We can still decide if clusters=0 is valid, but we'll
> > > > enforce that everything is explicit and that the product (with or without
> > > > clusters) matches maxcpus.
> > > Requiring every parameter explicitly will be most stable but indeed strict.
> > > 
> > > Currently all the parsers use way B to calculate value of thread if it is
> > > not provided explicitly.
> > > So users should ensure the -smp cmdline they provided can result in that
> > > parsed threads will
> > > be 1 if they don't want to support multiple threads in one core.
> > > 
> > > Very similar to thread, users should also ensure the provided cmdline can
> > > result in that parsed
> > > clusters will be 1 if they don't want to support multiple clusters in one
> > > socket.
> > > 
> > > So I'm wondering if we can just add some commit in the documentation to tell
> > > users that they
> > > should ensure this if they don't want support it. And as for calculation of
> > > clusters, we follow
> > > the logic of other parameters as you suggested in way B.
> > > 
> > > Thanks,
> > > Yanan
> > > > Requiring every parameter might be stricter than necessary, though, I
> > > > think we're mostly concerned with cpus/maxcpus, sockets, and cores.
> > > > clusters can default to one or zero (whatever we choose and document),
> > > > threads can default to one, and cpus can default to maxcpus or maxcpus can
> > > > default to cpus, but at least one of those must be provided. And, if
> > > > sockets are provided, then cores must be provided and vice versa. If
> > > > neither sockets nor cores are provided, then nothing else besides cpus and
> > > > maxcpus may be provided, and that would mean to not generate any topology
> > > > descriptions for the guest.
> > I still don't know. I think I prefer making -smp more strict (even for
> > other architectures, but that's more difficult to do than for Arm.) What I
> > wrote above isn't that bad. We only require one of cpus or maxcpus (pretty
> > much everybody already does that anyway), and then, if given sockets
> > or cores, the other will also be required. I assume anybody who bothers to
> > specify one or the other would already specify both anyway.
> I agree to make -smp more strict. We want to expose the cpu topology
> information
> to guest kernel, so that it can take advantage of the information for better
> scheduling.
> From this point of view, we hope the topology descriptions are accurately
> specified
> but not automatically populated.
> 
> But I think the requirement for ARM "if even one parameter other than cpus
> or maxcpus
> is provided then all parameters must be provided" will be better. This can
> ensure the
> whole accurate users-specified topology. As you mentioned, if anybody who
> bothers
> to specify one, why not also specify the others.
> 
> I can add the requirement for ARM in the documentation, and also check the
> parameters
> in virt_smp_parse. Will this be fine?

We sort of have to support command lines that are missing 'maxcpus' and
'clusters', unless we work together with libvirt to make the change.
Currently libvirt will generate '-smp 16,sockets=16,cores=1,threads=1'
from '<vcpu placement='static'>16</vcpu>'. That's sufficient for our
stricter, but not completely strict requirements. And, I still think
'threads' could be optional, because there's a good chance the user
doesn't want to describe them, so a default of 1 is good enough. Also,
given maxcpus, but not cpus, it's pretty obvious that cpus should equal
maxcpus.

Thanks,
drew



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v2 2/4] hw/arm/virt: Parse -smp cluster parameter in virt_smp_parse
  2021-04-30  6:41               ` Andrew Jones
@ 2021-04-30  7:01                 ` Andrew Jones
  2021-04-30  9:33                   ` wangyanan (Y)
  2021-04-30  8:59                 ` wangyanan (Y)
  1 sibling, 1 reply; 22+ messages in thread
From: Andrew Jones @ 2021-04-30  7:01 UTC (permalink / raw)
  To: wangyanan (Y)
  Cc: Barry Song, Peter Maydell, Michael S . Tsirkin, wanghaibin.wang,
	qemu-devel, yangyicong, Shannon Zhao, qemu-arm, Alistair Francis,
	prime.zeng, Igor Mammedov, yuzenghui, Paolo Bonzini, zhukeqian1,
	Jiajie Li, David Gibson

On Fri, Apr 30, 2021 at 08:41:25AM +0200, Andrew Jones wrote:
> On Fri, Apr 30, 2021 at 01:09:00PM +0800, wangyanan (Y) wrote:
> > But I think the requirement for ARM "if even one parameter other than cpus
> > or maxcpus
> > is provided then all parameters must be provided" will be better. This can
> > ensure the
> > whole accurate users-specified topology. As you mentioned, if anybody who
> > bothers
> > to specify one, why not also specify the others.
> > 
> > I can add the requirement for ARM in the documentation, and also check the
> > parameters
> > in virt_smp_parse. Will this be fine?
> 
> We sort of have to support command lines that are missing 'maxcpus' and
> 'clusters', unless we work together with libvirt to make the change.
> Currently libvirt will generate '-smp 16,sockets=16,cores=1,threads=1'
> from '<vcpu placement='static'>16</vcpu>'. That's sufficient for our
> stricter, but not completely strict requirements. And, I still think
> 'threads' could be optional, because there's a good chance the user
> doesn't want to describe them, so a default of 1 is good enough. Also,
> given maxcpus, but not cpus, it's pretty obvious that cpus should equal
> maxcpus.
>

We also still need just 'cpus' or just 'maxcpus' to work, since that
already works now. So, at least these should work

 -smp N
 -smp maxcpus=N
 -smp N,maxcpus=M
 -smp N,sockets=N,cores=1,threads=1
 -smp N,maxcpus=M,sockets=M,cores=1,threads=1

since they work today, even though no topology is described. If we want to
describe a topology for the first three, then we'll have to pick one,
which brings us back to the sockets over cores stuff. Or, we could choose
to just not generate topology descriptions when none is provided.

Thanks,
drew



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v2 2/4] hw/arm/virt: Parse -smp cluster parameter in virt_smp_parse
  2021-04-30  6:41               ` Andrew Jones
  2021-04-30  7:01                 ` Andrew Jones
@ 2021-04-30  8:59                 ` wangyanan (Y)
  2021-04-30 10:48                   ` Andrew Jones
  1 sibling, 1 reply; 22+ messages in thread
From: wangyanan (Y) @ 2021-04-30  8:59 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Barry Song, Peter Maydell, Michael S . Tsirkin, wanghaibin.wang,
	qemu-devel, yangyicong, Shannon Zhao, qemu-arm, Alistair Francis,
	prime.zeng, Igor Mammedov, yuzenghui, Paolo Bonzini, zhukeqian1,
	Jiajie Li, David Gibson


On 2021/4/30 14:41, Andrew Jones wrote:
> On Fri, Apr 30, 2021 at 01:09:00PM +0800, wangyanan (Y) wrote:
>> Hi Drew,
>>
>> On 2021/4/29 19:02, Andrew Jones wrote:
>>> On Thu, Apr 29, 2021 at 04:56:06PM +0800, wangyanan (Y) wrote:
>>>> On 2021/4/29 15:16, Andrew Jones wrote:
>>>>> On Thu, Apr 29, 2021 at 10:14:37AM +0800, wangyanan (Y) wrote:
>>>>>> On 2021/4/28 18:31, Andrew Jones wrote:
>>>>>>> On Tue, Apr 13, 2021 at 04:31:45PM +0800, Yanan Wang wrote:
>>>>>>>>              } else if (sockets == 0) {
>>>>>>>>                  threads = threads > 0 ? threads : 1;
>>>>>>>> -            sockets = cpus / (cores * threads);
>>>>>>>> +            sockets = cpus / (clusters * cores * threads);
>>>>>>>>                  sockets = sockets > 0 ? sockets : 1;
>>>>>>> If we initialize clusters to zero instead of one and add lines in
>>>>>>> 'cpus == 0 || cores == 0' and 'sockets == 0' like
>>>>>>> 'clusters = clusters > 0 ? clusters : 1' as needed, then I think we can
>>>>>>> add
>>>>>>>
>>>>>>>      } else if (clusters == 0) {
>>>>>>>          threads = threads > 0 ? threads : 1;
>>>>>>>          clusters = cpus / (sockets * cores * thread);
>>>>>>>          clusters = clusters > 0 ? clusters : 1;
>>>>>>>      }
>>>>>>>
>>>>>>> here.
>>>>>> I have thought about this kind of format before, but there is a little bit
>>>>>> difference between these two ways. Let's chose the better and more
>>>>>> reasonable one of the two.
>>>>>>
>>>>>> Way A currently in this patch:
>>>>>> If value of clusters is not explicitly specified in -smp command line, we
>>>>>> assume
>>>>>> that users don't want to support clusters, for compatibility we initialized
>>>>>> the
>>>>>> value to 1. So that with cmdline "-smp cpus=24, sockets=2, cores=6", we will
>>>>>> parse out the topology description like below:
>>>>>> cpus=24, sockets=2, clusters=1, cores=6, threads=2
>>>>>>
>>>>>> Way B that you suggested for this patch:
>>>>>> Whether value of clusters is explicitly specified in -smp command line or
>>>>>> not,
>>>>>> we assume that clusters are supported and calculate the value. So that with
>>>>>> cmdline "-smp cpus=24, sockets=2, cores=6", we will parse out the topology
>>>>>> description like below:
>>>>>> cpus =24, sockets=2, clusters=2, cores=6, threads=1
>>>>>>
>>>>>> But I think maybe we should not assume too much about what users think
>>>>>> through the -smp command line. We should just assume that all levels of
>>>>>> cpu topology are supported and calculate them, and users should be more
>>>>>> careful if they want to get the expected results with not so complete
>>>>>> cmdline.
>>>>>> If I'm right, then Way B should be better. :)
>>>>>>
>>>>> Hi Yanan,
>>>>>
>>>>> We're already assuming the user wants to describe clusters to the guest
>>>>> because we require at least one per socket. If we want the user to have a
>>>>> choice between using clusters or not, then I guess we need to change the
>>>>> logic in the PPTT and the cpu-map to only generate the cluster level when
>>>>> the number of clusters is not zero. And then change this parser to not
>>>>> require clusters at all.
>>>> Hi Drew,
>>>>
>>>> I think this kind of change will introduce more complexity and actually is
>>>> not necessary.
>>>> Not generating cluster level at all and generating cluster level (one per
>>>> socket) are same
>>>> to kernel. Without cluster description provided, kernel will initialize all
>>>> cores in the same
>>>> cluster which also means one cluster per socket.
>>> Which kernel? All kernels? One without the cluster support patches [1]?
>>>
>>> [1] https://lore.kernel.org/lkml/20210420001844.9116-1-song.bao.hua@hisilicon.com/#t
>> I'm sorry, I didn't make it clear. :)
>> I actually mean the ARM64 kernel, with or without [1].
>>
>> Without [1]: Kernel doesn't care about cluster. When populating cpu
>> topology, it directly
>> finds the hierarchy node with "physical package flag" as package when
>> parsing ACPI, and
>> finds the core node's parent as package when parsing DT. So even we provide
>> cluster level
>> description (one per socket), the parsing results will be the same as not
>> providing at all.
>>
>> With [1]: Kernel finds the core hierarchy node's parent as cluster when
>> parsing ACPI. So if
>> we don't provide cluster level description, package will be taken as
>> cluster. And if we provide
>> the description (one per socket), the parsing result will also be the same.
>>
>> That's why I said that we just need to provide description of cluster (one
>> per socket) if we
>> don't want to make use of it in VMs.
> OK, that sounds good.
>
>> [1] https://lore.kernel.org/lkml/20210420001844.9116-1-song.bao.hua@hisilicon.com/#t
>>>> So we should only ensure value of clusters per socket is one if we don't
>>>> want to use clusters,
>>>> and don't need to care about whether or not to generate description in PPTT
>>>> and cpu-map.
>>>> Is this right?
>>> Depends on your answer to my 'which kernel' questions.
>>>
>>>>> I'm not a big fan of these auto-calculated values either, but the
>>>>> documentation says that it'll do that, and it's been done that way
>>>>> forever, so I think we're stuck with it for the -smp option. Hmm, I was
>>>>> just about to say that x86 computes all its values, but I see the most
>>>>> recently added one, 'dies', is implemented the way you're proposing we
>>>>> implement 'clusters', i.e. default to one and don't calculate it when it's
>>>>> missing. I actually consider that either a documentation bug or an smp
>>>>> parsing bug, though.
>>>> My propose originally came from implementation of x86.
>>>>> Another possible option, for Arm, because only the cpus and maxcpus
>>>>> parameters of -smp have ever worked, is to document, for Arm, that if even
>>>>> one parameter other than cpus or maxcpus is provided, then all parameters
>>>>> must be provided. We can still decide if clusters=0 is valid, but we'll
>>>>> enforce that everything is explicit and that the product (with or without
>>>>> clusters) matches maxcpus.
>>>> Requiring every parameter explicitly will be most stable but indeed strict.
>>>>
>>>> Currently all the parsers use way B to calculate value of thread if it is
>>>> not provided explicitly.
>>>> So users should ensure the -smp cmdline they provided can result in that
>>>> parsed threads will
>>>> be 1 if they don't want to support multiple threads in one core.
>>>>
>>>> Very similar to thread, users should also ensure the provided cmdline can
>>>> result in that parsed
>>>> clusters will be 1 if they don't want to support multiple clusters in one
>>>> socket.
>>>>
>>>> So I'm wondering if we can just add some commit in the documentation to tell
>>>> users that they
>>>> should ensure this if they don't want support it. And as for calculation of
>>>> clusters, we follow
>>>> the logic of other parameters as you suggested in way B.
>>>>
>>>> Thanks,
>>>> Yanan
>>>>> Requiring every parameter might be stricter than necessary, though, I
>>>>> think we're mostly concerned with cpus/maxcpus, sockets, and cores.
>>>>> clusters can default to one or zero (whatever we choose and document),
>>>>> threads can default to one, and cpus can default to maxcpus or maxcpus can
>>>>> default to cpus, but at least one of those must be provided. And, if
>>>>> sockets are provided, then cores must be provided and vice versa. If
>>>>> neither sockets nor cores are provided, then nothing else besides cpus and
>>>>> maxcpus may be provided, and that would mean to not generate any topology
>>>>> descriptions for the guest.
>>> I still don't know. I think I prefer making -smp more strict (even for
>>> other architectures, but that's more difficult to do than for Arm.) What I
>>> wrote above isn't that bad. We only require one of cpus or maxcpus (pretty
>>> much everybody already does that anyway), and then, if given sockets
>>> or cores, the other will also be required. I assume anybody who bothers to
>>> specify one or the other would already specify both anyway.
>> I agree to make -smp more strict. We want to expose the cpu topology
>> information
>> to guest kernel, so that it can take advantage of the information for better
>> scheduling.
>>  From this point of view, we hope the topology descriptions are accurately
>> specified
>> but not automatically populated.
>>
>> But I think the requirement for ARM "if even one parameter other than cpus
>> or maxcpus
>> is provided then all parameters must be provided" will be better. This can
>> ensure the
>> whole accurate users-specified topology. As you mentioned, if anybody who
>> bothers
>> to specify one, why not also specify the others.
>>
>> I can add the requirement for ARM in the documentation, and also check the
>> parameters
>> in virt_smp_parse. Will this be fine?
> We sort of have to support command lines that are missing 'maxcpus' and
> 'clusters', unless we work together with libvirt to make the change.
> Currently libvirt will generate '-smp 16,sockets=16,cores=1,threads=1'
> from '<vcpu placement='static'>16</vcpu>'.
I see. And libvirt currently doesn't support cluster in xml, which means
we can not generate complete cmdlines with cluster in it through
<topology ...> specification in xml.
> That's sufficient for our
> stricter, but not completely strict requirements. And, I still think
> 'threads' could be optional, because there's a good chance the user
> doesn't want to describe them, so a default of 1 is good enough.
So the parsing logic can be repeated like this:
We require at least one of cpus and maxcpus specified explicitly, and 
default
cluster/thread to 1 if not explicitly specified. And require both of 
sockets and
cores if one of them is specified.

This is consistent with what you mentioned yesterday.

Thanks,
Yanan
>   Also,
> given maxcpus, but not cpus, it's pretty obvious that cpus should equal
> maxcpus.
>
> Thanks,
> drew
>
> .


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v2 2/4] hw/arm/virt: Parse -smp cluster parameter in virt_smp_parse
  2021-04-30  7:01                 ` Andrew Jones
@ 2021-04-30  9:33                   ` wangyanan (Y)
  2021-04-30 10:49                     ` Andrew Jones
  0 siblings, 1 reply; 22+ messages in thread
From: wangyanan (Y) @ 2021-04-30  9:33 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Barry Song, Peter Maydell, Michael S . Tsirkin, wanghaibin.wang,
	qemu-devel, yangyicong, Shannon Zhao, qemu-arm, Alistair Francis,
	prime.zeng, Igor Mammedov, yuzenghui, Paolo Bonzini, zhukeqian1,
	Jiajie Li, David Gibson


On 2021/4/30 15:01, Andrew Jones wrote:
> On Fri, Apr 30, 2021 at 08:41:25AM +0200, Andrew Jones wrote:
>> On Fri, Apr 30, 2021 at 01:09:00PM +0800, wangyanan (Y) wrote:
>>> But I think the requirement for ARM "if even one parameter other than cpus
>>> or maxcpus
>>> is provided then all parameters must be provided" will be better. This can
>>> ensure the
>>> whole accurate users-specified topology. As you mentioned, if anybody who
>>> bothers
>>> to specify one, why not also specify the others.
>>>
>>> I can add the requirement for ARM in the documentation, and also check the
>>> parameters
>>> in virt_smp_parse. Will this be fine?
>> We sort of have to support command lines that are missing 'maxcpus' and
>> 'clusters', unless we work together with libvirt to make the change.
>> Currently libvirt will generate '-smp 16,sockets=16,cores=1,threads=1'
>> from '<vcpu placement='static'>16</vcpu>'. That's sufficient for our
>> stricter, but not completely strict requirements. And, I still think
>> 'threads' could be optional, because there's a good chance the user
>> doesn't want to describe them, so a default of 1 is good enough. Also,
>> given maxcpus, but not cpus, it's pretty obvious that cpus should equal
>> maxcpus.
>>
> We also still need just 'cpus' or just 'maxcpus' to work, since that
> already works now. So, at least these should work
>
>   -smp N
>   -smp maxcpus=N
>   -smp N,maxcpus=M
>   -smp N,sockets=N,cores=1,threads=1
>   -smp N,maxcpus=M,sockets=M,cores=1,threads=1
>
> since they work today, even though no topology is described.
Yes. I forgot this point that we should consider the compat.
> If we want to
> describe a topology for the first three, then we'll have to pick one,
> which brings us back to the sockets over cores stuff. Or, we could choose
> to just not generate topology descriptions when none is provided.
I find that both preferring cores over sockets and not generating 
topology descriptions
when none is provided can't solve everything. We can only ensure guest 
will get one
socket with multiple cores with qemu cmdline "-smp N".

But if we specify N cpus without any other parameters in libvirt xml, a 
qemu cmdline like
"-smp N, sockets=N, cores=1,threads=1" will be generated, and as a 
result guest will get
N sockets. In this case, we still can't avoid the silent change.

So maybe we should just prefer sockets over cores like the general 
smp_parse() and libvirt,
and let users use "-smp N, sockets=1, cores=N,threads=1" to get what 
they want if they
use version 6.0 or later.

Thanks,
Yanan
> Thanks,
> drew
>
> .


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v2 2/4] hw/arm/virt: Parse -smp cluster parameter in virt_smp_parse
  2021-04-30  8:59                 ` wangyanan (Y)
@ 2021-04-30 10:48                   ` Andrew Jones
  0 siblings, 0 replies; 22+ messages in thread
From: Andrew Jones @ 2021-04-30 10:48 UTC (permalink / raw)
  To: wangyanan (Y)
  Cc: Barry Song, Peter Maydell, Michael S . Tsirkin, wanghaibin.wang,
	qemu-devel, yangyicong, Shannon Zhao, qemu-arm, Alistair Francis,
	prime.zeng, Igor Mammedov, yuzenghui, Paolo Bonzini, zhukeqian1,
	Jiajie Li, David Gibson

On Fri, Apr 30, 2021 at 04:59:36PM +0800, wangyanan (Y) wrote:
> 
> On 2021/4/30 14:41, Andrew Jones wrote:
> > On Fri, Apr 30, 2021 at 01:09:00PM +0800, wangyanan (Y) wrote:
> > > Hi Drew,
> > > 
> > > On 2021/4/29 19:02, Andrew Jones wrote:
> > > > On Thu, Apr 29, 2021 at 04:56:06PM +0800, wangyanan (Y) wrote:
> > > > > On 2021/4/29 15:16, Andrew Jones wrote:
> > > > > > On Thu, Apr 29, 2021 at 10:14:37AM +0800, wangyanan (Y) wrote:
> > > > > > > On 2021/4/28 18:31, Andrew Jones wrote:
> > > > > > > > On Tue, Apr 13, 2021 at 04:31:45PM +0800, Yanan Wang wrote:
> > > > > > > > >              } else if (sockets == 0) {
> > > > > > > > >                  threads = threads > 0 ? threads : 1;
> > > > > > > > > -            sockets = cpus / (cores * threads);
> > > > > > > > > +            sockets = cpus / (clusters * cores * threads);
> > > > > > > > >                  sockets = sockets > 0 ? sockets : 1;
> > > > > > > > If we initialize clusters to zero instead of one and add lines in
> > > > > > > > 'cpus == 0 || cores == 0' and 'sockets == 0' like
> > > > > > > > 'clusters = clusters > 0 ? clusters : 1' as needed, then I think we can
> > > > > > > > add
> > > > > > > > 
> > > > > > > >      } else if (clusters == 0) {
> > > > > > > >          threads = threads > 0 ? threads : 1;
> > > > > > > >          clusters = cpus / (sockets * cores * thread);
> > > > > > > >          clusters = clusters > 0 ? clusters : 1;
> > > > > > > >      }
> > > > > > > > 
> > > > > > > > here.
> > > > > > > I have thought about this kind of format before, but there is a little bit
> > > > > > > difference between these two ways. Let's chose the better and more
> > > > > > > reasonable one of the two.
> > > > > > > 
> > > > > > > Way A currently in this patch:
> > > > > > > If value of clusters is not explicitly specified in -smp command line, we
> > > > > > > assume
> > > > > > > that users don't want to support clusters, for compatibility we initialized
> > > > > > > the
> > > > > > > value to 1. So that with cmdline "-smp cpus=24, sockets=2, cores=6", we will
> > > > > > > parse out the topology description like below:
> > > > > > > cpus=24, sockets=2, clusters=1, cores=6, threads=2
> > > > > > > 
> > > > > > > Way B that you suggested for this patch:
> > > > > > > Whether value of clusters is explicitly specified in -smp command line or
> > > > > > > not,
> > > > > > > we assume that clusters are supported and calculate the value. So that with
> > > > > > > cmdline "-smp cpus=24, sockets=2, cores=6", we will parse out the topology
> > > > > > > description like below:
> > > > > > > cpus =24, sockets=2, clusters=2, cores=6, threads=1
> > > > > > > 
> > > > > > > But I think maybe we should not assume too much about what users think
> > > > > > > through the -smp command line. We should just assume that all levels of
> > > > > > > cpu topology are supported and calculate them, and users should be more
> > > > > > > careful if they want to get the expected results with not so complete
> > > > > > > cmdline.
> > > > > > > If I'm right, then Way B should be better. :)
> > > > > > > 
> > > > > > Hi Yanan,
> > > > > > 
> > > > > > We're already assuming the user wants to describe clusters to the guest
> > > > > > because we require at least one per socket. If we want the user to have a
> > > > > > choice between using clusters or not, then I guess we need to change the
> > > > > > logic in the PPTT and the cpu-map to only generate the cluster level when
> > > > > > the number of clusters is not zero. And then change this parser to not
> > > > > > require clusters at all.
> > > > > Hi Drew,
> > > > > 
> > > > > I think this kind of change will introduce more complexity and actually is
> > > > > not necessary.
> > > > > Not generating cluster level at all and generating cluster level (one per
> > > > > socket) are same
> > > > > to kernel. Without cluster description provided, kernel will initialize all
> > > > > cores in the same
> > > > > cluster which also means one cluster per socket.
> > > > Which kernel? All kernels? One without the cluster support patches [1]?
> > > > 
> > > > [1] https://lore.kernel.org/lkml/20210420001844.9116-1-song.bao.hua@hisilicon.com/#t
> > > I'm sorry, I didn't make it clear. :)
> > > I actually mean the ARM64 kernel, with or without [1].
> > > 
> > > Without [1]: Kernel doesn't care about cluster. When populating cpu
> > > topology, it directly
> > > finds the hierarchy node with "physical package flag" as package when
> > > parsing ACPI, and
> > > finds the core node's parent as package when parsing DT. So even we provide
> > > cluster level
> > > description (one per socket), the parsing results will be the same as not
> > > providing at all.
> > > 
> > > With [1]: Kernel finds the core hierarchy node's parent as cluster when
> > > parsing ACPI. So if
> > > we don't provide cluster level description, package will be taken as
> > > cluster. And if we provide
> > > the description (one per socket), the parsing result will also be the same.
> > > 
> > > That's why I said that we just need to provide description of cluster (one
> > > per socket) if we
> > > don't want to make use of it in VMs.
> > OK, that sounds good.
> > 
> > > [1] https://lore.kernel.org/lkml/20210420001844.9116-1-song.bao.hua@hisilicon.com/#t
> > > > > So we should only ensure value of clusters per socket is one if we don't
> > > > > want to use clusters,
> > > > > and don't need to care about whether or not to generate description in PPTT
> > > > > and cpu-map.
> > > > > Is this right?
> > > > Depends on your answer to my 'which kernel' questions.
> > > > 
> > > > > > I'm not a big fan of these auto-calculated values either, but the
> > > > > > documentation says that it'll do that, and it's been done that way
> > > > > > forever, so I think we're stuck with it for the -smp option. Hmm, I was
> > > > > > just about to say that x86 computes all its values, but I see the most
> > > > > > recently added one, 'dies', is implemented the way you're proposing we
> > > > > > implement 'clusters', i.e. default to one and don't calculate it when it's
> > > > > > missing. I actually consider that either a documentation bug or an smp
> > > > > > parsing bug, though.
> > > > > My propose originally came from implementation of x86.
> > > > > > Another possible option, for Arm, because only the cpus and maxcpus
> > > > > > parameters of -smp have ever worked, is to document, for Arm, that if even
> > > > > > one parameter other than cpus or maxcpus is provided, then all parameters
> > > > > > must be provided. We can still decide if clusters=0 is valid, but we'll
> > > > > > enforce that everything is explicit and that the product (with or without
> > > > > > clusters) matches maxcpus.
> > > > > Requiring every parameter explicitly will be most stable but indeed strict.
> > > > > 
> > > > > Currently all the parsers use way B to calculate value of thread if it is
> > > > > not provided explicitly.
> > > > > So users should ensure the -smp cmdline they provided can result in that
> > > > > parsed threads will
> > > > > be 1 if they don't want to support multiple threads in one core.
> > > > > 
> > > > > Very similar to thread, users should also ensure the provided cmdline can
> > > > > result in that parsed
> > > > > clusters will be 1 if they don't want to support multiple clusters in one
> > > > > socket.
> > > > > 
> > > > > So I'm wondering if we can just add some commit in the documentation to tell
> > > > > users that they
> > > > > should ensure this if they don't want support it. And as for calculation of
> > > > > clusters, we follow
> > > > > the logic of other parameters as you suggested in way B.
> > > > > 
> > > > > Thanks,
> > > > > Yanan
> > > > > > Requiring every parameter might be stricter than necessary, though, I
> > > > > > think we're mostly concerned with cpus/maxcpus, sockets, and cores.
> > > > > > clusters can default to one or zero (whatever we choose and document),
> > > > > > threads can default to one, and cpus can default to maxcpus or maxcpus can
> > > > > > default to cpus, but at least one of those must be provided. And, if
> > > > > > sockets are provided, then cores must be provided and vice versa. If
> > > > > > neither sockets nor cores are provided, then nothing else besides cpus and
> > > > > > maxcpus may be provided, and that would mean to not generate any topology
> > > > > > descriptions for the guest.
> > > > I still don't know. I think I prefer making -smp more strict (even for
> > > > other architectures, but that's more difficult to do than for Arm.) What I
> > > > wrote above isn't that bad. We only require one of cpus or maxcpus (pretty
> > > > much everybody already does that anyway), and then, if given sockets
> > > > or cores, the other will also be required. I assume anybody who bothers to
> > > > specify one or the other would already specify both anyway.
> > > I agree to make -smp more strict. We want to expose the cpu topology
> > > information
> > > to guest kernel, so that it can take advantage of the information for better
> > > scheduling.
> > >  From this point of view, we hope the topology descriptions are accurately
> > > specified
> > > but not automatically populated.
> > > 
> > > But I think the requirement for ARM "if even one parameter other than cpus
> > > or maxcpus
> > > is provided then all parameters must be provided" will be better. This can
> > > ensure the
> > > whole accurate users-specified topology. As you mentioned, if anybody who
> > > bothers
> > > to specify one, why not also specify the others.
> > > 
> > > I can add the requirement for ARM in the documentation, and also check the
> > > parameters
> > > in virt_smp_parse. Will this be fine?
> > We sort of have to support command lines that are missing 'maxcpus' and
> > 'clusters', unless we work together with libvirt to make the change.
> > Currently libvirt will generate '-smp 16,sockets=16,cores=1,threads=1'
> > from '<vcpu placement='static'>16</vcpu>'.
> I see. And libvirt currently doesn't support cluster in xml, which means
> we can not generate complete cmdlines with cluster in it through
> <topology ...> specification in xml.
> > That's sufficient for our
> > stricter, but not completely strict requirements. And, I still think
> > 'threads' could be optional, because there's a good chance the user
> > doesn't want to describe them, so a default of 1 is good enough.
> So the parsing logic can be repeated like this:
> We require at least one of cpus and maxcpus specified explicitly, and
> default
> cluster/thread to 1 if not explicitly specified. And require both of sockets
> and
> cores if one of them is specified.
> 
> This is consistent with what you mentioned yesterday.
>

Yup, I think this should work for both compatibility concerns, concerns
with bad assumptions, and with supporting users which would like more
terse command lines.

Thanks,
drew



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v2 2/4] hw/arm/virt: Parse -smp cluster parameter in virt_smp_parse
  2021-04-30  9:33                   ` wangyanan (Y)
@ 2021-04-30 10:49                     ` Andrew Jones
  2021-05-06  7:04                       ` wangyanan (Y)
  0 siblings, 1 reply; 22+ messages in thread
From: Andrew Jones @ 2021-04-30 10:49 UTC (permalink / raw)
  To: wangyanan (Y)
  Cc: Barry Song, Peter Maydell, Michael S . Tsirkin, wanghaibin.wang,
	qemu-devel, yangyicong, Shannon Zhao, qemu-arm, Alistair Francis,
	prime.zeng, Igor Mammedov, yuzenghui, Paolo Bonzini, zhukeqian1,
	Jiajie Li, David Gibson

On Fri, Apr 30, 2021 at 05:33:42PM +0800, wangyanan (Y) wrote:
> 
> On 2021/4/30 15:01, Andrew Jones wrote:
> > On Fri, Apr 30, 2021 at 08:41:25AM +0200, Andrew Jones wrote:
> > > On Fri, Apr 30, 2021 at 01:09:00PM +0800, wangyanan (Y) wrote:
> > > > But I think the requirement for ARM "if even one parameter other than cpus
> > > > or maxcpus
> > > > is provided then all parameters must be provided" will be better. This can
> > > > ensure the
> > > > whole accurate users-specified topology. As you mentioned, if anybody who
> > > > bothers
> > > > to specify one, why not also specify the others.
> > > > 
> > > > I can add the requirement for ARM in the documentation, and also check the
> > > > parameters
> > > > in virt_smp_parse. Will this be fine?
> > > We sort of have to support command lines that are missing 'maxcpus' and
> > > 'clusters', unless we work together with libvirt to make the change.
> > > Currently libvirt will generate '-smp 16,sockets=16,cores=1,threads=1'
> > > from '<vcpu placement='static'>16</vcpu>'. That's sufficient for our
> > > stricter, but not completely strict requirements. And, I still think
> > > 'threads' could be optional, because there's a good chance the user
> > > doesn't want to describe them, so a default of 1 is good enough. Also,
> > > given maxcpus, but not cpus, it's pretty obvious that cpus should equal
> > > maxcpus.
> > > 
> > We also still need just 'cpus' or just 'maxcpus' to work, since that
> > already works now. So, at least these should work
> > 
> >   -smp N
> >   -smp maxcpus=N
> >   -smp N,maxcpus=M
> >   -smp N,sockets=N,cores=1,threads=1
> >   -smp N,maxcpus=M,sockets=M,cores=1,threads=1
> > 
> > since they work today, even though no topology is described.
> Yes. I forgot this point that we should consider the compat.
> > If we want to
> > describe a topology for the first three, then we'll have to pick one,
> > which brings us back to the sockets over cores stuff. Or, we could choose
> > to just not generate topology descriptions when none is provided.
> I find that both preferring cores over sockets and not generating topology
> descriptions
> when none is provided can't solve everything. We can only ensure guest will
> get one
> socket with multiple cores with qemu cmdline "-smp N".
> 
> But if we specify N cpus without any other parameters in libvirt xml, a qemu
> cmdline like
> "-smp N, sockets=N, cores=1,threads=1" will be generated, and as a result
> guest will get
> N sockets. In this case, we still can't avoid the silent change.
> 
> So maybe we should just prefer sockets over cores like the general
> smp_parse() and libvirt,
> and let users use "-smp N, sockets=1, cores=N,threads=1" to get what they
> want if they
> use version 6.0 or later.

Rather than doing the preference of sockets over cores thing, let's
require that sockets, cores, and one of maxcpus or cpus are provided,
as we discussed in the other mail.

Thanks,
drew



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH v2 2/4] hw/arm/virt: Parse -smp cluster parameter in virt_smp_parse
  2021-04-30 10:49                     ` Andrew Jones
@ 2021-05-06  7:04                       ` wangyanan (Y)
  0 siblings, 0 replies; 22+ messages in thread
From: wangyanan (Y) @ 2021-05-06  7:04 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Barry Song, Peter Maydell, Michael S . Tsirkin, wanghaibin.wang,
	qemu-devel, yangyicong, Shannon Zhao, qemu-arm, Alistair Francis,
	prime.zeng, Igor Mammedov, yuzenghui, Paolo Bonzini, zhukeqian1,
	Jiajie Li, David Gibson

Hi Drew,

On 2021/4/30 18:49, Andrew Jones wrote:
> On Fri, Apr 30, 2021 at 05:33:42PM +0800, wangyanan (Y) wrote:
>> On 2021/4/30 15:01, Andrew Jones wrote:
>>> On Fri, Apr 30, 2021 at 08:41:25AM +0200, Andrew Jones wrote:
>>>> On Fri, Apr 30, 2021 at 01:09:00PM +0800, wangyanan (Y) wrote:
>>>>> But I think the requirement for ARM "if even one parameter other than cpus
>>>>> or maxcpus
>>>>> is provided then all parameters must be provided" will be better. This can
>>>>> ensure the
>>>>> whole accurate users-specified topology. As you mentioned, if anybody who
>>>>> bothers
>>>>> to specify one, why not also specify the others.
>>>>>
>>>>> I can add the requirement for ARM in the documentation, and also check the
>>>>> parameters
>>>>> in virt_smp_parse. Will this be fine?
>>>> We sort of have to support command lines that are missing 'maxcpus' and
>>>> 'clusters', unless we work together with libvirt to make the change.
>>>> Currently libvirt will generate '-smp 16,sockets=16,cores=1,threads=1'
>>>> from '<vcpu placement='static'>16</vcpu>'. That's sufficient for our
>>>> stricter, but not completely strict requirements. And, I still think
>>>> 'threads' could be optional, because there's a good chance the user
>>>> doesn't want to describe them, so a default of 1 is good enough. Also,
>>>> given maxcpus, but not cpus, it's pretty obvious that cpus should equal
>>>> maxcpus.
>>>>
>>> We also still need just 'cpus' or just 'maxcpus' to work, since that
>>> already works now. So, at least these should work
>>>
>>>    -smp N
>>>    -smp maxcpus=N
>>>    -smp N,maxcpus=M
>>>    -smp N,sockets=N,cores=1,threads=1
>>>    -smp N,maxcpus=M,sockets=M,cores=1,threads=1
>>>
>>> since they work today, even though no topology is described.
>> Yes. I forgot this point that we should consider the compat.
>>> If we want to
>>> describe a topology for the first three, then we'll have to pick one,
>>> which brings us back to the sockets over cores stuff. Or, we could choose
>>> to just not generate topology descriptions when none is provided.
>> I find that both preferring cores over sockets and not generating topology
>> descriptions
>> when none is provided can't solve everything. We can only ensure guest will
>> get one
>> socket with multiple cores with qemu cmdline "-smp N".
>>
>> But if we specify N cpus without any other parameters in libvirt xml, a qemu
>> cmdline like
>> "-smp N, sockets=N, cores=1,threads=1" will be generated, and as a result
>> guest will get
>> N sockets. In this case, we still can't avoid the silent change.
>>
>> So maybe we should just prefer sockets over cores like the general
>> smp_parse() and libvirt,
>> and let users use "-smp N, sockets=1, cores=N,threads=1" to get what they
>> want if they
>> use version 6.0 or later.
> Rather than doing the preference of sockets over cores thing, let's
> require that sockets, cores, and one of maxcpus or cpus are provided,
> as we discussed in the other mail.
Ok, I see.

Thanks for these discussions about parsing of -smp command line. :)
Now it seems that a separate virt_smp_parse() function for ARM is still 
necessary,
I will implement it in v3 of the other "Introduce cpu topology support" 
series.

Thanks,
Yanan
>
> Thanks,
> drew
>
> .


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2021-05-06  7:05 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-13  8:31 [RFC PATCH v2 0/4] hw/arm/virt: Introduce cluster cpu topology support Yanan Wang
2021-04-13  8:31 ` [RFC PATCH v2 1/4] vl.c: Add -smp, clusters=* command line support for ARM cpu Yanan Wang
2021-04-28 10:23   ` Andrew Jones
2021-04-29  1:22     ` wangyanan (Y)
2021-04-13  8:31 ` [RFC PATCH v2 2/4] hw/arm/virt: Parse -smp cluster parameter in virt_smp_parse Yanan Wang
2021-04-28 10:31   ` Andrew Jones
2021-04-29  2:14     ` wangyanan (Y)
2021-04-29  7:16       ` Andrew Jones
2021-04-29  8:56         ` wangyanan (Y)
2021-04-29 11:02           ` Andrew Jones
2021-04-30  5:09             ` wangyanan (Y)
2021-04-30  6:41               ` Andrew Jones
2021-04-30  7:01                 ` Andrew Jones
2021-04-30  9:33                   ` wangyanan (Y)
2021-04-30 10:49                     ` Andrew Jones
2021-05-06  7:04                       ` wangyanan (Y)
2021-04-30  8:59                 ` wangyanan (Y)
2021-04-30 10:48                   ` Andrew Jones
2021-04-13  8:31 ` [RFC PATCH v2 3/4] hw/arm/virt-acpi-build: Add cluster level for PPTT table Yanan Wang
2021-04-28 10:41   ` Andrew Jones
2021-04-13  8:31 ` [RFC PATCH v2 4/4] hw/arm/virt: Add cluster level for device tree Yanan Wang
2021-04-28 10:46   ` Andrew Jones

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).