All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v3 for-2.3 0/2] numa: Fix default VCPUs to node mapping
@ 2015-03-19 17:09 Igor Mammedov
  2015-03-19 17:09 ` [Qemu-devel] [PATCH v3 for-2.3 1/2] numa: introduce machine callback for VCPU " Igor Mammedov
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Igor Mammedov @ 2015-03-19 17:09 UTC (permalink / raw)
  To: qemu-devel; +Cc: ehabkost, afaerber

Changes since v2:
 - fix spelling errors
 - split out PC hunks itno a separate patch

Igor Mammedov (2):
  numa: introduce machine callback for VCPU to node mapping
  pc: fix default VCPU to NUMA node mapping

 hw/i386/pc.c          |  9 +++++++++
 include/hw/boards.h   |  5 +++++
 include/sysemu/numa.h |  3 ++-
 numa.c                | 18 +++++++++++++-----
 vl.c                  |  2 +-
 5 files changed, 30 insertions(+), 7 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Qemu-devel] [PATCH v3 for-2.3 1/2] numa: introduce machine callback for VCPU to node mapping
  2015-03-19 17:09 [Qemu-devel] [PATCH v3 for-2.3 0/2] numa: Fix default VCPUs to node mapping Igor Mammedov
@ 2015-03-19 17:09 ` Igor Mammedov
  2015-03-19 17:14   ` Andreas Färber
  2015-03-19 17:09 ` [Qemu-devel] [PATCH v3 for-2.3 2/2] pc: fix default VCPU to NUMA " Igor Mammedov
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 9+ messages in thread
From: Igor Mammedov @ 2015-03-19 17:09 UTC (permalink / raw)
  To: qemu-devel; +Cc: ehabkost, afaerber

Current default round-robin way of distributing VCPUs among
NUMA nodes might be wrong in case on multi-core/threads
CPUs. Making guests confused wrt topology where cores from
the same socket are on different nodes.

Allow a machine to override default mapping by providing
 MachineClass->cpu_index_to_socket_id()
callback which would allow it group VCPUs from a socket
on the same NUMA node.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
v3:
  - split out numa/machine change into a separate patch
---
 include/hw/boards.h   |  5 +++++
 include/sysemu/numa.h |  3 ++-
 numa.c                | 18 +++++++++++++-----
 vl.c                  |  2 +-
 4 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/include/hw/boards.h b/include/hw/boards.h
index 1feea2b..78838d1 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -82,6 +82,10 @@ bool machine_mem_merge(MachineState *machine);
  *    of HotplugHandler object, which handles hotplug operation
  *    for a given @dev. It may return NULL if @dev doesn't require
  *    any actions to be performed by hotplug handler.
+ * @cpu_index_to_socket_id:
+ *    used to provide @cpu_index to socket number mapping, allowing
+ *    a machine to group CPU threads belonging to the same socket/package
+ *    Returns: socket number given cpu_index belongs to.
  */
 struct MachineClass {
     /*< private >*/
@@ -118,6 +122,7 @@ struct MachineClass {
 
     HotplugHandler *(*get_hotplug_handler)(MachineState *machine,
                                            DeviceState *dev);
+    unsigned (*cpu_index_to_socket_id)(unsigned cpu_index);
 };
 
 /**
diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
index 5633b85..6523b4d 100644
--- a/include/sysemu/numa.h
+++ b/include/sysemu/numa.h
@@ -6,6 +6,7 @@
 #include "qemu/option.h"
 #include "sysemu/sysemu.h"
 #include "sysemu/hostmem.h"
+#include "hw/boards.h"
 
 extern int nb_numa_nodes;   /* Number of NUMA nodes */
 
@@ -16,7 +17,7 @@ typedef struct node_info {
     bool present;
 } NodeInfo;
 extern NodeInfo numa_info[MAX_NODES];
-void parse_numa_opts(void);
+void parse_numa_opts(MachineClass *mc);
 void numa_post_machine_init(void);
 void query_numa_node_mem(uint64_t node_mem[]);
 extern QemuOptsList qemu_numa_opts;
diff --git a/numa.c b/numa.c
index ffbec68..f1f571a 100644
--- a/numa.c
+++ b/numa.c
@@ -165,7 +165,7 @@ error:
     return -1;
 }
 
-void parse_numa_opts(void)
+void parse_numa_opts(MachineClass *mc)
 {
     int i;
 
@@ -233,13 +233,21 @@ void parse_numa_opts(void)
                 break;
             }
         }
-        /* assigning the VCPUs round-robin is easier to implement, guest OSes
-         * must cope with this anyway, because there are BIOSes out there in
-         * real machines which also use this scheme.
+        /* Historically VCPUs were assigned in round-robin order to NUMA
+         * nodes. However it causes issues with guest not handling it nice
+         * in case where cores/threads from a multicore CPU appear on
+         * different nodes. So allow boards to override default distribution
+         * rule grouping VCPUs by socket so that VCPUs from the same socket
+         * would be on the same node.
          */
         if (i == nb_numa_nodes) {
             for (i = 0; i < max_cpus; i++) {
-                set_bit(i, numa_info[i % nb_numa_nodes].node_cpu);
+                unsigned node_id = i % nb_numa_nodes;
+                if (mc->cpu_index_to_socket_id) {
+                    node_id = mc->cpu_index_to_socket_id(i) % nb_numa_nodes;
+                }
+
+                set_bit(i, numa_info[node_id].node_cpu);
             }
         }
     }
diff --git a/vl.c b/vl.c
index 69617d6..75ec292 100644
--- a/vl.c
+++ b/vl.c
@@ -4170,7 +4170,7 @@ int main(int argc, char **argv, char **envp)
     default_drive(default_floppy, snapshot, IF_FLOPPY, 0, FD_OPTS);
     default_drive(default_sdcard, snapshot, IF_SD, 0, SD_OPTS);
 
-    parse_numa_opts();
+    parse_numa_opts(machine_class);
 
     if (qemu_opts_foreach(qemu_find_opts("mon"), mon_init_func, NULL, 1) != 0) {
         exit(1);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [Qemu-devel] [PATCH v3 for-2.3 2/2] pc: fix default VCPU to NUMA node mapping
  2015-03-19 17:09 [Qemu-devel] [PATCH v3 for-2.3 0/2] numa: Fix default VCPUs to node mapping Igor Mammedov
  2015-03-19 17:09 ` [Qemu-devel] [PATCH v3 for-2.3 1/2] numa: introduce machine callback for VCPU " Igor Mammedov
@ 2015-03-19 17:09 ` Igor Mammedov
  2015-03-19 17:13 ` [Qemu-devel] [PATCH v3 for-2.3 0/2] numa: Fix default VCPUs to " Andreas Färber
  2015-03-20 10:24 ` Igor Mammedov
  3 siblings, 0 replies; 9+ messages in thread
From: Igor Mammedov @ 2015-03-19 17:09 UTC (permalink / raw)
  To: qemu-devel; +Cc: ehabkost, afaerber

Since commit
   dd0247e0 pc: acpi: mark all possible CPUs as enabled in SRAT
Linux kernel actually tries to use CPU to Node mapping from
QEMU provided SRAT table instead of discarding it, and that
in some cases breaks build_sched_domains() which expects
sane mapping where cores/threads belonging to the same socket
are on the same NUMA node.

With current default round-robin mapping of VCPUs to nodes
guest ends-up with cores/threads belonging to the same socket
being on different NUMA nodes.

For example with following CLI:

   qemu-system-x86_64 -m 4G \
         -cpu Opteron_G3,vendor=AuthenticAMD \
         -smp 5,sockets=1,cores=4,threads=1,maxcpus=8 \
         -numa node,nodeid=0 -numa node,nodeid=1

2.6.32 based kernels will hang on boot due to incorrectly built
sched_group-s list in update_sd_lb_stats()

Replacing default mapping with a manual, where VCPUs belonging to
the same socket are on the same NUMA node, fixes the issue for
guests which can't handle nonsense topology i.e. changing CLI to:
  -numa node,nodeid=0,cpus=0-3 -numa node,nodeid=1,cpus=4-7

So instead of simply scattering VCPUs around nodes, provide
callback to map the same socket VCPUs to the same NUMA node,
which is what guests would expect from a sane hardware/BIOS.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
v3:
  - split out pc change into a separate patch
  - fix spelling mistakes
v2:
  - add machine callback cpu_index_to_socket_id() and use it
    instead of stub approach

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 hw/i386/pc.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 4b46c29..a52d2af 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1851,6 +1851,14 @@ static void pc_machine_initfn(Object *obj)
                              NULL, NULL);
 }
 
+static unsigned pc_cpu_index_to_socket_id(unsigned cpu_index)
+{
+    unsigned pkg_id, core_id, smt_id;
+    x86_topo_ids_from_idx(smp_cores, smp_threads, cpu_index,
+                          &pkg_id, &core_id, &smt_id);
+    return pkg_id;
+}
+
 static void pc_machine_class_init(ObjectClass *oc, void *data)
 {
     MachineClass *mc = MACHINE_CLASS(oc);
@@ -1859,6 +1867,7 @@ static void pc_machine_class_init(ObjectClass *oc, void *data)
 
     pcmc->get_hotplug_handler = mc->get_hotplug_handler;
     mc->get_hotplug_handler = pc_get_hotpug_handler;
+    mc->cpu_index_to_socket_id = pc_cpu_index_to_socket_id;
     hc->plug = pc_machine_device_plug_cb;
     hc->unplug_request = pc_machine_device_unplug_request_cb;
     hc->unplug = pc_machine_device_unplug_cb;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH v3 for-2.3 0/2] numa: Fix default VCPUs to node mapping
  2015-03-19 17:09 [Qemu-devel] [PATCH v3 for-2.3 0/2] numa: Fix default VCPUs to node mapping Igor Mammedov
  2015-03-19 17:09 ` [Qemu-devel] [PATCH v3 for-2.3 1/2] numa: introduce machine callback for VCPU " Igor Mammedov
  2015-03-19 17:09 ` [Qemu-devel] [PATCH v3 for-2.3 2/2] pc: fix default VCPU to NUMA " Igor Mammedov
@ 2015-03-19 17:13 ` Andreas Färber
  2015-03-19 17:44   ` Eduardo Habkost
  2015-03-20 10:24 ` Igor Mammedov
  3 siblings, 1 reply; 9+ messages in thread
From: Andreas Färber @ 2015-03-19 17:13 UTC (permalink / raw)
  To: Igor Mammedov, qemu-devel, ehabkost

Am 19.03.2015 um 18:09 schrieb Igor Mammedov:
> Changes since v2:
>  - fix spelling errors
>  - split out PC hunks itno a separate patch
> 
> Igor Mammedov (2):
>   numa: introduce machine callback for VCPU to node mapping
>   pc: fix default VCPU to NUMA node mapping

Reviewed-by: Andreas Färber <afaerber@suse.de>

Eduardo, are you planning to handle this?

Thanks,
Andreas

-- 
SUSE Linux GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Felix Imendörffer, Jane Smithard, Jennifer Guild, Dilip Upmanyu,
Graham Norton; HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH v3 for-2.3 1/2] numa: introduce machine callback for VCPU to node mapping
  2015-03-19 17:09 ` [Qemu-devel] [PATCH v3 for-2.3 1/2] numa: introduce machine callback for VCPU " Igor Mammedov
@ 2015-03-19 17:14   ` Andreas Färber
  0 siblings, 0 replies; 9+ messages in thread
From: Andreas Färber @ 2015-03-19 17:14 UTC (permalink / raw)
  To: Igor Mammedov, qemu-devel; +Cc: ehabkost

Am 19.03.2015 um 18:09 schrieb Igor Mammedov:
> Current default round-robin way of distributing VCPUs among
> NUMA nodes might be wrong in case on multi-core/threads
> CPUs. Making guests confused wrt topology where cores from
> the same socket are on different nodes.
> 
> Allow a machine to override default mapping by providing
>  MachineClass->cpu_index_to_socket_id()

MachineClass::cpu_index_to_socket_id() please - can be fixed when applying.

> callback which would allow it group VCPUs from a socket
> on the same NUMA node.
> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
> v3:
>   - split out numa/machine change into a separate patch
> ---
>  include/hw/boards.h   |  5 +++++
>  include/sysemu/numa.h |  3 ++-
>  numa.c                | 18 +++++++++++++-----
>  vl.c                  |  2 +-
>  4 files changed, 21 insertions(+), 7 deletions(-)

-- 
SUSE Linux GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Felix Imendörffer, Jane Smithard, Jennifer Guild, Dilip Upmanyu,
Graham Norton; HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH v3 for-2.3 0/2] numa: Fix default VCPUs to node mapping
  2015-03-19 17:13 ` [Qemu-devel] [PATCH v3 for-2.3 0/2] numa: Fix default VCPUs to " Andreas Färber
@ 2015-03-19 17:44   ` Eduardo Habkost
  0 siblings, 0 replies; 9+ messages in thread
From: Eduardo Habkost @ 2015-03-19 17:44 UTC (permalink / raw)
  To: Andreas Färber; +Cc: Igor Mammedov, qemu-devel

On Thu, Mar 19, 2015 at 06:13:42PM +0100, Andreas Färber wrote:
> Am 19.03.2015 um 18:09 schrieb Igor Mammedov:
> > Changes since v2:
> >  - fix spelling errors
> >  - split out PC hunks itno a separate patch
> > 
> > Igor Mammedov (2):
> >   numa: introduce machine callback for VCPU to node mapping
> >   pc: fix default VCPU to NUMA node mapping
> 
> Reviewed-by: Andreas Färber <afaerber@suse.de>

Thanks!

> 
> Eduardo, are you planning to handle this?

Yes.

-- 
Eduardo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH v3 for-2.3 0/2] numa: Fix default VCPUs to node mapping
  2015-03-19 17:09 [Qemu-devel] [PATCH v3 for-2.3 0/2] numa: Fix default VCPUs to node mapping Igor Mammedov
                   ` (2 preceding siblings ...)
  2015-03-19 17:13 ` [Qemu-devel] [PATCH v3 for-2.3 0/2] numa: Fix default VCPUs to " Andreas Färber
@ 2015-03-20 10:24 ` Igor Mammedov
  2015-03-20 14:52   ` Eduardo Habkost
  3 siblings, 1 reply; 9+ messages in thread
From: Igor Mammedov @ 2015-03-20 10:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: ehabkost, qemu-stable

On Thu, 19 Mar 2015 17:09:20 +0000
Igor Mammedov <imammedo@redhat.com> wrote:

> Changes since v2:
>  - fix spelling errors
>  - split out PC hunks itno a separate patch
> 
> Igor Mammedov (2):
>   numa: introduce machine callback for VCPU to node mapping
>   pc: fix default VCPU to NUMA node mapping
> 
>  hw/i386/pc.c          |  9 +++++++++
>  include/hw/boards.h   |  5 +++++
>  include/sysemu/numa.h |  3 ++-
>  numa.c                | 18 +++++++++++++-----
>  vl.c                  |  2 +-
>  5 files changed, 30 insertions(+), 7 deletions(-)
> 

CCing stable which I've forgotten to do earlier

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH v3 for-2.3 0/2] numa: Fix default VCPUs to node mapping
  2015-03-20 10:24 ` Igor Mammedov
@ 2015-03-20 14:52   ` Eduardo Habkost
  2015-03-20 15:01     ` Igor Mammedov
  0 siblings, 1 reply; 9+ messages in thread
From: Eduardo Habkost @ 2015-03-20 14:52 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: qemu-devel, qemu-stable

On Fri, Mar 20, 2015 at 11:24:14AM +0100, Igor Mammedov wrote:
> On Thu, 19 Mar 2015 17:09:20 +0000
> Igor Mammedov <imammedo@redhat.com> wrote:
> 
> > Changes since v2:
> >  - fix spelling errors
> >  - split out PC hunks itno a separate patch
> > 
> > Igor Mammedov (2):
> >   numa: introduce machine callback for VCPU to node mapping
> >   pc: fix default VCPU to NUMA node mapping
> > 
> >  hw/i386/pc.c          |  9 +++++++++
> >  include/hw/boards.h   |  5 +++++
> >  include/sysemu/numa.h |  3 ++-
> >  numa.c                | 18 +++++++++++++-----
> >  vl.c                  |  2 +-
> >  5 files changed, 30 insertions(+), 7 deletions(-)
> > 
> 
> CCing stable which I've forgotten to do earlier

Why stable? This can be easily worked around by explicitly setting the
CPUs for each node. I see this as an usbility improvement, not a bug
fix. And it breaks compatibility.

-- 
Eduardo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH v3 for-2.3 0/2] numa: Fix default VCPUs to node mapping
  2015-03-20 14:52   ` Eduardo Habkost
@ 2015-03-20 15:01     ` Igor Mammedov
  0 siblings, 0 replies; 9+ messages in thread
From: Igor Mammedov @ 2015-03-20 15:01 UTC (permalink / raw)
  To: Eduardo Habkost; +Cc: qemu-devel, qemu-stable

On Fri, 20 Mar 2015 11:52:46 -0300
Eduardo Habkost <ehabkost@redhat.com> wrote:

> On Fri, Mar 20, 2015 at 11:24:14AM +0100, Igor Mammedov wrote:
> > On Thu, 19 Mar 2015 17:09:20 +0000
> > Igor Mammedov <imammedo@redhat.com> wrote:
> > 
> > > Changes since v2:
> > >  - fix spelling errors
> > >  - split out PC hunks itno a separate patch
> > > 
> > > Igor Mammedov (2):
> > >   numa: introduce machine callback for VCPU to node mapping
> > >   pc: fix default VCPU to NUMA node mapping
> > > 
> > >  hw/i386/pc.c          |  9 +++++++++
> > >  include/hw/boards.h   |  5 +++++
> > >  include/sysemu/numa.h |  3 ++-
> > >  numa.c                | 18 +++++++++++++-----
> > >  vl.c                  |  2 +-
> > >  5 files changed, 30 insertions(+), 7 deletions(-)
> > > 
> > 
> > CCing stable which I've forgotten to do earlier
> 
> Why stable? This can be easily worked around by explicitly setting the
> CPUs for each node. I see this as an usbility improvement, not a bug
> fix. And it breaks compatibility.
For me if guest hangs due to incorrect topology provided by QEMU, it's a bug.
I'm not sure about what compatibility it breaks though,
but I won't insist on making downstream (2.2 based) live easier
in this case since there is workaround.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-03-20 15:01 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-19 17:09 [Qemu-devel] [PATCH v3 for-2.3 0/2] numa: Fix default VCPUs to node mapping Igor Mammedov
2015-03-19 17:09 ` [Qemu-devel] [PATCH v3 for-2.3 1/2] numa: introduce machine callback for VCPU " Igor Mammedov
2015-03-19 17:14   ` Andreas Färber
2015-03-19 17:09 ` [Qemu-devel] [PATCH v3 for-2.3 2/2] pc: fix default VCPU to NUMA " Igor Mammedov
2015-03-19 17:13 ` [Qemu-devel] [PATCH v3 for-2.3 0/2] numa: Fix default VCPUs to " Andreas Färber
2015-03-19 17:44   ` Eduardo Habkost
2015-03-20 10:24 ` Igor Mammedov
2015-03-20 14:52   ` Eduardo Habkost
2015-03-20 15:01     ` Igor Mammedov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.