All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: Igor Mammedov <imammedo@redhat.com>
Cc: qemu-devel@nongnu.org, Eduardo Habkost <ehabkost@redhat.com>,
	Peter Maydell <peter.maydell@linaro.org>,
	Andrew Jones <drjones@redhat.com>, Eric Blake <eblake@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Shannon Zhao <zhaoshenglong@huawei.com>,
	qemu-arm@nongnu.org, qemu-ppc@nongnu.org
Subject: Re: [Qemu-devel] [PATCH for-2.10 05/23] numa: move source of default CPUs to NUMA node mapping into boards
Date: Tue, 28 Mar 2017 15:19:20 +1100	[thread overview]
Message-ID: <20170328041920.GC21068@umbus.fritz.box> (raw)
In-Reply-To: <1490189568-167621-6-git-send-email-imammedo@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 13096 bytes --]

On Wed, Mar 22, 2017 at 02:32:30PM +0100, Igor Mammedov wrote:
> Originally CPU threads were by default assigned in
> round-robin fashion. However it was causing issues in
> guest since CPU threads from the same socket/core could
> be placed on different NUMA nodes.
> Commit fb43b73b (pc: fix default VCPU to NUMA node mapping)
> fixed it by grouping threads within a socket on the same node
> introducing cpu_index_to_socket_id() callback and commit
> 20bb648d (spapr: Fix default NUMA node allocation for threads)
> reused callback to fix similar issues for SPAPR machine
> even though socket doesn't make much sense there.
> 
> As result QEMU ended up having 3 default distribution rules
> used by 3 targets /virt-arm, spapr, pc/.
> 
> In effort of moving NUMA mapping for CPUs into possible_cpus,
> generalize default mapping in numa.c by making boards decide
> on default mapping and let them explicitly tell generic
> numa code to which node a CPU thread belongs to by replacing
> cpu_index_to_socket_id() with @cpu_index_to_instance_props()
> which provides default node_id assigned by board to specified
> cpu_index.
> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
> Patch only moves source of default mapping to possible_cpus[]
> and leaves the rest of NUMA handling to numa_info[node_id].node_cpu
> bitmaps. It's up to follow up patches to replace bitmaps
> with possible_cpus[] internally.
> ---
>  include/hw/boards.h   |  8 ++++++--
>  include/sysemu/numa.h |  2 +-
>  hw/arm/virt.c         | 19 +++++++++++++++++--
>  hw/i386/pc.c          | 22 ++++++++++++++++------
>  hw/ppc/spapr.c        | 27 ++++++++++++++++++++-------
>  numa.c                | 15 +++++++++------
>  vl.c                  |  2 +-
>  7 files changed, 70 insertions(+), 25 deletions(-)
> 
> diff --git a/include/hw/boards.h b/include/hw/boards.h
> index 269d0ba..1dd0fde 100644
> --- a/include/hw/boards.h
> +++ b/include/hw/boards.h
> @@ -74,7 +74,10 @@ typedef struct {
>   *    of HotplugHandler object, which handles hotplug operation
>   *    for a given @dev. It may return NULL if @dev doesn't require
>   *    any actions to be performed by hotplug handler.
> - * @cpu_index_to_socket_id:
> + * @cpu_index_to_instance_props:
> + *    used to provide @cpu_index to socket/core/thread number mapping, allowing
> + *    legacy code to perform maping from cpu_index to topology properties
> + *    Returns: tuple of socket/core/thread ids given cpu_index belongs to.
>   *    used to provide @cpu_index to socket number mapping, allowing
>   *    a machine to group CPU threads belonging to the same socket/package
>   *    Returns: socket number given cpu_index belongs to.
> @@ -138,7 +141,8 @@ struct MachineClass {
>  
>      HotplugHandler *(*get_hotplug_handler)(MachineState *machine,
>                                             DeviceState *dev);
> -    unsigned (*cpu_index_to_socket_id)(unsigned cpu_index);
> +    CpuInstanceProperties (*cpu_index_to_instance_props)(MachineState *machine,
> +                                                         unsigned cpu_index);
>      const CPUArchIdList *(*possible_cpu_arch_ids)(MachineState *machine);
>  };
>  
> diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
> index 8f09dcf..46ea6c7 100644
> --- a/include/sysemu/numa.h
> +++ b/include/sysemu/numa.h
> @@ -24,7 +24,7 @@ typedef struct node_info {
>  } NodeInfo;
>  
>  extern NodeInfo numa_info[MAX_NODES];
> -void parse_numa_opts(MachineClass *mc);
> +void parse_numa_opts(MachineState *ms);
>  void numa_post_machine_init(void);
>  void query_numa_node_mem(uint64_t node_mem[]);
>  extern QemuOptsList qemu_numa_opts;
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 0cbcbc1..8748d25 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -1554,6 +1554,16 @@ static void virt_set_gic_version(Object *obj, const char *value, Error **errp)
>      }
>  }
>  
> +static CpuInstanceProperties
> +virt_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
> +{
> +    MachineClass *mc = MACHINE_GET_CLASS(ms);
> +    const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms);
> +
> +    assert(cpu_index < possible_cpus->len);
> +    return possible_cpus->cpus[cpu_index].props;;
> +}
> +

It seems a bit weird to have a machine specific hook to pull the
property information when one way or another it's coming from the
possible_cpus table, which is already constructed by a machine
specific hook.  Could we add a range or list of cpu_index values to
each possible_cpus entry instead, and have a generic lookup of the
right entry based on that?


>  static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms)
>  {
>      int n;
> @@ -1573,8 +1583,12 @@ static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms)
>          ms->possible_cpus->cpus[n].props.has_thread_id = true;
>          ms->possible_cpus->cpus[n].props.thread_id = n;
>  
> -        /* TODO: add 'has_node/node' here to describe
> -           to which node core belongs */
> +        /* default distribution of CPUs over NUMA nodes */
> +        if (nb_numa_nodes) {
> +            /* preset values but do not enable them i.e. 'has_node_id = false',
> +             * board will enable them if manual mapping wasn't present on CLI */

I'm a little confused by this comment, since I don't see any board
code altering has_node_id.

> +            ms->possible_cpus->cpus[n].props.node_id = n % nb_numa_nodes;;
> +        }
>      }
>      return ms->possible_cpus;
>  }
> @@ -1596,6 +1610,7 @@ static void virt_machine_class_init(ObjectClass *oc, void *data)
>      /* We know we will never create a pre-ARMv7 CPU which needs 1K pages */
>      mc->minimum_page_bits = 12;
>      mc->possible_cpu_arch_ids = virt_possible_cpu_arch_ids;
> +    mc->cpu_index_to_instance_props = virt_cpu_index_to_props;
>  }
>  
>  static const TypeInfo virt_machine_info = {
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index d24388e..7031100 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -2245,12 +2245,14 @@ static void pc_machine_reset(void)
>      }
>  }
>  
> -static unsigned pc_cpu_index_to_socket_id(unsigned cpu_index)
> +static CpuInstanceProperties
> +pc_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
>  {
> -    X86CPUTopoInfo topo;
> -    x86_topo_ids_from_idx(smp_cores, smp_threads, cpu_index,
> -                          &topo);
> -    return topo.pkg_id;
> +    MachineClass *mc = MACHINE_GET_CLASS(ms);
> +    const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms);
> +
> +    assert(cpu_index < possible_cpus->len);
> +    return possible_cpus->cpus[cpu_index].props;;

Since the pc and arm version of this are basically identical, I wonder
if that should actually be the default implementation.  If we need it
at all.

>  }
>  
>  static const CPUArchIdList *pc_possible_cpu_arch_ids(MachineState *ms)
> @@ -2282,6 +2284,14 @@ static const CPUArchIdList *pc_possible_cpu_arch_ids(MachineState *ms)
>          ms->possible_cpus->cpus[i].props.core_id = topo.core_id;
>          ms->possible_cpus->cpus[i].props.has_thread_id = true;
>          ms->possible_cpus->cpus[i].props.thread_id = topo.smt_id;
> +
> +        /* default distribution of CPUs over NUMA nodes */
> +        if (nb_numa_nodes) {
> +            /* preset values but do not enable them i.e. 'has_node_id = false',
> +             * board will enable them if manual mapping wasn't present on CLI */
> +            ms->possible_cpus->cpus[i].props.node_id =
> +                topo.pkg_id % nb_numa_nodes;
> +        }
>      }
>      return ms->possible_cpus;
>  }
> @@ -2324,7 +2334,7 @@ static void pc_machine_class_init(ObjectClass *oc, void *data)
>      pcmc->acpi_data_size = 0x20000 + 0x8000;
>      pcmc->save_tsc_khz = true;
>      mc->get_hotplug_handler = pc_get_hotpug_handler;
> -    mc->cpu_index_to_socket_id = pc_cpu_index_to_socket_id;
> +    mc->cpu_index_to_instance_props = pc_cpu_index_to_props;
>      mc->possible_cpu_arch_ids = pc_possible_cpu_arch_ids;
>      mc->has_hotpluggable_cpus = true;
>      mc->default_boot_order = "cad";
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 6ee566d..9dcbbcc 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2921,11 +2921,18 @@ static HotplugHandler *spapr_get_hotplug_handler(MachineState *machine,
>      return NULL;
>  }
>  
> -static unsigned spapr_cpu_index_to_socket_id(unsigned cpu_index)
> +static CpuInstanceProperties
> +spapr_cpu_index_to_props(MachineState *machine, unsigned cpu_index)
>  {
> -    /* Allocate to NUMA nodes on a "socket" basis (not that concept of
> -     * socket means much for the paravirtualized PAPR platform) */
> -    return cpu_index / smp_threads / smp_cores;
> +    CPUArchId *core_slot;
> +    MachineClass *mc = MACHINE_GET_CLASS(machine);
> +    int core_id = cpu_index / smp_threads * smp_threads;

I don't think you need this.  AIUI the purpose of
spapr_find_cpu_slot() is that it already finds the right CPU slot from
a cpu_index, so you can just pass the cpu_index directly.

> +
> +    /* make sure possible_cpu are intialized */
> +    mc->possible_cpu_arch_ids(machine);
> +    core_slot = spapr_find_cpu_slot(machine, core_id, NULL);
> +    assert(core_slot);
> +    return core_slot->props;
>  }
>  
>  static const CPUArchIdList *spapr_possible_cpu_arch_ids(MachineState *machine)
> @@ -2952,8 +2959,14 @@ static const CPUArchIdList *spapr_possible_cpu_arch_ids(MachineState *machine)
>          machine->possible_cpus->cpus[i].arch_id = core_id;
>          machine->possible_cpus->cpus[i].props.has_core_id = true;
>          machine->possible_cpus->cpus[i].props.core_id = core_id;
> -        /* TODO: add 'has_node/node' here to describe
> -           to which node core belongs */
> +
> +        /* default distribution of CPUs over NUMA nodes */
> +        if (nb_numa_nodes) {
> +            /* preset values but do not enable them i.e. 'has_node_id = false',
> +             * board will enable them if manual mapping wasn't present on CLI */
> +            machine->possible_cpus->cpus[i].props.node_id =
> +                core_id / smp_threads / smp_cores % nb_numa_nodes;
> +        }
>      }
>      return machine->possible_cpus;
>  }
> @@ -3076,7 +3089,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
>      hc->pre_plug = spapr_machine_device_pre_plug;
>      hc->plug = spapr_machine_device_plug;
>      hc->unplug = spapr_machine_device_unplug;
> -    mc->cpu_index_to_socket_id = spapr_cpu_index_to_socket_id;
> +    mc->cpu_index_to_instance_props = spapr_cpu_index_to_props;
>      mc->possible_cpu_arch_ids = spapr_possible_cpu_arch_ids;
>      hc->unplug_request = spapr_machine_device_unplug_request;
>  
> diff --git a/numa.c b/numa.c
> index e01cb54..b6e71bc 100644
> --- a/numa.c
> +++ b/numa.c
> @@ -294,9 +294,10 @@ static void validate_numa_cpus(void)
>      g_free(seen_cpus);
>  }
>  
> -void parse_numa_opts(MachineClass *mc)
> +void parse_numa_opts(MachineState *ms)
>  {
>      int i;
> +    MachineClass *mc = MACHINE_GET_CLASS(ms);
>  
>      for (i = 0; i < MAX_NODES; i++) {
>          numa_info[i].node_cpu = bitmap_new(max_cpus);
> @@ -378,14 +379,16 @@ void parse_numa_opts(MachineClass *mc)
>           * rule grouping VCPUs by socket so that VCPUs from the same socket
>           * would be on the same node.
>           */
> +        if (!mc->cpu_index_to_instance_props) {
> +            error_report("default CPUs to NUMA node mapping isn't supported");
> +            exit(1);
> +        }
>          if (i == nb_numa_nodes) {
>              for (i = 0; i < max_cpus; i++) {
> -                unsigned node_id = i % nb_numa_nodes;
> -                if (mc->cpu_index_to_socket_id) {
> -                    node_id = mc->cpu_index_to_socket_id(i) % nb_numa_nodes;
> -                }
> +                CpuInstanceProperties props;
> +                props = mc->cpu_index_to_instance_props(ms, i);
>  
> -                set_bit(i, numa_info[node_id].node_cpu);
> +                set_bit(i, numa_info[props.node_id].node_cpu);
>              }
>          }
>  
> diff --git a/vl.c b/vl.c
> index 0b4ed52..5ffb9c3 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -4498,7 +4498,7 @@ int main(int argc, char **argv, char **envp)
>      default_drive(default_floppy, snapshot, IF_FLOPPY, 0, FD_OPTS);
>      default_drive(default_sdcard, snapshot, IF_SD, 0, SD_OPTS);
>  
> -    parse_numa_opts(machine_class);
> +    parse_numa_opts(current_machine);
>  
>      if (qemu_opts_foreach(qemu_find_opts("mon"),
>                            mon_init_func, NULL, NULL)) {

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

  parent reply	other threads:[~2017-03-28  4:24 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-22 13:32 [Qemu-devel] [PATCH for-2.10 00/23] numa: add '-numa cpu' option Igor Mammedov
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 01/23] tests: add CPUs to numa node mapping test Igor Mammedov
2017-03-27  0:31   ` David Gibson
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 02/23] hw/arm/virt: extract mp-affinity calculation in separate function Igor Mammedov
2017-04-25 14:09   ` Andrew Jones
2017-04-25 14:39     ` Igor Mammedov
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 03/23] hw/arm/virt: use machine->possible_cpus for storing possible topology info Igor Mammedov
2017-04-25 14:28   ` Andrew Jones
2017-04-25 14:36     ` Igor Mammedov
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 04/23] hw/arm/virt: explicitly allocate cpu_index for cpus Igor Mammedov
2017-04-25 14:33   ` Andrew Jones
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 05/23] numa: move source of default CPUs to NUMA node mapping into boards Igor Mammedov
2017-03-23  6:10   ` Bharata B Rao
2017-03-23  8:48     ` Igor Mammedov
2017-03-28  4:19   ` David Gibson [this message]
2017-03-28 10:53     ` Igor Mammedov
2017-03-29  2:24       ` David Gibson
2017-03-29 11:48         ` Igor Mammedov
2017-04-20 14:29     ` Igor Mammedov
2017-04-25 14:48   ` Andrew Jones
2017-04-25 15:07     ` Igor Mammedov
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 06/23] spapr: add node-id property to sPAPR core Igor Mammedov
2017-03-28  4:23   ` David Gibson
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 07/23] pc: add node-id property to CPU Igor Mammedov
2017-04-12 21:02   ` Eduardo Habkost
2017-04-19 11:14     ` Igor Mammedov
2017-04-26 12:21       ` Eduardo Habkost
2017-04-27 13:14         ` Igor Mammedov
2017-04-27 16:32           ` Eduardo Habkost
2017-04-27 17:25             ` Igor Mammedov
2017-04-27 17:32               ` Eduardo Habkost
2017-05-02  4:27             ` David Gibson
2017-05-02  8:28               ` Igor Mammedov
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 08/23] virt-arm: " Igor Mammedov
2017-04-25 17:16   ` Andrew Jones
2017-04-26 10:47     ` Igor Mammedov
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 09/23] numa: add check that board supports cpu_index to node mapping Igor Mammedov
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 10/23] numa: mirror cpu to node mapping in MachineState::possible_cpus Igor Mammedov
2017-03-28  4:44   ` David Gibson
2017-04-12 21:15   ` Eduardo Habkost
2017-04-19  9:52     ` Igor Mammedov
2017-04-26 11:04       ` Eduardo Habkost
2017-04-13 13:58   ` Eduardo Habkost
2017-04-19  9:31     ` Igor Mammedov
2017-04-26 11:02       ` Eduardo Habkost
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 11/23] numa: do default mapping based on possible_cpus instead of node_cpu bitmaps Igor Mammedov
2017-03-28  4:46   ` David Gibson
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 12/23] pc: get numa node mapping from possible_cpus instead of numa_get_node_for_cpu() Igor Mammedov
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 13/23] spapr: " Igor Mammedov
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 14/23] virt-arm: " Igor Mammedov
2017-04-25 17:06   ` Andrew Jones
2017-04-26 10:54     ` Igor Mammedov
2017-04-26 11:27       ` Andrew Jones
2017-04-27 13:24         ` Igor Mammedov
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 15/23] QMP: include CpuInstanceProperties into query_cpus output output Igor Mammedov
2017-03-23 13:19   ` Eric Blake
2017-03-24 12:20     ` Igor Mammedov
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 16/23] tests: numa: add case for QMP command query-cpus Igor Mammedov
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 17/23] numa: remove no longer used numa_get_node_for_cpu() Igor Mammedov
2017-03-28  4:54   ` David Gibson
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 18/23] numa: remove no longer need numa_post_machine_init() Igor Mammedov
2017-03-28  4:55   ` David Gibson
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 19/23] machine: call machine init from wrapper Igor Mammedov
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 20/23] numa: use possible_cpus for not mapped CPUs check Igor Mammedov
2017-03-28  5:13   ` David Gibson
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 21/23] numa: remove node_cpu bitmaps as they are no longer used Igor Mammedov
2017-03-28  5:13   ` David Gibson
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 22/23] numa: add '-numa cpu, ...' option for property based node mapping Igor Mammedov
2017-03-23 13:23   ` Eric Blake
2017-03-24 13:29     ` Igor Mammedov
2017-03-28  5:16   ` David Gibson
2017-03-28 11:09     ` Igor Mammedov
2017-03-29  2:27       ` David Gibson
2017-03-29 12:08         ` Igor Mammedov
2017-04-03  4:40           ` David Gibson
2017-03-22 13:32 ` [Qemu-devel] [PATCH for-2.10 23/23] tests: check -numa node, cpu=props_list usecase Igor Mammedov
2017-04-12 20:18 ` [Qemu-devel] [PATCH for-2.10 00/23] numa: add '-numa cpu' option Eduardo Habkost

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170328041920.GC21068@umbus.fritz.box \
    --to=david@gibson.dropbear.id.au \
    --cc=drjones@redhat.com \
    --cc=eblake@redhat.com \
    --cc=ehabkost@redhat.com \
    --cc=imammedo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-arm@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    --cc=zhaoshenglong@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.