QEMU-Devel Archive on lore.kernel.org
 help / color / Atom feed
From: Babu Moger <babu.moger@amd.com>
To: Eduardo Habkost <ehabkost@redhat.com>
Cc: "mst@redhat.com" <mst@redhat.com>,
	"armbru@redhat.com" <armbru@redhat.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"imammedo@redhat.com" <imammedo@redhat.com>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"rth@twiddle.net" <rth@twiddle.net>
Subject: Re: [Qemu-devel] [RFC 2 PATCH 01/16] numa: Split the numa functionality
Date: Mon, 2 Dec 2019 14:18:36 -0600
Message-ID: <a1a0328b-4adf-eeaf-bab4-c02e430cd44a@amd.com> (raw)
In-Reply-To: <20191010032538.GB21666@habkost.net>


Eduardo,
Sorry taking time to respond to your comments earlier. Started working on
something else so it took a while to come back.

Working on v3 right now and taken care most of your comments on v2. Will
plan to submit v3 sometime this week.

On 10/9/19 10:25 PM, Eduardo Habkost wrote:
> Hi,
> 
> First of all, sorry for taking more than a month to start
> reviewing this.
> 
> On Fri, Sep 06, 2019 at 07:11:43PM +0000, Moger, Babu wrote:
>> To support new epyc mode, we need to know the number of numa nodes
>> in advance to generate apic id correctly. [...]
> 
> This explains that we need to initialize numa_info earlier than
> something...
> 
>>                                     [...] So, split the numa
>> initialization into two. The function parse_numa initializes numa_info
>> and updates nb_numa_nodes. And then parse_numa_node does the numa node
>> initialization.
> 
> ...but I miss what "something" is.
> 
> The sequence of events here will be:
> 
> * parse_numa_opts()
>   * for each -numa option:
>     * parse_numa()
>       * set_numa_options()
>         * parse_numa_info()
>           * here ms->numa_state->num_nodes is incremented [1]
> * parse_numa_node_opts()
>   * for each -numa option:
>     * parse_numa_node()
>       * set_numa_node_options()
>         * here are the operations that are being delayed by this
>           patch [2]
> 
> What exactly makes it necessary for [2] to happen after [1] is
> done for all NUMA nodes?

We dont need these changes anymore. Look at my comments below.

> 
> This needs to be clear in the code, otherwise somebody will try to refactor
> this in the future and merge set_numa_node_options() and parse_numa_info()
> again, not knowing why ordering between [1] and [2] is so important.
> 
> In addition to documenting it better, I suggest saving the CPU
> index list in NodeInfo, instead of calling qemu_opts_foreach()
> twice.  (Probably a good idea to document the new field as
> internal, though.  We don't want machine-specific code to be
> looking at the CPU index list.)
> 
> Also, would it work if the delayed initialization is done at
> numa_complete_configuration() instead of a new
> parse_numa_node_opts() function?  We already have 2 separate
> steps in NUMA initialization (parse_numa_node() and
> numa_complete_configuration()), so it would be nice to avoid
> adding a 3rd one.
> 
> Putting all the suggestions together, the code would look like this:
> 
> 
> static void parse_numa_node(MachineState *ms, NumaNodeOptions *node,
>                             Error **errp)
> {
>     /* ... */
>     numa_info[nodenr].cpu_indexes = QAPI_CLONE(node->cpus, uint16List);
>     /* ... */
> }
> 
> void numa_complete_configuration(MachineState *ms)
> {
>     /* ... */
>     for (i = 0; i < ms->numa_state->num_nodes; i++) {
>         /*
>          * numa_node_complete_configuration() needs to be called after all
>          * nodes were already parsed, because <insert reason here>,
>          */
>         numa_node_complete_configuration(numa_info[i]);
>     }
> }
> 
> void numa_node_complete_configuration(MachineState *ms, NodeInfo *node)
> {
>     for (cpu_index = node->cpu_indexes; cpu_index; cpu_index = cpu_index->next) {
>         CpuInstanceProperties props;
>         props = mc->cpu_index_to_instance_props(ms, cpu_index->value);
>         props.node_id = nodenr;
>         props.has_node_id = true;
>         machine_set_cpu_numa_node(ms, &props, &err);
>     }
> }

Yes. I this works fine. Also makes the code simple. Only requirement was
to know the number of numa nodes in advance. Thanks

> 
> 
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>>  hw/core/numa.c        |  106 +++++++++++++++++++++++++++++++++++--------------
>>  include/sysemu/numa.h |    2 +
>>  vl.c                  |    2 +
>>  3 files changed, 80 insertions(+), 30 deletions(-)
>>
>> diff --git a/hw/core/numa.c b/hw/core/numa.c
>> index a11431483c..27fa6b5e1d 100644
>> --- a/hw/core/numa.c
>> +++ b/hw/core/numa.c
>> @@ -55,14 +55,10 @@ bool have_numa_distance;
>>  NodeInfo numa_info[MAX_NODES];
>>  
>>  
>> -static void parse_numa_node(MachineState *ms, NumaNodeOptions *node,
>> +static void parse_numa_info(MachineState *ms, NumaNodeOptions *node,
>>                              Error **errp)
>>  {
>> -    Error *err = NULL;
>>      uint16_t nodenr;
>> -    uint16List *cpus = NULL;
>> -    MachineClass *mc = MACHINE_GET_CLASS(ms);
>> -    unsigned int max_cpus = ms->smp.max_cpus;
>>  
>>      if (node->has_nodeid) {
>>          nodenr = node->nodeid;
>> @@ -81,29 +77,6 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node,
>>          return;
>>      }
>>  
>> -    if (!mc->cpu_index_to_instance_props || !mc->get_default_cpu_node_id) {
>> -        error_setg(errp, "NUMA is not supported by this machine-type");
>> -        return;
>> -    }
>> -    for (cpus = node->cpus; cpus; cpus = cpus->next) {
>> -        CpuInstanceProperties props;
>> -        if (cpus->value >= max_cpus) {
>> -            error_setg(errp,
>> -                       "CPU index (%" PRIu16 ")"
>> -                       " should be smaller than maxcpus (%d)",
>> -                       cpus->value, max_cpus);
>> -            return;
>> -        }
>> -        props = mc->cpu_index_to_instance_props(ms, cpus->value);
>> -        props.node_id = nodenr;
>> -        props.has_node_id = true;
>> -        machine_set_cpu_numa_node(ms, &props, &err);
>> -        if (err) {
>> -            error_propagate(errp, err);
>> -            return;
>> -        }
>> -    }
>> -
>>      have_memdevs = have_memdevs ? : node->has_memdev;
>>      have_mem = have_mem ? : node->has_mem;
>>      if ((node->has_mem && have_memdevs) || (node->has_memdev && have_mem)) {
>> @@ -177,7 +150,7 @@ void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp)
>>  
>>      switch (object->type) {
>>      case NUMA_OPTIONS_TYPE_NODE:
>> -        parse_numa_node(ms, &object->u.node, &err);
>> +        parse_numa_info(ms, &object->u.node, &err);
>>          if (err) {
>>              goto end;
>>          }
>> @@ -242,6 +215,73 @@ end:
>>      return 0;
>>  }
>>  
>> +void set_numa_node_options(MachineState *ms, NumaOptions *object, Error **errp)
>> +{
>> +    MachineClass *mc = MACHINE_GET_CLASS(ms);
>> +    NumaNodeOptions *node = &object->u.node;
>> +    unsigned int max_cpus = ms->smp.max_cpus;
>> +    uint16List *cpus = NULL;
>> +    Error *err = NULL;
>> +    uint16_t nodenr;
>> +
>> +    if (node->has_nodeid) {
>> +        nodenr = node->nodeid;
>> +    } else {
>> +        error_setg(errp, "NUMA node information is not available");
>> +    }
>> +
>> +    if (!mc->cpu_index_to_instance_props || !mc->get_default_cpu_node_id) {
>> +        error_setg(errp, "NUMA is not supported by this machine-type");
>> +        return;
>> +    }
>> +
>> +    for (cpus = node->cpus; cpus; cpus = cpus->next) {
>> +        CpuInstanceProperties props;
>> +        if (cpus->value >= max_cpus) {
>> +            error_setg(errp,
>> +                       "CPU index (%" PRIu16 ")"
>> +                       " should be smaller than maxcpus (%d)",
>> +                       cpus->value, max_cpus);
>> +            return;
>> +         }
>> +         props = mc->cpu_index_to_instance_props(ms, cpus->value);
>> +         props.node_id = nodenr;
>> +         props.has_node_id = true;
>> +         machine_set_cpu_numa_node(ms, &props, &err);
>> +         if (err) {
>> +            error_propagate(errp, err);
>> +            return;
>> +         }
>> +    }
>> +}
>> +
>> +static int parse_numa_node(void *opaque, QemuOpts *opts, Error **errp)
>> +{
>> +    NumaOptions *object = NULL;
>> +    MachineState *ms = MACHINE(opaque);
>> +    Error *err = NULL;
>> +    Visitor *v = opts_visitor_new(opts);
>> +
>> +    visit_type_NumaOptions(v, NULL, &object, &err);
>> +    visit_free(v);
>> +    if (err) {
>> +        goto end;
>> +    }
>> +
>> +    if (object->type == NUMA_OPTIONS_TYPE_NODE) {
>> +        set_numa_node_options(ms, object, &err);
>> +    }
>> +
>> +end:
>> +    qapi_free_NumaOptions(object);
>> +    if (err) {
>> +        error_propagate(errp, err);
>> +        return -1;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>>  /* If all node pair distances are symmetric, then only distances
>>   * in one direction are enough. If there is even one asymmetric
>>   * pair, though, then all distances must be provided. The
>> @@ -368,7 +408,7 @@ void numa_complete_configuration(MachineState *ms)
>>      if (ms->ram_slots > 0 && nb_numa_nodes == 0 &&
>>          mc->auto_enable_numa_with_memhp) {
>>              NumaNodeOptions node = { };
>> -            parse_numa_node(ms, &node, &error_abort);
>> +            parse_numa_info(ms, &node, &error_abort);
>>      }
>>  
>>      assert(max_numa_nodeid <= MAX_NODES);
>> @@ -448,6 +488,12 @@ void parse_numa_opts(MachineState *ms)
>>      qemu_opts_foreach(qemu_find_opts("numa"), parse_numa, ms, &error_fatal);
>>  }
>>  
>> +void parse_numa_node_opts(MachineState *ms)
>> +{
>> +    qemu_opts_foreach(qemu_find_opts("numa"), parse_numa_node,
>> +                      ms, &error_fatal);
>> +}
>> +
>>  void numa_cpu_pre_plug(const CPUArchId *slot, DeviceState *dev, Error **errp)
>>  {
>>      int node_id = object_property_get_int(OBJECT(dev), "node-id", &error_abort);
>> diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
>> index 01a263eba2..ca109adaa6 100644
>> --- a/include/sysemu/numa.h
>> +++ b/include/sysemu/numa.h
>> @@ -24,7 +24,9 @@ struct NumaNodeMem {
>>  extern NodeInfo numa_info[MAX_NODES];
>>  
>>  void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp);
>> +void set_numa_node_options(MachineState *ms, NumaOptions *object, Error **errp);
>>  void parse_numa_opts(MachineState *ms);
>> +void parse_numa_node_opts(MachineState *ms);
>>  void numa_complete_configuration(MachineState *ms);
>>  void query_numa_node_mem(NumaNodeMem node_mem[]);
>>  extern QemuOptsList qemu_numa_opts;
>> diff --git a/vl.c b/vl.c
>> index b426b32134..711d2ae5da 100644
>> --- a/vl.c
>> +++ b/vl.c
>> @@ -4339,6 +4339,8 @@ int main(int argc, char **argv, char **envp)
>>      }
>>      parse_numa_opts(current_machine);
>>  
>> +    parse_numa_node_opts(current_machine);
>> +
>>      /* do monitor/qmp handling at preconfig state if requested */
>>      main_loop();
>>  
>>
> 


  reply index

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-06 19:11 [Qemu-devel] [RFC 2 PATCH 00/16] APIC ID fixes for AMD EPYC CPU models Moger, Babu
2019-09-06 19:11 ` [Qemu-devel] [RFC 2 PATCH 01/16] numa: Split the numa functionality Moger, Babu
2019-10-10  3:25   ` Eduardo Habkost
2019-12-02 20:18     ` Babu Moger [this message]
2019-09-06 19:11 ` [Qemu-devel] [RFC 2 PATCH 02/16] hw/i386: Rename X86CPUTopoInfo structure to X86CPUTopoIDs Moger, Babu
2019-10-10  3:26   ` Eduardo Habkost
2019-09-06 19:11 ` [Qemu-devel] [RFC 2 PATCH 03/16] hw/i386: Introduce X86CPUTopoInfo to contain topology info Moger, Babu
2019-10-11  2:29   ` Eduardo Habkost
2019-12-02 20:23     ` Babu Moger
2019-10-11  3:54   ` Eduardo Habkost
2019-12-02 20:25     ` Babu Moger
2019-09-06 19:12 ` [Qemu-devel] [RFC 2 PATCH 04/16] machine: Add SMP Sockets in CpuTopology Moger, Babu
2019-10-11  2:31   ` Eduardo Habkost
2019-09-06 19:12 ` [Qemu-devel] [RFC 2 PATCH 05/16] hw/i386: Simplify topology Offset/width Calculation Moger, Babu
2019-10-11  2:32   ` Eduardo Habkost
2019-12-02 20:29     ` Babu Moger
2019-09-06 19:12 ` [Qemu-devel] [RFC 2 PATCH 06/16] hw/core: Add core complex id in X86CPU topology Moger, Babu
2019-09-06 19:20   ` Eric Blake
2019-09-06 19:34     ` Moger, Babu
2019-09-22 12:48   ` Michael S. Tsirkin
2019-09-23 14:38     ` Moger, Babu
2019-09-06 19:12 ` [Qemu-devel] [RFC 2 PATCH 07/16] hw/386: Add new epyc mode topology decoding functions Moger, Babu
2019-10-11  3:29   ` Eduardo Habkost
2019-12-02 20:38     ` Babu Moger
2019-09-06 19:12 ` [Qemu-devel] [RFC 2 PATCH 08/16] i386: Cleanup and use the new epyc mode topology functions Moger, Babu
2019-10-11  3:51   ` Eduardo Habkost
2019-09-06 19:12 ` [Qemu-devel] [RFC 2 PATCH 09/16] hw/i386: Introduce initialize_topo_info function Moger, Babu
2019-10-11  3:54   ` Eduardo Habkost
2019-09-06 19:12 ` [Qemu-devel] [RFC 2 PATCH 10/16] hw/i386: Introduce apicid_from_cpu_idx in PCMachineState Moger, Babu
2019-09-06 19:12 ` [Qemu-devel] [RFC 2 PATCH 11/16] Introduce-topo_ids_from_apicid-handler Moger, Babu
2019-09-06 19:13 ` [Qemu-devel] [RFC 2 PATCH 12/16] hw/i386: Introduce apic_id_from_topo_ids handler in PCMachineState Moger, Babu
2019-09-06 19:13 ` [Qemu-devel] [RFC 2 PATCH 13/16] machine: Add new epyc property " Moger, Babu
2019-10-11  3:59   ` Eduardo Habkost
2019-10-11 16:23     ` Moger, Babu
2019-10-11 16:59     ` Moger, Babu
2019-10-11 19:03       ` Eduardo Habkost
2019-09-06 19:13 ` [Qemu-devel] [RFC 2 PATCH 14/16] hw/i386: Introduce epyc mode function handlers Moger, Babu
2019-09-06 19:13 ` [Qemu-devel] [RFC 2 PATCH 15/16] i386: Fix pkg_id offset for epyc mode Moger, Babu
2019-10-11  4:03   ` Eduardo Habkost
2019-09-06 19:13 ` [Qemu-devel] [RFC 2 PATCH 16/16] hw/core: Fix up the machine_set_cpu_numa_node for epyc Moger, Babu
2019-10-11  4:10   ` Eduardo Habkost
2019-12-02 20:44     ` Babu Moger
2019-09-20 22:44 ` [RFC 2 PATCH 00/16] APIC ID fixes for AMD EPYC CPU models Moger, Babu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a1a0328b-4adf-eeaf-bab4-c02e430cd44a@amd.com \
    --to=babu.moger@amd.com \
    --cc=armbru@redhat.com \
    --cc=ehabkost@redhat.com \
    --cc=imammedo@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=rth@twiddle.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

QEMU-Devel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/qemu-devel/0 qemu-devel/git/0.git
	git clone --mirror https://lore.kernel.org/qemu-devel/1 qemu-devel/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 qemu-devel qemu-devel/ https://lore.kernel.org/qemu-devel \
		qemu-devel@nongnu.org
	public-inbox-index qemu-devel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.nongnu.qemu-devel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git