qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Tao Xu <tao3.xu@intel.com>
To: Dan Williams <dan.j.williams@intel.com>,
	Igor Mammedov <imammedo@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>,
	Jingqi Liu <jingqi.liu@intel.com>, "Du, Fan" <fan.du@intel.com>,
	Qemu Developers <qemu-devel@nongnu.org>,
	daniel@linux.ibm.com,
	Jonathan Cameron <jonathan.cameron@huawei.com>
Subject: Re: [Qemu-devel] [PATCH v9 05/11] numa: Extend CLI to provide initiator information for numa nodes
Date: Wed, 14 Aug 2019 13:13:06 +0800	[thread overview]
Message-ID: <a73f7c81-0363-c10f-8ae1-9344abc55449@intel.com> (raw)
In-Reply-To: <CAPcyv4j=wBtBcscJBtrMNDDz=d6+HYYDF=4QLU69d0EPMkTTqg@mail.gmail.com>

On 8/14/2019 10:39 AM, Dan Williams wrote:
> On Tue, Aug 13, 2019 at 8:00 AM Igor Mammedov <imammedo@redhat.com> wrote:
>>
>> On Fri,  9 Aug 2019 14:57:25 +0800
>> Tao <tao3.xu@intel.com> wrote:
>>
>>> From: Tao Xu <tao3.xu@intel.com>
>>>
>>> In ACPI 6.3 chapter 5.2.27 Heterogeneous Memory Attribute Table (HMAT),
>>> The initiator represents processor which access to memory. And in 5.2.27.3
>>> Memory Proximity Domain Attributes Structure, the attached initiator is
>>> defined as where the memory controller responsible for a memory proximity
>>> domain. With attached initiator information, the topology of heterogeneous
>>> memory can be described.
>>>
>>> Extend CLI of "-numa node" option to indicate the initiator numa node-id.
>>> In the linux kernel, the codes in drivers/acpi/hmat/hmat.c parse and report
>>> the platform's HMAT tables.
>>>
>>> Reviewed-by: Jingqi Liu <Jingqi.liu@intel.com>
>>> Suggested-by: Dan Williams <dan.j.williams@intel.com>
>>> Signed-off-by: Tao Xu <tao3.xu@intel.com>
>>> ---
>>>
>>> No changes in v9
>>> ---
>>>   hw/core/machine.c     | 24 ++++++++++++++++++++++++
>>>   hw/core/numa.c        | 13 +++++++++++++
>>>   include/sysemu/numa.h |  3 +++
>>>   qapi/machine.json     |  6 +++++-
>>>   qemu-options.hx       | 27 +++++++++++++++++++++++----
>>>   5 files changed, 68 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/hw/core/machine.c b/hw/core/machine.c
>>> index 3c55470103..113184a9df 100644
>>> --- a/hw/core/machine.c
>>> +++ b/hw/core/machine.c
>>> @@ -640,6 +640,7 @@ void machine_set_cpu_numa_node(MachineState *machine,
>>>                                  const CpuInstanceProperties *props, Error **errp)
>>>   {
>>>       MachineClass *mc = MACHINE_GET_CLASS(machine);
>>> +    NodeInfo *numa_info = machine->numa_state->nodes;
>>>       bool match = false;
>>>       int i;
>>>
>>> @@ -709,6 +710,16 @@ void machine_set_cpu_numa_node(MachineState *machine,
>>>           match = true;
>>>           slot->props.node_id = props->node_id;
>>>           slot->props.has_node_id = props->has_node_id;
>>> +
>>> +        if (numa_info[props->node_id].initiator_valid &&
>>> +            (props->node_id != numa_info[props->node_id].initiator)) {
>>> +            error_setg(errp, "The initiator of CPU NUMA node %" PRId64
>>> +                       " should be itself.", props->node_id);
>>> +            return;
>>> +        }
>>> +        numa_info[props->node_id].initiator_valid = true;
>>> +        numa_info[props->node_id].has_cpu = true;
>>> +        numa_info[props->node_id].initiator = props->node_id;
>>>       }
>>>
>>>       if (!match) {
>>> @@ -1050,6 +1061,7 @@ static void machine_numa_finish_cpu_init(MachineState *machine)
>>>       GString *s = g_string_new(NULL);
>>>       MachineClass *mc = MACHINE_GET_CLASS(machine);
>>>       const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(machine);
>>> +    NodeInfo *numa_info = machine->numa_state->nodes;
>>>
>>>       assert(machine->numa_state->num_nodes);
>>>       for (i = 0; i < possible_cpus->len; i++) {
>>> @@ -1083,6 +1095,18 @@ static void machine_numa_finish_cpu_init(MachineState *machine)
>>>               machine_set_cpu_numa_node(machine, &props, &error_fatal);
>>>           }
>>>       }
>>> +
>>> +    for (i = 0; i < machine->numa_state->num_nodes; i++) {
>>> +        if (numa_info[i].initiator_valid &&
>>> +            !numa_info[numa_info[i].initiator].has_cpu) {
>>                            ^^^^^^^^^^^^^^^^^^^^^^ possible out of bounds read, see bellow
>>
>>> +            error_report("The initiator-id %"PRIu16 " of NUMA node %d"
>>> +                         " does not exist.", numa_info[i].initiator, i);
>>> +            error_printf("\n");
>>> +
>>> +            exit(1);
>>> +        }
>> it takes care only about nodes that have cpus or memory-only ones that have
>> initiator explicitly provided on CLI. And leaves possibility to have
>> memory-only nodes without initiator mixed with nodes that have initiator.
>> Is it valid to have mixed configuration?
>> Should we forbid it?
> 
> The spec talks about the "Proximity Domain for the Attached Initiator"
> field only being valid if the memory controller for the memory can be
> identified by an initiator id in the SRAT. So I expect the only way to
> define a memory proximity domain without this local initiator is to
> allow specifying a node-id that does not have an entry in the SRAT.
> 
Hi Dan,

So there may be a situation for the Attached Initiator field is not
valid? If true, I would allow user to input Initiator invalid.

> That would be a useful feature for testing OS HMAT parsing behavior,
> and may match platforms that exist in practice.
> 
>>
>>> +    }
>>> +
>>>       if (s->len && !qtest_enabled()) {
>>>           warn_report("CPU(s) not present in any NUMA nodes: %s",
>>>                       s->str);
>>> diff --git a/hw/core/numa.c b/hw/core/numa.c
>>> index 8fcbba05d6..cfb6339810 100644
>>> --- a/hw/core/numa.c
>>> +++ b/hw/core/numa.c
>>> @@ -128,6 +128,19 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node,
>>>           numa_info[nodenr].node_mem = object_property_get_uint(o, "size", NULL);
>>>           numa_info[nodenr].node_memdev = MEMORY_BACKEND(o);
>>>       }
>>> +
>>> +    if (node->has_initiator) {
>>> +        if (numa_info[nodenr].initiator_valid &&
>>> +            (node->initiator != numa_info[nodenr].initiator)) {
>>> +            error_setg(errp, "The initiator of NUMA node %" PRIu16 " has been "
>>> +                       "set to node %" PRIu16, nodenr,
>>> +                       numa_info[nodenr].initiator);
>>> +            return;
>>> +        }
>>> +
>>> +        numa_info[nodenr].initiator_valid = true;
>>> +        numa_info[nodenr].initiator = node->initiator;
>>                                               ^^^
>> not validated  user input? (which could lead to read beyond numa_info[] boundaries
>> in previous hunk).
>>
>>> +    }
>>>       numa_info[nodenr].present = true;
>>>       max_numa_nodeid = MAX(max_numa_nodeid, nodenr + 1);
>>>       ms->numa_state->num_nodes++;
>>> diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
>>> index 76da3016db..46ad06e000 100644
>>> --- a/include/sysemu/numa.h
>>> +++ b/include/sysemu/numa.h
>>> @@ -10,6 +10,9 @@ struct NodeInfo {
>>>       uint64_t node_mem;
>>>       struct HostMemoryBackend *node_memdev;
>>>       bool present;
>>> +    bool has_cpu;
>>> +    bool initiator_valid;
>>> +    uint16_t initiator;
>>>       uint8_t distance[MAX_NODES];
>>>   };
>>>
>>> diff --git a/qapi/machine.json b/qapi/machine.json
>>> index 6db8a7e2ec..05e367d26a 100644
>>> --- a/qapi/machine.json
>>> +++ b/qapi/machine.json
>>> @@ -414,6 +414,9 @@
>>>   # @memdev: memory backend object.  If specified for one node,
>>>   #          it must be specified for all nodes.
>>>   #
>>> +# @initiator: the initiator numa nodeid that is closest (as in directly
>>> +#             attached) to this numa node (since 4.2)
>> well, it's pretty unclear what doc comment means (unless reader knows well
>> specific part of ACPI spec)
>>
>> suggest to rephrase to something more understandable for unaware
>> readers (+ possible reference to spec for those who is interested
>> in spec definition since this doc is meant for developers).
>>
>>> +#
>>>   # Since: 2.1
>>>   ##
>>>   { 'struct': 'NumaNodeOptions',
>>> @@ -421,7 +424,8 @@
>>>      '*nodeid': 'uint16',
>>>      '*cpus':   ['uint16'],
>>>      '*mem':    'size',
>>> -   '*memdev': 'str' }}
>>> +   '*memdev': 'str',
>>> +   '*initiator': 'uint16' }}
>>>
>>>   ##
>>>   # @NumaDistOptions:
>>> diff --git a/qemu-options.hx b/qemu-options.hx
>>> index 9621e934c0..c480781992 100644
>>> --- a/qemu-options.hx
>>> +++ b/qemu-options.hx
>>> @@ -161,14 +161,14 @@ If any on the three values is given, the total number of CPUs @var{n} can be omi
>>>   ETEXI
>>>
>>>   DEF("numa", HAS_ARG, QEMU_OPTION_numa,
>>> -    "-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node]\n"
>>> -    "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node]\n"
>>> +    "-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
>>> +    "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
>>>       "-numa dist,src=source,dst=destination,val=distance\n"
>>>       "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n",
>>>       QEMU_ARCH_ALL)
>>>   STEXI
>>> -@item -numa node[,mem=@var{size}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}]
>>> -@itemx -numa node[,memdev=@var{id}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}]
>>> +@item -numa node[,mem=@var{size}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}][,initiator=@var{initiator}]
>>> +@itemx -numa node[,memdev=@var{id}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}][,initiator=@var{initiator}]
>>>   @itemx -numa dist,src=@var{source},dst=@var{destination},val=@var{distance}
>>>   @itemx -numa cpu,node-id=@var{node}[,socket-id=@var{x}][,core-id=@var{y}][,thread-id=@var{z}]
>>>   @findex -numa
>>> @@ -215,6 +215,25 @@ split equally between them.
>>>   @samp{mem} and @samp{memdev} are mutually exclusive. Furthermore,
>>>   if one node uses @samp{memdev}, all of them have to use it.
>>>
>>> +@samp{initiator} indicate the initiator NUMA @var{initiator} that is
>>                                    ^^^^^^^       ^^^^^^^^^^^^^^
>> above will result in "initiator NUMA initiator", was it your intention?
>>
>>> +closest (as in directly attached) to this NUMA @var{node}.
>> Again suggest replace spec language with something more user friendly
>> (this time without spec reference as it's geared for end user)
>>
>>> +For example, the following option assigns 2 NUMA nodes, node 0 has CPU.
>> Following example creates a machine with 2 NUMA ...
>>
>>> +node 1 has only memory, and its' initiator is node 0. Note that because
>>> +node 0 has CPU, by default the initiator of node 0 is itself and must be
>>> +itself.
>>> +@example
>>> +-M pc \
>>> +-m 2G,slots=2,maxmem=4G \
>>> +-object memory-backend-ram,size=1G,id=m0 \
>>> +-object memory-backend-ram,size=1G,id=m1 \
>>> +-numa node,nodeid=0,memdev=m0 \
>>> +-numa node,nodeid=1,memdev=m1,initiator=0 \
>>> +-smp 2,sockets=2,maxcpus=2  \
>>> +-numa cpu,node-id=0,socket-id=0 \
>>> +-numa cpu,node-id=0,socket-id=1 \
>>> +@end example
>>> +
>>>   @var{source} and @var{destination} are NUMA node IDs.
>>>   @var{distance} is the NUMA distance from @var{source} to @var{destination}.
>>>   The distance from a node to itself is always 10. If any pair of nodes is
>>



  reply	other threads:[~2019-08-14  5:13 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-09  6:57 [Qemu-devel] [PATCH v9 00/11] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao
2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 01/11] hw/arm: simplify arm_load_dtb Tao
2019-08-13 21:55   ` Alistair Francis
2019-08-14  1:19     ` Andrew Jeffery
2019-08-13 21:55   ` Eduardo Habkost
2019-08-14 13:08     ` Cédric Le Goater
2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 02/11] numa: move numa global variable nb_numa_nodes into MachineState Tao
2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 03/11] numa: move numa global variable have_numa_distance " Tao
2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 04/11] numa: move numa global variable numa_info " Tao
2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 05/11] numa: Extend CLI to provide initiator information for numa nodes Tao
2019-08-13 15:00   ` Igor Mammedov
2019-08-14  2:24     ` Tao Xu
2019-08-16 14:47       ` Igor Mammedov
2019-08-14  2:39     ` Dan Williams
2019-08-14  5:13       ` Tao Xu [this message]
2019-08-14 21:29         ` Dan Williams
2019-08-15  1:56           ` Tao Xu
2019-08-15  2:31             ` Dan Williams
2019-08-16 14:57               ` Igor Mammedov
2019-08-20  8:34                 ` Tao Xu
2019-08-27 13:12                   ` Igor Mammedov
2019-08-28  1:09                     ` Tao Xu
2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 06/11] hmat acpi: Build Memory Proximity Domain Attributes Structure(s) Tao
2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 07/11] hmat acpi: Build System Locality Latency and Bandwidth Information Structure(s) Tao
2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 08/11] hmat acpi: Build Memory Side Cache " Tao
2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 09/11] numa: Extend the CLI to provide memory latency and bandwidth information Tao
2019-08-12  5:13   ` Daniel Black
2019-08-12  6:11     ` Tao Xu
2019-08-13 15:11   ` Eric Blake
2019-08-14  2:58     ` Tao Xu
2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 10/11] numa: Extend the CLI to provide memory side cache information Tao
2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 11/11] tests/bios-tables-test: add test cases for ACPI HMAT Tao
2019-08-09 11:11 ` [Qemu-devel] [PATCH v9 00/11] Build ACPI Heterogeneous Memory Attribute Table (HMAT) no-reply
2019-08-13  8:53 ` Tao Xu
2019-08-14 20:57   ` Eduardo Habkost
2019-08-15  0:53     ` Tao Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a73f7c81-0363-c10f-8ae1-9344abc55449@intel.com \
    --to=tao3.xu@intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=daniel@linux.ibm.com \
    --cc=ehabkost@redhat.com \
    --cc=fan.du@intel.com \
    --cc=imammedo@redhat.com \
    --cc=jingqi.liu@intel.com \
    --cc=jonathan.cameron@huawei.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).