qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Tao Xu <tao3.xu@intel.com>
To: Daniel Black <daniel@linux.ibm.com>
Cc: ehabkost@redhat.com, jingqi.liu@intel.com, fan.du@intel.com,
	qemu-devel@nongnu.org, jonathan.cameron@huawei.com,
	imammedo@redhat.com, dan.j.williams@intel.com
Subject: Re: [Qemu-devel] [PATCH v9 09/11] numa: Extend the CLI to provide memory latency and bandwidth information
Date: Mon, 12 Aug 2019 14:11:11 +0800	[thread overview]
Message-ID: <a31b11d4-52d8-1c91-a8c9-70470fc3159d@intel.com> (raw)
In-Reply-To: <20190812151340.567e1cb6@volution.ozlabs.ibm.com>

On 8/12/2019 1:13 PM, Daniel Black wrote:
> 
> 
> Tao Xu, Liu Jingqi,
> 
> Thanks for doing these updates.
> 
> On Fri,  9 Aug 2019 14:57:29 +0800
> Tao <tao3.xu@intel.com> wrote:
> 
>> From: Liu Jingqi <jingqi.liu@intel.com>
>>
>> Add -numa hmat-lb option to provide System Locality Latency and
>> Bandwidth Information. These memory attributes help to build
>> System Locality Latency and Bandwidth Information Structure(s)
>> in ACPI Heterogeneous Memory Attribute Table (HMAT).
>>
>> Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
>> Signed-off-by: Tao Xu <tao3.xu@intel.com>
>> ---
>>   hw/acpi/hmat.h        |   3 +
>>   hw/core/numa.c        | 185
>> ++++++++++++++++++++++++++++++++++++++++++ include/sysemu/numa.h |
>> 2 + qapi/machine.json     |  95 +++++++++++++++++++++-
>>   qemu-options.hx       |  44 +++++++++-
>>   5 files changed, 326 insertions(+), 3 deletions(-)
>>
>> diff --git a/hw/acpi/hmat.h b/hw/acpi/hmat.h
>> index 6c32f12e78..b7c1e02cf0 100644
>> --- a/hw/acpi/hmat.h
>> +++ b/hw/acpi/hmat.h
>> @@ -42,6 +42,9 @@
>>   
>>   #define HMAT_IS_LATENCY(type) (type <= HMAT_LB_DATA_WRITE_LATENCY)
>>   
>> +#define PICO_PER_USEC 1000000
>> +#define PICO_PER_NSEC 1000
>> +
>>   struct HMAT_LB_Info {
>>       /*
>>        * Indicates total number of Proximity Domains
>> diff --git a/hw/core/numa.c b/hw/core/numa.c
>> index cfb6339810..9a494145f3 100644
>> --- a/hw/core/numa.c
>> +++ b/hw/core/numa.c
>> @@ -37,6 +37,7 @@
>>   #include "qemu/option.h"
>>   #include "qemu/config-file.h"
>>   #include "qemu/cutils.h"
>> +#include "hw/acpi/hmat.h"
>>   
>>   QemuOptsList qemu_numa_opts = {
>>       .name = "numa",
>> @@ -183,6 +184,184 @@ void parse_numa_distance(MachineState *ms,
>> NumaDistOptions *dist, Error **errp)
>> ms->numa_state->have_numa_distance = true; }
>>   
>> +void parse_numa_hmat_lb(MachineState *ms, NumaHmatLBOptions *node,
>> +                        Error **errp)
>> +{
> ..
> 
> Optional; you could support not connected (0xffff) for latency/bandwidth in
> this parsing.
> 
>> +        if (*endptr == '\0') {
>> +            base_lat = 1;
>> +        } else if (*(endptr + 1) == 's') {
>> +            switch (*endptr) {
>> +            case 'p':
>> +                base_lat = 1;
>> +                break;
>> +            case 'n':
>> +                base_lat = PICO_PER_NSEC;
>> +                break;
>> +            case 'u':
> 
> Glad you picked up my mismatch of "u/micro".
> 
>> +        } else {
>> +            error_setg(errp, "Invalid latency unit %s,"
>> +                "vaild units are \"ps\" \"ns\" \"us\"",
>> node->latency);
> 
> typo "valid"
> 
>> +        } else if (hmat_lb->base_lat != base_lat) {
>> +            error_setg(errp, "Invalid latency unit %s,"
>> +                " please unify the units.", node->latency);
> 
> This error is misleading. Should be something like "all latencies must be
> specified in the same units"
> 
>> +        switch (toupper(*endptr)) {
>> +        case '\0':
>> +        case 'M':
>> +            base_bw = 1;
>> +            break;
>> +        case 'G':
>> +            base_bw = UINT64_C(1) << 10;
>> +            break;
> 
> There was one more gap - Terra.
> 
>          case 'T':
>             base_bw = UINT64_C(1) << 20;
>             break;
> 
Oh, my mistake, I should use TB/s instead of PB/s.
Thank you for pointing out this.

>> +        case 'P':
>> +            base_bw = UINT64_C(1) << 20;
> and:
>                 base_bw = UINT64_C(1) << 30;
> 
>> +            break;
>> +        }
> 
> 
> Currently Linux 5.3.0-rc3+ doesn't cope with real corrected "bandwidth=2P" so
> maybe not worth it. >
> [    2.092060] HMAT: Locality: Flags:00 Type:Access Bandwidth Initiator
> Domains:1 Target Domains:2 Base:1073741824 [    2.092326]   Initiator-Target[0-0]:-2147483648 MB/s
> 
> On values, testing for overflow is required. e.g:
> 
>   -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=4096T
> bandwidth=4096T
> 
> [    2.047676] HMAT: Locality: Flags:00 Type:Access Bandwidth Initiator Domains:1 Target Domains:2 Base:1048576
> [    2.048084]   Initiator-Target[0-0]:0 MB/s
> 
> Technically ACPI could support up to 4P with base/offset but you'd need to be a
> lot trickier (i.e. base is highest common multiple of all entries and then see
> if entry/base > 2^32-2 ) with base/entry values to arrive at this number.
> 
> +docs/commit message propagation of this.
>

I agree. I also test the overflow case. Thank you for your suggestion. I 
will add a docs for it.
> 
>> +        } else if (hmat_lb->base_lat != base_lat) {
> 
> Bug: Incorrectly copied - base_lat should be base_bw (twice)
> 

My mistake, I will correct it.
>> +            error_setg(errp, "Invalid bandwidth unit %s,"
>> +                " please unify the units.", node->bandwidth);
> 
> This error is misleading. Should be something like "all bandwidths must be
> specified in the same units"
> 
>> diff --git a/qemu-options.hx b/qemu-options.hx
>> index c480781992..cda4607f3a 100644
>> --- a/qemu-options.hx
>> +++ b/qemu-options.hx
> 
>> +@example
>> +-m 2G \
>> +-object memory-backend-ram,size=1024M,policy=bind,host-nodes=0,id=ram-node0 -numa node,nodeid=0,memdev=ram-node0 \
>> +-object memory-backend-ram,size=1024M,policy=bind,host-nodes=1,id=ram-node1 -numa node,nodeid=1,memdev=ram-node1 \
>> +-smp 2 \
>> +-numa cpu,node-id=0,socket-id=0 \
>> +-numa cpu,node-id=0,socket-id=1 \
>> +-numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,latency=5ns \
>> +-numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=200M \
>> +-numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,latency=10ns \
>> +-numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=100M \
>> +@end example
> 
> nit: remove slash on last line
> 
> Is this a valid example? I get
> 
> qemu-system-x86_64: -numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,latency=11us: Invalid target=1, it hasn't a valid initiator proximity domain.
> 
> (I tested with host-nodes=1 changed to 0 as local machine is single node)
> 

I forget to update the example. It should add as follow:

   -numa node,nodeid=0,memdev=ram-node0 \
   -numa node,nodeid=1,memdev=ram-node1,initiator=1 \

> Technically on [PATCH v9 07/11]
> diff --git a/hw/acpi/hmat.c b/hw/acpi/hmat.c
> index abf99b1adc..431818dc82 100644
> --- a/hw/acpi/hmat.c
> +++ b/hw/acpi/hmat.c
> @@ -67,11 +67,81 @@ static void build_hmat_mpda(GArray *table_data, uint16_t flags, int initiator,
>       build_append_int_noprefix(table_data, 0, 8);
>   }
>   
> +/*
> + * ACPI 6.3: 5.2.27.4 System Locality Latency and Bandwidth Information
> + * Structure: Table 5-142
> 
> nit: 5-146
> 
> Test as follows:
> 
> qemu-system-x86_64   -kernel /home/dan/repos/linux/vmlinux   -nographic -append  console=ttyS0  \
>     -m 2G -object memory-backend-ram,size=1024M,policy=bind,host-nodes=0,id=ram-node0 \
>     -numa node,nodeid=0,memdev=ram-node0 \
>     -object memory-backend-ram,size=1024M,policy=bind,host-nodes=0,id=ram-node1 \
>     -numa node,nodeid=1,memdev=ram-node1 -smp 2 -numa cpu,node-id=0,socket-id=0 \
>     -numa cpu,node-id=0,socket-id=1  \
>     -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,latency=123us \
>     -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=200M \
>     -numa hmat-cache,node-id=0,size=0x20000,total=1,level=1,assoc=direct,policy=write-back,line=8 \
>     -numa hmat-cache,node-id=1,size=0x20000,total=1,level=1,assoc=direct,policy=write-back,line=8 \
> | grep -A 5 HMAT
> 
> 
> [    0.038912] ACPI: HMAT 0x000000007FFE16C5 000118 (v02 BOCHS  BXPCHMAT 00000001 BXPC 00000001)
> [    0.040954] SRAT: PXM 0 -> APIC 0x00 -> Node 0
> [    0.040999] SRAT: PXM 0 -> APIC 0x01 -> Node 0
> [    0.041189] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x0009ffff]
> [    0.041250] ACPI: SRAT: Node 0 PXM 0 [mem 0x00100000-0x3fffffff]
> [    0.041276] ACPI: SRAT: Node 1 PXM 1 [mem 0x40000000-0x7fffffff]
> --
> [    1.984572] HMAT: Memory Flags:0001 Processor Domain:0 Memory Domain:0
> [    1.984792] HMAT: Memory Flags:0000 Processor Domain:0 Memory Domain:1
> [    1.985435] HMAT: Locality: Flags:00 Type:Access Latency Initiator Domains:1 Target Domains:2 Base:1000000
> [    1.986424]   Initiator-Target[0-0]:123000 nsec
> [    1.986664]   Initiator-Target[0-1]:0 nsec
> [    1.986910] HMAT: Locality: Flags:00 Type:Access Bandwidth Initiator Domains:1 Target Domains:2 Base:1
> [    1.987229]   Initiator-Target[0-0]:200 MB/s
> [    1.987356]   Initiator-Target[0-1]:0 MB/s
> [    1.987549] HMAT: Cache: Domain:0 Size:131072 Attrs:00081111 SMBIOS Handles:0
> [    1.988393] HMAT: Cache: Domain:1 Size:131072 Attrs:00081111 SMBIOS Handles:0
> 
> Leaving default latency/bw as 0 if ok as spec says '0: the corresponding latency
> or bandwidth information is not provided.'. Potentially the kernel could display this better.

Yes, we set default as 0, because spec said. But the kernel code has no 
warning just show 0. But in qemu-options.hx we note this:

if NUM is 0, means the corresponding latency or bandwidth information is 
not provided.

> 
> Also note https://marc.info/?l=linux-acpi&m=156506549410279&w=2 submitted as
> hmat_build_table_structs only calls build_hmat_mpda with flags=0 or HMAT_PROX_INIT_VALID (0x1) which is right looking at ACPI-6.3. An Ack/(Nack if I'm wrong) there would be good to have both kernel and this patch series working together.

Sounds good!
> 
> for entire series:
> 
> Reviewed-by: Daniel Black <daniel@linux.ibm.com>
> 



  reply	other threads:[~2019-08-12  6:12 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-09  6:57 [Qemu-devel] [PATCH v9 00/11] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao
2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 01/11] hw/arm: simplify arm_load_dtb Tao
2019-08-13 21:55   ` Alistair Francis
2019-08-14  1:19     ` Andrew Jeffery
2019-08-13 21:55   ` Eduardo Habkost
2019-08-14 13:08     ` Cédric Le Goater
2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 02/11] numa: move numa global variable nb_numa_nodes into MachineState Tao
2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 03/11] numa: move numa global variable have_numa_distance " Tao
2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 04/11] numa: move numa global variable numa_info " Tao
2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 05/11] numa: Extend CLI to provide initiator information for numa nodes Tao
2019-08-13 15:00   ` Igor Mammedov
2019-08-14  2:24     ` Tao Xu
2019-08-16 14:47       ` Igor Mammedov
2019-08-14  2:39     ` Dan Williams
2019-08-14  5:13       ` Tao Xu
2019-08-14 21:29         ` Dan Williams
2019-08-15  1:56           ` Tao Xu
2019-08-15  2:31             ` Dan Williams
2019-08-16 14:57               ` Igor Mammedov
2019-08-20  8:34                 ` Tao Xu
2019-08-27 13:12                   ` Igor Mammedov
2019-08-28  1:09                     ` Tao Xu
2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 06/11] hmat acpi: Build Memory Proximity Domain Attributes Structure(s) Tao
2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 07/11] hmat acpi: Build System Locality Latency and Bandwidth Information Structure(s) Tao
2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 08/11] hmat acpi: Build Memory Side Cache " Tao
2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 09/11] numa: Extend the CLI to provide memory latency and bandwidth information Tao
2019-08-12  5:13   ` Daniel Black
2019-08-12  6:11     ` Tao Xu [this message]
2019-08-13 15:11   ` Eric Blake
2019-08-14  2:58     ` Tao Xu
2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 10/11] numa: Extend the CLI to provide memory side cache information Tao
2019-08-09  6:57 ` [Qemu-devel] [PATCH v9 11/11] tests/bios-tables-test: add test cases for ACPI HMAT Tao
2019-08-09 11:11 ` [Qemu-devel] [PATCH v9 00/11] Build ACPI Heterogeneous Memory Attribute Table (HMAT) no-reply
2019-08-13  8:53 ` Tao Xu
2019-08-14 20:57   ` Eduardo Habkost
2019-08-15  0:53     ` Tao Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a31b11d4-52d8-1c91-a8c9-70470fc3159d@intel.com \
    --to=tao3.xu@intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=daniel@linux.ibm.com \
    --cc=ehabkost@redhat.com \
    --cc=fan.du@intel.com \
    --cc=imammedo@redhat.com \
    --cc=jingqi.liu@intel.com \
    --cc=jonathan.cameron@huawei.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).