All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v4] Allow setting NUMA distance for different NUMA nodes
@ 2017-04-01 10:25 He Chen
  2017-04-02 15:42 ` no-reply
  2017-04-03  8:38 ` Andrew Jones
  0 siblings, 2 replies; 6+ messages in thread
From: He Chen @ 2017-04-01 10:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S . Tsirkin, Igor Mammedov, Paolo Bonzini,
	Richard Henderson, Eduardo Habkost, Eric Blake,
	Markus Armbruster, Andrew Jones

Current, QEMU does not provide a clear command to set vNUMA distance for
guest although we already have `-numa` command to set vNUMA nodes.

vNUMA distance makes sense in certain scenario.
But now, if we create a guest that has 4 vNUMA nodes, when we check NUMA
info via `numactl -H`, we will see:

node distance:
node    0    1    2    3
  0:   10   20   20   20
  1:   20   10   20   20
  2:   20   20   10   20
  3:   20   20   20   10

Guest kernel regards all local node as distance 10, and all remote node
as distance 20 when there is no SLIT table since QEMU doesn't build it.
It looks like a little strange when you have seen the distance in an
actual physical machine that contains 4 NUMA nodes. My machine shows:

node distance:
node    0    1    2    3
  0:   10   21   31   41
  1:   21   10   21   31
  2:   31   21   10   21
  3:   41   31   21   10

To set vNUMA distance, guest should see a complete SLIT table.
I found QEMU has provide `-acpitable` command that allows users to add
a ACPI table into guest, but it requires users building ACPI table by
themselves first. Using `-acpitable` to add a SLIT table may be not so
straightforward or flexible, imagine that when the vNUMA configuration
is changes and we need to generate another SLIT table manually. It may
not be friendly to users or upper software like libvirt.

This patch is going to add SLIT table support in QEMU, and provides
additional option `dist` for command `-numa` to allow user set vNUMA
distance by QEMU command.

With this patch, when a user wants to create a guest that contains
several vNUMA nodes and also wants to set distance among those nodes,
the QEMU command would like:

```
-numa node,nodeid=0,cpus=0 \
-numa node,nodeid=1,cpus=1 \
-numa node,nodeid=2,cpus=2 \
-numa node,nodeid=3,cpus=3 \
-numa dist,src=0,dst=0,val=10 \
-numa dist,src=0,dst=1,val=21 \
-numa dist,src=0,dst=2,val=31 \
-numa dist,src=0,dst=3,val=41 \
-numa dist,src=1,dst=0,val=21 \
-numa dist,src=1,dst=1,val=10 \
-numa dist,src=1,dst=2,val=21 \
-numa dist,src=1,dst=3,val=31 \
-numa dist,src=2,dst=0,val=31 \
-numa dist,src=2,dst=1,val=21 \
-numa dist,src=2,dst=2,val=10 \
-numa dist,src=2,dst=3,val=21 \
-numa dist,src=3,dst=0,val=41 \
-numa dist,src=3,dst=1,val=31 \
-numa dist,src=3,dst=2,val=21 \
-numa dist,src=3,dst=3,val=10 \
```

Signed-off-by: He Chen <he.chen@linux.intel.com>
---
 hw/acpi/aml-build.c         | 26 +++++++++++++++++
 hw/i386/acpi-build.c        |  2 ++
 include/hw/acpi/aml-build.h |  1 +
 include/sysemu/numa.h       |  1 +
 include/sysemu/sysemu.h     |  4 +++
 numa.c                      | 70 +++++++++++++++++++++++++++++++++++++++++++++
 qapi-schema.json            | 28 ++++++++++++++++--
 qemu-options.hx             | 11 ++++++-
 8 files changed, 140 insertions(+), 3 deletions(-)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index c6f2032..410b30e 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -24,6 +24,7 @@
 #include "hw/acpi/aml-build.h"
 #include "qemu/bswap.h"
 #include "qemu/bitops.h"
+#include "sysemu/numa.h"
 
 static GArray *build_alloc_array(void)
 {
@@ -1609,3 +1610,28 @@ void build_srat_memory(AcpiSratMemoryAffinity *numamem, uint64_t base,
     numamem->base_addr = cpu_to_le64(base);
     numamem->range_length = cpu_to_le64(len);
 }
+
+/*
+ * ACPI spec 5.2.17 System Locality Distance Information Table
+ * (Revision 2.0 or later)
+ */
+void build_slit(GArray *table_data, BIOSLinker *linker)
+{
+    int slit_start, i, j;
+    slit_start = table_data->len;
+
+    acpi_data_push(table_data, sizeof(AcpiTableHeader));
+
+    build_append_int_noprefix(table_data, nb_numa_nodes, 8);
+    for (i = 0; i < nb_numa_nodes; i++) {
+        for (j = 0; j < nb_numa_nodes; j++) {
+            build_append_int_noprefix(table_data, numa_info[i].distance[j], 1);
+        }
+    }
+
+    build_header(linker, table_data,
+                 (void *)(table_data->data + slit_start),
+                 "SLIT",
+                 table_data->len - slit_start, 1, NULL, NULL);
+}
+
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 2073108..12730ea 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2678,6 +2678,8 @@ void acpi_build(AcpiBuildTables *tables, MachineState *machine)
     if (pcms->numa_nodes) {
         acpi_add_table(table_offsets, tables_blob);
         build_srat(tables_blob, tables->linker, machine);
+        acpi_add_table(table_offsets, tables_blob);
+        build_slit(tables_blob, tables->linker);
     }
     if (acpi_get_mcfg(&mcfg)) {
         acpi_add_table(table_offsets, tables_blob);
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index 00c21f1..329a0d0 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -389,4 +389,5 @@ GCC_FMT_ATTR(2, 3);
 void build_srat_memory(AcpiSratMemoryAffinity *numamem, uint64_t base,
                        uint64_t len, int node, MemoryAffinityFlags flags);
 
+void build_slit(GArray *table_data, BIOSLinker *linker);
 #endif
diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
index 8f09dcf..2f7a941 100644
--- a/include/sysemu/numa.h
+++ b/include/sysemu/numa.h
@@ -21,6 +21,7 @@ typedef struct node_info {
     struct HostMemoryBackend *node_memdev;
     bool present;
     QLIST_HEAD(, numa_addr_range) addr; /* List to store address ranges */
+    uint8_t distance[MAX_NODES];
 } NodeInfo;
 
 extern NodeInfo numa_info[MAX_NODES];
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 576c7ce..6999545 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -169,6 +169,10 @@ extern int mem_prealloc;
 
 #define MAX_NODES 128
 #define NUMA_NODE_UNASSIGNED MAX_NODES
+#define NUMA_DISTANCE_MIN         10
+#define NUMA_DISTANCE_DEFAULT     20
+#define NUMA_DISTANCE_MAX         254
+#define NUMA_DISTANCE_UNREACHABLE 255
 
 #define MAX_OPTION_ROMS 16
 typedef struct QEMUOptionRom {
diff --git a/numa.c b/numa.c
index e01cb54..421c383 100644
--- a/numa.c
+++ b/numa.c
@@ -212,6 +212,40 @@ static void numa_node_parse(NumaNodeOptions *node, QemuOpts *opts, Error **errp)
     max_numa_nodeid = MAX(max_numa_nodeid, nodenr + 1);
 }
 
+static void numa_distance_parse(NumaDistOptions *dist, QemuOpts *opts, Error **errp)
+{
+    uint16_t src = dist->src;
+    uint16_t dst = dist->dst;
+    uint8_t val = dist->val;
+
+    if (!numa_info[src].present || !numa_info[dst].present) {
+        error_setg(errp, "Source/Destination NUMA node is missing. "
+                   "Please use '-numa node' option to declare it first.");
+        return;
+    }
+
+    if (src >= MAX_NODES || dst >= MAX_NODES) {
+        error_setg(errp, "Max number of NUMA nodes reached: %"
+                   PRIu16 "", src > dst ? src : dst);
+        return;
+    }
+
+    if (val < NUMA_DISTANCE_MIN) {
+        error_setg(errp, "NUMA distance (%" PRIu8 ") is invalid, "
+                   "it should be larger than %d.",
+                   val, NUMA_DISTANCE_MIN);
+        return;
+    }
+
+    if (src == dst && val != NUMA_DISTANCE_MIN) {
+        error_setg(errp, "Local distance of node %d should be %d.",
+                   src, NUMA_DISTANCE_MIN);
+        return;
+    }
+
+    numa_info[src].distance[dst] = val;
+}
+
 static int parse_numa(void *opaque, QemuOpts *opts, Error **errp)
 {
     NumaOptions *object = NULL;
@@ -235,6 +269,12 @@ static int parse_numa(void *opaque, QemuOpts *opts, Error **errp)
         }
         nb_numa_nodes++;
         break;
+    case NUMA_OPTIONS_TYPE_DIST:
+        numa_distance_parse(&object->u.dist, opts, &err);
+        if (err) {
+            goto end;
+        }
+        break;
     default:
         abort();
     }
@@ -294,6 +334,35 @@ static void validate_numa_cpus(void)
     g_free(seen_cpus);
 }
 
+static void validate_numa_distance(void)
+{
+    int src, dst;
+    bool have_distance = false;
+
+    for (src = 0; src < nb_numa_nodes; src++) {
+        for (dst = 0; dst < nb_numa_nodes; dst++) {
+            if (numa_info[src].present &&
+                numa_info[src].distance[dst] != 0)
+                have_distance = true;
+        }
+    }
+
+    if (!have_distance)
+        return;
+
+    for (src = 0; src < nb_numa_nodes; src++) {
+        for (dst = 0; dst < nb_numa_nodes; dst++) {
+            if (numa_info[src].present &&
+                numa_info[src].distance[dst] == 0) {
+                error_report("The distance between node %d and %d is missing, "
+                             "please provide the complete NUMA distance information.",
+                             src, dst);
+                exit(EXIT_FAILURE);
+            }
+        }
+    }
+}
+
 void parse_numa_opts(MachineClass *mc)
 {
     int i;
@@ -390,6 +459,7 @@ void parse_numa_opts(MachineClass *mc)
         }
 
         validate_numa_cpus();
+        validate_numa_distance();
     } else {
         numa_set_mem_node_id(0, ram_size, 0);
     }
diff --git a/qapi-schema.json b/qapi-schema.json
index 32b4a4b..b432e13 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -5644,10 +5644,14 @@
 ##
 # @NumaOptionsType:
 #
+# @node: NUMA nodes configuration
+#
+# @dist: NUMA distance configuration
+#
 # Since: 2.1
 ##
 { 'enum': 'NumaOptionsType',
-  'data': [ 'node' ] }
+  'data': [ 'node', 'dist' ] }
 
 ##
 # @NumaOptions:
@@ -5660,7 +5664,8 @@
   'base': { 'type': 'NumaOptionsType' },
   'discriminator': 'type',
   'data': {
-    'node': 'NumaNodeOptions' }}
+    'node': 'NumaNodeOptions',
+    'dist': 'NumaDistOptions' }}
 
 ##
 # @NumaNodeOptions:
@@ -5689,6 +5694,25 @@
    '*memdev': 'str' }}
 
 ##
+# @NumaDistOptions:
+#
+# Set the distance between 2 NUMA nodes.
+#
+# @src: source NUMA node.
+#
+# @dst: destination NUMA node.
+#
+# @val: NUMA distance from source node to destination node.
+#
+# Since: 2.10
+##
+{ 'struct': 'NumaDistOptions',
+  'data': {
+   'src': 'uint16',
+   'dst': 'uint16',
+   'val': 'uint8' }}
+
+##
 # @HostMemPolicy:
 #
 # Host memory policy types
diff --git a/qemu-options.hx b/qemu-options.hx
index 8dd8ee3..ce1a8ad 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -139,12 +139,15 @@ ETEXI
 
 DEF("numa", HAS_ARG, QEMU_OPTION_numa,
     "-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node]\n"
-    "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node]\n", QEMU_ARCH_ALL)
+    "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node]\n"
+    "-numa dist,src=source,dst=destination,val=distance\n", QEMU_ARCH_ALL)
 STEXI
 @item -numa node[,mem=@var{size}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}]
 @itemx -numa node[,memdev=@var{id}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}]
+@itemx -numa dist,src=@var{source},dst=@var{destination},val=@var{distance}
 @findex -numa
 Define a NUMA node and assign RAM and VCPUs to it.
+Set the NUMA distance from a source node to a destination node.
 
 @var{firstcpu} and @var{lastcpu} are CPU indexes. Each
 @samp{cpus} option represent a contiguous range of CPU indexes
@@ -167,6 +170,12 @@ split equally between them.
 @samp{mem} and @samp{memdev} are mutually exclusive. Furthermore,
 if one node uses @samp{memdev}, all of them have to use it.
 
+@var{source} and @var{destination} are NUMA node IDs.
+@var{distance} is the NUMA distance from @var{source} to @var{destination}.
+The distance from node A to node B may be different from the distance from
+node B to node A as the distance can to be asymmetrical. If a node is
+unreachable, set 255 as distance.
+
 Note that the -@option{numa} option doesn't allocate any of the
 specified resources, it just assigns existing resources to NUMA
 nodes. This means that one still has to use the @option{-m},
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCH v4] Allow setting NUMA distance for different NUMA nodes
  2017-04-01 10:25 [Qemu-devel] [PATCH v4] Allow setting NUMA distance for different NUMA nodes He Chen
@ 2017-04-02 15:42 ` no-reply
  2017-04-03  8:38 ` Andrew Jones
  1 sibling, 0 replies; 6+ messages in thread
From: no-reply @ 2017-04-02 15:42 UTC (permalink / raw)
  To: he.chen
  Cc: famz, qemu-devel, drjones, ehabkost, mst, armbru, pbonzini,
	imammedo, rth

Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 1491042326-24335-1-git-send-email-he.chen@linux.intel.com
Subject: [Qemu-devel] [PATCH v4] Allow setting NUMA distance for different NUMA nodes

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

# Useful git options
git config --local diff.renamelimit 0
git config --local diff.renames True

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
    echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
    if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
        failed=1
        echo
    fi
    n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
78dd9f1 Allow setting NUMA distance for different NUMA nodes

=== OUTPUT BEGIN ===
Checking PATCH 1/1: Allow setting NUMA distance for different NUMA nodes...
WARNING: line over 80 characters
#173: FILE: numa.c:215:
+static void numa_distance_parse(NumaDistOptions *dist, QemuOpts *opts, Error **errp)

ERROR: braces {} are necessary for all arms of this statement
#234: FILE: numa.c:344:
+            if (numa_info[src].present &&
[...]

ERROR: braces {} are necessary for all arms of this statement
#240: FILE: numa.c:350:
+    if (!have_distance)
[...]

total: 2 errors, 1 warnings, 236 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@freelists.org

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCH v4] Allow setting NUMA distance for different NUMA nodes
  2017-04-01 10:25 [Qemu-devel] [PATCH v4] Allow setting NUMA distance for different NUMA nodes He Chen
  2017-04-02 15:42 ` no-reply
@ 2017-04-03  8:38 ` Andrew Jones
  2017-04-03 18:58   ` Eric Blake
  2017-04-05 10:03   ` He Chen
  1 sibling, 2 replies; 6+ messages in thread
From: Andrew Jones @ 2017-04-03  8:38 UTC (permalink / raw)
  To: He Chen
  Cc: qemu-devel, Eduardo Habkost, Michael S . Tsirkin,
	Markus Armbruster, Paolo Bonzini, Igor Mammedov,
	Richard Henderson

On Sat, Apr 01, 2017 at 06:25:26PM +0800, He Chen wrote:
> Current, QEMU does not provide a clear command to set vNUMA distance for
> guest although we already have `-numa` command to set vNUMA nodes.
> 
> vNUMA distance makes sense in certain scenario.
> But now, if we create a guest that has 4 vNUMA nodes, when we check NUMA
> info via `numactl -H`, we will see:
> 
> node distance:
> node    0    1    2    3
>   0:   10   20   20   20
>   1:   20   10   20   20
>   2:   20   20   10   20
>   3:   20   20   20   10
> 
> Guest kernel regards all local node as distance 10, and all remote node
> as distance 20 when there is no SLIT table since QEMU doesn't build it.
> It looks like a little strange when you have seen the distance in an
> actual physical machine that contains 4 NUMA nodes. My machine shows:
> 
> node distance:
> node    0    1    2    3
>   0:   10   21   31   41
>   1:   21   10   21   31
>   2:   31   21   10   21
>   3:   41   31   21   10
> 
> To set vNUMA distance, guest should see a complete SLIT table.
> I found QEMU has provide `-acpitable` command that allows users to add
> a ACPI table into guest, but it requires users building ACPI table by
> themselves first. Using `-acpitable` to add a SLIT table may be not so
> straightforward or flexible, imagine that when the vNUMA configuration
> is changes and we need to generate another SLIT table manually. It may
> not be friendly to users or upper software like libvirt.
> 
> This patch is going to add SLIT table support in QEMU, and provides
> additional option `dist` for command `-numa` to allow user set vNUMA
> distance by QEMU command.
> 
> With this patch, when a user wants to create a guest that contains
> several vNUMA nodes and also wants to set distance among those nodes,
> the QEMU command would like:
> 
> ```
> -numa node,nodeid=0,cpus=0 \
> -numa node,nodeid=1,cpus=1 \
> -numa node,nodeid=2,cpus=2 \
> -numa node,nodeid=3,cpus=3 \
> -numa dist,src=0,dst=0,val=10 \
> -numa dist,src=0,dst=1,val=21 \
> -numa dist,src=0,dst=2,val=31 \
> -numa dist,src=0,dst=3,val=41 \
> -numa dist,src=1,dst=0,val=21 \
> -numa dist,src=1,dst=1,val=10 \
> -numa dist,src=1,dst=2,val=21 \
> -numa dist,src=1,dst=3,val=31 \
> -numa dist,src=2,dst=0,val=31 \
> -numa dist,src=2,dst=1,val=21 \
> -numa dist,src=2,dst=2,val=10 \
> -numa dist,src=2,dst=3,val=21 \
> -numa dist,src=3,dst=0,val=41 \
> -numa dist,src=3,dst=1,val=31 \
> -numa dist,src=3,dst=2,val=21 \
> -numa dist,src=3,dst=3,val=10 \
> ```
> 
> Signed-off-by: He Chen <he.chen@linux.intel.com>
> ---
>  hw/acpi/aml-build.c         | 26 +++++++++++++++++
>  hw/i386/acpi-build.c        |  2 ++
>  include/hw/acpi/aml-build.h |  1 +
>  include/sysemu/numa.h       |  1 +
>  include/sysemu/sysemu.h     |  4 +++
>  numa.c                      | 70 +++++++++++++++++++++++++++++++++++++++++++++
>  qapi-schema.json            | 28 ++++++++++++++++--
>  qemu-options.hx             | 11 ++++++-
>  8 files changed, 140 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
> index c6f2032..410b30e 100644
> --- a/hw/acpi/aml-build.c
> +++ b/hw/acpi/aml-build.c
> @@ -24,6 +24,7 @@
>  #include "hw/acpi/aml-build.h"
>  #include "qemu/bswap.h"
>  #include "qemu/bitops.h"
> +#include "sysemu/numa.h"
>  
>  static GArray *build_alloc_array(void)
>  {
> @@ -1609,3 +1610,28 @@ void build_srat_memory(AcpiSratMemoryAffinity *numamem, uint64_t base,
>      numamem->base_addr = cpu_to_le64(base);
>      numamem->range_length = cpu_to_le64(len);
>  }
> +
> +/*
> + * ACPI spec 5.2.17 System Locality Distance Information Table
> + * (Revision 2.0 or later)
> + */
> +void build_slit(GArray *table_data, BIOSLinker *linker)
> +{
> +    int slit_start, i, j;
> +    slit_start = table_data->len;
> +
> +    acpi_data_push(table_data, sizeof(AcpiTableHeader));
> +
> +    build_append_int_noprefix(table_data, nb_numa_nodes, 8);
> +    for (i = 0; i < nb_numa_nodes; i++) {
> +        for (j = 0; j < nb_numa_nodes; j++) {
> +            build_append_int_noprefix(table_data, numa_info[i].distance[j], 1);
> +        }
> +    }
> +
> +    build_header(linker, table_data,
> +                 (void *)(table_data->data + slit_start),
> +                 "SLIT",
> +                 table_data->len - slit_start, 1, NULL, NULL);
> +}
> +
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index 2073108..12730ea 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -2678,6 +2678,8 @@ void acpi_build(AcpiBuildTables *tables, MachineState *machine)
>      if (pcms->numa_nodes) {
>          acpi_add_table(table_offsets, tables_blob);
>          build_srat(tables_blob, tables->linker, machine);
> +        acpi_add_table(table_offsets, tables_blob);
> +        build_slit(tables_blob, tables->linker);
>      }
>      if (acpi_get_mcfg(&mcfg)) {
>          acpi_add_table(table_offsets, tables_blob);
> diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
> index 00c21f1..329a0d0 100644
> --- a/include/hw/acpi/aml-build.h
> +++ b/include/hw/acpi/aml-build.h
> @@ -389,4 +389,5 @@ GCC_FMT_ATTR(2, 3);
>  void build_srat_memory(AcpiSratMemoryAffinity *numamem, uint64_t base,
>                         uint64_t len, int node, MemoryAffinityFlags flags);
>  
> +void build_slit(GArray *table_data, BIOSLinker *linker);
>  #endif
> diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
> index 8f09dcf..2f7a941 100644
> --- a/include/sysemu/numa.h
> +++ b/include/sysemu/numa.h
> @@ -21,6 +21,7 @@ typedef struct node_info {
>      struct HostMemoryBackend *node_memdev;
>      bool present;
>      QLIST_HEAD(, numa_addr_range) addr; /* List to store address ranges */
> +    uint8_t distance[MAX_NODES];
>  } NodeInfo;
>  
>  extern NodeInfo numa_info[MAX_NODES];
> diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> index 576c7ce..6999545 100644
> --- a/include/sysemu/sysemu.h
> +++ b/include/sysemu/sysemu.h
> @@ -169,6 +169,10 @@ extern int mem_prealloc;
>  
>  #define MAX_NODES 128
>  #define NUMA_NODE_UNASSIGNED MAX_NODES
> +#define NUMA_DISTANCE_MIN         10
> +#define NUMA_DISTANCE_DEFAULT     20
> +#define NUMA_DISTANCE_MAX         254
> +#define NUMA_DISTANCE_UNREACHABLE 255
>  
>  #define MAX_OPTION_ROMS 16
>  typedef struct QEMUOptionRom {
> diff --git a/numa.c b/numa.c
> index e01cb54..421c383 100644
> --- a/numa.c
> +++ b/numa.c
> @@ -212,6 +212,40 @@ static void numa_node_parse(NumaNodeOptions *node, QemuOpts *opts, Error **errp)
>      max_numa_nodeid = MAX(max_numa_nodeid, nodenr + 1);
>  }
>  
> +static void numa_distance_parse(NumaDistOptions *dist, QemuOpts *opts, Error **errp)
> +{
> +    uint16_t src = dist->src;
> +    uint16_t dst = dist->dst;
> +    uint8_t val = dist->val;
> +
> +    if (!numa_info[src].present || !numa_info[dst].present) {
> +        error_setg(errp, "Source/Destination NUMA node is missing. "
> +                   "Please use '-numa node' option to declare it first.");
> +        return;
> +    }
> +
> +    if (src >= MAX_NODES || dst >= MAX_NODES) {
> +        error_setg(errp, "Max number of NUMA nodes reached: %"
> +                   PRIu16 "", src > dst ? src : dst);
> +        return;
> +    }
> +
> +    if (val < NUMA_DISTANCE_MIN) {
> +        error_setg(errp, "NUMA distance (%" PRIu8 ") is invalid, "
> +                   "it should be larger than %d.",
> +                   val, NUMA_DISTANCE_MIN);
> +        return;
> +    }
> +
> +    if (src == dst && val != NUMA_DISTANCE_MIN) {
> +        error_setg(errp, "Local distance of node %d should be %d.",
> +                   src, NUMA_DISTANCE_MIN);
> +        return;
> +    }
> +
> +    numa_info[src].distance[dst] = val;
> +}
> +
>  static int parse_numa(void *opaque, QemuOpts *opts, Error **errp)
>  {
>      NumaOptions *object = NULL;
> @@ -235,6 +269,12 @@ static int parse_numa(void *opaque, QemuOpts *opts, Error **errp)
>          }
>          nb_numa_nodes++;
>          break;
> +    case NUMA_OPTIONS_TYPE_DIST:
> +        numa_distance_parse(&object->u.dist, opts, &err);
> +        if (err) {
> +            goto end;
> +        }
> +        break;
>      default:
>          abort();
>      }
> @@ -294,6 +334,35 @@ static void validate_numa_cpus(void)
>      g_free(seen_cpus);
>  }
>  
> +static void validate_numa_distance(void)
> +{
> +    int src, dst;
> +    bool have_distance = false;
> +
> +    for (src = 0; src < nb_numa_nodes; src++) {
> +        for (dst = 0; dst < nb_numa_nodes; dst++) {
> +            if (numa_info[src].present &&
> +                numa_info[src].distance[dst] != 0)
> +                have_distance = true;
> +        }
> +    }
> +
> +    if (!have_distance)
> +        return;
> +
> +    for (src = 0; src < nb_numa_nodes; src++) {
> +        for (dst = 0; dst < nb_numa_nodes; dst++) {
> +            if (numa_info[src].present &&
> +                numa_info[src].distance[dst] == 0) {
> +                error_report("The distance between node %d and %d is missing, "
> +                             "please provide the complete NUMA distance information.",
> +                             src, dst);
> +                exit(EXIT_FAILURE);
> +            }
> +        }
> +    }
> +}

This validation is stricter than what Eduardo and I agreed was sufficient.
This says if any distance is given, they must all be given. We agreed that
the symmetrical shortcut was probably OK, but if any asymmetrical distance
is given, then they must all be given. Here a couple examples

Given:
 A -> B : 25
 A -> C : 35
 A -> D : 45
 B -> C : 25
 B -> D : 35
 C -> D : 25

The above is OK. All reverse directions are assumed symmetrical.

Given:
 A -> B : 25
 A -> C : 35
 A -> D : 45
 B -> C : 25
 B -> D : 35
 C -> D : 25
 D -> C : 35

The above is not OK, as C -> D and D -> C are given asymmetrical
distances, but no others are. We can no longer trust that the user meant
the rest are symmetrical, so all must be given now.

We should also ensure that when even one node pair's distance is given,
then all unique node pair's must have a distance given.

I've also attempted to describe this below as a suggestion for the
documentation.

> +
>  void parse_numa_opts(MachineClass *mc)
>  {
>      int i;
> @@ -390,6 +459,7 @@ void parse_numa_opts(MachineClass *mc)
>          }
>  
>          validate_numa_cpus();
> +        validate_numa_distance();
>      } else {
>          numa_set_mem_node_id(0, ram_size, 0);
>      }
> diff --git a/qapi-schema.json b/qapi-schema.json
> index 32b4a4b..b432e13 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -5644,10 +5644,14 @@
>  ##
>  # @NumaOptionsType:
>  #
> +# @node: NUMA nodes configuration
> +#
> +# @dist: NUMA distance configuration
> +#
>  # Since: 2.1
>  ##
>  { 'enum': 'NumaOptionsType',
> -  'data': [ 'node' ] }
> +  'data': [ 'node', 'dist' ] }
>  
>  ##
>  # @NumaOptions:
> @@ -5660,7 +5664,8 @@
>    'base': { 'type': 'NumaOptionsType' },
>    'discriminator': 'type',
>    'data': {
> -    'node': 'NumaNodeOptions' }}
> +    'node': 'NumaNodeOptions',
> +    'dist': 'NumaDistOptions' }}
>  
>  ##
>  # @NumaNodeOptions:
> @@ -5689,6 +5694,25 @@
>     '*memdev': 'str' }}
>  
>  ##
> +# @NumaDistOptions:
> +#
> +# Set the distance between 2 NUMA nodes.
> +#
> +# @src: source NUMA node.
> +#
> +# @dst: destination NUMA node.
> +#
> +# @val: NUMA distance from source node to destination node.
> +#
> +# Since: 2.10
> +##
> +{ 'struct': 'NumaDistOptions',
> +  'data': {
> +   'src': 'uint16',
> +   'dst': 'uint16',
> +   'val': 'uint8' }}
> +
> +##
>  # @HostMemPolicy:
>  #
>  # Host memory policy types
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 8dd8ee3..ce1a8ad 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -139,12 +139,15 @@ ETEXI
>  
>  DEF("numa", HAS_ARG, QEMU_OPTION_numa,
>      "-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node]\n"
> -    "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node]\n", QEMU_ARCH_ALL)
> +    "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node]\n"
> +    "-numa dist,src=source,dst=destination,val=distance\n", QEMU_ARCH_ALL)
>  STEXI
>  @item -numa node[,mem=@var{size}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}]
>  @itemx -numa node[,memdev=@var{id}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}]
> +@itemx -numa dist,src=@var{source},dst=@var{destination},val=@var{distance}
>  @findex -numa
>  Define a NUMA node and assign RAM and VCPUs to it.
> +Set the NUMA distance from a source node to a destination node.
>  
>  @var{firstcpu} and @var{lastcpu} are CPU indexes. Each
>  @samp{cpus} option represent a contiguous range of CPU indexes
> @@ -167,6 +170,12 @@ split equally between them.
>  @samp{mem} and @samp{memdev} are mutually exclusive. Furthermore,
>  if one node uses @samp{memdev}, all of them have to use it.
>  
> +@var{source} and @var{destination} are NUMA node IDs.
> +@var{distance} is the NUMA distance from @var{source} to @var{destination}.
> +The distance from node A to node B may be different from the distance from
> +node B to node A as the distance can to be asymmetrical. If a node is
> +unreachable, set 255 as distance.

The distance from a node to itself is always 10.  If no distance values
are given for node pairs, then the default distance of 20 is used for each
pair.  If any pair of nodes is given a distance, then all pairs must be
given distances.  Although, when distances are only given in one direction
for each pair of nodes, then the distances in the opposite directions are
assumed to be the same.  If, however, an asymmetrical pair of distances is
given for even one node pair, then all node pairs must be provided
distance values for both directions, even when they are symmetrical.  When
a node is unreachable from another node, set the pair's distance to 255.

> +
>  Note that the -@option{numa} option doesn't allocate any of the
>  specified resources, it just assigns existing resources to NUMA
>  nodes. This means that one still has to use the @option{-m},
> -- 
> 2.7.4
> 
>

Thanks,
drew

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCH v4] Allow setting NUMA distance for different NUMA nodes
  2017-04-03  8:38 ` Andrew Jones
@ 2017-04-03 18:58   ` Eric Blake
  2017-04-05 10:17     ` He Chen
  2017-04-05 10:03   ` He Chen
  1 sibling, 1 reply; 6+ messages in thread
From: Eric Blake @ 2017-04-03 18:58 UTC (permalink / raw)
  To: Andrew Jones, He Chen
  Cc: Eduardo Habkost, Michael S . Tsirkin, Markus Armbruster,
	qemu-devel, Paolo Bonzini, Igor Mammedov, Richard Henderson

[-- Attachment #1: Type: text/plain, Size: 2671 bytes --]

On 04/03/2017 03:38 AM, Andrew Jones wrote:
> On Sat, Apr 01, 2017 at 06:25:26PM +0800, He Chen wrote:
>> Current, QEMU does not provide a clear command to set vNUMA distance for
>> guest although we already have `-numa` command to set vNUMA nodes.
>>

>> +++ b/qapi-schema.json
>> @@ -5644,10 +5644,14 @@
>>  ##
>>  # @NumaOptionsType:
>>  #
>> +# @node: NUMA nodes configuration
>> +#
>> +# @dist: NUMA distance configuration
>> +#

Missing a '(since 2.10)' tag on @dist.


>>  ##
>> +# @NumaDistOptions:
>> +#
>> +# Set the distance between 2 NUMA nodes.
>> +#
>> +# @src: source NUMA node.
>> +#
>> +# @dst: destination NUMA node.
>> +#
>> +# @val: NUMA distance from source node to destination node.

Missing mention of the magic of 255.

>> +#
>> +# Since: 2.10
>> +##
>> +{ 'struct': 'NumaDistOptions',
>> +  'data': {
>> +   'src': 'uint16',
>> +   'dst': 'uint16',
>> +   'val': 'uint8' }}
>> +

Is a user allowed to pass { "src":0, "dst":0, "val":11 }, or do you
hard-validate that the diagonal of the matrix is always 10 (the user can
omit it, but if specified it must be 10).  Do you enforce that no
distance is less than 10?

>> +@var{source} and @var{destination} are NUMA node IDs.
>> +@var{distance} is the NUMA distance from @var{source} to @var{destination}.
>> +The distance from node A to node B may be different from the distance from
>> +node B to node A as the distance can to be asymmetrical. If a node is
>> +unreachable, set 255 as distance.
> 
> The distance from a node to itself is always 10.  If no distance values
> are given for node pairs, then the default distance of 20 is used for each
> pair.  If any pair of nodes is given a distance, then all pairs must be
> given distances.  Although, when distances are only given in one direction
> for each pair of nodes, then the distances in the opposite directions are
> assumed to be the same.  If, however, an asymmetrical pair of distances is
> given for even one node pair, then all node pairs must be provided
> distance values for both directions, even when they are symmetrical.  When
> a node is unreachable from another node, set the pair's distance to 255.

I don't like duplication where it is not necessary, but don't know if
there's an easy way to make the .json file and qemu-options.hx refer to
one another, since they both feed separate user-visible documentation.
So you may have to repeat some of this in the .json file (such as my
mention above that at least documenting that 255 is special).

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCH v4] Allow setting NUMA distance for different NUMA nodes
  2017-04-03  8:38 ` Andrew Jones
  2017-04-03 18:58   ` Eric Blake
@ 2017-04-05 10:03   ` He Chen
  1 sibling, 0 replies; 6+ messages in thread
From: He Chen @ 2017-04-05 10:03 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Eduardo Habkost, Michael S . Tsirkin, Markus Armbruster,
	qemu-devel, Paolo Bonzini, Igor Mammedov, Richard Henderson

On Mon, Apr 03, 2017 at 10:38:51AM +0200, Andrew Jones wrote:
> On Sat, Apr 01, 2017 at 06:25:26PM +0800, He Chen wrote:
> > Current, QEMU does not provide a clear command to set vNUMA distance for
> > guest although we already have `-numa` command to set vNUMA nodes.
> > 
> > vNUMA distance makes sense in certain scenario.
> > But now, if we create a guest that has 4 vNUMA nodes, when we check NUMA
> > info via `numactl -H`, we will see:
> > 
> > node distance:
> > node    0    1    2    3
> >   0:   10   20   20   20
> >   1:   20   10   20   20
> >   2:   20   20   10   20
> >   3:   20   20   20   10
> > 
> > Guest kernel regards all local node as distance 10, and all remote node
> > as distance 20 when there is no SLIT table since QEMU doesn't build it.
> > It looks like a little strange when you have seen the distance in an
> > actual physical machine that contains 4 NUMA nodes. My machine shows:
> > 
> > node distance:
> > node    0    1    2    3
> >   0:   10   21   31   41
> >   1:   21   10   21   31
> >   2:   31   21   10   21
> >   3:   41   31   21   10
> > 
> > To set vNUMA distance, guest should see a complete SLIT table.
> > I found QEMU has provide `-acpitable` command that allows users to add
> > a ACPI table into guest, but it requires users building ACPI table by
> > themselves first. Using `-acpitable` to add a SLIT table may be not so
> > straightforward or flexible, imagine that when the vNUMA configuration
> > is changes and we need to generate another SLIT table manually. It may
> > not be friendly to users or upper software like libvirt.
> > 
> > This patch is going to add SLIT table support in QEMU, and provides
> > additional option `dist` for command `-numa` to allow user set vNUMA
> > distance by QEMU command.
> > 
> > With this patch, when a user wants to create a guest that contains
> > several vNUMA nodes and also wants to set distance among those nodes,
> > the QEMU command would like:
> > 
> > ```
> > -numa node,nodeid=0,cpus=0 \
> > -numa node,nodeid=1,cpus=1 \
> > -numa node,nodeid=2,cpus=2 \
> > -numa node,nodeid=3,cpus=3 \
> > -numa dist,src=0,dst=0,val=10 \
> > -numa dist,src=0,dst=1,val=21 \
> > -numa dist,src=0,dst=2,val=31 \
> > -numa dist,src=0,dst=3,val=41 \
> > -numa dist,src=1,dst=0,val=21 \
> > -numa dist,src=1,dst=1,val=10 \
> > -numa dist,src=1,dst=2,val=21 \
> > -numa dist,src=1,dst=3,val=31 \
> > -numa dist,src=2,dst=0,val=31 \
> > -numa dist,src=2,dst=1,val=21 \
> > -numa dist,src=2,dst=2,val=10 \
> > -numa dist,src=2,dst=3,val=21 \
> > -numa dist,src=3,dst=0,val=41 \
> > -numa dist,src=3,dst=1,val=31 \
> > -numa dist,src=3,dst=2,val=21 \
> > -numa dist,src=3,dst=3,val=10 \
> > ```
> > 
> > Signed-off-by: He Chen <he.chen@linux.intel.com>
> > ---
> >  hw/acpi/aml-build.c         | 26 +++++++++++++++++
> >  hw/i386/acpi-build.c        |  2 ++
> >  include/hw/acpi/aml-build.h |  1 +
> >  include/sysemu/numa.h       |  1 +
> >  include/sysemu/sysemu.h     |  4 +++
> >  numa.c                      | 70 +++++++++++++++++++++++++++++++++++++++++++++
> >  qapi-schema.json            | 28 ++++++++++++++++--
> >  qemu-options.hx             | 11 ++++++-
> >  8 files changed, 140 insertions(+), 3 deletions(-)
> > 
> > diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
> > index c6f2032..410b30e 100644
> > --- a/hw/acpi/aml-build.c
> > +++ b/hw/acpi/aml-build.c
> > @@ -24,6 +24,7 @@
> >  #include "hw/acpi/aml-build.h"
> >  #include "qemu/bswap.h"
> >  #include "qemu/bitops.h"
> > +#include "sysemu/numa.h"
> >  
> >  static GArray *build_alloc_array(void)
> >  {
> > @@ -1609,3 +1610,28 @@ void build_srat_memory(AcpiSratMemoryAffinity *numamem, uint64_t base,
> >      numamem->base_addr = cpu_to_le64(base);
> >      numamem->range_length = cpu_to_le64(len);
> >  }
> > +
> > +/*
> > + * ACPI spec 5.2.17 System Locality Distance Information Table
> > + * (Revision 2.0 or later)
> > + */
> > +void build_slit(GArray *table_data, BIOSLinker *linker)
> > +{
> > +    int slit_start, i, j;
> > +    slit_start = table_data->len;
> > +
> > +    acpi_data_push(table_data, sizeof(AcpiTableHeader));
> > +
> > +    build_append_int_noprefix(table_data, nb_numa_nodes, 8);
> > +    for (i = 0; i < nb_numa_nodes; i++) {
> > +        for (j = 0; j < nb_numa_nodes; j++) {
> > +            build_append_int_noprefix(table_data, numa_info[i].distance[j], 1);
> > +        }
> > +    }
> > +
> > +    build_header(linker, table_data,
> > +                 (void *)(table_data->data + slit_start),
> > +                 "SLIT",
> > +                 table_data->len - slit_start, 1, NULL, NULL);
> > +}
> > +
> > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> > index 2073108..12730ea 100644
> > --- a/hw/i386/acpi-build.c
> > +++ b/hw/i386/acpi-build.c
> > @@ -2678,6 +2678,8 @@ void acpi_build(AcpiBuildTables *tables, MachineState *machine)
> >      if (pcms->numa_nodes) {
> >          acpi_add_table(table_offsets, tables_blob);
> >          build_srat(tables_blob, tables->linker, machine);
> > +        acpi_add_table(table_offsets, tables_blob);
> > +        build_slit(tables_blob, tables->linker);
> >      }
> >      if (acpi_get_mcfg(&mcfg)) {
> >          acpi_add_table(table_offsets, tables_blob);
> > diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
> > index 00c21f1..329a0d0 100644
> > --- a/include/hw/acpi/aml-build.h
> > +++ b/include/hw/acpi/aml-build.h
> > @@ -389,4 +389,5 @@ GCC_FMT_ATTR(2, 3);
> >  void build_srat_memory(AcpiSratMemoryAffinity *numamem, uint64_t base,
> >                         uint64_t len, int node, MemoryAffinityFlags flags);
> >  
> > +void build_slit(GArray *table_data, BIOSLinker *linker);
> >  #endif
> > diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
> > index 8f09dcf..2f7a941 100644
> > --- a/include/sysemu/numa.h
> > +++ b/include/sysemu/numa.h
> > @@ -21,6 +21,7 @@ typedef struct node_info {
> >      struct HostMemoryBackend *node_memdev;
> >      bool present;
> >      QLIST_HEAD(, numa_addr_range) addr; /* List to store address ranges */
> > +    uint8_t distance[MAX_NODES];
> >  } NodeInfo;
> >  
> >  extern NodeInfo numa_info[MAX_NODES];
> > diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> > index 576c7ce..6999545 100644
> > --- a/include/sysemu/sysemu.h
> > +++ b/include/sysemu/sysemu.h
> > @@ -169,6 +169,10 @@ extern int mem_prealloc;
> >  
> >  #define MAX_NODES 128
> >  #define NUMA_NODE_UNASSIGNED MAX_NODES
> > +#define NUMA_DISTANCE_MIN         10
> > +#define NUMA_DISTANCE_DEFAULT     20
> > +#define NUMA_DISTANCE_MAX         254
> > +#define NUMA_DISTANCE_UNREACHABLE 255
> >  
> >  #define MAX_OPTION_ROMS 16
> >  typedef struct QEMUOptionRom {
> > diff --git a/numa.c b/numa.c
> > index e01cb54..421c383 100644
> > --- a/numa.c
> > +++ b/numa.c
> > @@ -212,6 +212,40 @@ static void numa_node_parse(NumaNodeOptions *node, QemuOpts *opts, Error **errp)
> >      max_numa_nodeid = MAX(max_numa_nodeid, nodenr + 1);
> >  }
> >  
> > +static void numa_distance_parse(NumaDistOptions *dist, QemuOpts *opts, Error **errp)
> > +{
> > +    uint16_t src = dist->src;
> > +    uint16_t dst = dist->dst;
> > +    uint8_t val = dist->val;
> > +
> > +    if (!numa_info[src].present || !numa_info[dst].present) {
> > +        error_setg(errp, "Source/Destination NUMA node is missing. "
> > +                   "Please use '-numa node' option to declare it first.");
> > +        return;
> > +    }
> > +
> > +    if (src >= MAX_NODES || dst >= MAX_NODES) {
> > +        error_setg(errp, "Max number of NUMA nodes reached: %"
> > +                   PRIu16 "", src > dst ? src : dst);
> > +        return;
> > +    }
> > +
> > +    if (val < NUMA_DISTANCE_MIN) {
> > +        error_setg(errp, "NUMA distance (%" PRIu8 ") is invalid, "
> > +                   "it should be larger than %d.",
> > +                   val, NUMA_DISTANCE_MIN);
> > +        return;
> > +    }
> > +
> > +    if (src == dst && val != NUMA_DISTANCE_MIN) {
> > +        error_setg(errp, "Local distance of node %d should be %d.",
> > +                   src, NUMA_DISTANCE_MIN);
> > +        return;
> > +    }
> > +
> > +    numa_info[src].distance[dst] = val;
> > +}
> > +
> >  static int parse_numa(void *opaque, QemuOpts *opts, Error **errp)
> >  {
> >      NumaOptions *object = NULL;
> > @@ -235,6 +269,12 @@ static int parse_numa(void *opaque, QemuOpts *opts, Error **errp)
> >          }
> >          nb_numa_nodes++;
> >          break;
> > +    case NUMA_OPTIONS_TYPE_DIST:
> > +        numa_distance_parse(&object->u.dist, opts, &err);
> > +        if (err) {
> > +            goto end;
> > +        }
> > +        break;
> >      default:
> >          abort();
> >      }
> > @@ -294,6 +334,35 @@ static void validate_numa_cpus(void)
> >      g_free(seen_cpus);
> >  }
> >  
> > +static void validate_numa_distance(void)
> > +{
> > +    int src, dst;
> > +    bool have_distance = false;
> > +
> > +    for (src = 0; src < nb_numa_nodes; src++) {
> > +        for (dst = 0; dst < nb_numa_nodes; dst++) {
> > +            if (numa_info[src].present &&
> > +                numa_info[src].distance[dst] != 0)
> > +                have_distance = true;
> > +        }
> > +    }
> > +
> > +    if (!have_distance)
> > +        return;
> > +
> > +    for (src = 0; src < nb_numa_nodes; src++) {
> > +        for (dst = 0; dst < nb_numa_nodes; dst++) {
> > +            if (numa_info[src].present &&
> > +                numa_info[src].distance[dst] == 0) {
> > +                error_report("The distance between node %d and %d is missing, "
> > +                             "please provide the complete NUMA distance information.",
> > +                             src, dst);
> > +                exit(EXIT_FAILURE);
> > +            }
> > +        }
> > +    }
> > +}
> 
> This validation is stricter than what Eduardo and I agreed was sufficient.
> This says if any distance is given, they must all be given. We agreed that
> the symmetrical shortcut was probably OK, but if any asymmetrical distance
> is given, then they must all be given. Here a couple examples
> 
> Given:
>  A -> B : 25
>  A -> C : 35
>  A -> D : 45
>  B -> C : 25
>  B -> D : 35
>  C -> D : 25
> 
> The above is OK. All reverse directions are assumed symmetrical.
> 
> Given:
>  A -> B : 25
>  A -> C : 35
>  A -> D : 45
>  B -> C : 25
>  B -> D : 35
>  C -> D : 25
>  D -> C : 35
> 
> The above is not OK, as C -> D and D -> C are given asymmetrical
> distances, but no others are. We can no longer trust that the user meant
> the rest are symmetrical, so all must be given now.
> 
> We should also ensure that when even one node pair's distance is given,
> then all unique node pair's must have a distance given.
> 
> I've also attempted to describe this below as a suggestion for the
> documentation.
> 

Thanks for your clear explain, I will cook a better patch soon.
> > +
> >  void parse_numa_opts(MachineClass *mc)
> >  {
> >      int i;
> > @@ -390,6 +459,7 @@ void parse_numa_opts(MachineClass *mc)
> >          }
> >  
> >          validate_numa_cpus();
> > +        validate_numa_distance();
> >      } else {
> >          numa_set_mem_node_id(0, ram_size, 0);
> >      }
> > diff --git a/qapi-schema.json b/qapi-schema.json
> > index 32b4a4b..b432e13 100644
> > --- a/qapi-schema.json
> > +++ b/qapi-schema.json
> > @@ -5644,10 +5644,14 @@
> >  ##
> >  # @NumaOptionsType:
> >  #
> > +# @node: NUMA nodes configuration
> > +#
> > +# @dist: NUMA distance configuration
> > +#
> >  # Since: 2.1
> >  ##
> >  { 'enum': 'NumaOptionsType',
> > -  'data': [ 'node' ] }
> > +  'data': [ 'node', 'dist' ] }
> >  
> >  ##
> >  # @NumaOptions:
> > @@ -5660,7 +5664,8 @@
> >    'base': { 'type': 'NumaOptionsType' },
> >    'discriminator': 'type',
> >    'data': {
> > -    'node': 'NumaNodeOptions' }}
> > +    'node': 'NumaNodeOptions',
> > +    'dist': 'NumaDistOptions' }}
> >  
> >  ##
> >  # @NumaNodeOptions:
> > @@ -5689,6 +5694,25 @@
> >     '*memdev': 'str' }}
> >  
> >  ##
> > +# @NumaDistOptions:
> > +#
> > +# Set the distance between 2 NUMA nodes.
> > +#
> > +# @src: source NUMA node.
> > +#
> > +# @dst: destination NUMA node.
> > +#
> > +# @val: NUMA distance from source node to destination node.
> > +#
> > +# Since: 2.10
> > +##
> > +{ 'struct': 'NumaDistOptions',
> > +  'data': {
> > +   'src': 'uint16',
> > +   'dst': 'uint16',
> > +   'val': 'uint8' }}
> > +
> > +##
> >  # @HostMemPolicy:
> >  #
> >  # Host memory policy types
> > diff --git a/qemu-options.hx b/qemu-options.hx
> > index 8dd8ee3..ce1a8ad 100644
> > --- a/qemu-options.hx
> > +++ b/qemu-options.hx
> > @@ -139,12 +139,15 @@ ETEXI
> >  
> >  DEF("numa", HAS_ARG, QEMU_OPTION_numa,
> >      "-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node]\n"
> > -    "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node]\n", QEMU_ARCH_ALL)
> > +    "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node]\n"
> > +    "-numa dist,src=source,dst=destination,val=distance\n", QEMU_ARCH_ALL)
> >  STEXI
> >  @item -numa node[,mem=@var{size}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}]
> >  @itemx -numa node[,memdev=@var{id}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}]
> > +@itemx -numa dist,src=@var{source},dst=@var{destination},val=@var{distance}
> >  @findex -numa
> >  Define a NUMA node and assign RAM and VCPUs to it.
> > +Set the NUMA distance from a source node to a destination node.
> >  
> >  @var{firstcpu} and @var{lastcpu} are CPU indexes. Each
> >  @samp{cpus} option represent a contiguous range of CPU indexes
> > @@ -167,6 +170,12 @@ split equally between them.
> >  @samp{mem} and @samp{memdev} are mutually exclusive. Furthermore,
> >  if one node uses @samp{memdev}, all of them have to use it.
> >  
> > +@var{source} and @var{destination} are NUMA node IDs.
> > +@var{distance} is the NUMA distance from @var{source} to @var{destination}.
> > +The distance from node A to node B may be different from the distance from
> > +node B to node A as the distance can to be asymmetrical. If a node is
> > +unreachable, set 255 as distance.
> 
> The distance from a node to itself is always 10.  If no distance values
> are given for node pairs, then the default distance of 20 is used for each
> pair.  If any pair of nodes is given a distance, then all pairs must be
> given distances.  Although, when distances are only given in one direction
> for each pair of nodes, then the distances in the opposite directions are
> assumed to be the same.  If, however, an asymmetrical pair of distances is
> given for even one node pair, then all node pairs must be provided
> distance values for both directions, even when they are symmetrical.  When
> a node is unreachable from another node, set the pair's distance to 255.
> 

Thanks for your time to help me refine document here, really appreciate
it!
> > +
> >  Note that the -@option{numa} option doesn't allocate any of the
> >  specified resources, it just assigns existing resources to NUMA
> >  nodes. This means that one still has to use the @option{-m},
> > -- 
> > 2.7.4
> > 
> >
> 
> Thanks,
> drew
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCH v4] Allow setting NUMA distance for different NUMA nodes
  2017-04-03 18:58   ` Eric Blake
@ 2017-04-05 10:17     ` He Chen
  0 siblings, 0 replies; 6+ messages in thread
From: He Chen @ 2017-04-05 10:17 UTC (permalink / raw)
  To: Eric Blake
  Cc: Andrew Jones, Eduardo Habkost, Michael S . Tsirkin,
	Markus Armbruster, qemu-devel, Paolo Bonzini, Igor Mammedov,
	Richard Henderson

On Mon, Apr 03, 2017 at 01:58:34PM -0500, Eric Blake wrote:
> On 04/03/2017 03:38 AM, Andrew Jones wrote:
> > On Sat, Apr 01, 2017 at 06:25:26PM +0800, He Chen wrote:
> >> Current, QEMU does not provide a clear command to set vNUMA distance for
> >> guest although we already have `-numa` command to set vNUMA nodes.
> >>
> 
> >> +++ b/qapi-schema.json
> >> @@ -5644,10 +5644,14 @@
> >>  ##
> >>  # @NumaOptionsType:
> >>  #
> >> +# @node: NUMA nodes configuration
> >> +#
> >> +# @dist: NUMA distance configuration
> >> +#
> 
> Missing a '(since 2.10)' tag on @dist.
> 
> 
> >>  ##
> >> +# @NumaDistOptions:
> >> +#
> >> +# Set the distance between 2 NUMA nodes.
> >> +#
> >> +# @src: source NUMA node.
> >> +#
> >> +# @dst: destination NUMA node.
> >> +#
> >> +# @val: NUMA distance from source node to destination node.
> 
> Missing mention of the magic of 255.
> 
> >> +#
> >> +# Since: 2.10
> >> +##
> >> +{ 'struct': 'NumaDistOptions',
> >> +  'data': {
> >> +   'src': 'uint16',
> >> +   'dst': 'uint16',
> >> +   'val': 'uint8' }}
> >> +
> 
> Is a user allowed to pass { "src":0, "dst":0, "val":11 }, or do you
> hard-validate that the diagonal of the matrix is always 10 (the user can
> omit it, but if specified it must be 10).  Do you enforce that no
> distance is less than 10?
> 

I see your concern, I think it's reasonable that a user is allowed to
set distance 11 to local node but qemu refuses and complains about it.

If the distance of local is omitted, qemu can fill it, but if specified,
I think it must be 10.

In current version patch, the distance is enforced to be >= 10, this
check is done in numa_distance_parser.
Anyway, I will imporve it in next patch, thanks for your review.

> >> +@var{source} and @var{destination} are NUMA node IDs.
> >> +@var{distance} is the NUMA distance from @var{source} to @var{destination}.
> >> +The distance from node A to node B may be different from the distance from
> >> +node B to node A as the distance can to be asymmetrical. If a node is
> >> +unreachable, set 255 as distance.
> > 
> > The distance from a node to itself is always 10.  If no distance values
> > are given for node pairs, then the default distance of 20 is used for each
> > pair.  If any pair of nodes is given a distance, then all pairs must be
> > given distances.  Although, when distances are only given in one direction
> > for each pair of nodes, then the distances in the opposite directions are
> > assumed to be the same.  If, however, an asymmetrical pair of distances is
> > given for even one node pair, then all node pairs must be provided
> > distance values for both directions, even when they are symmetrical.  When
> > a node is unreachable from another node, set the pair's distance to 255.
> 
> I don't like duplication where it is not necessary, but don't know if
> there's an easy way to make the .json file and qemu-options.hx refer to
> one another, since they both feed separate user-visible documentation.
> So you may have to repeat some of this in the .json file (such as my
> mention above that at least documenting that 255 is special).
> 

I am afraid that I am not an expert in qapi, I do see some parts in
these two files are duplicate and I am not sure I can deal with it well,
if you agree, I may probably just document magic 255 in next patch.

Thanks,
-He

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-04-05 10:17 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-01 10:25 [Qemu-devel] [PATCH v4] Allow setting NUMA distance for different NUMA nodes He Chen
2017-04-02 15:42 ` no-reply
2017-04-03  8:38 ` Andrew Jones
2017-04-03 18:58   ` Eric Blake
2017-04-05 10:17     ` He Chen
2017-04-05 10:03   ` He Chen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.