qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v13 00/12] Build ACPI Heterogeneous Memory Attribute Table (HMAT)
@ 2019-10-20 11:11 Tao Xu
  2019-10-20 11:11 ` [PATCH v13 01/12] util/cutils: Add qemu_strtotime_ps() Tao Xu
                   ` (16 more replies)
  0 siblings, 17 replies; 38+ messages in thread
From: Tao Xu @ 2019-10-20 11:11 UTC (permalink / raw)
  To: imammedo, eblake, ehabkost
  Cc: jingqi.liu, tao3.xu, fan.du, qemu-devel, jonathan.cameron

This series of patches will build Heterogeneous Memory Attribute Table (HMAT)
according to the command line. The ACPI HMAT describes the memory attributes,
such as memory side cache attributes and bandwidth and latency details,
related to the Memory Proximity Domain.
The software is expected to use HMAT information as hint for optimization.

In the linux kernel, the codes in drivers/acpi/hmat/hmat.c parse and report
the platform's HMAT tables.

The V12 patches link:
https://patchwork.kernel.org/cover/11153861/

Changelog:
v13:
    - Modify some text description
    - Drop "initiator_valid" field in struct NodeInfo
    - Reuse Garray to store the raw bandwidth and bandwidth data
    - Calculate common base unit using range bitmap
    - Add a patch to alculate hmat latency and bandwidth entry list
    - Drop the total_levels option and use readable cache size
    - Remove the unnecessary head file
    - Use decimal notation with appropriate suffix for cache size
v12:
    - Fix a bug that a memory-only node without initiator setting
      doesn't report error. (reported by Danmei Wei)
    - Fix a bug that if HMAT is enabled and without hmat-lb setting,
      QEMU will crash. (reported by Danmei Wei)
v11:
    - Move numa option patches forward.
    - Add num_initiator in Numa_state to record the number of
    initiators.
    - Simplify struct HMAT_LB_Info, use uint64_t array to store data.
    - Drop hmat_get_base().
    - Calculate base in build_hmat_lb().
v10:
    - Add qemu_strtotime_ps() to convert strings with time suffixes
    to numbers, and add some tests for it.
    - Add qapi buildin type time, and add some tests for it.
    - Add machine oprion properties "-machine hmat=on|off" for
	  enabling or disabling HMAT in QEMU.
v9:
    - change the CLI input way, make it more user firendly (Daniel Black)
    use latency=NUM[p|n|u]s and bandwidth=NUM[M|G|P](B/s) as input and drop
    the base-lat and base-bw input.
Liu Jingqi (5):
  numa: Extend CLI to provide memory latency and bandwidth information
  numa: Extend CLI to provide memory side cache information
  hmat acpi: Build Memory Proximity Domain Attributes Structure(s)
  hmat acpi: Build System Locality Latency and Bandwidth Information
    Structure(s)
  hmat acpi: Build Memory Side Cache Information Structure(s)

Tao Xu (7):
  util/cutils: Add qemu_strtotime_ps()
  tests/cutils: Add test for qemu_strtotime_ps()
  qapi: Add builtin type time
  tests: Add test for QAPI builtin type time
  numa: Extend CLI to provide initiator information for numa nodes
  numa: Calculate hmat latency and bandwidth entry list
  tests/bios-tables-test: add test cases for ACPI HMAT

 hw/acpi/Kconfig                    |   7 +-
 hw/acpi/Makefile.objs              |   1 +
 hw/acpi/hmat.c                     | 263 +++++++++++++++++++++++++++
 hw/acpi/hmat.h                     |  42 +++++
 hw/core/machine.c                  |  70 ++++++++
 hw/core/numa.c                     | 273 ++++++++++++++++++++++++++++-
 hw/i386/acpi-build.c               |   5 +
 include/qapi/visitor-impl.h        |   4 +
 include/qapi/visitor.h             |   9 +
 include/qemu/cutils.h              |   1 +
 include/sysemu/numa.h              | 104 +++++++++++
 qapi/machine.json                  | 179 ++++++++++++++++++-
 qapi/opts-visitor.c                |  22 +++
 qapi/qapi-visit-core.c             |  12 ++
 qapi/qobject-input-visitor.c       |  18 ++
 qapi/trace-events                  |   1 +
 qemu-options.hx                    |  96 +++++++++-
 scripts/qapi/common.py             |   1 +
 tests/bios-tables-test.c           |  44 +++++
 tests/test-cutils.c                | 199 +++++++++++++++++++++
 tests/test-keyval.c                | 125 +++++++++++++
 tests/test-qobject-input-visitor.c |  29 +++
 util/cutils.c                      |  82 +++++++++
 23 files changed, 1575 insertions(+), 12 deletions(-)
 create mode 100644 hw/acpi/hmat.c
 create mode 100644 hw/acpi/hmat.h

-- 
2.20.1



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v13 01/12] util/cutils: Add qemu_strtotime_ps()
  2019-10-20 11:11 [PATCH v13 00/12] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao Xu
@ 2019-10-20 11:11 ` Tao Xu
  2019-10-23  1:08   ` Eduardo Habkost
                     ` (2 more replies)
  2019-10-20 11:11 ` [PATCH v13 02/12] tests/cutils: Add test for qemu_strtotime_ps() Tao Xu
                   ` (15 subsequent siblings)
  16 siblings, 3 replies; 38+ messages in thread
From: Tao Xu @ 2019-10-20 11:11 UTC (permalink / raw)
  To: imammedo, eblake, ehabkost
  Cc: jingqi.liu, tao3.xu, fan.du, qemu-devel, jonathan.cameron

To convert strings with time suffixes to numbers, support time unit are
"ps" for picosecond, "ns" for nanosecond, "us" for microsecond, "ms"
for millisecond or "s" for second.

Signed-off-by: Tao Xu <tao3.xu@intel.com>
---

No changes in v13.
---
 include/qemu/cutils.h |  1 +
 util/cutils.c         | 82 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 83 insertions(+)

diff --git a/include/qemu/cutils.h b/include/qemu/cutils.h
index b54c847e0f..7b6d106bdd 100644
--- a/include/qemu/cutils.h
+++ b/include/qemu/cutils.h
@@ -182,5 +182,6 @@ int uleb128_decode_small(const uint8_t *in, uint32_t *n);
  * *str1 is <, == or > than *str2.
  */
 int qemu_pstrcmp0(const char **str1, const char **str2);
+int qemu_strtotime_ps(const char *nptr, const char **end, uint64_t *result);
 
 #endif
diff --git a/util/cutils.c b/util/cutils.c
index fd591cadf0..a50c15f46a 100644
--- a/util/cutils.c
+++ b/util/cutils.c
@@ -847,3 +847,85 @@ int qemu_pstrcmp0(const char **str1, const char **str2)
 {
     return g_strcmp0(*str1, *str2);
 }
+
+static int64_t timeunit_mul(const char *unitstr)
+{
+    if (g_strcmp0(unitstr, "ps") == 0) {
+        return 1;
+    } else if (g_strcmp0(unitstr, "ns") == 0) {
+        return 1000;
+    } else if (g_strcmp0(unitstr, "us") == 0) {
+        return 1000000;
+    } else if (g_strcmp0(unitstr, "ms") == 0) {
+        return 1000000000LL;
+    } else if (g_strcmp0(unitstr, "s") == 0) {
+        return 1000000000000LL;
+    } else {
+        return -1;
+    }
+}
+
+
+/*
+ * Convert string to time, support time unit are ps for picosecond,
+ * ns for nanosecond, us for microsecond, ms for millisecond or s for second.
+ * End pointer will be returned in *end, if not NULL. Return -ERANGE on
+ * overflow, and -EINVAL on other error.
+ */
+static int do_strtotime(const char *nptr, const char **end,
+                      const char *default_unit, uint64_t *result)
+{
+    int retval;
+    const char *endptr;
+    int mul_required = 0;
+    int64_t mul;
+    double val, integral, fraction;
+
+    retval = qemu_strtod_finite(nptr, &endptr, &val);
+    if (retval) {
+        goto out;
+    }
+    fraction = modf(val, &integral);
+    if (fraction != 0) {
+        mul_required = 1;
+    }
+
+    mul = timeunit_mul(endptr);
+
+    if (mul == 1000000000000LL) {
+        endptr++;
+    } else if (mul != -1) {
+        endptr += 2;
+    } else {
+        mul = timeunit_mul(default_unit);
+        assert(mul >= 0);
+    }
+    if (mul == 1 && mul_required) {
+        retval = -EINVAL;
+        goto out;
+    }
+    /*
+     * Values >= 0xfffffffffffffc00 overflow uint64_t after their trip
+     * through double (53 bits of precision).
+     */
+    if ((val * (double)mul >= 0xfffffffffffffc00) || val < 0) {
+        retval = -ERANGE;
+        goto out;
+    }
+    *result = val * (double)mul;
+    retval = 0;
+
+out:
+    if (end) {
+        *end = endptr;
+    } else if (*endptr) {
+        retval = -EINVAL;
+    }
+
+    return retval;
+}
+
+int qemu_strtotime_ps(const char *nptr, const char **end, uint64_t *result)
+{
+    return do_strtotime(nptr, end, "ps", result);
+}
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v13 02/12] tests/cutils: Add test for qemu_strtotime_ps()
  2019-10-20 11:11 [PATCH v13 00/12] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao Xu
  2019-10-20 11:11 ` [PATCH v13 01/12] util/cutils: Add qemu_strtotime_ps() Tao Xu
@ 2019-10-20 11:11 ` Tao Xu
  2019-10-20 11:11 ` [PATCH v13 03/12] qapi: Add builtin type time Tao Xu
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 38+ messages in thread
From: Tao Xu @ 2019-10-20 11:11 UTC (permalink / raw)
  To: imammedo, eblake, ehabkost
  Cc: jingqi.liu, tao3.xu, fan.du, qemu-devel, jonathan.cameron

Test the input of basic, time suffixes, float, invaild, trailing and
overflow.

Signed-off-by: Tao Xu <tao3.xu@intel.com>
---

No changes in v13.
---
 tests/test-cutils.c | 199 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 199 insertions(+)

diff --git a/tests/test-cutils.c b/tests/test-cutils.c
index 1aa8351520..19c967d3d5 100644
--- a/tests/test-cutils.c
+++ b/tests/test-cutils.c
@@ -2179,6 +2179,193 @@ static void test_qemu_strtosz_metric(void)
     g_assert(endptr == str + 6);
 }
 
+static void test_qemu_strtotime_ps_simple(void)
+{
+    const char *str;
+    const char *endptr;
+    int err;
+    uint64_t res = 0xbaadf00d;
+
+    str = "0";
+    err = qemu_strtotime_ps(str, &endptr, &res);
+    g_assert_cmpint(err, ==, 0);
+    g_assert_cmpint(res, ==, 0);
+    g_assert(endptr == str + 1);
+
+    str = "56789";
+    err = qemu_strtotime_ps(str, &endptr, &res);
+    g_assert_cmpint(err, ==, 0);
+    g_assert_cmpint(res, ==, 56789);
+    g_assert(endptr == str + 5);
+
+    err = qemu_strtotime_ps(str, NULL, &res);
+    g_assert_cmpint(err, ==, 0);
+    g_assert_cmpint(res, ==, 56789);
+
+    /* Note: precision is 53 bits since we're parsing with strtod() */
+
+    str = "9007199254740991"; /* 2^53-1 */
+    err = qemu_strtotime_ps(str, &endptr, &res);
+    g_assert_cmpint(err, ==, 0);
+    g_assert_cmpint(res, ==, 0x1fffffffffffff);
+    g_assert(endptr == str + 16);
+
+    str = "9007199254740992"; /* 2^53 */
+    err = qemu_strtotime_ps(str, &endptr, &res);
+    g_assert_cmpint(err, ==, 0);
+    g_assert_cmpint(res, ==, 0x20000000000000);
+    g_assert(endptr == str + 16);
+
+    str = "9007199254740993"; /* 2^53+1 */
+    err = qemu_strtotime_ps(str, &endptr, &res);
+    g_assert_cmpint(err, ==, 0);
+    g_assert_cmpint(res, ==, 0x20000000000000); /* rounded to 53 bits */
+    g_assert(endptr == str + 16);
+
+    str = "18446744073709549568"; /* 0xfffffffffffff800 (53 msbs set) */
+    err = qemu_strtotime_ps(str, &endptr, &res);
+    g_assert_cmpint(err, ==, 0);
+    g_assert_cmpint(res, ==, 0xfffffffffffff800);
+    g_assert(endptr == str + 20);
+
+    str = "18446744073709550591"; /* 0xfffffffffffffbff */
+    err = qemu_strtotime_ps(str, &endptr, &res);
+    g_assert_cmpint(err, ==, 0);
+    g_assert_cmpint(res, ==, 0xfffffffffffff800); /* rounded to 53 bits */
+    g_assert(endptr == str + 20);
+
+    /* 0x7ffffffffffffe00..0x7fffffffffffffff get rounded to
+     * 0x8000000000000000, thus -ERANGE; see test_qemu_strtosz_erange() */
+}
+
+static void test_qemu_strtotime_ps_units(void)
+{
+    const char *ps = "1ps";
+    const char *ns = "1ns";
+    const char *us = "1us";
+    const char *ms = "1ms";
+    const char *s = "1s";
+    int err;
+    const char *endptr;
+    uint64_t res = 0xbaadf00d;
+
+    err = qemu_strtotime_ps(ps, &endptr, &res);
+    g_assert_cmpint(err, ==, 0);
+    g_assert_cmpint(res, ==, 1);
+    g_assert(endptr == ps + 3);
+
+    err = qemu_strtotime_ps(ns, &endptr, &res);
+    g_assert_cmpint(err, ==, 0);
+    g_assert_cmpint(res, ==, 1000);
+    g_assert(endptr == ns + 3);
+
+    err = qemu_strtotime_ps(us, &endptr, &res);
+    g_assert_cmpint(err, ==, 0);
+    g_assert_cmpint(res, ==, 1000000);
+    g_assert(endptr == us + 3);
+
+    err = qemu_strtotime_ps(ms, &endptr, &res);
+    g_assert_cmpint(err, ==, 0);
+    g_assert_cmpint(res, ==, 1000000000LL);
+    g_assert(endptr == ms + 3);
+
+    err = qemu_strtotime_ps(s, &endptr, &res);
+    g_assert_cmpint(err, ==, 0);
+    g_assert_cmpint(res, ==, 1000000000000ULL);
+    g_assert(endptr == s + 2);
+}
+
+static void test_qemu_strtotime_ps_float(void)
+{
+    const char *str = "56.789ns";
+    int err;
+    const char *endptr;
+    uint64_t res = 0xbaadf00d;
+
+    err = qemu_strtotime_ps(str, &endptr, &res);
+    g_assert_cmpint(err, ==, 0);
+    g_assert_cmpint(res, ==, 56.789 * 1000);
+    g_assert(endptr == str + 8);
+}
+
+static void test_qemu_strtotime_ps_invalid(void)
+{
+    const char *str;
+    const char *endptr;
+    int err;
+    uint64_t res = 0xbaadf00d;
+
+    str = "";
+    err = qemu_strtotime_ps(str, &endptr, &res);
+    g_assert_cmpint(err, ==, -EINVAL);
+    g_assert(endptr == str);
+
+    str = " \t ";
+    err = qemu_strtotime_ps(str, &endptr, &res);
+    g_assert_cmpint(err, ==, -EINVAL);
+    g_assert(endptr == str);
+
+    str = "crap";
+    err = qemu_strtotime_ps(str, &endptr, &res);
+    g_assert_cmpint(err, ==, -EINVAL);
+    g_assert(endptr == str);
+
+    str = "inf";
+    err = qemu_strtotime_ps(str, &endptr, &res);
+    g_assert_cmpint(err, ==, -EINVAL);
+    g_assert(endptr == str);
+
+    str = "NaN";
+    err = qemu_strtotime_ps(str, &endptr, &res);
+    g_assert_cmpint(err, ==, -EINVAL);
+    g_assert(endptr == str);
+}
+
+static void test_qemu_strtotime_ps_trailing(void)
+{
+    const char *str;
+    int err;
+    uint64_t res = 0xbaadf00d;
+
+    str = "123xxx";
+
+    err = qemu_strtotime_ps(str, NULL, &res);
+    g_assert_cmpint(err, ==, -EINVAL);
+}
+
+static void test_qemu_strtotime_ps_erange(void)
+{
+    const char *str;
+    const char *endptr;
+    int err;
+    uint64_t res = 0xbaadf00d;
+
+    str = "-1";
+    err = qemu_strtotime_ps(str, &endptr, &res);
+    g_assert_cmpint(err, ==, -ERANGE);
+    g_assert(endptr == str + 2);
+
+    str = "18446744073709550592"; /* 0xfffffffffffffc00 */
+    err = qemu_strtotime_ps(str, &endptr, &res);
+    g_assert_cmpint(err, ==, -ERANGE);
+    g_assert(endptr == str + 20);
+
+    str = "18446744073709551615"; /* 2^64-1 */
+    err = qemu_strtotime_ps(str, &endptr, &res);
+    g_assert_cmpint(err, ==, -ERANGE);
+    g_assert(endptr == str + 20);
+
+    str = "18446744073709551616"; /* 2^64 */
+    err = qemu_strtotime_ps(str, &endptr, &res);
+    g_assert_cmpint(err, ==, -ERANGE);
+    g_assert(endptr == str + 20);
+
+    str = "200000000000000s";
+    err = qemu_strtotime_ps(str, &endptr, &res);
+    g_assert_cmpint(err, ==, -ERANGE);
+    g_assert(endptr == str + 16);
+}
+
 int main(int argc, char **argv)
 {
     g_test_init(&argc, &argv, NULL);
@@ -2456,5 +2643,17 @@ int main(int argc, char **argv)
     g_test_add_func("/cutils/strtosz/metric",
                     test_qemu_strtosz_metric);
 
+    g_test_add_func("/cutils/strtotime/simple",
+                    test_qemu_strtotime_ps_simple);
+    g_test_add_func("/cutils/strtotime/units",
+                    test_qemu_strtotime_ps_units);
+    g_test_add_func("/cutils/strtotime/float",
+                    test_qemu_strtotime_ps_float);
+    g_test_add_func("/cutils/strtotime/invalid",
+                    test_qemu_strtotime_ps_invalid);
+    g_test_add_func("/cutils/strtotime/trailing",
+                    test_qemu_strtotime_ps_trailing);
+    g_test_add_func("/cutils/strtotime/erange",
+                    test_qemu_strtotime_ps_erange);
     return g_test_run();
 }
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v13 03/12] qapi: Add builtin type time
  2019-10-20 11:11 [PATCH v13 00/12] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao Xu
  2019-10-20 11:11 ` [PATCH v13 01/12] util/cutils: Add qemu_strtotime_ps() Tao Xu
  2019-10-20 11:11 ` [PATCH v13 02/12] tests/cutils: Add test for qemu_strtotime_ps() Tao Xu
@ 2019-10-20 11:11 ` Tao Xu
  2019-10-20 11:11 ` [PATCH v13 04/12] tests: Add test for QAPI " Tao Xu
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 38+ messages in thread
From: Tao Xu @ 2019-10-20 11:11 UTC (permalink / raw)
  To: imammedo, eblake, ehabkost
  Cc: jingqi.liu, tao3.xu, fan.du, qemu-devel, jonathan.cameron

Add optional builtin type time, fallback is uint64. This type use
qemu_strtotime_ps() for pre-converting time suffix to numbers.

Signed-off-by: Tao Xu <tao3.xu@intel.com>
---

No changes in v13.
---
 include/qapi/visitor-impl.h  |  4 ++++
 include/qapi/visitor.h       |  9 +++++++++
 qapi/opts-visitor.c          | 22 ++++++++++++++++++++++
 qapi/qapi-visit-core.c       | 12 ++++++++++++
 qapi/qobject-input-visitor.c | 18 ++++++++++++++++++
 qapi/trace-events            |  1 +
 scripts/qapi/common.py       |  1 +
 7 files changed, 67 insertions(+)

diff --git a/include/qapi/visitor-impl.h b/include/qapi/visitor-impl.h
index 8ccb3b6c20..e0979563c7 100644
--- a/include/qapi/visitor-impl.h
+++ b/include/qapi/visitor-impl.h
@@ -88,6 +88,10 @@ struct Visitor
     void (*type_size)(Visitor *v, const char *name, uint64_t *obj,
                       Error **errp);
 
+    /* Optional; fallback is type_uint64() */
+    void (*type_time)(Visitor *v, const char *name, uint64_t *obj,
+                      Error **errp);
+
     /* Must be set */
     void (*type_bool)(Visitor *v, const char *name, bool *obj, Error **errp);
 
diff --git a/include/qapi/visitor.h b/include/qapi/visitor.h
index c5b23851a1..62565f4cb5 100644
--- a/include/qapi/visitor.h
+++ b/include/qapi/visitor.h
@@ -554,6 +554,15 @@ void visit_type_int64(Visitor *v, const char *name, int64_t *obj,
 void visit_type_size(Visitor *v, const char *name, uint64_t *obj,
                      Error **errp);
 
+/*
+ * Visit a uint64_t value.
+ * Like visit_type_uint64(), except that some visitors may choose to
+ * recognize numbers with timeunit suffix, such as "ps", "ns", "us"
+ * "ms" and "s".
+ */
+void visit_type_time(Visitor *v, const char *name, uint64_t *obj,
+                     Error **errp);
+
 /*
  * Visit a boolean value.
  *
diff --git a/qapi/opts-visitor.c b/qapi/opts-visitor.c
index 5fe0276c1c..e289c63b52 100644
--- a/qapi/opts-visitor.c
+++ b/qapi/opts-visitor.c
@@ -526,6 +526,27 @@ opts_type_size(Visitor *v, const char *name, uint64_t *obj, Error **errp)
     processed(ov, name);
 }
 
+static void
+opts_type_time(Visitor *v, const char *name, uint64_t *obj, Error **errp)
+{
+    OptsVisitor *ov = to_ov(v);
+    const QemuOpt *opt;
+    int err;
+
+    opt = lookup_scalar(ov, name, errp);
+    if (!opt) {
+        return;
+    }
+
+    err = qemu_strtotime_ps(opt->str ? opt->str : "", NULL, obj);
+    if (err < 0) {
+        error_setg(errp, QERR_INVALID_PARAMETER_VALUE, opt->name,
+                   "a time value");
+        return;
+    }
+
+    processed(ov, name);
+}
 
 static void
 opts_optional(Visitor *v, const char *name, bool *present)
@@ -573,6 +594,7 @@ opts_visitor_new(const QemuOpts *opts)
     ov->visitor.type_int64  = &opts_type_int64;
     ov->visitor.type_uint64 = &opts_type_uint64;
     ov->visitor.type_size   = &opts_type_size;
+    ov->visitor.type_time   = &opts_type_time;
     ov->visitor.type_bool   = &opts_type_bool;
     ov->visitor.type_str    = &opts_type_str;
 
diff --git a/qapi/qapi-visit-core.c b/qapi/qapi-visit-core.c
index 5365561b07..ac8896455c 100644
--- a/qapi/qapi-visit-core.c
+++ b/qapi/qapi-visit-core.c
@@ -277,6 +277,18 @@ void visit_type_size(Visitor *v, const char *name, uint64_t *obj,
     }
 }
 
+void visit_type_time(Visitor *v, const char *name, uint64_t *obj,
+                     Error **errp)
+{
+    assert(obj);
+    trace_visit_type_time(v, name, obj);
+    if (v->type_time) {
+        v->type_time(v, name, obj, errp);
+    } else {
+        v->type_uint64(v, name, obj, errp);
+    }
+}
+
 void visit_type_bool(Visitor *v, const char *name, bool *obj, Error **errp)
 {
     assert(obj);
diff --git a/qapi/qobject-input-visitor.c b/qapi/qobject-input-visitor.c
index 32236cbcb1..9b66941d8a 100644
--- a/qapi/qobject-input-visitor.c
+++ b/qapi/qobject-input-visitor.c
@@ -627,6 +627,23 @@ static void qobject_input_type_size_keyval(Visitor *v, const char *name,
     }
 }
 
+static void qobject_input_type_time_keyval(Visitor *v, const char *name,
+                                           uint64_t *obj, Error **errp)
+{
+    QObjectInputVisitor *qiv = to_qiv(v);
+    const char *str = qobject_input_get_keyval(qiv, name, errp);
+
+    if (!str) {
+        return;
+    }
+
+    if (qemu_strtotime_ps(str, NULL, obj) < 0) {
+        /* TODO report -ERANGE more nicely */
+        error_setg(errp, QERR_INVALID_PARAMETER_VALUE,
+                   full_name(qiv, name), "time");
+    }
+}
+
 static void qobject_input_optional(Visitor *v, const char *name, bool *present)
 {
     QObjectInputVisitor *qiv = to_qiv(v);
@@ -708,6 +725,7 @@ Visitor *qobject_input_visitor_new_keyval(QObject *obj)
     v->visitor.type_any = qobject_input_type_any;
     v->visitor.type_null = qobject_input_type_null;
     v->visitor.type_size = qobject_input_type_size_keyval;
+    v->visitor.type_time = qobject_input_type_time_keyval;
     v->keyval = true;
 
     return &v->visitor;
diff --git a/qapi/trace-events b/qapi/trace-events
index 5eb4afa110..c4605a7ccc 100644
--- a/qapi/trace-events
+++ b/qapi/trace-events
@@ -29,6 +29,7 @@ visit_type_int16(void *v, const char *name, int16_t *obj) "v=%p name=%s obj=%p"
 visit_type_int32(void *v, const char *name, int32_t *obj) "v=%p name=%s obj=%p"
 visit_type_int64(void *v, const char *name, int64_t *obj) "v=%p name=%s obj=%p"
 visit_type_size(void *v, const char *name, uint64_t *obj) "v=%p name=%s obj=%p"
+visit_type_time(void *v, const char *name, uint64_t *obj) "v=%p name=%s obj=%p"
 visit_type_bool(void *v, const char *name, bool *obj) "v=%p name=%s obj=%p"
 visit_type_str(void *v, const char *name, char **obj) "v=%p name=%s obj=%p"
 visit_type_number(void *v, const char *name, void *obj) "v=%p name=%s obj=%p"
diff --git a/scripts/qapi/common.py b/scripts/qapi/common.py
index d6e00c80ea..20c47bae95 100644
--- a/scripts/qapi/common.py
+++ b/scripts/qapi/common.py
@@ -1822,6 +1822,7 @@ class QAPISchema(object):
                   ('uint32', 'int',     'uint32_t'),
                   ('uint64', 'int',     'uint64_t'),
                   ('size',   'int',     'uint64_t'),
+                  ('time',   'int',     'uint64_t'),
                   ('bool',   'boolean', 'bool'),
                   ('any',    'value',   'QObject' + pointer_suffix),
                   ('null',   'null',    'QNull' + pointer_suffix)]:
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v13 04/12] tests: Add test for QAPI builtin type time
  2019-10-20 11:11 [PATCH v13 00/12] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao Xu
                   ` (2 preceding siblings ...)
  2019-10-20 11:11 ` [PATCH v13 03/12] qapi: Add builtin type time Tao Xu
@ 2019-10-20 11:11 ` Tao Xu
  2019-10-20 11:11 ` [PATCH v13 05/12] numa: Extend CLI to provide initiator information for numa nodes Tao Xu
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 38+ messages in thread
From: Tao Xu @ 2019-10-20 11:11 UTC (permalink / raw)
  To: imammedo, eblake, ehabkost
  Cc: jingqi.liu, tao3.xu, fan.du, qemu-devel, jonathan.cameron

Add tests for time input such as zero, around limit of precision,
signed upper limit, actual upper limit, beyond limits, time suffixes,
and etc.

Signed-off-by: Tao Xu <tao3.xu@intel.com>
---

No changes in v13.
---
 tests/test-keyval.c                | 125 +++++++++++++++++++++++++++++
 tests/test-qobject-input-visitor.c |  29 +++++++
 2 files changed, 154 insertions(+)

diff --git a/tests/test-keyval.c b/tests/test-keyval.c
index 09b0ae3c68..b36914f0fc 100644
--- a/tests/test-keyval.c
+++ b/tests/test-keyval.c
@@ -490,6 +490,130 @@ static void test_keyval_visit_size(void)
     visit_free(v);
 }
 
+static void test_keyval_visit_time(void)
+{
+    Error *err = NULL;
+    Visitor *v;
+    QDict *qdict;
+    uint64_t time;
+
+    /* Lower limit zero */
+    qdict = keyval_parse("time1=0", NULL, &error_abort);
+    v = qobject_input_visitor_new_keyval(QOBJECT(qdict));
+    qobject_unref(qdict);
+    visit_start_struct(v, NULL, NULL, 0, &error_abort);
+    visit_type_time(v, "time1", &time, &error_abort);
+    g_assert_cmpuint(time, ==, 0);
+    visit_check_struct(v, &error_abort);
+    visit_end_struct(v, NULL);
+    visit_free(v);
+
+    /* Around limit of precision: 2^53-1, 2^53, 2^53+1 */
+    qdict = keyval_parse("time1=9007199254740991,"
+                         "time2=9007199254740992,"
+                         "time3=9007199254740993",
+                         NULL, &error_abort);
+    v = qobject_input_visitor_new_keyval(QOBJECT(qdict));
+    qobject_unref(qdict);
+    visit_start_struct(v, NULL, NULL, 0, &error_abort);
+    visit_type_time(v, "time1", &time, &error_abort);
+    g_assert_cmphex(time, ==, 0x1fffffffffffff);
+    visit_type_time(v, "time2", &time, &error_abort);
+    g_assert_cmphex(time, ==, 0x20000000000000);
+    visit_type_time(v, "time3", &time, &error_abort);
+    g_assert_cmphex(time, ==, 0x20000000000000);
+    visit_check_struct(v, &error_abort);
+    visit_end_struct(v, NULL);
+    visit_free(v);
+
+    /* Close to signed upper limit 0x7ffffffffffffc00 (53 msbs set) */
+    qdict = keyval_parse("time1=9223372036854774784," /* 7ffffffffffffc00 */
+                         "time2=9223372036854775295", /* 7ffffffffffffdff */
+                         NULL, &error_abort);
+    v = qobject_input_visitor_new_keyval(QOBJECT(qdict));
+    qobject_unref(qdict);
+    visit_start_struct(v, NULL, NULL, 0, &error_abort);
+    visit_type_time(v, "time1", &time, &error_abort);
+    g_assert_cmphex(time, ==, 0x7ffffffffffffc00);
+    visit_type_time(v, "time2", &time, &error_abort);
+    g_assert_cmphex(time, ==, 0x7ffffffffffffc00);
+    visit_check_struct(v, &error_abort);
+    visit_end_struct(v, NULL);
+    visit_free(v);
+
+    /* Close to actual upper limit 0xfffffffffffff800 (53 msbs set) */
+    qdict = keyval_parse("time1=18446744073709549568," /* fffffffffffff800 */
+                         "time2=18446744073709550591", /* fffffffffffffbff */
+                         NULL, &error_abort);
+    v = qobject_input_visitor_new_keyval(QOBJECT(qdict));
+    qobject_unref(qdict);
+    visit_start_struct(v, NULL, NULL, 0, &error_abort);
+    visit_type_time(v, "time1", &time, &error_abort);
+    g_assert_cmphex(time, ==, 0xfffffffffffff800);
+    visit_type_time(v, "time2", &time, &error_abort);
+    g_assert_cmphex(time, ==, 0xfffffffffffff800);
+    visit_check_struct(v, &error_abort);
+    visit_end_struct(v, NULL);
+    visit_free(v);
+
+    /* Beyond limits */
+    qdict = keyval_parse("time1=-1,"
+                         "time2=18446744073709550592", /* fffffffffffffc00 */
+                         NULL, &error_abort);
+    v = qobject_input_visitor_new_keyval(QOBJECT(qdict));
+    qobject_unref(qdict);
+    visit_start_struct(v, NULL, NULL, 0, &error_abort);
+    visit_type_time(v, "time1", &time, &err);
+    error_free_or_abort(&err);
+    visit_type_time(v, "time2", &time, &err);
+    error_free_or_abort(&err);
+    visit_end_struct(v, NULL);
+    visit_free(v);
+
+    /* Suffixes */
+    qdict = keyval_parse("time1=2ps,time2=3.4ns,time3=5us,"
+                         "time4=0.6ms,time5=700s",
+                         NULL, &error_abort);
+    v = qobject_input_visitor_new_keyval(QOBJECT(qdict));
+    qobject_unref(qdict);
+    visit_start_struct(v, NULL, NULL, 0, &error_abort);
+    visit_type_time(v, "time1", &time, &error_abort);
+    g_assert_cmpuint(time, ==, 2);
+    visit_type_time(v, "time2", &time, &error_abort);
+    g_assert_cmpuint(time, ==, 3400);
+    visit_type_time(v, "time3", &time, &error_abort);
+    g_assert_cmphex(time, ==, 5 * 1000 * 1000);
+    visit_type_time(v, "time4", &time, &error_abort);
+    g_assert_cmphex(time, ==, 600 * 1000 * 1000);
+    visit_type_time(v, "time5", &time, &error_abort);
+    g_assert_cmphex(time, ==, 700 * 1000000000000ULL);
+    visit_check_struct(v, &error_abort);
+    visit_end_struct(v, NULL);
+    visit_free(v);
+
+    /* Beyond limit with suffix */
+    qdict = keyval_parse("time1=18446745s", NULL, &error_abort);
+    v = qobject_input_visitor_new_keyval(QOBJECT(qdict));
+    qobject_unref(qdict);
+    visit_start_struct(v, NULL, NULL, 0, &error_abort);
+    visit_type_time(v, "time1", &time, &err);
+    error_free_or_abort(&err);
+    visit_end_struct(v, NULL);
+    visit_free(v);
+
+    /* Trailing crap */
+    qdict = keyval_parse("time1=89ks,time2=ns", NULL, &error_abort);
+    v = qobject_input_visitor_new_keyval(QOBJECT(qdict));
+    qobject_unref(qdict);
+    visit_start_struct(v, NULL, NULL, 0, &error_abort);
+    visit_type_time(v, "time1", &time, &err);
+    error_free_or_abort(&err);
+    visit_type_time(v, "time2", &time, &err);;
+    error_free_or_abort(&err);
+    visit_end_struct(v, NULL);
+    visit_free(v);
+}
+
 static void test_keyval_visit_dict(void)
 {
     Error *err = NULL;
@@ -678,6 +802,7 @@ int main(int argc, char *argv[])
     g_test_add_func("/keyval/visit/bool", test_keyval_visit_bool);
     g_test_add_func("/keyval/visit/number", test_keyval_visit_number);
     g_test_add_func("/keyval/visit/size", test_keyval_visit_size);
+    g_test_add_func("/keyval/visit/time", test_keyval_visit_time);
     g_test_add_func("/keyval/visit/dict", test_keyval_visit_dict);
     g_test_add_func("/keyval/visit/list", test_keyval_visit_list);
     g_test_add_func("/keyval/visit/optional", test_keyval_visit_optional);
diff --git a/tests/test-qobject-input-visitor.c b/tests/test-qobject-input-visitor.c
index 6bacabf063..4b5820b744 100644
--- a/tests/test-qobject-input-visitor.c
+++ b/tests/test-qobject-input-visitor.c
@@ -366,6 +366,31 @@ static void test_visitor_in_size_str_fail(TestInputVisitorData *data,
     error_free_or_abort(&err);
 }
 
+static void test_visitor_in_time_str_keyval(TestInputVisitorData *data,
+                                            const void *unused)
+{
+    uint64_t res, value = 265 * 1000 * 1000;
+    Visitor *v;
+
+    v = visitor_input_test_init_full(data, true, "\"265us\"");
+
+    visit_type_time(v, NULL, &res, &error_abort);
+    g_assert_cmpfloat(res, ==, value);
+}
+
+static void test_visitor_in_time_str_fail(TestInputVisitorData *data,
+                                          const void *unused)
+{
+    uint64_t res = 0;
+    Visitor *v;
+    Error *err = NULL;
+
+    v = visitor_input_test_init(data, "\"265us\"");
+
+    visit_type_time(v, NULL, &res, &err);
+    error_free_or_abort(&err);
+}
+
 static void test_visitor_in_string(TestInputVisitorData *data,
                                    const void *unused)
 {
@@ -1311,6 +1336,10 @@ int main(int argc, char **argv)
                            NULL, test_visitor_in_size_str_keyval);
     input_visitor_test_add("/visitor/input/size_str_fail",
                            NULL, test_visitor_in_size_str_fail);
+    input_visitor_test_add("/visitor/input/time_str_keyval",
+                           NULL, test_visitor_in_time_str_keyval);
+    input_visitor_test_add("/visitor/input/time_str_fail",
+                           NULL, test_visitor_in_time_str_fail);
     input_visitor_test_add("/visitor/input/string",
                            NULL, test_visitor_in_string);
     input_visitor_test_add("/visitor/input/enum",
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v13 05/12] numa: Extend CLI to provide initiator information for numa nodes
  2019-10-20 11:11 [PATCH v13 00/12] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao Xu
                   ` (3 preceding siblings ...)
  2019-10-20 11:11 ` [PATCH v13 04/12] tests: Add test for QAPI " Tao Xu
@ 2019-10-20 11:11 ` Tao Xu
  2019-10-21 12:29   ` Igor Mammedov
  2019-10-20 11:11 ` [PATCH v13 06/12] numa: Extend CLI to provide memory latency and bandwidth information Tao Xu
                   ` (11 subsequent siblings)
  16 siblings, 1 reply; 38+ messages in thread
From: Tao Xu @ 2019-10-20 11:11 UTC (permalink / raw)
  To: imammedo, eblake, ehabkost
  Cc: jingqi.liu, tao3.xu, fan.du, qemu-devel, jonathan.cameron, Dan Williams

In ACPI 6.3 chapter 5.2.27 Heterogeneous Memory Attribute Table (HMAT),
The initiator represents processor which access to memory. And in 5.2.27.3
Memory Proximity Domain Attributes Structure, the attached initiator is
defined as where the memory controller responsible for a memory proximity
domain. With attached initiator information, the topology of heterogeneous
memory can be described.

Extend CLI of "-numa node" option to indicate the initiator numa node-id.
In the linux kernel, the codes in drivers/acpi/hmat/hmat.c parse and report
the platform's HMAT tables.

Reviewed-by: Jingqi Liu <jingqi.liu@intel.com>
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
---

Changes in v13:
    - Modify some text description
    - Drop "initiator_valid" field in struct NodeInfo (Igor)

Changes in v12:
    - Fix the bug that a memory-only node without initiator setting
      doesn't report error. (reported by Danmei Wei)
---
 hw/core/machine.c     | 70 +++++++++++++++++++++++++++++++++++++++++++
 hw/core/numa.c        | 23 ++++++++++++++
 include/sysemu/numa.h |  5 ++++
 qapi/machine.json     | 10 ++++++-
 qemu-options.hx       | 35 ++++++++++++++++++----
 5 files changed, 137 insertions(+), 6 deletions(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 1689ad3bf8..32a451bd84 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -518,6 +518,20 @@ static void machine_set_nvdimm(Object *obj, bool value, Error **errp)
     ms->nvdimms_state->is_enabled = value;
 }
 
+static bool machine_get_hmat(Object *obj, Error **errp)
+{
+    MachineState *ms = MACHINE(obj);
+
+    return ms->numa_state->hmat_enabled;
+}
+
+static void machine_set_hmat(Object *obj, bool value, Error **errp)
+{
+    MachineState *ms = MACHINE(obj);
+
+    ms->numa_state->hmat_enabled = value;
+}
+
 static char *machine_get_nvdimm_persistence(Object *obj, Error **errp)
 {
     MachineState *ms = MACHINE(obj);
@@ -645,6 +659,7 @@ void machine_set_cpu_numa_node(MachineState *machine,
                                const CpuInstanceProperties *props, Error **errp)
 {
     MachineClass *mc = MACHINE_GET_CLASS(machine);
+    NodeInfo *numa_info = machine->numa_state->nodes;
     bool match = false;
     int i;
 
@@ -714,6 +729,17 @@ void machine_set_cpu_numa_node(MachineState *machine,
         match = true;
         slot->props.node_id = props->node_id;
         slot->props.has_node_id = props->has_node_id;
+
+        if (machine->numa_state->hmat_enabled) {
+            if ((numa_info[props->node_id].initiator < MAX_NODES) &&
+                (props->node_id != numa_info[props->node_id].initiator)) {
+                error_setg(errp, "The initiator of CPU NUMA node %" PRId64
+                        " should be itself.", props->node_id);
+                return;
+            }
+            numa_info[props->node_id].has_cpu = true;
+            numa_info[props->node_id].initiator = props->node_id;
+        }
     }
 
     if (!match) {
@@ -960,6 +986,13 @@ static void machine_initfn(Object *obj)
 
     if (mc->numa_mem_supported) {
         ms->numa_state = g_new0(NumaState, 1);
+        object_property_add_bool(obj, "hmat",
+                                 machine_get_hmat, machine_set_hmat,
+                                 &error_abort);
+        object_property_set_description(obj, "hmat",
+                                        "Set on/off to enable/disable "
+                                        "ACPI Heterogeneous Memory Attribute "
+                                        "Table (HMAT)", NULL);
     }
 
     /* Register notifier when init is done for sysbus sanity checks */
@@ -1048,6 +1081,38 @@ static char *cpu_slot_to_string(const CPUArchId *cpu)
     return g_string_free(s, false);
 }
 
+static void numa_validate_initiator(NumaState *numa_state)
+{
+    int i;
+    NodeInfo *numa_info = numa_state->nodes;
+
+    for (i = 0; i < numa_state->num_nodes; i++) {
+        if (numa_info[i].initiator == MAX_NODES) {
+            error_report("The initiator of NUMA node %d is missing, use "
+                         "'-numa node,initiator' option to declare it.", i);
+            goto err;
+        }
+
+        if (!numa_info[numa_info[i].initiator].present) {
+            error_report("NUMA node %" PRIu16 " is missing, use "
+                         "'-numa node' option to declare it first.",
+                         numa_info[i].initiator);
+            goto err;
+        }
+
+        if (!numa_info[numa_info[i].initiator].has_cpu) {
+            error_report("The initiator of NUMA node %d is invalid.", i);
+            goto err;
+        }
+    }
+
+    return;
+
+err:
+    error_printf("\n");
+    exit(1);
+}
+
 static void machine_numa_finish_cpu_init(MachineState *machine)
 {
     int i;
@@ -1088,6 +1153,11 @@ static void machine_numa_finish_cpu_init(MachineState *machine)
             machine_set_cpu_numa_node(machine, &props, &error_fatal);
         }
     }
+
+    if (machine->numa_state->hmat_enabled) {
+        numa_validate_initiator(machine->numa_state);
+    }
+
     if (s->len && !qtest_enabled()) {
         warn_report("CPU(s) not present in any NUMA nodes: %s",
                     s->str);
diff --git a/hw/core/numa.c b/hw/core/numa.c
index 038c96d4ab..eba66ab768 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -133,6 +133,29 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node,
         numa_info[nodenr].node_mem = object_property_get_uint(o, "size", NULL);
         numa_info[nodenr].node_memdev = MEMORY_BACKEND(o);
     }
+
+    /*
+     * If not set the initiator, set it to MAX_NODES. And if
+     * HMAT is enabled and this node has no cpus, QEMU will raise error.
+     */
+    numa_info[nodenr].initiator = MAX_NODES;
+    if (node->has_initiator) {
+        if (!ms->numa_state->hmat_enabled) {
+            error_setg(errp, "ACPI Heterogeneous Memory Attribute Table "
+                       "(HMAT) is disabled, enable it with -machine hmat=on "
+                       "before using any of hmat specific options.");
+            return;
+        }
+
+        if (node->initiator >= MAX_NODES) {
+            error_report("The initiator id %" PRIu16 " expects an integer "
+                         "between 0 and %d", node->initiator,
+                         MAX_NODES - 1);
+            return;
+        }
+
+        numa_info[nodenr].initiator = node->initiator;
+    }
     numa_info[nodenr].present = true;
     max_numa_nodeid = MAX(max_numa_nodeid, nodenr + 1);
     ms->numa_state->num_nodes++;
diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
index ae9c41d02b..788cbec7a2 100644
--- a/include/sysemu/numa.h
+++ b/include/sysemu/numa.h
@@ -18,6 +18,8 @@ struct NodeInfo {
     uint64_t node_mem;
     struct HostMemoryBackend *node_memdev;
     bool present;
+    bool has_cpu;
+    uint16_t initiator;
     uint8_t distance[MAX_NODES];
 };
 
@@ -33,6 +35,9 @@ struct NumaState {
     /* Allow setting NUMA distance for different NUMA nodes */
     bool have_numa_distance;
 
+    /* Detect if HMAT support is enabled. */
+    bool hmat_enabled;
+
     /* NUMA nodes information */
     NodeInfo nodes[MAX_NODES];
 };
diff --git a/qapi/machine.json b/qapi/machine.json
index ca26779f1a..f1b07b3486 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -463,6 +463,13 @@
 # @memdev: memory backend object.  If specified for one node,
 #          it must be specified for all nodes.
 #
+# @initiator: defined in ACPI 6.3 Chapter 5.2.27.3 Table 5-145,
+#             points to the nodeid which has the memory controller
+#             responsible for this NUMA node. This field provides
+#             additional information as to the initiator node that
+#             is closest (as in directly attached) to this node, and
+#             therefore has the best performance (since 4.2)
+#
 # Since: 2.1
 ##
 { 'struct': 'NumaNodeOptions',
@@ -470,7 +477,8 @@
    '*nodeid': 'uint16',
    '*cpus':   ['uint16'],
    '*mem':    'size',
-   '*memdev': 'str' }}
+   '*memdev': 'str',
+   '*initiator': 'uint16' }}
 
 ##
 # @NumaDistOptions:
diff --git a/qemu-options.hx b/qemu-options.hx
index 996b6fba74..1f96399521 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -43,7 +43,8 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
     "                suppress-vmdesc=on|off disables self-describing migration (default=off)\n"
     "                nvdimm=on|off controls NVDIMM support (default=off)\n"
     "                enforce-config-section=on|off enforce configuration section migration (default=off)\n"
-    "                memory-encryption=@var{} memory encryption object to use (default=none)\n",
+    "                memory-encryption=@var{} memory encryption object to use (default=none)\n"
+    "                hmat=on|off controls ACPI HMAT support (default=off)\n",
     QEMU_ARCH_ALL)
 STEXI
 @item -machine [type=]@var{name}[,prop=@var{value}[,...]]
@@ -103,6 +104,9 @@ NOTE: this parameter is deprecated. Please use @option{-global}
 @option{migration.send-configuration}=@var{on|off} instead.
 @item memory-encryption=@var{}
 Memory encryption object to use. The default is none.
+@item hmat=on|off
+Enables or disables ACPI Heterogeneous Memory Attribute Table (HMAT) support.
+The default is off.
 @end table
 ETEXI
 
@@ -161,14 +165,14 @@ If any on the three values is given, the total number of CPUs @var{n} can be omi
 ETEXI
 
 DEF("numa", HAS_ARG, QEMU_OPTION_numa,
-    "-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node]\n"
-    "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node]\n"
+    "-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
+    "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
     "-numa dist,src=source,dst=destination,val=distance\n"
     "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n",
     QEMU_ARCH_ALL)
 STEXI
-@item -numa node[,mem=@var{size}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}]
-@itemx -numa node[,memdev=@var{id}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}]
+@item -numa node[,mem=@var{size}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}][,initiator=@var{initiator}]
+@itemx -numa node[,memdev=@var{id}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}][,initiator=@var{initiator}]
 @itemx -numa dist,src=@var{source},dst=@var{destination},val=@var{distance}
 @itemx -numa cpu,node-id=@var{node}[,socket-id=@var{x}][,core-id=@var{y}][,thread-id=@var{z}]
 @findex -numa
@@ -215,6 +219,27 @@ split equally between them.
 @samp{mem} and @samp{memdev} are mutually exclusive. Furthermore,
 if one node uses @samp{memdev}, all of them have to use it.
 
+@samp{initiator} is an additional option that points to an @var{initiator}
+NUMA node that has best performance (the lowest latency or largest bandwidth)
+to this NUMA @var{node}. Note that this option can be set only when
+the machine property 'hmat' is set to 'on'.
+
+Following example creates a machine with 2 NUMA nodes, node 0 has CPU.
+node 1 has only memory, and its initiator is node 0. Note that because
+node 0 has CPU, by default the initiator of node 0 is itself and must be
+itself.
+@example
+-machine hmat=on \
+-m 2G,slots=2,maxmem=4G \
+-object memory-backend-ram,size=1G,id=m0 \
+-object memory-backend-ram,size=1G,id=m1 \
+-numa node,nodeid=0,memdev=m0 \
+-numa node,nodeid=1,memdev=m1,initiator=0 \
+-smp 2,sockets=2,maxcpus=2  \
+-numa cpu,node-id=0,socket-id=0 \
+-numa cpu,node-id=0,socket-id=1
+@end example
+
 @var{source} and @var{destination} are NUMA node IDs.
 @var{distance} is the NUMA distance from @var{source} to @var{destination}.
 The distance from a node to itself is always 10. If any pair of nodes is
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v13 06/12] numa: Extend CLI to provide memory latency and bandwidth information
  2019-10-20 11:11 [PATCH v13 00/12] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao Xu
                   ` (4 preceding siblings ...)
  2019-10-20 11:11 ` [PATCH v13 05/12] numa: Extend CLI to provide initiator information for numa nodes Tao Xu
@ 2019-10-20 11:11 ` Tao Xu
  2019-10-22  7:08   ` Igor Mammedov
  2019-10-23 15:28   ` Igor Mammedov
  2019-10-20 11:11 ` [PATCH v13 07/12] numa: Calculate hmat latency and bandwidth entry list Tao Xu
                   ` (10 subsequent siblings)
  16 siblings, 2 replies; 38+ messages in thread
From: Tao Xu @ 2019-10-20 11:11 UTC (permalink / raw)
  To: imammedo, eblake, ehabkost
  Cc: jingqi.liu, tao3.xu, fan.du, qemu-devel, jonathan.cameron

From: Liu Jingqi <jingqi.liu@intel.com>

Add -numa hmat-lb option to provide System Locality Latency and
Bandwidth Information. These memory attributes help to build
System Locality Latency and Bandwidth Information Structure(s)
in ACPI Heterogeneous Memory Attribute Table (HMAT).

Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
---

Changes in v13:
    - Reuse Garray to store the raw bandwidth and bandwidth data
    - Calculate common base unit using range bitmap (Igor)
---
 hw/core/numa.c        | 127 ++++++++++++++++++++++++++++++++++++++++++
 include/sysemu/numa.h |  68 ++++++++++++++++++++++
 qapi/machine.json     |  95 ++++++++++++++++++++++++++++++-
 qemu-options.hx       |  49 +++++++++++++++-
 4 files changed, 336 insertions(+), 3 deletions(-)

diff --git a/hw/core/numa.c b/hw/core/numa.c
index eba66ab768..3cf77f6ac9 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -23,6 +23,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/units.h"
 #include "sysemu/hostmem.h"
 #include "sysemu/numa.h"
 #include "sysemu/sysemu.h"
@@ -198,6 +199,119 @@ void parse_numa_distance(MachineState *ms, NumaDistOptions *dist, Error **errp)
     ms->numa_state->have_numa_distance = true;
 }
 
+void parse_numa_hmat_lb(NumaState *numa_state, NumaHmatLBOptions *node,
+                        Error **errp)
+{
+    int first_bit, last_bit;
+    uint64_t temp_latency;
+    NodeInfo *numa_info = numa_state->nodes;
+    HMAT_LB_Info *hmat_lb =
+        numa_state->hmat_lb[node->hierarchy][node->data_type];
+    HMAT_LB_Data lb_data;
+
+    /* Error checking */
+    if (node->initiator >= numa_state->num_nodes) {
+        error_setg(errp, "Invalid initiator=%d, it should be less than %d.",
+                   node->initiator, numa_state->num_nodes);
+        return;
+    }
+    if (node->target >= numa_state->num_nodes) {
+        error_setg(errp, "Invalid target=%d, it should be less than %d.",
+                   node->target, numa_state->num_nodes);
+        return;
+    }
+    if (!numa_info[node->initiator].has_cpu) {
+        error_setg(errp, "Invalid initiator=%d, it isn't an "
+                   "initiator proximity domain.", node->initiator);
+        return;
+    }
+    if (!numa_info[node->target].present) {
+        error_setg(errp, "Invalid target=%d, it hasn't a valid NUMA node.",
+                   node->target);
+        return;
+    }
+
+    if (!hmat_lb) {
+        hmat_lb = g_malloc0(sizeof(*hmat_lb));
+        numa_state->hmat_lb[node->hierarchy][node->data_type] = hmat_lb;
+        hmat_lb->latency = g_array_new(false, true, sizeof(HMAT_LB_Data));
+        hmat_lb->bandwidth = g_array_new(false, true, sizeof(HMAT_LB_Data));
+    }
+    hmat_lb->hierarchy = node->hierarchy;
+    hmat_lb->data_type = node->data_type;
+    lb_data.initiator = node->initiator;
+    lb_data.target = node->target;
+
+    /* Input latency data */
+    if (node->data_type <= HMATLB_DATA_TYPE_WRITE_LATENCY) {
+        if (!node->has_latency) {
+            error_setg(errp, "Missing 'latency' option.");
+            return;
+        }
+        if (node->has_bandwidth) {
+            error_setg(errp, "Invalid option 'bandwidth' since "
+                       "the data type is latency.");
+            return;
+        }
+
+        temp_latency = node->latency;
+        hmat_lb->base_latency = 1;
+        while (QEMU_IS_ALIGNED(temp_latency, 10)) {
+            temp_latency /= 10;
+            hmat_lb->base_latency *= 10;
+        }
+
+        if (temp_latency >= UINT64_MAX) {
+            error_setg(errp, "Latency %" PRIu64 " between initiator=%d and "
+                       "target=%d should not differ from previously entered "
+                       "values on more than %d.", node->latency,
+                       node->initiator, node->target, UINT16_MAX - 1);
+            return;
+        }
+        if (temp_latency > hmat_lb->range_left_la) {
+            hmat_lb->range_left_la = temp_latency;
+        }
+
+        lb_data.rawdata = node->latency;
+        g_array_append_val(hmat_lb->latency, lb_data);
+    }
+
+    /* Input bandwidth data */
+    if (node->data_type >= HMATLB_DATA_TYPE_ACCESS_BANDWIDTH) {
+        if (!node->has_bandwidth) {
+            error_setg(errp, "Missing 'bandwidth' option.");
+            return;
+        }
+        if (node->has_latency) {
+            error_setg(errp, "Invalid option 'latency' since "
+                       "the data type is bandwidth.");
+            return;
+        }
+        if (!QEMU_IS_ALIGNED(node->bandwidth, MiB)) {
+            error_setg(errp, "Bandwidth %" PRIu64 " between initiator=%d and "
+                       "target=%d should be 1MB aligned.", node->bandwidth,
+                       node->initiator, node->target);
+            return;
+        }
+
+        hmat_lb->range_bitmap_bw |= node->bandwidth;
+
+        first_bit = __builtin_ffs(hmat_lb->range_bitmap_bw);
+        last_bit = __builtin_ctz(hmat_lb->range_bitmap_bw);
+        if ((last_bit - first_bit) > UINT16_BITS) {
+            error_setg(errp, "Bandwidth %" PRIu64 " between initiator=%d and "
+                       "target=%d should not differ from previously entered "
+                       "values on more than %d.", node->bandwidth,
+                       node->initiator, node->target, UINT16_MAX - 1);
+            return;
+        }
+
+        hmat_lb->base_bandwidth = UINT64_C(1) << (first_bit - 1);
+        lb_data.rawdata = node->bandwidth;
+        g_array_append_val(hmat_lb->bandwidth, lb_data);
+    }
+}
+
 void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp)
 {
     Error *err = NULL;
@@ -236,6 +350,19 @@ void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp)
         machine_set_cpu_numa_node(ms, qapi_NumaCpuOptions_base(&object->u.cpu),
                                   &err);
         break;
+    case NUMA_OPTIONS_TYPE_HMAT_LB:
+        if (!ms->numa_state->hmat_enabled) {
+            error_setg(errp, "ACPI Heterogeneous Memory Attribute Table "
+                       "(HMAT) is disabled, enable it with -machine hmat=on "
+                       "before using any of hmat specific options.");
+            return;
+        }
+
+        parse_numa_hmat_lb(ms->numa_state, &object->u.hmat_lb, &err);
+        if (err) {
+            goto end;
+        }
+        break;
     default:
         abort();
     }
diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
index 788cbec7a2..b45afcb29e 100644
--- a/include/sysemu/numa.h
+++ b/include/sysemu/numa.h
@@ -14,6 +14,29 @@ struct CPUArchId;
 #define NUMA_DISTANCE_MAX         254
 #define NUMA_DISTANCE_UNREACHABLE 255
 
+/* the value of AcpiHmatLBInfo flags */
+enum {
+    HMAT_LB_MEM_MEMORY           = 0,
+    HMAT_LB_MEM_CACHE_1ST_LEVEL  = 1,
+    HMAT_LB_MEM_CACHE_2ND_LEVEL  = 2,
+    HMAT_LB_MEM_CACHE_3RD_LEVEL  = 3,
+};
+
+/* the value of AcpiHmatLBInfo data type */
+enum {
+    HMAT_LB_DATA_ACCESS_LATENCY   = 0,
+    HMAT_LB_DATA_READ_LATENCY     = 1,
+    HMAT_LB_DATA_WRITE_LATENCY    = 2,
+    HMAT_LB_DATA_ACCESS_BANDWIDTH = 3,
+    HMAT_LB_DATA_READ_BANDWIDTH   = 4,
+    HMAT_LB_DATA_WRITE_BANDWIDTH  = 5,
+};
+
+#define UINT16_BITS       16
+
+#define HMAT_LB_LEVELS    (HMAT_LB_MEM_CACHE_3RD_LEVEL + 1)
+#define HMAT_LB_TYPES     (HMAT_LB_DATA_WRITE_BANDWIDTH + 1)
+
 struct NodeInfo {
     uint64_t node_mem;
     struct HostMemoryBackend *node_memdev;
@@ -28,6 +51,46 @@ struct NumaNodeMem {
     uint64_t node_plugged_mem;
 };
 
+struct HMAT_LB_Data {
+    uint8_t     initiator;
+    uint8_t     target;
+    uint64_t    rawdata;
+};
+typedef struct HMAT_LB_Data HMAT_LB_Data;
+
+struct HMAT_LB_Info {
+    /* Indicates it's memory or the specified level memory side cache. */
+    uint8_t     hierarchy;
+
+    /* Present the type of data, access/read/write latency or bandwidth. */
+    uint8_t     data_type;
+
+    /* The left range of latency for calculating common latency base */
+    uint64_t    range_left_la;
+
+    /* The range bitmap of bandwidth for calculating common bandwidth base */
+    uint64_t    range_bitmap_bw;
+
+    /* The common base unit for latencies */
+    uint64_t    base_latency;
+
+    /* The common base unit for bandwidths */
+    uint64_t    base_bandwidth;
+
+    /* Array to store the compressed latencies */
+    uint16_t    *entry_latency;
+
+    /* Array to store the compressed latencies */
+    uint16_t    *entry_bandwidth;
+
+    /* Array to store the latencies */
+    GArray      *latency;
+
+    /* Array to store the bandwidthes */
+    GArray      *bandwidth;
+};
+typedef struct HMAT_LB_Info HMAT_LB_Info;
+
 struct NumaState {
     /* Number of NUMA nodes */
     int num_nodes;
@@ -40,11 +103,16 @@ struct NumaState {
 
     /* NUMA nodes information */
     NodeInfo nodes[MAX_NODES];
+
+    /* NUMA nodes HMAT Locality Latency and Bandwidth Information */
+    HMAT_LB_Info *hmat_lb[HMAT_LB_LEVELS][HMAT_LB_TYPES];
 };
 typedef struct NumaState NumaState;
 
 void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp);
 void parse_numa_opts(MachineState *ms);
+void parse_numa_hmat_lb(NumaState *numa_state, NumaHmatLBOptions *node,
+                        Error **errp);
 void numa_complete_configuration(MachineState *ms);
 void query_numa_node_mem(NumaNodeMem node_mem[], MachineState *ms);
 extern QemuOptsList qemu_numa_opts;
diff --git a/qapi/machine.json b/qapi/machine.json
index f1b07b3486..9ca008810b 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -426,10 +426,12 @@
 #
 # @cpu: property based CPU(s) to node mapping (Since: 2.10)
 #
+# @hmat-lb: memory latency and bandwidth information (Since: 4.2)
+#
 # Since: 2.1
 ##
 { 'enum': 'NumaOptionsType',
-  'data': [ 'node', 'dist', 'cpu' ] }
+  'data': [ 'node', 'dist', 'cpu', 'hmat-lb' ] }
 
 ##
 # @NumaOptions:
@@ -444,7 +446,8 @@
   'data': {
     'node': 'NumaNodeOptions',
     'dist': 'NumaDistOptions',
-    'cpu': 'NumaCpuOptions' }}
+    'cpu': 'NumaCpuOptions',
+    'hmat-lb': 'NumaHmatLBOptions' }}
 
 ##
 # @NumaNodeOptions:
@@ -557,6 +560,94 @@
    'base': 'CpuInstanceProperties',
    'data' : {} }
 
+##
+# @HmatLBMemoryHierarchy:
+#
+# The memory hierarchy in the System Locality Latency
+# and Bandwidth Information Structure of HMAT (Heterogeneous
+# Memory Attribute Table)
+#
+# For more information of @HmatLBMemoryHierarchy see
+# the chapter 5.2.27.4: Table 5-142: Field "Flags" of ACPI 6.3 spec.
+#
+# @memory: the structure represents the memory performance
+#
+# @first-level: first level memory of memory side cached memory
+#
+# @second-level: second level memory of memory side cached memory
+#
+# @third-level: third level memory of memory side cached memory
+#
+# Since: 4.2
+##
+{ 'enum': 'HmatLBMemoryHierarchy',
+  'data': [ 'memory', 'first-level', 'second-level', 'third-level' ] }
+
+##
+# @HmatLBDataType:
+#
+# Data type in the System Locality Latency
+# and Bandwidth Information Structure of HMAT (Heterogeneous
+# Memory Attribute Table)
+#
+# For more information of @HmatLBDataType see
+# the chapter 5.2.27.4: Table 5-142:  Field "Data Type" of ACPI 6.3 spec.
+#
+# @access-latency: access latency (nanoseconds)
+#
+# @read-latency: read latency (nanoseconds)
+#
+# @write-latency: write latency (nanoseconds)
+#
+# @access-bandwidth: access bandwidth (MB/s)
+#
+# @read-bandwidth: read bandwidth (MB/s)
+#
+# @write-bandwidth: write bandwidth (MB/s)
+#
+# Since: 4.2
+##
+{ 'enum': 'HmatLBDataType',
+  'data': [ 'access-latency', 'read-latency', 'write-latency',
+            'access-bandwidth', 'read-bandwidth', 'write-bandwidth' ] }
+
+##
+# @NumaHmatLBOptions:
+#
+# Set the system locality latency and bandwidth information
+# between Initiator and Target proximity Domains.
+#
+# For more information of @NumaHmatLBOptions see
+# the chapter 5.2.27.4: Table 5-142 of ACPI 6.3 spec.
+#
+# @initiator: the Initiator Proximity Domain.
+#
+# @target: the Target Proximity Domain.
+#
+# @hierarchy: the Memory Hierarchy. Indicates the performance
+#             of memory or side cache.
+#
+# @data-type: presents the type of data, access/read/write
+#             latency or hit latency.
+#
+# @latency: the value of latency from @initiator to @target proximity domain,
+#           the latency units are "ps(picosecond)", "ns(nanosecond)" or
+#           "us(microsecond)".
+#
+# @bandwidth: the value of bandwidth between @initiator and @target proximity
+#             domain, the bandwidth units are "MB(/s)","GB(/s)" or "TB(/s)".
+#
+# Since: 4.2
+##
+{ 'struct': 'NumaHmatLBOptions',
+    'data': {
+    'initiator': 'uint16',
+    'target': 'uint16',
+    'hierarchy': 'HmatLBMemoryHierarchy',
+    'data-type': 'HmatLBDataType',
+    '*latency': 'time',
+    '*bandwidth': 'size' }}
+
 ##
 # @HostMemPolicy:
 #
diff --git a/qemu-options.hx b/qemu-options.hx
index 1f96399521..de97939f9a 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -168,16 +168,19 @@ DEF("numa", HAS_ARG, QEMU_OPTION_numa,
     "-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
     "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
     "-numa dist,src=source,dst=destination,val=distance\n"
-    "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n",
+    "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n"
+    "-numa hmat-lb,initiator=node,target=node,hierarchy=memory|first-level|second-level|third-level,data-type=access-latency|read-latency|write-latency[,latency=lat][,bandwidth=bw]\n",
     QEMU_ARCH_ALL)
 STEXI
 @item -numa node[,mem=@var{size}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}][,initiator=@var{initiator}]
 @itemx -numa node[,memdev=@var{id}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}][,initiator=@var{initiator}]
 @itemx -numa dist,src=@var{source},dst=@var{destination},val=@var{distance}
 @itemx -numa cpu,node-id=@var{node}[,socket-id=@var{x}][,core-id=@var{y}][,thread-id=@var{z}]
+@itemx -numa hmat-lb,initiator=@var{node},target=@var{node},hierarchy=@var{str},data-type=@var{str}[,latency=@var{lat}][,bandwidth=@var{bw}]
 @findex -numa
 Define a NUMA node and assign RAM and VCPUs to it.
 Set the NUMA distance from a source node to a destination node.
+Set the ACPI Heterogeneous Memory Attributes for the given nodes.
 
 Legacy VCPU assignment uses @samp{cpus} option where
 @var{firstcpu} and @var{lastcpu} are CPU indexes. Each
@@ -256,6 +259,50 @@ specified resources, it just assigns existing resources to NUMA
 nodes. This means that one still has to use the @option{-m},
 @option{-smp} options to allocate RAM and VCPUs respectively.
 
+Use @samp{hmat-lb} to set System Locality Latency and Bandwidth Information
+between initiator and target NUMA nodes in ACPI Heterogeneous Attribute Memory Table (HMAT).
+Initiator NUMA node can create memory requests, usually including one or more processors.
+Target NUMA node contains addressable memory.
+
+In @samp{hmat-lb} option, @var{node} are NUMA node IDs. @var{str} of 'hierarchy'
+is the memory hierarchy of the target NUMA node: if @var{str} is 'memory', the structure
+represents the memory performance; if @var{str} is 'first-level|second-level|third-level',
+this structure represents aggregated performance of memory side caches for each domain.
+@var{str} of 'data-type' is type of data represented by this structure instance:
+if 'hierarchy' is 'memory', 'data-type' is 'access|read|write' latency(nanoseconds)
+or 'access|read|write' bandwidth(MB/s) of the target memory; if 'hierarchy' is
+'first-level|second-level|third-level', 'data-type' is 'access|read|write' hit latency
+or 'access|read|write' hit bandwidth of the target memory side cache.
+
+@var{lat} of 'latency' is latency value, the possible value and units are
+NUM[ps|ns|us] (picosecond|nanosecond|microsecond), the recommended unit is 'ns'. @var{bw}
+is bandwidth value, the possible value and units are NUM[M|G|T], mean that
+the bandwidth value are NUM MB/s, GB/s or TB/s. Note that max NUM is 65534,
+if NUM is 0, means the corresponding latency or bandwidth information is not provided.
+And if input numbers without any unit, the latency unit will be 'ps' and the bandwidth
+will be MB/s.
+
+For example, the following option assigns NUMA node 0 and 1. Node 0 has 2 cpus and
+a ram, node 1 has only a ram. The processors in node 0 access memory in node
+0 with access-latency 5 nanoseconds, access-bandwidth is 200 MB/s;
+The processors in NUMA node 0 access memory in NUMA node 1 with access-latency 10
+nanoseconds, access-bandwidth is 100 MB/s.
+@example
+-machine hmat=on \
+-m 2G \
+-object memory-backend-ram,size=1G,id=m0 \
+-object memory-backend-ram,size=1G,id=m1 \
+-smp 2 \
+-numa node,nodeid=0,memdev=m0 \
+-numa node,nodeid=1,memdev=m1,initiator=0 \
+-numa cpu,node-id=0,socket-id=0 \
+-numa cpu,node-id=0,socket-id=1 \
+-numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,latency=5ns \
+-numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=200M \
+-numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,latency=10ns \
+-numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=100M
+@end example
+
 ETEXI
 
 DEF("add-fd", HAS_ARG, QEMU_OPTION_add_fd,
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v13 07/12] numa: Calculate hmat latency and bandwidth entry list
  2019-10-20 11:11 [PATCH v13 00/12] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao Xu
                   ` (5 preceding siblings ...)
  2019-10-20 11:11 ` [PATCH v13 06/12] numa: Extend CLI to provide memory latency and bandwidth information Tao Xu
@ 2019-10-20 11:11 ` Tao Xu
  2019-10-20 11:11 ` [PATCH v13 08/12] numa: Extend CLI to provide memory side cache information Tao Xu
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 38+ messages in thread
From: Tao Xu @ 2019-10-20 11:11 UTC (permalink / raw)
  To: imammedo, eblake, ehabkost
  Cc: jingqi.liu, tao3.xu, fan.du, qemu-devel, jonathan.cameron

Compress HMAT latency and bandwidth raw data into uint16_t data,
which can be stored in HMAT table.

Suggested-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
---

New patch in v13.
---
 hw/core/numa.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 56 insertions(+), 1 deletion(-)

diff --git a/hw/core/numa.c b/hw/core/numa.c
index 3cf77f6ac9..4033a5a470 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -474,6 +474,45 @@ static void complete_init_numa_distance(MachineState *ms)
     }
 }
 
+static void calculate_hmat_entry_list(HMAT_LB_Info *hmat_lb, int num_nodes)
+{
+    int i, index;
+    uint16_t *entry_list;
+    uint64_t base;
+    GArray *lb_data_list;
+    HMAT_LB_Data *lb_data;
+
+    if (hmat_lb->data_type <= HMAT_LB_DATA_WRITE_LATENCY) {
+        base = hmat_lb->base_latency;
+        lb_data_list = hmat_lb->latency;
+    } else {
+        base = hmat_lb->base_bandwidth;
+        lb_data_list = hmat_lb->bandwidth;
+    }
+
+    entry_list = g_malloc0(lb_data_list->len * sizeof(uint16_t));
+    for (i = 0; i < lb_data_list->len; i++) {
+        lb_data = &g_array_index(lb_data_list, HMAT_LB_Data, i);
+        index = lb_data->initiator * num_nodes + lb_data->target;
+        if (entry_list[index]) {
+            error_report("Duplicate configuration of the latency for "
+                "initiator=%d and target=%d.", lb_data->initiator,
+                lb_data->target);
+            exit(1);
+        }
+
+        entry_list[index] = (uint16_t)(lb_data->rawdata / base);
+    }
+
+    if (hmat_lb->data_type <= HMAT_LB_DATA_WRITE_LATENCY) {
+        hmat_lb->entry_latency = entry_list;
+    } else {
+        /* Convert base from Byte to Megabyte */
+        hmat_lb->base_bandwidth = base / MiB;
+        hmat_lb->entry_bandwidth = entry_list;
+    }
+}
+
 void numa_legacy_auto_assign_ram(MachineClass *mc, NodeInfo *nodes,
                                  int nb_nodes, ram_addr_t size)
 {
@@ -512,9 +551,10 @@ void numa_default_auto_assign_ram(MachineClass *mc, NodeInfo *nodes,
 
 void numa_complete_configuration(MachineState *ms)
 {
-    int i;
+    int i, hierarchy, type;
     MachineClass *mc = MACHINE_GET_CLASS(ms);
     NodeInfo *numa_info = ms->numa_state->nodes;
+    HMAT_LB_Info *numa_hmat_lb;
 
     /*
      * If memory hotplug is enabled (slots > 0) but without '-numa'
@@ -611,6 +651,21 @@ void numa_complete_configuration(MachineState *ms)
             /* Validation succeeded, now fill in any missing distances. */
             complete_init_numa_distance(ms);
         }
+
+        if (ms->numa_state->hmat_enabled) {
+            for (hierarchy = HMAT_LB_MEM_MEMORY;
+                 hierarchy <= HMAT_LB_MEM_CACHE_3RD_LEVEL; hierarchy++) {
+                for (type = HMAT_LB_DATA_ACCESS_LATENCY;
+                    type <= HMAT_LB_DATA_WRITE_BANDWIDTH; type++) {
+                    numa_hmat_lb = ms->numa_state->hmat_lb[hierarchy][type];
+
+                    if (numa_hmat_lb) {
+                        calculate_hmat_entry_list(numa_hmat_lb,
+                                                  ms->numa_state->num_nodes);
+                    }
+                }
+            }
+        }
     }
 }
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v13 08/12] numa: Extend CLI to provide memory side cache information
  2019-10-20 11:11 [PATCH v13 00/12] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao Xu
                   ` (6 preceding siblings ...)
  2019-10-20 11:11 ` [PATCH v13 07/12] numa: Calculate hmat latency and bandwidth entry list Tao Xu
@ 2019-10-20 11:11 ` Tao Xu
  2019-10-20 11:11 ` [PATCH v13 09/12] hmat acpi: Build Memory Proximity Domain Attributes Structure(s) Tao Xu
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 38+ messages in thread
From: Tao Xu @ 2019-10-20 11:11 UTC (permalink / raw)
  To: imammedo, eblake, ehabkost
  Cc: jingqi.liu, tao3.xu, fan.du, qemu-devel, Daniel Black, jonathan.cameron

From: Liu Jingqi <jingqi.liu@intel.com>

Add -numa hmat-cache option to provide Memory Side Cache Information.
These memory attributes help to build Memory Side Cache Information
Structure(s) in ACPI Heterogeneous Memory Attribute Table (HMAT).

Reviewed-by: Daniel Black <daniel@linux.ibm.com>
Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
---

Changes in v13:
    - Drop the total_levels option.
    - Use readable cache size (Igor)
---
 hw/core/numa.c        | 66 ++++++++++++++++++++++++++++++++++++
 include/sysemu/numa.h | 31 +++++++++++++++++
 qapi/machine.json     | 78 +++++++++++++++++++++++++++++++++++++++++--
 qemu-options.hx       | 16 +++++++--
 4 files changed, 187 insertions(+), 4 deletions(-)

diff --git a/hw/core/numa.c b/hw/core/numa.c
index 4033a5a470..4ef7a94a84 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -312,6 +312,59 @@ void parse_numa_hmat_lb(NumaState *numa_state, NumaHmatLBOptions *node,
     }
 }
 
+void parse_numa_hmat_cache(MachineState *ms, NumaHmatCacheOptions *node,
+                           Error **errp)
+{
+    int nb_numa_nodes = ms->numa_state->num_nodes;
+    HMAT_Cache_Info *hmat_cache = NULL;
+
+    if (node->node_id >= nb_numa_nodes) {
+        error_setg(errp, "Invalid node-id=%" PRIu32
+                   ", it should be less than %d.",
+                   node->node_id, nb_numa_nodes);
+        return;
+    }
+
+    if (node->level > MAX_HMAT_CACHE_LEVEL) {
+        error_setg(errp, "Invalid level=%" PRIu8
+                   ", it should be less than or equal to %d.",
+                   node->level, MAX_HMAT_CACHE_LEVEL);
+        return;
+    }
+    if (ms->numa_state->hmat_cache[node->node_id][node->level]) {
+        error_setg(errp, "Duplicate configuration of the side cache for "
+                   "node-id=%" PRIu32 " and level=%" PRIu8 ".",
+                   node->node_id, node->level);
+        return;
+    }
+
+    if ((node->level > 1) &&
+        ms->numa_state->hmat_cache[node->node_id][node->level - 1] &&
+        (node->size >=
+            ms->numa_state->hmat_cache[node->node_id][node->level - 1]->size)) {
+        error_setg(errp, "Invalid size=0x%" PRIx64
+                   ", the size of level=%" PRIu8
+                   " should be less than the size(0x%" PRIx64
+                   ") of level=%" PRIu8 ".",
+                   node->size, node->level,
+                   ms->numa_state->hmat_cache[node->node_id]
+                                             [node->level - 1]->size,
+                   node->level - 1);
+        return;
+    }
+
+    hmat_cache = g_malloc0(sizeof(*hmat_cache));
+
+    hmat_cache->proximity = node->node_id;
+    hmat_cache->size = node->size;
+    hmat_cache->level = node->level;
+    hmat_cache->associativity = node->assoc;
+    hmat_cache->write_policy = node->policy;
+    hmat_cache->line_size = node->line;
+
+    ms->numa_state->hmat_cache[node->node_id][node->level] = hmat_cache;
+}
+
 void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp)
 {
     Error *err = NULL;
@@ -363,6 +416,19 @@ void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp)
             goto end;
         }
         break;
+    case NUMA_OPTIONS_TYPE_HMAT_CACHE:
+        if (!ms->numa_state->hmat_enabled) {
+            error_setg(errp, "ACPI Heterogeneous Memory Attribute Table "
+                       "(HMAT) is disabled, enable it with -machine hmat=on "
+                       "before using any of hmat specific options.");
+            return;
+        }
+
+        parse_numa_hmat_cache(ms, &object->u.hmat_cache, &err);
+        if (err) {
+            goto end;
+        }
+        break;
     default:
         abort();
     }
diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
index b45afcb29e..e80883819e 100644
--- a/include/sysemu/numa.h
+++ b/include/sysemu/numa.h
@@ -37,6 +37,8 @@ enum {
 #define HMAT_LB_LEVELS    (HMAT_LB_MEM_CACHE_3RD_LEVEL + 1)
 #define HMAT_LB_TYPES     (HMAT_LB_DATA_WRITE_BANDWIDTH + 1)
 
+#define MAX_HMAT_CACHE_LEVEL    HMAT_LB_MEM_CACHE_3RD_LEVEL
+
 struct NodeInfo {
     uint64_t node_mem;
     struct HostMemoryBackend *node_memdev;
@@ -91,6 +93,30 @@ struct HMAT_LB_Info {
 };
 typedef struct HMAT_LB_Info HMAT_LB_Info;
 
+struct HMAT_Cache_Info {
+    /* The memory proximity domain to which the memory belongs. */
+    uint32_t    proximity;
+
+    /* Size of memory side cache in bytes. */
+    uint64_t    size;
+
+    /* Total cache levels for this memory proximity domain. */
+    uint8_t     total_levels;
+
+    /* Cache level described in this structure. */
+    uint8_t     level;
+
+    /* Cache Associativity: None/Direct Mapped/Comple Cache Indexing */
+    uint8_t     associativity;
+
+    /* Write Policy: None/Write Back(WB)/Write Through(WT) */
+    uint8_t     write_policy;
+
+    /* Cache Line size in bytes. */
+    uint16_t    line_size;
+};
+typedef struct HMAT_Cache_Info HMAT_Cache_Info;
+
 struct NumaState {
     /* Number of NUMA nodes */
     int num_nodes;
@@ -106,6 +132,9 @@ struct NumaState {
 
     /* NUMA nodes HMAT Locality Latency and Bandwidth Information */
     HMAT_LB_Info *hmat_lb[HMAT_LB_LEVELS][HMAT_LB_TYPES];
+
+    /* Memory Side Cache Information Structure */
+    HMAT_Cache_Info *hmat_cache[MAX_NODES][MAX_HMAT_CACHE_LEVEL + 1];
 };
 typedef struct NumaState NumaState;
 
@@ -113,6 +142,8 @@ void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp);
 void parse_numa_opts(MachineState *ms);
 void parse_numa_hmat_lb(NumaState *numa_state, NumaHmatLBOptions *node,
                         Error **errp);
+void parse_numa_hmat_cache(MachineState *ms, NumaHmatCacheOptions *node,
+                           Error **errp);
 void numa_complete_configuration(MachineState *ms);
 void query_numa_node_mem(NumaNodeMem node_mem[], MachineState *ms);
 extern QemuOptsList qemu_numa_opts;
diff --git a/qapi/machine.json b/qapi/machine.json
index 9ca008810b..402e607f9c 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -428,10 +428,12 @@
 #
 # @hmat-lb: memory latency and bandwidth information (Since: 4.2)
 #
+# @hmat-cache: memory side cache information (Since: 4.2)
+#
 # Since: 2.1
 ##
 { 'enum': 'NumaOptionsType',
-  'data': [ 'node', 'dist', 'cpu', 'hmat-lb' ] }
+  'data': [ 'node', 'dist', 'cpu', 'hmat-lb', 'hmat-cache' ] }
 
 ##
 # @NumaOptions:
@@ -447,7 +449,8 @@
     'node': 'NumaNodeOptions',
     'dist': 'NumaDistOptions',
     'cpu': 'NumaCpuOptions',
-    'hmat-lb': 'NumaHmatLBOptions' }}
+    'hmat-lb': 'NumaHmatLBOptions',
+    'hmat-cache': 'NumaHmatCacheOptions' }}
 
 ##
 # @NumaNodeOptions:
@@ -648,6 +651,77 @@
     '*latency': 'time',
     '*bandwidth': 'size' }}
 
+##
+# @HmatCacheAssociativity:
+#
+# Cache associativity in the Memory Side Cache
+# Information Structure of HMAT
+#
+# For more information of @HmatCacheAssociativity see
+# the chapter 5.2.27.5: Table 5-143 of ACPI 6.3 spec.
+#
+# @none: None
+#
+# @direct: Direct Mapped
+#
+# @complex: Complex Cache Indexing (implementation specific)
+#
+# Since: 4.2
+##
+{ 'enum': 'HmatCacheAssociativity',
+  'data': [ 'none', 'direct', 'complex' ] }
+
+##
+# @HmatCacheWritePolicy:
+#
+# Cache write policy in the Memory Side Cache
+# Information Structure of HMAT
+#
+# For more information of @HmatCacheWritePolicy see
+# the chapter 5.2.27.5: Table 5-143: Field "Cache Attributes" of ACPI 6.3 spec.
+#
+# @none: None
+#
+# @write-back: Write Back (WB)
+#
+# @write-through: Write Through (WT)
+#
+# Since: 4.2
+##
+{ 'enum': 'HmatCacheWritePolicy',
+  'data': [ 'none', 'write-back', 'write-through' ] }
+
+##
+# @NumaHmatCacheOptions:
+#
+# Set the memory side cache information for a given memory domain.
+#
+# For more information of @NumaHmatCacheOptions see
+# the chapter 5.2.27.5: Table 5-143: Field "Cache Attributes" of ACPI 6.3 spec.
+#
+# @node-id: the memory proximity domain to which the memory belongs.
+#
+# @size: the size of memory side cache in bytes.
+#
+# @level: the cache level described in this structure.
+#
+# @assoc: the cache associativity, none/direct-mapped/complex(complex cache indexing).
+#
+# @policy: the write policy, none/write-back/write-through.
+#
+# @line: the cache Line size in bytes.
+#
+# Since: 4.2
+##
+{ 'struct': 'NumaHmatCacheOptions',
+  'data': {
+   'node-id': 'uint32',
+   'size': 'size',
+   'level': 'uint8',
+   'assoc': 'HmatCacheAssociativity',
+   'policy': 'HmatCacheWritePolicy',
+   'line': 'uint16' }}
+
 ##
 # @HostMemPolicy:
 #
diff --git a/qemu-options.hx b/qemu-options.hx
index de97939f9a..af8742f944 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -169,7 +169,8 @@ DEF("numa", HAS_ARG, QEMU_OPTION_numa,
     "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
     "-numa dist,src=source,dst=destination,val=distance\n"
     "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n"
-    "-numa hmat-lb,initiator=node,target=node,hierarchy=memory|first-level|second-level|third-level,data-type=access-latency|read-latency|write-latency[,latency=lat][,bandwidth=bw]\n",
+    "-numa hmat-lb,initiator=node,target=node,hierarchy=memory|first-level|second-level|third-level,data-type=access-latency|read-latency|write-latency[,latency=lat][,bandwidth=bw]\n"
+    "-numa hmat-cache,node-id=node,size=size,level=level[,assoc=none|direct|complex][,policy=none|write-back|write-through][,line=size]\n",
     QEMU_ARCH_ALL)
 STEXI
 @item -numa node[,mem=@var{size}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}][,initiator=@var{initiator}]
@@ -177,6 +178,7 @@ STEXI
 @itemx -numa dist,src=@var{source},dst=@var{destination},val=@var{distance}
 @itemx -numa cpu,node-id=@var{node}[,socket-id=@var{x}][,core-id=@var{y}][,thread-id=@var{z}]
 @itemx -numa hmat-lb,initiator=@var{node},target=@var{node},hierarchy=@var{str},data-type=@var{str}[,latency=@var{lat}][,bandwidth=@var{bw}]
+@itemx -numa hmat-cache,node-id=@var{node},size=@var{size},level=@var{level}[,assoc=@var{str}][,policy=@var{str}][,line=@var{size}]
 @findex -numa
 Define a NUMA node and assign RAM and VCPUs to it.
 Set the NUMA distance from a source node to a destination node.
@@ -282,11 +284,19 @@ if NUM is 0, means the corresponding latency or bandwidth information is not pro
 And if input numbers without any unit, the latency unit will be 'ps' and the bandwidth
 will be MB/s.
 
+In @samp{hmat-cache} option, @var{node-id} is the NUMA-id of the memory belongs.
+@var{size} is the size of memory side cache in bytes. @var{level} is the cache
+level described in this structure. @var{assoc} is the cache associativity,
+the possible value is 'none/direct(direct-mapped)/complex(complex cache indexing)'.
+@var{policy} is the write policy. @var{line} is the cache Line size in bytes.
+
 For example, the following option assigns NUMA node 0 and 1. Node 0 has 2 cpus and
 a ram, node 1 has only a ram. The processors in node 0 access memory in node
 0 with access-latency 5 nanoseconds, access-bandwidth is 200 MB/s;
 The processors in NUMA node 0 access memory in NUMA node 1 with access-latency 10
 nanoseconds, access-bandwidth is 100 MB/s.
+And for memory side cache information, NUMA node 0 and 1 both have 1 level memory
+cache, size is 10KB, policy is write-back, the cache Line size is 8 bytes:
 @example
 -machine hmat=on \
 -m 2G \
@@ -300,7 +310,9 @@ nanoseconds, access-bandwidth is 100 MB/s.
 -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,latency=5ns \
 -numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=200M \
 -numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,latency=10ns \
--numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=100M
+-numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=100M \
+-numa hmat-cache,node-id=0,size=10K,level=1,assoc=direct,policy=write-back,line=8 \
+-numa hmat-cache,node-id=1,size=10K,level=1,assoc=direct,policy=write-back,line=8
 @end example
 
 ETEXI
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v13 09/12] hmat acpi: Build Memory Proximity Domain Attributes Structure(s)
  2019-10-20 11:11 [PATCH v13 00/12] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao Xu
                   ` (7 preceding siblings ...)
  2019-10-20 11:11 ` [PATCH v13 08/12] numa: Extend CLI to provide memory side cache information Tao Xu
@ 2019-10-20 11:11 ` Tao Xu
  2019-10-20 11:11 ` [PATCH v13 10/12] hmat acpi: Build System Locality Latency and Bandwidth Information Structure(s) Tao Xu
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 38+ messages in thread
From: Tao Xu @ 2019-10-20 11:11 UTC (permalink / raw)
  To: imammedo, eblake, ehabkost
  Cc: jingqi.liu, tao3.xu, fan.du, qemu-devel, Daniel Black, Jonathan Cameron

From: Liu Jingqi <jingqi.liu@intel.com>

HMAT is defined in ACPI 6.3: 5.2.27 Heterogeneous Memory Attribute Table
(HMAT). The specification references below link:
http://www.uefi.org/sites/default/files/resources/ACPI_6_3_final_Jan30.pdf

It describes the memory attributes, such as memory side cache
attributes and bandwidth and latency details, related to the
Memory Proximity Domain. The software is
expected to use this information as hint for optimization.

This structure describes Memory Proximity Domain Attributes by memory
subsystem and its associativity with processor proximity domain as well as
hint for memory usage.

In the linux kernel, the codes in drivers/acpi/hmat/hmat.c parse and report
the platform's HMAT tables.

Reviewed-by: Daniel Black <daniel@linux.ibm.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
---

Changes in v13:
    - Remove the unnecessary head file.
---
 hw/acpi/Kconfig       |  7 ++-
 hw/acpi/Makefile.objs |  1 +
 hw/acpi/hmat.c        | 99 +++++++++++++++++++++++++++++++++++++++++++
 hw/acpi/hmat.h        | 42 ++++++++++++++++++
 hw/i386/acpi-build.c  |  5 +++
 5 files changed, 152 insertions(+), 2 deletions(-)
 create mode 100644 hw/acpi/hmat.c
 create mode 100644 hw/acpi/hmat.h

diff --git a/hw/acpi/Kconfig b/hw/acpi/Kconfig
index 12e3f1e86e..54209c6f2f 100644
--- a/hw/acpi/Kconfig
+++ b/hw/acpi/Kconfig
@@ -7,6 +7,7 @@ config ACPI_X86
     select ACPI_NVDIMM
     select ACPI_CPU_HOTPLUG
     select ACPI_MEMORY_HOTPLUG
+    select ACPI_HMAT
 
 config ACPI_X86_ICH
     bool
@@ -23,6 +24,10 @@ config ACPI_NVDIMM
     bool
     depends on ACPI
 
+config ACPI_HMAT
+    bool
+    depends on ACPI
+
 config ACPI_PCI
     bool
     depends on ACPI && PCI
@@ -33,5 +38,3 @@ config ACPI_VMGENID
     depends on PC
 
 config ACPI_HW_REDUCED
-    bool
-    depends on ACPI
diff --git a/hw/acpi/Makefile.objs b/hw/acpi/Makefile.objs
index 655a9c1973..517bd88704 100644
--- a/hw/acpi/Makefile.objs
+++ b/hw/acpi/Makefile.objs
@@ -7,6 +7,7 @@ common-obj-$(CONFIG_ACPI_CPU_HOTPLUG) += cpu.o
 common-obj-$(CONFIG_ACPI_NVDIMM) += nvdimm.o
 common-obj-$(CONFIG_ACPI_VMGENID) += vmgenid.o
 common-obj-$(CONFIG_ACPI_HW_REDUCED) += generic_event_device.o
+common-obj-$(CONFIG_ACPI_HMAT) += hmat.o
 common-obj-$(call lnot,$(CONFIG_ACPI_X86)) += acpi-stub.o
 
 common-obj-y += acpi_interface.o
diff --git a/hw/acpi/hmat.c b/hw/acpi/hmat.c
new file mode 100644
index 0000000000..c595098ba7
--- /dev/null
+++ b/hw/acpi/hmat.c
@@ -0,0 +1,99 @@
+/*
+ * HMAT ACPI Implementation
+ *
+ * Copyright(C) 2019 Intel Corporation.
+ *
+ * Author:
+ *  Liu jingqi <jingqi.liu@linux.intel.com>
+ *  Tao Xu <tao3.xu@intel.com>
+ *
+ * HMAT is defined in ACPI 6.3: 5.2.27 Heterogeneous Memory Attribute Table
+ * (HMAT)
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>
+ */
+
+#include "qemu/osdep.h"
+#include "sysemu/numa.h"
+#include "hw/acpi/hmat.h"
+
+/*
+ * ACPI 6.3:
+ * 5.2.27.3 Memory Proximity Domain Attributes Structure: Table 5-145
+ */
+static void build_hmat_mpda(GArray *table_data, uint16_t flags,
+                            uint16_t initiator, uint16_t mem_node)
+{
+
+    /* Memory Proximity Domain Attributes Structure */
+    /* Type */
+    build_append_int_noprefix(table_data, 0, 2);
+    /* Reserved */
+    build_append_int_noprefix(table_data, 0, 2);
+    /* Length */
+    build_append_int_noprefix(table_data, 40, 4);
+    /* Flags */
+    build_append_int_noprefix(table_data, flags, 2);
+    /* Reserved */
+    build_append_int_noprefix(table_data, 0, 2);
+    /* Proximity Domain for the Attached Initiator */
+    build_append_int_noprefix(table_data, initiator, 4);
+    /* Proximity Domain for the Memory */
+    build_append_int_noprefix(table_data, mem_node, 4);
+    /* Reserved */
+    build_append_int_noprefix(table_data, 0, 4);
+    /*
+     * Reserved:
+     * Previously defined as the Start Address of the System Physical
+     * Address Range. Deprecated since ACPI Spec 6.3.
+     */
+    build_append_int_noprefix(table_data, 0, 8);
+    /*
+     * Reserved:
+     * Previously defined as the Range Length of the region in bytes.
+     * Deprecated since ACPI Spec 6.3.
+     */
+    build_append_int_noprefix(table_data, 0, 8);
+}
+
+/* Build HMAT sub table structures */
+static void hmat_build_table_structs(GArray *table_data, NumaState *numa_state)
+{
+    uint16_t flags;
+    int i;
+
+    for (i = 0; i < numa_state->num_nodes; i++) {
+        flags = 0;
+
+        if (numa_state->nodes[i].initiator < MAX_NODES) {
+            flags |= HMAT_PROXIMITY_INITIATOR_VALID;
+        }
+
+        build_hmat_mpda(table_data, flags, numa_state->nodes[i].initiator, i);
+    }
+}
+
+void build_hmat(GArray *table_data, BIOSLinker *linker, NumaState *numa_state)
+{
+    int hmat_start = table_data->len;
+
+    /* reserve space for HMAT header  */
+    acpi_data_push(table_data, 40);
+
+    hmat_build_table_structs(table_data, numa_state);
+
+    build_header(linker, table_data,
+                 (void *)(table_data->data + hmat_start),
+                 "HMAT", table_data->len - hmat_start, 2, NULL, NULL);
+}
diff --git a/hw/acpi/hmat.h b/hw/acpi/hmat.h
new file mode 100644
index 0000000000..437dbc6872
--- /dev/null
+++ b/hw/acpi/hmat.h
@@ -0,0 +1,42 @@
+/*
+ * HMAT ACPI Implementation Header
+ *
+ * Copyright(C) 2019 Intel Corporation.
+ *
+ * Author:
+ *  Liu jingqi <jingqi.liu@linux.intel.com>
+ *  Tao Xu <tao3.xu@intel.com>
+ *
+ * HMAT is defined in ACPI 6.3: 5.2.27 Heterogeneous Memory Attribute Table
+ * (HMAT)
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>
+ */
+
+#ifndef HMAT_H
+#define HMAT_H
+
+#include "hw/acpi/aml-build.h"
+
+/*
+ * ACPI 6.3: 5.2.27.3 Memory Proximity Domain Attributes Structure,
+ * Table 5-145, Field "flag", Bit [0]: set to 1 to indicate that data in
+ * the Proximity Domain for the Attached Initiator field is valid.
+ * Other bits reserved.
+ */
+#define HMAT_PROXIMITY_INITIATOR_VALID  0x1
+
+void build_hmat(GArray *table_data, BIOSLinker *linker, NumaState *numa_state);
+
+#endif
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 1d077a7cb7..df98945b31 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -68,6 +68,7 @@
 #include "hw/i386/intel_iommu.h"
 
 #include "hw/acpi/ipmi.h"
+#include "hw/acpi/hmat.h"
 
 /* These are used to size the ACPI tables for -M pc-i440fx-1.7 and
  * -M pc-i440fx-2.0.  Even if the actual amount of AML generated grows
@@ -2718,6 +2719,10 @@ void acpi_build(AcpiBuildTables *tables, MachineState *machine)
             acpi_add_table(table_offsets, tables_blob);
             build_slit(tables_blob, tables->linker, machine);
         }
+        if (machine->numa_state->hmat_enabled) {
+            acpi_add_table(table_offsets, tables_blob);
+            build_hmat(tables_blob, tables->linker, machine->numa_state);
+        }
     }
     if (acpi_get_mcfg(&mcfg)) {
         acpi_add_table(table_offsets, tables_blob);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v13 10/12] hmat acpi: Build System Locality Latency and Bandwidth Information Structure(s)
  2019-10-20 11:11 [PATCH v13 00/12] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao Xu
                   ` (8 preceding siblings ...)
  2019-10-20 11:11 ` [PATCH v13 09/12] hmat acpi: Build Memory Proximity Domain Attributes Structure(s) Tao Xu
@ 2019-10-20 11:11 ` Tao Xu
  2019-10-20 11:11 ` [PATCH v13 11/12] hmat acpi: Build Memory Side Cache " Tao Xu
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 38+ messages in thread
From: Tao Xu @ 2019-10-20 11:11 UTC (permalink / raw)
  To: imammedo, eblake, ehabkost
  Cc: jingqi.liu, tao3.xu, fan.du, qemu-devel, jonathan.cameron

From: Liu Jingqi <jingqi.liu@intel.com>

This structure describes the memory access latency and bandwidth
information from various memory access initiator proximity domains.
The latency and bandwidth numbers represented in this structure
correspond to rated latency and bandwidth for the platform.
The software could use this information as hint for optimization.

Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
---

Changes in v13:
    - Calculate the entries in a new patch.
---
 hw/acpi/hmat.c | 96 +++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 95 insertions(+), 1 deletion(-)

diff --git a/hw/acpi/hmat.c b/hw/acpi/hmat.c
index c595098ba7..6ec1310e62 100644
--- a/hw/acpi/hmat.c
+++ b/hw/acpi/hmat.c
@@ -27,6 +27,7 @@
 #include "qemu/osdep.h"
 #include "sysemu/numa.h"
 #include "hw/acpi/hmat.h"
+#include "qemu/error-report.h"
 
 /*
  * ACPI 6.3:
@@ -67,11 +68,81 @@ static void build_hmat_mpda(GArray *table_data, uint16_t flags,
     build_append_int_noprefix(table_data, 0, 8);
 }
 
+/*
+ * ACPI 6.3: 5.2.27.4 System Locality Latency and Bandwidth Information
+ * Structure: Table 5-146
+ */
+static void build_hmat_lb(GArray *table_data, HMAT_LB_Info *hmat_lb,
+                          uint32_t num_initiator, uint32_t num_target,
+                          uint32_t *initiator_list)
+{
+    int i;
+    uint16_t *lb_data;
+    uint32_t base;
+    /*
+     * Length in bytes for entire structure, including 32 bytes of
+     * fixed length, length of initiator proximity domain list,
+     * length of target proximity domain list and length of entries
+     * provides latency/bandwidth values.
+     */
+    uint32_t lb_length = 32 + 4 * num_initiator + 4 * num_target +
+                              2 * num_initiator * num_target;
+
+    /* Type */
+    build_append_int_noprefix(table_data, 1, 2);
+    /* Reserved */
+    build_append_int_noprefix(table_data, 0, 2);
+    /* Length */
+    build_append_int_noprefix(table_data, lb_length, 4);
+    /* Flags: Bits [3:0] Memory Hierarchy, Bits[7:4] Reserved */
+    assert(!(hmat_lb->hierarchy >> 4));
+    build_append_int_noprefix(table_data, hmat_lb->hierarchy, 1);
+    /* Data Type */
+    build_append_int_noprefix(table_data, hmat_lb->data_type, 1);
+    /* Reserved */
+    build_append_int_noprefix(table_data, 0, 2);
+    /* Number of Initiator Proximity Domains (s) */
+    build_append_int_noprefix(table_data, num_initiator, 4);
+    /* Number of Target Proximity Domains (t) */
+    build_append_int_noprefix(table_data, num_target, 4);
+    /* Reserved */
+    build_append_int_noprefix(table_data, 0, 4);
+
+    if (hmat_lb->data_type <= HMAT_LB_DATA_WRITE_LATENCY) {
+        base = hmat_lb->base_latency;
+        lb_data = hmat_lb->entry_latency;
+    } else {
+        base = hmat_lb->base_bandwidth;
+        lb_data = hmat_lb->entry_bandwidth;
+    }
+
+    /* Entry Base Unit */
+    build_append_int_noprefix(table_data, base, 8);
+
+    /* Initiator Proximity Domain List */
+    for (i = 0; i < num_initiator; i++) {
+        build_append_int_noprefix(table_data, initiator_list[i], 4);
+    }
+
+    /* Target Proximity Domain List */
+    for (i = 0; i < num_target; i++) {
+        build_append_int_noprefix(table_data, i, 4);
+    }
+
+    /* Latency or Bandwidth Entries */
+    for (i = 0; i < num_initiator * num_target; i++) {
+        build_append_int_noprefix(table_data, lb_data[i], 2);
+    }
+}
+
 /* Build HMAT sub table structures */
 static void hmat_build_table_structs(GArray *table_data, NumaState *numa_state)
 {
     uint16_t flags;
-    int i;
+    uint32_t num_initiator = 0;
+    uint32_t initiator_list[MAX_NODES];
+    int i, hierarchy, type;
+    HMAT_LB_Info *hmat_lb;
 
     for (i = 0; i < numa_state->num_nodes; i++) {
         flags = 0;
@@ -82,6 +153,29 @@ static void hmat_build_table_structs(GArray *table_data, NumaState *numa_state)
 
         build_hmat_mpda(table_data, flags, numa_state->nodes[i].initiator, i);
     }
+
+    for (i = 0; i < numa_state->num_nodes; i++) {
+        if (numa_state->nodes[i].has_cpu) {
+            initiator_list[num_initiator++] = i;
+        }
+    }
+
+    /*
+     * ACPI 6.3: 5.2.27.4 System Locality Latency and Bandwidth Information
+     * Structure: Table 5-146
+     */
+    for (hierarchy = HMAT_LB_MEM_MEMORY;
+         hierarchy <= HMAT_LB_MEM_CACHE_3RD_LEVEL; hierarchy++) {
+        for (type = HMAT_LB_DATA_ACCESS_LATENCY;
+             type <= HMAT_LB_DATA_WRITE_BANDWIDTH; type++) {
+            hmat_lb = numa_state->hmat_lb[hierarchy][type];
+
+            if (hmat_lb) {
+                build_hmat_lb(table_data, hmat_lb, num_initiator,
+                              numa_state->num_nodes, initiator_list);
+            }
+        }
+    }
 }
 
 void build_hmat(GArray *table_data, BIOSLinker *linker, NumaState *numa_state)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v13 11/12] hmat acpi: Build Memory Side Cache Information Structure(s)
  2019-10-20 11:11 [PATCH v13 00/12] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao Xu
                   ` (9 preceding siblings ...)
  2019-10-20 11:11 ` [PATCH v13 10/12] hmat acpi: Build System Locality Latency and Bandwidth Information Structure(s) Tao Xu
@ 2019-10-20 11:11 ` Tao Xu
  2019-10-20 11:11 ` [PATCH v13 12/12] tests/bios-tables-test: add test cases for ACPI HMAT Tao Xu
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 38+ messages in thread
From: Tao Xu @ 2019-10-20 11:11 UTC (permalink / raw)
  To: imammedo, eblake, ehabkost
  Cc: jingqi.liu, tao3.xu, fan.du, qemu-devel, Daniel Black, Jonathan Cameron

From: Liu Jingqi <jingqi.liu@intel.com>

This structure describes memory side cache information for memory
proximity domains if the memory side cache is present and the
physical device forms the memory side cache.
The software could use this information to effectively place
the data in memory to maximize the performance of the system
memory that use the memory side cache.

Reviewed-by: Daniel Black <daniel@linux.ibm.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
---

Changes in v13:
    - rename level as cache_level
---
 hw/acpi/hmat.c | 72 +++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 71 insertions(+), 1 deletion(-)

diff --git a/hw/acpi/hmat.c b/hw/acpi/hmat.c
index 6ec1310e62..931aa60678 100644
--- a/hw/acpi/hmat.c
+++ b/hw/acpi/hmat.c
@@ -135,14 +135,63 @@ static void build_hmat_lb(GArray *table_data, HMAT_LB_Info *hmat_lb,
     }
 }
 
+/* ACPI 6.3: 5.2.27.5 Memory Side Cache Information Structure: Table 5-147 */
+static void build_hmat_cache(GArray *table_data, HMAT_Cache_Info *hmat_cache)
+{
+    /*
+     * Cache Attributes: Bits [3:0] – Total Cache Levels
+     * for this Memory Proximity Domain
+     */
+    uint32_t cache_attr = hmat_cache->total_levels & 0xF;
+
+    /* Bits [7:4] : Cache Level described in this structure */
+    cache_attr |= (hmat_cache->level & 0xF) << 4;
+
+    /* Bits [11:8] - Cache Associativity */
+    cache_attr |= (hmat_cache->associativity & 0x7) << 8;
+
+    /* Bits [15:12] - Write Policy */
+    cache_attr |= (hmat_cache->write_policy & 0x7) << 12;
+
+    /* Bits [31:16] - Cache Line size in bytes */
+    cache_attr |= (hmat_cache->line_size & 0xFFFF) << 16;
+
+    cache_attr = cpu_to_le32(cache_attr);
+
+    /* Type */
+    build_append_int_noprefix(table_data, 2, 2);
+    /* Reserved */
+    build_append_int_noprefix(table_data, 0, 2);
+    /* Length */
+    build_append_int_noprefix(table_data, 32, 4);
+    /* Proximity Domain for the Memory */
+    build_append_int_noprefix(table_data, hmat_cache->proximity, 4);
+    /* Reserved */
+    build_append_int_noprefix(table_data, 0, 4);
+    /* Memory Side Cache Size */
+    build_append_int_noprefix(table_data, hmat_cache->size, 8);
+    /* Cache Attributes */
+    build_append_int_noprefix(table_data, cache_attr, 4);
+    /* Reserved */
+    build_append_int_noprefix(table_data, 0, 2);
+    /*
+     * Number of SMBIOS handles (n)
+     * Linux kernel uses Memory Side Cache Information Structure
+     * without SMBIOS entries for now, so set Number of SMBIOS handles
+     * as 0.
+     */
+    build_append_int_noprefix(table_data, 0, 2);
+}
+
 /* Build HMAT sub table structures */
 static void hmat_build_table_structs(GArray *table_data, NumaState *numa_state)
 {
     uint16_t flags;
     uint32_t num_initiator = 0;
     uint32_t initiator_list[MAX_NODES];
-    int i, hierarchy, type;
+    int i, hierarchy, type, cache_level, total_levels;
     HMAT_LB_Info *hmat_lb;
+    HMAT_Cache_Info *hmat_cache;
 
     for (i = 0; i < numa_state->num_nodes; i++) {
         flags = 0;
@@ -176,6 +225,27 @@ static void hmat_build_table_structs(GArray *table_data, NumaState *numa_state)
             }
         }
     }
+
+    /*
+     * ACPI 6.3: 5.2.27.5 Memory Side Cache Information Structure:
+     * Table 5-147
+     */
+    for (i = 0; i < numa_state->num_nodes; i++) {
+        total_levels = 0;
+        for (cache_level = 1; cache_level <= MAX_HMAT_CACHE_LEVEL;
+             cache_level++) {
+            if (numa_state->hmat_cache[i][cache_level]) {
+                total_levels++;
+            }
+        }
+        for (cache_level = 0; cache_level <= total_levels; cache_level++) {
+            hmat_cache = numa_state->hmat_cache[i][cache_level];
+            if (hmat_cache) {
+                hmat_cache->total_levels = total_levels;
+                build_hmat_cache(table_data, hmat_cache);
+            }
+        }
+    }
 }
 
 void build_hmat(GArray *table_data, BIOSLinker *linker, NumaState *numa_state)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v13 12/12] tests/bios-tables-test: add test cases for ACPI HMAT
  2019-10-20 11:11 [PATCH v13 00/12] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao Xu
                   ` (10 preceding siblings ...)
  2019-10-20 11:11 ` [PATCH v13 11/12] hmat acpi: Build Memory Side Cache " Tao Xu
@ 2019-10-20 11:11 ` Tao Xu
  2019-10-20 11:43 ` [PATCH v13 00/12] Build ACPI Heterogeneous Memory Attribute Table (HMAT) no-reply
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 38+ messages in thread
From: Tao Xu @ 2019-10-20 11:11 UTC (permalink / raw)
  To: imammedo, eblake, ehabkost
  Cc: Jingqi Liu, tao3.xu, fan.du, qemu-devel, Daniel Black, jonathan.cameron

ACPI table HMAT has been introduced, QEMU now builds HMAT tables for
Heterogeneous Memory with boot option '-numa node'.

Add test cases on PC and Q35 machines with 2 numa nodes.
Because HMAT is generated when system enable numa, the
following tables need to be added for this test:
  tests/acpi-test-data/pc/*.acpihmat
  tests/acpi-test-data/pc/HMAT.*
  tests/acpi-test-data/q35/*.acpihmat
  tests/acpi-test-data/q35/HMAT.*

Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Daniel Black <daniel@linux.ibm.com>
Reviewed-by: Jingqi Liu <Jingqi.liu@intel.com>
Suggested-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Tao Xu <tao3.xu@intel.com>
---

Changes in v13:
    - Use decimal notation with appropriate suffix for cache size
---
 tests/bios-tables-test.c | 44 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/tests/bios-tables-test.c b/tests/bios-tables-test.c
index 0b33fb265f..96803c1f20 100644
--- a/tests/bios-tables-test.c
+++ b/tests/bios-tables-test.c
@@ -947,6 +947,48 @@ static void test_acpi_virt_tcg_numamem(void)
 
 }
 
+static void test_acpi_tcg_acpi_hmat(const char *machine)
+{
+    test_data data;
+
+    memset(&data, 0, sizeof(data));
+    data.machine = machine;
+    data.variant = ".acpihmat";
+    test_acpi_one(" -machine hmat=on"
+                  " -smp 2,sockets=2"
+                  " -m 128M,slots=2,maxmem=1G"
+                  " -object memory-backend-ram,size=64M,id=m0"
+                  " -object memory-backend-ram,size=64M,id=m1"
+                  " -numa node,nodeid=0,memdev=m0"
+                  " -numa node,nodeid=1,memdev=m1,initiator=0"
+                  " -numa cpu,node-id=0,socket-id=0"
+                  " -numa cpu,node-id=0,socket-id=1"
+                  " -numa hmat-lb,initiator=0,target=0,hierarchy=memory,"
+                  "data-type=access-latency,latency=5ns"
+                  " -numa hmat-lb,initiator=0,target=0,hierarchy=memory,"
+                  "data-type=access-bandwidth,bandwidth=500M"
+                  " -numa hmat-lb,initiator=0,target=1,hierarchy=memory,"
+                  "data-type=access-latency,latency=10ns"
+                  " -numa hmat-lb,initiator=0,target=1,hierarchy=memory,"
+                  "data-type=access-bandwidth,bandwidth=100M"
+                  " -numa hmat-cache,node-id=0,size=10K,level=1,assoc=direct,"
+                  "policy=write-back,line=8"
+                  " -numa hmat-cache,node-id=1,size=10K,level=1,assoc=direct,"
+                  "policy=write-back,line=8",
+                  &data);
+    free_test_data(&data);
+}
+
+static void test_acpi_q35_tcg_acpi_hmat(void)
+{
+    test_acpi_tcg_acpi_hmat(MACHINE_Q35);
+}
+
+static void test_acpi_piix4_tcg_acpi_hmat(void)
+{
+    test_acpi_tcg_acpi_hmat(MACHINE_PC);
+}
+
 static void test_acpi_virt_tcg(void)
 {
     test_data data = {
@@ -991,6 +1033,8 @@ int main(int argc, char *argv[])
         qtest_add_func("acpi/q35/numamem", test_acpi_q35_tcg_numamem);
         qtest_add_func("acpi/piix4/dimmpxm", test_acpi_piix4_tcg_dimm_pxm);
         qtest_add_func("acpi/q35/dimmpxm", test_acpi_q35_tcg_dimm_pxm);
+        qtest_add_func("acpi/piix4/acpihmat", test_acpi_piix4_tcg_acpi_hmat);
+        qtest_add_func("acpi/q35/acpihmat", test_acpi_q35_tcg_acpi_hmat);
     } else if (strcmp(arch, "aarch64") == 0) {
         qtest_add_func("acpi/virt", test_acpi_virt_tcg);
         qtest_add_func("acpi/virt/numamem", test_acpi_virt_tcg_numamem);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH v13 00/12] Build ACPI Heterogeneous Memory Attribute Table (HMAT)
  2019-10-20 11:11 [PATCH v13 00/12] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao Xu
                   ` (11 preceding siblings ...)
  2019-10-20 11:11 ` [PATCH v13 12/12] tests/bios-tables-test: add test cases for ACPI HMAT Tao Xu
@ 2019-10-20 11:43 ` no-reply
  2019-10-20 12:13 ` no-reply
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 38+ messages in thread
From: no-reply @ 2019-10-20 11:43 UTC (permalink / raw)
  To: tao3.xu
  Cc: ehabkost, jingqi.liu, tao3.xu, fan.du, qemu-devel,
	jonathan.cameron, imammedo

Patchew URL: https://patchew.org/QEMU/20191020111125.27659-1-tao3.xu@intel.com/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

Looking for expected file 'tests/data/acpi/pc/SRAT.acpihmat'
Looking for expected file 'tests/data/acpi/pc/SRAT'
**
ERROR:/tmp/qemu-test/src/tests/bios-tables-test.c:354:load_expected_aml: assertion failed: (exp_sdt.aml_file)
ERROR - Bail out! ERROR:/tmp/qemu-test/src/tests/bios-tables-test.c:354:load_expected_aml: assertion failed: (exp_sdt.aml_file)
make: *** [check-qtest-x86_64] Error 1
make: *** Waiting for unfinished jobs....
  TEST    iotest-qcow2: 025
  TEST    iotest-qcow2: 027
---
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--label', 'com.qemu.instance.uuid=d317dcfdadb14357ab4e6e456e466c28', '-u', '1001', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-jvsh36ab/src/docker-src.2019-10-20-07.31.22.24116:/var/tmp/qemu:z,ro', 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit status 2.
filter=--filter=label=com.qemu.instance.uuid=d317dcfdadb14357ab4e6e456e466c28
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-jvsh36ab/src'
make: *** [docker-run-test-quick@centos7] Error 2

real    12m1.295s
user    0m8.390s


The full log is available at
http://patchew.org/logs/20191020111125.27659-1-tao3.xu@intel.com/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v13 00/12] Build ACPI Heterogeneous Memory Attribute Table (HMAT)
  2019-10-20 11:11 [PATCH v13 00/12] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao Xu
                   ` (12 preceding siblings ...)
  2019-10-20 11:43 ` [PATCH v13 00/12] Build ACPI Heterogeneous Memory Attribute Table (HMAT) no-reply
@ 2019-10-20 12:13 ` no-reply
  2019-10-20 12:57 ` no-reply
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 38+ messages in thread
From: no-reply @ 2019-10-20 12:13 UTC (permalink / raw)
  To: tao3.xu
  Cc: ehabkost, jingqi.liu, tao3.xu, fan.du, qemu-devel,
	jonathan.cameron, imammedo

Patchew URL: https://patchew.org/QEMU/20191020111125.27659-1-tao3.xu@intel.com/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

Looking for expected file 'tests/data/acpi/pc/SRAT.acpihmat'
Looking for expected file 'tests/data/acpi/pc/SRAT'
**
ERROR:/tmp/qemu-test/src/tests/bios-tables-test.c:354:load_expected_aml: assertion failed: (exp_sdt.aml_file)
ERROR - Bail out! ERROR:/tmp/qemu-test/src/tests/bios-tables-test.c:354:load_expected_aml: assertion failed: (exp_sdt.aml_file)
make: *** [check-qtest-x86_64] Error 1
make: *** Waiting for unfinished jobs....
  TEST    iotest-qcow2: 036
  TEST    iotest-qcow2: 037
---
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--label', 'com.qemu.instance.uuid=1a622dbdc82341f49970cd92203d9bcd', '-u', '1001', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-6i8n4o9m/src/docker-src.2019-10-20-08.03.22.16115:/var/tmp/qemu:z,ro', 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit status 2.
filter=--filter=label=com.qemu.instance.uuid=1a622dbdc82341f49970cd92203d9bcd
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-6i8n4o9m/src'
make: *** [docker-run-test-quick@centos7] Error 2

real    9m56.614s
user    0m6.591s


The full log is available at
http://patchew.org/logs/20191020111125.27659-1-tao3.xu@intel.com/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v13 00/12] Build ACPI Heterogeneous Memory Attribute Table (HMAT)
  2019-10-20 11:11 [PATCH v13 00/12] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao Xu
                   ` (13 preceding siblings ...)
  2019-10-20 12:13 ` no-reply
@ 2019-10-20 12:57 ` no-reply
  2019-10-20 13:19 ` no-reply
  2019-10-22 11:22 ` Markus Armbruster
  16 siblings, 0 replies; 38+ messages in thread
From: no-reply @ 2019-10-20 12:57 UTC (permalink / raw)
  To: tao3.xu
  Cc: ehabkost, jingqi.liu, tao3.xu, fan.du, qemu-devel,
	jonathan.cameron, imammedo

Patchew URL: https://patchew.org/QEMU/20191020111125.27659-1-tao3.xu@intel.com/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

Looking for expected file 'tests/data/acpi/pc/SRAT.acpihmat'
Looking for expected file 'tests/data/acpi/pc/SRAT'
**
ERROR:/tmp/qemu-test/src/tests/bios-tables-test.c:354:load_expected_aml: assertion failed: (exp_sdt.aml_file)
ERROR - Bail out! ERROR:/tmp/qemu-test/src/tests/bios-tables-test.c:354:load_expected_aml: assertion failed: (exp_sdt.aml_file)
make: *** [check-qtest-x86_64] Error 1
make: *** Waiting for unfinished jobs....
  TEST    iotest-qcow2: 035
  TEST    check-unit: tests/test-bufferiszero
---
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--label', 'com.qemu.instance.uuid=333528b3def44bedb7868163d167de8a', '-u', '1001', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-83x7cpec/src/docker-src.2019-10-20-08.46.30.16537:/var/tmp/qemu:z,ro', 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit status 2.
filter=--filter=label=com.qemu.instance.uuid=333528b3def44bedb7868163d167de8a
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-83x7cpec/src'
make: *** [docker-run-test-quick@centos7] Error 2

real    10m40.886s
user    0m8.703s


The full log is available at
http://patchew.org/logs/20191020111125.27659-1-tao3.xu@intel.com/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v13 00/12] Build ACPI Heterogeneous Memory Attribute Table (HMAT)
  2019-10-20 11:11 [PATCH v13 00/12] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao Xu
                   ` (14 preceding siblings ...)
  2019-10-20 12:57 ` no-reply
@ 2019-10-20 13:19 ` no-reply
  2019-10-22 11:22 ` Markus Armbruster
  16 siblings, 0 replies; 38+ messages in thread
From: no-reply @ 2019-10-20 13:19 UTC (permalink / raw)
  To: tao3.xu
  Cc: ehabkost, jingqi.liu, tao3.xu, fan.du, qemu-devel,
	jonathan.cameron, imammedo

Patchew URL: https://patchew.org/QEMU/20191020111125.27659-1-tao3.xu@intel.com/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

Looking for expected file 'tests/data/acpi/pc/SRAT.acpihmat'
Looking for expected file 'tests/data/acpi/pc/SRAT'
**
ERROR:/tmp/qemu-test/src/tests/bios-tables-test.c:354:load_expected_aml: assertion failed: (exp_sdt.aml_file)
ERROR - Bail out! ERROR:/tmp/qemu-test/src/tests/bios-tables-test.c:354:load_expected_aml: assertion failed: (exp_sdt.aml_file)
make: *** [check-qtest-x86_64] Error 1
make: *** Waiting for unfinished jobs....
  TEST    iotest-qcow2: 035
  TEST    iotest-qcow2: 036
---
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--label', 'com.qemu.instance.uuid=086575f112be4f06b5b6628ea1ada77d', '-u', '1001', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-ug9vqabf/src/docker-src.2019-10-20-09.09.36.21970:/var/tmp/qemu:z,ro', 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit status 2.
filter=--filter=label=com.qemu.instance.uuid=086575f112be4f06b5b6628ea1ada77d
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-ug9vqabf/src'
make: *** [docker-run-test-quick@centos7] Error 2

real    10m16.781s
user    0m8.561s


The full log is available at
http://patchew.org/logs/20191020111125.27659-1-tao3.xu@intel.com/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v13 05/12] numa: Extend CLI to provide initiator information for numa nodes
  2019-10-20 11:11 ` [PATCH v13 05/12] numa: Extend CLI to provide initiator information for numa nodes Tao Xu
@ 2019-10-21 12:29   ` Igor Mammedov
  2019-10-22  1:01     ` Tao Xu
  0 siblings, 1 reply; 38+ messages in thread
From: Igor Mammedov @ 2019-10-21 12:29 UTC (permalink / raw)
  To: Tao Xu
  Cc: ehabkost, jingqi.liu, fan.du, qemu-devel, jonathan.cameron, Dan Williams

On Sun, 20 Oct 2019 19:11:18 +0800
Tao Xu <tao3.xu@intel.com> wrote:

> In ACPI 6.3 chapter 5.2.27 Heterogeneous Memory Attribute Table (HMAT),
> The initiator represents processor which access to memory. And in 5.2.27.3
> Memory Proximity Domain Attributes Structure, the attached initiator is
> defined as where the memory controller responsible for a memory proximity
> domain. With attached initiator information, the topology of heterogeneous
> memory can be described.
> 
> Extend CLI of "-numa node" option to indicate the initiator numa node-id.
> In the linux kernel, the codes in drivers/acpi/hmat/hmat.c parse and report
> the platform's HMAT tables.
> 
> Reviewed-by: Jingqi Liu <jingqi.liu@intel.com>
> Suggested-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Tao Xu <tao3.xu@intel.com>
> ---
> 
> Changes in v13:
>     - Modify some text description
>     - Drop "initiator_valid" field in struct NodeInfo (Igor)
> 
> Changes in v12:
>     - Fix the bug that a memory-only node without initiator setting
>       doesn't report error. (reported by Danmei Wei)
> ---
>  hw/core/machine.c     | 70 +++++++++++++++++++++++++++++++++++++++++++
>  hw/core/numa.c        | 23 ++++++++++++++
>  include/sysemu/numa.h |  5 ++++
>  qapi/machine.json     | 10 ++++++-
>  qemu-options.hx       | 35 ++++++++++++++++++----
>  5 files changed, 137 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index 1689ad3bf8..32a451bd84 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -518,6 +518,20 @@ static void machine_set_nvdimm(Object *obj, bool value, Error **errp)
>      ms->nvdimms_state->is_enabled = value;
>  }
>  
> +static bool machine_get_hmat(Object *obj, Error **errp)
> +{
> +    MachineState *ms = MACHINE(obj);
> +
> +    return ms->numa_state->hmat_enabled;
> +}
> +
> +static void machine_set_hmat(Object *obj, bool value, Error **errp)
> +{
> +    MachineState *ms = MACHINE(obj);
> +
> +    ms->numa_state->hmat_enabled = value;
> +}
> +
>  static char *machine_get_nvdimm_persistence(Object *obj, Error **errp)
>  {
>      MachineState *ms = MACHINE(obj);
> @@ -645,6 +659,7 @@ void machine_set_cpu_numa_node(MachineState *machine,
>                                 const CpuInstanceProperties *props, Error **errp)
>  {
>      MachineClass *mc = MACHINE_GET_CLASS(machine);
> +    NodeInfo *numa_info = machine->numa_state->nodes;
>      bool match = false;
>      int i;
>  
> @@ -714,6 +729,17 @@ void machine_set_cpu_numa_node(MachineState *machine,
>          match = true;
>          slot->props.node_id = props->node_id;
>          slot->props.has_node_id = props->has_node_id;
> +
> +        if (machine->numa_state->hmat_enabled) {
> +            if ((numa_info[props->node_id].initiator < MAX_NODES) &&
> +                (props->node_id != numa_info[props->node_id].initiator)) {
> +                error_setg(errp, "The initiator of CPU NUMA node %" PRId64
> +                        " should be itself.", props->node_id);
> +                return;
> +            }
> +            numa_info[props->node_id].has_cpu = true;
> +            numa_info[props->node_id].initiator = props->node_id;
> +        }
>      }
>  
>      if (!match) {
> @@ -960,6 +986,13 @@ static void machine_initfn(Object *obj)
>  
>      if (mc->numa_mem_supported) {
>          ms->numa_state = g_new0(NumaState, 1);
> +        object_property_add_bool(obj, "hmat",
> +                                 machine_get_hmat, machine_set_hmat,
> +                                 &error_abort);
> +        object_property_set_description(obj, "hmat",
> +                                        "Set on/off to enable/disable "
> +                                        "ACPI Heterogeneous Memory Attribute "
> +                                        "Table (HMAT)", NULL);
>      }
>  
>      /* Register notifier when init is done for sysbus sanity checks */
> @@ -1048,6 +1081,38 @@ static char *cpu_slot_to_string(const CPUArchId *cpu)
>      return g_string_free(s, false);
>  }
>  
> +static void numa_validate_initiator(NumaState *numa_state)
> +{
> +    int i;
> +    NodeInfo *numa_info = numa_state->nodes;
> +
> +    for (i = 0; i < numa_state->num_nodes; i++) {
> +        if (numa_info[i].initiator == MAX_NODES) {
> +            error_report("The initiator of NUMA node %d is missing, use "
> +                         "'-numa node,initiator' option to declare it.", i);
> +            goto err;
> +        }
> +
> +        if (!numa_info[numa_info[i].initiator].present) {
> +            error_report("NUMA node %" PRIu16 " is missing, use "
> +                         "'-numa node' option to declare it first.",
> +                         numa_info[i].initiator);
> +            goto err;
> +        }
> +
> +        if (!numa_info[numa_info[i].initiator].has_cpu) {
> +            error_report("The initiator of NUMA node %d is invalid.", i);
> +            goto err;
> +        }
> +    }
> +
> +    return;
> +
> +err:
> +    error_printf("\n");
error_report() does ^^^, drop it

with this fixed,

Reviewed-by: Igor Mammedov <imammedo@redhat.com>

> +    exit(1);
> +}
> +
>  static void machine_numa_finish_cpu_init(MachineState *machine)
>  {
>      int i;
> @@ -1088,6 +1153,11 @@ static void machine_numa_finish_cpu_init(MachineState *machine)
>              machine_set_cpu_numa_node(machine, &props, &error_fatal);
>          }
>      }
> +
> +    if (machine->numa_state->hmat_enabled) {
> +        numa_validate_initiator(machine->numa_state);
> +    }
> +
>      if (s->len && !qtest_enabled()) {
>          warn_report("CPU(s) not present in any NUMA nodes: %s",
>                      s->str);
> diff --git a/hw/core/numa.c b/hw/core/numa.c
> index 038c96d4ab..eba66ab768 100644
> --- a/hw/core/numa.c
> +++ b/hw/core/numa.c
> @@ -133,6 +133,29 @@ static void parse_numa_node(MachineState *ms, NumaNodeOptions *node,
>          numa_info[nodenr].node_mem = object_property_get_uint(o, "size", NULL);
>          numa_info[nodenr].node_memdev = MEMORY_BACKEND(o);
>      }
> +
> +    /*
> +     * If not set the initiator, set it to MAX_NODES. And if
> +     * HMAT is enabled and this node has no cpus, QEMU will raise error.
> +     */
> +    numa_info[nodenr].initiator = MAX_NODES;
> +    if (node->has_initiator) {
> +        if (!ms->numa_state->hmat_enabled) {
> +            error_setg(errp, "ACPI Heterogeneous Memory Attribute Table "
> +                       "(HMAT) is disabled, enable it with -machine hmat=on "
> +                       "before using any of hmat specific options.");
> +            return;
> +        }
> +
> +        if (node->initiator >= MAX_NODES) {
> +            error_report("The initiator id %" PRIu16 " expects an integer "
> +                         "between 0 and %d", node->initiator,
> +                         MAX_NODES - 1);
> +            return;
> +        }
> +
> +        numa_info[nodenr].initiator = node->initiator;
> +    }
>      numa_info[nodenr].present = true;
>      max_numa_nodeid = MAX(max_numa_nodeid, nodenr + 1);
>      ms->numa_state->num_nodes++;
> diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
> index ae9c41d02b..788cbec7a2 100644
> --- a/include/sysemu/numa.h
> +++ b/include/sysemu/numa.h
> @@ -18,6 +18,8 @@ struct NodeInfo {
>      uint64_t node_mem;
>      struct HostMemoryBackend *node_memdev;
>      bool present;
> +    bool has_cpu;
> +    uint16_t initiator;
>      uint8_t distance[MAX_NODES];
>  };
>  
> @@ -33,6 +35,9 @@ struct NumaState {
>      /* Allow setting NUMA distance for different NUMA nodes */
>      bool have_numa_distance;
>  
> +    /* Detect if HMAT support is enabled. */
> +    bool hmat_enabled;
> +
>      /* NUMA nodes information */
>      NodeInfo nodes[MAX_NODES];
>  };
> diff --git a/qapi/machine.json b/qapi/machine.json
> index ca26779f1a..f1b07b3486 100644
> --- a/qapi/machine.json
> +++ b/qapi/machine.json
> @@ -463,6 +463,13 @@
>  # @memdev: memory backend object.  If specified for one node,
>  #          it must be specified for all nodes.
>  #
> +# @initiator: defined in ACPI 6.3 Chapter 5.2.27.3 Table 5-145,
> +#             points to the nodeid which has the memory controller
> +#             responsible for this NUMA node. This field provides
> +#             additional information as to the initiator node that
> +#             is closest (as in directly attached) to this node, and
> +#             therefore has the best performance (since 4.2)
> +#
>  # Since: 2.1
>  ##
>  { 'struct': 'NumaNodeOptions',
> @@ -470,7 +477,8 @@
>     '*nodeid': 'uint16',
>     '*cpus':   ['uint16'],
>     '*mem':    'size',
> -   '*memdev': 'str' }}
> +   '*memdev': 'str',
> +   '*initiator': 'uint16' }}
>  
>  ##
>  # @NumaDistOptions:
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 996b6fba74..1f96399521 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -43,7 +43,8 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
>      "                suppress-vmdesc=on|off disables self-describing migration (default=off)\n"
>      "                nvdimm=on|off controls NVDIMM support (default=off)\n"
>      "                enforce-config-section=on|off enforce configuration section migration (default=off)\n"
> -    "                memory-encryption=@var{} memory encryption object to use (default=none)\n",
> +    "                memory-encryption=@var{} memory encryption object to use (default=none)\n"
> +    "                hmat=on|off controls ACPI HMAT support (default=off)\n",
>      QEMU_ARCH_ALL)
>  STEXI
>  @item -machine [type=]@var{name}[,prop=@var{value}[,...]]
> @@ -103,6 +104,9 @@ NOTE: this parameter is deprecated. Please use @option{-global}
>  @option{migration.send-configuration}=@var{on|off} instead.
>  @item memory-encryption=@var{}
>  Memory encryption object to use. The default is none.
> +@item hmat=on|off
> +Enables or disables ACPI Heterogeneous Memory Attribute Table (HMAT) support.
> +The default is off.
>  @end table
>  ETEXI
>  
> @@ -161,14 +165,14 @@ If any on the three values is given, the total number of CPUs @var{n} can be omi
>  ETEXI
>  
>  DEF("numa", HAS_ARG, QEMU_OPTION_numa,
> -    "-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node]\n"
> -    "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node]\n"
> +    "-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
> +    "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
>      "-numa dist,src=source,dst=destination,val=distance\n"
>      "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n",
>      QEMU_ARCH_ALL)
>  STEXI
> -@item -numa node[,mem=@var{size}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}]
> -@itemx -numa node[,memdev=@var{id}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}]
> +@item -numa node[,mem=@var{size}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}][,initiator=@var{initiator}]
> +@itemx -numa node[,memdev=@var{id}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}][,initiator=@var{initiator}]
>  @itemx -numa dist,src=@var{source},dst=@var{destination},val=@var{distance}
>  @itemx -numa cpu,node-id=@var{node}[,socket-id=@var{x}][,core-id=@var{y}][,thread-id=@var{z}]
>  @findex -numa
> @@ -215,6 +219,27 @@ split equally between them.
>  @samp{mem} and @samp{memdev} are mutually exclusive. Furthermore,
>  if one node uses @samp{memdev}, all of them have to use it.
>  
> +@samp{initiator} is an additional option that points to an @var{initiator}
> +NUMA node that has best performance (the lowest latency or largest bandwidth)
> +to this NUMA @var{node}. Note that this option can be set only when
> +the machine property 'hmat' is set to 'on'.
> +
> +Following example creates a machine with 2 NUMA nodes, node 0 has CPU.
> +node 1 has only memory, and its initiator is node 0. Note that because
> +node 0 has CPU, by default the initiator of node 0 is itself and must be
> +itself.
> +@example
> +-machine hmat=on \
> +-m 2G,slots=2,maxmem=4G \
> +-object memory-backend-ram,size=1G,id=m0 \
> +-object memory-backend-ram,size=1G,id=m1 \
> +-numa node,nodeid=0,memdev=m0 \
> +-numa node,nodeid=1,memdev=m1,initiator=0 \
> +-smp 2,sockets=2,maxcpus=2  \
> +-numa cpu,node-id=0,socket-id=0 \
> +-numa cpu,node-id=0,socket-id=1
> +@end example
> +
>  @var{source} and @var{destination} are NUMA node IDs.
>  @var{distance} is the NUMA distance from @var{source} to @var{destination}.
>  The distance from a node to itself is always 10. If any pair of nodes is



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v13 05/12] numa: Extend CLI to provide initiator information for numa nodes
  2019-10-21 12:29   ` Igor Mammedov
@ 2019-10-22  1:01     ` Tao Xu
  0 siblings, 0 replies; 38+ messages in thread
From: Tao Xu @ 2019-10-22  1:01 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: ehabkost, Liu, Jingqi, Du, Fan, qemu-devel, jonathan.cameron,
	Williams, Dan J

On 10/21/2019 8:29 PM, Igor Mammedov wrote:
> On Sun, 20 Oct 2019 19:11:18 +0800
> Tao Xu <tao3.xu@intel.com> wrote:
> 
>> In ACPI 6.3 chapter 5.2.27 Heterogeneous Memory Attribute Table (HMAT),
>> The initiator represents processor which access to memory. And in 5.2.27.3
>> Memory Proximity Domain Attributes Structure, the attached initiator is
>> defined as where the memory controller responsible for a memory proximity
>> domain. With attached initiator information, the topology of heterogeneous
>> memory can be described.
>>
>> Extend CLI of "-numa node" option to indicate the initiator numa node-id.
>> In the linux kernel, the codes in drivers/acpi/hmat/hmat.c parse and report
>> the platform's HMAT tables.
>>
>> Reviewed-by: Jingqi Liu <jingqi.liu@intel.com>
>> Suggested-by: Dan Williams <dan.j.williams@intel.com>
>> Signed-off-by: Tao Xu <tao3.xu@intel.com>
>> ---
[...]
>> +static void numa_validate_initiator(NumaState *numa_state)
>> +{
>> +    int i;
>> +    NodeInfo *numa_info = numa_state->nodes;
>> +
>> +    for (i = 0; i < numa_state->num_nodes; i++) {
>> +        if (numa_info[i].initiator == MAX_NODES) {
>> +            error_report("The initiator of NUMA node %d is missing, use "
>> +                         "'-numa node,initiator' option to declare it.", i);
>> +            goto err;
>> +        }
>> +
>> +        if (!numa_info[numa_info[i].initiator].present) {
>> +            error_report("NUMA node %" PRIu16 " is missing, use "
>> +                         "'-numa node' option to declare it first.",
>> +                         numa_info[i].initiator);
>> +            goto err;
>> +        }
>> +
>> +        if (!numa_info[numa_info[i].initiator].has_cpu) {
>> +            error_report("The initiator of NUMA node %d is invalid.", i);
>> +            goto err;
>> +        }
>> +    }
>> +
>> +    return;
>> +
>> +err:
>> +    error_printf("\n");
> error_report() does ^^^, drop it

OK, I will drop it.
> 
> with this fixed,
> 
> Reviewed-by: Igor Mammedov <imammedo@redhat.com>
> 
[...]


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v13 06/12] numa: Extend CLI to provide memory latency and bandwidth information
  2019-10-20 11:11 ` [PATCH v13 06/12] numa: Extend CLI to provide memory latency and bandwidth information Tao Xu
@ 2019-10-22  7:08   ` Igor Mammedov
  2019-10-22  8:22     ` Tao Xu
  2019-10-23 15:28   ` Igor Mammedov
  1 sibling, 1 reply; 38+ messages in thread
From: Igor Mammedov @ 2019-10-22  7:08 UTC (permalink / raw)
  To: Tao Xu; +Cc: ehabkost, jingqi.liu, fan.du, qemu-devel, jonathan.cameron

On Sun, 20 Oct 2019 19:11:19 +0800
Tao Xu <tao3.xu@intel.com> wrote:

> From: Liu Jingqi <jingqi.liu@intel.com>
> 
> Add -numa hmat-lb option to provide System Locality Latency and
> Bandwidth Information. These memory attributes help to build
> System Locality Latency and Bandwidth Information Structure(s)
> in ACPI Heterogeneous Memory Attribute Table (HMAT).
> 
> Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
> Signed-off-by: Tao Xu <tao3.xu@intel.com>
> ---
> 
> Changes in v13:
>     - Reuse Garray to store the raw bandwidth and bandwidth data
>     - Calculate common base unit using range bitmap (Igor)
> ---
>  hw/core/numa.c        | 127 ++++++++++++++++++++++++++++++++++++++++++
>  include/sysemu/numa.h |  68 ++++++++++++++++++++++
>  qapi/machine.json     |  95 ++++++++++++++++++++++++++++++-
>  qemu-options.hx       |  49 +++++++++++++++-
>  4 files changed, 336 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/core/numa.c b/hw/core/numa.c
> index eba66ab768..3cf77f6ac9 100644
> --- a/hw/core/numa.c
> +++ b/hw/core/numa.c
> @@ -23,6 +23,7 @@
>   */
>  
>  #include "qemu/osdep.h"
> +#include "qemu/units.h"
>  #include "sysemu/hostmem.h"
>  #include "sysemu/numa.h"
>  #include "sysemu/sysemu.h"
> @@ -198,6 +199,119 @@ void parse_numa_distance(MachineState *ms, NumaDistOptions *dist, Error **errp)
>      ms->numa_state->have_numa_distance = true;
>  }
>  
> +void parse_numa_hmat_lb(NumaState *numa_state, NumaHmatLBOptions *node,
> +                        Error **errp)
> +{
> +    int first_bit, last_bit;
> +    uint64_t temp_latency;
> +    NodeInfo *numa_info = numa_state->nodes;
> +    HMAT_LB_Info *hmat_lb =
> +        numa_state->hmat_lb[node->hierarchy][node->data_type];
> +    HMAT_LB_Data lb_data;
> +
> +    /* Error checking */
> +    if (node->initiator >= numa_state->num_nodes) {
> +        error_setg(errp, "Invalid initiator=%d, it should be less than %d.",
> +                   node->initiator, numa_state->num_nodes);
> +        return;
> +    }
> +    if (node->target >= numa_state->num_nodes) {
> +        error_setg(errp, "Invalid target=%d, it should be less than %d.",
> +                   node->target, numa_state->num_nodes);
> +        return;
> +    }
> +    if (!numa_info[node->initiator].has_cpu) {
> +        error_setg(errp, "Invalid initiator=%d, it isn't an "
> +                   "initiator proximity domain.", node->initiator);
> +        return;
> +    }
> +    if (!numa_info[node->target].present) {
> +        error_setg(errp, "Invalid target=%d, it hasn't a valid NUMA node.",
> +                   node->target);
> +        return;
> +    }
> +
> +    if (!hmat_lb) {
> +        hmat_lb = g_malloc0(sizeof(*hmat_lb));
> +        numa_state->hmat_lb[node->hierarchy][node->data_type] = hmat_lb;
> +        hmat_lb->latency = g_array_new(false, true, sizeof(HMAT_LB_Data));
> +        hmat_lb->bandwidth = g_array_new(false, true, sizeof(HMAT_LB_Data));
> +    }
> +    hmat_lb->hierarchy = node->hierarchy;
> +    hmat_lb->data_type = node->data_type;
> +    lb_data.initiator = node->initiator;
> +    lb_data.target = node->target;
> +
> +    /* Input latency data */
> +    if (node->data_type <= HMATLB_DATA_TYPE_WRITE_LATENCY) {
> +        if (!node->has_latency) {
> +            error_setg(errp, "Missing 'latency' option.");
> +            return;
> +        }
> +        if (node->has_bandwidth) {
> +            error_setg(errp, "Invalid option 'bandwidth' since "
> +                       "the data type is latency.");
> +            return;
> +        }
> +
> +        temp_latency = node->latency;
> +        hmat_lb->base_latency = 1;
> +        while (QEMU_IS_ALIGNED(temp_latency, 10)) {
> +            temp_latency /= 10;
> +            hmat_lb->base_latency *= 10;
> +        }
> +
> +        if (temp_latency >= UINT64_MAX) {
                            ^  ^^^^ doesn't make sense

can't you use range bitmap here as well?

> +            error_setg(errp, "Latency %" PRIu64 " between initiator=%d and "
> +                       "target=%d should not differ from previously entered "
> +                       "values on more than %d.", node->latency,
> +                       node->initiator, node->target, UINT16_MAX - 1);
> +            return;
> +        }
> +        if (temp_latency > hmat_lb->range_left_la) {
> +            hmat_lb->range_left_la = temp_latency;
> +        }
> +
> +        lb_data.rawdata = node->latency;
> +        g_array_append_val(hmat_lb->latency, lb_data);
> +    }
> +
> +    /* Input bandwidth data */
> +    if (node->data_type >= HMATLB_DATA_TYPE_ACCESS_BANDWIDTH) {
> +        if (!node->has_bandwidth) {
> +            error_setg(errp, "Missing 'bandwidth' option.");
> +            return;
> +        }
> +        if (node->has_latency) {
> +            error_setg(errp, "Invalid option 'latency' since "
> +                       "the data type is bandwidth.");
> +            return;
> +        }
> +        if (!QEMU_IS_ALIGNED(node->bandwidth, MiB)) {
> +            error_setg(errp, "Bandwidth %" PRIu64 " between initiator=%d and "
> +                       "target=%d should be 1MB aligned.", node->bandwidth,
> +                       node->initiator, node->target);
> +            return;
> +        }
> +
> +        hmat_lb->range_bitmap_bw |= node->bandwidth;
> +
> +        first_bit = __builtin_ffs(hmat_lb->range_bitmap_bw);
> +        last_bit = __builtin_ctz(hmat_lb->range_bitmap_bw);
> +        if ((last_bit - first_bit) > UINT16_BITS) {
> +            error_setg(errp, "Bandwidth %" PRIu64 " between initiator=%d and "
> +                       "target=%d should not differ from previously entered "
> +                       "values on more than %d.", node->bandwidth,
> +                       node->initiator, node->target, UINT16_MAX - 1);
> +            return;
> +        }
> +
> +        hmat_lb->base_bandwidth = UINT64_C(1) << (first_bit - 1);
> +        lb_data.rawdata = node->bandwidth;
> +        g_array_append_val(hmat_lb->bandwidth, lb_data);
> +    }
> +}
> +
>  void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp)
>  {
>      Error *err = NULL;
> @@ -236,6 +350,19 @@ void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp)
>          machine_set_cpu_numa_node(ms, qapi_NumaCpuOptions_base(&object->u.cpu),
>                                    &err);
>          break;
> +    case NUMA_OPTIONS_TYPE_HMAT_LB:
> +        if (!ms->numa_state->hmat_enabled) {
> +            error_setg(errp, "ACPI Heterogeneous Memory Attribute Table "
> +                       "(HMAT) is disabled, enable it with -machine hmat=on "
> +                       "before using any of hmat specific options.");
> +            return;
> +        }
> +
> +        parse_numa_hmat_lb(ms->numa_state, &object->u.hmat_lb, &err);
> +        if (err) {
> +            goto end;
> +        }
> +        break;
>      default:
>          abort();
>      }
> diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
> index 788cbec7a2..b45afcb29e 100644
> --- a/include/sysemu/numa.h
> +++ b/include/sysemu/numa.h
> @@ -14,6 +14,29 @@ struct CPUArchId;
>  #define NUMA_DISTANCE_MAX         254
>  #define NUMA_DISTANCE_UNREACHABLE 255
>  
> +/* the value of AcpiHmatLBInfo flags */
> +enum {
> +    HMAT_LB_MEM_MEMORY           = 0,
> +    HMAT_LB_MEM_CACHE_1ST_LEVEL  = 1,
> +    HMAT_LB_MEM_CACHE_2ND_LEVEL  = 2,
> +    HMAT_LB_MEM_CACHE_3RD_LEVEL  = 3,
> +};
> +
> +/* the value of AcpiHmatLBInfo data type */
> +enum {
> +    HMAT_LB_DATA_ACCESS_LATENCY   = 0,
> +    HMAT_LB_DATA_READ_LATENCY     = 1,
> +    HMAT_LB_DATA_WRITE_LATENCY    = 2,
> +    HMAT_LB_DATA_ACCESS_BANDWIDTH = 3,
> +    HMAT_LB_DATA_READ_BANDWIDTH   = 4,
> +    HMAT_LB_DATA_WRITE_BANDWIDTH  = 5,
> +};
> +
> +#define UINT16_BITS       16
> +
> +#define HMAT_LB_LEVELS    (HMAT_LB_MEM_CACHE_3RD_LEVEL + 1)
> +#define HMAT_LB_TYPES     (HMAT_LB_DATA_WRITE_BANDWIDTH + 1)
> +
>  struct NodeInfo {
>      uint64_t node_mem;
>      struct HostMemoryBackend *node_memdev;
> @@ -28,6 +51,46 @@ struct NumaNodeMem {
>      uint64_t node_plugged_mem;
>  };
>  
> +struct HMAT_LB_Data {
> +    uint8_t     initiator;
> +    uint8_t     target;
> +    uint64_t    rawdata;
> +};
> +typedef struct HMAT_LB_Data HMAT_LB_Data;
> +
> +struct HMAT_LB_Info {
> +    /* Indicates it's memory or the specified level memory side cache. */
> +    uint8_t     hierarchy;
> +
> +    /* Present the type of data, access/read/write latency or bandwidth. */
> +    uint8_t     data_type;
> +
> +    /* The left range of latency for calculating common latency base */
> +    uint64_t    range_left_la;
> +
> +    /* The range bitmap of bandwidth for calculating common bandwidth base */
> +    uint64_t    range_bitmap_bw;
> +
> +    /* The common base unit for latencies */
> +    uint64_t    base_latency;
> +
> +    /* The common base unit for bandwidths */
> +    uint64_t    base_bandwidth;
> +
> +    /* Array to store the compressed latencies */
> +    uint16_t    *entry_latency;
> +
> +    /* Array to store the compressed latencies */
> +    uint16_t    *entry_bandwidth;
> +
> +    /* Array to store the latencies */
> +    GArray      *latency;
> +
> +    /* Array to store the bandwidthes */
> +    GArray      *bandwidth;
> +};
> +typedef struct HMAT_LB_Info HMAT_LB_Info;
> +
>  struct NumaState {
>      /* Number of NUMA nodes */
>      int num_nodes;
> @@ -40,11 +103,16 @@ struct NumaState {
>  
>      /* NUMA nodes information */
>      NodeInfo nodes[MAX_NODES];
> +
> +    /* NUMA nodes HMAT Locality Latency and Bandwidth Information */
> +    HMAT_LB_Info *hmat_lb[HMAT_LB_LEVELS][HMAT_LB_TYPES];
>  };
>  typedef struct NumaState NumaState;
>  
>  void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp);
>  void parse_numa_opts(MachineState *ms);
> +void parse_numa_hmat_lb(NumaState *numa_state, NumaHmatLBOptions *node,
> +                        Error **errp);
>  void numa_complete_configuration(MachineState *ms);
>  void query_numa_node_mem(NumaNodeMem node_mem[], MachineState *ms);
>  extern QemuOptsList qemu_numa_opts;
> diff --git a/qapi/machine.json b/qapi/machine.json
> index f1b07b3486..9ca008810b 100644
> --- a/qapi/machine.json
> +++ b/qapi/machine.json
> @@ -426,10 +426,12 @@
>  #
>  # @cpu: property based CPU(s) to node mapping (Since: 2.10)
>  #
> +# @hmat-lb: memory latency and bandwidth information (Since: 4.2)
> +#
>  # Since: 2.1
>  ##
>  { 'enum': 'NumaOptionsType',
> -  'data': [ 'node', 'dist', 'cpu' ] }
> +  'data': [ 'node', 'dist', 'cpu', 'hmat-lb' ] }
>  
>  ##
>  # @NumaOptions:
> @@ -444,7 +446,8 @@
>    'data': {
>      'node': 'NumaNodeOptions',
>      'dist': 'NumaDistOptions',
> -    'cpu': 'NumaCpuOptions' }}
> +    'cpu': 'NumaCpuOptions',
> +    'hmat-lb': 'NumaHmatLBOptions' }}
>  
>  ##
>  # @NumaNodeOptions:
> @@ -557,6 +560,94 @@
>     'base': 'CpuInstanceProperties',
>     'data' : {} }
>  
> +##
> +# @HmatLBMemoryHierarchy:
> +#
> +# The memory hierarchy in the System Locality Latency
> +# and Bandwidth Information Structure of HMAT (Heterogeneous
> +# Memory Attribute Table)
> +#
> +# For more information of @HmatLBMemoryHierarchy see
> +# the chapter 5.2.27.4: Table 5-142: Field "Flags" of ACPI 6.3 spec.
> +#
> +# @memory: the structure represents the memory performance
> +#
> +# @first-level: first level memory of memory side cached memory
> +#
> +# @second-level: second level memory of memory side cached memory
> +#
> +# @third-level: third level memory of memory side cached memory
> +#
> +# Since: 4.2
> +##
> +{ 'enum': 'HmatLBMemoryHierarchy',
> +  'data': [ 'memory', 'first-level', 'second-level', 'third-level' ] }
> +
> +##
> +# @HmatLBDataType:
> +#
> +# Data type in the System Locality Latency
> +# and Bandwidth Information Structure of HMAT (Heterogeneous
> +# Memory Attribute Table)
> +#
> +# For more information of @HmatLBDataType see
> +# the chapter 5.2.27.4: Table 5-142:  Field "Data Type" of ACPI 6.3 spec.
> +#
> +# @access-latency: access latency (nanoseconds)
> +#
> +# @read-latency: read latency (nanoseconds)
> +#
> +# @write-latency: write latency (nanoseconds)
> +#
> +# @access-bandwidth: access bandwidth (MB/s)
> +#
> +# @read-bandwidth: read bandwidth (MB/s)
> +#
> +# @write-bandwidth: write bandwidth (MB/s)
> +#
> +# Since: 4.2
> +##
> +{ 'enum': 'HmatLBDataType',
> +  'data': [ 'access-latency', 'read-latency', 'write-latency',
> +            'access-bandwidth', 'read-bandwidth', 'write-bandwidth' ] }
> +
> +##
> +# @NumaHmatLBOptions:
> +#
> +# Set the system locality latency and bandwidth information
> +# between Initiator and Target proximity Domains.
> +#
> +# For more information of @NumaHmatLBOptions see
> +# the chapter 5.2.27.4: Table 5-142 of ACPI 6.3 spec.
> +#
> +# @initiator: the Initiator Proximity Domain.
> +#
> +# @target: the Target Proximity Domain.
> +#
> +# @hierarchy: the Memory Hierarchy. Indicates the performance
> +#             of memory or side cache.
> +#
> +# @data-type: presents the type of data, access/read/write
> +#             latency or hit latency.
> +#
> +# @latency: the value of latency from @initiator to @target proximity domain,
> +#           the latency units are "ps(picosecond)", "ns(nanosecond)" or
> +#           "us(microsecond)".
> +#
> +# @bandwidth: the value of bandwidth between @initiator and @target proximity
> +#             domain, the bandwidth units are "MB(/s)","GB(/s)" or "TB(/s)".
> +#
> +# Since: 4.2
> +##
> +{ 'struct': 'NumaHmatLBOptions',
> +    'data': {
> +    'initiator': 'uint16',
> +    'target': 'uint16',
> +    'hierarchy': 'HmatLBMemoryHierarchy',
> +    'data-type': 'HmatLBDataType',
> +    '*latency': 'time',
> +    '*bandwidth': 'size' }}
> +
>  ##
>  # @HostMemPolicy:
>  #
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 1f96399521..de97939f9a 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -168,16 +168,19 @@ DEF("numa", HAS_ARG, QEMU_OPTION_numa,
>      "-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
>      "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
>      "-numa dist,src=source,dst=destination,val=distance\n"
> -    "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n",
> +    "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n"
> +    "-numa hmat-lb,initiator=node,target=node,hierarchy=memory|first-level|second-level|third-level,data-type=access-latency|read-latency|write-latency[,latency=lat][,bandwidth=bw]\n",
>      QEMU_ARCH_ALL)
>  STEXI
>  @item -numa node[,mem=@var{size}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}][,initiator=@var{initiator}]
>  @itemx -numa node[,memdev=@var{id}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}][,initiator=@var{initiator}]
>  @itemx -numa dist,src=@var{source},dst=@var{destination},val=@var{distance}
>  @itemx -numa cpu,node-id=@var{node}[,socket-id=@var{x}][,core-id=@var{y}][,thread-id=@var{z}]
> +@itemx -numa hmat-lb,initiator=@var{node},target=@var{node},hierarchy=@var{str},data-type=@var{str}[,latency=@var{lat}][,bandwidth=@var{bw}]
>  @findex -numa
>  Define a NUMA node and assign RAM and VCPUs to it.
>  Set the NUMA distance from a source node to a destination node.
> +Set the ACPI Heterogeneous Memory Attributes for the given nodes.
>  
>  Legacy VCPU assignment uses @samp{cpus} option where
>  @var{firstcpu} and @var{lastcpu} are CPU indexes. Each
> @@ -256,6 +259,50 @@ specified resources, it just assigns existing resources to NUMA
>  nodes. This means that one still has to use the @option{-m},
>  @option{-smp} options to allocate RAM and VCPUs respectively.
>  
> +Use @samp{hmat-lb} to set System Locality Latency and Bandwidth Information
> +between initiator and target NUMA nodes in ACPI Heterogeneous Attribute Memory Table (HMAT).
> +Initiator NUMA node can create memory requests, usually including one or more processors.
> +Target NUMA node contains addressable memory.
> +
> +In @samp{hmat-lb} option, @var{node} are NUMA node IDs. @var{str} of 'hierarchy'
> +is the memory hierarchy of the target NUMA node: if @var{str} is 'memory', the structure
> +represents the memory performance; if @var{str} is 'first-level|second-level|third-level',
> +this structure represents aggregated performance of memory side caches for each domain.
> +@var{str} of 'data-type' is type of data represented by this structure instance:
> +if 'hierarchy' is 'memory', 'data-type' is 'access|read|write' latency(nanoseconds)
> +or 'access|read|write' bandwidth(MB/s) of the target memory; if 'hierarchy' is
> +'first-level|second-level|third-level', 'data-type' is 'access|read|write' hit latency
> +or 'access|read|write' hit bandwidth of the target memory side cache.
> +
> +@var{lat} of 'latency' is latency value, the possible value and units are
> +NUM[ps|ns|us] (picosecond|nanosecond|microsecond), the recommended unit is 'ns'. @var{bw}
> +is bandwidth value, the possible value and units are NUM[M|G|T], mean that
> +the bandwidth value are NUM MB/s, GB/s or TB/s. Note that max NUM is 65534,
> +if NUM is 0, means the corresponding latency or bandwidth information is not provided.
> +And if input numbers without any unit, the latency unit will be 'ps' and the bandwidth
> +will be MB/s.
> +
> +For example, the following option assigns NUMA node 0 and 1. Node 0 has 2 cpus and
> +a ram, node 1 has only a ram. The processors in node 0 access memory in node
> +0 with access-latency 5 nanoseconds, access-bandwidth is 200 MB/s;
> +The processors in NUMA node 0 access memory in NUMA node 1 with access-latency 10
> +nanoseconds, access-bandwidth is 100 MB/s.
> +@example
> +-machine hmat=on \
> +-m 2G \
> +-object memory-backend-ram,size=1G,id=m0 \
> +-object memory-backend-ram,size=1G,id=m1 \
> +-smp 2 \
> +-numa node,nodeid=0,memdev=m0 \
> +-numa node,nodeid=1,memdev=m1,initiator=0 \
> +-numa cpu,node-id=0,socket-id=0 \
> +-numa cpu,node-id=0,socket-id=1 \
> +-numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,latency=5ns \
> +-numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=200M \
> +-numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,latency=10ns \
> +-numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=100M
> +@end example
> +
>  ETEXI
>  
>  DEF("add-fd", HAS_ARG, QEMU_OPTION_add_fd,



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v13 06/12] numa: Extend CLI to provide memory latency and bandwidth information
  2019-10-22  7:08   ` Igor Mammedov
@ 2019-10-22  8:22     ` Tao Xu
  0 siblings, 0 replies; 38+ messages in thread
From: Tao Xu @ 2019-10-22  8:22 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: ehabkost, Liu, Jingqi, Du, Fan, qemu-devel, jonathan.cameron

On 10/22/2019 3:08 PM, Igor Mammedov wrote:
> On Sun, 20 Oct 2019 19:11:19 +0800
> Tao Xu <tao3.xu@intel.com> wrote:
> 
>> From: Liu Jingqi <jingqi.liu@intel.com>
>>
>> Add -numa hmat-lb option to provide System Locality Latency and
>> Bandwidth Information. These memory attributes help to build
>> System Locality Latency and Bandwidth Information Structure(s)
>> in ACPI Heterogeneous Memory Attribute Table (HMAT).
>>
>> Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
>> Signed-off-by: Tao Xu <tao3.xu@intel.com>
>> ---
[...]
>> +void parse_numa_hmat_lb(NumaState *numa_state, NumaHmatLBOptions *node,
>> +                        Error **errp)
>> +{
>> +    int first_bit, last_bit;
>> +    uint64_t temp_latency;
>> +    NodeInfo *numa_info = numa_state->nodes;
>> +    HMAT_LB_Info *hmat_lb =
>> +        numa_state->hmat_lb[node->hierarchy][node->data_type];
>> +    HMAT_LB_Data lb_data;
>> +
>> +    /* Error checking */
>> +    if (node->initiator >= numa_state->num_nodes) {
>> +        error_setg(errp, "Invalid initiator=%d, it should be less than %d.",
>> +                   node->initiator, numa_state->num_nodes);
>> +        return;
>> +    }
>> +    if (node->target >= numa_state->num_nodes) {
>> +        error_setg(errp, "Invalid target=%d, it should be less than %d.",
>> +                   node->target, numa_state->num_nodes);
>> +        return;
>> +    }
>> +    if (!numa_info[node->initiator].has_cpu) {
>> +        error_setg(errp, "Invalid initiator=%d, it isn't an "
>> +                   "initiator proximity domain.", node->initiator);
>> +        return;
>> +    }
>> +    if (!numa_info[node->target].present) {
>> +        error_setg(errp, "Invalid target=%d, it hasn't a valid NUMA node.",
>> +                   node->target);
>> +        return;
>> +    }
>> +
>> +    if (!hmat_lb) {
>> +        hmat_lb = g_malloc0(sizeof(*hmat_lb));
>> +        numa_state->hmat_lb[node->hierarchy][node->data_type] = hmat_lb;
>> +        hmat_lb->latency = g_array_new(false, true, sizeof(HMAT_LB_Data));
>> +        hmat_lb->bandwidth = g_array_new(false, true, sizeof(HMAT_LB_Data));
>> +    }
>> +    hmat_lb->hierarchy = node->hierarchy;
>> +    hmat_lb->data_type = node->data_type;
>> +    lb_data.initiator = node->initiator;
>> +    lb_data.target = node->target;
>> +
>> +    /* Input latency data */
>> +    if (node->data_type <= HMATLB_DATA_TYPE_WRITE_LATENCY) {
>> +        if (!node->has_latency) {
>> +            error_setg(errp, "Missing 'latency' option.");
>> +            return;
>> +        }
>> +        if (node->has_bandwidth) {
>> +            error_setg(errp, "Invalid option 'bandwidth' since "
>> +                       "the data type is latency.");
>> +            return;
>> +        }
>> +
>> +        temp_latency = node->latency;
>> +        hmat_lb->base_latency = 1;
>> +        while (QEMU_IS_ALIGNED(temp_latency, 10)) {
>> +            temp_latency /= 10;
>> +            hmat_lb->base_latency *= 10;
>> +        }
>> +
>> +        if (temp_latency >= UINT64_MAX) {
>                              ^  ^^^^ doesn't make sense
> 
> can't you use range bitmap here as well?
> 
Because the time unit is decimal. For example, if latency are 
1ns(1000ps, bit 0011 1110 1000) and 65534ns(0011 1110 0111 1111 1000 
0011 0000)

0000 0000 0000 0000 0011 1110 1000
0011 1110 0111 1111 1000 0011 0000

Then we can get the first_bit is 3 and last_bit is 25, 25 - 3 > 16.

But if we use 10 to find base, we can find the base is 1000ps, 
compressed latency are 1 and 65534(< UINT16_MAX).



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v13 00/12] Build ACPI Heterogeneous Memory Attribute Table (HMAT)
  2019-10-20 11:11 [PATCH v13 00/12] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao Xu
                   ` (15 preceding siblings ...)
  2019-10-20 13:19 ` no-reply
@ 2019-10-22 11:22 ` Markus Armbruster
  2019-10-23  1:46   ` Tao Xu
  16 siblings, 1 reply; 38+ messages in thread
From: Markus Armbruster @ 2019-10-22 11:22 UTC (permalink / raw)
  To: Tao Xu
  Cc: ehabkost, jingqi.liu, fan.du, qemu-devel, jonathan.cameron, imammedo

I just stumbled over this series.  It touches the QAPI visitors and even
the generator, without cc'ing its maintainers.  Such changes require
review.  There's precious little time until the soft freeze now.  I'll
try, but no promises.

Please cc me and Michael Roth <mdroth@linux.vnet.ibm.com> on future
revisions.

In general, peruse output of scripts/get_maintainer.pl to find relevant
maintainers.

Tao Xu <tao3.xu@intel.com> writes:

> This series of patches will build Heterogeneous Memory Attribute Table (HMAT)
> according to the command line. The ACPI HMAT describes the memory attributes,
> such as memory side cache attributes and bandwidth and latency details,
> related to the Memory Proximity Domain.
> The software is expected to use HMAT information as hint for optimization.
>
> In the linux kernel, the codes in drivers/acpi/hmat/hmat.c parse and report
> the platform's HMAT tables.
>
> The V12 patches link:
> https://patchwork.kernel.org/cover/11153861/
>
> Changelog:
> v13:
>     - Modify some text description
>     - Drop "initiator_valid" field in struct NodeInfo
>     - Reuse Garray to store the raw bandwidth and bandwidth data
>     - Calculate common base unit using range bitmap
>     - Add a patch to alculate hmat latency and bandwidth entry list
>     - Drop the total_levels option and use readable cache size
>     - Remove the unnecessary head file
>     - Use decimal notation with appropriate suffix for cache size
> v12:
>     - Fix a bug that a memory-only node without initiator setting
>       doesn't report error. (reported by Danmei Wei)
>     - Fix a bug that if HMAT is enabled and without hmat-lb setting,
>       QEMU will crash. (reported by Danmei Wei)
> v11:
>     - Move numa option patches forward.
>     - Add num_initiator in Numa_state to record the number of
>     initiators.
>     - Simplify struct HMAT_LB_Info, use uint64_t array to store data.
>     - Drop hmat_get_base().
>     - Calculate base in build_hmat_lb().
> v10:
>     - Add qemu_strtotime_ps() to convert strings with time suffixes
>     to numbers, and add some tests for it.
>     - Add qapi buildin type time, and add some tests for it.
>     - Add machine oprion properties "-machine hmat=on|off" for
> 	  enabling or disabling HMAT in QEMU.

I guess this is where you started messing with QAPI.  Seven weeks ago.
The time crunch is of your own making, I'm afraid.

> v9:
>     - change the CLI input way, make it more user firendly (Daniel Black)
>     use latency=NUM[p|n|u]s and bandwidth=NUM[M|G|P](B/s) as input and drop
>     the base-lat and base-bw input.
> Liu Jingqi (5):
>   numa: Extend CLI to provide memory latency and bandwidth information
>   numa: Extend CLI to provide memory side cache information
>   hmat acpi: Build Memory Proximity Domain Attributes Structure(s)
>   hmat acpi: Build System Locality Latency and Bandwidth Information
>     Structure(s)
>   hmat acpi: Build Memory Side Cache Information Structure(s)
>
> Tao Xu (7):
>   util/cutils: Add qemu_strtotime_ps()
>   tests/cutils: Add test for qemu_strtotime_ps()
>   qapi: Add builtin type time
>   tests: Add test for QAPI builtin type time
>   numa: Extend CLI to provide initiator information for numa nodes
>   numa: Calculate hmat latency and bandwidth entry list
>   tests/bios-tables-test: add test cases for ACPI HMAT
>
>  hw/acpi/Kconfig                    |   7 +-
>  hw/acpi/Makefile.objs              |   1 +
>  hw/acpi/hmat.c                     | 263 +++++++++++++++++++++++++++
>  hw/acpi/hmat.h                     |  42 +++++
>  hw/core/machine.c                  |  70 ++++++++
>  hw/core/numa.c                     | 273 ++++++++++++++++++++++++++++-
>  hw/i386/acpi-build.c               |   5 +
>  include/qapi/visitor-impl.h        |   4 +
>  include/qapi/visitor.h             |   9 +
>  include/qemu/cutils.h              |   1 +
>  include/sysemu/numa.h              | 104 +++++++++++
>  qapi/machine.json                  | 179 ++++++++++++++++++-
>  qapi/opts-visitor.c                |  22 +++
>  qapi/qapi-visit-core.c             |  12 ++
>  qapi/qobject-input-visitor.c       |  18 ++
>  qapi/trace-events                  |   1 +
>  qemu-options.hx                    |  96 +++++++++-
>  scripts/qapi/common.py             |   1 +
>  tests/bios-tables-test.c           |  44 +++++
>  tests/test-cutils.c                | 199 +++++++++++++++++++++
>  tests/test-keyval.c                | 125 +++++++++++++
>  tests/test-qobject-input-visitor.c |  29 +++
>  util/cutils.c                      |  82 +++++++++
>  23 files changed, 1575 insertions(+), 12 deletions(-)
>  create mode 100644 hw/acpi/hmat.c
>  create mode 100644 hw/acpi/hmat.h


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v13 01/12] util/cutils: Add qemu_strtotime_ps()
  2019-10-20 11:11 ` [PATCH v13 01/12] util/cutils: Add qemu_strtotime_ps() Tao Xu
@ 2019-10-23  1:08   ` Eduardo Habkost
  2019-10-23  6:07     ` Tao Xu
  2019-10-23  1:13   ` Eric Blake
  2019-10-24  9:54   ` Daniel P. Berrangé
  2 siblings, 1 reply; 38+ messages in thread
From: Eduardo Habkost @ 2019-10-23  1:08 UTC (permalink / raw)
  To: Tao Xu; +Cc: jingqi.liu, fan.du, qemu-devel, jonathan.cameron, imammedo

Hi,

First of all, sorry for not reviewing this earlier.  I thought
other people were already looking at the first 4 patches.

On Sun, Oct 20, 2019 at 07:11:14PM +0800, Tao Xu wrote:
> To convert strings with time suffixes to numbers, support time unit are
> "ps" for picosecond, "ns" for nanosecond, "us" for microsecond, "ms"
> for millisecond or "s" for second.
> 
> Signed-off-by: Tao Xu <tao3.xu@intel.com>
> ---
> 
> No changes in v13.
> ---
>  include/qemu/cutils.h |  1 +
>  util/cutils.c         | 82 +++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 83 insertions(+)
> 
> diff --git a/include/qemu/cutils.h b/include/qemu/cutils.h
> index b54c847e0f..7b6d106bdd 100644
> --- a/include/qemu/cutils.h
> +++ b/include/qemu/cutils.h
> @@ -182,5 +182,6 @@ int uleb128_decode_small(const uint8_t *in, uint32_t *n);
>   * *str1 is <, == or > than *str2.
>   */
>  int qemu_pstrcmp0(const char **str1, const char **str2);
> +int qemu_strtotime_ps(const char *nptr, const char **end, uint64_t *result);
>  
>  #endif
> diff --git a/util/cutils.c b/util/cutils.c
> index fd591cadf0..a50c15f46a 100644
> --- a/util/cutils.c
> +++ b/util/cutils.c
> @@ -847,3 +847,85 @@ int qemu_pstrcmp0(const char **str1, const char **str2)
>  {
>      return g_strcmp0(*str1, *str2);
>  }
> +
> +static int64_t timeunit_mul(const char *unitstr)
> +{
> +    if (g_strcmp0(unitstr, "ps") == 0) {
> +        return 1;

This makes do_strtotime("123ps,...", &end, ...) fail because of
trailing data.

> +    } else if (g_strcmp0(unitstr, "ns") == 0) {
> +        return 1000;
> +    } else if (g_strcmp0(unitstr, "us") == 0) {
> +        return 1000000;
> +    } else if (g_strcmp0(unitstr, "ms") == 0) {
> +        return 1000000000LL;
> +    } else if (g_strcmp0(unitstr, "s") == 0) {
> +        return 1000000000000LL;

Same as above, for the other suffixes.

> +    } else {
> +        return -1;

But this makes do_strtotime("123,...", &end, ...) work as
expected.

This is inconsistent.  I see that you are not testing non-NULL
`end` argument at test_qemu_strtotime_ps_trailing(), so that's
probably why this issue wasn't detected.

> +    }
> +}
> +
> +
> +/*
> + * Convert string to time, support time unit are ps for picosecond,
> + * ns for nanosecond, us for microsecond, ms for millisecond or s for second.
> + * End pointer will be returned in *end, if not NULL. Return -ERANGE on
> + * overflow, and -EINVAL on other error.
> + */
> +static int do_strtotime(const char *nptr, const char **end,
> +                      const char *default_unit, uint64_t *result)
> +{
> +    int retval;
> +    const char *endptr;
> +    int mul_required = 0;
> +    int64_t mul;
> +    double val, integral, fraction;
> +
> +    retval = qemu_strtod_finite(nptr, &endptr, &val);
> +    if (retval) {
> +        goto out;
> +    }
> +    fraction = modf(val, &integral);
> +    if (fraction != 0) {
> +        mul_required = 1;
> +    }
> +
> +    mul = timeunit_mul(endptr);
> +
> +    if (mul == 1000000000000LL) {
> +        endptr++;
> +    } else if (mul != -1) {
> +        endptr += 2;

This is fragile.  It can break very easily if additional
one-letter suffixes are added to timeunit_mul() in the future.

One option to make this safer is to make timeunit_mul() get a
pointer to endptr.


> +    } else {
> +        mul = timeunit_mul(default_unit);
> +        assert(mul >= 0);
> +    }
> +    if (mul == 1 && mul_required) {
> +        retval = -EINVAL;
> +        goto out;
> +    }
> +    /*
> +     * Values >= 0xfffffffffffffc00 overflow uint64_t after their trip
> +     * through double (53 bits of precision).
> +     */
> +    if ((val * (double)mul >= 0xfffffffffffffc00) || val < 0) {
> +        retval = -ERANGE;
> +        goto out;
> +    }
> +    *result = val * (double)mul;
> +    retval = 0;
> +
> +out:
> +    if (end) {
> +        *end = endptr;

This indicates that having trailing data after the string is OK
if `end` is not NULL, but timeunit_mul() doesn't take that into
account.

Considering that this function is just a copy of do_strtosz(), I
suggest making do_strtosz() and suffix_mul() get a
suffix/multiplier table as input, instead of duplicating the
code.


> +    } else if (*endptr) {
> +        retval = -EINVAL;
> +    }
> +
> +    return retval;
> +}
> +
> +int qemu_strtotime_ps(const char *nptr, const char **end, uint64_t *result)
> +{
> +    return do_strtotime(nptr, end, "ps", result);
> +}
> -- 
> 2.20.1
> 

-- 
Eduardo



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v13 01/12] util/cutils: Add qemu_strtotime_ps()
  2019-10-20 11:11 ` [PATCH v13 01/12] util/cutils: Add qemu_strtotime_ps() Tao Xu
  2019-10-23  1:08   ` Eduardo Habkost
@ 2019-10-23  1:13   ` Eric Blake
  2019-10-23  1:50     ` Tao Xu
  2019-10-24  9:54   ` Daniel P. Berrangé
  2 siblings, 1 reply; 38+ messages in thread
From: Eric Blake @ 2019-10-23  1:13 UTC (permalink / raw)
  To: Tao Xu, imammedo, ehabkost
  Cc: jingqi.liu, fan.du, qemu-devel, jonathan.cameron

On 10/20/19 6:11 AM, Tao Xu wrote:
> To convert strings with time suffixes to numbers, support time unit are
> "ps" for picosecond, "ns" for nanosecond, "us" for microsecond, "ms"
> for millisecond or "s" for second.

I haven't yet reviewed the patch itself, but my off-hand observation:

picosecond is probably too narrow to ever be useful.  POSIX interfaces 
only go down to nanoseconds, and when you start adding in vmexit delay 
times and such, we're lucky when we get anything better than microsecond 
accuracies.  Supporting just three sub-second suffixes instead of four 
would slightly simplify the code, and not cost you any real precision.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v13 00/12] Build ACPI Heterogeneous Memory Attribute Table (HMAT)
  2019-10-22 11:22 ` Markus Armbruster
@ 2019-10-23  1:46   ` Tao Xu
  0 siblings, 0 replies; 38+ messages in thread
From: Tao Xu @ 2019-10-23  1:46 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: ehabkost, Liu, Jingqi, Du, Fan, qemu-devel, jonathan.cameron, imammedo

On 10/22/2019 7:22 PM, Markus Armbruster wrote:
> I just stumbled over this series.  It touches the QAPI visitors and even
> the generator, without cc'ing its maintainers.  Such changes require
> review.  There's precious little time until the soft freeze now.  I'll
> try, but no promises.
> 
> Please cc me and Michael Roth <mdroth@linux.vnet.ibm.com> on future
> revisions.
> 
> In general, peruse output of scripts/get_maintainer.pl to find relevant
> maintainers.

[...]
>> v10:
>>      - Add qemu_strtotime_ps() to convert strings with time suffixes
>>      to numbers, and add some tests for it.
>>      - Add qapi buildin type time, and add some tests for it.
>>      - Add machine oprion properties "-machine hmat=on|off" for
>> 	  enabling or disabling HMAT in QEMU.
> 
> I guess this is where you started messing with QAPI.  Seven weeks ago.
> The time crunch is of your own making, I'm afraid.
> 

I am sorry. I will cc all related maintainers next time.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v13 01/12] util/cutils: Add qemu_strtotime_ps()
  2019-10-23  1:13   ` Eric Blake
@ 2019-10-23  1:50     ` Tao Xu
  0 siblings, 0 replies; 38+ messages in thread
From: Tao Xu @ 2019-10-23  1:50 UTC (permalink / raw)
  To: Eric Blake
  Cc: ehabkost, Liu, Jingqi, Du, Fan, qemu-devel, jonathan.cameron, imammedo

On 10/23/2019 9:13 AM, Eric Blake wrote:
> On 10/20/19 6:11 AM, Tao Xu wrote:
>> To convert strings with time suffixes to numbers, support time unit are
>> "ps" for picosecond, "ns" for nanosecond, "us" for microsecond, "ms"
>> for millisecond or "s" for second.
> 
> I haven't yet reviewed the patch itself, but my off-hand observation:
> 
> picosecond is probably too narrow to ever be useful.  POSIX interfaces
> only go down to nanoseconds, and when you start adding in vmexit delay
> times and such, we're lucky when we get anything better than microsecond
> accuracies.  Supporting just three sub-second suffixes instead of four
> would slightly simplify the code, and not cost you any real precision.
> 
Thank you for your suggestion.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v13 01/12] util/cutils: Add qemu_strtotime_ps()
  2019-10-23  1:08   ` Eduardo Habkost
@ 2019-10-23  6:07     ` Tao Xu
  0 siblings, 0 replies; 38+ messages in thread
From: Tao Xu @ 2019-10-23  6:07 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: Liu, Jingqi, Du, Fan, qemu-devel, jonathan.cameron, imammedo

On 10/23/2019 9:08 AM, Eduardo Habkost wrote:
> Hi,
> 
> First of all, sorry for not reviewing this earlier.  I thought
> other people were already looking at the first 4 patches.
> 
> On Sun, Oct 20, 2019 at 07:11:14PM +0800, Tao Xu wrote:
>> To convert strings with time suffixes to numbers, support time unit are
>> "ps" for picosecond, "ns" for nanosecond, "us" for microsecond, "ms"
>> for millisecond or "s" for second.
>>
>> Signed-off-by: Tao Xu <tao3.xu@intel.com>
>> ---
>>
>> No changes in v13.
>> ---
>>   include/qemu/cutils.h |  1 +
>>   util/cutils.c         | 82 +++++++++++++++++++++++++++++++++++++++++++
>>   2 files changed, 83 insertions(+)
>>
>> diff --git a/include/qemu/cutils.h b/include/qemu/cutils.h
>> index b54c847e0f..7b6d106bdd 100644
>> --- a/include/qemu/cutils.h
>> +++ b/include/qemu/cutils.h
>> @@ -182,5 +182,6 @@ int uleb128_decode_small(const uint8_t *in, uint32_t *n);
>>    * *str1 is <, == or > than *str2.
>>    */
>>   int qemu_pstrcmp0(const char **str1, const char **str2);
>> +int qemu_strtotime_ps(const char *nptr, const char **end, uint64_t *result);
>>   
>>   #endif
>> diff --git a/util/cutils.c b/util/cutils.c
>> index fd591cadf0..a50c15f46a 100644
>> --- a/util/cutils.c
>> +++ b/util/cutils.c
>> @@ -847,3 +847,85 @@ int qemu_pstrcmp0(const char **str1, const char **str2)
>>   {
>>       return g_strcmp0(*str1, *str2);
>>   }
>> +
>> +static int64_t timeunit_mul(const char *unitstr)
>> +{
>> +    if (g_strcmp0(unitstr, "ps") == 0) {
>> +        return 1;
> 
> This makes do_strtotime("123ps,...", &end, ...) fail because of
> trailing data.
> 
>> +    } else if (g_strcmp0(unitstr, "ns") == 0) {
>> +        return 1000;
>> +    } else if (g_strcmp0(unitstr, "us") == 0) {
>> +        return 1000000;
>> +    } else if (g_strcmp0(unitstr, "ms") == 0) {
>> +        return 1000000000LL;
>> +    } else if (g_strcmp0(unitstr, "s") == 0) {
>> +        return 1000000000000LL;
> 
> Same as above, for the other suffixes.
> 
>> +    } else {
>> +        return -1;
> 
> But this makes do_strtotime("123,...", &end, ...) work as
> expected.
> 
> This is inconsistent.  I see that you are not testing non-NULL
> `end` argument at test_qemu_strtotime_ps_trailing(), so that's
> probably why this issue wasn't detected.
> 
>> +    }
>> +}
>> +
>> +
>> +/*
>> + * Convert string to time, support time unit are ps for picosecond,
>> + * ns for nanosecond, us for microsecond, ms for millisecond or s for second.
>> + * End pointer will be returned in *end, if not NULL. Return -ERANGE on
>> + * overflow, and -EINVAL on other error.
>> + */
>> +static int do_strtotime(const char *nptr, const char **end,
>> +                      const char *default_unit, uint64_t *result)
>> +{
>> +    int retval;
>> +    const char *endptr;
>> +    int mul_required = 0;
>> +    int64_t mul;
>> +    double val, integral, fraction;
>> +
>> +    retval = qemu_strtod_finite(nptr, &endptr, &val);
>> +    if (retval) {
>> +        goto out;
>> +    }
>> +    fraction = modf(val, &integral);
>> +    if (fraction != 0) {
>> +        mul_required = 1;
>> +    }
>> +
>> +    mul = timeunit_mul(endptr);
>> +
>> +    if (mul == 1000000000000LL) {
>> +        endptr++;
>> +    } else if (mul != -1) {
>> +        endptr += 2;
> 
> This is fragile.  It can break very easily if additional
> one-letter suffixes are added to timeunit_mul() in the future.
> 
> One option to make this safer is to make timeunit_mul() get a
> pointer to endptr.
> 
> 
>> +    } else {
>> +        mul = timeunit_mul(default_unit);
>> +        assert(mul >= 0);
>> +    }
>> +    if (mul == 1 && mul_required) {
>> +        retval = -EINVAL;
>> +        goto out;
>> +    }
>> +    /*
>> +     * Values >= 0xfffffffffffffc00 overflow uint64_t after their trip
>> +     * through double (53 bits of precision).
>> +     */
>> +    if ((val * (double)mul >= 0xfffffffffffffc00) || val < 0) {
>> +        retval = -ERANGE;
>> +        goto out;
>> +    }
>> +    *result = val * (double)mul;
>> +    retval = 0;
>> +
>> +out:
>> +    if (end) {
>> +        *end = endptr;
> 
> This indicates that having trailing data after the string is OK
> if `end` is not NULL, but timeunit_mul() doesn't take that into
> account.
> 
> Considering that this function is just a copy of do_strtosz(), I
> suggest making do_strtosz() and suffix_mul() get a
> suffix/multiplier table as input, instead of duplicating the
> code.
> 
> Thanks for your suggestion, I will try it.



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v13 06/12] numa: Extend CLI to provide memory latency and bandwidth information
  2019-10-20 11:11 ` [PATCH v13 06/12] numa: Extend CLI to provide memory latency and bandwidth information Tao Xu
  2019-10-22  7:08   ` Igor Mammedov
@ 2019-10-23 15:28   ` Igor Mammedov
  2019-10-25  6:33     ` Tao Xu
  1 sibling, 1 reply; 38+ messages in thread
From: Igor Mammedov @ 2019-10-23 15:28 UTC (permalink / raw)
  To: Tao Xu
  Cc: ehabkost, jingqi.liu, fan.du, qemu-devel, Markus Armbruster,
	jonathan.cameron

On Sun, 20 Oct 2019 19:11:19 +0800
Tao Xu <tao3.xu@intel.com> wrote:

> From: Liu Jingqi <jingqi.liu@intel.com>
> 
> Add -numa hmat-lb option to provide System Locality Latency and
> Bandwidth Information. These memory attributes help to build
> System Locality Latency and Bandwidth Information Structure(s)
> in ACPI Heterogeneous Memory Attribute Table (HMAT).
> 
> Signed-off-by: Liu Jingqi <jingqi.liu@intel.com>
> Signed-off-by: Tao Xu <tao3.xu@intel.com>
> ---
> 
> Changes in v13:
>     - Reuse Garray to store the raw bandwidth and bandwidth data
>     - Calculate common base unit using range bitmap (Igor)
> ---
>  hw/core/numa.c        | 127 ++++++++++++++++++++++++++++++++++++++++++
>  include/sysemu/numa.h |  68 ++++++++++++++++++++++
>  qapi/machine.json     |  95 ++++++++++++++++++++++++++++++-
>  qemu-options.hx       |  49 +++++++++++++++-
>  4 files changed, 336 insertions(+), 3 deletions(-)
Below some comments on doc parts of the patch
(since I'm too familiar with the topic y now I can't properly review doc parts)

perhaps Eric and Markus can suggest a better way to describe new options.

[...]

> diff --git a/qapi/machine.json b/qapi/machine.json
> index f1b07b3486..9ca008810b 100644
> --- a/qapi/machine.json
> +++ b/qapi/machine.json
> @@ -426,10 +426,12 @@
>  #
>  # @cpu: property based CPU(s) to node mapping (Since: 2.10)
>  #
> +# @hmat-lb: memory latency and bandwidth information (Since: 4.2)
> +#
>  # Since: 2.1
>  ##
>  { 'enum': 'NumaOptionsType',
> -  'data': [ 'node', 'dist', 'cpu' ] }
> +  'data': [ 'node', 'dist', 'cpu', 'hmat-lb' ] }
>  
>  ##
>  # @NumaOptions:
> @@ -444,7 +446,8 @@
>    'data': {
>      'node': 'NumaNodeOptions',
>      'dist': 'NumaDistOptions',
> -    'cpu': 'NumaCpuOptions' }}
> +    'cpu': 'NumaCpuOptions',
> +    'hmat-lb': 'NumaHmatLBOptions' }}
>  
>  ##
>  # @NumaNodeOptions:
> @@ -557,6 +560,94 @@
>     'base': 'CpuInstanceProperties',
>     'data' : {} }
>  
> +##
> +# @HmatLBMemoryHierarchy:
> +#
> +# The memory hierarchy in the System Locality Latency
> +# and Bandwidth Information Structure of HMAT (Heterogeneous
> +# Memory Attribute Table)
> +#
> +# For more information of @HmatLBMemoryHierarchy see
> +# the chapter 5.2.27.4: Table 5-142: Field "Flags" of ACPI 6.3 spec.
> +#
> +# @memory: the structure represents the memory performance
> +#
> +# @first-level: first level memory of memory side cached memory
> +#
> +# @second-level: second level memory of memory side cached memory
> +#
> +# @third-level: third level memory of memory side cached memory
> +#
> +# Since: 4.2
> +##
> +{ 'enum': 'HmatLBMemoryHierarchy',
> +  'data': [ 'memory', 'first-level', 'second-level', 'third-level' ] }
> +
> +##
> +# @HmatLBDataType:
> +#
> +# Data type in the System Locality Latency
> +# and Bandwidth Information Structure of HMAT (Heterogeneous
> +# Memory Attribute Table)
> +#
> +# For more information of @HmatLBDataType see
> +# the chapter 5.2.27.4: Table 5-142:  Field "Data Type" of ACPI 6.3 spec.
> +#
> +# @access-latency: access latency (nanoseconds)
> +#
> +# @read-latency: read latency (nanoseconds)
> +#
> +# @write-latency: write latency (nanoseconds)
> +#
> +# @access-bandwidth: access bandwidth (MB/s)
> +#
> +# @read-bandwidth: read bandwidth (MB/s)
> +#
> +# @write-bandwidth: write bandwidth (MB/s)
I think units here are not appropriate, values stored in fields are
minimal base units only and nothing else (i.e. ps and B/s)

> +#
> +# Since: 4.2
> +##
> +{ 'enum': 'HmatLBDataType',
> +  'data': [ 'access-latency', 'read-latency', 'write-latency',
> +            'access-bandwidth', 'read-bandwidth', 'write-bandwidth' ] }
> +
> +##
> +# @NumaHmatLBOptions:
> +#
> +# Set the system locality latency and bandwidth information
> +# between Initiator and Target proximity Domains.
> +#
> +# For more information of @NumaHmatLBOptions see
> +# the chapter 5.2.27.4: Table 5-142 of ACPI 6.3 spec.
> +#
> +# @initiator: the Initiator Proximity Domain.
> +#
> +# @target: the Target Proximity Domain.
> +#
> +# @hierarchy: the Memory Hierarchy. Indicates the performance
> +#             of memory or side cache.
> +#
> +# @data-type: presents the type of data, access/read/write
> +#             latency or hit latency.

> +# @latency: the value of latency from @initiator to @target proximity domain,
> +#           the latency units are "ps(picosecond)", "ns(nanosecond)" or
> +#           "us(microsecond)".
> +#
> +# @bandwidth: the value of bandwidth between @initiator and @target proximity
> +#             domain, the bandwidth units are "MB(/s)","GB(/s)" or "TB(/s)".
ditto

> +# Since: 4.2
> +##
> +{ 'struct': 'NumaHmatLBOptions',
> +    'data': {
> +    'initiator': 'uint16',
> +    'target': 'uint16',
> +    'hierarchy': 'HmatLBMemoryHierarchy',
> +    'data-type': 'HmatLBDataType',
> +    '*latency': 'time',
> +    '*bandwidth': 'size' }}
> +
>  ##
>  # @HostMemPolicy:
>  #
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 1f96399521..de97939f9a 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -168,16 +168,19 @@ DEF("numa", HAS_ARG, QEMU_OPTION_numa,
>      "-numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
>      "-numa node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
>      "-numa dist,src=source,dst=destination,val=distance\n"
> -    "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n",
> +    "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n"
> +    "-numa hmat-lb,initiator=node,target=node,hierarchy=memory|first-level|second-level|third-level,data-type=access-latency|read-latency|write-latency[,latency=lat][,bandwidth=bw]\n",
>      QEMU_ARCH_ALL)
>  STEXI
>  @item -numa node[,mem=@var{size}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}][,initiator=@var{initiator}]
>  @itemx -numa node[,memdev=@var{id}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}][,initiator=@var{initiator}]
>  @itemx -numa dist,src=@var{source},dst=@var{destination},val=@var{distance}
>  @itemx -numa cpu,node-id=@var{node}[,socket-id=@var{x}][,core-id=@var{y}][,thread-id=@var{z}]
> +@itemx -numa hmat-lb,initiator=@var{node},target=@var{node},hierarchy=@var{str},data-type=@var{str}[,latency=@var{lat}][,bandwidth=@var{bw}]
                                                                              ^^^                 ^^^
Using the same 'str' for 2 different enums is confusing.
Suggest for 1st use 'level' and for the second just 'type'

>  @findex -numa
>  Define a NUMA node and assign RAM and VCPUs to it.
>  Set the NUMA distance from a source node to a destination node.
> +Set the ACPI Heterogeneous Memory Attributes for the given nodes.
>  
>  Legacy VCPU assignment uses @samp{cpus} option where
>  @var{firstcpu} and @var{lastcpu} are CPU indexes. Each
> @@ -256,6 +259,50 @@ specified resources, it just assigns existing resources to NUMA
>  nodes. This means that one still has to use the @option{-m},
>  @option{-smp} options to allocate RAM and VCPUs respectively.
>  
> +Use @samp{hmat-lb} to set System Locality Latency and Bandwidth Information
> +between initiator and target NUMA nodes in ACPI Heterogeneous Attribute Memory Table (HMAT).
> +Initiator NUMA node can create memory requests, usually including one or more processors.
s/including/it has/

> +Target NUMA node contains addressable memory.
> +
> +In @samp{hmat-lb} option, @var{node} are NUMA node IDs. @var{str} of 'hierarchy'
> +is the memory hierarchy of the target NUMA node: if @var{str} is 'memory', the structure
> +represents the memory performance; if @var{str} is 'first-level|second-level|third-level',
> +this structure represents aggregated performance of memory side caches for each domain.
> +@var{str} of 'data-type' is type of data represented by this structure instance:
> +if 'hierarchy' is 'memory', 'data-type' is 'access|read|write' latency(nanoseconds)
is nanoseconds is right here? Looking at previous patches default value of suffix-less
should be picoseconds. I'd just drop '(nanoseconds)'. User will use appropriate suffix.

> +or 'access|read|write' bandwidth(MB/s) of the target memory; if 'hierarchy' is
ditto (MB/s), probably should be Bytes/s for default suffix-less value
(well, I'm not sure how to express it better)

> +'first-level|second-level|third-level', 'data-type' is 'access|read|write' hit latency
> +or 'access|read|write' hit bandwidth of the target memory side cache.
> +
> +@var{lat} of 'latency' is latency value, the possible value and units are
> +NUM[ps|ns|us] (picosecond|nanosecond|microsecond), the recommended unit is 'ns'. @var{bw}
> +is bandwidth value, the possible value and units are NUM[M|G|T], mean that

> +the bandwidth value are NUM MB/s, GB/s or TB/s. Note that max NUM is 65534,
> +if NUM is 0, means the corresponding latency or bandwidth information is not provided.
> +And if input numbers without any unit, the latency unit will be 'ps' and the bandwidth
> +will be MB/s.
 1st: above is applicable to both bw and lat values and should be documented as such
 2nd: 'max NUM is 65534' when different suffixes is fleeting target,
      spec says that entry with 0xFFFF is unreachable, so how about documenting
      unreachable value as 0xFFFFFFFFFFFFFFFF (then CLI parsing code will
      exclude it from range detection and acpi table building code translate it
      to internal 0xFFFF it could fit into the tables)

> +For example, the following option assigns NUMA node 0 and 1. Node 0 has 2 cpus and
> +a ram, node 1 has only a ram. The processors in node 0 access memory in node
> +0 with access-latency 5 nanoseconds, access-bandwidth is 200 MB/s;
> +The processors in NUMA node 0 access memory in NUMA node 1 with access-latency 10
> +nanoseconds, access-bandwidth is 100 MB/s.
> +@example
> +-machine hmat=on \
> +-m 2G \
> +-object memory-backend-ram,size=1G,id=m0 \
> +-object memory-backend-ram,size=1G,id=m1 \
> +-smp 2 \
> +-numa node,nodeid=0,memdev=m0 \
> +-numa node,nodeid=1,memdev=m1,initiator=0 \
> +-numa cpu,node-id=0,socket-id=0 \
> +-numa cpu,node-id=0,socket-id=1 \
> +-numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,latency=5ns \
> +-numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=200M \
> +-numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,latency=10ns \
> +-numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=100M
> +@end example
> +
>  ETEXI
>  
>  DEF("add-fd", HAS_ARG, QEMU_OPTION_add_fd,



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v13 01/12] util/cutils: Add qemu_strtotime_ps()
  2019-10-20 11:11 ` [PATCH v13 01/12] util/cutils: Add qemu_strtotime_ps() Tao Xu
  2019-10-23  1:08   ` Eduardo Habkost
  2019-10-23  1:13   ` Eric Blake
@ 2019-10-24  9:54   ` Daniel P. Berrangé
  2019-10-24 13:20     ` Eduardo Habkost
  2 siblings, 1 reply; 38+ messages in thread
From: Daniel P. Berrangé @ 2019-10-24  9:54 UTC (permalink / raw)
  To: Tao Xu
  Cc: ehabkost, jingqi.liu, fan.du, qemu-devel, jonathan.cameron, imammedo

On Sun, Oct 20, 2019 at 07:11:14PM +0800, Tao Xu wrote:
> To convert strings with time suffixes to numbers, support time unit are
> "ps" for picosecond, "ns" for nanosecond, "us" for microsecond, "ms"
> for millisecond or "s" for second.
> 
> Signed-off-by: Tao Xu <tao3.xu@intel.com>
> ---
> 
> No changes in v13.
> ---
>  include/qemu/cutils.h |  1 +
>  util/cutils.c         | 82 +++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 83 insertions(+)

This really ought to have an addition to the unit tests to validating
the parsing, both success and error scenarios, so that we're clear on
exactly what strings are accepted & rejected.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v13 01/12] util/cutils: Add qemu_strtotime_ps()
  2019-10-24  9:54   ` Daniel P. Berrangé
@ 2019-10-24 13:20     ` Eduardo Habkost
  2019-10-25  1:22       ` Tao Xu
  0 siblings, 1 reply; 38+ messages in thread
From: Eduardo Habkost @ 2019-10-24 13:20 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: jingqi.liu, Tao Xu, fan.du, qemu-devel, jonathan.cameron, imammedo

On Thu, Oct 24, 2019 at 10:54:57AM +0100, Daniel P. Berrangé wrote:
> On Sun, Oct 20, 2019 at 07:11:14PM +0800, Tao Xu wrote:
> > To convert strings with time suffixes to numbers, support time unit are
> > "ps" for picosecond, "ns" for nanosecond, "us" for microsecond, "ms"
> > for millisecond or "s" for second.
> > 
> > Signed-off-by: Tao Xu <tao3.xu@intel.com>
> > ---
> > 
> > No changes in v13.
> > ---
> >  include/qemu/cutils.h |  1 +
> >  util/cutils.c         | 82 +++++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 83 insertions(+)
> 
> This really ought to have an addition to the unit tests to validating
> the parsing, both success and error scenarios, so that we're clear on
> exactly what strings are accepted & rejected.

Unit tests are in patch 02/12.  It's a good idea to squash
patches 01 and 02 together.

-- 
Eduardo



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v13 01/12] util/cutils: Add qemu_strtotime_ps()
  2019-10-24 13:20     ` Eduardo Habkost
@ 2019-10-25  1:22       ` Tao Xu
  0 siblings, 0 replies; 38+ messages in thread
From: Tao Xu @ 2019-10-25  1:22 UTC (permalink / raw)
  To: Eduardo Habkost, Daniel P. Berrangé
  Cc: Liu, Jingqi, Du, Fan, qemu-devel, jonathan.cameron, imammedo

On 10/24/2019 9:20 PM, Eduardo Habkost wrote:
> On Thu, Oct 24, 2019 at 10:54:57AM +0100, Daniel P. Berrangé wrote:
>> On Sun, Oct 20, 2019 at 07:11:14PM +0800, Tao Xu wrote:
>>> To convert strings with time suffixes to numbers, support time unit are
>>> "ps" for picosecond, "ns" for nanosecond, "us" for microsecond, "ms"
>>> for millisecond or "s" for second.
>>>
>>> Signed-off-by: Tao Xu <tao3.xu@intel.com>
>>> ---
>>>
>>> No changes in v13.
>>> ---
>>>   include/qemu/cutils.h |  1 +
>>>   util/cutils.c         | 82 +++++++++++++++++++++++++++++++++++++++++++
>>>   2 files changed, 83 insertions(+)
>>
>> This really ought to have an addition to the unit tests to validating
>> the parsing, both success and error scenarios, so that we're clear on
>> exactly what strings are accepted & rejected.
> 
> Unit tests are in patch 02/12.  It's a good idea to squash
> patches 01 and 02 together.
> 
Yes it is in 02/12. OK I will squash them.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v13 06/12] numa: Extend CLI to provide memory latency and bandwidth information
  2019-10-23 15:28   ` Igor Mammedov
@ 2019-10-25  6:33     ` Tao Xu
  2019-10-25 13:27       ` Igor Mammedov
  0 siblings, 1 reply; 38+ messages in thread
From: Tao Xu @ 2019-10-25  6:33 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: ehabkost, Liu, Jingqi, Du, Fan, qemu-devel, Markus Armbruster,
	jonathan.cameron

On 10/23/2019 11:28 PM, Igor Mammedov wrote:
> On Sun, 20 Oct 2019 19:11:19 +0800
> Tao Xu <tao3.xu@intel.com> wrote:
[...]
>> +#
>> +# @access-bandwidth: access bandwidth (MB/s)
>> +#
>> +# @read-bandwidth: read bandwidth (MB/s)
>> +#
>> +# @write-bandwidth: write bandwidth (MB/s)
> I think units here are not appropriate, values stored in fields are
> minimal base units only and nothing else (i.e. ps and B/s)
> 
Eric suggest me to drop picoseconds. So here I can use ns. For 
bandwidth, if we use B/s here, does it let user or developer to 
misunderstand that the smallest unit is B/s ?

>>   @item -numa node[,mem=@var{size}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}][,initiator=@var{initiator}]
>>   @itemx -numa node[,memdev=@var{id}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}][,initiator=@var{initiator}]
>>   @itemx -numa dist,src=@var{source},dst=@var{destination},val=@var{distance}
>>   @itemx -numa cpu,node-id=@var{node}[,socket-id=@var{x}][,core-id=@var{y}][,thread-id=@var{z}]
>> +@itemx -numa hmat-lb,initiator=@var{node},target=@var{node},hierarchy=@var{str},data-type=@var{str}[,latency=@var{lat}][,bandwidth=@var{bw}]
>                                                                                ^^^                 ^^^
> Using the same 'str' for 2 different enums is confusing.
> Suggest for 1st use 'level' and for the second just 'type'
> 
Ok

>>   @findex -numa
>>   Define a NUMA node and assign RAM and VCPUs to it.
>>   Set the NUMA distance from a source node to a destination node.
>> +Set the ACPI Heterogeneous Memory Attributes for the given nodes.
>>   
>>   Legacy VCPU assignment uses @samp{cpus} option where
>>   @var{firstcpu} and @var{lastcpu} are CPU indexes. Each
>> @@ -256,6 +259,50 @@ specified resources, it just assigns existing resources to NUMA
>>   nodes. This means that one still has to use the @option{-m},
>>   @option{-smp} options to allocate RAM and VCPUs respectively.
>>   
>> +Use @samp{hmat-lb} to set System Locality Latency and Bandwidth Information
>> +between initiator and target NUMA nodes in ACPI Heterogeneous Attribute Memory Table (HMAT).
>> +Initiator NUMA node can create memory requests, usually including one or more processors.
> s/including/it has/
> 
>> +Target NUMA node contains addressable memory.
>> +
>> +In @samp{hmat-lb} option, @var{node} are NUMA node IDs. @var{str} of 'hierarchy'
>> +is the memory hierarchy of the target NUMA node: if @var{str} is 'memory', the structure
>> +represents the memory performance; if @var{str} is 'first-level|second-level|third-level',
>> +this structure represents aggregated performance of memory side caches for each domain.
>> +@var{str} of 'data-type' is type of data represented by this structure instance:
>> +if 'hierarchy' is 'memory', 'data-type' is 'access|read|write' latency(nanoseconds)
> is nanoseconds is right here? Looking at previous patches default value of suffix-less
> should be picoseconds. I'd just drop '(nanoseconds)'. User will use appropriate suffix.
> 
OK, I will drop it.
>> +or 'access|read|write' bandwidth(MB/s) of the target memory; if 'hierarchy' is
> ditto (MB/s), probably should be Bytes/s for default suffix-less value
> (well, I'm not sure how to express it better)
> 

But last version, we let !QEMU_IS_ALIGNED(node->bandwidth, MiB) as error.
>> +'first-level|second-level|third-level', 'data-type' is 'access|read|write' hit latency
>> +or 'access|read|write' hit bandwidth of the target memory side cache.
>> +
>> +@var{lat} of 'latency' is latency value, the possible value and units are
>> +NUM[ps|ns|us] (picosecond|nanosecond|microsecond), the recommended unit is 'ns'. @var{bw}
>> +is bandwidth value, the possible value and units are NUM[M|G|T], mean that
> 
>> +the bandwidth value are NUM MB/s, GB/s or TB/s. Note that max NUM is 65534,
>> +if NUM is 0, means the corresponding latency or bandwidth information is not provided.
>> +And if input numbers without any unit, the latency unit will be 'ps' and the bandwidth
>> +will be MB/s.
>   1st: above is applicable to both bw and lat values and should be documented as such
>   2nd: 'max NUM is 65534' when different suffixes is fleeting target,
>        spec says that entry with 0xFFFF is unreachable, so how about documenting
>        unreachable value as 0xFFFFFFFFFFFFFFFF (then CLI parsing code will
>        exclude it from range detection and acpi table building code translate it
>        to internal 0xFFFF it could fit into the tables)
> 

If we input 0xFFFFFFFFFFFFFFFF, qemu will raise error that parameter 
expect a size value.



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v13 06/12] numa: Extend CLI to provide memory latency and bandwidth information
  2019-10-25  6:33     ` Tao Xu
@ 2019-10-25 13:27       ` Igor Mammedov
  2019-10-25 19:44         ` Markus Armbruster
  0 siblings, 1 reply; 38+ messages in thread
From: Igor Mammedov @ 2019-10-25 13:27 UTC (permalink / raw)
  To: Tao Xu
  Cc: ehabkost, Liu, Jingqi, Du, Fan, Markus Armbruster, qemu-devel,
	jonathan.cameron

On Fri, 25 Oct 2019 14:33:53 +0800
Tao Xu <tao3.xu@intel.com> wrote:

> On 10/23/2019 11:28 PM, Igor Mammedov wrote:
> > On Sun, 20 Oct 2019 19:11:19 +0800
> > Tao Xu <tao3.xu@intel.com> wrote:  
> [...]
> >> +#
> >> +# @access-bandwidth: access bandwidth (MB/s)
> >> +#
> >> +# @read-bandwidth: read bandwidth (MB/s)
> >> +#
> >> +# @write-bandwidth: write bandwidth (MB/s)  
> > I think units here are not appropriate, values stored in fields are
> > minimal base units only and nothing else (i.e. ps and B/s)
> >   
> Eric suggest me to drop picoseconds. So here I can use ns. For 
> bandwidth, if we use B/s here, does it let user or developer to 
> misunderstand that the smallest unit is B/s ?

It's not nanoseconds or MB/s stored in theses fields, isn't it?
I'd specify units in which value is stored or drop units altogether.

Maybe Eric and Markus can suggest a better way to describe fields.

> >>   @item -numa node[,mem=@var{size}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}][,initiator=@var{initiator}]
> >>   @itemx -numa node[,memdev=@var{id}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}][,initiator=@var{initiator}]
> >>   @itemx -numa dist,src=@var{source},dst=@var{destination},val=@var{distance}
> >>   @itemx -numa cpu,node-id=@var{node}[,socket-id=@var{x}][,core-id=@var{y}][,thread-id=@var{z}]
> >> +@itemx -numa hmat-lb,initiator=@var{node},target=@var{node},hierarchy=@var{str},data-type=@var{str}[,latency=@var{lat}][,bandwidth=@var{bw}]  
> >                                                                                ^^^                 ^^^
> > Using the same 'str' for 2 different enums is confusing.
> > Suggest for 1st use 'level' and for the second just 'type'
> >   
> Ok
> 
> >>   @findex -numa
> >>   Define a NUMA node and assign RAM and VCPUs to it.
> >>   Set the NUMA distance from a source node to a destination node.
> >> +Set the ACPI Heterogeneous Memory Attributes for the given nodes.
> >>   
> >>   Legacy VCPU assignment uses @samp{cpus} option where
> >>   @var{firstcpu} and @var{lastcpu} are CPU indexes. Each
> >> @@ -256,6 +259,50 @@ specified resources, it just assigns existing resources to NUMA
> >>   nodes. This means that one still has to use the @option{-m},
> >>   @option{-smp} options to allocate RAM and VCPUs respectively.
> >>   
> >> +Use @samp{hmat-lb} to set System Locality Latency and Bandwidth Information
> >> +between initiator and target NUMA nodes in ACPI Heterogeneous Attribute Memory Table (HMAT).
> >> +Initiator NUMA node can create memory requests, usually including one or more processors.  
> > s/including/it has/
> >   
> >> +Target NUMA node contains addressable memory.
> >> +
> >> +In @samp{hmat-lb} option, @var{node} are NUMA node IDs. @var{str} of 'hierarchy'
> >> +is the memory hierarchy of the target NUMA node: if @var{str} is 'memory', the structure
> >> +represents the memory performance; if @var{str} is 'first-level|second-level|third-level',
> >> +this structure represents aggregated performance of memory side caches for each domain.
> >> +@var{str} of 'data-type' is type of data represented by this structure instance:
> >> +if 'hierarchy' is 'memory', 'data-type' is 'access|read|write' latency(nanoseconds)  
> > is nanoseconds is right here? Looking at previous patches default value of suffix-less
> > should be picoseconds. I'd just drop '(nanoseconds)'. User will use appropriate suffix.
> >   
> OK, I will drop it.
> >> +or 'access|read|write' bandwidth(MB/s) of the target memory; if 'hierarchy' is  
> > ditto (MB/s), probably should be Bytes/s for default suffix-less value
> > (well, I'm not sure how to express it better)
> >   
> 
> But last version, we let !QEMU_IS_ALIGNED(node->bandwidth, MiB) as error.
it's alignment requirement and it doesn't mean that value could not be
represented in bytes/s.
And it is bytes/s if suffix isn't used.

As for alignment and precision of values you probably need to document
expectations here as well.

> >> +'first-level|second-level|third-level', 'data-type' is 'access|read|write' hit latency
> >> +or 'access|read|write' hit bandwidth of the target memory side cache.
> >> +
> >> +@var{lat} of 'latency' is latency value, the possible value and units are
> >> +NUM[ps|ns|us] (picosecond|nanosecond|microsecond), the recommended unit is 'ns'. @var{bw}
> >> +is bandwidth value, the possible value and units are NUM[M|G|T], mean that  
> >   
> >> +the bandwidth value are NUM MB/s, GB/s or TB/s. Note that max NUM is 65534,
> >> +if NUM is 0, means the corresponding latency or bandwidth information is not provided.
> >> +And if input numbers without any unit, the latency unit will be 'ps' and the bandwidth
> >> +will be MB/s.  
> >   1st: above is applicable to both bw and lat values and should be documented as such
> >   2nd: 'max NUM is 65534' when different suffixes is fleeting target,
> >        spec says that entry with 0xFFFF is unreachable, so how about documenting
> >        unreachable value as 0xFFFFFFFFFFFFFFFF (then CLI parsing code will
> >        exclude it from range detection and acpi table building code translate it
> >        to internal 0xFFFF it could fit into the tables)
> >   
> 
> If we input 0xFFFFFFFFFFFFFFFF, qemu will raise error that parameter 
> expect a size value.

opts_type_size() can't handle values >= 0xfffffffffffffc00

commit f46bfdbfc8f (util/cutils: Change qemu_strtosz*() from int64_t to uint64_t)

so behavior would change depending on if the value is parsed by CLI (size) or QMP (unit64) parsers.

we can cannibalize 0x0 as the unreachable value and an absent bandwidth/lat option
for not specified case.
It would be conflicting with matrix [1] values in spec, but CLI/QMP deals with
absolute values which are later processed into HMAT substructure.

Markus,
Can we make opts_type_size() handle full range of uint64_t?

1)
ACPI 6.3 spec:
5.2.27.4 System Locality Latency and Bandwidth Information Structure



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v13 06/12] numa: Extend CLI to provide memory latency and bandwidth information
  2019-10-25 13:27       ` Igor Mammedov
@ 2019-10-25 19:44         ` Markus Armbruster
  2019-10-25 20:51           ` Eduardo Habkost
  2019-10-28  7:25           ` Tao Xu
  0 siblings, 2 replies; 38+ messages in thread
From: Markus Armbruster @ 2019-10-25 19:44 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: ehabkost, Liu, Jingqi, Tao Xu, Du, Fan, qemu-devel, jonathan.cameron

Igor Mammedov <imammedo@redhat.com> writes:

> On Fri, 25 Oct 2019 14:33:53 +0800
> Tao Xu <tao3.xu@intel.com> wrote:
>
>> On 10/23/2019 11:28 PM, Igor Mammedov wrote:
>> > On Sun, 20 Oct 2019 19:11:19 +0800
>> > Tao Xu <tao3.xu@intel.com> wrote:  
>> [...]
>> >> +#
>> >> +# @access-bandwidth: access bandwidth (MB/s)
>> >> +#
>> >> +# @read-bandwidth: read bandwidth (MB/s)
>> >> +#
>> >> +# @write-bandwidth: write bandwidth (MB/s)  
>> > I think units here are not appropriate, values stored in fields are
>> > minimal base units only and nothing else (i.e. ps and B/s)
>> >   
>> Eric suggest me to drop picoseconds. So here I can use ns. For 
>> bandwidth, if we use B/s here, does it let user or developer to 
>> misunderstand that the smallest unit is B/s ?
>
> It's not nanoseconds or MB/s stored in theses fields, isn't it?
> I'd specify units in which value is stored or drop units altogether.
>
> Maybe Eric and Markus can suggest a better way to describe fields.

This isn't review (yet), just an attempt to advise more quickly on
general QAPI/QMP conventions.

Unit prefixes like Mebi- are nice for humans, because 1MiB is clearer
than 1048576B.

QMP is for machines.  We eschew unit prefixes and unit symbols there.
The unit is implied.  Unit prefixes only complicate things.  Machines
can deal with 1048576 easily.  Also dealing 1024Ki and 1Mi is additional
work.  We therefore use JSON numbers for byte counts, not strings with
units.

The general rule is "always use the plainest implied unit that would
do."  There are exceptions, mostly due to review failure.

Byte rates should be in bytes per second.

For time, we've made a godawful mess.  The plainest unit is clearly the
second.  We commonly need sub-second granularity, though.
Floating-point seconds are unpopular for some reason :)  Instead we use
milli-, micro-, and nanoseconds, and even (seconds, microseconds) pairs.

QAPI schema documentation describes both the generated C and the QMP
wire protocol.  It must be written with the implied unit.  If you send a
byte rate in bytes per second via QMP, that's what you document.  Even
if a human interface lets you specify the byte rate in MiB/s.

Does this make sense?

>> >>   @item -numa node[,mem=@var{size}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}][,initiator=@var{initiator}]
>> >>   @itemx -numa node[,memdev=@var{id}][,cpus=@var{firstcpu}[-@var{lastcpu}]][,nodeid=@var{node}][,initiator=@var{initiator}]
>> >>   @itemx -numa dist,src=@var{source},dst=@var{destination},val=@var{distance}
>> >>   @itemx -numa cpu,node-id=@var{node}[,socket-id=@var{x}][,core-id=@var{y}][,thread-id=@var{z}]
>> >> +@itemx -numa hmat-lb,initiator=@var{node},target=@var{node},hierarchy=@var{str},data-type=@var{str}[,latency=@var{lat}][,bandwidth=@var{bw}]  
>> >                                                                                ^^^                 ^^^
>> > Using the same 'str' for 2 different enums is confusing.
>> > Suggest for 1st use 'level' and for the second just 'type'
>> >   
>> Ok
>> 
>> >>   @findex -numa
>> >>   Define a NUMA node and assign RAM and VCPUs to it.
>> >>   Set the NUMA distance from a source node to a destination node.
>> >> +Set the ACPI Heterogeneous Memory Attributes for the given nodes.
>> >>   
>> >>   Legacy VCPU assignment uses @samp{cpus} option where
>> >>   @var{firstcpu} and @var{lastcpu} are CPU indexes. Each
>> >> @@ -256,6 +259,50 @@ specified resources, it just assigns existing resources to NUMA
>> >>   nodes. This means that one still has to use the @option{-m},
>> >>   @option{-smp} options to allocate RAM and VCPUs respectively.
>> >>   
>> >> +Use @samp{hmat-lb} to set System Locality Latency and Bandwidth Information
>> >> +between initiator and target NUMA nodes in ACPI Heterogeneous Attribute Memory Table (HMAT).
>> >> +Initiator NUMA node can create memory requests, usually including one or more processors.  
>> > s/including/it has/
>> >   
>> >> +Target NUMA node contains addressable memory.
>> >> +
>> >> +In @samp{hmat-lb} option, @var{node} are NUMA node IDs. @var{str} of 'hierarchy'
>> >> +is the memory hierarchy of the target NUMA node: if @var{str} is 'memory', the structure
>> >> +represents the memory performance; if @var{str} is 'first-level|second-level|third-level',
>> >> +this structure represents aggregated performance of memory side caches for each domain.
>> >> +@var{str} of 'data-type' is type of data represented by this structure instance:
>> >> +if 'hierarchy' is 'memory', 'data-type' is 'access|read|write' latency(nanoseconds)  
>> > is nanoseconds is right here? Looking at previous patches default value of suffix-less
>> > should be picoseconds. I'd just drop '(nanoseconds)'. User will use appropriate suffix.
>> >   
>> OK, I will drop it.
>> >> +or 'access|read|write' bandwidth(MB/s) of the target memory; if 'hierarchy' is  
>> > ditto (MB/s), probably should be Bytes/s for default suffix-less value
>> > (well, I'm not sure how to express it better)
>> >   
>> 
>> But last version, we let !QEMU_IS_ALIGNED(node->bandwidth, MiB) as error.
> it's alignment requirement and it doesn't mean that value could not be
> represented in bytes/s.
> And it is bytes/s if suffix isn't used.
>
> As for alignment and precision of values you probably need to document
> expectations here as well.
>
>> >> +'first-level|second-level|third-level', 'data-type' is 'access|read|write' hit latency
>> >> +or 'access|read|write' hit bandwidth of the target memory side cache.
>> >> +
>> >> +@var{lat} of 'latency' is latency value, the possible value and units are
>> >> +NUM[ps|ns|us] (picosecond|nanosecond|microsecond), the recommended unit is 'ns'. @var{bw}
>> >> +is bandwidth value, the possible value and units are NUM[M|G|T], mean that  
>> >   
>> >> +the bandwidth value are NUM MB/s, GB/s or TB/s. Note that max NUM is 65534,
>> >> +if NUM is 0, means the corresponding latency or bandwidth information is not provided.
>> >> +And if input numbers without any unit, the latency unit will be 'ps' and the bandwidth
>> >> +will be MB/s.  
>> >   1st: above is applicable to both bw and lat values and should be documented as such
>> >   2nd: 'max NUM is 65534' when different suffixes is fleeting target,
>> >        spec says that entry with 0xFFFF is unreachable, so how about documenting
>> >        unreachable value as 0xFFFFFFFFFFFFFFFF (then CLI parsing code will
>> >        exclude it from range detection and acpi table building code translate it
>> >        to internal 0xFFFF it could fit into the tables)
>> >   
>> 
>> If we input 0xFFFFFFFFFFFFFFFF, qemu will raise error that parameter 
>> expect a size value.
>
> opts_type_size() can't handle values >= 0xfffffffffffffc00
>
> commit f46bfdbfc8f (util/cutils: Change qemu_strtosz*() from int64_t to uint64_t)
>
> so behavior would change depending on if the value is parsed by CLI (size) or QMP (unit64) parsers.

Do these values matter?  Is there a use case for passing
18446744073709550593 via CLI?

> we can cannibalize 0x0 as the unreachable value and an absent bandwidth/lat option
> for not specified case.
> It would be conflicting with matrix [1] values in spec, but CLI/QMP deals with
> absolute values which are later processed into HMAT substructure.
>
> Markus,
> Can we make opts_type_size() handle full range of uint64_t?

Maybe.

> 1)
> ACPI 6.3 spec:
> 5.2.27.4 System Locality Latency and Bandwidth Information Structure



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v13 06/12] numa: Extend CLI to provide memory latency and bandwidth information
  2019-10-25 19:44         ` Markus Armbruster
@ 2019-10-25 20:51           ` Eduardo Habkost
  2019-10-28  2:05             ` Tao Xu
  2019-10-28  7:25           ` Tao Xu
  1 sibling, 1 reply; 38+ messages in thread
From: Eduardo Habkost @ 2019-10-25 20:51 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Liu, Jingqi, Tao Xu, Du, Fan, qemu-devel, jonathan.cameron,
	Igor Mammedov

On Fri, Oct 25, 2019 at 09:44:50PM +0200, Markus Armbruster wrote:
> Igor Mammedov <imammedo@redhat.com> writes:
> 
> > On Fri, 25 Oct 2019 14:33:53 +0800
> > Tao Xu <tao3.xu@intel.com> wrote:
> >
> >> On 10/23/2019 11:28 PM, Igor Mammedov wrote:
> >> > On Sun, 20 Oct 2019 19:11:19 +0800
> >> > Tao Xu <tao3.xu@intel.com> wrote:  
> >> [...]
> >> >> +#
> >> >> +# @access-bandwidth: access bandwidth (MB/s)
> >> >> +#
> >> >> +# @read-bandwidth: read bandwidth (MB/s)
> >> >> +#
> >> >> +# @write-bandwidth: write bandwidth (MB/s)  
> >> > I think units here are not appropriate, values stored in fields are
> >> > minimal base units only and nothing else (i.e. ps and B/s)
> >> >   
> >> Eric suggest me to drop picoseconds. So here I can use ns. For 
> >> bandwidth, if we use B/s here, does it let user or developer to 
> >> misunderstand that the smallest unit is B/s ?
> >
> > It's not nanoseconds or MB/s stored in theses fields, isn't it?
> > I'd specify units in which value is stored or drop units altogether.
> >
> > Maybe Eric and Markus can suggest a better way to describe fields.
> 
> This isn't review (yet), just an attempt to advise more quickly on
> general QAPI/QMP conventions.
> 
> Unit prefixes like Mebi- are nice for humans, because 1MiB is clearer
> than 1048576B.
> 
> QMP is for machines.  We eschew unit prefixes and unit symbols there.
> The unit is implied.  Unit prefixes only complicate things.  Machines
> can deal with 1048576 easily.  Also dealing 1024Ki and 1Mi is additional
> work.  We therefore use JSON numbers for byte counts, not strings with
> units.
> 
> The general rule is "always use the plainest implied unit that would
> do."  There are exceptions, mostly due to review failure.
> 
> Byte rates should be in bytes per second.
> 
> For time, we've made a godawful mess.  The plainest unit is clearly the
> second.  We commonly need sub-second granularity, though.
> Floating-point seconds are unpopular for some reason :)  Instead we use
> milli-, micro-, and nanoseconds, and even (seconds, microseconds) pairs.
> 
> QAPI schema documentation describes both the generated C and the QMP
> wire protocol.  It must be written with the implied unit.  If you send a
> byte rate in bytes per second via QMP, that's what you document.  Even
> if a human interface lets you specify the byte rate in MiB/s.
> 
> Does this make sense?

This makes sense for the bandwidth fields.  We still need to
decide how to represent the latency field, though.

Seconds would be the obvious choice, if only it didn't risk
silently losing precision when converting numbers to floats.

-- 
Eduardo



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v13 06/12] numa: Extend CLI to provide memory latency and bandwidth information
  2019-10-25 20:51           ` Eduardo Habkost
@ 2019-10-28  2:05             ` Tao Xu
  2019-10-28  5:46               ` Markus Armbruster
  0 siblings, 1 reply; 38+ messages in thread
From: Tao Xu @ 2019-10-28  2:05 UTC (permalink / raw)
  To: Eduardo Habkost, Markus Armbruster, Igor Mammedov
  Cc: Liu, Jingqi, Du, Fan, qemu-devel, jonathan.cameron

On 10/26/2019 4:51 AM, Eduardo Habkost wrote:
> On Fri, Oct 25, 2019 at 09:44:50PM +0200, Markus Armbruster wrote:
>> Igor Mammedov <imammedo@redhat.com> writes:
>>
>>> On Fri, 25 Oct 2019 14:33:53 +0800
>>> Tao Xu <tao3.xu@intel.com> wrote:
>>>
>>>> On 10/23/2019 11:28 PM, Igor Mammedov wrote:
>>>>> On Sun, 20 Oct 2019 19:11:19 +0800
>>>>> Tao Xu <tao3.xu@intel.com> wrote:
>>>> [...]
>>>>>> +#
>>>>>> +# @access-bandwidth: access bandwidth (MB/s)
>>>>>> +#
>>>>>> +# @read-bandwidth: read bandwidth (MB/s)
>>>>>> +#
>>>>>> +# @write-bandwidth: write bandwidth (MB/s)
>>>>> I think units here are not appropriate, values stored in fields are
>>>>> minimal base units only and nothing else (i.e. ps and B/s)
>>>>>    
>>>> Eric suggest me to drop picoseconds. So here I can use ns. For
>>>> bandwidth, if we use B/s here, does it let user or developer to
>>>> misunderstand that the smallest unit is B/s ?
>>>
>>> It's not nanoseconds or MB/s stored in theses fields, isn't it?
>>> I'd specify units in which value is stored or drop units altogether.
>>>
>>> Maybe Eric and Markus can suggest a better way to describe fields.
>>
>> This isn't review (yet), just an attempt to advise more quickly on
>> general QAPI/QMP conventions.
>>
>> Unit prefixes like Mebi- are nice for humans, because 1MiB is clearer
>> than 1048576B.
>>
>> QMP is for machines.  We eschew unit prefixes and unit symbols there.
>> The unit is implied.  Unit prefixes only complicate things.  Machines
>> can deal with 1048576 easily.  Also dealing 1024Ki and 1Mi is additional
>> work.  We therefore use JSON numbers for byte counts, not strings with
>> units.
>>
>> The general rule is "always use the plainest implied unit that would
>> do."  There are exceptions, mostly due to review failure.
>>
>> Byte rates should be in bytes per second.
>>
>> For time, we've made a godawful mess.  The plainest unit is clearly the
>> second.  We commonly need sub-second granularity, though.
>> Floating-point seconds are unpopular for some reason :)  Instead we use
>> milli-, micro-, and nanoseconds, and even (seconds, microseconds) pairs.
>>
>> QAPI schema documentation describes both the generated C and the QMP
>> wire protocol.  It must be written with the implied unit.  If you send a
>> byte rate in bytes per second via QMP, that's what you document.  Even
>> if a human interface lets you specify the byte rate in MiB/s.
>>
>> Does this make sense?
> 
> This makes sense for the bandwidth fields.  We still need to
> decide how to represent the latency field, though.
> 
> Seconds would be the obvious choice, if only it didn't risk
> silently losing precision when converting numbers to floats.
> 
Got it. I will use bytes per second for bandwidth here. Usually we use 
nanosecond for memory latency, so if we use second for latency, it may 
lose precision. So can I use nanosecond here, because we now use 
nanosecond as smallest time unit.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v13 06/12] numa: Extend CLI to provide memory latency and bandwidth information
  2019-10-28  2:05             ` Tao Xu
@ 2019-10-28  5:46               ` Markus Armbruster
  0 siblings, 0 replies; 38+ messages in thread
From: Markus Armbruster @ 2019-10-28  5:46 UTC (permalink / raw)
  To: Tao Xu
  Cc: Eduardo Habkost, Liu, Jingqi, Du, Fan, qemu-devel,
	jonathan.cameron, Igor Mammedov

Tao Xu <tao3.xu@intel.com> writes:

> Got it. I will use bytes per second for bandwidth here. Usually we use
> nanosecond for memory latency, so if we use second for latency, it may
> lose precision. So can I use nanosecond here, because we now use
> nanosecond as smallest time unit.

Sounds fair, go ahead.



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v13 06/12] numa: Extend CLI to provide memory latency and bandwidth information
  2019-10-25 19:44         ` Markus Armbruster
  2019-10-25 20:51           ` Eduardo Habkost
@ 2019-10-28  7:25           ` Tao Xu
  1 sibling, 0 replies; 38+ messages in thread
From: Tao Xu @ 2019-10-28  7:25 UTC (permalink / raw)
  To: Markus Armbruster, Igor Mammedov
  Cc: Liu, Jingqi, Du, Fan, ehabkost, jonathan.cameron, qemu-devel

On 10/26/2019 3:44 AM, Markus Armbruster wrote:
> Igor Mammedov <imammedo@redhat.com> writes:
> 
[...]
>>>>    1st: above is applicable to both bw and lat values and should be documented as such
>>>>    2nd: 'max NUM is 65534' when different suffixes is fleeting target,
>>>>         spec says that entry with 0xFFFF is unreachable, so how about documenting
>>>>         unreachable value as 0xFFFFFFFFFFFFFFFF (then CLI parsing code will
>>>>         exclude it from range detection and acpi table building code translate it
>>>>         to internal 0xFFFF it could fit into the tables)
>>>>    
>>>
>>> If we input 0xFFFFFFFFFFFFFFFF, qemu will raise error that parameter
>>> expect a size value.
>>
>> opts_type_size() can't handle values >= 0xfffffffffffffc00
>>
>> commit f46bfdbfc8f (util/cutils: Change qemu_strtosz*() from int64_t to uint64_t)
>>
>> so behavior would change depending on if the value is parsed by CLI (size) or QMP (unit64) parsers.
> 
> Do these values matter?  Is there a use case for passing
> 18446744073709550593 via CLI?
> 

There is a case that we need to input 0xFFFF as ACPI HMAT entry (an 
unreachable case). But I am thinking drop this case because Linux kernel 
HMAT as blow:

          /*
          * Check for invalid and overflow values
          */
         if (entry == 0xffff || !entry)
                 return 0;
         else if (base > (UINT_MAX / (entry)))
                 return 0;

So 0xFFFF and 0 are the same.

>> we can cannibalize 0x0 as the unreachable value and an absent bandwidth/lat option
>> for not specified case.
>> It would be conflicting with matrix [1] values in spec, but CLI/QMP deals with
>> absolute values which are later processed into HMAT substructure.
>>
>> Markus,
>> Can we make opts_type_size() handle full range of uint64_t?
> 
> Maybe.
> 




^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2019-10-28  7:26 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-20 11:11 [PATCH v13 00/12] Build ACPI Heterogeneous Memory Attribute Table (HMAT) Tao Xu
2019-10-20 11:11 ` [PATCH v13 01/12] util/cutils: Add qemu_strtotime_ps() Tao Xu
2019-10-23  1:08   ` Eduardo Habkost
2019-10-23  6:07     ` Tao Xu
2019-10-23  1:13   ` Eric Blake
2019-10-23  1:50     ` Tao Xu
2019-10-24  9:54   ` Daniel P. Berrangé
2019-10-24 13:20     ` Eduardo Habkost
2019-10-25  1:22       ` Tao Xu
2019-10-20 11:11 ` [PATCH v13 02/12] tests/cutils: Add test for qemu_strtotime_ps() Tao Xu
2019-10-20 11:11 ` [PATCH v13 03/12] qapi: Add builtin type time Tao Xu
2019-10-20 11:11 ` [PATCH v13 04/12] tests: Add test for QAPI " Tao Xu
2019-10-20 11:11 ` [PATCH v13 05/12] numa: Extend CLI to provide initiator information for numa nodes Tao Xu
2019-10-21 12:29   ` Igor Mammedov
2019-10-22  1:01     ` Tao Xu
2019-10-20 11:11 ` [PATCH v13 06/12] numa: Extend CLI to provide memory latency and bandwidth information Tao Xu
2019-10-22  7:08   ` Igor Mammedov
2019-10-22  8:22     ` Tao Xu
2019-10-23 15:28   ` Igor Mammedov
2019-10-25  6:33     ` Tao Xu
2019-10-25 13:27       ` Igor Mammedov
2019-10-25 19:44         ` Markus Armbruster
2019-10-25 20:51           ` Eduardo Habkost
2019-10-28  2:05             ` Tao Xu
2019-10-28  5:46               ` Markus Armbruster
2019-10-28  7:25           ` Tao Xu
2019-10-20 11:11 ` [PATCH v13 07/12] numa: Calculate hmat latency and bandwidth entry list Tao Xu
2019-10-20 11:11 ` [PATCH v13 08/12] numa: Extend CLI to provide memory side cache information Tao Xu
2019-10-20 11:11 ` [PATCH v13 09/12] hmat acpi: Build Memory Proximity Domain Attributes Structure(s) Tao Xu
2019-10-20 11:11 ` [PATCH v13 10/12] hmat acpi: Build System Locality Latency and Bandwidth Information Structure(s) Tao Xu
2019-10-20 11:11 ` [PATCH v13 11/12] hmat acpi: Build Memory Side Cache " Tao Xu
2019-10-20 11:11 ` [PATCH v13 12/12] tests/bios-tables-test: add test cases for ACPI HMAT Tao Xu
2019-10-20 11:43 ` [PATCH v13 00/12] Build ACPI Heterogeneous Memory Attribute Table (HMAT) no-reply
2019-10-20 12:13 ` no-reply
2019-10-20 12:57 ` no-reply
2019-10-20 13:19 ` no-reply
2019-10-22 11:22 ` Markus Armbruster
2019-10-23  1:46   ` Tao Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).